Video script templates for explainer videos, testimonials, product demos, tutorials, and brand stories. Each template includes scene breakdowns, timing markers, B-roll notes, and CTA placement. Built for marketing teams producing 60-second to 10-minute videos.
Last updated: March 2026 · Reading time: 11 min
A video script template gives your video structure before a camera turns on. Without one, you end up with 45 minutes of raw footage, 6 hours of editing, and a final product that wanders. Wyzowl’s 2025 State of Video Marketing report found that 91% of businesses use video as a marketing tool, but only 33% say their videos consistently hit performance targets. The gap is almost always in pre-production planning, not production quality.
A video script is a pre-written document that specifies dialogue, visual directions, timing, and calls-to-action for every scene in a video, ensuring the final product delivers a clear message within a defined runtime.
Scripts do three things that winging it cannot: they enforce a time constraint (critical for ads and social clips), they ensure the CTA lands at the right moment, and they let you get stakeholder approval before spending money on production. A 90-second explainer video script takes 30 minutes to write. The same video without a script takes 3x longer to edit into something coherent.
The template pack includes scripts for the five most common marketing video types, each with a different structure optimized for its purpose:
An explainer video script follows a three-act structure: problem, solution, CTA. The entire script runs 60-90 seconds (150-225 words). Every word needs to earn its place. According to Vidyard’s 2024 Video Benchmark Report, 62% of viewers watch explainer videos to completion when they’re under 90 seconds. Over 2 minutes, completion drops to 38%.
| Scene | Duration | Audio / Dialogue | Visual Direction | B-Roll Notes |
|---|---|---|---|---|
| 1: Hook | 0:00-0:08 | “[Relatable frustration statement]. Sound familiar?” | Close-up of person struggling with the problem | Frustrated office worker, cluttered desk, or error screen |
| 2: Problem | 0:08-0:25 | “Every [month/week], [target audience] wastes [X hours/dollars] on [problem]. That’s [Y] per year going to [waste/competitors].” | Animated data visualization showing the cost | Time-lapse of manual work, money counter, calendar filling up |
| 3: Solution Intro | 0:25-0:40 | “[Product name] fixes this. It [one-sentence description of core function].” | Product logo reveal, then UI walkthrough | Clean product shots, interface highlights |
| 4: Key Benefits | 0:40-1:05 | “With [product], you get: [benefit 1 with number], [benefit 2 with number], and [benefit 3 with number].” | Split screen showing before/after for each benefit | Dashboard screenshots, happy customer reactions, metrics improving |
| 5: Social Proof | 1:05-1:15 | “[X,000] companies already use [product], including [recognizable brand].” | Logo bar of customers, customer quote overlay | Customer testimonial clip (3-5 seconds), logo carousel |
| 6: CTA | 1:15-1:30 | “Start your free trial at [URL]. No credit card required.” | Full-screen CTA with URL and button animation | Screen recording of signup flow (show how easy it is) |
The critical rule for explainers: one product, one problem, one CTA. If you’re tempted to mention a second use case, make a second video. Viewers don’t rewind. They leave.
Testimonial videos use the SCSR framework: Situation, Challenge, Solution, Result. You’re not scripting word-for-word (that looks rehearsed and kills authenticity). You’re scripting the questions that guide the interviewee through a compelling narrative arc. The interviewer asks, the customer answers in their own words.
| Scene | Duration | Interview Question (Off-Camera) | Visual Direction | B-Roll Notes |
|---|---|---|---|---|
| 1: Situation | 0:00-0:20 | “Tell me about your company and your role.” | Customer speaking to camera, name/title lower third | Customer’s office, team working, company logo |
| 2: Challenge | 0:20-0:45 | “What was the biggest challenge you were facing before you found [product]?” | Continue interview, cut to B-roll on key phrases | Screenshots of old process, team meetings, whiteboard with problem notes |
| 3: Solution | 0:45-1:10 | “How did [product] help? What was the implementation like?” | Customer speaking, cut to product in use | Screen recordings of the product, onboarding photos, team using the tool |
| 4: Result | 1:10-1:40 | “What results have you seen? Can you share any specific numbers?” | Customer delivers the key metric, number displayed as text overlay | Dashboard showing improvement, before/after comparison, team celebration |
| 5: Recommendation | 1:40-2:00 | “Would you recommend [product]? Who would benefit most?” | Customer speaking directly to camera, natural close | Wide shot of customer at their desk, product on screen in background |
Pro tip: record the interview for 15-20 minutes even though you’ll use 2 minutes. Ask each question 2-3 different ways. The best soundbites come from the second or third time a customer explains something, after they’ve warmed up. BrightLocal (2024) found that 79% of consumers say video testimonials influence their purchase decisions, but only when the delivery feels natural, not scripted.
Product demo scripts follow a Feature-Benefit-Proof loop. Show the feature, explain the benefit, prove it with a specific result or data point. Then move to the next feature. Limit demos to 3-5 features. Gong’s analysis of 25,537 sales demos (2024) found that demos covering 3-4 features convert 28% better than those covering 7+ features.
| Scene | Duration | Audio / Dialogue | Visual Direction |
|---|---|---|---|
| 1: Context | 0:00-0:30 | “If you’re a [role] who needs to [task], here’s how [product] makes that happen in [timeframe] instead of [current timeframe].” | Presenter on camera, then transition to screen share |
| 2: Feature 1 | 0:30-1:15 | “First, [feature name]. [Show the action]. This means you can [benefit], which our customers say saves them [X hours/week].” | Screen recording of feature in action, highlight clicks with cursor emphasis |
| 3: Feature 2 | 1:15-2:00 | “Next, [feature name]. Watch what happens when I [action]. That just [accomplished benefit] in [timeframe].” | Screen recording, split screen with before/after if applicable |
| 4: Feature 3 | 2:00-2:45 | “This is the one our customers mention most: [feature name]. [Demo the action]. [Company name] used this to [specific result].” | Screen recording, customer quote overlay or mini-testimonial clip |
| 5: Integration/Workflow | 2:45-3:30 | “And it connects to the tools you already use. [Show integration with CRM, email, etc.]. Everything syncs automatically.” | Show integration settings, data flowing between tools |
| 6: CTA | 3:30-4:00 | “Start with a free trial or book a live demo with our team. Link’s below.” | Presenter on camera, CTA screen with URL and QR code |
Keep the demo environment clean. Use a dedicated demo account with realistic (not lorem ipsum) data. Viewers should be able to picture their own data in the product. Rehearse the exact click path 3 times before recording. Nothing kills a demo faster than a loading spinner or a wrong click.
Tutorial videos use the Hook-Teach-Recap structure. The hook grabs attention with the outcome (“By the end of this video, you’ll know how to…”), the teach section delivers the content in numbered steps, and the recap reinforces the key points. YouTube’s Creator Academy (2025) reports that tutorial videos with clear timestamps and step numbering get 2.3x more watch time than unstructured how-to content.
| Scene | Duration | Audio / Dialogue | Visual Direction |
|---|---|---|---|
| 1: Hook | 0:00-0:20 | “By the end of this video, you’ll know exactly how to [specific outcome]. I’ll walk you through [X] steps, and I’ll show you the mistake that 80% of people make at step [Y].” | Presenter on camera, energetic but not hyper. Show end result briefly. |
| 2: Context | 0:20-0:45 | “Before we start, here’s what you’ll need: [list 2-3 prerequisites]. If you don’t have [X], check the link in the description.” | On-screen checklist graphic |
| 3: Step 1 | 0:45-1:45 | “Step 1: [Action]. Go to [location] and click [button]. You’ll see [result].” | Screen recording with cursor highlights and text callouts |
| 4: Steps 2-4 | 1:45-4:30 | Repeat pattern: state the step, show the action, explain why it matters, show the result. | Alternate between screen recording and presenter on camera for transitions |
| 5: Common Mistake | 4:30-5:15 | “Now here’s where most people get stuck. They [common error]. Instead, do [correct approach]. See the difference?” | Side-by-side: wrong way vs. right way |
| 6: Recap | 5:15-5:45 | “Let’s recap. Step 1: [quick summary]. Step 2: [quick summary]. Step 3: [quick summary]. And remember, avoid [common mistake].” | Numbered list on screen, presenter voice-over |
| 7: CTA | 5:45-6:00 | “If this helped, subscribe and check out [related video] for the next step. Drop a comment if you have questions.” | End screen with subscribe button, related video thumbnails |
For YouTube tutorials, add chapter markers in the description. Each step becomes a timestamp. This improves both viewer experience and search visibility (Google surfaces chaptered videos in search results with direct links to relevant sections).
Brand story videos follow Origin-Mission-Proof-CTA. They’re the most emotional of the five types and the hardest to script without slipping into corporate clichs. The story must be specific and human. “We started in a garage” works. “We’re passionate about making the world better” doesn’t. Nike’s brand films don’t talk about shoe manufacturing. They talk about athletes. Your brand story should talk about the people you serve.
| Scene | Duration | Audio / Dialogue | Visual Direction | B-Roll Notes |
|---|---|---|---|---|
| 1: Origin | 0:00-0:40 | “In [year], [founder/team] noticed [specific problem]. [Specific anecdote that makes the problem real].” | Founder speaking to camera or voice-over with archival/early-days footage | Early office photos, first product prototype, founding team |
| 2: Mission | 0:40-1:10 | “We set out to [specific mission]. Not [what competitors do], but [what makes you different].” | Team working, montage of building/creating | Workshop footage, design process, team collaboration shots |
| 3: Proof | 1:10-2:00 | “Since then, we’ve [major milestone with number]. [Customer name] said it best: ‘[2-sentence quote].’ And [second proof point with data].” | Customer clips, data visualizations, milestone moments | Customer using product, event footage, awards/press mentions |
| 4: Where We’re Going | 2:00-2:30 | “Today, we’re [current focus]. Because [why this matters to customers, not to you].” | Forward-looking shots: new product, team expansion, vision | New office, product roadmap visuals, customer growth map |
| 5: CTA | 2:30-2:50 | “Join [X,000+] [customers/members/users] who [specific outcome]. [Action verb] at [URL].” | Logo on screen, URL, CTA button | Community montage, customer faces, final product shot |
The single most important brand story rule: make the customer the hero, not your company. Your company is the guide that helped them succeed. This is the StoryBrand framework (Donald Miller) applied to video, and it works because viewers care about people like them achieving results, not about your founding story for its own sake.
Three formulas cover 90% of marketing video scripts. Pick the one that matches your video’s goal:
PAS (Problem-Agitate-Solve). State the problem. Make it feel urgent by showing consequences. Present your product as the answer. Best for: explainer videos, social ads, landing page videos. Example: “Your team spends 12 hours a week on manual reporting. That’s 624 hours a year, the equivalent of losing a full-time employee to spreadsheets. [Product] automates your reports in 15 minutes.”
AIDA (Attention-Interest-Desire-Action). Grab attention with a surprising fact or bold claim. Build interest with the “how.” Create desire with benefits and social proof. Close with a clear action. Best for: product launches, demo videos, sales enablement. This formula has been tested since 1898 (Elias St. Elmo Lewis) and still outperforms unstructured scripts by 35-50% on completion rates, according to a Wistia analysis of 14,000 marketing videos (2024).
Before-After-Bridge. Show the “before” state (the viewer’s current situation). Show the “after” state (what life looks like with your product). Build the bridge (how to get from before to after). Best for: testimonial videos, case study videos, brand stories.
One rule that applies to all three: front-load the value. YouTube’s internal data shows that 20% of viewers leave within the first 10 seconds. Your hook has to earn the next 80 seconds. Don’t start with your logo animation. Start with the viewer’s problem.
Pacing is how fast or slow information moves through the video. Too fast and viewers feel overwhelmed. Too slow and they leave. Here are the benchmarks we use at ScaleGrowth.Digital:
| Metric | Guideline | Why |
|---|---|---|
| Speaking rate | 140-160 words per minute | Conversational pace, easy to follow. Faster than that feels like a sales pitch. |
| Scene length | 3-8 seconds per visual | Human attention resets every 8 seconds (Microsoft attention study, 2023). Change visuals within that window. |
| CTA timing (under 2 min) | Last 15 seconds | Short videos should end on the CTA. Don’t waste time with a post-CTA outro. |
| CTA timing (2-10 min) | 75% mark + end | First mention at 75% (when viewers are most engaged), repeat at the end. |
| B-roll ratio | 40-60% of total runtime | Talking heads without B-roll drops retention by 25-30% (Wistia, 2024). |
| Silence/pauses | 0.5-1 second after key points | Gives the viewer time to absorb. Especially important after numbers or surprising claims. |
| Music | 60-70% quieter than dialogue | Background only. If viewers notice the music, it’s too loud. |
Word count to runtime conversion: 150 words equals roughly 1 minute of spoken content. A 90-second explainer video script should be 200-225 words. A 5-minute tutorial should be 700-750 words. These numbers assume conversational pacing with natural pauses. If your script reads longer than the target runtime at 150 WPM, cut content rather than speeding up delivery.
“We’ve produced or consulted on video projects for over 25 brands. The number one mistake isn’t production quality, it’s script length. Teams write 500-word scripts for 60-second videos, then try to speed up the voice-over to fit. The result sounds rushed, viewers retain nothing, and the CTA gets cut for time. Write to the clock first. If your 90-second script is over 225 words, you need to cut, not compress.”
Hardik Shah, Founder of ScaleGrowth.Digital
Four mistakes to avoid:
Mistake 1: Starting with your logo. A 5-second logo intro on a 60-second video wastes 8% of your runtime on something that adds zero value to the viewer. Put your logo in the lower third throughout the video instead.
Mistake 2: Multiple CTAs. “Subscribe, visit our website, follow us on social, and download our ebook.” That’s four asks. The viewer does none of them. One video, one CTA. Hubspot’s 2024 video marketing data shows single-CTA videos convert 247% better than multi-CTA videos.
Mistake 3: Writing for readers, not listeners. Read your script out loud before recording. If any sentence requires a second read to understand, simplify it. Written language and spoken language are different. Contractions, short sentences, and conversational phrasing work on camera. Academic-sounding prose doesn’t.
Mistake 4: No B-roll planning. If B-roll isn’t in the script, it won’t be in the edit. Mark every scene with visual direction and B-roll notes. Editors can’t use footage you didn’t plan to capture.
Each template in the download is a table with 5 columns: scene number, duration, audio/dialogue, visual direction, and B-roll notes. Here’s the quick-start process:
Step 1: Pick the template that matches your video type. Don’t mix templates. An explainer is not a tutorial.
Step 2: Fill in the audio/dialogue column first. Write conversationally, as if you’re explaining it to a colleague. Keep word count within the target for your runtime.
Step 3: Add visual direction for each scene. Describe what’s on screen: talking head, screen recording, animation, product shot.
Step 4: Note B-roll requirements. This becomes your shot list for production day. Missing B-roll means awkward jump cuts in editing.
Step 5: Read the entire script out loud and time it. Adjust until it fits your target runtime at conversational pace.
For webinar-specific scripts, see our webinar script template. For short-form social video scripts, check the TikTok script template.
Get all 5 video script templates in Google Docs format. Each includes scene breakdowns, timing markers, B-roll notes, and a word-count-to-runtime calculator.
Structure a 30-60 minute webinar with opening hook, content sections, Q&A prep, and closing CTA.
15-60 second scripts optimized for TikTok, Reels, and Shorts. Hook-first structure for vertical video.
ScaleGrowth.Digital builds video content strategies tied to search demand and conversion goals.
Use the 150-words-per-minute rule. A 60-second video needs about 150 words. A 90-second explainer needs 200-225 words. A 5-minute tutorial needs 700-750 words. Always read the script aloud and time it. Written pacing is deceptively fast compared to natural speaking pace.
No. Script the questions, not the answers. A word-for-word testimonial looks and sounds rehearsed, which kills credibility. Prepare 8-10 guided questions that follow the Situation-Challenge-Solution-Result framework. Record 15-20 minutes and edit down to the 90-120 second highlights.
For videos under 2 minutes, place the CTA in the last 15 seconds. For videos 2-10 minutes, mention it at the 75% mark and again at the end. For YouTube videos over 10 minutes, add a mid-roll CTA at the 40-50% mark as well. Never place a CTA in the first 30 seconds of any video.
It depends on the platform and purpose. Social ads: 15-30 seconds. Explainer videos: 60-90 seconds. Product demos: 2-5 minutes. Tutorials: 5-10 minutes. Webinars: 30-45 minutes. Wyzowl’s 2025 data shows that 2 minutes is the sweet spot for marketing videos overall, with 68% average completion rate at that length.
Not for most marketing videos. A modern smartphone (iPhone 13+ or Samsung S22+), a $25 lavalier microphone, and natural window light produce professional-enough quality. Audio quality matters more than video quality. Viewers tolerate 720p video but won’t tolerate echo or background noise. Invest in a microphone before a camera.
ScaleGrowth.Digital develops video content strategies tied to search demand and conversion funnels. From scripting to distribution planning, we handle the strategy so your production team can focus on execution.