A/B Test Log Spreadsheet | Free Experiment Tracker

About This Template

What is an A/B test log and why does every testing program need one?

An A/B test log is a centralized spreadsheet where you document every experiment your team runs, from initial hypothesis through results and the business decision that followed. Without one, teams repeat failed tests, forget what worked, and can’t prove the cumulative impact of their testing program to leadership.

A/B test log: A structured record of every split test or multivariate experiment, capturing the hypothesis, test design, sample size, statistical results, business impact, and the key learning extracted from each test.

VWO’s 2024 State of Experimentation report found that companies running 10+ tests per month are 2.7x more likely to report significant revenue growth than those running fewer than 3. But running tests isn’t enough. VWO also found that 62% of CRO teams have no centralized documentation of past experiments. That means they’re re-testing ideas, missing patterns across tests, and unable to build on previous wins. This template fixes that. It gives you a backlog to prioritize ideas, a live tracker for running experiments, a results archive with revenue impact calculations, and a learnings database that turns individual test results into institutional knowledge.

Preview

What does this A/B test log spreadsheet look like?

Four tabs cover the full testing lifecycle: idea, execution, analysis, and learning.

Tab	Purpose	Key Columns
1. Test Backlog	Prioritized idea queue	Hypothesis, Page/Element, Impact (1-10), Confidence (1-10), Ease (1-10), ICE Score, Priority Rank
2. Active Tests	Currently running experiments	Test Name, Start Date, End Date, Control Description, Variant Description, Sample Size Target, Current Status
3. Results Log	Completed test outcomes	Test Name, Winner, Uplift %, Confidence Level, Revenue Impact (Monthly), Revenue Impact (Annual), Decision
4. Learnings Database	Searchable knowledge base	Test ID, Category, Key Finding, Applies To, Date, Validated By Repeat Test?

What’s Included

What does each tab of the A/B test log contain?

Each tab maps to a stage of the experimentation workflow.

Test Backlog: Every test idea starts here. Write your hypothesis in “If we [change], then [metric] will [improve/decrease] because [reason]” format. Score each idea on Impact (1-10), Confidence (1-10), and Ease (1-10). The ICE score calculates automatically, and rows sort by priority. Dropdown menus categorize tests by type: copy, layout, CTA, pricing, form, navigation, or media.
Active Tests: When you move a test from backlog to live, it goes here. Record the start date, planned end date, sample size target (use the built-in calculator row that estimates required sample size based on your baseline conversion rate and minimum detectable effect), and a plain-English description of what the control and variant look like. Status dropdown options: Setting Up, Running, Minimum Sample Reached, Analyzing, Complete.
Results Log: After each test concludes, record the winner (Control, Variant A, Variant B, or Inconclusive), percentage uplift, statistical confidence level, and business impact. The revenue impact column uses a formula: (monthly visitors to tested page) x (conversion rate lift) x (average order value or lead value). This annualizes automatically. Over 12 months of testing, this column shows cumulative revenue impact.
Learnings Database: The most underrated tab. After each test, distill the result into a one-sentence learning: “Short-form testimonials (under 20 words) outperform long testimonials on pricing pages by 18%.” Tag it by category (social proof, CTAs, forms, pricing, copy). Six months later, when someone proposes a new test, search this tab first. The McGaw.io testing framework (2024) recommends maintaining a learnings database as the single most valuable output of any experimentation program.

Prioritization

How does ICE scoring prioritize your test backlog?

A simple framework that prevents your team from testing low-impact ideas first.

Dimension	Question It Answers	Scoring Guide
Impact (I)	How much will this move the target metric?	1-3: Minor (<5% lift). 4-6: Moderate (5-15%). 7-10: Major (15%+)
Confidence (C)	How sure are you this will work?	1-3: Gut feeling. 4-6: Supported by data or competitor evidence. 7-10: Validated by past tests
Ease (E)	How easy is this to implement and test?	1-3: Needs dev work, 2+ weeks. 4-6: Moderate effort, 1-2 weeks. 7-10: Copy/config change, days

The ICE score is the average of all three: (I + C + E) / 3. A test scoring 8.0+ is a top priority. Anything below 4.0 goes to the bottom of the backlog or gets cut entirely. Sean Ellis, who coined the ICE framework at GrowthHackers, recommends re-scoring ideas quarterly. What seemed like a 9 on Impact last quarter might be a 5 now because you’ve already optimized that area. The template includes a “Last Scored” date column to flag stale scores. For context: teams at Booking.com run over 1,000 concurrent experiments at any given time, according to their 2023 engineering blog. You don’t need that volume. But you do need a system. Even 2-3 well-prioritized tests per month, documented properly, compound into significant revenue gains. The A/B test significance calculator on our site can help you determine the right sample size before you start.

How To Use

How do you set up an A/B test tracking spreadsheet?

Setup takes 20 minutes. The ongoing habit takes 5 minutes per test.

Dump every test idea into Tab 1. Don’t filter at this stage. Landing page headline changes, checkout flow modifications, email subject lines, pricing page layout, CTA button color and copy. Get them all in. Then score each one with ICE and let the math sort your priorities.
Move the top-scored idea to Tab 2 when you’re ready to run it. Fill in the control and variant descriptions in plain language. Set your sample size target using the built-in formula (you’ll need your current conversion rate and the minimum uplift you’d consider meaningful). Set start and end dates.
Don’t stop tests early. This is the most common mistake. Mark the status as “Running” and don’t touch it until you hit your sample size target. Peeking at results mid-test and calling winners early inflates your false-positive rate from 5% to as high as 30%, according to Evan Miller’s analysis of sequential testing errors.
Record results in Tab 3 within 24 hours of test completion. Include the winner, uplift, confidence level, and your team’s decision (Implement, Iterate, or Archive). Calculate the monthly revenue impact using the formula provided. This is the number that justifies your testing program to leadership.
Extract the learning into Tab 4. One sentence. Specific. Actionable. “Reducing form fields from 7 to 4 increased lead form completion by 22% on service pages.” Not “shorter forms are better.” The specific detail is what makes the learning reusable.

Expert Context

What testing mistakes waste the most traffic?

We’ve reviewed experimentation programs for over 25 companies. The same errors come up regardless of industry or traffic volume:

Testing without a hypothesis. “Let’s try a green button” isn’t a test. “Changing the CTA from ‘Submit’ to ‘Get My Free Report’ will increase form completions by 10% because it sets a clear expectation of value” is a test. The hypothesis format forces you to think about why a change should work, which means you learn something whether the test wins or loses.
Running tests on low-traffic pages. If a page gets 500 visitors per month, it will take 4-6 months to reach statistical significance on most tests. Focus testing resources on pages with 5,000+ monthly visitors. Use the sample size calculator in the template to check feasibility before starting.
Not tracking cumulative revenue impact. Individual test results are interesting. Cumulative impact over 12 months is what gets your testing budget renewed. The Results Log tab auto-sums annual revenue impact across all completed tests. After 6 months, you should be able to say “our testing program has generated $X in additional revenue.”
Losing institutional memory. Person A runs a pricing page test in January. Person B joins the team in June and runs the same test. The Learnings Database in Tab 4 prevents this. Before proposing any new test, search Tab 4 for related findings. It takes 2 minutes and can save weeks of wasted testing time.

“The test itself is worth 10% of the value. The other 90% is in the documented learning. A company that runs 50 tests and documents zero learnings is worse off than a company that runs 15 tests and builds a searchable knowledge base from every result.”
Hardik Shah, Founder of ScaleGrowth.Digital

We use a version of this log inside every analytics and measurement engagement. The Results Log tab is what goes into quarterly business reviews. The Learnings Database is what our CRO team references before designing any new test. It’s the compound interest of experimentation.

Download the A/B Test Log

Get the complete 4-tab experiment tracker with ICE scoring, sample size calculator, revenue impact formulas, and a learnings database. Ready to use in 20 minutes. Download Free Template →

Google Sheets format. No spam. Instant access.

A/B Test Significance Calculator

Calculate statistical significance, required sample size, and test duration before you start any experiment. Try Calculator →

Landing Page Checklist

A pre-launch checklist for landing pages covering copy, design, load speed, and conversion elements. Pairs well with testing. Get Checklist →

Conversion Rate Optimization Guide

The full CRO methodology from audit through testing to implementation. Context for how this test log fits into a broader program. Read Guide →

FAQ

Frequently Asked Questions

What is ICE scoring for A/B tests?

ICE stands for Impact, Confidence, and Ease. Each dimension is scored 1-10. The ICE score is the average of all three: (I + C + E) / 3. Tests with higher ICE scores should be prioritized. The framework was created by Sean Ellis at GrowthHackers and is widely used by CRO teams to rank test ideas objectively rather than relying on opinion or seniority.

How long should you run an A/B test?

Run each test until it reaches your pre-calculated sample size, and always for at least 1 full business cycle (typically 7-14 days). For most sites with 10,000-50,000 monthly visitors, expect tests to run 2-4 weeks. Never stop a test early because results look promising. Early stopping inflates false-positive rates significantly.

What confidence level should an A/B test reach?

The industry standard is 95% statistical confidence (p-value < 0.05). For high-stakes tests (pricing changes, checkout flow), use 99% confidence. For lower-risk tests (button color, headline copy), 90% confidence can be acceptable if you're optimizing for test velocity over certainty.

How many A/B tests should a team run per month?

This depends on traffic volume. Sites with 100,000+ monthly visitors can run 4-8 concurrent tests across different pages. Sites with 10,000-50,000 visitors should focus on 1-2 tests at a time to avoid traffic splitting issues. VWO’s 2024 report found that companies running 10+ tests per month are 2.7x more likely to report significant revenue growth.

A/B Test Log Spreadsheet: Track Every Experiment That Matters

What’s in this template