A 4-tab experiment log with test backlog, active test tracking, results archive, and a learnings database. ICE scoring, confidence intervals, and revenue impact calculations built in.
Last updated: March 2026 · Reading time: 9 min
VWO’s 2024 State of Experimentation report found that companies running 10+ tests per month are 2.7x more likely to report significant revenue growth than those running fewer than 3. But running tests isn’t enough. VWO also found that 62% of CRO teams have no centralized documentation of past experiments. That means they’re re-testing ideas, missing patterns across tests, and unable to build on previous wins. This template fixes that. It gives you a backlog to prioritize ideas, a live tracker for running experiments, a results archive with revenue impact calculations, and a learnings database that turns individual test results into institutional knowledge.A/B test log: A structured record of every split test or multivariate experiment, capturing the hypothesis, test design, sample size, statistical results, business impact, and the key learning extracted from each test.
Four tabs cover the full testing lifecycle: idea, execution, analysis, and learning.
| Tab | Purpose | Key Columns |
|---|---|---|
| 1. Test Backlog | Prioritized idea queue | Hypothesis, Page/Element, Impact (1-10), Confidence (1-10), Ease (1-10), ICE Score, Priority Rank |
| 2. Active Tests | Currently running experiments | Test Name, Start Date, End Date, Control Description, Variant Description, Sample Size Target, Current Status |
| 3. Results Log | Completed test outcomes | Test Name, Winner, Uplift %, Confidence Level, Revenue Impact (Monthly), Revenue Impact (Annual), Decision |
| 4. Learnings Database | Searchable knowledge base | Test ID, Category, Key Finding, Applies To, Date, Validated By Repeat Test? |
Each tab maps to a stage of the experimentation workflow.
A simple framework that prevents your team from testing low-impact ideas first.
| Dimension | Question It Answers | Scoring Guide |
|---|---|---|
| Impact (I) | How much will this move the target metric? | 1-3: Minor (<5% lift). 4-6: Moderate (5-15%). 7-10: Major (15%+) |
| Confidence (C) | How sure are you this will work? | 1-3: Gut feeling. 4-6: Supported by data or competitor evidence. 7-10: Validated by past tests |
| Ease (E) | How easy is this to implement and test? | 1-3: Needs dev work, 2+ weeks. 4-6: Moderate effort, 1-2 weeks. 7-10: Copy/config change, days |
Setup takes 20 minutes. The ongoing habit takes 5 minutes per test.
We use a version of this log inside every analytics and measurement engagement. The Results Log tab is what goes into quarterly business reviews. The Learnings Database is what our CRO team references before designing any new test. It’s the compound interest of experimentation.“The test itself is worth 10% of the value. The other 90% is in the documented learning. A company that runs 50 tests and documents zero learnings is worse off than a company that runs 15 tests and builds a searchable knowledge base from every result.”
Hardik Shah, Founder of ScaleGrowth.Digital
Get the complete 4-tab experiment tracker with ICE scoring, sample size calculator, revenue impact formulas, and a learnings database. Ready to use in 20 minutes. Download Free Template →
Google Sheets format. No spam. Instant access.
Calculate statistical significance, required sample size, and test duration before you start any experiment. Try Calculator →
A pre-launch checklist for landing pages covering copy, design, load speed, and conversion elements. Pairs well with testing. Get Checklist →
The full CRO methodology from audit through testing to implementation. Context for how this test log fits into a broader program. Read Guide →
ICE stands for Impact, Confidence, and Ease. Each dimension is scored 1-10. The ICE score is the average of all three: (I + C + E) / 3. Tests with higher ICE scores should be prioritized. The framework was created by Sean Ellis at GrowthHackers and is widely used by CRO teams to rank test ideas objectively rather than relying on opinion or seniority.
Run each test until it reaches your pre-calculated sample size, and always for at least 1 full business cycle (typically 7-14 days). For most sites with 10,000-50,000 monthly visitors, expect tests to run 2-4 weeks. Never stop a test early because results look promising. Early stopping inflates false-positive rates significantly.
The industry standard is 95% statistical confidence (p-value < 0.05). For high-stakes tests (pricing changes, checkout flow), use 99% confidence. For lower-risk tests (button color, headline copy), 90% confidence can be acceptable if you're optimizing for test velocity over certainty.
This depends on traffic volume. Sites with 100,000+ monthly visitors can run 4-8 concurrent tests across different pages. Sites with 10,000-50,000 visitors should focus on 1-2 tests at a time to avoid traffic splitting issues. VWO’s 2024 report found that companies running 10+ tests per month are 2.7x more likely to report significant revenue growth.
Our analytics team designs experiment roadmaps, builds testing infrastructure, and documents learnings that compound over time. We’ve helped clients find 15-40% conversion lifts through systematic testing. Talk to Our Analytics Team → Get in Touch →