Enter your test results to calculate statistical significance, p-value, confidence level, and uplift percentage. Includes a sample size calculator to plan your next test. Uses a two-proportion Z-test for accurate results.
Last updated: March 2026 · Reading time: 9 min
Statistical significance means the observed difference between two variations is unlikely to be caused by random chance alone. At the standard 95% confidence level, a p-value below 0.05 indicates significance.The sample size calculator uses the same Z-test framework in reverse. Given your baseline conversion rate, minimum detectable effect, confidence level, and statistical power, it calculates how many visitors you need per variation before starting the test. This prevents the common mistake of calling a test too early with insufficient data.
| Metric | What It Tells You | What to Look For |
|---|---|---|
| P-value | Probability that the observed difference happened by chance | Below 0.05 for 95% confidence. Below 0.01 for 99% confidence. |
| Confidence Level | How sure you can be that the result isn’t random | 95%+ is the industry standard. Some teams use 90% for faster decisions. |
| Uplift % | Relative improvement of Variant B over Control A | Consider business impact. A 2% uplift on $1M revenue = $20K. A 2% uplift on $10K = $200. |
| Sample Size | Visitors needed per variation to detect a given effect | The smaller the expected uplift, the more traffic you need. A 5% MDE needs 4x more traffic than a 10% MDE. |
“I’ve seen teams celebrate a ‘winning’ test that had p=0.04 but only 800 visitors per arm. That’s not a win, that’s a coin flip dressed up in math. Calculate your sample size before you start, commit to running the full duration, and resist the temptation to peek early. The math only works if you follow the protocol.” Hardik Shah, Founder of ScaleGrowth.Digital
Calculate return on investment for any marketing channel or campaign. Use Calculator →
Complete CRO guide covering research, testing frameworks, and quick wins. Read Guide →
30-point checklist for landing pages that convert. Get Checklist →
The standard threshold is p < 0.05, which corresponds to 95% confidence. This means there's less than a 5% probability that the observed difference happened by chance. Some teams use p < 0.10 (90% confidence) for faster decision-making, while high-stakes tests may require p < 0.01 (99% confidence).
It depends on your baseline conversion rate and the minimum effect you want to detect. For a 3% baseline conversion rate and a 10% relative MDE at 95% confidence with 80% power, you need roughly 30,000 visitors per variation. Higher baseline rates and larger expected effects require smaller samples. Use the sample size calculator above to get your specific number.
Minimum detectable effect is the smallest relative change in conversion rate your test is designed to detect. An MDE of 10% on a 5% baseline means you’re testing whether the variant can move the rate from 5.0% to at least 5.5%. Smaller MDEs require larger sample sizes. A practical MDE for most businesses is 5-15%.
Looking at results early (called “peeking”) inflates your false positive rate. If you check a test 10 times during its run, your actual false positive rate could be 30% instead of the intended 5%. Either commit to running the test to full sample size without checking, or use a sequential testing method that adjusts for multiple looks. Most standard A/B testing tools don’t account for peeking.
This calculator uses a two-tailed test, which is the standard for A/B testing. A two-tailed test checks whether the variant is different from the control in either direction (better or worse). A one-tailed test only checks one direction and requires half the sample size, but it can’t detect if your variant is actually performing worse. Use two-tailed unless you have a strong statistical reason not to.
Our CRO practice handles test design, implementation, statistical analysis, and rollout. We find the tests that move revenue, not just conversion rate. Get a CRO Audit →