Question 1

What does 95% confidence actually mean?

Accepted Answer

It means: if the variants truly had identical conversion rates, the probability of observing a difference this large (or larger) just by chance is less than 5%. It does NOT mean 'B is 95% likely to be better than A' — that's a common misinterpretation. The honest interpretation: at 95% confidence, you're accepting a 5% false-positive rate over many tests.

Question 2

When should I use 90% vs 95% vs 99%?

Accepted Answer

95% is the default for almost everything in performance marketing — it balances false-positive risk against test duration. Use 90% for fast, low-risk creative tests (more false positives but faster shipping) and 99% for high-stakes irreversible changes (pricing, checkout flow, brand positioning).

Question 3

How many conversions do I need before I read the verdict?

Accepted Answer

At least 50 per variant for a meaningful read; 100+ per variant for a confident one. Below 50 conversions, even a 30% apparent lift can fail to reach significance — the variance is huge at low volume. If your daily traffic is too low to hit 100 conversions per variant in 14 days, consider running fewer parallel tests or testing bigger changes (more likely to produce large lift).

Question 4

What's wrong with peeking at a test daily?

Accepted Answer

Each time you check, you have a small chance of seeing 'significance' by random fluctuation alone. Check 14 times and your effective false-positive rate is around 25%, not 5%. To peek safely, either pre-commit to a sample size up front, or use sequential testing methods (Bayesian A/B testing tools like Optimizely's stats engine, which adjust significance dynamically).

Question 5

How is statistical significance different from business significance?

Accepted Answer

Statistical significance asks 'is this lift real?'. Business significance asks 'is this lift worth shipping?'. A 0.4% lift on a 3% baseline (relative lift +13%) is huge in ecommerce. A 0.4% lift on a 30% baseline (relative lift +1.3%) might not survive minor implementation overhead. Always reason about both — significance is a green light to consider shipping, not a mandate to ship.

Question 6

Should I use this for incrementality testing too?

Accepted Answer

Yes, but with caveats. Conversion-lift tests (Meta Conversion Lift, geo holdouts) use the same two-proportion math, but the treatment group is exposed to a campaign and the holdout isn't. The math is identical; the sample-size requirements are much larger because lift is usually smaller than in standard A/B tests. Use the conversion-lift sample-size calculator to plan; use this tool to read the result.

A/B Test Significance Calculator

Test inputs

What the math is doing

Frequently asked

Stop guessing whether a creative actually wins.