- Optimizely Web Experimentation
- Optimizely Performance Edge
- Optimizely Personalization
- Optimizely Feature Experimentation
- Optimizely Full Stack (Legacy)
Statistical significance measures how unusual your results would be if the variation and baseline performed identically.
Confidence intervals
A confidence interval is the estimated range likely to include the true effect, such as the true uplift. For example, if you ran an experiment many times and calculated a 90% confidence interval for each run, about 90% of those intervals would contain the true effect. Confidence intervals reflect the sample size and the variability (dispersion or noise) in the experiment data.
Confidence intervals contain:
- A point estimate (uplift or improvement) – A single value derived from your statistical model of choice.
- A margin of error – Around the point estimate that indicates the amount of uncertainty surrounding the sample estimate from the population.
You should report confidence intervals to supplement your statistical significance results, as they can offer information about the observed effect size of your experiment.
The confidence interval in Optimizely's sequential engine is adaptive. The Optimizely Experiment Results page shows the running intersection of all previous confidence intervals by tracking the smallest upper limit and the largest lower confidence interval limit values during the experiment runtime.
Optimizely sets your confidence interval to the same level that you set your statistical significance threshold for the project. By default, the statistical significance setting for your project is 90%.
When Optimizely declares significance
Optimizely declares a variation significant when its confidence interval stops crossing zero. An interval that crosses zero means the data does not yet show a clear impact.
When a variation reaches statistical significance, its confidence interval lies entirely above or entirely below zero.
- Winning variation – The confidence interval is entirely above 0%.
- Inconclusive variation – The confidence interval includes 0%.
- Losing variation – The confidence interval is entirely below 0%.
Confidence interval entirely above zero percent
In the preceding screenshot, the evidence collected so far makes random chance an unlikely explanation for the observed improvement. But the improvement Optimizely measured (+89.9%) may be different from the exact improvement you see going forward. The confidence interval indicates that this test variation has a positive impact in the long run. For this experiment iteration, the error bounds were between 77.57% and 105.21% improvement.
The statistical significance setting for this example is 90%. As Optimizely collects more data, the confidence interval may narrow.
Confidence interval includes zero percent
The confidence interval gives you an idea of whether implementing that variation has a positive or negative impact.
When you see low statistical significance on specific variations, the confidence interval can serve as another data point to help you make decisions. When you have an inconclusive variation, the interval looks like this:
This variation's conversion rate is between -20.19% and 22.13%. You can interpret the confidence interval as a worst-case, middle-ground, and best-case scenario.
- Worst-case – -20.19%
- Middle-ground – 0.69%
- Best-case – 22.13%
Confidence interval entirely below zero percent
In this example, the evidence collected so far makes random chance an unlikely explanation for the observed negative improvement. The negative improvement Optimizely measured (-15.3%) likely matches what you see going forward. The confidence interval indicates that this test variation has a negative impact in the long run. For this experiment iteration, the error bounds were between -15.55% and -9.19% improvement.
How statistical significance and confidence intervals are connected
Optimizely supports multiple statistical analysis methods: Frequentist (Fixed Horizon), Bayesian, and Sequential (the Optimizely Stats Engine). Each method evaluates significance differently, but the relationship between confidence intervals and statistical significance is consistent across them.
Optimizely shows you the statistical likelihood that an observed improvement is caused by your change rather than random variation. Until Stats Engine collects enough data, the Experiment Results page states that more visitors are needed and estimates the wait time based on the observed conversion rate.
The significance level determines how much evidence you require before calling a winner.
- Lower thresholds let you decide and iterate faster but increase the chance of false conclusions.
- Higher thresholds reduce error probability but require more traffic or a longer test duration.
Choose a level that matches your testing strategy, the number of hypotheses you plan to evaluate, and the amount of traffic available.
This rule applies across all methods:
- In Fixed Horizon, you calculate the interval at the end of the test while maintaining error-control guarantees.
- In Bayesian, the interval represents a highly probable range for the true effect.
- In Sequential, the interval updates continuously while maintaining error‑control guarantees.
When the entire interval moves above or below zero, the data provides enough evidence to declare a meaningful effect: your result is statistically significant.
Improvement intervals
The Results page shows a different improvement interval based on the experiment or optimization type:
-
For A/B tests – Optimizely displays the relative improvement in conversion rate for the variation over the baseline as a percentage. This is true for all A/B test metrics, regardless of whether they are binary or numeric conversions.
- A relative improvement interval of 1% to 10% means the variation outperforms the baseline by 1% to 10%. For example, if the baseline conversion rate is 25%, you can expect the variation conversion rate to fall between 25.25% and 27.5%.
- For multi-armed bandit (MAB) optimizations – Optimizely displays absolute improvement.
- For A/B tests or MAB optimizations with Stats Accelerator enabled – Optimizely displays both absolute and relative improvement.
Estimated wait time and <1% significance (for sequential tests)
While the experiment or campaign runs, Optimizely estimates how long the test takes to reach conclusiveness. This estimate uses the current observed baseline and variation conversion rates. If those rates change, the estimate adjusts automatically. In sequential testing, the visitors remaining estimate adjusts continuously as Optimizely collects more data.
You may see a significance of less than 1%, with a certain number of visitors remaining. Optimizely needs more evidence to determine whether the change reflects a true difference in visitor behavior. This applies across all engines, but only sequential shows dynamic visitors remaining estimates in real time.
For example, a variation may require more than 100,000 additional visitors before reaching significance. This number assumes the current conversion rate remains stable. If more visitors see the variation and conversion performance declines, the experiment takes longer and the visitors remaining estimate increases. If conversions improve, Optimizely needs fewer visitors to confirm that the change is real.
To learn about the importance of sample size, see How long to run an experiment.
Please sign in to leave a comment.