Frequentist (Fixed Horizon) statistics

Optimizely Feature Experimentation
Optimizely Analytics

Fixed Horizon is a frequentist statistical method used to run traditional A/B tests with a predetermined sample size. This approach relies on well-established statistical concepts such as p-values, minimum detectable effect (MDE), and variance to determine whether observed differences between variations are meaningful. See the following sections to understand the key statistical elements behind Frequentist (Fixed Horizon) testing and how they impact experiment design and analysis.

Frequentist (Fixed Horizon) is in beta. Contact your Customer Success Manager for information.

To learn how to configure a Frequentist (Fixed Horizon) test in Optimizely Experimentation, see Configure a Frequentist (Fixed Horizon) A/B test.

Why Frequentists (Fixed Horizon) tests require a predetermined sample size

In a Frequentist (Fixed Horizon) experiment, you must calculate the sample size before starting. This ensures the following:

Reliable conclusions are supported – You collect enough data to detect a real effect rather than miss it in the noise.
Mid-test peeking is avoided – You prevent inflated false positives caused by early looks at partial data.
A clear decision rule is committed to – You evaluate results once, using pre-defined thresholds.

Viewing or acting on incomplete results increases false positives and invalidates your experiment.

Sample size calculation

The required sample size per variation depends on the following:

Baseline metric value – The current performance of the metric you are testing.
MDE – The smallest detectable relative change.
Statistical significance level – Your desired confidence threshold.
Variance – The variability of your data.

These inputs work together to determine the total number of visitors needed to make a reliable conclusion. Use the built-in Frequentist (Fixed Horizon) Sample Size Calculator to automatically compute the required visitors per variation.

Why peeking mid-test is a problem

In a Fixed Horizon experiment, peeking mid-test increases the chance of false positives. When you check results before the full sample is collected, you might stop the experiment too early, believing you found a winner when the effect was due to random chance.

Consider the following analogy, if you flip a coin 100 times to test fairness and check after 10 flips (7 heads), you might falsely conclude bias. Waiting for all 100 flips gives a more reliable answer.

Fixed Horizon example scenario

Scenario

Baseline Metric Value – 5%
Minimum Detectable Effect – 10% relative increase. That is, you expect the metric value in the variant is $ 5\% \times (100 + 10)\% $.
Statistical Significance level – 95%
Number of Variations – 2 (1 baseline + 1 treatment)

Outcome

Visitors needed per variation – 34,363 visitors
Minimum duration – 7 days.

You should run all tests for a minimum of one business cycle (seven days) to account for all kinds of user behavior. For example, differences between weekend visitors and weekday visitors. See Seasonality and traffic spikes.

Interpretation

You must run the experiment until each variation (1 baseline + 1 test) receives 34,363 visitors and at least seven days have passed.
You cannot view results before these conditions are complete.

Statistical calculations

The following sections assume that there two variants.

One baseline (or control) group.
One treatment (or test) group.

In situations with multiple treatments, Optimizely compares each treatment group to the baseline (control) group, resulting in a series of two-sample problems. This is a multiple comparison problem and requires false positive discovery control.

Confidence interval calculations

Depending on the types of metrics (binary, numeric, or ratio) and the types of improvement (absolute or relative), one of the following confidence interval calculations is used by Optimizely:

Numeric or binary metric, relative improvement

Let $ \hat{\theta}_h $ and $ \hat{\sigma}^2_h $ denote the estimated metric (sample mean) and variance in the baseline (or control) group with sample size $ n_h $.
Let $ \hat{\theta}_t $ and $ \hat{\sigma}^2_t $ denote the estimated metric and variance in the treatment group with sample size $ n_t $.
Let $ \hat{R} := \frac{\hat{\theta}_t}{\hat{\theta}_h}, $. Therefore, the relative improvement is defined as $ 100(\hat{R} - 1)\% $.

An approximate $ 100(1 - \alpha)\% $ confidence interval using the Delta method is: \[ \hat{R} - 1 \pm \Phi^{-1}(1 - \alpha/2)\sqrt{\frac{1}{\hat{\theta}_h^2} \left( \frac{\hat{\sigma}_h^2 \hat{R}^2}{n_h} + \frac{\hat{\sigma}_t^2}{n_t} \right)} \]

Because Optimizely reports and tests relative improvement, the previous confidence interval is a Wald interval. For testing absolute improvement, you may consider using a T-test.

P-values

For the Wald confidence intervals, their corresponding p-value is defined as the following:

Let $ w $ denote the observed value of the Wald statistic W.

The p-value is: \[ p\text{-value} = \mathcal{P}_{\theta_0}(|W| > |w|) \approx 2\Phi(-|w|) \]

Sample size estimation calculations

Estimate how many visitors an experiment needs to detect a given improvement at your chosen power and significance level.

Estimation for relative improvement for binary or numeric metrics

Suppose the true ratio of treatment over control is $ \gamma $, and assume that $ n_t = kn_h $, then the desired sample size needed for achieving the power $ 1 - \beta $ is

\[ n_h = \frac{\frac{1}{\theta_h^2} \left( \sigma_h^2 \gamma^2 + \frac{\sigma_t^2}{k} \right)} {\left(\frac{1 - \gamma}{z_{\alpha / 2} + z_\beta}\right)^2}. \]

Calculation examples

The formula needs the variance of your metric ($ \sigma_h^2 $ and $ \sigma_t^2 $). Calculate it from historical data in one of two ways.

Use a built-in variance function

If you store per-visitor values for this metric (for example, revenue per visitor), use one of the following functions:

Excel or Google Sheets – =VAR.S(A2:A1000)
SQL – SELECT VAR_SAMP(METRIC_VALUE) FROM YOUR_EVENTS

Most analytics and business intelligence (BI) tools include a sample variance function.

Calculate variance by hand

Suppose your last five visitors spent $10, $20, $30, $40, $50:

Find the average. The average is $30.
Subtract the average from each value, then square each result:
- (−20)² = 400
- (−10)² = 100
- 0² = 0
- 10² = 100
- 20² = 400
Add the squared results: 400 + 100 + 0 + 100 + 400 = 1000.
Divide the total by the number of values minus one: 1000 ÷ (5 − 1) = 250. You divide by the number of values minus one because this is a sample, not the full population.

The variance is 250.

Next steps

For more information, see the following documentation:

Frequentist (Fixed Horizon) statistics

Why Frequentists (Fixed Horizon) tests require a predetermined sample size

Sample size calculation

Why peeking mid-test is a problem

Fixed Horizon example scenario

Scenario

Outcome

Interpretation

Statistical calculations

Confidence interval calculations

Numeric or binary metric, relative improvement

P-values

Sample size estimation calculations

Estimation for relative improvement for binary or numeric metrics

Calculation examples

Next steps

<%= previousTitle %>

<%= nextTitle %>

In this article

<%= heading %>

<% if (!block.description) { %> <%= block.name %> <% } else { %> <%= block.name %> <% } %>

<%= heading %>

<% if (!block.description) { %> <%= parsed.title %> <% } else { %> <%= parsed.title %> <% } %>

User Research

Security Announcements

Still have questions?

Categories

Toggle navigation menu

<%= category.name %>