Use minimum detectable effect to prioritize experiments

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)
This article is part of The Optimization Methodology series.

Many optimization programs prioritize their roadmaps by estimating the effort versus the impact of an experiment. Use the minimum detectable effect (MDE) to gauge an experiment's potential effort and impact.

MDE represents the relative minimum improvement over the baseline you are willing to detect in an experiment to a certain degree of statistical significance. It can help determine the relationship between impact and effort (or cost and potential value) for your experiment.

Use MDE to benchmark the time it takes to run an experiment and the impact you are likely to see so you can prioritize experiments according to expected ROI. Depending on how granular you want your results, you can set expectations for how long it may take to run an experiment.

MDE can provide a guideline through the uncertainty of experimentation.

How MDE affects sample size

One major cost in every experiment is the time it takes to reach a statistically significant result. To estimate how long a given experiment must run to achieve statistical significance, determine the following:

  • Traffic allocated for the experiment – What percentage of your traffic will you allocate for this experiment?

  • Total sample size – Total sample size is the number of variations multiplied by the sample size per variation. Use Optimizely Experimentation’s sample size calculator to estimate the sample size to reach statistical significance, depending on the baseline conversion rate and the MDE.

Use the following formula to determine how long to run an experiment:


Dividing the total sample size by traffic allocated to the experiment tells you approximately how long it takes to run this experiment so you can prioritize.

To calculate the sample size per variation for your experiment, you need the current baseline conversion rate and the MDE. The MDE represents the relative minimal improvement over the baseline you are willing to detect in this experiment to a certain significance level. For example, assume a standard 95% statistical significance.

If your experiment measures an improvement equal to or higher than the MDE, you reach significance within the given sample size. You see a significant result with equal or fewer visitors than originally estimated, and you can call a winner more quickly. However, if your experiment detects improvement lower than the MDE, it does not reach statistical significance within the given sample size. You must keep running the experiment to call a winner.

For example, you run an experiment to optimize a checkout flow. You measure conversions with a pageview goal on the checkout confirmation page; the baseline conversion rate is 10%. You estimate that your variation will improve the baseline by at least 5% and your variation conversion rate will be 10.5% or greater. Your MDE is 5%.

calculator.png

With Optimizely Experimentation's sample size calculator, you determine a sample size of 62,000 per variation. In this experiment, which includes one original and one variation, you need approximately 124,000 visitors to detect a change of 5% or more at 95% statistical significance. You launch the experiment.

When results start appearing, you note that the actual conversion rate for your variation is higher than 10.5%. If this trend continues, your experiment will likely reach significance within 124,000 visitors. If the conversion rate is lower than 10.5%, you probably will not reach statistical significance by 124,000 visitors. Now, you must decide whether to keep running the experiment. Depending on your results, you may gather more data or move on to the next idea.

Set boundaries by estimating MDE

Rather than trying to have an exact MDE, use it to set boundaries for your experiment so you can make informed business decisions. You can decide when to keep running an experiment given certain operational constraints.

Notice how the baseline conversion rate and MDE directly affect the sample size:

The smaller your baseline is, the larger the sample size required to detect the same relative change (MDE).

Baseline

MDE

Statistical significance

Sample size (per variation)

15% 10% 95% 7,271
10% 10% 95% 12,243
3% 10% 95% 51,141

The smaller your MDE is, the larger the sample size required to reach statistical significance.

Baseline

MDE

Statistical significance

Sample size (per variation)

10% 10% 95% 12,243
10% 5% 95% 59,401
10% 3% 95% 185,661

The graph below demonstrates how your sample size may balloon as you attempt to detect a smaller MDE.

Sample size translates directly into how long it takes to run an experiment:

  • number of weeks to run an experiment = total sample size / unique visitors per week

In the example above, if you had 40,000 weekly unique visitors on the page you are planning to experiment, you would calculate the following:

  • total sample size / unique visitors per week = 60,000 / 40,000 = 1.5, or approximately 2 weeks
Since most business metrics run weekly, consider rounding up the time you calculate for running an experiment to whole weeks.

The baseline, number of variations, number of unique visitors, and statistical significance are constant for this experiment. You can plot the time it takes to run this experiment as a function of the MDE.

It will take at least five weeks to detect improvement to a granularity of 4% lift or less. You can use this information to prioritize this experiment. If the 4% lift on this goal affects an important metric and the estimated time to run this experiment is within a realistic range for your roadmap, you may want to continue with the experiment.

For example, a 4% lift in 5 weeks may be a reasonable tradeoff of impact (lift) for effort (experiment runtime), but a 2% lift measured over several months may not be. If the traffic cost is too high, consider de-prioritizing the hypothesis in your roadmap or seek to measure lift with less granularity.

Use a range of MDEs to identify the time you are willing to invest in each experiment. This range can also help you decide whether to keep running or stop an experiment with inconclusive results.

MDE and operation constraints

Your time to run an experiment may be limited for operational reasons, such as:

  • Traffic – You can only allocate a limited amount of traffic to an experiment (or, your site has relatively low traffic).
  • Time – You are trying to get results quickly due to operational pressures.
  • Value – You will not run the experiment unless you can prove that it provides a certain amount of value.

Use MDE to make informed business decisions.

If you have only two weeks to run an experiment, for example, plot how MDE can be measured for other goals in the experiment. See what impact you can observe in the two weeks that you have.

According to the sample graph above, you could capture small changes for Goal 2 (yellow) and Goal 3 (green) within two weeks: 2% and 3% lift, respectively. However, you would only be able to observe an 8% change or higher in Goal 4 (red).

Is 8% lift granular enough for this experiment? If so, launch the experiment with the two-week window. If you must detect improvement in that goal with better granularity, consider pushing for more time for the experiment or blocking out time for it in your roadmap. In either case, you have a clearer idea of what to expect from this experiment.

Use MDE and time to run an experiment to inform how you prioritize experiments in your roadmap.

Getting better at estimating MDEs requires setting limits and ranges rather than looking for exact numbers. If you are off by a few percent, you can run the experiment longer to find the answer. Ensure this aligns with your roadmap or if you could be testing something else.

Examining your experiment plan to understand the estimated time can help with prioritization, letting you build a more detailed, intentional roadmap.