Why you should not change a running experiment

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)

Although the Optimizely Experimentation application does not forbid it, we strongly recommend not changing an active experiment. Optimizely does not recommend:

  • Traffic Allocation – Proportion of total traffic included in the experiment
  • Traffic Distribution – Proportion of of traffic sent to a particular variation

Considerations for changing traffic allocation

Even though Optimizely does not advise changing traffic allocation during a running experiment, changing the traffic allocation monotonically is acceptable but can lengthen the time to statistical significance.

However, if you change traffic non-monotonically (for example, first decreasing, then increasing the traffic), your users may be re-bucketed, potentially resulting in statistically invalid experiments results. See how bucketing works for more information.

Setting a radically low traffic allocation (for example, <25%) for an extended amount of time (more than one or two days) could under-power experiments and impact the results negatively. 

Considerations for changing traffic distribution

You should not change the traffic distribution between variations of a running experiment. Doing so will invalidate the data and can lead to Simpson's paradox. The only way to safely change the traffic distribution is to:

  1. Pause the experiment.
  2. Either duplicate the experiment in Optimizely Web Experimentation or copy the experiment rule in Optimizely Feature Experimentation.
  3. Publish the new experiment.

The potential harm of changing a running experiment

When an experiment runs, Optimizely Experimentation collects the conversion data for all times and compare it to the control group conversion data for the same time. If a change is made midway through the experiment, then the effects of that change can only be measured starting at that time. This will invalidate your results.

How changing a running experiment can affect result data accuracy

Suppose a change that you made improves conversions by 5%. You take note of that change and then decide you are going to add another change to that variation that you think will have the same effect.

Suddenly, your conversion rate drops back to the same as the control group. Now you do not know whether the drop is because you made a change or because the original change performed worse than the numbers initially indicated.

Let us say the conversion rate did not drop to the control group rate but dipped to 2%. Again, you can not be sure what caused that dip, but you see it as an all-around improvement. However, the second change may harm conversions, but the positive effect of the first change negated that decrease.

How adding metrics can affect results data accuracy

For example, imagine you are running an experiment with revenue as the primary metric and some page views as the secondary metrics. With these metrics, the experiment reaches 90% statistical significance in a few days.

After a few days, you want to check how some Call to Action (CTA) click events are performing, so you add those as metrics to your already running experiment. Will these new, additional metrics affect the statistical significance of the initial metrics (revenue and page views)?

Yes, adding metrics after an experiment has started distorts the experiment results. The first metric (primary metric) will be unaffected, but the second metric and onwards will be affected by Optimizely Experimentation's False Discovery Rate control.

Best practices

If you want to change your running experiment, you must pause the existing experiment and start a new one. This way, you do not contaminate the data from any changes you make. You can duplicate an experiment in Optimizely Web Experimentation or copy a rule in Optimizely Feature Experimentation to create a new experiment similar to an existing one. Duplicating experiments will not duplicate the results of the original experiment.

If you intend to pause a variation, the decision should be final. You should not use that result information later to compare to the still-running variations. If any events impact all variations, the changes will not be reflected by the paused variation.

These best practices apply to Optimizely Web Experimentation, Optimizely Performance Edge, Optimizely Feature Experimentation, and all A/B testing tools.