Take action now: Critical imbalance experiment health

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)

The bottom line 

A Critical Experiment Health warning on the experimentation results page means that your experiment has failed two different imbalance checks. The patterns of visitor assignments to variations in your experiment must be investigated to identify the problems. 

The red exclamation point

When you launch a Stats Engine A/B test, Optimizely Experimentation's automatic sample ratio mismatch (SRM) detection runs in the background. This algorithm finds unusual imbalances in visitor traffic to an experiment's variations and detects experiment problems early if they arise.

Optimizely's automatic SRM detection tool combines the power of automatic visitor imbalance detection with an indicator of experiment health. This indicator alerts an experimenter to any visitor traffic issues and tells you when a critical traffic imbalance occurs.

An experiment that shows Critical health status has failed two different imbalance checks. It means Optimizely's automatic SRM detection algorithm identified an imbalance in the following:

  1. Total decisions – Are the aggregated impressions per user that Optimizely Experimentation receives regardless of their variation assignment in an experiment.
  2. First decisions – Reflect the first impression of a variation by a particular user, which is captured when a visitor is initially bucketed into a variation.

Critical health status

A Critical experiment health status is the highest priority visitor imbalance. It means the experiment is experiencing a statistically significant difference in traffic balance, and there is evidence of consistent and undeniable underlying assignment bias. The experiment should be paused immediately to investigate the issue.

See Imbalance detected: What to do if Optimizely's automatic SRM detection alerts you to an imbalance in your Stats Engine A/B test

Examples of why a critical imbalance occurred

The following examples are not an exhaustive range of imbalance causes. 

The following are examples of situations that may have occurred. Decide what to do with your experiment, depending on your situation. See additional Causes of imbalance.

Example 1: Traffic distribution changes mid-experiment

You have three variations in your experiment. You set the traffic distribution to 0% for one of those variations. After the experiment launches, someone on your team ramps up that 0% traffic distribution. Now your results look strange.

The experiment has become completely corrupted. It is entirely unusable. End the experiment immediately.

To restart your experiment safely, duplicate it and do not adjust the traffic experiment while it is running. For best practices, Optimizely strongly recommends against setting 0% traffic to a single variation. Instead, you can remove that variation from the experiment because no traffic is being sent to it.

Changing an experiment's traffic distribution while the experiment is running is the worst thing any experimenter can do to their experiment program. Specifically, assigning 0% traffic to a variation and launching the experiment, then at some point changing the traffic distribution to that previously 0% variation is the ultimate way to destroy results by deliberately triggering a severe imbalance and inviting Simpson's Paradox directly into the experiment.

Example 2: The results page shows a different traffic split than what you set

Your experiment is set at a 50/50 traffic split, but the results page displays an 80/20 split. None of the metrics shows the polar opposite or the expected uplift behavior presented consistently in similar experiments.

Optimizely's automatic SRM detection identifies the earliest date of your experiment imbalance. Correlate that date with any unusual behavior in the experiment change history. This lets you investigate if a team member has biased traffic distribution.

With the earliest date, you can investigate if a purposeful or accidental triggering of a third-party audience segmentation software occurred that rerouted traffic while the experiment was live.

Another way to determine if an experiment was destined from launch for corrupted results is to employ carefully chosen guardrail metrics as monitoring goals. Guardrail metrics, such as bounce rate, can help in triangulating evidence for fatal implementation errors.