Imbalance detected: What to do if Optimizely's automatic SRM detection alerts you to an imbalance in your Stats Engine A/B test

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)

Sample ratio mismatch (SRM)

A sample ratio mismatch (SRM) occurs when the traffic distribution between variations in an A/B experiment becomes severely imbalanced due to an implementation issue. This imbalance may lead to experiment degradation and, in extreme cases, inaccurate results. It is important to remember that not all imbalances should cause immediate panic and abandonment of the experiment.

Timing is important for determining when to do an imbalance check for an experiment. That is why Optimizely's automatic SRM detection evaluates an experiment continuously. An imbalance's detection indicates a symptom of various data quality issues.

Automatic SRM detection alerts

An automatic SRM detection alert does not necessarily mean your experiment is ruined. If Optimizely's automatic SRM detection alerts you to an imbalance in your Stats Engine A/B test, that may indicate an external influence affecting the distribution of traffic. It is important to exercise caution and refrain from overreacting to every traffic disparity, as this does not instantly signify that an experiment is useless.

The following are the different severity levels of an SRM imbalance. They are categorized as follows (from highest severity to lowest):

  1. Immediate, critical failure
  2. Monitor performance
  3. Expected behavior

Immediate, critical failure

A Critical experiment health status is the highest priority visitor imbalance. Investigate why the traffic imbalance occurs. This means that the experiment experiences a statistically significant difference in counts at a radically different probability than what you intended, also known as a consistent and non-ignorable underlying assignment bias.

See Take action now: Critical imbalance experiment health for examples of why a critical imbalance may occur and what you should do if your experiment is reporting a critical status.

Monitor performance

minorimbalance_nobanner_2024.png

A Minor imbalance detected experiment health status indicates that a minor visitor imbalance is detected by Optimizely Experimentation. Investigate why the visitor imbalance occurs. Continue to monitor the performance of your experiment if a minor visitor imbalance is detected.

See, My experiment is reporting a minor imbalance for its experiment health status. How worried should I be? for examples of why your experiment may report a Minor imbalance detected status and what you should do if your experiment reports this status.

Expected behavior of the experiment health

A Good experiment health status indicates that no visitor imbalance is detected. You do not need to do anything, and your experiment is running smoothly.

See Good experiment health for information on this health status. 

Information about a normal traffic split

You may observe that the number of visitors assigned to each experiment variation is never exactly at a 50/50 split, yet the experiment shows a green checkmark of "good" health. This is not a bug. Do not expect a perfect 50/50 split for every experiment you run. There is some slight deviation. 

An imbalance occurs when the actual proportion of traffic does not match the intended size assigned to a variation. It is impossible to visually check how improbable the severity or lack of assignment bias to a particular variation may be across the life of the experiment. When an experiment shows a good health status and a slightly imperfect split, the algorithm has determined that there is nothing unusual from what you intended.

This is supported by Optimizely's hashing function, which determines what variation to show to a user. It uses a Murmurhash function to assign visitors. For bucketing, that means Optimizely assigns each user a number between 0 and 10,000 to determine if they qualify for the experiment and, if so, which variation they see.

Think of it as a coin flip per user, but that coin flip always gives the same result for the same user. If you flip a coin 10,000 times, it is extremely unlikely you get precisely 5,000 heads and 5,000 tails.

  • Achieving a perfect, exact 50/50 split of 5,000 heads has a probability of 0.008 (or 0.8%) if you repeat that process indefinitely.
  • You get approximately 5,000 heads in 10,000 independently and identically distributed (IID) fair coin flips.

Differentiating good traffic splits from minor and critical imbalances

In a live experiment, the actual split of traffic assigned to a variation relatively aligns with the intended traffic split you assigned to a variation. A sample ratio mismatch is a statistically significant difference in traffic splits from the intended experiment design. This does not imply any visible difference. 

Optimizely Experimentation's automatic SRM detection algorithm considers:

  • the severity of the imbalance.
  • the consistency of the imbalance over time.

For the detection algorithm to declare an imbalance, it observes the traffic distribution settle into a consistent pattern of being weighted more towards one variation over another. Eventually, this results in a cumulative effect that drives the statistical significance below the threshold cutoff of the test.

Not all traffic splits are the same. In the initial stages of an experiment, if the traffic volume is relatively low, your percentage differences can be exaggerated. However, they may not be statistically extreme upon analysis by Optimizely's statistical imbalance detection test. This is partly why the automatic SRM detection algorithm does not start analysis until more than 1,000 visitors arrive at a Stats Engine experiment.