- Optimizely Web Experimentation
- Optimizely Personalization
- Optimizely Performance Edge
- Optimizely Feature Experimentation
- Optimizely Full Stack (Legacy)
Sample ratio mismatch (SRM)
A sample ratio mismatch (SRM) occurs when the traffic distribution between variations in an A/B experiment becomes severely imbalanced due to an implementation issue. This imbalance may lead to experiment degradation and, in extreme cases, inaccurate results. It is important to remember that not all imbalances should cause immediate panic and abandonment of the experiment.
Timing is important for determining when to do an imbalance check for an experiment. That is why Optimizely's automatic SRM detection evaluates an experiment continuously. An imbalance's detection indicates a symptom of various data quality issues.
Automatic SRM detection alerts
An automatic SRM detection alert does not necessarily mean your experiment is ruined. If Optimizely's automatic SRM detection alerts you to an imbalance in your Stats Engine A/B test, that may indicate an external influence affecting the distribution of traffic. It is important to exercise caution and refrain from overreacting to every traffic disparity, as this does not instantly signify that an experiment is useless.
The Experiment Health indicator on the Optimizely Experiment Results page alerts you if your experiment is experiencing an SRM.
Critical health status
A Critical experiment health status indicates that there is a critical visitor imbalance. This means that the experiment experiences a statistically significant difference in counts at a radically different probability than you intended, also known as a consistent and non-ignorable underlying assignment bias.
See Possible causes for traffic imbalances for reasons why a critical imbalance may occur, and what you should do if your experiment reports a critical status.
Good health status
A Good experiment health status indicates that no visitor imbalance is detected. You do not need to do anything, and your experiment runs smoothly.
Information about a normal traffic split
You may observe that the number of visitors assigned to each experiment variation is never exactly at a 50/50 split, yet the experiment shows a green checkmark of "good" health. This is not a bug. Do not expect a perfect 50/50 split for every experiment you run. There is some slight deviation.
An imbalance occurs when the actual proportion of traffic does not match the intended size assigned to a variation. It is impossible to visually check how improbable the severity or lack of assignment bias to a particular variation may be across the life of the experiment. When an experiment shows a good health status and a slightly imperfect split, the algorithm has determined nothing unusual from what you intended.
This is supported by Optimizely's hashing function, which determines what variation to show to a user. It uses a Murmurhash function to assign visitors. For bucketing, that means Optimizely assigns each user a number between 0 and 10,000 to determine if they qualify for the experiment and, if so, which variation they see.
Think of it as a coin flip per user, but that coin flip always gives the same result for the same user. If you flip a coin 10,000 times, it is extremely unlikely you get precisely 5,000 heads and 5,000 tails.
- Achieving a perfect, exact 50/50 split of 5,000 heads has a probability of 0.008 (or 0.8%) if you repeat that process indefinitely.
- You get approximately 5,000 heads in 10,000 independently and identically distributed (IID) fair coin flips.
Differentiating good traffic splits from critical imbalances
In a live experiment, the actual split of traffic assigned to a variation relatively aligns with the intended traffic split you assigned to a variation. A sample ratio mismatch is a statistically significant difference in traffic splits from the intended experiment design. This does not imply any visible difference.
Optimizely Experimentation's automatic SRM detection algorithm considers:
- the severity of the imbalance.
- the consistency of the imbalance over time.
For the detection algorithm to declare an imbalance, it observes the traffic distribution settle into a consistent pattern of being weighted more towards one variation over another. Eventually, this results in a cumulative effect that drives the statistical significance below the test's threshold cutoff.
Not all traffic splits are the same. In the initial stages of an experiment, if the traffic volume is relatively low, your percentage differences can be exaggerated. However, according to Optimizely's statistical imbalance detection test analysis, they may not be statistically extreme. This is partly why the automatic SRM detection algorithm does not start analysis until more than 1,000 visitors arrive at a Stats Engine experiment.
Please sign in to leave a comment.