Optimizely's automatic sample ratio mismatch detection

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)

A sample ratio mismatch (SRM) occurs when the traffic distribution between variations in a Stats Engine A/B experiment becomes severely and unexpectedly unbalanced, often due to an implementation issue or third-party bots.

If an SRM does occur, it indicates a potential external influence affecting the distribution of traffic. It is important to exercise caution and refrain from overreacting to every traffic disparity, as this does not automatically signify that an experiment is useless.

How Optimizely protects your Stats Engine A/B experiments with its automatic SRM detection

Optimizely Experimentation aims to alert customers to any experiment deterioration as soon as possible. Early detection helps you decide the severity of the imbalance and stop a faulty experiment. This early detection can greatly reduce the number of potential users exposed to a faulty experiment.

To rapidly detect deterioration caused by mismanaged traffic distribution, Optimizely Experimentation's automatic SRM detection uses a statistical method called sequential sample ratio mismatch (SSRM). Optimizely's SSRM algorithm continuously checks traffic counts throughout an A/B experiment. It provides immediate detection at the beginning of an experiment's lifecycle instead of waiting until the experiment's end to test for an imbalance.

For information on why Optimizely does not use chi-squared tests to evaluate for imbalances, see A Better Way to Test for Sample Ratio Mismatches (SRMs) and Validate Experiment Implementations.

Going through your old experiences and trying to find imbalances using an online ratio mismatch calculator is not helpful. This retroactive or end-of-experiment imbalance check is not a recommended use of your time. Retroactive imbalance testing informs you about a possible implementation problem only after the experiment has collected all the data, which is far too late and goes against why most experimenters want imbalance detection in the first place.

Optimizely Experimentation emphasizes the importance of running automatic checks. The automatic SRM detection algorithm created at Optimizely checks for imbalances after every data point, not just at the end, so that you can identify actual problematic imbalances at the first sign of trouble.

Optimizely's automatic SRM notification is only available for A/B tests:
  • With the traffic distribution set to Manual (Stats Accelerator is NOT enabled).
  • That are running for 45 days or less. Measured as total running time, not age. The days the experiment is paused do not count towards the day total.
  • That have at least 1000 visitors.

Segmenting experiment results

Optimizely Experimentation does not check for visitor imbalances when you segment your results.

Segments and filters should only be used for data exploration, not making decisions.
There is something called the Sure Thing Principle (STP)1,2. If doing something increases the chances of something bad happening across almost all your segments, it also dramatically raises the chances of something bad happening overall in your experiment. So, Optimizely Experimentation does not check for visitor imbalances in segments.

Citations: 1. Savage, L. J. (1954). The Foundations of Statistics. John Wiley & Sons. 2. Joyce, J. M. (1999). The Foundations of Causal Decision Theory. Cambridge University Press.

Paused or archived experiments and flags

Optimizely Experimentation does not check for visitor imbalances for the following:

Sample ratio mismatch

An SRM occurs when the traffic distribution between variations in a Stats Engine A/B experiment becomes significantly imbalanced. Optimizely Experimentation's Stats Engine does not generate SRMs, and its traffic-splitting mechanism is trustworthy. A severe traffic distribution imbalance may lead to experiment degradation and, in extreme cases, inaccurate results.

For example, in a Stats Engine A/B test, you set a 50/50 traffic split between Variation A and Variation B. But instead, you observe a 40/60 traffic distribution.

Remember, not every imbalance is a reason to panic and immediately abandon your experiment. If you properly understand the cause of the traffic distribution imbalance, you can still make concrete conclusions. An imbalance does not mean your experiment results are immediately invalid.

Evaluating experiments for traffic imbalances is most helpful at the start of your experiment launch period. Finding an experiment with an unknown source of a traffic imbalance lets you turn it off quickly and reduce the blast radius.

Optimizely Experimentation's automatic SRM detection leverages a sequential sample ratio mismatch algorithm. That algorithm continuously and efficiently checks traffic counts throughout an experiment. Optimizely Experimentation's automatic SRM detection is only for stats engine A/B experiments.

For information on why a traffic imbalance may be occurring, see Possible causes for traffic imbalances.