Optimizely's automatic Sample Ratio Mismatch (SRM) detection discovers any experiment deterioration early

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)

Optimizely's automatic SRM detection beta is now closed. We are now preparing for wide release! If you are an Optimizely customer, check out our other fantastic programs on the beta signup page.

Sample Ratio Mismatch occurs when the traffic distribution between variations in an A/B experiment becomes severely and unexpectedly unbalanced, often due to an implementation issue. If an SRM does occur, it indicates a potential external influence affecting the distribution of traffic. It is important to exercise caution and refrain from overreacting to every traffic disparity, as this does not automatically signify that an experiment is useless.

How Optimizely protects your A/B experiments with its automatic SRM detection

Optimizely aims to alert customers to any experiment deterioration as soon as possible. Early detection helps you decide the severity of the imbalance and stop a faulty experiment as quickly as possible. This early detection can greatly reduce the number of potential users exposed to a faulty experiment.

To rapidly detect deterioration caused by mismanaged traffic distribution, Optimizely's automatic SRM detection uses a statistical method called sequential sample ratio mismatch (SSRM). Optimizely's SSRM algorithm continuously checks traffic counts throughout an experiment. It allows for immediate detection at the beginning of an experiment's lifecycle instead of waiting until the experiment's end to test for an imbalance.

For information on why Optimizely does not use chi-squared tests to evaluate for imbalances, see A Better Way to Test for Sample Ratio Mismatches (SRMs) and Validate Experiment Implementations.

Sample ratio mismatch (SRM)

An SRM occurs when the traffic distribution between variations in an A/B experiment becomes significantly imbalanced due to an implementation issue. Optimizely's Stats Engine does not generate SRMs and its traffic-splitting mechanism is trustworthy. A severe traffic distribution imbalance may lead to experiment degradation and, in extreme cases, inaccurate results.

For example, in an A/B test, you set a 50/50 traffic split between Variation A and Variation B. But instead, you observe a 40/60 traffic distribution.

It is important to remember that not all imbalances should cause immediate panic and abandonment of the experiment. If the cause of the traffic distribution imbalance is well understood, you can still make concrete conclusions. An imbalance does not mean your experiment results are immediately invalid.

Evaluating experiments for traffic imbalances is most helpful at the start of your experiment launch period. Finding an experiment with an unknown source of traffic allocation imbalance lets you turn it off quickly and reduce the blast radius.

Remember, Optimizely employs automatic SRM detection that leverages a sequential sample ratio mismatch algorithm. That algorithm is continuously and efficiently checking traffic counts throughout an experiment. Currently, Optimizely's automatic SRM detection is only for A/B experiments.

Causes of imbalance

The detection of an imbalance by Optimizely's automatic SRM detection is a symptom of various data quality issues. Implementation errors and third-party bots are the most common culprits behind experiment imbalances. To minimize the likelihood of an imbalance, you should set up your experiment carefully.

Specific experiment configurations pose a greater risk of an imbalance occurring. Assess the following scenarios to see if they are relevant to your experiment structure.

Redirect experiments

Redirect experiments are a known and reasonable cause of traffic imbalance. In Optimizely Web Experimentation or Optimizely Performance Edge, they allow you to compare two separate URLs as variations in an A/B test. For example, you might create a redirect experiment that compares two landing page versions.

Due to the nature of redirects, clients may close the window or tab and leave a page before the redirect finishes executing. The Optimizely Web Experimentation code does not activate in this situation, preventing the transmission of the event data to Optimizely. Optimizely never receives the data, so Optimizely does not count the user. Redirect experiments are valid experiments, but it is reasonable a slight imbalance may occur.

URL redirects can vary and you cannot rely on them to behave in a consistent manner. It is unreasonable to expect a specific fixed rate of redirects. Therefore, Optimizely does not endorse ad-hoc adjustments with over or under-correction of traffic for redirect experiments. We also do not endorse running redirect experiments for an extended period of time solely to rebalance visitor counts.

There are two major reasons Optimizely delays event tracking until after the redirect is completed:

  1. Performance

    Optimizely's redirect hides the initial page's content through CSS. Naturally, there is a delay between a user accessing a webpage and actually receiving any content. End users are rightfully very sensitive to site performance. If Optimizely waited until the event was sent, this would exacerbate the delay further. And if changes are applied to the page a customer is redirected to, then the snippet still needs to apply and the user would be further pushed out from receiving an experience. Optimizely minimizes this extra time by waiting until the customer is on the second page.

  2. Accuracy

    The only way we know for sure the redirect was complete, and the user is receiving the variation they were bucketed into, is to send the event once the second page loads. You might think giving the snippet the time it needs to send the event and confirm receipt, then redirect, ensures accurate results. However, that is not the case if Optimizely were to count users in the redirect variation inaccurately and include their data (or lack thereof) in the results processing. That would distort the reliability and precision of metrics reported on the results page.

    There are a multitude of reasons why a redirect may fail. For example,

    1. The browser may reject it if there are too many redirects. Optimizely may not be the only thing redirecting the user, and it may be one step in a series of redirects.
    2. A user could have a browser setting or extension that rejects redirects.
    3. The delay could be long enough that a user closes the tab before the redirect has finished. 

For more on this topic, refer to our documentation on redirect experiments.

Reduce and then increase traffic allocation

There are two major ways an experimenter can irreparably harm the results of their experiments. The first way is down and then up ramping, which causes rebucketing.

Bucketing at Optimizely is:

  • Deterministic – With the way Optimizely hashes user IDs, a returning user is not reassigned to a new variation.
    In most cases, the user's visitor ID is not a consistent ID that works across devices (like a customer ID for logged-in users). Therefore, the user does not see the same variation across devices.
  • Sticky unless reconfigured – If you reconfigure a "live," running experiment, for example, by decreasing and then increasing traffic, a user may get rebucketed into a different variation.

When users are rebucketed because an experimenter down-ramped and then up-ramped their own traffic allocation, it will distort visitor counts for each variation of that experiment. This results in a traffic imbalance caused by the experimenter. 

Forced-bucketing

The second major way you can irreparably destroy your results of your experiment is with forced-bucketing into a legacy Full Stack Experimentation experiment.

If a user first gets bucketed in an Optimizely Web Experimentation experiment and then that decision is used to force-bucket them in a legacy Full Stack Experimentation experiment, then the results of that Full Stack Experimentation experiment become imbalanced.

See the following example of how force-bucketing can cause an imbalance:

Imagine an experiment with two variations: Variation A and Variation B.

Variation A is providing a superior user experience in comparison to Variation B. Visitors who were assigned to Variation A are finding it enjoyable, and many of them continue to log in and land in the Full Stack Experimentation experiment, where they are force-bucketed to Variation A.

In contrast, visitors who were assigned to Variation B are not having a good experience and only a few proceed to log in and land in the Full Stack Experimentation experiment, where they are assigned to Variation B.

As a result, there are significantly more visitors in Variation A than in Variation B, and these Variation A visitors are likely to convert more in the Full Stack Experimentation experiment because they are happier with their experience. In addition to visitor traffic split imbalance, metrics and conversion rates are also skewed.

Delayed or failed Optimizely API calls

The Event API sends event data directly to Optimizely Experimentation. A traffic imbalance may occur if anything happens that causes the calls to delay or not fire. For example,

Differences in the snippet or event dispatch timing

The Optimizely Web Experimentation snippet is one line of JavaScript code that executes on your website. If something causes the snippet code to misfire, a traffic distribution imbalance may occur. Additionally, if you use the holdEvents or sendEvents JavaScript APIs in a location other than in the project JavaScript, the script may not load properly, resulting in a traffic distribution imbalance. Adding more scripts to your webpage may cause implementation or loading rates to differ across variations, particularly in the case of redirects.

The Optimizely Feature Experimentation SDKs make HTTP requests for every decision event or conversion event that gets triggered. Each SDK has a built-in event dispatcher for handling these events. A traffic distribution may occur if the events are dispatched incorrectly due to misconfiguration or other dispatching issues.

For more information, refer to your Feature Experimentation SDK's configure event dispatcher documentation:

What to do if Optimizely identifies an imbalance in your experiment

If Optimizely identifies an imbalance in your experiment, troubleshooting depends on the cause of the imbalance and if you can correct the problem directly.

If you can determine the root cause, you can quickly stop your experiment, fix the underlying issue, duplicate the experiment, and start that new experiment. You can continue monitoring your fix in the new experiment to verify that you have corrected the problem.

If you cannot determine the root cause of the imbalance, stop the experiment or remove variation quickly to lessen the negative impact on customers to investigate further. For more information, refer to Imbalance detected: What to do next if Optimizely identifies an SRM.