Optimizely's automatic sample ratio mismatch detection discovers any experiment deterioration early

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)

A sample ratio mismatch (SRM) occurs when the traffic distribution between variations in a Stats Engine A/B experiment becomes severely and unexpectedly unbalanced, often due to an implementation issue. If an SRM does occur, it indicates a potential external influence affecting the distribution of traffic. It is important to exercise caution and refrain from overreacting to every traffic disparity, as this does not automatically signify that an experiment is useless.

How Optimizely protects your Stats Engine A/B experiments with its automatic SRM detection

Optimizely aims to alert customers to any experiment deterioration as soon as possible. Early detection helps you decide the severity of the imbalance and stop a faulty experiment. This early detection can greatly reduce the number of potential users exposed to a faulty experiment.

To rapidly detect deterioration caused by mismanaged traffic distribution, Optimizely's automatic SRM detection uses a statistical method called sequential sample ratio mismatch (SSRM). Optimizely's SSRM algorithm continuously checks traffic counts throughout an experiment. It provides immediate detection at the beginning of an experiment's lifecycle instead of waiting until the experiment's end to test for an imbalance.

For information on why Optimizely does not use chi-squared tests to evaluate for imbalances, see A Better Way to Test for Sample Ratio Mismatches (SRMs) and Validate Experiment Implementations.

Sample ratio mismatch

An SRM occurs when the traffic distribution between variations in a Stats Engine A/B experiment becomes significantly imbalanced due to an implementation issue. Optimizely's Stats Engine does not generate SRMs, and its traffic-splitting mechanism is trustworthy. A severe traffic distribution imbalance may lead to experiment degradation and, in extreme cases, inaccurate results.

For example, in a Stats Engine A/B test, you set a 50/50 traffic split between Variation A and Variation B. But instead, you observe a 40/60 traffic distribution.

It is important to remember that not every imbalance is a reason to panic and immediately abandon the experiment. If you properly understand the cause of the traffic distribution imbalance, you can still make concrete conclusions. An imbalance does not mean your experiment results are immediately invalid.

Evaluating experiments for traffic imbalances is most helpful at the start of your experiment launch period. Finding an experiment with an unknown source of a traffic imbalance lets you turn it off quickly and reduce the blast radius.

Optimizely's automatic SRM detection leverages a sequential sample ratio mismatch algorithm. That algorithm continuously and efficiently checks traffic counts throughout an experiment. Currently, Optimizely's automatic SRM detection is only for stats engine A/B experiments.

Causes of imbalance

The detection of an imbalance by Optimizely's automatic SRM detection is a symptom of various data quality issues. Implementation errors and third-party bots are the most common culprits behind experiment imbalances. To minimize the likelihood of an imbalance, you should set up your experiment carefully.

Specific experiment configurations pose a greater risk of an imbalance occurring. Assess the following scenarios to see if they are relevant to your experiment structure.

Redirect experiments

Redirect experiments are a known and reasonable cause of traffic imbalance. In Optimizely Web Experimentation or Optimizely Performance Edge, you can compare two separate URLs as variations in a Stats Engine A/B test. For example, you might create a redirect experiment that compares two landing page versions.

Due to the nature of redirects, clients may close the window or tab and leave a page before the redirect finishes executing. The Optimizely Web Experimentation code does not activate in this situation, preventing the transmission of the event data to Optimizely. Optimizely never receives the data, so Optimizely does not count the user. Redirect experiments are valid experiments, but it is reasonable a slight imbalance may occur.

URL redirects can vary, and you cannot rely on them to behave consistently. It is unreasonable to expect a specific fixed rate of redirects. Optimizely does not endorse ad-hoc adjustments with over or under-correction of traffic for redirect experiments. Optimizely also does not endorse running redirect experiments for an extended period solely to rebalance visitor counts.

There are two major reasons Optimizely delays event tracking until after the redirect is completed:

  1. Performance – Optimizely's redirect hides the initial page's content through CSS. Naturally, there is a delay between a user accessing a webpage and receiving any content. End users are rightfully sensitive to site performance. If Optimizely waits until the event is sent, the delay could be exacerbated. If changes are applied to the page a customer is redirected to, then the snippet still needs to apply, and the user gets pushed out from receiving an experience. Optimizely minimizes this extra time by waiting until the customer is on the second page.

  2. Accuracy – The only way Optimizely knows the redirect is complete, and the user is bucketed and receives the variation is to send the event when the second-page loads. You might think giving the snippet time to send the event and confirm receipt, then redirect, ensures accurate results. However, that is not the case if Optimizely counts users in the redirect variation inaccurately and includes their data (or lack thereof) in the results processing. That would distort the reliability and precision of metrics reported on the results page.

    There are a multitude of reasons why a redirect may fail. For example,

    1. The browser may reject it if there are too many redirects. Optimizely may not be the only thing redirecting the user, and it may be one step in a series of redirects.
    2. A user can have a browser setting or extension that rejects redirects.
    3. The delay can be long enough that a user closes the tab before the redirect has finished. 

For more on this topic, refer to Optimizely's documentation on redirect experiments.

How to irreparably harm your results

There are two major ways you can irreparably harm the results of your experiments. 

Reduce and then increase traffic allocation

If you down and then up-ramp traffic, which causes re-bucketing, you can irreparably harm the results of your experiment.

Bucketing at Optimizely is:

  • Deterministic – The way Optimizely hashes user IDs, a returning user is not reassigned to a new variation.
  • Sticky unless reconfigured – If you reconfigure a "live," running experiment, for example, by decreasing and then increasing traffic, a user may get rebucketed into a different variation.

When users are rebucketed because an experimenter down-ramped and then up-ramped their own traffic allocation, it distorts visitor counts for each variation of that experiment. This results in a traffic imbalance caused by the experimenter. 

Forced-bucketing

The second major way you can irreparably destroy the results of your experiment is with forced bucketing into a legacy Full Stack Experimentation experiment.

If a user first gets bucketed in an Optimizely Web Experimentation experiment and then that decision is used to force-bucket them in a legacy Full Stack Experimentation experiment, then the results of that Full Stack Experimentation experiment become imbalanced.

See the following example of how force-bucketing can cause an imbalance:

An experiment has two variations: Variation A and Variation B.

Variation A provides a superior user experience in comparison to Variation B. Visitors assigned to Variation A find it enjoyable, and many of them continue to log in and land in the Full Stack Experimentation experiment, where they are force-bucketed to Variation A.

In contrast, visitors assigned to Variation B do not have a good experience, and only a few proceed to log in and land in the Full Stack Experimentation experiment, where they are assigned to Variation B.

As a result, there are significantly more visitors in Variation A than in Variation B. These Variation A visitors are more likely to convert to the Full Stack Experimentation experiment because they are happier with their experience. In addition to visitor traffic split imbalance, metrics, and conversion rates are also skewed.

Additional ways to harm your results

There are other situations that may arise and cause irreparable harm to the results of your experiments. 

Delayed or failed Optimizely API calls

The Event API sends event data directly to Optimizely Experimentation. A traffic imbalance may occur if anything happens that causes the calls to be delayed or not fire. For example,

Differences in IDs across devices

In some cases, the user ID is not a consistent ID that works across devices (like a customer ID for logged-in users). So, the user does not see the same variation across devices.

Differences in the snippet or event dispatch timing

The Optimizely Web Experimentation snippet is one line of JavaScript code that executes on your website. A traffic distribution imbalance may occur if something causes the snippet code to misfire. Additionally, if you use the holdEvents or sendEvents JavaScript APIs in a location other than in the project JavaScript, the script may not load properly, resulting in a traffic distribution imbalance. Adding more scripts to your webpage may cause implementation or loading rates to differ across variations, particularly in the case of redirects.

The Optimizely Feature Experimentation SDKs make HTTP requests for every decision event or conversion event that gets triggered. Each SDK has a built-in event dispatcher for handling these events. A traffic distribution may occur if the events are dispatched incorrectly due to misconfiguration or other dispatching issues.

For information, refer to your Feature Experimentation SDK's configure event dispatcher documentation:

What to do if Optimizely identifies an imbalance in your experiment

If Optimizely identifies an imbalance in your experiment, troubleshooting depends on the cause of the imbalance and if you can correct the problem directly.

If you can determine the root cause, you can stop your experiment, fix the underlying issue, duplicate the experiment, and start that new experiment. You can continue monitoring your fix in the new experiment to verify that you have corrected the problem.

If you cannot determine the root cause of the imbalance, stop the experiment or remove variation to lessen the negative impact on customers to investigate further. For information, refer to Imbalance detected: What to do next if Optimizely identifies an SRM in your Stats Engine A/B test.