Maximize lift with multi-armed bandit optimizations

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Web Personalization
  • Optimizely Feature Experimentation
  • Optimizely Full Stack (Legacy)

Traditional A/B tests and multi-armed bandits (MABs) differ as:

  • Multi-armed bandits do not generate statistical significance.

  • Multi-armed bandits do not use a control or a baseline experience.

Instead of statistical significance, the MAB results page focuses on improvement over equal allocation as its primary summary of your optimization's performance. This article breaks down the key differences between MABs and traditional A/B tests, demonstrates how each approach would perform in an identical situation, and answers some frequently asked questions.

Why MABs do not show statistical significance

With a traditional A/B test, the goal is exploration: collecting data to discover if a variation performs better or worse than the control. This is expressed through the concept of statistical significance.

Statistical significance tells you whether a change had the effect you expected. You can use those lessons to make your variations better each time. Fixed traffic allocation strategies are usually the best ways to reduce the time it takes to reach a statistically significant result.

On the other hand, Optimizely Experimentation's MAB algorithms are designed for exploitation: MABs aggressively push traffic to whichever variations are performing best, because the MAB does not consider the reason for that superior performance to be very important.

Since MABs essentially ignore statistical significance, Optimizely Experimentation does too. This is why statistical significance does not appear on the results page for MABs: It avoids confusion about the purpose and meaning of MAB optimizations.

Why MABs do not use a baseline

In a traditional A/B test, statistical significance is calculated relative to the performance of one baseline experience. But MABs do not do this. Instead, MABs continuously make these decisions throughout the experiment’s lifetime. They explicitly evaluate the tradeoffs between all variations at once, instead of continuously comparing performance to baseline experience as a reference. MABs are excellent for validating your optimized variations and confirming whether to deploy a winning variation or stay with the control.

Improvement over original

Improvement over original is an estimate of the gain in total conversions compared to simply delivering all traffic to the original variation.

To calculate it, Optimizely Experimentation examines the cumulative average conversions per visitor for each variation. Then it multiplies the original's conversion rate by the total number of visitors in the test. Finally, this number is compared to the observed conversion counts in the test.

There are no statistical significance measures associated with this calculation. It does not predict or guarantee any reproducibility in future tests or campaigns. Also, the original variation in this context is the first variation in the list and may not be named "original" if you changed it.

Algorithms powering Optimizely Experimentation's MAB

Optimizely uses a different algorithm based on what type of metric you are taking:

  • Binary metrics – Optimizely Experimentation uses a Bayesian MAB procedure called Thompson Sampling [Russo, Van Roy 2013]. Optimizely characterizes each variation as a Beta distribution, where its parameters are the variation's observed number of conversions and visitors. These distributions are sampled several times, and Optimizely allocates traffic to the variations according to their win ratio.
  • Numeric metrics – Optimizely Experimentation uses the Epsilon Greedy procedure, where a small fraction of traffic is uniformly allocated to all variations, and a large amount is allocated to the variation with the highest observable mean.

MAB optimization versus A/B testing: a demonstration

In the following head-to-head comparison, simulated data is sent to both an A/B test with fixed traffic distribution and a MAB optimization. Traffic distribution over time and the cumulative count of conversions for each mode are both observed. The true conversion rates driving the simulated data are:

  • Original – 50%

  • Variation 1 – 50%

  • Variation 2 – 45%

  • Variation 3 – 55%


The MAB algorithm indicates that Variation 3 is higher-performing from the start. Even without any statistical significance information for this signal (remember, the multi-armed bandit does not show statistical significance), it still begins to push traffic to Variation 3 in order to exploit the perceived advantage and gain more conversions.

For the ordinary A/B experiment, the traffic distribution remains fixed in order to more quickly arrive at a statistically significant result. Because fixed traffic allocations are optimal for reaching statistical significance, MAB-driven experiments generally take longer to find winners and losers than A/B tests.

By the end of the simulation, the MAB optimized the experiment to achieve roughly 700 more conversions than if traffic was held constant.

Traffic distribution is updated frequently, so Optimizely Feature Experimentation customers should implement sticky bucketing to avoid exposing the same visitor to multiple variations. To do this, implement the user profile service. See our developer documentation for more detail.


Does the multi-armed bandit algorithm work with Multivariate tests and Web Personalization?

Yes. To use a MAB in a Multivariate test (MVT), select Partial Factorial. In the Traffic Mode dropdown, select Multi-armed Bandit.

mab in mvt.png

In Optimizely Web Personalization, you can apply a MAB on the experience level. This works best when you have two variations aside from the holdback.


How often does the Multi-armed bandit make a decision?

The MAB model is updated hourly. If you need a different frequency for model updates, you can contact Optimizely

Why is a baseline variation listed on the Results page for my multi-armed bandit campaign?

In MVT and Web Personalization, your Results page still designates one variation as a baseline. However, this designation does not actually mean anything because MABs do not measure success relative to a baseline variation. It is just a label that has no effect on your experiment or campaign.

You should not see a baseline variation in the Results page when using a MAB with an Optimizely Web Experimentation or Optimizely Feature Experimentation experiment.

What happens if I change my primary metric?

If you change the primary metric mid-experiment in a Web Experimentation MVT or Web Personalization, the MAB begins optimizing for the new primary metric instead of the one you originally selected. For this reason, do not change the primary metric once you begin the experiment or campaign.

You cannot change your primary metric in Optimizely Web Experimentation or Optimizely Feature Experimentation once your experiment has begun. See Why you should not change a running experiment for more information.

What happens when I stop or pause a variation?

If you pause or stop a variation, the MAB ignores data from those variations when it adjusts traffic distribution among the remaining live variations. But there are side effects you should be aware of. If you change variations mid-experiment, the MAB needs to re-optimize its strategy and adapt the traffic accordingly. Additionally, doing so will add weeks for an experiment to complete. Also, when the swapping occurs, the original bandit model gets destroyed. The MAB has to start from scratch to optimize for the reduction in variations. Because of this, Optimizely recommends not changing variations mid-experiment. See Why you should not change a running experiment for more information.

How do multi-armed bandits handle conversion rates that change over time and Simpson's Paradox?

Optimizely Experimentation uses an exponential decay function that weighs recent visitor behavior more strongly to better adapt to the effect of time variation more quickly. This approach gives less weight to earlier observations and more weight to recent ones.

Also, Optimizely Experimentation reserves a portion of traffic for pure exploration so that time variation is easier to detect.