This article was originally posted on Optimizely.com as a blog post. It has been moved to the support documentation for completeness.
Most importantly, Optimizely has changed the way it refers to Accelerate Impact and Accelerate Learnings. Accelerate Impact is now referred to as Multi-Armed Bandit (MAB), and Accelerate Learning is known as Stats Accelerator (SA).
Customer feedback helped us identify and address the core confusion in understanding the distinction between Accelerate Learnings and Accelerate Impact.
Based on customer research over a two-month period on the usability of Stats Accelerator, we discovered that customers were unclear as to how to best use the product. As a refresher, there were two modes of Stats Accelerator, Accelerate Impact and Accelerate Learnings.
This article describes what Optimizely learned from our research and how we set out to improve the products.
The handful of customer challenges we discovered can be distilled down to misaligned expectations and unclear results.
Customers struggled to decide when to pick one mode or the other, often picking Accelerate Impact because it had the least amount of friction to set up. In contrast to Accelerate Impact, Accelerate Learnings requires at least two variations aside from the baseline. Even after selecting Accelerate Impact, customers were surprised to see their results page lacking the statistical values they had been accustomed to.
In search of those values, they would switch to Manual distribution because it yielded something they could report back to their stakeholders. This activity suggested that their mental framework was tied to A/B testing even though Accelerate Impact is centered on optimization.
This revealed a gap in understanding between when to experiment and when to optimize–the fundamental difference between when to use Accelerate Learning and Accelerate Impact. Admittedly, how we designed the interface and named the two modes (they both start with “Accelerate”) likely contributed to this conflation of the two concepts.
Experimentation and Optimization
The decision to run one or the other comes down to understanding the intent of your test. Do you want to optimize towards a primary metric? Or are you interested in discovering which variation improves your product – in the sense of statistical certainty that this change is indeed better and not just an anomaly?
Let us consider when to run an A/B test. You have a hypothesis, create the experiment, and run it. If a variation reaches statistical significance, you analyze the results, create a new hypothesis with these learnings and run a new experiment with an updated baseline.
This is different from optimization. With optimization, you are not concerned about learning insights; you want to maximize (or minimize) a metric. For example, a customer runs a website promotion to drive more sign-ups. Suppose this test had two variations: Variation 1 has a 10% conversion rate, and Variation 2 has a 20% conversion rate. With uniform allocation and 100 visitors.
Had they known that Variation 2 yields more conversions on average, they could have allocated all visitors to Variation 2 like Scenario 2:
During this promotion, the customer wants the algorithm to drive visitors to the variation that yields the most sign-ups. Compared to Scenario 1, Scenario 2 produced five additional conversions, a 33% improvement. Missing those five conversions is referred to (in Machine Learning) as regret, and that is what they want to minimize. This is an optimization problem. For a more extended treatise on the benefits of Machine learning, read Optimizely's blog post on 5 Ways to Use Machine Learning to Get More Conversions.
The customer running the site was not concerned about whether there was some actionable insight throughout their test. There is no intent to permanently implement the variation into their product since this is a promotion (check out our Knowledge Base article on when to experiment and Knowledge Base article on when to experiment and when to optimize).
And it is this fundamental difference between experimentation and optimization is where we focused our efforts on improving Stats Accelerator over the last year.
First, we changed the name to reflect the feature's intent, and we also placed additional guardrails in the creation flow and the results page.
We renamed Accelerate Impact to Multi-Armed Bandit (MAB) and Accelerate Learning to Stats Accelerator (SA). Though both use multi-armed bandits to allocate visitors dynamically, we used the commonly used industry term of MAB to rename Accelerate Impact because a multi-armed bandit is often associated with minimizing regret.
MAB Creation Flow
We have made MAB the first-class option when creating a test:
Before this change, using Accelerate Impact under the A/B test made it seem like statistical significance, and confidence intervals were part of the results. Users had to select “Accelerate Impact” within their A/B test’s Traffic Distribution.
In doing so, they were thinking of this optimization in the paradigm of an experiment. The reasons to move this out as a separate test option out of the A/B test are two-fold. First, we emphasize the distinction between MAB as an optimization and an A/B test. An A/B test aims to understand when a variation is better through statistical rigor, whereas a MAB should be used for maximizing reward (or minimizing regret). Second, with its own creation flow, MAB can be disassociated with the experimentation paradigm, encouraging our customers to decide which is appropriate for their business use-case when selecting these options.
We have included guardrails highlighting that MAB does not use values typical of an A/B Test.
MAB Results Page
Before the improvements, a results page for an A/B test with Accelerate Impact enabled looked similar to that of an A/B test.
Customers who saw this were confused, still expecting to see confidence intervals and statistical significance values, attributes akin to the experimentation framework. Now, emphasizing that an optimization ignores statistical significance, we have taken those columns out and focused on how MAB performs.
Users can get a summary of this performance and observe a new estimate called “Improvement Over Equal Allocation.” This is the cumulative gain of running a MAB over running an A/B test with uniform allocation.
To calculate this, we separate the history of the test into a series of equal-length epochs, where an epoch is a period of time where the traffic allocation and conversion rates are constant. Within each one, we calculate the gain of running a MAB over-allocating visitors to all arms/variations equally.
Check out our Maximize lift with multi-armed bandit optimizations article for a great explanation.
Moreover, we have included a tour to highlight the new aspects of the MAB results page.
Stats Accelerator Creation Flow
Aside from renaming Accelerate Learnings to Stats Accelerator, nothing else has changed. This mode is specifically for an A/B test, and enabling it requires you to change the Distribution Mode.
Stats Accelerator Results Page
There are several improvements for the Results Page of an A/B test with Stats Accelerator enabled. There is a badge that indicates when SA is enabled. More importantly, we have included a tour that focuses on Weighted Improvement, the estimate of the actual lift by filtering out bias within each epoch, and less on Conversion Rate, which is not used as an input to the algorithm.
Customer feedback helped us identify and address the core confusion in understanding the distinction between Accelerate Learnings and Accelerate Impact. We hope our improvements encourage users to decide whether to run an experiment or an optimization upfront and adopt the proper framework to understand their results.
As we continue incorporating feedback, we aim to refine our product so customers can understand their intent in using our features. And we did just that with our latest improvements.