Take action based on the results of an experiment

  • Updated
  • Optimizely Web Experimentation
  • Optimizely Personalization
  • Optimizely Performance Edge
  • Optimizely Feature Experimentation
  • Optimizely Full Stack

When you have analyzed your results, you are ready to take action. 

Winning, losing, and inconclusive variations are opportunities to make decisions about your business through data. If you find winning variations among your experiments, you decide which changes to publish to your site and how.

Losing and inconclusive variations present another set of valuable opportunities. To learn from your results, hone in on expectations your site fails to meet and conduct proof-of-concept tests before committing resources to an unproven idea. These types of results may not surface quick wins, but they focus your testing and keep you oriented toward long-term success.

See Interpret your Optimizely Experimentation Results.

Winning, losing, and inconclusive experiments 

  • Positive – At least one variation shows a statistically significant positive difference (% Improvement) from the baseline conversion rate for the primary goal and possibly other goals as well.
  • Negative – All variations show statistically significant negative differences from the original for your primary goal and potentially other goals as well.
  • Inconclusive – All variations have relatively equal performance, showing no statistically significant positive or negative results for your primary goal.

See Manage experiments and campaigns for Web Experimentation, Personalization, and Performance Edge.

One winning variation

When your experiment finishes and you see a clear winner, you should first evaluate whether you should implement the variation or test again. If you decide to implement the variation, what to do next depends on your Experimentation product. If you are using:

  • Web Experimentation, Personalization, Performance Edge – Work with a developer to permanently implement the variation in your code.
    If you decide to implement your changes, but your developer cannot implement them quickly, you should not update your current experiment's traffic distribution to the winning variation. Instead, pause the currently running experiment, duplicate it, and then set the entire traffic allocation in your new experiment to the winning variation. See Send all traffic to a winning variation.
  • Feature Experimentation – Conclude your rule and deploy your winning variation. See new rule statuses. See the walkthrough to see concluding an experiment in action.
    Flag lifecycle management is in beta.

Evaluate your variations and decide if you should:

  • Implement changes – Experiments that test messaging, imagery, and content prioritization are good candidates for expansion. Translate insights from these wins to other areas of your website or other domains if your testing program spans multiple sites.
  • Experiment more – Sometimes, winning test results lets you keep optimizing along with a trend. This might be the case if your experiment generated an unexpectedly significant improvement or if a dramatic structural change (instead of a minor cosmetic tweak) produced a winning result. If so, continue to iterate on the winning variation for opportunities.
  • Move on to other ideas – Some tests may see more diminishing returns from continued mining than others. For example, a button color changed from low to high contrast may not show much further impact on user behavior.
  • Do not implement – 
    • If the variation does not help your business goals. For example, you run a test to optimize your conversion funnel. The winning variation increases conversions on the product details page, but overall submissions stay the same. Before you implement this variation, consider the overall situation. Are there opportunities to optimize further down the funnel? If so, consider running a second test to improve conversions down the funnel before committing to permanent changes.
    • If your changes result in a misleading message and visitors who click through immediately bounce, do not put that change into production. It is unlikely to impact on your most important metrics. Worse, you may create a perception that your site presents misleading propositions, and you risk slowing down future tests on the funnel. Visitors who click through with no possibility of purchase increase the noisiness of any signal in the funnel—making it difficult to achieve statistically significant results.

If you run your test during a tumultuous time, try it again before acting on the results. External factors may lead you to optimize your site for a certain type of visitor or situation. You should run your tests for at least one business cycle (seven days) to ensure all user behavior is accounted for. Consider running the test again to account for seasonality and certain traffic spikes

Multiple winning variations

You may see multiple winning variations in a single test. This is a good outcome, but what do you do with all of these winning variations?

  • Combine the ideas – Try combining multiple winning variations into a single optimized experience. By combining your winning ideas, you optimize and iterate based on multiple data points at once. When you are trying to test what resonates with visitors the most, mix existing themes in new combinations to find the best experience.
  • Expand on one idea – If you see a promising trend in one of your variations, consider taking a little time exploring that opportunity. 
  • Implement the highest winning variation – When speed is a concern, and you want to implement results quickly, you can choose to implement the highest-winning variation in your code. But before you do, segment your results to check how your most valuable customers respond to the change. If those important visitors prefer the next-winning variation, weigh the business impact of those two choices. 

Losing variation

Experiments with losing variations are not bad. They are often just as actionable as winning variations, and they provide focused and valuable insights about your visitors' behaviors that help to guide your optimization efforts in the long run. The most valuable result of A/B testing is learning more about the customer.

Pay attention to your losing variations. Dig deep into the results of losing tests to find out:

  • Why do visitors who see this experience convert less often?

  • Do certain visitors convert significantly better (or worse) than others?

  • Can you use this data to brainstorm new hypotheses about how visitors respond to your site experience?

    • What can you optimize based on this information?

    • How can you leverage this insight? 

Use what you have learned to plan a new, bolder experiment or make major decisions about the direction of a redesign.

Here are a few more ways to handle a losing variation after you analyze it:

  • Try it again, with some attitude – If you have generated a statistically significant result, even if that result is negative, you have still shown an ability to affect your visitors' behavior with your experiment.
  • Move on – Sometimes, the thing you are trying to test is already relatively "optimized" (though no site is ever completely optimized). Continuing to test the same thing will not only generate more negative results, but you will also encounter an additional opportunity cost for not testing something with clearer potential to generate positive results. This might be true if you have already iterated multiple times on the same experiment idea or if you were running tests on one area of your site for an extended period.
  • Sometimes a loser is a winner – A losing variation may provide advantageous business results. For example, if you test a prototype of a costly redesign and it loses, you can avoid committing to the full project. Prototyping and testing can generate benefits by avoiding costly investments.

Inconclusive experiment

When you check your experiment at the end of the projected time-to-results, your test may not have reached statistical significance. Inconclusive tests can provide valuable information. Waiting longer helps you gather more data—but how long should you wait? What is the potential impact of this test compared to the next experiment you might run?

Inconclusive tests are the most confusing type of test result, but they also show you to test bigger.

Many test executions are not dramatic enough to impact results. Organizational fear and a lack of resources are the biggest hurdles to testing more significant changes.

Here are a couple of ways to go bigger:

  • Test more than one element at a time – Although you may sacrifice some empirical rigor by testing more things at once, your single greatest responsibility is to move the needle; you can focus on getting better at establishing why things are happening when you have gotten the lift you were looking for.
  • Increase the degree of drama – Make your variations clearly and visibly different. Making more substantial attempts to change the environment will yield proportionally stronger results. 

In certain situations, you may want to take action on an experiment that has run for the projected time but has yet to reach significance. Taking action is a business decision based on the Experiment Results page data and your business goals. Review the following sections to learn a few tactics that may help you decide what to do with your inconclusive experiment.

Check Visitors Remaining

The number of visitors remaining estimates how long your experiment needs to run to reach significance based on traffic, conversion rate, and the number of variations. Specifically, visitors remaining estimates the number of visitors needed for a particular variation. If your conversion rates for the original (baseline) and that variation stay the same, you would need X number of visitors for Stats Engine to call a significant result.

visitors-remaining.png

But, if the conversion rates change, your visitors remaining will also adjust. A ballpark estimate based on visitors remaining will help you make a business decision on how long to wait.

To decide whether to keep running the test, compare the visitors remaining to the number of visitors projected in your experiment test plan.

Wait for statistical significance if:

  • You have not reached the number of visitors predicted in your experiment test plan.

  • Your visitors remaining suggests you do not have long to wait.

Declare the test inconclusive if: 

  • You have exceeded the number of visitors you planned for, and the visitors remaining suggests you will not reach significance anytime soon.

For example, you want to be confident that the 5% lift you see is real, and you would likely have to wait another two weeks to gather enough data to say that the likelihood that the lift is false is less than 5% (or 95% statistical significance). You could keep running the test—but it is probably not worth it. Considering the data for visitors remaining, your experiment test plan, and the lift you expect to see indicate that the change you made did not change your visitors' behavior in a significant way.

At this point, segment your results and check your secondary and monitoring metrics to look for insights for the next round of tests.

When visitors remaining does not provide a ready answer, check the confidence interval to help you interpret your results.

Check the Confidence Interval

The Confidence Interval can help you decide whether to keep running a test or move on to another idea. The interval straddles zero for experiments that have not reached significance yet.

confidence interval.png

The confidence interval expands and contracts as Stats Engine gathers more data from visitors' behaviors. The more erratic visitor behaviors are, the less sure Stats Engine is about future results, and the interval widens. A narrow difference interval indicates that Stats Engine is honing in on the likelihood that the variation will convert at the same rate in the future. 

If your results are inconclusive, your difference interval can provide insight when paired with visitors remaining.

Segment Your Results

Segments are one of the most important tools for digging into overall performance metrics. Your total performance can be considered an average population, but different sub-groups may have different goals and conversion rates. Dig into your segments to discover whether a group responds to your experiment differently from the average population. If there is a segment that responds differently to your experiment, that may lead you to develop a personalization strategy

Segments and filters should only be used for data exploration, not making decisions