- Optimizely Web Experimentation
- Optimizely Web Personalization
- Optimizely Performance Edge
- Optimizely Feature Experimentation
- Optimizely Full Stack
Once you have analyzed your results, you are ready to take action. This is the big moment!
Winning, losing, and inconclusive variations are opportunities to make decisions about your business, through data. If you find winning variations among your experiments, you will decide which changes to publish to your site, and how.
Losing and inconclusive variations present another set of valuable opportunities: to learn from your results, hone in on expectations that your site is failing to meet, and conduct proof-of-concept tests before committing resources to an unproven idea. These types of results may not surface quick wins, but they focus your testing and keep you oriented toward long-term success.
This article provides steps to iterate on winning, losing, and inconclusive experiments. Use it to turn your data into action.
If you would like to take a step back and dig deeper into your results, check out this article on interpreting your data.
Definitions: winning, losing, and inconclusive
What is a winning, losing, or inconclusive variation?
- Winning – When at least one experiment variation shows a statistically significant positive difference (% Improvement) from the baseline conversion rate for the primary goal, but potentially for other goals as well.
- Losing – When all experiment variations show statistically significant negative differences from the original, for your primary goal and potentially other goals as well.
- Inconclusive – When the performance for all variations is relatively equal, showing no statistically significant positive or negative results for your primary goal.
Below, we suggest how to take action on each of these types of results.
One winning variation
When your experiment finishes, and you see a single clear winner, you should first evaluate if you should implement the variation or test again. If you decide to implement the variation, you should work with a developer to implement the variation permanently in your code.
Evaluate your variations and decide if you should implement it:
- Implement changes – Experiments that test messaging, imagery, and content prioritization are naturally good candidates for expansion. Translate insights from these wins to other areas of your web site or even to other domains if your testing program spans multiple sites.
- Experiment more – Sometimes, winning test results present an opportunity to keep optimizing along with a trend. This might be the case if your experiment generated an unexpectedly significant improvement or if a dramatic structural change (versus a minor cosmetic tweak) produced a winning result. If this is the case, continue to iterate on the winning variation for opportunities.
- Move on to other ideas – Some tests are more prone to see diminishing returns from continued mining than others. For example, a button color changed from low to high contrast may not show much further impact on user behavior.
Do not implement – There are a couple of situations where implementing a winning variation may not move the needle in the right direction for your business.
- If the variation does not help your business goals, you should not implement the change. For example, imagine that you run a test to optimize your conversion funnel. The winning variation increases conversions on the product details page, but overall submissions stay the same. Before you implement this variation, step back to consider the bigger picture. Are there opportunities to optimize further down the funnel? If so, consider running a second test to improve conversions down the funnel before committing to permanent changes.
If your changes result in a misleading message, and visitors who click through immediately bounce, do not put that change into production. It is unlikely to make a true impact on your most important metrics. Worse, you may create a perception that your site presents misleading propositions. And you risk slowing down future tests on the funnel. Visitors who click through with no possibility of purchase increase the noisiness of any signal in the funnel—making it difficult to achieve statistically significant results.
If you ran your test during a tumultuous time, try it again before acting on the results. External factors may lead you to optimize your site for a certain type of visitor or situation. Optimizely Experimentation recommends running all tests a minimum of one business cycle (seven days) to ensure all kinds of user behavior are accounted for. Consider running the test again to account for seasonality and certain traffic spikes.
Multiple winning variations
You may see multiple winning variations in a single test. This is a good outcome, but what do you do with all of these winning variations?
- Combine the ideas – Try combining multiple winning variations into a single optimized experience. By combining your winning ideas, you optimize and iterate based on multiple data points at once. When you are trying to test what resonates with visitors the most, mix existing themes in new combinations to find the best experience.
- Expand on one idea – If you see a promising trend in one of your variations, consider taking a little time exploring that opportunity.
- Implement the highest winning variation – When speed is a concern and you want to implement results quickly, you can choose to implement the highest-winning variation in your code. But before you do, segment your results to check how your most valuable customers respond to the change. If those important visitors prefer the next-winning variation, weigh the business impact of those two choices.
Some teams try to avoid losing tests, but experiments with losing variations are not bad. They are often just as actionable as winning variations, and they provide focused and valuable insights about your visitors' behaviors that help to guide your optimization efforts in the long run. The most valuable result of A/B testing is learning more about the customer.
Pay attention to your losing variations. Dig deep into the results of losing tests to find out:
Why do visitors who see this experience convert less often?
Do certain visitors convert significantly better (or worse) than others?
Can you use this data to brainstorm new hypotheses about how visitors respond to your site experience?
What can you optimize, based on this information?
How can you leverage this insight?
Use what you have learned to plan a new, bolder experiment or make major decisions about the direction of a redesign.
Here are a few more ways to handle a losing variation, after you analyze it:
- Try it again, with some attitude – If you have generated a statistically significant result, even if that result is negative, you have still shown an ability to affect your visitors' behavior with your experiment.
- Move on – Sometimes, the thing you are trying to test is already relatively "optimized" (though no site is ever completely optimized). Continuing to test the same thing will not only generate more negative results, but you will also encounter an additional opportunity cost for not testing something with clearer potential to generate positive results. This might be true if you have already iterated multiple times on the same experiment idea, or if you have been running tests on one area of your site for an extended period.
- Sometimes a loser, is actually a winner – Sometimes, you will have the case where a losing variation actually drives advantageous business results. For example, if you test a prototype of a costly redesign, and it loses, you can avoid committing to the full project. Prototyping and testing can generate benefits by avoiding costly investments.
Sometimes, when you check your experiment at the end of the projected time-to-results, your test has not reached statistical significance. You know that inconclusive tests can provide valuable information. Waiting longer will help you gather more data—but how long should you wait? What is the potential impact of this test, compared to the next experiment you might run?
Inconclusive tests are the most confusing type of test result. But they also generally point in a clear direction: go bigger!
Many test executions simply are not dramatic enough to impact results. Organizational fear and a lack of resources are the biggest hurdles to testing more significant changes.
Here are a couple of ways to go bigger:
- Test more than one element at a time – Although you may sacrifice some empirical rigor by testing more things at once, your single greatest responsibility is to move the needle; you can focus on getting better at establishing why things are happening once you have gotten that lift you have been looking for.
- Increase the degree of drama – Make your variations clearly and visibly different. Making more substantial attempts at changing the environment will yield proportionally stronger results.
In certain situations, you may wish to take action on an experiment that has run for the projected time but has yet to reach significance. Taking action is a business decision based on the Results page data and your business goals. Below, we discuss a few tactics that may help.
Check Visitors Remaining
Visitors remaining estimates how long your experiment needs to run to reach significance, based on traffic, conversion rate, and the number of variations. Specifically, visitors remaining estimates the number of visitors needed for a particular variation. If your conversion rates for the original and that variation stay the same, you would need X number of visitors for Stats Engine to call a significant result.
But, if the conversion rates start to change, your visitors remaining will also adjust. A ballpark estimate based on visitors remaining will help you make a business decision on how long to wait.
To decide whether to keep running the test, compare the visitors remaining to the number of visitors projected in your experiment test plan.
Wait for statistical significance if:
You have not reached the number of visitors predicted in your experiment test plan.
Your visitors remaining suggests you do not have long to wait.
Declare the test inconclusive if:
- You have exceeded the number of visitors you planned for, and visitors remaining suggests you will not reach significance anytime soon.
Imagine, for example, that you want to be confident that the 5% lift you see is real. And you would likely have to wait another two weeks to gather enough data to say that the likelihood that lift is false is less than 5% (or 95% statistical significance). You could keep running the test—but it is probably not worth it. Together, the data for visitors remaining, your experiment test plan, and the lift you expect to see already indicate that the change you made did not change your visitors' behavior in a significant way.
At this point, segment your results and check your secondary and monitoring metrics to look for insights for the next round of tests.
When visitors remaining does not provide a ready answer, check the confidence interval to help you interpret your results.
Check the Confidence Interval
The Confidence Interval can help you decide whether to keep running a test or move on to another idea. The interval always straddles zero for experiments that have not reached significance yet.
As Stats Engine gathers more data from visitors' behaviors, the confidence band expands and contracts. The more erratic visitor behaviors are, the less sure Stats Engine is about future results, and the interval widens. A narrow difference interval indicates that Stats Engine is honing in on the likelihood that the variation will convert at the same rate in the future.
If your results are inconclusive, your difference interval can provide insight when paired with visitors remaining.
Segment Your Results
Segments are one of the analyst's most important tools for digging into overall performance metrics. Your total performance can be thought of as an average population, but different sub-groups may have different goals—and different conversion rates. Dig into your segments to discover whether there is a group that responds to your experiment differently from the average population. If there is a segment that responds differently to your experiment, that may lead you to develop a personalization strategy.