Interpret your Optimizely Experimentation Results

Optimizely Web Experimentation
Optimizely Performance Edge
Optimizely Feature Experimentation
Optimizely Full Stack (Legacy)

When you publish an experiment or campaign, you can start checking your Optimizely Experiment Results page. Unlike many testing tools, Optimizely Experimentation's Stats Engine uses a statistical approach that lets you peek into results without introducing error.

The Experiment Results page is where you will find value in Optimizely Experimentation. To run a truly data-driven experimentation program, it is important to take time to review and interpret the data that you collect before deciding to take action.

Your experiment results—whether winning, losing, or inconclusive—are an incredibly valuable resource. The data on your Results page helps you learn about your visitors, make data-driven business decisions, and feed the iterative cycle of your experimentation program. Before stopping an experiment, really dig into your data to look for valuable insights beyond which variations won or lost.

This article provides some high-level tactics for investigating the results of your experiment.

A few quick tips

Use losing and inconclusive tests to learn more about what visitors expect and how you can provide it
Use winning variations to learn what changes generated desired outcomes—and why
Compare results to qualitative research and your hypothesis to bring your experiment full circle
Before stopping an experiment, check that you have gathered enough data for your business needs
Document and share takeaways—they are valuable resources

When you are done analyzing results, decide how to take action on winners, losers, and inconclusive results.

Materials to prepare

Test results.
Analytics data from other platforms.
Hypothesis or experiment plan.
Qualitative data (surveys, customer reports).

People and resources

Program manager.
Analyst.

Actions you perform

Segment results to look for patterns.
Check secondary and monitoring goals.
Consider seasonality or traffic spikes.
Check the difference interval.
Use the root cause analysis to evaluate why the test affected visitors' behaviors.

Deliverables

Key takeaways from the test (as inputs for your business intelligence report and general insights).

What to watch out for

An underdeveloped hypothesis makes it difficult to interpret results.
Bias towards certain outcomes can stand in the way of understanding the data.
Do not forget to document takeaways and communicate what you have learned.
This article is a part of the Optimization Methodology series.

If you are using Optimizely Experimentation to test on a checkout page, you might need to configure your site for PCI compliance.

Segment your results

Think of an experiment's overall results as an average across all visitors. Not all visitors behave like your average visitor. Segmenting your results (filtering results for specific audiences or attributes) is a powerful way to generate insights about your customers.

Different types of visitors have different goals on your site. You may find that a change that does not move the needle for most visitors is a huge hit with a certain subset. Conversely, an experience that lifts conversions across the board might also be bad for a particular group.

In the following example, in a Web Experimentation A/B Test, Variation #1 is a clear winner for the Form success primary metric.

But what if you segment for All Phones only, and you see that Variation #1 is a clear loser for the same metric?

Moreover, suppose Variation #1 is also a statistically significant loss for mobile phone visitors, even though it is not statistically significant for visitors overall, yet

At this point, you should investigate why Variation #1 is a bad experience for Mobile visitors and consider excluding them from the experiment going forward.

Analyze

Dig into default segments for Optimizely Web Experimentation , such as browser type or device type, and custom segments that are important to your business.

Here is what to look for

Do any segments of visitors behave differently from visitors overall?
What do you know about those visitors? Why do you think they respond differently?
What do your most valuable visitors prefer?

Imagine that you are testing a streamlined login process on your site. You test a Facebook login and see a significant lift across all visitors. But when you segment by browser type, the conversion rate for visitors using Internet Explorer is a statistically significant loss. Why?

Assuming that nothing is broken, start by considering what you already know. Maybe Internet Explorer visitors are likely to be older or to come from a professional services environment compared to Safari visitors (sometimes linked to higher-income or tech-savvy users). Are professional visitors less likely to log in with a personal account? Do older visitors hesitate before connecting through Facebook? Will you roll out the Facebook login as an option instead of a requirement? Will you personalize it for just the high-converting segments?

Segments and filters should only be used for data exploration, not making decisions.

Learn

Combine insights from segmenting results with other data, like results from previous experiments, direct data, and indirect data.

In the previous example, why did Mobile Visitors respond differently from other visitors? Is the text Call To Action (CTA) difficult to click on mobile? Is the pop-up CTA frustrating on a smaller screen?

In your next round of experiments, these insights serve as inputs for your direct data.

Share what you have done with your organization. Data-driven insights may benefit other teams, and you will help increase the impact of your program.

Check secondary and monitoring goals

Optimizely Experimentation lets you set a primary metric to measure success. Stats Engine weighs that primary metric differently, so it always reaches significance as soon as possible. Secondary and monitoring goals are all the goals in the experiment that are not the primary goal.

You should set secondary metrics to track conversions down the funnel. Monitoring goals helps you answer the question: Where am I optimizing this experience, and where, if anywhere, am I worsening it?

Here are a few questions to help you evaluate secondary goals

Where in your funnel do you see improvement or loss? Does a pattern emerge?
Is the exit rate higher on any step in the funnel than the original?
How does a significant lift or loss at a certain step correspond to changes you have made?

Here are a few questions to help you evaluate monitoring goals:

How does my test affect this monitoring goal?
Are there multiple monitoring goals? What story do these goals tell together?
How valuable is my primary goal compared to the metrics tracked by this monitoring goal?

An example

Imagine you are testing a more attention-grabbing CTA on your homepage. Your primary goal is to increase the number of clicks on the submit button. But you wonder how this change affects browsing behavior on the product categories page. If you track click events on the search button and pageviews on the product categories page, you can evaluate how your sign-up experiment affects purchase behavior. Did visitors sign up, then exit the site? Consider how this tradeoff affects key company metrics and the bottom line.

Evaluate all monitoring goals to look for warnings that you are cannibalizing another revenue path.

Secondary and monitoring goals provide a broad context for immediate lifts and losses. They help you guide your program towards a global maximum, so you do not end up refining small parts of your site in isolation. Keep your program focused on providing long-term value to your business.

If your test is taking longer to reach significance, take a look at your primary goal. Is it a high-signal goal?

High-signal and low-signal goals

A high-signal goal measures a behavior that is directly affected by the changes in your variation.
A low-signal goal is not directly impacted by your test.

For example, if you add a value proposition such as free shipping on your product details page, the Add-to-Cart click might be a high-signal goal. Clicks to navigation links or revenue at the end of the checkout funnel are low-signal goals; they are not the strongest indicators that your new offer works.

Stats Engine calculates your primary goal independently from secondary and monitoring goals; the primary goal will reach significance faster than if it were pooled with those goals. To ensure that your test reaches significance as quickly as possible, use your primary goal to measure a high-signal goal.

If you need to change your primary goal in the middle of your experiment, you can.But, you should not make this a regular practice. Stats Engine recalculates your test with all previous data as if the new goal were always the primary goal. The old primary goal is pooled with the secondary goals, so it will take longer to reach significance than otherwise.

To learn more about setting different types of goals, see Primary metrics, secondary metrics, and monitoring goals.

Seasonality and traffic spikes

Before you stop the test, check that you have captured all the necessary data.

If external events or traffic spikes are influencing your results, or if the difference interval of your statistically significant experiment is too large, consider letting your experiment run longer for a more comprehensive test.

You should run all tests a minimum of one business cycle (seven days) to ensure all kinds of user behavior are accounted for.

Sometimes, optimization teams focus experiments on high-traffic periods or seasons when they make the most money. Testing during traffic surges can help speed up optimization.

But there are a couple of things to watch out for. If you are testing promising experiences that are likely to generate lift, for instance, seasonal messages during the winter holidays, it might be more effective to translate those experiments into personalization campaigns. By focusing all testing on high-traffic or high-profit periods, you also risk missing part of the conversion cycle; your data will provide an incomplete picture.

For example, imagine you run tests on weekends because most of your visitors make purchases on Saturday and Sunday. If you limit your experiment window to the weekend, you assume that visitors encounter your variation and convert within the same period. But it can take multiple visits for a customer to convert.

Run your experiment on weekdays and weekends to capture data from the first interaction to the final conversion. Design your experiment to optimize the entire conversion and business cycle.

Generally, you should test across your full conversion cycle (at least seven days) and through peaks and troughs in traffic.

Broad difference interval

Sometimes, when your goal reaches statistical significance, the difference interval may still be relatively large. The confidence intervals and improvement intervals are a range of values where the difference between the original and variation actually lies; it tells you what you can expect your conversion rate to be if you run the test again.

For example, if a variation wins with a confidence interval of 0.1 to 10%, the lift you can expect if you run that variation is within that broad range of values. Decide whether the level of uncertainty is acceptable for your business before you decide to stop the test.

If your primary goal is revenue-generating, a narrower confidence interval can help you project the impact of this change more precisely. In other words, you would be able to predict whether your improvement is worth $1,000 or $1 million. If you are making a business case, such as asking for more developer resources to push changes live to your site, it can help to be more specific.

Segment your results to see if a certain subset of visitors is moving the needle. Create a new experiment that is targeted specifically to those visitors to see if you can recreate that lift. If this subset of visitors displays consistent behavior over time, your results will show improvement with a smaller confidence interval.

If the primary goal is engagement or user acquisition instead of revenue, a large confidence interval and more nebulous results may serve your purposes just as well. A more precise prediction may not make a difference. Because you know that the change led to a better experience in terms of overall conversions, you can feel comfortable pushing the changes live to your site.

When you have analyzed your results and documented what you have learned, you are ready to decide how to take action.

Interpret your Optimizely Experimentation Results

Segment your results

Analyze

Learn

Check secondary and monitoring goals

An example

High-signal and low-signal goals

Seasonality and traffic spikes

Broad difference interval

<%= previousTitle %>

<%= nextTitle %>

In this article

<%= heading %>

<% if (!block.description) { %> <%= block.name %> <% } else { %> <%= block.name %> <% } %>

<%= heading %>

<% if (!block.description) { %> <%= parsed.title %> <% } else { %> <%= parsed.title %> <% } %>

User Research

Security Announcements

Still have questions?

Categories

Toggle navigation menu

<%= category.name %>