- Optimizely Web Experimentation
- Optimizely Performance Edge
- Optimizely Feature Experimentation
- Optimizely Full Stack (Legacy)
Choosing the right metrics help you validate or disprove your hypothesis and ensure that you are making progress towards your overall business goals.
In experience optimization, metrics play different roles depending on where you set them and what you want them to tell you. Below, we walk you through each role type: primary and secondary metrics and monitoring goals.
- Primary metric – Determines whether the test "wins" or "loses"—it tracks how your changes affect your visitors’ behaviors.
- Secondary metrics – Provide additional information about your visitors’ behavior in the vicinity of your change and across your site.
- Monitoring goals – All goals and events that are not your primary or secondary metrics. These monitoring goals have no impact on the speed of secondary metrics and no impact on the speed of the primary metric.
Here are a few general tips for setting goals and events:
Focus on a direct visitor action that is on the same page as the changes you made.
Consider how your changes affect other parts of your site. Set goals and events to measure potential interaction effects so you know if your test is truly moves customers in the right direction.
Place different types of goals at different points in your funnel to gather timely data about your visitors’ behavior.
Optimizely Experimentation lets you set a primary metric for each experiment to determine its success. It is the most important goal of the experiment and decides whether your hypothesis is proven or disproven.
In Optimizely Experimentation, the primary metric will always achieve statistical significance at full speed, regardless of any other goals or events added. Stats Engine treats the primary metric separately because it is the most important and tells you whether your hypothesis is supported.
In general, the more goals and variations you include in an experiment, the longer each will take to reach significance. For this reason, it is important to be mindful in distinguishing a primary metric from secondary metrics and monitoring goals. Stats Engine corrects for false discovery rate to help you make better business decisions.
When choosing a primary metric, ask yourself these questions:
- What visitor action indicates that this variation is a success? – Often, the best path is to measure the action that visitors take as a direct result of this test.
- Does this event directly measure the behavior you are trying to influence? – Use an action that is directly affected by your change to decide whether your test helped or hurt.
- Does the event fully capture the behavior you are trying to influence? – Consider whether your primary metric fully captures the behavior you are trying to influence. What is the best way to capture the change?
Imagine you are testing the design and placement of an Add-to-Cart button. Your business cares about revenue, but it is measured five pages down the funnel. You are likely to devote a large amount of traffic to this test and you risk an inconclusive result.
You decide to measure clicks to the Add-to-Cart on product pages instead. It is a primary metric that is directly affected by the changes you made. And with a goal tree, you know that this metric rolls directly up to company goals.
Suppose you are testing bulk discounts on your site. Your primary metric might be conversion rate or it might be average order value (AOV). Neither metric fully accounts for the behavior you are trying to affect.
The conversion rate could rise as customers are incentivized or decrease as customers wait to create large, discounted orders. AOV could rise as customers buy more in bulk or decrease as discounts take the place of full-price orders.
From this perspective, revenue-per-visitor is the best metric. It equals the conversion rate (how often customers purchase) multiplied by the AOV (how much they spend). It is the best overarching goal in this test, where smaller goals may provide conflicting information.
Secondary metrics help you gather insights that are key to long-term success. In Optimizely Experimentation, the metrics that you rank from two to five are secondary metrics.
Secondary metrics track long-distance events and more ambitious metrics. End-of-funnel events like order value and order confirmation make excellent secondary metrics because they provide valuable information but are generally slower to reach significance. If you do not make these long-term wins your primary metric, you do not have to wait.
Secondary metrics are also useful for gaining visibility across the different steps of your funnel. For example, if you make a change to your product page and display shipping costs, your secondary metric might measure the change in drop-offs from the shipping page in your funnel. In general, use secondary metrics to learn when visitors drop off or navigate back to the home page and how these patterns compare between the original and variations.
Here is a list of common secondary metrics and the reason for tracking:
- Searches submitted – See how many searches are submitted.
- Category pageview – Discover whether visitors navigate to the site via category pages.
- Subcategory pageview – Learn whether visitors reach subcategory pages.
- Product pageview – Know the percentage of visitors who do or do not view a product during a visit.
- Add-to-cart – Understand what percentage of visitors add-to-cart per test, category, or product type.
- Shopping cart pageview – See how many visitors progress to the shopping cart.
- Checkout pageview – Understand how many visitors continue from the shopping cart to checkout.
- Payment pageview – Learn what percentage of visitors continue from checkout to payment.
- Conversion rate – Know what percentage of visitors ultimately convert or complete payment.
If you are using Optimizely Web Experimentation to test on a checkout page, you might need to configure your site for PCI compliance.
Estimate time to statistical significance for multiple secondary metrics
Want to estimate how much longer it will take for multiple secondary metrics to reach statistical significance? Here is an easy back-of-the-envelope method.
In Optimizely Experimentation's Sample Size Calculator, fill out your baseline conversion rate and minimum detectable effect (MDE) as usual. For the statistical significance threshold, enter 100 - (100 - S)/N, where S is your desired threshold (default is 90), and N is the number of metrics multiplied by variations other than baseline.
For example, if you are running an experiment with 2 metrics and 2 variations plus a baseline, at 90 significance, your secondary metric will require the number of visitors it takes to reach 100 - (100 - 90)/(2*2) = 97.5 significance with 1 goal and 1 variation.
This is an upper bound on the number of visitors you will need on average, which means you will likely see significance sooner.
Monitoring goals are all goals and events that are not your primary or secondary metrics. Like secondary metrics, monitoring goals help you gather insights that are key to long-term success, but they are diagnostic. They do not impact on the speed of secondary metrics nor impact the speed of the primary metric.
Monitoring goals track whether your experiment is truly moving visitors in the right direction. Every time you create an experiment, you are trying to optimize the user experience to improve a business outcome. But your change might also create adverse effects in another metric. Monitoring goals help you answer the question, "Where am I optimizing this experience, and where (if anywhere) am I worsening it?" Monitoring goals form a warning system that alerts you when you are cannibalizing another revenue path.
For example, imagine that you show visitors more products on the product category page. With your primary metric, you find that people view more products as a result. Here are some other questions you might wonder at the same time, with the monitoring goal that can help you find out:
- Are people more price-conservative when initially present with more products? – Average order value
- Are people actually buying more products? – Conversion rate
- Are people frustrated and unable to find what they are looking for? – Subcategory filters
Here is a list of common monitoring goals and their reason for tracking:
- Search bar opened – Learn what percentage of search bar interactions do not lead to submissions.
- Top menu Clickthrough rate (CTR) – Discover how often visitors navigate via the top menu per page or step in funnel.
- Home page CTR – See how often visitors exit to the Home page from any given page.
- Category page filter usage – Understand the frequency of filter usage.
- Product page quantity selection – Understand the percentage of visitors who interact with quantity selection.
- Product page more info – Understand how many visitors seek more information about a product
- Product page tabs – Discover how often visitors interact with each tab.
- Payment type chosen – See which payment type users prefer, per experiment.
- Return/back button CTR – Learn how often visitors exit a page via a particular button.
Stats Engine approach to metrics and goals
When you run an experiment with many variations and metrics, there is a greater chance that some of them will give false positive results, when your test data shows a significant difference between your original and your variation, but it is actually random noise in the data—there is no underlying difference between your original and your variation. In other words, it is harder to declare winners when there are many variations and metrics.
Stats Engine uses false discovery rate control to address this issue and reduce your chance of making an incorrect business decision or implementing a false positive among conclusive results. As a result, Stats Engine becomes more conservative when you add more metrics to an experiment.
Here is how Stats Engine ranks primary and secondary metrics and monitoring goals:
The primary metric, independent of all of the other metrics
Secondary metrics as an independent group of up to four metrics (metrics #2-5)
All monitoring goals together as one group (metrics #6+)
Revenue goals and skew correction with Stats Engine
In general, Stats Engine works the same way for revenue-per-visitor goals as it does for other goals. You can look at your results any time and get an accurate assessment of your error rates on winners and losers, as well as confidence intervals on the average revenue per visitor (RPV).
However, when interpreting your results for RPV goals, there are some differences of which you should be aware.
Testing for a difference in average revenue between a variation and baseline is more challenging than testing for a difference in conversion rates. This is because revenue distributions tend to be heavily right-tailed or skewed. This skewness impedes the distributional results that many techniques rely on, including t-tests and Stats Engine. The practical implication is that they end up having less power or are less able to detect differences in average revenue when those differences actually exist.
Optimizely Experimentation's Stats Engine regains some of this lost power through skew correction. Skew corrections were specifically designed to work well with all other aspects of Stats Engine.
Thanks to skew correction, confidence intervals for continuously-valued goals are no longer symmetric about their currently observed effect size. The underlying skewness of the distributions are now correctly factored into the shape of the confidence interval. Additionally, detecting differences in average revenue is more reasonable for the types of visitor counts that Optimizely Experimentation customers regularly see in A/B tests.
Strategies for metrics
Use the strategies described in this section to help you decide what metrics to use for your experiments.
Consider speed and impact
Think of your primary metric in terms of distance. In a funnel, the most immediate effects are directly downstream from the changes you made. The closer an event is to the change, the louder the signal and the bigger the measurable impact. As you move downstream, the signal starts to fade as visitors from different paths and motivations enter the stream. At the end of the funnel, the effect may be too faint to measure.
Remember, all other things being equal, metrics that have a lower conversion rate require more visitors to reach statistical significance. Events that are further from the page you are testing will have lower improvement in conversion rates due to your variation as visitors enter from different paths, leave the site before they convert, and more. If this is the case, your test will take longer to reach significance.
Instead, consider setting a primary metric on the same page as your change. The impact of your change will be picked up immediately, so you will quickly find a winning variation. Quick wins help generate credibility and interest in your testing program and provide fast, reliable insights about how your visitors behave. By focusing on small, grounded wins, you build a testing program that is data-rich and can quickly iterate on the insights it generates.
Ambitious, program-level events like revenue and conversion rate make excellent secondary metrics and help keep your program focused on long-term success.
Choose high-signal goals
As we mentioned in the primary metric section, Optimizely Experimentation's Stats Engine reacts to the number of goals or events and variations in your experiment to align statistical significance with your risk in making business decisions from experiments. We also mentioned that significance takes longer to achieve when there are more goals or events and variations in an experiment and that your primary metric is exempt from this slowdown. Here, we will add to the story.
Adding more goals or events and variations to your experiment increases your chance of implementing a falsely significant result with traditional statistics, and this is what Stats Engine corrects with False discovery rate control. However, not all goals and events are equal. High-signal goals and events—those you believe will be most affected by your variations—are less likely to contribute to false discoveries. This is because high-signal goals and events are usually less noisy, so it is easier to tell if your variation is having an effect on them.
One analogy to think of is to consider experimentation with multiple goals or events and variations as trying to pick needles of true differences from a haystack of noise. It is easier to find a large (high-signal) needle in the haystack than a small (low-signal) needle.
Similarly, with Stats Engine, the more high-signal goals in your experiment, the faster all your secondary metrics and monitoring goals will reach significance. If speed to significance is a concern for your organization, consider limiting the number of non-primary metrics in your experiment and focusing on goals or events that you believe are related to your variations.
Of course, you are free to add many metrics to your experiments. The strength of Stats Engine is that you will not be exposed to higher error rates, but the cost of broad, undirected exploration is longer time to significance.