- Optimizely Web Experimentation
- Optimizely Performance Edge
- Optimizely Feature Experimentation
- Optimizely Full Stack (Legacy)
When you run an experiment in Optimizely Experimentation in which variation A is identical to variation B, you are running what is known as an A/A experiment, rather than an A/B experiment. Such an experiment is called an A/A experiment because there is no "B" variation. The original page and variation are exactly the same. A/A experiments are also referred to as calibration testing.
A/A test and monitoring campaigns
An A/A test is different from a monitoring campaign. A monitoring campaign is an experiment without any variations.
- Monitoring campaign – The goal is to deliver content to visitors or determine the baseline conversion rate for a certain goal before you test.
- A/A test – The typical purpose is to validate your experiment configuration. Specifically, an A/A calibration test is a data reliability and quality assurance procedure to evaluate the implementation of all your experiment comparisons.
In most cases, the majority of your A/A calibration test results should show that the improvement between the identical baseline variations is statistically inconclusive. You should expect this inconclusive result because you made no changes to the original variation. However, when running a calibration test, it is important to understand that a difference in conversion rate between identical baseline variations is possible. The statistical significance of your result is a probability, not a certainty.
As with any experimental process, there is always some percentage of outcomes that turn out to be anomalies because an experiment calculates results on a random sample from the population of all visitors to your experiment. An experiment must make a judgment call on how large a trend indicates a true difference to identify any variation as significant. So, a large enough fake trend can make experiments look like there may be a true difference when none actually exists. The significance level controls this trade-off between identifying trends with significant results and seeing more errors.
Optimizely Experimentation's statistical approach
Statistical significance is a project-level setting that you can adjust up or down. You can adjust it based on your comfort level with statistical error. This also holds true for an A/A test. Even when there is no difference, there is a small chance that Optimizely Experimentation reports a result based on underlying trends in experiment data.
When you examine the results of your A/A test, you should see the following behavior:
- Your statistical significance stabilizes around a certain value over time.
- The confidence intervals for your experiment shrink as more data is collected, ruling out non-zero values.
- The baseline and variation might perform differently at different points in the test results. But neither should be declared a statistically significant winner indefinitely.
Statistical significance measures how unusual your experiment results would be if there were no difference between your variation and baseline, and the difference in lift was due to random chance alone. When you observe a lift with 90% or higher statistical significance, the observed results are more unusual than expected in 90% of the cases if there is no lift. Assuming no difference in performance between the variation and the baseline, the higher your statistical significance, the more unusual your results seem.
Interpret a regular A/B experiment
Pay attention to the statistical significance in your experiments, and be skeptical of implementing variations that do not reach your chosen significance level.
With Stats Engine, Optimizely Experimentation accurately represents the likelihood of error, regardless of when you look at your results. Ultimately, you may need to find the happy medium between declaring winning and losing variations with high confidence (thus requiring more visitors) and the opportunity cost of being able to run more experiments.
See Confidence intervals and improvement intervals for information.
Importance of unique variation keys
Each variation key must be unique to ensure that Optimizely can accurately track and differentiate between the variations being tested. Unique keys prevent data conflicts and ensure the results are correctly attributed to the respective variations.
Run an A/A test in Feature Experimentation
Create a flag
- Go to Flags > Create New Flag.
- Enter a unique name in the Name field. Valid keys contain alphanumeric characters, hyphens, and underscores, are limited to 64 characters, and cannot contain spaces.
- (Optional) Enter a Description.
- Click Create Flag to save your flag. To learn more, see Create feature flags.
Create flag variables
- Click Variables and click + Add Variable.
- Select the variable type.
- Edit the Variable Key and optionally update the Description and Default Value. You cannot modify the Variable Key after you save the variable.
- Repeat steps two through four to create as many variables as you need for that flag. For example, you can set the buttontext to Learn More and the buttoncolor to #FF0000.
- Click Save.
Create flag variations
- Click Variations > Add variations.
- Configure the variation with the following:
- Enter a Name.
- Enter a Key. You cannot change the key after you save the variation.
- (Optional) Enter a Description.
- Configure the Variables for the variation.
- Create two variations and ensure each has a unique key. For example, name them Variation A1 and Variation A2. Assign unique keys like key_A1 and key_A2 to each. Make sure that both variations have the same variable values.
- Click Save.
Create events
Events track the success of your experiment by recording specific user actions, such as clicks, pageviews, form submissions, purchases, and scroll depth.
Create an event to track your changes, to learn more, see Create events.
Create an A/B test rule
- Click Flags and select the flag that you created for the A/A test.
- Select the environment you want to target, such as Development.
- Click Add Rule > A/B Test.
- Enter a Name.
- The Key is automatically created based on the Name. You can optionally update it.
- (Optional) Click Add description to add a description. You should add your hypothesis for your A/A test rule as the description.
- (Optional) Search for and add audiences. To create an audience, see Target audiences. Audiences evaluate in the order in which you drag and drop them. You can match each user on any or all of the audience conditions.
- Set the Traffic Allocation to 100%.
- Choose the primary and secondary metrics you want to track during the test. These metrics should be the same for both variations to ensure consistency in data collection.
- Set the Distribution Mode to Manual.
- Choose the flag variations you created, specifically Variation A1 and Variation A2, and set their distribution to 50%.
- Click Save.
Start your rule and flag (ruleset)
After creating your A/A test, you should implement the A/A test using Feature Experimentation decide methods. Then, you can start your test.
- Click Run on your A/B test rule.
- Click Ok on the Ready to Run Status page. This alerts you that your rule is set to Ready to run until you start your ruleset (flag).
-
Click Run to start your flag and rule.
Please sign in to leave a comment.