Optimizely Warehouse-Native Experimentation Analytics enhances your experimentation results by evaluating experiments against data that lives in your data warehouse. This ensures your data warehouse remains the single source of truth, keeping your data secure and centralized. With Analytics’s warehouse-native architecture, you can seamlessly use existing data structures, letting you efficiently and effectively model your data to support your experimentation analytics needs.
This article contains the steps you must perform before you dive into Warehouse-Native Experimentation Analytics: connect your data warehouse, create necessary datasets, and have experiments in Optimizely Feature Experimentation or Web Experimentation.
Prerequisites
- Optimizely account ID in the Warehouse-Native Analytics app settings. To add your account ID, send an email to support@optimizely.com.
- An account in Optimizely Feature Experimentation or Optimizely Web Experimentation.
- An experiment in Optimizely Feature Experimentation or Optimizely Web Experimentation if you want to view and analyze experiments.
Data warehouse configurations
Optimizely Analytics supports Snowflake, BigQuery, Databricks, and Amazon Redshift.
Create a service account with read/write schema
Optimizely Analytics uses the service account to query the warehouse. The schema is used to improve performance by caching common computations, such as the drop-down list of unique values in a column. This substantially improves the warehouse cost and performance.
- Create a service account for Optimizely Analytics in your data warehouse of choice (Snowflake, BigQuery, Databricks, or Amazon Redshift) to query the warehouse.
- Add a schema with read/write access granted.
- Note the database and schema name for enabling materialization in Optimizely Analytics later.
Configure your data warehouse for Optimizely Analytics
Setting up your data warehouse for analytics is a crucial step in ensuring seamless data integration and analysis. This process involves configuring your warehouse to efficiently handle data operations and connect with Optimizely Analytics. Learn how to configure the following warehouses:
- Configure your Snowflake warehouse
- Configure your BigQuery warehouse
- Configure your Amazon Redshift warehouse
- Configure your Databricks warehouse
Send Optimizely data to your warehouse
Using real-time or batch processing methods, you can send decision and conversion events to your data warehouse. Integrating experimentation decision events into your data warehouse is essential for centralizing your data, enabling more robust analytics, and data-driven decision-making.
- Send data to your Snowflake warehouse
- Send data to your BigQuery warehouse
- Send data to your Databricks warehouse
- Send data to your Amazon Redshift warehouse
Connect Optimizely Analytics to your Warehouse
Connect your data warehouse
Create a connection to your data warehouse. The following warehouse options are available:
- Connect your Snowflake warehouse
- Connect your BigQuery warehouse
- Connect your Amazon Redshift warehouse
- Connect your Databricks warehouse
Enable Materialization
Analytics creates materialized tables in the data warehouse, which contain intermediate results for improving performance.
-
Go to Settings > General Settings > Materialization and enable the feature.
- Configure the following fields:
- Database – The database name in the data warehouse where the materialized tables are created.
- Schema – The schema's name in the database where the materialized tables are created.
-
Refresh Cron Schedule – The refresh periodicity of the materialized tables using the cron syntax. The recommended periodicity is daily, so the schedule is
0 0 * * *.
Create datasets and establish relationships between them
Analytics uses three datasets to generate insights into experimentation: a decision dataset, an actor dataset, and an event dataset. You must create these three datasets and link them together.
Create a decision dataset
A decision dataset effectively populates the experiments list in Optimizely Analytics. It serves as a foundation for analyzing experimentation decisions and outcomes.
- Go to Data > Datasets > + New Dataset > Source Dataset.
- Select a data table on the Pick a source page and click Save.
- Select Decision stream from the drop-down list in the Semantics tab.
- Configure the following fields for the decision table – Actor dataset, Experiment ID, Experiment name, Variation ID, Is holdback, Timestamp, and Custom partition time column(optional).
- Actor dataset – The actor corresponding to the identifier used for experiment variation decisions, which is typically a user. The decision dataset and the event dataset must have a many-to-one relationship with this actor dataset.
- Experiment ID – The experiment ID used by Optimizely. ID is preferred, but if it is not available, leave it blank and populate the experiment name or the rule key.
- Rule key – The rule key assigned by Optimizely to identify experiments. This is required only if the Experiment ID is not available.
- Experiment name – The experiment name used by Optimizely. This is required only if the Experiment ID and the rule key are not available.
- Variation – The variation ID used by Optimizely. The variation ID is preferred, but if it is unavailable, you can select the variation name or variation key as an alternative.
- Timestamp – The time at which the decision was made.
- Is holdback (optional) – The boolean column that indicates decisions that should be excluded from the experiment.
- Custom partition time column (optional) – The option you use when the decision table is partitioned by a timestamp or date column other than the decision timestamp specified. When specified, each query on the decision dataset includes a filter on the custom partition time column in addition to the decision timestamp column. The before and after skew settings let the filter on the custom partition time column to be wider than the time range selected by the user. For example, with the default settings of 1 day before and after, a user-specified time range of November 10 to 15 would be expanded to November 9 to 16.
Learn how to create datasets in Optimizely Analytics.
Experiment on anything (optional)
You can also analyze any experiments run outside of Optimizely by connecting the data from your own data warehouse. Whether your experiments are conducted through in-house systems or other feature flagging tools, you can evaluate the results with Optimizely’s Stats Engine by uploading or connecting third-party datasets to Optimizely Analytics.
Optimizely Analytics must be able to distinguish between Optimizely-sourced decision datasets and external decision datasets, and treat any non-Optimizely labeled decision datasets as originating from your data warehouse.
Once your third-party decision dataset is connected and configured, you can use it in a similar fashion to Optimizely-sourced datasets to create an Experiment Scorecard, view experiments, and explore results.
To label Optimizely-sourced decision datasets,
- Go to Settings > General Settings.
- Enter your Optimizely Account ID and select the Optimizely Decisions dataset. This labels your Optimizely-sourced decision datasets for the system. Except for the dataset you select here, all other source datasets that have Decision Stream selected in the Semantics setting are assumed to originate from your data warehouse.
Create an event dataset
You must create an event dataset to calculate experiment success metrics in Optimizely Analytics.
Events represent actions taken by users, such as clicks, page views, or purchases. By tracking these events, you can gain insight into what users are doing within your product. By examining sequences of events, you can analyze user journeys and identify patterns, bottlenecks, or drop-off points.
If the data you want to evaluate your experiments against lives in one table, create a source dataset. If you want to create a logical combination of multiple warehouse tables exposed as a single dataset, often used to unify event tables by type, create a union dataset.
Create an actor dataset
Actors represent the users themselves, and datasets about them can include attributes like demographics, preferences, or account details. This helps in segmenting users for targeted analysis.
If you have a users table, you can use it to create an actor dataset to track unique visitors to experiments and link your Users table to your decision dataset using the user's unique identifier.
If you do not have a users table, you can use an identifier column to create a column actor dataset, such as the visitor_id column in an event table within the existing decisions dataset.
Create relationships between datasets
When your datasets are ready, link them by creating the following relationships:
- Create a one-to-many relationship between the actor dataset and the decisions dataset using the user's unique identifier.
- Create a one-to-many relationship between the actor dataset and event dataset for your metric using the user's unique identifier.
Gain insights and analyze experiments
After configuring datasets and metrics, you can gain insights and analyze experiments.
- View experiments and explore results on the experiment results page.
- Create an Experiment Scorecard to track key experiment metrics and insights.
- Manage user permissions to ensure secure data access for your scorecards.
Please sign in to leave a comment.