Connect your Redshift warehouse to Optimizely Analytics to unify customer and product data, create advanced segments, and measure key outcomes like revenue and retention. This direct connection ensures your analytics are accurate, eliminates the need for manual exports, and ensures experimentation insights align with your single source of truth.
Perform the following steps to configure your Redshift warehouse for Optimizely Analytics:
Retrieve your JDBC URL
The first parameter required to set up your Amazon Redshift connection is the JDBC connection string for your cluster. You can find the cluster connection string in the Amazon Redshift console, on a cluster's details page.
Create a Redshift user
- Configure a Redshift user with a unique username and password. By default, only the admin user you create when you launch the data warehouse has access to the default database. To grant other users access, create one or more accounts. Read about authenticating with username and password.
- Provide the Redshift user with permissions for the specific table, views, or schemas you want to analyze with Analytics.
-
Enable case-sensitive lookup for databases, tables, and column names using the following SQL:
ALTER USER <username> SET enable_case_sensitive_identifier = true - Update the maximum query execution time to five minutes:
- Go to the workgroup configuration.
- Select the Limits tab, and update the limit for Maximum query execution time from 14400s to 300s.
IP allowlisting (conditional)
Ensure the warehouse accepts incoming Optimizely Analytics requests over the public internet. This action is conditional and only required if the warehouse cluster is guarded by a security group (such as a firewall) that prevents access to the cluster from Optimizely Analytics.
Following is the IP allowlist:
- 35.196.71.222
- 34.73.142.185
- 34.148.77.115
- 34.73.63.141
- 34.74.199.69
- 34.74.109.219
- 34.139.128.201
- 35.243.168.58
Configure CUPED (Experimentation only)
To enable CUPED (Controlled-experiment using Pre-Experiment Data) in your Amazon Redshift environment, a one-time manual setup is required. This guide will walk you through creating and configuring the necessary AWS Lambda function and IAM roles.
Create the Amazon Web Services (AWS) Lambda function
Create the basic Lambda function from the AWS console.
- Go to the AWS Lambda Console.
- Click Functions > Create function.
- Ensure the Author from scratch option is selected.
- Configure the basic function settings:
-
Function name – Enter
cuped_thetas. - Runtime – Select Python 3.12.
-
Architecture – Keep the default value,
x86_64. - Execution role – Select Create a new role with basic Lambda permissions.
-
Function name – Enter
- Click Create function to finalize.
Configure the function code and layers
Add the Python code and the required dependencies (layers) to your new function.
- Select the Code tab in your newly created function's console.
-
Replace the default code in the editor with the following snippet:
import numpy as np, pandas as pd, math, json def create_matrix(triangleVals,n): out = [] outIdx = 0 for i in range(n): out.append([]) for j in range(n): if j >= i: out[i].append(triangleVals[outIdx]) outIdx += 1 else: out[i].append(out[j][i]) return out def lambda_handler(event, context): input = event['arguments'][0][0] df = pd.DataFrame(np.array([0 if i is None else i for i in json.loads(input)])) m = len(df) n = int((-3 + math.sqrt(9 + 8*m)) / 2) XtX = create_matrix(df.iloc[0:m-n].values.flatten(), n) Xty = df.iloc[m-n:].values.flatten() beta_hat = np.linalg.inv(XtX).dot(Xty) results = dict() results['results'] = [json.dumps(beta_hat.tolist())] return json.dumps(results) - Scroll down to the Layers section and click Add a Layer.
- Select AWS Layers under Choose a layer.
- Choose AWSSDKPandas-Python312 from the list of AWS layers.
- Click Add to attach the layer to your function.
Test the Lambda function
Before proceeding, verify that the function is working correctly.
- Select Test in the Code tab.
- Click Create new event.
- Enter an Event name (e.g.,
TestEvent). -
Paste the following test event data in the JSON editor:
{ "arguments": [ [ "[1,2,3]" ] ] } - Click Save, and then click Test again.
-
Verify that the function executes successfully and returns the following response:
"{\"results\": [\"[3.0]\"]}"
Configure the Redshift IAM role
Finally, you need to grant your Redshift cluster permission to invoke the Lambda function you just created.
- Ensure a default IAM role – Confirm that your Redshift cluster has a default IAM role attached to it.
-
Attach a new policy – Attach a new policy to that role that grants permission for the
lambda:InvokeFunctionaction on thecuped_thetasfunction.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": "arn:aws:lambda:<REGION>:<AWS account>:function:cuped_thetas" } ] }
Performance guidance
The following techniques are various approaches you can take to ensure that Optimizely Analytics runs at an optimal cost and performance profile:
- Ensure your events/conversions table is clustered by event date (not time), and event type (the same column selected in the Semantics tab of the dataset in Optimizely Analytics), in that order.
- (Experimentation Analytics only) Check if the warehouse is clustered by the experiment ID column and decision date (not time) if it has a separate decisions table.
- Ensure that you have created a new schema in your warehouse, give Optimizely Analytics read and write access to it, and then enter the name of that schema in the Optimizely Analytics app settings, in the Materialization section. This enables the materialization of repeated queries, which is a large cost/performance boost.
- Ensure your warehouse instance size is appropriate.
- Consider creating a new warehouse for Optimizely Analytics, if this same warehouse instance is also used by other workloads, so that Optimizely Analytics queries are isolated from other workloads.
Create an empty scratch schema
After configuring your warehouse and granting permissions, create a writable scratch space for Optimizely Analytics. This space optimizes internal operations, enhancing performance, resource utilization, and user experience, making it essential for advanced analytics in Redshift.
Please sign in to leave a comment.