Quick Start Guide

  • Updated

This guide provides an overview of the end-to-end flow of using Warehouse-Native Analytics, starting with creating a service account, accessing the platform, creating an application, connecting to your data warehouse, defining datasets, and building an exploration.

See our Terms and Concepts Guide for explanations of these terms.

Step 1: Log into Warehouse-Native Analytics

First, access the Warehouse-Native Analytics platform.

  1. Access the Warehouse-Native Analytics web application URL on any browser of your choice.
  2. Enter your username and password as provided by the Warehouse-Native Analytics Administrator and click Sign In.

01.png

 

If you encounter any difficulties logging into Analytics, reach out to your Analytics Administrator or use the Support portal for assistance.

Step 2: Create a Service Account

You need to create a service account for Analytics within your data warehouse. Analytics uses this service account to query your warehouse.

Create a Read/Write Schema

It is also highly recommended that you add a schema with read/write access granted. We use this to improve performance by caching common computations such as the list of unique values in a column and reduce the warehouse cost of Analytics. This schema is usually called netspring_rw. This will substantially improve the warehouse cost and the performance of Analytics. If you need to put in a request to get this schema created, you can do so - this will not block you from continuing your setup, you can come back to this when the schema is created.

Following is the set of steps that you need to perform after this:

  1. On the left navigation panel in Analytics, go to Settings and under General Settings, scroll down to the Materialization section and enable the feature using the toggle. When materialization is enabled, Analytics creates materialized tables within the data warehouse which contain intermediate results for improving performance.
  2. Then, fill up the following fields.
    • Database - the name of the database in the data warehouse where the materialized tables will be created. Here, we give the name as netspring.
    • Schema - the name of the schema within the database where the materialized tables will be created. In the example, our schema name will be netspring_rw.
    • Refresh Cron Schedule - the refresh periodicity of the materialized tables using the Cron syntax. The recommended periodicity is daily and so the schedule will be 0 0 * * *
service_account-schema-9cc465a1743da26a6d6f920cea6443c4.gif

Step 3: Create an Application

Create an application within your organization to store all data and analysis we will generate in the subsequent steps.

  1. On the left navigation panel, click the Application button and then click + beside the Organization name to create a new application within that Organization.
gs-flix-11-4219a85da610121ddbbd8fb555ab86fc.gif
  1. In the Creating new application dialog that appears, enter the application details.
    • Provide a Name and Description for the new application and select a Color of your choice.
gs-flix-12-6eb2a7d721391141e2118e84c7638fae.gif
  1. Click Next step: "Setup Connection" to proceed to data connection setup.
  2. Select your preferred data warehouse and click Next.
  3. Enter the Connection Setup details.
  4. Click the clockwise arrow icon next to the Health field to test your connection. A successful connection appears in green and shows as good; in case of failure, it displays an error. The common sources of error are incorrect warehouse credentials or that the Analytics service account does not have correct permissions.
  5. Click Confirm to finish the application setup. After successful creation, the new application will appear under your Organization. You can verify this by clicking the Application button on the left navigation panel.

Learn how to setup Connections in Analytics.

Step 4: Create Datasets and a Relationship

Adding user and event datasets and create a relationship between them is necessary before you start creating explorations in Analytics.

Creating Event Datasets

Before you create a dataset, you need to understand whether your events are represented by one or many tables within your warehouse. These categories are two common possibilities, but it is also possible that you may not fall under any of these categories.

If you use Rudderstack or Segment, your events are stored in multiple tables. If you use Snowplow, your events are stored in a single table. If you do not fall under either of these categories, reach out to us by clicking the Help icon in the left navigation bar and choosing Get Help. Analytics supports a variety of setups, this document covers common setups, but others are supported.

gethelp-992be9c64fb8afa2fc898d81e8a23bc8.gif
 

Single Events Table

  1. On the left navigation panel, click +, navigate to Dataset, and select Source Dataset. Source datasets read from a single table in your warehouse.
  2. In the Create Dataset dialog, navigate to the Pick a Source section on the left and expand the database using the tree view. You can also enter the table path directly. We have used the SPRINGFLIX database as an example for this Quick Start guide.
  3. Select the required table or view. In this example, we've used the EVENT table from the PUBLIC schema.
  4. On the dataset definition page that appears, provide a name for the dataset, such as "Product Events" in this example, and click Save.
  5. Annotate the dataset as an event stream. This is to tell Analytics that the dataset that you have created is a stream of events performed by users, accounts, etc. To do that:
    • Go to the Semantics tab.
    • Choose Event Stream in the This is an drop-down.
    • Click + Add under Event Streams.
    • In the Product Events are events that occur at field, select a column that orders the event by time, such as event_ts in this example and for the with event type field, select the column that represents the name of the event, which in this example is event_type. The Event Type of an event is the name that will show in the Analytics UI when you are selecting events.
    • Click Save.

Events Stored Across Multiple Tables

When you have multiple tables, you create a Union Dataset. A union dataset is a combination of multiple warehouse tables exposed to Analytics users as a single logical dataset. This type of dataset is specifically designed to unify your view of events across multiple tables and query them efficiently. You can create a union dataset by following the step-by-step procedure given in this section.

In the Semantics tab, annotate the dataset as an event stream. To do that:

  • Choose Event Stream in the This is an drop-down.
  • Click + Add under Event Streams.
  • In the Product Events are events that occur at field, select a column that orders the event by time, such as event_ts in this example and for the with event type field, select the column that represents the name of the event, which in this example is event_type. The Event Type of an event is the name that will show in the Analytics UI when you are selecting events.
  • Click Save.

Creating User Datasets

As the next step, we need to create a User Dataset. Analytics requires a dimension table of users where there is one row per user. Follow these steps to create a user dataset:

  1. On the left navigation panel, click +, navigate to Dataset, and select Source Dataset.
  2. In the Create Dataset dialog, navigate to the Pick a Source section on the left and expand the database using the tree view. You can also enter the table path directly. We have used the SPRINGFLIX database as an example.
  3. Select the required table or view. In this example, we've used the USERS table from the PUBLIC schema.
  4. On the dataset definition page that appears, provide a name for the dataset, such as "Users" in this example, and click Save.

If you do not have a single table representing all users, you can reach out to Analytics Support or click the Help icon in the left navigation bar and choose Get Help. Analytics supports a variety of setups, this document covers common setups, but others are supported.

 

Creating a Relationship between your User and Event Datasets

Now that we have our Event and User datasets, let us establish a relationship between them. Creating a relationship between datasets tells Analytics how to join these tables together.

  1. From the Users dataset, navigate to the Related Datasets tab and click + Add Related Dataset.
  2. In the Dataset field, choose the event dataset that was created earlier - here, ProductEvents.
  3. For Cardinality, choose one to many because one user will have multiple events.
  4. Then, select the columns. Choose the id column in the Users table, this is usually the primary key. In the Events table, choose the column that represents that id of the user who performed that event.
  5. Finally, click Save.

Creating Column Actor datasets

An actor is represented by a dataset - typically this maps to a dimension table in the data warehouse. However in some cases, there is no such dimension table in the warehouse, rather there is only an identifier column in an event table (i.e. Events.user_id). In this case, you can create a Column Actor dataset. The Column Actor dataset does not have any direct mapping to a warehouse table; it is defined solely by its relationship to the identifiers in other tables. Follow the steps below to create a column actor dataset:

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click + New Dataset at the top and choose Column Actor Dataset.
  3. Give your dataset a name, add a description(optional), and click Save.
  4. The next step is to add Related Datasets - here, you can define a relationship with another dataset.

To add related datasets, follow the steps below.

  1. Click + Add Related Dataset in the upper-right corner.
  2. Click the Dataset dropdown and choose the dataset you want to link to the current column actor dataset.
  3. Select the Cardinality for the relationship.
  4. Now, define the relationship. Here, we only choose the value for the non-column actor because there are no physical columns in the column actor dataset. For example, if you choose user_id - this means that the column actor dataset will use the user_id column of the related dataset that was chosen, with no underlying physical table.
  5. Click Save.
create-ca-dataset-01009c8f3092467145adad077c43ce51.gif

Step 5: Build an Exploration

The next step is to build an exploration using the datasets you created. Before you create an exploration, go to Settings and configure the Defaults - choose the Users dataset for the Actors dataset field and for the Event Stream field, choose Product Events. Once you set these defaults, users will be populated by default as the selected actor for this measure.

app_defaults-da0b3e4b99b0fc195842a61d0a73f26c.gif

Now, you are all set to build your exploration.

  1. On the left navigation panel, go to Exploration.
  2. Choose a template from the list of available exploration templates. For the example, let us choose Event Segmentation and create an exploration of Daily Active Users.
  3. On the definition page, select the measure using the drop-down - here, Count of unique actors that performed event.
  4. Then, under the Events section, click Select Events and choose an event. For the example, let us choose Play Content.

The list of events that you will see within the events drop-down is extracted from the event type column on the event stream.

  1. Click Run Exploration to display a chart in the visualization window.
  2. Give a name to the exploration and click Save.

You have now successfully created your first exploration!