Source Dataset

  • Updated

A source dataset represents a logical table from the data warehouse. After connecting to the data warehouse, you can map a database table, database view, or SQL query from the data warehouse to a source dataset within Analytics without physically transferring or moving data to Analytics. You can then utilize the source dataset to execute analysis or create derived datasets.

Create a Source Dataset

Follow the steps below to create a source dataset:

  1. On the left navigation panel, go to Data > Datasets.
Add a New Dataset
  1. On the Datasets page, click + New Dataset at the top and choose Source Dataset.
New Dataset Button
  1. In the Create Dataset dialog, navigate to the Pick a Source section on the left and expand the database using the tree view.
Pick a Data Source
  1. Expand the schema and select the table or view you want to add to the query.
  2. To preview the selected data, click Preview in the Source Preview section on the right, then click Confirm to move to the dataset definition.
Data Preview
  1. On the dataset definition page, Analytics displays a default SQL query based on the data selections you made in the previous steps, including the connection source and all of its contents. You can use or modify the suggested SQL command to shape the data as needed.
SQL Preview
  1. Click Run Query at the top-right and preview the data to ensure the SQL statement returns the expected information.
Run Query
  1. (Optional) Go to the Column Properties tab to inspect properties of individual columns. The following properties are defined:
Column Editor
    • Measure or attribute: This default setting makes it easier to construct queries for Explorations and Visualizations downstream in the application. A measure is a column in a Analytics dataset that represents a quantitative measurement of a business activity and an attribute is a column in a Analytics Dataset that represents a qualitative or categorical property of the entity the dataset represents.
    • Default Aggregation: This default setting makes it easier to construct queries for Explorations and Visualizations downstream in the application.
    • Data Index: Data Index controls whether or not Analytics caches the distinct values of the column in the warehouse. It helps reduce the latency with which column value dropdowns are populated. This is true by default for columns with string data type and disabled for other data types.
  1. (Optional) Go to the Primary Keys tab to define the keys. A primary key consists of one or more columns in a dataset that serves to uniquely identify each row within the dataset. Configuring a primary key is optional for all datasets in Analytics. If you want to define a primary key, ensure that a dataset within Analytics does not include any fully duplicated rows, as a primary key cannot be established for such a dataset. To add a primary key,
    1. Click + Add in the upper-right corner to add primary keys.
    2. Click the blank box under Columns Group.
    3. Select a column to set it as a key in your dataset from the drop-down menu.
    4. Repeat the previous step to add another key.
Primary Keys
  1. (Optional) Go to the Related Datasets tab to define a relationship between this dataset with other datasets. Just like the source tables in the data warehouse, Analytics’s datasets are relational.
    • By setting up relationships between datasets, end users will have access to all of the events or dimensional attributes that they need to bring business context into their product analyses.
    • Using these relationships, Analytics understands when event datasets or attribute datasets can and cannot be combined in the same analysis.
    • Events and attributes will be available to select in the UI based on the existing relationships, and will automatically be not selectable when it does not make sense to include in an analysis. This gives end users access to successfully use all of the available data in the warehouse without having to be familiar with the actual layout of the data.
    • The purpose of establishing a relationship between two datasets is to allow the use of both datasets in a single exploration query and perform analysis based on the sliced and diced data obtained from the two datasets. However, the relationship must be logical, where the values in the selected columns in both datasets must match to establish a successful relationship. For instance, the "Events" dataset may include a column called "User ID", which takes its values from the "ID" column in the "Users" dataset. By using these two columns, a logical relationship can be established between the two datasets.
    • To add related datasets,
        1. Click + Add Related Dataset in the upper-right corner.
        2. Click the Dataset dropdown and choose the dataset you want to link to the current source dataset.
        3. Give a Name to the relationship you are creating.
        4. Select the relevant Cardinality for the relationship. The available options are:
          • many to one - Use this when multiple rows in the current dataset can match a single row in the related dataset based on the selected columns.
          • one to many - Use this when a single row in the current dataset can match multiple rows in the related dataset based on the selected columns.
          • one to one - Use this when you want to establish a relationship by directly matching a selected column in a single row of both datasets.
        5. Click + located at the bottom-right and use the dropdowns to select the columns that need to match for both datasets to establish a successful relationship.
        6. Click Save.
    1.  
Related Datasets
  1. Go to the Semantics tab to annotate this dataset as an event stream or an actor.
    • Choose the type for your dataset - ActorEvent Stream, or None.
    • If you choose Actor as the type, then click Add Preferred Property and add as many properties as needed.
    • If you choose Event Stream as the type, then click Add Event Stream and configure event properties and actor properties.
    • Click Save.
Semantics
  1. (Optional) Go to the Derived Columns tab where you can define new derived columns for this dataset. Learn more
  2. (Optional) Go to the Cohorts tab where you can define cohorts of rows of this dataset that represents logical groupings such as user cohorts. Learn more
  3. Go to the Sampling Keys tab to set a sampling key. A sampling key for a dataset is exclusive to an actor dataset and allows the utilization of sampled data when creating an exploration using this source dataset.
Creating a Source Dataset

Once you have saved your dataset successfully, you can view and access it on the Dataset page.

New Source Dataset

Active Event

Any event that is triggered by the user becomes an active event. For example, a click by the user is an active event. Analytics provides the ability to define these active events by identifying a cohort on any event stream. Here's how you can define an active event.

  1. Navigate to Cohorts on the left panel.
  2. Click New Cohort, choose a dataset and click Confirm.
  3. Give your cohort a name. Add a new block and set a value for it based on your requirements and click Save.
  4. Now, this cohort will appear in the active events drop-down under the Semantics tab of your dataset.
Active Events

Why do you have to define active events?

When you define active events, you can use them inside features like funnels. While defining the stages in a funnel, Analytics makes it possible for you to set stages in which only the active events are displayed. In Funnels, you will only be able to see active events if your event stream has an annotation for active events.

Edit your dataset

To edit your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to edit.
  3. Make necessary changes and click Save.
Related Datasets

Duplicate your dataset

To duplicate your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to duplicate.
  3. Click the three dots at the top-right and and click Save As. Give your copy a name and a description(optional), and choose a destination folder.
  4. Click OK and the new dataset will be created.
Related Datasets

Delete your dataset

To delete your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to delete.
  3. Click the three dots at the top-right and and click Delete. A confirmation box will appear asking you to confirm the deletion - click Delete and your dataset will be deleted.
Related Datasets