Derived Dataset

  • Updated

In Analytics, a derived dataset is created by combining existing datasets (either source or derived) to create a compound dataset. A fundamental distinction between source and derived datasets is that source datasets are defined by mapping tables present in the data warehouse, whereas derived datasets are based on other datasets within Analytics.

Create a Derived Dataset

You can create a derived dataset from multiple datasets with custom SQL statements. Follow the steps below to create a derived dataset:

  1. On the left navigation panel, go to Data > Datasets.
Add a New Dataset
  1. On the Datasets page, click + New Dataset at the top and choose Derived Dataset.
Create a Derived Dataset
  1. On the dataset definition page, insert your SQL query into the SQL box:
SQL Query

You can quickly insert dataset names or columns into the box by utilizing the auto-complete feature that displays a list of selectable options.

SQL Query
  1. Click Run Query at the top-right to preview the results of your query:
Run Query
  1. The Data Preview tab shows you a preview of all the columns inside the table.
Data preview
  1. (Optional) Go to the Column Properties tab to inspect properties of individual columns. The following properties are defined:
Column Editor
    • Measure or attribute: This default setting makes it easier to construct queries for Explorations and Visualizations downstream in the application. A measure is a column in a Analytics dataset that represents a quantitative measurement of a business activity and an attribute is a column in a Analytics Dataset that represents a qualitative or categorical property of the entity the dataset represents.
    • Default Aggregation: This default setting makes it easier to construct queries for Explorations and Visualizations downstream in the application.
    • Data Index: Data Index controls whether or not Analytics caches the distinct values of the column in the warehouse. It helps reduce the latency with which column value dropdowns are populated. This is true by default for columns with string data type and disabled for other data types.

These properties have default values assigned to them. This makes it easier to construct queries for Explorations and Visualizations downstream in the application.

  1. (Optional) Go to the Primary Keys tab to define the keys. A primary key consists of one or more columns in a dataset that serves to uniquely identify each row within the dataset. Configuring a primary key is optional for all datasets in Analytics. If you want to define a primary key, ensure that a dataset within Analytics does not include any fully duplicated rows, as a primary key cannot be established for such a dataset. To create a primary key,
      1. Click + Add in the upper-right corner to add primary keys.
      2. Click the blank box under Columns Group.
      3. Select a column to set it as a key in your dataset from the drop-down menu.
      4. Repeat the previous step to add another key.
Primary Keys
  1. (Optional) Go to the Related Datasets tab to define a relationship between this dataset with other datasets.
    • The purpose of establishing a relationship between two datasets is to allow the use of both datasets in a single exploration query and perform analysis based on the sliced and diced data obtained from the two datasets. However, the relationship must be logical, where the values in the selected columns in both datasets must match to establish a successful relationship. For instance, the "Events" dataset may include a column called "User ID", which takes its values from the "ID" column in the "Users" dataset. By using these two columns, a logical relationship can be established between the two datasets.
    • To add related datasets,
        1. Click + Add Related Dataset in the upper-right corner.
        2. Click the Dataset dropdown and choose the dataset you want to link to the current derived dataset.
        3. Give a Name to the relationship you are creating.
        4. Select the relevant Cordinality for the relationship. The available options are:
          • many to one - Use this when multiple rows in the current dataset can match a single row in the related dataset based on the selected columns.
          • one to many - Use this when a single row in the current dataset can match multiple rows in the related dataset based on the selected columns.
          • one to one - Use this when you want to establish a relationship by directly matching a selected column in a single row of both datasets.
        5. Click + located at the bottom-right and use the dropdowns to select the columns that need to match for both datasets to establish a successful relationship.
        6. Click Save.
Related Datasets
  1. Go to the Semantics tab to annotate this dataset as an event stream if it represents events or an actor.
    • Choose the type for your dataset - ActorEvent Stream, or None.
    • If you choose Actor as the type, then click Add Preferred Property and add as many properties as needed.
    • If you choose Event Stream as the type, then click Add Event Stream and configure event properties and actor properties.
    • Click Save.
  1. (Optional) Go to the Derived Columns tab where you can define new derived columns for this dataset. Learn more
  2. (Optional) Go to the Cohorts tab where you can define cohorts of rows of this dataset that represents logical groupings such as user cohorts. Learn more

Once you have saved your dataset successfully, you can view and access it on the Dataset page.

Edit your dataset

To edit your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to edit.
  3. Make necessary changes and click Save.
Related Datasets

Duplicate your dataset

To duplicate your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to duplicate.
  3. Click the three dots at the top-right and and click Save As. Give your copy a name and a description(optional), and choose a destination folder.
  4. Click OK and the new dataset will be created.
Related Datasets

Delete your dataset

To delete your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to delete.
  3. Click the three dots at the top-right and and click Delete. A confirmation box will appear asking you to confirm the deletion - click Delete and your dataset will be deleted.
Related Datasets