Union Dataset

  • Updated

A union dataset is a logical combination of multiple warehouse tables exposed to users as a single logical dataset. A typical use case for union datasets is to create a single logical event dataset from a set of event tables in the warehouse - one for each event type. Analytics's union datasets have performance and cost advantages over physically combining these tables in the warehouse.

A source dataset in Analytics maps to only one warehouse table, while a union dataset maps to multiple warehouse tables.

Create a Union Dataset

To create a union dataset, follow the steps below:

  1. On the left navigation panel, go to Data and click Datasets.
  2. On the Datasets page, click + New Dataset at the top and choose Union Dataset.
  3. On the dataset definition page, the first tab that you will see is Tables. In this section, you can click + Add Tables and select all warehouse tables that need to be included for this union dataset and click Confirm. Then, you can see all the tables that you had chosen to create the dataset.
  4. Then, save the dataset to be able to see the columns, primary keys, and other such properties. Every time you change the tables on a union dataset, you need to click Save in order to be able to view the properties.
Union Dataset
  1. The next section is Columns.
    • This will contain all the columns of the union dataset, where each column maps to one or more columns of the tables that were chosen for the dataset.
    • By default, Analytics will attempt to combine and map columns based on matching column names and data types.
    • For example, say the union dataset is created from two warehouse tables "Table 1" and "Table 2". Say "Table 1" has a String column called "Country", and "Table 2" too has a String column called "Country". In the resulting union dataset, Analytics will create a String column called "Country" mapped to the "Country" columns of "Table 1" and "Table 2".
    • To see the mapping of these columns - click on an entry and all properties related to the entry are displayed - Column name, Description, Data Type, Measure or Attribute, Default Aggregation, Data Index, Status, Source Tables, Mapping Type and Value are displayed. You can inspect properties of individual columns, if required.
    • You can also edit column properties directly using one click in the Columns tab without having to open individual entries.
Union Dataset

Following are the fields that you can edit within the individual column sections:

Name and Description

To edit the name of a column, click the column name and edit it. To edit the description, click the existing description, make necessary changes, and click Apply. To see column description, you can hover over the description column in the Columns tab.

Union Dataset

Column Type

To change the column type, click tehe field under Measure or Attribute and choose a value from the drop-down.

Union Dataset

Default Aggregation

To set the default aggregation, click the drop-down and choose an aggregator. Learn more about Aggregators.

Union Dataset

Data Index and Status

Both these properties can be enabled/disabled using the respective toggles.

Union Dataset

Mapping Type

  • In the Column Mappings section, you can edit default values like Mapping Type and column Value. There are 3 mapping types -ColumnConstant, and None.
  • Every column is mapped to zero or more columns of underlying tables. This is called column mapping. Below are some details related to this.

    • If the column should map to a specific table column, it has COLUMN mapping.
    • If the column should just be a constant, then it has CONSTANT mapping. This type of mapping is equivalent of selecting a constant, like select 'val' as columnName from table0.
    • If the column has NONE mapping, then the column selections are not projected from the underlying table.Union Dataset

These properties have default values assigned to them. This makes it easier to construct queries for Explorations and Visualizations downstream in the application.

Overrides

Within this section, you can see the Source Table name, Mapping Type and column Value. You can make changes to the mapping type and column values here as well. To add a new override, click + at the top of this section and a new row will be added.

Union Dataset

Define Primary Keys

  1. Go to the Primary Keys tab to define the keys.
      • A primary key consists of one or more columns in a dataset that serves to uniquely identify each row within the dataset. Configuring a primary key is optional for all datasets in Analytics. If you want to define a primary key, ensure that a dataset within Analytics does not include any fully duplicated rows, as a primary key cannot be established for such a dataset.
      • To define a primary key,
          1. Click + Add in the upper-right corner to add primary keys.
          2. Click the blank box under Columns Group.
          3. Select a column to set it as a key in your dataset from the drop-down menu.
  1. Repeat the previous step to add another key.
Primary Keys

Define Related Datasets

Defining related datasets is an optional step. Go to the Related Datasets tab to define a relationship between this dataset with other datasets.

  • The purpose of establishing a relationship between two datasets is to allow the use of both datasets in a single exploration query and perform analysis based on the sliced and diced data obtained from the two datasets.
  • However, the relationship must be logical, where the values in the selected columns in both datasets must match to establish a successful relationship. For instance, the "Events" dataset may include a column called "User ID", which takes its values from the "ID" column in the "Users" dataset. By using these two columns, a logical relationship can be established between the two datasets.
  • To add related datasets, follow the steps below.
      1. Click + Add Related Dataset in the upper-right corner.
      2. Click the Dataset dropdown and choose the dataset you want to link to the current derived dataset.
      3. Give a Name to the relationship you are creating.
      4. Select the relevant Cardinality for the relationship. The available options are:
        • many to one - Use this when multiple rows in the current dataset can match a single row in the related dataset based on the selected columns.
        • one to many - Use this when a single row in the current dataset can match multiple rows in the related dataset based on the selected columns.
        • one to one - Use this when you want to establish a relationship by directly matching a selected column in a single row of both datasets.
      5. Click + located at the bottom-right and use the dropdowns to select the columns that need to match for both datasets to establish a successful relationship.
      6. Click Save.
        1.  
Related Datasets

Set Sampling Keys

Go to the Sampling Keys tab to set a sampling key. A sampling key for a dataset is exclusive to an actor dataset and allows the utilization of sampled data when creating an exploration using this derived dataset.

Related Datasets

Configure Semantics

The Semantics tab allows you to annotate this dataset as an event stream or an actor.

    1. Choose the type for your dataset - ActorEvent Stream, or None.
        • If you choose Actor as the type, then click Add Preferred Property and add as many properties as needed.
        • If you choose Event Stream as the type, then click Add Event Stream and configure event properties and actor properties.
    2. Click Save.
Related Datasets

Finally, give your dataset a name and add a description, if required and click Save.

Edit your dataset

To edit your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to edit.
  3. Make necessary changes and click Save.
Related Datasets

Duplicate your dataset

To duplicate your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to duplicate.
  3. Click the three dots at the top-right and and click Save As. Give your copy a name and a description(optional), and choose a destination folder.
  4. Click OK and the new dataset will be created.
Related Datasets

Delete your dataset

To delete your dataset,

  1. On the left navigation panel, go to Data > Datasets.
  2. On the Datasets page, click the dataset you want to delete.
  3. Click the three dots at the top-right and and click Delete. A confirmation box will appear asking you to confirm the deletion - click Delete and your dataset will be deleted.
Related Datasets