Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content Skip to complementary content

Transforming data

You can create reusable and rule based data transformations as a part of your data pipeline. You can perform transformations as part of your data onboarding, or create reusable transformation data tasks. You can perform row-level transformations and create datasets that are either materialized as tables, or created as views that perform transformations on the fly.

You can perform explicit transformations per dataset, or create global rules that transform multiple datasets. You can also filter a dataset to create a subset of rows.

Creating a transformation data asset

The easiest way to create a transformation data task is to click ... on a storage data task and then selecting Transform data.

You can also click Add new in a data project and select Transform data. In this case you need to define which source data task to use.

  1. Define your source data in Transform dataset.

    Select source datasets and click Add to add them to Output.

    You can add a SQL-based dataset by selecting a dataset, and clicking Add SQL-based dataset.

    Adding SQL-based datasets

    Tip noteYou can also add more datasets from other storage data assets by clicking Select source data.
  2. Make all required changes to the included datasets, such as transformations, filtering data, or adding columns.

    For more information, see Managing datasets.

  3. When you have added the transformations that you want, you can validate the datasets by clicking Validate datasets. If the validation finds errors, fix the errors before proceeding.

    For more information, see Validating and adjusting the datasets.

  4. Create a data model

    Click Model to set the relationships between the included datasets.

    For more information, see Creating a data model.

  5. Click Prepare to prepare the data task and all required artifacts. This can take a little while.

  6. When the status displays Prepared, you can run the data task .

    Click ... and then Run.

The data task will now start creating datasets to transform the data.

Information noteIt is not possible to change which datasets are included when you have started generating datasets.

Adding SQL-based datasets

You can include SQL-based tables in transformation tasks. A SQL-based table allows you to input a SQL SELECT query into a pipeline to define complex or simple transformations.

For more information, see Adding SQL-based datasets.

  • In Transform dataset, select the datasets to include in the query, and click Add SQL-based table.

Scheduling a transformation task

You can schedule a transformation task to be updated periodically. You can set a time based schedule, or set the task to run when input data tasks have completed running.

Click ... on a data task and select Scheduling to create a schedule. The default scheduling setting is inherited from the settings in the data project. For more information about default settings, see Transform default values.

You always need to set Scheduling to On to enable the schedule.

Information noteIf all datasets in the task are non-materialized, there is nothing to run as the transformation is performed on-the-fly with views. You can still create a schedule for a non-materialized transformation ask to run when the schedule condition is met. The task will complete immediately which can trigger a downstream task, for example a data mart. This allows you to build an event based pipeline schedule that includes non-materialized transformations without interrupting the pipeline flow.

Time based schedules

You can use a time based schedule to run the task regardless of when the different input sources are updated.

  • Select At specific time in Run the data task.

You can set an hourly, daily, weekly or monthly schedule.

Event based schedules

You can use an event based schedule to run the task when input data tasks have completed running.

  • Select On specific event in Run the data task.

You can select if you want to run the task when any of the input tasks have completed successfully, or when any of a selection of input tasks have completed successfully.

Information noteThe task will not run if any input task or downstream task is running when the schedule is triggered. The task is skipped until the next scheduled run.

Monitoring a transformation task

You can monitor the status and progress of a transformation task by clicking on Monitor.

For more information, see Monitoring a data task.

Transformation settings

You can set properties for the transformation data task .

  • Click Settings.

Warning noteIf the task has already been run, changing a setting other than Runtime settings require that you recreate the datasets.

General settings

  • Database

    Database to use in the data source.

  • Data task schema

    You can change the name of the storage data task schema. Default name is store.

  • Internal schema

    You can change the name of the internal storage schema. Default name is store__internal.

  • Prefix for all tables and views

    You can set a prefix for all tables and views created with this task.

    Information noteYou must use a unique prefix when you want to use a database schema in several data tasks.
  • Materialized

    You can select to create tables (Materialized), or only create views that perform transformations on the fly (Non-materialized).

  • History

    You can keep historical change data to let you easily recreate data as it looked at a specific point in time. You can use history views and live history views to see historical data.

Runtime settings

  • Parallel execution

    You can set the maximum number of data connections for full loads to a number from 1 to 5.

  • Warehouse

    The name of the cloud data warehouse.

Limitations

  • It is not possible to change data types in a transformation data task when the Non-materialized option is selected.

  • Non-materialized SQL-based data sets provide the full result set only, incremental filtering is not supported. Non-materialized SQL-based data sets also provide current data only and do not keep historical records.

Learn more

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!