Adding SQL transformations

You can include SQL-based tables in transformation tasks. A SQL transformation allows you to input a SQL SELECT query into a pipeline to define complex or simple transformations. You can let SQL assistant generate a query from a text prompt using generative AI. You can also use macros to adapt the query to run if an initial load or an incremental load is performed.

For more information about creating a transformation task, see Transforming data.

Adding a SQL transformation

To add a SQL transformation in a Transform data task:

In Transform, select the datasets to include in the query, and click Add SQL transformation.

Set the name of the transformation in Name. For more information about the other settings, see Settings.

Click Add when you are ready to create the SQL transformation.

Edit SQL transformation is displayed.
Write the query to return the SQL-based table in the query pane.

For more information, see Editing the SQL query.
Click Run query.

This will fetch metadata and a preview of the data.
Review the parameters in Parameters.

For more information, see Parameters.
Set a primary key in Metadata.

For more information, see Metadata.
When you are satisfied with the results of the query, click Save and close. The dataset is now added to the list of targets, and you can preview it.

Editing the SQL query

Type the query to return the SQL-based table in the query pane. The query can only read from the source datasets that you selected, but you can add more source datasets with Add.

You can create a statement to load a dataset from Objects. Click and then Generate select statement in editor.
You can copy the name of a dataset to the clipboard by clicking and then Copy.
Click > to move a dataset or column name to the editor.
Click to search and replace text in the editor.

You can use SQL assistant to generate a query using generative AI. For more information, see Generating a SQL transformation from a text prompt. SQL assistant is not available in Qlik Cloud Government.

If incremental load is used, you can use Macros to adapt the query to run if an initial load or an incremental load is performed.

For more information, see Using macros to adapt the query to load type.

Keyboard shortcuts

Keyboard shortcuts are expressed assuming that you are working in Windows. For macOS, use Command instead of Ctrl.

Ctrl+F Search and replace
Ctrl+/ Comment/uncomment the current line
Ctrl+Enter Run the query
Ctrl+Z Undo
Ctrl+Y Redo

Previewing results

You can preview the results using a sample of the data in Results.

Data preview must be enabled on tenant level in Administration. You can find the Viewing data in Data Integration option in Settings, under Feature control.

You must be assigned the following roles to be able to preview data:

Can view data in the space where the connection resides.
Can view in the space where the project resides.

Results using a sample of the data are displayed. You can set how many data rows to include in the sample with Number of rows.

Click Last executed query to see the last run time and duration of the last executed query.

Filtering the results

You can filter the results by one or more columns.

Click on the first column to filter by and then Filter.

Select which operator to use and which value to filter by.

You can add more column filters by clicking Add filter.

The filter will only affect the existing sample data.

Sorting the results

You can sort the data sample by a specific column.

Click on the column to sort by and then Sort ascending or Sort descending.

Sorting will only affect the existing sample data. If you used a filter to only include orders from 2024 and invert the sort order, the sample data will still only contain orders from 2024.

Managing displayed columns

You can select which columns to display in the results. The selection is not saved when you save the transformation.

Hide a column by clicking on the column and then Hide column.
Manage display of all columns by clicking on any column and then Displayed columns.

Parameters

Click Parameters to change the parameters in the SELECT query. The parameters should be mapped to the source or target objects.

Parameters are automatically mapped when you:

Use Generate select statement in editor in Objects.
Click > to move a dataset or column name to the editor.
A parameter name matches a table name from the tables in Objects.

If you use arbitrary parameter names, you must map the parameters manually. Only parameter names that match dataset names will be mapped automatically.

You can map to

History type 2 structures (_history)
Current structures
Target objects

You can only map target objects when a parameter is defined inside an incremental macro.

You can replace a parameter with a reference to have a query that is easier to read. Change the value in Parameter, and the reference in SELECT query will change as well.

Metadata

You can edit the metadata settings in Metadata.

Set primary keys in the Key column.
Set if the column is nullable in the Nullable column.
Select a column and click Edit to set column name, key, nullable and data type.

Running the query

You can run the query at any time during development by clicking Run query. The first time you run a query, metadata will be retrieved. When the query is run you can go to Metadata and set a primary key, which is required fo for the transformation to be complete.

If the editor contains more than one query you can run a single query by selecting the query and clicking Run query.

Using macros to adapt the query to load type

If incremental load is used, you can use Macros to adapt the query to run if an initial load or an incremental load is performed.

Select the macro and add the query inside the respective clause.

Q_RUN_INCREMENTAL adds a clause where you can place a query that is only run when an incremental load is performed.
Q_RUN_INITIAL_LOAD adds a clause where you can place a query that is only run when an initial load is performed.

You do not need to specify the complete query with each macro. You can adjust or add code based on your use case.

Use case: Incremental filter using date range:

In this use case, a set of orders data needs to be transformed with custom SQL. Since this is a large dataset, you want to include incremental filtering to ensure that the incremental load only considers orders from the past 12 hours.

This is the base query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM “mydb”.”myStorageTask”.”ORDER” O JOIN “mydb”.”myStorageTask”.”ORDER_DETAIL” OD ON O.ORDER_ID = OD.ORDER_ID

This is the filter to apply during the incremental run, where you only consider orders that were added the past 12 hours:

WHERE OD.UPDATE_DTM > DATEADD(hour, -12, CURRENT_TIMESTAMP())

When implementing a SQL-based table, it is recommended to replace the table names with the ${TABLE} notation. This prevents hard-coding database and schema names which may change from environment to environment, for example when moving from development to production.

This is the complete query, using the filter macro and with table names replaced with the ${TABLE} notation:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID {{#QLIK_FUNC_IF Q_RUN_INCREMENTAL}} WHERE OD.UPDATE_DTM > DATEADD(hour, -12, CURRENT_TIMESTAMP() ) {{/QLIK_FUNC_IF}}

This results in an initial load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID

And an incremental load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID WHERE OD.UPDATE_DTM > DATEADD(hour, -12, CURRENT_TIMESTAMP() )

Use case: Incremental filter using data in the target :

In this use case, a set of orders data needs to be transformed with custom SQL. Since orders are a large dataset, you want to include incremental filtering to ensure that the incremental load only considers order details records that are newer than the data in the target table. You also want to mark if a row was processed by the initial load or an incremental load.

As you can see by this example, the macros can be used to adjust any part of the query for initial load or incremental load.

This is the base query:

This is the filter to apply during the incremental run. “ORDERS_DERIVED” is the name of the target dataset.

WHERE OD.UPDATE_DTM > = (SELECT MAX(LAST_UPDATE_DTM) FROM “mydb”.”myTrasformTask”.”ORDERS_DERIVED” )

Incremental loading can also map a ${TABLE} notation to the target objects created by the transform task.

This is the complete query, using the filter macro, and with table names replaced with the ${TABLE} notation. LOADED_BY is set to INIT if the load is initial, and INCR if the load is incremental:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT, OD.LAST_UPDATE_DTM, {{#QLIK_FUNC_IF Q_RUN_INITIAL_LOAD}} ‘INIT’ as LOADED_BY {{/QLIK_FUNC_IF}} {{#QLIK_FUNC_IF Q_RUN_INCREMENTAL}} ‘INCR’ as LOADED_BY {{/QLIK_FUNC_IF}} FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID {{#QLIK_FUNC_IF Q_RUN_INCREMENTAL}} WHERE OD.UPDATE_DTM >= (SELECT MAX(LAST_UPDATE_DTM) FROM ${TARGET_TABLE} ) {{/QLIK_FUNC_IF}}

This results in an initial load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT, OD.LAST_UPDATE_DTM, ‘INIT’ as LOADED_BY FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID

And an incremental load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT, OD.LAST_UPDATE_DTM, ‘INCR’ as LOADED_BY FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID WHERE OD.UPDATE_DTM >= (SELECT MAX(LAST_UPDATE_DTM) FROM ${TARGET_TABLE} )

Settings

Click to edit the settings of the SQL transformation.

Enter a name for the target dataset in Name.

You can also add a longer description in Description.

In Materialization, select if the transformed output should be materialized or not. You can choose to inherit the setting from the data task settings.

On will create tables and will handle associated ELT processing.
Off will create views that perform transformations on the fly.

In Historical Data Store (Type 2), select if you want to keep historical data. You can choose to inherit the setting from the data task settings. This setting requires that Materialization is on.

Incremental load allows you to adjust the query for an incremental data load by applying filters or other conditions to reduce the set of data being processed using macros. Incremental load is only available if the data is materialized as tables.

When Incremental load is On

The first execution of the task will perform an initial load, inserting all results of the query into your target table. Subsequent executions will execute incremental loads leveraging filters or specific conditions you have defined for incremental processing. During incremental load, the task will only process data as an update or an insert, deletes are not managed.
When Incremental load is Off

The first execution of the task will perform an initial load, inserting all results of the query into your target table. Subsequent executions will process all results of the query by comparing to your target table, and processing records that are new, changed or deleted.

If the query will select all records that should exist in the target, set Incremental load to off. Records that are not selected will be deleted in the target.

Best practice

When you add a SQL transformation, a primary key is not defined automatically. Add a key to the dataset in the Dataset tab.
Do not create column headers in the transformed output manually.
Avoid using SELECT with *, as this may return different columns each time the query is run. If there was a change in the schema, change the query accordingly and perform Describe table again. Then you can edit columns and adjust tables.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here