Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content Skip to complementary content

Replicating data with a Standard, Premium, or Enterprise subscription

To set up a replication task:

  1. In Data Integration > Projects, click Create project.

  2. In the New project dialog, do the following:

    1. Provide a Name for your project.
    2. Select the Space in which you want the project to be created.
    3. Optionally, provide a Description.
    4. Select Replication as the Use case.
    5. Optionally, clear the Open check box if you want to create an empty project without configuring any settings.
    6. Click Create.

      One of the following will occur:

      • If the Open check box in the New project dialog was selected (the default), the project will open.
      • If you cleared the Open check box in the New project dialog, the project will be added to your list of projects. You can open the project later by selecting Open from the project's menu.
  3. After the project opens, click Replicate data.

    The Replicate data wizard opens.

  4. In the General tab, specify a name and description for the replication task. Then click Next.

  5. In the Select source connection tab, select a connection to the source data. You can optionally edit the connection settings by selecting Edit from the menu in the Actions column.

    If you have not yet created a connection to your data source, you need to create one by clicking Create connection in the top right of the tab.

    You can filter the list of connections using the filters on the left. Connections can be filtered according to source type, gateway, space, and owner. The All filters button above the connections list shows the number of current filters. You can use this button to close or open the Filters panel on the left. Currently active filters are also shown above the list of available connections.

    You can also sort the list by selecting Last modified, Last created, or Alphabetical from the drop-down list on the right. Click the arrow to the right of the list to change the sorting order.

    After you have selected a data source connection, optionally click Test connection in the top right of the tab(recommended), and then click Next.

  6. In the Select datasets tab, select tables and/or views to include in the replication task. You can also use wildcards and create selection rules as described in Selecting data from a database.

  7. In the Select target connection tab, select the target from the list of available connections and then click Next. In terms of functionality, the tab is the same as the Select source connection tab described earlier.

  8. In the Settings tab, optionally change the following settings and then click Next.

    Information noteWhen replicating from SaaS application sources, the Full load replication mode is enabled by default and cannot be disabled.
    • Full load: Loads the data from the selected source tables to the target platform and creates the target tables if necessary. The full load occurs automatically when the task is started, but can also be performed manually should the need arise.
    • Apply changes: Keeps the target tables up-to-date with any changes made to the source tables.

    • Store changes: Stores the changes to the source tables in Change Tables (one per source table).

      For more information, see Store changes.

    When working with Data Movement gateway, changes are captured from the source in near real-time. When working without Data Movement gateway (by setting Data gateway to None in the connector settings), changes are captured according to the scheduler settings. The default change capture interval is every six hours. For more information, see Scheduling tasks when working without Data Movement gateway.

    Information noteWhen replicating to data warehouse targets, you cannot select which Apply changes mode to use. Changes will always be applied in Batch optimized mode for maximum efficiency.

    Changes are applied to the target tables using one of the following methods:

    • Batch optimized: This is the default. When this option is selected, changes are applied in batches. A preprocessing action occurs to group the transactions into batches in the most efficient way.
    • Transactional: Select this option to apply each transaction individually, in the order it is committed. In this case, strict referential integrity is ensured for all tables.

    When replicating to the data warehouses listed below, you need to set a staging area. Data is processed and prepared in the staging area before being transferred to the warehouse.

    Either select an existing staging area or click Create new to define a new staging area and follow the instructions in Connecting to cloud storage.

    To edit the connection settings, click Edit. To test the connection (recommended), click Test connection.

    For information on which staging areas are supported with which data warehouses, see the Supported as a staging area column inTarget platform use cases and supported versions.

    Information noteThis setting is only relevant when accessing data sources without Data Movement gateway. For information on the benefits of Data Movement gateway and use cases that require it, see Qlik Data Gateway - Data Movement.
    • Replicate data every: You can schedule how often to capture changes from the data source and set a Start time and Start date. If the source datasets support CDC (Change data capture), only the changes to the source data will be replicated and applied to the corresponding target tables. If the source datasets do not support CDC (for example, Views), changes will be applied by reloading of all the source data to the corresponding target tables. If some of the source datasets support CDC and some do not, two separate sub-tasks will be created (assuming the Apply changes or Store changes replication options are selected): one for reloading the datasets that do not support CDC, and the other for capturing the changes to datasets that do support CDC.

      The task setup wizard allows you to schedule a basic hourly interval. After you have completed setting up the task, you can explore different scheduling options, as described in Scheduling tasks when working without Data Movement gateway.

    For information about minimum scheduling intervals according to data source type and subscription tier, see Minimum allowed scheduling intervals.

  9. In the Summary tab, a visual of the data pipeline is displayed. If some of the selected datasets do not support CDC, two pipelines will be displayed: one for the CDC task and the other for the Reload task. Choose whether to Open the <name> data task or Do nothing, and then click Create.

    Depending on your choice, either the task will be opened or a list of projects will be displayed.

  10. If you chose to open the task, the Datasets tab will show the structure and metadata of the selected source tables. This includes all explicitly listed tables as well as tables that match the selection rules.

    If you want to add more tables from the data source, click Select source data.

  11. Optional, change the task setting as described in Data replication task settings.

  12. You can perform transformations on the datasets, filter data, or add columns.

    For more information, see Managing datasets.

  13. When you have added the transformations that you want, you can validate the datasets by clicking Validate datasets. If the validation fails, resolve the errors before proceeding.

    For more information, see Validating and adjusting the datasets.

  14. When you are ready, click Prepare to catalog the data task and prepare it for execution.

  15. When the data task has been prepared, click Run.

    For information on recovering tasks and other methods of running tasks, see Advanced run options.

  16. The replication task should now start, and you can see the progress in Monitor. For more information, see Monitoring an individual data task

Setting load priority for datasets

You can control the load order of datasets in your data task by assigning a load priority to each dataset. This can be useful, for example, if you want to load smaller datasets before large datasets.

  1. Click Load priority.

  2. Select a load priority for each dataset.

    The default load priority is Normal. Datasets will be loaded in the following order of priority:

    • Highest

    • Higher

    • High

    • Normal

    • Low

    • Lower

    • Lowest

    Datasets with the same priority are loaded in no particular order.

  3. Click OK.

Information noteDatasets from SaaS application sources may contain dependencies in load order. Consider this when setting the load priority.

Refreshing metadata

You can refresh the metadata in the task to align with changes in the metadata of the source in the Design view of a task. For SaaS applications using Metadata manager, Metadata manager must be refreshed before you can refresh metadata in the data task.

Information noteThis operation only affects tables in the Design view of a task.
  1. You can either:

    • Click ..., and then Refresh metadata to refresh metadata for all datasets in the task.

    • Click ... on a dataset in Datasets, and then Refresh metadata to refresh metadata for a single dataset.

    You can view the status of the metadata refresh under Refresh metadata in the lower part of the screen. You can see when metadata was last refreshed by hovering the cursor on info button.

  2. Prepare the data task to apply the changes.

    When you have prepared the data task and the changes are applied, the changes are removed from Refresh metadata.

You must prepare storage tasks that consume this task to propagate the changes.

If a column is removed, a transformation with Null values is added to ensure that storage will not lose historical data.

Limitations for refreshing metadata

  • A rename with a dropped column before that, in the same time slot, will be translated into the dropped column rename if they have the same data type and data length.

    Example:  

    Before: a b c d

    After: a c1 d

    In this example, b was dropped and c was renamed to c1, and b and c have same data type and data length.

    This will be identified as a rename of b to c1 and a drop of c.

  • Last column rename is not recognized, even if the last column was dropped,and the one before it was renamed.

    Example:  

    Before: a b c d

    After: a b c1

    In this example, d was dropped and c was renamed to c1.

    This will be identified as a drop of c and d, and an add of c1.

  • New columns are assumed to be added at the end. If columns are added in the middle with the same data type as the next column, they may be interpreted as a drop and rename.

Schema evolution

Schema evolution allows you to easily detect structural changes to multiple data sources and then control how those changes will be applied to your task. Schema evolution can be used to detect DDL changes that were made to the source data schema. You can also apply some changes automatically.

Information noteSchema evolution is not available with web application sources, or with a Qlik Talend Cloud Starter subscription.

For each change type, you can select how to handle the changes in the Schema evolution section of the task settings. You can either apply the change, ignore the change, suspend the table, or stop task processing.

You can set which action to use to handle the DDL change for every change type. Some actions are not available for all change types.

  • Apply to target

    Apply changes automatically.

  • Ignore

    Ignore changes.

  • Suspend table

    Suspend the table. The table will be displayed as in error in Monitor.

  • Stop task

    Stop processing of the task. This is useful if you want to handle all schema changes manually. This will also stop scheduling, that is, scheduled runs will not be performed.

The following changes are supported:

  • Add column

  • Rename column

  • Change column data type

  • Drop table

  • Drop column

  • Add table that matches the selection pattern

    If you used a Selection rule to add datasets that match a pattern, new tables that meet the pattern will be detected and added.

For more information about task settings, see Schema evolution

You can also get notifications about changes that are handled with schema evolution. For more information, see Setting notifications for changes in operation.

Limitations for schema evolution

The following limitations apply to schema evolution:

  • Schema evolution is only supported when using CDC as update method.

  • When you have changed schema evolution settings, you must prepare the task again.

  • If you rename tables, schema evolution is not supported. In this case you must refresh metadata before preparing the task.

  • If you are designing a task, you must refresh the browser to receive schema evolution changes. You can set notifications to be alerted on changes.

  • In Landing tasks, dropping a column is not supported. Dropping a column and adding it will result in a table error.

  • In Landing tasks, a drop table operation will not drop the table. Dropping a table and then adding a table will only truncate the old table, and a new table will not be added.

  • Changing the length of a column is not possible for all targets depending on support in the target database.

  • If a column name is changed, explicit transformations defined using that column will not take affect as they are based on column name.

  • Limitations to Refresh metadata also apply for schema evolution.

When capturing DDL changes, the following limitations apply:

  • When a rapid sequence of operations occurs in the source database (for instance, DDL>DML>DDL), Qlik Talend Data Integration might parse the log in the wrong order, resulting in missing data or unpredictable behavior. To minimize the chances of this happening, best practice is to wait for the changes to be applied to the target before performing the next operation.

    As an example of this, during change capture, if a source table is renamed multiple times in quick succession (and the second operation renames it back to its original name), an error that the table already exists in the target database might be encountered.

  • If you change the name of a table used in a task and then stop the task, Qlik Talend Data Integration will not capture any changes made to that table after the task is resumed.
  • Renaming a source table while a task is stopped is not supported.

  • Reallocation of a table's Primary Key columns is not supported (and will therefore not be written to the DDL History Control table).
  • When a column's data type is changed and the (same) column is then renamed while the task is stopped, the DDL change will appear in the DDL History Control table as “Drop Column” and then “Add Column” when the task is resumed. Note that the same behavior can also occur as a result of prolonged latency.
  • CREATE TABLE operations performed on the source while a task is stopped will be applied to the target when the task is resumed, but will not be recorded as a DDL in the DDL History Control table.
  • Operations associated with metadata changes (such as ALTER TABLE, reorg, rebuilding a clustered index, and so on) may cause unpredictable behavior if they were performed either:

    • During Full Load

      -OR-

    • Between the Start processing changes from timestamp and the current time (i.e. the moment the user clicks OK in the Advanced Run Options dialog).

      Example:

      IF:

      The specified Start processing changes from time is 10:00 am.

      AND:

      A column named Age was added to the Employees table at 10:10 am.

      AND:

      The user clicks OK in the Advanced Run Options dialog at 10:15 am.

      THEN:

      Changes that occurred between 10:00 and 10:10 might result in CDC errors.

    Information note

    In any of the above cases, the affected table(s) must be reloaded in order for the data to be properly moved to the target.

  • The DDL statement ALTER TABLE ADD/MODIFY <column> <data_type> DEFAULT <> does not replicate the default value to the target and the new/modified column is set to NULL. Note that this may happen even if the DDL that added/modified the column was executed in the past. If the new/modified column is nullable, the source endpoint updates all the table rows before logging the DDL itself. As a result, Qlik Talend Data Integration captures the changes but does not update the target. As the new/modified column is set to NULL, if the target table has no Primary Key/Unique Index, subsequent updates will generate a "zero rows affected" message.
  • Modifications to TIMESTAMP and DATE precision columns will not be captured.

Handling changes that are not automatically applied

This describes how to handle changes that cannot be applied to target, that is, the action is Ignore, Suspend, or Stop Task.

DDL change To implement the change To preserve the current state
Add column Refresh metadata, prepare the task, and then run the task. No action required
Create table Refresh metadata, prepare the task, and then run the task. No action required
Change column data type

Check limitations. if no limitation applies:

Refresh metadata, prepare the task, and then run the task.

Preserving the current state is not always supported. Possible workarounds:

  • Change the column data type manually. This needs to be repeated in downstream tasks.

  • Create a new column and direct the old column to the new one, then create a view manually that combine the two columns.

Rename table

Check limitations. if no limitation applies:

Prepare the task, and then run the task.

Define an explicit rename rule to the old name.
Rename column

Check limitations. if no limitation applies:

Refresh metadata, prepare the task, and then run the task.

Refresh metadata, then define an explicit rename rule to the old name.

Limitations and considerations when replicating data

Transformations are subject to the following limitations:

  • Transformations are not supported for columns with right-to-left languages.
  • Transformations cannot be performed on columns that contain special characters (e.g. #, \, /, -) in their name.

  • The only supported transformation for LOB/CLOB data types is to drop the column on the target.
  • Using a transformation to rename a column and then add a new column with the same name is not supported.

Changing nullability is not supported on columns that are moved, either changing it directly or using a transformation rule. However, new columns created in the task are nullable by default.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!