Lake landing settings

The following Lake landing task settings apply to Qlik Open Lakehouse projects.

You can configure settings for the Lake landing data task.

Open the landing task and click Settings in the toolbar.

The Settings: <Task-Name> dialog opens. The available settings are described below.

General

Update method

The landing task always starts with a Full Load. After the Full Load completes, you can keep the landed data up-to-date using one of the following methods:

It is not possible to change the update method once the landing data task prepare operation has completed.

Change data capture (CDC)

The landed data is kept up-to-date using CDC (Change Data Capture) technology. CDC may not be supported by all data sources. CDC does not capture DDL operations, such as renaming columns, or changes in metadata.

If your data also contains views or tables that do not support CDC, two data pipelines will be created. One pipeline with all tables supporting CDC, and another pipeline with all other tables and views using Reload as the update method.
Reload

All of the landed data is reloaded from the source. This is useful if your source does not support CDC, but can be used with any supported data source.

You can schedule the reloads periodically.

Information noteThis setting is not available when using a SaaS application Preview connector. Preview connectors are indicated by a button, both in the Create connection dialog and in the online help.

Folder to use

Select which folder to use when landing data to the staging area.

Default folder

This creates a folder with the default name: <project name>/<data task name>.
Root folder

Store data in the root folder of the storage.
Folder

Specify a folder name to use.

Data uploading

File attributes

Configure the file settings for uploading data.

Field delimiter
The field delimiter character used in the source data file. The default is ",".
Null value
Optional default string to replace null values in the source data file.
Record delimiter
The record delimiter character used in the source data file. The default is "\n".
Quote character
The quote character used in the source data file. The default is """.
Quote escape character
The quote escape character used in the source data file. The default is """.
Maximum file size (KB)

The maximum size a file can reach before it is closed. Smaller files might be uploaded faster (depending on the network) and improve performance when used in conjunction with the parallel execution option. However, it is generally considered bad practice to clutter the database with small files.
Compress files using

When selected, the CSV files will be compressed using GZIP.
Information note
- Requires Data Movement gateway 2023.5.16 or later.

Metadata

LOB columns

Include LOB columns and limit column size to (KB):

You can choose to include LOB columns in the task, and set the maximum LOB size. LOBs that are larger than the maximum size will be truncated.

Full load

These settings are not available for tasks defined with a SaaS application Preview connector. Preview connectors are indicated by a Image of the Preview button

button, both in the Create connection dialog and in the online help.

Performance tuning

Maximum number of tables to load in parallel: Enter the maximum number of tables to load into the target at one time. The default value is 5.
Transaction consistency timeout (seconds): Enter the number of seconds to wait for open transactions to close, before beginning the Full Load operation. The default value is 600 (10 minutes). The full load will start after the timeout value is reached even if there are transactions that are still open.

Information noteTo replicate transactions that were open when Full Load started but were only committed after the timeout value was reached, you need to reload the target tables.
Commit rate during full load: The maximum number of events that can be transferred together. The default value is 10000.

After full load completes

Create primary key or unique: Select this option if you want to delay creation of the primary key or unique index on the data platform until after full load completes.

Error handling

Data error handling is supported with the Change Data Capture (CDC) update method only.

For data truncation errors: Select what you want to happen when a truncation occurs in one or more specific records. You can select one of the following from the list:

Ignore: The task continues and the error is ignored.
Suspend table: The task continues, but data from the table with the error record is moved into an error state and its data is not replicated
Stop task: The task is stopped and manual intervention is required.

For other data errors: Select what you want to happen when an error occurs in one or more specific records. You can select one of the following from the list:

Ignore: The task continues and the error is ignored.
Suspend table: The task continues, but data from the table with the error record is moved into an error state and its data is not replicated
Stop task: The task is stopped and manual intervention is required.

Escalate error handling when other data errors reach (per table): Select this check box to escalate error handling when the number of non-truncation data errors (per table) reaches the specified amount. Valid values are 1-10,000.

Escalation action: Choose what should happen when error handling is escalated. Note that the available actions are dependent on the action selected from the For other data errors drop-down list described above.

Suspend table (default): The task continues, but data from the table with the error record is moved into an error state and its data is not landed.
Stop task: The task is stopped and manual intervention is required.

Table errors

Number of retries before returning a table error

This option allows you to control when the table error handling policy will be triggered. By default, after a table error is encountered, the table will be suspended or the task will be stopped (according to the selected action) after three retries. Sometimes, a table error will occur due to planned maintenance of the SaaS application. In these cases, the default number of retries might not be enough for the maintenance to be completed before the table error handling policy is triggered. This also depends on the task's scheduling interval as a retry will be performed each time the task runs. So, for example, if you schedule a task to run every hour and the SaaS application is taken offline for maintenance just as the task starts to run, the default of three retries will allow the SaaS application to be offline for up to three hours without the table error handling policy being triggered. A longer maintenance window would require you to increase the number of retries (or change the scheduling) to prevent the table error handling policy being triggered.

To sum up, if you aware that your SaaS application undergoes periodic maintenance, best practice is to increase the number of retries according to the scheduling so that the maintenance can be completed without table error handling policy being triggered.

This option will only be shown for tasks configured with a Lite or Standard SaaS application connector.
If you are using Data Movement gateway, version 2024.11.70 or later is required.

When encountering a table error: Select one of the following from the drop-down list:

Suspend table (default): The task continues but data from the table with the error record is moved into an error state and its data is not replicated
Stop task: The task is stopped and manual intervention is required.

Escalate error handling when table errors reach (per table): Select this check box to escalate error handling when the number of table errors (per table) reaches the specified amount. Valid values are 1-10,000.

Escalation action: The escalation policy for table errors is set to Stop task and cannot be changed.

Environmental

Maximum retry count: Select this option and then specify the maximum number of attempts to retry a task when a recoverable environmental error occurs. After the task has been retried the specified number of times, the task is stopped and manual intervention is required.

To never retry a task, clear the check box or specify "0".

To retry a task an infinite number of times, specify "-1"
- Interval between retry attempts (seconds): Use the counter to select or type the number of seconds that the system waits between attempts to retry a task.
  
  Valid values are 0-2,000.
Increase retry interval for long outages: Select this check box to increase the retry interval for long outages. When this option is enabled, the interval between each retry attempt is doubled, until the Maximum retry interval is reached (and continues retrying according to the specified maximum interval).
- Maximum retry interval (seconds): Use the counter to select or type the number of seconds to wait between attempts to retry a task when the Increase retry interval for long outages option is enabled. Valid values are 0-2,000.

Change processing tuning

This tab is only available when the update method is Change data capture (CDC).

Transaction offload tuning

Offload transactions in progress to disk if:

Transaction data is usually kept in memory until it is fully committed to the source or target. However, transactions that are larger than the allocated memory or that are not committed within the specified time limit will be offloaded to disk.

Total memory size for all transactions exceeds (MB): The maximum size that all transactions can occupy in memory before being offloaded to disk. The default value is 1024.
Transaction duration exceeds (seconds): The maximum time that each transaction can stay in memory before being offloaded to disk. The duration is calculated from the time Qlik Talend Data Integration started capturing the transaction. The default value is 60.

Batch tuning

Minimum number of changes per transaction: The minimum number of changes to include in each transaction. The default value is 1000.
Information note
The changes will be applied to the target either when the number of changes is equal to or greater than the Minimum number of changes per transaction value OR when the Maximum time to batch transactions before applying (seconds) value described below is reached - whichever occurs first. Because the frequency of changes applied to the target is controlled by these two parameters, changes to the source records may not immediately be reflected in the target records.
Maximum time to batch transactions before applying (seconds): The maximum time to collect transactions in batches before declaring a timeout. The default value is 1.

Miscellaneous tuning

Statements cache size (number of statements): The maximum number of prepared statements to store on the server for later execution (when applying changes to the target). The default is 50. The maximum is 200.
DELETE and INSERT when updating a primary key column: This option requires full supplemental logging to be turned on in the source database.

Schema evolution

Select how to handle the following types of DDL changes in the schema. When you have changed schema evolution settings, you must prepare the task again. The table below describes which actions are available for the supported DDL changes.

When the task is configured with a SaaS application Preview connector, only the Change column data type DDL change is supported. To find out if your connector is in preview, see the connector help.

DDL change	Apply to target	Ignore	Suspend table	Stop task
Add column	Yes	Yes	Yes	Yes
Rename column	No	No	Yes	Yes
Rename table	No	No	Yes	Yes
Change column data type	No	Yes	Yes	Yes
Create table If you used a Selection rule to add datasets that match a pattern, new tables that meet the pattern will be detected and added.	Yes	Yes	No	No

Character substitution

You can substitute or delete source characters in the target database and/or you can substitute or delete source characters that are not supported by a selected character set.

All characters must be specified as Unicode code points.
Character substitution will also be performed on the Control tables.

Invalid values will be indicated by a red triangle in the top right of the table cell. Hovering your mouse cursor over the triangle will show the error message.

Any table-level or global transformations defined for the task will be performed after the character substitution has been completed.
Substitutions actions defined in the Substitute or Delete Source Characters table are performed before the substitution action defined in the Substitute or Delete Source Characters Unsupported by the Selected Character Set table.
Character substitution does not support LOB data types.

Substituting or deleting source characters

Use the Substitute or delete source characters table to define replacements for specific source characters. This may be useful, for example, when the Unicode representation of a character is different on the source and target platforms. For example, on Linux, the minus character in the Shift_JIS character set is represented as U+2212, but on Windows it is represented as U+FF0D.

Substitution actions
To	Do This
Define substitution actions.	Click the Add character button above the table. Specify a source character and a target character in the Source character and Substitute character fields respectively. For example to replace the letter "a" with the letter "e", specify 0061 and 0065 respectively. Information note To delete the specified source character, enter 0 in the Substitute character column. Repeat steps 1-2 to replace or delete additional characters.
Edit the specified source or target character	Click at the end of the row and selected Edit.
Delete entries from the table	Click at the end of the row and selected Delete.

Substituting or deleting source characters unsupported by the selected character set

Use the Unsupported source characters by character set table to define a single replacement character for all characters not supported by the selected character set.

Unsupported character substitution actions
To	Do This
Define or edit a substitution action.	Select a character set from the Character set drop-down list in the table. Any characters not supported by the selected character set will be replaced on the target by the character specified in step 2 below. In the Substitute character column, click anywhere in the column and specify the replacement character. For example, to replace all unsupported characters with the letter "a", enter 0061. Information note To delete all unsupported characters, enter 0.
Disable the substitution action.	Select the blank entry from the Character Set drop-down list.

More options

These options are not exposed in the UI as they are only relevant to specific versions or environments. Consequently, do not set these options unless explicitly instructed to do so by Qlik Support or product documentation.

To set an option, simply copy the option into the Add feature name field and click Add. Then set the value or enable the option according to the instructions you received.

Loading dataset segments in parallel

This setting is not available for SaaS application sources and only available for a specific subset of source and target databases.

During full load, you can accelerate the loading of large datasets by splitting the dataset into segments, which will be loaded in parallel. Tables can be split by data ranges, all partitions, all subpartitions, or specific partitions.

For more information, see Landing dataset segments in parallel

Scheduling CDC tasks when working without Data Movement gateway

Data Movement gateway is not supported with a Qlik Talend Cloud Starter subscription and optional with other subscription tiers. When working without Data Movement gateway, you keep the target data up-to-date by setting a scheduling interval. The schedule determines how often the target datasets will be updated with changes to the source datasets. Whereas the schedule determines the update frequency, the dataset type determines the update method. If the source datasets support CDC (Change data capture), only the changes to the source data will be replicated and applied to the corresponding target tables. If the source datasets do not support CDC (for example, Views), changes will be applied by reloading of all the source data to the corresponding target tables. If some of the source datasets support CDC and some do not, two separate sub-tasks will be created: one for reloading the datasets that do not support CDC, and the other for capturing the changes to datasets that do support CDC. In this case, to ensure data consistency, it is strongly recommended to set the same schedule for both sub-tasks.

For information about minimum scheduling intervals according to data source type and subscription tier, see Minimum allowed scheduling intervals.

To change the scheduling:

Open you data project and then do one of the following:
- In tasks view, click on the data task and select Scheduling.
- In pipeline view, click on the data task and select Scheduling.
- Open the landing task and click the Scheduling toolbar button.
Change the scheduling settings as needed and then click OK.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here