Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content Skip to complementary content

Landing data from data sources

The first step of transferring the data is landing the data. This involves transferring the data continuously from the on-premises data source to a landing area.

You can land data from a number of data sources through source connections.

The landing area is defined when you create the data project.

  • Qlik Cloud (via Amazon S3)

    When you land data to Qlik Cloud (via Amazon S3), you can use it to generate QVD tables ready for analytics in Qlik Cloud.

  • Cloud data warehouse

    When you land data to a cloud data warehouse, such as Snowflake or Azure Synapse Analytics, you can store tables in the same cloud data warehouse.

Create and configure a landing data task

This describes how to create a landing data task. The quickest way to create a data pipeline is to onboard data which creates a landing data task and a storage data task, ready to prepare and run. For more information, see Onboarding data.

  1. Click Add new in Qlik Cloud Data Integration home, and select Land data.
  2. In the Land data dialog, enter a name and a description of the data task.

    Select Open to open the landing data task when it is created.

    Click Create.

  3. Click Select source data.

  4. Select a data connection to the source data and click Next.

    You can use the filters in the left panel, to filter the list of connections on source type, space, and owner.

    If you don't have a data connection to the source data yet, you need to create one first, by clicking Add connection.

    For more information about setting up a connection to the supported sources, see Connecting to data sources.

    Information noteWhen you have selected tables in the next step, it is not possible to change the source data connection from an on-premises data source to a cloud data source, or vise versa. You can only change the connection to another data source of the same type.
  5. Select tables and views to include in the data asset. The selection dialog is different depending on which type of source you have connected to.

    When you are done selecting tables, click Finish.

    Datasets is displayed.

  6. You can change settings for the landing. This is not required.

    • Click Settings.

    For more information about settings, see Landing settings.

  7. You can now preview the structure and metadata of the selected data asset tables. This includes all explicitly listed tables, and tables that match the selection rules.

    If you want to add more tables from the data source, click Select source data.

  8. You can perform transformations on the datasets, filter data, or add columns.

    For more information, see Managing datasets.

  9. When you have added the transformations that you want, you can validate the datasets by clicking Validate datasets. If the validation finds errors, fix the errors before proceeding.

    For more information, see Validating and adjusting the datasets.

  10. When you are ready, click Prepare to catalog the data task and prepare it for execution.

  11. When the data task is prepared, and you are ready to start replicating data, click Run.

The replication should now start, and you can see the progress in Monitor. For more information, see Monitoring the landing task

Selecting data from a database

You can select specific tables or views, or use selection rules to include or exclude groups of tables.

Information noteIf the selection includes views, CDC is not supported.

Use % as a wildcard to define a selection criteria for schemas and tables.

  • %.% defines all tables in all schemas.

  • Public.% defines all tables in the schema Public.

Selection criteria gives you a preview based on your selections.

You can now either:

  • Create a rule to include or exclude a group of tables based on the selection criteria.

    Click Add rule from selection criteria to create a rule, and select either Include or Exclude.

    You can see the rule under Selection rules.

  • Select one or more datasets, and click Add selected datasets.

    You can see the added datasets under Explicitly selected datasets.

Selection rules only apply to the current set of tables and views, not to tables and views that are added in the future.

Running a landing task with Full load and CDC

You can run the landing task when it is prepared. This starts the replication which transfers data from the on-premises data source to the landing area.

  • Click Run to start landing data.

The replication should now start, and the data asset will have status Running. First, the full data source is copied, then changes are tracked. This means that changes are continuously tracked and transferred when discovered. This keeps the landing data in the landing area up to date.

In Qlik Cloud Data Integration home you can view status, date and time of when the landing data is updated, and the number of tables in error. You can also open the data asset and select the Tables tab to view basic metadata information for the tables.

You can monitor progress in detail by opening the Monitor tab. For more information, see Monitoring the landing task

When all tables are loaded and the first set of changes are processed, Data is updated to on the data asset card indicates that source changes up to that time are available in the data task.

Running a landing data task with Reload and compare

You can copy data using the landing data task when it is prepared.

  • Click Run to start the full load.

Data will now start being copied, and the data task will have status Running. When the full data source is copied, the status is Completed.

In Qlik Cloud Data Integration home you can view status, date and time of when the landing data is updated, and the number of tables in error. You can also open the data asset and select the Tables tab to view basic meta data information for the tables.

You can monitor progress in detail by opening the Monitor tab. For more information, see Monitoring the landing task

When all tables are loaded, Data is updated to on the data task card indicates that source changes up to that time are available in the data asset. However, some tables of the data task can be updated to a later time, depending on when they started loading. This means that data consistency is not guaranteed. For example, if the load started at 08:00 and took 4 hours, Data is updated to will show 08:00 when the load is completed. However, a table that started reloading at 11.30 will include source changes that occurred between 08:00 and 11:30.

Data is updated to reflects only tables that loaded successfully. It does not indicate anything regarding tables that their reloads have failed. In cloud targets, the field will be empty if a reload completed with all tables in error.

Reloading data when using Reload and compare

When you use full load , without CDC, you need to reload data to keep it up-to-date with the data source.

  • Click Run to perform a manual reload of data.

  • Set up a scheduled reload.

Scheduling a Reload and compare landing data task

You can schedule periodical reloads for the landing data task if you have the Can operate role in the space of the data task . Data task status must be at least Prepared for the schedule to be active.

  • Click ... on a data task and select Scheduling.

    You can set a time based schedule.

Information noteIf a data task is still reloading when a scheduled reload is about to start, the scheduled reload is skipped until the next scheduled reload event.

Monitoring the landing task

You can monitor the status and progress of the creation of the landing data task by clicking on Monitor. The update method of the landing task can either be Reload and compare or Change data capture (CDC).

Reload and compare monitoring details

You can view the following details for the landing task in Full load status:

  • Queued- the number of tables currently queued.

  • Loading - the number of tables currently being loaded.

  • Completed- the number of tables completed.

  • Error - the number of tables in error.

You can view the following details for each table in the landing task:

  • Name

    The name of the target table in the landing task.

  • State

    Table state will be either: Queued, Loading, Completed, or Error.

  • Started

    The time that loading started.

  • Ended

    The time that loading ended.

  • Duration

    Duration of the load in format hh:mm:ss.

  • Number of records

    The number of records that were replicated during the load.

  • Message

    Displays error message if the load was not processed successfully.

Change data capture (CDC) monitoring details

You can view the following CDC details for the landing task to monitor change processing in CDC status:

  • Incoming changes- the number of changes present at the source and waiting to be processed. You can view how many that are accumulated, and how many that are being applied.

  • Processed changes- the number of changes that have been processed and applied (in the last 24 hours).

  • Throughput- average target throughput in Kilobytes/second. This indicates how fast the change records are loaded to the target endpoint.

  • Latency- current latency of the data asset (hh:mm:ss). This duration represents the time from when the change is available in the source until the change is applied and available in the target or landing asset.

You can view the following details for each table in the landing task:

  • Name

    The name of the target table in the landing asset.

  • State

    Table state will be either: Accumulating changes or Error.

  • Last processed

    The date and time when the last changes were made to the table.

  • Inserts

    The number of insert operations.

  • Updates

    The number of update operations.

    Information noteUpdates are handled as inserts for SaaS application sources.
  • Deletes

    The number of delete operations.

  • Message

    Displays error message if changes to the table fail and are not processed.

If you are landing data from an on-premises source and chose Full load mode , the tables will be automatically reloaded when the landing asset is Run.

If you are landing data from an on-premises source and chose Full load and CDC mode, the tables will be continuously updated with new data after the initial full load.

Landing settings

You can set properties for the landing data task.

  • Click Settings.

General settings

  • Database

    Database to use in the target.

    Information noteThis option is not available when landing data to Qlik Cloud (via Amazon S3).
  • Data task schema

    You can change the name of the landing data task schema. Default name is landing.

    Information noteThis option is not available when landing data to Qlik Cloud (via Amazon S3).
  • Prefix for all tables and views

    You can set a prefix for all tables and views created with this task.

    Information noteYou must use a unique prefix when you want to use a database schema in several data tasks.
  • Update method

    You can land data in two different modes. It is not possible to change mode once the landing data asset is prepared.

    • Change data capture (CDC)

      The landing starts with a full load. The landed data is then kept up-to-date using CDC (Change Data Capture) technology. CDC may not be supported by all data sources. CDC does not capture DDL operations, such as renaming columns, or changes in meta data.

    • Reload and compare

      The landing performs full loads only from the source. This is useful if your source does not support CDC, but can be used with any supported data source.

      You can schedule the reloads periodically.

  • Proxy server when using Data Movement gateway

    You can select to use a proxy server when the Data Movement gateway connects to the cloud data warehouse and the storage area.

    For more information about configuring the Data Movement gateway to use a proxy server, see Setting the Qlik Cloud tenant and a proxy server.

    • Use proxy to connect to cloud data warehouse

      Information noteAvailable when using Snowflake, Google BigQuery, and Databricks.
    • Use proxy to connect to storage

      Information noteAvailable when using Azure Synapse Analytics, Amazon Redshift, and Databricks.
  • Folder to use

    You can select which folder to use when landing data.

    Information noteThis option is only available when landing data to Qlik Cloud (via Amazon S3).
    • Default folder

      This creates a folder with the default name: <project name>/<data task name>.

    • Root folder

      Store data in the root folder of the storage.

    • Folder

      Specify a folder name to use.

Runtime settings

  • LOB (Large Object)

    You can choose to include LOB columns, and set the maximum LOB size. LOBs that are larger than the maximum size will be truncated.

    Information noteWhen using Azure Synapse Analytics as a target, maximum LOB size cannot be higher than 7 MB.
  • Parallel execution

    You can set the maximum number of data connections for full loads to a number from 1 to 5.

  • For initial load

    You can set how to perform the initial full load for SaaS application data sources.

    Information noteThis option requires Data Movement gateway version 2022.11.74 or later.
    Use cached data

    This option lets you use cached data that was read when generating metadata with Full data scan selected.

    This creates less overhead regarding API use and quotas, as the data is already read from the source. Any changes since the initial data scan can be picked up by Change data capture (CDC).

    Load data from source

    This option performs a new load from the data source. This option is useful if:

    • The metadata scan was not performed recently.

    • The source dataset is small and frequently changing, and you do not want to maintain a full history of changes.

  • Read changes every (Minutes)

    Set the interval between reading changes from the source in minutes. The valid range is 1 to 1440.

    Information noteThis option is only available for data tasks with the update method Change data capture (CDC).
  • Change processing interval

    You can set the interval between processing changes from the source.

    Information noteThis option is only available when landing data to Qlik Cloud (via Amazon S3).

Operations on the landing data task

You can perform the following operations on a landing data task from the task menu.

  • Open

    This opens the landing data task. You can view the table structure and details about the data task.

  • Edit

    You can edit the name and the description of the task, and add tags.

  • Delete

    You can delete the data task.

    The following objects are not deleted, and need to be deleted manually:

    • The data in the landing area.

  • Run

    You can run the data task to start copying data.

    Running a landing task with Full load and CDC

    Running a landing data task with Reload and compare

  • Stop

    You can stop operation of a data task that is running. The landing area is not updated with changed data.

    When you stop a full load data task with a reload schedule, only the current reload is stopped. If the data task status is Stopped and there is an active reload schedule, it will reload again at the next scheduled time. You must turn off the reload schedule in Schedule reload.

  • Reload

    You can perform a manual reload of a data task in full load mode.

  • Prepare

    This prepares a task for execution. This includes:

    • Validating that the design is valid.

    • Creating or altering the physical tables and views to match the design.

    • Generating the SQL code for the data task.

    • Creating or altering the catalog entries for the task output datasets.

  • Recreate tables

    This recreates the datasets from the source.

    You must also recreate all downstream data tasks that consume this data task.

  • Scheduling

    You can setup a scheduled reload for landing data tasks in Full load mode. You can set a time based schedule that can be customized.

    You can also turn on or off scheduled reloads.

    You must have the Can operate role on the space of the data task to schedule reloads.

  • Store data

    You can create a storage data task that uses data from this landing data task.

Maintenance of the landing area

Automatic cleanup of the landing area is not supported. This can affect performance.
We recommend that you perform manual cleanups of old full load data in the landing area.

  • Qlik Cloud (via Amazon S3)

    If there are several folders of full load data, you can delete all but the most recent folder. You can also delete change data partitions that have been processed.

  • Cloud data warehouse

    You can delete full load and change table records that have been processed.

Limitations

  • Source and landing connection properties such as credentials, SSL, and proxy are propagated to the landing only when being cataloged. If there is a change in source or landing connection credentials, the new credentials will not propagate to the landing upon any stop, resume or recover. This means that the landing might fail and need to be recreated in order to recover.

  • Replicating varchar data longer than 8000 bytes, or Nvarchar longer than 4000 bytes, is not supported.

Learn more

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!