Landing data to Qlik Open Lakehouse

Data lands in Amazon S3, ready for the storage data task to convert it into the Iceberg open table format. You can land data from any source supported by Qlik.

Landing data to a Qlik Open Lakehouse requires a pre-configured Amazon S3 bucket. Qlik Open Lakehouse is specifically optimized for high-volume, streaming data sources, and compatible with all Qlik-supported data sources. Data lands in CSV format in S3. The Storage data task converts the data to Iceberg format and copies it to Parquet files. The Iceberg specification enables data to be queried from any engine that natively supports Trino SQL, for example Amazon Athena, Ahana, or Starburst Enterprise. Optionally, tables can be mirrored to Snowflake where they can be queried without duplicating data.

Landing data to a Qlik Open Lakehouse is available in projects with an AWS Glue Data Catalog target connection.

Preparations

To mirror data to Snowflake, you must first create a Qlik Open Lakehouse project to ingest your data and store it using the Iceberg open table format. You can add a Mirror data task after the Storage data task. To perform data transformations, create a Snowflake project that uses the Qlik Open Lakehouse project as the source. For more information, see Mirroring data to a cloud data warehouse.
Although you can configure your source and target connection settings in the task setup wizard, to simplify the setup procedure, it is recommended to do this before you create the task.

Creating a Lake landing task

To create a lake landing task, do the following:

Create a project, and select Data pipeline in Use case.
Select Qlik Open Lakehouse in Data platform and establish a connection to the data catalog.
Set up a storage area in Landing target connection.
Click Create to create the project.

When you onboard data or create a landing task in the project, a Lake landing task is created instead of a Landing task. Lake landing tasks operate and behave mostly like Landing tasks, except for the fact that they land data to cloud storage. For more information, see Landing data from data sources.

All files are landed in CSV format. After landing data is updated, the storage task that consumes the landing task updates the external tables.

Settings

For more information about task settings, see Lake landing settings.

Limitations

Landed data is not partitioned in the bucket due to the storage task running every minute. Therefore the data partition frequency cannot be updated in the task settings.
Although landing data from SaaS sources is scheduled, the storage task runs mini-batches every minute. This requires an active lakehouse cluster at minimal cost.
If a Primary Key value changes, records with the original key are marked as Deleted, and the row containing the changed key value is marked as Insert.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here