Mirroring data to a cloud data warehouse
Mirror tasks enable you to query data stored in your Qlik Open Lakehouse from Snowflake. Data is reflected in Snowflake without duplication. Mirrored tables ensure storage and engineering costs are minimal, and maintains a single source of truth.
After onboarding your data to a Qlik Open Lakehouse, you can mirror your data to Snowflake using mirror tables. Data in Iceberg can be queried using a query engine that natively supports Iceberg, such as Amazon Athena. However, mirror tables are ideal when you want to implement the open Iceberg lakehouse architecture but continue to use Snowflake as your query engine. The Mirror data task automates the process of making Iceberg tables accessible in Snowflake by declaring them as external tables. Snowflake refers to the Iceberg table as an external table because it does not manage the table, it only reads from it. External tables allows you to query your Iceberg data in Snowflake without migrating data or the management of your tables into Snowflake.
To mirror data, you create an external volume in Snowflake that points to the S3 bucket where the Iceberg tables are located and updated by the storage task. You then create a Snowflake catalog integration that points to the data catalog used by your Qlik Open Lakehouse project, for example, AWS Glue Data Catalog.
The mirror task executes the necessary DDL statements to create the external tables in Snowflake. The table (schema) is displayed in Snowflake, along with the changes and history tables, but if you look at the table definition, it displays as a view created on top of the external table. Snowflake users can query the views as if the data is stored in their Snowflake environment. Mirrored data offers high performance, as Qlik continues to manage and optimize the data.
Refresh mechanism
Snowflake points to the metadata that reflects the latest snapshot of the available data within Iceberg. There are two ways to refresh the metadata:
-
Qlik-managed: This option requires an active Snowflake warehouse, and includes monitoring and data preview. Select this option when you want to create downstream transformations, and monitor and schedule the task. Qlik owns the metadata refresh operation, so you can configure this manually, for example, to run every 30 minutes. This option is particularly relevant for multi-table transformations, as the metadata for all tables updates simultaneously. While you may lose some of the real-time gain that the Snowflake-managed refresh offers, you maintain consistency between tables. For multi-table transformations, you can trigger the refresh as often as needed. Qlik recommends that you set event-based triggering on downstream transformation tasks that follow the scheduled mirror task.
-
Snowflake-managed: A serverless operation that leverages Snowpipe infrastructure without requiring or activating a compute warehouse. This option is recommended when you do not require downstream transformations. The refresh interval is configured when you create the Snowflake catalog integration. To monitor the status of the automatic refresh, query the SYSTEM$AUTO_REFRESH_STATUS in Snowflake. Qlik loses ownership of the process and is unable to monitor tasks of this type.
Each mirror task within a project can be configured with its own refresh mechanism: if you create two mirror tasks, one can use the Qlik-managed refresh, and the other can use the Snowflake-managed refresh.
Schema evolution
If you manually add or remove columns or tables from the storage task, or indirectly from the landing task, changes are automatically reflected in the mirror task design. You must prepare the task to apply the changes to the mirrored table. If schema evolution is enabled in the mirror and storage tasks settings, any schema changes that are automatically detected in the storage task are applied to the mirror tables.
Prerequisites
A Mirror data task can only be added after a storage task has been created in a Qlik Open Lakehouse project. A storage task can have multiple Mirror data tasks. A mirror data task can only be associated with one storage task.
To mirror your data to Snowflake, you need:
-
A connection to the Snowflake database where you want to mirror your data. Optionally, you can create a new connection during the mirror task creation. The requirements can be found in the instructions for connecting to Snowflake.
-
A Snowflake external volume. This grants Snowflake with restricted access to your S3 location. To configure the volume, see Configure an external volume for Amazon S3.
-
An AWS Glue Data Catalog integration. This enables Snowflake to connect to data held in Iceberg open table format in your object store. To configure a catalog integration, see Configure a catalog integration for AWS Glue.
Creating a mirror data task
To mirror data to Snowflake, do the following:
-
Open the project that contains the storage task for the data you want to mirror.
-
Click
More actions on the storage task. Select Mirror data, and configure it:
-
Name: Enter a name for your mirror task.
-
Description: Optionally, describe the purpose of the task.
-
Connection:
-
To use an existing connection, click Select to open the Secure source connection dialog. Select the Space where your connection is located, then select the connection. Click Edit to amend the connection properties.
-
To create a new connection, click Create connection to open the Create connection dialog and follow the instructions.
-
-
Database: Enter the name of the database where you want to mirror the data.
-
Snowflake external volume: Enter the name of the external volume created in Snowflake.
-
Snowflake catalog integration: Enter the name of the catalog integration created in Snowflake.
-
Select how you want your data to be refreshed in Snowflake:
-
Qlik-managed: Select this option if you want to create downstream transformations. This requires an active Snowflake warehouse and is monitored by Qlik.
-
Snowflake-managed: Select this option when you do not want to perform downstream transformations. A Snowflake warehouse is not required and therefore not monitored by Qlik. This is administered and monitored in Snowflake.
-
Create the mirror task to add it to the storage task in your pipeline.
-
Click
More actions on the mirror task and select Open. Ensure you are displaying the Design view.
-
To select a subset of the available datasets, click Select source data and remove any unwanted datasets.
-
Click Prepare to create the external table in Snowflake and mirror the data.
Performing transformations
If you need to transform your data, you can create a Snowflake project and use a mirror data task within your Qlik Open Lakehouse project as the source. To create a transformation task using data from an existing project, see Building cross-project pipelines.
Deleting a mirror data task
When you delete a mirror task, the external tables and views are deleted from Snowflake and no longer available to query. A storage task cannot be deleted while a mirror task is reading from it.
To delete a mirror data task, do the following:
-
On the Mirror data task you want to delete, click the
More actions menu on the task and select Delete. -
In the confirmation dialog, click Delete.