Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content Skip to complementary content

Verifying datasets

Once the initial replication process is complete in Qlik Talend Cloud, the new destination schema will contain all data replicated from your sources. Any historical data replicated through Stitch will still exist in your data warehouse under a different schema. Some data that was replicated in the Stitch schema might not be available in the Qlik Talend Cloud target schema for any of the following reasons:

  • The historical data is no longer available.

  • Different connector versions leverage different APIs for an integration, and over time, data replicated through those APIs changes.

  • Source data is deleted over time.

The following verification queries are included to help you identify differences between the datasets that were replicated with Qlik Talend Cloud and those that were replicated with Stitch. Run these example queries against your Qlik Talend Cloud destination schema and your Stitch destination schema, and then compare and identify any differences between them.

Information noteThe verification queries are meant as examples to guide you on how to validate your datasets. You might need to adjust them to your specific needs.

Example query using hash row-level comparison

The easiest way to compare the output of each of the schemas (Stitch and Qlik Talend Cloud) is to use a hash row-level comparison to find mismatches in the data.

Warning noteAlthough the following method offers the best results, it is also the most computationally expensive method and might result in extra consumption in your data warehouse.

In the example below, the generated hash is based on a few key elements of the data, such as primary_key and a date row, but it can be extended to include more columns to compare as highlighted in column_1 and column_2 in the hash.

For each query, replace the values stitch_schema.stitch_table and qtc_schema.qtc_table with your Stitch schema name and Qlik Talend Cloud schema name.

  1. Generate an MD5 Hash for tables in the existing Stitch schema.

    SELECT id, HASH_MD5([primary_key] || '|' || [column_1] || '|' || [column_2] || '|' || [date]) AS hash_value FROM [stitch_schema.stitch_table];
  2. Generate MD5 Hash for tables in the new Qlik Talend Cloud schema.

    SELECT id, HASH_MD5([primary_key] || '|' || [column_1] || '|' || [column_2] || '|' || [date]) AS hash_value FROM [qtc_schema.qtc_table];
  3. Compare the two hashes to identify data inconsistencies:

    SELECT id FROM (SELECT id, HASH_MD5([primary_key] || '|' || [column_1] || '|' || [column_2] || '|' || [date]) AS hash FROM [stitch_schema.stitch_table) stitch
    JOIN (SELECT id, HASH_MD5([primary_key] || '|' || [column_1] || '|' || [column_2] || '|' || [date]) AS hash FROM [qtc_schema.qtc_table]) qtc ON stitch.id = qtc.id WHERE stitch.hash <> qtc.hash

The comparison will return data when there are differences across both schemas. You can extend this query to make the comparison more comprehensive by including more columns to hash.

Information noteDo not include the system columns generated by Stitch in the Stitch schemas as this will cause a mismatch with all hash comparisons.

Example query comparing the structural integrity across schemas

Alternatively, you can compare and verify datasets between Stitch and Qlik Talend Cloud by running a structural validation query on the migrated datasets. This type of query compares the structural integrity across the existing Stitch schema and the new Qlik Talend Cloud schema to identify potential differences.

To effectively run structural validation on the migrated datasets, you need to compare high-level information from both schemas together. This includes comparing row and column counts from each of the migrated tables, and comparing primary keys, null values, and foreign keys in each table.

For each query, replace the values stitch_schema.stitch_table and qtc_schema.qtc_table with your Stitch schema name and Qlik Talend Cloud schema name.

  1. Compare the primary key columns that are generated in each schema using the following example queries.

    1. Stitch

    2. SELECT COLUMN_NAME, DATA_TYPE
      FROM INFORMATION_SCHEMA.COLUMNS
      WHERE TABLE_NAME = '[stitch_schema.stitch_table]';
    3. Qlik Talend Cloud

    4. SELECT COLUMN_NAME, DATA_TYPE
      FROM INFORMATION_SCHEMA.COLUMNS
      WHERE TABLE_NAME = '[qtc_schema.qtc_table]';
  2. Compare rows and columns across each table.

    1. Stitch

      SELECT COUNT(*) FROM [stitch_schema.stitch_table]; -- From Stitch's target (for example, Snowflake)
    2. Qlik Talend Cloud

      SELECT COUNT(*) FROM [qtc_schema.qtc_table]; -- From Qlik Talend Cloud's target
  3. Check for missing records that are not available in the Qlik Talend Cloud schema.

    SELECT id FROM [stitch_schema.stitch _table] WHERE id NOT IN (SELECT id FROM [qtc_schema.qtc _table]);
Information noteThe queries in this example are meant to guide you on best practices and to help identify potential differences between data that was loaded by the two platforms. Some differences might be minimal, but others might require an adjustment to downstream reporting or data modeling. These queries were written using a Snowflake data warehouse. You might need to edit the syntax to reflect your destination warehouse.

Once you have the query reports, compare the results to identify any major changes that affect your dataset now that you have transitioned to Qlik Talend Cloud. If the reports show minimal differences, you can continue with the process of pointing your downstream reporting to the new schemas without any issue. If the reports show major differences with your new destination schema, you will need to adjust downstream reporting and modeling capabilities to work with the destination schema that will be kept up-to-date by the Qlik Talend Cloud replication tasks after the migration.

Visit the discussion forum at community.qlik.com

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!