Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content

Landing data in a data lake

You can set up a Land data in data lake task to land data to an Amazon S3 bucket. Although you can configure your source and Amazon S3 target connection settings in the task setup wizard, to simplify the setup procedure, it is recommended to do this before you set up the task.

For information on configuring a connection to your Amazon S3 bucket, see Amazon S3 target.

For information on configuring connections to your data sources, see Connecting to data sources

To set up a data lake landing task:

  1. Click the Add new button in the top right and then select Create data project from the drop-down menu.

  2. In the New data project dialog, do the following:

    1. Provide a Name for your project.
    2. Select the Space in which you want the project to be created.
    3. Optionally, provide a Description.
    4. Select Replication as the Use case.
    5. Optionally, clear the Open check box if you want to create an empty project without configuring any settings.
    6. Click Create.

      One of the following will occur:

      • If the Open check box in the New data project dialog was selected (the default), the project will open.
      • If you cleared the Open check box in the New data project dialog, the project will be added to your list of projects. You can open the project later by selecting Open from the project's menu.
  3. After the project opens, click Land data in data lake.

    The Land data in data lake wizard opens.

  4. In the General tab, specify a name and description for the data lake landing task. Then click Next.

  5. In the Select source connection tab, select a connection to the source data. You can optionally edit the connection settings by selecting Edit from the menu in the Actions column.

    If you don't have a data connection to the source data yet, you need to create one first, by clicking Create connection in the top right of the tab.

    You can filter the list of connections using the filters on the left. Connections can be filtered according to source type, gateway, space, and owner. The All filters button above the connections list shows the number of current filters. You can use this button to close or open the Filters panel on the left. Currently active filters are also shown above the list of available data connections.

    You can also sort the list by selecting Last modified, Last created, or Alphabetical from the drop-down list on the right. Click the arrow to the right of the list to change the sorting order.

    After you have selected a data source connection, optionally click Test connection in the top right of the tab(recommended), and then click Next.

  6. In the Select datasets tab, select tables and/or views to include in the data lake landing task. You can also use wildcards and create selection rules as described in Selecting data from a database.

  7. In the Select target connection tab, select the Amazon S3 target from the list of available connections and then click Next. In terms of functionality, the tab is the same as the Select source connection tab described earlier.

  8. In the Settings tab, optionally change the following settings and then click Next.

    • Change data capture (CDC): The data lake landing tasks starts with a full load (during which all of the selected tables are landed). The landed data is then kept up-to-date using CDC (Change Data Capture) technology.
    • Reload: Performs a full load of the data from the selected source tables to the target platform and creates the target tables if necessary. The full load occurs automatically when the task is started, but can also be performed manually or scheduled to occur periodically as needed.

    Select one of the following, according to which bucket folder you want the files to be written to:

    • Default folder: The default folder format is <your-project-name>/<your-task-name>
    • Root folder: The files will be written to the bucket directly.
    • Folder: Enter the folder name. The folder will be created during the data lake landing task if it does not exist.
  9. In the Summary tab, a visual of the data pipeline is displayed. Choose whether to Open the <name> task or Do nothing. Then click Create.

    Depending on your choice, either the task will be opened or a list of projects will be displayed.

  10. If you chose to open the task, the Datasets tab will show the structure and metadata of the selected data asset tables. This includes all explicitly listed tables as well as tables that match the selection rules.

    If you want to add more tables from the data source, click Select source data.

  11. Optional, change the task setting as described in Landing in a data lake settings.

  12. You can perform transformations on the datasets, filter data, or add columns.

    For more information, see Managing datasets.

  13. When you have added the transformations that you want, you can validate the datasets by clicking Validate datasets. If the validation fails, resolve the errors before proceeding.

    For more information, see Validating and adjusting the datasets.

  14. When you are ready, click Prepare to catalog the landing task and prepare it for execution.

  15. When the data task has been prepared, click Run.

  16. The data lake landing task should now start. You can monitor its progress in Monitor view. For more information, see Landing data in a data lake

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!