Creating a data flow
Start by creating a new data flow.
-
From the launcher menu, select Analytics > Create or Analytics > Prepare data.
-
Click Data flow.
The Create a new data flow dialog opens
-
In the corresponding field, enter a Name for your data flow.
-
From the corresponding drop-down list, select in which Space you want to save the data flow.
-
Add a Description to document the purpose of the data flow.
-
Add some Tags to the data flow to make it easier to find.
-
Optionally select the Open data flow check box to directly view the data flow once it’s created.
-
Click Create.
Your empty data flow opens and you reach the Overview tab of the navigation header. The new data flow can also be found later in the Analytics > Home page of Qlik Talend Cloud.
Four tabs are available in a data flow overview:
-
Summary, where you can see general information on the data flow creation, updates and status. You can also check the Output section of the page, listing the resulting files of the data flow, and the Inputs section, listing the different sources used at the start of your data flow. For more information on the outputs and inputs, you can click Open Impact Analysis or Open Lineage.
-
Notifications, where you can configure which events regarding the data flow will trigger notifications.
-
Run history: where you can see a list of the previous data flow runs, with the status, time and duration of the run. Click View to display a summary and optionally download a detailed log.
-
Published copies: where you can see in which spaces the data flow was published.
To start designing your data flow, go to the Editor tab of the navigation header.
Selecting a source
The first building block of your data flow is the source that contains the data you want to prepare. You can use any data file from your catalog or upload new files.
To select a source for your data flow:
-
From the Sources tab of the left panel, drag a Datasets source and drop it on the canvas.
The Data catalog window opens, where you can browse for previously uploaded datasets, or click Upload data file to browse for files on your computer and upload them on the fly.
-
Using the search and filters, select the check-box in front of one or more datasets from your list and click Next.
-
In the Summary, you can review the datasets you have selected, check the fields they contain, and exclude some if you want. Click Load into data flow.
The source is added to the canvas, with a warning saying you need to connect it to other nodes.
Adding processors
Processors are the building blocks that contain the different preparation functions available in a data flow. They receive the incoming data, and return the prepared data to the next step of the flow. Processors allow you to perform complex extract, improve and cleaning operations on diverse data with a live preview. See the full Data flow processors for more information on the available functions.
To connect a first processor to your data source:
-
You can either:
-
From the Processors tab of the left panel, drag the processor of your choice and drop it on the canvas next to your source.
You will need to manually connect the source and the processor. Create a link by clicking the dot on the right of the source node, holding, and dragging the link to the dot on the left of the processor node.
-
Click the action menu of the source, select Add processor, and click the processor of your choice.
The processor is placed on the canvas and automatically connected to the source.
-
-
Click the processor to start configuring it in the right panel.
The different functions available, and the parameters to configure depend on each processor. See the individual processor documentation for more information.
-
Click Save.
-
Add and connect as many processors as needed to prepare your data.
Activate the Data preview switch in the Preview panel to see the effects of a processor on a sample of your data. Click the cog icon to open the preview Settings and configure the sample size up to 10000 rows. You can also activate the Script switch to look at the Qlik Script equivalent of your data flow at this point.
Selecting a target
To end the data flow, you need to connect the last processor to a target node. You can choose between two target types:
-
Data Files for files stored in Qlik Cloud.
-
Connections to write in an external source added as connection in Qlik Cloud.
Both options allow you to export the prepared data as a .qvd, .parquet, .txt or .csv file.
To connect a target to the rest of the flow:
-
You can either:
-
From the Targets tab of the left panel, drag the target type of your choice, and drop it on the canvas next to the last processor.
Manually connect the last processor to the target in the same way you connected processors previously.
-
Click the action menu of the last processor, select Add target, and click the target of your choice.
Information noteIn the case of Data files, the target is placed on the canvas and automatically connected to the processor. In the case of Connections, you need to select an existing connection or create a new one before seeing the target on the canvas. -
-
Click the target to start configuring it in the right panel.
-
Click Save.
With a minimum of one source, one target, and an optional processor, the data flow can now be run.
Running the data flow
When all the nodes of your data flow are connected, configured, and marked as OK, a green check mark shows that the data flow is considered valid and can be run. At this point, it is possible to use the Preview script button on the top right of the canvas to look at the full script that will be generated behind the scene.
-
Click Run flow to start processing the data.
-
A notification opens to show the status of the run.
-
When the flow has successfully completed, the prepared data that has been output can be found at different places according to the target:
-
In your Catalog among you other assets, and in the Outputs section of the data flow Overview for data files
-
In the Outputs section of the data flow Overview for connection-based datasets.
-
You can now freely use this prepared data as clean source to feed an AutoML experiment, or in a visualization app.