Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

A high-level look at this connector, including useful links, and supported features.

Feature/Capability	Support details
Supported Qlik Talend Data Integration projects	Replication projects only. Data pipeline projects are not supported.
Target update methods	Replication tasks: Apply changes Store changes Landing data in a data lake tasks: Change data capture (CDC)
Managing metadata	Manual metadata generation is not required.
Schema evolution	Only the Change column data type operation is supported.
Replication of LOB columns (NCLOB, CLOB, and BLOB)	Not supported.
Scheduled CDC	Required. This is how the target is kept up-to-date with changes to the source. For replication tasks, see Scheduling CDC for replication tasks For lake landing tasks, see Scheduling CDC for lake landing tasks
Notifications	Partially supported Setting notifications for changes in operation
Monitoring	CDC-only, as full load is not relevant for this connector. Monitoring an individual data task
Automatic denesting of JSON column payloads	Not supported. JSON column payloads in source datasets are not denested automatically on the target.

Preparing for authentication

To access your data, you need to authenticate the connection with your account credentials.

Make sure that the account you use has read access to the tables you want to fetch.

To connect to Amazon S3, you need permissions in AWS Identity Access Management (IAM) that allow you to create policies, create roles, and attach policies to roles. This is required to grant the authorization to your S3 bucket:

Creating an IAM policy.
Creating an IAM Role.

Creating an IAM policy

An IAM policy is JSON-based access policy language to manage permissions to bucket resources.

Amazon S3 permissions
Permission name	Operation	Description
s3:GetObject	GET Object	Allows for the retrieval of objects from Amazon S3.
s3:GetObject	HEAD Object	Allows for the retrieval of metadata from an object without returning the object itself.
s3:ListBucket	GET Bucket (List Objects)	Allows for the return of some or all (up to 1,000) of the objects in a bucket.
s3:ListBucket	HEAD Bucket	Used to determine if a bucket exists and access is allowed.

To create the IAM policy:

In AWS, navigate to the IAM service by clicking the Services menu and typing IAM.
Click IAM once it displays in the results.
Click Policies in the menu on the left side of the page.
Click Create Policy.
In the Create Policy page, click the JSON tab.
Select everything currently in the text field and delete it.

In the text field, paste the following JSON and replace MyBucketName by the name of your bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
            "s3:GetObject",
            "s3:ListBucket"
            ],
            "Resource": [
            "arn:aws:s3:::MyBucketName",
            "arn:aws:s3:::MyBucketName/*"
            ]
        }
    ]
}

Click Review policy.
On the Review Policy page, give the policy a name. For example: qlik_amazon_s3.
Click Create policy.

Creating an IAM role

To complete this step, you need the following AWS IAM permissions: CreateRole and AttachRolePolicy. Refer to Amazon’s documentation for more info.

If you are creating multiple Amazon S3 integrations, you need to complete this step for each integration you are connecting.

In AWS, navigate to the IAM Roles page.
Click Create Role.
On the Create Role page:
1. In the Select type of trusted entity section, click the Another AWS account option.
2. In the Account ID field, paste 338144066592.
3. In the Options section, select the Require external ID checkbox.
4. In the External ID field that displays, paste qlik_connection_<tenant-id> and replace <tenant-id> by your tenant ID.
  To find your tenant ID, see Finding tenant information.
5. Click Next: Permissions.
On the Attach permissions page:
1. Search for the policy you created in the Creating an IAM policy.
2. Once located, check the box next to it in the table.
3. Click Next: Tags.
If you want to enter any tags, do so on the Add tags page. Otherwise, click Next: Review.
On the Review page:
1. In the Role name field, paste qlik_s3_<tenant-id> and replace <tenant-id> by your tenant ID.
  To find your tenant ID, see Finding tenant information.
2. Enter a description in the Role description field. For example: Qlik role for Amazon S3 integration.
3. Click Create role.

Defining the search pattern

The Search Pattern field defines the search criteria Qlik should use for selecting and replicating files. This field accepts regular expressions, which can be used to include a single file or multiple files.

When creating a search pattern, keep the following in mind:

When including multiple files for a single table, each file should have the same header row values.
Special characters such as periods (.) have special meaning in regular expressions. To match exactly, they will need to be escaped. For example: .\
Qlik uses Python for regular expressions, which may vary in syntax from other varieties. Try using PyRegex to test your expressions before saving the integration.
Search patterns should account for how data in files is updated. Consider these examples:

Scenario	Single file, periodically updated	Multiple files, generated daily
How updates are made	A single JSONL file is periodically updated with new and updated customer data.	A new CSV file is created every day that contains new and updated customer data. Old files are never updated after they're created.
File name	`customers.jsonl`	`customers-[STRING].csv`, where `[STRING]` is a unique, random string
Search pattern	Because there will only ever be one file, you could enter the exact name of the file in your S3 bucket: `customers\.jsonl`	To ensure new and updated files are identified, you'd want to enter a search pattern that would match all files beginning with `customers`, regardless of the string in the file name: `(customers-).*\.csv`
Matches	`customer.jsonl`, exactly	`customers-reQDSwNG6U.csv` `customers-xaPTXfN4tD.csv` `customers-MBJMhCbNCp.csv` etc.

File requirements

First-row header (CSV files only)	Every file must have a first-row header containing column names. The first row in any file is considered the header row, and will present these values as columns available for selection. Files with the same first-row header values, if including multiple files in a table. Amazon S3 integration allows you to map several files to a single target table. Header row values are used to determine a table’s schema. For the best results, each file should have the same header row values. This is different from configuring multiple tables. See Defining the search pattern for examples.
File types	CSV (`.csv`) Text (`.txt`) JSONL (`.jsonl`)
Compression types	These files must be correctly compressed or errors will surface during extraction. gzip compressed files (`.gz`)
Delimiters (CSV files only)	Comma (`,`) Tab (`/t`) Pipe (`\|`) Semicolon (`;`)
Character encoding	UTF-8

Creating the connection

For more information, see Connecting to SaaS applications.

Fill in the required connection properties.
Provide a name for the connection in Connection name.
Select Open connection metadata to define metadata for the connection when it has been created.
Click Create.

Connection settings
Setting	Description
Data gateway	Select a Data Movement gateway if required by your use case. Information note This field is not available with the Qlik Talend Cloud Starter subscription, as it does not support Data Movement gateway. If you have another subscription tier and do not want to use Data Movement gateway, select None. For information on the benefits of Data Movement gateway and use cases that require it, see Qlik Data Gateway - Data Movement.
Start Date	Enter the date, in the format `MM/DD/YYYY`, from which the data must be replicated from your source to your target.
S3 Bucket	Name of the S3 bucket.
AWS Account ID	The external ID in AWS. See Preparing for Authentication. The pattern is: `qlik_connection<tenant-id>`.
Search pattern	Enter the files to include in your table. You can enter a single file name or a regular expression. Example: `users\.csvproducts\.jsonl`.
Directory	Limit the search in this directory path. When defined, only files in this location will be searched and wille be selected those that match the search pattern. You cannot use a regular expression. Example: csv-exports-folder or employee_jsonl_exports.
Table configuration Configure a table by specifying files you want to include. You can configure multiple tables.
Table name	Table name. Each target has its own rules for how tables can be named. For example, Amazon Redshift table names cannot exceed 127 characters.
Primary key	Enter the primary key to identify unique rows or records. When you enter more than one key, use comma to separate the values. For CSV files, enter the header fields or column names. For JSONL files, enter the attribute names or object keys. Example: id, name.
Specify datetime fields	Enter the values that must appear as datetime instead of string in your table. Example: created_at, modified_at.
Delimiter	Select the delimiter from the drop-down list.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here