Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Preparing for authentication
To access your data, you need to authenticate the connection with your account credentials.
To connect to Amazon S3, you need permissions in AWS Identity Access Management (IAM) that allow you to create policies, create roles, and attach policies to roles. This is required to grant the authorization to your S3 bucket:
Creating an IAM policy
An IAM policy is JSON-based access policy language to manage permissions to bucket resources.
Permission name | Operation | Description |
s3:GetObject | GET Object |
Allows for the retrieval of objects from Amazon S3. |
s3:GetObject | HEAD Object | Allows for the retrieval of metadata from an object without returning the object itself. |
s3:ListBucket | GET Bucket (List Objects) |
Allows for the return of some or all (up to 1,000) of the objects in a bucket. |
s3:ListBucket | HEAD Bucket |
Used to determine if a bucket exists and access is allowed. |
To create the IAM policy:
- In AWS, navigate to the IAM service by clicking the Services menu and typing IAM.
- Click IAM once it displays in the results.
- Click Policies in the menu on the left side of the page.
- Click Create Policy.
- In the Create Policy page, click the JSON tab.
- Select everything currently in the text field and delete it.
- In the text field, paste the following JSON and replace MyBucketName by the name of your bucket:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::MyBucketName", "arn:aws:s3:::MyBucketName/*" ] } ] }
- Click Review policy.
- On the Review Policy page, give the policy a name. For example:
qlik_amazon_s3
. - Click Create policy.
Creating an IAM role
To complete this step, you need the following AWS IAM permissions: CreateRole
and AttachRolePolicy
. Refer to Amazon’s documentation for more info.
If you are creating multiple Amazon S3 integrations, you need to complete this step for each integration you are connecting.
- In AWS, navigate to the IAM Roles page.
- Click Create Role.
- On the Create Role page:
- In the Select type of trusted entity section, click the Another AWS account option.
- In the Account ID field, paste
338144066592
. - In the Options section, select the Require external ID checkbox.
- In the External ID field that displays, paste
qlik_connection_<tenant-id>
and replace <tenant-id> by your tenant ID.To find your tenant ID, see Finding tenant information.
- Click Next: Permissions.
- On the Attach permissions page:
- Search for the policy you created in the Creating an IAM policy.
- Once located, check the box next to it in the table.
- Click Next: Tags.
- If you want to enter any tags, do so on the Add tags page. Otherwise, click Next: Review.
- On the Review page:
- In the Role name field, paste
qlik_s3_<tenant-id>
and replace <tenant-id> by your tenant ID.To find your tenant ID, see Finding tenant information.
- Enter a description in the Role description field. For example:
Qlik role for Amazon S3 integration.
- Click Create role.
- In the Role name field, paste
Defining the search pattern
The Search Pattern field defines the search criteria Qlik should use for selecting and replicating files. This field accepts regular expressions, which can be used to include a single file or multiple files.
When creating a search pattern, keep the following in mind:
- When including multiple files for a single table, each file should have the same header row values.
- Special characters such as periods (
.
) have special meaning in regular expressions. To match exactly, they will need to be escaped. For example:.\
- Qlik uses Python for regular expressions, which may vary in syntax from other varieties. Try using PyRegex to test your expressions before saving the integration.
- Search patterns should account for how data in files is updated. Consider these examples:
Scenario | Single file, periodically updated | Multiple files, generated daily |
How updates are made | A single JSONL file is periodically updated with new and updated customer data. | A new CSV file is created every day that contains new and updated customer data. Old files are never updated after they're created. |
File name | customers.jsonl
|
customers-[STRING].csv , where [STRING] is a unique, random string
|
Search pattern |
Because there will only ever be one file, you could enter the exact name of the file in your S3 bucket:
|
To ensure new and updated files are identified, you'd want to enter a search pattern that would match all files beginning with
|
Matches | customer.jsonl , exactly
|
|
File requirements
First-row header (CSV files only) |
|
File types |
|
Compression types |
These files must be correctly compressed or errors will surface during extraction.
|
Delimiters (CSV files only) |
|
Character encoding |
UTF-8 |
Creating the connection
For more information, see Connecting to SaaS applications.
- Fill in the required connection properties.
-
Provide a name for the connection in Connection name.
-
Select Open connection metadata to define metadata for the connection when it has been created.
-
Click Create.
Setting | Description |
---|---|
Data gateway |
Select a Data Movement gateway if required by your use case. Information note
This field is not available with the Qlik Talend Cloud Starter subscription, as it does not support Data Movement gateway. If you have another subscription tier and do not want to use Data Movement gateway, select None. For information on the benefits of Data Movement gateway and use cases that require it, see Qlik Data Gateway - Data Movement. |
Start Date |
Enter the date, in the format |
S3 Bucket | Name of the S3 bucket. |
AWS Account ID |
The external ID in AWS. See Preparing for Authentication. The pattern is: |
Search pattern | Enter the files to include in your table. You can enter a single file name or a regular expression.
Example: |
Directory | Limit the search in this directory path. When defined, only files in this location will be searched and wille be selected those that match the search pattern. You cannot use a regular expression.
Example: csv-exports-folder or employee_jsonl_exports. |
Table configuration
Configure a table by specifying files you want to include. You can configure multiple tables. |
|
Table name | Table name.
Each target has its own rules for how tables can be named. For example, Amazon Redshift table names cannot exceed 127 characters. |
Primary key | Enter the primary key to identify unique rows or records. When you enter more than one key, use comma to separate the values.
Example: id, name. |
Specify datetime fields | Enter the values that must appear as datetime instead of string in your table.
Example: created_at, modified_at. |
Delimiter | Select the delimiter from the drop-down list. |