Setting up connections to data catalogs
Connect to your AWS Glue Data Catalog to use it as a target in your Qlik Open Lakehouse projects.
Using AWS Glue Data Catalog as the target provides integration with the AWS analytics ecosystem and enables Qlik Open Lakehouse to interoperate with a centralized metadata store. AWS Glue Data Catalog acts as the unified metadata layer, allowing Qlik to write Iceberg tables that are immediately queryable by AWS-native services such as Amazon Athena, without additional configuration. Data written by Qlik is also available to third-party tools without replication.
Prerequisites
To create an AWS Glue Data Catalog connection, you need:
-
An Amazon S3 target bucket.
-
If you are using role-based authentication to access the bucket, you need:
-
Permission to access the network integration you want to use for the connection.
-
The role ARN.
-
-
If you are using access key authentication to access the bucket, you need:
-
Your AWS Access Key ID.
-
Your AWS Secret Access Key.
-
Setting AWS Glue Data Catalog connection properties
To configure the connection, do the following:
-
In Connections, click Create connection.
-
Select the Space where you want to create the connection, or choose Create new data space.
-
Select AWS Glue Data Catalog from the Connector name list or use the Search box.
-
Click Create, and configure the properties:
-
Catalog region: From the list, select the region for your catalog.
-
S3 target bucket: Enter the bucket location, using the following format:
S3://<bucket>/<data_storage_prefix>
-
Configure the Authentication type. From the list, select Role-based or Access key authentication and complete the following information for your selection:
Role-based
-
Network integration: Select the network integration from the list.
-
ARN role: Enter the ARN role created in AWS.
Create an AWS role
To create an AWS role, do the following:
-
In the AWS Console, go to IAM.
-
In Roles, click Create role.
-
For Trusted entity type, select Custom trust policy.
-
In Qlik Cloud, in the Create an AWS role dialog, copy the Trusted entity, which is the entity assigned to the clusters in your integration. Paste it into the console in AWS.
-
Click Roles, and select the role you created above.
-
In Permission policies, click Add permissions, and select Create inline policy.
-
In Qlik Cloud, in the Create an AWS role dialog, copy the inline policy below, and paste it into the console in AWS, and change the <bucket_name> value to your bucket location:
-
From the Role page, in Summary, copy the ARN.
-
In Qlik Cloud, close the Create an AWS role dialog, and paste the ARN value into ARN role.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:BatchDeleteTable",
"glue:GetTable",
"glue:GetTables",
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetUserDefinedFunction"
],
"Resource": [
"arn:aws:glue:us-east-2:*:catalog",
"arn:aws:glue:us-east-2:*:database/*",
"arn:aws:glue:us-east-2:*:table/*/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket_name>"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket_name>/*"
}
]
}
Access key
-
Access key: Enter your unique AWS Access Key ID to use for authentication.
-
Secret key: Enter your AWS Secret Access Key to use with your access key.
Define user permissions
To create an inline policy in AWS, do the following:
-
In the AWS Console, go to IAM.
-
Navigate to Policies > Create policy.
-
In Qlik Cloud, in the Create an AWS role dialog, copy the policy.
-
In AWS, in the Policy editor, paste in the policy, and change the <bucket_name> parameter to your bucket location:
-
Add the policy to the user that gives access to Qlik.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:BatchDeleteTable",
"glue:GetTable",
"glue:GetTables",
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetUserDefinedFunction"
],
"Resource": [
"arn:aws:glue:us-east-2:*:catalog",
"arn:aws:glue:us-east-2:*:database/*",
"arn:aws:glue:us-east-2:*:table/*/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket_name>"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket_name>/*"
}
]
}
In Name, enter the display name for the connection.
Supported data types
The following table shows the supported Iceberg source data types and their default mapping to Qlik Talend Data Integration data types.
Iceberg data types | Qlik Talend Data Integration data types. |
---|---|
BOOLEAN | BOOLEAN |
BYTES | BINARY |
DATE | DATE |
TIME | TIME |
DATETIME | TIMESTAMP |
INT1 | INT |
INT2 | INT |
INT4 | INT |
INT8 | LONG |
NUMERIC | DECIMAL(precision, scale) |
REAL4 | FLOAT |
REAL8 | DOUBLE |
UINT1 | INT |
UINT2 | LONG |
UINT4 | LONG |
UINT8 | DECIMAL(20, 0) |
STRING | STRING |
WSTRING | STRING |
BLOB | BINARY |
NCLOB | STRING |
CLOB | STRING |