Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content Skip to complementary content

Large datasets in Qlik Predict - Preview

Experiment training is available for datasets with large file sizes and cell counts.

Qlik Predict limitations and capacities

This section lists general limitations and capacities for Qlik Predict functionality. For capacities that are specific to individual Qlik Cloud subscriptions, refer to Qlik Pricing and Qlik Cloud® Subscriptions or contact your Qlik account representative.

General limitations

  • Qlik Predict has an API rate limit of 300 requests per minute.

  • Maximum number of columns in the dataset: 500

    This applies both to training and apply datasets. For training datasets, the limit is the number of columns used as features in an experiment version. More columns can be in the source dataset. For example, you might have a dataset with 501 columns. The dataset can still be used for training if you drop one feature during experiment configuration.

  • Any flat file that can be uploaded and profiled in Qlik Cloud is supported for use in Qlik Predict.

    For multi-table files such as Microsoft Excel files with multiple sheets, only the first table will be imported. If data profiling fails for a table (for example, if it is empty), the file is not supported.

Training dataset and profiling limitations

This section lists guardrails enforced on training dataset sizes in Qlik Predict.

Note the following:

  • The limitations apply only to the data included in the experiment version: all included features, including the target column.

  • The limitations apply either to general data profiling across the Qlik Cloud platform, or are specific to Qlik Predict.

  • The limitations are maximum capacities. The limits of your Qlik Cloud subscription may be lower.

Dataset size limits for training

These limits are technical capacities for the size, cell count, and number of included columns in training datasets.

Maximum sizes for training datasets, by dataset type
Dataset type Maximum dataset size Maximum dataset cell count Maximum number of included columns
CSV 2 GiB 100 million 500
Parquet 2 GiB 500 million 500
QVD 2 GiB 500 million 500
Others 1 GiB 100 million 500

In addition, certain training functionality is only available for datasets under specific sizes and cell counts.

Training feature availability by dataset type and size
Dataset type Free text feature engineering supported Time series experiments supported
CSV Up to 100 million cells or 1 GiB (exceeding either of these limits is not supported) Up to 1 GiB
Parquet Up to 100 million cells or 1 GiB (exceeding either of these limits is not supported) Up to 1 GiB
QVD Up to 100 million cells or 1 GiB (exceeding either of these limits is not supported) Up to 1 GiB
Others Up to 100 million cells or 1 GiB (exceeding either of these limits is not supported) Up to 1 GiB

Profiling limitations

When you add a training dataset to an experiment, it is analyzed by Qlik Cloud data profiling to estimate various statistics (such as cell count and distinct value count). After running an experiment version, Qlik Predict preprocessing is performed, which sometimes results in changes to certain statistics.

For large datasets—those exceeding 1 GiB—the data is partially profiled. This can cause some estimated statistics—for example, row, cell, distinct value, and null counts—to change after running the training.

As a result, with some large datasets, you could experience the following training errors:

  • Training fails due to a dataset exceeding the cell count permitted by the subscription, even though no errors were present upon profiling.

  • Training fails because the null count exceeds the maximum permitted threshold, even though no errors were present upon profiling.

  • The estimated experiment type is found to be incompatible with the training dataset, even though no errors were present upon profiling.

For troubleshooting steps you can follow to reduce your dataset size, see Troubleshooting.

Troubleshooting

Training error: dataset limit exceeded

You might encounter an error when training an experiment version because the dataset exceeds Qlik Predict guardrails.

Possible cause 1

The training dataset exceeds the maximum cell count or file size for Qlik Predict. In some cases, this may not have been identified before training due to partial profiling.

For more information, see Training dataset and profiling limitations.

Proposed action 1

Consider:

  • Dropping features from the experiment to reduce the size.

  • Converting the dataset to a different file type.

  • Return to dataset preparation and reduce the number of rows in the dataset.

Possible cause 2

The training dataset exceeds the maximum size for your Qlik Cloud subscription. In some cases, this may not have been identified before training, specifically when data profiling estimates row count.

For more information, see Training dataset and profiling limitations.

Proposed action 2

Consider:

  • Dropping features from the experiment to reduce the size.

  • Return to dataset preparation and reduce the number of rows of the dataset.

Possible cause 3

The training dataset has too many columns. You can have a maximum of 500 columns across the data you have selected to use in training.

For more information, see General limitations.

Proposed action 3

Deselect any unneeded features until below the limit.

Training: Certain functionality is not available when configuring experiment

For example, you might notice that the following are not available for your experiment:

  • Free text feature engineering

  • Time series experiment

Possible cause 1

Your dataset is too large for the functionality to be available. The table below outlines the dataset limits for certain capabilities.

For more information, see Training dataset and profiling limitations.

Proposed action 1

Reduce the size of your training data. You can try any of the following:

  • Dropping features from the experiment to reduce the size.

  • Return to dataset preparation and reduce the number of rows of the dataset.

Possible cause 2

Your Qlik Cloud subscription does not include this functionality.

Qlik Predict subscription-governed capacities

Proposed action 2

Contact your tenant administrator or service account owner to learn about the limits of the subscription.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!