Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content Skip to complementary content

Monitoring data drift in deployed models

In the Data drift monitoring pane in your ML deployment, you can analyze data drift for the source deployed model. Data drift monitoring allows you to identify changes to the distributions of one or more features used to train the model.

When the calculated drift for a feature surpasses a value of 0.25, it is recommended that you re-train the model with the most recent data, or configure a new model if the original machine learning question has changed significantly.

Information noteData drift analysis is only available in English.

Data drift analysis in AutoML

Embedded analysis showing feature drift calculations for a deployed model. The sheet includes visualizations to display information such as feature drift over time, value distributions, and a comparison of feature drift and importance within the same chart

Data drift calculations in AutoML

In Qlik AutoML, data drift is calculated as the population stability index (PSI).

You can identify significant data drift for a feature by looking at its PSI value. If the PSI value is greater than or equal to 0.25, consider re-training the model or creating a new experiment.

Population stability index (PSI) values and indications
PSI value Description
Below 0.1 Low drift
Greater than 0.1 but less than 0.25 Minor drift
Greater than or equal to 0.25 Significant drift

Launching a data drift analysis

  1. Open an ML deployment.

  2. From the left panel, select Data drift monitoring.

  3. An embedded analysis is generated. Stay on the Feature Drift sheet to perform data drift analysis.

Availability of the analysis

New calculations for data drift are not generated immediately when you open an analysis. Data drift calculations are generated once daily at 4:30 PM UTC.

Navigating embedded analytics

Use the interactive interface to analyze the deployed model with embedded analytics.

Switching between sheets

The Sheets panel lets you switch between the sheets in the analysis. Each sheet has a specific focus. The panel can be expanded and collapsed as needed.

The Feature Drift sheet contains all information about data drift. Switching to the Operations sheet allows you to monitor the usage of your ML deployment. For more information, see Monitoring deployed model operations.

Making selections

Use selections to refine the data. You can select features and their specific values or ranges, and filter for specific dates and importance ranges. In some cases, you might need to make one or more selections for visualizations to be displayed. Click data values in visualizations to make selections.

You can work with selections by:

  • Select values by clicking content, defining ranges, and drawing.

  • Search within charts to select values.

  • Click a selected field in the toolbar at the top of the embedded analysis. This allows you to search in existing selections, lock or unlock them, and further modify them.

  • In the toolbar at the top of the embedded analysis, click Remove to remove a selection. Clear all selections by clicking the Clear selections icon.

  • Step forward and backward in your selections by clicking Step backward in selections and Step forward in selections.

Analyzing feature drift alongside importance

Use the Feature drift vs importance chart to analyze feature drift and permutation importance together. You can identify when changes in drift are happening in parallel with changing patterns in importance. Viewing these two metrics together, you can uncover newly emerging patterns and develop a deeper understanding of the trends affecting your data.

To understand what the drift scores mean for your model's performance, see Data drift calculations in AutoML.

Monitoring feature drift over time

In the Feature drift over time chart, view the timeline for each drift calculation and analyze changes that have been happening over time as new predictions are generated.

A reference line has been added at a PSI value of 0.25 to indicate when a feature is demonstrating significant drift. To learn more about what the drift scores mean for your model's performance, see Data drift calculations in AutoML.

Viewing feature distribution

The Value distribution chart is helpful for comparing the value distribution for a feature between the training dataset and the dataset used for latest prediction generated with the model. You can identify which ranges in a feature are affected most, and least, by drift.

The blue bars indicate the percentage of values in the latest apply dataset that fall within each range. The purple circle-shaped markers show the percentage of values in the training dataset that fall within each range. If you notice a large difference between the height of the bars and the position of the markers, it is likely that the range is affected by drift.

Learn more

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!