Early Access: The content on this website is provided for informational purposes only in connection with pre-General Availability Qlik Products.
All content is subject to change and is provided without warranty.
Skip to main content Skip to complementary content

Working with multivariate time series forecasting

With Qlik Predict, you can train machine learning models to forecast time-specific metrics. Using neural network-based methods, models learn and predict complex patterns involving time-specific associations, grouped target data, historical features, and known future variables. To create a time series forecast, prepare a training dataset, use it in a time series experiment, deploy a model, and then create apply datasets that you can use to generate predictions.

Components of a time series problem

With time series forecasting, the goal is to predict target values for specific dates into the future. For example, you might want to predict sales for the next week, month, or quarter.

When developing your time series problem, define the following components:

  • Target and groups

  • Date index

  • Forecast horizon

  • Covariates

Information noteThis framework describes how to define a machine learning question for time series forecasting problems. To define machine learning questions for classification and regression problems, see Defining machine learning questions.

Simplified illustration outlining the components of a time series forecasting problem in Qlik Predict.

Target

As with other experiment types, the target is the column for which you want the model to predict future values. For time series experiments, the target needs to contain numeric data—for example, sales or inventory.

If you are using groups in the time series forecast, models will predict one target value per group per time step in the forecast window. If you are not using groups, your trained models will predict one target value for each time step in the forecast window.

Date index

The date index tracks the time series metrics over a continuous time interval (time step). You need to decide on your time step at an early stage: how often do you need to predict future values?

Specifically, the date index is a column that appears in your training and apply datasets for time series problems. The date index determines the structure of both of these apply datasets—each row represents a step in time (or, with groups, a step in time for each unique grouping).

When you add your training dataset in a time series experiment, possible date index columns are automatically identified and presented to you as Insights at the column level. You can identify them from the Possible date index insight in schema view.

Groups

Groups are features containing categorical information for which you want to generate predictions separately. Classic examples of groups include store number and product, which could have been used to organize data for a target such as sales. By selecting store number and product type as groups, your time series models will provide predictions for each individual value across these columns. For example, with a target of sales, if you have three store numbers — 1, 2, and 3 — and two product types — grocery and produce — your model will generate sales predictions for each unique combination of these values.

You should incorporate groups into your time series problem if you have the data and need individual predictions by category. Another advantage of groups is that models can learn globally, better understanding the patterns that exist between the different groupings you define.

You can configure the groups to use for each experiment version. If you do not specify groups but groups are identified in your training dataset, the training will use groups.

Groups are identified by duplicate values in the date index column—for example, for a date of 1/14/2025, you have two records: one for store A, and the other for store B.

Each group in a time series experiment — including the target alone — are considered to be separate time series within your dataset. See What is a time series?.

Forecast horizon

The forecast horizon specifies how far into the future you want to forecast. The forecast horizon is composed of the forecast window (the number of time steps for which you need predictions) and forecast gap (an optional number of time steps after your historical data for which you do not want predictions).

You set the forecast window and gap size when configuring an experiment version. These values are used both during model training and when generating predictions from models deployed as ML deployments.

The forecast window is the number of time steps for which you want to predict into the future. For example, if your time step is one day and you want to forecast sales for the next two weeks, you would set your forecast window to 14.

The forecast gap is the amount of time in the future for which you do not require predictions. Setting a forecast gap is optional, because you may or may not need one. The forecast gap starts at the end of the recorded historical training data you have provided. The forecast window begins where the forecast gap ends.

For example, you might be looking to predict future sales, but you are only interested in future sales for dates later than one week after the end of your input data. In this case, with a time step of days, you could set your forecast gap size to seven time steps.

Your selected forecast window, in addition to how much training data you have, limits how far into the future you can forecast. For more information, see Maximum forecast window.

Covariates

In time series problems, features are often called covariates. Similar to other machine learning problems, covariates are the other variables that you suspect have an influence on the outcome of the target. Each covariate is represented as a single column in your training dataset.

In time series forecasting, there are several types of covariates and they have some important distinctions:

  • Static covariates: Columns that do not vary over the course of a time series. Static covariates are applicable in time series experiments where groups are being used. For example, suppose you have groups for Product and Store Number, and there is a feature Default Discount. If Product A in Store 1 has a default discount of 10% and Product B in Store 2 has a default discount of 20%, Default Discount would be a static covariate. That is, it does not vary within the data for the group within which it appears.

    Static covariates are detected automatically from historical features you include in the experiment. You do not need to indicate which features are static covariates.

  • Past covariates: Time-dependent variables that are available only in the historical data, and which vary across this data. Past covariates are detected automatically from historical features you include in the experiment. You do not need to explicitly indicate which features are past covariates.

  • Future covariates: Future covariates, also known as future features, are time-dependent variables for which you will know the future values within the forecast horizon. When using future covariates in training, you need to indicate them as future features in the training configuration.

Future features

With future features, you can provide additional data to your models about future information you already know or can reasonably expect. In particular, you have access to future values for this feature spanning your selected forecast horizon. When defining future features, you need to provide historical as well as future data.

For example, for a model predicting metrics that could be influenced by future discounts offered by a store, you could include the historically observed discounts, as well as the discounts for future time periods within the forecast window. Other examples of future features could be weather or calendar information.

Other important concepts

This section outlines concepts that are relevant to your time series problem, but that you do not configure directly in an experiment or ML deployment. These are properties that are defined by your data or by other properties you configure for the model.

Time steps

The time step is defined by your training dataset and is important for both training and predictions.

In your training dataset, the time step is the interval at which the data in your date index is recorded. For example, the time step can be daily, every hour, every minute, or every second.

It is important to be aware of the time step used in your training data. Other experiment parameters you define, such as forecast window and forecast gap size, will follow this time step interval.

After deploying your model, the apply data for which you want to create predictions will need to follow the same time step as defined in the training dataset.

Quality

When you select a training dataset, the system infers the time step used. If there are some missing or inconsistent values in the date index, these features can be interpolated automatically by the system. However, if your data contains time intervals that are inconsistent to the point where different time steps are detected, the data must be fixed first. For example, if you have several months of data recorded once daily, but there is a section in which data is consistently recorded on a weekly basis, the dataset cannot be used because multiple time steps will be detected.

Apply window

The apply window, or look-back period, is the portion of the training data that the algorithm can use to provide the predictions for your specified forecast window.

The apply window is calculated and set by the system. It is measured in time steps. The apply window is defined by what you set as the forecast window and gap (forecast horizon).

The apply window is identified automatically from your training configuration. To generate predictions for a given forecast window, you need to provide the historical data covering at least your apply window. This is provided in your apply dataset. See Preparing an apply dataset.

Maximum forecast window

The maximum forecast window is estimated as you configure your time series experiment. After you have run a version of the training, the maximum forecast window is confirmed with certainty. The maximum forecast window is displayed to you as the Estimated maximum forecast or Maximum forecast under Based on your data, when you open Target and experiment type in the experiment configuration panel. The maximum forecast window is the maximum number of time steps for which you can generate forecasts, given your chosen forecast window, how much historical data you have provided, and the minimum sample size expected by the system. The more historical data you provide, the further in time you will be able to predict. However, to generate reliable predictions, it is important to select a reasonable forecast window.

The maximum forecast window can be as large 180 time steps.

Forecast cut-off time

The forecast cut-off time is especially important when defining your apply dataset during predictions. The forecast cut-off time is the last date in your sample for which you have a target value. Essentially, dates after this cut-off time are the dates for which you want to generate predictions.

What is a time series?

In Qlik Predict time series forecasting, each group — including the target alone — are considered to be separate time series within the training dataset. For example, suppose your training dataset contains sales metrics. These sales metrics are defined for each store and product type. With Store and Product Type columns defined as groups, there are three time series in the training dataset.

Preparing a training dataset

For multivariate time series forecasts, your training dataset needs to contain the following columns:

  • Date index

  • Target column

  • Group columns (optional)

  • Feature columns (optional—without features, you are training a univariate forecasting model)

Illustrations showing the required columns and data for time series training datasets. Scenarios with no groups, one group, and two groups are described.

Linear diagram outlining the needed components, and timeline, of a training dataset for a time series forecasting model.

Date index column

You need a date index containing full dates or time stamps. This column is the chronological index along which the target and covariate metrics are tracked. The date index column organizes the time-based measurements sequentially along a consistent time interval (the time step).

The date index column is organized as follows, depending on whether or not you are using groups:

  • No groups: A single record for each time step. For example, with a daily forecast, each row represents a single day.

  • With groups: One or more duplicate entries for each time step depending on the groups used.

With a multivariate training dataset, there will be one or more duplicate entries for each time step depending on the groups used. There is flexibility in the time step you use — you could, for example, record dates one or more times on a daily, weekly, or monthly basis, and so on.

Missing or inconsistently recorded values in this column are sometimes acceptable, if they can be interpolated. However, your date index values cannot contain multiple different time steps. For example, if the interval is determined to be once daily, but at some point, an interval of twice daily is identified, an error will occur during training.

Target column and group columns

Your dataset needs to have a target column containing a numeric metric that you want to forecast. A common example is sales.

If you are using groups, you provide historical target values for each possible value in groups that you add. For example, if your target is Sales and you add a group Store Number that contains data for Store A and Store B, your dataset needs to include two separate records for each time step: one with the sales value for Store A, and the other with the sales value for Store B.

Feature columns

You can train a time series model without any covariates. However, if you include covariates, provide a column in the dataset for each feature. Feature data should generally be historically recorded data unless you are adding future features. Future feature columns can contain both historical and future data. You should only include future feature data in the training dataset if you are confident that the future values of these column will be known when you create predictions.

Keep track of which features you will use as future features, as you will need to select them as such in the training configuration.

Data volume

Your dataset needs to contain enough records. The volume of your historical data plays a part in determining how far into the future you can predict. Your desired forecast window also affects how much historical data you need.

Generally, more historical data is better than less. However, the data needs to be of good quality and capture the desired trends. If the data provides irrelevant information or contains inaccuracies, it is not helpful to have it in the model. Consider a balance between optimizing volume and maintaining quality and relevance.

Examples

Preparing an apply dataset

After you deploy a time series model, you need to develop an apply dataset for which predictions will be made.

Apply dataset — Requirements and validation

For time series models, the apply dataset needs:

  • Columns and column headers for all columns included in the training dataset.

  • The same time step as the training dataset.

  • As many or more historical data records (per target and group) prior to the forecast cut-off time as the number of records in the apply window for the model. These need to be full records containing the historically observed date or time stamp, target, and covariate values. The apply window is determined by the forecast window and gap configured during training — the longer into the future you need to predict, the more historical data you need in your apply dataset to run predictions.

  • Records for all future time steps in your forecast horizon. For these future records, only include the values for the date index column, as well as any future features. Leave the values for the other columns blank.

Tip noteMost of the historical data requirements for your apply dataset are to specify minimum acceptable data volumes. You can always provide more than needed. When the model generates predictions, only the records needed to cover the apply window are used.

Illustrations showing the required columns and data for apply datasets used to generate predictions from time series forecasting models. Scenarios with no groups, one group, and two groups are described.

Linear diagram outlining the needed components, and timeline, of an apply dataset that is used to generate predictions with a time series forecasting model.

Examples

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!