Aggregate processor
Performs operations on groups of the data.
The Aggregate processor allows you to group data from the input flow by aggregation sets, which values can then be used for numerical operations that are outputted in a new field.
Usage
The Aggregate processor requires one input flow and can generate only one output flow.
Properties
Properties to configure to aggregate your records.
Category name | Property name | Configuration |
---|---|---|
Group by | Field | In the list, select the field you want to use for your aggregation sets. |
Operations | Field | Select the field on which you want to perform a calculation operation. |
Operations | Operation | Select the operation you want to apply on your aggregation set: Average, Count, Count Distinct, Concatenate, Maximum, Minimum, Sum. |
Operations | Output field name (optional) |
Enter a name for the generated output field. If left empty, the default generated field name will contain the name of the selected field with the name of the selected operation. The expected format is the following:
Example: ASDasd123_4564 |
To rename the processor or edit its description, click the Edit icon next to the processor name in the Properties panel.
Example
The data used as source in this example contains customer information such as names, age groups, gender and number of purchases.
To gain more insight on the data, you can use the Aggregate processor to create new statistics, such as the average number of purchases by age group.
In the processor properties, configure the Age group field as field to group by, and Purchases as the field to perform operations on. Select Average as operation, and give a name to the field that will be generated in the output.
To perform another operation based on the age groups like the total number of purchases for example, click the + icon next to the Operations property. This time select Sum as operation.
The output flow now contains the Age group field, as well as the two fields containing the new data for statistical analysis. Other fields from the source data are not included in the output of the processor.