Aggregate event data using Edge Processor

Learn how to aggregate event data using Edge Processor to optimize data flow and reduce log volume by processing partial aggregations in batches.

You can create a pipeline that aggregates your incoming event data to reduce the volume of raw logs being sent to your destination.

For example, when working with network flow record data, you can get the value sum of total bytes transferred from each source IP address. You can then send these aggregations as singular events to your data destination to be indexed.

Data is aggregated using the stats command in an SPL2 pipeline statement. See the stats command documentation in the SPL2 Search Reference manual for more information.

Differences between ingest-time aggregations and search-time aggregations

Due to the streaming data flow through Edge Processor pipelines, the stats command performs partial aggregations in batches. These batches can be made up of single or multiple events, depending on the rate of your data flow through pipelines and data source configurations. If aggregable data is processed across multiple batches, each batch is processed independently. These batches of partial aggregations are then routed as events to your pipeline destination. If you want to retrieve a complete and finalized aggregation of your data, you must run a search using the stats command in Splunk platform once your data is indexed. See Processing aggregations at search-time for more information.

Additionally, there are optional arguments available for search-time stats use that are not available for ingest-time stats, and optional arguments available for ingest-time aggregations. See the table below for which optional arguments are available for using in stats in Edge Processor pipelines and at search-time.
Optional argument Can be used in an Edge Processor pipeline Can be used in Splunk platform search
all-num Yes
batch_id Yes
batch_time Yes
by-clause Yes Yes
delim Yes
instance_id Yes
partitions Yes
span Yes Yes
For more information on the statistical functions that can be used with stats in a pipeline, see SPL2 statistical functions for Edge Processor Pipelines.
Note: While the statistical function avg() is not available for use with the stats command in Edge Processor pipelines, an aggregation with an average of values can still be computed by using sum and count in your Edge Processor pipeline statement, then dividing at search-time. See Processing aggregations at search-time for a full pipeline and search statement example.

Create a pipeline to aggregate your data

Create a data pipeline to aggregate and process event data using the pipeline editor.

To create a pipeline that aggregates your data in batches, use the Add summary action in the pipeline editor to specify the fields you want to summarize.
  1. Navigate to the Pipelines page and then select New pipeline, then Edge Processor pipeline.
  2. On the Define your pipeline's partition page, do the following:
    1. Select how you want to partition your incoming data that you want to send to your pipeline.
      You can partition by source type, source, and host.
    2. Enter the conditions for your partition, including the operator and the value.
      Your pipeline will receive and process the incoming data that meets these conditions.
    3. Select Next to confirm the pipeline partition.
  3. On the Add sample data page, do the following:
    1. Enter or upload sample data for generating previews that show how your pipeline processes data.
      The sample data must contain accurate examples of the values that you want to retrieve statistic example, the following sample events represent requested files from a server and contains aggregable data.
      server_name server_ip file_requested bytes_out sourcetype
      web-01 10.1.2.3 /index.html 500 web
      web-01 10.1.2.3 /images/logo.png 8500 web
      web-02 10.1.2.4 /css/main.css 1200 web
    2. Select Next to confirm the sample data that you want to use for your pipeline.
  4. On the Select a data destination page, select the name of the destination that you want to send your processed data to.
  5. If you selected a Splunk platform destination, you can configure index routing:
    1. Select one of the following options in the expanded destinations panel.
      Option Description
      Default The pipeline does not route events to a specific index. If the event metadata already specifies an index, then the event is sent to that index. Otherwise, the event is sent to the default index of the Splunk Cloud Platform deployment.
      Specify index for events with no index The pipeline only routes events to your specified index if the event metadata did not already specify an index.
      Specify index for all events The pipeline routes all events to your specified index.
    2. If you selected Specify index for events with no index or Specify index for events with no index, then from the Specify index for events with no index drop-down list, select the name of the index that you want to send your data to.
      Note: Be aware that the destination index is determined by a precedence order of configurations. See How does an Edge Processor know which index to send data to? for more information.
      Note: It is recommended that you do not send both non-summarized and summarized data to the same index.
  6. Select Done to confirm the data destination.
    After you complete the on-screen instructions, the pipeline builder displays the SPL2 statement for your pipeline.
  7. (Optional) To generate a preview of how your pipeline processes data based on the sample data that you provided, select the Preview Pipeline icon (Image of the Preview Pipeline icon). Use the preview results to validate your pipeline configuration.
  8. Select the plus icon (This image shows an icon of a plus sign.) in the Actions section and then select Summarize values in field.
  9. In the Add summary dialog box, do the following:
    1. Add an aggregation in the Aggregations section. Select a field to aggregate and the calculation you want to perform for that field. You can add multiple fields for multiple aggregations.
      For example, using the sample data provided in Step 4a, add bytes_out as the source field and select sum as the calculation.
    2. In the Split by section, add one or more fields to group together your aggregations, if desired. Add a field to group together your aggregations, if desired.
      Continuing the example from the previous step, if you wanted to group the sum of bytes_out by the server names given in the server_name field, add server_name in the Split by section.
    The resulting output of this example would be aggregated as the following events:
    server_name bytes_out _raw
    web-01 9000 {"server_name": "web-01", "bytes_out": 9000}
    web_02 1200 {"server_name", "web-02", "bytes_out": 1200
    Note: Your pipeline keeps the fields specified in the Aggregations and Split by sections and filters out the unspecified fields. In the example above, the pipeline filters out the server_ip and file_requested fields because they are not included in the aggregation.
  10. To save your pipeline, do the following:
    1. Select Save pipeline.
    2. In the Name field, enter a name for your pipeline.
    3. In the Description field, enter a description for your pipeline.
    4. Select Save.
  11. To apply this pipeline, do the following:
    1. Navigate to the Pipelines page.
    2. In the row that lists your pipeline, select the Actions icon (Image of the Actions icon) and then select Apply.
    3. Select the pipeline that you want to apply, and then select Apply.
    It can take a few minutes for the process of applying the pipeline to the Edge Processor to complete. During this time, the status of the affected Edge Processors is Pending. To confirm that the process completed successfully, do the following:
    • Navigate to the Edge Processors page. Then, verify that the Instance health column for the affected Edge Processors shows that all instances are back in the Healthy status.
    • Navigate to the Pipelines page. Then, verify that the Applied column for the pipeline contains a The pipeline is applied icon (Image of the "applied pipeline" icon).
      Note: You might need to refresh your browser to see the latest updates.

      For information about other ways to apply pipelines to Edge Processors, see Apply pipelines to Edge Processors.

    The Edge Processors that you applied the pipeline to can now process and route data as specified in the pipeline configuration.
You now have a pipeline that calculates aggregations from your event data.

Managing aggregations

Aggregation patterns

Different types of aggregations can be created with the stats command in an Edge Processor pipeline depending on how many fields you want your indexed events to retain.

Summary pattern: Choosing a limited number of fields to aggregate in your pipeline filters out unselected fields and outputs a distilled summarization of your data. See Create a pipeline to aggregate your data for an example of the summary pattern.

Passthrough pattern: If you want to retain all fields from your events and still reduce the overall number of events being indexed, you can add all desired fields in the BY <clause> section of your pipeline statement. See below for an example of the passthrough pattern in practice.

Data input
server_name server_ip file_requested bytes_out sourcetype
web-01 10.1.2.3 /index.html 500 web
web-01 10.1.2.3 /index.html 500 web
web-02 10.1.2.4 /css/main.css 1200 web
SPL2 statement
CODE
... | stats sum(bytes_out) AS bytes_out BY server_name, server_ip, file_requested, sourcetype
    | rename orig_sourcetype AS sourcetype
    | eval _raw=json_delete(_raw, "orig_sourcetype"
Aggregation output
_raw server_name server_ip file_requested bytes_out sourcetype
{"server_name": "web-01", "server_ip": "10.1.2.3", "file_requested": "/index.html", "bytes_out": 1000} web-01 10.1.2.3 /index.html 1000 web
{"server_name": "web-02", "server_ip": "10.1.2.4", "file_requested": "/css/main.css", "bytes_out": 1200} web-02 10.1.2.4 /css/main.css 1200 web

Processing aggregations at search-time

Now that you have an applied pipeline aggregating your data in batches and sending these batches to your Splunk platform destination, you can finalize your batched aggregations by running a search in Splunk platform using the stats command. There are corresponding SPL1 queries for each statistical function run in an Edge Processor pipeline to finalize your aggregations. See the following examples for these search statements.

All examples use the following sample data as input:
server_name server_ip file_requested bytes_out sourcetype
web-01 10.1.2.3 /index.html 500 web
web-01 10.1.2.3 /images/logo.png 8500 web
web-02 10.1.2.4 /css/main.css 1200 web
sum
SPL2 pipeline statement
CODE
... | stats sum(bytes_out) as bytes_out BY server_name
    | eval sourcetype="web:summary"
Aggregation output
server_name bytes_out _raw sourcetype
web-01 9000 {"server_name": "web_01", "bytes_out":9000} web:summary
web-02 1200 {"server_name": "web-02", "bytes_out":1200} web:summary
SPL1 query
CODE
count
SPL2 pipeline statement
CODE
... | stats count() as event_count BY server_name
    | eval sourcetype="web:summary"
Aggregation output
server_name event_count _raw sourcetype
web-01 2 {"server_name": "web-01", "count": 2} web:summary
web-02 1 {"server_name": "web-02", "count":1} web:summary
SPL1 query
CODE
min
SPL2 statement
CODE
... | stats min(bytes_out) as min_bytes BY server_name
    | eval sourcetype="web:summary"
SPL1 query
CODE
max
SPL2 statement
CODE
... | stats max(bytes_out) as max_bytes BY server_name
    | eval sourcetype="web:summary"
Aggregation output
server_name max_bytes_out _raw sourcetype
web-01 8500 {"server_name": "web-01", "max_bytes_out":8500} web:summary
web-02 1200 {"server_name": "web-02", "max_bytes_out":1200} web:summary
SPL1 query
CODE
avg
Note: The statistical function avg is not available to use in an Edge Processor pipeline, but a combination of the sum and count functions and a finalizing query achieves the same calculation result.
SPL2 statement
CODE
... | stats sum(bytes_out) as bytes_out, count(bytes_out) as event_count BY server_name
    | eval sourcetype="web:summary"
Aggregation output
server_name bytes_out event_count _raw sourcetype
web-01 9000 2 {"server_name": "web-01", "bytes_out":9000} web:summary
web-02 1200 1 {"server_name":"web-02", "bytes_out":1200} web:summary
SPL1 query
CODE
span
Input data
_time server_name server_ip file_requested bytes_out sourcetype
2025-01-01 12:00:05 web-01 10.1.2.3 /index.html 500 web
2025-01-01 12:01:22 web-01 10.1.2.3 /images/logo.png 8500 web
2025-01-01 12:00:28 web-02 10.1.2.4 /css/main.css 1200 web
SPL2 statement
CODE
… | stats sum(bytes_out) as bytes_out BY span(_time, 1m), server_name
  | eval sourcetype="web:summary"
Aggregation output
_time server_name bytes_out _raw sourcetype
2025-01-01 12:00:00 web-01 500 {"server_name": "web-01", "bytes_out": 500, "_time":1735732800} web:summary
2025-01-01 12:01:00 web-01 8500 {"server_name": "web-01", "bytes_out": 8500, "_time":1735732860} web:summary
2025-01-01 12:00:00 web-02 1200 {"server_name": "web-01", "bytes_out": 1200} web:summary
SPL1 query
CODE