Monitor AI agents with Splunk APM

Monitor the performance, quality, and token usage of your AI agents with Splunk APM.

The Agents page can help you answer questions such as:

Which of my AI agents are currently degraded in performance?

What AI agents are using the most tokens?

What quality issues are currently affecting my AI agents?

What types of quality issues are most prevalent?

Prerequisites

To monitor AI agents with Splunk APM, set up AI Agent Monitoring.

Note: Ensure that your Log Observer Connect index is set to the index that contains your AI trace data. For instructions, see step 2 of Enable instrumentation-side evaluations.

Monitor all AI agents

To monitor all AI agents, select APM > Agents from the Splunk Observability Cloud main menu. The following screenshot displays an example of the Agents page.

The Agents page in Splunk APM.

On the Agents page, the panels above the table display the aggregate metrics across all your agents. The table displays a list of the instrumented agents in your environment and their individual metrics.

Drill down into the detail view of an AI agent

In the table of agents on the Agents page, select an agent name to navigate to the detail view. The detail view for an agent displays charts for the metrics shown in the table of agents.

The following screenshot displays an example of the detail view for an agent.

Detail view for an AI agent in Splunk APM.

Use the agent detail view to answer questions such as:

When did my agent start experiencing errors or issues?
Is my agent consuming a high number of tokens?
What quality issues is my agent facing?

In the table of agents on the Agents page, select the icon in the Related traces column to navigate to the Trace Analyzer. By default, the Trace Analyzer view is filtered by the agent name and the AI traces only option is enabled. Select the name of a trace to navigate to the trace view.

The following screenshot displays an example of the trace view for an agent.

The trace view for an AI agent in Splunk APM.

On the trace view, you can:

View all of your APM trace details and correlate AI-related data with service and other APM-level data.
Monitor the token usage, number of tool and LLM calls, model names, and relationships associated with the trace.
Use the Agent flow graph to see the calls made between agents and agent quality issues.
In the trace waterfall, select a GenAI span with input and output messages to display the Span properties panel. In this panel:
- Select the Span details tab to correlate the trace and span details with APM tag and attribute data.
- Select the AI details tab to view the metadata, quality scores, and agent input and output messages for the span.

In the table of agents on the Agents page, select the icon in the Related logs column to navigate to the Logs page. This page displays a table of related logs.

Select a log from the table to view additional details about the AI agent calls. You can select the Trace ID or Span ID to display an option to navigate to the related trace or span.

Create a detector to generate alerts for an AI agent

To create a detector to trigger alerts for an AI agent from the Agents page, select the actions (…) menu in the row for an agent and select Create Detector. For more information about detectors and alerts, see Create detectors to trigger alerts.

About AI agent quality scores

The percentage of evaluations that passed for a metric. Splunk Observability Cloud reports quality scores for the following metrics:

Bias: If responses are fair toward certain groups, ideas, or outcomes.
Hallucination: If responses are factually correct or incorrect.
Relevance: If responses are on topic, helpful, and match the user's question or task.
Sentiment: If the tone of responses are positive, negative, or neutral.
Toxicity: How harmful or offensive responses are.

For example, a quality score of 60% means that 60% of a given evaluation passed for the metric. When fewer than 80% of a given evaluation pass for a metric, the agent is flagged as having a quality issue.

Quality scores are calculated from instrumentation-side evaluations. The instrumentation frameworks for your AI applications trigger evaluations performed by DeepEval, an open-source evaluation framework for LLMs, and the Splunk Distribution of the OpenTelemetry Collector sends the evaluation results to Splunk Observability Cloud. Splunk Observability Cloud receives and displays evaluation results, but does not have visibility into your interaction inputs or outputs.

By default, Splunk Observability Cloud samples all collected spans to calculate quality scores. To control the sample rate, you can configure the OTEL_INSTRUMENTATION_GENAI_EVALUATION_SAMPLE_RATE setting when you instrument your application with the Splunk Distribution of the OpenTelemetry Collector. For more information on this setting, see Configure the Python agent for AI applications.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Prerequisites

Monitor all AI agents

Drill down into the detail view of an AI agent

View related traces for an AI agent

View related logs for an AI agent

Create a detector to generate alerts for an AI agent

About AI agent quality scores