Monitor AI agents with Splunk APM
Monitor the performance, token usage, and quality of your AI agents with Splunk APM.
Monitor the performance, quality, and token usage of your AI agents with Splunk APM.
The Agents page can help you answer questions such as:
-
Which of my AI agents are currently degraded in performance?
-
What AI agents are using the most tokens?
-
What quality issues are currently affecting my AI agents?
-
What types of quality issues are most prevalent?
Prerequisites
To monitor AI agents with Splunk APM, set up AI Agent Monitoring.
Monitor all AI agents
To monitor all AI agents, select from the Splunk Observability Cloud main menu. The following screenshot displays an example of the Agents page.
On the Agents page, the panels above the table display the aggregate metrics across all your agents. The table displays a list of the instrumented agents in your environment and their individual metrics.
Drill down into the detail view of an AI agent
In the table of agents on the Agents page, select an agent name to navigate to the detail view. The detail view for an agent displays charts for the metrics shown in the table of agents.
The following screenshot displays an example of the detail view for an agent.
Use the agent detail view to answer questions such as:
-
When did my agent start experiencing errors or issues?
-
Is my agent consuming a high number of tokens?
-
What quality issues is my agent facing?
View related logs for an AI agent
In the table of agents on the Agents page, select the icon in the Related logs column to navigate to the Logs page. This page displays a table of related logs.
Select a log from the table to view additional details about the AI agent calls. You can select the Trace ID or Span ID to display an option to navigate to the related trace or span.
Create a detector to generate alerts for an AI agent
To create a detector to trigger alerts for an AI agent from the Agents page, select the actions (…) menu in the row for an agent and select Create Detector. For more information about detectors and alerts, see Create detectors to trigger alerts.
About AI agent quality scores
A quality score is a percentage between 0-100% that measures the following metrics for AI agent responses:
-
Bias: If responses are fair toward certain groups, ideas, or outcomes.
-
Hallucination: If responses are factually correct or incorrect.
-
Relevance: If responses are on topic, helpful, and match the user's question or task.
-
Sentiment: If the tone of responses are positive, negative, or neutral.
-
Toxicity: How harmful or offensive responses are.
Quality scores are calculated from evaluations. An agent is flagged as having a quality issue when less than 80% of the evaluations pass for a metric.
Quality scores are calculated from instrumentation-side evaluations. The instrumentation frameworks for your AI applications trigger evaluations performed by DeepEval, an open-source evaluation framework for LLMs, and the Splunk Distribution of the OpenTelemetry Collector sends the evaluation results to Splunk Observability Cloud. Splunk Observability Cloud receives and displays evaluation results, but does not have visibility into your interaction inputs or outputs.
By default, Splunk Observability Cloud samples all collected spans to calculate quality scores. To control the sample rate, you can configure the OTEL_INSTRUMENTATION_GENAI_EVALUATION_SAMPLE_RATE setting when you instrument your application with the Splunk Distribution of the OpenTelemetry Collector. For more information on this setting, see Configure the Python agent for AI applications.