Monitor Kubernetes

Monitor and troubleshoot Kubernetes instances with Splunk Observability Cloud.

Note:

This page describes the new Kubernetes monitoring experience. For documentation on the classic Kubernetes navigator, see Monitor Kubernetes (classic navigator).

Feature availability depends on your Splunk Distribution of the OpenTelemetry Collector for Kubernetes version. Use version 0.138.1 or higher for access to all features. For upgrade instructions, see Upgrade the Collector for Kubernetes.

Monitor and troubleshoot Kubernetes instances with Splunk Observability Cloud. Splunk Observability Cloud uses the Splunk Distribution of OpenTelemetry Collector for Kubernetes to provide robust infrastructure monitoring capabilities.

Use the Kubernetes entities page to:

  • Get an overview of your Kubernetes infrastructure.

  • Monitor the health of your Kubernetes infrastructure.

  • Identify and diagnose an issue with your Kubernetes infrastructure.

  • View services and hosts running on Kubernetes.

Prerequisites

To monitor Kubernetes instances with the Kubernetes entities page, you must meet the following requirements.

  • You have logged into Splunk Observability Cloud with administrator credentials.

Access the Kubernetes entities page

To access the Kubernetes entities page, you must first opt in to the new Kubernetes experience:

  1. In the Splunk Observability Cloud main menu, select Infrastructure > Kubernetes.

  2. Select a Kubernetes navigator.

  3. Use one of the following methods to opt in to the new experience.

    1. Select Switch now from the banner displayed on the aggregate view for the navigator.

    2. Select a Kubernetes instance name. In the displayed banner, select Switch to new experience.

After you opt in to the new Kubernetes experience, you can access the page by using the Splunk Observability Cloud main menu to select Infrastructure > Kubernetes entities.

Refine your view with the left navigation panel

Use the left navigation panel in any view to quickly switch between Kubernetes entity types, search for filters, use predefined filters, and view or use recently used filters. This panel is available in all views.

To refine your view with the left navigation panel, use the following features:

  • Hide Controls: Select this option to collapse the left navigation panel.

  • Kubernetes entities: Select a Kubernetes entity type from this menu to switch between entity types.

  • Quick filters: Use this panel to search for filters, use or search for predefined filters, or view and use recently used filters.

Monitor Kubernetes instances with aggregate views

On the Kubernetes entities page, you can monitor all Kubernetes instances of a selected entity type using the Table and Dashboard tabs.

The following sections describe how to use the aggregate views:

View or pause real-time Kubernetes data

By default, the Kubernetes entities page streams real-time data with a minor delay and displays the most recent data point. The LIVE icon next to the time filter appears when the configured time range ends with Now.

Select the pause button to set the ending time range to the current time. To set the ending time range to Now again, select the play button. The starting time range isn't affected by either action.

Customize the table view for Kubernetes entities

On the Kubernetes entities page, the Table tab is selected by default. The table displays the list of Kubernetes instances with customizable columns for metrics and attributes.

To customize the columns in the table, select the gear icon in the upper-right corner. The drop-down menu displays the current columns in the table. Use the drop-down menu to add, remove, and reorder table columns.

View and filter by OpenTelemetry tags in the table view

To view OpenTelemetry tag keys or values in the table view, follow the steps to customize the table view. You can use the drop-down menu to add a table column for:

  • Tags, which displays all OpenTelemetry tag keys for each instance.

  • A specific tag key. The column displays the tag value for each instance with a matching tag key.

The table displays the first tag key or value for each instance. To view the full list, select the +<number> button next to the first tag key or value.

When you select the +<number> button for an instance, you can use the Search bar above the table to filter the list by specific tag keys or values.

Monitor entity performance with the Dashboard tab

On the Kubernetes entities page, select the Dashboard tab to display a dashboard that summarizes the performance of the selected entity type.

Use this view to quickly identify the instances that consume the most resources, cause the highest number of errors, and contribute to slowing down your infrastructure.

Monitor Kubernetes pod statuses

On the Kubernetes entities page, select Pods from the Kubernetes entities drop-down menu. The table of pods includes a Status column. When you select the name of a pod, the detail view also displays the status next to the pod name.

Splunk Observability Cloud displays the following pod statuses.

Kubernetes pod statuses

Pod status name Description
Pending The pod has been accepted by the Kubernetes cluster, but one or more of the containers has not been set up and made ready to run. This includes the time a pod spends waiting to be scheduled as well as the time spent downloading container images over the network.
Running The pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.
Succeeded All containers in the pod have terminated successfully and will not be restarted.
Failed All containers in the pod have terminated successfully and will not be restarted.
Evicted The pod has been terminated and removed from a node by the kubelet. This typically occurs due to resource pressure on the node.
NodeAffinity The pod cannot be scheduled onto a node because the node's labels do not satisfy the node affinity rules defined in the pod's specification, or in the PersistentVolume (PV) associated with the pod.
NodeLost The node has become unreachable by the control plane, causing the pods on that node to enter an Unknown state.
Shutdown The pod has been terminated.
UnexpectedAdmissionError The pod was rejected during the admission process, meaning the kubelet was unable to admit or start the pod on the node.
CrashLoopBackOff The backoff delay mechanism is currently in effect for a given container that is in a crash loop, or failing and restarting repeatedly.
CreateContainerConfigError The pod cannot be created because there is an issue with the configuration specified for one of its containers.
ErrImagePull The kubelet on the node where the pod is scheduled was unable to successfully pull the container image specified in the pod's definition from the container registry.
ImagePullBackOff The pod is unable to start because the container image specified in its configuration cannot be pulled from the container registry.
Error One of the containers in the pod exited with a non-zero exit code, indicating atypical shutdown.
ContainerCannotRun A container within the pod failed to start successfully.
Unknown The state of the pod could not be obtained. This typically occurs due to an error in communicating with the node where the pod should be running.

Troubleshoot performance with the Kubernetes analyzer in aggregate views

This feature is available for Kubernetes nodes, pods, and containers.

In any aggregate view on the Kubernetes entities page, select the Kubernetes analyzer icon next to the Group by drop-down menu to analyze the performance of the selected entity type.

For more information on using the Kubernetes analyzer, see Troubleshoot performance with the Kubernetes analyzer.

Configure a Kubernetes entity type to require a filter to display data

When a Kubernetes entity exceeds the metric time series (MTS) limit processed in a signal, the aggregate views on the Kubernetes entities page display only a subset of the data until you apply a filter to limit the results. For more information on MTS limits, see Maximum number of metric time series processed in a signal.

MTS limit violations can cause low responsiveness on the Kubernetes entities page. To prevent this, you can configure a Kubernetes entity type to require a filter to display data. When you configure this setting, the aggregate views for the entity type will not display data until a filter is added.

To configure a Kubernetes entity type to require a filter to display data:

  1. In any aggregate view on the Kubernetes entities page, select the actions menu (...) in the upper-right corner.

  2. Check the box for Require filters for data display.

Monitor a Kubernetes instance with the detail view

On the Kubernetes entities page, select the name of a Kubernetes instance to navigate to the detail view to explore more information about the instance.

The following sections describe how to use the detail view:

Investigate instances with the cluster map

This feature is available for Kubernetes clusters, nodes, pods, and containers.

In the detail view for an instance, select the Cluster map tab to monitor your Kubernetes infrastructure with an interactive hierarchical map that displays the child resources associated with the selected instance. You can select elements in the map to drill down into them or use the filter to explore your data. The level of detail shown on the map is dynamic and depends on the number of displayed elements.

View Kubernetes events associated with an instance

Note: This feature is available for Kubernetes nodes and pods. It requires enabling Log Observer Connect and configuring a default connection to retrieve entity logs. For instructions, see

Get started with Log Observer Connect.

In the detail view for an instance, select the K8s Events tab to view an events rate chart that groups events by severity and a searchable table that lists the Kubernetes events associated with the entity type.

Search embedded logs

Note: This feature is available for Kubernetes nodes. It requires enabling Log Observer Connect and configuring a default connection to retrieve entity logs. For instructions, see

Get started with Log Observer Connect

.

Search for specific keywords within logs embedded in Kubernetes entities for faster troubleshooting and log analysis. Your search does not affect the Log Chart Summary in Log Observer Connect, ensuring data integrity.

To search embedded logs, navigate to the detail view for an instance and select the Logs tab. Use one of the search bars to search for the keyword that you want to find in embedded logs.

Searches are case insensitive and treat the keywords you enter as a single string, aligning with Log Observer Connect behavior. When you view the logs in Log Observer Connect, the search persists to maintain context.

Troubleshoot performance with the Kubernetes analyzer

This feature is available for Kubernetes nodes, pods, and containers.

In the detail view for an instance, select the Analyzer tab to access the Kubernetes analyzer. The analyzer helps you troubleshoot Kubernetes problems at scale by highlighting Kubernetes instances that are in a bad state, such as nodes that are not ready. The analyzer produces theories about what those instances might have in common, such as that all of the instances are running the same workload or all instances are located in the same AWS region.

The analyzer displays suggested filters for the elements selected in the table view. Select links in the analyzer to add filters to the table view and explore conditions across your entire Kubernetes environment.

The analyzer uses AI-driven insights to examine potential patterns between nodes, pods, or containers. The trouble indicators are:

  • Pods that are in pending status

  • Pods that are in failed status

  • Pods with unknown condition

  • Containers with high restart counts

  • Nodes not ready

  • Nodes with unknown condition

  • Nodes experiencing high CPU

  • Nodes experiencing high memory

The analyzer displays overrepresented metrics properties for known conditions, such as pods in pending status, pods in failed status, and so on. You can use properties that are highly correlated with these conditions to filter the table.

View and compare YAML configuration files for a Kubernetes instance

Note:

This feature is available for Kubernetes pods and is enabled by default in versions 0.138.1 or higher of the Splunk Distribution of the OpenTelemetry Collector for Kubernetes.

For instructions on how to enable this feature in lower versions of the Collector, see Collect YAML configuration files with the Collector for Kubernetes version 0.138.1 and lower.

In the detail view for an instance, select the YAML tab to display the configuration file for the instance. You can use this tab to compare deployment versions and debug errors. For example, you can compare the arguments that were passed to your applications.

Splunk Observability Cloud displays YAML files up to 7 days old. By default, YAML files are collected every 6 hours. To update the collection interval, update the interval value in your Collector values.yaml file. For instructions, see Collect YAML configuration files with the Collector for Kubernetes version 0.138.1 and lower.

In the YAML tab, you can:

  • Enable the Compact Manifest setting to show only relevant metadata in the configuration file.

  • Select Compare and use the drop-down menus to select two YAML files. Each YAML file is identified by the timestamp that it was collected.

    By default, the Split view displays the two YAML files and their differences side by side. Select Unified to show the differences as part of a single file.