Schedule demo

Amazon EKS Monitoring


Overview

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service that facilitates smooth running of Kubernetes on both AWS and on-premises. EKS is integrated with many AWS services to provide scalability and security and helps guarantee high availability of your clusters and its resources across numerous availability zones. There is no need to install, operate and maintain our own Kubernetes control plane or node.

Creating a new monitor

Prerequisites for monitoring Amazon EKS metrics: Click here

To learn how to create a new Amazon EKS monitor, refer here.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the EKS instance available under Amazon in the Cloud Apps section. Displayed is the Amazon EKS bulk configuration view distributed into three tabs:

  • Availability tab gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configurations.

By clicking a monitor from the list, you'll be taken to the Amazon EKS monitor dashboard. It has 6 tabs:

Note:
  • If a node is not in the 'Ready' state, the availability of the node will be automatically affected. Also, the health of the node in the monitor is dependent on the following parameters by default and alerts can be configured for the same under Settings → Performance Polling → Optimize Data Collection → Elastic Kubernetes Service:
    • EKS Node Memory Pressure
    • EKS Node Disk Pressure
    • EKS Node PID Pressure
    • EKS Node Out of Disk
    • EKS Node Network Unavailable

Mode of Monitoring

  • REST API
    • Metadata/Service API (DescribeCluster)
    • CloudWatch API (Container Insights)
  • Kubectl
Note:
  • Cluster Information is collected from Metadata API.
  • Metrics marked with * are collected from AWS CloudWatch.
  • Rest of the metrics are collected from kubectl utility.

Overview

ParameterDescription
CLUSTER INFORMATION
Cluster StatusCurrent status of the cluster. (CREATING, ACTIVE, DELETING, FAILED, UPDATING)
Cluster ARNThe Amazon Resource Name (ARN) of the cluster.
Cluster EndpointThe endpoint for your Kubernetes API server.
NAMESPACE DETAILS
Namespace NameName of the namespace
Running pods in namespace *Number of running pods under the namespace.
Resource VersionThe version number of the namespace.
Namespace AvailabilityAvailability of the Namespace
Namespace Created TimeTimestamp at which the Namespace was created
PODS
Used Pods %Percentage of the number of pods used.
Used PodsNumber of pods used.
Maximum PodsMaximum number of pods available.
Top 5 Nodes by Used Pods - Displays a graphical representation of top 5 nodes with respect to the number of pods used.
CLUSTER USAGE DETAILS
Avg Cluster CPU UsageAverage amount of CPU used by the cluster (in percentage).
Avg Cluster Memory UsageAverage amount of memory used by the cluster (in percentage).
API SERVER COMPONENT ERROR RATES
API Server - 5xx RequestsTotal number of server-side HTTP 5xx error responses from the Kubernetes API server components aggregated between the poll interval.
API Server - 4xx RequestsTotal number of client-side HTTP 4xx error responses from the Kubernetes API server components aggregated between the poll interval.
API Server - 429 RequestsTotal number of throttled API requests (HTTP 429) from the Kubernetes API server components aggregated between the poll interval.
API SERVER COMPONENTS INFLIGHT REQUESTS
API Server - Read Only Inflight RequestsAverage number of ongoing read-only requests processed by the API server between the poll interval.
API Server - Mutating Inflight RequestsAverage number of ongoing mutating requests processed by the API server between the poll interval.
NODE DETAILS
Master NodesNumber of master nodes available.
Worker NodessNumber of worker nodes available.
Failed Nodes *Number of nodes that have failed.
CLUSTER DETAILS
Git VersionGit version of the cluster.
Build DateBuild date of the cluster.
CompilerName of the compiler used.
PlatformOS Platform of the cluster.
COMPONENT DETAILS

Note: Component Details monitoring is not supported from Applications Manager version 17.86, as the kubectl get componentstatuses command is deprecated by Kubernetes starting from version 1.19.

Component NameName of the component.
AvailabilityAvailability status of the component.
NODE GROUP DETAILS
Node group NameThe name associated with an Amazon EKS managed node group.
Node group StatusThe current status of the managed node group. (CREATING, ACTIVE, UPDATING, DELETING, CREATE_FAILED, DELETE_FAILED, DEGRADED)
Kubernetes VersionThe Kubernetes version of the managed node group.
Release VersionAMI ID in launch template (OR) EKS version of the optimized AMI.
Minimum Node SizeThe minimum number of nodes that the managed node group can scale into.
Maximum Node SizeThe maximum number of nodes that the managed node group can scale out to.
Desired Node SizeThe desired number of nodes that the managed node group should maintain.
Instance TypesIf the node group wasn't deployed with a launch template, then this is the instance type that is associated with the node group. If the node group was deployed with a launch template, then this is null.
Created AtThe timestamp when the node group was created.
FARGATE PROFILE DETAILS
Fargate Profile NameThe name of the Fargate profile.
Fargate Profile StatusThe current status of the Fargate profile. (CREATING, ACTIVE, DELETING, CREATE_FAILED, DELETE_FAILED)
Created AtThe timestamp when fargate profile was created.
SubnetsThe IDs of subnets to launch pods into.
 
Note:API Server Component graph metrics are disabled from data collection by default and mapped under performance polling as 'API Server Component Metrics'. To enable data collection, navigate to Settings → Performance Polling, select the Optimize Data Collection tab, choose Elastic Kubernetes Service as the monitor type and API Server Component Metrics as the metric name, then set the preferred time interval.

Node

ParameterDescription
Node CPU Utilization
Node CPU UtilizationThe average percentage of CPU utilized by each node at the time of polling.
Node Memory Utilization
Node Memory UtilizationThe average percentage of memory used by each node at the time of polling.
Top 5 Nodes by Memory Details - Displays a graphical representation of top 5 nodes with respect to their corresponding memory details (in percentage).
Top 5 Nodes by CPU Details - Displays a graphical representation of top 5 nodes with respect to their corresponding CPU details (in percentage).
Node Usage Details
Node NameName of the node
Allocatable MemoryThe CPU resources of a node that are available for scheduling (in GiB).
Memory LimitThe maximum limit of memory resource which can be used (in percentage).
Memory RequestNumber of memory requests (in percentage).
Allocatable CPUThe number of CPU processes that are available.
CPU LimitThe maximum limit of CPU resource which can be used (in percentage).
CPU RequestNumber of CPU requests (in percentage).
Network Total Usage *The total number of data transmitted and received over the network per node in a cluster (in kB/s).
File System Usage *The total amount of file system capacity being used on nodes in the cluster (in percentage).
Running containers in node *The number of running containers per node in a cluster.
Node Pod Details
Node NameName of the pod
Pod UsageDisplays a graphical representation of the total number of pods available with used and free pods split-up.
kube-system PodsNumber of Kube state pods.
Non kube-system PodsNumber of non-Kube state pods.
ImagesNumber of images present in the node.
Used PodsTotal num of pods present in Kubernetes.
Allocatable PodsNumber of pods that are available.
NODE DETAILS
Node NameName of the node
Instance IDEC2 Instance ID of the node.
OS ImageOS Image name of the node.
OSName of the OS in which the container is deployed.
ArchitectureArchitecture details of the node.
TypeType of node used.
Kubelet VersionThe version of Kubelet used.
Allocatable Ephemeral StorageSize of temporary memory available (in GiB).
Created TimeTimestamp at which the node was created.

Pods

ParameterDescription
Pod CPU UtilizationThe average CPU usage percentage of individual pods at the time of polling.
Pod Memory UtilizationThe average memory usage percentage of individual pods at the time of polling.
Top 10 Pods by CPU Usage (%) - Displays a graphical representation of top 10 pods with respect to their corresponding CPU details (in percentage).
Top 10 Pods by Memory Usage (%) - Displays a graphical representation of the top 10 pods based on their memory utilization percentage.
POD CPU AND MEMORY USAGE
Pod UUIDUniversal unique ID of the pod.
Pod NameName of the pod.
Pod NamespaceNamespace of the pod.
Pod CPU Used (millicores)The amount of CPU consumed by the pod (in millicores).
Pod CPU Usage (%)The percentage of CPU utilized by the pod.
Pod Memory Used (MiB)The amount of memory used by the pod (in MiB).
Pod Memory Usage (%)The percentage of memory utilized by the pod.
POD PERFORMANCE STATISTICS
Pod UUIDUniversal unique ID of the pod.
Pod NameName of the pod
Pod NamespaceNamespace of the pod.
Pod Containers CountThe number of containers run by the pod.
Pod CPU Limit(millicores)The maximum amount of CPU resources that all containers in a Pod are allowed to use collectively (in millicores). If 0, no limit is set, then pod can use CPU upto the node capacity.
Pod CPU Limit(%)The maximum amount of CPU resources that all containers in a Pod are allowed to use collectively (in %). It is the sum of the CPU limits set on each container within the Pod. The Kubernetes kubelet enforces this limit to ensure the Pod does not exceed the specified CPU usage on the node. If a container tries to use more CPU than its limit, it will be throttled.
Pod CPU Request (millicores)The guaranteed minimum amount of CPU resources that all containers in a Pod collectively request (in millicores). If 0, no CPU is guaranteed and it can use CPU only if the node has free capacity.
Pod CPU Request(%)The guaranteed minimum amount of CPU resources that all containers in a Pod collectively request (in %). It is the sum of the CPU requests of all containers in the Pod. Kubernetes uses this value to schedule the Pod onto a node that has enough CPU capacity to meet the request.
Pod Memory Limit(in MiB)The maximum amount of memory (RAM) that all containers in a Pod can use collectively (in MiB).If 0, no limit is set, then pod can use memory upto the node capacity.
Pod Memory Limit(in %)The maximum amount of memory (RAM) that all containers in a Pod can use collectively (in %). It is the sum of the memory limits of all containers in the Pod. The kubelet enforces this limit; if a container exceeds its memory limit, it may be terminated.
Pod Memory Request (in MiB)The minimum amount of memory that all containers in a Pod request collectively (in MiB). If 0, no CPU is guaranteed and it can use memory only if the node has free capacity.
Pod Memory Request (in %)The minimum amount of memory that all containers in a Pod request collectively (in %). It is the sum of the memory requests of all containers in the Pod. Kubernetes uses this value to schedule the Pod onto a node that has enough available memory to satisfy the request.
Network Transmitted Bytes *Amount of data transmitted over the network by the pod (in kB/s).
Network Received Bytes *Amount of data received over the network by the pod (in kB/s).
POD DETAILS
Pod Node NameName of the node on which the pod is running.
Pod ApplicationName of the pod application.
Pod TypeType of pod used.
Pod CreatedMedium by which the pod was created.
Pod StatusCurrent status of the pod (Pending, Running, Succeeded, Failed, or Unknown).
Pod AgeAmount of time elapsed since the pod was created (in days).
Pod Start TimeTimestamp at which the pod was started.
Pod Created TimeTimestamp at which the pod was created.
Top 10 Pods by CPU Details - Displays a graphical representation of the top 10 pods based on their CPU usage (in %).
Top 10 Pods by Memory Details - Displays a graphical representation of the top 10 pods based on their memory usage (in %).
CONTAINER DETAILS
Container IDID of the container.
Container NameName of the container.
Container ImageName of the container image.
Container Pod NameName of the container pod.
Container RestartsThe number of times the container has restarted.
Container StatusStatus of the container. Following are the list of possible values that are shown for each status:
StatusValue
RunningRunning
Waiting
  • ContainerCreating
  • CrashLoopBackOff
  • ErrImagePull
  • ImagePullBackOff
  • CreateContainerConfigError
  • InvalidImageName
  • CreateContainerError
Terminated
  • OOMKilled
  • Error
  • Completed
  • ContainerCannotRun
  • DeadlineExceeded
Container Start TimeTimestamp at which the container was started.

Services

ParameterDescription
SERVICE DETAILS
Service UUIDUniversal unique ID of the service.
Service NameName of the service.
Service NamespaceName of the Namespace in which the service resides.
ApplicationName of the service application.
Service TypeType of the service.
Service ProtocolName of the service protocol.
Host IP AddressIP address of the service host.
Service Target PortName of the port that connects with the service.
Running pods in service *The number of pods running the service or services in the cluster.
Created TimeTimestamp at which the service was created.
DEPLOYMENT DETAILS
Deployment UUIDUniversal unique ID of the deployment.
Deployment NameName of the deployment.
Deployment NamespaceNamespace where the deployment exists.
Deployment ReplicasThe number of replicas in a deployment.
Deployment Available ReplicasNumber of available replicas in a deployment.
Deployment AvailabilityAvailability of the deployment.

Persistent Volumes

ParameterDescription
PERSISTENT VOLUMES (PV) DETAILS
PV NameName of the Persistent Volume.
PV StatusStatus of the Persistent Volume. (Available, Bound, Released, Failed, or Pending)
PV ClaimName of the Persistent Volume Claim.
PV Access ModeThe mode through which you can access the Persistent Volume.
PV Storage ClassName of the Persistent Volume storage class.
PV CapacityThe capacity of the Persistent Volume (in GiB).
PV Created TimeTimestamp at which the Persistent Volume was created.
PERSISTENT VOLUMES CLAIM (PVC) DETAILS
PVC UUIDUniversal unique ID of the persistent volume.
PVC NameName of the Persistent Volume Claim.
PVC NamespaceName of the Namespace in which the Claim exists.
PVC StatusStatus of the Persistent Volume Claim. (Available, Bound, Released, Failed, or Pending)
PV NameName of the Persistent Volume.
PVC Access ModeThe mode through which you can access the Persistent Volume Claim.
PVC Storage ClassName of the Persistent Volume storage class.
PVC RequestsNumber of Persistent Volume Claim requests (in GiB).
PVC Created TimeTimestamp at which the Persistent Volume Claim was created.

Service Map

  • Displays a graphical map view containing namespace and service details.
  • All the namespace with its status and running pods count will be seen inside cluster circle.
  • Green color indicates that the namespace is UP and red color indicates it is DOWN.
  • Under the cluster services under a namespace can be seen as tree.
  • Each service contains, its host IP, port and number of running pods.

Applications Manager Amazon EKS Monitoring: Service map view of Kubernetes clusters in Amazon EKS

Loved by customers all over the world

"Standout Tool With Extensive Monitoring Capabilities"

It allows us to track crucial metrics such as response times, resource utilization, error rates, and transaction performance. The real-time monitoring alerts promptly notify us of any issues or anomalies, enabling us to take immediate action.

Reviewer Role: Research and Development

carlos-rivero
"I like Applications Manager because it helps us to detect issues present in our servers and SQL databases."
Carlos Rivero

Tech Support Manager, Lexmark

Trusted by over 6000+ businesses globally