AWS ECS Monitoring


AWS ECS - Overview

Amazon Elastic Container Service (Amazon ECS) is a highly scalable, fast, container management service that makes it easy to run, stop, and manage Docker containers on a cluster. Amazon ECS lets you launch and stop container-based applications with simple API calls, allows you to get the state of your cluster from a centralized service, and gives you access to many familiar Amazon EC2 features.

Creating a new AWS ECS monitor

To learn how to create a new AWS ECS monitor, refer here.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the ECS available under Amazon in the Cloud Apps section. Displayed is the Amazon ECS bulk configuration view distributed into three tabs:

  • Availability tab gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configurations.

By clicking a monitor from the list, you'll be taken to the AWS ECS dashboard. It has 5 tabs:

Overview

Parameter Description
CLUSTER INFORMATION
Name Name of the cluster.
Status The status of the cluster. (ACTIVE, PROVISIONING, DEPROVISIONING, FAILED, INACTIVE)
Cluster ARN The Amazon Resource Name (ARN) that identifies the cluster.
Registered Container Instances The number of containers instances registered to the cluster.
CLUSTER TASKS
Running Tasks The number of tasks that are in RUNNING state.
Pending Tasks The number of tasks that are in PENDING state.
Running EC2 Tasks The number of EC2 tasks that are in RUNNING state.
Pending EC2 Tasks The number of EC2 tasks that are in PENDING state.
Running Fargate Tasks The number of Fargate tasks that are in RUNNING state.
Pending Fargate Tasks The number of Fargate tasks that are in PENDING state.
CLUSTER SERVICES
Active Services The number of active services running in the cluster.
Active EC2 Services The number of EC2 services that are running on the cluster in an ACTIVE state.
Draining EC2 Services The number of EC2 services that are in DRAINING state.
Active Fargate Services The number of Fargate services that are running on the cluster in an ACTIVE state.
Draining Fargate Services The number of Fargate services that are in DRAINING state.

Cluster Performance

Parameter Description
CPU
CPU Reservation Amount of CPU units that are reserved by running tasks in the cluster (in percentage).
CPU Utilization Amount of CPU units that are used in the cluster (in percentage).
MEMORY
Memory Reservation Amount of memory that is reserved by running tasks in the cluster (in percentage).
Memory Utilization Amount of memory that is used in the cluster (in percentage).
CPU USAGE
CPU Reserved The CPU units reserved by tasks in the cluster. This metric is collected only for tasks that have a defined CPU reservation in their task definition.
CPU Utilized The CPU units used by tasks in the cluster. This metric is collected only for tasks that have a defined CPU reservation in their container definition.
MEMORY USAGE
Memory Reserved Amount of memory that is reserved by tasks in the cluster (in GB).
Memory Utilized Amount of memory being used by tasks in the cluster (in GB).
DISK I/O THROUGHPUT
Storage Read Rate Rate at which data is read from storage in the cluster (in kB/s).
Storage Write Rate Rate at which data is written from storage in the cluster (in kB/s).
CLUSTER NETWORK I/O
Data Transmit Rate Rate at which data is transmitted by the cluster (in kB/s).
Data Receive Rate Rate at which data is received by the cluster (in kB/s).

Tasks

Parameter Description
Task Details
Task ID The unique identifier for the task.
Health Status The health status for the task, which is determined by the health of the essential containers in the task. (HEALTHY, UNHEALTHY, UNKNOWN)
Last Status The last known status of the task. (PROVISIONING, PENDING, ACTIVATING, RUNNING, DEACTIVATING, DEPROVISIONING, STOPPED)
Desired Status Displays the desired status of the task.
Launch Type The launch type on which you task is running. (EC2 or Fargate)
Connectivity The connectivity status of a task. (CONNECTED or DISCONNECTED)
Connectivity At Shows the time stamp for when the task went to CONNECTED state.
Configured CPU Units Shows the number of CPU units used by the task.
Configured Memory Displays the amount of memory used by the task (in MB).
Number of Containers Number of containers in which the task is running.
Task Info
Task ID The unique identifier for the task.
Container Instance ID The unique identifier for the container instance.
Task Group The name of the task group associated with your task.
Task Definition The full description of the task definition.
Created At Shows the time stamp for when the task was created.
Started At Shows the time stamp for when the task was started.
Started By Shows the tag specified when a task is started.

Services

Parameter Description
Service Status
Service Name The name of the service.
Status The status of the service. (ACTIVE, DRAINING, INACTIVE)
CPU Utilization The percentage of CPU units that are used in the service.
Memory Utilization The percentage of memory that are used in the service.
Pending Tasks The number of tasks in the service that are in the PENDING state.
Running Tasks The number of tasks in the service that are in the RUNNING state.
Desired Tasks The desired number of instantiations of the task definition to keep running on the service.
Launch Type The launch type on which your service is running. (EC2 or FARGATE)
Scheduling Strategy The scheduling strategy to use for the service. (REPLICA or DAEMON)
Service Insights
Service Name The name of the service.
CPU Reserved The CPU units reserved by tasks in the service. This metric is collected only for tasks that have a defined CPU reservation in their task definition.
CPU Utilized The CPU units used by tasks in the service. This metric is collected only for tasks that have a defined CPU reservation in their container definition.
Memory Reserved The memory that is reserved by tasks in the service (in GB).
Memory Utilized The memory being used by tasks in the service (in GB).
Storage Read Rate Rate at which data is read from storage in the service (in kB/s).
Storage Write Rate Rate at which data is written from storage in the service (in kB/s).
Data Transmit Rate Rate at which data is transmitted by the service (in kB/s).
Data Receive Rate Rate at which data is received by the service (in kB/s).
TaskSet Count The number of task sets in the service.
Service Details
Service Name The name of the service.
Task Definition The task definition to use for tasks in the service.
Platform Version The platform version on which to run your service.
Created At The timestamp for when the service was created.
Created By The principal that created the service.
Service Events
Event ID Indicates the ID for the event.
Service Name Name of the service.
Generated Time Date and time at which the event was generated.
Message The message shown for the event.

Container Instances

Parameter Description
CPU
Instance CPU Reserved Capacity The percentage of CPU currently being reserved on a single EC2 instance in the cluster.
Instance CPU Utilization The total percentage of CPU units being used on a single EC2 instance in the cluster.
MEMORY
Instance Memory Reserved Capacity The percentage of memory currently being reserved on a single EC2 instance in the cluster.
Instance Memory Utilization The total percentage of memory being used on a single EC2 instance in the cluster.
CPU USAGE
Instance CPU Used Amount of CPU units being used on a single EC2 instance in the cluster.
Instance CPU Remaining Amount of CPU units remaining after use on a single EC2 instance in the cluster.
Instance CPU Limit Maximum amount of CPU units that can be assigned to a single EC2 instance in the cluster.
MEMORY
Instance Memory Used Amount of memory being used on a single EC2 instance in the cluster (in GB).
Instance Memory Remaining Amount of memory remaining after use on a single EC2 instance in the cluster (in GB).
Instance Memory Limit Maximum amount of memory that can be assigned to a single EC2 instance in the cluster (in GB).
FILESYSTEM UTILIZATION
Instance FileSystem Utilization Total amount of file system capacity being used on a single EC2 instance in the cluster (in percentage).
NETWORK USAGE
Instance Network Traffic Rate Rate at which data is sent and received over the network on a single EC2 instance in the cluster (in kB/s).
Container Instances
Container Instance ID The ID of the container instance.
Status The status of the container instance. (REGISTERING, REGISTRATION_FAILED, ACTIVE, INACTIVE, DEREGISTERING, DRAINING)
Running Tasks Number of tasks that are in RUNNING state per container instance.
Pending Tasks Number of tasks that are in PENDING state per container instance.
Version The version counter for the container instance. Every time a container instance experiences a change that triggers a CloudWatch event, the version counter is incremented.
Agent Version The version number of the Amazon ECS container agent.
Agent Connected Indicates whether the agent is connected to Amazon ECS. (True or False)
Instance ID The EC2 instance ID of the container instance.
Registered At The timestamp for when the container instance was registered.
Container Instances Insights
Container Instance ID The ID of the container instance.
Memory Reserved Capacity Amount of memory currently being reserved on the instance (in percentage).
Memory Utilization Amount of memory currently being used on the instance (in percentage).
CPU Reserved Capacity Amount of CPU currently being reserved on the instance (in percentage).
CPU Utilization Amount of CPU currently being used on the instance (in percentage).
FileSystem Utilization Amount of file system capacity being used on the instance (in percentage).
Network Traffic Rate Rate at which data is sent and received over the network on the instance (in kB/s).
Instances Resource Details
Container Instance ID The unique identifier for the container instance.
Available CPU Amount of CPU units available to allocate tasks.
Available Memory Amount of memory available to allocate tasks (in MB).
Reserved Ports The ports that were reserved by the Amazon ECS container agent.