AWS Step Function Monitoring


AWS Step Function - Overview

AWS Step Functions is a managed service that lets you build and run serverless workflows by coordinating multiple AWS services. It ensures reliable execution of business processes with visual workflow design and monitoring.

Applications Manager tracks execution outcomes (failed, timed-out, throttled, aborted, succeeded), execution rates, and duration to ensure workflow health. For express state machines, it also monitors billed duration and billed memory to aid cost optimization.

Creating a new AWS Step Function monitor

To learn how to create a new AWS Step Function monitor, refer here.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the Step Function instance available under Amazon in the Cloud Apps section. Displayed below is the AWS Step Function bulk configuration view distributed into three tabs:

  • Availability tab gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configurations.

By clicking a monitor from the list, you'll be taken to the AWS Step Function dashboard which includes the following tabs:

Performance Overview

ParameterDescription
STATE MACHINE INFORMATION
Type The type of state machine.
Status The current status of the state machine.
Versions The maximum number of versions published for the state machine at the time of polling.
Aliases The maximum number of versions created for the state machine at the time of polling.
STATE MACHINE EXECUTION PERFORMANCE (%)
Failed Execution Percentage The percentage of executions that failed at the time of polling (in %).
Timed Out Execution Percentage The percentage of executions that timed out between the poll interval (in %).
Aborted Execution Percentage The percentage of executions aborted between the poll interval (in %).
Throttled Execution Percentage The percentage of executions throttled between the poll interval (in %).
Succeeded Execution Percentage The percentage of executions that succeeded between the poll interval (in %).
Total Execution Failure Percentage The total percentage of failed executions (failed, timed out, aborted, throttled) between the poll interval (in %).
EXECUTION ERRORS
Failed Executions The total number of executions that failed between the poll interval (in count).
Timed Out Executions The total number of executions that timed out between the poll interval (in count).
Aborted Executions The total number of executions that were aborted between the poll interval (in count).
Throttled Executions The total number of executions that were throttled between the poll interval (in count).
TOTAL EXECUTION
Total Executions Rate The total number of state machine executions (failed, timed out, aborted, throttled, successful) per minute between the poll interval (in count/min). This count includes redriven executions.
Total Executions The total number of state machine executions (failed, timed out, aborted, throttled, successful) between the poll interval (in count). This count includes redriven executions.
SUCCEEDED EXECUTIONS
Succeeded Executions Rate The total number of successfully completed executions per minute between the poll interval (in count/min).
Succeeded Executions The total number of successfully completed executions between the poll interval (in count).
STARTED EXECUTIONS
Started Executions Rate The total number of started executions per minute between the poll interval (in count/min).
Started Executions The total number of started executions between the poll interval (in count).
EXECUTION DURATION
Average Execution Duration The average time taken between the start and end time of an execution, calculated across all the executions performed between the poll interval (in secs).

Redriven Executions

Note: This tab is only applicable for Standard State machines.

ParameterDescription
REDRIVEN EXECUTION PERFORMANCE (%)
Failed Redriven Execution Percentage The percentage of redriven executions that failed between the poll interval (in %).
Timed Out Redriven Execution Percentage The percentage of redriven executions that timed out between the poll interval (in %).
Aborted Redriven Execution Percentage The percentage of redriven executions that were aborted between the poll interval (in %).
Succeeded Redriven Execution Percentage The percentage of redriven executions that succeeded between the poll interval (in %).
Total Redriven Failure Percentage The total percentage of failed redriven executions, including failed, timed out, and aborted, between the poll interval (in %).
REDRIVEN EXECUTION ERRORS
Failed Redriven Executions The total number of redriven executions that failed between the poll interval (in count).
Timed Out Redriven Executions The total number of redriven executions that timed out between the poll interval (in count).
Aborted Redriven Executions The total number of redriven executions that were aborted between the poll interval (in count).
REDRIVEN EXECUTIONS
Redriven Executions The total number of executions that were redriven (retried after failure) between the poll interval (in count).
Redriven Execution Percentage The rate of redriven executions retried from failed, timed out, and aborted executions between the poll interval (in %).
SUCCEEDED REDRIVEN EXECUTIONS
Succeeded Redriven Executions The total number of redriven executions that succeeded between the poll interval (in count).

Failed Executions

Note:
  • This tab is only applicable for Standard State machines.
  • This table shows the recent 1000 failed execution details, if available.
 
ParameterDescription
Failed Execution Details
Execution Name The name of the execution.
Start Time The date the execution started.
End Time The date the execution ended.
Duration The total duration of the failed execution (in secs).
Number of Redrives The total number of redrives for each failed execution (i.e., retries).

Express Machine

Note: This tab is only applicable for Express State Machines.

ParameterDescription
Express Billed Memory Rate The total billed memory per minute for the Express Workflows between the poll interval (in MB/min).
Express Actual Memory Rate The total memory consumed per minute by Express Workflows between the poll interval (in MB/min).
Express Billed Memory The total billed memory for Express Workflows between the poll interval (in MB).
Express Billed Duration The billed duration for Express Workflows executions between the poll interval (in minutes).

Configuration

ParameterDescription
CONFIGURATION
Creation Time The date and time when the state machine was created.
Role ARN The Amazon Resource Name (ARN) of the IAM role used when creating the state machine.
KMS Key ID The KMS Key (alias, ID, ARN) used to encrypt data.
Revision ID The revision identifier of the state machine.
Label A user-identified or auto-identified string that identifies a Map string. Present only if the stateMachineArn specified in input is a qualified state machine ARN.
X-Ray Tracing Indicates whether X-Ray Tracing is enabled or not.

Thank you for your feedback!

Was this content helpful?

We are sorry. Help us improve this page.

How can we improve this page?
Do you need assistance with this topic?
By clicking "Submit", you agree to processing of personal data according to the Privacy Policy.