Schedule demo

AWS Step Function Monitoring


AWS Step Function - Overview

AWS Step Functions is a managed service that lets you build and run serverless workflows by coordinating multiple AWS services. It ensures reliable execution of business processes with visual workflow design and monitoring.

Applications Manager tracks execution outcomes (failed, timed-out, throttled, aborted, succeeded), execution rates, and duration to ensure workflow health. For express state machines, it also monitors billed duration and billed memory to aid cost optimization.

Creating a new AWS Step Function monitor

To learn how to create a new AWS Step Function monitor, refer here.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the Step Function instance available under Amazon in the Cloud Apps section. Displayed below is the AWS Step Function bulk configuration view distributed into three tabs:

  • Availability tab gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configurations.

By clicking a monitor from the list, you'll be taken to the AWS Step Function dashboard which includes the following tabs:

Performance Overview

ParameterDescription
STATE MACHINE INFORMATION
TypeThe type of state machine.
StatusThe current status of the state machine.
VersionsThe maximum number of versions published for the state machine at the time of polling.
AliasesThe maximum number of versions created for the state machine at the time of polling.
STATE MACHINE EXECUTION PERFORMANCE (%)
Failed Execution PercentageThe percentage of executions that failed at the time of polling (in %).
Timed Out Execution PercentageThe percentage of executions that timed out between the poll interval (in %).
Aborted Execution PercentageThe percentage of executions aborted between the poll interval (in %).
Throttled Execution PercentageThe percentage of executions throttled between the poll interval (in %).
Succeeded Execution PercentageThe percentage of executions that succeeded between the poll interval (in %).
Total Execution Failure PercentageThe total percentage of failed executions (failed, timed out, aborted, throttled) between the poll interval (in %).
EXECUTION ERRORS
Failed ExecutionsThe total number of executions that failed between the poll interval (in count).
Timed Out ExecutionsThe total number of executions that timed out between the poll interval (in count).
Aborted ExecutionsThe total number of executions that were aborted between the poll interval (in count).
Throttled ExecutionsThe total number of executions that were throttled between the poll interval (in count).
TOTAL EXECUTION
Total Executions RateThe total number of state machine executions (failed, timed out, aborted, throttled, successful) per minute between the poll interval (in count/min). This count includes redriven executions.
Total ExecutionsThe total number of state machine executions (failed, timed out, aborted, throttled, successful) between the poll interval (in count). This count includes redriven executions.
SUCCEEDED EXECUTIONS
Succeeded Executions RateThe total number of successfully completed executions per minute between the poll interval (in count/min).
Succeeded ExecutionsThe total number of successfully completed executions between the poll interval (in count).
STARTED EXECUTIONS
Started Executions RateThe total number of started executions per minute between the poll interval (in count/min).
Started ExecutionsThe total number of started executions between the poll interval (in count).
EXECUTION DURATION
Average Execution DurationThe average time taken between the start and end time of an execution, calculated across all the executions performed between the poll interval (in secs).

Redriven Executions

Note: This tab is only applicable for Standard State machines.

ParameterDescription
REDRIVEN EXECUTION PERFORMANCE (%)
Failed Redriven Execution PercentageThe percentage of redriven executions that failed between the poll interval (in %).
Timed Out Redriven Execution PercentageThe percentage of redriven executions that timed out between the poll interval (in %).
Aborted Redriven Execution PercentageThe percentage of redriven executions that were aborted between the poll interval (in %).
Succeeded Redriven Execution PercentageThe percentage of redriven executions that succeeded between the poll interval (in %).
Total Redriven Failure PercentageThe total percentage of failed redriven executions, including failed, timed out, and aborted, between the poll interval (in %).
REDRIVEN EXECUTION ERRORS
Failed Redriven ExecutionsThe total number of redriven executions that failed between the poll interval (in count).
Timed Out Redriven ExecutionsThe total number of redriven executions that timed out between the poll interval (in count).
Aborted Redriven ExecutionsThe total number of redriven executions that were aborted between the poll interval (in count).
REDRIVEN EXECUTIONS
Redriven ExecutionsThe total number of executions that were redriven (retried after failure) between the poll interval (in count).
Redriven Execution PercentageThe rate of redriven executions retried from failed, timed out, and aborted executions between the poll interval (in %).
SUCCEEDED REDRIVEN EXECUTIONS
Succeeded Redriven ExecutionsThe total number of redriven executions that succeeded between the poll interval (in count).

Failed Executions

Note:
  • This tab is only applicable for Standard State machines.
  • This table shows the recent 1000 failed execution details, if available.
 
ParameterDescription
Failed Execution Details
Execution NameThe name of the execution.
Start TimeThe date the execution started.
End TimeThe date the execution ended.
DurationThe total duration of the failed execution (in secs).
Number of RedrivesThe total number of redrives for each failed execution (i.e., retries).

Express Machine

Note: This tab is only applicable for Express State Machines.

ParameterDescription
Express Billed Memory RateThe total billed memory per minute for the Express Workflows between the poll interval (in MB/min).
Express Actual Memory RateThe total memory consumed per minute by Express Workflows between the poll interval (in MB/min).
Express Billed MemoryThe total billed memory for Express Workflows between the poll interval (in MB).
Express Billed DurationThe billed duration for Express Workflows executions between the poll interval (in minutes).

Configuration

ParameterDescription
CONFIGURATION
Creation TimeThe date and time when the state machine was created.
Role ARNThe Amazon Resource Name (ARN) of the IAM role used when creating the state machine.
KMS Key IDThe KMS Key (alias, ID, ARN) used to encrypt data.
Revision IDThe revision identifier of the state machine.
LabelA user-identified or auto-identified string that identifies a Map string. Present only if the stateMachineArn specified in input is a qualified state machine ARN.
X-Ray TracingIndicates whether X-Ray Tracing is enabled or not.

Loved by customers all over the world

"Standout Tool With Extensive Monitoring Capabilities"

It allows us to track crucial metrics such as response times, resource utilization, error rates, and transaction performance. The real-time monitoring alerts promptly notify us of any issues or anomalies, enabling us to take immediate action.

Reviewer Role: Research and Development

carlos-rivero
"I like Applications Manager because it helps us to detect issues present in our servers and SQL databases."
Carlos Rivero

Tech Support Manager, Lexmark

Trusted by over 6000+ businesses globally