AWS Step Functions is a managed service that lets you build and run serverless workflows by coordinating multiple AWS services. It ensures reliable execution of business processes with visual workflow design and monitoring.
Applications Manager tracks execution outcomes (failed, timed-out, throttled, aborted, succeeded), execution rates, and duration to ensure workflow health. For express state machines, it also monitors billed duration and billed memory to aid cost optimization.
To learn how to create a new AWS Step Function monitor, refer here.
Go to the Monitors Category View by clicking the Monitors tab. Click on the Step Function instance available under Amazon in the Cloud Apps section. Displayed below is the AWS Step Function bulk configuration view distributed into three tabs:
By clicking a monitor from the list, you'll be taken to the AWS Step Function dashboard which includes the following tabs:
| Parameter | Description |
|---|---|
| STATE MACHINE INFORMATION | |
| Type | The type of state machine. |
| Status | The current status of the state machine. |
| Versions | The maximum number of versions published for the state machine at the time of polling. |
| Aliases | The maximum number of versions created for the state machine at the time of polling. |
| STATE MACHINE EXECUTION PERFORMANCE (%) | |
| Failed Execution Percentage | The percentage of executions that failed at the time of polling (in %). |
| Timed Out Execution Percentage | The percentage of executions that timed out between the poll interval (in %). |
| Aborted Execution Percentage | The percentage of executions aborted between the poll interval (in %). |
| Throttled Execution Percentage | The percentage of executions throttled between the poll interval (in %). |
| Succeeded Execution Percentage | The percentage of executions that succeeded between the poll interval (in %). |
| Total Execution Failure Percentage | The total percentage of failed executions (failed, timed out, aborted, throttled) between the poll interval (in %). |
| EXECUTION ERRORS | |
| Failed Executions | The total number of executions that failed between the poll interval (in count). |
| Timed Out Executions | The total number of executions that timed out between the poll interval (in count). |
| Aborted Executions | The total number of executions that were aborted between the poll interval (in count). |
| Throttled Executions | The total number of executions that were throttled between the poll interval (in count). |
| TOTAL EXECUTION | |
| Total Executions Rate | The total number of state machine executions (failed, timed out, aborted, throttled, successful) per minute between the poll interval (in count/min). This count includes redriven executions. |
| Total Executions | The total number of state machine executions (failed, timed out, aborted, throttled, successful) between the poll interval (in count). This count includes redriven executions. |
| SUCCEEDED EXECUTIONS | |
| Succeeded Executions Rate | The total number of successfully completed executions per minute between the poll interval (in count/min). |
| Succeeded Executions | The total number of successfully completed executions between the poll interval (in count). |
| STARTED EXECUTIONS | |
| Started Executions Rate | The total number of started executions per minute between the poll interval (in count/min). |
| Started Executions | The total number of started executions between the poll interval (in count). |
| EXECUTION DURATION | |
| Average Execution Duration | The average time taken between the start and end time of an execution, calculated across all the executions performed between the poll interval (in secs). |
Note: This tab is only applicable for Standard State machines.
| Parameter | Description |
|---|---|
| REDRIVEN EXECUTION PERFORMANCE (%) | |
| Failed Redriven Execution Percentage | The percentage of redriven executions that failed between the poll interval (in %). |
| Timed Out Redriven Execution Percentage | The percentage of redriven executions that timed out between the poll interval (in %). |
| Aborted Redriven Execution Percentage | The percentage of redriven executions that were aborted between the poll interval (in %). |
| Succeeded Redriven Execution Percentage | The percentage of redriven executions that succeeded between the poll interval (in %). |
| Total Redriven Failure Percentage | The total percentage of failed redriven executions, including failed, timed out, and aborted, between the poll interval (in %). |
| REDRIVEN EXECUTION ERRORS | |
| Failed Redriven Executions | The total number of redriven executions that failed between the poll interval (in count). |
| Timed Out Redriven Executions | The total number of redriven executions that timed out between the poll interval (in count). |
| Aborted Redriven Executions | The total number of redriven executions that were aborted between the poll interval (in count). |
| REDRIVEN EXECUTIONS | |
| Redriven Executions | The total number of executions that were redriven (retried after failure) between the poll interval (in count). |
| Redriven Execution Percentage | The rate of redriven executions retried from failed, timed out, and aborted executions between the poll interval (in %). |
| SUCCEEDED REDRIVEN EXECUTIONS | |
| Succeeded Redriven Executions | The total number of redriven executions that succeeded between the poll interval (in count). |
| Parameter | Description |
|---|---|
| Failed Execution Details | |
| Execution Name | The name of the execution. |
| Start Time | The date the execution started. |
| End Time | The date the execution ended. |
| Duration | The total duration of the failed execution (in secs). |
| Number of Redrives | The total number of redrives for each failed execution (i.e., retries). |
Note: This tab is only applicable for Express State Machines.
| Parameter | Description |
|---|---|
| Express Billed Memory Rate | The total billed memory per minute for the Express Workflows between the poll interval (in MB/min). |
| Express Actual Memory Rate | The total memory consumed per minute by Express Workflows between the poll interval (in MB/min). |
| Express Billed Memory | The total billed memory for Express Workflows between the poll interval (in MB). |
| Express Billed Duration | The billed duration for Express Workflows executions between the poll interval (in minutes). |
| Parameter | Description |
|---|---|
| CONFIGURATION | |
| Creation Time | The date and time when the state machine was created. |
| Role ARN | The Amazon Resource Name (ARN) of the IAM role used when creating the state machine. |
| KMS Key ID | The KMS Key (alias, ID, ARN) used to encrypt data. |
| Revision ID | The revision identifier of the state machine. |
| Label | A user-identified or auto-identified string that identifies a Map string. Present only if the stateMachineArn specified in input is a qualified state machine ARN. |
| X-Ray Tracing | Indicates whether X-Ray Tracing is enabled or not. |
Thank you for your feedback!