Schedule demo

AWS Elastic Disaster Recovery Monitoring


AWS Elastic Disaster Recovery - Overview

AWS Elastic Disaster Recovery (DRS) is a cost-effective, reliable disaster recovery solution that minimizes downtime and prevents data loss by continuously replicating source servers using block-level replication to AWS.

ManageEngine Applications Manager's integration with AWS DRS offers a streamlined way to monitor and manage your disaster recovery operations. This integration provides real-time monitoring of your regional DRS environment, providing an overview of source server fleet status, replication states, and recovery readiness across AWS Regions.

Applications Manager provides monitoring through three distinct components:

  • Elastic Disaster Recovery (DRS): Acts as the regional monitor, providing a high-level overview of the source server fleet status, replication states, and recovery readiness across your AWS Regions.
  • DRS Source Server: Monitors on-premises or cloud-based servers configured for continuous replication to the AWS staging area, offering critical insights into replication health and data synchronization.
  • DRS Recovery Instance: Monitors the EC2 instances launched during recovery drills or failovers, tracking their performance and status as they take over workloads from the source server.

Creating a new Elastic Disaster Recovery monitor

To learn how to create a new Elastic Disaster Recovery monitor, refer here.

Monitored parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the Elastic Disaster Recovery (DRS) instance available under Amazon in the Cloud Apps section. Displayed is the Elastic Disaster Recovery bulk configuration view distributed into three tabs:

  • Availability tab shows the availability history for the past 24 hours or 30 days.
  • Performance tab shows health status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configurations.

By clicking on the respective monitor name from the list, you'll be taken to the Elastic Disaster Recovery dashboard.

Click on the Elastic Disaster Recovery (DRS) monitor to see all the metrics listed under the following tab:

Click on the DRS Source Server monitor to see all the metrics listed under the following tabs:

Click on the DRS Recovery Instance monitor to see all the metrics listed under the following tabs:

Elastic Disaster Recovery Metrics

Elastic Disaster Recovery - Performance Overview

ParameterDescription
SOURCE SERVER FLEET OVERVIEW
Source ServersThe average number of source servers being replicated to AWS DRS in this region between the poll interval.
Active Source ServersThe average number of source servers actively replicating data to AWS DRS in this region between the poll interval.
Protected Source ServersThe average number of source servers that are fully protected and ready for recovery in this region between the poll interval.
SOURCE SERVERS
Source Server IDThe unique identifier for the source server.
HostnameThe hostname of the source server.
Data Replication StateDetailed data replication state of the source server.
Last Launch ResultResult of the last recovery launch attempt.
RECOVERY INSTANCES
Recovery Instance IDThe unique identifier of the recovery instance.
Source Server IDThe unique identifier for the source server.
EC2 Instance IDEC2 instance ID of the recovery instance attached.
EC2 Instance StateThe current state of the EC2 recovery instance.
Failback StateThe current failback state of the recovery instance.

DRS Source Server Metrics

DRS Source Server - Performance Overview

ParameterDescription
LAST RECOVERY DETAILS
Last Launch TypeThe type of the last recovery launch.
Last Launch Job IDThe job ID of the last recovery launch.
Last Launch ResultThe result of the last recovery launch attempt for this source server.
Last Launch TimeThe time of the last recovery launch API call.
REPLICATION PROGRESS
Replication ProgressThe average percentage of data that has been replicated to the staging area at the time of polling (in %).
LAG DURATION
Lag DurationThe average amount of time that the source server is behind the replication target between the poll interval (in s).
BACKLOG
BacklogThe maximum amount of data that has not yet been replicated to the staging area at the time of polling (in MB).
ELAPSED REPLICATION DURATION
Elapsed Replication DurationThe maximum elapsed time of the replication run at the time of polling (in mins).
DURATION SINCE LAST SUCCESSFUL RECOVERY LAUNCH
Duration Since Last Successful Recovery LaunchThe maximum elapsed time since the last successful recovery launch at the time of polling (in mins).

DRS Source Server - Recovery Instances

ParameterDescription
RECOVERY INSTANCES
Recovery Instance IDThe unique identifier of the recovery instance.
EC2 Instance IDEC2 instance ID of the recovery instance attached.
EC2 Instance StateThe current state of the EC2 recovery instance.
Failback StateThe current failback state of the recovery instance.

DRS Source Server - Configuration

ParameterDescription
SOURCE SERVER DETAILS
Recovery Instance IDThe recovery instance ID associated with this source server.
HostnameThe hostname of the source server.
Agent VersionThe version of the AWS Replication Agent installed on the source server.
Creation TimeThe date and time the source server was added to the DRS service.
Last Updated TimeThe date and time the source server was last updated.
LAUNCH SETTINGS
Instance Type Right-Sizing MethodThe method used for right-sizing the target EC2 instance type.
Copy Private IPIndicates if the private IP address is copied during launch.
Copy TagsIndicates if the tags are copied from the source server to the recovery instance.
Launch Template IDThe EC2 launch template ID used for launching recovery instances.
OS BYOLIndicates if Bring Your Own License (BYOL) is enabled for the operating system.
REPLICATION SETTINGS
Staging Area Subnet IDThe subnet ID of the staging area.
EBS EncryptionThe EBS encryption setting for replicated disks.
Replication Server Instance TypeThe EC2 instance type used for the replication server.
Default Staging Disk Type (Large)The default EBS volume type for large staging disks.
Auto-Replicate New DisksIndicates if the new disks added to the source server are automatically replicated.
Use Dedicated Replication ServerIndicates if a dedicated replication server is used for this source server.
Associate Default Security GroupWhether the default security group is associated with the replication server.
Replication Server Security Group IDsThe security group IDs associated with the replication server.
Data Plane RoutingThe network routing used for data replication.
Bandwidth ThrottlingThe bandwidth throttling setting, where 0 indicates no throttling (in Mbps).
Create Public IPIndicates if a public IP is created for the replication server.
REPLICATION STATUS DETAILS
Replication DirectionThe direction of data replication for the source server.
Data Replication ErrorThe error message for the current data replication, if any.
Data Replication StateThe current state of data replication for this source server.
Replicating FromThe Availability Zone from which data is being replicated.
Replicating ToThe staging Availability Zone to which data is being replicated.
Replicated StorageThe total replicated storage across all disks (in GB).
Total StorageThe total storage capacity across all disks (in GB).

DRS Recovery Instance Metrics

DRS Recovery Instance - Performance Overview

ParameterDescription
INSTANCE INFORMATION
Data Replication StateThe current state of data replication for this recovery instance.
Data Replication ErrorThe error message for the current data replication, if any.
Failback StateThe current failback state of the recovery instance.
REPLICATION PROGRESS
Replication ProgressThe average progress of the data synchronization process for the recovery instance at the time of polling (in %).
REPLICATION LAG DURATION
Lag DurationThe average time difference between the source and recovery instance, representing potential data loss (RPO) between the poll interval (in s).
REPLICATION BACKLOG
Replication BacklogThe maximum amount of data waiting to be synchronized to the recovery instance at the time of polling (in MB).
ELAPSED REPLICATION DURATION
Elapsed Replication DurationThe maximum time the recovery instance has been in its current replication state at the time of polling (in mins).

DRS Recovery Instance - Configuration

ParameterDescription
RECOVERY INSTANCE DETAILS
Source Server IDThe unique identifier for the source server.
EC2 Instance IDThe EC2 instance ID of the recovery instance.
EC2 Instance StateThe current state of the EC2 recovery instance.
Job IDThe job ID that initiated the recovery instance.
Drill InstanceIndicates if this recovery instance was launched as a drill.
HostnameThe hostname of the source server.
FQDNThe fully qualified domain name (FQDN) of the recovery instance.
Agent VersionThe version of the AWS Replication Agent installed on the source server.
Point-in-Time Snapshot TimestampThe timestamp of the last point-in-time snapshot.
Last Updation TimeThe date and time the source server was last updated.
Agent Last SeenThe timestamp when the agent was last seen by the service.
REPLICATION STATUS DETAILS
Replicating FromThe Availability Zone from which data is being replicated.
Replicating ToThe staging Availability Zone to which data is being replicated.
Replication Start TimeThe timestamp when the replication has started.
Replicated StorageThe total replicated storage across all disks (in GB).
Total StorageThe total storage capacity across all disks (in GB).
FAILBACK DETAILS
Failback StateThe current failback state of the recovery instance.
Failback Client IDThe failback client ID of the recovery instance.
Failback Job IDThe failback job ID associated with the recovery instance.
Failback to Original ServerIndicates if failback is configured to return to the original source server.
Failback Client Last SeenThe timestamp when the failback client was last seen.

Loved by customers all over the world

"Standout Tool With Extensive Monitoring Capabilities"

It allows us to track crucial metrics such as response times, resource utilization, error rates, and transaction performance. The real-time monitoring alerts promptly notify us of any issues or anomalies, enabling us to take immediate action.

Reviewer Role: Research and Development

carlos-rivero
"I like Applications Manager because it helps us to detect issues present in our servers and SQL databases."
Carlos Rivero

Tech Support Manager, Lexmark

Trusted by thousands of leading businesses globally