Windows Server Cluster Monitoring


Overview

Applications Manager's Windows Cluster monitoring capability includes monitoring of the cluster details, the cluster nodes, resource groups, cluster performance, networks, disk utilization and storage stats. You can also monitor the cluster events by configuring the Event log Rules.

Creating a new Windows Cluster monitor

Supported Versions: Windows 2016, Windows 2012, Windows 2012 R2, Windows 2008, Windows 2008 R2

Prerequisites for monitoring Windows Cluster metrics: Click here

Using the REST API to add a new Windows Cluster monitor: Click here

Follow the steps given below to create a new Windows Cluster monitor in Applications Manager:

  1. Click on New Monitor link.
  2. Select Windows Cluster under the Servers category.
  3. Specify the Display Name of the Windows Cluster.
  4. Enter the Virtual IP Address or the Listener IP Address of the cluster.
  5. Select the Version of the Windows Server from the drop-down menu.
  6. You can either use your Cluster Domain Administrator username and password, or select credentials from the Credential Manager drop-down menu. To use your Cluster Domain Administrator credentials, make sure the user account has permission to excute WMI queries on 'root\mscluster' namespace in cluster server nodes.
  7. Select the Node Discovery option. The available options are Do not Discover Nodes and Discover and Monitor Nodes
    • Do not Discover Nodes - Selecting this option will not discover the cluster server nodes as a Windows Server. If the node already added as windows server, it will be associated internally for collecting the event logs specific to cluster.
    • Discover and Monitor Nodes - Selecting this option will discover the cluster server nodes as a Windows Server and monitor availability and performance. If the node already added as windows server , it will not be discovered again and the existing server will be associated internally for collecting the event logs specific to cluster.
  8. Select the Enable Event Log Monitoring option:
    • Checked - This will enable the eventlog monitoring in all the cluster server nodes. The events generated from the configured eventlog rules will be propagated to cluster. During the eventlog collection of servers, it will collect the events for cluster as well and add them to database without generating the alert. Then, during the data collection of cluster, it will take the cluster events from database from all the nodes and then generate the alert for the configured Eventlog rule.
    • Unchecked - This will disable the eventlog monitoring in the Cluster.
      • Cluster Add : While adding the cluster, selecting this option will not enable eventlog monitoring in the nodes discovered. If a node already exists, this option will leave the current event log status in the server as it is.
      • Cluster Update : If this option is selected while updating the cluster, it will disable event log monitoring in the all the servers & cluster. Also it will clear the eventlog related Alarms and Events from database for all the servers & cluster. Hence use this option only when it is necessary.
  9. Enter the polling interval time in minutes.
  10. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional).
  11. Click Add Monitor(s).

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on Windows Cluster under the Servers section. Displayed is the Windows Cluster's bulk configuration view in three tabs:

  • Availability tab gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the monitor name to view all the Windows Cluster attributes monitored under the following tabs:

Overview

ParameterDescription
CLUSTER DETAILS

Cluster Name/IP Address

The name/IP Address of the cluster.

Quorum Owner Node The node name, which currently owns the Quorum Resource.
Quorum Path The path to the quorum files.
Quorum Type The current quorum type. The following are the possible values:
  • InputObject
  • Cluster
  • DiskOnly
  • NodeAndDiskMajority
  • NodeAndFileShareMajority
  • NodeMajority
Number of Nodes The total number of nodes in a cluster.
Max Nodes The maximum number of nodes that can participate in a cluster.
Number of Networks The number of networks used by the server cluster for communication.
Resources Online  The count of resources that are currently online.
Resources Offline  The count of resources that are currently offline.
Resource Groups Online  The count resource groups that are currently online.
Resource Groups Offline  The count resource groups that are currently offline.
Disks in Use The number of disks currently in use in the cluster.
Number of Nodes The total number of nodes in cluster.
DISK UTILIZATION

Disk Used Percentage

The total percentage of used disk space in a Cluster.
Disk Free Percentage The total percentage of free disk space in a Cluster.
Disk Size The total size of disk space, in megabytes.
Disk Used The total used space in the disk, in megabytes.
Disk Free The total free space available in the disk, in megabytes.
NODES
Node Name Specifies the label by which the node is known.
State

Specifies the current state of a node. Node states can be:

  • Up - The node is physically plugged in, turned on, booted, and capable of executing programs.
  • Down - The node is turned off or not operational.
  • Joining - The node is in the process of joining a cluster.
  • Paused - The node is running but not participating in cluster operations.
  • Unknown - The operation was not successful.
RESOURCE CONTROL AND MULTICAST RR
Messages Outstanding Specifies the length of the internal message queue.
RHS Processes Specifies how many Resource Host Monitor processes are running on the node.
RHS Restarts Specifies how many Resource Host Monitor failures have taken place on this node.
NETWORK RECONNECTIONS
Reconnect Count Specifies the number of times the TCP connection was broken and reestablished.


Performance

ParameterDescription
MULTICAST REQUEST REPLY
Messages Outstanding Specifies the length of the internal message queue.
RESOURCE CONTROL MANAGER
RHS Processes Specifies how many Resource Host Monitor processes are running on the node.
RHS Restarts Specifies how many Resource Host Monitor failures have taken place on this node.
NETWORK RECONNECTIONS
Node Name Specifies the label by which the node is known.
Reconnect Count Specifies the number of times the TCP connection was broken and reestablished.
Normal Message Queue Length Specifies the number of messages in the queue waiting to be sent.
Normal Message Queue Length Delta Specifies the incoming message rate to the queue.
Urgent Message Queue Length Specifies the number of urgent messages in the queue waiting to be sent.
Urgent Message Queue Length Delta Specifies the incoming message rate to the queue.
RESOURCE TYPE STATS
Resource Failure Indicates the number of times, the Resource Host Monitor get terminated due to a failure of a resource.
Resource Failure Access Violation Indicates the number of times, the Resource Host Monitor get terminated due to a failure of a resource, which caused by access violation.
Resource Failure Deadlock Indicates the number of times, the Resource Host Monitor get terminated due to a failure of a resource, which caused by deadlock.

Networks

ParameterDescription
Name Specifies the name of the network.
Address Provides the address for the entire network or subnet.
Role Provides access to the network's Role property i.e, the role of the network in the cluster. The following are the possible values:
  • None - The network is not used by the cluster.
  • Cluster - The network is used to carry internal cluster communication.
  • Client - The network is used to connect client systems to the cluster.
  • Both - The network is used to connect client systems and to carry internal cluster communication.
State Specifies the current state of the network. The following are the possible values:
  • Unknown - The operation was not successful.
  • Unavailable - All of the network interfaces on the network are unavailable, which means the nodes that own the network interfaces are down.
  • Down - The network is not operational; none of the nodes on the network can communicate.
  • Partitioned - The network is operational, but two or more nodes on the network cannot communicate. Typically a path-specific problem has occurred.
  • Up - The network is operational; all of the nodes in the cluster can communicate.
NETWORK MESSAGES
Bytes Received The Bytes Received/sec performance counter shows the number of new cluster message bytes received on the network per second
Bytes Sent The Bytes Sent/sec performance counter shows the number of new cluster message bytes sent over the network per second.
Messages Received The Messages Received/sec performance counter shows the number of new cluster messages received on the network per second.
Messages Sent The Messages Sent/sec performance counter shows the number of new cluster messages sent over the network per second.

Storage

ParametersDescription
Path The path (including the drive letter if present) of the clustered disk partition.
Volume Label Specifies access to the VolumeLabel property, which is the volume label of the partition.
Size The total size for the partition, in megabytes.
Used The total used space in the partition, in megabytes.
Free The total free space available for the partition, in megabytes.
Used Percentage The percentage of used space in the partition.
Free Percentage The percentage of free space in the partition.

Resource Groups

ParametersDescription
Name The name of the Resource group.
Current Node The node in which the resource group is currently running.
Preferred Node Indicates the preferred node names from the cluster, to which the resource group can failover/failback
State The current state of the resource group. The following are the possible values.
  • Unknown
  • Online
  • Offline
  • Failed
  • PartialOnline
  • Pending

Events

ParametersDescription
Rule Name The name of the Windows Cluster Event Log rule.
Log File Type The Log File Type under which the the Event Log rule was created. The Windows Cluster Events is generated in 'System' log file but Applications Manager users can Create a rule for Cluster under any Log File Type in Applications Manager. Hence, you can see the other events generated in all the Servers in the Cluster Level.
Node Name Name of the windows cluster node, in whcih the particular Event was generated.
Source Specifies the Source associated with the Event Log.
Event ID Specifies the Event ID associated with the Event Log
Type The Event Type - Event of Any Type, Error, Warning and Information.Note: In case of Security Events, the types would vary between Success Audit and Failure Audit.
Description Description content for the incoming event
Generated Time The time at which the event is generated.