|
Failover or redundancy support for OpManager is necessary to achieve
uninterrupted service. It becomes cumbersome if the OpManger DB crashes
or loses its network
connectivity and not monitoring your network. Though regular backups
help you recover from DB crashes, but it takes
time for OpManger to
resume its service. However, in the mean time your network will be left
unmonitored and some other critical devices such as routers, mail
servers etc. may go down and affect your business. Implementing a
redundancy
system helps you to overcome such failures.
Failover support requires you to configure OpManager Secondary or
Standby server and keep monitoring the OpManager Primary server. Incase
the Primary server fails the Standby server automatically starts
monitoring the network. The transition is so quick and smooth that
the end user does not feel the impact of the failure of the Primary
server or the subsequent taking over by Standby. In parallely the
Standby server triggers an email alert (email ID entered configured in
the mail server settings)
about the Primary's failure. Once the Primary
server
is restored back to operation the Standby server automatically goes
back to standby mode.
Working Mechanism
The Primary server updates its presence with a symbolic count in the
BEFailover table at a specified interval known as the
HEART_BEAT_INTERVAL. With every update the count gets incremented. This
count is known as LASTCOUNT. Similarly the
standby server also updates the its presence by updating the LASTCOUNT
in the BEFailover table.
When the Primary server fails, it fails
to update the LASTCOUNT. The Standby server keeps monitoring the
Primary's LASTCOUNT at a specified periodic interval known as
FAIL_OVER_INTERVAL. By default the FAIL_OVER_INTERVAL value is 60
seconds. If required you can modify it in the Failover.xml file
(<OpManager_Standby_home>\conf). Supposing, you have specified
FAIL_OVER_INTERVAL as
50 seconds, the standby will monitor the Primary's LASTCOUNT for every
50 seconds. Every time, when the Standby server looks up the
LASTCOUNT, it compares the previous and present counts. When the
Primary server fails to update the LASTCOUNT, consecutive counts will
be the same and the Standby assumes that the Primary server has failed
and
starts monitoring the network.

While installing OpManager on the standby server,


| Note: The Date and Time settings of the Primary and the Standby should be same. |
If you are running OpManager with MSSQL as the backend DB, then
implement clustering. Clustering refers to an array of
databases in which the data are stored and have a single virtual IP. If
any of the DB in the cluster environment fails the other DBs have the
data thereby providing high availability of data. The Primary server
sends all its data to a virtual IP and the data gets stored in multiple
locations. The Standby server that takes control over the network in
case the primary fails, then the standby server also sends the data to
the same virtual IP.
For configuring MSSQL server clustering visit the below link published
by Microsoft.
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/failclus.mspx#EDAAC
For MSSQL, the Standby OpManager server can be started once the
installation is completed, provided you have already configured MSSQL
clustering for Primary server.
Once the Primary server fails, the Standby server assumes itself as
the Primary server and starts monitoring the network. Once the Primary
server is up, the Standby server goes back to its standby mode and
monitors the Primary server.
|