How to create a solid IT incident management team

Jul 04 ยท 08 min read

IT incident management team

What do you do if one of your critical services goes down, or if a customer asks for their data to be removed, stating privacy issues? Every organization, including ManageEngine, has the same go-to crew when it comes to rescue: the incident management (IM) team.

The IM team is the first line of defense when things go wrong, and that's why you can't afford to slip up while forming your IM team. At ManageEngine, we've faced various incidents over the years. Those experiences helped us create, train, and nurture a solid IM team. In this article, we'll show you how you can bring out the best in your IM team to combat any security incident.

What does an IM team do?

An IM team's responsibilities are pretty straightforward:

  • Minimize the impact of incidents on users and the organization
  • Facilitate the immediate restoration of services
  • Perform root cause analysis (RCA) to reduce the reoccurrence of an incident

These are the major activities in the big picture of IM. However, your IM team needs to perform a wide range of small-scale activities to get there.

Let's take a quick look at an example incident where our MDM cloud server was down for a short period, causing a service outage.

We use our site monitoring tool for incidents. During an outage, it automatically sends out a notification to our internal communication platform, which keeps track of availability incidents. The IM team receives this information first. Their first job is to reach out to the concerned IT team; in this case, they reached out to our network operations center (NOC) team. The IM team discussed the incident with a manager and appointed a coordinator, both from the NOC team. Then, they formed a chat group involving the coordinator and other employees whose input was required to resolve the incident.

They figured out that the problem was with our ISP. While the NOC team resolved the issue, the IM team recorded the incident and examined the affected users. They responded to users and assured them that we identified the issue and services would be restored as soon as possible. After the NOC team restored the services, the IM team worked with the NOC team to create an RCA.

The RCA had a few action items. The IM team tracked these items to closure according to our SLAs. There are a few exceptions to this situation where the IM team would have had to follow a different procedure. This includes cases where:

  • The NOC team cannot find the root cause.
  • More teams need to be involved.
  • The action items cannot be executed as planned.

If this had been a security incident, they would take a completely different route through risk assessment, testing, and the use of other tools.

Here's the complete picture of what the IM team does:

Incident management team activities

What a successful IM team needs

The IM team handles availability, security, and privacy incidents. They might even tackle all three of them simultaneously. They wear multiple hats and adopt different approaches in every case. Therefore, we ensure that the IM team has the following capacities to handle these activities:

1. Knowledge of web applications

In the previous example, we were first alerted about the service outage in our MDM cloud server by the monitoring tool. The cause of the outage could be in the application logs or the MI logs (MI is a tool for network monitoring developed in-house by ManageEngine). This finding is crucial for the RCA. To acquire this insight, the IM team must have a basic understanding of these applications. They also need this understanding when they face security incidents due to vulnerabilities. Likewise, they must have the ability to use internal apps to manage the incident database.

2. An understanding of IT operations

In the previous example, if the service outage occurred in a parent suite of applications (i.e., Zoho WorkDrive, meaning multiple applications under WorkDrive might also face an outage), it could involve Zorro, the team that handles our data center operations. The IM team must keep Zorro informed about this incident. While Zorro fixes the incident, the IM team should convey the details of the incident and the measures taken to restore it to the affected users. The IM team can handle these concerns more adequately if they understand IT operations.

3. Familiarity with information security principles

The IM team must understand information security to spot the difference between security events and security incidents. For example, multiple login failures associated with a customer account in the US data center is a security event. However, if the password is compromised and an IP address from East Asia logs in successfully, it affects the customer and becomes a security incident. So, the IM team must clearly distinguish when an event occurs and when to log it as an incident.

Furthermore, the IM team must know how to apply the principles of CIA: confidentiality, integrity, and availability of information.

In this case:

  • Breach of confidentiality: A successful login by the attacker
  • Breach of integrity: The attacker modifies information in the account
  • Breach of availability: The attacker destroys data in the account

The IM team needs to approach all security incidents with these principles in mind. They also need to understand when to apply data privacy principles to security incidents. For example, if the compromised account had digital copies of sensitive identity documents (like passports), that's a privacy breach.

4. Knowledge of frameworks and their application

At ManageEngine, we generally follow the NIST Cybersecurity Framework. We use this framework across all teams to standardize risk assessments, and we ensure our IM team understands how to apply it to guide teams through those risk assessments.

These are some basic requirements to create a solid IM team. However, you should note that expertise in these areas takes time and effort to develop.

How to help your IM team thrive

Guidelines for different types of incidents

Apart from an IM policy, your IM team needs guidelines to ensure they stay on track. Guidelines help them access resources and make better decisions.

1.Availability incidents

Guidelines must lay out a transparent procedure that directs the team through the end of the process. For instance, the procedure section in our guidelines starts with the monitoring notification and ends with the post-RCA meeting. Here's what our guidelines for availability incidents cover:

  • The scope of the IM team in the incident
  • The procedure for responding to availability incidents: Gather insights from monitoring tools, notify concerned team members and work with them to restore affected services, analyze the reason for unavailability, log events in the IM register, create an RCA, and track RCA action items to closure
  • Exceptional cases that could deviate from regular incidents
  • Roles and responsibilities of IT teams and the IM team
  • A priority matrix (how to prioritize incidents based on impact)
  • A list of tools to be used
  • Prerequisites for the IM team and other IT teams
  • Review and authorization
  • How to handle exceptions and escalations
  • Metrics

2. Security incidents:

For security incidents, we usually deal with confidential information and a different level of risk. Here's what our guidelines consist of:

  • The scope of the IM team
  • The difference between a security event and an incident
  • Examples and use cases for types of security incidents
  • A list of tools to be used
  • Different phases in security incidents: Draft phase, analysis phase, containment phase, eradication phase, recovery phase, notification phase, review phase, and closure
  • Roles and responsibilities during each of the above phases
  • Incident reports: What and how to report, and who to report to
  • External and internal communication: Who, when, and how to notify
  • RCA templates
  • How to collect evidence
  • How to handle escalations and exceptions
  • Review and approval

3. Privacy incidents:

These incidents require a thorough understanding of privacy laws across the globe and how they apply to us. For example, if one of our teams sends an email to the wrong set of email addresses, that's a privacy incident only in some parts of the world. After the consolidation of such laws, here's what our guidelines contain:

  • The procedure for responding to privacy incidents: Containment, preliminary assessment, risk analysis, notification, resolution, and closure
  • Each phase elaborated along with relevant privacy laws for reference
  • Examples and use cases related to privacy incidents
  • The difference between a data controller and a data processor and how it applies to us
  • Breach notification templates for notifying authorities

Training, mentorship, and support

Even with the most comprehensive guidelines, your IM team needs exposure to various incidents over time to develop expertise. While this will happen with experience, you can accelerate their learning through training sessions on:

  • Information security awareness.
  • Data privacy awareness.
  • IT operations management.
  • Leadership and communication.

At ManageEngine, we also ensure that the IM team interacts with experienced IT leaders who share their insights. These discussions happen after crucial incidents and during quarterly meetings. We also bring in external firms to train our IM team on leadership and communication.

Most importantly, the IM team needs management's support. We go further and ensure our IM team has the right tools to manage incidents. We also have a repository of guidelines and chat groups to facilitate their work.

Forming a solid IM team can take years of effort, but it's an investment no company would ever regret making. If you'd like to learn more about how our IM team leads our IM process from the front lines, check out our IM handbook.

About the author

Shivaram P R

Shivaram P R, Content writer

Sign up for our newsletter to get more quality content

Get fresh content in your inbox

By clicking 'keep me in the loop', you agree to processing of personal data according to the Privacy Policy.