The definitive guide to ITIL incident management
Last updated on: May 22, 2018
IT incident management is one of the help desk's fundamental processes. In this guide, you will learn about the basics of incident management, its components, the roles and responsibilities involved, and how incident management works with other components of the service desk.
An IT incident is any disruption to an organization's IT services that affects anything from a single user or the entire business . In short, an incident is anything that interrupts business continuity.
Incident management is the process of managing IT service disruptions and restoring services within agreed service level agreements (SLAs).
The scope of incident management starts with an end user reporting an issue and ends with a service desk team member resolving that issue.
With proper incident management in place, collecting information about incidents is streamlined and less chaotic without having emails fly back and forth for the purpose. Service desk teams can publish forms in t user self-service portal to ensure that all relevant information is collected right at the time of ticket creation.
The next stage in incident management is incident categorization and prioritization. This not only helps sort incoming tickets but also ensures that the tickets are routed to the technicians, most qualified to work on the issue. Incident categorization also helps the service desk system apply the most appropriate SLAs to incidents and communicate those priorities to end users. Once an incident is categorized and prioritized, technicians can diagnose the incident and provide the end user with a resolution.
Incident management process when enabled with the relevant automations allows service desk teams to keep an eye on SLA compliance, and sends notifications to technicians when they are approaching an SLA violation; technicians also have the option to escalate SLA violations by configuring automated escalations , as applicable to the incident. After diagnosing the issue, the technician offers the end user a resolution, which the end user can validate. This multistep process ensures that any IT issue affecting business continuity is resolved as soon as possible.
Incidents in an IT environment can be categorized in several different ways. Some factors that influence incident categorization include the urgency of the incident and the severity of its impact on users or the business in general. Classifying and categorizing IT incidents helps identify and route incidents to the right technician, saving time and effort. For example, incidents can be classified as major or minor incidents based on their impact on the business and their urgency. Typically major incidents are the ones that affect business-critical services, thus affecting the entire organization, and need immediate resolutions. Minor incidents usually impact a single user or a department, and might have a documented resolution in place already.
Incident management covers every aspect of an incident across its life cycle. It speeds up the resolution process and makes ticket management transparent. Without incident management, handling tickets can be a hassle. Some of the key problems that can arise include:
Incident management practices are widely used by the IT service desk teams. Service desks are usually the single point of contact for end users to report issues to IT management teams.
The incident management process can be summarized as follows:
These processes may be simple or complex based on the type of incident; they also may include several workflows and tasks in addition to the basic process described above.
An incident can be logged through phone calls, emails, SMS, web forms published on the self-service portal or via live chat messages.
Incidents can be categorized and sub-categorized based on the area of IT or business that the incident causes a disruption in like network, hardware etc.
The priority of an incident can be determined as a function of its impact and urgency using a priority matrix. The impact of an incident denotes the degree of damage the issue will cause to the user or business. The urgency of an incident indicates the time within which the incident should be resolved. Based on the priority, incidents can be categorized as:
Once the incident is categorized and prioritized, it gets automatically routed to a technician with the relevant expertise.
Based on the complexity of the incident, it can broken down into sub-activities or tasks. Tasks are typically created when an incident resolution requires the contribution of multiple technicians from various departments.
While the incident is being processed, the technician needs to ensure the SLA isn't breached. An SLA is the acceptable time within which an incident needs response (response SLA) or resolution (resolution SLA). SLAs can be assigned to incidents based on their parameters like category, requester, impact, urgency etc. In cases where an SLA is about to be breached or has already been breached, the incident can be escalated functionally or hierarcially to ensure that it is resolved at the earliest.
An incident is considered resolved when the technician has come up with a temporary workaround or a permanent solution for the issue.
An incident can be closed once the issue is resolved and the user acknowledges the resolution and is satisfied with it.
After an incident has been closed, it's good practice to document all the takeaways from that incident. This helps better prepare teams for future incidents and creates a more efficient incident management process. The post-incident review process can be broken down into various aspects, as shown below, and is particularly useful for major incidents.
Apart from the above factors, some end-user facing factors should also be evaluated. For this purpose, a post-closure survey is conducted to collect feedback from the end users affected by the incident. This survey should be used to gain insight in some key areas, like:
Although each organization can have their own custom roles and responsibilities, below are some of the most common IT incident management roles.
This is the stakeholder who usually experiences a disruption in service and raises an incident ticket to initiate the process of incident management.
This is the first point of contact for the requesters when they want to raise a request or incident ticket. The Tier 1 service desk usually consists of technicians who have a working knowledge of the most common issues that might occur in an IT environment, including password resets and Wi-Fi problems.
This service desk is made up of technicians with advanced knowledge of incident management. They usually receive more complex requests from end users; they also receive requests in the form of escalations from Tier 1.
This level is usually comprised of specialist technicians who have advanced knowledge of particular domains in the IT infrastructure. For example, technicians for hardware maintenance and server support specialize in very specific fields.
This stakeholder plays a key role in the process of incident management by monitoring how effective the process is, recommending improvements, and ensuring the process is followed, among other responsibilities.
This stakeholder owns the process followed for managing incidents. They also analyze, modify, and improve the process to ensure it best serves the interest of the organization.
Each role has unique responsibilities, as shown below.
Metrics that drive important decisions are termed key performance indicators (KPIs). Below are a few KPIs for effective IT incident management.
The average time taken to resolve an incident.
The average time taken to respond to each incident.
The percentage of incidents resolved within an SLA.
Percentage of incidents resolved in the first call.
The number of identical incidents logged within a specific time frame.
The percentage of resolved incidents that were reopened.
The number of incidents that are pending in the queue without a resolution.
The number of major incidents compared to the total number of incidents.
The average expense pertaining to each ticket.
The number of end users or customers who were satisfied with the IT services delivered to them.
With a proper ITIL incident management process in place, you can:
When choosing a ticketing system or IT help desk software, there are a few features that can make or break your IT incident management. Here are some features to consider when choosing incident management software:
Incident management is a collection of policies, processes, workflows, and documentation that helps IT teams manage an incident from start to finish. The process of incident management involves identifying an incident, logging it with all the relevant information, diagnosing the issue, and restoring the service in a timely manner. The process of incident management is akin to firefighting, where the main goal is to minimize damage to the business.
On the other hand, IT problem management is the process of identifying the root cause leading to one or more incidents and then initiating actions to rectify the issue. Problem management aims to minimize the impact of the problem on the business by taking a more organized approach in the form of root cause analysis, which is used to pinpoint the underlying issue. This issue is then fixed to prevent similar incidents in the future. Ultimately, identifying underlying problems helps with incident management and proactively ensures that normal operations continue.
ITIL change management is the process of modifying the IT infrastructure of an organization in a standardized and systematic manner. It is a well-planned process comprised of various stages and statuses that IT changes can go through.
Typically, IT changes are initiated after the IT problem management processes to fix the identified IT problem, to replace a faulty asset that leads to repeat incidents, or as a part of the resolution to a major incident. The objective of IT incident management is to minimize IT disruptions and restore services immediately. In some cases, change implementations can lead to incidents, most of which are minor incidents caused by temporary service disruptions or service unavailability. The impact of such incidents can be minimized by proactively informing end users about the change implementation as well as anticipated incidents or service unavailability. In case of a major incident caused by a change, change management teams can immediately roll back the change to restore normalcy.
Integrating IT asset management and IT incident management processes makes incident diagnosis and resolution much easier for Tier 2 and Tier 3 technicians. For example, when a user reports an issue about limited internet connectivity, the issue could be either with the laptop or with the router the user is connected to. Having all the information about the user's laptop—including the router they're connected to along with its details and relationships—helps the technician pinpoint the cause of the incident and provide the right resolution. From an asset-management perspective, linking IT incidents with assets helps IT service desks identify and retire faulty assets that cause repeat incidents in the organization.
An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item, even if it has not yet affected a service, is also an incident (e.g. failure of one disk from a mirror set).
The process of discovering an incident.
Creating and maintaining a record of an incident in the form of a ticket.
Recording an incident with due diligence so that it's placed under the appropriate category.
Closing an open incident ticket once the incident has been resolved.
A set of rules defining the hierarchy for escalating incidents, including triggers that lead to escalations. Triggers are usually based on incident severity and resolution time.
Managing the life cycle of all incidents to restore normal service operation as quickly as possible and minimize business impact.
A series of reports produced by the incident manager for various target groups (e.g. teams responsible for IT management, service level management, other service management processes, or incident management itself).
The person responsible for the effective implementation of the incident management process and carrying out reporting. Also represents the first stage of escalation if an incident is not able to be resolved within the agreed service level.
Contains the predefined steps that should be taken to deal with a particular type of incident.
Tracking the processing status of outstanding incidents so that counter measures may be introduced as soon as possible if service levels are likely to be breached.
Assigning priorities to incidents and defining what constitutes a major Incident.
A collection of data with all details of an incident, documenting the history of the incident from registration to closure.
A report that includes information about incidents, how they were handled, and other data that can help measure the performance of the incident management process.
The workaround or correction that fixes the incident and restores service to its best quality.
How far along an incident is in the incident management process. Common statuses include:
An exclusive package of a feature checklist and incident management presentations.
Comprehensive list of must-have features that you can use as a benchmark for your IT service desk.
Detailed presentations with specific use cases to get started with ITIL incident management .
ManageEngine's flagship product, ServiceDesk Plus, is an ITIL-ready service desk software used by ITSM professionals worldwide. With industry-certified best practice ITSM functionality, easy-to-use capability, and native mobile apps, ServiceDesk Plus leverages the latest technology to help IT support teams deliver world-class service to end users with reduced costs and complexity. Available in both cloud and on-premises versions, the software is available in three editions, and 29 different languages. Over 100,000 organizations across 185 countries trust ServiceDesk Plus to optimize their IT service desk performance and be future-ready in their IT service management operations. For more information on ServiceDesk Plus, please visit manageengine.com/products/service-desk.