People often say "what gets measured gets improved," but they rarely say what, exactly, should be measured. With the recent developments in the reporting capabilities of IT help desk software, hundreds of KPIs and help desk metrics can be measured and monitored.
But that doesn't mean you should measure them all. Only the KPIs and metrics that are critical to your IT help desk need to be measured to improve service delivery.
This paper describes the 8 KPIs that are critical to every IT help desk. These KPIs help meet basic IT help desk objectives such as business continuity, organizational productivity, and delivery of services on time and within budget. The KPIs are as follows:
The number of hours the business is down because IT services are unavailable.
Keep lost business hours to the bare minimum.
Most IT teams track service availability to see the overall performance of their IT service desks. But the pain of lost business isn't always reflected in service availability levels, even when those levels are high. For instance, if service availability is at 99.9%, the company still loses more than eight hours per year. Tracking lost business hours clearly highlights the loss and its impact on business.
In September 2010, Virgin Blue faced what could be considered every airline's worst nightmare. About 50,000 customers and 100 flights were grounded . Four hundred more flights were delayed or rescheduled over the following days because the solid-state disk server infrastructure hosting Virgin Blue's applications failed. This affected Virgin Blue's online check-in and booking system.
Despite SLAs to restore services immediately, it took 11 hours for the service to be restored, and 10 more hours to restore full operations. This was because of an attempted repair of a faulty device, which delayed the switch over to a contingency hardware platform. By then, the damage was already done. Although these 11 hours didn't cost much in terms of Virgin Blue'sIT service availability for the year, they cost Virgin Blue approximately $10 million in terms of lost business.
Industry standards - Lost business hours
|Number of downtime events in the last 12 months||0.56||2.26||3.92|
|Average amount of downtime per event in the last 12 months||0.16 hours||1.49 hours||17.82 hours|
|Longest downtime event||0.21 hours||4.78 hours||43.71 hours|
|Crirical application availability||99.90%||99.62%||99.58%|
|Length of time to recover from last downtime event||1.13 hours||5.18 hours||27.11 hours|
That said, a lot of factors could contribute negatively towards lost business hours. In 2010, Gartner projected that, "Through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration, and hand-off issues."
The ratio of the number of successful changes to the total number of changes that were executed in a given time frame.
Achieve a higher percentage of successful change implementations.
Opinion remains divided on what a failed change implies. It basically refers to any change that did not meet its objectives or go as planned.
On October 27, 2011, trading had to be halted at the Australian Stock Exchange (ASX) for four hours due to a failed change implementation. An upgrade on the ASX's internal network ( to improve the latency of the trading platform) led to unprecedented connectivity issues between the supporting components and the disseminating gateways of the trading system. ASX had to initiate trading services from one of their disaster recovery sites. Finally, to restore normalcy, the change had to be backed out that night.
A downward trend or a stale change success rate is usually due to failure of change implementations due to:
Another help desk metric that should be tracked to have an effective change management process is the number of unplanned changes. An unplanned change can be an emergency change or an urgent change.
Although there is no industry standard or defined number for the number of unplanned changes permissible in an IT infrastructure, this reporting metric is important, especially during an increasing trend or a spike in the number of unplanned changes.
An increasing trend in the number of unplanned changes indicates the inadequate planning of changes and questions the efficiency of the change management process. Therefore, the change management process has to be improved to ensure proper planning and execution of changes.
Increasing trend in unplanned changes
A sudden spike in the number of unplanned changes can be due to unanticipated major incidents, which warrant emergency changes to restore service. Such a situation is probably due to an unstable infrastructure, which could affect service availability and, ultimately, the business.
Discrete spikes in unplanned changes
A highly stable infrastructure is characterized by maximum availability, very few outages, and low service disruptions.
Maintain a highly stable infrastructure.
To effectively gauge and monitor infrastructural stability, IT help desks need to monitor the following:
Delivering maximum availability and better service quality will be impossible in an infrastructure where routers have to be restarted multiple times a day, servers are often down, or workstations have to be rebooted every now and then. Therefore, such problematic assets must be identified and replaced to ensure business continuity.
A problematic asset might repeatedly be the cause for service disruptions or outages, and for reporting purposes, these could be assets that have more than a couple incidents associated with them. The percentage reduction in the number of problematic assets can be calculated using the following formula:
Number of problematic assets replaced at the end of the time frame.
Number of problematic assets identified at the beginning of the time frame
Another major indication of stability is the recurrence of major incidents on the IT infrastructure, which can lead to service disruptions or service level deterioration. A major incident, by definition, is a high-impact, high-urgency incident that affects a large number of users, depriving the business of one or two key services.
The goal is to reduce the number of major incidents, which can be achieved with efficient Root Cause Analysis (RCA) and a reduction of problem backlog. Identifying root causes and fixing problems can reduce the recurrence of major incidents and, subsequently, ticket volumes to the IT helpdesk.
Teams can also measure these action items with details on time taken to initiate root cause analysis after problem identification and time taken to complete root cause analysis.
One of the world's leading financial institutions was able to improve its stability by reducing their major incidents. This reduction in the number of incidents was achieved by improving their root cause analysis process.
Reducing major incidents helps improve IT stability
The major reasons for a heavy problem backlog could be:
Without identifying and rectifying the root cause, the chances of major incidents recurring are fairly high. Thankfully, though, the problem backlog can be reduced by:
Working on these two simple ITIL® service desk metrics-percentage reduction in the number of major incidents and percentage reduction in the number of problematic assets-can help you maintain a highly stable IT infrastructure.
Total number of tickets handled by the IT helpdesk and their patterns within a given time frame.
Optimize the number of incidents and service requests, and prepare the IT team to handle the ticket load.
IT help desks should watch out for a few trends when it comes to ticket volumes, such as:
Discrete spikes in ticket volumes
A sudden upward spike in the ticket volume can be due to the following reasons:
The below figure (7) represents the number of tickets handled by the IT helpdesk at a university in the United States. The graph clearly indicates a ticket spike in the month of September 2012 and 2013. This is due to the increased amount of students joining the university during the fall. So, the IT team makes sure that this extra load is distributed evenly across the team, and each member works overtime to handle these ticket spikes
Ticket volume at an American university
Continuous upward trend in ticket volumes
An upward trend could be due to any of the following reasons:
Increase in the organization size
As the business grows, it is obvious that the IT service desk has to support more end users, which typically leads to increased ticket volumes. This gradual increase in the ticket volume can be handled by an effective staffing plan in accordance with the growth of the business. Furthermore, end users can be segregated into departments and user groups to handle tickets effectively.
Initiatives to support more business functions
As IT starts supporting more business functions, the ticket volume (both incidents and service requests) rises. This can be handled by understanding the requirements and expectations of the end users, and equipping the IT help desk team to handle the increase in tickets.
Decrease of infrastructure stability
With the increasing number of problematic and outdated assets in the IT network, the number of tickets is bound to increase as well. This can be addressed by associating the incidents and problems with their assets, helping the IT team decide on retiring the asset, upgrading the assets, and so on.
Percentage of incidents resolved by the first level of support (first call or contact with the IT help desk).
Have a higher level of FCRR.
High first call resolution rate is usually associated with higher customer satisfaction as confirmed by a study that Customer Relationship Metrics conducted. Furthermore, a study conducted by the Service Quality Measurement Group also revealed that for every one percent improvement in FCR, you get a one percent improvement in customer or end user satisfaction.
First call resolution is also related to cost per ticket. The following graph represents the cost per ticket for every level.
Cost per ticket at various levels of support
Sometimes IT helpdesk technicians rush to close tickets during the first call, even without accurate resolutions. Such cases can lead to first call resolution rates rising, while end user satisfaction rates drop drastically, as depicted in the following graph.
First call resolution rate Vs. End user satisfaction
FCRR Excellence Tip
Here is a simple three-phase technique to get your IT help desk team resolving tickets in the first call.
Phase 1 : Learn the environment
Phase 2 : Fine tune
Generate reports to ascertain that the efforts of phase I panned out, and identify areas of improvement. Below are some sample reports to help you get started.
Phase 3 : Optimize
Establish a well-defined process for continual improvement of first call resolution rate.
This technique not only helps you improve the FCRR levels, but also helps ensure that tickets are properly resolved, not just closed.
Another possible trend is a constantly degrading FCRR, as shown in the following graph.
Constantly degrading FCRR
There are a few reasons this could occur, but the primary reasons are as follows:
According to MetricNet's benchmarking levels, the average net FCRR for service desks globally is 74 percent, with a range of 41 to 74 percent. The most common factors among all the services on the higher end of the spectrum were the presence of highly trained agents, the availability of knowledge management tools, and the presence of tools such as remote desktop management.
Percentage of incidents resolved within the agreed SLA time.
Maintain maximum SLA compliance rate.
Tracking SLA compliance levels helps IT help desks:
Sometimes IT help desk technicians close tickets without proper resolutions, just to avoid SLA violations. When this happens, though the SLA compliance rates remain high and the end-user satisfaction levels are bound to decrease, as shown in the following graph.
SLA compliance rate Vs. End user satisfaction
SLA compliance levels may drop for other reasons, though, so it is important to keep the following possibilities in mind:
During such scenarios, IT service desk teams must understand the requirements of the business, and redefine their SLAs as appropriate.
SLAs and SLA compliance are critical to ensuring business continuity. This case from a cement manufacturing company, however, stresses that SLAs must also be set carefully. The IT help desk was unavailable for immediate response to an issue on a truck dispatch, but did resolve it within the SLA. Unfortunately, the cement manufactured had to be dispatched to the client location within one hour to avoid hardening.
The IT help desk was unaware of this, and SLAs were set without considering these factors. As a result, though the ticket was resolved within the SLA, the cement had already hardened, which affected the business.
Decreasing trend in SLA compliance rate
Another alarming trend to keep an eye out for is a constantly degrading SLA compliance rate.
The total monthly operating expense of IT support, divided by the monthly ticket volume.
Maintain minimum levels of cost per ticket.
As per MetricNet, the following were the cost per ticket benchmarks for 2014.
Industry standard - Cost per ticket at a high density environment
Industry standard - Cost per ticket at a medium density environment
As seen in both cases, the cost of the service request is usually higher than the cost of the incidents. This is because incidents typically take less time to resolve than service requests. So, the cost per ticket is heavily influenced by the mix of incidents and service requests.
IT support is considered a cost center in most organizations, and is usually the first to get budget cuts during a financial downturn. Therefore, IT support must remain efficient, even when IT spending is reduced. Cost per ticket is a key service desk performance metric that helps IT support analyze its efficiency in handling tickets within a given budget. The goal is always to maintain an optimal level of cost per ticket.
However, it is important to keep in mind that a higher-than-average cost per ticket may not necessarily be a bad thing, and a lower-than-average cost per ticket may not always be good, as shown in the following graphs.
The scenario depicted in this graph may mean that the IT service desk team is compromising on service quality to reduce the cost per ticket, which often results in lower customer satisfaction levels.
Cost per ticket Vs End user satisfaction
The scenario depicted in the above graph shows where the increase in the cost per ticket is accompanied by an increase in the customer satisfaction levels. This may mean that the increasing cost per ticket has led to better service delivery, justifying the extra cost.
One key factor for optimizing the cost per ticket is to enable quick resolution of tickets and reduce any unnecessary escalation. Cost per ticket can be kept in control by following these pointers:
Percentage of software products and licenses in actual use by the business.
Maximize ROI (return on investments) on software investments.
With software license purchases taking up a major part of the IT spending, it is important to track software utilization. Unfortunately, this is one of the least discussed service desk metrics. For easy management, the software can be categorized as follows:
The following service desk metrics can be used to track software utilization:
This metric helps identify any software purchase expenditure that does not provide any value to the organization. Ideally, this ratio should be close to one, meaning there is maximum utilization of all purchased software, thereby ensuring a maximum ROI on the software license purchase. A high number of category one software in the unused list means that a major portion of the software asset spending is sitting in idle software.
This metric helps analyze the license utilization of a particular software, helping IT teams plan ahead for license purchases. The ratio should be as small as possible for maximum ROI. A higher ratio could mean that some of the software applications are over licensed, which could be an idle investment with no ROI.
A leading global pharmaceutical company saved about one million dollars in spending. The pharmaceutical company, with its services spread across 50+ countries, was using a diverse range of Microsoft products. At one particular office, there were thousands of software applications licensed under a Microsoft volume licensing agreement, but there was no visibility or control of these software assets, initially. The purchase had been made without understanding the business requirements.
In fact, the company had limited information on the software assets and the number and type of assets the organization actually needed. This, again, put the organization at the risk of over-licensing, under-licensing, and compliance penalties.
The IT help desk started with a simple analysis by comparing the installed Microsoft software with the Microsoft licenses they held. The insight gained, and IT's efforts to understand the business requirements, led to a redesigned Microsoft license purchase that involved stepping down from the Microsoft Office Professional edition to the cheaper standard edition, which met the business requirement.
Furthermore, several other volume licenses were replaced, leading to cost cuts saving the company about one million dollars in their software license purchases.
Another important software asset management metric that could incur cost to the organization is the license compliance rate. Maintaining maximum compliance can save your organization from penalties and fines. The following are a few tips for achieving maximum compliance:
Achieve maximum compliance with a three-step pre-audit
Hundred percent license compliance rate will no longer be a myth with this simple three-step pre-audit.
Step 1 : Gap analysis
Step 2 : Compliance analysis
Check the total number of software installations vs. the total number of licenses purchased for every software application to identify over and under-licensed software.
Step 3 : Software license optimization
With all the insight gained from step I and II, redesign your software purchases to optimize compliance and attain a 100 percent license compliance rate.
These 8 KPIs, with respective metrics, will help you establish a measurement engine to constantly measure and continuously improve your service desk performance. The first step in establishing this measurement engine is to understand the business that the IT help desk is supporting, and align the IT help desk objectives to the business objectives. The next step is to identify the KPIs and metrics that are critical to these help desk objectives, and constantly measure them.
The 8 service desk KPIs discussed here are critical to the three basic IT help desk objectives of ensuring business continuity, making the organization productive, and delivering services within budgets and on time, which underlines the fact that these 8 KPIs are the ones that your IT help desk should care most about.
ITIL® is a registered trade mark of AXELOS Limited. All rights reserved.