In the previous blog post in this Sensible Service Management Series we looked at the core of servicing customers: managing Requests.
In Beetil, everything we respond to is called an Incident. Let’s talk about incidents in the strictest sense of the word: dealing with things going wrong.
Everything we talked about last time regarding Requests still applies. We will add some more considerations now for when something needs to be fixed. Recall we talked about three main capabilities:
1. Record
Provide a single point of contact with multiple channels to access it. Make sure you keep a record of all requests as Incidents in Beetil. Record all interactions with your users. Track all your responses and record what you did about them.
2. Respond
Make sure someone owns every request. Build up information in the Beetil knowledgebase. Use external information. Provide scripts for how to deal with common requests. Use Beetil to pass requests to someone else. Regularly monitor how long requests are taking and chase up the slow ones.
3. Report
Make sure you don’t drop the ball on any requests. Look for trends to help you improve your service.
OK, so now let’s extend that. Sometimes a user requests help with a service not working as expected: something needs to be fixed. That is an “incident” in the strictest use of the word.
Sometimes that “user” reporting an incident is an internal person picking up an error before it affects any of the “real” users consuming the service. It can even be a software program detecting the error and automatically alerting us.
In any case, if something needs fixing we need to elaborate on the “2. Respond” part of our Request process we described last time.
Here is how we expand that part of the process:
2.1. Categorise
We didn’t talk much about this last time. It helps to categorise all incidents, whether general requests or issues to be fixed, but it is particularly important when we are fixing stuff to get a general idea of what type of incident it is so that we can determine how serious it is, how wide and severe the impact is, and so that we pass it to the right person first time.
2.2. Diagnose
This is where keeping records of what we did in the past, and building up the knowledge base, really pay off. Search Beetil’s Incident records, Problem records, and the Knowledgebase, to see if we know what the cause is and how to fix it. If you get a match, fix it if you can or pass it to somebody who can. This is called Level 1 support.
If you don’t get a match or can’t fix it, pass it to Level 2 support: those who have the technical skills to do specialist diagnosis and resolution. If they can’t fix things, they refer the incident to Level 3 support: the folk who built or supplied the stuff that is not working, often a supplier external to your organisation.
2.3. Escalate
All this passing around amongst support groups is called Functional Escalation, but when we talk of “escalating” we usually think of Hierarchal Escalation, i.e. telling somebody more senior. We hierarchically escalate because:
That person might make a call that this is a Major Incident. This means drop the normal process described here and switch to a crisis-response process that we will talk about in a future blog post.
2.4. Resolve
Somebody is not getting the service they expect. The incident process must focus on restoring that service. Sometimes that is not the same thing as fixing the underlying problem (fixing the Problem is a different process we will talk about some other time). If we have to fix the problem in order to get the user back on track again we will, but sometimes there is a Workaround: a way to get them back up and running without fixing anything. For example, with some software, simply logging off and on again may get them around an issue and working again. Or rebooting a server may make the problem go away. (There is an old joke that “a problem gone is a problem solved”).
You can find Workarounds as part of Beetil’s Problem records, and/or you can also record Workarounds in the knowledgebase.
Eventually a Problem may cause so many Incidents that we have to hold the user up without a Workaround while we properly diagnose it and nail it once and for all. That is a management call whether the inconvenience is outweighed by the ongoing cost of recurring incidents. But in general the Incident process takes whatever Workarounds or temporary fixes it can to get service restored to the user as quickly as possible.
2.5. Close
This applies to all Incidents and Requests. Before you close the ticket make sure:
ITIL confuses things by talking a lot about finding and fixing the underlying problem, and recovering the broken service, as part of the Incident process. We will keep it clean by talking about all that as part of the Problem process (coming up in a future blog post). Keep it nice and crisp:
There is a huge body of knowledge out there about Incidents and Requests, which you can investigate further as you need to. ITIL has a lot (in the version 3 book Service Operation and the Operational Support and Analysis intermediate course). The Helpdesk Institute produce a lot of useful material too. COBIT 5 is my choice for formal definition of what should be happening and what should be produced, and by who.
For now, start with:
1. Record
2. Respond
2.1 Categorise
2.2 Diagnose
2.3 Escalate
2.4 Resolve
2.5 Close
3. Report
Want to read more sensible service management goodness?