So far the Sensible Service Management Series has covered incidents and requests. This is the “front-office” activity involved in serving the users: meeting their needs and keeping their services running. There is a “back-office” activity closely related to incidents and requests: Problem Management. A problem is an underlying cause of incidents. Usually it means something is broken. If an alligator bites someone, you fix the incident with bandages and maybe surgery. To fix the problem you shoot the alligator before it bites again.
Strictly speaking something can be wrong/broken and not be a problem because it is not causing incidents (yet). I like to call those Faults. You can keep life simple (and your Beetil configuration simple) if you treat anything wrong/broken as a problem.
It is not uncommon to find an organization that doesn’t use problem records, only incidents. This is a big mistake. An incident record says that a user is unhappy. If we get the user working again (say using a Workaround – see our Incident Management post) then the incident is over, though the problem may be still there. An incident ends up getting used to track the problem, which usually takes longer. This screws up our reporting, making it look like we have long-running incidents, like we are not looking after the users.
Incident Management and Problem Management are very different activities and need to be kept separate. Not only does it make the incident reporting more accurate; keeping problems separate has other benefits:
So use problem records. We open problem records in several different situations:
If you want, you can be quite general about what you define as a problem. For example, lots of user errors might show you there is a problem with the training.
Track all your problems (prioritise, work on them, follow up the slow ones), and record what you did about them, and close them off as you fix them or decide to live with them (if they are too hard or expensive to fix).
It is not the bosses’ job to solve problems. Problems don’t get escalated. The old manager’s mantra is “Bring me options not problems”. Those doing operations know best how to fix problems.
In order to fix a problem (or an incident) you quite often have to do root cause analysis. There are formal techniques you can use to do this. Some argue that there is no single root cause of problems. It generally takes several causes together to create a problem – they have to “line up” in some way. The first and most obvious cause you find is seldom the end of the story: keep asking “why” until the answers are not useful. Finding root cause is not necessarily about assigning blame – it is about removing cause. Complex systems are in fact permanently broken, so when they actually fail it may be nobody’s fault. On the other hand there could be negligence.
Once you are tracking and dealing with problems, the next level of maturity is to “kill the alligators before they bite you”: proactively seek out problems and fix them. When you are really good you will forestall them and prevent them ever existing. Find a keen, clever, energetic employee and assign them half a day per week to be an Alligator Killer: measure them on how many problems they find and eliminate.
Your register of problems in Beetil is closely linked to your register of risks and you may want to link them somehow. An unfixed problem is one kind of risk: a risk that it will cause more interruptions to service.
The better you get at dealing with problems, the fewer incidents you will have .The other area that you can improve in order to reduce incidents is Change Management. We’ll talk about that next.