• Skip to main content
  • Skip to primary sidebar

This view of service management...

On the origin and the descent of managing services. We put meat on the bones.

  • Kanban eLearning
  • Services
    • Kanban Software Solutions
    • Consulting & Coaching
    • Training & Seminars
  • Posts
  • Events
    • Events – Agenda View
    • Events – Calendar View
    • International Service Management Calendar
  • Publications
    • Our Publications
    • Notable Publications
    • Quotes
  • Subscribe

What is a problem?

17 March 2013 by Robert Falkowitz Leave a Comment

Although troubleshooting and the definitive elimination of faults has a long history, the particular innovation of ITIL® 2 was to recommend treating problems and incidents as two separate entities, each with its own life-cycle. This advice had led to a series of confusions and ambiguities, many of which have still not been resolved among the practitioners of service management and the creators of tools to support service management.

What’s the problem?

The advice of ITIL 2, retained in all subsequent versions of ITIL, refers to the intuitive practice of seeking and eliminating the cause of an incident as part of the handling of the incident. This is particularly common, even to this day, for incidents with considerable impact. The advice, which some may consider to be counter-intuitive, is to treat problems as the causes of groups of incidents and to base the priority for treating a problem on the overall impact of those incidents, rather than the impact of any single incident. Thus, a problem that causes frequent, but relatively low impact, incidents may have an overall impact that gives it a high priority.

This advice should be understood within the context of an organization having only limited resources for handling incidents and their causes. Normally, a service provider simply cannot handle every cause and every incident simultaneously and with the same priority. Therefore, it is important to get the best return on the investment in the resources by ensuring that they work first on the problems creating the most harm.

Confusing concepts

So far, so good. But when we dig in a little deeper, we see that underlying concepts of problem management are described in a somewhat confusing way. The definitions proferred by ITIL have exactly the opposite effect to the intended effects. Instead of creating a simple standard to enhance communication, they tend to create confusion.

In its current avatar, ITIL defines a problem as “the cause of one or more incidents.” This definition seems simple enough, until we dig deeper. In ITIL 2, a problem was considered to be “the unknown cause of one or more incidents” and a major purpose of problem management is to find that cause. Once found, and a workaround identified, one speaks no longer of a problem, but of a known error. Why it was called a “known error“, as opposed to a “known problem“, will remain a mystery. Many ITIL 2 trainers will recall having introduced the non-standard concept of an “unknown problem” in order to try to explain how you can identify a problem without knowing its causes.

As if this is not confusing enough, the concept of the “root cause” only compounds the discomfiture. Currently, “root cause” is defined as “the underlying or original cause of an incident or problem.” I will leave the discussion of what “underlying” and “original” mean for another posting. How is it that a root cause is the cause of a problem or an incident, whereas a problem is also the cause of an incident? A very simple concept, with a long practical history, has been made very confusing.

I realize that there are ways of expounding on these concepts so as to make some sense of them. Indeed, that is what all ITIL trainers worth their salt end up doing. Wouldn’t it be much simpler if, instead of having to produce elaborate explanations involving much legerdemain and fancy footwork, we had more direct and intuitive definitions of the concepts involved?

Focus on the symptoms

The easiest way to resolve this conundrum is looking at the reality of what we need to do to manage “problems.” In fact, when a “problem” is first identified in reaction to one or more incidents, all we know are the symptoms. It is useless to talk about causes, or even “unknown” causes, at this point. Therefore, it would be much simpler to speak of a problem as being a certain collection of related symptoms.

This approach to the definition of “problem” has many advantages. First, it reflects the reality of what we do in managing problems. Second, it is well adapted to the long history of identifying things that go wrong and trying to find out what causes them. Third, it removes the ambiguity of the terminology. Finally, it helps us to perform the work of problem management more easily and, perhaps, more automatically.

The logic of problem management, as a discipline separate from incident management, requires us to identify—that is to say, to name—whatever has apparently caused the incident. We can do this based on the symptoms alone. Suppose users frequently get a certain error message in a certain application when they attempt a certain function. Applying Occam’s razor, we assume that all these incidents have a single cause. In our search for the cause, we first identify the particular group of symptoms. The grouping of symptoms typically includes the CI or class of CI in which the fault is detected, the operational context in which the fault occurs and the unexpected behavior.

We need to name that group of symptoms so that we can easily refer to it in the future and while we attempt to handle it. The name typically is derived from the most unusual symptom. This is precisely the same method used, for example, in medicine. For example, one all too common disease today is not named for the incapacity of our bodies’ cells to assimilate glucose correctly. Instead, it is named for a common symptom of the disease, the burning and frequent need to urinate, the phenomenon to which the originally Greek word “diabetes” refers.

Once we have identified, and then prioritized, a problem as a group of symptoms, the next obvious step is to identify the cause or causes of that problem. By defining “problem” as symptoms, and not causes, we avoid the metaphysical embarrassment of trying to find the cause of the cause. We avoid having to talk about whether a problem is “known” or “unknown”. Finally, when we have at last identified the causes and a workaround, we no longer have to debate whether a known error is really an object distinct from a problem or whether it is merely a status for a problem.

What is proactive?

The difficulties in understanding what is meant by proactive problem management are themselves a symptom of the confusion concerning problem and incident terminology. In spite of the advice of ITIL 2, many service providers and many of the tools used to support problem management continue to treat problem management as the extension of incident management. They consider problem management to be the domain of major incidents or of incidents that are difficult to resolve. With such a (mis-)understanding of problem management in hand, these same people consider proactive problem management to be that part of problem management that reacts to previous incidents after those incidents have already been resolved. In short, some people think of reactive problem management as the work of handling difficult or major incidents, whereas proactive problem management is the work of identifying and resolving the causes of incidents that have already been resolved.

This is surely not what ITIL ever intended in coining the term “proactive” problem management. My point is not to criticize organizations that do not follow ITIL, which is largely irrelevant as an issue. My point is to show how fuzzy definitions turn off people, create misunderstandings and result in a failure to benefit from what is, after all, quite good advice.

ITIL’s intended meaning for proactive problem management covers the work of identifying the potential causes of failure before they do cause any incidents.  We are all familiar with the phenomenon of seeing anomalous sights and thinking, “They ought to fix that before there is an accident.” Proactive problem management is simply a structured way of identifying those underlying causes. As such, it is related on the one hand to condition-based maintenance and on the other hand to risk management.

Proactive problem management is to condition-based maintenance as reactive problem management is to incident management. For example, condition-based maintenance will check the lubricant levels in machines and top them up when necessary. In this way, it helps prevents incidents due to lack of lubricants. Proactive problem management examines the question of why lubricants need to be topped up much more frequently than expected. It tries to identify the underlying causes for loss of lubricants. As such, proactive problem management is a means for controlling risk. Because lubricant levels are unexpectedly lower than according to specific, there is a certain risk that incidents will be caused. Identifying the underlying causes helps to remove much of the uncertainty and therefore helps to clarify the priorities in addressing problems. The definition of risk, after all, is “uncertainty of outcome.” In short, proactive problem management is the work of identifying unexpected groups of symptoms, symptoms that do not follow specification, symptoms that are likely to provoke incidents if their causes are not understood and handled.

Problem tooling and automation

I realize perfectly well that merely changing the definition of a term should not change the techniques by which the underlying concepts are managed. That being said, virtually all the tools I have ever seen that are intended to support problem management are either much too complicated or much too simple. At best, they are simply administrative supports designed to document. They hardly ever really help to do the work of identifying, diagnosing or resolving problems and causes.

This is a pity, because the current state of information technology is probably advanced enough to do the work of grouping (that is to say, correlating) symptoms and identifying the probable causes of those symptoms. Clearer implementation in tools of the concept of symptom, symptom correlation and cause would go a long way to supporting much faster and probably more reliable problem identification and diagnosis.

Unfortunately, our tools today seem more concerned with administration, control and compliance than with resolving problems. It is a decadent technology used to support a decadent society. Our tools should first and foremost be designed to increase the value of our services and secondly to limit the destruction of value in our services. Any other use, such as in proving that the defined and agreed process is being followed, should be only in a very distant third position.

Summary
Article Name
What is a problem?
Description
ITIL's description of problems is confusing and misleading. We provide a simpler and more coherent vision of what a problem really is.
Author
Robert S. Falkowitz
Publisher Name
Concentric Circle Consulting
Publisher Logo
Concentric Circle Consulting

Filed Under: Problem Management Tagged With: automation, cause, decadent society, decadent technology, known error, maintenance, proactive problem management, problem, reactive problem management, risk, root cause, symptom

  • Email
  • Facebook
  • LinkedIn
  • Phone
  • Twitter
  • xing
  • YouTube

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Kanban eLearning

Kanban training online

Recent Posts

  • The Three Indicators
  • Visualization of Configurations
  • How to increase visualization maturity

Tag Cloud

context switching ITIL flow lean service request cause service manager agile bias Service Management incident management tools priority impact change control process definition value stream Cost of Delay service management tools incident urgency risk knowledge management knowledge work kanban Incident Management kanban board agility lean management rigidity problem resource liquidity statistical control chart waste change management automation process metrics manifesto flow efficiency adaptive case management manifesto for software development
  • Kanban eLearning
  • Services
  • Posts
  • Events
  • Publications
  • Subscribe
  • Rights & Duties
  • Personal Data

© 2014–2021 Concentric Circle Consulting · All Rights Reserved.
Concentric Circle Consulting Address
Log in

This site uses cookies . You accept those cookies when you continue to use this site. Cookie policyAllow cookiesNo 3rd party non-functional cookiesCookie policy
You can revoke your consent any time using the Revoke consent button.Change cookie settings