• Skip to main content
  • Skip to primary sidebar

This view of service management...

On the origin and the descent of managing services. We put meat on the bones.

  • Kanban Software
  • Services
    • Kanban Software Solutions
    • Consulting & Coaching
    • Training and Seminars
  • Posts
  • Events
    • Events – Agenda View
    • Events – Calendar View
  • Publications
    • Our Publications
    • Notable Publications
    • Quotes
  • About us

Problem Workarounds and Incident Resolutions

7 January 2014 by Robert Falkowitz 1 Comment

The Scope of this Discussion

When a problem is identified reactively, it means that one or more incidents have occurredand it has been decided to take note of and perhaps investigate their underlying causes. I exclude from this discussion both the proactively identified problems—the problems identified before any related incidents have occurred—and those organizations that treat problem management as a discipline for resolving difficult incidents rather than a discipline identifying the causes of those incidents.

Therefore, in the scope defined above, when a problem is recorded, there is always a known way of resolving any related incidents—namely, the same way one or more of the previous incidents have been resolved. I will further exclude from this discussion the case where an incident is resolved by a change that serves, at the same time, to eliminate the cause of the incident.

ITIL’s Definition of Known Error

As we know, ITIL® has defined a known error as a problem for which the root cause has been identified and for which a workaround has been identified. The question I will address here is how that workaround is different from the already known resolution of the incidents related to the problem. Another way of articulating this issue is to ask when a reactive problem might be recorded for which a workaround is not already known?

What is a Workaround?

It should be clear that all problems within the defined scope have a means for resolving related incidents without eliminating the key factors in the chain of causality that results in the incidents. But this is precisely what we mean be a workaround! So why bother specifying that a known error implies an identified workaround? In fact, it adds little or nothing to the definition of known error to refer to identified workarounds.

An Exceptional Case

There is a special case which is not even covered by such a definition. This is the case where:

  • no workaround at all is possible – either the incident is resolved in such a way that its causes are eliminated or it is simply not resolved at all
  • it is agreed with the customers that the incident does not need to be resolved within the normal timeframe. In other words, the customer agrees not to use the service feature that is failing.

Now, this exceptional case is exceedingly rare. In the vast majority of cases, a workaround is indeed available, albeit the workaround might require that the customer find a different way of working; one that does not depend on the faulty IT service. For all intents and purposes, we can ignore this exception.

Building a Better Workaround

We may conclude that the objective of problem management is not to identify some workaround for an incident. Rather, an objective is to identify a better workaround, if feasible. Better than what? Better than the way in which incidents have been hitherto resolved. What do we mean by better? We mean resolving the incident in such a way that the business impact of the incident is reduced. Here is an example. Suppose the first instance of the incident type was resolved with a loss of 2 FTEs of productivity. After developing a better workaround, it is possible to resolve such incidents with a loss of only 1 FTE of productivity. That’s better.

Why do I say feasible? Very simply, the cost of developing, implementing and using the better workaround must be lower than the probable business impact of incidents due to the problem, over a reasonable horizon. It makes no sense to invest 100’000 to improve a workaround if the savings are expected to be only 50’000.

What is a reasonable horizon? The duration of the horizon depends on several factors. The most important is the expected time to live for the system or the service having the problem. If a service or a system is to be decommissioned in 6 months, then 6 months is the maximum horizon. If a bug is known to be resolved by a new software version and it is planned to release that new version in 12 months, then 12 months is the maximum horizon.

The second factor in determining the horizon is the general policy of the organization concerning how to determine the return on investments. The varies considerably according to the risk tolerance of the organization as its capabilities to deliver solutions as planned. Thus, a low capability organization with low risk tolerance might insist on a positive ROI within 6 months. One with high risk tolerance and with confirmed capabilities to deliver solutions over the long term might require an ROI within three years. A horizon of greater than three years is probably not useful, given the fast pace at which technology changes.

It should be noted, too, that every organization has limited resources and limited management capabilities. As a result, the organization might decide not to pursue the development of a better workaround even though a positive ROI can be demonstrated. It may be that the limited capabilities and resources are to be devoted to other initiatives with an even higher ROI.

An Example of Identifying a Workaround

Here is a concrete example. An organization uses a client-server application in which the client periodically freezes up. In fact, the cause of this freezing up is knownand there is even a solution available, a solution tha requires upgrading to the next major version of the application. While he organization does indeed intend to make that upgrade, it will be extremely complicated, requiring the testing and adaptation of a very large number of procedures. What, if anything, can be done in the interim?

In order to resolve incidents related to this problem, the organization has been rebooting client computers. While this does resolve the incident, it also results in a lot of lost productivity and potentially lost data. Is it possible to find a way to minimize that loss of productivity and minimize the risk of lost data? Since the incidents recur frequently and touch a very large number of users, it is worth investing some time and resources to find a better way to resolve the incidents.

It is determined that it is possible to run a script locally that preserves the integrity of the machine, restarts the client software and allows users to get back to work much more quickly and with little risk of data corruption. The script is developed, tested and installed on all client computers. In the future, incidents related to the problem are resolved using this script, until such time as the software is upgradedand the bug itself is resolved.

Cause Determination and Workaround Identification are Not Necessarily Interdependent

So, this is an example of a workaround, designed and implemented under the auspices of problem management, that is a better workaround than the previous way of resolving incidents. It is curious to note that no knowledge of the application bug and its resolution were needed in order to determine this new workaround. In other words, it was possible to find an improved workaround without first understanding the causes of the incidents.

The Real Goal of Problem Management

The message we should retain is that the goal of problem management is not to find incident causes and it is not to find or improve workarounds. The goal is simply to reduce the impact of incidents. Improving workarounds and identifying causes are only two of the principle means by which that goal may be achieved.

Summary
Article Name
Problem Workarounds and Incident Resolutions
Description
The goal of problem management is not to find incident causes and it is not to find or improve workarounds. The goal is simply to reduce the impact of incidents. Improving workarounds and identifying causes are only two of the principle means by which that goal may be achieved.
Author
Robert S. Falkowitz
Publisher Name
Concentric Circle Consulting
Publisher Logo
Concentric Circle Consulting

Filed Under: Incident Management, Problem Management Tagged With: causality, cause, incident, known error, problem, return on investment, ROI, workaround

Subscribe to our mailing list

Click here to be the first to learn of our events and publications
  • Email
  • Facebook
  • LinkedIn
  • Phone
  • Twitter
  • xing
  • YouTube

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Kanban eLearning

Kanban training online

Recent Posts

  • Verbs, nouns and kanban board structure
  • The role of the problem manager
  • The Three Indicators

Tag Cloud

priority manifesto ITIL problem leadership kanban service manager process definition kanban board incident management tools bias service management tools tools flow manifesto for software development ITSM automation incident lean lean management rigidity histogram knowledge management flow efficiency value stream knowledge work cause Cost of Delay risk context switching kanban training process metrics waste change management process Incident Management agile service request resource liquidity impact
  • Kanban Software
  • Services
  • Posts
  • Events
  • Publications
  • Subscribe
  • Rights & Duties
  • Personal Data

© 2014–2023 Concentric Circle Consulting · All Rights Reserved.
Concentric Circle Consulting Address
Log in

Manage Cookie Consent
We use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}