The causes of incidents

Are 80% of incidents really caused by changes?

In a recent thread, it was advanced that 80% of incidents were caused by changes. Since that figure does not correspond to any experience I have had with any organization, I thought it would be worthwhile to investigate a little how some organizations perceive the causes of the incidents that occur in their IT services.

The limits of Internet surveys

I am acutely, indeed painfully, aware of the limits of such investigations based on a survey advertised via a variety of LinkedIn and Facebook groups. The respondees are self-selecting and not random. The sample size is not very large. No matter how simple you try to make a survey, some of our dear IT brethren find the terminology too complex, hard to understand or poorly defined and they interpret the questions in different ways. The more you try to ensure accurate data with checks and controls, the fewer the responses are recorded.  And most importantly of all, I doubt that most organizations really have the statistics to back up the values they report. Even though the survey is completely anonymous, there is a well known tendency for people to overestimate or be optimistic about what they do and know.

Therefore, the data reported here is most likely understood as the perceptions of socially engaged IT personnel willing to share some information. As such, it is no better nor worse than any other anecdotal information that makes its way into social media and tends to get repeated out of context and as if it were really true.

I note, too, that the free version of SurveyMonkey, used for this survey, has become largely useless for analyzing the collected data. I welcome suggestions from anyone regarding a viable alternative.

The design of the survey

The survey asked about the causes of IT incidents. Perhaps it should have talked of the causes of incidents in the services based on information technology, instead. At least one respondent thought that “users” could not cause “IT” incidents. Oh well.

It divided those causes into four categories:

  • incidents caused by changes
  • incidents caused by defects in components
  • incidents caused by users
  • other causes

With Einstein, I admit that this taxonomy might be too simple. I thought it to be is pretty orthogonal, but many respondees thought otherwise. Anyway, a change includes the introduction of new hardware or software or the reconfiguration of existing hardware or software. A component defect is, very simply, something that breaks, typically due to age or to patterns of use. Since most IT services are operated by end users, it should be clear that an operator can make a mistake and cause a failure. One respondent pointed out that some incidents might be due to inadequate capacity. To the extent that inadequate capacity is the result of implementing an incorrectly dimensioned system, I consider this to be a type of change. Capacity-related incidents may also  be caused by changes in load patterns for which no corresponding changes in capacity have been made.

Analysis of the results

The survey was very simple, so the results can be displayed without graphs.

Number of responses:

61 (of whom 2 neglected to provide any data about the causes of incidents)

Complexity of organizations:

Simple:    3%
Medium: 22%
Complex: 75%

Size of IT staff:

<100:         16%
100-1000: 35%
>1000:       49%

Incidents caused by changes made by IT (including releases of software)

Fig. 1: Changes as cause of incidents, by organization size and complexity

Fig. 1: Changes as cause of incidents, by organization size and complexity

Fig. 1 shows the percentage of responses indicating that changes by IT are a cause of incidents, split out by both the size of the organization and its complexity. Not all combinations of size and complexity were recorded, as might be expected. There are two remarks to be made:

  • Neither size of the organization nor complexity of IT appears to have a significant impact on the results.
  • The four bumps in the graph, at 10-19%, 30-39%, 60-69% and 80-89% are not easily explained. Do they represent psychological phenomena? Are the respondees giving data based on reports from incident logs or are they provided seat of the pants impressions, not backed up by real data?
  • Fewer than 10% of the respondees confirmed the initial report of changes being the cause of 80% of incidents.

Incidents caused by defective components

 Fig.2: Component failure as cause of incidents, by organization size and complexity

Fig.2: Component failure as cause of incidents, by organization size and complexity

Once again, we see the strange bumps in the smoothed curve of the percentage of organizations reporting the prevalence of component failure as a cause of incidents. But the bumps are not exactly in the same places as for incidents caused by IT changes. There are many open questions, such as whether the respondees consider a component failure in a redundant system, where service continues, to be an incident or not.

Incidents caused by users

Fig. 3: Users as cause of incidents, by organization size and complexity

Fig. 3: Users as cause of incidents, by organization size and complexity

The respondees seemed somewhat reticent to blame incidents on users, albeit a few found them to be a very important cause. I assume that many of the respondees did not consider user support calls to the service desk as examples of incidents. Otherwise, we might have expected a very large number of user-caused incidents.

Other causes

No attempt is made to analyze other causes.

Synoptic view of all causes

Fig. 1: Tabular analysis of causes

Fig. 4: Tabular analysis of causes

Although a very few number of respondees clearly indicated that IT changes are the major cause of incidents, they were certainly in the minority. Although most respondees consider that the causes of incidents are multiple and spread out, the overall responses do show that IT changes are considered to be slightly more important as a cause of incidents than the other categories of causes.

Some of the reponses indicated a high percentage of incidents due to other causes. As we have not attempted to analyze what those other causes might be, we can only suppose that this represents a weakness in the survey itself as well as diverse understandings of the questions by the respondees. Indeed, we are obliged to take many of the responses with a grain a salt, given that the total percentages of all the causes were sometimes well under 100%.

Creative Commons Attribution-NonCommercial ShareAlike 4.0 International License.The diagrams in this posting are licensed to you under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
Robert Falkowitz

About Robert Falkowitz

Robert S. Falkowitz is the founder and principal consultant of Concentric Circle Consulting, a company specializing in Service Management training and projects. He has practical experience in all phases of IT management, from software development through infrastructure design and package integration. He has also had roles in quality management, strategy and planning. Robert has provided services to companies in the aviation, logistics, pharmaceuticals, telephony, finance and banking industries, amongst others. With a doctorate from the University of Pennsylvania, he had a career in teaching and research at universities such as Yale, Chicago, Emory and Cornell, before entering the commercial sector. He is ITIL V2 qualified at the Manger level, ITIL V3 qualified at the Expert level, and holds the itSMF ISO/IEC 20000 Consultant qualification and was awared priSM's DPSM credential in 2011. Robert has been a member of the board of itSMF Switzerland since 2007, a founding member of the itSMF International Ethics Review Board, the Translation Officer of IPESC, and a member of the itSMF Publishing Editorial Advisory Task Force. A frequently invited lecturer, he is the author of IT Tools for the Business when the Business is IT: Selecting and Implementing IT service management tools, an itSMF International imprint published by TSO in 2011.
This entry was posted in Incident Management and tagged , , , . Bookmark the permalink.

4 Responses to The causes of incidents

  1. Robert Falkowitz Seymour Hosking says:

    I felt that the survey was a good start to get people like us thinking realistically about the causes of incidents. I realise now that I classified incidents as “events that raise someone’s blood pressure”.
    In my opinion users are very aware when IT makes a change, and IT is a sitting duck for blame. Users are not so aware of their own actions – I often tell a user “you must have pressed this key” but they insist they hadn’t!

    • Robert Falkowitz Robert Falkowitz says:

      I think you open the discussion, Seymour, to an issue that will become increasingly important as IT support becomes more social and the difference between user and supporter fades. Who, in the end, will be responsible for classifying the causes of incidents? Will it stay with the supporter, will it become subject to “Likes”, will the customer become the ultimate arbitrator? Does it make any difference? There will be many new challenges in our brave new world.

  2. Pingback: Are 80% of all IT Incidents Change-Related? - ITSMTransition

Leave a Reply

Your email address will not be published. Required fields are marked *