Most service management practitioners will agree that there is an advantage to using a shared and a standard terminology when speaking of how to manage services. To this end, the definitions provided by ITIL® are most often cited. Some terms, however, continue to be used in diverse ways, only adding to the confusion of those who attempt to follow industry standards. This might be due to multiple industry standards, to poetic license, to ignorance or, indeed, to laziness.
The Case of Severity
One such term is severity. Severity is normally used to describe an event or an incident.
However, some practitioners appear to use this term interchangeably with other attributes of events and incidents, such as impact or priority. I propose here a simple way of distinguishing severity from impact, one that is loosely derived from ITIL®.
Events or incidents may be viewed from two perspectives: how do they influence the customers of services and how do they influence the providers of services. We propose that the term impact should describe the influence of an event or incident on the customers, while the term severity should describe its influence on the service provider.
Since an incident or event normally has a negative influence on service customers, impact is a measure of the loss of value to the customers. Services are intended to provide value to customers; incidents diminish that value. Impact tells us by how much. Ultimately, impact should be measurable in financial terms, but this is often difficult to do or impractical during the resolution of an incident or event. Therefore, it tends to be reduced to a rather arbitrary scale, such as high, middle or low. There are numerous indicators of the probable level of impact, including the number of users touched by the incident, the profiles of those users, the role that affected components play in the delivery of services and the value of those services.
Whatever means are used to measure the probable or actual impact of an incident or event, the resulting assessment should be recognizable and verifiable by the customers. There is an importance nuance here, however. Only the customer can really assess the actual impact of an incident. However, both the service provide and the customer may together assess the probable or potential impact of an incident. The service provider must make such assessments early in the management of events and incidents, even if the customer provides no input at all.
Severity measures the effort and expense required by the service provider to manage and resolve an event or incident. A number of examples illustrate this definition. A failed disk that may be replaced using a hot swap is an incident less severe than one requiring the shutdown and opening of a computer chassis. The failure of a device costing 100,000 to replace is more severe than the failure of a device costing 10,000. The incident whose resolution requires five technicians and ten hours of work is more severe than the incident requiring 1 technician and ten minutes.
As for impact, severity should ultimately be measurable in financial terms. Similarly, such measurements are much easier to make after the fact than during investigation and resolution. The major difference between impact and severity is that impact depends on how customers are using a service. The profiles of use are only indirectly influenced by the service provider. The same service provided to two different customers making different use of the service may end in radically different impacts in the case of failure. A lost email message intending to place an order to buy or sell stocks will have a very different impact than a lost email thanking a customer for years of loyal patronage.
In fact, severity is often due to a series of strategic and tactical choices made by the service provider. Severity is therefore directly influenced and under the control of that organization. If incidents are more severe because an organization has not enforced any standards for the models and configurations of its servers, then that severity is the direct result of a decision (or a decision not taken). A service provider may choose not to invest in the ongoing development and education of its technicians. In that case, it should not be surprised if the staff takes longer to diagnose and resolve incidents, thereby making them more severe than they might be otherwise.
Even though impact and severity measure similar aspects of a single event or incident, there is no necessary relationship between the degree of impact and the degree of severity of a single incident. Simple to resolve incidents might cause great impacts, just as incidents of minor impact might be very difficult to resolve. Imagine, for example, the case of an Internet bank cut off from the Internet because someone unplugged a cable – huge impact, trivial resolution. It might happen that the severity of an incident is greater than it need be. This may be because the service provider has a fragile, complicated and poorly understood infrastructure. In such cases, that incident is likely to have a greater impact than it need have, due to the length of time required to resolve it.
This latter example brings us to the use of the severity metric in managing IT. Severity measurements of events and incidents help to gauge, in common terms, the pain caused by the inadequacies of its resources and capabilities. Such pain points are obvious places for initiating improvements, especially if they represent quick wins. A Pareto analysis of the distribution of severity might be one factor in deciding how to prioritize all improvement initiatives. I emphasize one, because there should be some balance among operational, tactical and strategic improvements, whether there is pain or not.
In theory, severity could be a metric used for comparisons with industry benchmarks. In practice, it is extremely difficult to compare practices from one organization to another in a meaningful way. Any desire to benchmark severity ought to be handled with great prudence.
To close this discussion, it is important to mention that impact, and not severity, ought to be used to prioritize the handling of events and incidents. If a service provider puts the low severity incidents – meaning the ones that are easier to resolve – at the top of the queue, it is probably not well aligned with its customers’ priorities.
Concepts, not Terms
I do not expect universal adoption of this understanding of severity, given that other meanings are sometimes well entrenched in a company’s culture. The importance of the discussion, then, is not so much in the terms themselves as in the underlying concepts, each of which is distinct and is a useful metric for the management of services.