Professional Documents
Culture Documents
Methodology
There is a close relationship between problem management and the major incident process.
An incident is any event that is not part of the standard operation of a service and that causes
an interruption or a reduction in the quality of that service. Incidents are recorded in a
standardized system which is used for documenting and tracking outages and disruptions. A
Major Incident is an unplanned or temporary interruption of service with severe negative
consequences. Examples are outages involving core infrastructure equipment/services that
affect a significant customer base, such as isolation of a company site, which is considered a
Major Incident. Any equipment or service outage that does not meet the criteria necessary to
qualify as a Major Incident is either a Moderare, Minor or Normal Incident. Major incident
reports are escalated to the problem manager for quality assurance.
Incident Pyramid
The scale of incidents follows an Incident Pyramid where the most incidents are normal,
escalating up to a singular Major Incident.
Iceberg
Incidents are a portion of activity in problem management that forms the tip of an iceberg. The
major incident process deals with the visible portion of the iceberg, while in the greater field of
problem management a large number of non-visible issues are lurking.
Notification (issued within <X> Preliminary (issued within <X> Final (issued within 6 <X>
6 working hours of trigger) 6 working hours of working hours of normal
workaround) business operations
Details
Description
Service desk / Risk logging <References>
Trigger (who requested the report/notification) <Job title of person>
Service affected <Name in service catalogue>
Data networks <X> AD <X>
Messaging <X> Security <X>
Payments <X> Operations <X>
Voice <X> Service desk <X>
Hosting <X> Monitoring <X>
Intranet <X> Printing <X>
Documents <X> Third party <X>
Ecommerce <X> Extranet <X>
Backups <X> <X>
Storage <X> <X>
Identification (please clearly describe the incident and its symptoms – immediate and visual causes)
<Description of the incident or outage and including the symptoms displayed or experienced>
Conditions (please describe the environment – business or IT – conditions that caused or were present during
the incident)
<The business and IT conditions present when the incident or outage occurred>
Resolution
Initial (describe the workaround)
<Initial actions and any possible workaround>
Execution
Timelines (date and times) the expanded incident lifecycle
Time when incident started (actual – <dd/mm/yy> <hh:mm>
something has happened to a CI or a risk event has
occurred)
Time when incident was detected <dd/mm/yy> <hh:mm>
(incident is detected either by monitoring tools, IT
personnel or, worse case, the user/customer)
Time of diagnosis (underlying cause – we know <dd/mm/yy> <hh:mm>
what happened?)
Time of repair (process to fix failure started or <dd/mm/yy> <hh:mm>
corrective action initiated)
Time of recovery (component recovered – the <dd/mm/yy> <hh:mm>
CI is back in production – business ready to be resumed)
Time of restoration (normal operations resume <dd/mm/yy> <hh:mm>
– the service is back in production)
Time of workaround (Service is back in <dd/mm/yy> <hh:mm>
production with workaround)
Time of escalation (to problem management <dd/mm/yy> <hh:mm>
team)
Time period service was unavailable (SLA measure) <minutes>
Time period service was degraded (SLA measure) <minutes>
Measurement
Function Please select the most appropriate
<Data networks, Messaging, Voice, Payments, Hosting, Intranet, Security,
Document management, AD, Storage, Service desk, Backups, Operations,
Third party, Printing, Monitoring, Ecommerce>
Cause Please select the most appropriate
<Availability, Configuration, Carrier, Service provider, Environmental, Bug,
Hardware, Vendor, Process, Capacity, Change>
Type (mark with a X) To calculate the IUM please select a single type which best describes the incident
Classification
Scope (Mark with a X) Dashboard designation = S
(4) More than 50% of customers affected <X>
(3) More than 25% of customers affected <X>
(2) Less than 25% of customers affected* <X>
(1) Less than 1% of users affected <X>
(0) Single IT customer affected <X>
Credibility (Mark with a X) Dashboard designation = CR
(4) Areas outside the company will be affected negatively <X>
(3) Company affected negatively <X>
(2) Multiple business units affected negatively <X>
(1) Single business units affected negatively <X>
(1) No credibility issue* <X>
Operations (Mark with a X) Dashboard designation = OP
(4) Interferes with core business functions <X>
(3) Interferes with business activities* <X>
(2) Significant interference with completion of work <X>
(1) Some interference with normal completion of work <X>
(0) No work interference <X>
Urgency (Mark with a X) Dashboard designation = U
(4) Underway and could not be stopped <X>
(3) Caused by unscheduled change or maintenance <X>
(2) Incident caused by a change <X>
(1) Incident caused by scheduled maintenance <X>
(0) Completion time not important* <X>
Prioritization (Mark with a X) Dashboard designation = P
Reviewing the scope , credibility, operations and urgency please classify the
priority of the incident
(4) Critical - An immediate and sustained effort using all <X>
available resources until resolved. On-call procedures
Outage analysis
Service period outage classification (Mark with a X) Dashboard designation = P
(4) Critical - App, server, link (network or voice) <X>
unavailable for greater than 4 hours or degraded for
greater than 1 day – negative business delivery for more
than 1 month
(3) Major - App, server, link (network or voice) unavailable <X>
for greater than 1 hour or degraded for greater than 4
hours - negative business delivery for more than 1 week
(2) Moderate - App, server, link (network or voice) <X>
unavailable for greater than 30 minutes or degraded for
greater than 1 hour - negative business delivery for more
than 1 day
(1) Minor - App, server, link (network or voice) unavailable <X>
greater than 5 minutes or degraded for greater than 30
minutes - negative business delivery for more than 1 hour
(0) Low* - App, server, link (network or voice) unavailable <X>
for less than 5 minutes or degraded for less than 30
minutes - negative business delivery for less than 1 hour
Service consequence outage classification (Mark with a X) Dashboard
designation = C
(4) Critical - Financial loss, which puts a business unit in a <X>
critical position - greater than $10m or substantial loss of
credibility or litigation or prosecution or fatality or
disability.
(3) Major - Financial loss which severely impacts the <X>
profitability of a business unit - greater than $1m or
serious loss of credibility or sanction or impairment
(2) Moderate - Financial loss which impacts the <X>
profitability of the business unit, greater than $100k or
embarrassment or reported to regulator or hospitalization.
(1) Minor -Financial loss with a visible impact on <X>
profitability but no real effect, greater than $10k or some
embarrassment or rule or process breaches or medical
treatment
(0) Low* - Financial loss with no real effect, less than <X>
R50k or irritating or no legal or regulatory issue or no
Risk management
Risk impact (Mark with a X) Dashboard designation = I
Evaluate the data and information that is directly effected by the incident
taking into account the involvement of the people, process, products and
partners.
“At Risk” issues
<People, process, products and partners>
Confidentiality (Information is Secure <X>
accessible only to those authorized)
Confidential <X>
Restricted* <X>
Public <X>
Integrity (Safeguarding the accuracy and Very high <X>
completeness of information)
High <X>
Moderate* <X>
Low <X>
Availability (Authorised users have Mandatory <X>
access to information when required.)
Very high <X>
High <X>
Moderate* <X>
Low <X>
Rating Taking into (4)Critical (3)Major (2)Moderate (1)Low (0)None
account the above please <X> <X> <X> <X> <X>
rate the Risk impact
Risk vulnerability (Rate as either low, moderate, high or major) Dashboard designation = V
Rate the vulnerability in the following categories of the information or data that
is affected by the incident
Loss <low, moderate, high, major>
Error <low, moderate, high, major>
Failure* <low, moderate, high, major>
Rating Taking into (4)Critical (3)Major (2)Moderate (1)Low (0)None
account the above please <X> <X> <X> <X> <X>
rate the Risk vulnerability
Countermeasures Dashboard designation = CM
What measures are in place to mitigate any risks identified with the
information or data affected by the incident
<Due diligence>