You are on page 1of 24

CRISIS MANAGEMENT PLAN

IKE S. GABRIEL

CRISIS MANAGEMENT OBJECTIVE


To have as little impact as possible on the traffic if a catastrophic outage occurs.

To shorten the outage duration by containing the crisis as early as possible.


To inform higher management immediate after the outbreak of a catastrophic event in the network in order to deploy all necessary support and resources

KEY NOC OBJECTIVE


100 % OF NETWORK OUTAGES MUST BE DETECTED AND ESCALATED WITHIN 15 MINUTES

B. KEYS TO AN EFFECTIVE MANAGEMENT OF CATASTROPHIC EVENTS


STRICT ADHERENCE TO ESCALATION PLAN INFORM HIGHER MANAGEMENT AT THE OUTSET OF A CATASTROPHIC EVENTS ENGAGE RESOLVING UNIT AS EARLY AS POSSIBLE ASSIGNING A MANAGEMENT STAFF TO ACT AS THE CENTRAL POINT OF COORDINATION AND CONTROL THE FLOW OF INFORMATION

C. ESCALATION PLANNING
Guide the Crisis Management team whom to call and seek help when facing a catastrophic outage/event with aim of containing a crisis.
Provide instructions what initial action to take and after what time the next level of help must be contacted.

ESCALATION TIME FRAMES


Crisis Management team must have: Right focus to the problems at the right time. Focusing on the right competence when required. Correctly use channels and the Crisis Management team must knowledgeable that there exists single point of interface towards each channel i.e. the interface towards the support group/vendor.

Escalation Call Flow


M U S T B E C O N T A C T E D W I T H I N 1 5 M I N U T E S

CONTACT HEAD OF NOAT

COLLABORATE WITH VENDOR LOCAL SUPPORT AND GLOBAL TAC (R&D) Require Third Level of Support? CONTACT VENDOR LOCAL SUPPORT

If not contacted, escalate to Next Level Management Officer.

CONTACT VP FOR NOC

If not contacted, escalate to Next Level Management Officer.

Require Second Level of Support?

MANAGER OF PIONEER NMC or REGIONAL CEBU NMC


SIC to Inform Immediate Head

COORDINATE FAULTS/OUTAGE WITH RESOLVING UNITS (RAFO, NTBN, DNS, CORE, ETC.)
Immediate Assignment of Fault to Resolving Unit OCCURRENCE OF CATASTROPHIC OUTAGE AND EVENT

NOC

Alarm Monitor and Detected within 15 Minutes

CATASTROPHIC OUTAGE MANAGEMENT ESCALATION MATRIX


ELAPSED TIME IN MINUTES
0 ~ 15 16 ~ 60 61 ~ 120 121 ~ 240

DUTY NMC Engineer

Detect Outage and Starts Isolating the problem. Notifies Head of NMC.

-Coordinates with Resolving Team

- Monitors clearing of
alarms and traffic normalization

-Continue coordination with Resolving Team. Send Update

-Continue coordination with Resolving Team. Send Update. - Monitor clearing of alarms and traffic normalization

- Monitor clearing of
alarms and traffic normalization. Collaborate with Vendor Local Support to contain crisis. Correlate events and KPIs with outages Vendor Local TAC takes over responsibility to neutralize fault. Assigns product expert to solve the problem

DMPI O&M RESOLVING TEAM

Deploy Quick Reaction Team to Location of crisis. Contact Vendor Support Team for assistance if necessary

O&M Support investigates the problem and attempt to neutralize. Seek support from Vendor experts if require. Vendor O&M team provide remote support while Technical Expert on their way to NOC/ affected sites.

Collaborate with Vendor Support to contain crisis. Investigate work-around and trigger contingency plan.

VENDOR LOCAL EXPERTS 3rd Level

Collaborate with Global TAC /R&D to neutralize fault. Work with DMPI/DTPI O&M support to resolve the problem
Vendor Global TAC takes over responsibility to neutralize fault. Work with local to resolve the problem

VENDOR GLOBAL RESPONSE CENTER or R&D Head of NMC


Notify Heads of NOC and NOAT. Engage support team to determine extent and find a workaround solution. -Collaborate with Heads of O&M Resolving Team - Decide on further action to be taken -Collaborate with Heads of O&M Resolving Team -Get update from NMC and Brief Head of NOC/NOAT -Wait instructions from Head of NOC/NOAT on what further action to be taken.

-Collaborate with Heads of O&M Resolving Team -Get update from NMC and Brief Head of NOC/NOAT -Wait instructions from Head of NOC/NOAT on what further action to be taken.

E. Officer-In-Charge Crisis Management


OIC- Crisis Management Team Shall be responsible for managing catastrophic outage for particular week. He/She will be responsible in escalating to higher management to dispatch required support and make decisions as required to contain a crisis. He/She may be required to be at the CG2 NOC during catastrophic situations and must be reachable with two (2) phones.

Catastrophic Event/Outage Duty Officer Alex Galzote Arnold Melgarejo Richard Cadungog Tante Valdez Arnold Pedro PJ Capiral Melai Sabidong Alex Galzote Arnold Melgarejo Richard Cadungog Tante Valdez Arnold Pedro PJ Capiral 1st Duty Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13

DUTY WEEK 2nd Duty Week 14 Week 15 Week 16 Week 17 Week 18 Week 19 Week 20 Week 21 Week 22 Week 23 Week 24 Week 25 Week 26 3rd Duty Week 27 Week 28 Week 29 Week 30 Week 31 Week 32 Week 33 Week 34 Week 35 Week 36 Week 37 Week 38 Week 39 4th Duty Week 40 Week 41 Week 42 Week 43 Week 44 Week 45 Week 46 Week 47 Week 48 Week 49 Week 50 Week 51 Week 52

INSTRUCTIONS TO OIC-DUTY OFFICER


Must be accessible 24X7 through the entire duration of his/her duty. In case, his/her present location has no coverage, he/she must inform the NOC Duty Engineer to forward the call to any ALTERNATE PHONE. Must have the necessary cash advances ( EMERGENCY CASH FUND) and have knowledge whom to call to get support. In case Duty Officer would not be available, he/she must designate an alternate OIC for his/her replacement. May call NOC time to time to check the status of the network. At the outbreak of a catastrophic outage/event, Officer in Charge will call the Head of NOC and NOAT immediately to seek help to deploy support team from other department.

HUAWEI Support Service

HUAWEI Local Support Organization


HQ Business Unit Head

Over-all Chair of Crisis Management R. Frey/R. Zawila

Country Head C. Li Deputy : Maxin

APAC TSD Head- Ma Xiao Bo

Philippine TSD Head- Jack Ruan Crisis Management Team

CHINA R and D

Weekly Assigned Officer/ O&M Heads

Customer Support Head- Gary Cai Service Delivery Mgr. Peter Zhang

Maintenance Manager

Technical Director Xue Shi Jun

Nelson Villoria

2G/3G RAN LTAC Zhang Chong

CS Core LTAC Wang Guodong

PS Core LTAC Xue Shi Jun

IN-VAS Shu Peng

ROUTER/SWITCH (DATACOM) Zhou zhi hao

TRANSMISSION Fang Yong Liang

Escalation Procedure Name list


Huawei Particulars position DMPI

Name

Tel./Mobile Nos.
02 8190532 0922 8482934 0922 8016301 0917 8306301 0922 8006189 0922 8990212 0922 8990968 0917 8679888 0922 8341001 0917 5954485 0922 3613941 0916 4188028 0922 8850125 0922 9508457 0917 9017797 0922 8991619 0917 8513210

Emails

Name

Tel./Mo bile Nos.

Emails

HOTLINE

HW-TAC Country President Account Director Account manager

PH-TAC liwei maxin xujiaxiang ruanjiahai zhangligang caigaoyang huang zhan Panda Huang xue shi jun

phil_support@huawei.com li.wei@huawei.com maxin@huawei.com xujiaxiang@huawei.com jhruan@huawei.com zhangligang@huawei.com caigaoyang@huawei.com huangzhan@huawei.com huangxiarong@huawei.com xueshijun@huawei.com

HW Top Manager Group

TSD Director Service Delivery Manager Customer Support Manager Technical Director Maintenance manager Digitel Network CTO Core Network-Team leader

Wang Guodong
Joel Sabidong Liu Dongbo Shu Peng Fang Yongliang

0922 3861982
0922 8115635 0922 5306045 0933 9471514 0908 1577115

Wangguodongph@huawei.com
joelvs@huawei.com liudongbo@huawei.com shupeng1@huawei.com fangyongliang@huawei.com

HW Maintenance Team

Wireless-Team leader Data Com-Team leader A&S-Team leader Optical Network-Team leader

Page 11

Maintenance Service level agreement


Service availability language Hotline service availability period On-site support service availability period Problem Hotline response time Departure time from Notification (if 1st Level (Catastrophic) <15mins. <20 mins. <4 hours Neutralization Time (upon site arrival) Note: 2hrs for Core Network English 7 days x 24 hours 7 days x 24 hours 2nd Level (Critical) <15mins. <3 hours <3 days Note: 4hrs for Core Network 3rd Level (Major) <30 mins. <12 Hours <6 days 4th Level (Minor) < 5days Reasonable Effort Next update

on-site intervention is required)

Report Time
Final Solution Time (For Bug) System patch (software update) service starting time <45 days

<48 hours (after neutralization)


<60 days <90 days

Next update
Next update

Planning to be mutually agreed 7 days x 24 hours Emergency Replacement

Lead time (for spares)

10 working days for Hong Kong spare parts center replacement


45 working days for R&R Center in China

Noted: Maintenance service for DMPI will be end of 2012-12-31 ,but that of DTPI with 80% has expired.

DMPI-Huawei Escalation Interface


HW Manager Member: Service Delivery Manager Customer Support Manager TSD Director; Account manager Account Director Country President

HW Top Manager Group

DMPI Top Manager Group

5 Mins

HW HQ Expert Group

HW Maintenance Manager

DMPI Maintenance Manager

10 Mins

5 Mins

Info Sharing
DMPI Maintenance leader

HW Maintenance leader

5 Mins

HW Engineer

DMPI NOC Engineer

Page 13

Huawei DMPI Joint Maintenance Team


Sam Xu (Xu Jiaxiang) Digitel Account Manager Jack Ruan (Ruan Jiahai) Philippines TSD Head

Rudi Sponsor

Peter Zhang Service Delivery Manager

Gary Cai Customer Support Dep. Manager

IKE NOC Manager

Huang Xiarong Maintenance Delivery Manager

Xue shi jun DMPI network CTO

Dolly Esplana Julius Rodriguez NOC Core Team

Wang Guodong Core Product Maintenance Leader

Core Product Engineer Yi Zhan Audi/Rene

DMPI NOC Wireless Team

Joel V Sabidong Wireless Product Maintenance Leader

Wireless Engineer Huang Guodong Eric/Jay

Mike c NOC Datacom Team

Liu Dongbo Data & Access Product Maintenance Leader

Data & Access Engineer Richard Liu Fumin/Cleo

Jenny
Rhoda Campos NOC IN & VAS Team

Shu Peng IN & VAS Product Maintenance Leader

IN & VAS Engineer Yuzhenhua/Jo are/Shi wei/Errol/Ryan/Lloyed Optical & MW Engineer Mark Rey

DMPI NOC Optical Team

Fang Yongliang Optical & MW Product Maintenance Leader

Page 14

Responsibilities- Network Management Center Functions


The NMC role under the directions of the Head of Crisis Management is to counter short-term disturbances and congestion in the network due to:
o o o o o Failure of the Core network Failure of the Transmission Backbone System Accidental cutting of fiber cable, interconnection facilities Earthquake, flooding, fire, etc Special events, exhibition and sporting events

In the event that this situations occurs, NMC has two ways to deal with this: o Redirecting traffic through unaffected parts of the network o Reducing the demand on the network by blocking less priority users. o Implement traffic control such as call gapping, SS7 link distribution and BSC/RNC blocking, activating MSC congestion reduction mechanism

In case of disaster, for instance, priority might be given to police and other emergency services. In cases of national emergency (war), government and military would be given priority.

A1. CONTENTS OF EMERGENCY BINDER


General
This chapter would contain the general information and guideline for pre-requests for using the emergency binder and how to make an analysis of the fault situation.

Emergency Telephone Numbers


This chapter would contain the information about useful telephone/Cell phone number that can be contacted during crisis: Operation and Maintenance Center for BSS and Transmission Operations and Maintenance Center for NSS and GPRS Head Network Management Center Head Network Operations Center Regional Field Maintenance Center Manager of the different Regional Field Maintenance Center Head of the ACCESS Field Operations Center Power and Air-conditioning maintenance personnel Spare parts and Logistics HUAWEI Local Support Center ERICSSON Local Support Center ACISION Support personnel Police Fire Brigade MERALCO and other provincial electric company Complete list of IN-VAS personnel Complete list of all telephone number to all MSC locations

A2. CONTENTS OF EMERGENCY BINDER


EMERGENCY CATEGORIES
Following are some of the example of emergency categories. The complete list is located in the appendix summarized with detail description and alert code assignment. The following category will have corresponding Operational instructions as attached in the appendices. The following define the different emergency situations: Cyclic Restarts- The NSS heading and restart heading are repeatedly output at the workstation. Restarts do not result in reloads even though they are less than 10 minutes apart. Cyclic Reload - One or more automatic system reloads in the MSC from the Hardisk have failed to recover the NSS. System Stop - Failure of the MSC/HLR/VLR/MGW, GPRS, IP SWITCHES/ROUTERS,IN-VAS to carry out recovery attempt Charging Failure (CDR Collector/Charging Gateway)- If the Charging records functions fails and all CDR devices is seized. System Overload/ Reduced Traffic Handling - Substantial reduction of traffic handling of the MSC without a restarts e.g. excessive delay for Call Set up, SS7 instability, VLR instability, SS7 device lockout, abnormal processor load, route congestion or external indicators. Power Failure - Power at the MSC i.e. how to handle the situation when the batteries are reaching critical low voltage situation. Loss of Interconnection Links Calls to/from GLOBE/SMART/PLDT is impossible. Fire - In case of fire, special instructions are required to handle such emergency Backbone Transmission Failure - Situation affecting isolating an entire MSC or BSC Flooding and Earthquake - Special instructions required to personnel

CRISIS CLASSIFICATION
DESCRIPTION OF EMERGENCY SCENARIO Class Total Breakdown of Interconnection Links within other Operators Interconnection network such as SMART, GLOBE & PLDT, IGF Total Breakdown of BSC/RNC Due To Hardware Fault BSS/RNC Breakdown of the Media Gateway Hardware Affecting Multiple BSS/RNC BSC/RNC Breakdown of the SGSN/GGSN Hardware BSS/RNC Corrupted CDR data -Too many erroneous CDRs (50% of the total CDR Collector CDRs); Corrupt data - Too many duplicated CDR's (50% of the total CDR Collector CDRs) on counter reports Total failure of all Processes in the CDR Collector CDR Collector Total break down of Clustered Server or Charging Gateway Charging Gateway Total Breakdown of MSC Rectifier ENVR_NSS Total Breakdown of MSC Inverter ENVR_NSS Total Breakdown of BSS /RNC DC Power Distribution ENVR_NSS Total Breakdown of Transmission Backbone DC Power ENVR_NSS System/Distribution Failure of UPS Emergency Power ENVR_NSS Building Fire Alarm ENVR_NSS Breakdown of Fire Suppression System ENVR_NSS MSC High Room Temperature ENVR_NSS MSC Door Intrusion Alert ENVR_NSS Breakdown of more than 2 MSC Air-conditioning Unit ENVR_NSS Breakdown of Genset at the MSC ENVR_NSS Day Tank Fuel Critical Level at MSC ENVR_NSS AC Main Failure to MSC Rectifier ENVR_NSS Charging function stop from any GSN. 3G No transfer of CDRs to CDR Collector 3G Breakdown of one CDR Collector server 3G Breakdown of one CG LAN router/switch 3G Total break down of Ethernet switch on Gi interface. 3G Total break down of DNS/DHCP 3G Total break down of Border GW 3G Data transfer between HLR and SGSN not possible 3G Breakdown of 3G IP backbone LAN Switch 3G Total failure of Iub traffic to RNC 3G Breakdown of SS7 signaling links to HLR 3G Type
Quality of Service Hardware Hardware Hardware Application Application Application Hardware Hardware Hardware Hardware

Escalation Up to

Alert Code
Yellow Orange Orange Orange Orange red orange Orange Red Red Yellow

Hardware
Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware Application Application Application Application Hardware Hardware Hardware Hardware Hardware Transmission Transmission

Yellow
Yellow Red Yellow Yellow Yellow Orange Orange Yellow Yellow Yellow Yellow Orange Orange Orange yellow yellow yellow yellow orange yellow

CRISIS CLASSIFICATION
Corruption of Vital Service Profile Data Unable to perform deduction/refund via PMC Periodic fee is not functioning No generation of call detail records in the IN Failure of Oracle database Retrieval from backup not possible Total break down of IN platform IN Platform Shared hard drive failure Loss of a non-redundant element Repetitive changeover to standby platform Detection of multiple changeover to standby platform 100% of the Bearer Connection is down 100% Signaling link Failure on PCM/Trunk Layer 3 Switch routing/link Failure Total loss of call processing Total failure of recharging functions for all prepaid accounts Impossibility to carry out a basic operation function Unable to perform outgoing calls Incorrect generation or loss of call records Loss of connection to CDR Collector Two consecutive switch over for HLR Data transfer between HLR and MSC/VLR not possible Total break down of HLR Total Break Down of MSC or MGW Two consecutive switch over for MSC SG M3UA Overload for Multiple MSC Total loss of MSC call handling functions No location update for more than 70% of booked subscriber in VLR Total loss of the signaling links (SS7) MSC/HLR to SGW Breakdown of Transmission to DIGITEL LEC (PSTN/IGF) Breakdown of Transmission to other PLMN (GLOBE & SMART) More than 50% of calls are rejected Total Loss of Connection between HLR and CDR Collector IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS IN-PPS Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Core Network Application Application Application Application Database Database Hardware Hardware Hardware Hardware Hardware Network Network Network Services Services Services Services Application Application Hardware Hardware Hardware Hardware Hardware Hardware Transmission Transmission Transmission Transmission Transmission Transmission Transmission yellow orange orange orange orange orange red red yellow orange yellow red red red red red orange orange orange orange red red red red red Red Red Red Red Orange Red Yellow Yellow

CRISIS CLASSIFICATION
Total Loss of Network Supervision to all MSC/MGW/SGW Total Loss of Connection to more than 50% of total number of Transmission Backbone Nodes Total Loss of Connection to GGSN or SGSN Total Loss of Connection to OMC-R by more than one BSC No Network Supervision for entire BSS network element. Total break down of one or more OMC-x or INMS server Total Loss of an application critical to monitoring IP backbone down Database Corruption More Than 10% of the Total BTS Isolated Due To Transmission Failure One (1) BSC Isolated Due To Transmission Failure One (1) RNC Isolated Due To Transmission Failure One (1) MSC Isolated Due To Transmission Failure Database crash Loss of billing data Breakdown of 2 or more SMSC Front-ends Total loss of SMSC interconnections to the SG Total Loss of SMSC interconnections to the Layer 3 switch Total Failure of SMSC to handle SMS processing OMC OMC OMC OMC OMC OMC OMC OMC OMC Transmission Transmission Transmission Transmission VAS-SMSC VAS-SMSC VAS-SMSC VAS-SMSC VAS-SMSC VAS-SMSC Application Application Application Application Application Hardware Hardware Hardware Hardware Network Network Network Network Database Database Hardware Interface Interface Services Yellow Yellow Yellow Yellow Yellow Yellow Yellow Yellow Yellow Yellow Yellow Orange Red Red Red Red Yellow orange red

CRISIS CODE LEVEL


CRISIS ALERT LEVEL COLOR CODE RESPONSE TIME

Yellow

Orange

Red

The Mean time the Crisis Management Team response to a Level 1 emergency shall be within 4 hours from the time the crisis is escalated. The Mean time the Crisis Management Team response to a Level 1 emergency shall be within 2 hours from the time the crisis is escalated. The Mean time the Crisis Management Team response to a Level 1 emergency shall be within 1 hour from the time the crisis is escalated.

F. EMERGENCY HANDLING ORIENTATION


Regular workshop in handling emergency is a very important means of preparing the O&M personnel and the Crisis management team. The preparation could be as regular training and as test of the content of the emergency binder.

The head of the Crisis Management Team who is responsible for emergency planning should make an emergency training plan for all member of the Quick Reaction Team and the O&M personnel. The plan should be followed and followed up.
Additionally, the Head of Crisis Management team can schedule a CRISIS DRILL to determine the responsiveness and readiness of the team and validate the affectivity of the crisis management plan.

10 Rules of Network Safety


1. Only Qualified Personnel is allowed to conduct any Network Activities at all times: The implementing party who will perform the activity on the LIVE" Network must be a trained personnel in order to eliminate the chances of network outages or abnormalities occurring due to incompetence, lack of skills or inexperience. 2. Always secure the necessary clearances and approvals: All network activities that need to be conducted in a LIVE network environment, must be covered by an approved WORAP. The MOP shall be strictly followed, only carrying out specific network activities it covers. 3. Follow the Standards Operating Procedure recommended by vendors for both Hardware & Software at all times. 4. Ensure that activities are implemented based on the prescribed maintenance window: As a general rule, Network activities in a LIVE network environment will be implemented during off-peak traffic hours which is from 1:00 am to 5:00 am. If there will be necessary exceptions, it must have the written permission of Head of NOC and NOAT. 5. Confirm the operating state of the network before starting the activity: Before entering the site or accessing a network element remotely to perform an approved activity, the implementing personnel must advise the NOC team and wait for confirmation to proceed before doing so. He must also verify that there are no relevant alarms existing prior to starting the activity. 6. Verify that the network equipment is operating normally after completion of the activity: After having performed the approved Network activity successfully on-site or remotely, the implementing personnel must advise the NOC team on the conclusion of the activity and get confirmation that No ALARMS or any other abnormalities have manifested arising out from the activities performed. 7. Only implement activities that are covered by an approved WORAP: If there are certain operations than need to be implemented but not included in the WORAP/MOP, this operation must not be executed without the written permission of Head of NOC or NOAT. 8. In case of the outbreak of catastrophic outage, immediately contact the Head of crisis management team for that day: The NOC duty engineer must notify the crisis management team within 15 minutes from detection. 9. Never connect unauthorized devices before, during and after the activity: It is not allowed to connect personal portable devices or memory medium such as erasable compact disc, USB and portable hard disk to all network equipment if this is not a requirement of the WORAP. 10.Playing network games or logging on unauthorized websites from any NMS/OMC/maintenance terminals are STRICTLY not allowed.

END OF PRESENTATION

You might also like