You are on page 1of 26

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

NETWORK OPERATING CENTER PROCESSES & ROLES


REDACTEUR DU DOCUMENT
Camille Bertrand KITE camille.kite@aes.com tel 00 237 79 50 01 55

IDENTIFICATION DU DOCUMENT
Fichier Word :NOC process & roles - English.doc

ETAT DU DOCUMENT

En cours dlaboration En cours de modification Valid

En cours de correction En cours de validation Prim

Les corrections dsignent des volutions mineures du document (nouvelle release). Les modifications dsignent des volutions importantes du document (nouvelle version).
CONFIDENTIALITE DU DOCUMENT

Document public Document sensible (quipe dexploitation)

Document priv (usage interne socit) Document trs sensible (destinataires uniquement)

12/03/2008

Page 1

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

MATRISE DU DOCUMENT
VERIFICATION DU DOCUMENT
Vrificateur Approbateur Nom Bernard Fanga Date Visas BF

LISTE DE DIFFUSION
Destinataires Coordonnes

VOLUTIONS DU DOCUMENT
Version 1.0 Date 11/10/2010 Opration et commentaire Mise--jour

12/03/2008

Page 2

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

SOMMAIRE
1. 1.1 1.2 1.3 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3. 3.1 3.2 3.3 3.4 4. 5. 6.
ROLES

4 4 4 4 5 6 6 6 6 6 6 7 7 9 9 10 11 11 13 15 16

Monitor Manage Troubleshoot


FUNCTIONS

Performance monitoring Status monitoring Alert management Policy monitoring Quality insurance Reporting Schedule Documentation
ESCALATION LEVEL BY CARRIER

MTN Orange CAMTEL ITC GLOBAL ESCALATION LEVEL - AES SONEL


ESCALATION PROCESS NOC AES SONEL INCIDENT MANAGEMENT

12/03/2008

Page 3

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

1. ROLES
AES SONEL NOC is a team that has some important roles inside IT infrastructure system. In general, that team has to: Monitor Manage Troubleshoot

1.1 MONITOR
NOC has to monitor the entire network system. It is concerning Routers, Switches, UPS, Servers and so on, in such a way to manage the WAN, MAN and LAN in all AES-SONEL facilities. In those equipments, there are some resources such as CPU, Memory, NIC,

Storage/Flash, Power supply, level of charge of battery, Temperature, interfaces in networks equipments, Links etc. We have to check information and collect the events because they can help to be proactive.

1.2 MANAGE
Some parameters have to be optimized in those equipments. Some formulas have to be filled and approved by a committee to get the right to change something one those equipments. NOC has also the ability to adapt reports to the need of the management; that should help for taking the right decision.

1.3 TROUBLESHOOT
When an incident happens and the source is not know by anybody, NOC has to check and find the problem, and when its found, they can fix that if they have the authority or transfer it to the right service or person in charge of that task.

12/03/2008

Page 4

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

2. FUNCTIONS
NOC has a huge part of functionalities around the IT system. Their main functions are to: Level 1: Performance monitoring Status monitoring Policy monitoring Incident management Open, update, and close trouble tickets. Periodical activities Reporting Documentation Other duties as assigned

Level 2 Monitor data communications networks to ensure that networks are available to all system users. Monitor Datacenter infrastructures Resolve and document data communications problems. Develop and follow troubleshooting procedures in an effort to resolve problems. Contact users to correct and maintain network operations. Escalate problems as needed to engineering staff. Records daily network statistics. Open, update, and close trouble tickets. Update documentation to record new equipment installed, new sites, and changes to configurations. Coordinate installation of communications equipment. Install communications equipment. Schedule operations on IT Facilities Quality insurance Other duties as assigned

12/03/2008

Page 5

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

According to those functions, NOC is working as a back-end team to solve problem that concern a site, a building or the global network. End user problems are taken care by HelpDesk.

2.1 PERFORMANCE MONITORING


Network equipments have to be followed every time. NOC has to check how resources are used during a day. If there are some alarms (high consumption of CPU, Memory, etc.), NOC team has to notify the problem.

2.2 STATUS MONITORING


NOC team has to follow the status of each equipment installed on the network. If its down or unreachable, it has to notify that incident as soon as possible by following the procedure of escalation.

2.3 ALERT MANAGEMENT


NOC team has to alert when an incident occurs. If it cannot solve the problem, the escalation has to be done as soon as possible according to the contract with suppliers, or send the incident to the Network team to solve if its internal.

2.4 POLICY MONITORING


To define in next version

2.5 QUALITY INSURANCE


To define in next version

2.6 REPORTING
NOC team has to produce weekly and monthly reports about performance and status monitoring of network equipments NOC team has to produce KPI monthly reports
12/03/2008 Page 6

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

We can summarize it: Weekly reports: o o o Availability of the network

Monthly reports: KPI report of the network Availability of the entire network, each MAN/WAN Operator links

2.7 SCHEDULE
According to NOC team tasks, we have to put in place a NOC schedule that should be based in 02 teams working in round robbing. This is the time: Group 01: from 07H00 to 15H30 with break from 12H00 to 13H00 Group 02: from 10H00 to 18H30 with break from 13H00 to 14H00 During the month, the two groups should work alternatively: o o Group 01: 1st and 3rd weeks Group 02: 2nd and 4th weeks

On Saturday, the group which starts at 07H00 should work from 08H00 to 14H00

2.8 DOCUMENTATION
Each incident has to be documented. Each action made by suppliers or employee has to have a report of intervention.

2.9 PREREQUISITES:
LEVEL 1:

Network Operations Center Knowledge on network operations. Knowledge of layer data communications protocols. Previous experience with tools used in monitoring the Ability to troubleshoot network problems effectively in a network operations environment.
12/03/2008 Page 7

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

Maintain a broad knowledge of all products, service and NOC procedures. Strong interpersonal, verbal, and written communication skills. Excellent organizational, multitasking, prioritizing, and teamwork skills. Ability to work independently with little supervision. Ability to qualify for security clearance.

LEVEL 2:

Network Operations Center Knowledge on CISCO routers and switches, VSAT network operations. Knowledge of layer data communications protocols. Ability to verify that switches and routers as well as their configured network services and protocols, operate as intended within a given network specification. Previous experience with tools used in monitoring the network including datacenter management tools Ability to troubleshoot network problems effectively in a network operations environment. Maintain a broad knowledge of all products, service and NOC procedures. Strong interpersonal, verbal, and written communication skills. Excellent organizational, multitasking, prioritizing, and teamwork skills. Ability to work independently with little supervision. Ability to qualify for security clearance.

12/03/2008

Page 8

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

3. ESCALATION LEVEL BY CARRIER


Escalation is done according to the present contracts signed between the two parts. Escalation is maintained until the incident is closed.

3.1 MTN
HOTLINE number to call for incidents: 7126

Level
1 2 3

Delay
Immediately hour 1 hour NOC MTN NOC Coordinator

Contact

Solution & Service Support Manager Account Manager Senior Operations Manager Senior Manager Corporate Sales Chief Technical Officer

4 5

2 hours 4 hours

Following the persons in place included in escalation list:

Contact
NOC MTN NOC Coordinator Solution and Service Support Manager Account Manager Senior Operations Manager Senior Manager Corporate Sales Chief Technical Officer

Persons
Network Operations Center Armand Pichele Samuel PII Augustin MIAFFO Pierre Paul BISSOMBI Alain MORE

Cellular
79 00 92 13 77 55 04 61 77 55 02 58 77 55 03 51 77 55 10 97

E-mail
noc@mtncameroon.net Pichel_A@mtncameroon.net Pii_s@mtncameroon.net Miaffo_A@mtncameroon.net Bissom_p@mtncameroon.net

77 55 05 13

More_A@mtncameroon.net

Gilbert NGONO

77 55 10 01

Ngono_C@mtncameroon.net

12/03/2008

Page 9

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

3.2 ORANGE
HOTLINE number to call for incidents : 96 40 04 00

Level
1 2

Delay
Immediately 4 hours NOC ORANGE

Contact
Responsable infrastructure Directeur des Oprations Senior Project Manager Directeur OCMS DGA en Charge technique et administratif

More than 4 hours / critical incident

Following the persons in place included in escalation list:

Contact
NOC ORANGE Responsable infrastructure Directeur des Oprations Senior Project Manager Directeur OCMS DGA en Charge technique et administratif

Persons
Network Operations Center Martin BIYICK

Cellular
96 40 04 00

E-mail
supporttechnique.internet@orange.cm

99 94 98 81

martin.biyick@orange.cm

Serge NAFTEUR Daniel Parfait NLEND Jean Michel CANTO

99 94 28 38

serge.nafteur@orange.cm

99 94 12 20

danielparfait.nlend@orange.cm

99 94 01 04

jeanmichel.canto@orange.cm

Alain MARQUIS

99 94 08 08

alain.marquis@orange.cm

12/03/2008

Page 10

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

3.3 CAMTEL
HOTLINE number to call for incidents: ?

Level
1a 1b 1c 2 3 4 5

Delay
Immediately

Contact
Salle dexploitation SAT3 Service Internet Douala Service Internet Yaound

1 hour 2 hours 4 hours 6 hours

Responsable technique littoral Division service aprs-vente Gestionnaire du compte AES Responsable commerciale

Following the persons in place included in escalation list:

Contact Salle dexploitation SAT3

Persons DONFACK LOWE Bertin AKOUA Anicet KAMA Bienvenu EYOUM KOUADIO Chantal NJI AWA Mathias

Cellular 33410755

E-mail

bertinlowe2002@yahoo.fr

Service Internet Douala

22 02 04 53

Service Internet Yaound Reponsable technique Littoral Division service aprsvente Gestionnaire du compte AES Responsable commerciale

22 00 73 33

33 02 13 55

22 02 01 12

33 00 30 03

22 00 12 91

12/03/2008

Page 11

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

3.4 VSAT NETWORK (TO BE DETERMINED)


...

12/03/2008

Page 12

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

4. ESCALATION LEVEL - AES SONEL


NOC number : 5777

Level
1

Delay
Immediately NOC AES SONEL

Contact

hour

Network Team ; Infrastructures team ; Application team Chef division ; Sous-directeur de linfrastructure ; Chef de division Infrastructures et Rseaux

3/4 hour

1 hour

DSI

Following the persons in place included in escalation list:

Contact
NOC

Personnes

Cellulair e

E-mail
sonel.noc@aes.com sylvain.bithe@aes.com intern.tbell@aes.com nicolas.tongo@aes.com

NOC AES SONEL

Sylvain BITHE Theodore BELL Nicolas TONGO MOUSSONGO

Gabriel OYONO Network Team Felix NGOH Daniel Claude MOFEN Sidonie MBWANG
Camille.kite@aes.com aimeclaude.tampoo@aes.com gabriel.oyono@aes.com felix.ngoh@aes.com

Infrastructures Team

Camille KITE Christian N. AWOMO E. Aim Claude TAMPOO

christian.awomo@aes.com

12/03/2008

Page 13

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

Chef Service ITSC Application Team Leader Network and Infra Team Leader Sous-directeur DSI

DOOH EDIBE EWANDJE Joseph Mose Christian NOLA ZE

Edibe.dooh@aes.com

Christian.nolaze@aes.com

Bernard FANGA

Bernard.fanga@aes.com

Jean Louis NGAMBY

jeanlouis.ngamby@aes.com

12/03/2008

Page 14

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

5. ESCALATION PROCESS NOC AES SONEL


QUI QUAND COMMENT
- Open an incident ticket

When the link WAN/MAN is unreachable

ESCALATION 1: Immediately - Define the liste LIST of persons to alert: * NOC Provider * NOC AES SONEL * NETWORK AES SONEL

Send the alert by mail to all destinations in LIST

Incident solved after hour ?


NO

YES

NOC

hour after the incident occurs

ESCALATION 2: hour - Add persons in the LIST

Send the alert by mail to all destinataires LIST with the time delay of the incident

Incident solved after 1 hour?


NO

YES

ESCALATION 3: 1 hour - Add some persons in the LIST

Every hour the incident is not solved

Send the alert by mail to all destinataires LIST with the time delay of the incident

NO

Stop the alert

Incident solved after X heures?

YES

- Close the incident

12/03/2008

Page 15

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

6. INCIDENT MANAGEMENT
Incident are opened every event occur when we have one link which goes down. Information needs for managing the incident are: Incident number identifier Date and hour the incident occurs Type of incident Description of the incident Multiple interventions to solve the incident (TSP & ISP / AES SONEL) Date and hour the incident is closed Time passed from the incident detection till the resolution Optionally, the copy of the work permit for intervention Reporting of the troubleshooting operations carried out At the end of each week and each month, we can get a report about how many incidents are opened, closed. They are useful for statistics and analysis of reaction when an even occurs Carrier and AES SONEL. They can also be used as penalties for payments. These are reports that can be useful: - Opened incidents, closed incidents, pourcentage of incident solved - Incidents opened and closed by provider, with min, max and average times to solve the incident - Incidents opened and closed by level of escalation

7. NETWORK TROUBLESHOOTING OVERVIEW


Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). You also continually evaluate and improve your network's performance. Because serious networking problems can sometimes begin as performance problems, paying attention to performance can help you address issues before they become serious.
12/03/2008 Page 16

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

7.1.1 About Connectivity Problems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN). Using management tools, you can often fix a connectivity problem before users even notice it. Connectivity problems include:

Loss of connectivity - When users cannot access areas of your network, your organization's effectiveness is impaired. Immediately correct any connectivity breaks. Intermittent connectivity - Although users have access to network resources some of the time, they are still facing periods of downtime. Intermittent connectivity problems can indicate that your network is on the verge of a major break. If connectivity is erratic, investigate the problem immediately. Timeout problems - Timeouts cause loss of connectivity, but are often associated with poor network performance.

7.1.2 About Performance Problems

Your network has performance problems when it is not operating as effectively as it should. For example, response times may be slow, the network may not be as reliable as usual, and users may be complaining that it takes them longer to do their work. Some performance problems are intermittent, such as instances of duplicate addresses. Other problems can indicate a growing strain on your network, such as consistently high utilization rates. If you regularly examine your network for performance problems, you can extend the usefulness of your existing network configuration and plan network enhancements, instead of waiting for a performance problem to adversely affect the users' productivity.
7.1.3 Solving Connectivity and Performance Problems

When you troubleshoot your network, you employ tools and knowledge already at your disposal. With an in-depth understanding of your network, you can use network software tools, such as "Ping", and network devices, such as "NMS", to locate problems, and then make corrections, such as swapping equipment or reconfiguring segments, based on your analysis. So you can:

12/03/2008

Page 17

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

Baseline the network's normal status to use as a basis for comparison when the network operates abnormally Precisely monitor network events Be notified immediately of critical problems on your network, such as a device losing connectivity Establish alert thresholds to warn you of potential problems that you can correct before they affect your network Resolve problems by disabling ports or reconfiguring devices

8. TROUBLESHOOTING STRATEGY
If you notice changes on your network, ask the following questions:

Is the change expected or unusual? Has this event ever occurred before? Does the change involve a device or network path for which you already have a backup solution in place? Does the change interfere with vital network operations? Does the change affect one or many devices or network paths?

After you have an idea of how the change is affecting your network, you can categorize it as critical or noncritical. Both of these categories need resolution (except for changes that are one-time occurrences); the difference between the categories is the time that you have to fix the problem. By using a strategy for network troubleshooting, it is possible to approach a problem methodically and resolve it with minimal disruption to network users. It is also important to have an accurate and detailed map of your current network environment. Beyond that, a good approach to problem resolution is:

Recognition Symptoms Understanding the Problem Identifying and Testing the Cause of the Problem Solving the Problem

8.1 RECOGNITION OF SYMPTOMS

The first step to resolving any problem is to identify and interpret the symptoms. You may discover network problems in several ways. Users may complain that the network seems slow or that they cannot connect to a server. You may pass your network management station and notice that a node icon is red. Your beeper may go off and display the message: WAN connection down.

12/03/2008

Page 18

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

8.1.1.1 User Comments Although you can often solve networking problems before users notice a change in their environment, you invariably get feedback from your users about how the network is running, such as:

They cannot print. They cannot access the application server. It takes them much longer to copy files across the network than it usually does. They cannot log on to a remote server. When they send e-mail to another site, they get a routing error message. Their system freezes whenever they try to Telnet.

8.1.1.2 Network Management Software Alerts Network management software, as described in "Your Network Troubleshooting Toolbox", can alert you to areas of your network that need attention. For example:

The application displays red (Warning) icons. Your weekly Top-N utilization report (which indicates the 10 ports with the highest utilization rates) shows that one port is experiencing much higher utilization levels than normal. You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded.

These signs usually provide additional information about the problem, allowing you to focus on the right area. 8.1.1.3 Analyzing Symptoms When a symptom occurs, ask yourself these types of questions to narrow the location of the problem and to get more data for analysis:

To what degree is the network not acting normally (for example, does it now take one minute to perform a task that normally takes five seconds)? On what subnetwork is the user located? Is the user trying to reach a server, end station, or printer on the same subnetwork or on a different subnetwork? Are many users complaining that the network is operating slowly or that a specific network application is operating slowly? Are many users reporting network logon failures? Are the problems intermittent? For example, some files may print with no problems, while other printing attempts generate error messages, make users lose their connections, and cause systems to freeze.

12/03/2008

Page 19

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

8.2 UNDERSTANDING THE PROBLEM

Networks are designed to move data from a transmitting device to a receiving device. When communication becomes problematic, you must determine why data are not traveling as expected and then find a solution. The two most common causes for data not moving reliably from source to destination are:

The physical connection breaks (that is, a cable is unplugged or broken). A network device is not working properly and cannot send or receive some or all data.

Network management software can easily locate and report a physical connection break (layer 1 problem). It is more difficult to determine why a network device is not working as expected, which is often related to a layer 2 or a layer 3 problem. To determine why a network device is not working properly, look first for:

Valid service - Is the device configured properly for the type of service it is supposed to provide? For example, has Quality of Service (QoS), which is the definition of the transmission parameters, been established? Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted? For example, is a firewall set up that prevents that device from accessing certain network resources? Correct configuration - Is there a misconfiguration of IP address, subnet mask, gateway, or broadcast address? Network problems are commonly caused by misconfiguration of newly connected or configured devices.

8.3 IDENTIFYING AND TESTING THE CAUSE OF THE PROBLEM

After you develop a theory about the cause of the problem, test your theory. The test must conclusively prove or disprove your theory.

12/03/2008

Page 20

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

Two general rules of troubleshooting are:


If you cannot reproduce a problem, then no problem exists unless it happens again on its own. If the problem is intermittent and you cannot replicate it, you can configure your network management software to catch the event in progress.

Although network management tools can provide a great deal of information about problems and their general location, you may still need to swap equipment or replace components of your network until you locate the exact trouble spot. After you test your theory, either fix the problem as described in "Solving the Problem" or develop another theory.
8.3.1.1 Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident. On your network, a user cannot access the mail server. You need to establish two areas of information:

What you know - In this case, the user's workstation cannot communicate with the mail server. What you do not know and need to test Can the workstation communicate with the network at all, or is the problem limited to communication with the server? Test by sending a "Ping" or by connecting to other devices. Is the workstation the only device that is unable to communicate with the server, or do other workstations have the same problem? Test connectivity at other workstations. If other workstations cannot communicate with the server, can they communicate with other network devices? Again, test the connectivity.

The analysis process follows these steps:

1 . Can the workstation communicate with any other device on the subnetwork?

If no, then go to step 2. If yes, determine if only the server is unreachable. If only the server cannot be reached, this suggests a server problem. Confirm by doing step 2. If other devices cannot be reached, this suggests a connectivity problem in the network. Confirm by doing step 3.
12/03/2008 Page 21

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

2 . Can other workstations communicate with the server?


If no, then most likely it is a server problem. Go to step 3. If yes, then the problem is that the workstation is not communicating with the subnetwork. (This situation can be caused by workstation issues or a network issue with that specific station.)

3 . Can other workstations communicate with other network devices?


If no, then the problem is likely a network problem. If yes, the problem is likely a server problem.

When you determine whether the problem is with the server, subnetwork, or workstation, you can further analyze the problem, as follows:

For a problem with the server - Examine whether the server is running, if it is properly connected to the network, and if it is configured appropriately. For a problem with the subnetwork - Examine any device on the path between the users and the server. For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server.

8.3.1.2 Equipment for Testing

To help identify and test the cause of problems, have available:

A laptop computer that is loaded with a terminal emulator, TCP/IP stack, TFTP server, CD-ROM drive (to read the online documentation), and some key network management applications. With the laptop computer, you can plug into any subnetwork to gather and analyze data about the segment. A spare managed hub to swap for any hub that does not have management. Swapping in a managed hub allows you to quickly spot which port is generating the errors. A single port probe to insert in the network if you are having a problem where you do not have management capability. Console cables for each type of connector, labeled and stored in a secure place.

8.3.2 Solving the Problem

Many device or network problems are straightforward to resolve, but others yield misleading symptoms. If one solution does not work, continue with another. A solution often involves:

12/03/2008

Page 22

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

Upgrading software or hardware (for example, upgrading to a new version of agent software or installing Gigabit Ethernet devices) Balancing your network load by analyzing: What users communicate with which servers What the user traffic levels are in different segments

Based on these findings, you can decide how to redistribute network traffic.

Adding segments to your LAN (for example, adding a new switch where utilization is continually high) Replacing faulty equipment (for example, replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems, have available:


Spare hardware equipment (such as modules and power supplies), especially for your critical devices A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

9. SERVER MONITORING POLICY


OVERVIEW This server monitoring policy is an internal IT policy and defines the monitoring of servers in the organization for both security and performance issues. PURPOSE This policy is designed both to protect the the organization against loss of service by providing minimum requirements for monitoring servers. It provides for monitoring servers for file space and performance issues to prevent system failure or loss of service. SCOPE This policy applies to all production servers and infrastructure support servers including but not limited to the following types of servers: 1. 2. 3. 4. 5. 6. 7. File servers Database servers Mail servers Web servers Application servers Domain controllers FTP servers
12/03/2008 Page 23

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

8. DNS servers DAILY CHECKING All servers shall be checked manually on a daily basis the following items shall be checked and recorded: 1. The amount of free space on each drive shall be recorded in a server log. 2. Services shall be checked to determine whether any services have failed. 3. The status of backup of files or system information for the server shall be checked daily. EXTERNAL CHECKS Essential servers shall be checked using either a separate computer from the ones being monitored or a server monitoring service. The external monitoring service shall have the ability to notify multiple IP personnel when a service is found to have failed. Servers to be monitored externally include: 1. 2. 3. 4. 5. The mail server The web server External DNS servers Externally used application servers. Database or file servers supporting externally used application servers or web servers.

12/03/2008

Page 24

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

CATEGORIES

Liens Rseaux

ACTIONS OBSERVATIONS Noter les statistiques via les dashboards des NMS Prendre connaissance des resultats du derniers Sites testing Verifier et mettre jour la liste des Incidents en cours de resolution Etablir le rapport de disponibilit via Nagios pour MTN, OCMS, Camtel, AES Communiquer les informations aux quipes connexes Verifier l'tat des UPS via Nagios ou Centreon Verifier l'tat des groupes lectrognes Verifier l'tat des Servers Verifier l'tat des systmes de refroidissement Vrifier l'intgrit des quipement Rseaux (Router, Switch, Firewall, ) SAGE 1000 BSA ORACLE DB xSQL DB CITRIX INTERNET WEB ACCESS Outlook Web Access AES SONEL CONTACT All others Internal Portal DHCP DNS Page 25

STATUTS

Infrastructures

Services

12/03/2008

NETWORK OPERATING CENTER


CHECK LIST AND PROCEDURES

12/03/2008

Page 26

You might also like