You are on page 1of 12

TECHNICAL COMMUNICATION OmniPCX 4400

URGENT NOT URGENT

No. TC0296
Nb of pages : 11

Date : 15-03-2002

TEMPORARY

PERMANENT

SUBJECT : COLLECTING INFORMATION IN CASE OF CPU CRASHES This technical communication cancels and replaces the technical communication TC0057. This technical communication gives tricks and trips about CPU problems like reboot, stop of the telephone application or no possibility to connect on the system.

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

CONTENTS
1. 2. INTRODUCTION ........................................................................ 3 TRICKS AND TRIPS ABOUT CPU PROBLEMS ................................ 3
Who has initialised the reboot of the system? ......................................... 3 The telephone application stops ............................................................. 4 One CPU reboots continuously ............................................................... 4
2.1.1. Automatic shutdown ............................................................................................3

2.1. 2.2. 2.3.

2.3.1. Problem with the IO2 configuration .....................................................................4 2.3.2. Problem with the IO2 board ................................................................................5 2.3.3. Problem with OPS files.........................................................................................5

2.4. 2.5. 2.6. 2.7.

Database corruption............................................................................... 5 Checking the installation of the software version .................................... 6 Checking the V24 ports .......................................................................... 6 Capturing the system information........................................................... 6

3. 4.

HARDWARE INVESTIGATIONS ................................................... 7 INFORMATIONS TO BE COLLECTED TO OPEN A SERVICE REQUEST ................................................................................... 7


Description of the problem ..................................................................... 7 Description of the investigations ............................................................. 7 Log files of the incidents ......................................................................... 8 Log files of the telephone exceptions ...................................................... 8 Log files of the black boxes .................................................................... 8 Log files of the telephone application ..................................................... 9 Log files of Chorus ................................................................................. 9 Log files of the system ............................................................................ 9

4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. 4.8.

Ed. 15-03-2002

TC0296

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

4.9.

List of the boards of the installation........................................................ 9

4.10. Configuration of the system.................................................................. 10 4.11. Type of CPU ......................................................................................... 10 4.12. References of the CPU .......................................................................... 10 4.13. Detection of a memory corruption ........................................................ 10 4.14. Crashdump file .................................................................................... 10

TC0296

Ed. 15-03-2002

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

1.

INTRODUCTION

This technical communication gives tricks and trips about CPU problems like reboot, stop of the telephone application or no possibility to connect on the system. In case no solution is found, some hardware checking will have to be done. If the solution is still not found, a Service Request of observation sheet type will have to be opened by the Business Partner. The observation sheet will contain a list of information described at the end of this document. Many different log files are stored on the system. Each reset of the system will replace some previous log files by new ones. The log files regarding one specific problem must be collected as soon as possible before the information is lost.

2.
2.1.

TRICKS AND TRIPS ABOUT CPU PROBLEMS


Who has initialised the reboot of the system?

2.1.1. Automatic shutdown Edit the text files /DHS3dyn/incid/incpbm.1, /DHS3dyn/incid/incpbm.2 /DHS3dyn/incid/incpbm.3.

A line containing mailsys asks shutdown means that the software application has launched a reboot itself. There was no manual action done by anybody to start the shutdown.
0005 Fri Jan 25 15:42:30 2002 mailsys asks 'echo `ps|wc -l` processes running&' 0006 Fri Jan 25 15:42:30 2002 mailsys asks 'shutdown &' at 25/01/02 15:42:30 ; S 0007 Fri Jan 25 15:42:31 2002 mailsys asks 'echo `ps|wc -l` processes running&'

In this case, the list of the incidents on the system will then contain a gravity 0 incident that confirms the reboot. The reboot was started because some failure was detected in the system. The reason of the failure may also be indicated in the incidents list. Note When analysing the incidents of the system, check that there is no filter and all the incidents are displayed. Even some incidents with lower priority can indicate the reason of the reboot.

Ed. 15-03-2002

TC0296

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

2.2.

The telephone application stops

The telephone stopped but the system does not restart. There is no V24 or IP access on the system. The screen on the console port remains blank. Please try to generate manually a crashdump. The crashdump is a copy of the complete memory of the system. The crashdump file will be have to be added to the observation sheet. On the console port only, use the following commands: start a "Capture Text" on the Windows terminal, to enter in the kernel debugger, press : Ctrl G then Ctrl O the following message appears: ** Entry in the debugger requested, ** Press <Return> within 30 seconds to enter into the kernel debugger press the key Return immediately, the following message appears: --- Enter Debugger kdbg> to start the crashdump: kdbg> ED It will take a few seconds. to reboot the system: kdbg> ZZ the "Capture Text" will have to be added to the observation sheet with the crashdump file; see 4.14 for the way to extract the crashdump file.

2.3.

One CPU reboots continuously

2.3.1. Problem with the IO2 configuration If there are IO2/IO2N boards, please check that the management of these boards is correct and identical on the Main and Stand-By CPU. After the reboot of the CPU, stop the launch of the telephone application. From the software version C1.712, the following commands will permit to consult and modify the management of the IO2 boards even if the telephone is stopped: login : mtcl a4400> RUNMAO a4400> mgr
TC0296

Ed. 15-03-2002

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

2.3.2. Problem with the IO2 board When the IO2 board is available and managed, it has the switching role. It takes place of the embedded IO1 in the CPU. When the reboots cannot be explained, it is interesting, for test, to replace it by an IO2N board if possible; as the software is different, the reaction of the system will be also different and can provide information on the initial default. Attach the result of this test to the observation sheet. Note The IO2N is only taken into account from some software releases. Please refer to the technical communication TC0192 - Installation procedure of IO2N boards. When an IO2 board or IO2N board is installed with a CPU, the same type of IO2 board must be installed with the duplicated CPU (if any). 2.3.3. Problem with OPS files Normally, the same OPS files are installed on both CPU. If it is not the case and the field PARA_MAO 1 of the file hardware.mao is different, a permanent CPU reboot can occur with the incidents 2076 or 2070: 2076 = Size of a TEL's or remanent's region is not the same on main and stand-by CPU or for Releases 1.4/2.x : 2070 = Swapping mode is not the same on main and stand-by CPU This may occur when you install new OPS files on the Stand By CPU and there is a modification of the size of remanent data. In this case, the telephone application will have to be completely stopped to install the new OPS files on both CPU.

2.4.

Database corruption

In case of a database corruption, some incidents will be listed in the incidents file. You can also use the following commands to check the database: a4400> cd /DHS3data/mao a4400> checkinitrem If a corruption of the database is suspected, you have to restore a save of the database. In the recent versions, a save of the database is by default automatically done every day on the hard disk of the CPU. In order to restore a save of database, please use the following commands: login : swinst Option 4 : Select Save & Restore operations Option 4 : Select Restore operations Option 2 : Select Restore from cpu disk Choose the save file to restore among several present on the disk.

Ed. 15-03-2002

TC0296

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

2.5.

Checking the installation of the software version

The integrity test checks that the software version has been correctly installed on the CPU. Please use the following commands: login : swinst Option 8 : Select the menu Software identity display Option 6 : Application software validity checking Select the partition to be checked. Check the result of the integrity test. In case of problem (a message of "Checksum incorrect" type is displayed), the software must be loaded again.

2.6.

Checking the V24 ports

In case of a CPU3: if a modem or TA is connected to a V24 port with login, it must not be managed as follows: replies Hayes codes to commands and local echo for a modem, a welcome menu on TA,

check the presence of an client application which will dialog with a V24 port of the CPU. a loop can occur if the V24 port of the CPU is managed with a login. The system could reboot. Trick Use the following command to verify the activity of the V24 ports : a4400> sar y 1 <nb> (where <nb> = number of "scan" on the port ; example : 20 ) The parameter 1 means a scan every second. This command provides the number of bytes sent and received on the serial ports.
(410)xa004010> sar -y 1 20 Chorus xa004010 MiX V.3.2r4.1.5 r4.1.5 COMP-386 01/25/102 15:25:18 rawch/s canch/s outch/s rcvin/s xmtin/s mdmin/s 15:25:19 0 0 124 0 124 0 15:25:20 0 0 58 0 58 0 15:25:21 0 0 58 0 58 0 15:25:22 0 0 58 0 58 0 etc up to 20 lines

Check the outch/s field; it indicates the number of characters sent on all V24 ports.

2.7.

Capturing the system information

In case of frequent CPU problems without any explanation, start a continuous "Capture Text" on the Windows terminal of the PC connected to the console port. The system messages are not stored and are output only on this port. This trace could be added in the observation sheet.

TC0296

Ed. 15-03-2002

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

3.

HARDWARE INVESTIGATIONS

When the software checks have been done and the problem is not fixed, some investigations must be done on the hardware items like : CPU board : swap the CPU with another board, power supply, grounding connection, default on ET board (located in the bottom left-hand cabinets), power rectifier too low, back panel, external environment, etc.

4.

INFORMATIONS TO BE COLLECTED TO OPEN A SERVICE REQUEST

Each time there is a CPU crash, the hardware has been checked and you cannot find the reason or provide a solution, an observation sheet must be prepared with all the information described below.

4.1.

Description of the problem

The observation sheet will describe the problem in details and answer the following questions: Is the telephone still running ? Did the system make a reset ? Did the system make a reset by itself ? Did the system recover by itself from the problem ? Did somebody manually do something to reboot the system ? What is the display on the UA sets during the problem ? Do you have tone ? Can you connect on the system during the problem ? Is it a new installation ? If this is a new problem on an old system, what has been modified on the installation ? What is the frequency of the problem. When did it happen (date and time) ? Is the OPS configuration conformed to customer requirements (traffic, virtual sets, etc.).

4.2.

Description of the investigations

The observation sheet will indicate the investigations that were conducted on site: board swapping, item replacement, checking, etc. These information will avoid the TS to ask to test something that was already tested on site.

Ed. 15-03-2002

TC0296

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

4.3.

Log files of the incidents


<---------"incidents 2"--------><-------"incidents 1"--------><-----"current incidents"

The log files of the incidents are stored on the system as follows:
-------------------------------------------------------------------------------------------------------> time reboot 2 reboot 1 last reboot NOW

Add the results of the following commands in the observation sheet : a4400> incvisu a4400> incvisu -1 a4400> incvisu -2 Check that all the incidents are displayed and no incident is filtered.

4.4.

Log files of the telephone exceptions


<--------"exceptions 2"--------><-------"exceptions 1"-------><---"current exceptions"

The log files of the exceptions are stored on the system as follows:
-------------------------------------------------------------------------------------------------------> time reboot 2 reboot 1 last reboot NOW

Add the results of the following commands in the observation sheet: a4400> excvisu a4400> excvisu -1 a4400> excvisu -2

4.5.

Log files of the black boxes


<--------blackbox.3------><--------blackbox.2---------><------blackbox.1---------><-------blackbox

The log files of the black boxes are stored in the directory /tmpd as follows :
------------------------------------------------------------------------------------------------------------> time reboot -3 reboot 2 reboot 1 last reboot NOW

Add the results of the following commands in the observation sheet: a4400> readbbox a4400> readbbox -1 a4400> readbbox -2 a4400> readbbox -3

TC0296

Ed. 15-03-2002

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

4.6.

Log files of the telephone application


<----DHS3-INIT.log3----><-----DHS3-INIT.log2------><-----DHS3-INIT.olog----><---DHS3-INIT.log

The log files of the telephone application are stored in the directory /tmpd as follows:
------------------------------------------------------------------------------------------------------------> time reboot -3 reboot 2 reboot 1 last reboot NOW

Add the copy of the four text files in the observation sheet : /tmpd/DHS3-INIT.log, /tmpd/DHS3-INIT.olog, /tmpd/DHS3-INIT.log2, /tmpd/DHS3-INIT.log3.

4.7.

Log files of Chorus

The log files of Chorus are stored in the directory /etc as follows :
-------boot.log3-----><------------boot.log2----------><----------- boot.log1--------><------- boot.log

-------------------------------------------------------------------------------------------------------> time reboot 2 reboot 1 last reboot NOW

Add the results of the following command in the observation sheet : a4400> traceboot v

4.8.

Log files of the system

The log files of the system are stored in the directory /DHS3dyn/incid as follows:
----incpbm.3-----<-----------incpbm.2------------><----------incpbm.1----------><----------incpbm

-------------------------------------------------------------------------------------------------------> time reboot 2 reboot 1 last reboot NOW

Add the copy of the three text files in the observation sheet: /DHS3dyn/incid/incpbm.1, /DHS3dyn/incid/incpbm.2, /DHS3dyn/incid/incpbm.3.

4.9.

List of the boards of the installation

For each shelf of each node, add the results of the following command in the observation sheet: a4400> config x (x = number of shelf)

Ed. 15-03-2002

TC0296

OmniPCX 4400
COLLECTING INFORMATION IN CASE OF CPU CRASHES

4.10. Configuration of the system


Please indicate in the observation sheet: the number of nodes, the network configuration, the CCD configuration, the number of users, the presence of external applications such as "CMP" type board, etc.,

4.11. Type of CPU


Add the results of the following command in the observation sheet: a4400> uhwconf

4.12. References of the CPU


Please indicate in the observation sheet the complete technical references of: the CPU board, the memory, the hard disk, the processor board. The CPU must be unplugged to read the references of the different items.

4.13. Detection of a memory corruption


Add the results of the following commands in the observation sheet : login : root a4400 > /usr2/oneshot/mtch/memcheck In case of a memory corruption, messages Segment corrupted will be displayed.

4.14. Crashdump file


When a crashdump file is present on the disk, the following message is displayed when you log on the system: There is a dump on /dev/dsk/crashdump from <date>, <time> Use the following command to extract the crashdump: a4400> dumpsave r o /tmpd/mycrash (where "mycrash" is the name you give to the extract file). The size of the file is the size of the memory and thus is very big. Use the following command to compress the file on the system : a4400> gzip /tmpd/mycrash (where "mycrash" is the name of the file). Add the compressed file mycrash.gz to the observation sheet.

TC0296

10

Ed. 15-03-2002

You might also like