You are on page 1of 19

1/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

STC CU/ECU alarms handling procedure

STC CU/ECU alarms handling procedure

2/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

Table of contents: 1) Introduction 2) Issue Description 3) Technical Description 3.1) Main CU/ECU alarms description . 4) Clearance Actions 5) Recommendations

3/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

1) Introduction
The current document is intended to provide information about the CU/ECU alarm handling for STC network in order to improve the current behavior of alarms and HW issues in the CU/ECU module, it is based in the standard NSN operation documentation and should not be used as a replacement of such documentation but as a quick reference and support for the field engineers . For ex-Siemens BR equipment document number A50016-G5100-A001-0776K5 is available and all personal working with ex-Siemens GSM radio equipment should be familiar with it, please refer to this documentation for further technical details. The high system functionality of the base station system is achieved by means of systemintegrated routine tests. These routine tests continually check the correct functioning of the base station subsystems including the BTSEs. In most cases, the results of these routine tests are sufficient to localize the fault and clear it immediately at the BTSE. The modular design of the BTSE allows STC to clear a large percentage of faults in the system by replacing a defective module. Sometimes, however, it may happen that faults do not result from defective modules, but from interface problems in general (for example interrupted cables). In this case, special trouble shooting procedures for interfaces are provided. This maintenance concept guarantees a simple and fast fault clearance and leads to high operational efficiency.

2) Issue description
It has been found that the number of alarms received from the CU/ECU modules is higher than other vendors for equivalent modules within STC network. It is necessary to review the procedure being followed to clear these alarms and to study the current behavior and alarm clearance process.

3) Technical description
3.1) Main CU/ECU alarms description .
The following items provide the error description which has to be taken into account when troubleshooting any CU/ECU alarm. The more common service affecting alarms in STC network are listed. 3.1.1) The RF part of the CU/ECU has problems (34835) Error description: The power stage of the CU/ECU detected one of the following errors: Over temperature of the power stage Loop1 of Power stage does not close Loop2 of Power stage does not close FlexCU only: Over current alarm Other words: The other word information provides additional information about every single alarm generate by ex-Siemens BSS, it is mandatory to analyze this information to define the appropriate action to be taken (please refer to annex C for details). For the case of alarm numbered 34835 RF part of the CU/ECU has problems the other words information is explained as followed: 1718(D):

4/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

17=Bit field showing the errors of the power stage. If bit <> 0 -> error bit0: over temperature bit1: unused bit2: loop1 alarm (reduced output power) bit3: loop2 alarm (excessive output power) 18=if <> 0: over current alarm (on FlexCU only) As per above information for this alarm octet 17 can take the hexadecimal values 1 (overtemperature), 4 (reduced output power) or 8 (excessive output power).

3.1.2) Loss of Board (22356) Error description: Board lost. Communication supervision detects a loss of board. In case of the board loss alarm caused by ACT all associated alarms will remain in the same status e.g. if BCOM is connected and fails no alarm would be sent. Other words: In this case the other words information does not provide any hint to the field engineer. 3.1.3) RF power reflected warning (34910) Error description: The power amplifier detects reflected RF power. The error can be caused by: 1. Cable problems in the TX/RX path, especially a defective TX cable from the CU/ECU to the combiner or a defective antenna cable. 2. A defective CU/ECU or FlexCU As a consequence, the nominal output power is reduced for safety reasons: If TX power reduction = 0 dB, the output power is reduced by 6 dB. If TX power reduction = 2 dB, the output power is reduced by 4 dB. If TX power reduction = 4 dB, the output power is reduced by 2 dB. If TX power reduction > 4 dB, the power is not reduced. Further consequence: If after 20 seconds the error condition is removed, the alarm is cleared and the output power is set back to its original value. If the alarm persists after 20 sec, the error condition persists and the alarm 34892 is raised. Other words: Octet 17 = A value unequal zero indicates a VSWR problem 3.1.4) Device driver initialization failed (94) Error description: The initialization function of a device driver returned a non-zero error code to the OS, i.e. the initialization failed. The device driver can be identified by the address of its initialization function; its name can be found in the linker map file.

5/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

Other words: In this case the other words information does not provide any hint to the field engineer.

3.1.5) RF power reflected into power stage (34892) Error description: The power amplifier detected reflected RF power. So either the connection between the power amplifier and the combining equipment is not correct or the power amplifier itself does not work correctly. Other words: In this case the other words information does not provide any hint to the field engineer. 3.1.6) No startup-request after reset (24603) Error description: Reset-supervision failed: The core did not receive a startup-request message from the peripheral board after it was brought into boot-phase by reset. If this FER from object CU/ECU occurs sporadically, it can be ignored. This can occur when Booter falsely starts test SW instead of system SW because of a wrong transmission of test mode bits via CC Link. Other words: Not applicable 3.1.7) Hardware error (4110) Error description: A hardware error was detected on one of the following BTSE boards: CCTRL, BBSIG/BBSIG44, TRXD/TRXD2(TPU), ALCO, TRXA, CCLK, LI, CORE, CU/ECU Possible causes: 1) A hardware test procedure has detected a chip error 2) BBSIG/BBSIG44: Timout at loading BB1(/BB2) with Boot-SW or Load-SW TRXD: Timeout at loading DR1/DR2 with Boot-SW or Load-SW TRXD2: Timeout at loading LEA with Load-SW Because of HW tolerances this error can occur during startup of TPU (TPU1 HW version) with the following additional infos: HW test number = 0b, HW test result = 01. The TPU is afterwards recovering successfully without any problem! NOTE: This error replaces the former error CBM_AEID_HW_ERROR beginning with BR3.0. It provides more detailed error information. NOTE2: For EdgeCU/FlexCU in BR8.0 this alarm was enhanced with more additional information about failures on HIT DSPs. Other words: Not applicable

6/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

3.1.8) Hardware Error on CU/ECU detected by BIC (20603) Error description: Power Supply Unit (PSU) does not work correctly Other words: Not applicable 3.1.9) Any of the PLLs shows lockproblems (34836) Error description: Any of PLLs on the SIPRO has lockproblems. Other words: Not applicable

3.1.10) Test Result failed (28675) Error description: The initiated PerformTest or AutomaticTest don't pass. It fails, see corresponding Test Report. Other words: Not applicable

3.1.11) No phys. Recovery after BTSE restart (30784) Error description: No physical recovery is performed for a faulty processor board after a BTSE restart (the board remains in state DISABLED.NOT_INSTALLED). With this alarm the board is set to DISABLED.FAILED in order to indicate on LMT and RC that there is a problem with this board (note that the original alarm is no longer available after a BTSE restart). Other words: Not applicable

3.1.12) Error in Flash or mismatch with database (34832) Error description: There are two possibilities for this error: Either the cell allocation number in the database does not match to the actual hardware, or the ramping FLASH could not be read completely.

7/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

Other words: Not applicable

3.1.13) Inter board communication timeout (24581) Error description: A task located on the core has not received the expected message from a peripheral board. Other words: Not applicable

3.1.14) Local AE-queue full (20543) Error description: The local AE-Queueing buffer is full. Other words: Not applicable 3.1.15) Error in downlink message from COP/HIT (34841) Error description: U1-BIC (baseband information controller) received a message from COP (coding/decoding proc essor on CU/ECU/ GCU ) / HIT (highly integrated transceiver, DSP on ECU /FlexCU ) in which either: The checksum is not correct. The downlink burst type is illegal. The training sequence code is illegal. The time slot number is wrong. The number of received data packet(s) is wrong. The packet size is wrong. There is a collision of downlink data on FlexCU. The error can be caused by: Cable problems on CC-Link A defective COBA

A defective CU (GCU/ECU/FlexCU) Other words: Not applicable

8/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

3.1.16) Critical SELIC problem (8260) Error description: A SELIC-ASIC indicates a critical problem. The problem does not allow any further processing of the SELIC-ASIC. The following critical problems can occur: * Illegal CC Link configuration * RAM BIST failure. Other words: Not applicable

3.1.17) Diversity Receive Branch Failed (34837) Error description: A diversity receive branch failed because of a bad signal receive level or because of a bad signal to noise ratio. The error can be located in the whole receive path, i.e.: - at the antenna - at the RF cabling - in the combining equipment - inside the CU/GCU/ECU/FCU A CU/GCU/ECU/FCU test can give a hint whether the error is situated in the CU/GCU/ECU or outside of it. If the CU/GCU/ECU/FCU test fails the error is located inside the CU/GCU/ECU/FCU, otherwise the error is located outside. Cause for this alarm may also be a strong interfering signal, which disturbs the receiver. In this case the CU/GCU/ECU/FCU test should pass. The alarm changes the availability status of the CU/GCU/ECU/FCU to "degraded". It is recommended to perform a CU/GCU/ECU/FCU test in order to locate the error. The CU/GCU/ECU/FCU leaves the availability status "degraded" if: - both receivers have a good receive quality - the supervision is switched off reception with diversity is switched off - Call Processing is blocked i.e. the CU/GCU/ECU/FCU is locked or a fault occured on an object which is necessary for this TRX. Other words: 1718(D): 17=(G)CU (Octet 1 = 0x6a): Shows the number of timeslots of the main receiver which do not receive properly. ECU/FCU (Octet 1 = 0x62): Alarm type discriminator: operating mode 0: Invalid 1: 2Rx 2: 4Rx 3: Switch Beam (not supported in BR8) 4: 4RxTxDiv 5: 0Rx 18=(G)CU (Octet 1 = 0x6a): The upper four bits show the number of timeslots of the main receiver which do not receive properly due to a bad signal to noise ratio. The lower four bits show the number of timeslots of the main receiver which do not receive properly due to bad signal strength. ECU/FCU (Octet 1 = 0x62): Processor ID of alarm originator 1920(D): 19=(G)CU (Octet 1 = 0x6a):

9/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

This octet shows the number of timeslots of the diversity receiver which do not receive properly. ECU/FCU (Octet 1 = 0x62): Equalizer diversity configuration 20=(G)CU (Octet 1 = 0x6a): The upper four bits show the number of timeslots of the diversity receiver which do not receive properly due to a bad signal to noise ratio. The lower four bits show the number of timeslots of the diversity receiver which do not receive properly due to bad signal strength. ECU/FCU (Octet 1 = 0x62): Shows the number of timeslots of the 1st receive branch which do not receive properly. 2122(D): 21=ECU/FCU (Octet 1 = 0x62): The upper four bits show the number of timeslots of the 1st receive branch which do not receive properly due to a bad signal to noise ratio. The lower four bits show the number of timeslots of the 1st receive branch which do not receive properly due to bad signal strength. 22=ECU/FCU (Octet 1 = 0x62): Shows the number of timeslots of the 2nd receive branch which do not receive properly. 2324(D): 23=ECU/FCU (Octet 1 = 0x62): The upper four bits show the number of timeslots of the 2nd receive branch which do not receive properly due to a bad signal to noise ratio. The lower four bits show the number of timeslots of the 2nd receive branch which do not receive properly due to bad signal strength. 24=ECU/FCU (Octet 1 = 0x62): Shows the number of timeslots of the 3rd receive branch which do not receive properly. 2526(D): 25=ECU/FCU (Octet 1 = 0x62): The upper four bits show the number of timeslots of the 3rd receive branch which do not receive properly due to a bad signal to noise ratio. The lower four bits show the number of timeslots of the 3rd receive branch which do not receive properly due to bad signal strength. 26=ECU/FCU (Octet 1 = 0x62): Shows the number of timeslots of the 4th receive branch which do not receive properly. 2728(D): 27=ECU/FCU (Octet 1 = 0x62): The upper four bits show the number of timeslots of the 4th receive branch which do not receive properly due to a bad signal to noise ratio. The lower four bits show the number of timeslots of the 4th receive 3.1.18) Increased path loss difference (38956) Error description: The absolute mean value of the path loss difference of the corresponding TRX (see additional info) is above the specified alarm threshold (RFL Alarm Threshold). This indicates possible hardware degradation of the BTS RF path. The path loss difference is represented as a signed integer number with 4 bytes length. Positive values indicate higher UL path loss than DL path loss -> Degradation at the receiver equipment.

10/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

Negative values indicate higher DL path loss than UL path loss -> Degradation at the transmitter equipment. The alarm contains some information about the responsible HW object. With this information all affected RF cabling, boosters, combiners, power amplifiers, etc. can be located by using the wiring data of the relevant cell. The reported TRX ID keeps it's validity up to the next configuration change. Locking a BTS RFLoopBack scanner with active alarm conditions will cease these alarms, even if the error condition still exists. Other words: 1718(D): 17=TRX ID, octet 1 (LSB) 18=TRX ID, octet 2 1920(D): 19=TRX ID, octet 3 20=TRX ID, octet 4 (MSB) 2122(D): 21=Measurement count, octet 1 (LSB) 22=Measurement Count, octet 2 2324(D): 23=Measurement count, octet 3 (MSB) 24=call count, octet 1 (LSB) 2526(D): 25=call count, octet 2 26=call count, octet 3 (MSB) 2728(D): 27=mean value of path loss difference, octet 1 (LSB) 28=mean value of path loss difference, octet 2 2930(D): 29=mean value of path loss difference, octet 3 30=mean value of path loss difference, octet 4 (MSB)

4) Clearance actions
Most of the alarms can appear in the CU/ECU only when the traffic over that particular CU/ECU increases e.g. The RF power reflected in the CU/ECU will depend on the traffic carried by this CU/ECU in case that particular CU/ECU is not carrying the BCCH. Due to this behavior is mandatory to define within O&M team the responsibility of detecting and tracking the actual faulty modules, since the initial test of a module not working properly can pass successfully during low traffic conditions and this can lead to the reuse of a faulty module in another site. It is also recommended to switch the BCCH to the CU/ECU that was generating the alarm after this is cleared. Some of the alarms could also be caused by environmental issues in the site itself e.g. over temperature, so it is also important to keep record and to track each site behavior when it comes to faulty modules and alarms cleared in that particular site to easily recognize any pattern that may show up.

11/19

STC CU/ECU alarms handling Procedure V1.0

4.1.1) The RF part of the CU/ECU has problems The initial step to clear this alarm is to perform a test on the module; this can be done remotely from the RC or the BSC. If the test is failed the module can be deleted and recreated to force a restart, if the module keeps disabled the O&M engineer should visit the site and swap the CU/ECU another with CU/ECU of the same BTSE, this will rule out a possible failure in the CU/ECU position of the rack and provide a HW rest of the module. If the same CU/ECU is disabled after swapping, it has to be replaced and sent to repair (please refer to annex A and B for further details). If the test passes or the module is recovered after re-creation or swap it is necessary to keep track of the behavior of this CU/ECU to make sure it does not fail again, in case the alarm is generated again and the other words information indicate an over temperature problem it is recommended to visit the site to make sure there is no physical issue with the rack e.g. missing cover plate, rack door open etc If the alarm reappears the module has to be sent for repair (please refer to annex A and B for further details). 4.1.2) Loss of board alarm, Hardware error Alarm or Test result failed alarm The initial step to clear this alarm is to perform a test on the module; this can be done remotely from the RC or the BSC. If the test is failed the module can be deleted and recreated to force a restart, if the module keeps disabled the O&M engineer should visit the site and swap the CU/ECU with another CU/ECU of the same BTSE, this will rule out a possible failure in the CU/ECU position of the rack and provide a HW rest of the module. If the same CU/ECU is disabled after swapping, it has to be replaced and sent to repair (please refer to annex A and B for further details). If another CU/ECU does not work in the faulty CU/ECU position in the rack. the CU backplane need to be checked If the test passes or the module is recovered after re-creation or swap it is necessary to keep track of the behavior of this CU/ECU to make sure it does not fail again, in case of a new failure the module should be replaced and sent for repair (please refer to annex A and B for further details)

4.1.3) RF power reflected warning and RF power reflected into powerstage These two alarms are closely related and normally generated due to a problem outside the module, initially the module has to be tested remotely but the test its very likely to be successful (the CU/ECU has to be sent to repair if the test fails), in case and the alarm appears again the O&M engineer has to visit the site to pinpoint the failure. The failure can be located in: 1) The physical TX output port of the CU/ECU. 2) The TX FlexiCable connecting the CU/ECU with the combiner. 3) The TX input port of the antenna combiner. 4) A high VSWR in the feeder (indicated in the other words of the RF power reflected warning) If the test is successful from the RC or BSC and the alarm appears again the O&M engineer should visit the site and sequentially perform the following steps:

12/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

A) Perform a visual inspection of all flexicables and input/output ports of the CU/ECU and corresponding combiner. Replace the related module/flexicable in case any physical damage is found. B) Perform a VSWR test of the corresponding feeder. If necessary measure the insertion loss of the feeder to make sure it is within the range define by the feeder manufacturer. Items A and B should pinpoint most of the possible reasons for this alarm, in case it is still not clear where the problem is located the O&M engineer should continue as follows: C) 1- Swap the CU/ECU with the alarm with another CU/ECU of the same BTSE for example CU/ECU:X. 2- Swap the TX flexicable with another one from the same BTSE for example CU/ECU:Y. D) 1-If the alarm moved to position X the CU/ECU has to be changed. 2- In case the alarm moves to the position Y where the flexicable was moved to, this flexicable has to be replaced

E) If the alarm stayed in the same CU/ECU position, swap the combiner where the CU/ECU is connected to make sure the fault is located in the combiner, once this is determine the combiner should be replaced and sent for repair (please refer to annex A and B for further details). 4.1.4) Device driver initialization failed The initial step to clear this alarm is to perform a test on the module; this can be done remotely from the RC or the BSC. If the test is failed the module can be deleted and recreated to force a restart, if the module keeps disabled the O&M engineer should visit the site and swap the CU/ECU another with CU/ECU of the same BTSE, this will rule out a possible failure in the CU/ECU position of the rack and provide a HW rest of the module. If the same CU/ECU is disabled after swapping, it has to be replaced and sent to repair (please refer to annex A and B for further details). If the test passes or the module is recovered after re-creation or swap it is necessary to keep track of the behavior of this CU/ECU to make sure it does not fail again, in case of a new failure the module should be replaced and sent for repair (please refer to annex A and B for further details). Alarms in items 3.1.6 to 3.1.14 should be troubleshooted as per 4.1.4 4.1.5) Error in downlink message from COP/HIT (34841) Initally perform a visual inspection including the CC-link cables related to the CU/ECU with the alarm (replace if necessary). Then proceed to troubleshoot as per 4.1.4, if the alarm persists replace the COBA of the site. 4.1.6) Critical SELIC problem (8260) Proceed as per 4.1.5.

13/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

4.1.7) Diversity Receive Branch Failed (34837) for CU or ECU As mentioned in the error description this alarm indicated a bad signal received level or a bad signal to noise ratio. Normally the problem is located outside the CU in the RX path (Antenna, Flexicable or Combiner). This alarm does not leave the related TRX in disabled state so the CU keeps carrying traffic, this requires for the field engineer to have all the necessary precautions to avoid any call drops (please refer to Annex B). 1- If the alarm is present in more than one CU/ECU in more than one sector in the site or, if the other words information indicates a bad signal to noise ratio in more than one RX path this may indicate an interference problem in the site, contact OPT for support. 2- To find out the CU is healthy or not perform test either locally or from remote so if A- The test failed the CU is faulty and need to be changed. B- If the test pass this means the CU is healthy and the problem outside the CU Either 1) The physical RX output port of the CU. 2) The RX FlexiCable connecting the CU with the combiner. 3) The RX input port of the antenna combiner. 4) High insertion loss in the antenna feeder. FLM has to do swap for each mentioned items , the CU with other one the site for example CU:X, Swap The RXFlexi cables with other ones in the site for example with the two RX cables of CU:Y&Z. Swap the related Combiner with other combiners of the mentioned CUs (X/Y/Z) for example with combiner Of CUs ( A,B,C,&D) Swap the related Feeder with other one in the same sector from BTSE side for example with feeder of CU:M&N. After that keep monitoring the site and check the alarm where will be appeared either In CU:X means the CU is faulty, CU:Y or Z so changed the related RX cable, one of CU X,Y or Z so check the Combiner or in the CUs M&N so the feeder need to be checked.

4.1.8) Increased path loss difference (38956) Based on the other words information it is possible to determine if a problem is present in the RX or the TX path. If the problem is present in the RX path O&M engineer should proceed as per 4.1.7 otherwise perform the procedure described in 4.1.3.

14/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

5) Recommendations
- All the replacements procedures have to be performed as indicated in the official NSN operating documentation; this includes the usage of all the ESD precautions among all the others recommendations as indicated in the Maintain Hardware document (A50016-G5100-B326-04-7620). - It is recommended to keep track of the tested modules and to define if the replacement has to be done even if the test of module is successful.

- O&M engineer on the field should keep a record of the replaced modules with its serial number to avoid the usage of a replaced module in a different site. This could happen since a faulty module can be enabled right after installation if there is no traffic in it. Since the rotation of the field engineers is very frequent in STC a proper track of the replacement is mandatory to avoid this kind of situations. - The uninstalled modules (spare parts or replaced faulty modules) should be packed and transported all the time as received from NSN to avoid any further damage in a faulty module or a fault in a spare due to mishandling. - STC could designed a checklist database where the parties involved can register the steps performed in one particular module and can also check and advise based on such information.

15/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

ANNEX A. Module replacement and ESD precautions

16/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

17/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

Annex B. Avoiding the lost of calls

18/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

Annex C. Other words

19/19

STC CU/ECU alarms handling Procedure V1.0 NSN CM Team NSN MS Use Khalil Al Ngashy

When connected to the LMT the information of the alarm will be displayed as below:

The additional words are always presented in octes as in above example, for this particular example octet 4 is H61, octet 15 H07 and octet 15 HFF.

You might also like