You are on page 1of 421

Student Guide

Servicing HP Proliant Server Products

Rev. 3.41

Training

Student Guide

Servicing HP Proliant Server Products

Rev. 3.41

Training

2003 Hewlett-Packard Company All other product names mentioned herein may be trademarks of their respective companies. Hewlett-Packard Company shall not be liable for technical or editorial errors or omissions contained herein. The information is provided as is without warranty of any kind and is subject to change without notice. The warranties for HP products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty. Servicing HP Proliant Server Products June 2003

CS3200 Contents Rev 3.41


1 Course Overview
Introduction .............................................................................................................. 1 Course Goal and Objectives ..................................................................................... 1 Course Design .......................................................................................................... 2 Course Content ......................................................................................................... 3 Classroom Facilities ................................................................................................. 4 Accredited Platform Specialist (APS) Program ....................................................... 5

2 Maximizing Customer Satisfaction


Introduction ...........................................................................................................2-1 Objectives ..............................................................................................................2-1 Effective Customer Service Skills.........................................................................2-2 Dealing with an Angry Customer..........................................................................2-7

3 Service Resources
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 Serial Numbers ......................................................................................................... 2 Standard Warranties ................................................................................................. 5 Service and parts information resources................................................................... 6 HP PartSurfer.......................................................................................................... 10 Service Parts Information (SPI) CD-ROM............................................................. 11 Electronic and Telephone Support Services........................................................... 12 Information and notification services..................................................................... 24 Learning Check ...................................................................................................... 30

4 Server Technology
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 PCI............................................................................................................................ 2 SCSI Architecture..................................................................................................... 6 Server Subsystems.................................................................................................. 18 Processor Subsystem .............................................................................................. 19 Memory Subsystem ................................................................................................ 30 Power Subsystem.................................................................................................... 48 Input/Output Subsystem ......................................................................................... 52 Software Subsystem ............................................................................................... 60 Fault Prevention and Recovery Management......................................................... 61 Learning Check ...................................................................................................... 70

Rev. 3.31

Servicing HP ProLiant Server Products

5 Server Product Line Overview


Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 Product Positioning Framework ............................................................................... 2 Server Introduction Timeline ................................................................................... 5 Maximized Expansion Servers ............................................................................... 11 ProLiant ML310 ..................................................................................................... 12 ProLiant ML330, ML330e, ML330 G2, ML330 G3.............................................. 15 ProLiant ML350, ML350 1GHz, ML350 G2, ML350 G3..................................... 20 ProLiant ML370, ML370 G2 and ML370 G3........................................................ 26 ProLiant ML530 and ML530 G2 ........................................................................... 32 ProLiant ML570 and ML570 G2 ........................................................................... 37 ProLiant ML750 ..................................................................................................... 43 Density-Optimized Servers .................................................................................... 46 ProLiant DL320 and DL320 G2............................................................................. 47 ProLiant DL360, DL360 G2 and DL360 G3.......................................................... 52 ProLiant DL380, DL380 G2 and DL380 G3.......................................................... 57 ProLiant DL560...................................................................................................... 62 ProLiant DL580 and DL580 G2............................................................................. 65 ProLiant DL590/64................................................................................................. 71 ProLiant DL740...................................................................................................... 74 ProLiant DL760 and DL760 G2............................................................................. 77 Blade Servers.......................................................................................................... 84 ProLiant BL10e ...................................................................................................... 85 ProLiant BL20p and BL20P G2 ............................................................................. 90 ProLiant BL40p ...................................................................................................... 93 Cluster Line ............................................................................................................ 96 Cluster Line ............................................................................................................ 96 ProLiant DL380 G2 and DL380 G3Packaged Clusters ......................................... 97 ProLiant CL380 Cluster ....................................................................................... 101

ii

Rev. 3.31

Contents

6 Array Products
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 Drive array technology ............................................................................................. 2 RAID levels supported by HP array controllers....................................................... 4 Smart Array controllers .......................................................................................... 13 Smart Array controller features .............................................................................. 14 Smart Array 6402/6404 .......................................................................................... 15 Smart Array 641/642 .............................................................................................. 16 Smart Array 5312 ................................................................................................... 17 Smart Array 5304 and 5302 ................................................................................... 18 Smart Array 532 ..................................................................................................... 20 Smart Array 5i and 5i plus...................................................................................... 21 Smart Array 4200 ................................................................................................... 22 Smart Array 431 ..................................................................................................... 23 Integrated Smart Array a.k.a. RAID on a chip (ROC) ........................................... 24 RAID LC2 .............................................................................................................. 25 Smart Array 3200 ................................................................................................... 26 Smart Array 4250ES and 3100ES .......................................................................... 27 SMART 2/E 2/P 2DH 221 and 2SL Array Controllers.......................................... 28 Array Controller Service Considerations ............................................................... 31 Array configuration utilities ................................................................................... 33 Learning Check ...................................................................................................... 34

7 Tools and Utilities


Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 1 SmartStart ................................................................................................................. 2 ROM Based Setup Utility (RBSU)........................................................................... 9 Device Drivers........................................................................................................ 17

8 Troubleshooting Methodology
Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 2 Troubleshooting Prerequisites .................................................................................. 3 HP Troubleshooting Methodology Overview .......................................................... 9 Step 1-Collecting Data ........................................................................................... 13 Step 2-Evaluating Information to Isolate Mode of Failure .................................... 21 Step 3-Developing an Optimized Action Plan ....................................................... 25 Step 4-Executing the Action Plan........................................................................... 29 Step 5-Evaluating Results....................................................................................... 33 Step 6-Implementing Preventive Measures............................................................ 37 Learning Check ...................................................................................................... 46

Rev. 3.31

iii

Servicing HP ProLiant Server Products

9 Server Diagnostic Tools


Introduction .............................................................................................................. 1 Objectives ................................................................................................................. 2 HP Insight Diagnostics ............................................................................................. 3 Array Diagnostics Utility (ADU) ........................................................................... 13 Insight Manager...................................................................................................... 31 Remote Insight Management.................................................................................. 44 Survey Utility ......................................................................................................... 68 SCU diagnostics ..................................................................................................... 75 Learning Check ...................................................................................................... 85

Appendix A Entry Level Servers


Introduction ..........................................................................................................A-1 Objectives .............................................................................................................A-1 ProSignia 200 .......................................................................................................A-2 ProLiant 800 .........................................................................................................A-6 ProLiant 1200 .....................................................................................................A-17 ProLiant 400 .......................................................................................................A-21 ProLiant 720 .......................................................................................................A-25 ProSignia 740 .....................................................................................................A-28

Appendix B Workgroup Servers


Introduction .......................................................................................................... B-1 Objectives ............................................................................................................. B-1 ProLiant 2500 ....................................................................................................... B-2 ProLiant 1600 and ProLiant 1600R...................................................................... B-6 ProLiant 3000 ..................................................................................................... B-13 ProLiant 850R .................................................................................................... B-19 ProLiant 1850R .................................................................................................. B-24

Appendix C Enterprise Servers


Introduction .......................................................................................................... C-1 Objectives ............................................................................................................. C-1 ProLiant 5000 ....................................................................................................... C-2 ProLiant 6000 ....................................................................................................... C-9 ProLiant 6000 Xeon ........................................................................................... C-14 ProLiant 6500 ..................................................................................................... C-20 ProLiant 7000 ..................................................................................................... C-26 ProLiant 5500 and 5500R................................................................................... C-32 ProLiant 6400R .................................................................................................. C-39 ProLiant 8500 ..................................................................................................... C-44 ProLiant 8000 ..................................................................................................... C-48

iv

Rev. 3.31

Contents

Appendix D Appliance and Storage Server Products


Introduction ..........................................................................................................D-1 Objectives .............................................................................................................D-2 TaskSmart C-Series Servers - Tower ...................................................................D-3 TaskSmart C-Series Servers - Rack .....................................................................D-4 TaskSmart N-Series Servers.................................................................................D-6 Neoserver..............................................................................................................D-8 ProLiant Storage Systems...................................................................................D-10 Rack 4000 Series ................................................................................................D-16 Rack 7000 Series ................................................................................................D-17 Rack 9000 Series ................................................................................................D-18 UPSs ...................................................................................................................D-16 Rack Builder Pro..D20 Server Console Switch .......................................................................................D-21 Learning Check ..................................................................................................D-23

Appendix E Fault Isolation


Introduction .......................................................................................................... E-1 Objectives ............................................................................................................. E-1 Isolating to a Subsystem....................................................................................... E-2 Key Indicators of Problems with Specific Subsystems ........................................ E-4 Memory Problems ................................................................................................ E-5 Processor Problems .............................................................................................. E-8 Power Problems.................................................................................................. E-10 I/O Subsystem Problems .................................................................................... E-15 Hard Drive Problems .......................................................................................... E-20 Floppy Drive Problems....................................................................................... E-22 CD-ROM Problems ............................................................................................ E-23 Tape Drive Problems .......................................................................................... E-24 Mouse and Keyboard Problems.......................................................................... E-29 Video Problems .................................................................................................. E-32 Network Problems .............................................................................................. E-35 Software Problems.............................................................................................. E-39 Microsoft Windows Problems ............................................................................ E-42 Novell NetWare and IntranetWare Problems..................................................... E-45 Loose Connection Problems............................................................................... E-48 Learning Check .................................................................................................. E-49

Appendix F - Sample Inspect Report Appendix G - Sample ADU Report Appendix H Cabling Guidelines Appendix I - Error Codes
Rev. 3.31 v

Course Overview
Module 1

Introduction
Servicing HP ProLiant Server Products is a training program that focuses on customer communication skills, service tools, software utilities, troubleshooting, and repair/replacement procedures. It also gives an overview of the features of the ProLiant server product line and major service considerations. Learning concepts are reinforced through a series of comprehensive lab exercises that provide the opportunity to gain valuable hands-on experience with HP products. The training curriculum has two major segments:

Classroom-delivered instruction Hands-on lab assignments

Each of these segments requires demonstrable proficiency and is measured by comprehensive certification testing.

Course Goal and Objectives


The goal of Servicing HP ProLiant Server Products is to ensure a consistent base level of expertise among service engineers certified to configure, upgrade, diagnose, and repair ProLiant server products. By design, this course focuses on training the technician in both where to find information needed to configure and service ProLiant server products, as well as how to perform the needed service. To meet this goal, service engineers should be able to:

Demonstrate customer communication skills. Demonstrate ability to quickly and accurately find ProLiant service documentation and product-specific information. Demonstrate knowledge of and ability to use HP service resources in configuring, upgrading, and servicing ProLiant server products. Demonstrate technical proficiency and skills by working with HP products in a lab environment. Demonstrate logical troubleshooting skills by problem recognition, problem isolation, solution development, and testing for proper operation.

Rev. 3.41

11

Servicing HP ProLiant Server Products

Course Design
Servicing HP ProLiant Server Products is a three-day class, made up of a series of product modules, lab assignments, classroom presentations, and product demonstrations. A series of lab assignments are designed to reinforce the concepts presented in the course and give you the opportunity to enhance your proficiency with ProLiant server products. Learning checks and review questions let you confirm your understanding of the information. To make this course a successful learning experience, you are expected to do the following:

Actively participate in all class presentations. Your experience and expertise is of value to the entire class and will greatly supplement the information being presented. Complete the lab exercises assigned by your instructor to the best of your ability. The exercises are your opportunity to demonstrate that you have developed the skills and knowledge to service and support ProLiant server products.

12

Rev. 3.41

CS3200 Course Overview

Course Content
The following is a brief overview of the information and topics presented in the course modules:

Module1: Course Overview covers course goals, objectives, class materials, class expectations, lab assignments, requirements for certification, and final testing requirements Module 2: Maximizing Customer Satisfaction provides an overview of effective customer communication skills Module 3: Service Resources covers documentation, software, utilities, online services, and technical support Module 4: Server Technology addresses server technologies and subsystems Module 5: Server Product Line Overview describes the ProLiant server product family including: ML, DL, CL and BL server products Product positioning and server introduction timeline Features and service considerations

Module 6: Smart Array Products covers the HP Smart Array controller family Module 7: Tools and Utilities covers ProLiant server installation and configuration Module 8: Troubleshooting Methodology covers the 6 steps to logical troubleshooting and troubleshooting flowcharts) Module 9: Server Diagnostic Tools provides information on the tools used to diagnose faults in various ProLiant server subsystems Legacy products are covered in the following appendices: A. Entry Level Servers B. Workgroup Servers C. Enterprise Servers D. Appliance Servers and Storage Systems

Appendix E on Fault Isolation helps you determine subsystems or Field Replaceable Units (FRUs) that could be causing a problem Other appendices provide sample reports from various tools Lab exercises provide hands-on experience to reinforce the classroom material

Rev. 3.41

13

Servicing HP ProLiant Server Products

Classroom Facilities
Your instructor will provide additional information about class specifics. However, it is important to note a few general guidelines that must be followed:

Be sure to locate fire exits. The classroom is smoke free. Your instructor will provide information regarding available smoking facilities. Set pagers and cell phones in silent mode. Your instructor will point out the location of phones for use by class members. There will be scheduled AM and PM breaks. However, feel free to take a break whenever you need to. Your instructor will review the restroom and telephone locations with you before the break. Your instructor will present lunch information. Please ensure that you return from lunch on time. Because a lot of information and activities must be covered in three days, it is essential that you: Begin class promptly. Use your time wisely. Attend the entire class time allotted. The instructor will cover only a portion of the information in this student guide during class. It is to serve as a reference manual for you to use in the future.

Leave this page blank.

14

Rev. 3.41

CS3200 Course Overview

Accredited Platform Specialist (APS) Program


A major key to providing superior customer satisfaction is the ability to deliver quick and effective post-sales service and technical support. To meet this challenge, it is essential to have a properly trained and certified technical staff that is skilled in the service of HP products and knowledgeable of the service resources available. To help our service partners develop the technical expertise needed to achieve this goal, HP is pleased to provide the Accredited Platform Specialist Certification Program. This program develops and measures the in-depth skills and knowledge necessary to service and support ProLiant Server products. Applicants must become Server+ certified, successfully complete all required platform exams, and have their application approved by HP. Attainment of this certification entitles Authorized Service Providers participating in the APS program to receive warranty reimbursement. The terms and conditions of the Authorized Service Provider program outline these benefits.

Exam Preparation
The best way to prepare for the Server Certification Exam is to complete the threeday ILT course Servicing HP ProLiant Server Products or the Servicing HP ProLiant Server Products WBT. Server+ certification and training is also recommended preparation. To improve your chance of success with the certification exam, use the preparation guide which is accessible via the Internet at
http://h18014.www1.hp.com/training/service/ACT/010_500_epg.html

Sample questions are provided in the exam preparation guide to demonstrate the types of questions to expect at the testing center.

Certification Testing
Certification testing is required at proctored testing centers. This is a 75 minute competency verification exam where the student can demonstrate mastery of skills and experience to service ProLiant Server products. Exam questions are developed by experienced service engineers based on skills and content defined in the exam preparation guide. Benchmarking the scores of practicing service engineers sets the passing score. This is your assurance that your APS certification will be valued. To register for the ProLiant Server Maintenance Exam, you may call:
Thompson Prometric at 1-800-366-EXAM

Rev. 3.41

15

Servicing HP ProLiant Server Products

APS Application
After successfully passing the Server Certification Exam, and the Server+ certification, complete and submit an APS application, along with the supporting documentation. The application can be found at
http://h10017.www1.hp.com/certification/na/application.html

Certification Maintenance
The final segment of the training program centers upon the necessity for Accredited Platform Specialists to maintain their knowledge of new HP products that are introduced after certification. HP will proactively provide new product self-paced training. This may be in the form of computer-based training distributed on CD, print-based materials, or new product training implemented through the Internet. HP will determine which new products require self-paced training for the APS. When this determination is made, Accredited Platform Specialists will be notified.

Training History Tracking


The names of Accredited Platform Specialists will be entered into the HP training database so their training history can be maintained. To qualify for warranty reimbursement, an Accredited Platform Specialist must perform service work. Verification will be determined by the training database.

Current Information
You will find the most current APS Certification information at
www.hp.com/go/certification

16

Rev. 3.41

Maximizing Customer Satisfaction


Module 2

Introduction
Our customers have the right to expect their HP equipment to perform properly. When customers call for help, it is because their systems are functioning below their level of expectation. This situation is an opportunity to gain new customers and to win their loyalty through great service and support. Because the customer is usually unhappy with the situation, it is essential to turn the service event into a positive experience. Conveying concern and determination in a positive and confident manner is as important as resolving the problem quickly and professionally. Customers remember a positive service experience, which becomes a very influential factor for future equipment and service needs. Take every opportunity possible to reinforce the customers decision to buy HP products and services. Companies are vulnerable when their database, financial, application, gateway, or e-mail systems are malfunctioning. It is difficult to measure the true cost of downtime for a company. However, we know that productivity suffers when systems are down for repair or maintenance. Maintaining the quality and consistent availability of HP systems is a critical task. There are two essential skills or areas of expertise needed to provide superior service. The first is outstanding customer service skills. The second is outstanding troubleshooting skills. These two skills overlap and each is an essential part of the other. This module presents an approach for maximizing customer satisfaction, including an overview of effective customer communication techniques. Topics include:

Effective customer service skills Dealing with the angry customer

Objectives
To maximize customer satisfaction, field service engineers should be able to:

Define the elements of effective customer communication. List ways to provide effective customer service.

Rev. 3.41

21

Servicing HP ProLiant Server Products

Effective Customer Service Skills


Voice
There is a voice used by effective HP service personnel. That voice is:

Positive Caring Confident Determined

Computers are critical to the day-to-day functioning of a company. People are adversely affected by failure, it is imperative that you use that voice when dealing with our customers.

Why Use
Customers may make initial purchasing decisions based on price, but good customer service is what builds loyal customers. Customers are also more likely to recommend HP to others if they receive excellent, professional, customer service. Conversely, dissatisfied customers quickly spread the word, and discourage others from choosing HP. Effective customer service skills are not only good for creating customer satisfaction, but are also a key component of troubleshooting. These skills provide the technician or service engineer with an effective diagnostic tool.

Basic Skills for Service Communications


Basic customer service skills can be used during any portion of the communication process:

Preparing for the service call During the service call Following up

The following sections provide tips to follow.

22

Rev. 3.41

Maximizing Customer Satisfaction

Preparing for the Service Call


When possible, to prepare for a service call:

Research the company to become familiar with their business, their computer systems, and how they use the systems. If this is not possible before the service call, learn as much as you can while on- site. Research the problem. Try to get as much information as possible so that you arrive with the appropriate tools, parts, hardware, software, and so on. Try these questioning and listening strategies to help you research the problem, whether over the phone or on-site. Seek the most effective person or persons to help you understand the perceived problem. This person is usually the system administrator or the person assigned to network operating system support responsibilities. Maintain eye contact. Ask questions, summarize, and rephrase the answers to make sure you understand what the customer has said. Let the customer do most of the talking. Do not interrupt the customer even if you think you already understand the problem. Additional details may change your mind. Respond to the customers comments so that they know you are listening. Listen to what the customer is saying, and how they are saying it. Be aware of the customers tone. If you are meeting in person, also pay attention to gestures and facial expressions. Keep an open mind and do not jump to conclusions. Use terms and expressions that your customer understands. Be professional and positive. Do not argue with a customer. Listen carefully for pertinent information related to the problem, such as: Symptoms How often the problem is occurring When it started What was the desktop or workstation doing when the problem occurred Information that does not seem to fit. The customer may be misinterpreting the situation.

Rev. 3.41

23

Servicing HP ProLiant Server Products

During the Service Call

Arrive on time and prepared. If you are unavoidably delayed, let the customer know as soon as possible. Treat all employees at the customers company with courtesy and respect. Maintain a caring attitude while gathering information to understand the problem and the context of the failure. Avoid showing a belittling know-itall attitude, using ill manners, or presenting a non-caring attitude. Maintain a professional appearance at all times. Use an effective greeting: Project a positive, helpful attitude at the beginning to help ensure customer cooperation and smooth communication as the service event progresses. Do not act rushed or hurried. A proper introduction lets the customer know how you fit into the organization, and gives them a sense that a professional is taking responsibility for helping to solve the problem. Include all four parts of an effective greeting, whether in person or over the phone:
Part Salutation Purpose

Acknowledges the customer. Invokes a sense of openness and friendliness

Examples: A greeting such as Hi, Hello, Good morning, or Good afternoon, spoken sincerely provides acknowledgment of the customer and starts the conversation in a positive direction. Your name Stating your name:

Implies you are responsible for your work. Implies you stand behind what you do. Gives the customer a path for future assistance. Lets the customer know you represent your company proudly. Implies that your company stands behind their work. Gives the customer a path for future assistance.

Company name

Stating your company's name:


Purpose

Tell the customer why you are there or why you are calling. Be sure to include the word HP. The customer can let you know if you are at, or calling, the correct location and save valuable time. If the customer calls you, state your purpose before they ask to let them know they have reached the correct location.

24

Rev. 3.41

Maximizing Customer Satisfaction

Ask questions and listen to the customer. (See the previous tips for questioning and listening strategies.) Apply the steps in the Troubleshooting Methodology Flowchart.1 Be positive. Focus on what you can do, not what you cannot do. Negative words project an uncaring, unhelpful attitude. Instead of I cant replace your diskette drive right now say I can order a replacement diskette drive right now. Keep negative words out of conversation as much as possible, including: no, not, never, wont, cant, doesnt, wrong, and but. Eliminate these words from your vocabulary. Avoid assigning blame. Instead of Youve called the wrong extension say Let me get you to the person who can help. Use confident words and phrases. Do not say This is the first time Ive ever seen one of these. Instead, say This is a great machine or Ill get this problem resolved for you. Explain clearly what you are there to do and get down to business. Set the customers expectations with a timeframe and an outline of your approach to diagnose and resolve the issue. By doing this, you are taking charge of the situation, while discouraging any unreasonable demands. Stay focused on the goal to resolve the issue. If the customer drifts into unrelated topics, politely turn the conversation back to the problem. After you do this a few times, the customer usually takes the hint and either stick to the subject or leave you alone to do your job. Provide a timeline, schedule, or plan, as appropriate. Update the customer as soon as possible and reset any expectations to allow the customer to adequately plan or prepare if you realize that you need additional time troubleshoot, that downtime needs to be scheduled or parts need to be ordered. Update the chain of command, if appropriate. The customer may take care of this task if he/she feels the situation is under control and progressing.

Manage the conversation, whether on-site or over the phone.

Keep the customer updated and informed.

1 Covered in module 13

Rev. 3.41

25

Servicing HP ProLiant Server Products

Following Up
After the meeting or service call:

Keep the customer updated and informed. (See the previous tips.) Follow up with any work that needs to be done. Follow through with any promises made. Once you have solved the problem, remind the customer of the problem and the steps you took to solve it. Use the same terms and phrases that the customer originally used, so that it will be clear that you solved the original problem. Use terms and language the customer understands when explaining the solution. Answer any additional questions the customer may have. Include the words Thank you and HP as well as your name and, if you are an authorized service provider, your companys name, in the closure. This assures that the customer knows we appreciate the opportunity to solve the problem and that we are available for future opportunities. Check back with the customer to make sure the problem is truly solved. This is good public relations for HP and will reinforce their decision to choose HP.

If you notify the customer about the possibility of a customer satisfaction survey, they can address any issues they have with the service call before it is ended. You might say, An HP representative may be contacting you to conduct a survey on the service that my company has provided you on this issue. Is there anything that would keep you from scoring our service a five out of five? If the customer has any issues that have not been addressed, you can address them at this time. This could also lead to a discussion about who should be contacted for the survey, and where. That would avoid any possible frustration from the wrong person being contacted. In addition, this gives the customer an opportunity to ask questions about the survey process.

26

Rev. 3.41

Maximizing Customer Satisfaction

Dealing with an Angry Customer


Occasionally, you may find yourself in a difficult situation with an angry customer. Apply the following tips if you are dealing with an angry customer:

Acknowledge the customers feelings. Be sympathetic and empathetic. For example, tell the customer I can understand why you are upset. Rephrasing and repeating the customers complaint lets them know that you understand their problem. Remember, agreeing with the customer and understanding the customer are two different things. Put yourself on the customers team by using we and us. This lets the customer know you are working together to resolve the problem. Apologize for any inconvenience. Accept responsibility for resolving the problem, even if you did not have anything to do with creating it. You are representing HP, not just yourself. Let the customer know that you are there to help, and accept responsibility for resolving the problem. If it is necessary to refer the customer to someone else, let the customer know that you will stay on as part of the team to help solve the problem. Stress what you can do for the customer, rather than what you cannot do. Give the customer as many options as possible and stress them all. Try to find an agreeable solution among the options. Stay positive. Contact your service manager if you are still unable to satisfy the customer. Allow your service manager to handle the situation and to set the customers expectations. Most service managers are experienced in managing these situations. Never escalate the anger in an already difficult situation.

Remember, when dealing with an angry customer, dont get hooked by their anger! It only complicates the situation if you become defensive. Try not to take their anger or the situation personally

Rev. 3.41

27

Servicing HP ProLiant Server Products

Learning Check
1. What voice should HP service engineers use?

2.

Why are good customer service skills important?

3.

Explain the four components of an effective greeting.

4.

What types of information should you research before the service call?

5.

Explain how to manage the conversation with the customer.

6.

Explain ways to stay positive during the service call.

7.

Explain how to keep a customer updated and informed.

8.

What actions should you take once the problem is solved?

28

Rev. 3.41

Maximizing Customer Satisfaction

9.

List five questioning and listening strategies.

10. Explain how to deal with an angry customer.

Rev. 3.41

29

Servicing HP ProLiant Server Products

This page left blank.

2 10

Rev. 3.41

HP Service Resources
Module 3

Introduction
A thorough understanding of the available service and support resources is key to providing superior customer satisfaction. This module provides an overview of these resources. Topics include:

Serial numbers Standard Warranties Maintenance information Parts information Web-based resources Electronic and telephone support services Training Enterprise Computing Services

Objectives
To use HP service resources effectively, service engineers should be able to:

Interpret a serial number Locate part numbers for HP products Locate product information for HP products Access product and service resources on-line.

Rev. 3.41

31

Servicing HP ProLiant Server Products

Serial Numbers
A serial number is an integral part of the product label, on which the serial number must be presented along with the HP product and/or part number. Pre-merger Compaq serial numbers used a 12 digit format while pre-merger HP serial numbers used 10 digits. Effective January 1, 2003 the 10-digit policy became effective for new product design and introduction in the new HP. Both 12 and 10 digit serial number codes provide valuable information such as:

Country code (pre-merger and new HP) Model configuration (pre-merger Compaq) Date of manufacture Supply site/Vendor code Unique sequential identifier

There are two different serial number formats previously used for pre-merger Compaq servers. One is used to identify machines built at Compaq sites existing before 1998, the other to identify products built at Compaq sites added since 1998 and before the merger with HP. Since January 1, 2003, the 10-digit serial number policy has applied to ProLiant servers built by HP. As an exception to the policy an interim interpretation of the 10-digit format will be used until the transition to the standard 10-digit format is complete.

Unit Configuration Code


In the 12-digit format, the unit configuration code is located in the product serial number at the fifth, sixth, seventh, and eighth placeholders. The location is the same in both serial number formats. The quickest way to identify a pre-merger Compaq product from the serial number is to initiate a search in QuickFind 2000 using the configuration code. Configuration codes are also listed in the Maintenance and Service Guide under the Illustrated Parts Catalog section.

Refurbished Units
Units with an R in the first position of the serial number are refurbished products and sold through a reseller.

32

Rev. 3.41

HP Service Resources

12-digit serial number breakdown existing build sites


D 7 45 BRZ1 0018 Serial number that differentiates units produced in the same week Configuration code which defines the specific computer as shipped from Compaq Week in which the product was produced Year in which the product was produced Compaq unique

12-digit serial number breakdown additional build sites

Rev. 3.41

3 3

Servicing HP ProLiant Server Products

10-digit serial number breakdown current format

CC S YWW ZZZZ

ZZZZ = Unique sequential ID YWW = Date of manufacture S = Supply site CC = country code

10-digit serial number breakdown transitional

S ZZZZ AAAA YW

YW = Date of manufacture AAAA = Configuration code ZZZZ = Unique identifier S = Supply site

34

Rev. 3.41

HP Service Resources

Standard Warranties
HP offers a variety of warranty programs:

Servers For most servers HP Services provides a three-year, limited warranty, including Pre-Failure Warranty (coverage of hard drives, memory and processors) fully supported by a worldwide network of resellers and service providers and lifetime toll-free 7 x 24 hardware technical phone support. Limited Warranty includes 3 year Parts, 3 year Labor, 3-year on-site support. Complete information on the warranty of any product is available in the product QuickSpecs on HPs worldwide website at http://h18004.www1.hp.com/products/servers/platforms/warranty/index.html

Refurbished Products (identified by an R in the first character of the serial number) are refurbished by HP and sold through a reseller or through HP Factory Outlet (HP Works). A one-year parts and carry-in limited warranty cover them. Spare and Option Part Warranties are covered by a 90-day warranty or the warranty of the computer into which they are installed, whichever is longer. Most options carry a one-year warranty. Once installed in an HP machine, they assume the remainder of that machines warranty or the remainder of the option warranty, whichever is longer. (Some options have three-year warranties.)
NOTE ABOUT WARRANTIES: Do not make assumptions about warranty based on a systems serial number. For example, the serial number of a system that has a failed hard drive may show a manufacture date of four years ago but the proof of purchase on the hard drive may show that it is under parts warranty.

Rev. 3.41

3 5

Servicing HP ProLiant Server Products

Service and parts information resources


HP service and parts information resources include:

Service Announcements, Advisories, and Bulletins Maintenance and Service Guides (MSGs) QuickFind 2000 HP Parts Surfer Service Parts Information (SPI)

36

Rev. 3.41

HP Service Resources

Service Announcements, Advisories, and Bulletins


Service Announcements, Advisories, and Bulletins are published by HP to give technicians information about business procedures and technical considerations.

Service Announcements These publications describe new HP products and new or modified service programs for our Authorized Service Providers. A good example of this type of announcement is Service Product Announcement 3341N which announces the ProLiant DL380 G3 server. Service Product Announcements describe new HP products. Service Program Announcements describe new or modified service programs. Service Announcements can be found in the QuickFind Support Reference Library.

Rev. 3.41

3 7

Servicing HP ProLiant Server Products

Service Advisories SAs are problem/solution pair documents. They are the official communications from HP to our Service Partners regarding issues and repair instructions. Special return procedures or warranty extension information is also communicated through these documents. The schedule for publishing Service Advisories depends on your access method. External to HP, SAs are available on the FTP site for QuickFind CD subscribers to update their local copies. This is updated once every other week. SAs are also available to external service partners through the CSN Service Partner website. This site is updated daily and partners can search by serial number, date, document type, and more. High-priority SAs are printed and distributed every two weeks to subscribers. They are simultaneously published to OARS. Notice that OARS also contains SAs that are informative and address noncritical technical issues. Service Bulletins These publications are urgent proactive documents of extreme importance to Service Providers. They are printed and delivered immediately, regardless of the biweekly mailing schedule for regular Service Advisories. They are also published directly in OARS.
NOTE: A Service Advisory or Service Bulletin that is distributed on yellow paper is to be considered critically important. It must be given priority attention. Electronically distributed copies highlight a warning message visually separated in the document from regular text.

For a detailed comparison of these service document types and modifications to the numbering system, see Service Advisory 1900(A) and Service Announcement 2000. Service Advisories and Bulletins are available from two sources:

All Advisories and Bulletins are available on OARS. All Advisories and Bulletins are available on CSN.

38

Rev. 3.41

HP Service Resources

Maintenance and Service Guides


A Maintenance and Service Guide (MSG) is published for all HP computers. It is designed to be used as a troubleshooting guide and reference tool when servicing HP computers. The MSG is divided into various chapters that provide the following information:

Illustrated Parts Catalog Provides an illustrated reference for specific HP personal computer spare parts a good resource for part numbers. Service Preliminaries Provides preliminary warnings and cautions, information about necessary equipment, and warranty information. Removal and Replacement Procedures Describes how to remove and replace field subassemblies for specific HP personal computers. Switch and Jumper Settings Provides detailed information for setting switches and jumpers. It notes specific settings for each board. Power-On Self-Test (POST) Describes the internal system diagnostic programs that are executed automatically when you power on the system. Error Messages and Codes Lists the POST and HP Diagnostics Error Codes, and the required course of action to resolve the problem described by each error code. Specifications Provides operating and performance specifications for the specific HP personal computer for which a particular guide is developed. Index Assists in locating specific information throughout the guide. http://www3.compaq.com/support/reference_library/

MSGs are available at

Rev. 3.41

3 9

Servicing HP ProLiant Server Products

HP PartSurfer

HP PartSurfer provides fast, easy access to parts information for a wide range of HP products. With this application you can: search for part information by product name or model number look up part information by keyword, category or part type cross-reference exchange to new part numbers identify all HP products that use/reference a specific part number generate on-screen and hard-copy reports display exploded product and part views

The HP Service Parts Information (SPI) CD-ROM is a complement to HP PartSurfer. The SPI CD-ROM accesses the same data as HP PartSurfer, but with a slightly different interface. HP-SPI CD-ROMs can be purchased individually or as a subscription (includes four quarterly updates). You can subscribe to HP-SPI by fax, mail or phone. For information on how to order the SPI CD-ROM, please check out the SPI web site

3 10

Rev. 3.41

HP Service Resources

Service Parts Information (SPI) CD-ROM

This CD-ROM-based information tool for Windows provides fast, easy access to all the latest information on HP parts in one location. The application includes a Parts Database and Parts Information Reports. Parts Database features enable you to : Search for parts information by product model name or number. Look up parts information by keyword, category, or part type. Cross-reference any exchange part number to its new part number. Identify all HP products that use or refer to a specific part number. Locate part information when product information is unknown. Display exploded product and part views On-screen and hard-copy reports by model name, number, and family. Parts ID lists that give you product breakdown by category and keyword. Parts price lists with product breakdown by part number.

Parts Information Reports provide:

SPI CD-ROM ordering information can be found on the website at http://partsurfer.hp.com/hp-spi/order_info.htm.

Rev. 3.41

3 11

Servicing HP ProLiant Server Products

Electronic and Telephone Support Services


HP electronic and telephone services include:

Websites ActiveUpdate ActiveAnswers Online services Reseller services Technical Support Center Training

3 12

Rev. 3.41

HP Service Resources

HP Websites
Having access to the latest HP technical product information, diagnostic software, and SoftPaq solution files is critical for all service providers. HP provides this information through a variety of websites, including:

Active Update: http://www.compaq.com/products/servers/management/activeupdate/ Active Answers: http://h71019.www7.hp.com/ActiveAnswers/ Channel Services Network: http://web7.compaq.com/csn/ Customer Profile Center: http://compaq.mycustomprofile.com HP PartSurfer: http://partsurfer.hp.com/cgi-bin/spi/main HP Services: http://www.hp.com/services HP Support: http://h71025.www7.hp.com/support/home/ HP Worldwide: http://www.hp.com Product Change Notification: http://h18000.www1.hp.com/pcn/ QuickFind 2000: http://h18018.www1.hp.com/Cas-Catalog/quickfind.html Training: http://h18004.www1.hp.com/training/

Rev. 3.41

3 13

Servicing HP ProLiant Server Products

HP Worldwide Website
The HP worldwide website is located at http://www.hp.com.

The HP worldwide website is a major source for information and resources. It provides access to:

Product Information Support Resources Solutions Services

3 14

Rev. 3.41

HP Service Resources

HP Services Website
Access the HP Services website at http://www.hp.com/services or select Services from the HP home page.

The HP Services page provides details about the most comprehensive service and support programs available in the computer industry:

Hardware and Software Infrastructure eBusiness Platform eBusiness Solutions Industry Focused

Rev. 3.41

3 15

Servicing HP ProLiant Server Products

HP Support Home
Access the HP Support website by selecting Support and Drivers from the HP worldwide website and then selecting Compaq and HP ProLiant Servers or by using the following url: http://h71025.www7.hp.com/support/home/index.asp

From the Support home page you can access information about the following topics:

Software and drivers Natural language search Reference library Forums and communities Support tools Warranty information Contact support Parts Feedback

Following are expanded descriptions for the type of information associated with each link:
3 16 Rev. 3.41

HP Service Resources

Software and drivers This link takes you to a page where you can search for software and drivers by server or by operating system as well as locate Softpaqs for specific products. Natural language search On this page you can use a search engine to find answers to questions you can ask in the same manner as you would in conversation, for example, Where can I find information on the Rapid Deployment Pack? Reference library In the Reference library you can get information for specific products including service notifications, frequently asked questions, white papers and manuals. Forums and communities This link will take you to a listing of a variety of forums such as business customer discussion groups, customer communities and forums for particular HP products. Support tools Among the support tools that you can find on this page are proactive notification tools, Internet call logging services, Internet connection services, InsightManager and ActiveAnswers. Warranty information Here you can use a product serial number to determine the warranty expiration date for that product. Contact support This link takes you to a page where you can email technical support engineers with questions about ProLiant servers, find and order spare parts, locate resellers and service centers and obtain telephone numbers for pre- and post sales technical support worldwide. Parts Here you will find links to HP PartSurfer, the End User Replaceable Parts (EURP) program and the Spare Parts Store as well as spare part information and illustrations. Feedback This link takes you to a form where you can fill out a survey and provide information on how HP can make your support experience better in the future. Note: this is not the place for obtaining technical support for that type question use Contact ssupport.

Rev. 3.41

3 17

Servicing HP ProLiant Server Products

HP Support Reference Library Product Selection


When you click reference library from the HP Support home page, you go to a page that allows you to select the particular model of server on which to obtain detailed product information.

In the Reference library you select from Product Category, Product Family and Product Series to target a particular server model. Once you have completed the selections a search engine proceeds to locate documents associated with the product.

3 18

Rev. 3.41

HP Service Resources

HP Support Reference Library Product Selection

As displayed here the Reference Library search locates all of the documents associated with the product including:

Service notifications Frequently asked questions Parts documentation Manuals Service links White papers

Rev. 3.41

3 19

Servicing HP ProLiant Server Products

Support Information A more detailed summary of the information in the Reference Library is shown in the table below.
Category Information

Customer Advisories

A compilation of advisories on such topics as


Controllers and adapters Hardware Operating systems and utilities Internet solutions Communications and networks Clusters System management

Drivers, SoftPaqs, Software and Utilities

Software information referenced by product family, model and operating system or by operating system and software category including

Support Paq Display Management Agents Management Applications and Utilities Network Storage System ROMpaqs/BIOS Utilities

Manuals

Product documentation including Maintenance and Service Guide


Setup and Installation Guide User and Reference Guides Option Related Guides

Other

Other information available by product includes: Tech Notes


Services White Papers

3 20

Rev. 3.41

HP Service Resources

HP Channel Services Network (CSN)


The HP Channel Services Network is a virtual, global community made up of HP customers, HP service channel partners and HP. The HP Channel Services Network provides partners opportunities to enhance business capabilities to sell and deliver global IT services through a Web-based management system.

HP Channel Services Network provides real-time access to service, sales, and support information. Various business transactions can be conducted on CSN, including parts ordering and tracking, electronic claims, CarePaq registration, and Depot registration. Various reports and metrics, sales assistance, as well as support tools are also available. Access CSN at http://web7.compaq.com/csn/. Sign-up online or call 1-800-231-9977, option 8. Based on their business model, partners can choose the appropriate partner program. Each partner program has a specific set of service offerings that partners can sell or deliver. Service delivery partners have access to training, service delivery methodologies and varying levels of technical support, along with additional opportunities to partner with HP to deliver a complete solution to their customers. HP is currently streamlining the authorization process for partners - check with your local channel account manager for more information.
Rev. 3.41 3 21

Servicing HP ProLiant Server Products

Information and Support Tools In addition to the various ordering and reporting functions of CSN, there are many tools that are valuable to service personnel. To access these tools, login to CSN and select Tools List from the menu. Next, select Information and Support Tools. The following selections are available: Diagnostic Tools/Utilities provides a search engine by operating system for HP Setup, Diagnostics, and Insight Manager. Links to any training or references for the utilities are available, as well as links to the Softpaqs needed to load these utilities. Technical Information/QuickFind provides a search engine to QuickFind and other technical databases. Search by serial number, document number, product, operating system, and geographic location. Also allows for selection of Service Advisories and Service Bulletins. Technicians Toolbox provides subscriptions to technician tools. Also available is a Basic Toolkit which includes, QuickFind, SmartStart for Servers, and the Support Software CD for Portables, Desktops, and Professional Workstations. Vendor Links provide direct access to vendor support websites (Comm, Baan, Cisco, Intel, Microsoft, Nortel, Novell, Oracle, and SCO) Vendor Support Forum provides access to online forums (Baan, Cisco, Intel, Microsoft, and Novell) allowing technicians to discuss products or service challenges with other IT professionals

3 22

Rev. 3.41

HP Service Resources

HP partnership web

(Formerly Compaq business partner login) This site provides information on


How to become a partner How to find a local HP partner Referral tools for business partners Training and certification

Site location is http://partner.americas.hp.com/

Rev. 3.41

3 23

Servicing HP ProLiant Server Products

Information and notification services


HP customers and partners can receive timely notification of critical updates and product changes by subscribing to the services as described below:

ActiveUpdate Active Update is a web-based client application that provides proactive notification and automatic delivery of software updates for HP servers, desktops, portables, workstations, and handheld PCs. It connects you to a secure HP server that delivers the latest updates and notifications based on your subscription profile. Once they are downloaded to your local or networked database repository, you choose which updates to implement, and when to deploy them.

Product Change Notification The Product Change Notification system uses a secure web site for proactively communicating product changes via e-mail. Based on a customer provided profile, PCN notifies customers 30-60 days in advance of upcoming critical changes that may impact their computing environment.

Customer Profile Center The Customer Profile Center allows customers to receive information via email that's relevant to their needs--the latest product announcements, updates, news, and special offers that will help users make more informed purchasing decisions. The following table summarizes and compares the features of the information and notification services described above: Comparison of information services
Active Update
Proactive delivery of information and software for specified SoftPaqs via web-based subscription service. Customer controls which updates to deploy and when. Requires the installation of the ActiveUpdate client application. Available for Microsoft Windows platforms only.

PCN
Proactive delivery of information on planned hardware and software changes via email. Notification is sent 30 to 60 days in advance. Does not require the installation of client software.

Profile Center
Proactive delivery of information and links for response to marketing offers via email.

Does not require the installation of client software.

3 24

Rev. 3.41

HP Service Resources

ActiveUpdate
ActiveUpdate is a web-based application that proactively notifies and delivers the latest software updates for HP servers, desktops, workstations, and portables.

Saves you time by downloading and storing new updates automatically Delivers information customized to your needs Provides easy to understand descriptions about the software updates Simplifies access to the latest software updates for HP servers, desktops, portables, and workstations by providing a single point of access

System administrators can subscribe to software updates by server, desktop, workstation or portable models at http://h18000.www1.hp.com/products/servers/management/activeupdate/. Select the models, operating systems, and languages for the SoftPaq files you want downloaded. You must submit your subscription in order to receive downloads. Minimum requirements for using ActiveUpdate are as follows:

Operating System Windows 95/98, Windows 2000 Professional, or Windows NT Workstation/Server 4.0. Minimum Hardware Pentium or higher recommended. Memory Minimum 32MB RAM for Windows 95/98, 64MB forWindows 2000 Professional or Windows NT 4.0. Disk Space 20MB for the ActiveUpdate software and 1GB for the local cache. Internet Connection Internet connection required (direct or dialup) Web Browser Microsoft Internet Explorer 5.0 or higher.

Rev. 3.41

3 25

Servicing HP ProLiant Server Products

ActiveAnswers

ActiveAnswers provides a dynamic set of tools, e-services, and information to help customers plan, deploy, and operate business solutions. Designed for CIOs, IT managers, VARs, systems integrators, and consultants, ActiveAnswers simplifies solutions to help you achieve faster returns on your IT investments. First-time users are asked to register. Previous users can log on and begin their research at http://h71019.www7.hp.com/ActiveAnswers/ Categories available for research include:

Customer Relationship Management Database and Business Intelligence ERP and Supply Chain Infrastructure and Architecture Internet and E-Commerce Messaging/Collaboration and Portals Telecom and Service Providers

3 26

Rev. 3.41

HP Service Resources

Reseller Services
HP provides 24-hour-a-day, 7-day-a-week support and services for its Authorized Service Providers at 1-800-231-9977. (Outside of North America, contact your local Geo.) You must supply your Authorized Reseller ID to use this toll-free service. A call prompter answers your call with a recorded message, then instructs you to select the type of service needed. Sections available include:

Technical Support Spare part information, warranty verification, Service Order Management, or field return receiving Product features and configuration information Accredited Systems Engineer Support or HPCare Systems Partner - special IDs are required to access these services
When you call for technical assistance with your HP server or server option be sure to have the following information available: HP Reseller ID# Product name, model number, and serial number Hardware configuration and expansion boards installed Detailed description of any error messages and any associated error codes Knowledge of the conditions under which the problem occurred Familiarity with any previous troubleshooting steps taken Hard copy of data (INSPECT) Hard copy of System Configuration Resource Map Version of network operating systems Printouts of software configuration files setup Updated ROM and drivers, and recorded versions All network operating systems patches installed and up to date An INSPECT, Survey, or Insight Manager report ready to fax or e-mail

Rev. 3.41

3 27

Servicing HP ProLiant Server Products

Product Information Center


The HP Product Information Center can help you obtain HP product information, locate an authorized reseller, order software or purchase out-of-warranty parts. Most services are available 24 hours a day, 7 days a week, including holidays. To contact the HP Customer Support Center, call 1-800-345-1518. (In Canada, call 1-800-567-1616. Outside of North America, contact your local Geo.) The Product Information Center can:

Provide pre-sales and post-sales product information for commercial and consumer, desktops, workstations and portable units. Send brochures and QuickSpecs. Provide part number, configurations, and upgrade information. Provide assistance with dealer locations if you are unable to access it from the option on the menu.

Technical Support Center


HP Technical Support, warranty service and software support is available 7 days a week, 24 hours a day, 365 days a year at 1-800652-6672 (1-800-OK-COMPAQ). (Outside of North America, contact your local Geo.) HP receives more than 17,000 telephone calls a day for support and technical assistance. For the HP Technical Support Engineer to assist you, it is essential that you have complete information about the system and problems for which you are requesting assistance. It is best to call from the location of the machine being serviced.

3 28

Rev. 3.41

HP Service Resources

Training

You can obtain HP training information and registration from the following sources:

U.S. 1-800-732-5741 Canada 1-800-392-7024 Outside of North America, contact your local Geo.

Information and registration can also be done on the Education and Training website at http://h18014.www1.hp.com/training/ The following information/services are available:

Training schedule or information via Fax Self-paced training or training installation video information and ordering Sales and technical training information and registration Student ID number for registering in Service, Sales or HP Accredited Professional classes

Rev. 3.41

3 29

Servicing HP ProLiant Server Products

Learning Check
1. A unit configuration code is embedded in the 12-digit serial number used in pre-merger HP. True False

2. The MSG includes an illustrated parts catalog and is a good place to find spare part numbers. True False

3. Which service resource provides partners opportunities to enhance business capabilities to sell and deliver global IT services through a Web-based management system? a. ActiveUpdate b. ActiveAnswers c. Channel Services Network d. Support website 4. Which service is a web-based client application that provides proactive notification and automatic delivery of software updates for HP servers? a. ActiveUpdate b. ActiveAnswers c. Service Parts Information d. Support website 5. What information resource would contain parts removal procedures? a. Service Bulletin b. Support website c. Service Parts Information d. Maintenance and Service Guide 6. Which of the following would have information about new products? a. Service Advisories b. Service Announcements c. Service Bulletins d. All of the above
3 30 Rev. 3.41

HP Service Resources

7. What information resource would allow you to identify all HP products that use or reference a specific part number ? a. Maintenance and Service Guide b. ActiveAnswers c. Channel Services Network d. HP PartSurfer 8. Where would find a link to the Spare Parts Store? a. HP support website b. ActiveAnswers c. Channel Services Network d. HP PartSurfer 9. What resource would you use to locate white papers for a particular ProLiant server? a. HP support website b. ActiveAnswers c. Channel Services Network d. HP PartSurfer 10. What resource provides a dynamic set of tools, e-services, and information to help customers plan, deploy, and operate business solutions? a. ActiveUpdate b. ActiveAnswers c. Channel Support Network d. HP Support website

Rev. 3.41

3 31

Servicing HP ProLiant Server Products

3 32

Rev. 3.41

Server Technology
Module 4

Introduction
To develop outstanding troubleshooting skills, it is important to know how the servers function normally. The remaining modules focus on the tools needed to develop those troubleshooting skills. This module provides an overview of Server Technology specific to ProLiant servers. It describes the various devices and subsystems that make up a ProLiant server. Topics include:

PCI and PCI-X SCSI technologies Server subsystems


Processors Memory Power Input/output Software

Network adapters Fault prevention and recovery management

Objectives
To demonstrate an understanding of ProLiant server technology, service engineers should be able to:

Describe the features of PCI, PCI-X and SCSI bus architectures Identify the various subsystems that make up a server Identify the components of each subsystem Describe the interaction between the various server subsystems Describe how the server will react in a failure situation

Rev. 3.41

41

Servicing HP ProLiant Server Products

PCI
PCI architecture and features
The PCI bus (Peripheral Component Interconnect) is a local-bus design developed by Intel, Compaq, DEC, IBM and NCR in late 1991. The focus is oriented around electrical specifications at the expense of ease of integration. The PCI standard offers a number of features and advantages. The fundamental design of PCI involves a buffered local bus. The bus always utilizes some sort of PCI bridge which provides a number of advantages, most importantly making the PCI bus processor independent. Bridging allows buffering and concurrent bus master access, suiting the bus to multitasking environments. By placing a bridge between the bus and the microprocessor the bus frequency may be standardized, eliminating the issues caused by varying processor frequencies in different systems. PCI also offers 64-bit support, making the bus very well suited to Pentium implementations.

L2 Cache

System Bus

CPU PCI Bridge

PCI Bus

SCSI

PCI Slots

EISA Bus
Diskette

PCI EISA Bridge


COM LPT EISA Slots

42

Rev. 3.41

Server Technology

PCI Features The most significant features and characteristics of the PCI bus are:

Provides switchless and jumperless support. Plug and play capable, no requirement to run a configuration utility in a PCI only system. (A PCI EISA system will require the configuration utility.) Utilizes a multiplexed bus, meaning addresses and data move over the same set of wires. This requires less system board space and results in less traces or wires than ISA/EISA. The processor independent design allows the bus to be supported under many processors. One option card will work in multiple different processor computers. The bridging requirement protects against potential problems associated with high-speed host/local buses. Supports intelligent I/O devices and burst mode transfers of 133 MB/second. The PCI bus may buffer read or write activity to allow the processor to continue with other tasks rather than wait for the I/O operation to complete. Currently, HP uses 32 and 64-bit wide PCI buses. 64-bit PCI cards can be installed in a 32-bit bus, however they will only run at 32-bit. Parity checking is done on all server PCI buses (control, address, data).

PCI Bus Speeds and Transfer Rates The transfer rates of ProLiant systems are determined by the speed of the various buses, which are derived from the system clock. Peripherals such as the graphics subsystem, network controllers, and hard drives take advantage of the PCI local bus for faster system throughput. The PCI bus provides a 32-bit data path operating at 33MHz or 66 MHz, or a 64bit data path operating at 33MHz or 66MHz. This greatly increases the performance of peripherals such as the graphics subsystem and the network controller.

PCI Bus Performanc e 32-bit c ard 33MHz 133MB/s 32-bit c ard 66MHz 267MB/s 64-bit c ard 33MHz 267MB/s 64-bit c ard 66MHz 533MB/s

Rev. 3.41

43

Servicing HP ProLiant Server Products

PCI-X: An Evolution of the PCI Bus


PCI-X is an evolutionary bus architecture based on the prevalent PCI bus. PCI-X technology leverages the wide acceptance of the PCI bus and provides an evolutionary I/O upgrade to conventional PCI. PCI-X 1.0 technology increases bus capacity to more than eight times the conventional PCI bus bandwidth from 133 MB/s with the 32-bit, 33-MHz PCI bus to1066 MB/s (1GB/s) with the 64-bit, 133MHz PCI-X bus. PCI-X achieves this performance through the use of a register-to-register design that allows higher clock frequencies and new protocol enhancements such as the attribute phase and split transactions that allow more efficient use of the bus. PCI-X technology is backward compatible with conventional PCI systems at the system, device driver, and the adapter level. Conventional PCI adapters will operate in PCI-X systems, and vice versa. PCI-X (like PCI) adjusts the operating frequency to match that of the slowest device on the bus. If a 33-MHz adapter is present, the bus must operate at 33 MHz. If only conventional 66-MHz devices are present, a PCI bus optionally operates in conventional 66-MHz or 33-MHz mode. Introduction of the PCI-X 2.0 specification adds two new transfer rates, PCI-X 266 and PCI-X 533 which provide bandwidths of 2 GB/s and 4 GB/s respectively. PCIX 2.0 is backward compatible. Existing PCI and PCI-X adapter cards will operate in PCI-X 2.0 systems and new PCI-X 2.0 adapter cards will operate in existing PCI and PCI-X systems. The table below summarizes the clock speed, voltage levels and bandwidth of the PCI and PCI-X specifications:
PCI and PCI-X Specifications

Spec PCI 2.3

Year Clock Level MHz 2002 33 66 Volts 3.3 3.3 3.3 3.3 3.3 3.3 3.3

32-bit

64-bit 266 533 533 800 1070 533 800 1070 2130 4270

BW MB/s BW MB/s 266 266 400 533 266 400 533

5 or 3.3 133

PCI-X 1.0 1999 66 100 133 PCI-X 2.0 2002 66 100 133 266 533

3.3 - 1.5 1070 3.3 - 1.5 2130

44

Rev. 3.41

Server Technology

PCI bus slots The PCI interface provides two bus widths (32- and 64-bit) and two signaling levels (5- and 3.3-volt). Below are the four types of PCI expansion slots used on personal computers and servers.
32-bit connector 32-bit, 5v 32-bit, 3.3v 64-bit, 5v 64-bit, 3.3v 64-bit connector

Adapter bus widths and slot bus widths are completely interoperable; a 32-bit card can be used in a 64-bit slot and a 64- bit card can be used in a 32-bit slot (although its operation will be limited to 32-bit transfers). However, the signaling level may restrict where a card can be installed. A keyed scheme is used on the 32-bit connector to determine the signaling level.

Correspondingly, PCI cards are keyed in one of three ways: 5-volt, Universal, or 3.3-volt (shown below). A Universal card can be installed in either a 5- or 3.3-volt slot, but a 5- or 3.3volt card must be installed in a slot that specifically supports its level of signaling. The latest systems designed to support faster slots with 3.3-volt signaling will accept only Universal and 3.3-volt PCI cards. Legacy cards keyed for 5-volt signaling will not work in systems that provide only 3.3-volt slots.

Rev. 3.41

45

Servicing HP ProLiant Server Products

SCSI Architecture
SCSI (Small Computer System Interconnect) is a system-level parallel channel or I/O bus designed for interconnecting peripheral devices that have intelligent controllers, thereby allowing control signals and data to flow to all peripherals. The (smart array) controller does not need to know how many cylinders, heads, or sectors are available on each device. The local device intelligence is capable of managing these functions, including errors. Servers and workstations use SCSI to communicate with a variety of external RAID storage devices. The SCSI system contains three main components:

SCSI Controller SCSI Bus (cable) SCSI Device(s)

SCSI Controller The SCSI controller is the interface between the computer and the other devices on the bus. The controller may be built into the mother board or housed on a SCSI host bus adapter card in a PCI or PCI-X slot.

46

Rev. 3.41

Server Technology

SCSI Cables SCSI cables consist of 34 pairs of multi stranded flexible copper wires for a total of 68 conductors. SCSI devices inside the server are connected to the SCSI controller using a 68-pin ribbon cable. The ribbon cable has a connector at each end and one or more connectors along its length. External SCSI devices are connected to the SCSI controller on the SCSI host bus adapter using a round 68-pin cable. Two terminators, one at each end of the SCSI bus prevent signal reflections within the cables. SCSI Devices All SCSI devices share the same data and control lines and only two devices (an initiator and a target) can communicate at a time. To facilitate communication on the bus each device must have a unique address or ID number. The number of physical addresses on a bus is a function of the bus width. There can be up to eight devices on an 8-bit bus (ID numbers 0 to 7) and up to 16 devices on a 16-bit bus (ID numbers 0 to 15).

Rev. 3.41

47

Servicing HP ProLiant Server Products

SCSI Protocols
SCSI-1 SCSI-1 devices used proprietary commands and were very often incompatible with each other. SCSI-1 supports 5 MB/s transfers in synchronous mode and 3 MB/s transfers in asynchronous mode. In synchronous mode a block of bytes as a whole is acknowledged by the target. In asynchronous mode each byte is acknowledged by the target. Today this mode is only used during the command phase. SCSI-2 New speed levels allow for 20 MB/s (Fast Wide). One of the greatest achievements of the SCSI-2 specification is the Common Command Set which allows devices from various vendors to cooperate. SCSI-3 SCSI-3 defines Wide Ultra (40 MB/s), Wide Ultra2 (80 MB/s), Ultra3 (160 MB/s) and Ultra320 (320 MB/s) transfers as well as the new LVD interface. The following table summarizes the characteristics of SCSI protocols.

Fast / Fast Wide SCSI Fast SCSI reduces the signal length from 200ns to 100ns and doubles the transfer rate to 10 MB/s. Reducing the signal length makes the signals much more prone to distortion. Fast SCSI requires active termination, high quality cables and supports cable lengths up to 3 meters. Fast Wide SCSI allows a transfer rate of 20 MB/s. Ultra / Wide Ultra SCSI By reducing the signal length to 50ns, the transfer rate achieves 40 MB/s for a Wide Ultra SCSI transfer. The cable length is reduced to 1.5 meters (6 feet). High quality cables witch a matched impedance are required.
48
Rev. 3.41

Server Technology

Wide Ultra2 SCSI Wide Ultra2 is the first SCSI protocol that uses low voltage differential (LVD) signaling instead of Single Ended (SE) signaling. The transfer rate is 80 MB/s and cables can have a length of up to 12 meters. Ultra3 SCSI Also called Ultra 160, Ultra3 SCSI not only increases the transfer rate to 160 MB/s but also introduces improved data reliability by adding a CRC checksum. Ultra3 SCSI has a transfer rate of 160 MB/s and supports domain validation and CRC. Ultra3 SCSI is only available as Wide SCSI. Ultra 320 SCSI Ultra 320 increases throughput to 320 MB/s and adds technologies that improve bus utility and data integrity. These include Higher clock frequency Ultra 320, like Ultra3, uses double transition clocking to trigger data transfer on both the rising and falling edges of the bus clock signal. It also operates at 80 MHz, twice the frequency of the Ultra3 (Ultra 160). Data streaming Read data streaming minimizes the overhead of data transfer by allowing the target to send one data stream packet followed by multiple data packets. Write data streaming performance is also increased because the bus turnaround delay is not incurred between each data packet. Packetization and QAS During arbitration no data is being transferred on the bus so decreasing arbitration time improves SCSI system performance. Quick arbitration and selection (QAS) eliminates the bus free phase and reducing the number of times arbitration must occur. In other words, QAS allows a device waiting for the bus to grab the bus without arbitration after the previous initiator and target disconnect. Together, QAS and packetization increase performance by 20 to 30%. Flow control Flow control allows the initiator to optimize its pre-fetching of data during writes and flushing of data FIFOs during reads. The target device indicates when the last packet of a data stream will be transferred which allows the initiator to terminate the data pre-fetch or begin flushing data FIFOs sooner than previously possible.

Rev. 3.41

49

Servicing HP ProLiant Server Products

Electrical Interface
There are three electrical levels of SCSI: Single Ended (SE), High voltage differential (HVD) and Low voltage differential (LVD)

Data line

Data+ line

> 60 mV > 2.5 volt

signal level

signal level

Data- line

Common Ground

Common Ground

Single Ended

LV Differential

Single Ended (SE) SCSI Single-ended SCSI uses the ground line as a signal reference. The receiver detects the magnitude of the signal (TTL technology) and decides whether the signal is a logical one or a logical zero. SE SCSI is very sensitive to ground shifts and electro magnetic interference (EMI). For this reason single-ended SCSI allows only for short cables. High Voltage Differential (HVD) SCSI A much lesser used SCSI technology is High Voltage Differential (HVD) SCSI, where the signals travel on two wires. The difference in voltage between the wire pairs determines if the signal is a logical one or zero. HVD technology has excellent noise immunity and a maximum cable length of 25 meters but requires external transceivers making it more expensive than SE and LVD. The fastest transfer mode supported by HVD is Wide Ultra. Older tape libraries used HVD. HVD SCSI devices cannot be mixed with SE or LVD SCSI devices. Low-Voltage Differential (LVD) SCSI Low Voltage Differential SCSI takes all the advantages from HVD and adds new features. Low signal voltage swings allow the whole technology to be integrated on a single chip. LVD SCSI is backwards compatible with Single Ended SCSI. However, once a single ended device is connected to the bus, all devices will operate in SE mode. LVD was first implemented in Ultra2 SCSI technology. Maximum cable length is 12 meters. All newer SCSI developments starting with Ultra2 SCSI must use LVD signaling.
4 10
Rev. 3.41

Server Technology

SCSI bus termination


Both ends of the SCSI chain are terminated, and the termination may be internal to the SCSI devices that are on the ends of the cable. Passive termination uses only resistors. It works well for standard SCSI and Fast SCSI. Active termination uses a voltage regulator and is required for Wide Ultra SCSI and all faster SCSI protocols.
TermPwr 220 ohms Data line 3 volts 330 ohms 110 ohms Ground Data line 2.85V voltage regulator TermPwr

Ground

Proliant servers and options do not use termination at the device level. Internal termination for ProLiant servers is active and handled on the bus. Use cables with integrated active termination. Drive cages have active termination on the backplane board. Disk drive enclosures are also terminated by default. No action is required when a disk enclosure is connected to a server. Devices placed on the hot-plug backplane should not be terminated. This is handled actively onboard the controller with active termination applied at cable end.

Rev. 3.41

4 11

Servicing HP ProLiant Server Products

SCSI device compatibility


LVD Backward Compatibility LVD SCSI devices are backward compatible. When Ultra3 and Wide Ultra2 devices are connected to a Ultra3 controller, Ultra3 devices will operate in Ultra3 mode and WU2 devices will operate in WU2 mode (see illustration above). LVD/SE Backward Compatibility LVD SCSI devices are compatible with SE (single ended) devices. If LVD devices are mixed with Wide Ultra devices, all devices will fall back to SE operation and run in Wide Ultra mode. LVD operation is not possible under these circumstances. In order to profit from the high data transfer rates of LVD SCSI, it is recommended not to mix LVD and SE devices. Ultra320, Ultra3 and Ultra2 transfer rates will only be enabled when all devices on the bus use LVD signaling. This requires a LVD controller (Ultra320, Ultra3 or Wide Ultra2), LVD devices (all of them) and LVD compatible termination (eg. new cables with active LVD terminator)
Ultra3 Ultra320 Wide Ultra2

LVD 160 MB/s Ultra320

LVD 320 MB/s

LVD 80 MB/s

All devices run at full speed (if speed is supported by controller)


Ultra3 Wide Ultra2 Wide Ultra

Ultra3

LVD 40 MB/s

LVD 40 MB/s

SE 40 MB/s

All devices run in Wide Ultra mode (fallback to SE)


Do not use the old standard SCSI cables with an attached terminator. This terminator only supports the single ended operating mode. Ultra320, Ultra3 and Ultra2 both require a new sort of terminator that supports both LVD and single ended operation.
4 12
Rev. 3.41

Server Technology

SCSI configuration
SCSI IDs SCSI IDs are usually set up on the drives by selecting a unique ID number through an array of jumpers. ID7 is reserved for the SCSI Host Bus Adapter (Smart Array Controller). IDs are automatically set with HP hot-pluggable hard drives. SCSI devices SCSI devices are daisy chained together using a common conductor or cable. This conductor is a hot-plug backplane in most ProLiant servers. All signals are common between all SCSI devices on the 50- or 68-pin cable. Internal and External Connectors The internal and external connectors of a single SCSI bus cannot be used at the same time. If you have both internal and external devices, two separate SCSI channels must be used. This requires two controllers or a multi-channel controller.

Port-1

Port-1

Port-2

Port-1
Port-2

Port-2

Maximum Supported Devices per Bus Single Ended SCSI supports up to 7 devices - for up to 15 devices a repeater is required. LVD-based SCSI supports up to 15 devices per bus without a repeater. In many documents the maximum number is stated as 14 because the largest StorageWorks drive enclosures support only 14 drives.

Rev. 3.41

4 13

Servicing HP ProLiant Server Products

SCSI Connectors
These illustrations show the various wide and narrow internal and external SCSI cable connectors.

To help identify SCSI cables, keep the following characteristics in mind:


External SCSI cables have a round wire with securable connectors. Internal SCSI cables have a ribbon wire with push-on connectors. SCSI cables are keyed to deter improper installation Internal Wide SCSI cables are narrower than standard internal SCSI cables. The external 68 pin (Fast-Wide) cable is wider and the internal 68 pin (FastWide) cable is smaller than the 50 pin one used by Fast-SCSI.

4 14

Rev. 3.41

Server Technology

Booting from a SCSI controller


Boot Controller Order Use the System Configuration Utility or ROM Based Setup to select the boot controller. The controller which is set as primary controller will be the boot controller. Booting from a Standard SCSI Controller Embedded SCSI controllers and PCI SCSI controllers usually boot from the disk drive with the lowest SCSI ID. Always set the ID of the boot disk to 0. This will help to avoid problems when additional drives are added later. Booting from a HP SMART Controller SMART controllers always boot from the first logical drive. The SCSI ID is not relevant in this case. Make sure to configure the boot drive first. The boot order cannot be changed after the operating system has been installed.

Rev. 3.41

4 15

Servicing HP ProLiant Server Products

Serial Attached SCSI


Serial Attached SCSI (SAS) is the logical evolution of the traditional SCSI interface. As its name implies, Serial Attached SCSI transports the SCSI protocol over a serial interface. This enables faster device interconnect speeds, simpler cabling and improved system reliability. The Serial Attached SCSI specification revision 1.0 is expected to be released as an ANSI standard in the first half of 2003. Products are expected to follow in limited numbers in 2004 and begin shipping in volume in 2005. SAS will use the same electrical and physical interface as Serial ATA (SATA) which will allow its controller to accept either a SATA or SAS hard drive. However, the SAS connector has a filled-in notch to ensure that its hard drives cannot be plugged into a SATA controller. This is because Serial ATA host controllers do not understand SCSI protocols. Serial Attached SCSI also provides connectivity to a large number of drives using a device called an expander. Expanders are virtual circuit switches that are configured between SAS drives and the host controller. Each expander allows connection to 64 ports including host connections, hard drives or other expanders. Edge expanders are typically housed in the drive enclosure while fan-out expanders support large configurations up to 4096 total devices. Today two different backplanes are used to support parallel SCSI and parallel ATA. Using a common interface will enable manufacturers to use a single backplane to support both Serial Attached SCSI and Serial ATA. This will reduce both cost and complexity by reducing the number of layers and signal traces in the backplane. End users will benefit from the reduced expense of upgrading from ATA to SCSI hard drives. ATA drives are used where cost and capacity are important. As the users growth and application require the higher performance and reliability an upgrade to Serial Attached SCSI hard drives will be facilitated by the common interface.

4 16

Rev. 3.41

Server Technology

Hot-Plug Drives
Hot-plug drive support allows a failed physical drive in a hardware or software fault tolerant volume to be replaced while the computer is still running. This support requires an array controller supporting hot-plug drives and a hot-plug drive bus. The family of SMART Array Controllers provides this capability.

On-Line Drive Access Drive Failure


zz

Hot-plug support enhances the capability of the On-Line Spare drive as the failed drive may be replaced while the computer is still running. The On-Line Spare may become available for any further failed drives. Replaced hot-plug drives may be equal or larger in size to the original drive. Note the following about hot-pluggable hard drives:

The drives require support by the SCSI controller and bus. They support a variety of both hardware and software fault tolerance. In a fault tolerant environment, a failed drive can be replaced without bringing the system down.

Rev. 3.41

4 17

Servicing HP ProLiant Server Products

Server Subsystems
The main components of a system are the buses, the controllers, and the subsystems. A server has five subsystems:
Subsystem Processor Components or FRUs Processor(s) Processor board(s), if any System board Processor bus circuitry, including GTL (Gunning Transceiver Logic) bus Terminator board Memory modules Memory expansion board(s) Processor boards that have SIMM sockets and soldered memory Memory controller chip is usually on system board, but some system boards have memory module sockets or soldered-on memory Internal: Power supply 230/115 V AC switch On/Off switch Voltage Regulator Module (VRM) or Power Processor Module (PPM) Fans System board thermistors Access panels External: Uninterruptible Power Supply (UPS) and cables Power cord Power strip or power distribution unit Outlet Line voltage Wall switch Rack-mountable blanking panels Input Keyboard devices Mouse Video display (touch screen) Output devices Video display IMD (Integrated Management Display) Input/Output Serial and parallel ports devices Mouse and keyboard ports Expansion cards (SCSI controller, Video controller, Network Interface Controller, SMART Array controllers, Modem) Storage devices (hard drive, CD-ROM drive, diskette drive) Operating system/network operating system Applications Device drivers Users data files Tools and utilities

Memory

Power

Input/Output

Software

4 18

Rev. 3.41

Server Technology

Processor Subsystem
In general, the processor controls all the activity between the elements that make up a computer. The other devices in the computer are controlled by the program running in the processor. The processor controls the devices by placing a control signal and an address onto the system bus. If the device controller sees the address and control signal it has been configured to react to, it then responds, either reading the data bus (processor WRITE) or placing data onto the data bus in response (processor READ).

Processor Types
Pentium 4 The Intel Pentium 4 processor has a new hyper pipelined design (a 20 stage pipeline vs 10 stage for Pentium III). The deeper pipeline enables instructions inside the processor to be queued and executed at a much faster rate, allowing processors to achieve higher clock speeds. Intels name for new Pentium 4 features is NetBurst.

Following are features of the Pentium 4 processor: Single Instruction Multiple Data (SIMD) uses multiple data elements that are packed into a single instruction. MMX instructions operate simultaneously on two 32-bit integers while SSE instructions simultaneously operate on four 32-bit floats. Eight new 128-bit registers were added for SSE. SSE2 extends MMX and SSE technology. SSE2 is a set of 144 new instructions that are compatible with the original 70 SSE instructions and 57 MMX instructions. The 64-bit MMX instructions are extended to 128-bits and now support two 64-bit double precision FP operations at the same time. This accelerates encryption, video, speech and scientific applications.
Rev. 3.41

4 19

Servicing HP ProLiant Server Products

Two arithmetical logical units (ALUs) called double-pumped ALUs have twice the effective performance compared to the ALUs of Pentium III processors as each ALU is capable of executing an operation in every half-clock cycle. Execution Trace Cache - The instruction trace cache is a L1 cache that caches decoded IA-32 instructions and helps to remove decoder pipeline latency. Quad Pumped Front Side Bus (FSB) The FSB is still clocked with 100 MHz. The Pentium 4 processor however, can transfer 4 data sets (one set is 64 bits) per clock cycle. As a result, the FSB has a bandwidth of 3.2 GB/s. The 100 MHz Quad Pumped bus is also referred to as the 400 MHZ FSB. 2.53 GHz and faster P4 processors have a 533 MHz FSB (4.0 GB/s). Xeon and Xeon MP processors The dual processor-capable (DP) and multiprocessor-capable (MP) versions of the Pentium 4 processor are called Intel Xeon and Xeon MP. The dual processorcapable Xeon processor allows two processors to work together in a single system. It offers a larger cache (512 KB or 1MB) than the single processor version (256 KB). The pin layout is different from the single processor version.

(DP) version

MP version

The multiprocessor-capable Xeon MP processor allows four processors to work together on a single front side bus. The L3 cache is available in a 2MB, 1 MB or 512 KB version. Using special chipsets, multiple groups of four processors can be combined into 8-way, 16-way and 32-way systems. Each physical Xeon processor consists of two logical processors. With HyperThreading technology, the two logical processors can execute different tasks simultaneously using shared hardware resources. From a software or architecture perspective, this means operating systems and user programs can schedule threads to logical processors as they would on multiple physical processors.

4 20

Rev. 3.41

Server Technology

Hyper-Threading Technology A system with processors that use Hyper-Threading technology appears to software as having twice the number of processors than it physically has. The two logical processors per chip can execute different threads simultaneously using shared hardware resources.
physical processor 1 physical processor 2 physical processor 3 physical processor 4

LP-1 LP-2

LP-1 LP-2

LP-1 LP-2

LP-1 LP-2

logical processor 1 and 5

logical processor 2 and 6

logical processor 3 and 7

logical processor 4 and 8

From a software or architecture perspective, this means that operating systems and user programs can schedule threads to logical processors as they would on multiple physical processors. From a hardware perspective, instructions from both logical processors will execute simultaneously on shared execution resources. The end result is a performance boost for multi-threaded and multi-tasked software. Hyper-Threading can be switched off in RBSU for software that cannot profit from hyper-threading. Otherwise, the operating system may execute a job on the idle logical processor that repeatedly checks for work to do (idle loop) consuming significant execution resources. Operating Systems and Hyper-Threading A server with four physical processors may exceed the license limit of the OS, if the OS cannot differentiate between physical and logical processors. Once Windows 2000 reaches the license limit, it will only use the number of processors supported by the OS license. In the example above, Windows 2000 Server would only use logical processors 1, 2, 3, and 4. Windows 2000 Advanced Server would use all eight. A four-processor license for Windows.Net would use all 8 logical processors. Netware 5.0 and higher supports Hyper-Threading but requires a special driver (CPQMPK.PSM). Linux and Solaris 8 also support Hyper-Threading.

Rev. 3.41

4 21

Servicing HP ProLiant Server Products

Pentium III processor (Coppermine) Second generation Pentium III processors (code name "Coppermine") are based on 0.18 technology. They have 28 million transistors and use a new 370 pin packaging, called FlipChip Pin Grid Array (FC-PGA). FC-PGA Pentium III processors are available with 133 or 100 MHz FSB and a 256 KB 8-way set associative L2 cache that runs at full processor speed. The Coppermine core is identical with Coppermines in a Slot-1 cartridge.

Service Issue When the processor heat sink is removed and the same heat sink is reinstalled, the thermal contact between the processor and the heat sink becomes damaged. This causes the processor to overheat. To resolve this issue, a new heat sink should be installed.

4 22

Rev. 3.41

Server Technology

Pentium III processor (Tualatin) The latest Pentium III processors are based on 0.13 technology (code name "Tualatin"). The Pentium III processor now has a 512KB L2 cache and is still using the FC-PGA package. The package has an Integrated Heat Spreader (IHS) and is labeled as FC-PGA2. The 370-pin zero insertion force socket (PGA370) is the same as the socket used for Coppermine processors. Coppermine and Tualatin processors are not compatible.

FC-PGA2 with IHS

FC-PGA

All Pentium III processors implement a Dynamic Execution micro architecture, a unique combination of multiple branch prediction, data flow analysis, and speculative execution. The processor can execute MMX instructions for enhanced media and communication performance. Additionally, streaming singleinstruction, multiple data (SIMD) extensions for enhanced floating point and 3-D application increase performance. Multiple low-power states can significantly reduce power consumption. The processor includes an integrated on-die 512KB 8-way set associative L2 cache. The L2 cache implements the Advanced Transfer Cache Architecture with a 256-bit wide bus. The processor also includes a 16 KB L1 instruction cache and 16 KB L1 data cache. All caches run at full processor speed. The Tualatin has a cacheable memory space of 64 GB and allows systems with more than 4 GB of RAM. The 0.13 Pentium III processor uses a lower voltage on the front side bus than the 0.18 based processors. As a result, Tualatin processor with 512KB L2 cache will not work in a previous generation platform due to incompatible system bus signal levels. ULV Version The ultra low voltage (ULV) version of the Tualatin has a 100 MHz front side bus and is not compatible with the standard version of the Pentium III processor. It is used in systems that require ultra low power consumption (e.g., notebooks, ProLiant BL10e).

Rev. 3.41

4 23

Servicing HP ProLiant Server Products

Pentium III Xeon processors The Pentium III Xeon processor is based on the Pentium III core with a few additions. The L2 cache is 512 KB, 1MB or 2 MB and operates at full speed. The Pentium III Xeon uses the same SC330 package as Pentium II Xeon processors and has all the new features that were introduced with the Pentium III.

72 new instructions designed especially to enhance the performance of floating point operations and to accelerate memory-access. Eight new 128-bit registers have been added to the IA32 architecture. The Pentium III Xeon requires different voltages for the processor core and the L2-cache. Older Pentium II Xeon based systems can be upgraded - but not mixed - with Pentium III Xeon processors. The Pentium III has a fixed core speed multiplier, the Pentium III Xeon, however, still requires external setting of the core speed. Thus it is possible to mix Pentium III Xeon processors with different speeds. The cache address limit is 64 GB and the internal multiprocessor support is limited to four processors per GTL+ bus. 8way systems have a total of three GTL+ busses and do support two groups of 4 processors each. Pentium II Xeon systems can be upgraded with Pentium III Xeon processors. They are supported by the existing PPMs (processor power modules). The 100 MHz FSB has a bandwidth of 800 MB/s only and can be a severe performance bottleneck in 4-processor servers. This can be compensated with a large L2 cache.

4 24

Rev. 3.41

Server Technology

Pentium III Xeon processors (DP version) The 800, 866, 933 MHz and 1 GHz Pentium III Xeon processor is based on the Pentium III core. It does not have the typical Xeon features and supports only two processors per system. The L2 cache size is 256 KB. The FSB speed is 133 MHz. In other words: This processor is a standard Pentium III processor in a Slot-2 cartridge instead of a Slot-1 or FC-PGA packaging. The only difference between a 800 MHz Pentium III and a 800 MHz Pentium III Xeon processor is the packaging and the integrated processor power module called On Cartridge Voltage Regulation (OCVR). OCVR

Caution Pentium III Xeon processors with a 133 MHz FSB are only supported in the ProLiant ML530. These processors have a gold-colored heat sink.

Rev. 3.41

4 25

Servicing HP ProLiant Server Products

Processor Steppings Processor steppings are versions of the same processor model that vary only slightly. Each stepping requires changes to System ROM. For each processor stepping Intel provides a microcode patch for inclusion in the System ROM. Within the System ROM there is a table where the patches are stored. HP continually adds newly released Intel patches to keep the ROMs up to date. "Unsupported Processor" Message When a processor is upgraded or the system board is replaced, the server may stop to respond during POST (Power-On Self Test) and the following message is displayed: "Unsupported Processor. System Halted". This happens when the System ROM does not recognize the stepping of the processor. The only solution is to upgrade the System ROM. The RomPaq diskette, however, will not boot after the error message has been displayed. The server must be set to disaster recovery mode. Disaster Recovery Procedure Some servers have a DIP switch labeled "Disaster Recovery". Other systems require the system configuration DIP switch to be set to: 1=on, 4=on, 5=on, 6=on. This setting is not documented in some service manuals. After setting the appropriate switch, insert the RomPaq diskette (CD is not supported). After the system has been powered-on, wait for the beep code that indicates the end of the ROM upgrade. This may take up to 5 minutes. There may be no video output on the monitor during disaster recovery.
Note Upgrade the System ROM before upgrading a processor to avoid the need to use Disaster Recovery.

4 26

Rev. 3.41

Server Technology

Profusion 8-Way Architecture The Profusion chipset joins the two processor buses, the I/O bus, and the two memory ports together through a crossbar switch. The otherwise independent processor and I/O buses are joined by a logical connection that is made only when required to transfer data. The AGTL+ bus running at 100 MHz can support a maximum of five loads per bus. This allows four processors and one connection to the memory controller on each processor bus and up to four host-to-PCI bridges with a connection to the memory controller on the I/P bus. Each of the three AGTL+ buses has independent access to the two memory ports. This architecture prevents I/O traffic from consuming bandwidth on the processor buses. In addition, the use of 100MHz buses and 5 independent paths allows the crossbar switch to deliver an aggregate instantaneous peak throughput of 4GB/s. The following figure shows a block diagram of the 8-way SMP architecture.

The features of the 8-way SMP architecture include:


Dual 100MHz processor buses with dedicated 100MHz I/O bus (AGTL+) 8-way multiprocessing with Pentium III Xeon processors Multiported system architecture (five-point crossbar switch) Dual-ported, interleaved memory Uniform memory access for all eight processors Dual cache accelerators and up to three host-to-PCI bridges Up to 32GB of synchronous dynamic random access memory (SDRAM)
4 27

Rev. 3.41

Servicing HP ProLiant Server Products

Intel Itanium Processors Large Memory Addressability The number of lines (in bits) available to the address bus determines the maximum addressable memory size for a processor. With 64-bit addressing capabilities, the Itanium processor leapfrogs the memory addressing capabilities of the 32-bit processors that preceded it. Intel 32-bit Processors Previous 32-bit processors added four bits of Page Address Extensions (PAE) to translate between 32-bit linear addresses and 36-bit physical addresses. This allowed a theoretical maximum addressable memory size of 64GB. Intel Itanium Processor The Itanium processor steps up to what is commonly referred to as Large memory addressability. Though a theoretical maximum of 16,000,000TB could be reached with 64 address lines in the Itanium, chipset and space constraints allow only 44 physical address pins on the processor. Even with this limitation, the maximum addressable memory is 16TB. The DL590/64 is the first ProLiant server to support the Intel Itanium IA64

EPIC Explicitly Parallel Instruction Computing (EPIC) is a design philosophy. The Itanium architecture is based on EPIC which is a unique combination of innovative features such as predication, speculation and explicit parallelism. Speculation allows the compiler to schedule load instructions ahead of branches and stores to reduce memory latency. Predication eliminates branches and associated branch misprediction penalties. Parallelism enables the compiler to provide more information to the processor allowing it to execute multiple operations simultaneously on a sustained basis.
4 28
Rev. 3.41

Server Technology

Though designed for optimal performance with 64-bit operating systems and software, the Itanium processor supports 32-bit binary compatibility in hardware and does not require software emulation. Machine Check Architecture Enhanced Machine Check Architecture provides advanced error detection, correction and containment which improves the processors ability to contain and fix errors in the caches and on the system bus, reducing downtime. Three Caches The Itanium includes three levels of cache. L1 and L2 caches are integrated into the processor die. The L3 cache is off the processor die, on the cartridge, but runs at full processor frequency. L3 2MB or 4MB of unified, on-cartridge L3 cache organized as 4-way setassociative with 64-byte cache line size. Fully pipelined and optimized to provide fast access to data at a bandwidth of 12.8GB/s using a 128-bit wide cache bus. L2 The L2 cache is 96KB, 6-way set-associative, and fully pipelined with 64byte cache line size. L1 The L1 cache is a 32KB (16KB data and 16KB instruction), 4-way setassociative, and fully pipelined with 32-byte cache line size. Double-Pumped Data Bus The Itanium processor is compatible with a double-pumped data bus. Double pumping a 133MHz bus provides a bus speed of 266MHz, enabling 64-bit system bus transactions between the system controllers and processors at 2.1 GB/s.

Clock cycle

Normal

Double Pumped
data latched

P N
data latched data latched data latched data latched data latched

Double-pumped buses means twice the transactions in a normal clock cycle. Instead of sending, or latching, data out on only one edge of the clock cycle, double-pumped buses send data on the rising and falling edge of the clock cycle. With Itanium systems, there are two overlapping clock strobes, each operating 180 degrees out of phase with the other. Data is sent at that intersection of the two strobes, which happens twice for each clock cycle.
Rev. 3.41

4 29

Servicing HP ProLiant Server Products

Memory Subsystem
Memory stores information for future use. Random Access Memory (RAM) is defined as memory in which the data can be read by the processor, modified through processing, and then written back for storage. The amount of time
Speed Comparison Faster
SRAM DDR SDRAM RamBus SDRAM EDO FPM

Slower

required to either read data from or write data to memory is referred to as access time and is measured in nanoseconds. RAM types include FPM DRAM, EDO RAM, SDRAM, RamBus DRAM (RDRAM), DDR SDRAM and SRAM. The graphic above illustrates the comparative speed of the various technologies.

Memory Packaging
Memory is used in several areas of the computer including the Main System, Cache, and Video. Systems use either SIMMs (Single In-Line Memory Modules) or DIMMs (Dual In-Line Memory Modules). These are small circuit boards on which integrated circuits (ICs) are mounted. SIMMs (Single In-Line Memory Modules) SIMMs were developed to be an easy way to upgrade and downgrade system memory. Modern PCs are designed for a larger 72-pin SIMM, and older system boards use a 30-pin SIMM. The additional pins allow each SIMM to deliver four bytes of data (plus parity) in every memory request. SIMMs are inserted at an angle and pushed back. Incompatibility SIMM connectors can be either gold or tin plated. Contact reliability can be affected if the different metal types are mixed, for example, placing a tin-plated SIMM into a gold-plated memory socket. This metal mixing can cause accelerated corrosion, which results in bad connections and can ultimately cause system failure. Contacts must be the same, gold to gold and tin to tin. SIMMs must be installed at the specified speed for the system. Mixing SIMMs of the specified speed with SIMMs of a lower speed can produce timing differences. When SIMMs of the specified speed are mixed with SIMMs of a higher speed, the higher speed SIMMs will run at the specified speed, not the higher speed. Parity and nonparity SIMMs should not be mixed.
4 30
Rev. 3.41

Server Technology

DIMMs (Dual In-Line Memory Modules) DIMMs are the next advancement in memory packaging. DIMMs with parity are 72 bits wide, while SIMMs are 36 bits wide with parity. DIMMs offer greater capacity. DIMMs are available in 5 volt and 3.3 volt. Systems are designed for a specific voltage, and the sockets and DIMMs are keyed to preventing installation of the wrong ones. Buffered DIMMs use a buffer to help reduce loading on the bus and improve signal quality at the DRAM. All address signals and most control signals are

buffered

unbuffere d 5 V

3,3 V

buffered. Data is not buffered. Unbuffered DIMMs have no buffering between the bus and DRAMs.
Dual Inline Memory Module (DIMM)

DIMM Expansion Board

DIMM key slot

DRAM Technologies
Over the past few years, improvements in DRAM storage density have increased capacity from just 1 kilobit (Kb) per chip to 512 megabits (Mb) per chip. This improvement in storage capacity has reduced the number of DRAM chips required for a particular module capacity. Until recently, computer memory components operated at 5 volts, the industry standard. Today, computer memory components operate at 3.3 volts, which allows them to run faster and consume less power. Memory Access Time Memory access time is measured in billionths of a second (nanoseconds, ns). Although DRAM density has improved significantly over the last few years, DRAM speed has not kept pace with processor performance because there is a physical limit to how fast DRAM can handle data requests.

Rev. 3.41

4 31

Servicing HP ProLiant Server Products

System Bus Timing A system bus clock controls all computer components that execute instructions or transfer data. The smallest unit of time measured by the system bus clock is called a clock tick, or cycle. A complete clock cycle is measured from one rising edge to the next rising edge. The clock speed, or clock frequency, is measured in megahertz (MHz.

Components operate more efficiently when they are in sync, or synchronized, with the system bus clock. If a component is not synchronized (asynchronous) with the system bus clock, either the rest of the system or the component itself must wait one or more additional clock cycles for data or instructions due to clock resynchronization. Memory Bus Speed The speed of the DRAM is not the same as the true speed (or frequency) of the overall memory subsystem. The memory subsystem operates at the memory bus speed, which has the same frequency (in MHz) as the main system bus clock. The two main factors that control the speed of the memory subsystem are the memory timing and the maximum DRAM speed.

SDRAM Technologies
The original DRAM took approximately six system bus clock cycles for each memory access. FPM, EDO, and SDRAM improved performance by automatically retrieving data from additional memory locations on the assumption that they too will be requested. FPM and EDO DRAMs are controlled asynchronously. When processor speeds were less than 66 MHz, FPM and EDO DRAMs were fast enough to keep pace. But as processors became faster, they had to wait more often for data from FPM and EDO DRAMs. SDRAM uses a clock to synchronize the input and output signals on the memory chip. This clock is synchronized with the system bus clock so that the memory chips and processor coordinate the execution of commands and the transmission of data. SDRAM Features SDRAM is the most prevalent memory being used in systems today. In addition to synchronous operation, SDRAM has other features that accelerate data retrieval multiple memory banks, burst mode access, greater bandwidth, and registers. Multiple Memory Banks SDRAM divides memory into two to four banks for simultaneous access to more data. While one memory bank is being accessed, the other bank remains ready to
4 32
Rev. 3.41

Server Technology

be accessed. This allows the processor to initiate a new memory access before the previous access has been completed, resulting in continuous data flow. Burst Mode Access The architectural enhancement of SDRAM allows data to be accessed with each clock cycle after the initial request has been satisfied. SDRAM uses this process, called data bursting, to achieve greater data throughput. Increased Bandwidth The bandwidth (capacity) of the memory bus increases with its width (in bits) and its frequency (in MHz). By transferring 8 bytes (64 bits) at a time and running at 100 MHz, SDRAM increases memory bandwidth to 800 MB/s, 50 percent more than EDO DRAMs (533 MB/s at 66 MHz). Registered SDRAM Modules To achieve higher memory subsystem densities, registers have been added to memory modules. These registers isolate the modules heavily loaded address and control buses from the rest of the system. The fewer loads that the memory bus sees, the greater the amount of memory that can be added to the system.

Rev. 3.41

4 33

Servicing HP ProLiant Server Products

Advanced SDRAM Technologies


Despite the performance improvement in the overall system due to use of SDRAM, a growing performance gap between the memory and processor must be filled by more advanced memory technologies. For these advanced technologies to support systems with burst speeds over 200 MHz, memory packaging, interconnections, emissions, and timing will have to be redesigned. The table below illustrates the relative bandwidth of the various technologies.

Rambus DRAM The Rambus design provides higher performance than traditional SDRAM because RDRAM transfers data on both edges of a synchronous, high-speed clock pulse. RDRAM is capable of operating at 800 MHz and providing a peak bandwidth of 1.6 GB/s. Current RDRAMs use the first generation of signaling technology called Rambus Signaling Level (RSL), which allows data to be transferred on both edges of a synchronous clock pulse, effectively sending two bits every clock cycle. Quad Rambus Signaling Levels (QRSL), the next-generation technology, transfers two bits of data per clock edge, theoretically doubling peak bandwidth.

4 34

Rev. 3.41

Server Technology

Double Data Rate SDRAM Double Data Rate (DDR) SDRAM has the same core design as SDRAM with two basic differences: more advanced synchronization circuitry and delay-locked loop. allow data to be read on both the rising and falling edges of the clock, thus delivering twice the bandwidth of standard DRAM without increasing the clock frequency. DDR SDRAM has peak data transfer rates of 1.6 and 2.1 GB/s at clock frequencies of 100 MHz and 133 MHz, respectively. Because of different signaling technology, it is not possible to mix SDRAM and DDR SDRAM within the same memory subsystem. Although the specification is still being finalized, DDR II will be backward compatible with DDR SDRAM and will improve bus utilization to increase performance and bandwidth, yielding a theoretical peak bandwidth of 6.4 GB/s. DDR II is also expected to provide improvements in cost, power requirements, I/O, packaging, and clocking. This table summarizes the various types of DDR SDRAM and associated naming conventions.

Rev. 3.41

4 35

Servicing HP ProLiant Server Products

Methods to prevent memory errors


There are two ways to protect against memory errors: testing and the use of error detection/correction technologies. The quality of the testing procedure depends on the source of the memory modules. HP is the leader in the qualification and testing of memory components for industry-standard servers and backs its procedures with an innovative pre-failure warranty. HP has long established its leadership in memory error detection/correction technology for industry-standard servers, and it continues this leadership with Advanced Memory Protection technologies. Superior testing improves memory reliability As memory chips become faster and more complex, testing them becomes more difficult and expensive. Memory device manufacturers invest heavily in testing systems, and they continually revamp their testing procedures to maintain device quality. Due to the constant changes in manufacturing processes, HP qualifies each memory module design and manufacturing process to minimize the occurrence of hard errors. In addition to the rigorous qualification of module manufacturers, HP further tests every memory module in the model of server in which it will be installed. This process includes testing each manufacturer's modules on every model of HP servers currently shipping and re-qualifying every module manufacturer each time HP offers a new processor speed or a new server platform. This testing and re-qualification process results in continuous improvement of memory module reliability. Superior qualification and testing procedures allow HP to offer a three-year prefailure warranty on HP memory. The HP Pre-Failure Warranty allows for replacement of any HP DIMM that exceeds predefined limits for correctable errors. These errors are recorded by the server and can be verified through HP Insight Manager or a diagnostics program. HP Pre-failure Warranty The Pre-Failure Warranty, standard on all HP ProLiant servers, extends the advantage of an HP three-year, limited warranty by applying it to critical components, such as memory, before they actually fail. Specifically, the PreFailure Warranty ensures that when customers receive notification from HP Insight Manager Version 2.0 or higher that a critical server component may fail, the component is replaced free of charge under the warranty. With the Pre-Failure Warranty, system administrators can proactively schedule downtime for maintenance and not interrupt critical business operations that rely on these enterprise servers. During the warranty period, the Pre-Failure Warranty covers the replacement of DIMMs used in a servers main memory when the predefined thresholds for correctable errors have been exceeded. The predefined thresholds can differ among system architectures. Non-repeating correctable soft errors are not covered under warranty since their occurrence requires no action.
4 36
Rev. 3.41

Server Technology

Parity, Non-Parity, ECC and Advanced ECC


Whenever data is transmitted from one device to another, it is possible that errors can occur (for example, the receiving device does not receive the identical information that was sent by the transmitting device). One of the simplest and most widely used schemes for detecting transmission error is the parity error detection method. When a PC with parity memory encounters a soft error during operation, an error message is displayed containing a code number that helps a user or the technician track down a bad memory module. When a soft memory error (non-repetitive) has occurred (and if they do, they may not realize that it was a memory error). If the error is repetitive (hard error), the user receives error notification the next time the PC is booted (because the PC checks its memory and reports any errors at startup). Parity allows a memory subsystem to detect single-bit memory failures, but parity memory subsystems cannot correct a failed data bit. Error Correcting Code (ECC) can perform this task for single bit errors. There are two phases of ECC: detection and correction. When a hard or soft error does occur, a system using an ECC scheme can often detect and possibly correct the error and continue running. There are four types of memory subsystems available today, parity, non-parity, Error Checking and Correction (ECC) and Advanced ECC. Non-parity subsystems cannot detect faults in the memory. Parity subsystems can detect single bit failures in the memory but can only react by causing a system halt. ECC Memory subsystems can detect a two-bit failure and correct a single-bit failure in the system memory. Advanced ECC can correct multi-bit errors.

Rev. 3.41

4 37

Servicing HP ProLiant Server Products

Advanced Memory Protection


HP is committed to providing a broad spectrum of memory protection technologies for ProLiant servers which will deliver increased fault tolerance for applications requiring higher levels of availability. The HP ProLiant 300, 500, and 700 Series servers will feature one or more Advanced Memory Protection technologies: Online Spare Memory, Hot Plug Mirrored Memory, and Hot Plug RAID Memory. These Advanced Memory Protection technologies are optimized for the features and applications of each server series. Online Spare Memory will be beneficial to customers with sites that cannot afford downtime from memory errors, yet can wait until a scheduled downtime to replace failed memory modules. Hot Plug Mirrored Memory will provide a more fault-tolerant option for sites that cannot afford downtime from memory errors and do not want to wait until scheduled downtime to replace failed memory modules. It will allow memory modules to be hot-replaced without shutting down the server. HP Hot Plug RAID Memory provides the highest level of availability for customers who deploy industry-standard servers with large memory systems to run 24x7 applications. It enables the memory subsystem to operate continuously, even in the event of a complete memory device failure, by allowing DIMMs to be hotreplaced, hot-added, and hot-upgraded. All HP Advanced Memory Protection technologies support industry-standard 256MB, 512-MB, 1-GB DIMMs, and 2-GB DDR DIMMs. Advanced Memory Protection for HP ProLiant 300 series servers In the HP ProLiant ML370 G2 and G3 servers and in DL380 G2 and G3 servers, there are six DIMM sockets on the motherboard. The sockets are organized into three memory banks (A, B, and C). In standard memory mode, all banks are used as available system memory for a total capacity of 6 GB, if 1-GB DIMMS are used. Because the system uses 2-way interleaving, the DIMMs must be installed in pairs, one bank at a time. The DIMMS in each bank must be of the same type and capacity or the performance of the memory subsystem will be degraded. For example, bank A can contain two 512-MB DIMMs while bank B can contain two 1-GB DIMMS.

4 38

Rev. 3.41

Server Technology

Online Spare Memory mode for G2 servers Online Spare Memory mode provides a higher level of memory protection than Standard Memory mode. Online Spare Memory is beneficial to businesses with sites that do not have sufficient IT staff available to service a failure, do not always have replacement memory on hand, or cannot bring down the server before a scheduled shutdown.

To enable Online Spare mode, customers use the ROM-Based Setup Utility (RBSU) at startup to designate bank C as Online Spare memory. For the ProLiant ML370 G2 and DL380 G2 servers, Bank C must be populated before the server can be configured in Online Spare mode. Banks A and B are considered as system memory with a total capacity of 4 GB if 1-GB DIMMS are used; however, bank B does not have to be populated. The DIMMs installed in bank C must be of equal or greater capacity than those in the remaining banks. For example, if 512-MB DIMMs are used in bank A and 1-GB DIMMs are usedin bank B, the DIMMs in bank C would have to be at least 1-GB DIMMs. The next generation of the Online Spare implementation will not use a dedicated memory bank. Rather, the last populated bank will be the Online Spare bank. For example if banks A and B are populated, the DIMMs in bank B can be used as Online Spare Memory. The memory socket configuration may also differ from current generation product. Refer to the Setup and Installation Guide for memory socket configuration and Online Spare Memory population requirements. In Online Spare Mode, if a DIMM in bank A or B exceeds a predefined error threshold, an amber attention LED in front of the failed DIMM will light. The error will be corrected, but the data from the entire bank that contains the failed DIMM will be copied to the Online Spare memory bank. The failed bank will be deactivated, but the server will remain available until the customer can replace the failed DIMM during a scheduled shutdown. Online Spare memory mode for G3 servers The Online Spare implementation for the ProLiant ML370 G3 and DL380 G3 server does not require Bank C to be populated. The Online Spare Bank is always the last populated bank.
Rev. 3.41

4 39

Servicing HP ProLiant Server Products

Advanced Memory Protection for ProLiant 500 series servers The HP ProLiant 500 series servers come standard with a primary memory board. The primary memory board has eight DIMM sockets for a total capacity of 8 GB, if 1-GB DIMMS are used in Standard Memory mode. The HP ProLiant ML570 G2 and ML530 G2 are examples of servers that use 2-way interleaving, while the ProLiant DL580 G2 is an example of a server that uses 4-way interleaving. In systems using 2-way interleaving, the sockets are organized into four banks (A, B, C, and D) with two sockets in each bank. Systems using 4-way interleaving are organized into two banks with four sockets each. The DIMMs must be installed in banks of four, one-at-a-time, and the DIMMS in each bank must be of the same type and capacity for the system to operate properly. No operating system support is required for this option. All software and drivers are in the system BIOS. With a single memory board, customers can also enable Online Spare Memory mode and Single-Card Memory Mirroring. Customers can purchase an optional memory board to increase the available memory in Standard or Online Spare Memory modes or to enable Hot Plug Mirrored mode. The following sections explain the memory protection options for both single-board and dual-board configurations. Online Spare Memory mode (Single memory board configuration) Using RBSU, customers can designate Bank D as Online Spare Memory and designate the remaining banks (A, B, and C) as system memory. Bank D on the primary memory board is always the Online Spare bankeven if the optional memory board is also installed. Bank D must be populated before the server can be configured in Online Spare mode. If one of the DIMMs in banks A, B, or C reaches a pre-defined error threshold, the system copies the data from the entire memory bank that contains the failed DIMM to the Online Spare Memory bank. The system then deactivates the failed bank and illuminates the memory board LED indicator in front of the failed DIMM. HP Insight Manager will provide system warnings on the monitor or by other means such as paging. This operation maintains server availability and memory reliability without service intervention. The DIMM that exceeded the error threshold can be replaced at the customer's convenience during a scheduled shutdown.

4 40

Rev. 3.41

Server Technology

Online Spare Memory mode (Dual memory board configuration) The ProLiant ML570 G2 and DL580 G2 servers support dual memory boards. Using dual memory boards in Online Spare mode, users can increase system memory up to 16 GB and maintain a higher level of memory protection than with Standard Memory mode. If the optional memory board is installed prior to booting the server, bank D on the primary memory board can still be designated as the Online Spare bank using RBSU. Using Online Spare mode in a 2-way interleaving configuration, the server can support up to 2 GB of Online Spare memory in bank D on the primary board and up to 14 GB of system memory in the remaining banks (using 1-GB DIMMs).

Systems with 4-way interleaving have only two memory banks per board (four sockets per bank) and can therefore only support failover from a maximum of three banks to the Online Spare bank. Mirrored memory mode Mirrored Memory mode is a fault-tolerant memory option that provides a higher level of availability than Online Spare Memory. Online Spare Memory mode protects against single-bit errors and entire DRAM failure, but Mirrored Memory mode provides full protection against single-bit and multi-bit errors. For this reason, Mirrored Memory mode is beneficial to businesses that cannot afford downtime and cannot risk waiting until scheduled downtime to replace degraded memory modules. Mirrored memory mode single memory board configuration (non-hot plug) Customers can enable Mirrored Memory mode using the primary memory board that comes standard with the server. This capability provides customers with full protection against single-bit and multi-bit errors using a single memory board. Customers can designate up to two banks (C and D) as mirrored memory. Servers operating in Mirrored Memory mode with a single memory board can support up to 4 GB of system memory (and an equivalent amount of redundant memory) using 1-GB DIMMs. To enable Mirrored Memory mode in a server with 2-way interleaving (ProLiant ML570 G2 or ML530 G2), banks A and B must be configured identically to banks C and D, respectively. To enable Mirrored Memory mode in a server with 4-way interleaving (ProLiant DL580 G2), bank A must be configured identically to bank B. The same data is written to both system memory and mirrored memory banks, but data is read only from the system memory banks. If a DIMM in the system memory banks experiences a multi-bit error or reaches the pre-defined error
Rev. 3.41

4 41

Servicing HP ProLiant Server Products

threshold for single-bit errors, banks C and D are automatically designated as system memory and banks A and B are designated as mirrored memory. Data is still written to the system and mirrored memory banks, but it is read only from the system memory banks. This will allow continuous operation and maintain the level of server availability except in the highly unlikely case of a simultaneous error in exactly the same location on a DIMM and its mirrored DIMM. The system illuminates the memory board LED indicators of the DIMMs in the bank that experienced the multi-bit error. These DIMMs can be replaced at the customer's convenience during a scheduled shutdown.

Mirrored memory mode dual memory board configuration Hot Plug Mirrored Memory mode uses the optional memory board to provide complete redundancy and a higher level of memory protection than Online Spare mode. Hot Plug Mirrored Memory also provides hot-add and hot-replace capability to increase server availability. Hot-add allows the customer to increase memory capacity by adding DIMMs to open slots, while hot-replace allows a customer to replace a failed DIMM while the system continues to operate. This capability is especially useful for businesses that cannot afford downtime and cannot risk waiting until scheduled downtime to replace degraded memory modules. Servers operating in mirrored memory mode support up to 8 GB of system memory (and an equivalent amount of redundant memory) using 1-GB DIMMs. To enable Hot Plug Mirrored mode, the two boards must be configured identically. The same data is written to both boards, but data is only read from the primary board.

4 42

Rev. 3.41

Server Technology

Hot Plug Mirrored Memory configuration requirements For hot-plug support, the second memory board must meet the following requirements: Same number of memory banks populated as the first board. Same amount (total capacity) of memory in each bank as the first board. Same type of memory in each bank as the first board (single-sided or doublesided). If a DIMM on the primary board experiences a multi-bit error or reaches the error threshold for single-bit errors, the data is read from the optional board. This will enable the customer to hot-replace the failed DIMMs on the primary board without shutting down the server. HP will use Hot Plug Mirrored Memory along with Advanced ECC to provide protection against all memory errors except in the highly unlikely case of a simultaneous error in exactly the same location on a DIMM and its mirrored DIMM.

On the membrane of the memory board, a Ready to Hot Plug light will indicate when it is safe to remove one of the memory boards. When the light is green, the user can remove either memory board with the following restrictions: If no errors have occurred, either board can be removed. If one of the memory banks has a failure, the user can only remove the board that contains the failed bank. If both boards have a failed bank, the user cannot remove either board. While this type of failure is highly unlikely, this restriction will protect the customer from entering a risky configuration with one memory board that has known multi-bit errors. The server must be shut down to service this type of failure. Hot-plug capabilities The ProLiant 500 Series servers feature hot-add functionality, which allows the customer to increase memory capacity by adding DIMMs to open slots. Hot-add capability requires support from the operating system to recognize the additional memory that is installed. Microsoft Windows Server 2003 supports hot-add capability in the HP ProLiant 500 Series servers.

Rev. 3.41

4 43

Servicing HP ProLiant Server Products

Advanced Memory Protection for ProLiant 700 Series servers HP Hot Plug RAID Memory is available for the ProLiant 700 Series servers such as the ProLiant DL 740 and ProLiant DL760 G2). HP Hot Plug RAID Memory allows the memory subsystem to operate continuously, even in the event of a complete memory device failure. RAID, in this case, stands for Redundant Array of Industry-standard DIMMs, which should not be confused with the Redundant Array of Independent Disks (RAID) schemes used for hard disk drive storage. While HP Hot Plug RAID memory is conceptually similar to RAID Level 4 disk storage technology, there are some key performance and implementation differences, which are described next. HP Hot Plug RAID Memory Hot Plug RAID Memory is conceptually similar to RAID Level 4 in that it generates parity for an entire cache line of data during write operations and records the parity information on a dedicated parity cartridge. This parity information is checked during read operations.

This is where the similarity between HP Hot Plug RAID Memory and RAID disk storage technology ends. Hot Plug RAID Memory does not have the mechanical delays of seek time and rotational latency associated with disk drive arrays. Storage subsystem arrays use a single bus to write the stripes sequentially across multiple drives. In contrast, Hot Plug RAID Memory uses parallel, point-to-point connections to write data simultaneously across multiple memory cartridges. Also, Hot Plug RAID Memory eliminates the write bottleneck associated with typical storage subsystem RAID implementations. In a storage array, the RAID controller generally performs a read operation of existing parity before a write operation can be completed. If a dedicated parity drive is being used, a bottleneck occurs. However, because Hot Plug RAID Memory usually operates on an entire cache line of data, there is no need to read existing parity before a write operation. Therefore, no performance bottleneck occurs.

4 44

Rev. 3.41

Server Technology

HP Hot Plug RAID Memory Operation How does HP Hot Plug RAID Memory work? Servers with HP Hot Plug RAID Memory use five memory controllers to control five memory cartridges. Each cartridge can hold up to eight industry-standard DIMMs. When the memory controllers need to write data to memory, they split the data into four blocks and write them to four of the memory cartridges. A RAID engine calculates parity information, which is stored on the fifth cartridge. With the four data cartridges and the parity cartridge, the data subsystem is completely redundant such that if the data from any DIMM is incorrect or any cartridge is removed, the data can be recreated from the remaining four cartridges.

Hot-plug capabilities The redundancy in HP Hot Plug RAID Memory allows customers to hot-replace, hot-add, and hot-upgrade DIMMs without shutting down the server. Hot replace is replacing a failed DIMM while the system continues to operate. HP Hot Plug RAID memory offers hot-replace capability in a driverless implementation that requires no support from the operating system. Servers will have hot-replace capability directly out of the box, regardless of the operating system. Hot-add and hot-upgrade capabilities allow customers to scale the server's available memory. Hot-add allows the customer to increase memory capacity by adding DIMMs to open slots. Hot-upgrade allows the customer to replace smaller capacity DIMMs with larger capacity DIMMs. Hot-add and hot-upgrade capabilities require support from the operating system to recognize the additional memory that is installed. Microsoft Windows Advanced Server, Windows Data Center, Novell NetWare 6.0, and SCO UnixWare 7.1.2 will support these capabilities in the HP ProLiant 700 Series servers. HP is working with other operating system vendors to ensure that these capabilities will be supported in their future releases. When a hot-plug operation is completed, HP Hot Plug RAID Memory automatically rebuilds the data across all the memory cartridges. The process to rebuild 4 GB of memory takes less than 30 seconds.
Rev. 3.41

4 45

Servicing HP ProLiant Server Products

Online Spare Memory Configuration Configuration procedure: 1. It is highly recommended you test new memory when first adding it to the system. Follow these three steps: a. Under Advanced Options in RBSU - ROM-Based Setup Utility, change the setting Post Speed Up to disable (enabled by default.) b. Make sure that Online Spare Memory is disabled - it is by default. This option is also in RBSU under Advanced Options and Advanced Memory Protection. c. Reboot. All the memory will be tested. This may take a few minutes, depending on how much memory is installed in your system. Once the memory has been tested, you can enable Post Speed Up again for faster system boot. 2. Once the memory has been tested, power down the system and make sure that bank C is populated with memory no smaller than either bank A or B. 3. Power on your server. Online Spare Memory is disabled by default; therefore, all the memory is initially counted and configured as available primary memory. 4. At the prompt, press F9 to enter RBSU. 5. From the RBSU main menu, select Advanced Options. 6. Using the arrow key, move down and select Advanced Memory Protection. 7. To activate Online Spare Memory, highlight Online Spare and press enter. Once you press enter, your choice is saved. (The default option is Standard ECC, giving maximum memory size for applications that require large memory.) 8. Press ESC twice to go back to the main RBSU menu. 9. Press F10 to exit RBSU and your server will automatically re-boot. As your server reboots subsequent to enabling Online Spare Memory, it will display the following message: xxxxMB System Memory and xxxxMB memory reserved for Online Spare Note: If the memory size requirements for proper operation are not met, RBSU will not allow you to enable Online Spare Memory and will display the message: Caution: Current memory configuration does not support Online Spare. See documentation.

4 46

Rev. 3.41

Server Technology

Online Spare Memory Troubleshooting The system will inform you when the ECC threshold has been exceeded by: 1. Integrated Management Log The IML Log will have the following entry: Online Spare Memory Engaged for Faulty Module (Slot x, Memory Module y) 2. OS Console Depending on your OS, the console will display one of the following messages: a. NT/Windows 2000: The System Management Driver has determined that memory module x in slot n has exceeded the memory error threshold and Online Spare Memory has been engaged. b. Netware: CPQHLTH: Excessive ECC memory errors detected and automatically corrected. Online-Spare Memory engaged. c. UNIX / Linux: Excessive ECC memory errors detected and automatically corrected. Online-Spare Memory engaged. 3. The following LEDs will light: a. Amber colored LED next to the failed DIMM inside the server. LED will stay on to signify which DIMM has exceeded single bit error threshold until the system is rebooted. b. Internal Health LED on the front panel will light up Amber to signify ECC error and switch over. 4. Insight Manager will display Degraded or Failed status under the Advanced Memory Protection section.

Rev. 3.41

4 47

Servicing HP ProLiant Server Products

Power Subsystem
The power subsystem includes everything related to power, thermal issues and adequate airflow. It sometimes helps when isolating a failure to think of this subsystem in terms of two groups: everything related to power internal to the system and everything related to power external to the system. The power supply is switch controlled. Hot Pluggable Power Supplies Built-in hot plug power supply support allows users to insert or remove power supplies in fault tolerant configurations while the system is still up and running.

Embedded microcontroller Automatic load sharing Automatic line sensing Independent line cord Hot plug, N+1 redundant All failure conditions sent to IMD and CIM Common design throughout workgroup servers Common design throughout high end servers

Systems with traditional power supplies do not perform a power supply self-test. With intelligent power supply technology, the microcontroller performs a self-test upon startup that checks the power supply temperature sensors, RAM integrity, ROM revision, analog to digital (A/D) and digital to analog (D/A) accuracy, and non-volatile memory integrity. In case of a failed self-test, the power supply will not enable and will indicate failure by flashing an amber status LED. The inclusion of a self-test at system startup greatly increases system reliability. A system administrator can now discover possible power supply problems before a system runs and performs functions. This could prevent the power supply from failing during a critical function. For example, if the D/A accuracy was outside tolerances the power supply status LED would indicate a failure.

4 48

Rev. 3.41

Server Technology

Hot-pluggable power supply assemblies can be identified by a port wine colored removal and insertion latch assembly.

Power Supply LEDs


Indicator LED Status Meaning

DC Power

Green Amber Amber flashing Green/Amber alternating Green flashing Off

AC power is connected to this power supply. Fault detected in this power supply. Failed self-test. Power supply failed to restart after a prolonged fault. Power supply will restart within 20 seconds. DC power not switched on or interlock open. AC power is connected to this power supply.

AC Power

Green

Off

No AC power connected to this supply.

1. DC Power Status LED 2. AC Power Status LED


Rev. 3.41

4 49

Servicing HP ProLiant Server Products

Hot Pluggable Fans Redundant fans ensure proper airflow around temperature sensitive components in case of fan failure. Server fans speed up as the temperature rises and alert the operating system through Insight Manager if the temperature approaches a critical point. Hot-pluggable, redundant fans are standard on todays servers. Like other hot plug components, these fans can be individually powered down and replaced in the event of a failure, while the redundant fan takes over. This helps ensure that a fan failure will not take the server down. Fan Status Check the fan LEDs to determine fan status.
LED Green Amber Off Fan Status Power to fan. Fan OK Replace fan No power to fan. Ensure fan is properly seated. Ensure power to fan is good. Replace fan.

4 50

Rev. 3.41

Server Technology

Redundant Processor Power Modules (PPMs) or Voltage Regulator Modules (VRMs) A processor requires tightly controlled power from a dedicated power supply. If a power supply module supporting a processor fails, the system goes down. To prevent that, ProLiant servers have either three processor power supply modules to support every two processors (two active and one redundant) or fully redundant power modules. If one power module fails, the redundant power module takes over operation without interrupting system operation. Some redundant PPMs are actually multiple physical PPMs. Some, such as the one pictured here, have two PPMs on one physical board. If one PPM fails, the second one takes over.

Rev. 3.41

4 51

Servicing HP ProLiant Server Products

Input/Output Subsystem
I/O devices link the user with the system. I/O devices can be uni-directional or bidirectional. PCI Hot Plug HP PCI Hot Plug technology enables the removal and replacement of PCI controllers without shutting down the system or interfering with other controllers on the PCI bus. The operating system, the system hardware and the device driver must all support PCI Hot Plug for this function to be used. This is currently available with Microsoft Windows NT, Microsoft Windows 2000/2003 and Novell IntranetWare. SCOs UnixWare operating system also gives administrators full hot-plug capability. The first generation of hot-pluggable PCI slots required a utility to turn off the driver in the operating system. The second generation of slots turns off the driver when the slot is powered down. The utility is provided whether first or second generation slots are in the server. PCI Hot Plug systems incorporate the following features that differentiate them from conventional systems:

Advanced system circuitry that permits software control of the PCI Hot Plug slots LED status indicators for each PCI Hot Plug slot that indicate if a slot has power, and if the device driver reported an attention condition Slot release levers that automatically disable power to the hot plug slot when opened Wider PCI slot spacing and dividers between hot plug slots that permit safe insertion and removal of controllers, while avoiding contact with active adjacent PCI options Each hot plug slot can be isolated from PCI bus, uninterrupted service on adjacent adapters Adapter locks prevent removing adapters with power Backward compatible to existing PCI cards Must have networking installed in order to use hot plug PCI, because of RPC calls used

4 52

Rev. 3.41

Server Technology

Board Slot Status The LEDs at each expansion slot indicate the board slot status.
LED What the Slot LEDs Indicate

Green On Amber Off Green On Amber On

Power is currently applied to the slot. Do not open the slot release lever. The slot is functioning normally. Power is currently applied to this slot, but the slot needs attention, such as when there is a problem with the slot, the board, or the driver. Do not open the slot release lever. Follow these steps: 1. 2. 3. 4. 5. 6. Through the PCI Hot Plug application, turn power off to the slot (the green LED turns off). Open the slot release lever (the amber LED turns off). Remove or replace the board. Connect the cables to the PCI board. Close the slot release lever. Return power to the slot through the PCI Hot Plug application (the green LED turns on).

Green Off Amber On

Power to this slot is turned off, but this slot needs attention, such as when there is a problem with the slot, the board, or the driver. Follow these steps: 1. 2. 3. 4. 5. Open the slot release lever (the amber LED turns off). Remove or replace the board. Connect the cables to the PCI board. Close the slot release lever. Return power to the slot through the PCI Hot Plug application (the green LED turns on).

Green Off Amber Off

The power to the slot is off. If you need to replace the card in this slot, follow these steps: 1. 2. 3. 4. 5. Open the slot release lever. Remove or replace the board. Connect the cables to the PCI board. Close the slot release lever. Return power to the slot through the PCI Hot Plug application (the green LED turns on).

Rev. 3.41

4 53

Servicing HP ProLiant Server Products

PCI Hot Plug with Microsoft Windows NT On Microsoft Windows NT servers, installation of the hot-plug user interface creates a new icon in the Control Panel called Hot-Plug. This utility can also be accessed through a shortcut in the Administrative Tools folder. The utility provides a means for managing the hot-plug PCI slots on the local server and on remote nodes. A built-in filter permits the user to select the chassis and slots being viewed. The hot-plug utility provides information about the controllers plugged into the hot-plug slots, such as card location, board specific information, driver name, duplex status, and board status. The administrator can use the hot-plug utility to perform the following maintenance tasks:

Turn the power to individual slots off and on to permit controller replacement View the properties page(s) for the controllers Mark devices failed when they are suspect and remove that status once repaired Run diagnostics on the controllers to determine their current status

Microsoft Windows NT PCI Hot Plug Architecture

4 54

Rev. 3.41

Server Technology

PCI Hot Plug with Microsoft Windows 2000/2003

Rev. 3.41

4 55

Servicing HP ProLiant Server Products

PCI Hot Plug with Novell IntranetWare The PCI Hot Plug architecture takes advantage of the inherent modularity of IntranetWare to minimize the changes required of third party adapter card software. The system relies on a new central component, the Novell Event Bus, which facilitates communications between the different software modules. The Event Bus is first implemented as a NetWare Loadable Module (NLM), allowing implementation of PCI Hot Plug on existing versions of IntranetWare. These components include:

Novell Event Bus (NEB) Novell Configuration Manager (NCM) OEM Specific System Bus Driver (SBD) Novell Configuration Manager Console (NCMCON) CPQHLTH.NLM Device Drivers ODI-Compliant network adapters NWPA-Compliant storage adapters Other Adapters Installation Tools

Novell IntranetWare PCI Hot Plug Architecture

4 56

Rev. 3.41

Server Technology

Hot Pluggable Drives Hot-pluggable drive support allows easier servicing and high availability. Built-in hot plug drive support allows users to insert or remove drives in fault tolerant configurations while the system is still up and running. Inserting new hot plug drives is necessary for on-line capacity expansion. Removing hot plug drives is required when a disk drive fails and needs to be replaced.

On-Line

Drive Access

Drive Failure

LEDs on the front of the drive indicate hard drive status.

Hard Drive LEDs


LED
On-Line (green)

Status
ON Flashing OFF

Meaning
Hard drive online. Power to hard drive. Do not remove hard drive. Hard drive being rebuilt. Do not remove hard drive. Hard drive off. Drive is being accessed. Drive is not being accessed. Problem with hard drive. Replace drive. Hard drive functioning normally.

Drive Access (green)

Flashing OFF

Drive Failure (amber)

ON OFF

Rev. 3.41

4 57

Servicing HP ProLiant Server Products

The following illustration gives LEDs on the front of LVD drives and their meanings:

1 Activity Off Off X

2 Power/ Online Off Off On

3 Fault Off On X

Indicator Meaning

OK to remove drive if not part of a faulttolerant configuration OK to remove failed drive Drive is online, do not remove

4 58

Rev. 3.41

Server Technology

Network Interface Controllers All HP network device drivers have integrated error recovery features that allow the drivers to detect failure events and recover from these errors. The drivers can reset the NIC and continue running, usually without noticeable interruption, after the following types of errors:

Adapter check interrupt When the hardware detects a problem, a detailed console error message is generated and an immediate attempt to recover begins. Link status change Link status changes occur when a cable is unplugged or there is a hub problem. If a fatal link status change occurs, the driver attempts to recover from it. Transmit integrity check failure If the driver receives indication that the interface integrity is compromised (by a cable or hub failure, for example), it reports the failure and attempts to recover.

An RJ45 connection is used on a network interface controller for 100TX. HP network interface controllers support a network speed of 1000 MB/s. Optional Redundant NIC Support Under Windows NT 4.0, Novell IntranetWare, and SCO UNIX, NICs can be installed in redundant controller pairs, sharing a driver. For example, dual-port fast Ethernet network interface controllers can support redundant NICs. When the device driver detects an error on the NIC and cannot effect recovery, the driver switches the roles of the active and standby interfaces (standby becomes active) without interruption of service, allowing conveniently scheduled replacement of the failed controller. In systems with hot plug capability, the failed NIC can be replaced without shutting down the system.

Same Subnet Network and MAC (Media Access Controller) Address failover Detection of Adapter and Cabling Faults

CPQSET The CPQSET installation utility allows the user to run initial diagnostics and configure NIC teams. A CPQSET icon is usually placed in the Control Panel when a HP NIC driver is installed.

Rev. 3.41

4 59

Servicing HP ProLiant Server Products

Software Subsystem
The following components make up the software subsystem:

Operating System/Network Operating System Applications Insight Manager Device Drivers Users data files Tools and Utilities Systems and Options ROMpaq ROM-based Configuration Utility Array Configuration Utility Diagnostics Virus Protection

4 60

Rev. 3.41

Server Technology

Fault Prevention and Recovery Management


3 Tier Fault Tolerance
HP has a three tiered approach to provide highly available servers: Tier 1 - Fault Prevention Predicts and avoids failure Allows preventive maintenance prior to failure Insight Manger Tier 2 - Fault Tolerance Keeps server running in event of component failure ECC Memory RAID Redundant Fans Redundant Power Supplies Tier 3 - Fault Recovery Quickly and automatically recovers from critical failures ASR-2

1
Fault Prevention
Predicts and avoids failures

2
Fault Tolerance
Keeps server running in event of component failure

3
Rapid Recovery
Quickly and automatically recovers from critical failures

Insight Manager

C EC ory m Me AID R

A S R

Rev. 3.41

4 61

Servicing HP ProLiant Server Products

Controller Duplexing
Some operating systems support controller duplexing, a fault tolerance feature that requires two SMART Array Controllers. With duplexing, the two controllers each have their own drives that contain identical data. In the unlikely event of a Controller failure, the remaining drives and Array Controller service all requests. Controller duplexing is not the same as duplexing the SCSI buses on a single SMART Controller. Controller duplexing is a function of the operating system and takes the place of other fault tolerance methods. Refer to the documentation included with the operating system for implementation. HP recommends using hardware-based fault tolerance instead of controller duplexing. Hardware-based fault tolerance provides a much more robust and controlled environment for fault tolerance protection. If controller duplexing is used, configure each SMART Controller with RAID 0 to achieve maximum storage capacity. In addition, the following fault-tolerant features will not be available: Online Spare, Auto Reliability Monitoring, Interim Data Recovery, and Automatic Data Recovery.

4 62

Rev. 3.41

Server Technology

Automatic Server Recovery-2


Automatic server recovery allows a server to:

Perform an automatic restart in case of a system lockup, thermal issue, or UPS activation Switch to a recovery server in the event of system failure. Send notification to a pager when ASR has been activated. Allow remote control of the server through a serial port, network connection, or a remote insight board that has an onboard modem.

ASR-2 can be configured to page an administrator when the system restarts. ASR2 depends on the application and driver that routinely notify the ASR-2 hardware of proper system operations. If the time between ASR-2 notifications exceeds the specified period, ASR-2 assumes a fault has occurred and initiates the recovery process.
Server Down

Pager

Remote

Server Down

Log-Reboot-Analyze

ASR-2
Reboots server after a H/W or S/W failure
1. 2. 3. 4. 5. 6.

Auto Reconfigure & Restart

Server Quickly Up and Running

Server Up

Logs error to the Critical Error Log Resets the server Pages the administrator Tests devices automatically, deallocates bad components Reboots server If server reboot is successful, Pages a 2nd time

Pager

The available recovery features are:

Software Error Recovery automatically restarts the server after a software-induced server failure Environmental Recovery allows the server to restart when temperature, fan, or AC power conditions return to normal

Rev. 3.41

4 63

Servicing HP ProLiant Server Products

Unattended Recovery
For unattended recovery, ASR-2 logs the error information to the Critical Error Log, resets the server, pages the system administrator (if a modem is present and paging is selected), and tries to restart the operating system. Often the server restarts successfully, making unattended recovery the ideal choice for remote locations where trained service personnel are not immediately available. ASR-2 tries to restart the server up to 10 times. If ASR-2 cannot restart the server within 10 attempts, it places a critical error in the Critical Error Log, starts the server into HP Utilities, and enables remote access if configured. ASR-2 must be configured to load the operating system after restart.

Attended Recovery
For attended recovery, ASR-2 performs the following actions:

Logs the error information to the Critical Error Log Resets the server Pages (if a modem is present and Paging is selected) Starts HP Utilities from the hard drive Enables remote access

During system configuration, these utilities are placed on the system partition of the hard drive. If dial-in access has been configured and have a modem with an auto-answer feature installed, the system administrator can dial in and remotely diagnose or reconfigure the server. If HP Utilities has been accessed for network access, the utilities can be accessed over the network. Insight Manager can be used for dial-in or network access.

Server Failure Notification


Server Failure Notification allows the server to send a pager message if ASR is activated. The options presented are self-explanatory, and are:

Pager status Pager dial string Pager message Pager test

4 64

Rev. 3.41

Server Technology

Remote Options
Remote Options enables you to remotely control the server through a modem or a network. Most of the options are self-explanatory, but those that are ambiguous in meaning are explained below:

Serial interface The communications port for the modem that is used by Server Failure Notification and Remote Options. Com1 and 2 are the only available selections. Network status Enable remote control of the server through the network. Network frame type Make sure this option is set correctly for your network; otherwise, no remote communication will occur. Ethernet II is the selection that will work on a standard Microsoft TCP/IP network.
NOTE: In Remote Options, the modem and network access should not be used at the same time. The remote connection function may not work properly when both are enabled.

Hardware Requirements
To use ASR-2 over a modem, you need the following:

HP modem or optional Hayes modem, the communication parameters must be set for 8 data bits, no parity, and 1 stop bit System Configuration Utility, version 2.24 or later and Diagnostics Utility installed on the system partition of the hard drive ASR-2 configured to load HP Utilities after restart

Booting into the Operating System


When ASR-2 is enabled to restart into the operating system and a critical error occurs, ASR-2 logs the error in the Critical Error Log and restarts the server. The system ROM pages the designated administrator, and executes the normal restart process. During the recovery process, the ASR-2 feature tries to restart the server up to 10 times. If the ASR-2 feature cannot restart the server within 10 attempts, it logs a critical error in the Critical Error Log, restarts the server into the HP Utilities, and puts the modem into auto-answer mode.

Rev. 3.41

4 65

Servicing HP ProLiant Server Products

ASR-2 Security
The standard HP password features function differently during ASR-2 than during a typical system startup. During ASR-2, the system does not prompt for the Power-On Password. This allows the ASR-2 to restart the operating system or HP Utilities without user intervention. To maintain system security, set the server to boot in Network Server Mode (an option in the System Configuration Utility). This option ensures that the server keyboard is locked until the Keyboard Password is entered. Select an Administrator Password (an option in the System Configuration Utility). During attended ASR-2 (local or remote), the Administrator Password must be entered before any modifications can be made to the server configuration

4 66

Rev. 3.41

Server Technology

Sequence of Events After a Hardware or Software Error


This flow chart shows the sequence of events after a hardware or software error:
Hardware/Software error occurs | Operating System halts normal operation | ASR Timer expires | Error records in the Server Health Log or in the Integrated Management Log, depending on the server | Server is reset | If a modem is installed and paging is enabled, the Server Failure Notification pager alert is sent to the Server Administrator

Unattended server boots the Operating System

---Or---

Server boots the HP Utilities on the system partition on the hard drive

| If a modem is installed, ASR puts the modem on auto answer so that the Server Administrator can dial in using third party terminal emulator software to remotely run the HP Utilities to identify the source of the fault

If the server continues experiencing hardware/software errors and the number of ASR cycles exceed the specified number of recovery attempts, the server will log an error to the Server Health Log or the Integrated Management Log and boot the HP Utilities from the system partition on the hard drive

| Or | Local Server Administrator runs HP Utilities from server console to identify the source of the fault

Rev. 3.41

4 67

Servicing HP ProLiant Server Products

Simplified ASR
Servers with ROM based setup have Simplified ASR. Simplified ASR is enabled when the Server Management Driver is loaded. It can be disabled through the Insight Manager Recovery icon. The timer is automatically set to 10 minutes. In case of a thermal shutdown, UPS shutdown, or OS hang, the server will attempt to reboot to the operating system after 10 minutes. Simplified ASR does not have the paging features or the configuration features of ASR-2.

4 68

Rev. 3.41

Server Technology

Health Driver

SYSMGMT.SYS/CPQHLTH.NLM - Called System Management Driver Provides support for: ASR Thermal Protection Health Log/Critical Error Log Support Remote Control of server from Insight Manager PC Requires configuration in System Configuration I2C Bus implementation

Driver uses IRQ13 for fan and temperature alerting; possible conflict with other devices.

The Health Driver continually resets the ASR-2 timer according to the frequency you specified in the System Configuration Utility (for example, 10 minutes). If the ASR-2 timer counts down to zero before being reset, due to an operating system crash, or a server lock-up, ASR-2 restarts the server into either HP Utilities or the operating system (as indicated by the System Configuration parameters). The default value is 10 minutes. The allowable settings are 5, 10, 20, and 30 minutes. For remote and off-site (unattended) servers, setting the software error recovery time-out for 5 minutes reduces the server downtime and allows the server to recover quickly. For local (attended) servers located onsite, you can set the software error recovery time-out for 20 or 30 minutes, giving you time to arrive at the server if you wish to manually diagnose the problem. The Health Driver is independent of the ASR-2 timer. You should load it enabling the ASR-2 timer. This allows the driver to detect and log information about numerous hardware and software errors in the Integrated Management Log. However, you cannot enable the ASR-2 timer without loading the Health Driver. Before ASR-2 restarts the server, it will record any information available about the condition of the operating system in the Critical Error Log, or the Integrated Management Log depending on the server support. This information can be used to diagnose an operating system crash or server lock-up, while still allowing the server to be restarted.

Rev. 3.41

4 69

Servicing HP ProLiant Server Products

Learning Check
1. PCI provides switchless and jumperless support, plug and play capability, and processor independent design. 2. True False

Conventional PCI adapters will operate in PCI-X slots, and vice versa. True False

3.

Legacy PCI cards keyed for 5-volt signaling will work in systems that provide only 3.3-volt slots. True False

4.

Which of the following statements about parallel SCSI is true? a. b. c. d. The (smart array) controller knows how many cylinders, heads, or sectors are available on each device. The SCSI host bus adapter must be built into the mother board not in a PCI or PCI-X slot. Ultra 320 SCSI operates at twice the frequency of Ultra3. Low-Voltage Differential (LVD) devices are not backward compatible with Single Ended (SE) devices.

4 70

Rev. 3.41

Server Technology

5.

Which of the following statements about SCSI configuration is true? a. b. c. d. On HP hot pluggable hard drives SCSI IDs are usually set up by selecting a unique ID number through an array of jumpers. With proper termination the internal and external connectors of a single SCSI bus can be used at the same time. Single ended (SE) SCSI supports up to 15 devices per bus without a repeater. If you have both internal and external devices, two separate SCSI channels must be used.

6.

Serial Attached SCSI (SAS) will use the same electrical and physical interface as Serial ATA (SATA) which will allow its controller to accept either a SATA or SAS hard drive. True False

7.

Which of the following is a true statement about processors? a. Processor steppings are versions of the same processor model that vary only slightly. Each stepping requires changes to System ROM. b. c. The Pentium 4 has a Quad Pumped Front Side Bus (FSB) that provides an effective speed of 400MHz with a 100 MHz clock. A system with processors that use Hyper-Threading technology appears to software as having twice the number of processors than are physically present. All of the above.

d.

Rev. 3.41

4 71

Servicing HP ProLiant Server Products

8. a. b. c.

Which of the following is a true statement about memory? Despite different signaling technology, it is possible to mix SDRAM and DDR SDRAM within the same memory subsystem. ECC Memory subsystems can correct a two-bit failure. The redundancy in Hot Plug RAID Memory allows customers to hotreplace, hot-add, and hot-upgrade DIMMs without shutting down the server Hot-update allows the customer to replace smaller capacity DIMMs with larger capacity DIMMs. HP PCI Hot Plug technology enables the removal and replacement of PCI controllers without shutting down the system or interfering with other controllers on the PCI bus. What three components must provide support to make this possible?

d.

9.

10. While working on a ProLiant server you notice that the speed of the power supply fan is changing. This is an indication of an impending fan failure. True False

4 72

Rev. 3.41

Servicing HP ProLiant Server Products

ProLiant Server Product Line Overview


Module 5

Introduction

ProLiant server products deliver top performance for a variety of business applications. This module describes the rationale for server model designations and covers the chronology of product introductions. This module also provides information on features and service considerations for the newest servers in the product line. Legacy models are covered in appendices A, B and C and appendix D focuses on appliance servers and related products. Topics in this module include:

Product positioning framework Server introduction timeline Maximized Expansion servers (ML) Density-Optimized servers (DL) Ultra-dense server blades (BL) Packaged cluster servers (CL)

Objectives
To demonstrate an awareness of the ProLiant server product line, service personnel should be able to:

Identify the major categories of ProLiant servers. Explain the organizing principles of the ProLiant server product line. Describe the features and characteristics of the newest ProLiant servers. Locate configuration and service information relative to each product.

Revision 3.41

51

Servicing HP ProLiant Server Products

Product Positioning Framework


Traditionally, the ProLiant servers have been divided into four major groups:

Entry Level Servers Workgroup Servers Enterprise Servers Appliance Servers

The needs of our customers are rapidly changing, driven by the Internet and other accelerating technologies. To meet those needs, ProLiant has continued to evolve by taking on a new positioning framework for the entire family of servers. This positioning framework better addresses our customers target needs, and more clearly reflects the breadth of our offering in a way that will directly tie to what customers want. A new positioning framework has been implemented for the ProLiant server family.

ProSignia Servers have been rebranded ProLiant and aligned with existing ProLiant servers. ProLiant servers have transitioned to a new positioning framework, and have taken on a new numbering system. ML - Maximized Expansion Servers DL - Density-Optimized Servers BL - Ultra-dense, power-efficient server blades CL - Cluster Servers Appliance Servers

The ProLiant server categories are as follows:


Renumbering
Next-generation platforms of current ProLiant servers have been given new numbers based upon the new positioning framework. Only those ProLiant servers that have been announced with new platform architecture have been renumbered. Servers have not been renumbered retroactively. We will continue to sell our current ProLiant servers, with their existing numbering, until they are discontinued. The ProLiant 6000, ProLiant 6500 and ProLiant 7000 have not undergone a platform transition, and have not been renumbered. They will maintain their naming until they reach end-of-life.

Organizing Principles
The new positioning framework for the ProLiant family is based on two organizing principles:

Customer environment designated by prefix, e.g., ML Customer application type designated by series number, e.g., 330
52

Revision 3.41

Servicing HP ProLiant Server Products

Customer environment: Customer environment is indicated by the model prefix, e.g., ML denotes emphasis on maximum expansion and DL denotes emphasis on maximum density. The ML line denotes a line of ProLiant servers that offer maximum internal expansion. They are ideal for remote and branch offices and offer all-inclusive server/storage solutions. They are available in both rack and tower models. ProLiant ML Line Transition Table
Transitioned From
ProLiant 400 Prosignia 720 ProLiant 800 Prosignia 740 ProLiant 1600 ProLiant 1600R ProLiant 3000 ProLiant 3000R ProLiant 5500 ProLiant 5500R ProLiant 8000

To

New Models
ProLiant ML330 ProLiant ML350 ProLiant ML370 ProLiant ML530 ProLiant ML570 ProLiant ML750

The DL line denotes a line of ProLiant servers that are densityoptimized for space constrained and rack-mounting environments. They are intended for data center and external storage environments as well as efficient clustering. They are available only in rack-optimized models. ProLiant DL Line Transition Table
Transitioned From To New Models
ProLiant DL320 ProLiant DL360 ProLiant DL380 ProLiant DL580 ProLiant DL590 ProLiant DL760

ProLiant 1850R ProLiant 6400R ProLiant 8500

The BL line denotes a line of ProLiant servers that are ultra-dense, power-efficient server blades, which integrate a server-class chipset, ultra-low voltage processor, and other power-saving components in an ultra-dense design that reduces power and cooling costs and saves space. Customers can install up to 280 ProLiant BL10e server blades in a standard 42U rack for better utilization of valuable data center space. BL systems range from power-efficient single processor blades to highperformance SMP server blades. The CL line denotes a line of ProLiant servers that are packaged for simplified clustering. They are a self-contained, ready-to-go clustering solution and are ideal for a variety of high-availability environments, such as data centers and remote offices. They fit in standard racks or can be configured as a stand-alone tower.
53

Revision 3.41

Servicing HP ProLiant Server Products

ProLiant CL Line Transition Table


Transitioned From
ProLiant CL1850

To

New Models
ProLiant CL380

Customer application type: The level of performance and availability they achieve defines the three series of servers in the ProLiant ML and DL line.

ProLiant 300 series offers cost-effective servers to run small databases and applications, to serve as web servers, or to support infrastructure needs such as file/print and domain server functions. The ProLiant 500 series offers more performance and availability to handle complex web applications, large databases, and to serve as critical file servers. The ProLiant 700 series offers maximum performance and availability for industry-standard computing to support very large databases, multiapplication needs, and mid-range applications. The 700 series servers are also an effective solution for server consolidation.

Generation Identifier
As ProLiant servers transition from one generation to the next there is a need to visually identify which generation of server is being serviced to ensure that the correct documentation, options and parts are used. A one-square-centimeter label will be affixed to the server to identify the generation, e.g.:

The label will be placed in a consistent location as follows:


Racks: left rack screw opposite Intel logo Towers: top left chassis behind door/bezel

The identifier will be used in documentation where the generation difference is relevant, e.g., technical documentation. It may appear in one of several formats depending on constraints such as available space in a database field:

ProLiant DL360 generation 2 server ProLiant DL360 (G2) ProLiant DL360 G2

The identifier may not be used in certain marketing documentation such as brochures and pictures. It will not be applied retroactively but will be implemented with new generations of servers going forward.
Revision 3.41

54

Servicing HP ProLiant Server Products

Server Introduction Timeline


Server Introduction Timeline 1996 - 1999
The following illustration shows a timeline for the introduction of ProLiant server products for the years from 1996 through 1999. Note the legacy product categories: Entry Level, Workgroup and Enterprise.

Enterprise rise Enterp

ProLiant 5000

ProLiant ProLiant ProLiant ProLiant 5500 7000 6000 6500

ProLiant 6400R

ProLiant 8500

ProLiant ProLiant 8000 CL1850

Workgroup

ProLiant 2500

ProLiant ProLiant 3000 1600

ProLiant 850R

ProLiant 1850R

Entry Level

ProSignia 200

ProLiant 800

ProLiant 1200

ProLiant ProSignia NeoServer 400 720 / 740 TaskSmart

1996

1997

1998

1999

Revision 3.41

55

Servicing HP ProLiant Server Products

Server Introduction Timeline 2000


The following illustration continues the timeline for the introduction of ProLiant server products for the year 2000. Note the new product categories: DensityOptimized, Maximized for Expansion and Cluster. These reflect the current marketing strategy.
ProLiant CL380

Maximized Expansion

Cluster

ProLiant ProLiant ML370 ML350

ProLiant ProLiant ML530 ML330

Density Optimized

ProLiant DL380

ProLiant DL360

ProLiant DL580

ProLiant DL320

2000

Revision 3.41

56

Servicing HP ProLiant Server Products

Server Introduction Timeline 2001


The following illustration continues the timeline for the introduction of ProLiant server products for the years 2001. Note the introduction of the Blade product category and several Generation 2 servers.
ProLiant BL10e

Density Optimized Maximized Expansion Cluster Blade

ProLiant ML750

ProLiant ML330e

ProLiant ML370G2

ProLiant ML330G2

ProLiant ML350G2

ProLiant DL760

ProLiant DL380G2

ProLiant DL590

2001

Revision 3.41

57

Servicing HP ProLiant Server Products

Server Introduction Timeline 2002


The following illustration continues the timeline for the introduction of ProLiant server products for the year 2002. New Generation 1 servers include the ProLiant BL20p and ProLiant ML310. Several Generation 3 servers appear for the first time.
ProLiant BL20p ProLiant DL380G2 Cluster

Density Optimized Maximized Expansion Cluster Blade

ProLiant ML370G3 ProLiant ML530G2 ProLiant ML350G3 ProLiant ML570G2

ProLiant ML310

ProLiant DL360G2

ProLiant DL580G2

ProLiant DL320G2

ProLiant DL380G3

ProLiant DL360G3

2002

2003

Revision 3.41

58

Servicing HP ProLiant Server Products

Server Introduction Timeline 2003


The following illustration continues the timeline for the introduction of ProLiant server products for the year 2003. New Generation 1 servers include the ProLiant BL40p, ProLiant DL560 and ProLiant DL740.
ProLiant BL10e G2 ProLiant DL380G3 Cluster ProLiant DL380G3 Integrated Cluster

-Optimized Maximized Expansion Cluster Blade DensityDensity

ProLiant BL40p

ProLiant BL20p G2

ProLiant ML330G3

ProLiant DL760G2

ProLiant DL740

ProLiant DL560

2003

2004

Revision 3.41

59

Servicing HP ProLiant Server Products

Learning Check
1. What are the current organizing principles for the ProLiant positioning framework?

2.

Describe the levels of performance and availability offered by the three series of servers in the ProLiant ML/DL line.

Revision 3.41

5 10

Servicing HP ProLiant Server Products

Maximized Expansion Servers


ProLiants ML line denotes a line of ProLiant servers that offer maximum internal expansion. ML server products include:

ProLiant ML310 ProLiant ML330 ProLiant ML350 ProLiant ML370 ProLiant ML530 ProLiant ML 570 ProLiant ML750

Objectives
To demonstrate an awareness of ProLiant maximized expansion server products, service personnel should be able to:

Describe the features and characteristics of ProLiant ML servers. Locate configuration and service information relative to each product.

Revision 3.41

5 11

Servicing HP ProLiant Server Products

ProLiant ML310

The standard features of the ProLiant ML310 include:


Processors Memory

1P Intel Pentium 4 2.0/2.2/2.8GHz 400/533MHz Frontside bus 256MB 266Mhz PC2100 DDR SDRAM standard on 2.53/2.8GHz models 128MB 266Mhz PC2100 DDR SDRAM standard on 2.0/2.2GHz models Four DIMM slots, expandable to 4GB maximum 512KB second level ECC cache Four 64-bit/33MHz PCI NC7760 PCI Gigabit Server Adapter (integrated/embedded) Wake On LAN support Integrated Dual Channel Ultra ATA-100 IDE Adapter with Integrated ATA RAID 0, 1,
& 1+0 (ATA Models) OR

Cache memory Expansion slot Network controllers Storage controller

Integrated Single Channel Wide Ultra3 SCSI Adapter (SCSI Models)


Storage and expansion

48X CD-ROM and 1.44MB disk drive assembly Support for up to five 1 Wide Ultra3 NHP SCSI hard drives or four 1 ATA NHP
Drives (depending on Model)

Internal storage capacity of up to 364 GB (5 x 72.8-GB non-hot plug 1 Wide Ultra3


SCSI HD) (SCSI Models) with optional Drive Cage

Internal storage capacity of up to 320 GB (4 x 80-GB 1 ATA/100 drives) (ATA


Models) Interfaces

Two serial ports One parallel port Two RJ-45 Ethernet ports Two USB ports

Video port Keyboard port Mouse port

Video Warranty

Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory One-year, limited warranty, Next Business Day 1 year on-site limited Global warranty
and Pre-Failure Warranty, which covers processors, memory, and hard drives Certain restrictions and exclusions 5 12

Revision 3.41

Servicing HP ProLiant Server Products

ProLiant ML310 Component Breakdown

Reference 1 2 3 4 5 6

Description 48X CD-ROM Removable media bays 1.44 MB floppy drive Two 1 Non Hot Plug Four 64-bit/33MHz PCI System fan

Revision 3.41

5 13

Servicing HP ProLiant Server Products

ProLiant ML310 Service Considerations


The following service considerations apply to the ProLiant ML310:
NMI Debug button

The NMI Debug button is located near the center of the system board. The NonMaskable Interrupt (NMI) is a diagnostic mechanism that allows for crash dump files to be created in situations when a system is hung and unable to respond to traditional debug mechanisms. The NMI Debug button can be used to diagnose software failures by forcing the operating system to invoke the Non-Maskable Interrupt (NMI) handler and generate a crash dump log. This log can provide critical troubleshooting information that may be difficult or impossible to obtain through other means. The user initiates a Non-Maskable Interrupt (NMI) by pressing the NMI Debug button. The NMI can allow a hung system to become responsive enough to generate a crash dump log. The button is enabled/disenabled in RBSU. Warning! The NMI Debug button causes the unit to abruptly fail, as it is designed to do. Therefore, it should never be used during normal operation. It may be necessary at some time to clear and reset system configuration settings. When the system configuration switch position 6 is set to the ON position, the system is prepared to erase all system configuration settings from both CMOS and NVRAM.

Clearing NVRAM

Switching From the Current ROM to the Backup ROM

Warning! Clearing nonvolatile RAM (NVRAM) deletes the system configuration. Refer to Chapter 5 "Server Configuration and Utilities," in the Server Setup and Installation Guide for instructions on configuring the server. To switch to the backup ROM: 1. 2. 3. 4. Power down the server. Set the system configuration switch positions 1, 5, and 6 to the On position. Power up the server. (the ROM will beep and halt when the ROM images have been swapped.) Power down the server, and reset all switches to the default Off position..

5. Power up the server.


System ID switchbank SCSI ID

The system ID switchbank, located on the system board, is reserved for use by authorized service providers only. All switches default to the Off position. No two SCSI devices connected to the same SCSI controller can have the same SCSI ID. If another SCSI device is connected to the controller, check its SCSI ID before beginning the installation procedure for the additional device. The SCSI ID is set by jumpers located on each device. When installing any ATA devices, make sure that the jumper on the device is set to Cable Select (CS). This setting allows the cable to automatically assign the device ID of an ATA drive attached to the cable.

ATA devices

Revision 3.41

5 14

Servicing HP ProLiant Server Products

ProLiant ML330, ML330e, ML330 G2, ML330 G3

The ProLiant ML330 replaced the ProLiant 400 and ProSignia 720 servers. ML330e is a lower cost version of the ML330 with support for ATA drives instead of SCSI. ML330 G2 is an entry-level two-processor server. ML330 G3 has Xeon processors, a 533MHz front side bus and an embedded gigabit NIC. The standard features include:
ML330 Processor 667MHz1.0GHz Pentium III 256K L2 1 2 non-hot-plug ML330e 800MHz, 933MHz or 1.0GHz PIII 256K L2 1 64MB/2GB 2 non-hot-plug ML330G2 ML330G3 1.26GHz or 1.4GHz 2.4GHz or 2.8GHz Pentium III Xeon 512K L2 1 or 2 128MB/4GB 2 non-hot-plug 512K L2 1 or 2 256MB/4GB 2 non-hot-plug

Cache No. Processors Hard Drive Bays

Memory (Std/Max) 64MB/2GB

Removable Media 4x1.5 (3 available) 4x1.5 (3 available) 4x1.5 (3 available) 5 SCSI or 4ATA Bays for NHP Drives, for NHP Drives, for NHP Drives, 2- NHP Drives, AIT, bay HP SCSI Cage, DAT; Optional 2AIT, DAT AIT, DAT bay HP SCSI Cage AIT, DAT Network Controller Integrated NC3163 Integrated NC3163 Fast Ethernet Fast Ethernet Storage Controllers Integrated, singlechannel Ultra2 SCSI Integrated, dualchannel Ultra ATA 100 Integrated NC3163 Integrated Fast Ethernet NC7760 10/100/1000 Integrated, dualchannel Ultra3 SCSI or Integrated, dual-channel Ultra ATA 100 RAID 4x64bit; 1x32bit (33MHz) Integrated Single Channel Ultra320 SCSI Adapter in a PCI slot 4x64bit (33MHz) Serial , RJ-45, Parallel, Graphics, Keyboard, Mouse, Two USB ports Integrated ATI Rage XL PCI with 8MB RAM 1-1-1

Expansion Slots Interfaces

2x64bit; 3x32bit (33MHz) Two serial , RJ-45, Parallel, Graphics, Keyboard, Mouse Integrated ATI Rage XL PCI with 4MB RAM 3-3-3

2x64bit; 3x32bit (33MHz)

Two serial , RJ-45, Two serial , RJ-45, Parallel, Graphics, Parallel, Graphics, Keyboard, Mouse Keyboard, Mouse, Two USB ports Integrated ATI Rage Integrated ATI XL PCI with 4MB Rage XL PCI with RAM 8MB RAM 3-1-1 1-1-1

Video

Warranty

Revision 3.41

5 15

Servicing HP ProLiant Server Products

ProLiant ML330, ML330e Component Breakdown

Revision 3.41

5 16

Servicing HP ProLiant Server Products

ProLiant ML330 G2 Component Breakdown

Revision 3.41

5 17

Servicing HP ProLiant Server Products

ProLiant ML330 G3 Component Breakdown

Revision 3.41

5 18

Servicing HP ProLiant Server Products

ProLiant ML330 Service Considerations


The following service considerations apply to the ProLiant ML330:
Front panel System Health Indicator ROM

Amber indicates pre-failure of processor or DIMM Red indicates failure of processor, PPM or fan Language choice is selected after the F10setup is invoked. This eliminates the need for separate images o f the ROM. When flashing the ROM, the ROMPaq flashes both the System ROM and the integrated SCSI controllers ROM. Only PC133MHz ECC registered DIMMs can be used for the server to boot successfully. The server feature board must be installed in slot 3 for the system to boot successfully. Failure to do this will generate an 800 POST error. The server management information cable must be installed. Failure to do this will generate an 801 POST error. If Wide Ultra2 or Wide Ultra3 drives are mixed with Wide Ultra devices on the embedded controller, all drives will run at Wide Ultra speeds. If Wide Ultra2 and Wide Ultra3 drives are mixed on the embedded controller the devices will run at the maximum speed of the controller and drive. All 64 bit PCI Slots support only 3.3V PCI cards POST error messages are non-standard for ML330 and ML330e only. Always refer to the MSG for POST errors. Before loading the operating system, it must be selected through the System menu of BIOS Setup Utility There is no system utility partition, therefore Diagnostics must be run from BIOS for the ML330. Rom-Based Setup Utility (RBSU) resident in ROM in ML330e and ML330 G2 There is a battery for CMOS on the system board and a battery for NVRAM on the server feature board. Both are removable. All Remote Insight Boards must be installed in slot 4, a32-bit PCI slot (slot 5 for ML330 G2). No option kits for updating to latest processor technology are currently being offered on any of the previous ProLiant ML330 models. 1GHz processor spared with heatsink and 110 CFM fan. ML330 and ML330e do not support the same set of operating systems; to determine the difference see the OS support matrix at ftp://ftp.compaq.com/pub/products/servers/os-support-matrix-310.pdf

Memory Server Feature Board

Mass Storage

PCI slots Misc

Revision 3.41

5 19

Servicing HP ProLiant Server Products

ProLiant ML350, ML350 1GHz, ML350 G2, ML350 G3

The ProLiant ML350 replaced the ProLiant 800 servers. The standard features of the ProLiant ML350, ML350 1GHz, ML350 G2 and ML350 G3 include:
Generation 1 Processors Intel Pentium III 933 MHz, 866 MHz, 800 MHz, 733MHz, 667MHz, 600EB (extended bus), MHz 128 MB PC133MHz ECC Registered SDRAM DIMM memory Maximum 2GB Integrated 256KB Level 2 ECC cache Upgradeable to dual processing Two 64bit/33MHz, PCI Four 32bit/33MHz, PCI One dedicated ISA slot 1GHz Intel Pentium III 1GHz FCPGA (Flip Chip) Generation 2 and G2 Array Intel Pentium III 1.4GHz, 1.26 GHz or 1.13 GHz (Array model available with 1.4 and 1.26 GHz only) 128 MB PC133MHz ECC Registered SDRAM (256 MB for array models) Maximum 4GB Integrated 512KB Level 2 ECC cache Upgradeable to dual processing Five 64bit/33MHz, PCI (one used by SmartArray 532 in array model) One 32bit/33MHz, PCI No ISA slot HP NC3163 Fast Ethernet NIC (embedded) PCI 10/100 WOL (Wake On LAN) Generation 3 and G3 Array Intel Xeon Processor 3.06, 2.8, 2.4 GHz 533MHz FSB Hyperthreading and NetBurst 256 MB PC2100 ECC DDR SDRAM (512 MB for array models) Maximum 8GB Integrated 512KB Level 2 cache (full speed) Upgradeable to dual processing Four 64bit/33MHz, PCIX (one used by SmartArray 532 in array model) One 32bit/33MHz, PCI No ISA slot Broadcom NC7760 (embedded) PCI 10/100/1000 WOL (Wake On LAN) 5 20

Memory

128 MB PC133MHz ECC Registered SDRAM DIMM memory Maximum 4GB Integrated 256KB Level 2 ECC cache Upgradeable to dual processing Four 64bit/33MHz, PCI Two 32bit/33MHz, PCI No ISA slot

Cache memory

Upgradeability Expansion slots

Network controller

HP NC3163 Fast Ethernet NIC (embedded) PCI 10/100 WOL (Wake On LAN)

HP NC3163 Fast Ethernet NIC (embedded) PCI 10/100 WOL (Wake On LAN)

Revision 3.41

Servicing HP ProLiant Server Products

Storage controller

Integrated Dual Channel Wide Ultra2 SCSI Adapter

Integrated Dual Channel Wide Ultra3 SCSI Adapter

Integrated Dual Channel Wide Ultra3 SCSI (Smart Array 532 RAID controller in array model) Support for up to six 1 hot plug drives Two 5.25-inch available 1.44MB diskette drive One 40X Max or faster IDE CDROM drive Two serial ports, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated ATI RAGE XL Video Controller with 8MB SDRAM Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor

Integrated Dual Channel Wide Ultra3 SCSI (Smart Array 641 RAID controller in array model) Support for up to six 1 hot plug drives Two 5.25-inch available 1.44MB diskette drive One 48X Max or faster IDE CDROM drive One serial port, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor

Storage

Support for up to four 1 hot-plug or non-hot-plug drives Two 5.25-inch available 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Two serial ports, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Integrated ATI RAGE IIC Video Controller with 4MB Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor

Support for up to four 1 hot-plug or non-hot-plug drives Two 5.25-inch available 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Two serial ports, RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated ATI RAGE XL Video Controller with 4MB SDRAM Video Memory Next-business-day Three-year on-site limited warranty; coverage is for parts, labor, and onsite repair Pre-Failure Warranty on hard drives, memory and processor

Removable drives

Interfaces

Video

Warranty

Revision 3.41

5 21

Servicing HP ProLiant Server Products

ProLiant ML350 Component Breakdown

Revision 3.41

5 22

Servicing HP ProLiant Server Products

ProLiant ML350 G2 Component Breakdown

Revision 3.41

5 23

Servicing HP ProLiant Server Products

ProLiant ML350 G3 Component Breakdown

Revision 3.41

5 24

Servicing HP ProLiant Server Products

ProLiant ML350 Service Considerations


The following service considerations apply to the ProLiant ML350.
Processors

Both processors must be the same speed. Pentium III processors can no longer be down-clocked or up-clocked. Intel now locks in the speed. All processor sockets must be populated with a processor or terminator board in order for the server to boot successfully. Failure to do this will generate an 802 POST error. If 2 processors are installed, the processor in slot 2 must have the same or lower stepping as the processor in slot 1 in order for the server to boot successfully. Failure to do this will generate an 805 POST message. The processors can be exchanged between processor slots to remedy this. Only PC133MHz ECC registered DIMMs can be used for the server to boot successfully. The system will generate an 804 POST error with incorrect memory installed. PC133MHz ECC registered SDRAM DIMMS are downward compatible in systems using 100 MHz SDRAM. The server feature board must be installed in slot 1for the system to boot successfully. Failure to do this will generate an 800 POST error. The server feature board on the ProLiant 400 and ProSignia 720 is not interchangeable with the board for the ML350. It is recommended that when mixing drives, connect Wide Ultra2 drives to channel 1/A of the integrated SCSI controller, and other drives to Channel 2/B. The connectors on the integrated SCSI controller support 2 internal cables, or one internal and one external cable, or 2 external cables. The cable for SCSI channel 2/B is optional (not standard with the machine) PCI Slots 2 and 3 do not support 5V PCI cards POST error messages are non-standard. Always refer to the MSG for POST errors. Before loading the operating system, it must be selected through the System menu of BIOS Setup Utility Remove the system fan before removing the system board. There is no system utility partition, therefore Diagnostics must be run from diskette.

Memory

Server Feature Board

Mass Storage

PCI slots Misc

There is a battery for CMOS on the system board and a battery for NVRAM on the server feature board. Both are removable. There is an option kit available to upgrade an existing ProLiant 800 6/350/400/450 to a ProLiant ML350. To upgrade an existing ProLiant 800 Model 6/350/400/450 with the 6/500 Processor Upgrade Option Kit number 401268B21 the server must have: A minimum System ROM revision of 2/18/1999. The Processor Core Frequency Switch reset to 1=ON, 2=OFF, 3=OFF, 4=ON.

Revision 3.41

5 25

Servicing HP ProLiant Server Products

ProLiant ML370, ML370 G2 and ML370 G3

The standard features of the ProLiant ML370, ML370 G2 and ML370 G3 include:
ML370 Processors ML370 G2 ML370 G3

Intel Pentium III 800MHz , 866MHz , 933MHz, 1GHz 128MB (expandable to 4GB) of 133MHz ECC Registered SDRAM

Intel Pentium III 1.13 GHz, 1.26GHz , 1.4GHz 256MB (expandable to 6GB) of 133MHz ECC Registered SDRAM Dual interleaved memory Online spare memory capable RBSU configurable

Intel Xeon 3.06GHz w 533MHz FSB Intel Xeon 2.4, 2.8 GHz w 400MHz FSB 1GB of 2-way interleaved capable PC2100 DDR SDRAM running at 266MHz on 3.06GHz models 12GB max 512 MB of 2-way interleaved capable PC2100 DDR SDRAM running at 200MHz on 2.8GHz models and lower 12GB max Online spare memory capable RBSU configurable 512KB L2 ECC cache all models 1MB L3 cache avail on 3.06GHz models Six 100MHz PCI-X slots (non hot plug) Embedded NC7781 NIC 10/100/1000 supporting Wake On LAN

Memory

Cache memory

256KB L2 ECC cache

512KB L2 ECC cache

Expansion slots Network controller

Six PCI slots (four 32-bit/33MHz, two 64-bit/33MHz) Embedded NC3163 Fast Ethernet NIC 10/100 supporting Wake On LAN

Six PCI slots (four 64-bit/33MHz, two HP 64-bit/66MHz Embedded NC3163 Fast Ethernet NIC 10/100 supporting Wake On LAN

Revision 3.41

5 26

Servicing HP ProLiant Server Products

Storage controllers

Integrated dual channel Wide Ultra2 embedded RAID option Optional Integrated Array Controller RAID 0, 1, 1 + 0, 5 (RAID On Chip ROC) Optional Integrated Smart Array 5i Controller 0, 1, 0+1, 5 436.8GB (Six 72.8GB drives) or 509.6GB with 2 (36.4) NHP drives in removable media Six 1 hard drive bays, four removable media bays

Integrated dual channel Wide Ultra3 - embedded RAID option Optional Integrated Array Controller RAID 0, 1, 1 + 0, 5 (RAID On Chip ROC) Optional Integrated Smart Array 5i Controller 0, 1, 0+1, 5 436.8GB (Six 72.8GB drives) or 582.4GB with 2 HP (72.8GB) hard drives in optional drive cage for removable media area. Six 1 hard drive bays, four removable media bays Rack or tower 5U chassis 8MB Upgradeable to dual processing

Integrated dual channel Wide Ultra3 PCI-based RAID

Internal Storage

582.4 GB ((6 x 72.8 GB 1 with standard internal hot plug drive cage + (2 x 72.8 GB 1) with optional ML3xx Internal Two Bay Hot Plug Wide Ultra2/Ultra3 SCSI Drive Cage) Rack or tower 5U chassis 8MB Upgradeable to dual processing

Form factor Video Upgrades

Rack or tower 5U chassis 4MB Upgradeable to dual processing The Pentium III 1GHz processor option kit is supported on all previously shipped and currently shipping ProLiant ML370 servers. Next-business-day Three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Warranty

Next-business-day Three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Next-business-day Three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Revision 3.41

5 27

Servicing HP ProLiant Server Products

ProLiant ML370 Component Breakdown

Revision 3.41

5 28

Servicing HP ProLiant Server Products

ProLiant ML370 G2 Component Breakdown

Revision 3.41

5 29

Servicing HP ProLiant Server Products

ProLiant ML370 G3 Component Breakdown

Revision 3.41

5 30

Servicing HP ProLiant Server Products

ProLiant ML370 Service Considerations


The following service considerations apply to the ProLiant ML370.
Processors ROM There must be a processor in slot 1 at all times. Both processors must be the same speed. Pentium III processors can no longer be down-clocked or up-clocked. Intel now locks in the speed. Unlike the ProLiant DL380 the ProLiant ML370 1GHz processor upgrade option will not require the thermal upgrade kit. The system ROM maintains a primary and redundant image of the BIOS. If one image is corrupt, POST error 105 Current System ROM is corrupt-now booting redundant system ROM will appear. If both ROM images are corrupt, enable disaster recovery mode by setting SW2 position 1,4, 5, and 6 to the ON position and rebooting (G2 only, redundant ROM feature) Only PC133MHz ECC register DIMMS can be used. Post will warn of unsupported DIMMs. If only one SCSI drive is used, it should be installed in Bay 0. Wide Ultra2 and Wide Ultra3 drives can be mixed. If they are mixed on the embedded controller, all drives will operate in Wide Ultra2 mode. External SCSI Port 1 and internal SCSI port 1 are the same port. This port cannot be used for both internal and external devices at the same time. The CMOS/NVRAM battery is not soldered down and is replaceable. It is important to observe air flow requirements to avoid overheating and damaging server components. Air flow is affected by a number of factors including blanks in unused slots. Air flow is also affected by removal of covers and baffles during service. The server must meet the following criteria for hot plug capability: Hot-plug aware components PCI Hot Plug device drivers Operating system with hot plug support

Memory Mass storage

Battery Air Flow

Hot Plug Requirements

The following service considerations apply to the ProLiant ML370 G2:


I/O Board Repair Potential damage to I/O board components during repair occurs when the bottomside of the board scrapes the alignment posts during replacement or when cable connectors left under the board are pulled free resulting in components being knocked off the board. Service personnel should use caution when replacing the board by moving cables out of the way and by exercising caution while aligning the board to the mounting posts.

Revision 3.41

5 31

Servicing HP ProLiant Server Products

ProLiant ML530 and ML530 G2

The ProLiant ML530 is the next generation of the ProLiant 3000 servers. ML530 G2 is a 2P enterprise server with mirrored memory.
ML530 Processors Processor upgrade 1GHz, 933MHz, 866MHz, 800MHz Pentium III Xeon Supports dual processing 133MHz Frontside bus Highly Parallel System Architecture Pentium III Xeon 1GHZ processor option kit is supported on the ProLiant ML530 800MHz, 866MHz and 933MHz models. 128MB or 256MB (depending on model) 133Mhz ECC registered SDRAM DIMMs upgradeable to 4GB maximum ML530 G2 2.4 GHz, 3.0 GHz Pentium III Xeon Supports dual processing 400MHz system bus Highly Parallel System Architecture Upgradeable to dual processing

Memory

1GB (2 x 512 MB) 200MHz DDR SDRAM DIMMs upgradeable to 16 GB maximum Advanced Memory Protection including Mirrored Memory and Online Spare Memory 2:1 interleaved memory 512KB L2

Cache memory Expansion slots

256KB L2

Eight slots: Five 64-bit PCI (33MHz) Two 64-bit PCI (66MHz) One 32-bit PCI (33MHz) Integrated 10/100 NC3163 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra2 SCSI controller Smart Array 5302/32 Array Controller (Array Models only)

Seven PCI-X Slots Four 64-bit/100MHz Hot Plug Three 64-bit/100MHz Non-Hot Plug Integrated 10/100 NC3163 Fast Ethernet Wake On LAN and PXE support Integrated Dual Channel Wide Ultra3 SCSI controller

Network controller

Storage controller

Revision 3.41

5 32

Servicing HP ProLiant Server Products

Storage and expansion

Four removable media bays:

Two Ultra3/Ultra4-ready SCSI Drive Cages standard support up to 12 1 hot plug hard drives Optional ML5xx Internal Two Bay Hot Plug Wide Ultra2/Ultra3 SCSI Drive Cage (with fan) 1.44MB diskette drive One 40X IDE CD-ROM drive Support for up to twelve 1.0-inch drives Optional drive cage adds support for 2 additional 1.0 inch drives Two bays for optional tape backup, DVD, or SCSI devices Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port Two USB ports Integrated Rage XL 8MB SDRAM video memory Next-business-day, three-year onsite limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Two 5.25-inch available 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Support for up to six 1.0-inch drives Optional drive cage adds support for 6 additional 1.0 inch drives

Interfaces

Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port

Video Warranty

ATI Rage IIC 4MB video RAM Next-business-day, three-year onsite limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Revision 3.41

5 33

Servicing HP ProLiant Server Products

ProLiant ML530 Component Breakdown

Revision 3.41

5 34

Servicing HP ProLiant Server Products

ProLiant ML530 G2 Component Breakdown

Revision 3.41

5 35

Servicing HP ProLiant Server Products

ProLiant ML530 Service Considerations


The following service considerations apply to the ProLiant ML530.
Processors

Both processors must be the same speed. Pentium III Xeon processors can no
longer be down-clocked or up-clocked. Intel now locks in the speed. slot 2 is terminated on the system board, slot 1 is not.

In a single processor system, always install the processor in slot 1. Processor Pentium III Xeon Processors with the gold colored heat sinks must be
Memory installed. Gold colored heat sinks indicate a 133MHz bus. Older processors (black >100MHz bus or green heat sinks 100MHz bus) will not boot.

Only PC133MHz ECC registered DIMMs can be used for the server to boot PC133MHz ECC registered SDRAM DIMMS are downward compatible in
systems using 100 MHz SDRAM.

successfully. The system will generate an 804 POST error with incorrect memory installed.

Cables

Cables are color-coded to reduce service time and support. The routing of cables is very important. When removing or replacing cables,
make sure that you route them in the same manner as the original, including the use of any cable clips. This will assure no cables are pinched when the system board tray is moved. bracket can be removed to provide room to maneuver.

When replacing the system board tray or cables, the PCI retainer and PCI

PCI Retainer

PCI Bracket

Revision 3.41

5 36

Servicing HP ProLiant Server Products

ProLiant ML570 and ML570 G2

The standard features of the ProLiant ML570 and ML570 G2 include the following:
ML570 Processors Intel Pentium III Xeon processor 900MHz, 700MHz Upgradeable to quad processing Expansion slots Six total, five available: Two 64-bit 66MHz PCI Hot Plug Two 64-bit 33MHz PCI Hot Plug One 64-bit 33MHz PCI Non-Hot Plug (not available) One 32-bit 33MHz PCI Non-Hot Plug Storage controller Integrated 10/100 NC3163 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra2 SCSI controller Optional integrated Smart Array controller ML570 G2 Intel Pentium III Xeon processor 1.4GHz, 1.5GHz, 1.9GHz, 2.0GHz, 2.5 GHz, 2.8GHz Hyper-Threading technology 400MHz frontside bus Upgradeable to quad processing 2MB L3 (2.0GHz, 2.8GHz only) 1MB L3 (1.5GHz, 1.9GHz, 2.0GHz, 2.5 GHz) 512KB (1.4GHz) 1024MB (PC1600-MHz Registered ECC SDRAM DIMM Memory) (Standard on 2P Rack Models only) 512MB (PC1600-MHz Registered ECC SDRAM DIMM Memory) (Standard on 1P Rack Models only) Support for a maximum of 32GB Seven 64-bit/100MHz PCI-X slots (four hot-pluggable)

Cache memory

2MB L2 per processor (900MHz, 700MHz) 1MB L2 per processor (700MHz only) 1024MB PC100MHz Advanced ECC SDRAM (900MHz) 512MB PC100MHz Advanced ECC SDRAM (700MHz) Support for a maximum of 16GB

Memory

Network controller

Integrated 10/100 NC3163 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra3 SCSI controller Optional integrated Smart Array controller 5 37

Revision 3.41

Servicing HP ProLiant Server Products

Storage

One 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Twelve 1 hard drive bays

One 1.44MB diskette drive One 32X Max or faster IDE CDROM drive Twelve 1 hard drive bays Optional two additional 1 hotpluggable drives Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port Integrated ATI Rage IIC Video Controller with 4MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Interfaces

Two serial ports RJ-45 port Parallel port Graphics port Keyboard port Mouse port Integrated ATI Rage IIC Video Controller with 4MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Video

Warranty

Revision 3.41

5 38

Servicing HP ProLiant Server Products

ProLiant ML570 Component Breakdown

Tower Models
1. 1.44-MB Floppy Drive 2. 32x or 40x IDE CD-ROM 3. Front Bezel 4. Wide Ultra2/Ultra3 Hot Plug Drive Cage (12 x 1) (Two 6 x 1 drive cages ship standard) 5. Diagnostic Lighting 6. 5.25-inch Removable Media Bays 7. Hot Plug Fans 8. Peripheral Board 9. Processors 10. Memory Board

Rack Models
1. Rack handles 2. Sliding Rails 3. 1.44MB Floppy Drive 4. 32x or 40x IDE CD-ROM Drive 5. Diagnostic Lighting 6. Wide Ultra2/Ultra3 Hot Plug Drive Cage (12 x 1) (Two 6 x 1 drive cages ship standard) 7. 5.25-inch Removable Media Bays

Revision 3.41

5 39

Servicing HP ProLiant Server Products

ProLiant ML570 G2 Component Breakdown

Revision 3.41

5 40

Servicing HP ProLiant Server Products

ProLiant ML570 Service Considerations


The four front panel LEDs in the rack version of the ProLiant ML570 are located above the drive cage and to the left of the diskette and CD-ROM drives. In the tower version, the location is the same but rotated 90 degrees to the vertical position. They can be used to diagnose a number of problems. In general, green indicates normal operation. An LED that is flashing amber indicates a problem with its associated function:

Power: The Power LED will be flashing amber if there is a temporary shutdown due to a thermal event. If it is steady amber, the system is in standby and no +5V, +12V or +3.3V power is available. Auxiliary power is supplied to the system and a portion of the system logic may still be active. LEDs will have power and may be used for diagnosis. If the LED is off, no AC power is provided to the system. Memory: Flashing amber memory status indicates a processor or memory failure which can be pinpointed by checking the Internal Diagnostics Display (IDD) on the peripheral board (discussed later). Fan: Flashing amber fan status indicates a fan failure. LED indicators on the individual fans will enable you to identify the one that is failing. Power supply: Flashing amber power supply status indicates a failure. LEDs on the individual supplies will identify which one has failed.

An improperly seated component in the interlock chain causes the associated LEDs on the system board to light. There are seven LEDs to monitor seven components: four processor boards, the memory board, the peripheral board and the power supply backplane board. All of the LEDs are extinguished if there are no interlock errors. One or more LEDs are lit when a board is not properly seated.

Revision 3.41

5 41

Servicing HP ProLiant Server Products

There is an Internal Diagnostic Display on the peripheral board which indicates the failure of a memory module or processor. It displays a two-digit alphanumeric code that corresponds to a specific memory module or processor. The Diagnostic jumper must be removed before the IDD will display a code. Internal Diagnostic Display (IDD) Indicator Codes

Location of Internal Diagnostic Display (IDD)

Serial Port B is not installed in the factory. A cable is included in the country kit that can be connected from the peripheral board to the back of the chassis. There is a blank plate installed which can be removed when the connector is installed. The WOL feature is only supported by operating systems that support ACPI. At this time, that is only Windows 2000. The WOL feature is enabled through the System Configuration Utility. Use the following steps to enable WOL. 1. 2. Press the Ctrl and A keys before the Continue message displays. This will take you to the Advanced Mode. Scroll down to find the Enable WOL selection

All Remote Insight Boards, including the new Remote Insight Lights-Out Edition, must be installed in PCI slot 6. Cabling the board to J8 of the system board provides the Lights-Out Edition with full control over the server power state. There are no VRM or PPM slots. This machine has On-Chip Voltage Regulation (OCVR). The VRM or PPM is part of the processor cartridge. The entire system board tray is a field replaceable unit including the system board itself. When lifting the ProLiant ML570 server, do not handle the server by the bezel because damage to the bezel may result. (Ribs in the plastic may get broken and cause the bezel to vibrate). To use the IRC capability, an external modem must be connected to one of the serial ports.
Revision 3.41

5 42

Servicing HP ProLiant Server Products

ProLiant ML750

Standard features of the ProLiant ML750 include:


Processor 900MHz, 700MHz, 550MHz Pentium III Xeon processors Supports up to eight processors for 8-way symmetric multiprocessing (SMP) Five 100MHz front-side buses 2048 MB of 100MHz ECC SDRAM DIMM memory (4P Models) Max 16GB 1024 MB of 100MHz ECC SDRAM DIMM memory (2P Models) Max 16GB Model
ML750T01 X900 2 M B 2 0 4 8 M B ( 4 P ) ML750T01 X700 2MB 2048MB (4P) ML750T01 X700 1MB 1048MB (2P)

Memory Capacities

Processors 4 4 2

Cache 2MB 2MB 1MB

SDRAM 2GB 2GB 1GB

Expansion slots Network controller Storage controllers Storage and expansion

Eleven hot-pluggable 64-bit PCI slots (9 x 33MHz, 2 x 66MHz); ten 64-bit, one 32-bit Integrated HP NC3134 Fast Ethernet NIC 64 PCI Dual Port 10/100, upgradeable to Gigabit

Cableless Smart Array 4250ES Controller with optional redundancy Two half-height removable media bays Support for up to 21 one-inch hot-plug Wide Ultra3 SCSI hard drives in three combinable drive cages Integrated 1280 x 1024 x 256 color on PCI local bus, 2-MB video memory Rack-optimized 14U chassis Global three-year on-site limited warranty for parts and labor with nextbusiness-day response Pre-Failure Warranty coverage of hard drives, memory and processors

Video Form factor Warranty

Revision 3.41

5 43

Servicing HP ProLiant Server Products

ProLiant ML750 Component Breakdown

Reference 1 2 3 4

Description 2 x 66MHz SCSI/PCI hotpluggable expansion slots 9 x 33MHz PCI hot-pluggable expansion slots Rack rail slots Support for up to eight 500MHz Pentium III processors with redundant Processor Power Modules (one per processor) Rear hot-pluggable redundant fan Rear redundant processor fans Memory expansion board Redundant internal processor fans

Reference 10 11 12 13

Description High speed IDE CD-ROM drive (low-profile) 1.44MB diskette drive (lowprofile) Front hot-pluggable processor fan Hot-plug drive bay; three internal drive cages standard for 21 x 1inch Wide-Ultra2 SCSI hotpluggable hard drives On/Standby power switch Integrated Management Display (IMD) Hot-pluggable redundant I/O fans Smart Array 4250ES Controller (optional redundant array controller shown)

5 6 7 8

14 15 16 17

9
Revision 3.41

Two 5-inch removable media bays 5 44

Servicing HP ProLiant Server Products

ProLiant ML750 Service Considerations


The following service considerations apply to the ProLiant ML750.
LEDs The ProLiant ML750 has several sets of LEDs that assist in troubleshooting: Front panel (power status, Integrated Management display, hot-pluggable front fan status) I/O board interconnect Processor board (eleven LEDs, number lit indicates failure point) Fan (top, front and rear) PCI Hot Plug Power supply Refer to the Maintenance and Service Guide for details. Before removing or replacing a non-hot-pluggable device: 1. Press the power ON/STANDBY switch to Standby. This disables the main power supply output and provides auxiliary power (+5V) to the server. Standby does not disable main output power. 2. Verify that the system LED on the front panel (near the power ON/STANDBY switch) is off and the fan noise stops. 3. Disconnect all power cords from the server to disable all power to the server. For some removal and replacement procedures, you must remove the server from the rack and place it on a sturdy surface. The memory expansion board has 16 DIMM slots divided into eight banks. One bank, 2 DIMMs, must be installed at a time for the server to recognize the added memory. The two DIMMs in each bank must be the same size. HP recommends installing DIMMs in order, starting from the lowest number socket. The DIMM socket IDs are labeled on the memory expansion board. Pentium III Xeon 550 MHz, 700 MHz, and 900 MHz processors cannot be mixed in the same server. They all have to be of the same speed. If you remove a processor, you must install a processor terminator board before powering up the server. The Processor Power Module (PPM) must be installed before you install the accompanying processor. Attempting to install the PPM afterward could damage the electronic components on the PPM. When Processors are installed on the second of the two system (processor) buses, a pair of Cache Coherency Accelerators must be installed. If the coherency accelerator memory fails or the modules are not installed properly, the system will initialize only processor bus 1. There is up to a 40 second delay between power-on and video. This machine has a remote-flash redundant ROM which allows recovery in the event of ROM failure

Removing and replacing nonhot-pluggable devices

Installing DIMM memory Processors Removing a processor

Miscellaneous

Revision 3.41

5 45

Servicing HP ProLiant Server Products

Density-Optimized Servers
HPs DL line denotes a line of ProLiant servers that are density-optimized for space constrained and rack-mounting environments. DL server products include:

ProLiant DL320 ProLiant DL360 ProLiant DL380 ProLiant DL560 ProLiant DL580 ProLiant DL590 ProLiant DL760

Objectives
To demonstrate an awareness of HP density-optimized server products, service personnel should be able to:

Describe the features and characteristics of HP DL servers. Locate configuration and service information relative to each product.

Revision 3.41

5 46

Servicing HP ProLiant Server Products

ProLiant DL320 and DL320 G2

The standard features of the ProLiant DL320 include:


DL320 Processors DL320 G2

1.13GHz, 1GHz, 800MHz Pentium


III FC-PGA, 133MHz Front Side Bus

2.26-GHz, 2.66, 3.06 GHz Pentium 4 FCPGA, 533-MHz Front Side Bus

Memory Cache memory

128MB PC133Mhz registered ECC


SDRAM standard, 2GB maximum

128 MB PC2100-MHz Registered ECC


DDR SDRAM, 4GB maximum

256-KB level 2 ECC cache with

1.0GHz processor, 512-KB level 2 ECC cache with 1.13GHz processor

512-KB level 2 ECC cache

Expansion slot Network controllers Storage controller

One full-length 64-bit/33MHz PCI Dual integrated full-duplex NC3163


Fast Ethernet 10/100 NICs ATA or

Two embedded NC7760 PCI Gigabit


Server Adapters

Integrated single channel dual port Integrated single channel Wide


Ultra2 SCSI

Integrated Dual Channel Ultra ATA/100 Optional slotless single channel Wide
Ultra3 SCSI controller module

Adapter with Integrated ATA RAID 0, 1

Storage and expansion

Optional removable 24x, low-profile


CD-ROM and 1.44MB disk drive assembly for controlled software updates and maximized in-rack security

Optional removable 24x, low-profile CD-

ROM and 1.44MB disk drive assembly for controlled software updates and maximized in-rack security in ATA Models (2 x 80 GB 1" ATA/100 Non-Hot Plug Drives) or up to 72.8 GB in SCSI models (2 x 36.4 GB 1" SCSI nonHot Plug drives)

Maximum internal storage capacity is

Internal storage capacity of up to 160 GB

up to 80 GB in ATA Models (2 x 40 GB 1" ATA/100 Non-Hot Plug Drives) or up to 72.8 GB in SCSI Models (2 x 36.4 GB 1" SCSI non-Hot Plug drives) Serial port Two RJ-45 ports Graphics port Keyboard port Mouse port Two USB ports

Interfaces

Serial port Two RJ-45 ports Graphics port Keyboard port Mouse port Two USB ports

Revision 3.41

5 47

Servicing HP ProLiant Server Products

Chassis

1U form factor rack-mount (1.75inch)

1U form factor rack-mount (1.75-inch)

Video

Integrated ATI RAGE XL Video


Controller w 4 MB SDRAM Memory

Integrated ATI RAGE XL Video

Controller w 8 MB SDRAM memory Standard global 3/1/1 next business day (3-year parts, 1-year labor, 1-year on-site) Extended, Pre-Failure Warranty which covers processors, memory, hard drives

Warranty

Standard global 3/1/1 next business day (3-year parts, 1-year labor, 1year on-site) Warranty upgrades available to 3/3/3 and 4 hours response time

Revision 3.41

5 48

Servicing HP ProLiant Server Products

ProLiant DL320 Component Breakdown

Reference 1 2 3 4 5 6 7 8 9 10 11 12

Description Thumb Tabs Power Supply LED Indicators Removable CD-ROM/Diskette Drive Assembly (included in some models) Two 3.5 x 1 ATA or SCSI non-hot plug drive bays Fixed Rails Fan (7 Total) Processor (populated) DIMM Memory Slots (4 total) Ultra ATA/100 Controller Module (ATA Models) Single Channel Wide Ultra2 SCSI Controller Module (SCSI Models) 64-bit/33MHz PCI Slot

Revision 3.41

5 49

Servicing HP ProLiant Server Products

ProLiant DL320 Storage Configuration

Reference 1 2

Description

Up to two 1-inch height HP ATA or SCSI non-hot-plug hard drives An optional removable CD-ROM/diskette drive assembly including a low-profile 3.5-inch diskette drive and a low-profile CD-ROM drive

Revision 3.41

5 50

Servicing HP ProLiant Server Products

ProLiant DL320 Service Considerations


The following service considerations apply to the ProLiant DL320.
Remote ROM Flash Fans

This capability is available with Windows 2000 and Windows NT only. The two rear and three center wall fans are interchangeable; all must be operational for the system to run (there is no redundancy). Single fans are available as spares; the center wall spare has three fans already mounted on it A Fan 6 Error indicates an error from either one of the power supply fans. When this error occurs, you must replace the entire power supply unit. Although there is no interlock LED, there is an interlock circuit which prevents power up if the PCI riser board is not seated Trip Caution 43C - the server saves all running data and then shuts down one minute later. Trip Deadly 49C the server shuts down immediately. The installation of a Smart Array Controller to manage external SCSI hard drives is the same as installing any other PCI expansion card. If the Smart Array Controller is being used for the internal drives, the existing internal controller module must first be removed. The ProLiant DL320 drive activity LED does not flash when the Linux operating system is in use. (There will, however, be LED activity on the drives themselves). Novell is not supported because it is primarily a file and print server OS. With only two internal drive bays and no parallel port, this would not be a good platform for Novell The non-maskable interrupt switch (NMI) on the system board is for manufacturing use only. Always apply a new thermal pad and heat sink before reseating the processor. Failure to use a new heat sink may result in damage to the processor. When replacing the processor remove the plastic cover to expose the adhesive side of the thermal pad on the new heat sink before placing the heat sink on the processor. The system will not continue to operate if the plastic cover is left in place.

PCI Riser Interlock Temperature Events

Smart Array Controllers

Drive Activity LED

Novell Support

NMI Switch Processor Heatsink

Revision 3.41

5 51

Servicing HP ProLiant Server Products

ProLiant DL360, DL360 G2 and DL360 G3

The standard features of the ProLiant DL360 servers include:


DL360 G1 Processor DL360 G2 DL360 G3

1.26GHz, 1.13GHz, 1GHz, 933MHz, 866MHz, 800MHz or 550MHz processor Dual processor capability (except for 550MHz processor) Customers who choose to upgrade from 1GHz and below will require an upgrade kit in addition to the processor option kit. 128 MB 133-MHz ECC registered SDRAM DIMM memory expandable to 4GB 256KB One 64-bit 33MHz PCI slot One 32-bit 33MHz PCI slot

Intel Xeon 1.4GHz, processor with 133MHz front side bus Dual processor capability Customers who choose to upgrade from 1GHz and below will require an upgrade kit in addition to the processor option kit. 256 MB 133-MHz ECC registered SDRAM DIMM memory expandable to 4GB 512KB Level 2 Two full length expansion PCI slots: 64-bit/66MHz

Processor upgrades

Intel Xeon 2.4GHz, 2.8GHz or 3.06GHz processor with 533MHz front side bus Dual processor capability Option kits available for Intel Xeon 3.06GHz, 2.80 GHz, 2.40 GHz processors

Memory

512 MB or 1024 MB 266MHz PC2100 DDR SDRAM expandable to 8GB 512KB Level 2 1024KB Level 3 Two full length expansion PCI-X slots: 64-bit/100MHz Note: One PCI-X slot if redundant power supply installed

Cache memory Expansion slots

Revision 3.41

5 52

Servicing HP ProLiant Server Products

Network controller

Two integrated HP NC3163 Fast Ethernet NICs

Two NC7780 PCI-X 10/100/1000-T Server Adapter Note: 64-bit/133MHz PCI-X bus speeds not supported - will run at 64-Bit/66MHz. Smart Array 5i Controller (integrated on system board) Note: External SCSI port not offered Wide Ultra2/Ultra3 SCSI Drive Cage supports up to two 1 hot plug hard drives Maximum internal storage 293.6 GB (2 x 146.8 GB Ultra320, 1" drives) Optional removable CD-ROM/Diskette Drive Assembly Serial port Two RJ-45 ports External SCSI connector Keyboard port Mouse port Two USB ports iLO remote management port

Two integrated NC7781 PCI-X Gigabit NICs

Storage controller

Integrated dual channel Smart Array Controller

Smart Array 5i Plus Controller (integrated on system board) Note: External SCSI port not offered Wide Ultra320 SCSI Drive Cage supports up to two 1 hot plug hard drives Maximum internal storage 293.6 GB (2 x 146.8 GB Ultra320, 1" drives) Optional removable CD-ROM/Diskette Drive Assembly Serial port Two RJ-45 ports External SCSI connector

Storage and expansion

Wide Ultra2/Ultra3 SCSI Drive Cage supports up to two 1 hot plug hard drives Maximum internal storage 145.6 GB (internal drive cage) (2 x 72.8 GB Wide Ultra3, 1 drives) Optional removable CD-ROM/Diskette Drive Assembly Serial port Two RJ-45 ports External SCSI connector

Interfaces

Keyboard port Mouse port

Keyboard port Mouse port Two USB ports iLO remote

management port

Chassis Warranty

1U form factor rackmount (1.75-inch)

1U form factor rackmount (1.75-inch)

1U form factor rackmount (1.75-inch)

Three-year on-site Next-Business-Day limited Global warranty Extended Pre-Failure Warranty covers Pentium III processors, memory, and hard drives

Protected by HP Services, including a three-year, next business day on-site limited global warranty and extended Pre-Failure Warranty which covers processors, memory, and hard drives Certain restrictions and exclusions apply.

Protected by HP Services, including a three-year, next business day on-site limited global warranty and extended Pre-Failure Warranty which covers processors, memory, and hard drives Certain restrictions and exclusions apply.

Revision 3.41

5 53

Servicing HP ProLiant Server Products

ProLiant DL360 Component Location

ProLiant DL360 Storage Configuration

RemovableCD-ROM/ Hot Plug drive bays floppy drive assembly


5 54

Revision 3.41

Servicing HP ProLiant Server Products

Revision 3.41

5 55

Servicing HP ProLiant Server Products

ProLiant DL360 Service Considerations


The following service considerations apply to the ProLiant DL360.
CPU and processor board

The DL360 has dual processor capabilities with all but 550MHz processors. Installing two 550MHz processors will generate a halt and POST error. The system automatically detects and configures settings when a processor is added or replaced. The Processor socket 1 must be populated at all times for the system to complete POST. 1.26/1.13GHz SKUs use a different system board than the 1GHz and Below. The new system board is NOT backwards compatible with the 1GHz and below processors. This is because the 1.26 and 1.13GHz processors have 512K of lervel-2 cache and require a new VRM (PPM) and socket To upgrade 550 MHz, 800 MHz, 866 MHz or 933 MHz Models to 1.0 GHz, the HP ProLiantDL360 P1000 Upgrade Kit is required (PN 225352-B21). When upgrading 550 MHz, 800 MHz, 866 MHz, 933 MHz, 1.0 GHz Models to a 1.266 GHz or 1.133 GHz Model, the HP ProLiant DL360 P1133/P1126 Upgrade Kit (236122-B21) is required. Only PC133MHz ECC registered DIMMS can be used in this server. External drives support RAID 0 only off integrated SCSI controller. Integrated SCSI controller supports only single tape drives - not tape libraries. If Wide Ultra2 and Wide Ultra3 drives are mixed on the embedded array controller, all drivers will operate at Wide Ultra2 speeds. The shipping pin must be removed before the CD-ROM/Floppy drive assembly can be ejected. The server must be placed in power standby mode before removing the CDROM/Floppy drive assembly. The drive assembly bay should always have either the CD-ROM/Floppy drive assembly or a bezel blank installed for proper air flow. Failure to do so may result in thermal damage. All Remote Insight Boards must be installed in the 32-bit PCI slot. To allow LAN access to the Remote Insight Lights-Out Edition (RILOE), a LAN cable must be attached to the RJ-45 connector on the RILOE board. The RJ-45 connector on the rear panel will not provide network access to the RILOE. The CMOS/NVRAM battery is not soldered down and is replaceable. Failure to run the server without an expansion board or an expansion slot cover in each of the expansion slots may cause thermal damage. Some POST error messages are non-standard. Always refer to the Maintenance and Service Guide for POST error messages. Unit Identification Switch in front and back of server for easy identification of server in rack.

Processor upgrade requirements

Memory Mass storage

Miscellaneous

Revision 3.41

5 56

Servicing HP ProLiant Server Products

ProLiant DL380, DL380 G2 and DL380 G3

The standard features of the ProLiant DL380 G1, G2 and G3 servers include:
DL380 G1 Processor DL380 G2 DL380 G3

Pentium III 1GHz, 933MHz, 866MHz, 800MHz, 733MHz or 667MHz Upgradeable to dual processing 128 MB 133-MHz ECC SDRAM memory expandable to 4GB

Pentium III 1.4GHz, 1.2GHz or 1.13GHz, Upgradeable to dual processing

Intel Xeon Processor 3.06 GHz, 2.8GHz, 2.4GHz Upgradeable to dual processing 1024 MB 266MHz PC2100 DDR SDRAM on 3.06GHz models expandable to 12GB or 512 MB 200MHz PC2100 DDR SDRAM on 2.8GHz models or lower expandable to 6GB Advanced ECC and online spare capable 512KB Level 2 1024KB Level 3 Two 64-bit/ 100MHz hot plug One 64-bit/133MHz non-hot plug Two integrated NC7781 PCI-X Gigabit NICs Integrated Smart Array 5i+ controller

Memory

256 MB 133-MHz ECC SDRAM memory expandable to 6GB

Cache memory Expansion slots

256KB per processor Three 64-bit/ 33MHz One 32-bit 33MHz

512KB per processor Two 64-bit/ 66MHz hot plug One 64-bit/33MHz non-hot plug Two HP NC3163 Fast Ethernet NIC 64 PCI dual base controller Integrated Smart Array 5i controller

Network controller

Embedded HP NC3163 Fast Ethernet 10/100 PCI NIC with Wake on LAN Integrated Smart Array controller

Storage controller

Revision 3.41

5 57

Servicing HP ProLiant Server Products

Storage and expansion

1.44MB diskette drive, low-profile 24X max CD-ROM drive Support for up to six 1 Wide Ultra2/Ultra3 hot plug hard drives: 4 in the standard drive cage; 2 in an optional 2x1inch drive cage Two serial ports/one parallel port RJ-45 port External SCSI port (for tape only) ports

1.44MB diskette drive, low-profile 24X max CD-ROM drive Support for up to six Ultra3 hot plug hard drives: five 1 drives and one 1.6 (for disks or tape).

1.44MB diskette drive, low-profile 24X max CD-ROM drive Support for up to 6 drives with single or dual channel (using either the embedded Smart Array 5i Plus controller or a PCIbased controller) One serial port/two USB ports Three RJ-45 ports (one for iLO remote management) External SCSI port (for tape only) Keyboard and mouse ports XL video controller with 8MB video memory

Interfaces

One serial port/two USB ports Two RJ-45 ports External SCSI port (for tape only) ports

Keyboard and mouse

Keyboard and mouse

Graphics

Integrated ATI Rage

IIC video controller with 4MB video memory 3U form factor rackmount Three-year on-site limited warranty Extended Pre-Failure Warranty covers Pentium III processors, memory, and hard drives

Integrated ATI Rage

IIC video controller with 8MB video memory 2U form factor rackmount Three-year on-site limited warranty Extended Pre-Failure Warranty covers Pentium III processors, memory, and hard drives

Integrated ATI Rage

Chassis Warranty

2U form factor rackmount

Three-year on-site limited warranty Extended Pre-Failure Warranty covers processors, memory, and hard drives

Revision 3.41

5 58

Servicing HP ProLiant Server Products

ProLiant DL380 Component Breakdown

ProLiant DL380 G2/G3 Component Breakdown

Revision 3.41

5 59

Servicing HP ProLiant Server Products

ProLiant DL380 G1 Storage Configuration

Ref. # 1 2 3 4 5

Description Hot-plug drive cage accommodating four 1-inch heightSCSI hot-plug hard drives Two 5.25-inch wide x half-height drives Optional drive cage that supports two 1-inch media devices Low-profile IDE CD-ROM drive Diskette drive

ProLiant DL380 G2/G3 Storage Configuration

Ref. # 1 2 3 4
Revision 3.41

Description Support for up to six 1-inch, hot-plug SCSI hard drives Support for one optional 1.6-inch HP Universal Hot-Plug Tape drive with five hot-plug SCSI hard drives installed One bay occupied by a slimline 1.44-MB diskette drive One CD MultiBay occupied by a removable 24X IDE CD-ROM drive 5 60

Servicing HP ProLiant Server Products

ProLiant DL380 Service Considerations


The following service considerations apply to the ProLiant DL380 and DL380G2.
Processors and ROM

Processor slot 1 must be populated at all times. If it is necessary to remove the processor from slot 1, install the second processor in slot 1. Both processors must be the same speed. Pentium III processors can no longer be down-clocked or up-clocked. Intel now locks in the speed. The system ROM maintains a primary and redundant image of the BIOS. If one image is corrupt, POST error 105 Current System ROM is corrupt-now booting redundant system ROM will appear. If bot ROM images are corrupt, enable disaster recovery mode by setting SW2 position 1,4, 5, and 6 to the ON position and rebooting. Only PC133MHz ECC register DIMMS can be used. IMPORTANT: Do not force the installation of DIMMs. If the alignment does not match, it is probably the wrong type of DIMM. This battery is not soldered onto the board and is replaceable. If only 1 SCSI drive is used, it should be installed in Bay 0. Wide Ultra2 and Wide Ultra3 drives can be mixed on the embedded controller, but all drives will operate at Wide Ultra2 speeds.

Memory

!
CMOS and NVRAM Battery Storage

Revision 3.41

5 61

Servicing HP ProLiant Server Products

ProLiant DL560

The standard features of the ProLiant DL560 include the following:


Processors Cache memory Memory Expansion slots Intel Xeon MP 1.5GHz, 1.9GHz, 2.0, 2.5 2.8GHz Upgradeable to quad processing 1MB integrated level 3 on 1.5GHz, 1.9 and 2.0GHz models 2MB integrated level 3 on 2.5 and 2.8GHz models 200MHz DDR SDRAM with Advanced ECC Support for a maximum of 12GB

Three PCI-X slots total: Two 64-bit 100MHz non-hot plug One 64-bit 133MHz non-hot plug Two embedded 10/100/1000 HP NC7781 Dual channel Ultra3 controller Smart Array 5i Plus controller (with 64MB memory) One slimline 1.44MB ejectable drive One slimline 24X IDE ejectable CD-ROM Up to two internal 1 U320 hot plug hard drives Serial, mouse, keyboard and video ports One RJ-45 iLO connector Two RJ-45 NIC connectors Two USB ports

Network controller Storage controller Storage

Interfaces

Video Warranty

Integrated 1280x1024, 16M color Video Controller with 8MB Video Memory Three-year on-site limited warranty. 3-years parts, 3-years labor, and 3-years onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Revision 3.41

5 62

Servicing HP ProLiant Server Products

ProLiant DL560 Component Breakdown

Revision 3.41

5 63

Servicing HP ProLiant Server Products

ProLiant DL560 Service Considerations


The following service considerations apply to the ProLiant DL560.
Array controller

The DL560 G2 has an embedded array controller (Smart Array 5i) which connects to the two drives on the front of the server. The array controller does not have a connection to support external drives, so customers must install an array controller in an expansion slot to use external SCSI storage. When the fans are in a fully redundant configuration, there will be a sudden reduction in fan noise after the server completes POST. All of the fans spin up at power up and self-test. When the redundant fans spin down, there is a sudden reduction in fan noise. This is normal. Install memory in pairs of identical DIMMs. All DIMMs installed must be the same speed Install DIMMS into both slots of the next available memory bank, beginning with bank A, then bank B, lastly bank C. 207-Memory Configuration Warning - DIMM In DIMM Socket X does not have Primary Width of 4 and only supports standard ECC. 209-Online Spare Memory Configuration - Spare bank is invalid. Mixing of DIMMs with Primary Width of x4 and x8 is not allowed in this mode

Fans

Memory

Post error codes

Revision 3.41

5 64

Servicing HP ProLiant Server Products

ProLiant DL580 and DL580 G2

The standard features of the ProLiant DL580 and DL580 G2 include the following:
Processors Cache memory Memory DL580 Intel Pentium III Xeon processor 900MHz, 700MHz Upgradeable to quad processing 2MB L2 per processor (900MHz, 700MHz) 1MB L2 per processor (700MHz only) 1024MB PC100MHz Advanced ECC SDRAM (900MHz) 512MB PC100MHz Advanced ECC SDRAM (700MHz) Support for a maximum of 16GB DL580 G2 2.8,2.5,2.0,1.9,1.6,1.5,1,4 GHz Xeon MP Upgradeable to quad processing 2MB (2.0, 2.8GHz) iL3 or 1MB (2.5, 2.0, 1.9, 1.6, 1.5 GHz) iL3 or 512KB (1.40 GHz) iL3 2048MB 200MHz DDR, Advanced ECC, 4:1 inteleaved (2P model) 1024MB 200MHz DDR, Advanced ECC, 4:1 inteleaved (1P model) Support for a maximum of 32GB Online Spare Memory, Single Board Mirrored Memory, Hot-Plug Mirrored Memory

Expansion slots

Six total, five available: Two 64-bit 66MHz PCI Hot Plug (one available) Two 64-bit 33MHz PCI Hot Plug One 64-bit 33MHz PCI Non-Hot Plug One 32-bit 33MHz PCI Non-Hot Plug Integrated 10/100 NC3134 Fast Ethernet Wake On LAN support Integrated dual channel Wide Ultra2 SCSI controller Optional integrated Smart Array controller

Four full length hot pluggable 64-bit/100 MHz PCI-X slots Two full length non-hot pluggable 64bit/100 MHz PCI-X slots (one available, one used for the NIC)

Network controller Storage controller

Integrated HP NC7770 PCI-X Gigabit Server Adapter in a slot Integrated Smart Array 5i Plus Controller (Dual Channel, Ultra3) with 64-MB total memory on 5i Plus Memory Module Battery-Backed Write Cache Enabler module on all 2P models (optional on 1P model)

Revision 3.41

5 65

Servicing HP ProLiant Server Products

Storage

One 1.44MB diskette drive One 32X Max or faster IDE CD-ROM drive Four 1 hard drive bays One serial ports (2nd available with auxiliary serial connector provided) Parallel port External Wide Ultra2 SCSI RJ-45 port Keyboard port Mouse port Integrated ATI RAGE IIC Video Controller with 4MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

One 1.44MB diskette drive 24x IDE CD-ROM Drive (slim line) ejectable for security and serviceability Four 1 hard drive bays One serial port Keyboard port Mouse port Graphics port iLO remote management RJ-45 port USB ports (2)

Interfaces

Video Warranty

Integrated ATI RAGE IIC Video Controller with 8MB Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Revision 3.41

5 66

Servicing HP ProLiant Server Products

ProLiant DL580 Component Breakdown

Revision 3.41

5 67

Servicing HP ProLiant Server Products

ProLiant DL580 G2 Component Breakdown

Revision 3.41

5 68

Servicing HP ProLiant Server Products

ProLiant DL580 Service Considerations


An improperly seated component in the interlock chain causes the associated LEDs on the system board to light. There are seven LEDs to monitor seven components: four processor boards, the memory board, the peripheral board and the power supply backplane board. All of the LEDs are extinguished if there are no interlock errors. One or more LEDs are lit when a board is not properly seated.

There is an Internal Diagnostic Display on the peripheral board which indicates the failure of a memory module or processor. It displays a two-digit alphanumeric code that corresponds to a specific memory module or processor. The Diagnostic jumper must be removed before the IDD will display a code. Internal Diagnostic Display (IDD) Indicator Codes

Location of Internal Diagnostic Display (IDD)

Revision 3.41

5 69

Servicing HP ProLiant Server Products

Serial Port B is not installed in the factory. A cable is included in the country kit that can be connected from the peripheral board to the back of the chassis. There is a blank plate installed which can be removed when the connector is installed. The WOL feature is only supported by operating systems that support ACPI. At this time, that is only Windows 2000. The WOL feature is enabled through the System Configuration Utility. Use the following steps to enable WOL. 1. Press the Ctrl and A keys before the Continue message displays. This will take you to the Advanced Mode. 2. Scroll down to find the Enable WOL selection All Remote Insight Boards, including the new HP Remote Insight Lights-Out Edition, must be installed in PCI slot 6. Cabling the board to J8 of the system board provides the Lights-Out Edition with full control over the server power state. There are no VRM or PPM slots. This machine has On-Chip Voltage Regulation (OCVR). The VRM or PPM is part of the processor cartridge. To use the IRC capability, an external modem must be connected to one of the serial ports. In the event that the server locks up during normal operation because of software or hardware problems, depress and hold the power switch for a minimum of four seconds. This causes the system to transition to an off state.

ProLiant DL580 G2 Service Consideration


Optional SCSI cable assembly 288874-B21 is required for connecting a Smart Array Controller in a PCI slot to the internal hard drive backplane. Duplex configuration requires two such kits.

Revision 3.41

5 70

Servicing HP ProLiant Server Products

ProLiant DL590/64

The standard features of the ProLiant DL590/64 include the following:


Processors Cache memory Memory Up to 4 Intel Itanium 800MHz, 733MHz processors Upgradeable to quad processing 4MB level 3 (800MHz) or 2MB level 3 (733 MHz) 4GB 100 MHz ECC SDRAM DIMMs (800MHz) or 1GB 100 MHz ECC SDRAM DIMMs (733MHz) Support for a maximum of 64GB

Expansion slots

Eleven total, ten available: Eight 64-bit 66MHz PCI Hot Plug, eight available Three 64-bit 33MHz PCI Hot Plug, two available Integrated dual port 10/100 NC3134 Fast Ethernet Upgradeable to quad port Gigabit Integrated, dual channel, Wide-Ultra2 SCSI Smart Array Controller Optional support of Ultra3 and Fibre Channel in an I/O slot

Network controller Storage controller Storage

Integrated LS-120 Drive Integrated 24x IDE CD-ROM (Slim Line) Drive Four 1 hard drive bays Two serial ports Parallel port Two USB ports RJ-45 port Keyboard port Mouse port

Interfaces

Video Warranty

Embedded ATI Rage XL Video Controller with 8-MB SDRAM Video Memory Next-business-day, three-year on-site limited warranty. Coverage is for parts, labor, and onsite repair. Pre-Failure Warranty (on hard drives, memory, and processor)

Revision 3.41

5 71

Servicing HP ProLiant Server Products

ProLiant DL590 Component Breakdown

Revision 3.41

5 72

Servicing HP ProLiant Server Products

ProLiant DL590 Service Considerations


The following service considerations apply to the ProLiant DL590.
Power Supply Replacement

The actual power supplies and the power supply blanks look very similar. To install a new hot-plug power supply, you must remove a blank. If the server is powered on, make sure that you remove a blank and not the actual power supply. If you mistakenly remove a power supply and it is the only power supply in the server, then the server loses power, resulting in loss of data. (A power supply has a release lever on the left side, whereas a blank has a release tab at the top instead of a release lever.) If you are using a 120-volt AC power source to power the ProLiant DL590/64 server, do not install more than 2 power supply/SPM pairs. Using 3 power supply/SPM pairs requires more current than a 120-volt power source can provide. Consequently, the server will not power up under these conditions. HP does not support transferring memory modules from other non-Itanium platforms. To prevent damage to equipment or loss of information, HP strongly recommends using DIMMs supplied by HP. The ProLiant DL590/64 also supports third-party industry standard 168-pin PC100 CL2 ECC Registered DIMMs with the following restrictions: TDAL=3 Tclks, TRC=6 Tclks when operating at Tclk = 15 ns (66.67 MHz). Other third-party memory may result in error messages and possible loss of data. Do not grasp the components of the memory board VRM when removing it. These components are very fragile and can easily break. Hold the memory board VRM by the circuit board only. The ejection mechanism moves the VRM away from the memory board sufficiently to reduce the force needed to complete the memory board VRM removal. The server will not power up if the bus-to-core ratio jumpers are set to a speed higher than the slowest processor installed in the server. Failure to properly set the bus-to-core ratio jumpers can cause damage to the server and void the warranty. To prevent damage to the processor board, always ensure that a processor/PPM blank is installed in any unused processor slot before securing the triple beam in place. To avoid damaging components when replacing the I/O board, ensure that the I/O board is attached to the subpan along with the System Power Module basket, PCI basket insulator, and I/O board latch release lever assembly. To avoid damage to the I/O board and sideplane board connector when replacing the sideplane board assembly, ensure that the sideplane board assembly is fully pushed to the rear of the server to fully seat into the connectors before tightening the thumbscrew. Never remove more than one of the hot-plug redundant I/O fans at a time while the server is powered up. Loss of BIOS settings occurs when the battery is removed. BIOS settings must be reconfigured whenever the battery is replaced. As a precaution, place a sheet of paper on the metal surface below the battery. This will prevent the battery from shorting out if it gets dropped on the metal surface. When maintenance mode is enabled (the maintenance switch is set to on) and the system is powered up, NVRAM configuration is invalidated. 5 73

Power Source

Memory

Processor

I/O Board

I/O Fans Battery Replacement

System Maintenance Switch


Revision 3.41

Servicing HP ProLiant Server Products

ProLiant DL740

Standard features of the ProLiant DL740 include:


Processor Intel Xeon Processor MP 1.5 GHz/1MB or 2.0, 2.5 2,8GHz/2MB Processors Eight processor capability 133MHz SDRAM Hot Plug RAID Memory (Hot Add, Hot Replace, and Hot Upgrade) 2048MB min on 1.5GHz models (2560MB total) 4096MB min on >1.5GHz models (5120 MB total) /64GB max (80GB total) Note: All models come standard with all five memory cartridges populated and Hot Plug RAID Memory enabled. Total memory consists of addressable memory plus the redundant memory

Memory

Expansion slots Network controller Storage controllers Storage and expansion Video Form factor Management Warranty

Six 64bit/100MHz PCI-X hot pluggable Dual integrated (10/100/1000) Smart Array 5i Four 1.0" Ultra320 Wide Ultra3 hard drives Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory 4U (7") rack form factor - standard 19 rack-mountable Integrated Lights Out Standard (iLO) Limited Warranty includes 3 year Parts, 3 year Labor, 3-year on-site support Pre-Failure Warranty

Revision 3.41

5 74

Servicing HP ProLiant Server Products

ProLiant DL740 Component Breakdown

Revision 3.41

5 75

Servicing HP ProLiant Server Products

ProLiant DL740 Service Considerations


The following service considerations apply to the ProLiant DL740.
Processors Memory Processors cannot be mixed in the same server. All processors in each server must be of the same speed. Memory must be populated in banks of ten (five memory cartridges having two DIMMs per cartridge). If running in non-redundant mode, only four cartridges must be populated. All DIMMs in a bank must have the same part number. The Hot Add or Hot Upgrade procedures require Operating System Vendor (OSV) and Independent Software Vendor (ISV) support, but the Hot Replace feature is OS and Application transparent. Diagnostic LEDs on front bezel indicate condition of individual DIMMs and memory cartridges To ensure redundancy of server power, HP requires that 200-240 VAC power be used for all deployments. Monitors larger than 17" may be too heavy for use with rack systems. To ensure proper airflow, blanking panels must be used to fill all empty frontpanel U-spaces in the rack. Using a rack without blanking panels results in improper cooling, which can lead to thermal damage. When a third-party rack is used, it must be set up to meet specific requirements in order to ensure adequate airflow and to prevent damage to the equipment.

Power Monitors Airflow

Racks

Revision 3.41

5 76

Servicing HP ProLiant Server Products

ProLiant DL760 and DL760 G2

Standard features of the ProLiant DL760 G1 and G2 include:


DL760 Processor DL760 G2

900MHz Pentium III Xeon processor

with 2MB L2 cache 700MHz Pentium III Xeon processor with 1MB or 2MB L2 cache Supports up to eight processors for 8-way symmetric multiprocessing (SMP) Memory

2.8, 2.0GHz Intel Xeon processor with


2MB of L3 cache

2.5, 2.0, 1.5GHz Intel Xeon processor


2048 MB ECC protected 100MHz ECC SDRAM DIMMs (4P Models) 1024 MB ECC protected 100MHz ECC SDRAM DIMMs (2P Models) Support for a maximum of 16GB Eleven 64-bit hot-plug (8 PCI-X, 3 PCI Dual 10/100 NC3134 NIC Integrated Smart Array Controller: one channel for RAID support with internal drives, one for external tape drive 1.44MB diskette, 24X or greater IDE CD-ROM drive Support for up to four one-inch hotpluggable Wide Ultra2 SCSI hard drives in three combinable drive cages Integrated ATI Rage IIC graphics controller with 2MB Synchronous Graphics RAM (SGRAM) Rack-optimized 7U chassis

with 1MB L3 cache Support for up to 8 processors - 4 processors ship standard 133MHz ECC SDRAM Hot Plug RAID Memory 2048MB min on 1.5GHz models (2560MB total) 4096MB min on >1.5GHz models (5120 MB total) /64GB max (80GB total) Eleven 64-bit hot-plug: ten 64-bit 100MHz PCI-X one 64-bit 133MHz PCI Integrated NC7770 PCI-X Gigabit Integrated Smart Array 5i Controller

Expansion slots Network controller Storage controllers Storage and expansion

1.44MB diskette, 24X or greater IDE CD-ROM drive Support for up to four one-inch hotpluggable Ultra 320 SCSI hard drives Integrated ATI RAGE XL Video Controller with 8-MB SDRAM Video Memory

Video

Form Warranty

Global three-year on-site limited warranty for parts and labor with next-business-day response Extended Pre-Failure Warranty (if Insight Manager is installed on the server) covers
processors, memory, and hard drives

Upgrade Kit
Revision 3.41

An upgrade option kit (PN 190756-B21) is available to upgrade any ProLiant 8500 or
ProLiant DL760 G1 to a ProLiant DL760 G2 5 77

Servicing HP ProLiant Server Products

ProLiant DL760 Component Breakdown

Reference 1 2

Description Processor/memory module One to eight 550MHz Pentium III Xeon processors with redundant processor power module Integrated Management Display (IMD), optional for Model 1 Integrated 1.44MB diskette drive Integrated high speed IDE CDROM drive (low-profile) Media module Four 1-inch Wide-Ultra2 SCSI hot-pluggable drive bays

Reference 8 9

Description Integrated lift handles Two redundant hot-plug power supplies (single power supply on Model 1) I/O module Redundant hot-pluggable fans Eleven hot-plug I/O slots including eight PCI-X and three PCI System interconnect status indicators

3 4 5

10 11 12

6 7

13

Revision 3.41

5 78

Servicing HP ProLiant Server Products

ProLiant DL760 G2 Component Breakdown

Revision 3.41

5 79

Servicing HP ProLiant Server Products

ProLiant DL760 Service Considerations


The following service considerations apply to the ProLiant DL760.
Serial Number DL760s that have been upgraded from 8500s will have two serial numbers the original and one assigned during the upgrade. Both numbers should be provided when placing a service call. The new serial number will be in the format xxxxFGY1xxxx. A pre-installed cable for Remote Insight Lights-Out Edition eliminates the need to thread a longer cable through the modules. If you install a Remote Insight Lights-Out Edition board, use the pre-installed cable, not the one that comes with the board. The new Array Bypass cable is longer than previous cables and enables you to connect a Smart Array controller in slot 10 or 11 of the media drawer in either of these slots the controller can run at 100MHz PCI-X. The ROM-Based Setup Utility (RBSU) is housed in the system ROM and is accessed by pressing F9 during system startup. Since RBSU is customized to the unique hardware and software environment of each server, its file size is small allowing it to run in seconds rather than minutes for previous configuration utilities on the hard drive. Conventional PCI and PCI-X adapters are interchangeable. However, a PCI-X adapter on a conventional PCI bus is limited to conventional PCI speeds. Also, each I/O bus will run at the speed of the slowest adapter in its corresponding slots. The ProLiant DL760 is a rack-only form factor and should be installed in an industrystandard, 19-inch rack. The processor used in the ProLiant DL760 requires increased airflow, which is provided by the HP Rack 9000 Series: the 42U Rack, the 36U Rack, and the 22U Rack. When the ProLiant DL760R is installed in a HP Rack 7000 or 4000 Series, the processor also requires the HP High-Airflow Rack Door Insert. The ProLiant DL760 weighs 137 pounds fully assembled. To reduce the risk of personal injury or damage to the equipment, remove the modules and obtain adequate assistance to lift and stabilize the product during installation or removal.

Cables

Setup

PCI vs PCI-X

Form factor Cooling Requirements

Moving the ProLiant DL760 Server

Revision 3.41

5 80

Servicing HP ProLiant Server Products

ProLiant DL760 Service Considerations (continued)


LEDs

The ProLiant DL760 has several sets of LEDs that assist in troubleshooting: Front panel (power status, fan status, Information Management display) System interconnect Fan PCI Hot Plug Power supply When one of the connected components in the interconnect chain is improperly seated in its connector or is missing, the System Interconnect LED associated with the fault origination point will illuminate on the system midplane board and will be displayed on the top access panel. When any of the interconnect LEDs are lit, the front panel power status LED will illuminate amber. System Interconnect Status Indicators Component Indicator Component Emergency shutdown 10 Memory board Processor MP8 11 Processor power module Processor MP7 12 Processor and memory module Processor MP6 13 I/O module and fans Processor MP5 14 Media module Processor MP4 15 SCSI backplane 1 Processor MP3 16 SCSI backplane 2 Processor MP2 17 Reset Processor MP1

Indicator 1 2 3 4 5 6 7 8 9 Power Considerations

The ProLiant DL760 comes standard with two 1150/500W Redundant Hot Plug Power Supplies. Each power supply generates 500W in the 110VAC configuration or 1150W in the 220VAC configuration. The power supplies autosense 110VAC or 220VAC and are auto load balancing. They are microprocessor controlled which allows them to be monitored for advanced health and configuration management. Memory can be expanded to a maximum of 16GB. Install SDRAM DIMM modules two at a time in the proper sockets. When installing or replacing memory, you must use only 256MB, 512MB or 1GB SDRAM DIMMs. Each DIMM of a given bank must be the same size, type, and speed.

Installing or replacing memory

Revision 3.41

5 81

Servicing HP ProLiant Server Products

ProLiant DL760 Service Considerations (continued)


Processors

The Processor Power Module must be installed before you install the accompanying processor. Attempting to install the PPM afterward could damage the electronic components on the PPM. On the host board are three rows of contacts. If a processor is not fully seated, these contacts will not line up and the unit will not function. To ensure proper seating for this sensitive connection, HP developed newly designed ejectors on these processors, which also serve as injectors used to fully seat the processor. As you push down on a levers, they will cam down to seat the processor completely. 1. 2. To remove a processor: Lift up and rotate the front and rear ejector levers on the processor outward. Use the tabs to pull out the processor. If you remove a processor, you must install a processor terminator board before powering up the server. The system will not power up if there are empty slots. If more than 4 processors are installed, NT Enterprise is required. Mixing of PII Xeon and PIII Xeon processors is not supported. Pentium III Xeon 550MHz, 700 MHz, or 900 MHz processors cannot be mixed in the same server. They all have to be of the same speed. When Processors are installed on the second of the two system (processor) buses, a pair of Cache Coherency Accelerators must be installed. If the coherency accelerator memory fails or the modules are not installed properly, the system will initialize only processor bus 1. There is up to a 40 second delay between power on and video. F9 is used to access the ROM based configuration utility This machine has a remote-flash redundant ROM which allows recovery in the event of ROM failure

Miscellaneous

Revision 3.41

5 82

Servicing HP ProLiant Server Products

ProLiant DL760 G2 Service Considerations


Processors

Customers should only install 4 or 8 of the same processor in the DL760G2. Processor
mixing of different frequencies and cache sizes is not supported on the DL760G2. Unpredictable behavior may result if processors are mixed. At the very least this will cause the system to run at the slowest processor speed. The Processor & Memory Module needs to be re-extended about 2.5-inches to disconnect it from the System Midplane Assembly, and re-bolted with the orange Shipping Screws before shipping the unit. The standard procedure for installing a DL760 G2 into a rack now involves removing the shipping screws prior to removing all modules and powers supplies, attaching slides to the empty chassis and rails to the rack, mounting the chassis in the rack, reinstalling the modules, and verifying all critical components are properly seated by reviewing the system status indicator lights. Alert customers and resellers to keep these shipping screws for future use. provide PCI hot plug support or if you do not have the appropriate device drivers installed. Failure to take these precautions causes system shutdown and risks data integrity. Hot plug capability is only functional when using a hot plug aware expansion board and after installing: The PCI-X Hot Plug device drivers (located on the SmartStart CD and HP website), and An operating system that supports PCI-X Hot Plug technology (support levels vary) Hot-Replace capability is operating system independent; Hot Add or Hot Upgrade require operating system and application support. All models come standard with all five memory cartridges populated and Hot Plug RAID Memory enabled. Total memory consists of addressable memory plus the redundant memory DIMMs are installed in bank pairs of ten. A bank of memory is five DIMMs one in the corresponding slot across each of the five cartridges - and in order to achieve interleaving performance advantages, memory must be installed two banks at a time (a bank pair). An LED bar is located directly underneath the memory cartridges. This LED bar includes an LED for all 40 DIMM slots in the memory subsystem. When a cartridge is removed from the server to replace a failed DIMM, the LED for the failed DIMM remains lit so that it can be matched with the label of the DIMM slot inside the memory cartridge. When a cartridge is removed and the server is therefore running in non-redundant memory mode, any attempt to unlock a second cartridge will not bring down the power to that second cartridge. Instead, audible and visual alarms will indicate the need to relock that second cartridge - if a second cartridge is unlocked and removed from the server while the server is running, the server will fail.

Shipping screws

PCI Hot Plug

Do not attempt a PCI hot plug procedure if your operating system does not

Hot-Plug RAID Memory

Revision 3.41

5 83

Servicing HP ProLiant Server Products

Blade Servers
HPs BL line denotes a line of ProLiant servers that offer power-efficient servers in ultra-dense, space-saving packaging. The offerings in the BL line include:

ProLiant BL10e ProLiant BL20p ProLiant BL40p

Objectives
To demonstrate an awareness of HP blade server products, service personnel should be able to:

Describe the features and characteristics of HP BL servers. Locate configuration and service information relative to each product.

Revision 3.41

5 84

Servicing HP ProLiant Server Products

ProLiant BL10e

The standard features of the ProLiant BL10e servers include:


ProLiant BL e-Class Server Enclosure BL10e G1 Enclosure Size BL10e G2

3U form factor (5.25 high x 17.5 wide x 28.35 deep) Bays for up to 20 ProLiant BL10e server blades RJ-45 Patch Panel (with 40 RJ45 connectors) or RJ21 Patch Panel (with 4 RJ21 connectors) or ProLiant BL e-Class C-GbE interconnect switch (4 Gigabit Ethernet uplinks) Local console and remote network access Remote power control for enclosure and server blades Remotely toggle on/off unit identification LEDs for blades &enclosure Monitors/controls enclosure fans, temperature sensors, blade status Connects to each blades serial console Two diagnostic adapter interfaces provide ProLiant BL10e server blade with diagnostic LEDs, buttons and the following external ports: mouse, keyboard, video, serial, USB (2) Three-year limited warranty on enclosure and interconnect trays

3U form factor (5.25 high x 17.5 wide x 28.35 deep) Bays for up to 20 ProLiant BL10e server blades RJ-45 Patch Panel (with 40 RJ45 connectors) or RJ21 Patch Panel (with 4 RJ21 connectors) or ProLiant BL e-Class C-GbE interconnect switch (4 Gigabit Ethernet uplinks) Local console and remote network access Remote power control for enclosure and server blades Remotely toggle on/off unit identification LEDs for blades &enclosure Monitors/controls enclosure fans, temperature sensors, blade status Connects to each blades serial console Two diagnostic adapter interfaces provide ProLiant BL10e server blade with diagnostic LEDs, buttons and the following external ports: mouse, keyboard, video, serial, USB (2) Three-year limited warranty on enclosure and interconnect trays

Interconnect Tray

Integrated Administrator Module

Diagnostic Adapter Interface

Warranty

Revision 3.41

5 85

Servicing HP ProLiant Server Products

ProLiant BL10e Server Blade BL10e G1 Processor Cache memory Memory BL10e G2

Single ultra-low voltage (ULV) Pentium III 900MHz, 100MHz front side bus 512KB L2 512MB PC133MHz registered ECC SDRAM Expandable to 1GB maximum using a total of two DIMM slots None available Two NC3163 10/100 Fast Ethernet NICs 64 with Wake on LAN (WOL) 40-GB Ultra ATA/100 5,400 rpm nonhot-plug hard drive, 2.5" Diagnostic port support for local keyboard, video, mouse, diskette drive Also supports USB devices : keyboard, mouse, CD-ROM, floppy disk One-year limited warranty on server blades

Single ultra-low voltage (ULV) Pentium M 1GHz, 400MHz front side bus 1MB L2 512MB C2100 registered ECC DDR Expandable to 1GB maximum using a total of two DIMM slots None available Two NC3163 10/100 Fast Ethernet NICs 64 with Wake on LAN (WOL) 40-GB Ultra ATA/100 5,400 rpm nonhot-plug hard drive, 2.5" Diagnostic port support for local keyboard, video, mouse, diskette drive Also supports USB devices : keyboard, mouse, CD-ROM, floppy disk One-year limited warranty on server blades

Expansion slots Network controller Storage Diagnostic Adapter Interface Warranty

Revision 3.41

5 86

Servicing HP ProLiant Server Products

ProLiant BL10e Enclosure Component Breakdown

1 7 6 6

10

Reference 1 2 3 4 5

Description Hot Plug Power Supply (600W) Center wall assembly ProLiant BL e-Class Integrated Administrator module Fan backplane assembly Fan cage

Reference 6 7 8 9 10

Description Hot Plug fan Enclosure status assembly ProLiant BL e-Class C-bE Interconnect Switch interconnect tray RJ-21 patch panel interconnect tray RJ-45 patch panel interconnect tray

Revision 3.41

5 87

Servicing HP ProLiant Server Products

ProLiant BL10e Blade Server Component Breakdown

3 4

1 2

Reference 1 2 3 4

Description ProLiant BL10e server blade ATA hard drive assembly 133MHz SDRAM DIMM 3.3-V Lithium battery

Revision 3.41

5 88

Servicing HP ProLiant Server Products

ProLiant BL10e Service Considerations


The following service considerations apply to the ProLiant BL10e:
Diagnostic Adapter and Hot Plug Capability Because PS/2 devices do not support hot-plug technology, restart the server blade after attaching the diagnostic adapter. You can hot-add peripheral devices using the diagnostic adapter if the devices support hot-plug capability. USB devices are hot-plug supported and do not require restarting the server blade after attachment.

Memory Requirements Graceful Power Shutdown

Use only 128-MB, 256-MB, or 512-MB, 72-bit wide, 3.3 V, registered ECC SDRAM. SDRAM can be either 100 or 133 MHz. Use HP SDRAM only. You can perform a graceful shutdown of a ProLiant BL10e server blade or a ProLiant BL e-Class enclosure by using the Power Off option in the Integrated Administrator. You can also perform a graceful shutdown of a ProLiant BL e-Class enclosure and all the server blades by pressing the enclosure power button on the rear of the enclosure if your operating system is Microsoft Windows 2000. If your operating system is RedHat Linux, you must have the HP Linux Health driver installed in order for the server blades to shut down gracefully. You can perform an emergency shutdown of a ProLiant BL10e server blade by pressing and holding the power button on the front of the server blade for four seconds. You can also perform an emergency shutdown of a ProLiant BL e-Class enclosure and all server blades by pressing and holding the power button for four seconds. Note: Performing an emergency shutdown blade may result in the loss of any unsaved data. The Integrated Administrator performs an emergency shutdown of the enclosure and all server blades only after trying for five minutes to perform a graceful shutdown. If, after five minutes, the Integrated Administrator cannot perform a graceful shutdown on the enclosure and all server blades, the Integrated Administrator performs an automatic emergency shutdown. Performing an emergency shutdown on the enclosure may result in the loss of any unsaved data on all server blades in that enclosure. Integrated Administrator security settings are assigned to server blade bays, not to server blades. If server blades change locations within the enclosure, Integrated Administrator settings must also be adjusted to ensure accurate security. Do not remove a failed power supply until a replacement power supply is available, to avoid a thermal event.

Emergency Power Shutdown

Integrated Administrator Security Settings Power Supply Replacement

Revision 3.41

5 89

Servicing HP ProLiant Server Products

ProLiant BL20p and BL20P G2

The standard features of the ProLiant BL20p servers include:


BL20p Processor BL20p G2

Intel Pentium III FC-PGA processor 1.40GHz Upgradable to dual processing 512-KB Level 2 Cache 133MHz bus

2.8GHz or 3.06GHz Intel Xeon processor Upgradeable to dual processing 1MB Level 3 cache (3.06 GHz only) 512KB Level 2 cache (3.06GHz, 2.8Ghz) 533MHz front side bus 512MB ECC PC2100 DDR Std/8GB max Integrated Smart Array 5i Plus with optional battery-backed write cache Three 10/100/1000 NICs 1 Dedicated iLO Port

RAM Std/Max Drive controller NIC

512MB PC133 ECC SDRAM Std/4GB max Integrated Smart Array 5i with optional battery-backed write cache Three 10/100 NICs 2 upgradeable to 10/100/1000T 1 Dedicated iLO Port

Hard Drive Bays Slots Chassis Server mgmt Power Server Blade Enclosure

Two 3.5 SCSI hot plug drive bays No PCI slots - all features are integrated 1U X 6U form factor plugs vertically into 6U server enclosure

Integrated Lights-Out Rack-centralized External shared redundant hot-plug power 10 bays available - 8 bays for server blades plus 2 outside bays for interconnect modules Server blades blind mate into the server blade enclosure backplane for power and data connections Up to 6 BL p-Class 6U server blade enclosures fit in a 42U rack Server blade management module attached to the back of each server blade enclosure to report events for all servers and provide asset and inventory information

Revision 3.41

5 90

Servicing HP ProLiant Server Products

ProLiant BL20p and BL20p G2 Enclosure Component Breakdown

Revision 3.41

5 91

Servicing HP ProLiant Server Products

ProLiant BL20p Service Considerations


The following service considerations apply to the ProLiant BL20p
Setup message response

While installing a Service Pack If the following dialog box displays: Your computer vendor installed this file on your computer. Do you want this Service Pack to replace this file? Click on the NO button. Do no overwrite Compaq or HP software when prompted while installing a service pack, unless instructed to do so. When servicing the power and server blade enclosures: be aware that the server blade enclosure and the power enclosure do not have locking mechanisms that prevent them from sliding out of the rack while servicing. The default address for iLO is 192.168.1.1. Ensure this address is not used before plugging a new blade into the network. Do not connect the front iLO ports to a hub. All server blades have the same IP address through the diagnostic port. Multiples on a hub make the server blades indistinguishable on the network When removing server blades, physically label each server blade to ensure it will be installed back into the same position in the enclosure. Connecting to the diagnostic port with the diagnostic cable automatically disables the iLO connection on the rear of the server blade. If the server does not automatically power up and POST after inserting in the enclosure, press and hold the power button on the front of the server blade for at least 6 seconds. If using the Virtual Power Button in iLO, always use the Press and Hold selection when powering up a server blade for the first time. If both iLO and the blade are not responding, view iLO on an adjacent blade. If the adjacent blade appears normal, reset the out of service blade and iLO by operating the release lever and backing the blade out enough to disconnect power entirely from the blade for about 10 seconds. Re-install the blade and view iLO again. If iLO is still inoperable, remove the blade and test in the diagnostic station (or replace the blade with a spare for testing purposes, or insert the blade into a spare blade slot in the enclosure, if any, and try again). The new ProLiant BL20p G2 blade fits into the same enclosure as the BL20p and BL40p blades and shares the same power. SAN connectivity on the ProLiant BL20p G2 is provided using a Dual Port Fibre Channel Mezzanine Card specifically designed for it. The card cannot be installed in the ProLiant BL20p G1.

Enclosure insertion precaution iLO default address iLO port connect precaution Blade removal procedure Diagnostic cable connection Power up failure after blade insertion iLO and blade failure to respond

Common enclosure SAN Connectivity

Revision 3.41

5 92

Servicing HP ProLiant Server Products

ProLiant BL40p

The standard features of the ProLiant BL40p servers include:


Processor

Xeon MP 2.8GHz, 2.0GHz, 1.5GHz 400MHz bus 2MB Level 3 cache (2.8GHz, 2.0GHz only) 1MB Level 3 cache (2.0Ghz, 1.5GHz) Up to 4 processors 1GB PC2100 ECC DDR Std/12GB max (2p model) 512MB PC2100 ECC DDR Std/12GB max (1p model) Advanced memory protection with online spare Integrated Smart Array 5i Plus with optional battery-backed write cache Five 10/100/1000T Ethernet PXE enabled connections One dedicated iLO port Four 3.5" SCSI hot plug drive bays Two PCI-X slots for SAN connectivity Plugs vertically into p-Class server enclosure Up to 12 BL40p blades fit in 42U rack Integrated Lights-Out Rack-centralized External shared redundant hot-plug power Four bays wide X 6U high form factor - plugs vertically into 6U server enclosure

RAM Std/Max

Drive controller NIC Hard Drive Bays Slots Chassis Server mgmt Power Server Blade Enclosure

Revision 3.41

5 93

Servicing HP ProLiant Server Products

ProLiant BL40p Enclosure Component Breakdown

Revision 3.41

5 94

Servicing HP ProLiant Server Products

ProLiant BL40p Service Considerations


The following service considerations apply to the ProLiant BL40p
Blade power up mode

Blade removal procedures

Server Blades (by default) may be configured to power up upon insertion; however, this setting can be changed through iLO to manual power-up using the power button. Use the setting on the iLO Rack Settings page called Enable Automatic Power On. If a server is removed for any reason, ensure a blank is inserted in its place. When removing server blades, physically label each server blade to ensure it will be installed back into the same position in the enclosure. The default address for the iLO front port on all blades is always 192.168.1.1. Change this port to an unused address before plugging it into a network.

iLO default address

Revision 3.41

5 95

Servicing HP ProLiant Server Products

Cluster Line
HPs cluster line offers ProLiant servers with simple and affordable packaged clusters powered by ProLiant Servers and Smart Array technology. The current offerings in the packaged cluster line include:

ProLiant DL380 G3 Cluster ProLiant DL380 G3 Integrated Cluster ProLiant DL380 G2 Cluster ProLiant CL380

Objectives
To demonstrate an awareness of HP packaged cluster products, service personnel should be able to:

Describe the features and characteristics of HP packaged cluster servers. Locate configuration and service information relative to each product.

Revision 3.41

5 96

Servicing HP ProLiant Server Products

ProLiant DL380 G2 and DL380 G3Packaged Clusters

The standard features of the ProLiant DL380 G2 and DL380 G3 packaged clusters include:
DL380 G2 Servers DL380 G3

Two ProLiant DL380 G2 servers Server features listed under DL380 G2 8U configuration fixture

Two ProLiant DL380 G3 servers Server features listed under DL380 G3 8U configuration fixture 14U configuration fixture available for racked version Smart Array Cluster storage Now supports U320 SCSI 10k and 15K rpm Universal Hard Drives 4-Port Shared storage module option for the highest level of availability (multipath software included) Two VHDCI SCSI cables (one per server) Ethernet crossover cable (cluster heartbeat for MSCS) Three-year on-site limited warranty Extended Pre-Failure Warranty covers processors, memory, and hard drives

Packaging

Storage

Smart Array Cluster storage

Cables

Two VHDCI SCSI cables (one per server) Ethernet crossover cable (cluster heartbeat for MSCS) Three-year on-site limited warranty Extended Pre-Failure Warranty covers processors, memory, and hard drives

Warranty

Revision 3.41

5 97

Servicing HP ProLiant Server Products

ProLiant DL380 G2 and ProLiant DL380 G3 Packaged Cluster Component Breakdown

Revision 3.41

5 98

Servicing HP ProLiant Server Products

ProLiant DL380 G3 Packaged Cluster - Racked Component Breakdown

1. 14U rack 2. Two 2U Servers

3. 4U Shared Storage with 14 1 Hot Plug Drives 4. Open rack space for options (6U total)

Revision 3.41

5 99

Servicing HP ProLiant Server Products

ProLiant DL380 G2 and DL380 G3 Packaged Cluster Service Considerations


The following service considerations apply to the ProLiant DL380 G2 and DL380 G3 Packaged Cluster:
Configuration Utility Server Configuration Do not use the Option ROM Configuration for Arrays (ORCA) utility to configure your servers and storage. The Array Configuration Utility (ACU) on the HP SmartStart for Servers CD must be used. Do not power on the storage system before configuring the servers. Configure only one server at a time. During server configuration the Array Configuration Utility (ACU) will configure the Smart Array Controller 5i for the server hard drives. The shared storage hard drives will be configured later. You must power on the Smart Array Cluster Storage before powering on the servers. After powering on, wait until the storage system startup complete message appears on the display. (It may take up to two minutes for the system to completely power up). ProLiant DL380 Generation 3 (G3) Servers or ProLiant DL380 Generation 3 (G3) Packaged Clusters may generate Power-On Self-Test (POST) error message 1611, indicating a fan failure. This error may occur if either of the two thumbscrews that secure the system board to the chassis (underneath the fan bracket) are not fully tightened.

Storage Configuration Fan Failure

Revision 3.41

5 100

Servicing HP ProLiant Server Products

ProLiant CL380 Cluster

The standard features of the ProLiant CL380 packaged cluster includes the following (all features per server unless specified other wise)::
Processor Cache Memory Upgradeability Memory Network Controller Expansion Slots

Intel Pentium III 1.0 GHz 256KB level 2 writeback cache per processor Upgradeable to dual processing 128MB PC 133MHz registered ECC DRAM Maximum 4GB Embedded NC3163 Fast Ethernet PCI 10/100 WOL for heartbeat monitoring NC3123 Fast Ethernet PCI 10/100 for public LAN (occupies one PCI slot per server) Four total, two available Three 64-bit/66MHz PCI 3.3V or universal cards (two available) One 32-bit/33MHz PCI 5V or universal cards (not available) 64-bit dual channel Wide Ultra2 in PCI slot (interface to shared storage) Integrated Smart Array Controller (utilized for server boot) 1.44MB diskette drive, 24x IDE CD-ROM, no Hard Drives Shared internal storage can accommodate six 1 Wide Ultra3 drives Optional non-shared internal storage can accommodate two 1 Ultra2/Ultra3 drives One RAID CR3500 controller ships standard; second controller optional (per cluster) Up to six drives in cluster server cabinet (per cluster) One parallel, two serial, mouse, keyboard, external SCSI (tape only), RJ45 Integrated ATI RAGE IIC video controller with 4MB memory Three year limited; pre-failure coverage of processors, memory, hard drives 5 101

Storage Adapters Storage

Shared Storage Interfaces Graphics Warranty


Revision 3.41

Servicing HP ProLiant Server Products

ProLiant CL380 Cluster Component Breakdown

Revision 3.41

5 102

Servicing HP ProLiant Server Products

ProLiant CL380 Service Considerations


The following service considerations apply to the ProLiant CL380:
KVM Firmware SCSI ID Limitation Be sure to update the KVM firmware when a new KVM switch is installed. The total number of shared storage drives that the system can support is limited to 14. This limitation is due to the number of available SCSI IDs and may not reflect the number of drive bays available if external storage is connected to the system. (SCSI IDs 6 and 7 are reserved for controllers). When replacing the system board in one of the cluster nodes be sure to remove the fan adapter jumper, located near Processor Power Module 1, and insert the fan adapter jumper onto the replacement system board. Failure to install this jumper on the system board will prevent the server node from powering up. Before configuring ASR-2, verify that the System Configuration Utility and Diagnostics software are installed on the system partition. ASR-2 must have these tools to start HP Utilities after system restart. HP recommends this even if you configure ASR-2 to start the operating system. When you enable ASR-2 to restart into the operating system, Modem Dial-In Status, Network Status, and Modem Dial-Out Status are automatically disabled. In this mode, ASR-2 can page you if a critical error occurs, but you cannot access the server, and the server cannot dial out to a remote workstation.

System Board Fan Adapter Jumper ASR-2 Restart

Revision 3.41

5 103

HP Smart Array Products


Module 6

Introduction
This module gives an overview of HP Smart Array products. Topics include: Drive array technology RAID and fault tolerance HP Smart Array controller features HP Smart Array controller service considerations HP Smart Array controller configuration utilities

Objectives
To demonstrate knowledge of HP array products and utilities, service personnel should be able to: Describe the features and benefits of drive array technology Explain how HP array controllers support RAID and fault tolerance. Describe the features and characteristics of current HP array controllers. Describe the key features of HP array configuration utilities List general service considerations for array controllers

Rev. 3.41

61

Servicing HP ProLiant Server Products

Drive array technology


An array is a set of physical disk drives that may be combined or subdivided into logical drives distributed across all disks in the set. The advantages of a drive array implementation are: Effective high-speed data transfer rates Ability to handle simultaneous multiple requests Increased storage capacity Flexibility in configuring data High reliability Drive array configuration information is stored on the drives, on the system board NVRAM, and on the array controller NVRAM. This allows the controller to be changed without requiring reconfiguring. It further allows a set of configured hard drives to be moved from one machine to another without data loss.

Logical volume configuration


Many physical drives can be grouped to create a logical volume. In the following example, seven 4.3GB SCSI drives are configured as a single 30.1GB logical drive. With dual-channel array controllers, up to 14 physical drives can be configured as a single logical drive.

Drive array technology distributes data across a series of drives that have been configured as a single logical volume. This data distribution scheme makes it possible to access data from multiple drives more quickly than from any one physical drive. It also allows the arrayed drives to service multiple requests simultaneously.

62

Rev. 3.41

HP Smart Array Products

Drive Array Features

Fixed Disk Drive Array Single Drive


1 4 2 3 1 2 3 4

1234

In addition to having multiple drives logically configured as a single drive, drive arrays provide the following features: Data striping across multiple drives. A file is divided into a selected number of sectors and then written across a series of drives. The process of writing (or reading) a file across multiple drives is much faster. Multiple channels. The drive array has up to four channels that can be used at the same time, thus increasing performance. Request processing. Because multiple commands can be issued across multiple devices, the commands can be processed at the same time and the requests are processed in the most logical order (Tagged Command Queuing).

Rev. 3.41

63

Servicing HP ProLiant Server Products

RAID levels supported by HP array controllers


RAID is an acronym applied to drive arrays first described in a University of California at Berkeley paper entitled A Case for Redundant Arrays of Inexpensive Disks, published in 1987. Although there are several levels defined by the RAID model, HP Array Controllers support only six. RAID 0 RAID 1 RAID 2 RAID 3 RAID 4 RAID 5 RAID 1+0 RAID ADG Data Striping without Parity Disk Mirroring Complex Error Correction Parallel-Transfer, Parity Drive Concurrent Access, Dedicated Parity Drive (Data Guarding) Concurrent Access, Distributed Parity (Distributed Data Guarding) Disk Mirroring and Data Striping without Parity

Advanced Data Guarding with Two Sets of Parity; ADG is sometimes called RAID 6. The following pages describe these levels in detail.
NOTE: The HP RAID implementations are achieved at the hardware level. Some operating systems support RAID configurations implemented in software at the operating system level. Software implementations of RAID add additional overhead to the CPU and are less efficient than hardware implementations. NOTE: RAID levels 2 and 3 (Complex Error Correction and ParallelTransfer, Parity Drive) are no longer used in the industry.

64

Rev. 3.41

HP Smart Array Products

RAID Level 0 Data Striping without Parity


In RAID 0, data striping without parity, a file is broken into stripes (or segments) and written across multiple disks. Striping unites multiple physical drives into a single logical drive. The logical drive is arranged so blocks of data are written alternately across all physical drives in the logical array. The number of sectors per block is referred to as the striping factor. By definition, RAID 0 requires two of more drives for a true stripe set. However, with HP array controllers, a RAID 0 logical volume can be created with a single drive. Depending on the array controller in use, the striping factor can be modified, usually with the manufacturers system configuration utility. On HP controllers released before the Smart Array 3100ES, changes to stripe size are data destructive. With the implementation of the Performance Tuning Tool Set (PTTS), the 3100ES and newer controllers allow adjustment of stripe size on the fly. Changes to stripe size on all older array controllers require a complete backup operation before and a restore operation after the modifications have been made.
Stripe 1 Disk 0 Stripe 2 Disk 1 Stripe 3 Disk 2

The above illustration shows how a file is broken into stripes (or segments) and then written across multiple disks. This greatly improves the disk latency (the amount of time a disk head has to wait for the target sector to move under the head). In addition, 100 percent of the disk space is available for data and overall disk performance is improved.

Rev. 3.41

65

Servicing HP ProLiant Server Products

Striping Factor

64 KBytes Host Data

s Byte 16 K

16 K Byte s

KB yt es

16 es yt KB

In this figure the drive array is striping 64KB of data across a four-drive, no fault tolerance array. Striping unites multiple physical drives into a single logical drive. The logical drive is arranged so blocks of data are written alternately across all physical drives in the logical array. The number of sectors per block is referred to as the striping factor. Depending on the array controller in use, the striping factor can be modified, with Array Configuration utility. Any change to the logical volume geometry (such as striping factor, volume size, or RAID level) may be data destructive. Changes such as these require a complete backup operation before and a restore operation after the modifications have been made. Example If the striping factor is 32, intelligence in the array controller writes 32 sectors to one physical drive and 32 sectors to the next physical drive in the array. Cycling continues through the drives until the write is complete. Since a sector is 512 bytes, a striping factor of 32 is equivalent to a stripe block of 16 KB. Limitations Data striping is faster than conventional file writing to a single disk; however, there is no fault protection should a drive fail. In the above illustration, if disk 1 should fail, the entire file could not be retrieved, nor could additional information be written to the drives. As more drives are added to the array, the potential for drive failure rises. For example, calculating the Mean Time Between Failure (MTBF) for a physical disk and then for a RAID 0 implementation yields interesting results.

66

16

Rev. 3.41

HP Smart Array Products

If the MTBF of a single drive is 200,000 hours, the MTBF of an array with five similar drives is figured as 200,000 divided by 5 for a total array MTBF of 40,000. The number lowers simply because there are more physical spindles that are subject to failure. Therefore, a RAID 0 implementation is not suited for faulttolerant environments.

Rev. 3.41

67

Servicing HP ProLiant Server Products

RAID Level 1 Disk Mirroring


With RAID 1, data is written to two separate mirrored drives. If a drive should fail, the mirrored drive is the safeguard. RAID 1 requires an even number of drives, with a maximum of 30 when connected to dual-channel controllers supporting the Wide-Ultra SCSI-3 or Wide-Ultra2 protocols. Drives must also be added in pairs to achieve a RAID 1 expansion. RAID 1 mirrors the entire data structure on different drives, and allows split seeks. When reading data from the drives, the drive or drives with the requested data nearest to the read/write heads will be read. This improves read performance slightly.
Stripe 1 Stripe 2 Stripe 3 Stripe 4

Stripe 1 Stripe 2 Stripe 3 Stripe 4

Disk 0

Disk 1

M irrored Data

Limitations Although RAID 1 is a viable fault-tolerant solution, it is an expensive solution in that it requires twice as much drive storage (only 50 percent of the total disk space is available for data storage).
NOTE: The HP implementation of drive mirroring is done with hardware. Drive mirroring can also be implemented in software at the operating system level. However, note that software mirroring adds additional overhead to the CPU and is often less efficient than hardware mirroring.

68

Rev. 3.41

HP Smart Array Products

RAID Level 4 Data Guarding


In RAID 4, data is striped to multiple drives, its parity sum is calculated, and the parity sum is written to a dedicated parity drive. In this fault tolerant state, if a drive were to fail, the data stored on that drive could be retrieved from the parity drive, encoded, and returned when needed. The biggest limitation is the time required to encode the parity information and then access a single dedicated parity drive to store the information. RAID 4 requires a minimum of three drives and a maximum of 30 (known as an N+1 combination, where N is the number of drives used for data plus an additional drive for parity). Regardless of the number of physical drives used in the array, a single drive is used for parity.
Stripe 1 Disk 0 Stripe 2 Disk 1 Stripe 3 Disk 2 Parity Disk 3

Dedicated Parity Drive


File striped across multiple disks, parity sum written to a dedicated drive

Limitations The biggest limitation of RAID 4 is the time required to encode the parity information and then access a single dedicated parity drive to store the information. While it does provide fault tolerance, it does require a dedicated parity drive.

Rev. 3.41

69

Servicing HP ProLiant Server Products

RAID Level 5 Distributed Data Guarding


In RAID 5, data is striped across multiple drives, its parity sum is calculated, and the parity sum is also striped across multiple drives (not a dedicated parity drive). This increases performance in that the parity generation does not cause degradation, as not all drives need access to a single parity drive.
Stripe 1 Stripe 4 Stripe 7 Parity Disk 0 Stripe 2 Stripe 5 Parity Stripe 10 Disk 1 Stripe 3 Parity Stripe 8 Stripe 11 Disk 2 Parity Stripe 6 Stripe 9 Stripe 12 Disk 3

File striped across multiple disks, parity sum also written across multiple disks

RAID 5 is best suited for I/O-intensive applications and transaction processing, thus making it an ideal solution for high-performance faulttolerant servers. The biggest limitation of RAID 5 is the increased read time in a failure. Regardless of which disk fails data has to be recalculated on each read from the remaining disks. RAID 5 has the same drive requirements as RAID 4, except that the space used for parity is distributed across all the drives in the volume.

6 10

Rev. 3.41

HP Smart Array Products

RAID Level 1+0

Chunk 1 Chunk 3 Disk 0

Chunk 2 Chunk 4 Disk 1

Chunk 1 Chunk 3 Disk 2

Chunk 2 Chunk 4 Disk 3

RAID 1+0 is a combination of striping and mirroring data. RAID 1+0 writes data across the drives in the same fashion as RAID 0, and achieves redundancy by mirroring data similar to RAID 1. Unlike RAID 1, the data disks are also the mirror disks. RAID 1+0 mirrors data back onto the data disks rotated by one drive. An odd number of drives can be used in a RAID 1+0 configuration, whereas RAID 1 requires an even number of drives. You can continue to access data in a RAID 1+0 configuration with a single drive failure or multiple drive failures. As long as 1 drive of each mirrored pair is functioning, the set will function.

Rev. 3.41

6 11

Servicing HP ProLiant Server Products

RAID 6 Advanced Data Guarding (ADG)


RAID ADG delivers high fault tolerance similar to RAID 1 while keeping capacity utilization high like RAID 5. It protects data from multiple drive failures with an ability to withstand two simultaneous hard drive failures without data loss or downtime. To accomplish this, RAID ADG increases to two the number of sets of parity striped across the disks. This method results in protection for an array with as many as 56 drives while requiring the capacity of only two drives to store parity information. While RAID ADG provides the dual advantages of increased fault tolerance and high capacity, it does so at the cost of performance that is less than that of other RAID levels. Performance equals that for RAID 5 when reading data but is slower when writing due to the extra parity data. The decision tree shown below illustrates the factors that should be considered in choosing the RAID level to use in a given situation. RAID ADG is the choice when there is a need for high fault tolerance in an environment that requires high capacity. Although RAID 5 can handle up to 14 drives, HP recommends considering RAID ADG when the number of drives exceeds eight.

6 12

Rev. 3.41

HP Smart Array Products

Smart Array controllers


Smart Array controller overview
HP SCSI Managed Array Technology (SMART) Array controllers provide hardware level RAID support. The SMART Array controller accepts write commands, calculates any parity data, decides where the data and the parity data are to be written, then manages the writing of that data. Taking that overhead away from the operating system and giving it to the array controller speeds read and writes operations. This hardware also provides fault tolerant features that protect data integrity.

Smart Array controller history


Compaq was the pioneer of RAID controllers in Intel based servers. It started in 1989 with the IDA controller that used Conner IDE drives. After the release of the SCSI-2 specification Compaq chose SCSI as the preferred technology for hard disk drives and RAID controllers in 1992. Following is the timeline for the introduction of Compaq/HP Smart Array controllers from 1992 through the present. Chronology
1992 SMART (EISA / 2 Fast SCSI channels) 1996 SMART 2/E (EISA / 2 Wide Ultra SCSI channels) 1996 SMART 2/P (PCI / 2 Wide Ultra SCSI channels) 1996 SMART 2SL (low cost SMART, 1 Wide Ultra SCSI channel) 1997 SMART 2DH (SMART 2/P with larger 16 MB cache, 2x Wide Ultra SCSI) 1998 Smart Array 3200 (logical drive expansion, RAID conversion, 2x WU2) 1998 Smart Array 3100ES (hot-plug version for Proliant 6000/7000, 3x Wide Ultra) 1999 Smart Array 4200 (Quad channel Wide Ultra2 SCSI) 1999 Smart Array 4250ES (hot-plug version for Proliant 8000/ML750, 3x WU) 2000 Smart Array 431 (Single channel, low cost, Ultra3 SCSI) 2000 RAID LC2 (low cost, single channel WU2, data compatible with SMART) 2000 Smart Array 5304 (Quad channel Ultra3, RAID ADG) 2000 Smart Array 5302 (2x Ultra3, upgradeable: RAID ADG, SAN module, 4 channels) 2001 Smart Array 532 (low cost, dual channel Ultra3 SCSI) 2002 Smart Array 5312 (133 MHz PCI-X, dual channel Ultra3, no ADG ) 2003 Smart Array 641 (PCI-X to single channel Ultra320 SCSI) 2003 Smart Array 642 (PCI-X to dual channel Ultra320 SCSI) 2003 Smart Array 6402 (PCI-X to dual channel, 128MB cache, Ultra320 SCSI, ADG) 2003 Smart Array 6404 (PCI-X to quad channel, 256MB cache, Ultra320 SCSI, ADG)

Rev. 3.41

6 13

Servicing HP ProLiant Server Products

Smart Array controller features


HP Smart Array controllers have a number of features that enhance their performance, reliability and serviceability. The following is a list of features that are represented within the product line. Consult individual controller specifications for those that apply to a particular model.

Inter-generation data compatibility for ease of migration Standard configuration and management tools across product line Automatic data transfer from a failed drive to an online spare Redundant ROM protection against firmware image corruption Pre-failure notification of impending hard disk failure Capacity expansion allows the addition of drives to an existing array Volume extension increases the space on an existing logical drive RAID migration allows online reconfiguration to a new level of fault tolerance Stripe size migration to tune performance

6 14

Rev. 3.41

HP Smart Array Products

Smart Array 6402/6404


The Smart Array 6400 products provide maximum performance, flexibility, and data protection for HP ProLiant servers, through unique modular design and support for Advanced Data Guarding (RAID ADG). Ultra320 SCSI technology delivers up to 320 MB/s bandwidth per channel. The 64-bit, 133 MHz PCI-X interface boosts bandwidth above 1GB/s burst transfer rate over PCI-X bus Enhanced RAID engine Smart Array 6402/6404 controllers support up to 2 TB RAID volumes under some operating systems including Windows 2000. A two-channel controller can have a maximum of twenty-eight 146 GB hard disk drives for a total of 4TB of storage. DDR battery-backed write cache architecture In the event of a controller failure or server failure, battery-backed cache can be removed and placed on another controller board; the cached data will be flushed to the disk drives. Cache batteries provide up to four days of battery life. This design offers redundant and replaceable battery packs for increased data protection and better serviceability. There are two cache sizes available: 128MB and 256MB. Standard cache size for the SA-6402 is 128 MB and for the SA-6404 is 256MB. The SA-6402 controller is upgradeable to 256 MB of cache. Recovery ROM Recovery ROM provides a redundancy feature that protects from a ROM failure. A new version of firmware can be flashed to the ROM while the controller maintains the last known working version of firmware. If the firmware becomes corrupt, the controller will revert back to the previous version of firmware and continue operating. Internal and External Connectors The SA-6402 controller has two internal and two external SCSI connectors to support both internal and external drives. Each channel shares an internal and external connector. For each SCSI channel, you can choose to use the internal connector or external connector, but you cannot use the same internal and external channels at the same time. The SA-6402 controller uses Very High Density Cable Interconnects (VHDCI) for the external SCSI buses and high density 68-pin connector for internal SCSI buses. The same connectors are found on the StorageWorks Enclosure 4214 and 4314. Backward compatibility Upon reaching the limitation of the SMART-2SL, SMART-2DH, SA221, SA3200, SA4200, SA431, SA532 or the SA5300, customers can easily replace them with the SA-6400 to increase data performance, availability and capacity. To replace the existing controller the customer just upgrades the firmware, shuts down the server, replaces he old controller with the new controller. The new SA-6400 array controllers will seamlessly recognize the disk drives, RAID configuration and data.
Rev. 3.41

6 15

Servicing HP ProLiant Server Products

Smart Array 641/642


The Smart Array 641 and 642 are both entry-level, sixth generation Smart Array controllers. The SA642 is essentially the SA641 with an additional SCSI channel. The SA641 has a single channel with an internal port while the SA642 has two channels with both an internal and an external port. The additional channel allows the SA642 to have up to 20 drives (6 internal and 14 external) while the SA641 is limited to 6 internal drives. Both controllers use the Ultra320 SCSI protocol which is backward compatible with Ultra2 and Ultra3 drives. The controllers achieve a maximum bandwidth of 320 MB/s per channel and with the 64-bit, 133-MHz PCI-X interface, bandwidth over the PCI-X bus can exceed a1GB/s burst transfer rate. The SA641 and SA642 come with 64 MB DDR Memory Cache (64-MB module upgrade to 128MB Battery Backed Writer Cache (BBWC) is available). BBWC provides up to 3 days of redundant battery life and is removable for easy replacement Feature Summary

Modular, easy-to-upgrade design lets you optimize performance as needed, from 64-MB to 128-MB battery-backed cache. Battery-backed Cache protects cached data in the event of a power outage, server failure or controller failure, and redundant, replaceable batteries take that protection even further. Ultra320 SCSI technology delivers high performance and data bandwidth up to 320 MB/s bandwidth per channel. Dual Channels (SA-642 only) provide up to 1.5TB of storage with 20 drives. Mix-and-match LVD SCSI compatibility protects your investments and lets you deploy drives as needed. 64-bit, 133 MHz PCI-X interface boosts bandwidth above 1GB/s burst transfer rate using the PCI-X bus. 64-bit memory addressing supports servers with greater than 4 GB of memory. Online Management Features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority. Hot plug tape support (AIT100/200, 50/100, 35/70; DAT 20/40) Multiple logical drives per array S.M.A.R.T. support (Drive Pre-Failure Warranty) Auto-Reliability Monitoring (ARM) Dynamic Sector Repair Background Parity Initialization
Rev. 3.41

6 16

HP Smart Array Products

Smart Array 5312


The Smart Array 5312 controller raised the standard to higher performance levels than its predecessors with several enhancements of the memory architecture and RAID engine. Designed and tested with industry-standard ProLiant Servers for greater reliability, this controller is ideal for the distributed workgroup server or centralized departmental server, and like other Smart Array controllers, the SA5312 offers complete data compatibility with previous generations Smart Array controllers for easy data migration and upgradeability. Feature Summary

Modular, easy-to-upgrade design lets you optimize performance as needed, from 128-MB to 256-MB battery-backed cache. High-performance, Fifth generation architecture offers the new hardware RAID engine, and a new memory architecture for increased performance over previous controllers. Ultra3 SCSI technology delivers high performance and data bandwidth up to 160 MB/s bandwidth per channel. Dual Channels provide the ability to support up to 2TB with 28 drives. Mix-and-match LVD SCSI compatibility protects your investments and lets you deploy drives as needed. Battery-backed cache protects cached data in the event of a power outage, server failure or controller failure, and redundant, replaceable batteries take that protection even further. 64-bit, 133 MHz PCI-X interface boosts bandwidth above 1GB/s burst transfer rate over PCI-X bus. 64-bit memory addressing supports servers with greater than 4 GB of memory. Online Management Features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority.

Note: The Smart Array 5312 is basically a PCI-X version of the SMART 5302. The 5312, however, does not support the ADAM module (required for RAID ADG) and cannot be upgraded to four ports or with a SAN module.

Rev. 3.41

6 17

Servicing HP ProLiant Server Products

Smart Array 5304 and 5302


The Smart Array 5300 series of high-performance Ultra3 array controllers provides reliable data protection for HP ProLiant servers and offers new levels of flexibility with Advanced Data Guarding (RAID ADG) technology. Feature Summary

A new hardware RAID engine and new performance memory architecture to significantly improve performance over previous controllers. Modular design allows optimizing performance and increasing capacity from two to four channels with 32, 64, 128, or 256 MB battery backed cache. RAID ADG (Advanced Data Guarding) delivers high fault tolerance similar to RAID 1 while keeping capacity utilization high like RAID 5.

This feature protects data from multiple drive failures while only requiring the capacity of two drives to store parity information. This higher level of protection is ideal where large logical volumes are required. RAID ADG can withstand two simultaneous hard drive failures without data loss or downtime - twice as many as RAID 5. This is a standard feature with Smart Array 5304/128 and as an option for the SA-5302/64 and SA-5302/32 models. RAID ADG requires a minimum of 64 MB battery backed cache.

Ultra3 SCSI delivers up to 160 MB/s per channel bandwidth and up to 4 channels provides the highest storage capacity per PCI slot in the industry. Mix-and-match LVD SCSI compatibility protects the investments and allows for drives to be deployed as needed. Battery-backed cache protects cached data during power outages, server failure or controller failure, and uses redundant batteries. A 64 bit, 66 MHz PCI interface boosts bandwidth up to a 533 MB/s total transfer rate. 64-bit memory addressing supports servers with greater than 4 GB of memory. Online management features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority.

6 18

Rev. 3.41

HP Smart Array Products

Smart Array 5304 and 5302 Feature Summary (continued)


Recovery ROM Upgradeable firmware - 2 MB flashable ROM Support for HP universal hot-plug tape drives. SAN Access, the industrys first integrated SCSI controller and Fibre Channel SAN adapter offers:

Centralized, consolidated backup solutions Incremental SAN based primary storage for groups of 5-10 servers driving Direct Attach Storage (SCSI) with Smart Array 5302 Controllers.

Notes: 1. The Smart Array 5304 supports RAID ADG as a standard. ADG resembles RAID 5 but requires two parity drives and can handle the simultaneous failure of two drives without data loss. The cache module uses AECC technology which can handle the failure of a single memory chip without causing data loss or system interruption. 2. The SMART 5302 can be upgraded to a 5304 by adding a 2-channel Ultra3 module. The SMART 5302 controller requires a HW upgrade to enable RAID ADG (minimum of 64 MB cache 1) and ADAM module). ADAM is the ADG Activation Module. The batteries of the cache module are redundant. The cache module uses AECC technology which can handle the failure of a single memory chip without causing data loss or system interruption.

Rev. 3.41

6 19

Servicing HP ProLiant Server Products

Smart Array 532


The Smart Array 532 Controller (SA-532) is a 64-bit, 66 MHz dual SCSI channel PCI array controller for entry level hardware-based fault tolerance. Utilizing both SCSI channels (1 internal and 1 external) of the SA-532 allows up to 28 hard drives to store up to 2TB of storage per PCI slot. The SA-532 provides high reliability and increased performance over the Smart Array 431. In addition, the SA-532 is data compatible with all Wide Ultra3 and Wide Ultra2 drives and servers. The SA-532 is supported only in 3.3-volt PCI slots of all Ultra2 and Ultra3 servers - it is not supported in 5-volt PCI slots The SA-532 provides performance and capacity for RAID array controllers in entry-level and workgroup HP ProLiant servers or in any ProLiant server where hardware RAID is needed at an entry-level price point. Feature Summary

Compatibility with all Ultra2 and Ultra3 LVD family products Recovery ROM protects against a ROM corruption Ultra3 SCSI technology delivers high performance and data bandwidth up to 160 MB/s bandwidth per channel Mix-and-match LVD SCSI compatibility protects your investments and lets you deploy drives as needed Dual SCSI channels allows for up to 2 TB of storage per server slot Software consistency among all Smart Array family products: Array Configuration Utility XE (ACU-XE), Array Configuration Utility (ACU), Insight Manager (IM), Array Diagnostic Utility (ADU) and SmartStart. 64-bit, 66MHz PCI interface boosts bandwidth up to 533 MB/s total transfer rate 64-bit memory addressing supports servers with greater then 4 GB of memory 3.3 Volt slot support only (provides the latest in low-voltage, 64-bit support) 32MB Memory optimizes performance and data throughput. Pre-Failure Warranty support for hard disk drives (requires Insight Manager). 1. If a SMART 221 or 2SL is upgraded with a SMART 532, some features (RAID level migration, drive expansion, stripe set migration) cannot be used. Enabling these features requires deleting the existing arrays and creating new arrays. 2. The SMART 532 does NOT support all single ended SCSI devices.The Proliant Storage enclosures F1, F2, U1, U2 and UE are not supported by the SMART 532.

Notes:

6 20

Rev. 3.41

HP Smart Array Products

Smart Array 5i and 5i plus


Embedded Smart Array 5i is an intelligent array controller for entry-level, hardwarebased fault tolerance. Dual Ultra3 SCSI channels (one internal and one external) allow support of all Ultra2 and/or Ultra3 SCSI internal hard disk drives as well as up to 14 Ultra2 and/or Ultra3 SCSI external hard disk drives using the StorageWorks Enclosure 4300 Family. Smart Array 5i is a cost-effective alternative to software-based RAID. Smart Array 5i Plus is the upgraded version of Smart Array 5i with 64MB cache and the ability to add battery backed write cache (BBWC). Smart Array 5i controller is designed and integrated with the ProLiant DL380 G2 server and is an optional upgrade for the ProLiant ML370 G2 server. Smart Array 5i Plus is standard on the DL 580 G2 server and comes standard with the BBWC Enabler on the 2P models. BBWC Enabler is an optional upgrade for the DL 580 G2 1P Model servers. DL 380 G2 and ML 370 G2 customers will be able to purchase the bundled Smart Array 5i Plus and BBWC Enabler option kit, which will replace the 5i controller Feature Summary for Smart Array 5i and 5i Plus

Increased performance over the Integrated Smart Array Controller Transportable, hardware-based Wide Ultra3 SCSI RAID controller, with 32of memory (64MB on the Smart Array 5i plus). More robust and easier to use than software-based RAID Compatibility with all Ultra2 and Ultra3 LVD family products and a seamless upgrade to next generation Ultra3 Smart Array controllers. 32-bit architecture Dual SCSI channels (one internal and one external) Universal hot-plug tape drive support and native SCSI pass through for tape backup Recovery ROM protects against a ROM image corruption

Smart Array 5i vs. Integrated Smart Array controller (ROC) The main advantages of Smart Array 5i Controller over ROC include:

Dual Ultra3 SCSI channels (1 internal and 1 external) 32 MB of cache memory used for code, transfer buffers, and non-battery backed read cache Recovery ROM to protect against a ROM image corruption

Note: Because the Integrated Smart Array Controller is embedded on the system board and cannot be removed you cannot upgrade it to the Smart Array 5i Controller but you can move any hard drives to a Smart Array 5i controller..
Rev. 3.41

6 21

Servicing HP ProLiant Server Products

Smart Array 4200


The HP Smart Array 4200 Controller was the first of a new generation of highperformance array controllers. Based on a 64-bit architecture, this 4-channel controller has two major enhancements over previous generations of mainstream Smart Array products: 1. Four channels, providing two customer benefits:

High performance - 320 MB/s parallel SCSI bandwidth High capacity - 4 SCSI channels with (14) 18.2-GB drives per channel externally = 1 TB of external storage.
Note: Requires the use of the StorageWorks Enclosure Model 4214R or 4214T products.

2. I/O architecture benchmarked at up to 4x the current I/Os per second of the award-winning SA-3200 controller. This architecture includes:

Super-scalar RISC processor for greater processing power Split memory architecture for greater internal bandwidth 64-bit PCI standard for greater system bus bandwidth

Feature summary

4 external SCSI ports; 2 internal SCSI ports Over 1 TB of external storage per server slot supported 64-MB1 ECC protected, battery-backed and removable, cache daughter board for up to 4 days of data protection
Note: 56 MB useable for Read/Write cache, 8 MB used for transfer buffer and scripts memory.

Ultra2 SCSI: up to 320 MB total SCSI band-width Performance architecture 64-bit PCI design Data compatible with previous Smart Array family products-seamless upgrade from previous generations of Smart Array family controllers. Online management features: capacity expansion, RAID level migration, stripe size migration, online spares (global), user selectable read/write cache and user selectable expand and rebuild priority

Note: The SMART 4200 is basically a SMART 3200 with two additional SCSI channels. As a result of using 64-bit PCI technology and a 64-bit RISC processor the SMART 4200 has a higher performance than the SMART 3200.

6 22

Rev. 3.41

HP Smart Array Products

Smart Array 431


Smart Array 431 Controller (SA-431) is an intelligent 64-bit array controller for entry level hardware-based fault tolerance with support for up to 14 Wide Ultra3 SCSI hard drives. SA-431 provides a cost effective alternative to software-based RAID. SA-431 is also data compatible with all Wide Ultra3 and Wide Ultra2 drives and servers. SA-431 has two major advantages over previous generations of entry-level Smart Array controllers: 1. SA-431 uses Wide Ultra3 in the HP LVD Family of products. Since Wide Ultra3 is data compatible with Wide Ultra2 drives and servers it provides customers investment protection for those products. 2. SA-431 achieves greater ease-of-use through online capacity expansion: which increases fault tolerant logical drive capacity and online RAID level migration which reduces downtime during reconfiguration Features summary

Compatibility with all Wide Ultra2 and Wide Ultra3 LVD family products and a seamless upgrade to next generation Wide Ultra3 Smart Array controllers. Software consistency with all Smart Array family products including ACUXE, ACU, IM, ADU and SmartStart Wide Ultra3 SCSI with up to 160 MB/s SCSI bandwidth 64-bit PCI bus design Up to 254.8-GB of storage per server slot 16-MB of controller cache is used to create a high performance engine and optimize data throughput. 16-MB of DRAM used for code, transfer buffers, and non-battery backed read cache 1. To utilize capacity expansion and RAID level migration features when upgrading or migrating from previous generation Smart Array controllers, the customer must perform a backup of the existing data and recreate logical drives using the Smart Array 431 controller. 2. The SMART 431 does NOT support all single ended SCSI devices (Wide Ultra and Fast Wide). The Proliant Storage enclosures F1, F2, U1, U2 and UE are not supported by the SMART 431. 3. If a SMART 221 or 2SL is upgraded with a SMART 431, some features (RAID level migration, drive expansion, stripe set migration) cannot be used. Enabling these features requires deleting the existing arrays and creating new arrays.

Notes:

Rev. 3.41

6 23

Servicing HP ProLiant Server Products

Integrated Smart Array a.k.a. RAID on a chip (ROC)


The Integrated Smart Array Controller is an intelligent array controller for entrylevel, hardware-based fault tolerance with support for all Ultra2 SCSI internal hard drives. The Integrated Smart Array controller provides a cost effective alternatives to software-based RAID. Designed and integrated with Compaq ProLiant DL360, DL380, ML570, DL580, ProLiant 8500, and ProLiant 8500 Data Center solution servers, the Integrated Smart Array controller provides data protection for all server internal storage needs. Feature summary

Better performance and ease of use than software- based RAID Available only on ProLiant DL360, DL380, DL580, ProLiant 8500, and ProLiant 8500 Data Center Solution servers standard. Also available as an optional upgrade for the ProLiant ML370 and ML570 servers. Embedded hardware RAID controller Wide Ultra2 SCSI 32-bit architecture Fault tolerant RAID supported for all internal hard disk drives Dual Channel Wide Ultra2 SCSI performance; 80MB/s per channel, Channel 1 - internal disk drive cage Channel 2 - external for tape drive support including Hot Plug tape options. Supports a maximum of four 1" Wide Ultra2/Ultra3 SCSI Hot Plug drives. 8MB Read Cache. (Write Cache not available due to lack of battery backup). Support for RAID 0, 1+0, and 5 from Channel 1, only. Channel 2 supports standard SCSI operation, only. ROC does not support Tape Libraries or Tape AutoLoaders from either Channel. Only external tape devices are supported for CH 2.. Internal tape devices are not supported: Channel 2 external connector is VHDCI. LVD interface required by Wide Ultra2/Ultra3 SCSI protocol. Supports migration to higher performance PCI Smart Array Controllers. Backward compatible with Fast, Fast-Wide SCSI, and Wide Ultra SCSI-3 devices.

Note: The Integrated SMART controller converts the on-board SCSI to RAID protected SCSI. It does not add extra channels. If the RAID features of the Integrated SMART controller are not used, the controller should be removed.

6 24

Rev. 3.41

HP Smart Array Products

RAID LC2
The HP RAID LC2 Controller is a single channel PCI RAID controller targeted at entry-level and workgroup servers that need hardware RAID. Qualified on the HP ProLiant ML330, ML350 and the DL320 servers, the RAID LC2 Controller offers entry-level RAID functionality. With support for Wide Ultra2 SCSI and up to six internal disk drives, the RAID LC2 Controller provides data compatibility and an upgrade path to all HP Smart Array controllers.

Compatibility with all Wide Ultra2, Wide Ultra3, Ultra320 LVD disk drives (Ultra3 and Ultra 320 attached drives will transfer data at a maximum of 80 MB/s) Seamless Upgrade to all HP Smart Array controllers Wide Ultra2 SCSI: Up to 80 MB/s SCSI bandwidth Single Internal Channel 8-MB Read Cache 32-bit PCI Bus Design Up to 218.4 GB of storage using six 36.4 GB internal disk drives

Note: The ProLiant ML310, ProLiant ML330, and ProLiant DL320 servers use the 36.4 GB Wide Ultra3 or Ultra320 drives

Up to 880.8 GB of storage using six 146.8 GB internal disk drives

Note: The ProLiant ML330 G2, ProLiant ML350, ProLiant ML350 G2, and ProLiant ML350 G3 servers use the 146.8 GB Ultra320 drives

Auto Rebuild Feature Online Spare Support Pre-Failure Warranty support for hard disk drives (requires that Insight Manager be installed) 1. The RAID LC2 controller is a single channel PCI RAID controller targeted at entry-level and workgroup servers that need low cost hardware RAID. The RAID LC2 controller is data compatible with other Smart Array controllers and can be upgraded without data loss. 2. The RAID LC2 controller does not support ACU and ACU XE and must be configured before the SmartStart CD is used to install the server.

Notes:

Rev. 3.41

6 25

Servicing HP ProLiant Server Products

Smart Array 3200


The Smart Array 3200 Controller is an award-winning workgroup RAID controller. Supporting two internal or two external Ultra2 SCSI connections, this controller replaces the SMART-2DH Array Controller. Feature summary

Supports Wide-Ultra2 SCSI, a 16-bit, 40 MHz bus with a data transfer rate of 80 MB/s Has two channels with support for up to 30 drives (15 per channel) Supports two external Wide-Ultra2 SCSI connections or can be custom configured for internal or external connections using daughter boards Has a removable Array Accelerator battery-backed 64 MB read/write cache board with ECC (Error Checking and Correcting) memory* Has read ahead caching Supports hot-plug PCI Allows multiple logical drives per drive array Supports RAID 0, 1+0 (also called RAID 10), 1, 4, and 5 fault tolerance options Supports Wide-Ultra2 SCSI, Wide-Ultra SCSI-3, Fast-Wide SCSI-2, and Fast SCSI-2 hard drives Allows performance monitoring through Insight Manager Is available in 32 bit PCI Bus Master interface

Notes: 1. If all cabling is external, remove the daughterboard. 2. Upgrading to firmware 4.44 or higher will reduce the number of array controller failures.

6 26

Rev. 3.41

HP Smart Array Products

Smart Array 4250ES and 3100ES


Smart Array 4250ES Controller is based on innovations in RAID I/O architecture. This cable-free controller supports high performance and availability in ProLiant 8000, 7000, or 6000 server1 with Extended SCSI (ES) PCI. Designed specifically for internal drive support in these servers, this cable-free design allows for hot plug, cache coherent fail-over to optional second Smart Array 4250ES controller. The Smart Array 3100ES Controller is an innovation for Hot Plug Redundancy of Array Controllers design based on field-proven SMART-2 array architecture. Three Wide-Ultra SCSI-3 channels are provided to match the three SCSI drive banks in the ProLiant 6000 and 7000 Systems1 (Xeon versions only). Using the Extended SCSI PCI Bus Connector, the Smart Array 3100ES Controller enhances PCI hot plug by distributing the three SCSI channels through the PCI hot plug Extended SCSI connector resulting in a "cable free" controller environment. Feature summary

Record setting RISC processor architecture enhances controller performance Hot plug cable-free design provides greater online availability in redundant controller environments Data compatibility with all previous Smart Array controllers for simpler, more cost effective upgrades Processing tasks are divided between two engines. One generates fault tolerance information and manages data flow, while the other prepares and sorts array storage commands. Three SCSI channels provide up to 1.528 TB of high-availability fault tolerant storage using the 72.8-GB Wide Ultra3 hard drives. 64-MB ECC-protected and battery-backed removable read/write cache module maximizes I/O performance without sacrificing data integrity. Redundant controller boards enable you to Hot Plug a redundant SA-4250ES controller without bringing down the server. Maximum internal capacity for the ProLiant 8000 server is twenty-one, 1 in (2.54 cm) 72.8-GB drives, resulting in 1.528-TB of storage (21 x 72.8 GB 1" Wide Ultra3 drives). 1. The SMART 3100ES and 4250ES are specifically designed for the internal drive cages of a Proliant 6000, 7000, 8000 and ML750. Drive arrays can be spanned over two or three drive cages. 2. No cables are attached to the controller. All SCSI signals are routed through a special ES (extended SCSI) PCI slot. The SMART 4250ES and 3100ES controllers have no support for external drives. Data are compatible with other SMART controller models. Two SMART 4250ES or two 3100ES controllers can be set up as a redundant hot-pluggable pair.

Notes:

ProLiant 6000 and ProLiant 7000 servers have been discontinued.


6 27

Rev. 3.41

Servicing HP ProLiant Server Products

SMART 2/E 2/P 2DH 221 and 2SL Array Controllers


Internal SCSI Channel 1 Requires punch out block to route to external drives

SCSI Channel 2

SMART-2/E

The internal card top connector provides connection to SCSI Channel 1. The external card edge connector provides connection to SCSI Channel 2. To connect an external storage system to Channel 1, it is necessary to use the provided cable and punch-out block to pass the internal Fast-Wide bus to an external connection point. A slot cover pass-out point can also be used for systems that lack the punch-out block provision. The SMART-2SL has two internal connectors (a 50-pin for Fast-SCSI-2 devices and a 68-pin for Fast-Wide or Wide Ultra devices) and one external connector (68-pin). Because the SMART-2SL is a single channel controller only one of these three connectors may be used at any given time. It is possible to upgrade from a SMART-2SL to a SMART-2/P, /E, or /DH array controller. However, you may encounter a configuration error when moving drives from the SMART-2SL to the external channel of the one of the dual channel controllers. To allow movement of drives from port 1 to port 2 between controllers, make sure the new controller firmware is revision 1.78 or later. Although the SMART-2/E has the same capabilities of the SMART-2/P, it did not include the Symbios Logic 875 chipset and will not support Wide-Ultra transfer rates. SMART-2 array controllers with a firmware update support up to 15 drives. The PCI controllers in the SMART-2 family are bridged controllers and will cause the renumbering of secondary PCI buses when installed on the primary bus in select servers.
WARNING: After adding one of these controllers, it is important to check the configuration of affected controllers on the subsequent buses.

6 28

Rev. 3.41

HP Smart Array Products

The SMART-2/P, SMART-2SL, and the SMART-2DH support Wide-Ultra transfer rates internally only. HP only guarantees Fast-Wide-SCSI-2 transfer rates on external drives connected to these controllers. Wide-Ultra transfer rates are supported when these controllers are connected to a ProLiant/U Storage System. Both green drive lights on hot-pluggable drives attached to the SMART-2 family of array controllers may illuminate periodically while the server is idle. In most cases, this is a normal condition, and indicates that the controller is performing a test on the drives called Dynamic Sector Repair. This test runs in the background only while the server is idle and does not necessarily mean that there are problems with a driveit is only a test. The expansion card on a SMART-2 controller (none on 2SL or 221) contains the battery backed cache. In cases where the server fails, data is kept in the onboard cache for up to four days. It can be written to disk upon restoration of the unit within that period of time. The expansion card can also be transferred to a new controller in cases where the controller itself fails. The SMART-2 family of controllers is configured with the Array Configuration Utility. With the SMART-2, several configuration limitations of the SMART were overcome, allowing addition of drives, array expansion, and drive configuration without closing the operating system (online). Online array expansion is not supported with the SMART-2SL. The HP SMART SCSI Array Controller and all models of the SMART-2 SCSI Array Controller support SCSI hard drives only. Connecting it to any other SCSI device will not work and may permanently damage the controller.

Rev. 3.41

6 29

Servicing HP ProLiant Server Products

Smart Array Controller Features Summary


Array Controller SMART-2/E SMART-2/P SMART-2SL SMART-2DH SMART 221 SMART 3100ES SMART 3200 SMART 4200 SMART 4250ES SA-431 RAID LC2 SA 5i, 5i+ SCSI Level Supported Fast-Wide SCSI-2 (68-pin) Wide Ultra (68-pin)* Wide Ultra (68-pin) Wide Ultra (68-pin) Wide Ultra2 Wide Ultra SCSI-3 Wide Ultra2 Wide Ultra Wide Ultra2 Wide Ultra Wide Ultra2 Wide Ultra Wide Ultra3 Wide Ultra2 Wide Ultra Wide Ultra3 Wide Ultra2 Wide Ultra Ultra3 Wide Ultra2 Wide Ultra(tape only) Ultra3 Wide Ultra2 Ultra3 Wide Ultra2 Wide Ultra Fast Wide Ultra3 Wide Ultra2 Wide Ultra Ultra3 Wide Ultra2 Ultra 320 Ultra3 Ultra2 Ultra 320 Ultra3 Ultra2 Ultra 320 Ultra3 Ultra2 Cache 4MB ECC read-write 4MB ECC read-write 6MB ECC read 16MB ECC readwrite 6MB ECC read 56MB ECC read-write 64MB ECC read-write 64MB ECC read-write 64MB ECC read-write 16MB cache engine 8MB read 32 MB RO 5i 64 MB BBRW 5i+ 32 MB RO 128 MB BBRW (Upgrade to 256 MB) 256 MB BBRW 128 MB BBRW (Upgrade to 256) 64 MB (Upgrade to 128 MB BBRW) 128 MB BBRW (Upgrade to 256) 256 MB BBRW 0, 1, 1+0, 5 0, 1, 1+0, 5 28 28 (Upgrade to 56) 0, 1, 1+0, 5, ADG 0, 1, 1+0, 5 0, 1, 1+0, 5 0, 1, 1+0, 5, ADG 0, 1, 1+0, 5, ADG 56 28 6/28 28 (Upgrade to 56) 56 4 2 2/4 2 (Upgrade to 4) 4 2 2 RAID Levels 0, 1, 4, 5 0, 1, 4, 5 0, 1, 5 0, 1, 4, 5 0, 1, 1+0, 5 0, 1, 1+0, 4, 5 0, 1, 4, 5 0, 1, 1+0, 4, 5 0, 1, 1+0, 4, 5 0, 1, 1+0, 5 0, 1, 1+0, 5 0, 1, 1+0, 5 Max. # Hard Drives 14 14 7 14 12 18 30 56 18/21 14 15 28 #Channels 2 2 1 2 1 3 2 4 3 1 1 2

SA-532 SA-5302

SA-5304 SA-5312 SA 641/642 SA 6402 SA 6404

6 30

Rev. 3.41

HP Smart Array Products

Array Controller Service Considerations


All Array Controllers
The following service considerations apply to HP array controllers.

During periods of inactivity, drives attached to an array controller will run Dynamic Sector Repair (DSR). This is normal activity. Make sure to check controller order any time a controller is added or hardware changes are made to the controller or server. Any controller containing bootable drives must be first in the controller order. The logical geometry of the drives is determined by the operating system selected. Previously stored data will be lost if the operating system setting is changed. To change operating systems, the data must be backed up, and then reinstalled. Any changes to logical volume size or RAID level may be data destructive. There is no way to recover from data loss in these situations. All SMART-2 controller families support up to four online spares, except the single channel controllers, which support only two online spares. All HP array controllers are bus mastering devices. Adding an array controller can free up system resources for other activities as well as increase disk read-write performance. SCSI IDs are not displayed at POST when the SCSI devices are connected to an array controller.

Hot-Plug Drive Support


Several of the advantages provided by HP array controllers require hot-plug SCSI drives. Without hot pluggable drives, the following operations cannot be completed with the drive on-line: Replacement of a failed drive in a fault tolerant array Addition of drives and arrays Expansion of arrays Although HP supports non-hot-plug drives on all of its array controllers, they are not recommended. One of the primary advantages of array controllers is the ability to recover fully from a drive failure without taking the server off-line. This capability requires the use of hot-plug drives in conjunction with an array controller.

Rev. 3.41

6 31

Servicing HP ProLiant Server Products

On-Line Spare
Hot-plug drive support enhances the capability of the On-Line Spare drive, as the failed drive may be replaced while the computer is still running, and the array can return to its original configuration. The On-Line Spare is an effective solution of returning to a fault-tolerant condition after a drive fails. An On-Line Spare is a redundant physical drive that takes the place of any drive that may fail in a hardware fault tolerant logical volume. An array controller supporting On-Line Spare drives not only has the ability to detect unrecoverable drive errors, but also to initiate a background rebuild to the On-Line Spare. The entire process is managed by the processor on the array controller and is independent of the operating system.

Mirrored Pair On-Line Spare


Before Failure During Failure

Mirrored Pair

Mirrored Pair On-Line Spare


After Replacement

Once an On-Line Spare is automatically activated and data from the failed drive rebuilt, the failed drive may be replaced. Following replacement, the data on the spare is spooled onto the new drive and the spare is again available to failed volumes. The On-Line Spare is required to be equal or larger in size than the drive it is replacing. However, once configured as an On-Line Spare, it may become a replacement for any fault-tolerant logical volume on the array controller. Rebuild times vary depending on overall array controller activity. With minimal server activity a 1GB drive takes approximately 10 to 20 minutes to rebuild on the SMART Array Controller.

6 32

Rev. 3.41

HP Smart Array Products

Array configuration utilities


HP array utilities enable you to configure arrays controlled by HP controllers and troubleshoot the array should problems occur. Configuration utilities for Smart Array controllers include

Option ROM Configuration for Arrays (ORCA) an off-line ROMbased configuration utility that runs independent of the operating system. ORCA can be started during the boot process and uses a menu-driven interface for minimal configuration needs by experienced users. ORCA is accessible by pressing F8 after system POST. It allows the user to create and delete logical drives and to set the boot controller order. ORCA does not support drive expansion, RAID level migration and stripe size migration. Smart Array controllers with ORCA Support include

all embedded RAID controllers (ROC, 5i, 5i plus, future products) all 5th generation Smart Array controllers (532, 5302, 5304, 5312) all future Smart Array generations

Array Configuration Utility (ACU) a configuration utility that can be run or installed from SmartStart 5.5 or earlier. ACU has a graphical interface for extensive configuration needs. Wizards are available to support novice users. The Array Configuration Utility (ACU) simplifies array configuration. ACU can be started from within the OS, from the SmartStart CD or from a bootable diskette. Under Windows 2000, Windows NT and Novell Netware this utility can be started online. The server does not have to be powered down when disks are configured.

Array Configuration Utility XE (ACU-XE) a browser-based utility that has both wizard-based assistance and different operating modes for different skill levels or faster configuration.

ACU XE combines the power of the Internet and the features of ACU to provide local or remote, web-based array configuration and management. ACU XE has an easy to use browser based interface and allows you to manage all Smart Array controllers as well as StorageWorks RA4100, RA4000 and MSA1000 enclosures from one central location. ACU-XE is also shipped with SmartStart 6.x.

Rev. 3.41

6 33

Servicing HP ProLiant Server Products

Learning Check
1. Although RAID 5 can handle up to 14 drives, HP recommends considering RAID ADG when the number of drives exceeds eight. True 2. False

Hot plug hard drives support which of the following? a. b. c. d. Replacement of a failed drive in a fault tolerant array Addition of drives and arrays Expansion of arrays Replacement of an array controller while the machine is on-line

3.

Which of the following are features of the SmartArray 6404? a. b. c. d. 32-bit array controller Supports up to 56 drives 133MHz PCI-X 256MB cache

4.

Which of the following statements about RAID are true? a. b. c. d. RAID 1+0 is the least expensive fault tolerant RAID method RAID 5 stores parity across all drives in the array RAID 5 is a more expensive solution than RAID 1+0 RAID ADG performance exceeds that of RAID 5

5.

HP Smart Array Controllers provide a software level RAID solution. True False

6.

Ultra320 SCSI protocol is backward compatible with Ultra2 and Ultra3 drives. True False

6 34

Rev. 3.41

HP Smart Array Products

7.

Which of the following Smart Array controllers support RAID ADG? a. b. c. d. Smart Array 6404 Smart Array 5312 Smart Array 641 Smart Array 5304

8.

The ability to recover fully from a drive failure without taking the server offline requires the use of hot-plug drives with an array controller. True False

9.

If a customer has a failed drive in a RAID 5 set and another drive in prefailure mode which should be replaced first?

10. If a customer has a two failed drives in a RAID 5 set what should you do?

Rev. 3.41

6 35

Tools and Utilities


Module 7

Introduction
This module gives an overview of various tools and utilities that can aid in servicing HP products. Topics include:

SmartStart System Erase ROM-Based Setup Utility (RBSU) Obtaining current device drivers

Objectives
To use HP tools and utilities, service engineers should be able to:

List the functions performed by SmartStart Describe key differences between SmartStart 5.x and 6.x Locate and install a ProLiant Support Pack (PSP) Compare system erase for SmartStart 5.x and 6.x Access and use the ROM Based Setup Utility (RBSU) List at least 3 sources of HP device drivers

Rev. 3.41

71

Servicing HP ProLiant Server Products

SmartStart
Server Integration Tool SmartStart is a set of server integration tools and utilities that optimizes platform configuration and simplifies setup of servers. It also provides functionality for integrating operating system installations on ProLiant servers to achieve optimum reliability and performance. Intelligent Manageability features extend the benefits of SmartStart and facilitate consistency and reliability of server deployment and on-going system maintenance. Server Configuration and Installation SmartStart provides intelligent server configuration and software installation and tuning assistance via a graphical tool, ensuring a streamlined, optimized and reliable setup of ProLiant servers. This tool enables navigation and a summary screen that tracks details on how the system will be configured. This walk-through graphical interface guides the user through every step of the configuration process providing maximum ease of use and confidence that the system is configured properly. Diagnostics and Drivers SmartStart includes the suite of ProLiant server software from diagnostics to drivers and supports the integration of "off-the-shelf" versions of leading operating system software. SmartStart for Servers is shipped standard with every ProLiant Server. You can easily stay up-to-date with SmartStart releases with one of our flexible subscription services.

SmartStart Functions
SmartStart performs the following functions: Automatic Hardware Detection SmartStart automatically detects and configures ProLiant hardware appropriately for the selected software and displays a summary of the configuration and selected parameters to review before any of the software is installed. Drive Array Configuration SmartStart configures physical and logical drive volumes and advanced RAID options Assisted Operating System Integration SmartStart assisted install tunes the configuration precisely for host ProLiant platform and performs the software installation without any further user intervention when the appropriate CD is inserted. Utilities SmartStart automatically installs and configures Insight Management Agents. Insight Manager can then be installed on the management workstation directly from the management CD.
72
Rev. 3.41

Tools and Utilities

ProLiant Support Paqs (PSPs) ProLiant Support Paqs (PSPs) allow you to manually install or upgrade drivers and utilities from Windows NT, Windows 2000, Windows 2003 or NetWare.

SmartStart Setup
When using SmartStart 5.x to setup a server, there are three installation paths to choose from. Assisted Integration Path Replicated Install Path Manual Configuration Path

Assisted Integration The Assisted Integration path provides the full hardware and software integration benefits of SmartStart. This path guides the user through the collection of information needed for configuring the hardware and installing the system software, providing validation, online help, and recommended defaults along the way. A summary is available at any time to review the installation settings and is saved for later reference. A server profile diskette is required for Assisted Integration. To create a server profile diskette, create an empty SPD.ini file using notepad or the edit utility at the command prompt. Replicated Install In SmartStart 5.x the Replicated Install path allows the user to replicate saved operating system configurations across multiple servers. Replicated install captures and saves parameters during the installation of supported software. The configuration information is then saved into "profiles". These profiles can be used over and over to accelerate the installation of software. By using replicated install, users save time and gain a consistent way to deploy NT across the enterprise. SmartStart 6.x does not include a method to perform replicated installations. The SmartStart 6.x deployment process for ProLiant servers configured with RBSU is faster and the interview questions have been streamlined. Performing attended replications for a small number of servers at one time does not require a complicated replicated installation path. Manual Configuration The Manual Configuration path allows the user to run the System Configuration Utility manually and follow the installation procedures of the software manufacturer to install the software. However, full integration benefits are only achieved with the Assisted Integration path. This path may be used to install an operating system using CDs which are not SmartStart enabled. It may also be used for installing software from the Software Product CDs, if more flexibility with the installation settings is desired.

Rev. 3.41

73

Servicing HP ProLiant Server Products

ProLiant Support Paqs (PSPs)


ProLiant Support Paqs (PSPs) (formerly Compaq Support Paqs (CSPs)) are the next generation of Server Support Diskettes (SSDs). PSPs represent operating system specific bundles of optimized drivers, utilities and management agents. They extend the capabilities of the SSD and include the following features: Management ready ProLiant Support Paqs simplify system software maintenance by installing management agents as well as the system software drivers and software utilities. Adding management agents eliminates the need to update them from a separate installation program. Self-installable Smart Components Unlike SSDs, ProLiant Support Paqs contain smaller, more modular pieces called Smart Components. Each Smart Component is a self-installable binary that provides increased flexibility by enabling it to be distributed and installed individually. Deployment Utilities SSDs provided a setup utility for installing the entire contents of the SSD. ProLiant Support Paqs include the new Remote Deployment Utility for installing all Smart Components included in the PSP. The Remote Deployment Utility (RDU) provides version information on currently installed software as well as the software in the PSP. Obtaining ProLiant Support Paqs The latest PSPs can be obtained from the following locations: HP website HP ActiveUpdate SmartStart CD

HP Website The latest PSP deployment utilities, PSPs, and individual components for supported Microsoft Windows and Novell NetWare operating systems are always available on the HP website http://h71025.www7.hp.com/support/swdrivers/index which is accessible from any system with a Web browser and access to the Internet.

74

Rev. 3.41

Tools and Utilities

HP ActiveUpdate The latest HP deployment utilities, PSPs, and individual components for Microsoft Windows NT 4.0, Windows 2000, and Novell NetWare are also available from HP ActiveUpdate v2.0. ActiveUpdate is a Web-based client application for Windows systems only. The ActiveUpdate client reduces the time that administrators spend searching the Web for the latest server updates by proactively delivering updates to a centralized software repository. You can obtain the ActiveUpdate client from the HP website: http://h18000.www1.hp.com/products/servers/management/activeupdate/. NOTE: Although you can use ActiveUpdate to maintain a centralized, networkbased software repository for all of the operating systems discussed in this guide, the ActiveUpdate client does not run on Novell NetWare systems. ActiveUpdate requires initial configuration on a Windows-based system. SmartStart for Servers CD When Web access is not available or download speeds are too slow, the PSP deployment utilities, PSPs, and individual components for Microsoft Windows NT 4.0, Windows 2000, Novell NetWare 4.2, NetWare 5.1, and NetWare 6.0 can also be obtained from the SmartStart for Servers CD 5.3 or later.

SmartStart Home Page


The SmartStart home page provides the following information: SmartStart product overview and description Changes in latest version of SmartStart Key benefits of SmartStart Link to the SmartStart product support page Link to QuickSpecs for SmartStart

The URL for the SmartStart home page is http://h18000.www1.hp.com/products/servers/management/smartstart/index.html SmartStart New Product Support Pages From the SmartStart home page you can link to specific versions of SmartStart that provide the following information:
Rev. 3.41

New server products supported New option products supported Links to current versions of configuration tools Links to updated drivers and support software Links to ROM updates for specific servers Links to customer advisories
75

Servicing HP ProLiant Server Products

SmartStart CD Contents
SmartStart contains optimized drivers and utilities that give you maximum performance on all leading operating systems. The SmartStart 6.x CD contains: Support Software Microsoft Windows 2000/2003 Microsoft Windows NT 4.0 Linux Novell NetWare

Utilities . ROM Update Utility provides customized options for updating system, option and hard drive firmware. Array Configuration Utility (ACU) enables you to configure newly added array controllers and associated storage devices. Array Diagnostics Utility (ADU) performs device tests on HP array controller hardware. Insight Diagnostics performs tests on system components and displays information about a servers hardware and software configuration. Erase Utility provides options to clean different areas of the system: attached drives, non-attached drives, BIOS, and non-volatile RAM (NVRAM).

Management CD Contents An integral piece of the ProLiant Essentials Foundation Pack includes HPs suite of Intelligent Manageability products. Visit http://h18013.www1.hp.com/products/servers/management/index.html for more information about HP Management Software. The Management CD includes: Insight Manager 7 SP2 ActiveUpdate v2.0 Version Control Agent Version Control Repository Manager Survey Management Agents for:
76

IBM OS/2 Linux Microsoft Windows


Rev. 3.41

Tools and Utilities

Subscription Service

Novell NetWare SCO OpenServer SCO UnixWare

The SmartStart subscription service provides customers 8 new releases of the SmartStart CD and the Management CD for a period of approximately one year from the date of purchase. Order by phone at 1-800-573-1099 or online at http://hp.productorder.com/smartstart/. SmartStart Server Packs SmartStart and Insight Manager7 ship standard with every ProLiant server packaged in the new ProLiant Essentials Foundation Pack. SmartStart Request Pack Customers who have received defective media can request a replacement single Request Pack. In the United States, SmartStart replacement CDs can be ordered by calling 1-800-OK-Compaq (1-800-652-6672). In other countries, customers should contact a Compaq Authorized Supplier or local HP Services Center for request pack availability and ordering information.

SmartStart 6.x Differences


SmartSmart 6.x is now part of the ProLiant Essentials Foundation Pack, which is the next generation tool to configure and install operating systems on ProLiant servers. SmartStart 6.x has been redesigned to more efficiently set up and deploy the new generation of ProLiant servers. Following are some key differences between SmartStart 5.x and SmartStart 6.x. SmartStart 6.x does support
Rev. 3.41

ML/DL G2 and G3 servers and some ML/DL G1 servers RBSU ROM Update Utility Array Configuration Utility Array Diagnostic Utility Erase utility Insight Diagnostics Survey Utility Pre-ML/DL and most ML/DL G1 servers SCU Replicated installation A requirement for s Server Profile Diskette or System Partition
77

SmartStart 6.x does not support

Servicing HP ProLiant Server Products

System Erase with SmartStart 5.x and 6.x


System Erase with SmartStart 5.x Use the System Erase Utility to erase all previous hardware and software configurations, including the network operating system, before using SmartStart 5.x to initialize a server. The System Erase Utility can be run from SmartStart 5.x. The System Erase Utility erases system configuration information from NVRAM, from the hard drives, and from the SMART array controllers present in the system. When reinitializing a system using SmartStart, be sure to use the System Erase Utility.
WARNING: The System Erase Utility is destructive to all data.

WARNING: If you start a previously configured server with SmartStart and it prompts you to run the System Erase Utility, do not run the System Erase Utility unless you want to clear all existing server configuration and data. The System Erase Utility destroys all configuration information and data. The System Erase Utility completely erases all hard drives.

System Erase with SmartStart 6.x SmartStart 6.x includes the System Erase Utility. The System Erase Utility provides options to clean different areas of the system: attached drives, nonattached drives, BIOS, and non-volatile RAM (NVRAM). Unlike previous versions of SmartStart, the System Erase Utility for SmartStart 6.3 does not erase the Smart Array controllers. For legacy systems, the Erase Utility is still available for download. To access the latest release of the Erase Utility, go to the Software and Drivers download area on www.hp.com.

Clearing Non-Volatile RAM


Non-volatile RAM (NVRAM) is an EEPROM that stores various data, including the critical error log, system information table (SIT), and status flags. Occasionally, NVRAM becomes corrupted or contains erroneous information. This can result in a wide range of errors ranging from no video and no POST to POST errors to system errors and lockups. Clearing NVRAM can resolve many errors that may seem unusual and have no apparent cause. Newer servers replace the System Configuration Utility with a ROM-Based Setup Utility (RBSU). This utility also has the capability of clearing NVRAM by selecting Erase non-volatile memory from the Advanced Options menu.

78

Rev. 3.41

Tools and Utilities

ROM Based Setup Utility (RBSU)


Starting with the ProLiant ML350 update in late 2000 and continuing with Generation 2 and later ProLiant servers, the ROM-Based Setup Utility (RBSU) replaced the System Configuration Utility (SCU) but provides many of the same functions. RBSU is machine-specific and customized for each type of server. RBSU performs a wide variety of configuration activities including the following: Viewing system information Selecting the operating system (OS) Configuring system devices and installed options Selecting the primary boot controller

RBSU is updateable and it is resident in ROM. The table below illustrates some of the feature differences between RBSU and SCU:
ROM-Based Setup Utility Saves changes to NVRAM as they are made Silent conflict resolution Embedded in system ROM; does not use disk Customized for each server resulting in smaller, faster utility Configuration oriented and table driven Replication utility support with configuration info in RBST table Utility update through RBSU ROM flash or physical ROM change System Configuration Utility Does not save changes until the user exits Displays warnings when conflicts are resolved Disk-based; can be installed on system partition Comprehensive utility one version supports all servers Device oriented and file driven No direct replication utility support except through configuration backup Utility update through new version of the software

Rev. 3.41

79

Servicing HP ProLiant Server Products

Running RBSU On a 32-bit server: 1. Press the F9 key when prompted during the startup sequence. 2. Modify configuration settings as desired. 3. Exit RBSU by pressing the Escape key at the main menu. The system must be restarted when configuration settings are changed. A confirmation to exit appears on the screen, and the current boot controller is also displayed for reference purposes. To confirm exiting RBSU, press the F10 key. The server restarts using the new configuration settings. Running RBSU On a 64-bit server: 1. Select System Maintenance from the Boot menu. 2. Select ROM-Based Setup Utility. 3. Modify configuration settings as desired. 4. Exit RBSU by pressing the Escape key. If you have made any changes that require the system to be restarted, a box will appear stating that the system must be restarted. Restart the server. The server powers up using the new configuration settings. Initial Boot On initial boot (for a system that has not yet been configured) you will be required to enter the following information:

Language and Operating System Primary boot controller Date and time
NOTE: To bypass this step you must insert a Diagnostics ROMpaq diskette into the floppy drive before booting the server. This would enable you to upgrade the ROM or run the SmartStart scripting tools.

7 10

Rev. 3.41

Tools and Utilities

RBSU Main Menu

This menu, located on the left-hand side of the screen, allows you to choose which configuration setting to view or modify. The choices are: System Options PCI Devices Standard Boot Order (IPL) (applies only to 32-bit servers) Boot Controller Order Date and Time Automatic Server Recovery (ASR) Server Passwords Server Asset Text (and IMD Textapplies only to 64-bit servers) Advanced Options BIOS Serial Console (applies only to 32-bit servers) Utility Language

On the right-hand side of the screen, a window displays basic information about the server. This information includes the server model, serial number, BIOS version, backup BIOS version, memory installed, and processors installed. Pressing the F1 key when any menu option is highlighted will allow you to view a description of that feature.

Rev. 3.41

7 11

Servicing HP ProLiant Server Products

System Options
Following are the options available from the System Options choice on the main menu: OS Selection Serial Number Embedded COM Port A Embedded COM Port B Embedded LPT Port Integrated Diskette Controller NUMLOCK Power-On State Embedded NIC Port Pre-Boot Execution Environment (PXE) Support (applies to 32-bit servers only) Diskette Write Control Diskette Boot Control Advanced Memory Protection

PCI Devices
The PCI Devices option displays the configuration settings of the PCI devices installed in the server and allows you to modify the IRQ. Multiple PCI devices can share an interrupt. To disable a device, press enter while the device is highlighted. A menu will appear with options to change the IRQ, as well as to disable the device. If the device cannot be disabled on your system, only IRQs will be available to change. IMPORTANT: Disabling a PCI Controller on a server with the PCI hot-plug driver installed will disable all controllers on that PCI driver if the server is running Windows 2000 or Windows.NET. To avoid this issue, remove the controller instead of disabling it. IMPORTANT: For 64-bit servers, devices can only be viewed, and no changes can be made.

Standard Boot Order


The Standard Boot Order (IPL) option configures the Initial Program Load (IPL) device, and controls the search order the server uses to look for a bootable device. NOTE: If you enable or disable a device, restart the server to update the list. Devices that have been enabled since the last reboot will not appear on the list. IMPORTANT: Standard Boot Order (IPL) applies to 32-bit servers only.

7 12

Rev. 3.41

Tools and Utilities

Boot Controller Order


The Boot Controller Order option selects which of the installed mass storage devices is used as the primary boot controller. The server attempts to power up with the operating system on this device. The primary boot controller is set to controller 1. NOTE: If you change the Boot Controller Order in the Option ROM Configuration Array Utility (ORCA), the change will be reflected in this menu.

Date and Time


The Date and Time option sets the system time and date. Enter the date in an mmdd-yyyy (month-day-year) format. Enter the time in a 24-hour (hh:mm:ss) format.

Automatic Server Recovery


The Automatic Server Recovery (ASR) menu includes options that configure the ASR features. The ASR menu may include the following options 1. ASR Status 2. ASR Timeout 3. Thermal Shutdown The ASR Status option is a simple toggle setting that either enables or disables ASR. When set to Disabled, no ASR features function. The ASR Timeout option sets a timeout limit for resetting a server that is not responding. When the server has not responded in the selected amount of time, it automatically resets. The Thermal Shutdown option is a simple toggle setting that determines when the server automatically powers down due to dangerous temperatures. When the setting is enabled (default), the Health Driver will initiate a shutdown of the system when the temperature reaches five degrees of critical level. When the setting is disabled, the Health Driver shuts down the system at critical level.

Rev. 3.41

7 13

Servicing HP ProLiant Server Products

Server Passwords
The Set Power-On Password option sets a password that controls access to the server during power-up. The server cannot be powered up until the correct password is entered. The Set Power-On Password option uses a simple character string with a maximum of seven characters. To disable or clear the password, enter the password followed by / (slash) when prompted to enter the password. The Set Admin Password option sets a password to control access to the administrative features of the server. The Set Admin Password option is a simple character string with a maximum of seven characters. To disable or clear the password, enter the password followed by / (slash) when prompted to enter the password. The Network Server Mode option is a simple toggle setting that sets the server to operate in network server mode. This feature works in conjunction with the poweron password. When set to Disabled, the server operates normally. When it is set to Enabled, the following actions occur: 1. The local keyboard remains locked until the power-on password is entered. 2. The power-on password prompt is bypassed. 3. When a diskette is in the diskette drive, the server does not start unless the power-on password is entered locally. NOTE: Network server Mode cannot be enabled until the power-on password has been established. The Quicklock option is a simple toggle setting that either enables or disables the Quicklock feature. When set to Enabled, the keyboard is locked by pressing the Ctrl+Alt+L keys. The keyboard remains locked until the power-on password is typed. NOTE: If the power-on password is disabled at the power-on key prompt, the Quicklock feature remains inactive until the password is changed in RBSU.

7 14

Rev. 3.41

Tools and Utilities

Server Asset Text


The Server Asset Text menu includes options that customize the system-specific text for the server. This information is reported on the Integrated Management Display (IMD), an option for ProLiant servers. The available options are: 1. Set Server Info Text 2. Set Administrator Info Text 3. Set Service Contact Text 4. Set IMD Custom Text (available only for 64-bit servers) The Set Server Info Text option defines reference information for the server including Server Name, Server Asset Tag, Server Primary Os and Other Text. The Set Administrator Info Text option defines reference information for the server.including Admin Name, Phone Number , Admin Pager Number and Other Text. The Set Service Contact Text option defines reference information for the service contact of the server including Service Name, Phone Number, Pager Number and Other Text. Set IMD Text ly(64-bit servers on) includes IMD Idle Screen Text, IMD Custom Menu Screen Text.

Rev. 3.41

7 15

Servicing HP ProLiant Server Products

Advanced Options
The Advanced Options menu includes options that allow you to configure the advanced features of the server. These include MPS Table Mode (applies only to 32-bit servers) Hot-Plug Resources (applies only to 32-bit servers) POST Speed Up (applies only to 32-bit servers) Post F1 Prompt Redundant ROM Selection Erase Non-volatile Memory Set CPU Corrected Wake-On LAN (applies only to 32-bit servers) Advanced memory protection IDE EDD 3.0 (applies only to 64-bit servers) NMI Debug Button (applies only to 32-bit servers) Custom POST Message Processor Hyper-Threading Secondary IDE Channel Support (applies only to the ProLiant ML530 G2 server)

System Partition SmartStart Automated Installation still creates and populates a System Partition for User Diagnostic Utilities. If a System Partition is available, the system will reboot and automatically run RBSU if you press the F10 key to enter the System Partition, and then select Configure Hardware. Embedded System Maintenance Utilities are found in some Compaq Generation 2 and later servers and can be run by pressing the F10 key when prompted from the Power-On Self Test (POST) sequence. Systems with a System Maintenance Menu do not have an accessible System Partition, so all system utilities should be run from the System Maintenance Menu.

7 16

Rev. 3.41

Tools and Utilities

Device Drivers
Current device drivers are essential for proper operation and can be obtained from:

SmartStart CD (PSP) HP Website http://www.hp.com/country/us/eng/support.html Internet FTP site ftp.Compaq.com Download Facility 281-518-1418, US and Canada (outside North America, contact your local Geo) Online Services CompuServe (keyword GO COMPAQ) America Online (keyword COMPAQ) Prodigy (keyword COMPAQ)

Technical Support Center 1-800-OK-COMPAQ (1-800-652-6672, US and Canada); outside North America, contact your local Geo

Rev. 3.41

7 17

Servicing HP ProLiant Server Products

Learning Check 1. The streamlined installation process for SmartStart 6.x has eliminated the need for which installation path found in SmartStart 5.x? _____________________________________________________________ 2. What utility provides the means of updating system, option and hard drive firmware? _____________________________________________________________ 3. What component erased by the system erase utility in SmartStart 5.x is not erased by the system erase utility in SmartStart 6.x? _____________________________________________________________ 4. List three sources of ProLiant Support Paqs (PSPs): _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ 5. List four functions of the ROM-Based Setup Utility (RBSU): _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ 6. The RBSU main menu is accessed by pressing which function key during the system boot process? _____________________________________________________________

7 18

Rev. 3.41

Tools and Utilities

7.

What utility is used to set the boot controller order? _____________________________________________________________

8. What RBSU main menu choice includes the options of erasing non-volatile memory and setting advanced memory protection?

_____________________________________________________________

Rev. 3.41

7 19

Servicing HP ProLiant Server Products

7 20

Rev. 3.41

HP Troubleshooting Methodology
Module 8

Introduction
The high degree of interaction between the system, options hardware, operating system, and software can make it difficult to isolate to the root cause of the problem. Intermittent problems and problems generated by multiple subsystem malfunctions can be especially difficult to troubleshoot. Minimizing the time to problem resolution is critical to attaining and maintaining a high level of customer satisfaction by maximizing the availability of HP equipment. Use of this methodology will enable service providers to distinguish themselves in the marketplace by being able to provide this higher level of customer satisfaction. This methodology provides a logical framework to troubleshoot system problems and reach problem resolution. A logical framework also provides a consistent and solid foundation for other technicians and system engineers to work from when escalation is necessary. This module presents the HP troubleshooting methodology, used to diagnose and resolve HP system issues. Topics include:

Troubleshooting prerequisites HP troubleshooting methodology overview Collecting data Evaluating information to isolate mode of failure Developing an optimized action plan Implementing the action plan Evaluating results Implementing preventive measures

Rev. 3.41

81

Servicing HP ProLiant Server Products

Objectives
To use the HP troubleshooting methodology, service personnel should be able to:

Identify the troubleshooting prerequisites. Explain the HP troubleshooting methodology for diagnosing and troubleshooting HP systems. Explain the importance of collecting data. Identify effective techniques for data collection. Evaluate information to isolate the specific mode of failure. Develop an optimized action plan with possible primary and alternate solutions. Implement the action plan. Implement preventive measures.

A Learning Check at the end of this module will test your understanding of the information and concepts presented.

82

Rev. 3.41

HP Troubleshooting Methodology

Troubleshooting Prerequisites
Observing Safety Precautions
The first step in troubleshooting must always include personal and data safety. Your personal safety is the single most important factor to protect when servicing equipment. Never work under unsafe conditions. If you feel your personal safety may be at risk, contact your service manager immediately. Protect yourself and HP equipment from contact with unintentional live voltage or ESD damage. Observe the following precautions when servicing HP equipment:

Electrical shock protection Physical injury and equipment protection Electrostatic discharge awareness and precautions

Electrical Shock Protection It is critically important to read and abide by the following electrical shock warnings to avoid the risk of personal injury. Contact HP technical assistance if you have any questions regarding electrical shock protection before servicing HP equipment.
WARNING: No one should attempt to make any repairs at the component level or to make any modifications to any printed circuit board. Improper repairs can create a safety hazard.

WARNING: Never disassemble or attempt to repair HP power supplies, UPSs or monitors. The yoke and deflection coils of a CRT typically have 20K V to 40K V applied and often the charge is held in-state by capacitors. Severe injury could occur by accidental contact with this circuitry.

WARNING: Before servicing system products, disconnect the power cord. In systems that have multiple power supplies, disconnect all the power cords. The high-end systems do not completely shut off with the front panel Power On/Standby switch. The standby position removes power from most of the electronics and the drives, but portions of the power supply and some internal circuitry remain active.

WARNING: Safety interlocks are installed on some HP servers. Do not attempt to permanently defeat the safety interlocks that prevent access to hazardous energy and avoid the risk of personal injury.

Rev. 3.41

83

Servicing HP ProLiant Server Products

Physical Injury and Equipment Protection HP equipment needs to be handled, installed, removed, and disassembled properly to avoid the risk of personal injury or possible damage to the equipment. Observe the following warnings to protect against personal injury or product damage:
WARNING: Avoid the risk of personal injury by not lifting or moving heavy items without assistance. This includes large monitors, systems, and rack components such as uninterruptible power supplies. For components installed above shoulder height in a rack, have assistance removing the component to a work surface for repair or preventive maintenance. After the work on the system has been completed, have assistance replacing the component in the rack.

WARNING: Before working on high-end systems in a tower form factor that have casters, lock the casters in place to prevent the system from rolling during disassembly.

WARNING: Allow internal components to cool before handling them to prevent the risk of personal injury.

WARNING: To reduce the risk of personal injury or damage to the rack, be sure that: The leveling jacks are extended to the floor. The full weight of the rack rests on the leveling jacks. The stabilizers are attached to the rack if it is a single rack installation. The racks are coupled together in multiple rack installations.

Do not overlook this next warning. A fully loaded 42U rack is extremely heavy, with a load capacity of 1,000 pounds.
WARNING: To reduce the risk of personal injury, always ensure that the rack is adequately stabilized before extending a component outside the rack. A rack may become unstable if more than one component is extended for any reason. Extend only one component at a time.

84

Rev. 3.41

HP Troubleshooting Methodology

Electrostatic Discharge Awareness and Precautions Static electricity is an electrical charge at rest. The Triboelectric Effect is the generation of static electricity caused by rubbing two substances together (mechanical friction). Static electricity is generated every time you walk across a carpet or pull tape from a roll. Most of the time you are not aware of it unless the air is dry and you can hear or see the static charge crackle and spark its way to a new location. At humidity levels of 40% or lower, just by moving around, you can build up a static potential in your body of hundreds of volts. ESD Precautions Observe electrostatic discharge (ESD) precautions when servicing HP equipment. Every time a system is opened or a part is handled, there is a risk of damaging system components with electrostatic charges or of harming yourself by accidentally coming into contact with live voltage if working on a system with power applied. When handling boards, use a wrist strap with safety resistance connected to an earth ground. The wrist strap keeps the static charge at near zero volts as it drains off the charge to earth ground. This allows the chips to be safe from static charges during handling. The safety resistance of a 1-M or 2-M resistor is installed in series between the wrist strap and the earth ground. This is important because if an accidental shock does occur (if you did get across the 120V ac line), the voltage would push a lot of current through the wrist strap to ground. This current will be absorbed by the resistor in the wrist strap and not by the low resistance of your body. ESDS Precautions Maintain and transport electrostatic discharge sensitive (ESDS) components in closed ESD protective packages (bags or containers). Keep ESDS items in their original ESD protective containers until they are needed to avoid unnecessary handling. Unpack and handle parts only at an ESD approved workstation. Tape documentation to the outside of the bag to avoid direct contact with the ESDS item. ESDS material returned for use must be packaged in ESD protective containers. Do not use any damaged ESD protective packages, that is, bags that are ripped, torn, crumpled, punctured, and so on. Keep packing material away from the ESD safe workstation.
Rev. 3.41

85

Servicing HP ProLiant Server Products

Obtaining a Complete System Backup


System data is the most important item to protect on the system. If an action plan produces disastrous results, you must be able to restore the system to its original state. Be sure to back up the contents of the hard drives and the system configuration before executing the action plan. It is also important to document any software parameters that will be changed. A complete system backup includes:

Operating system backup System configuration backup Documenting existing software settings

Operating System Backup If the system contains valuable data, verify that the customer has at least two complete known-good backups of the operating system and data, a copy of the backup software, and a functional tape drive that can read the backup. Two backups ensure complete data recovery in case something happens to the first tape or during the first restore attempt. Ensure that the customer understands that backups and restores are their responsibility. If the customer is not willing to take responsibility for these actions, do not put yourself in jeopardy. Contact your service manager for further direction. Do not overlook the importance of verifying this pre-work. Losing a companys data without any means to recover it is truly the worst possible scenario. Avoid it at all costs. Even seemingly trivial circumstances can lead to disastrous consequences if a backup is not available. System Configuration Backup Document the system settings, if this has not been done already. If the system configuration will be changed, first obtain a record of the current system configuration settings.

Create a backup.sci file to a diskette before making any changes. To do this, go into the system partition by using the F10 key during the boot up process. Select System Configuration, Configuration Backup, Backup to a System Configuration SCI file, Enter filename (backup.sci), and press Enter to write the data to a diskette.

Print out an Inspect report.


NOTE: Do not underestimate or forget about viruses. Many of their symptoms emulate hardware failures. Be sure that the customer is scanning with the latest version their virus software. New viruses are found every day.

86

Rev. 3.41

HP Troubleshooting Methodology

Documenting Existing Software Settings If software settings will be changed as part of the action plan, first record the existing settings and parameters. If the action plan does not work, the original settings can be restored. If new files will replace old files, first rename the original files so that they can be reused later if the action plan fails. Generally, the file extension can be changed to something distinguishable such as .old or .bad. Record the original and the new name of the files changed. General Server Shutdown and Startup Procedures Ask the system administrator to follow HPs recommended general procedures for server shutdown and startup, as listed in the following table:
Operation General server shutdown Procedure 1. 2. 3. 4. General server startup 1. 2. 3. Exit applications Exit operating system Power down the server Power down the peripherals Start up peripherals. Start up server. Look for errors.

Rev. 3.41

87

Servicing HP ProLiant Server Products

Learning What Is Normal


Before you can begin to identify a system failure, you must understand how the system should operate under normal circumstances. To identify a malfunctioning system, you need to understand what a system requires to operate properly as well as be able to recognize when a system is performing normally. It is necessary to understand items such as when, why, and what LEDs illuminate, in what order system components need to be powered up, the importance of SCSI termination, and the boot load order of files. Not understanding how a system or subsystem operates can lead to unnecessary part replacement, unneeded software upgrades, as well as wasted time and effort. These things lead to unnecessary downtime and greatly reduced customer satisfaction. Many tools can help identify normal activity and proper system setup. The manuals and CDs that ship with HP systems and options are good references that describe proper operation. For example, the UPS User Guide illustrates what a particular sequence of LEDs on a UPS indicates. Maintenance and Service Guides provide many system specifications as well as symptoms of failures and troubleshooting hints. QuickFind contains a comprehensive list of error codes, switch settings and other system and option documentation. Equally as important as knowing how a system should operate is understanding how a system will react to a failed hardware or software component. Knowledge of the systems dependency on its components includes how the system will perform if component x completely fails and how it will perform if component x partially fails. This ability involves recognizing the warning signs of a failed or failing component. These signs or symptoms may include consistent or intermittent error codes, loss of functionality, or a change in the time the system takes to perform a task. As you gain experience in troubleshooting systems, you will become familiar with which error codes and symptoms follow which incidents or failures. Familiarity with how a system will react to a failed component and the ability to collect relevant data will significantly aid in identifying the true cause of the problem regardless of what it may be.

88

Rev. 3.41

HP Troubleshooting Methodology

HP Troubleshooting Methodology Overview


This flowchart illustrates the HP troubleshooting methodology. An overview of each step follows.
Collect Data
Need More Information
1. Ask the right questions. 2. Determine and use the most appropriate tools for each situation. 3. Understand how the machine itself will react in a failure situation.

Data is insufficient to identify problem correctly.

Result = Identified Mode of Failure

Action plan did not resolve expected issue or identify alternate cause of issue.

Evaluate the Data to Determine Potential Subsystems Causing Issue


Re-Diagnose Mode of Failure
1. Understand the REAL (not reported) mode of failure. 2. Determine which subsystem could cause what happened. 3. Isolate faults to a subsystem (FRU) or software component level.

Result = Isolated to Subsystems

Develop an Optimized Action Plan


1. Identify possible root causes of issue for each potential subsystem. 2. Identify possible solutions for each possible root cause. 3. Prioritize the possible solutions by balancing time/cost it will take to implement each solution against likelihood that the solution will fix the issue or the potential value of the information gained from the failure of the solution. 4. Identify the steps necessary to implement each solution. 5. Compile all the steps into a master plan by eliminating redundancy and ensuring that you are only manipulating one variable at a time.

Eliminate last task. Re-evaluate action plan order.

Try Alternate Solutions

Result = Have a Plan of Attack

Previous task or test did not resolve or identify cause of issue.

Execute the Action Plan


Execute Next Task in Action Plan
Carefully execute each step, making sure to apply only one solution or one variable at a time.

NO

Problem Solved?
YES

Implement Preventive Measures

Rev. 3.41

89

Servicing HP ProLiant Server Products

Step 1 Collect Data


Most of the time involved in troubleshooting a problem is spent in gathering information. To arrive at an accurate problem description, you need to develop these skills and knowledge level:

Ability to ask the right questions Ability to determine and use the most appropriate tools for each situation Understanding of how the system will react in a failure scenario Identifying hardware components in the system Identifying software components in the system Asking questions to understand what failed and in what context Continuing to ask questions to learn as much detail as possible Gathering failure information such as: Stop/Abend/Trap messages Insight Manager error conditions Critical error log messages POST messages

Collecting data includes:


Organizing collected data

Step 2 Evaluate the Data to Determine Potential Subsystems Causing the Issue
After you collect data and identify the symptoms, evaluate all of these facts and symptoms to:

Determine which components could cause what happened. Isolate faults to a hardware or software subsystem. Understand the mode of failure.

8 10

Rev. 3.41

HP Troubleshooting Methodology

Step 3 Develop an Optimized Action Plan


After collecting the facts and isolating the specific mode of failure, develop an optimized action plan.

Identify specific root causes for specified mode of failure. Identify possible solutions for each possible root cause. Order the solution by balancing the time/cost it will take to implement each solution against the likelihood that the solution will fix the issue or by the potential value of the information gained if the solution is inadequate. Identify the steps necessary to implement each solution. Compile all the steps into an optimized action plan by eliminating redundancy and ensuring that only one variable is being manipulated at a time. Incorporate an escalation plan into the master action plan: Be prepared to escalate for technical assistance. List the order of whom to contact and the information needed by each.

Step 4 Execute the Action Plan


Implement the written optimized action plan. Carefully observe and record the results of each step. Even if the action plan does not solve the problem, it may provide more clues to solving it.

Carefully execute each step. Apply only one solution or variable at a time. Observe and record the results of each step including any error messages or changes in functionality.

Step 5 Determine if the Problem Is Solved


Observe the results of each step in each solution and evaluate the results of each step until the problem has been isolated and resolved. If the problem is not resolved:

Collect more data. Utilize the information gathered from implementation of the action plan. Evaluate the information. Develop another optimized action plan. Implement the optimized action plan.

Repeat these steps as additional information is gathered and new action plans are optimized, executed, and evaluated, until problem resolution is reached.
Rev. 3.41

8 11

Servicing HP ProLiant Server Products

Step 6 Implement Preventive Measures


As soon as the problem is resolved, look at opportunities to implement preventive measures to avert the problem from occurring again and look for other ways to availability. To implement preventive measures:

Determine the root cause of the problem. Determine proactive steps that can prevent the problem from recurring. Devise a system test to verify changes and procedures before implementing them into production. Implement a new set of procedures, software, and administrative maintenance to attain a higher level of availability. Perform preventive maintenance, including checking for loose cables, reseating boards, and checking for proper airflow. Add fault tolerant elements to critical subsystems, where applicable.

8 12

Rev. 3.41

HP Troubleshooting Methodology

Step 1-Collecting Data


The troubleshooting process starts with collecting data. The first piece of information needed is an understanding of the customers perception of the problem. Ask the right questions and listen to the customer. Next, collect information. This can be done through questioning, observation, problem duplication, and focused observation. This should give you insight into the actual problem. Last, organize the collected data so that it can be easily evaluated.

Rev. 3.41

8 13

Servicing HP ProLiant Server Products

Understanding the Customers Reported Problem


The first item to be identified in the troubleshooting process is the customers reported problem. Listen carefully and ask questions to understand specifically what the customer perceives to be wrong. Equally important is discovering and understanding the customers perception of the exhibited symptoms and their possible cause or causes. Realize that what the customer has identified as the cause could be very different from the actual cause. Refrain from making assumptions and conclusions. Remain neutral on identifying the true cause of the failure until you have collected enough data to make an impartial evaluation and hypothesis. Once you have an understanding of the customers perception of the problem, use the appropriate data collection technique to troubleshoot further.

Understanding the True Failure


The next step is to understand the true failure. Troubleshooting cannot progress until the exact problem is identified. Without understanding what is truly wrong, it is impossible to correct it. It is important to get to the root cause of the problem. Symptoms may mask the root cause of the problem. Fixing only a symptom is at best a temporary solution. Whatever caused the symptom to occur in the first place will most likely make it occur again. A failing component may knock out several more highly visible components. These component failures are effects of the cause. If just the components are replaced without also replacing the component that is knocking them out, these components will soon need to be replaced again. As you proceed, validate the customers problem while digging for the true failure. It is important that both be addressed.

8 14

Rev. 3.41

HP Troubleshooting Methodology

Data Collection Techniques


The majority of the time spent in effective troubleshooting is spent in the data collection phase searching for the cause and the effect. The cause will have occurred shortly before the appearance of the effect. Find out what happened or changed just before you recognized the problem. Use reference tools to help you understand error messages and results. Data collection techniques include:

Questioning Observation Problem duplication Focused observation and magnification

Questioning Questioning is a valuable technique, but you should understand that everything you hear may not be one hundred percent accurate. Much of the reply will be perception or from memory. Questioning is useful to understand the customers perception of the problem and will provide valuable clues that may lead you in the right direction. Information gained from questioning must be validated. Questioning also involves careful listening skills. Do not interrupt a customer and never assume you know what the customer is going to say. Questioning is the art of polite interrogation. Open-Ended Questions When beginning your data collection, ask open-ended questions. Open-ended questions are those that will provoke and permit spontaneous and unguided responses. The customers complete explanation will provide more details than a short answer. A customer may mention something that turns out to be a valuable clue. Ask the customer to explain what happened and listen instead of asking a series of short questions looking for specific details. The Right Questions Asking the right questions obviously depends on the immediate issue. First, center your questions on identifying failure symptoms. Once all the symptoms have been identified, the questions should then center on identifying what may have occurred before the symptoms appeared. This line of questioning will help isolate the problem to a subsystem or to a defective field replaceable unit within that subsystem.

Rev. 3.41

8 15

Servicing HP ProLiant Server Products

Controlled Questioning Once generic open-ended questioning is finished, use controlled questioning to dig deeper into the situation. The results of the open-ended questioning should provide you with a beginning baseline of what occurred and what symptoms appeared. Controlled questions should be used when one of the answers provided to an open-ended question is either not logical or provides a clue to a malfunctioning subsystem. When a clue is provided, examine further by asking specific, focused, probing questions. The answers to these questions should allow you to ask even more specific and relevant questions until you understand how the system is functioning and can define precisely when and where the error occurs and in what context the error occurs. The answers will taper off as you tap out the customers knowledge of the failing system and situation. At this stage, use the appropriate tools and utilities to collect the information that the customer was unable to provide and to validate or invalidate the information.

Look into all the error logs for information. Look up operating system errors in the appropriate tools. These tools usually offer valuable information on the root cause of the problem and offer suggestions on which items to check. Probe related components for possible links. Continue in this manner until you have collected as many facts as possible regarding the condition of the system both before and after the problem occurred.

Examples Collect enough data to pull the pieces together to see the big picture:

Can you describe the problem in detail? What happened prior to the point of malfunction? Look for discrepancies. What does not fit? Is there a system log, an Inspect report, a network map? Was there an error message? What was it?

8 16

Rev. 3.41

HP Troubleshooting Methodology

Observation Observation is an important and useful data collection technique. Using your powers of observation and your senses can provide critical information on failures. If you are in front of the system, observation should always be one of the first techniques used. This can be accomplished by physical inspection. Look for something that is not connected or that is out of place. Visual Indicators Look for:

Something that appears wrong Charred or discolored components Unconnected plugs or cables Switches in an incorrect position Smoke Physical damage LED activity

Audible Indicators Listen for:


Anything that sounds different Anything that sounds wrong Beeps Clicks Whirring sounds Grinding sounds

Olfactory Indicators

Sniff for bad or unusual odors. (Acrid electrical odor indicates burnout.)

Tactile Indicators

Carefully touch components to learn if something is cool when it should be warm. Carefully touch components to learn if something is overheating. Toggle switches to find out if they click in place or are loose. Wiggle cables or wires to find out if they are loose. Press down on boards to find out if they are seated correctly or reseat them. Use your fingers to detect frayed cables (better than visual observation).
8 17

Rev. 3.41

Servicing HP ProLiant Server Products

Researching Error Messages


If you have collected error messages, use any manuals, reference guides, or application-supplied help screens to find information on those error messages. QuickFind, HPs website, and Service Advisories are also an excellent source of additional information. Is It Really an Error? Be aware that while you are researching error messages you will run across some that are benign. Some error messages are simply informational messages and are not reporting an error condition. The more you learn about operating systems and applications, the more you will be able to recognize what is outside the norm. Knowing what to expect and how a system functions can quickly resolve those calls from customers believing that something is wrong with their system when in actuality it is performing as designed. The customers perception or fear that their system is malfunctioning can quickly be put to rest with an accurate assurance that what they are observing is normal and the reason why. Interview the customer to get specifics and if possible ask the customer to duplicate it. It may be a user error or oversight and the customer simply needs some redirection.

8 18

Rev. 3.41

HP Troubleshooting Methodology

Organizing the Collected Data


Once the data has been collected, it should be organized so that it is easily evaluated. The organization of the data for a single problem is called baselining. Data for multiple problems can be organized in a field journal, providing a service history. Baselining Baselining is the process of organizing the data collected into the specific configuration of a system. This process will provide a reference point for managing change. A baseline can also be used to compare a malfunctioning system with a functional one. There are two types of baseline:

A complete baseline is a document or set of documents defining all the facts that can be known about the hardware, software, and firmware configuration of a system including its environmental conditions. It also includes the version of diagnostic tools and utilities used on the system. There are no guesses, estimations, or assumptions in a baseline. All details are researched and verified. A working baseline is the set of facts needed to understand the problem. The more facts that are collected, the more accurate the assessment of the problem will be.

Process of Baselining Document all the facts you collect about the system. If you can, print a screen image of any error messages that may have been produced for further reference. Create useful drawings such as a diagram indicating which boards are in which slots or a diagram of nodes with their network addresses. As you go through the data collection process of asking questions, making observations, using diagnostic tools, and controlled questioning to gather more detail, document all the facts. This set of documents, printouts, and drawings are your accurate field notes that together create a snapshot of the system before any changes are made. As you gain experience in creating a baseline, you will automatically gather the information needed to produce a working baseline defining how the system is functioning, precisely when and where the error occurs, and in what context the error occurs. The baseline can be carefully evaluated in the next step of the HP troubleshooting methodology to determine which subsystems have the potential to produce the symptoms recorded based on the current hardware, software and firmware configuration, and environmental conditions. The troubleshooting direction you take next will be determined by the evaluation of this information. Once solutions are tried and tested, the changes and the results will be compared against this baseline.

Rev. 3.41

8 19

Servicing HP ProLiant Server Products

Field Journal Many technicians and system engineers maintain notebooks or electronic journals filled with field cases and solutions so they have written records to draw on in the future. Most organize journals into sections, one for each subsystem with subsections for problems and solutions. This is one of the most valuable reference tools because it is filled with real problems the author lived through. Some create their own journals by creating a template and then binding many copies of this template in one notebook. As each new problem is attacked, relevant information is filled in on the template. This is entirely a personal system, there is no one right way of doing this. What works for one technician may not work for another. The following information is common to many of these journals. The information listed here is only to provide an example. Pertinent Hardware Items

Model of the system or subsystem Serial number Version of the System Configuration Utility Version of the System Diagnostics System ROM date or version Options ROM date or version Type, quantity, size, speed, and layout of RAM HP boards and slot locations Third-party boards and slot locations, if any Externally connected hardware, if any POST error code, if any Other errors reported, if any

8 20

Rev. 3.41

HP Troubleshooting Methodology

Step 2-Evaluating Information to Isolate Mode of Failure


Developing the ability to isolate faults to a subsystem and ultimately to a particular field replaceable unit (FRU) is an important step in gaining the skill set needed to effectively diagnose and resolve HP system self-explanatory. Problems can be chameleons. Techniques that can dig further into the problem or divide it up into manageable pieces are important for effective and efficient problem diagnosis and resolution. Troubleshooting techniques that can be used to effectively troubleshoot and isolate faults in HP products include:

Elimination Minimum configuration

Once you have evaluated the data, determine if a true failure exists before proceeding.

Rev. 3.41

8 21

Servicing HP ProLiant Server Products

Elimination
The process of elimination is an important part of troubleshooting. The elimination technique simplifies the variables or FRUs that make up the present configuration by removing suspect FRUs. Remove suspect FRUs to observe how the system operates when they are not part of it. If the system still malfunctions, these FRUs are probably not contributing to the problem. However, if the problem is resolved, one or more of these FRUs is a contributing factor. Eliminating FRUs is also valuable to see if the problem changes once they are removed, thus identifying new potential FRU failures. Example Some of the older ROM versions in switchboxes caused jerky pointing device movements and freeze-ups. The firmware can be changed out to correct this. To verify that the switchbox is causing these symptoms, temporarily remove the switchbox from the system. Removing the switchbox and switchbox cables can also be used to isolate miscellaneous video, keyboard, and pointing device problems. If the problem disappears when the switchbox is eliminated, then either the switchbox or the switchbox cables are the problem FRU. Further elimination will isolate the defective part.

8 22

Rev. 3.41

HP Troubleshooting Methodology

Minimum Configuration
Minimum configuration is the process of removing all the FRUs except the ones necessary to configure the system to a minimum configuration. This is a drastic, but effective way to eliminate a large quantity of FRUs at once. If the problem still occurs, all those removed components have just been eliminated as contributors to the problem. If the problem goes away, take your time adding components back until the cause or causes are discovered. Variations on this technique that can be useful in troubleshooting include reducing just memory or just processors to the minimum hardware configuration or just removing all added expansion boards. Reducing the system to just HP components is a fairly common variation that quickly identifies any conflicts with third-party components. This technique is good to use if you have no indication of which FRU may be contributing to the problem. In some cases, removing unnecessary hardware immediately also provides a set of spares if duplicate parts are installed in the system. Example Temporarily remove all options installed in the expansion slots, any additional memory DIMMs and processors, and their processor power modules (PPMs). Boot the system back up and test if the problem still occurs.

Rev. 3.41

8 23

Servicing HP ProLiant Server Products

Determining if a Customers Issue Is a True Failure


After evaluating the data, you may occasionally find that a customer reports a failure when no problem exists. These situations generally fall into two categories. The first is the customers unrealistic expectation of the systems or options performance, and the second is the customers lack of understanding of the method of operation of the system or component.

If the customer expects a system or option to perform to a specific level and it cannot, the customer will view this as a failure. The system or option may not have the capability to perform what the customer expects or may not have the capability to perform to the degree that the customer is expecting. For example, the customer expects the integrated SCSI controller to be able to perform as a SCSI array controller; but because it does not have that capability, it cannot. There is absolutely nothing wrong with the integrated SCSI controller. To resolve this issue, educate the customer. A careful and accurate explanation of what the system or option is capable of performing needs to be relayed. The customer may need to purchase additional equipment or select a different system to attain the functionality or level of performance desired.

Understanding how a system functions when running properly can quickly resolve those calls from customers believing that something is wrong with their system when it is performing properly. The customers lack of understanding of how a system operates or inaccurate knowledge regarding how a system should operate can lead the customer to believe that a properly functioning system is malfunctioning. For example, if the customer does not understand the significance of the various LED illuminations, the customer may believe that the system is defective when, in fact, the system is functioning normally. The customers perception that their system is malfunctioning can be resolved with a detailed explanation of why a particular behavior is observed, what it indicates, and why it is normal. Point the customer to documentation, if possible.

8 24

Rev. 3.41

HP Troubleshooting Methodology

Step 3-Developing an Optimized Action Plan


The action plan is a series of possible cures, variables, or solutions to correct the present fault condition of the system. Developing an optimized action plan includes:

Identifying possible root causes Identifying possible solutions Planning and scheduling Identifying potential problems in implementing a solution Identifying how to test results Optimizing the action plan

Identifying Possible Root Causes


Once the collected data has been carefully evaluated, certain subsystems were identified as having the potential to cause the problem. Take into consideration all symptoms to adequately identify the correct subsystem(s). Based on that determination, the next step is to identify the root cause or causes behind the failure of that subsystem. It is important to accurately identify the true failure as opposed to resolving the symptoms. For example, if the power processor modules (PPMs) fail repeatedly, this is actually a symptom. PPMs typically do not have repeated failures. Something else is the root cause and must be identified so that the true failure can be solved. Replacing PPMs has only a temporary effect. Possible root causes are a faulty processor board, faulty power supply, or line voltage problems.

Identifying Possible Solutions


Identify all possible solutions to each possible root cause of the problem. Try to come up with several alternate plans in case the first plan does not work. Solving complicated problems may require several attempts. Remember, an obvious, easy solution may be all that is required. Try not to make the time-consuming mistake of replacing parts when configuring the system properly is all that is needed. For more complicated problems, analyze the data collected and brainstorm alternate solutions, especially if this is the second attempt at generating an action plan. Make logical conclusions instead of assumptions. The master action plan is a compilation of a series of possible solutions that attempt to solve the problem. Optimize this plan so the most effective or most efficient possible solutions are tried before the more time-consuming and less likely ones. Possible solutions to the problem of the failing PPMs may include replacing the processor board, replacing the power supply, and acquiring a UPS. Having an electrician check out the line voltage is another solution. The failed PPMs need to be replaced, so they are included as a step in the action plan.
Rev. 3.41

8 25

Servicing HP ProLiant Server Products

Planning and Scheduling


In developing an action plan, you must determine how much time it will take to execute the multiple solutions involved. Scheduling an adequate period of time to execute the action plan is critical to the success of the plan. Knowing the length of time available to execute the action plan will help you to prioritize the solutions. If the downtime window is only one hour, time-consuming solutions are out. List in the action plan all the tools required to perform the steps in the plan. Tools may include a Torx screwdriver, loopback cable, ESD wrist strap, mat, and a flashlight. List other necessary prerequisites that must be completed before the action plan can be executed. The main prerequisite is two complete verified system backups.

Identifying Potential Problems in Implementing a Solution


Even the best solutions can generate problems. Obstacles may include excessive time to implement, excessive cost, excessive labor to execute, possible side effects, and the likelihood of completely and adequately solving the problem. Remain objective about any solution. Being too optimistic about any possible solutions success can cloud its negative aspects. List the negative aspects of the possible solution. This step will be used to rank the solutions in the master action plan.

Identifying How to Test Results


Look for ways the effectiveness of the solution can be adequately measured through testing. If possible, list several tests for each solution. The tests may be identical across many solutions. If the problem is with the re-indexing of a database, then re-indexing may be the only adequate test. Consider whether the customer would be satisfied by the test. If HP diagnostics were failing before, it would be a good test to rerun them after the action plan is executed.

8 26

Rev. 3.41

HP Troubleshooting Methodology

Optimizing the Action Plan


When you optimize the action plan, try to balance the following criteria as you rank the solutions:

Prioritize the possibilities. Avoid backtracking. Eliminate redundant steps. Change one variable at a time and implement one solution at a time.

Prioritizing the Possibilities After you have listed all the possible solutions to the root causes, prioritize them according to their likelihood of correcting the root cause. Once the most likely solution is selected, continue down the list, selecting the next best proposed solution until they are all in the most sensible order. Take into consideration the side effects the solutions could cause. Weigh the following criteria when prioritizing the potential solutions:

The time it will take to execute If a possible solution takes a very long time to execute, it may not be a reasonable solution to execute. The cost of downtime is very high and must be kept to a minimum. A solution that involves a great deal of time to execute may be a great solution, but because it is time consuming it may not be ordered as the first solution to act on.

The monetary cost and the amount of work involved Costly or difficult solutions are usually ordered after less costly or easier ones. These solutions may also require more preparation time as well. When calculating monetary cost, remember to think about the cost of downtime to the customer.

The value of the information gained from the failure of the solution Even when a solution fails, valuable information may be gained. This failure may make the problem worse. If it does, you are probably on the right track, and you need just to pick a different variable. New error messages generated by the system may provide the clues you were missing. No change at all may indicate that you have selected the wrong subsystem to troubleshoot. The failure may eliminate a subsystem or FRU as the cause of failure and point you toward a different subsystem as the root cause.

Rev. 3.41

8 27

Servicing HP ProLiant Server Products

Avoiding Backtracking Backtracking can waste valuable time. Watch for it when you create the action plan and try to eliminate it as much as possible during the optimization phase. An example of backtracking is going to an error log twice to look for different error messages. Eliminating Redundant Steps Look at ways to completely eliminate redundant steps. If that is not possible, reduce them as much as possible. Performing redundant steps if not completely necessary is also a waste of your time. Look at ways of linking two solutions together into one master solution if it can be done in a logical way. Changing One Variable at a Time and Implementing One Solution at a Time Execute each step of the plan by applying only one solution or by changing only one variable at a time. Understanding exactly what corrected the problem can lead to understanding what may have caused the problem to occur in the first place. When multiple changes are made at one time, it is impossible to know which one solved or modified the problem. Whenever possible, avoid applying multiple changes at one time.

8 28

Rev. 3.41

HP Troubleshooting Methodology

Step 4-Executing the Action Plan


Before implementing the action plan, take appropriate precautions and check that any necessary pre-work is complete, such as system and configuration backups. As the action plan is implemented, use caution and put your personal safety first. To safely implement an action plan and achieve problem resolution:

Schedule time with the customer. Implement the optimized action plan.

Scheduling Time with the Customer


If the system to be worked on is currently in production, work with the system administrator or the person who is responsible for scheduling downtime. Downtime is very expensive to our customers and must be scheduled appropriately to minimize the effect. The service event should be scheduled only when there is a sufficient window available to perform the necessary work and to recover in the event of a catastrophic failure.

Set up a timeframe to execute the action plan, observe, and evaluate the results. Include enough time to implement preventive measures and perform preventive maintenance. Underestimating the time needed to perform the necessary work or to recover from a disaster will result, at best, in rushing through a job or, at worst, in failing to complete a job coupled with customer dissatisfaction. Explain to the customer exactly what the action plan involves, how much time is required to execute it, and the risks associated with it. Depending on the guidelines set out by the service center manager, your service center may also need to be informed. The customer should be familiar with how much time a complete restore will take in a worst-case scenario and be able to estimate the total amount of time required. If the customer selects non-business hours for warranty work to be scheduled, or cannot provide an adequate window to perform the necessary work, contact your service manager for direction.

Rev. 3.41

8 29

Servicing HP ProLiant Server Products

Implementing the Optimized Action Plan


After the action plan is optimized and all appropriate precautions have been satisfied, it is finally time to implement the action plan.

Take a few minutes to look at the action plan to make sure that any necessary precautions and pre-work have been adhered to and completed and that the steps appear complete and in the correct order. Anything that needs to be completed, fixed, or taken care of should be done now. Make sure that everything needed to execute and complete the action plan is ready. Gather tools needed for system disassembly, necessary spare parts, and configuration utilities, if appropriate. Now it is time to begin. Change only one variable at a time. Record the output. Observe the results. Test the solution.

Be sure to do the following when implementing the optimized action plan:


Changing Only One Variable at a Time Implement each step of the plan by applying only one solution or by changing only one variable at a time. It is important to work with only one change at a time regardless if that change involves a modification, addition, or deletion of a specified item. By following this simple guideline, you can observe exactly which variable corrected or modified the existing problem or a symptom of the problem. You can also observe which variable appears to have no visible impact at all on the problem or symptom. This is equally important information to collect to understand and resolve the system problem. Understanding exactly what corrected the problem can lead to understanding what may have caused the problem to occur in the first place. Understanding the root cause is important to prevent the problem from occurring again in the future and putting needed preventive measures in place to ensure it. If you make multiple changes at one time, you do not know which one solved or modified the problem. Avoid applying multiple changes at one time whenever possible.

8 30

Rev. 3.41

HP Troubleshooting Methodology

Recording the Output On a hard copy of the optimized action plan, draw a column beside the steps. This column will be used to record the results of the steps executed. Immediately following the complete execution of a possible solution, these results or output will be used as input to the evaluation process, which judges if the solution completely solves the problem under every situation. Record the results of each step after it is executed, making sure to include any error messages or additional information collected. For future reference, record the date and time as well as how long it took to complete these steps. If these written records are complete and accurate, they will serve to eliminate future guesswork and possible confusion regarding what happened in what order. If the action plan solved the problem, it is a recorded solution for future use if the problem reappears. Observing the Results Carefully observe each step of the action plan as it is implemented. Watch for the occurrence of new symptoms or the elimination of existing ones. Some results are obvious, such as the introduction of informational or error messages, or significant changes in functionality. Other changes may not be as obvious and may require checking system logs for any new events recorded after the change was made. If the action plan calls for a reboot, watch for changes at POST if relevant. If the system has been failing at a certain point, watch if the system can now go past it or if it still fails at the same point. The following are examples of the types of observations that should be recorded on the hard copy of the action plan.

What are the results of the step? Watch for and record new symptoms, such as error messages or informational messages. Did anything change? If so, what? Check system logs. Look for any type of change, no matter how insignificant. Was any functionality gained or diminished? Functionality changes are an important indicator of the effectiveness of the action plan. Were any errors made in implementing the step? Was more than one variable changed at a time? Watch for and record any mistakes made while executing the step or the action plan. Were any steps skipped or completed out of order? Circle the steps not executed and number the true order in which the steps were executed. Were any steps accidentally added? Were any steps added intentionally to complete or correct another step? Place check marks against the steps as they are executed to avoid this. If steps had to be added in order to proceed, record why and indicate what step they were added after.

Rev. 3.41

8 31

Servicing HP ProLiant Server Products

Testing the Solution Once a solution has been completely applied, the action plan should provide for running a test or series of tests to be evaluated later. The tests may include running diagnostic utilities that up to this point have failed and displayed a particular error message. If the problem is with an application or database, it may be necessary to have the customer perform the test and evaluate the results. It is important to observe and record the results of each test executed. Also, notice if the test finishes or stops at a certain point. If a test does not complete, run it again and record those results as well. It is equally important to take notice if the solution has had no visible change to the system or system logs. The solution produced no results and may warrant reexamination of the subsystem or other subsystems.

8 32

Rev. 3.41

HP Troubleshooting Methodology

Step 5-Evaluating Results


The written action plan and results need to be evaluated to determine if the solution has adequately resolved the problem. If the results do not adequately correct the problem, then go back through the relevant steps in the flowchart until the problem is resolved. This may include additional use of diagnostic tools to collect more data and reevaluate the situation and develop a new action plan. If the results are satisfactory, then preventive measures can be implemented. Evaluating the results is usually a straightforward process, but sometimes it can be difficult to evaluate the results especially if the symptom appears only while reindexing a database. Involve the customer as needed in the testing and evaluation processes.

Evaluation Criteria
Consider the following criteria in evaluating if the solution adequately corrects and addresses the problem:

Are the results logical and consistent? Are the results what you expected to see? Is there an indication that steps were left out or added, or that other deviations were made in executing the action plan? Are there any error messages or new symptoms among the results? Do the results indicate if any side effects have been inadvertently introduced? Do the results prove that the problem is completely resolved? If the problem is only partially resolved, were there any functionality gains or losses or did any error message or symptom change? Is the solution only a temporary patch? Is the solution actually a workaround? Were sufficient tests run to check if the solution works correctly under all conditions?

The result of this evaluation should clearly indicate whether the problem has been resolved. If the results are inconclusive, perform additional tests. Ask the customer to test the solution. It may take several days to determine if the solution actually fixes the problem and does not generate any side effects.

Rev. 3.41

8 33

Servicing HP ProLiant Server Products

If the Solution Fails To Fix the Problem


Once a solution is implemented, evaluate the results to see if the problem is solved. There will be many times during the troubleshooting process when the problem is not solved. Refer to the HP troubleshooting methodology flowchart to help you determine your next step in troubleshooting a system problem. When the solution does not fix the problem, keep trying the next variable or task in the optimized action plan until all of them have been implemented and evaluated or until one of them solves the problem. If none of these attempts solve the problem, then check to see if any errors were made during the execution of any of the solutions. One of the solutions may need to be retried because of errors made during execution. If the evaluation process discards all these attempts as solutions not solving the problem, then develop another action plan and optimize it. Keep doing this until all of the alternate solutions have been tried and evaluated or until a resolution has been found. If none of the solutions and variables in the new optimized action plan corrected the problem, back up another step and re-diagnose the mode of failure and evaluate which subsystem may have caused the failure. Evaluate the new data collected from the results of the failed attempts. Look up any new error messages in the appropriate utility. There may be some valuable clues here. Develop a new optimized action plan based on this new data. Execute and evaluate these solutions until either a resolution is found or until all of them have been tried. If all of these new solutions also fail, go back to the first step and collect more data. Use this new data to determine which subsystem may be causing this failure and develop an action plan and optimize it. Execute all the variables and solutions in this action plan until either they are all tried or until one of them solves the problem. This cycle will continue to repeat until a resolution is found.

8 34

Rev. 3.41

HP Troubleshooting Methodology

If the Solution Fixes the Problem


As soon as you are satisfied that the problem is resolved, inform the customer. The customer may need to perform some tests to validate that the system now has complete functionality. The customer may need to ask you some pertinent questions to understand what the problem was and how you resolved it. It is important that the customer believes that the problem is totally resolved and may need to understand how this solution relates to what he/she first believed was the cause of the problem. When the true problem is resolved, the customers reported problem also needs to be addressed. The perceived issue is what the customer reported and expects to be resolved. This may involve an explanation of the actual cause and why it appeared to be something else, or running diagnostics on a component to assure the customer that the hardware is functioning properly, or letting the customer recreate the point of failure to assure all errors are gone. For example, a customer calls in with a memory issue and believes it is caused by bad RAM. The customer wants the bad RAM replaced. After some troubleshooting, you discover that the problem is actually software related; there is a badly written DLL file. You will need to explain or demonstrate to the customer that although it appeared to be a bad memory module, the malfunctioning application has a memory leak, which caused the symptoms. Explain until the customer is satisfied that the problem is adequately resolved. Explaining the true root cause of the problem will maximize customer satisfaction. The customer will not only be very satisfied that the problem is truly resolved, but will feel comfortable that he/she is dealing with a knowledgeable and reputable service provider. Ask the customer if he/she maintains a system logbook. If there is one, copy the problem, the time and date, the steps to the solution, and the results in it. Take this time to implement preventive measures. Fill out the necessary paperwork to complete the work order and pick up any spare parts that need to be sent back to HP or to your service center. Be sure to pick up all your tools, diskettes, and paperwork. Leave the on-site work area at least as neat as you found it. Follow up the service event with a telephone call to the customer. Check after a few days to see if the problem has reappeared. This provides you with the opportunity to resolve any new symptoms that may have appeared. Sometimes, resolving one issue produces a side effect that is not discovered until a particular application is run. It is impossible to run every conceivable test. Regardless if there is a new symptom, the customer will be pleased to receive a brief follow-up call from you.

Rev. 3.41

8 35

Servicing HP ProLiant Server Products

If the Problem Remains Unresolved


If you have tried all the possible solutions you can think of to resolve the problem and are unsure of what your next step should be, contact the appropriate Original Equipment Manufacturer (OEM) for technical assistance. Before you call, gather all your notes about the system to accurately answer the questions that the technical support engineers will ask you. The more you know about the system and the more open you are to receiving help, the quicker the problem can be resolved. If the problem cannot be resolved during that first telephone conversation, the technical support engineer may recommend a new action plan for you to try or may ask you to collect additional information. If the results of these steps do not resolve the issue, the technical support engineer may escalate the case to secondlevel support. Most companies have a hierarchy of technical support with support generalists and specialists. If technical assistance confirms that there is no resolution to the problem, determine the next step. Check with the OEM to see if there is a known workaround to the problem and if there are plans to correct this problem in the next release of the product. Many problems are labeled nuisance problems and if time permits before the product deadline, programmers resolve them. Other problems are unintentionally resolved when another change is made to the program. If the problem is with a particular OEM part, find out if there is another part either from that OEM or another that will work properly with the other key components. Regardless of what you find out, keep the customer in the loop. The customer may be able to live with a workaround until something better is found. The customer will be satisfied that you tried your best and will want to work with you in the future. Sometimes just leaving and doing something else is an effective approach. Once you have worked on a problem for hours or days and are tired and frustrated, you will simply not be in the best frame of mind for further troubleshooting. New ideas will often come to you when you are not thinking about the problem. Come back at another time and try again when you have a fresh perspective and some different ideas may come to you. Of course, this approach has to be approved by your service manager and the customer.

8 36

Rev. 3.41

HP Troubleshooting Methodology

Step 6-Implementing Preventive Measures


Implement corrective or preventive measures to avoid or reduce the likelihood of future occurrences of the problem just resolved. This is a good time to increase the fault tolerance of the system and to look at ways to minimize future system downtime. It also provides an opportunity to check the overall health of the system and to correct potential problems before there is a failure. A little time spent in taking preventive measures now will save downtime later. It is also an excellent opportunity to create a positive reputation and increase customer satisfaction and create the possibility for repeat business. A customer is likely to trust a recommendation from a technician or engineer. As a result, there is a higher possibility of the customer acting on a recommendation to upgrade or expand current systems. The following are common sense ways to prevent problems and to minimize downtime in the future:

Customer involvement High-availability features System management features Software management On-site spare parts Service offerings Preventive maintenance

Rev. 3.41

8 37

Servicing HP ProLiant Server Products

Customer Involvement
Recognize that the customer can be a tremendous asset in the prevention of problems. The cause of many problems is operator error, some type of human intervention, or lack of human intervention. The customer is already on-site and has a significant interest in preventing downtime and problems. Most customers would rather be proactive than reactive. Telling the Customer Where Error Information Can Be Found Most customers are willing to help if the action items are not time consuming. Some customers want to be as self-sufficient as possible and will pick up any tasks that will assist in this goal. Take a few minutes to show the customer where the error information is recorded on the system and how to read the IMD and error logs. This step empowers the customer to call and schedule service on a warranty prefailure and prevents a failed system condition. At the very least, if customers are instructed in where to look for error information, the next time they have a service issue they will want to engage your services and they will be able to provide accurate and useful data. Explaining the Resolution Explain the resolution to the problem that was just solved and write it down for the customer in clear and simple steps. This information will serve as a guide if the problem reappears. Depending on the complexity of the problem, the customer may be able to complete the steps or at least will be able to refer to it when calling for service the next time it occurs. Items the Customer Should Implement If the data on the system is important and the system does not currently have a scheduled backup routine, advise the customer of the necessity of implementing one. Suggest instituting a complete library of backups with off-site storage as part of a disaster recovery plan. Also, explain that the backups should be periodically tested to verify that they are functional and complete. A problem logging and resolution notebook is a helpful thing to have beside each system. When properly maintained, they provide a complete history of the system. Problems can then be categorized by failure types, such as hardware failure, operating system error, application error, user error, and malicious problems caused by virus programs or sabotage. Preventive maintenance actions should be recorded. Every hardware and software installation, modification, or removal should be written down as well. System configuration printouts, as well as utility diskettes can also be stored with the resolution notebook. These items can save a great deal of time in the future and ensure accuracy especially when dealing with future part replacement. If the problem involved the network cabling or IP addresses, suggest that the customer keep an up-to-date network topology map in an accessible location.
8 38
Rev. 3.41

HP Troubleshooting Methodology

If the system has little available hard drive disk space, suggest to the customer the need to periodically check for this and to perform routine file archive or removal of unneeded files. Also, suggest the available possibilities of expanding the system to accommodate more hard drive disk storage. If the customer has a DAT drive, explain the value of a scheduled cleaning program. If the customer has a DLT drive, describe what a dropped leader is and how to look for it. Also, explain the importance of tape cartridge labels placement. If it is placed on the exposed surface, it cannot fall off or become lodged inside the tape drive.

Rev. 3.41

8 39

Servicing HP ProLiant Server Products

High-Availability Features
If a server is capable of supporting fault tolerant options and they are not implemented, it may be because the customer is not aware that they are available. Fault tolerant features include:

Redundant power supplies Redundant fans RAID array controllers On-line spare drives for RAID arrays Duplexed SCSI controllers Redundant PPMs Redundant network adapters Off-line backup processors

All of these features are designed to minimize downtime. Check QuickFind for information on which features are available for the customers system. Automatic System Recovery (ASR) If it is not already set up, suggest using the Automatic System Recovery feature to restart a system after a critical hardware or software error occurs. This feature is especially useful when the error occurs while no one is onsite to service the system. ASR requires loading the HP Health Driver and enabling the Automatic System Recovery-2 (ASR-2) feature in the system configuration utility. If a critical error occurs, the system will record the error information in the System Health Logs, reboot the system, and page the system administrator. The system can be configured for automatic recovery or for attended local or remote access to diagnostic and configuration tools.

8 40

Rev. 3.41

HP Troubleshooting Methodology

System Management Features


Insight Manager, Remote Insight Option, and Integration System are all useful management tools. Suggest that the customer implement one or more of these tools to manage the system. A preventive measure for out-of-date drivers is to advise the customer about the version control feature in Insight Manager. This feature checks the versions of HP operating system drivers, Insight Agents, HP Utilities, and firmware on the system, compares them against a database of current software and firmware versions, and indicates whether an upgrade is needed and why. If the customer already has CIM set up, show the customer where to check for items that need updating.

NOTE: The version control update is available by downloading SoftPaq SP0965.exe. SoftPaq SP0965.exe is consistently updated to reflect the latest version.

If the customer has not already implemented an Integration System, explain the advantage of one. Integration System is a network system that acts as a repository of approved system software and configuration standards that can be implemented across distributed systems. Access to the latest software for the update of an Integration System is enabled through a dedicated HP Support Software System on the World Wide Web and through the HP SmartStart Subscription Service with periodic releases of CD updates. Through Integration System Maintenance in Insight Manager, the administrator can compare the latest software versions available via the Internet or CD to those stored on the Integration System and use the information provided to assess the need for any new versions. The administrator can then select the versions desired for download to the Integration System. Once the Integration System is updated, the new software is available for both new SmartStart installations and for update of production systems.

Rev. 3.41

8 41

Servicing HP ProLiant Server Products

Software Management
Keep abreast of operating system updates and patches. Many customers already do this, but doing so yourself may provide the extra edge needed to solve a problem or understand a conflict. It is vital to weigh the risk of implementing the change versus the added functionality it provides. It is advisable to test all changes first on a test system to check for functionality changes before implementing these changes on a production system. Ask the customer if any virus protection software is installed on the system, how long ago it was updated, and the frequency it is set to scan. If no virus protection software is installed or if it is out of date or infrequently used, advise the customer of the need to install it, keep it up to date, and use it. Even though macro viruses have grown exponentially in the last two years, boot-sector viruses still account for four out of the ten most common infections. Boot-sector viruses are a leading cause of corruption on systems running the Microsoft Windows NT operating system. Suggest subscribing to SmartStart to ensure that the customer has the latest drivers and utilities on-site. It also provides the necessary license to update Insight Manager to the latest version.

8 42

Rev. 3.41

HP Troubleshooting Methodology

On-Site Spare Parts


For problems involving failed hardware, such as a failed hot-pluggable hard drive, suggest to the customer the possibility of purchasing spare parts to have on-site. On-site spare parts provide convenience, decrease downtime, and provide an immeasurable amount of peace of mind. Spare parts to maintain on-site include SCSI controllers, hot-pluggable redundant power supplies, hot-pluggable fans, hot-pluggable drives, SCSI cables, and network adapters. On-site spare parts enable a customer or technician to take care of a pre-failure warranty or a failure with little or no downtime. Time is saved, sometimes days, that would otherwise be spent waiting for the needed part to arrive. It is vital to restock spare parts as they are used. For Example, if a hot-pluggable hard drive fails and a drive was pulled from a test systems array and used in the production system as a stopgap, when the spare hard drive arrives, remember to put it back in the production system. It is important to use the spare drive in the production system because it has fewer service hours on it and contains no array information.

Rev. 3.41

8 43

Servicing HP ProLiant Server Products

Service Offerings
Whenever you see a customers need for a service plan or contract that would be of use to him/her, advise the customer of the possibility of obtaining service level agreements to enhance their environment. This measure means that you will need to be familiar with the various service offerings and contracts that your service center offers. A timely suggestion of a service to fill a customers need can greatly increase customer satisfaction as well as increase business revenue for your company. An updated listing of warranty upgrades and service offerings (CarePaqs) is available at http://www.compaq.com/services/carepaq/. Service offerings include:

Spare equipment on-site Support contracts Operational Management Services

8 44

Rev. 3.41

HP Troubleshooting Methodology

Preventive Maintenance
Here are some suggestions for preventive maintenance measures that can prevent problems from occurring:

While you have the system cover off, take a few extra moments to get rid of any dust build-up with a can of anti-static air, tighten any loose connections, reseat boards, and inspect any cables for frays. Move the cables away from sources of heat and give them more slack if possible. Check for adequate airflow, and dislodge anything blocking the fans. Do not clean connectors with erasers. It removes the gold, causes static discharge, and leaves residue. If connectors need to be cleaned, use isopropyl alcohol or a special cleaning solution applied with a cotton-tipped swab. Make sure systems are not positioned tightly up against walls and that there is adequate space around them for proper airflow. Move magnetized office items such as magnetized screwdrivers and telephones with electromagnetic ringers away from the system. Advise the customer if you find any of these conditions: the system sharing a power line with high-current machines; e.g., laser printers, air conditioners, copiers, and coffee machines; ungrounded power strips; and outlets in need of repair. Check the adequacy of the power back-up system. Besides having Uninterruptible Power Supply (UPS) protection for the system, consider the power protection requirements for the hubs, bridges, routers, and gateways to avoid network functionality loss. Also, check that no UPS is overloaded. Before adding faster or larger hard drives make sure that a thermal upgrade or power supply upgrade is not necessary. The heat generated by some of the larger and faster hard drives may cause a thermal overload unless there are provisions for additional cooling. If a terminator board needs to be removed to add a processor board, give the board to the customer to store. Later, in the event that there is a processor problem, replace the failed processor board with the terminator board to keep the system functioning with the remaining processors until a replacement can be installed.

Rev. 3.41

8 45

Servicing HP ProLiant Server Products

Learning Check
1. What are the six steps of the HP Troubleshooting Methodology?

2.

What is prerequisite to data collection?

3.

List four techniques for collecting data.

4.

How could maintaining a field journal help you in the future?

5.

When should you use open-ended questions?

8 46

Rev. 3.41

HP Troubleshooting Methodology

6.

What is the main difference between the elimination technique and the minimum configuration technique?

7.

Which is more important, understanding the customers reported problem or understanding the true failure?

8.

What criteria should you consider when optimizing your action plans?

9.

Why should you involve the customer in preventive measures?

Rev. 3.41

8 47

Servicing HP ProLiant Server Products

10.

When evaluating results, you should never ask the customer to test the solution. True False

11. Are the results to a solution always immediately available and visible? Why or why not?

12. If executing the entire action plan did not solve the problem, what is the next step to try?

13. List at least three things you should do after solving the problem.

14. What are the most important considerations for troubleshooting?

15. It is safe to field repair monitors as long as you have been fully trained. True False

8 48

Rev. 3.41

Server Diagnostic Tools


Module 9

Introduction
Tools can serve multiple purposes in the diagnostic methodology. They can assist in the collection of data. They can assist in the evaluation process and in fault isolation between subsystems. They can be an essential part of implementing a troubleshooting action plan. This module covers features of various diagnostic tools and highlights their use in troubleshooting servers, focusing on how the various tools can serve the data collection process. This module assumes a basic knowledge of the tools. Therefore, only selected portions and advanced aspects of these tools are presented. Tools that are specific to ProLiant servers are also covered. The tool sections presented are those that are considered the primary sources for the information given in the section. While the suggested tool may not be the only source, it is the most efficient to use in finding the information. Topics include: HP Insight Diagnostics Survey Array Diagnostic Utility Insight Manager Remote Insight Lights-Out Edition Server Troubleshooting Guide Summary of resources and tools

Rev. 3.41

91

Servicing HP ProLiant Server Products

Objectives
To use server diagnostic tools, service personnel should be able to:

Describe and explain the use of the following diagnostic tools:


y y y y y y

HP Insight Diagnostics Survey Array diagnostics utility (ADU) Insight manager Remote Insight Lights-Out Edition ROM update utility

Select the correct tool for the task. Interpret the results of the tool reports. Use diagnostic tools for troubleshooting and preventive maintenance.

92

Rev. 3.41

Server Diagnostic Tools

HP Insight Diagnostics
SmartStart Home Page HP Insight Diagnostics are accessed from the SmartStart CD. They will also be available in online mode, accessible from the operating system. When the online version is available it will be distributed as a Softpaq. To access ProLiant server maintenance utilities, boot from the SmartStart CD and choose the Install button. You will see the SmartStart Home page which offers you the choice to Setup the server. Since the server has already been setup, you would not select this choice but would instead choose the Maintenance tab to gain access to the diagnostics and utilities on the SmartStart CD.

Rev. 3.41

93

Servicing HP ProLiant Server Products

SmartStart Maintenance Menu


When you click on the Maintenance tab you will see a list of utilities that are available to perform various configuration and diagnostic functions. The Server diagnostics utility is used to display configuration information about the server hardware and software as well as provide the means to test various system components and subsystems. Clicking on the launch server diagnostics utility link brings you to a screen which provides several options to inspect and test the server.

94

Rev. 3.41

Server Diagnostic Tools

Server diagnostics menu


Selecting the Server diagnostics utility from the Maintenance menu brings you to a sub menu with another set of tabs labeled Survey, Test, Status, Log and Help. The Survey tab is automatically selected when the screen displays and provides a detailed view of the system components. Survey is a utility that has evolved from a command line application to a web-based version and is now a key element of HP Insight Diagnostics that comes on the SmartStart 6.x CD. Survey is therefore available on all generation 2 and later ProLiant servers. You will notice that Survey has a list box that enables you to choose whether it displays a Summary or Advanced (detailed) view of the server components. This illustration shows a Summary view of the system components. You could also select any of several other displays including Architecture, Memory, Storage, etc. You could further choose to display All categories at once or an Overview.

Rev. 3.41

95

Servicing HP ProLiant Server Products

Survey utility example - memory By choosing the Advanced mode of Survey and selecting one of the components you can obtain detailed information about that component. In the example shown here you can determine a number of facts about the memory on the server system board. For example, four of the six DIMMs are populated with 256 MB each of ECC memory, for a total of 1 GB. You will also note that the system has been configured for online spare memory.

96

Rev. 3.41

Server Diagnostic Tools

Quick Test All Devices Selecting the Test tab provides you with a mechanism for testing the various server components and subassemblies. In the center of the screen you can see that there is a choice of testing All devices or specific individual devices. At the left of the screen you can then select the Type of Test and Test Mode. The choices of test type include Quick, Complete or Custom. The mode may be either Interactive or Unattended. In this example the system board DIMMs 1 and 2 are selected for a quick test.

Rev. 3.41

97

Servicing HP ProLiant Server Products

Test Status Choosing the test Status tab allows you to keep track of the test progress. In this example, a Noise test is in progress and a Cache test will be run next. The condition legend at the left shows you that the blue dot to the left of the test item indicates the status is unknown at this point.

98

Rev. 3.41

Server Diagnostic Tools

Test Log Choosing the test Log tab enables you to see the test results after the test is complete. Here in this example you will note that both tests were successful as indicated by the green icon before each test item.

Rev. 3.41

99

Servicing HP ProLiant Server Products

Integrated Management Log (IML) Also built into this suite of diagnostic utilities is the IML viewer which was previously a separate utility run under the operating system. The IML gives you details of error conditions generated during or after POST. Included are the severity, class, date, count and description of each error. A check box is provided to allow you to note when the condition causing the error message has been repaired.

9 10

Rev. 3.41

Server Diagnostic Tools

Insight Diagnostics Help A Help tab on the maintenance menu gives you useful information about the diagnostic utilities. In this example you can see a description of the theory of operation for Insight Diagnostics. You will note that the diagnostics can be run in both online and offline mode but that different information is available depending on the mode. A detailed description of Survey tells you about the sessions that enable you to note any changes that may have occurred during the interval between them. A description of the different types of test is also shown on this screen. Here you can see the differences between Quick, Complete and Custom test modes.

Rev. 3.41

9 11

Servicing HP ProLiant Server Products

Erase Utility
The system erase utility is another feature of the server diagnostics available from the Maintenance menu. Here you can choose to erase all drives, system NVRAM or CMOS on your server. Unlike the previous server erase utility, however, this utility does not erase the smart array controller NVRAM.

9 12

Rev. 3.41

Server Diagnostic Tools

Array Diagnostics Utility (ADU)


Because array controllers are complex devices, HP provides an Array Diagnostics Utility (ADU) to help administrators quickly identify such problems as an incorrect version of firmware, drives installed in the wrong order, inappropriate error rates, or a failed battery on the array accelerator board. The ADU displays a detailed analysis of the system configuration. If the cause of a problem is still not apparent, the ADU can generate a full report that administrators can fax or email to HP customer service for phone support. Beginning with SmartStart and Support Software Release 4.10, the Array Diagnostics Utility (ADU) Version 1.10 replaced the Drive Array Advanced Diagnostics (DAAD) Utility which is no longer included on the SmartStart CD and will no longer be updated to support future HP array controllers. Instead, the ADU supports new and future HP array controllers, as well as many older array controllers. ADU is designed to collect all possible information about the array controllers in the system and to offer as list of all detected problems. ADU issues multiple commands to the array controllers to determine if a problem exists. In most cases, enough information is provided to initiate problem resolution immediately. From the Insight Diagnostics Maintenance menu, you can choose to run the ADU and will see s screen like this one showing the results of the test. On the opening screen you will notice the controller(s) that ADU has detected. By selecting a controller from those listed, you can begin the process of problem diagnosis.

Rev. 3.41

9 13

Servicing HP ProLiant Server Products

ADU Sample Report


Array Diagnostic Utility Inspection Report Version 1.20 Revision A (Pass 2) USER ENTERED INFORMATION: The user can add text here Date: Time: Tuesday, January 11, 2000 7:24

Computer Model: ProLiant 5500 System ROM Version: 10/01/1997 SLOT SUMMARY: Slot Num Slot Type -------- --------Slot 3 PCI Array Controllers and Host Adapters Detected -------------------------------------------Smart Array 3200 Controller

SLOT 3 SMART ARRAY 3200 CONTROLLER ERROR REPORT: No problems detected

The first part of an ADU report gives:


Version Date Time System ROM Slot and controller identification Identifies installed array controllers and shows the slots in which they are installed. Unlike DAAD, only filled slots are shown.

9 14

Rev. 3.41

Server Diagnostic Tools

Subsystem Information
SUBSYSTEM INFORMATION: Chassis Serial Num: This Controller Array Serial Number: Cache Serial Number: Other Controller Array Serial Number: Cache Serial Number: D745BRZ10018 P165C0BBFH16VR P19200BBFH1C4E Not Available Not Available

Serial number information is available here for the server as well as the array controller.

CONTROLLER IDENTIFICATION: Configured Logical Drives: Configuration Signature: Adapter Firmware Revision: Adapter ROM Revision: Adapter Hardware Revision: Boot Block Version: Drive Present Map: External Drive Map: Board ID: Cable or Config Error: Non-disk map: Invalid Host RAM Address: CPU Revision: CPU to PCI ASIC Rev: Cache Controller ASIC Rev: PCI to Host ASIC Rev: Marketing Revision: Expand Disable Code: SCSI Chip Count: Max SCSI ID's per Bus: Big Drive Map: Big Ext Drive Map: Big Non-Disk Drive Map:

2 0xaceb0678 '3.08' '3.08' 0x01 '3.08' 0x0000000f 0x00000000 0x40320e11 0x00 (No) 0x00000000 No 0x75 0x03 0xff 0x01 0x41 (Rev A) 0x01 2 16 0x000f 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0080 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

The ROM and firmware revisions for the array controller are available here, as well as the number of logical drives and the maximum number of SCSI IDs per SCSI bus.

Rev. 3.41

9 15

Servicing HP ProLiant Server Products

Logical Drive Status


Logical Drive 1: Drive Status: Drive Failure Map: Blocks to Rebuild: Blocks Re-mapped: OK 0x00000000 0 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Replaced Drive Map: 0x00000000 Active Spare Map: 0x00000000 Spare Status Flags: 0x00 Spare to Replaced Map: See Big Spare to Replace Map: Replaced Marked OK Map: 0x00000000 Media Was Exchanged: No Cache Failure: No Expand Failure: 0x00 Unit Flags: 0x00 Big Remap Count: All Counts Zero Big Drive Failure Map: 0x0000 0x0000 0x0000 0x0000 0x0000 Big Replacement Drive Map: 0x0000 0x0000 0x0000 0x0000 0x0000 Big Active Spare Map: 0x0000 0x0000 0x0000 0x0000 0x0000 Big Spare to Replace Map: No spares have replaced any drives Big Spare Marked OK Map: 0x0000 0x0000 0x0000 0x0000 0x0000

0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

The information available here is very similar to the DAAD report.

Drive status Current condition of the logical drive. As shown, this logical drive is OK. Blocks to rebuild Blocks re-mapped Replaced marked OK map Media was exchanged Cache failure Expand failure

9 16

Rev. 3.41

Server Diagnostic Tools

Monitor and Performance Data


MONITOR AND PERFORMANCE DATA: SCSI Port 1, Drive ID 0 Factory: Serial #, Firmware Rev, and Mfg/Model #: 57 53 37 30 30 30 31 37 32 32 34 36 00 00 00 00 00 00 00 00 31 2e 35 32 00 00 00 00 43 4f 4d 50 41 51 20 20 57 44 45 32 31 37 30 53 20 20 20 20 20 20 20 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Since Power: Serial #, Firmware Rev, and Mfg/Model #: 57 53 37 30 30 30 31 37 32 32 34 36 00 00 00 00 00 00 00 00 31 2e 35 32 00 00 00 00 43 4f 4d 50 41 51 20 20 57 44 45 32 31 37 30 53 20 20 20 20 20 20 20 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Threshold Flags: 0x0000 Serial Number Control: 0x8054 Firmware Revision Control: 0x8248 Mfg/Model Number Control: 0x8268 Serv. Time Read Blks Hrd Read Rtry Read Factory 00000068 0000000001280560 00000000 00000000 Since Power 00000015 00000000004bbd98 00000000 00000000 Threshold ffffffff ffffffff ffffffff

WS7000172246.... ....1.52....COMP AQ WDE2170S ............ .... WS7000172246.... ....1.52....COMP AQ WDE2170S ............ ....

Control 8184 8108 8184 8184

Use this section to determine if a drive is in pre-failure or has failed. This section is important as it allows you to compare service times with various errors, and various errors with each other. SCSI port x, drive ID x:

Factory and since power Serial number Firmware revision ID data

Factory, since power, and threshold The Factory column is the one of main concern, as it holds all of the failures (or data) that have ever occurred with the drive. The Since Power column holds failures only since last poweron (or last driver load) and may register no faults depending on when the unit was last rebooted. Service The Factory service time is the number of minutes (in hexadecimal notation) that the drive has been in use. Use it to determine the age of the drive, as some faults may occur only after a certain number of hours. Also, compare this time with other drives showing similar faults. The faults may be false, or from another source (such as Insight Manager agents). Rtry read This entry should report all 0s.

Rev. 3.41

9 17

Servicing HP ProLiant Server Products

Monitor and Performance Data (cont'd.)


ECC Read Write Blks Hrd Write Rtry Write Seeks Seek Errs Spin Cyls Spin Time Test 1 Test 2 Test 3 Test 4 Spare Blks Re-mapped DRQ Tmots Timeouts Rebuilds 0000000000000000 0000000000000000 ffffffffffffffff 8188 0000000000112c78 00000000000385f8 8108 00000000 00000000 ffffffff 8184 00000000 00000000 ffffffff 8184 0000000000001e11 0000000000000160 8108 0000000000000000 0000000000000000 ffffffffffffffff 8188 00000007 00000001 ffffffff 8184 0000 0053 ffff 8282 ffff ffff ffff 0a82 000f 000f ffff 8282 0054 0054 ffff 8282 0099 0099 ffff 8282 ffffffff ffffffff 0a04 00000022 00000000 ffffffff 8584 ffff ffff ffff 0982 0000 0000 ffff 0182 0002 0001 ffff 0182

ECC read Note that this field shows one error. However, when comparing this with the Service time, it is not found to be an issue as the drive has been in service for quite some time. Rtry write This field should report all 0s. Seek errors This field should report all 0s. An error here would indicate a definite hardware problem with the drive. Spin time The Factory spin time column should always be lower than the Threshold column. Re-mapped The number of sectors that needed to be remapped due to being bad. An increasing number of remapped sectors indicates that the drive should be replaced. Note that there is a threshold number, and it may vary with different types of drives. Timeouts These timeouts occur when the system tries to access the drive, and should rarely (if ever) happen. A relatively small number in relation to the service time of the drive in not a problem, but if the number continues to increase, or occur on some of the other drives, see errors are also listed in other places. Rebuilds This number will increment if: A drive fails, is removed and reinstalled, being rebuilt in the process. A failed drive is replaced. The new drive will be rebuilt and the counter incremented. A rebuild occurs on the specified drive ID.

9 18

Rev. 3.41

Server Diagnostic Tools

Monitor and Performance Data (cont'd.)


Spn Retrs ffff Fl Rd Recv 0000 Fl Wt Recv 0000 Format Err 0000 POST Err ffff Drv Nt Ry 00000000 Reallc Abt ffffffff IRQ Gltchs ffffffff Bus Flts 00000000 Hot Plgs 00000001 ffff 0000 0000 0000 ffff 00000000 ffffffff ffffffff 00000000 00000000 ffff ffff ffff ffff ffff ffffffff ffffffff ffffffff ffffffff ffffffff 0982 8182 0182 0182 0982 0184 0984 0984 8184 0184

Bus flts Double-digit bus faults are generally acceptable, but triple-digit ones are not.

Bus faults should have been an interesting statistic, but unfortunately system software (Insight Manager Agents) are responsible for most of them. If you suspect real SCSI bus problems, compare this value for drives that have similar service times. If the value is the same for these drives, the bus faults were probably due to Insight Manager agents. If the values differ between drives with similar service time, then look at Bd tgt cnt (bad target count). A bad SCSI bus can also result in target (drive) selection errors that will show up there.

Hot plgs The hot-plug counter acts as a re-insertion counter. If a drive failed, was pulled out, and reinstalled, the counter would then increment. If a new drive were put in instead, its counter would not change.

Rev. 3.41

9 19

Servicing HP ProLiant Server Products

Monitor and Performance Data (cont'd.)


Tk Rwt Err ffff Rmp Wt Err 0000 Bg Fw Rev 0000000000000000 Med Flrs 0000 Hrdw Errs 0000 Abt Cmd Fl 0000 Spn Up Fl 0000 Bd Tgt Cnt 0000 Pred Fails 00000000 00000000 ffff 0000 0000000000000000 0000 0002 0000 0000 0000 00000000 2184 ffff ffff ffff ffff ffff ffff ffff 0982 0182 0a48 0182 0182 0182 0182 0182

Hrdw errors This field should report all 0s. In the sample, two errors are shown. These errors could be either transient errors or actual hard errors. Those shown were possibly caused by heat problems or power fluctuations. However, as the number increases, so does the likelihood that the drive actually has hard errors. Bad tgt cnt (bad target count) This entry could be an indication of a SCSI bus signal integrity problem. However, we have some drives that we suspect should not be in the system due to thermal and (possibly) power supply considerations. It is possible that they could be a factor here as well. Compare this entry with the Bus flts entry. A bad SCSI bus can result in errors being shown in both the Bus flts and Bad tgt cnt fields.

9 20

Rev. 3.41

Server Diagnostic Tools

Error Log Data


DRIVE ERROR LOG: Error Log Header: Parameter Length Entry Size Current Entry Total Errors Logged Error Log Data: SCSI Stat ---00 00 00 00 02 00 02 02 02 CAM Stat ---0f 0a 0a 0a 04 0a 04 04 04 Sense Key ----00 00 00 00 01 00 02 01 01 Sense Code ----00 00 00 00 18 00 04 10 10 Qual ---00 00 00 00 01 00 01 00 00

= = = =

0x14 0x0014 0x09 0x00000009

Block(Vl) --------001d77e0(0) 001d77e0(0) 00000000(0) 001d77e0(0) 003d1be2(1) 003d9740(0) 00000000(1) 003dc0c0(1) 00396dd7(1)

Time ---000000dd 000000dd 000000dd 000000dd 0000012b 0000012b 0000012b 00000137 0000014f

Op -28 28 00 28 28 2a 2a 28 28

Info ---0000 0000 0000 0000 0000 0000 0000 0000 0000

SCSI port x, drive ID x:


Total errors logged Codes: Error SCSI stat SCSI CAM Sense key Sense code/qual/block Time The time stamp for the error. Use this entry to compare the time stamp on other drives with the same or related errors to find causes that may be external to the drive (thermals, bus errors, and so on). Op Info

Rev. 3.41

9 21

Servicing HP ProLiant Server Products

Accelerator Status
ACCELERATOR STATUS: Logical Drive Disable Map: Read Cache Size: Posted Write Size: Disable Flag: Status: Disable Code: Total Memory Size: Battery Count: Battery Status: Parity Read Errors: Parity Write Errors: Error Log: Failed Batteries: Board Present: Accelerator Failure Map: Max Error Log Entries: NVRAM Load Status: Memory Size Shift Factor: Non Battery Backed Memory: Memory State: 0xfffffffc 28672 KBytes 28672 KBytes 0x00 0x00000001 0x0000 57344 KBytes 3 0x0007 0000 0000 N/A 0x0000 Yes 0x00000000 16 0x00 0x00 0 KBytes 0x00

Read cache size The amount of the controllers cache dedicated as a read cache. This value is configured in the Controller Settings - Accelerator Ratio in the Array Configuration Utility. Posted write size The amount of the controllers cache dedicated as a posted-write cache. It is configured in the same way as the read cache. The read and write caches are mutually exclusive. Total memory size Total memory (cache) on the controller. Battery count Number of batteries on the controller. Battery status Parity read/Parity write errors Any errors accessing the cache will be indicated here. Failed batteries (number of)

9 22

Rev. 3.41

Server Diagnostic Tools

Physical Drive Identifying Data


PHYSICAL DRIVE IDENTIFICATION: SCSI Port 1, Drive ID 0 Vendor Id: Product Id: Product Rev: Vendor Specific: Serial Number: SCSI Inquiry Header: Device Supports: COMPAQ WDE2170S 1.52 122CWS7000172246 WS7000172246 00 00 02 02 fa 00 00 3e Tagged Command Queueing Linked Commands Synchronous Data Transfer 16-bit Wide Data Transfer Block Size: 512 bytes/sector Total Blocks: 4110000 sectors/disk Reserved Blocks: 1088 reserved sectors/disk SCSI Inquiry Bits: 0x3E Stamped for M&P: yes Last Failure Reason: 04h - Fail drive command issued Phys Drive Flags: 0x4d 0x25 0x00 Drive present and operational Wide SCSI transfers Enabled Ultra SCSI Enabled S.M.A.R.T. Supported S.M.A.R.T. Enabled Configured as part of Logical Drive SCSI LUN: 0 Spi Speed Rules: 0x00000000 Physical Connector: J3 (controller connector attached to drive) Physical Bay in Box: 0 (number of the physical drive bay in the enclosure) MODE SENSE: Header: Page 01: Page 02: Page 03: Page 04: Page 08: Page 09: Page 0a: Page 0c: Page 1c: Page 00: b3 81 82 03 00 84 00 88 00 89 8a 8c 00 9c 80 00 0a 0e 16 3a 16 00 12 00 0e 06 16 00 0a 0a 10 c4 00 04 00 00 00 00 00 00 00 c0 00 01 01 08 ff 00 74 61 16 00 00 00 00 10 00 00 04 41 00 30 00 02 40 58 1c 00 00 00 00 00 00 00 3e 00 00 75 00 04 20 00 00 00 14 00 00 00 b6 00 00 00 00 00 00 ff 00 00 00 10 00 00 b0 00 00 00 00 00 00 ff 00 00 00 08 00 2d 00 ff 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 c1 02 00 00 00

00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 40 ec 62 00 00 00 01 00 00 00 00

The entries here are self-explanatory and are mentioned for information-gathering purposes only. SCSI port x, drive ID y:

Vendor ID Product ID Product revision Serial number Drive capacity Device supports (lists supported features) Physical drive flags (such as SMART Enabled)
9 23

Rev. 3.41

Servicing HP ProLiant Server Products

Bus Parameters: SCSI Port x


SCSI BUS 1 PARAMETERS: Inquiry Data Valid: Inquiry Header: Vendor Id: Product Id: Product Rev: Installed Drive Map: Hot Plug Counts: All counts are zero Fan Alert Count: Alarm Status: Temperature Status: Valid Alarm Bits: Alarm Count: Specific Counts: Connection Info: SCSI Device Rev: Fan Status: More Inquiry Data: 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCSI Device Type: Bus Bitmap: Interrupt Count: Ultra Bus Faults: SCSI Initiator ID: SCSI Target ID: Physical Connector: Big Inst Drive Map: Big Bus Map: More Connection Info: Yes 03 00 02 02 21 00 00 00 COMPAQ PROLIANT 4-7I JB34 0x0000000f 0x0000 0x00 (No Alarms) 0x00 0x03 0000 00000 00000 00000 00000 00000 00000 00000 00000 0x110a 0x02 0x110a 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 ................ Unknown 0x0000007f 00000000 0x00000000 7 7 J3 (controller connector attached to drive) 0x000f 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0xffff 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x02 (Single-Ended SCSI bus enabled)

This portion of the ADU report gives general information concerning the device attached to the designated SCSI port:

Installed drive map Hot plug counts (by drive ID) Fan alert counts Alarm status SCSI device type Ultra bus faults

9 24

Rev. 3.41

Server Diagnostic Tools

ADU Reader
The ADU Reader website is accessed at http://stain.cca.cpqcorp.net/. The Reader menu provides you with a sample report that you can save to a file and then submit for analysis. The menu also has a Help feature that provides access to information about error conditions and report terminology. Another feature of this utility is the Drive Model decoder. Simply feed it the model number and you will get back a display of information about the drive including option and spare part numbers. To use the Reader for report analysis, direct it to the drive location containing the ADU text report previously generated and select Analyze Report. A sample analysis is displayed on the following slides.

Rev. 3.41

9 25

Servicing HP ProLiant Server Products

Analyzing an ADU Report Controller Info The first part of an ADU report provides User Information that includes the date and time of the report, the server model and its system ROM version. Controller Information is the next section and contains information on the type of controller and the slot in which it is installed as well as its firmware revision and the number of logical drives associated with it. The next section is the Controller Error Report. Here you would see a list of errors if any were found. In this example, none were detected.

Analyzing an ADU Report Drive Statistics Drive Statistics provide information on a number of parameters including: Service Time - The Factory service time is the number of minutes (in hexadecimal notation) that the drive has been in use. Use it to determine the age of the drive, as some faults may occur only after a certain number of hours. Also, compare this time with other drives showing similar faults. The faults may be false, or from another source (such as Insight Manager agents). Read Blocks - Number of sectors read as requested. Write Blocks - Number of sectors written to media. Seeks - Number of seeks. Spin Cycle - Number of spin-up cycles. Spin time - Spin-up time in tenths of the second.

9 26

Rev. 3.41

Server Diagnostic Tools

Re-mapped - The number of sectors that needed to be remapped due to being bad. An increasing number of remapped sectors indicates that the drive should be replaced. Note that there is a threshold number, and it may vary with different types of drives. Rebuilds - This number will increment if: z A drive fails, is removed and reinstalled, being rebuilt in the process. z A failed drive is replaced. The new drive will be rebuilt and the counter incremented. z A rebuild occurs on the specified drive ID. Hot-plugged - The hot-plug counter acts as a re-insertion counter. If a drive failed, was pulled out, and reinstalled, the counter would thenincrement. If a new drive were put in instead, its counter would not change.

Rev. 3.41

9 27

Servicing HP ProLiant Server Products

Analyzing an ADU Report Drive Error Logs

Analyzing an ADU Report Miscellaneous Info

9 28

Rev. 3.41

Server Diagnostic Tools

Analyzing an ADU Report Event Log

Rev. 3.41

9 29

Servicing HP ProLiant Server Products

ROM Update Utility - Express

ROM Update Utility - Custom

9 30

Rev. 3.41

Server Diagnostic Tools

Insight Manager
Next generation of HP management technology Insight Manager 7 represents the next generation of HP management technology. It incorporates the strengths of Insight Manager (Win32) and Insight Manager XE, and delivers new functionality designed not only to help diagnose system fault, performance and configuration management, but to also facilitate system software maintenance throughout the server life-cycle. Easy to set up and use Insight Manager 7 is easy to set up and use. It may be installed on EVO or Deskpro systems running Microsoft Windows 2000 Professional or Windows XP (with Service Pack 1) or on ProLiant servers running Microsoft Windows NT 4 Server, Windows 2000 Server, Windows 2000 Advanced Server, Windows 2000 Professional, or Windows .NET Server 2003. Accessible through Microsoft Internet Explorer Insight Manager 7 is accessible through Microsoft Internet Explorer and provides seamless access to the HP Insight Management Agents, the Integrated Lights Out and Remote Insight Lights Out Edition. With Insight Manager 7, critical management information is available from any location accessible via a LAN, WAN, or secure remote connection, so systems administrators have the information and tools that they need when they need them. Capable of managing a wide variety of systems Insight Manager 7 is capable of managing a wide variety of systems. It manages HP servers, clusters, desktops, workstations, and portables. It also manages nonHP devices instrumented to the Simple Network Management Protocol (SNMP) or the Distributed Management Interface (DMI.) Insight Manager 7 is the perfect management tool for customers with heterogeneous management needs. Provides Secure Socket Layer (SSL) encryption Finally, Insight Manager 7 provides Secure Socket Layer (SSL) encryption for data privacy as well as user administration and authentication integrated with local, NT domain, or Windows 2000 Active Directory accounts. It also makes extensive use of RSA Public Key technology to ensure that only authorized users can take advantage of sensitive and potentially data-destructive features. HP wants to ensure that powerful security goes hand in hand with powerful management functionality.

Rev. 3.41

9 31

Servicing HP ProLiant Server Products

Insight Manager 7 Systems Management Architecture


Insight Manager 7 leverages a distributed architecture that may be broken into three layers.

Insight Manager 7 Management Server Management Agents Web-browser User Interface

Insight Manager 7 Management Server The Insight Manager 7 Management Server sits at the center of the systems management architecture. It aggregates fault, asset, performance, and configuration data from all discovered systems attached to the network. It is also responsible for management tasks conducted against groups of servers, such as SNMP status polling, e-mail and paging notification, and system software update. Finally, Insight Manager 7 discovers and provides linkages to management applications that run at the agent layer. This enables users to access all management capabilities available to them from a single point of access.

9 32

Rev. 3.41

Server Diagnostic Tools

Management Agents and Management Processors Management Agents are applications that typically run on each server within the managed environment. Examples of management agents are the HP Insight Management Agents, the Survey Utility, the Version Control Agent, and the Version Control Repository Manager. Agents perform numerous functions including fault and performance management, configuration management, system software version control and update, policy-based fault recovery, and cluster management. A management processor provides hardware-based, remote management and administration capabilities for individual servers. The Remote Insight Lights-Out Edition is an example of a management processor. Through its graphical remote console, users may take full control of servers located in secure data centers or servers located in remote offices with no dedicated IT staff. The Remote Insight Lights-Out Edition also provides the ability to remotely power on and power off servers. Web Browser User Interface The Web-browser serves as the primary user interface for all HP management products. HP has chosen to Web-enable its products in order to offer systems administrators flexibility and mobility. Because users are not tied to a specific management console, they are free to manage from any location with a LAN, WAN, or secure remote connection.

Insight Manager 7 Features


Insight Manager is a client/server application used to remotely manage hardware in a network environment. Insight Manager Reports hardware fault conditions (both failure and pre-failure) Collects data for reporting and graphing Provides remote control of a server Uses SNMP to send traps and retrieve data from a client Insight Manager 7 represents the convergence between Insight Manager (Win32) and Insight Manager XE. Insight Manager 7 provides rapid access to detailed fault and performance information gathered by the HP Management Agents. Insight Manager 7 also introduces powerful new functionality for System software maintenance Configuration of management parameters across groups of managed devices. For more information, visit the HP systems management website at http://h18013.www1.hp.com/products/servers/management/cim7-overview.html
Rev. 3.41

9 33

Servicing HP ProLiant Server Products

Insight Manager 7 Features


Insight Manager 7 comes equipped with the following capabilities. Users may choose to use some or all of these capabilities depending on the size of their network, the make-up of devices under management, and the geographical distribution of managed devices. Auto Discovery and Identification Users may configure Insight Manager 7 to automatically discover and identify HP servers, desktops, workstations, and portables as well as other SNMP and DMI instrumented devices attached to the network. Users desiring greater control over the set of discovered devices may use manual discovery features. Discovery Filters Discovery filters allow users to exercise control over the type of devices added to the Insight Manager 7 database. By enabling discovery filters and selecting which devices should be discovered (servers, clients, management processors, etc.), users are able to prevent Insight Manager 7 from discovering devices that are either unmanageable or that fall outside of the systems administrators responsibility. Security Insight Manager 7 provides secure access to management information. Browsing is accomplished using Secure Sockets Layer (SSL) to encrypt all communications between Microsoft Internet Explorer and the management server. Insight Manager 7 also leverages local, NT domain, and Windows 2000 Active Directory accounts for user administration and authentication. Users log in to Insight Manager 7 using their existing Windows user name and password. Single Login Insight Manager 7 will automatically authenticate users to HP management applications running on managed devices. This eliminates the need to reauthenticate to the HP Insight Management Agents and other management applications to perform tasks that require administrator level privileges. Insight Manager 7 Home Page The Insight Manager 7 home page provides links to the most frequently used features within Insight Manager 7 along with contextual explanations of those features. It allows users to select a set of devices or events that they would like to view when first browsing into Insight Manager 7. This device or event query is customizable on a per-user basis. It also allows users to perform a device search in order to locate a particular device without searching through lengthy device lists. Multiple System Version Control and System Software Update Insight Manager 7 in conjunction with the Version Control Agent and Version Control Repository Manager introduces a new architecture for version managing
9 34
Rev. 3.41

Server Diagnostic Tools

and updating HP system software. The software maintenance architecture allows customers to version manage HP system software based on internally established baselines. It also allows customer to distribute BIOS, driver, and management application updates to multiple servers through a single software update task. Remote Insight Integration Insight Manager 7 discovers all Remote Insight Lights-Out Edition (RILOE) boards, Remote Insight Light-Out Edition II (RILOE II) boards, and Integrated Lights-Out (iLO) management processors running in the managed environment. Users may access RILOE, RILOE-II or iLO from the Insight Manager 7 home page by clicking on the status icon in the management processor column. RILOE provides functionality such as graphical remote console, Virtual Power, and Virtual Floppy Drive. To learn more about the Remote Insight Lights-Out Edition, visit the HP Management website at http://www.compaq.com/manage Blade Server Visual Locator Insight Manager 7 provides blade server visualization that pinpoints the exact position of blade servers within their enclosure and rack. It also correlates alerts generated by shared infrastructure elements, and associates the Remote Insight Integrated Lights-Out and Integrated Administration management processors with the servers that they manage. Queries and Tasks Insight Manager 7 queries and tasks enable group management of HP servers and other devices connected to the network.

Queries are device or event groups based on user-defined criteria (for example, all servers, all important events, all servers running Windows 2000, etc.) Insight Manager 7 automatically updates all queries as new devices are added to the managed network and as new events are saved in the database. Tasks are operations, such as Software Update or SNMP Status Polling, performed against groups of managed devices. All tasks are based on queries and therefore self-updating. When a new device is added to the managed network it will automatically be added to the appropriate set of tasks. All tasks, including Group Configuration and System Software Update, may be scheduled to happen either immediately, periodically, or at some specified time in the future.

Group Configuration This feature enables administrators to change important configuration settings on groups of ProLiant servers. For example, users may use Group Configuration to change SNMP Settings, Management Agent passwords and security settings, and Version Control Agent settings across multiple devices.

Rev. 3.41

9 35

Servicing HP ProLiant Server Products

EMail and Paging Notification Insight Manager 7 provides the ability to send both email and paging notifications based on the receipt of a specified event or a change in device status. This gives systems administrators unrestricted mobility and removes the need for constant monitoring of a management console. Cluster Monitor The Cluster Monitor provides enhanced monitoring capabilities for Microsoft Cluster Server (MSCS), Tru64 UNIX, OpenVMS, SCO UnixWare 7, NonStop clustered servers running on ProLiant and AlphaServer systems. The Cluster Monitor navigation pane displays all discovered clusters and the data pane displays detailed information regarding CPU and disk utilization as well as environmental status on individual cluster nodes. Cluster Monitor will discover and link to the Intelligent Cluster Administrator to allow systems administrators to manage cluster policies, to take cluster resources on and off line, and to replicate cluster settings across multiple MSCS clusters. Reporting Insight Manager 7 provides inventory-reporting capabilities. Through a simple report creation wizard, you are able to display asset information across groups of servers. Asset information includes CPU, disk, memory, system, option boards, system software information, and operating system data. In addition to generating default reports, you can create customer-defined report configurations, edit report configurations, and delete report configurations. Insight Manager 7 allows exporting of inventory reports in CSV format for easy importing into most well known reporting tools. You also have the option to save the source of reports and import the resulting text file into tools such as Microsoft Excel. All users with login access to Insight Manager 7 will have the ability to generate reports. Language Support Insight Manager 7 may be installed on English, French, German, Spanish, and Japanese versions of Microsoft Windows 2000 Professional, Windows XP Professional, Microsoft Windows NT Server version 4, Windows .NET Server 2003 Standard and Enterprise Edition, Microsoft Windows 2000 Server and Microsoft Windows 2000 Advanced Server. Database support also extends to English, French, German, Spanish, and Japanese. Service Integration Insight Manager 7 integration with Intelligent Service Link software provides automatic, secure reporting of service events for systems under service contracts directly to HP Customer Support Centers or qualified service providers.

9 36

Rev. 3.41

Server Diagnostic Tools

What's New in Insight Manager 7 Service Pack 2


Insight Manager 7 Service Pack 2 introduces a number of important new capabilities designed to improve usability and to deliver stronger integration with other ProLiant Essentials software. Integration with ProLiant Performance Analyzer (PPA) The ProLiant Performance Analyzer is the centerpiece of the new ProLiant Essentials Performance Management Pack. It helps systems administrators isolate potential and actual performance bottlenecks by analyzing system hardware for suboptimal configurations and by monitoring key performance counters for irregularities. Insight Manager 7 Service Pack 2 will receive performance events from PPA and launch PPA from the performance column of the device list. The ProLiant Performance Analyzer will install by default with Insight Manager 7 Service Pack 2. Favorites This feature of Insight Manager 7 allows users to construct nested collections of queries that represent geographical, business, or functional groupings. Favorite groups aggregate the status of all devices within each level of the nested hierarchy allowing systems administrators to quickly isolate problems without sorting through large device lists. The status has two parts-a primary status indicator, and a separate indicator for critical-status devices present. Favorites are represented by My Favorites under Devices>Queries. Netserver Data Collection and Inventory Reporting Insight Manager 7 Service Pack 2 delivers enhanced data collection and inventory reporting for HP Netservers. Now customers will be able to produce inventory reports for the Netservers like the do for their ProLiant servers. Event Assignment Insight Manager 7 Service Pack 2 allows systems administrators to assign responsibility for events to individuals on their IT staff. The comment area allows IT staffers to input information related to the event resolution for auditing purposes and for future reference. Integrated Database Installation Insight Manager 7 provides a number of enhancements designed to improve ease of use. The Microsoft Desktop Engine (MSDE 2000) installation is now integrated into the overall Insight Manager 7 installation, simplifying initial setup for users who wish to deploy the application on a desktop or manage a small number of systems. The installation process also makes it easy to connect to a remote or local Microsoft SQL Server database.

Rev. 3.41

9 37

Servicing HP ProLiant Server Products

Insight Manager 7 Home Page


Insight Manager 7 Home page is designed to provide information management and key functionality at a glance. It is the first page that you will see after logging in to Insight Manager 7. The Insight Manager 7 Home page provides links to the most frequently used features within Insight Manager 7 along with contextual explanations of those features. It allows users to select a set of devices or events that they would like to view when first browsing into Insight Manager 7. On the Insight Manager 7 SP2 Home page, notice the differences from the previous version: The HP logo, the Quicklinks order and tabbed User Interface. Click the Home tab on the top left corner of the screen and you will notice that the Resource Center Quicklink on the bottom right of the screen becomes a Did You Know? box temporarily.

Device Status bar Click an underlined number link to view the devices with the associated status. The red, orange, and yellow color-coded status blocks indicate the general health of your network. Uncleared Events bar Click an underlined number link to view a list of uncleared events with Major, Minor, or Critical status. The red, orange, and yellow color-coded status blocks indicate the general health of your network.
9 38

Rev. 3.41

Server Diagnostic Tools

Device Search When the home page loads, the cursor is positioned in the Device Search field. Enter the name of the device that you would like to be found. The Device Search feature allows you to quickly retrieve details about a device using its name. Click Search, to locate the indicated device. If an exact match is found, the device page is displayed for that device. If an exact match is not found, the device page displays a list of devices in the database whose names closely resemble the target name. This list of device names will be a hyperlink and clicking a name in the list brings up the device page for that device. If no devices in the database resemble the target device, the device page will indicate the device was not found.
Note

The search field only allows the following characters to be entered: letters, numbers, tilde, dash, period, underscore, apostrophe, and space.

Results from Query The first time that you log in to Insight Manager 7, the Results From Query section displays the All Server query results. However, you may customize this section by clicking the Configure Me! link, which is located on the Results from Query bar. The Configure Me! link allows you to view only the devices or events in which you are interested. The query results will include the presence of the Actions menu allowing you to create new queries and tasks, print the query results list, delete devices or events, create reports, ping devices, assign user to events, add comments to an event, or clear events. The query results window also includes: a View menu that allows you to choose between a Details view and an Icon view; the ability to sort the query results by column; and the ability to choose what columns you would like to view in the query results table. All columns can be dragged and dropped to any location in the Results from Query section Devices and Events The Device and Events box explains the difference between devices and events. This box contains a hyperlink to the Overview page, which displays Device Status and Uncleared Event Status. You can also reach the Overview page by clicking the Devices tab from the toolbar. Click the Reports hyperlink and the Reports page is displayed. From here you can Create/Run New Report or use an existing report. You may also reach the Reports page by selecting the Devices tab and then clicking Reports.

Rev. 3.41

9 39

Servicing HP ProLiant Server Products

Queries The Queries box provides an explanation of queries and provides separate links for devices, events, clusters and favorite queries. By clicking devices, events or clusters, the Queries page will be displayed for whichever link that you choose. You will be able to view your own personal queries along with the other queries that you have access to. By clicking favorite queries, the Configure Folders page is displayed listing your folders. Tasks The Tasks box provides a link to the Tasks page by clicking the Task link. It also provides links to example tasks. By clicking an example task, the Create/Edit Task page is displayed for the chosen task. The Tasks box is displayed only if you have operator or administrator rights. Resource Center The Resource Center box offers links to management-related websites at www.hp.com Administration The Administration box allows you to fine-tune Insight Manager 7 for your environment. The links provided here are to the Automatic Discovery page, the Discovery Filter Configuration page, the Accounts page, and the Protocols page. You may also reach these pages by clicking the Settings icon from the toolbar. Additional messages may display in this section if you have not initiated Discovery. The Administration box is displayed only if you have administrator rights.

9 40

Rev. 3.41

Server Diagnostic Tools

Insight Manager 7 Devices - Overview


The Overview page displays the current Device Status and Uncleared Event Status. Click any number link to view the details of the status. The link executes a query that displays the results of the query.

As you navigate through the console you will be using some of the most widely used features of Insight Manager 7. It provides access to a list of devices defined by a pre-defined or a custom query, and allows users the ability to search for devices. See the table below for more details.

To Navigate in the Console


Tabs Select the Home, Devices, Tools, or Settings tabs to display the menus for the associated functions. The menus are displayed in the left menu frame. Some expand to reveal submenus. Click an underlined number link to view the devices with that status. The red, orange, and yellow color-coded status indicates the general health of your network. Click an underlined number link to view uncleared events with Major, Minor, or Critical status. The red, orange, and yellow color-coded status indicates the general health of your network. Click the logo to open the site http://www.HP.com. Click the text Support to open a page of links to various HP support and information sites. Click the link to exit.

Device Status

Uncleared Events HP Logo Support Logout Link

Rev. 3.41

9 41

Servicing HP ProLiant Server Products

Insight Manager 7 Settings -> Automatic Discovery


The Automatic Discovery page is displayed by selecting the Settings tab. The figure below displays the following:

Expanded and collapsed lists in the menu frame Larger edit fields for functions, such as IP Address Range Help icon Edit fields for entering settings, such as Retries and Timeout Buttons that initiate an action, such as Execute Discovery Now Submenus indicated by a plus/minus button next to the menu

The Discovery Process Discovery is the process of finding and identifying a device at a specific address on the network (IP or IPX), and collecting information about that device. Insight Manager 7 discovers and identifies devices on your network and maintains a database of the information. You can run discovery at any time from the Automatic Discovery page and set your own schedule. You must visit this page at least once to set the initial range used for Discovery before the discovery process can begin.

9 42

Rev. 3.41

Server Diagnostic Tools

Insight Manager 7 Devices -> Queries


A Device Query logically groups devices into a collection based on information in the Insight Manager 7 database. After a query is defined, you can display the results from the Device Queries page or associate it with a management task. You can save an edited or an unedited query as a query with another name. Creating logical groups of devices reduces the number of devices viewed in a particular device query. For example, your organization might have five system administrators who are responsible for 100 different devices in six different buildings. You can create a query for each administrator that includes only their devices, or you can create a query for each building that includes only the devices located in a particular building. Queries are listed by section and by category. In addition to using the queries provided by Insight Manager 7, you can also create, edit, or delete queries, or create categories of queries under each section of queries. Queries must follow specific query naming rules. Complex queries that contain individual device selections or numerous selection criteria take more system resources to run. Keep the query as simple as possible to minimize the performance impacts of individual tasks.

Rev. 3.41

9 43

Servicing HP ProLiant Server Products

Remote Insight Management


HP currently has three products capable of managing servers remotely:

Remote Insight Lights-Out Edition (RILOE) Remote Insight Lights-Out Edition II (RILOE II) Integrated Lights Out (iLO)

RILOE and RILOE II are PCI-based options that provide remote server management capability. RILOE II has replaced RILOE and offers enhanced performance and greater functionality. Starting in early 2002, HP began to integrate Remote Insight Lights-Out capabilities into ProLiant servers. The ProLiant DL360 G2 was the first server to offer Integrated Lights-Out (iLO), the next generation of HP's technology integrated directly into the server architecture. Standard iLO provides a text-based interface to the customer as an integral part of the server at no extra charge. Advanced iLO has a graphical interface and is available to the customer through the purchase of a license key. This module will focus on the RILOE II and iLO products.

Remote Insight Lights-Out Edition II (RILOE II)


RILOE II Features Provides the same lights-out features already available on RILOE option card plus: Higher performance remote console Higher performance graphical remote console capability enables you to work as efficiently and as effectively as you would in front of the server using a keyboard or mouse, and it offers the potential of reduced cabling and equipment costs . Pocket PC access New Pocket PC access for anytime, anywhere access and control of ProLiant servers through an iPAQ Pocket PC. Virtual CD or floppy Breakthrough Virtual Media capabilities now include the ability for users to virtually insert CD or floppy media from a remote location 128-bit encryption Enhanced security now includes 128-bit encryption for the Remote Console connection

9 44

Rev. 3.41

Server Diagnostic Tools

Up to 25 users User administration features provide the capability to define up to 25 users with customizable access rights Supports .NET Fully compatible with all ProLiant DL and ML servers and now supports Microsoft .NET (when available from vendor) and SuSE Linux operating systems Direct access to EMS console Integration with Microsoft .NET allows access to EMS console directly from the RILOE II user interface

Rev. 3.41

9 45

Servicing HP ProLiant Server Products

RILOE and RILOE II Differences Processor speed Remote Insight Lights-Out Edition II has an IBM 405GP PowerPC embedded processor running at 200Mhz for faster remote console performance. User interface Remote Insight Lights-Out Edition II features a new tab-based user interface for improved browser navigation. Security Remote Insight Lights-Out Edition II uses 128-bit encryption for remote console for improved security. Virtual functions Remote Insight Lights-Out Edition II provides USB-based Virtual Floppy and Virtual CD functionality. This functionality is supported on ProLiant servers with the Remote Insight 30-pin connector, running a USB supported operating system.

9 46

Rev. 3.41

Server Diagnostic Tools

RILOE II Screens RILOE II/RILOE Login Screen The RILOE II Login Screen has a distinctly different appearance from that of the RILOE. In either case you will be prompted for a user name and password.

Rev. 3.41

9 47

Servicing HP ProLiant Server Products

RILOE II Interface Although the RILOE II provides the same functions of the RILOE with some additional enhancements, the user interface has a different look and feel. As you can see below, there are four tabs along the top for System Status, Remote Console, Virtual Devices and Administration. Each of these tabs has a set of submenus that display along the left side of the screen when the tab is selected. The Status Summary screen is the first item on the System Status menu and is the first screen to be displayed after the user logs in. The remaining tabs and submenus provide functions similar to the RILOE.

Operational Overview During normal operation, the RILOE II passes the keyboard and mouse signals to the server and functions as the primary video controller. This configuration allows the following operations to occur:

Transparent substitution of a remote keyboard and mouse for the server keyboard and mouse Saving of video captures of reset sequences and failure sequences in the RILOE II memory for later replay Simultaneous transmission of video to the server monitor and to a Remote Console monitor

9 48

Rev. 3.41

Server Diagnostic Tools

Accessing the RILOE II for the First Time RILOE II is preconfigured with a default user name, password, and DNS name. A network settings tag with the preconfigured values is attached to the board. Use these values to access the board remotely from a network client using a standard Web browser. IMPORTANT: For security reasons, HP recommends changing these default settings after accessing Remote Insight Lights-Out Edition II for the first time. Default values: User name: Administrator Password: The last eight digits of the serial number DNS name: RIBXXXXXXXXXXXX, where 12 Xs are the MAC address of RILOE II NOTE: User names and passwords are case sensitive. After the default user name and password are verified, the Remote Insight Status Summary screen is displayed. The Remote Insight Status Summary provides general information about the RILOE II, such as the user currently logged on, server name and status, Remote Insight IP address and name, and latest log entry data. The summary home page also shows whether the RILOE II has been configured to use HP Web-based Management and Insight Management Web agents. Insight Manager 7 link to RILOE Insight Manager 7 discovers all Remote Insight Lights-Out Edition (RILOE) boards, Remote Insight Lights-Out Edition II (RILOE II) boards and Integrated Lights-Out (iLO) management processors running in the managed environment. Users may access RILO, RILOE II or iLO from the Insight Manager Home page by clicking on the status column in the management processor column. This integration provides the following capabilities:

Insight Manager 7s ability to automatically launch the Lights-Out Configuration Utility on multiple cards simplifies configuration of multiple Lights-Out ports. Server administration is more efficient and centralized Combining Virtual Media features and the Smart Start Scripting Toolkit enables the remote deployment of servers in an unattended fashion. The ProLiant Essentials Rapid Deployment Value Pack automates deploying and provisioning server software configurations through the RILOE interface. Insight Agents provide system monitoring and pre-failure alerting through the RILOE II network interface. Insight Agents on the remote server can be accessed directly from RILOE

Rev. 3.41

9 49

Servicing HP ProLiant Server Products

Insight Insight Manager 7 displays RILOE II events Insight Manager 7 provides options to manage the recovery options of remote servers. The recovery options of Insight Manager 7 will also provide the status of RILOE II and access to the diagnostics on RILOE II. In addition to useful information about the RILOE II itself, the status screen provides network information and information about power cable status. Events that are recorded include system resets, ASR, system power loss, user logins to the RILOE II and unsuccessful login attempts. RILOE II Survey Report The RILOE II provides features for proactive system management and efficient troubleshooting of server problems. In addition to the Remote Console, you have access to overall server status information, video replay of previous server resets, and other information gathered by the Survey utility.

9 50

Rev. 3.41

Server Diagnostic Tools

RILOE II Global Settings Session Timeout (minutes)Controls how long a session can remain inactive before the Remote Insight board forces the user to log in again. The default is 15 minutes and can be set up to 120 minutes. ROM Configuration Utility (F8)Enables or disables the use of the F8 key during POST to access the Remote Insight ROM-Based Configuration Utility. Emergency Management ServicesEnables or disables the use of Windows 2003 Server EMS through the RILOE II. Bypass reporting of external power cableEnables or disables the RILOE II board to report to the operating system agent to which the external power cable is connected. Remote Console Port ConfigurationEnables or disables configuring of the port address.

Remote Access with Pocket PCEnables or disables access to the RILOE II from a Pocket PC. Remote Console Data EncryptionEnables encryption of Remote Console data. If using a standard telnet client to access the RILOE II board, this setting must be Disabled. SSL Encryption StrengthAllows you to set a 40-bit or 128-bit cipher strength. The most secure is 128-bit (High). Current CipherDisplays the encryption algorithm currently being used to protect data during transmission between the browser and the RILOE II.
Rev. 3.41

9 51

Servicing HP ProLiant Server Products

Remote Insight HTTP PortAllows you to change this setting, if required by your environment. Remote Insight HTTPS PortAllows you to change this setting, if required by your environment. Remote Insight Remote Console PortAllows you to change this setting, if required by your environment. Host KeyboardEnables or disables the host keyboard. Level of Data ReturnedAllows you to select the amount of data that is returned to an HTTP identification request from Insight Manager 7.

9 52

Rev. 3.41

Server Diagnostic Tools

iLO/RILOE differences

iLO is embedded on the ProLiant server. RILOE is a PCI option card. iLO integrates system board management and diagnostics functionality with Lights-Out technology. System board management is not part of the RILOE functionality. iLO does not require any internal or external cables for its' operations. RILOE does require internal or external cable(s) for operation. iLO has Standard and Advanced features RILOE features are all standard.

iLO Standard and Advanced features iLO Standard Features Virtual text remote console Virtual power button control Dedicated LAN connectivity Automatic IP configuration via DHCP/DNS/WINS Industry standard 128-bit SSL IML and iLO event logging Support for up to 12 user accounts iLO Advanced Features Virtual graphic remote console Virtual floppy drive

Rev. 3.41

9 53

Servicing HP ProLiant Server Products

iLO Status Summary The iLO Status Summary provides general information about iLO such as the user currently logged on, server name and status, iLO IP address and name and latest log enry data. The Status Summary also shows whether iLO has been configured to use HP Web-Based Management and Insight Management Web agents.

9 54

Rev. 3.41

Server Diagnostic Tools

iLO Integrated Management Log The Integrated Management Log (IML) allows you to view logged remote server events. Logged events include all server-specific events recorded by the system health driver including operating system information and ROM-based POST codes.

Rev. 3.41

9 55

Servicing HP ProLiant Server Products

Server and iLO diagnostics As an integrated management processor iLO monitors the progress of the boot process of the server. The Host Server ROM writes Port 84 codes as it is booting. Integrated Lights-Out records and displays these codes. Selected POST codes are considered to be milestone codes and will have a description associated with them. You may use these milestones and descriptions to determine how far the server progressed through the boot process. HP uses Non-Volatile RAM to store server environment variable information. This information may be useful to HP engineers and advanced customers who have detailed knowledge of HP System Management architecture.

9 56

Rev. 3.41

Server Diagnostic Tools

Remote Insight Lights-Out Edition (RILOE)


The HP Remote Insight Lights-Out Edition board is a 32-bit PCI-based, singleboard computer that can be installed in a server to provide management from a PC using a web browser.

RILOE Features Hardware-based graphical console A Hardware-based graphical console turns a standard browser on a management PC into a virtual desktop of the host server. This enables full control of the host servers display, keyboard, and mouse even if the servers operating system is not responding or the server is without power. Browser support for Internet Explorer and Netscape Browser support enables access to the Remote Insight Board through an integrated HTML Remote Insight menu. This menu resides in the firmware of the Remote Insight Board. LAN access through onboard network interface card LAN access enables access to the Remote Insight Board through the network. An integrated 10/100 Ethernet NIC on the Remote Insight Board supports TCP/IP, allowing you to access the Lights-Out Edition through the network without having to use a phone line. The NIC can auto-select between 10MB and 100MB. Server failure alerting Remote Insight detects when the server has lost power or has been reset by the Automatic Server Recover (ASR) circuitry after the operating system has stopped
Rev. 3.41

9 57

Servicing HP ProLiant Server Products

responding. Alerts can be sent to up to 12 management accounts through SNMP traps. SNMP is the network management protocol used by Insight Manager and other industry-standard network management applications. Reset and failure sequence replay Video text sequences stored on the Lights-Out Edition allow you to play back, pause, and replay server startup and shutdown sequences. These sequences include all system and operating system error messages and fatal error screens, such as NetWare Abend screens and Windows NT blue screens. Remote reset Remote reset enables you to initiate a cold reset from the management PC to bring the host server back on line when it is not responding. This type of restart does not shut down the server operating system gracefully but is useful in situations when the operating system is unresponsive Integration with Insight Manager Integration with Insight Manager provides hardware-based asynchronous manageability, including access to Insight Management Agents, and support for all full in-band SNMP management under key operating environments User administrator security To ensure security, RILOE supports up to 12 users with customizable access rights and individual log in names and passwords, implements MD5 encrypted password security and provides event generation for invalid login attempts. External power even when server is powered down An external power connector ensures continuous power to the Remote Insight Lights-Out Edition, even during a server power failure, or when the server is turned off. Auto configuration of IP address via DNS/DHCP RILOE provides automatic network configuration it comes with a default name and DHCP client that leases an IP address from the DNS/DHCP server on the network Survey Using industry-standard browsers, RILOE users can access the Survey configuration file, providing the latest server configuration information to assist in the diagnostic process. Virtual power button and virtual floppy drive With the virtual power button, authenticated users can remotely turn the host server on or off using any standard browser interface. Virtual floppy drive allows an administrator to remotely reboot a server from a diskette inserted in a remote
9 58
Rev. 3.41

Server Diagnostic Tools

management PC from anywhere on the network by capturing and transferring an image of a diskette over the network into the memory of the host servers RILOE. The board then redirects diskette read/write requests to diskette sectors in memory instead of the local diskette drive. RILOE Option Kit The option kit includes (clockwise starting on left): External power adapter Provides power to the Remote Insight Lights-Out Edition when the server power is off. Power cord for the external power adapter. Keyboard/mouse adapter cable Allows local and remote use of a keyboard and mouse. Virtual power button cables for ProLiant ML and DL servers Allows remote control of the power switch on the host server Virtual power button cables for ProLiant 1850R and ProLiant 8000 Allows remote control of the power switch on the host server Network settings tag contains pre-configured values for default user name, password and DNS name. HP recommends changing these default settings after accessing the RILOE board for the first time.

Power cord

Keyboard/mouse adapter cable

External power adapter

Virtual power button cables

Network settings tag

Note: PCI slot, cables and video switch settings vary among servers. (In some cases the cable ships with the server, in others, the cable from the RILOE option kit is used). Refer to table 2-1 in the Remote Insight Lights-Out Edition User Guide for the
Rev. 3.41

9 59

Servicing HP ProLiant Server Products

data for your server. You can also find the information on the HP website as seen below for RILOE II.

Important! All servers support the keyboard/mouse external cable as well as the AC adapter. However, the default configuration always relies on having the internal cable connected so RILOE II can provide the virtual power buttons, Virtual Floppy, and Virtual Media USB applet. Whenever the 16- or 30-pin internal cables are used, the external cables should not be used. Frequently, customers try to use the external mouse/keyboard cables with the internal cables, causing conflicts with the mouse and keyboard functions. RILOE LEDs, Switch and Connectors The Remote Insight Lights-Out Edition external connectors are shown below:

9 60

Rev. 3.41

Server Diagnostic Tools

Video connector

Keyboard/mouse Connector

Power adapter connector


From left to right they include: Power adapter connector

NEC LEDs

Under normal conditions, the Remote Insight Lights-Out Edition takes power from the PCI slot and does not require an external source. However, attachment of the board to a separate AC power source using the included AC adapter provides backup power should power within the server fail. Video connector The Remote Insight Lights-Out Edition video circuit becomes the primary video. Therefore, the host server monitor should be connected to the Remote Insight Lights-Out Edition. Keyboard/mouse connector To provide remote keyboard and mouse control, the keyboard and mouse signals must pass through the Remote Insight Lights-Out Edition. LAN connector The network connector provides a full-time 10MB/s or 100MB/s network connection to the host server. This provides a management PC with access to the host server without the need for separate telephone lines or modem sharing devices.

Rev. 3.41

9 61

Servicing HP ProLiant Server Products

Two LEDs Two LEDs are located on the RJ-45 connector to indicate network connectivity: 1. A green LED on the bottom right illuminates when a link is present from the Ethernet hub and flashes to indicate network traffic. 2. An amber LED on the top right illuminates to indicate a 100MB/s connection and is off to indicate a 10MB/s connection.

LEDs

J11

SW3

J12

Nine LEDs are located in the upper left corner of the Lights-Out Edition board. (viewing it component side up). During the initial boot of the Remote Insight Lights-Out Edition, the LED indicators flash randomly. After the board is booted, LED 7 will flash once a second. If the any combination of the LEDs illuminates after the initial boot, it indicates a hardware failure. Under this circumstance, try resetting the Remote Insight Lights-Out Edition. SW3, a four position DIP switch, is located near the right end of the board. It allows the user to enable and disable video and put the Lights-Out Edition board into a flash recovery mode. J11 is the virtual power button 16-pin connector. Refer to the appropriate server Setup and Installation Guide or Maintenance and Service Guide installation instructions. J12 is the Virtual power button 4-pin connector Refer to the appropriate server Setup and Installation Guide or Maintenance and Service Guide installation instructions.

9 62

Rev. 3.41

Server Diagnostic Tools

RILOE F8 Setup When the server boots you are given the option of configuring the RILOE board. Pressing F8 at the prompt displays a menu-driven interface from which you can change information related to the user, network, etc. Exiting the menu returns you to the boot process.

RILOE Remote Console Login When you set up the RILOE option you specify the network address of the RILOE board along with the names and passwords for any users. Using the network address for the RILOE board from a web browser will invoke a login screen as shown here.

Rev. 3.41

9 63

Servicing HP ProLiant Server Products

RILOE Home Page The home page displays with information about the board and the server being managed when you login to the RILOE. Notice in the Remote Console menu at the left there is a Remote Console (Frame) selection that displays Remote Insight information within a frame that allows you to maintain a view of the RILOE menu while observing the activity on the server display.

RILOE Remote Console Frame View This is a view of the Microsoft Windows 2000 Advanced Server desktop as seen from the RILOE console in frame view. The Remote Console redirects the host server console to the remote client to provide the user with full video and keyboard access.

9 64

Rev. 3.41

Server Diagnostic Tools

RILOE Status In the Server Information menu there is a Status selection that displays the screen shown here. This provides status information about both the server and the Remote Insight board.

RILOE Global Settings Global Settings is a choice under the Administration menu that allows the user to view and modify miscellaneous information including security and keyboard settings. Among these is the ability to set a time-out limit for the RILOE session. For security reasons, RILOE will automatically logout the user if the session exceeds this period with no activity. Clicking the Refresh button on your browser will bring up the dialog box that will allow the user to login again. If the Remote Console Port Configuration is set to Auto, the Remote Console Port is enabled only when a Remote Console Session is in progress.

Rev. 3.41

9 65

Servicing HP ProLiant Server Products

RILOE Event Log The Logs menu allows you to view either the Event log or the Integrated Management Log. A sample Event Log is shown here. Logged events include major server events such as a server power outage or server reset, and Remote Insight events such as a loose cable, or unauthorized login attempt. User actions are also logged such as server power on/off, power (reset) cycle, virtual floppy activity, and clearing of event log.

RILOE Integrated Management Log This slide provides an example of what you would see when invoking the Integrated Management Log from the Logs menu. Logged events include all server specific events recorded by the health driver (OS information, ROM Post codes, etc.)

9 66

Rev. 3.41

Server Diagnostic Tools

RILOE Reset Sequences Reset Sequences provides the user with video replay capability of critical host server sequences. This includes the previous two boot sequences, with ROM POST messages and OS load information. The user can also view the video sequences leading up to the last host server reset, including any abend information generated by the OS.

Rev. 3.41

9 67

Servicing HP ProLiant Server Products

Survey Utility
The Survey Utility is an online information gathering agent that runs on servers, gathering critical hardware and software information from various sources and saving it as a history of multiple sessions. It was developed to enable you to resolve problems without taking the server offline. Server Utility now includes a web browser interface that enables remote control of the utility and facilitates transfer of survey information from remote machines to a service provider. This online access is available from supported operating systems. Survey is also available from a tab on the SmartStart-based HP Insight Diagnostics menu. The Survey Utility is an agent, similar to other HP management agents, and is supported on all ProLiant servers. In addition to its text output file, it gathers up to 10 configuration captures (or sessions) and can report on changes that have occurred to the system hardware or software over time. Survey captures data as sessions, where a session is defined as an organized group of data describing the configured state of the system at a specific point in time. It will keep up to 10 distinct sessions, organized as 3 distinct types:

The original session (always session number 2) is the first session sampled, is treated as a master configuration, and will never be overwritten by the utility. The checkpoint sessions (session numbers 3 to 10) are the next 8 samples that differ significantly from the previous session. They are maintained in a first in, first out (FIFO) fashion and may be deleted as the number of checkpoints increases. Checkpoints are generated only when something that would not change under normal operation of the server is changed; thus, not all items that change will generate checkpoints. The active session (always session number 1) is the last information captured, and is overwritten each time a sample is taken. The session information is maintained in a file called SURVEY.IDI in the same directory as the executable portion of the program. This file contains all of the binary information captured for every session and can be analyzed locally by the Survey Utility, or it can be sent to another location such as a help center or to HP where the Survey Utility can generate custom reports on the information.

9 68

Rev. 3.41

Server Diagnostic Tools

Uses for the Survey Utility


Some of the practical uses of Survey are:

Diagnosis of a server without shutting down the unit. Remote server diagnosis, where the customer or field technician may send you the Survey files; or, you may use Remote Insight to view the Survey files through your web browser. Determining if changes to the server have caused a problem. For example, if the server was working correctly yesterday, and has a problem today, you can generate a Survey file that compares the current configuration with the last known good one. Some examples of system changes that may contribute to system failures and that may be detected by using Survey are: Were any cards recently added? Has memory been added (or subtracted)? Have any services or devices stopped that were running in the last known good session? Was a ROM upgrade performed? Were any hard drives hot-plugged?

Accessing the Integrated Management Log (IML) on a server that does not have an Integrated Management Display (IMD). After running the Survey Utility, you can view the Integrated Management Log by loading the output of the utility (typically called SURVEY.TXT) into a text viewer such as Microsoft Notepad. The event list follows the system slot information. Once you have opened the file in a text viewer you can print its contents using the print feature of the viewer.

These examples illustrate the types of failure isolation functions that can be performed with Survey.

Rev. 3.41

9 69

Servicing HP ProLiant Server Products

Generating a Survey file


The Survey Utility saves its captured information in binary form in the Survey.idi file, and outputs reports to the Survey.txt file. The Survey.idi file cannot be read directly because it is in binary format. Survey uses the data in the .idi file to generate a report whose contents depend on the command line options that you select when running Survey. The report is saved as Survey.txt, which can be viewed with a text editor.

Several common questions arise when using Survey:

Where does Survey reside? The default location is C:\Compaq\Survey for NT. The default location for Novell is the system directory. How do I know what sessions exist? Run Survey -v from the DOS command prompt to get a listing as shown in the diagram above. How do I view the IDI file? Survey.idi cannot be read directly because it is in binary format. Survey uses this file to generate Survey.txt. See the command line options section for more details.

9 70

Rev. 3.41

Server Diagnostic Tools

Sample Survey Report


Compaq Survey Utility Version 1.11 (BUILD 24) for Windows NT (C) Copyright Compaq Computer Corporation 1995-97 ______________________________________________________________________________ List of Sessions in C:\Compaq\Survey\Survey.IDI 2) Surveyor-NT 1.11 (BUILD 24) sampled 1/19/1998 21:00:26 3) Surveyor-NT 1.11 (BUILD 24) sampled 3/26/1998 16:15:27 4) Surveyor-NT 1.11 (BUILD 24) sampled 3/29/1998 9:50:17 5) Surveyor-NT 1.11 (BUILD 24) sampled 4/10/1998 12:32:10 6) Surveyor-NT 1.11 (BUILD 24) sampled 4/13/1998 16:09:34 7) Surveyor-NT 1.11 (BUILD 24) sampled 4/14/1998 10:51:24 8) Surveyor-NT 1.11 (BUILD 24) sampled 4/14/1998 13:10:55 9) Surveyor-NT 1.11 (BUILD 24) sampled 4/14/1998 13:13:03 10)*Surveyor-NT 1.11 (BUILD 24) sampled 4/14/1998 13:18:25 ______________________________________________________________________________ Output C:\Compaq\Survey\Survey.IDI session 10 as C:\Compaq\Survey\Survey.TXT Include comparison with C:\Compaq\Survey\Survey.IDI session 9 Information detail filter set to Checkpoint Level Sample interval is every 7 days The next 3 samples are scheduled for: Wednesday 4/15/1998 at 12:00:00 hours Wednesday 4/22/1998 at 12:00:00 hours Wednesday 4/29/1998 at 12:00:00 hours ______________________________________________________________________________ System Components ______________________________________________________________________________ Mass Storage Disk Controller 1 ..................... Array Array Configuration Physical Devices Attached SCSI Bus 1 SCSI Target 1 .................(-) COMPAQ WDE2170S Capacity .....................(-) 2,007 MBytes (4110000 Blocks) Serial Number ................(-) WS7000177432 Firmware Version ..............(-) 1.52 Compaq ProLiant Storage Box 1 ....... COMPAQ PROLIANT 4U6E DB Storage Slot 1 ..................(+) [empty] . (-) COMPAQ WDE2170S (wide) ______________________________________________________________________________

Excerpt from Survey report using survey -o10,9 fdifference to generate the report.

Rev. 3.41

9 71

Servicing HP ProLiant Server Products

Explanation of Sample Survey Report This Survey report is very short due to the fact that two Survey sessions were compared and only the differences were listed, as specified in the command line parameters. By comparing two Survey sessions, one from when the server was operating properly and one from when it was not, it is possible to see what has changed that may have affected the unit. In the sample, session 10 was the primary session, and it was compared with session 9 for differences. Thus, if an item has a plus beside it, then that was its setting in 10, and a minus signifies its setting in 9. Session 9 was created when the system was fully functional. Then, a hot-plug hard drive was pulled from a storage system, simulating a drive failure, and session 10 was created. When they were compared, the Survey.txt file shows the drive information from 9 with minus signs alongside. This minus means that these parameters have changed. In this case, the plus beside the empty in the Storage Slot 1 field shows that the drive is no longer there. If the drive had been replaced, the new drive information would be shown with plus signs.

9 72

Rev. 3.41

Server Diagnostic Tools

Using Survey Utility from a Browser


Browser Requirements Following are the minimum version levels for browsers that will work with Survey:

Microsoft Internet Explorer 4.01 or Netscape Navigator 4.04

Minimum browser requirements include support for tables, frames, Java, JavaScript and Java Development Kit. In addition, all of the following options must be enabled:

Java JavaScript Accept all cookies Security

The following login accounts are available for use with the web-enabled Survey Utility:

Anonymous User Operator Administrator

Operator or Administrator access is necessary to capture a new configuration sample. Pointing the Browser to the Device Home Page Survey Utility allows you to view information from a web browser, either locally or remotely using the following procedure: 1. Determine the address of the target machine for the Survey Utility information that you want to view: a. b. 2. To view data locally, use the URL: http:// Localhost:2301/. To view data remotely, use the URL: http:// machine:2301/ where machine is the IP address or the computer name under DNS.

Enter the IP address in the Address field of the browser. This will provide a display of the Insight Manager Web-Based Management Device Home page displays for that machine. You may select the Login Account link to log in as another user. Use this option if you need to perform operations that require additional access rights, such as capturing a new configuration sample. Select Survey Utility. The most recent Survey report for the selection utility displays.
9 73

3.

4.
Rev. 3.41

Servicing HP ProLiant Server Products

Navigation The default browser view for the Survey Utility contains the following three frames: 1.

Title Frame Located in the upper left corner of the browser window Contains the following links: Help displays the Survey Utility User Guide. Report displays the Survey Utility Report in the data frame. Options displays the Options page for the Survey Utility. Device Home displays the device home page from where the Survey Utility was selected.

2. Navigation Frame

Located below the Title Frame on the left side of the window Contains tree applet that allows navigation of the Data frame on the right Located on the right side of the window Used for Setting configuration options Displaying a Survey Utility Report

3. Data Frame

Using Survey Utility Options The Options page of the web-enabled Survey Utility displays all captured Survey sessions and allows you to control the utility from a browser to perform the following functions:

Select Different Configuration History Files Select Primary and Compare Session Download the Configuration History File Select a Report Type Generate a New Report Capture a New Configuration Sample

9 74

Rev. 3.41

Server Diagnostic Tools

SCU diagnostics
On legacy systems (pre-ML/DL) the precursor to RBSU was the System Configuration Utility (SCU) which provided a menu-based interface to certain configuration and diagnostic functions. Following is a brief description of some of the diagnostic capabilities found on SCU.

Test Computer
TEST Menu The Test Computer menu provides three types of diagnostics routines:
Routine Quick Check Diagnostics Description This option will run high-level diagnostics on all detected hardware in the system. However, any errors found with this test would most likely be hard failures and would have already been reported during POST. In such a case, running diagnostics would not be necessary as the POST code will detail the failure. (A listing of POST error codes can be found in Appendix G.) This option is most useful for burn-in testing of all devices. The continuous looping feature continues testing all devices until it is stopped by hitting CTRL+BREAK; or, if the Stop on errors option is checked, until it finds an error with a device. Unattended testing without continuous looping will run all tests once and then stop. Use this option to test individual devices. This is most useful in troubleshooting intermittent errors, such as a high number of ECC memory errors or problems with a drive. It can also be used to verify that a particular component has failed and that its replacement if fully operational.

Automatic Diagnostics

Prompted Diagnostics

Rev. 3.41

9 75

Servicing HP ProLiant Server Products

Diagnostic Tests Any of the three modes of Test Computer can perform the following tests:

Primary Processor Test 100 Series Error Codes. Identifies failures Memory Test 200 Series Error Codes. The System Memory Test will Write, Read, Compare Test uses static patterns to exercise memory. Noise Test checks the integrity of data transfer through data lines. Random Data Pattern Test uses random data patterns to exercise Random Address Test uses random data patterns written to random Random Long Test uses four patterns to exercise long memory. It Keyboard Test 300 Series Error Codes.

Parallel Printer Test 400 Series Error Codes. Diskette Drive Test 600 Series Error Codes. SMART Array Controller Test The following options are under the Drive Monitoring Diagnostic Test Controller Diagnostic Test verifies that the hard drive controller can Seek Test performs sequential seeks over the hard drive and then Read Test performs a random head seek test followed by a test of the Select All the Above Tests runs the Drive Monitoring Diagnostic Test, Surface Analysis performs multiple write/read/compares on each track Serial Test 1100 Series Error Codes.

Modem Communications Test 1200 Series Error Codes. Fixed Disk Drive Test 1700 Series Error Codes. Tape Drive Test 1900 Series Error Codes. Advanced VGA Board Test 2400 Series Error Codes. 32-Bit DualSpeed NetFlex-2 Controller and 32-Bit DualSpeed Token Ring Controller Test 6000 Series Error Codes. SCSI Fixed Disk Drive Test 6500 Series Error Codes. CD-ROM Drive Test 6600 Series Error Codes. SCSI Tape Drive Test 6700 Series Error Codes. Server Manager/R Board Test 7000 Series Error Codes. Pointing Device Interface Test 8600 Series Error Codes. Network Controller(s) Test

9 76

Rev. 3.41

Server Diagnostic Tools

Inspect
Inspect System ROM Keyboard System Ports System Storage Graphics Memory Operating System System Files Network System Configuration Server Health Miscellaneous Print Save to File Add Comments Exit Inspect

The most useful items on the list are:

ROM Determines the ROM revisions of various components, including the system board. System Storage Gathers information concerning drives and mass storage controllers. Memory Identifies the number, type, and position of installed memory modules. Network Determines the I/O address, IRQ, speed, and MAC address of installed NICs. Server Health Displays the health logs, which include Standby Recovery Server status, Critical Error log, Correctable Memory Error log, and the Revisions table that details the system board and riser card revisions.

Rev. 3.41

9 77

Servicing HP ProLiant Server Products

Test ASR
Use Test ASR to verify a new ASR configuration or troubleshoot an existing one. It causes the following to happen: 1. 2. 3. 4. A test alert will be generated in the Server Health Log. The system will be restarted. The system ROM will check for bad memory. Depending on the selections made in the ASR options menu, the following may occur: 5. 6. The pager number and message will be dialed. The modem will be set to auto-answer or will dial out to another computer. The system will boot into the configuration partition (if installed) or will boot into the operating system.

Either remotely or from the host, you can run the Diagnostics utilities to view the Server Health Log. Depending on the version of ASR, a successful reboot will generate a page.

9 78

Rev. 3.41

Server Diagnostic Tools

Upgrade Firmware
The Upgrade Firmware option starts the ROMPaq Firmware Upgrade Utility, which allows you to upgrade the system and option ROMs in the server. ROM upgrades can fix many issues such as those that can arise when a new card with the latest firmware is installed in a server that has an older system ROM. The symptoms for these issues vary widely and range from solid failures to intermittent problems. It may not be readily apparent that the cause of an issue is ROM-related. Therefore, it is recommended that a system and its options all be upgraded to the latest ROM revisions when issues arise.
WARNING: Powering down a system or otherwise interrupting a ROM upgrade may result in inoperative system boards or components, and may require replacement of the failed part! Some options do have boot-block on the ROM, and in the event of an upgrade interruption or ROM corruption can be re-flashed by powering up the unit with a ROMPaq diskette inserted.

Remote Utilities
The Remote Utilities option allows connection to the computer either through a modem or through a network if enabled. The ASR feature must be configured before using this feature (see Automatic Server Recovery in the System Configuration section). Remote control will be handled through an ANSI terminal emulation program, such as ProComm or Windows HyperTerminal. Once the remote service session is established, the Remote Utilities menu options for uploading or downloading files to the server will be operational.

Rev. 3.41

9 79

Servicing HP ProLiant Server Products

Integrated Management Display


The Integrated Management Display (IMD) is an LCD panel connected directly to a servers system board. It provides instantaneous information, including the Integrated Management Log, that can increase the serviceability of the system.

The IMD is a standard feature on the current high-end servers (except the 6400R).

64 character backlit LCD (16 x 4 rows) Menu driven Four user navigating buttons Allows F1 POST entry Auxiliary power supply Displays POST, system alerts, fan failures, and user information Continuous event wrap Common design throughout high-end servers

The IMD is off unless AC power is applied to the power supply and +5V AUX is on. When it powers up, the IMD displays:
COMPAQ LCD MODEL #56022 LCD FIRMWARE 1.9

9 80

Rev. 3.41

Server Diagnostic Tools

Initialization
On systems that have an ON/STANDBY switch, when AC power is first applied, the buttons have no effect on the display until the system is powered on. Once the system is powered on, the IMD starts its initialization sequence. The LCD screen clears, it displays the model number and the LCD firmware revision, and then the MAIN MENU appears. The initialization sequence begins displaying:
System Initialization EISA Initialization PCI Auto Cfg. Processors Video Memory Test Cache Test Memory Initialization Drive Arrays Floppy Drive Option ROMs SCSI Devices F10 Prompt

A rotating line appears by each of the above prompts to indicate that section of POST is being executed. When POST completes that section, the rotating line is replaced by a check mark. When the system is powered down, the display indicates System Powered OFF.

Rev. 3.41

9 81

Servicing HP ProLiant Server Products

Displaying Events
The following is an example of how an event is displayed on the Integrated Management Display:
**001 of 010** --CAUTION-03/19/1997 12:54 PM FAN INSERTED Main System Location: System Board Fan ID: 03 **END OF EVENT**

Advantages of IMD
Because you can access the IMD directly from the server instead of going to the management console, you can achieve a greater level of system uptime and serviceability. For example, if your data center consists of racks of HP servers and Insight Manager on the management console has notified you that one of the servers is down, the IMD can display the user-defined server name, making it easier for you to identify from among all of the data center servers, the server that has gone down. The IMD can also store and display information about the system administrator who services the server. To customize how the IMD displays information, you can set user-definable options without taking the server offline. You can also add custom menu items that provide a more comprehensive status of the server. For example, if you wish to keep track of when the server was last serviced, you can enter this information and later have access to it through the IMD. These functions are currently supported only under Windows NT.

9 82

Rev. 3.41

Server Diagnostic Tools

Integrated Management Log Viewer


The HP Integrated Management Log Viewer allows users to view system events either locally or remotely. IML Viewer provides the ability to remotely manipulate system event information with features such as view, filter, sort, print, export and save. Benefits include the ability to:

View the IML of either local or remote systems Obtain a single historical record of recent system events and errors for postdiagnosis review View detailed system event information in a readable format Save an IML as a binary file so that users can view the saved IML file at a later date or possibly even at a different location Filter or sort IML entries to find specific information quickly Save the IML to a comma-separated file for viewing at a later date using a third-party application, such as a spreadsheet program Print out a hard copy of the IML

Install the IML viewer from the SSD for either NT or Novell (CPQIML.NLM). Severity Levels in the IML Severity levels displayed in the Integrated Management Log are as follows:
Level Informational Repaired Caution Description A comprehensive chronicle of past hardware or software system events. This type of event requires no action by the administrator. An action has taken place to fix this system event and the user marked this event as being repaired. A non-critical system error has occurred and may or may not require action by the administrator, however, it is recommended to take action if possible, then mark the event as repaired. A system component on the unit has failed and requires action by the administrator. Replace the system component, then mark the event as repaired.

Critical

Rev. 3.41

9 83

Servicing HP ProLiant Server Products

HP Servers Troubleshooting Guide


The HP Servers Troubleshooting Guide is a comprehensive resource covering troubleshooting information for all ProLiant and TaskSmart servers. Created for both novice and expert users, the guide walks you through the diagnostic process and directs you to specific troubleshooting information. A wide range of problems are covered in the guide, including hardware problems, software problems, and error recovery. The guide also contains lists of error codes and messages from the Power-On Self-Test (POST), Diagnostics, the Integrated Management Log (IML), and Array Diagnostic Utility (ADU). The HP Servers Troubleshooting Guide can be obtained by going to the HP Support Reference Library and downloading document number 161759-007. The Reference Library is linked from the Support website at http://wwss1pro.compaq.com/support/home/index.asp

9 84

Rev. 3.41

Server Diagnostic Tools

Learning Check
1. What four tabs are found on the initial SmartStart screen? _____________________________________________________________ _____________________________________________________________ 2. What six pieces of information about the host server are displayed on the SmartStart Home screen? _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ 3. What tab on the Server Diagnostics menu would enable you to determine the location and size of installed memory? _____________________________________________________________ 4. What HP Insight Diagnostics screen provides a list of errors detected during POST? _____________________________________________________________ 5. What three components are erased by the Erase utility? _____________________________________________________________ _____________________________________________________________ 6. After saving an Array Diagnostic Utility (ADU) report what web-based utility could you use to format and summarize it? _____________________________________________________________

Rev. 3.41

9 85

Servicing HP ProLiant Server Products

7.

If SCSI bus fault values for drives with similar service times are the same, what is the likely cause? _____________________________________________________________

8.

What feature provides the means to configure the type of systems Insight Manager 7 will discover? _____________________________________________________________

9.

What iLO Advanced features are not found in iLO Standard? _____________________________________________________________ _____________________________________________________________

10. What key is used during POST to access the Remote Insight ROM-Based Configuration Utility ? _____________________________________________________________ 11. RILOE II is preconfigured with a default user name, password, and DNS name where are these found? _____________________________________________________________ 12. What iLO screen can be used to determine how far a server progressed during the boot process before failing? _____________________________________________________________

9 86

Rev. 3.41

You might also like