AN153 Stud

V10.
cover
IBM Training Front cover

Student Notebook
Power Systems for AIX III: Advanced Administration and

Problem Determination
Course code AN15 ERC 3.0
Student Notebook
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX 5L™ AIX 6™ AIX®
DS8000® FlashCopy® HACMP™
Initiate® PartnerWorld® POWER Hypervisor™
Power Systems™ Power® PowerVM®
POWER6® POWER7® Redbooks®
RS/6000® Tivoli®
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and service names might be trademarks of IBM or other companies.
January 2015 edition

The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
© Copyright International Business Machines Corporation 2009, 2015.

This document may not be reproduced in whole or in part without the prior written permission of IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
V10.1
Student Notebook
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Unit 1. Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Application outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Maintenance window tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
Effective problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Before problems occur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
Before problems occur: A few good commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
Steps in problem resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
Progress and reference codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15
Reference codes at the IBM Knowledge Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17
Working with AIX Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
AIX Support test case data (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20
AIX Support test case data (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
AIX software update hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-23
Relevant documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Exercise: Problem diagnostic information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-27
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28
Unit 2. The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.1. Introduction to the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
What is the ODM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Data managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
ODM components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Device configuration summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Location and contents of ODM repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
How ODM classes act together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Data not managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Let's review: Device configuration and the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
Changing attribute values with odmadd and odmdelete . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Changing attribute values with odmchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
2.2. ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21
Software vital product data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Software states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
Predefined devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
Predefined attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
Customized devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
Customized attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
© Copyright IBM Corp. 2009, 2015 Contents iii

Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Additional device object classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-34

ODM and high-level device commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-36
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-38
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-39
Exercise: The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-40
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-41
Unit 3. Error monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2
3.1. Working with the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3
Error logging components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4
Generating an error report using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6
The errpt command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9
A summary report: errpt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
A detailed error report: errpt -a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12
Types of disk errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-15
LVM error log entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-17
Maintaining the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19
Exercise: Error monitoring (Part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-21
3.2. Error notification and syslogd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-23
Error notification methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-24
Self-made error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-25
ODM-based error notification: errnotify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-27
syslogd daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-30
syslogd configuration examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32
Redirecting syslog messages to error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-35
Directing error log messages to syslogd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-36
System hang detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-37
Configuring shdaemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-39
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-41
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-42
Exercise: Error monitoring (Part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-43
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-44
Unit 4. Network Installation Manager basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2
NIM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3
Machine roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5
Boot process for AIX installation: Tape or CD/DVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
Boot process for AIX installation with NIM (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-9
Boot process for AIX installation with NIM (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-10
NIM objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-11
Listing NIM objects and their attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-13
NIM configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-14
Resource objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-16
Resource objects: lpp_source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-18
Resource objects: SPOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-20
Resource objects: mksysb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-23
Network objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-25
Machine objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-27
Defining a machine object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-29
iv AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.1
Student Notebook
TOC Define a client using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31

NIM operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-34
bos_inst operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-37
More information about NIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-39
Additional topics in NIM course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-41
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-42
Exercise: Basic Network Installation Manager configuration . . . . . . . . . . . . . . . . . . . . . . . 4-43
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-44
Unit 5. System initialization: Accessing a boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
5.1. System startup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
How does a Power server or LPAR boot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
Loading of a boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Boot disk and the boot logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.2. Unable to find boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
Working with bootlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
AIX 7: Bootlist pathid enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
Starting System Management Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Working with bootlists in SMS (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17
Working with bootlists in SMS (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19
5.3. Corrupted boot logical volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
Boot device alternatives (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22
Boot device alternatives (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24
Accessing a system that will not boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25
Booting in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27
Working in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29
How to fix a corrupted BLV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34
Exercise: System initialization: Accessing a boot image . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36
Unit 6. System initialization: rc.boot and inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.1. AIX initialization part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
System software initialization overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
rc.boot 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
rc.boot 2 (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
rc.boot 2 (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
rc.boot 3 (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
rc.boot 3 (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
rc.boot summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
Fixing corrupted file systems and logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17
Let’s review: rc.boot (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19
6.2. AIX initialization part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24
Config_Rules object class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-26
cfgmgr output in the boot log using alog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-28
© Copyright IBM Corp. 2009, 2015 Contents v

Student Notebook
/etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30

Boot problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33
Let's review: /etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
Exercise: System initialization: rc.boot and inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-40
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-41
Unit 7. LVM metadata and related problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2
7.1. LVM data representation: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3
Review: LVM terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4
LVM identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6
LVM data on disk control blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-8
LVM data in the operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-10
LVM-related ODM objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-11
7.2. Export and import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-13
Exporting a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14
Importing a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-16
importvg and duplicate names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-17
importvg and existing logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-19
importvg and existing file systems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20
importvg and existing file systems (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-21
7.3. LVM metadata details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-23
Contents of the VGDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-24
VGDA example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-26
The logical volume control block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-30
How LVM interacts with the ODM and the VGDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-32
ODM entries for physical volumes (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-34
ODM entries for volume groups (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-41
ODM entries for volume groups (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-42
ODM entries for logical volumes (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-43
ODM entries for logical volumes (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-44
7.4. LVM metadata-related problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-45
ODM-related LVM problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-46
Fixing ODM problems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-48
Fixing ODM problems (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-50
Intermediate level ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-52
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-54
Exercise: LVM metadata and related problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-55
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-56
Unit 8. Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2
8.1. Failed disks: Mirroring and quorum issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3
Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-4
Stale partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5
Mirroring rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7
vi AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.1
Student Notebook
TOC VGDA count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-10

Quorum not available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11
Nonquorum volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-13
Forced vary on (varyonvg -f) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15
Physical volume states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17
8.2. Disk replacement techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-19
Disk replacement: Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-20
Procedure 1 (1 of 4): Disk mirrored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-22
Procedure 1 (2 of 4): Disk mirrored with replacepv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-24
Procedure 1 (3 of 4): Disk mirrored without replacepv . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-25
Procedure 1 (4 of 4): Special steps for rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27
Procedure 2 (1 of 2): Disk still working . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-28
Procedure 2 (2 of 2): Special steps for rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30
Procedure 3: Disk in missing or removed state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-32
Procedure 4: Total rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-34
Procedure 5: Total non-rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-36
ODM errors from LVM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-38
Removal of disk without reducevg (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-40
Removal of disk without reducevg (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-41
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-43
Exercise: Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-44
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-45
Unit 9. Install and cloning techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.1. Alternate disk installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4
Alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Alternate mksysb disk installation (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
Alternate mksysb disk installation (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
Alternate disk rootvg cloning (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
Alternate disk rootvg cloning (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
Removing an alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12
NIM alternate disk migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14
Exercise: Install and cloning techniques (Part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16
9.2. Using multibos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
Topic 2 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
multibos overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
Active and standby BOS logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
Setting up a standby BOS (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
Setting up a standby BOS (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-24
Other multibos operations (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-25
Other multibos operations (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-27
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-29
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-30
Exercise: Install and cloning techniques (Part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-31
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-32
Unit 10. Advanced backup techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Backup data inconsistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
© Copyright IBM Corp. 2009, 2015 Contents vii

Student Notebook
Ensuring backup data consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-4

10.1. LVM mirror-based online backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-5
Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-6
Online JFS backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-7
Splitting the mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-8
Reintegrate a mirror backup copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-10
Snapshot volume groups (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-11
Snapshot volume groups (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-12
Snapshot volume group commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-13
Snapshot volume group example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-14
10.2. JFS2 snapshot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-15
Topic 2 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-16
JFS2 snapshot (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-17
JFS2 snapshot (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-18
JFS2 snapshot mechanism (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-20
JFS2 snapshot mechanism (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-21
JFS2 snapshot SMIT menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-22
Creating snapshots: External . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-23
Creating snapshots: Internal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-25
Listing snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-27
Using a JFS2 snapshot to recover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-29
Using a JFS2 external snapshot to back up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-31
Using a JFS2 internal snapshot to back up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-33
JFS2 snapshot space management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-35
10.3. SAN Copy issues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-37
Topic 3 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-38
SAN Copy and file system cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-39
Use of JFS2 freeze and thaw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-40
Consistency groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-42
Accessing SAN Copy data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-44
The recreatevg command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-46
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-47
Exercises: Advanced backup techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-48
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-49
Unit 11. Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2
When do you need diagnostics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-3
Where do you run diagnostics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-5
The diag command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-6
Working with diag (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-7
Working with diag (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-9
Working with diag (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-10
What happens if a device is busy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-11
Diagnostic modes (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-12
diag: Using task selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-17
Diagnostic log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-19
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-20
Exercise: Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-21
viii AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.1
Student Notebook
TOC Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-22
Unit 12. The AIX system dump facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
System dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
Types of dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4
Traditional system dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6
Traditional dump actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7
The dump device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9
Dump device types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11
Firmware assisted dump (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13
Firmware assisted dump algorithm (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16
Firmware assisted dump algorithm (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-18
The sysdumpdev command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-19
List current dump settings (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-20
List current dump settings (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-21
Configuring the dump type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-22
Set the dump devices (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-24
Set the dump devices (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-26
Copy directory location and copy policy (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-28
Copy directory location and copy policy (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-29
Dump copy failure (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-30
Dump copy failure (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-31
always allow dump flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-32
Estimate the dump size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-33
Methods of starting a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-34
Starting a dump from a TTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-36
Generating dumps with SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-38
Generating dumps with HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-40
Dump image information (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-41
Operator panel codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-44
Dump problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-45
Retrieving the dump image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-47
The savecore command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-48
The dumpcheck command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-50
Automatically reboot after a crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-51
Using kdb to analyze a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-52
The snap command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-54
Data collection flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-56
Control flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-57
snap examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-59
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-61
Exercise: The AIX system dump facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-62
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-63
© Copyright IBM Corp. 2009, 2015 Contents ix

Student Notebook
Appendix A. Checkpoint solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
Appendix B. Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Appendix C. AIX dump code and progress codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1
x AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.1
Student Notebook
TMK
Trademarks
The reader should recognize that the following terms, which appear in the content of this training
document, are official trademarks of IBM or other companies:
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX 5L™ AIX 6™ AIX®
DS8000® FlashCopy® HACMP™
Initiate® PartnerWorld® POWER Hypervisor™
Power Systems™ Power® PowerVM®
POWER6® POWER7® Redbooks®
RS/6000® Tivoli®
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and service names might be trademarks of IBM or other companies.
© Copyright IBM Corp. 2009, 2015 Trademarks xi

Student Notebook
xii AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.0
Student Notebook
pref
Course description
Power Systems for AIX III: Advanced Administration and Problem
Determination
Duration: 5 days
Purpose
This course provides advanced AIX system administrator skills with a focus
on availability and problem determination. It provides detailed knowledge of
the ODM database where AIX maintains so much configuration information.
It shows how to monitor for and deal with AIX problems. There is special
focus on dealing with Logical Volume Manager problems, including
procedures for replacing disks. Several techniques for minimizing the system
maintenance window are covered. While the course includes some AIX 7.1
enhancements, most of the material is applicable to prior releases of AIX.
Audience
This course is an advanced course for AIX system administrators, system
support, and contract support individuals with at least six months of
experience in AIX.
Prerequisites
You should have basic AIX System Administration skills. These skills include:
• Use of the Hardware Management Console (HMC) to activate a logical
partition to run AIX and to access the AIX system console
• Install an AIX operating system from an already configured NIM server
• Implementation of AIX backup and recovery
• Manage additional software and base operating system updates
• Familiarity with management tools such as SMIT
• Understand how to manage file systems, logical volumes, and volume
groups
• Mastery of the UNIX user interface, which include use of the vi editor,
command execution, input and output redirection, and the use of utilities
such as grep
These skills can be developed through experience or by formal training. The
recommended training course to obtain these prerequisite skills is:
• Power Systems for AIX II: AIX Implementation and Administration AN12
or AX12 and their prerequisites
© Copyright IBM Corp. 2009, 2015 Course description xiii

Student Notebook
If the student has AIX system administration skills, but is not familiar with the
LPAR environment, those skills can be obtained by attending the following
course:
• AN11 or AX11 Power Systems Administration I: LPAR Configuration
Objectives
On completion of this course, students should be able to:
• Perform system problem determination and reporting procedures that
include analyzing error logs, creating memory dumps of the system, and
providing needed data to the AIX Support personnel
• Examine and manipulate Object Data Manager databases
• Identify and resolve conflicts between the Logical Volume Manager
(LVM) disk structures and the Object Data Manager (ODM)
• Complete a basic configuration of Network Installation Manager to
provide network boot support for either system installation or booting to
maintenance mode
• Identify various types of boot and disk failures and perform the matching
recovery procedures
• Implement advanced methods such as alternate disk installation,
multibos, and JFS2 snapshots to use a smaller maintenance window
Contents
• Advanced AIX administration overview
• The Object Data Manager
• Error monitoring
• Network Installation Manager basics
• System initialization: Accessing a boot image
• System initialization: rc.boot and inittab
• LVM metadata and related problems
• Disk management procedures
• Install and cloning techniques
• Advanced backup techniques
• Diagnostics
• The AIX system dump facility
xiv AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.0
Student Notebook
pref
Agenda
Day 1
Welcome
Unit 1: Advanced AIX administration overview
Exercise 1: Problem diagnostic information
Unit 2: The Object Data Manager
Exercise 2: The Object Data Manager
(optional) Exercise 2: Object Data Manager, Part 3
Unit 3: Error monitoring
Day 2
Exercise 3: Error monitoring
Unit 4: Network Installation Manager basics
Exercise 4: Basic Network Installation Manager configuration
Unit 5: System initialization: Accessing a boot image
Exercise 5: System initialization: Accessing a boot image
Day 3
Unit 6: System initialization: rc.boot and inittab
Exercise 6: System initialization: rc.boot and inittab
Unit 7: LVM metadata and related problems
Exercise 7: LVM metadata and related problems
(optional) Exercise 7: LVM metadata and related problems, Part 6
Unit 8: Disk management procedures, Topic 1
Exercise 8: Disk management procedures, Part 1
Day 4
Unit 8: Disk management procedures, Topic 2
Exercise 8: Disk management procedures, Parts 2 and 3
Unit 9: Install and cloning techniques, Topic 1
Exercise 9: Install and cloning techniques, Part 1
Unit 9: Install and cloning techniques, Topic 2
Exercise 9: Install and cloning techniques, Part 2
Unit 10: Advanced backup techniques, Topic 1
Exercise 10: Advanced backup techniques, Part 1
(Optional) Exercise 10: Advanced backup techniques, Part 2
Exercise 10: Advanced backup techniques, Parts 3 and 4
© Copyright IBM Corp. 2009, 2015 Agenda xv

Student Notebook
Day 5
Unit 11: Diagnostics
Exercise 11: Diagnostics
Unit 12: The AIX system dump facility
Exercise 12: The AIX system dump facility
Wrap up / Evaluations
xvi AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.0
Student Notebook
Uempty
Unit 1. Advanced AIX administration overview
What this unit is about

This unit introduces various AIX administration issues that are related to
problem determination and handling system maintenance and backup
efficiently.
What you should be able to do

After completing this unit, you should be able to:
• List the steps of a basic methodology for problem determination
• List AIX features that help minimizing planned downtime or shorten the
maintenance window
• Explain how to find documentation and other key resources that are
needed for problem resolution
How you will check your progress

Accountability:
• Checkpoint questions
• Lab exercise
References
SG24-7910 IBM AIX Version 7.1 Differences Guide (Redbooks)
SG24-5496 Problem Solving and Troubleshooting in AIX 5L (Redbooks)
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-1
Student Notebook
Unit objectives
IBM Power Systems

• List the steps of a basic methodology for problem
determination
• List AIX features that help minimize planned downtime or
shorten the maintenance window
• Explain how to find documentation and other key resources
that are needed for problem resolution
© Copyright IBM Corporation 2009, 2015
Figure 1-1. Unit objectives AN153.0
Notes:
1-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.0
Student Notebook
Uempty
Application outages
IBM Power Systems
• Functional or performance
• Avoid unplanned outages with best practices
– Change control
– Data security
– Capacity planning
– High availability design
• Avoid planned outages
– Fail over to a backup server
– Relocate application (LPAR or WPAR mobility)
• Use maintenance windows
– Application that is stopped versus slow activity
– Plan enough time for back-out or recovery
– Minimize time that is needed
• Effective problem determination and recovery
Figure 1-2. Application outages AN153.0
Notes:
Introduction
Providing system availability is a major responsibility of any system administrator. An outage
can be from a functional problem (such as an application or system crash) or a server
performance problem (business is seriously impacted due to poor response times or late jobs).
There are many approaches to dealing with this issue.
Unplanned outages
When most of you think of availability, you think of unplanned outages. Regular hardware and
software maintenance can often avoid these outages. Designing the computing facility to have
redundant components (power, network adapters, network switches, storage, and more) can
make the overall system resilient to the failure of individual components. Performance problems
are often the result of failing to do proper capacity planning, resulting in not enough resources
(memory, processors, network bandwidth, or disk I/O bandwidth) to handle the increased
workload. If there is no change control to manage what work is placed on a system, capacity
planning is even more challenging. Furthermore, uncontrolled changes to a system result in
Student Notebook
uncontrolled exposure to possible outages created by those changes, and thus unplanned
outages. Computer viruses and other malicious attacks by computer hackers can also reduce
system availability (in addition to the exposure of losing proprietary information). Good data
security policies are essential.
Even when implementing good policies in these areas, some unplanned outages might still
happen. In these situations, the system administrator needs to have a plan for minimizing the
impact and recovering as quickly as possible. One common approach is to have another
system that can take over the work of the failed system. High Availability Cluster
Multi-Processing (HACMP) provides a system for either concurrent processing by multiple
systems, or an automated fallover to a backup system, thus minimizing the impact of a server
failure. Such server redundancy can be designed to work within a single facility or be divided
between different geographical locations. Obviously, rapid notification of a problem, effective
and prompt diagnosis of the cause, and being able to quickly implement an effective solution all
contribute to a shorter mean time to recovery.
Planned outages
By using change control, the risk that is associated with certain categories of potential
unplanned outages can be managed. The impact of any unexpected problem (resulting from the
change) can be minimized by implementing the changes during planned windows of time. In
addition, there are certain types of changes for which an outage is unavoidable.
Some facilities implement multiple types of maintenance windows. One type would be frequent
short maintenance windows for any administrative work that competes with applications for
resources (performance impact) or have a small chance of having a functional disruption.
Another type would be a less frequent window in which any reboot of the system or any major
change to the level of the operating system or major subsystems, such as database software,
would be allowed.
Sometimes, the amount of time in a maintenance window is relatively small and the work must
be carefully planned. You also need to allow time to recover if any thing goes wrong due to the
maintenance. Any needed resources that can be pre-staged helps expedite the work. Any
approach that can speed recovery after a problem occurs is also useful.
For systems that need to be up 24 hours a day, seven days a week, and every day in the year
(24x7x365), even a short outage cannot be tolerated. In those situations, a method to
non-disruptively move the applications to another system can be invaluable. If an HACMP
cluster solution is already in place to handle unplanned outages, then that can be used to
manually fallover the services to another system while maintenance is being done. Other
solutions are to use Live Partition Mobility or Live Application Mobility.

V10.0
Student Notebook
Uempty
Maintenance window tasks

IBM Power Systems
• Minimize time that is needed for tasks

– Including time to recover from a failed task
• Operating system maintenance

– Pre-staging of maintenance
– Applying maintenance to alternate rootvg
– Applying maintenance with alternate BLV
– Reboot to use updated alternate
• System backups
– Minimizing rootvg size
– Snapshot techniques for user file systems
Figure 1-3. Maintenance window tasks AN153.0
Notes:
Expediting work in the maintenance window

The quicker maintenance can be completed the sooner you can boot up the system up and go
home. More importantly, expediting the expedited activities allow more time to handle any
problems that might arise.
Operating system maintenance

Ensure that you have on hand whatever materials you need for the job, such as the installation
media. To eliminate the need to handle the media, copy all of the needed filesets to disk storage
before the installation. An NFS or NIM server (provided you have sufficient network bandwidth)
or a software repository on the system that is being updated can be used. If using a software
repository on the system that is being updated, the filesets be in a file system should in a
different volume group than the rootvg.
An important technique that can be used is to have alternate storage for the target of the
software update. The updates are not made to the rootvg, but rather to a copy of the rootvg.
Student Notebook
There are two advantages. First, no change is being made to the active rootvg. The update can
be done at any time. Then, when a major maintenance window arrives, the system just needs to
be rebooted to make the update take effect. The second advantage is the ease of recovery. If
there are problems with the new level of code, you need to reboot back to the earlier code level
rather than recover from a mksysb or reject the entire update. The down side is the system
might need to be rebooted to make the update take effect.
Two techniques that can be used. One technique that is called multibos, creates an alternate
set of logical volumes that are copies of the rootvg BOS logical volumes. The other technique,
creates an alternate volume group that is a clone of the rootvg. In each case, you would apply
the maintenance to the copy and then later reboot to make it effective.
System backups
Another common maintenance activity is backing up the system. You need to quiesce the
application activity long enough to be sure that there are no inconsistencies in the backup,
unless you have an application that uses fuzzy backups. The term fuzzy backup refers to a
backup in which the application was making changes during the backup. For a specific
transaction, multiple data changes are made. Some of these transaction-related changes are
made before that data was backed up, while other changes were made after that data was
backed up. Thus the backup has one piece of data that reflects the transaction and another
piece of data that does not reflect the transaction. The two pieces of data are inconsistent and
such a backup is referred to as fuzzy.
For the rootvg itself, the size of the rootvg should be minimized. It should contain what is
needed for the OS. All user data and other non-essential files should be backed up and restored
separately. An example would be the standard location of a software repository:
/usr/sys/inst.images. The software repository can be large and yet this common path is
in the /usr file system, which is in the rootvg. Placing the software repository in a separate file
system with its own recovery plan can help reduce backup and recovery time. Another common
example is the /home file system. If users have large amounts of data that is stored in /home,
then over mounting with a separate file system can speed up working with the rootvg. There are
other file systems such as /tmp that might have contents be eliminated from the system
backup. The trick is that these files would need to be excluded (not mounted or identified in
/etc/exclude.rootvg) from the backup during the mksysb execution. Then, separately
recovered from their own backup. Other user data is in separate user volume groups.
With the emphasis on separate backups for non-BOS data, there comes a need to minimize
how long the applications need to be quiesced and still have data consistency. One technique
that AIX provides is JFS2 snapshots, which can briefly quiesce the application and still have a
consistent picture of the data at a single point in time. Then, you can either use that snapshot of
the data as its own backup, or base an actual backup upon that snapshot to have off-site
storage of the backup). There are other facilities for doing snapshot captures of data. Some are
part of the storage subsystems and some are part of total storage solutions such as Tivoli
Storage Manager. Your focus is on the facility that is provided with AIX, JFS2 snapshot.

V10.0
Student Notebook
Uempty
Effective problem management

IBM Power Systems
• Keep system documentation current
• Keep maintenance up to date
• Use a problem determination methodology
• If an AIX bug:
– Collect problem information
– Open problem report with AIX Support
– Provide snap with information
Figure 1-4. Effective problem management AN153.0
Notes:
Obtaining and documenting information about your system

It is a good idea, whenever you approach a new system to learn as much as you can about that
system. It is also critical to document not only the physical resources and the devices, but also
how the system was configured (network, LVM, and more). Then, this information is ready when
needed.
Later in the course, some ways are suggested to collect system information.
System maintenance
Sometimes code works well under normal testing or production circumstances, but can have
some poor logic that is discovered when faced with an unanticipated situation. Alternatively, it
might be some non-central aspect of the code that is not noticed normally. However, if the
number of facilities that use this code is large enough, then it is probable that one of the facilities
will detect and report the problem soon after release of the new code level. The fix for the code
defect usually comes out in the next released fix pack. Many facilities might not be effected or
Student Notebook
concerned about the code defect problem for months until the circumstances arise in which it
represents a problem. By installing newer service packs, a facility can benefit from the
experience of others and avoid known problems.
It is possible that a new fix pack introduces new problems, while solving many old problems.
This course covers some techniques to use in applying fix packs.
Problem determination
If you find yourself impacted by what you believe to be a product defect, you need to obtain
prompt resolution. There is no substitute for the experience of being able to recognize a
situation and remember the details of how you dealt with it the last time a similar problem
occurred. However, many problems are most effectively solved by following a developed
problem determination methodology. This course covers a basic problem determination
methodology.
Problem reporting
When you find yourself impacted by what you believe to be a product defect, you need to
contact AIX Support. Before contacting AIX Support, you should write up a description of the
problem and the surrounding circumstances. When you open a new Problem Management
Report (PMR) with AIX Support, you are expected to provide them with information to assist
them in determining the cause of the problem. The snap command is a common tool to help
collect a large amount of information about the environment. The course materials cover
procedures to report problems.

V10.0
Student Notebook
Uempty
Before problems occur

IBM Power Systems
• Effective problem determination starts with a good

understanding of the system and its components.
• The more information that you have about the normal

operation of a system, the better:
– System configuration
– Operating system level
– Applications installed
– Baseline performance System
– Installation, configuration, and documentation
service manuals
Figure 1-5. Before problems occur AN153.0
Notes:
Obtaining and documenting information about your system

It is a good idea whenever you approach a new system to learn as much as you can about that
system.
It is also critical to document both logical and physical device information so that it is available
when troubleshooting is necessary.
Information that should be documented

It is a good idea to maintain (what is commonly referred to as) a control book. A control book is
a collection of information that describes various aspects of your system. Having this
information is especially true when the problem involves not being able to access the system.
Examples of important items that should be determined and recorded include:
- Machine architecture (model, CPU type)
- Physical volumes (type and size of disks)
Student Notebook
- Volume groups (names, just a bunch of disks (JBOD) or redundant array of independent
disks (RAID)
- Logical volumes (mirrored or not, which volume group, type)
- File systems (which volume group, what applications)
- Memory (size) and paging spaces (how many, location)

V10.0
Student Notebook
Uempty
Before problems occur: A few good commands

IBM Power Systems
• lspv Lists physical volumes, PVID, VG membership

• lscfg Provides information about system components
• prtconf Displays system configuration information
• lsvg Lists the volume groups
• lsps Displays information about paging spaces
• lsfs Gives file system information
• lsdev Provides device information
• getconf Displays values of system configuration variables
• bootinfo Displays system configuration information (unsupported)
• snap Collects system data
Figure 1-6. Before problems occur: A few good commands AN153.0
Notes:
A list of useful commands

The list of commands on the visual provide a starting point for use in gathering key information
about your system.
Sources of additional information

Be sure to check the man pages or the AIX Commands Reference for correct syntax and option
flags to be used with these commands to provide more specific information. There is no man
page or entry in the AIX Commands Reference for the bootinfo command.
Student Notebook
Steps in problem resolution

IBM Power Systems
1. Identify the
problem
2. Talk to users
to define the
problem
3. Collect system
data
4. Resolve
the problem
Figure 1-7. Steps in problem resolution AN153.0
Notes:
The start-to-finish method

The start-to-finish method for resolving problems consists primarily of the following four major
components:
- Identify the problem
- Talk to users to define the problem
- Collect system data
- Resolve (fix) the problem
Step 1: Identify the problem

The first step in problem resolution is to find out what the problem is. It is important to
understand exactly what the users of the system perceive the problem to be.
A clear description of the problem typically gives clues as to the cause of the problem and aids
in the choice of troubleshooting methods to apply.

V10.0
Student Notebook
Uempty Step 2: Gathering more detail

Anyone who uses the system might identify a problem. If a problem is reported, gather
information from users to get more details to develop a clear picture of what happened.
The users might be data entry staff, programmers, system administrators, technical support
personnel, management, application developers, operations staff, network users, and so forth.
Suggested questions
- What is the problem?
- What is the system doing (or not doing)?
- How did you first notice the problem?
- When did it happen?
- Have any changes been made recently?
Keep them talking until the picture is clear. Ask many questions to be able to get the entire
history of the problem.
Step 3: Collect system data

Some information about the problem might be found from the users during the process of
defining the problem.
By using various commands, such as lsdev, lspv, lsvg, lslpp, lsattr, and others, you
can gather further information about the system configuration.
Gather other relevant information by using available error reporting facilities, determining the
state of the operating system, checking for the existence of a system dump, and inspecting the
various available log files.
- How is the machine configured?
- What errors are being produced?
- What is the state of the OS?
- Is there a system dump?
- What log files exist?
SMIT logs
If SMIT was used, extra logs might provide further information. The SMIT log files are normally
contained in the home directory of the root user. One is named smit.log, by default.
Student Notebook
Step 4: Resolve the problem

After all the information is gathered, determine the procedures necessary to solve the problem.
Keep a log of all actions you do in trying to determine the cause of the problem, and any actions
you do to correct the problem.
- Use the information that you gathered
- Keep a log of actions taken to correct the problem
- Use the tools available: commands documentation, downloadable fixes, and updates
- Contact IBM Support, if necessary
Resources for problem solving

Various resources, such as the documentation for individual commands, are available to assist
you in solving problems with AIX systems.
The IBM Knowledge Center is a website that serves as a central place for information on
POWER servers and AIX. A message database is available to search on error numbers, error
identifiers, and display codes (LED values). The website also contains FAQs, how-to’s and
more.
IBM Knowledge Center

The IBM Power Systems Information Center can be found at the following link:
http://ibm.com/support/knowledgecenter

V10.0
Student Notebook
Uempty
Progress and reference codes

IBM Power Systems
• Progress codes
– Checkpoint during a process such as boot, shutdown, or dump
• System reference codes (SRCs)

– Error codes for problems in hardware, firmware, or operating system
• Service request numbers (SRNs)

– Indicates the detecting component and error condition detected
• Obtained from:
– Front panel of system enclosure
– HMC or IVM (for logically partitioned systems)
– Operator console message or diagnostics (diag utility)
Figure 1-8. Progress and reference codes AN153.0
Notes:
Introduction
AIX provides progress and error indicators (display codes) during the boot process. These
display codes can be useful in resolving startup problems. Depending on the hardware platform,
the codes are displayed on the console and the operator panel.
Operator panel
For non-LPAR systems, the operator panel is an LED display on the front panel. Beginning with
the early POWER4 models, the Power Systems can be divided into multiple Logical Partitions
(LPARs). In this case, a system-wide LED display still exists on the front panel. However, the
operator panel for each LPAR is displayed on the screen of the Hardware Management Console
(HMC). The HMC is a separate system that is required when running multiple LPARs.
Regardless of where they are displayed, they are sometimes referred to as LED Display Codes.
Student Notebook
Progress codes and other reference codes

Reference codes can have various sources:
- Diagnostics:
Diagnostics or error log analysis can provide Service Request Numbers (SRNs) which can
be used to determine the source of a hardware or operating system problem.
- Hardware initialization:
System firmware sends boot status codes (called firmware checkpoints) to the operator
panel. When the console is initialized, the firmware can also send 8-digit error codes to the
console.
- AIX initialization:
The rc.boot script and the device configuration methods send progress and error codes to
the operator panel.
Codes from the hardware/firmware or from AIX initialization scripts fall into two categories:
- Progress Codes indicate the stages in the initial program load (IPL) or boot sequence. They
do not necessarily indicate a problem unless the sequence permanently stops on a single
code or a rotating sequence of codes.
- System Reference Codes (SRC) indicate that a problem originated in hardware, Licensed
Internal Code (firmware), or in the operating system.

V10.0
Student Notebook
Uempty
Reference codes at the IBM Knowledge Center

IBM Power Systems
Figure 1-9. Reference codes at the IBM Knowledge Center AN153.0
Notes:
Documentation
Note
All information on websites and their design is based on what is available at the time of this
course revision. Website URLs and the design of the related web pages often change.
Reference codes and their meanings can be found at: http://ibm.com/support/knowledgecenter

under the particular server with which you are working (though most codes are the same,
regardless of the server model).
Student Notebook
Working with AIX Support

IBM Power Systems
• Have the needed information ready:

– Name, phone #, customer #,
– Machine type model and serial #,
– AIX version, release, technology level (TL), and service pack (SP)
– Problem description, including error codes
– Severity level: Critical, significant impact, some impact, minimal
• 1-800-IBM-SERV (1-800-426-7378)
• Level 1 collects information and assign PMR number
• Route to Level 2 responsible for the product
• You might be asked to collect additional information to upload
• You might be asked to update to a specific TL or SP
– APAR for your problem already addressed
– Need to have a standard environment for them to investigate
Figure 1-10. Working with AIX Support AN153.0
Notes:
If you believe that your problem is the result of a system defect, you can call AIX Support to
request assistance. Before you call 1-800-IBM-SERV, it is a good idea to have certain
information ready. They want to verify your name against a list of names that are associated
with your customer number, and validate that your customer number has support for the product
in question. They also need to know some details about the hardware and software
environment in which the problem is occurring. The details might include your MTMS (machine
type, model, serial), your AIX OS level, and the level of any other relevant software. You need to
explain your problem, providing as much detail as possible, especially any error messages or
codes.
The Level 1 Support personnel need to identify the priority of your problem.
- Severity level 1(critical) indicates that the function does not work, your business is severely
impacted, there is no work-around, and that there needs to be an immediate solution. For
severity level 1, you are expected to be available 24x7 until the problem is resolved.
- Severity level 2 (significant impact) indicates that the function is usable but is limited in a
way that your business is severely impacted.

V10.0
Student Notebook
Uempty - Severity level 3 (some impact) indicates that the program is usable with less significant
features (not critical to operations) unavailable.
- Severity level 4 (minimal impact) indicates that the problem causes little impact on
operations, or a reasonable circumvention to the problem was implemented.
Level 1 Support assigns you a PMR number (a PMR and branch number combination) for
tracking purposes. In the future, each time you call about this problem, you should have the
PMR and branch numbers at hand.
When the basic information is collected, you are passed to Level 2 Support for the product area
for which you are having a problem. They work with you in investigating the nature and cause of
your problem. They search the support database to see whether it is a known problem that is
either already being worked on or has a solution that is already developed. In many cases, they
request that you update to a specific technology level (TL) and service pack (SP) that already
includes the fix.
If they do not have a fix, you might be asked to update your system and determine whether the
problem still exists. If the problem still exists, they now have a known software environment to
work with. They often ask for a complete set of information from your system to be collected and
uploaded to their server to support their investigation. The basic tool for collecting your system
information is the snap command.
Student Notebook
AIX Support test case data (1 of 2)

IBM Power Systems
Run the following (or very similar) commands to gather

snap information:
# snap –a
Copy any extra data to the /tmp/ibmsupt/testcase or the /tmp/ibmsupt/other

directory
# snap –c This step creates /tmp/ibmsupt/snap.pax.Z

# cd /tmp/ibmsupt
# mv snap.pax.Z \ PMR#.b<branch#>.c<country#>.snap.pax.Z
Figure 1-11. AIX Support test case data (1 of 2) AN153.0
Notes:
Overview of the snap command

The snap command is used to gather system configuration information useful in identifying and
resolving system problems.
The snap command can also be used to compress the snap information that is gathered into a
pax file. The file can then be written to a device such as tape or DVD, or transmitted to a remote
system.
Refer to the man page for snap or the corresponding entry in the AIX Commands Reference
manual for detailed information about the snap command and its various flags.
Command sequence that is shown on the visual

As illustrated on the visual, the -a flag of the snap command should be used to gather all
system configuration information that can be gathered by using snap. The output of this
command is written to the /tmp/ibmsupt directory.

V10.0
Student Notebook
Uempty Next, you should place any additional testcase data that you feel might be helpful in resolving
the problem into either the /tmp/ibmsupt/other or /tmp/ibmsupt/testcase directory.
This additional information is then included (together with the information gathered directly by
snap) into the compressed pax file that is created in the next step in this command sequence.
As shown, the -c flag of the snap command should then be used to create a compressed pax
file that contains all files that are contained in the /tmp/ibmsupt directory. The output file is
/tmp/ibmsupt/snap.pax.Z.
Next, the /tmp/ibmsupt/snap.pax.Z output file should be renamed by using the mv
command to indicate the PMR number, branch number, and country number that is associated
with the data in the file. For example, if the PMR number is 12345, the branch number is 567,
and the country number is 890, the file should be renamed 12345.b567.c890.snap.pax.Z.
(The country code for the United States is: 000).
Student Notebook
AIX Support test case data (2 of 2)

IBM Power Systems
Upload the information that has been captured:
# ftp testcase.software.ibm.com
User: anonymous
Password: <your email address>
ftp> cd /toibm/aix
ftp> bin
ftp> put PMR#.b<branch#>.c<country#>.snap.pax.Z
ftp> quit
Figure 1-12. AIX Support test case data (2 of 2) AN153.0
Notes:
Uploading data to AIX Support

AIX Support provides an anonymous FTP server for receiving your testcase data. The host
name for that server is: testcase.software.ibm.com.
When you log in to the server, change directory to /toibm/aix.
Be sure to transfer the file as binary to avoid an undesirable attempt by FTP to convert the
contents of the file.
Then, just put your file on the server and notify your support contact that the data is there.

V10.0
Student Notebook
Uempty
AIX software update hierarchy

IBM Power Systems
• Version and release (oslevel)

– Requires new license and migration installation
• Fileset updates (lslpp –L shows mod and fix levels)
– Collected changes to files in a fileset
– Related to APARs and PTFs
– Only need to apply the new fileset
• Fix bundles
– Collections of fileset updates
• Technology level and maintenance level (oslevel –r)
– Fix bundle of enhancements and fixes
• Service packs (oslevel –s)
– Fix bundle of important fixes
• Interim fixes
– Special situation code replacements
– Delay for normal PTF packaging is too slow
– Managed with efix tool
Figure 1-13. AIX software update hierarchy AN153.0
Notes:
Version, release, mod, and fix

The oslevel command, by default, displays the version and release of the AIX operating
system. Changing version and release requires a new license and a disruption to the system
(such as rebooting to installation and maintenance to do a migration installation). The mod and
fix levels in the oslevel -s output is normally displayed as zeros. The mod level that is
displayed in the oslevel output should reflect the technology level.
The mod and fix levels are used to reflect changes to the many individual filesets that make up
the operating system. You can see the mod and fix levels in the output of the lslpp -L
command. These changes require the administrator to install a Program Temporary Fix (PTF) in
the form of a fix fileset. A fix fileset can resolve one or more problems or APARs (Authorized
Program Analysis Report).
Student Notebook
Fix bundles
It is useful to collect many accumulated PTFs together and test them together. Then, they can
be used as a base line for a new cycle of enhancements and corrections. By testing them
together, it is often possible to find unexpected interactions between them.
There are two types of AIX fix bundles.
- One type of fix bundle is a Technology Level (TL) update (formally known as Maintenance
Level or ML). A TL is a major fix bundle that not only includes many fixes for code problems,
but also includes minor functional enhancements. You can identify the current AIX
technology level by running the oslevel -r command.
- Another type of bundling is a Service Pack (SP). A Service Pack is released more frequently
than a Technology Level (between TL releases) and usually contains only the needed fixes.
You can identify the current AIX technology level and service pack by running the
oslevel -s command.
For the oslevel command to reflect a new TL or SP, all related filesets fixes must be installed.
If a single fileset update in the fix bundle is not installed, the TL or SP level is not changed.
Interim fixes
On rare occasions, a customer has an urgent situation that needs fixes for a problem so quickly
that they cannot wait for the formal PTF to be released. In those situations, a developer might
place one or more individual file replacements on an FTP server and allow the system
administrator to download and install them. Originally, it would involve manually copying the
new files over the old files. But this created problems, especially in identifying the state of a
system that later experienced other (possibly related) problems or in backing out the changes.
Today, there is a better methodology for managing these interim fixes that use the efix
command. Security alerts often provide interim fixes for the identified security exposure.
Depending upon your own risk analysis, you might immediately use the interim fix, or wait for
the next service pack (which will include these security fixes).
The syntax and use of the efix command was covered in the prerequisite course.

V10.0
Student Notebook
Uempty
Relevant documentation
IBM Power Systems
• IBM Systems Product Information entry page:

• IBM Redbooks home http://www.redbooks.ibm.com

Figure 1-14. Relevant documentation AN153.0
Notes:
IBM Knowledge Center

Most software and hardware documentation for AIX and POWER-based systems can be
accessed online from the IBM Knowledge Center: http://ibm.com/support/knowledgecenter.
The information from the IBM Knowledge Center is available both in online form and as
downloadable PDF files.
IBM Redbooks
Redbooks can be viewed, downloaded, or ordered from the IBM Redbooks website:
http://www.redbooks.ibm.com
Student Notebook
Checkpoint
IBM Power Systems
1. What are the four major problem determination steps?
2. Who should provide information about system problems?
3. True or False: If there is a problem with the software, it is

necessary to get the next release of the product to resolve
the problem.
4. True or False: Documentation can be viewed or downloaded

from the IBM website.
Figure 1-15. Checkpoint AN153.0
Notes:

V10.0
Student Notebook
Uempty
Exercise: Problem diagnostic information

IBM Power Systems
• Obtain configuration information about your

system
• Use the IBM Knowledge Center to find
reference code information
• Create, compress, and rename a snap file
for upload to AIX Support
Figure 1-16. Exercise: Problem diagnostic information AN153.0
Notes:
Student Notebook
Unit summary
IBM Power Systems
Having completed this unit, you should be able to:

• List the steps of a basic methodology for problem
determination
• List AIX features that help minimize planned downtime or
shortening the maintenance window
• Explain how to find documentation and other key resources
that are needed for problem resolution
Figure 1-17. Unit summary AN153.0
Notes:

V10.0
Student Notebook
Uempty
Unit 2. The Object Data Manager

This unit describes the structure of the Object Data Manager (ODM). It
shows the use of the ODM command-line interface and explains the role of
the ODM in device configuration. Specific information about the function and
content of the most important ODM files is also presented.

• Describe the structure of the ODM
• Use the ODM command-line interface
• Explain the role of the ODM in device configuration
• Describe the function of the most important ODM files

Accountability:
• Lab exercise
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 General Programming Concepts: Writing
and Debugging Programs
Online AIX Version 7.1 Technical Reference: Kernel and
Subsystems
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-1
Student Notebook
Unit objectives
IBM Power Systems

Notes:
Importance of this unit

The ODM is an important component of AIX and is one major feature that distinguishes AIX
from other UNIX systems. This unit describes the structure of the ODM and explains how you
can work with ODM files that use the ODM command-line interface.
It is also important that you, as an AIX system administrator, understand the role of the ODM
during device configuration. Thus, explaining the role of the ODM in this process is another
major objective of this unit.

V10.0
Student Notebook
Uempty 2.1. Introduction to the ODM
Student Notebook
What is the ODM?

IBM Power Systems
• The Object Data Manager (ODM) is a database that is

intended for storing system information
• Physical and logical device information is stored and

maintained by using objects with associated
characteristics
Figure 2-2. What is the ODM? AN153.0
Notes:

V10.0
Student Notebook
Uempty
Data managed by the ODM

IBM Power Systems
Devices Software
System
SMIT menus
Resource ODM and panels
Controller
TCP/IP Error Log,

NIM
configuration Dump
Figure 2-3. Data managed by the ODM AN153.0
Notes:
System data that is managed by ODM

The ODM manages the following system data:
- *Device configuration data
- *Software Vital Product Data (SWVPD)
- System Resource Controller (SRC) data
- TCP/IP configuration data
- Error log and dump information
- NIM (Network Installation Manager) information
- SMIT menus and commands
Emphasis in this unit

The main emphasis in this unit is on the use of the ODM to store and manage information for
devices and software products (software vital product data). During the course, many other
ODM classes are described.
Student Notebook
ODM components
IBM Power Systems
uniquetype attribute deflt values
tape/scsi/scsd block_size none 0-2147483648,1
disk/scsi/osdisk pvid none
tty/rs232/tty login disable enable, disable, ...
Figure 2-4. ODM components AN153.0
Notes:
Completing the drawing on the visual

The drawing on the visual identifies the basic components of ODM, but some terms were
intentionally omitted from the drawing. Your instructor completes this drawing during the lecture.
Complete your own copy of the drawing by writing in the terms that are supplied by your
instructor.
ODM data format

For security reasons, the ODM data is stored in binary format. To work with ODM files, you must
use the ODM command-line interface. It is not possible to update ODM files with an editor.

V10.0
Student Notebook
Uempty
ODM database files

IBM Power Systems
Predefined device information PdDv, PdAt, PdCn

Customized device information CuDv, CuAt, CuDep, CuDvDr,
CuVPD, Config_Rules
Software vital product data history, inventory, lpp, product
SMIT menus sm_menu_opt, sm_name_hdr,
sm_cmd_hdr, sm_cmd_opt
Error log, alog, and dump SWservAt
information
System resource controller SRCsubsys, SRCsubsvr, ...
Network Installation Manager nim_attr, nim_object,
(NIM) nim_pdattr
Figure 2-5. ODM database files AN153.0
Notes:
Major ODM files

The table in the visual summarizes the major ODM files in AIX. As you can see, the files that are
listed in this table are placed into several different categories.
Current focus
This unit concentrates on ODM classes that are used to store device information and software
product data. This section focuses on ODM classes that store device information.
Student Notebook
Predefined and customized device information

The first two rows in the table on the visual indicate that some ODM classes contain predefined
device information and that others contain customized device information. What is the
difference between these two types of information?
Predefined device information describes all supported devices. Customized device information
describes all devices that are defined on the system.
It is important that you understand the difference between these two information classifications.
The classes are described in more detail in the next topic of this unit.

V10.0
Student Notebook
Uempty
Device configuration summary

IBM Power Systems
Predefined databases
PdDv
PdCn PdAt
Configuration manager
Config_Rules
(cfgmgr)
Customized databases
CuDep CuDv CuAt
CuDvDr CuVPD
Figure 2-6. Device configuration summary AN153.0
Notes:
ODM classes that are used during device configuration

The visual shows the ODM object classes that are used during the configuration of a device.
Roles of cfgmgr and the Config_Rules ODM object class

When an AIX system boots, the Configuration Manager (cfgmgr) is responsible for configuring
devices. One ODM object class that the cfgmgr uses to determine the correct sequence when
configuring devices is Config_Rules. This ODM object class also contains information about
various method files that are used for device management.
Student Notebook
IBM Power Systems
Predefined "Plug and Play"

PdDv
PdAt
PdCn
Config_Rules
cfgmgr
Customized Methods
CuDv Define
Device Load
CuAt Configure
Driver
CuDep Change
CuDvDr Unload Unconfigure
CuVPD Undefine
Figure 2-7. Configuration manager AN153.0
Notes:
Importance of Config_Rules object class

Although cfgmgr gets credit for managing devices (adding, deleting, changing, and so forth), it
is the programs, called methods, which are defined in the predefined devices object class that
does the actual work. The Config_Rules object class defines the order in which cfgmgr
examines various buses to look for attached devices to configure.

V10.0
Student Notebook
Uempty
Location and contents of ODM repositories

IBM Power Systems
CuDv Constant for machines of same architecture

CuAt
CuDep Constant for all machines
CuDvDr
CuVPD
Config_Rules PdDv
PdAt
history PdCn
inventory
lpp history
product inventory
lpp history
nim_* product inventory
SWservAt lpp
SRC* sm_* product
/etc/objrepos /usr/lib/objrepos /usr/share/lib/objrepos
Figure 2-8. Location and contents of ODM repositories AN153.0
Notes:
Introduction
Originally, the three parts of the ODM were designed to support diskless, dataless, and other
workstations. The ODM object classes are held in three repositories. Each of these repositories
is described in the material that follows.
/etc/objrepos
The purpose of this location is to hold information that is expected to vary from machine to
machine. It contains the part of the product that cannot be shared among machines. Each client
must have its own copy. Most of this software requires a separate copy of the product for each
machine that is associated with the configuration of the machine or product.
One example is the customized device information. For example, the location of a device or the
overrides to the default attributes can be expected to vary.
This repository contains the customized devices object classes and the four object classes that
are used by the Software Vital Product Database (SWVPD) for the / (root) part of the installable
Student Notebook
software product. The root part of the software contains files that must be installed on the target
system. For example, any configuration files that are used by the programs would be in the root
part.
To access information in the other directories, this directory contains symbolic links to the
predefined devices object classes. The links are needed because the ODMDIR variable points to
only /etc/objrepos.
/usr/lib/objrepos
This repository contains the predefined devices object classes, SMIT menu object classes, and
the four object classes that are used by the SWVPD for the /usr part of the installable software
product. The object classes in this repository can be shared across the network by /usr clients,
dataless and diskless workstations. Software that is installed in the /usr part can be shared
among several machines with compatible hardware architectures.
/usr/share/lib/objrepos
Contains the four object classes that are used by the SWVPD for the /usr/share part of the
installable software product. The /usr/share part of a software product contains files that are not
hardware-dependent. They can be shared among several machines, even if the machines have
a different hardware architecture. An example is terminfo files that describe terminal
capabilities. As terminfo is used on many UNIX systems, terminfo files are part of the
/usr/share part of a system product.
lslpp options
The lslpp command can list the software that is recorded in the ODM. When run with the -l
(lowercase L) flag, it lists each of the locations (/, /usr/lib, /usr/share/lib) where it finds the fileset.
If you are not concerned with these distinctions, it can be distracting. Alternately, you can run
lslpp -L that reports each fileset one time, without distinguishing between the root, usr, and
share portions.
When should you be concerned about private versus shared ODM

repositories?
Most of you do not deal with diskless and dataless servers, so the distinctions for the device
objects generally do not concern you; every machine has all three repositories local for those
objects.
If you are working with Workload Partitions (WPARs), then the different software object class
repositories are a major concern. A WPAR would have the private or root portion of the software
in its private file systems. The other object repositories for the software would be maintained in
global environment file systems, which would be shared among all WPARs, with read-only
mounts. For details, attend the course that teaches AIX Workload Partitions.

V10.0
Student Notebook
Uempty
How ODM classes act together

IBM Power Systems
# cfgmgr
PdDv: CuDv:
type = "14106902" name = "ent1"
class = "adapter" status = 1
subclass = "pci" chgstatus = 2
prefix = "ent" ddins = "pci/goentdd"
... location = "02-08"
DvDr = "pci/goentdd" parent = "pci2"
Define = /usr/lib/methods/define_rspc" connwhere = "8“
Configure = "/usr/lib/methods/cfggoent" PdDvLn = "adapter/pci/14106902"
...
uniquetype = "adapter/pci/14106902"
# chdev -l ent1 -a jumbo_frames=yes
PdAt: CuAt:
uniquetype = "adapter/pci/14106902" name = "ent1"
attribute = "jumbo_frames" attribute = "jumbo_frames"
deflt = "no" value = "yes"
values = "yes,no" type = "R"
... ...
Figure 2-9. How ODM classes act together AN153.0
Notes:
Interaction of ODM classes

The visual and the notes summarize how ODM classes act together.
- For a particular device to be defined in AIX, the device type must be defined in ODM class
PdDv.
- A device is defined with either the cfgmgr (if the device is detectable), or by the mkdev
command. Both commands use the define method to generate an instance in ODM class
CuDv. The configure method is used to load a specific device driver and to generate an
entry in the /dev directory.
Notice the link PdDvLn from CuDv back to PdDv.
- Default attribute values are only in PdAt which, in the example of a gigabit Ethernet adapter,
means you might not use jumbo frames (default is no). If you change the attributes, for
example, jumbo_frames to yes, you get an object with the nondefault value in CuAt.
Student Notebook
Data not managed by the ODM

IBM Power Systems
File system
information ?
User/security
information ?
Queues and
queue devices ?
Figure 2-10. Data not managed by the ODM AN153.0
Notes:
Completion of this page

The visual identifies some types of system information that the ODM does not manage, but the
names of the files that store these types of information are intentionally omitted from the visual.
Your instructor completes this visual during the lecture. Complete your own copy of the visual by
writing in the file names that are supplied by your instructor.

V10.0
Student Notebook
Uempty
Let's review: Device configuration and the ODM

IBM Power Systems
1.
_______
Undefined Defined Available
2. 3.
AIX kernel Applications
D____ D____ 4. /____/_____ 5.
Figure 2-11. Let's review: Device configuration and the ODM AN153.0
Notes:
Instructions
Answer the following questions by writing them on the picture in the visual. If you are unsure
about a question, leave it out.
1) Which command configures devices in an AIX system? Note: It is not an ODM
command.
2) Which ODM class contains all devices that your system supports?
3) Which ODM class contains all devices that are configured in your system?
4) Which programs are loaded into the AIX kernel to control access to the devices?
5) If you have a configured tape drive rmt1, which special file do applications access to
work with this device?
Student Notebook
ODM commands
IBM Power Systems
Object class: odmcreate, odmdrop
Descriptors: odmshow
uniquetype attribute deflt values
tape/scsi/scsd block_size none 0-2147483648,1
disk/scsi/osdisk pvid none
tty/rs232/tty login disable enable, disable, ...
Objects: odmadd, odmchange, odmdelete, odmget
Figure 2-12. ODM commands AN153.0
Notes:
Introduction
Different commands are available for working with each of the ODM components: object
classes, descriptors, and objects.
Commands for working with ODM object classes

- Creating object classes
You can create ODM classes that use the odmcreate command. This command has
the following syntax:
odmcreate descriptor_file.cre
The file descriptor_file.cre contains the class definition for the corresponding
ODM class. Usually, these files have the suffix .cre. The exercise for this unit contains
an optional part that shows how to create self-defined ODM object classes.

V10.0
Student Notebook
Uempty - Deleting object classes

To delete an entire ODM class, use the odmdrop command. The odmdrop command
has the following syntax:
odmdrop -o object_class_name
The name object_class_name is the name of the ODM class you want to remove. Be
careful with this command. It removes the complete class immediately.
Command to view ODM descriptors

To view the underlying structure of an object class, use the odmshow command. The odmshow
command has the following syntax:
odmshow object_class_name
The visual shows an extraction from ODM class PdAt, where four descriptors are shown
(uniquetype, attribute, deflt, and values).
Commands for working with ODM objects

Usually, system administrators work with ODM objects. The odmget command retrieves object
information from an existing object class. To add new objects, use the odmadd command. To
delete objects, use the odmdelete command. To change objects, use the odmchange
command. Working on the object level is explained in more detail on the following pages.
The ODMDIR environment variable

All ODM commands use the ODMDIR environment variable, which is set in the file
/etc/environment. The default value of ODMDIR is /etc/objrepos.
Student Notebook
Changing attribute values with odmadd and

odmdelete
IBM Power Systems
• The odmdelete and odmadd commands can be used to change

attributes in objects.
1. Create a file with the object to change.
2. Edit the file and change the attribute value.
3. Delete the existing object.
4. Add the new object.
# odmget –q "uniquetype=tape/scsi/scsd and attribute=block_size" PdAt > file
# vi file
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = “512" Change deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
# odmdelete -o PdAt –q "uniquetype=tape/scsi/scsd and attribute=block_size"
# odmadd file
Figure 2-13. Changing attribute values with odmadd and odmdelete AN153.0
Notes:
Command sequence on the visual

The odmget command in the example on the visual retrieves all the records from the PdAt
object class, where uniquetype is equal to tape/scsi/scsd and attribute is equal to
block_size. In this instance, only one record should be matched. The information is
redirected into a file that can be changed by using an editor.
In this example, the default value for the attribute block_size is changed to 512.
Note: Before the new value of 512 can be added into the ODM, the old object (which had the
block_size set to a null value) must be deleted. Otherwise, you would have two objects that
describe the same attribute in the database. It uses the first object that is found. The results
might be confusing. For this reason, it is important to delete an entry before adding a
replacement record.
The final operation is to add the object that is defined in the file into the ODM.

V10.0
Student Notebook
Uempty Need to use ODM commands

The ODM objects are stored in a binary format; that means you need to work with the ODM
commands to query or change any objects.
Possible queries
As with any database, you can create queries for records that match certain criteria. The tests
are on the values of the descriptors of the objects. A number of tests can be done:
= Equal
!= Not equal
> Greater
>= Greater than or equal to
< Less than
<= Less than or equal to
like Similar to; finds patterns in character string data
For example to search for records where the value of the lpp_name attribute begins with
bosext1., you would use the syntax lpp_name like bosext1.*
Tests can be linked together by using normal Boolean operations, as shown in the following
example:
uniquetype=tape/scsi/scsd and attribute=block_size
In addition to the * wildcard, a ? can be used as a wildcard character.
Student Notebook
Changing attribute values with odmchange

IBM Power Systems
• The odmchange command modifies all objects that satisfy the search criteria.
1. Create a file with the object to change.
2. Edit the file and change the attribute value.
3. Use odmchange to delete the existing object and add the new object.
• Syntax:
odmchange -o ObjectClass [ -q criteria] input_file
# odmget –q "uniquetype=tape/scsi/scsd and attribute=block_size" PdAt > file
# vi file
PdAt:
deflt = "512" Change deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
# odmchange -o PdAt –q "uniquetype=tape/scsi/scsd and attribute=block_size" file
Figure 2-14. Changing attribute values with odmchange AN153.0
Notes:
Another way of changing attribute values

The series of steps that are shown on this visual show how the odmchange command can be
used instead of the odmadd and odmdelete steps that are shown in the previous example to
modify attribute values.

V10.0
Student Notebook
Uempty 2.2. ODM database files
Student Notebook
Software vital product data

IBM Power Systems
product:
lpp_name = "bos.rte.printers"
comp_id = "5765-G6200" inventory:
update = 0 lpp_id = 38
cp_flag = 2359571 private = 0
fesn = "0000" file_type = 0
name = "bos" format = 1
state = 5 loc0 = "/etc/qconfig"
ver = 7 loc1 = ""
rel = 1 loc2 = ""
mod = 0 size = 0
fix = 0
checksum = 0
ptf = ""
...
media = 0
sceded_by = ""
fixinfo = ""
prereq = "*coreq bos.rte 7.1.0.0"
description = "" history:
supersedes = "" lpp_id = 38
event = 2
lpp: ver = 7
name = "bos.rte.printers" rel = 1
size = 0 mod = 0
state = 5 fix = 0
cp_flag = 2359571 ptf = ""
group = "" corr_svn = ""
magic_letter = "I" cp_mod = ""
ver = 7 cp_fix = ""
rel = 1 login_name = "root"
mod = 0 state = 1
fix = 0 time = 1310159341
description = "Front End Printer Support" comment = ""
lpp_id = 38
Figure 2-15. Software vital product data AN153.0
Notes:
Role of the installp command

Whenever installing a product or update in AIX, the installp command uses the ODM to
maintain the Software Vital Product Database (SWVPD).

V10.0
Student Notebook
Uempty Contents of SWVPD

The following information is part of the SWVPD:
• The name of the software product (for example, bos.rte.printers)
• The version, release, modification, and fix level of the software product (for example,
6.1.5.2 or 7.1.3.3)
• The fix level, which contains a summary of fixes that are implemented in a product
• Any Program Temporary Fix (PTF) installed on the system
• The state of the software product:
- Available (state = 1)
- Applying (state = 2)
- Applied (state = 3)
- Committing (state = 4)
- Committed (state = 5)
- Rejecting (state = 6)
- Broken (state = 7)
SWVPD classes
The Software Vital Product Data is stored in the following ODM classes:
lpp The lpp object class contains information about the installed software
products, including the current software product state and
description.
inventory The inventory object class contains information about the files that
are associated with a software product.
product The product object class contains product information about the
installation and updates of software products and their prerequisites.
history The history object class contains historical information about the
installation and updates of software products.
Student Notebook
Software states
IBM Power Systems
Applied • Only possible for PTFs or updates

• Previous version stored in /usr/lpp/Package_Name
• Rejecting update recovers to saved version
• Committing update deletes previous version
Committed • Removing committed software is possible
• No return to previous version
Applying If installation was not successful:

Committing 1. installp -C
Rejecting 2. smit maintain_software
Deinstalling
Broken • Cleanup failed
• Remove software and reinstall
Figure 2-16. Software states AN153.0
Notes:
Introduction
The AIX software vital product database uses software states that describe the status of an
installation or update package.
The applied and committed states

When installing a Program Temporary Fix (PTF) or update package, you can install the software
into an applied state. Software in an applied state contains the newly installed version (which is
active) and a backup of the old version (which is inactive). Software in the applied state gives
you the opportunity to test the new software. If it works as expected, you can commit the
software, which removes the old version. If it does not work as planned, you can reject the
software, which removes the new software and reactivate the old version. Install packages
cannot be applied. These packages are always in the committed state.
If you would like to return to the old version after a product is committed, you must remove the
current version and reinstall the old version.

V10.0
Student Notebook
Uempty States indicating installation problems

If an installation does not complete successfully, for example, if the power fails during the
installation, you might find software states like applying, committing, rejecting, or deinstalling. To
recover from this failure, run the command installp -C or use the SMIT fastpath
smit maintain_software. Select Clean Up After Failed or Interrupted Installation when
working in SMIT.
The broken state

After a cleanup of a failed installation, you might detect a broken software status. In this case,
the only way to recover from the failure is to remove and reinstall the software package.
Student Notebook
Predefined devices
IBM Power Systems
PdDv:
type = "scsd"
class = "tape"
subclass = "scsi"
prefix = "rmt"
...
base = 0
...
detectable = 1
...
led = 0
setno = 54
msgno = 0
catalog = "devices.cat"
DvDr = "tape"
Define = "/etc/methods/define"
Configure = "/etc/methods/cfgsctape"
Change = "/etc/methods/chggen"
Unconfigure = "/etc/methods/ucfgdevice"
Undefine = "etc/methods/undefine"
Start = ""
Stop = ""
...
Figure 2-17. Predefined devices AN153.0
Notes:
The predefined devices (PdDv) object class

The predefined devices (PdDv) object class contains entries for all devices supported on the
system. A device that is not part of this ODM class cannot be defined or configured on an AIX
system. Key attributes of objects in this class are described in the following paragraphs.
type
Type specifies the product name or model number, for example, scsd.
class
Specifies the functional class name. A functional class is a group of device instances that share
a high-level function. For example, tape is a functional class name that represents all tape
devices.

V10.0
Student Notebook
Uempty subclass
Device classes are grouped into subclasses. The subclass scsi specifies all tape devices that
can be attached to a SCSI interface.
prefix
Prefix specifies the Assigned Prefix in the customized database, which is used to derive the
device instance name and /dev name. For example, rmt is the prefix name that is assigned to
tape devices. Names of tape devices would then look like rmt0, rmt1, or rmt2.
base
This descriptor specifies whether a device is a base device or not. A base device is any device
that forms part of a minimal base system. During system boot, a minimal base system is
configured to allow access to the root volume group (rootvg) and hence to the root file system.
This minimal base system can include, for example, a SCSI hard disk. The device that is shown
on the visual is not a base device.
This flag is also used by the bosboot and savebase commands, which are introduced later in
this course.
detectable
Detectable specifies whether the device instance is detectable or undetectable by cfgmgr
when it is powered on and attached to the system. A value of 1 means that the device is
detectable, and a value of 0 that it is not (for example, a printer or tty).
led
Led indicates the value that is displayed on the LEDs when the configure method runs. The
value that is stored is decimal, but the value that is shown on the LEDs is hexadecimal (2418 is
972 in hex).
setno, msgno
Each device has a specific description (for example, SCSI Tape Drive) that is shown when the
lsdev command is used to list the device attributes. The setno and msgno descriptors are
used to look up the description in a message catalog.
catalog
Catalog identifies the filename of the National Language Support (NLS) catalog. The LANG
variable on a system controls the catalog file to use to show a message. For example, if LANG is
set to en_US, the catalog file /usr/lib/nls/msg/en_US/devices.cat is used. If LANG is
de_DE, catalog /usr/lib/nls/msg/de_DE/devices.cat is used.
Student Notebook
DvDr
DvDr identifies the name of the device driver that is associated with the device (for example,
tape). Usually, device drivers are stored in directory /usr/lib/drivers. Device drivers are
loaded into the AIX kernel when a device is made available.
Define
Define names the define method that is associated with the device type. This program is called
when a device is brought into the defined state.
Configure
Configure names the configure method that is associated with the device type. This program is
called when a device is brought into the available state.
Change
Change names the change method that is associated with the device type. This program is
called when a device attribute is changed through the chdev command.
Unconfigure
Unconfigure names the unconfigure method that is associated with the device type. This
program is called when a device is unconfigured by rmdev -l.
Undefine
Undefine names the undefine method that is associated with the device type. This program is
called when a device is undefined by rmdev -l -d.
Start, stop
Few devices support a stopped state (only logical devices). A stopped state means that the
device driver is loaded, but no application can access the device. These two attributes name the
methods to start or stop a device.
uniquetype
uniquetype is a key that other object classes reference. Objects use this descriptor as a pointer
back to the device description in PdDv. The key is a concatenation of the class, subclass, and
type values.

V10.0
Student Notebook
Uempty
Predefined attributes
IBM Power Systems
PdAt:
deflt = ""
values = "0-2147483648,1"
...
PdAt:
uniquetype = "disk/scsi/osdisk"
attribute = "pvid"
deflt = "none"
values = ""
...
PdAt:
uniquetype = "tty/rs232/tty"
attribute = "term"
deflt = "dumb"
values = ""
...
Figure 2-18. Predefined attributes AN153.0
Notes:
The predefined attribute (PdAt) object class

The predefined attribute (PdAt) object class contains an entry for each existing attribute for
each device that is represented in the PdDv object class. An attribute is any device-dependent
information, such as interrupt levels, bus I/O address ranges, baud rates, parity settings, or
block sizes.
The extract out of PdAt that is given on the visual shows three attributes (block_size, pvid
(physical volume identifier), and term (terminal name)) and their default values.
The meanings of the key fields that are shown on the visual are described in the paragraphs
that follow.
uniquetype
This descriptor is used as a pointer back to the device defined in the PdDv object class.
Student Notebook
attribute
Attribute identifies the name of the attribute. This attribute is the name that can be passed to the
mkdev or chdev command. For example, to change the default name of dumb to ibm3151 for
tty0, you can run the following command:
# chdev -l tty0 -a term=ibm3151
deflt
deflt identifies the default value for an attribute. Nondefault values are stored in CuAt.
values
Values identifies the possible values that can be associated with the attribute name. For
example, allowed values for the block_size attribute range from 0 to 2147483648, with an
increment of 1.

V10.0
Student Notebook
Uempty
Customized devices
IBM Power Systems
CuDv:
name = "ent1"
status = 1
chgstatus = 2
ddins = "pci/goentdd"
location = "02-08"
parent = "pci2"
connwhere = "8"
PdDvLn = "adapter/pci/14106902"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 2
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Figure 2-19. Customized devices AN153.0
Notes:
The customized devices (CuDv) object class

The customized devices (CuDv) object class contains entries for all device instances that are
defined in the system. An object is defined when a define method in PdDv is run and an entry is
created in CuDv object class. A defined device object might or might not have a corresponding
actual device that is attached to the system.
The CuDv object class contains objects that provide device and connection information for each
device. Each device has a unique logical name. The customized database is updated twice,
during system bootup and at run time, to define new devices, remove undefined devices, and
update the information for a device.
The key descriptors in CuDv are described in the next few paragraphs.
name
A customized device object for a device instance is assigned a unique logical name to
distinguish the device from other devices. The visual shows two devices, an Ethernet adapter
ent1 and a disk drive hdisk2.
Student Notebook
status
Status identifies the status of the device instance. Possible values are:
- status = 0 - Defined
- status = 1 - Available
- status = 2 - Stopped
chgstatus
This flag tells whether the device instance was altered since the last system boot. The
diagnostics facility uses this flag to validate system configuration. The flag can take these
values:
- chgstatus = 0 - New device
- chgstatus = 1 - Does not know
- chgstatus = 2 - Same
- chgstatus = 3 - Device is missing
ddins
This descriptor typically contains the same value as the Device Driver Name descriptor in the
predefined devices (PdDv) object class. It specifies the name of the device driver that is loaded
into the AIX kernel.
location
Identifies the AIX location of a device. The location code is a path from the system unit through
the adapter to the device. In a hardware problem, the location code is used by technical support
to identify a failing device.
parent
Identifies the logical name of the parent device. For example, the parent device of hdisk2 is
scsi1.
connwhere
Identifies the specific location on the parent device where the device is connected. For
example, the device hdisk2 uses the SCSI address 8,0.
PdDvLn
Provides a link to the device instance's predefined information through the uniquetype
descriptor in the PdDv object class.

V10.0
Student Notebook
Uempty
Customized attributes
IBM Power Systems
CuAt:
name = "ent1"
attribute = "jumbo_frames"
value = "yes"
...
CuAt:
name = "hdisk2"
attribute = "pvid"
value = "00c35ba0816eafe50000000000000000"
...
Figure 2-20. Customized attributes AN153.0
Notes:
The customized attribute (CuAt) object class

The Customized Attribute (CuAt) object class contains customized device-specific attribute
information.
Devices that are represented in the customized devices (CuDv) object class have attributes that
are found in the predefined attribute (PdAt) object class and the CuAt object class. There is an
entry in the CuAt object class for attributes that take customized values. Attributes taking the
default value are found in the PdAt object class. Each entry describes the current value of the
attribute.
Examples on visual
The sample CuAt entries on the visual show two attributes that have customized values. The
attribute jumbo_frames was changed to yes. The attribute pvid shows the physical volume
identifier that was assigned to disk hdisk0.
Student Notebook
Additional device object classes

IBM Power Systems
PdCn: CuDvDr:
resource = "devno"
uniquetype = "adapter/pci/sym875"
value1 = "36"
connkey = "scsi"
value2 = "0"
connwhere = "1,0"
value3 = "hdisk3"
PdCn:
CuDvDr:
uniquetype = "adapter/pci/sym875"
resource = "devno"
connkey = "scsi"
value1 = "36"
connwhere = "2,0"
value2 = "1"
value3 = "hdisk2"
CuVPD:
CuDep: name = "hdisk2"
name = "rootvg" vpd_type = 0
dependency = "hd6" vpd = "*MFIBM *TM\n\
HUS151473VL3800 *F03N5280
CuDep: *RL53343341*SN009DAFDF*ECH17923D
name = "datavg" *P26K5531 *Z0\n\
dependency = "lv01" 000004029F00013A*ZVMPSS43A
*Z20068*Z307220"
Figure 2-21. Additional device object classes AN153.0
Notes:
PdCn
The predefined connection (PdCn) object class contains connection information for adapters (or
sometimes called intermediate devices). This object class also includes predefined dependency
information. For each connection location, there are one or more objects that describe the
subclasses of devices that can be connected.
The sample PdCn objects on the visual indicate where the devices that belong to the SCSI
subclass can be attached.
CuDep
The customized dependency (CuDep) object class describes device instances that depend on
other device instances. This object class describes the dependence links between logical
devices and physical devices, and the dependence links between logical devices, exclusively.
Physical dependencies of one device on another device are recorded in the customized devices
(CuDep) object class.

V10.0
Student Notebook
Uempty The sample CuDep objects on the visual show the dependencies between logical volumes and
the volume groups they belong to.
CuDvDr
The customized device driver (CuDvDr) object class is used to create the entries in the /dev
directory. These special files are used from applications to access a device driver that is part of
the AIX kernel. The attribute value1 is called the major number and is a unique key for a
device driver. The attribute value2 specifies a certain operating mode of a device driver.
The sample CuDvDr objects on the visual reflect the device driver for disk drives hdisk2 and
hdisk3. The major number 36 specifies the driver in the kernel. In the example, the minor
numbers 0 and 1 specify two different instances of disk dives, both using the same device driver.
For other devices, the minor number can represent different modes in which the device can be
used. For example, looking at a tape drive, the operating mode 0 would specify a rewind on
close. The operating mode 1 would specify no rewind on close.
CuVPD
The customized vital product data (CuVPD) object class contains vital product data
(manufacturer of device, engineering level, part number, and so forth) that is useful for technical
support. When an error occurs with a specific device, the vital product data is shown in the error
log.
Student Notebook
ODM and high-level device commands

IBM Power Systems
• Listing objects in the Predefined and Customized object classes:

– List the PdDv object class:
# lsdev –P [-c <class>] [-s <subclass>] [-t <type>]
– List the CuDv object class:

# lsdev –C [-l <device name>] [-c <class>] [-s <subclass>]
[-t <type>]
• Listing default and effective attributes:

– List default attributes from PdAt object class:
# lsattr –D –c <class> -s <subclass> -t <type> [-a <attribute>]
# lsattr –D –l <device name> [-a <attribute>]
– List an enumeration or range of acceptable attribute values:

# lsattr –R –l <device name> -a <attribute name>
– List effective attributes (PdAt and overrides in CuAt):

# lsattr –E –l <device name> [-a <attribute>]
Figure 2-22. ODM and high-level device commands AN153.0
Notes:
Most of the time the information in the ODM device database is accessed and managed by
using high-level commands. Understanding the object classes and their roles helps when using
these commands.
The lsdev command has options that control which ODM object class you list.
To see the objects in the predefined device (PdDv) object class, use the -P flag. If you want to
control the output, you can optionally qualify the command with any combination of the three
key descriptors: class, subclass, and type.
To see objects in the customized device (CuDv) object class, use the -C flag. To control the
output, you can either specify a particular device (by using its logical device name) or you can
use any combination of the PdDv object class key descriptors.
Here is an example of specifying a particular device:
# lsdev -l hdisk0

V10.0
Student Notebook
Uempty The most common PdDv descriptor qualification is the class. Thus, it is common to enter
commands such as:
# lsdev -Cc disk
# lsdev -Cc adapter
The lsattr command, also, has options which control which ODM object classes it uses.
To see the default attribute values, which are stored in the predefined attributes (PdAt) object
class, use the -D flag. You must uniquely identify the object by either:
• Specifying the class, subclass, and type for the object
• Specifying the logical device name of a customized device that is related to the PdAt
object
The effective attributes are either the attributes in the Customized Attributes (CuAt) object class
for the specified device, or the default attribute value from the related PdAt object. The CuAt
object class has entries for attributes that are different from their default values in PdAt. You
must specify a particular device by providing the logical device name of that device.
When using the chdev command to modify an attribute value, the command logic does not let
you enter unacceptable values. It knows what is allowed by examining the value descriptor for
the attribute in the PdAt object class. If you get an exception message when you attempt to set
an attribute value, it is useful to know what is acceptable. The lsattr command displays this
information when using the -R (range) flag. The -R option requires that the attribute name is
specified in addition to the logical name of the device for which you are attempting modify that
attribute.
Student Notebook
Checkpoint (1 of 2)
IBM Power Systems
1. True or False: The CuAt ODM object class contains an entry for
each attribute for each supported device.
2. True or False: The DvDr attribute in the PdDv ODM object class
identifies the program that is loaded into the kernel when the device
is made available.
3. True or False: The configure attribute in the CuDv ODM object

class identifies the program that runs to bring a device to the
available state.
Figure 2-23. Checkpoint (1 of 2) AN153.0
Notes:

V10.0
Student Notebook
Uempty
Checkpoint (2 of 2)
IBM Power Systems
4. True or False: The /etc/objrepos ODM repository holds object

classes that are specific to a system.
5. True or False: A defined device has an entry in CuDv, but cannot be

used at this time.
6. True or False: An available device has its device driver loaded into the
kernel and a device file created in /dev (if applicable).
Notes:
Student Notebook
Exercise: The Object Data Manager

IBM Power Systems
• Review the device configuration ODM

classes
• Modify a device attribute’s default value
• Create self-defined ODM classes

(Optional)
Figure 2-25. Exercise: The Object Data Manager AN153.0
Notes:

V10.0
Student Notebook
Uempty
Unit summary
IBM Power Systems

Notes:
The ODM is made from object classes, which are broken into individual objects and descriptors.
AIX offers a command-line interface to work with the ODM files.
The device information is held in the customized and the predefined databases (Cu*, Pd*).
Student Notebook

V10.0
Student Notebook
Uempty
Unit 3. Error monitoring

This unit covers techniques in monitoring for problems and how to automate
responses to those problems. Topics include an overview of the AIX Error
Log facility (and how it can interact with the syslogd daemon), and the
system hang (shdaemon) monitoring facility.

• Analyze error log entries
• Identify and maintain the error logging components
• Describe different error notification methods
• Log system messages that use the syslogd daemon
• Monitor and take actions for hang conditions that use shdaemon

Accountability:
• Lab exercise
References
Online AIX Version 7.1 General Programming Concepts: Writing
and Debugging Programs (Chapter 5. Error-Logging
Overview)
© Copyright IBM Corp. 2009, 2015 Unit 3. Error monitoring 3-1

Student Notebook
Unit objectives
IBM Power Systems

• Log system messages using the syslogd daemon
• Monitor and take actions for hang conditions using
shdaemon
Notes:

V10.0
Student Notebook
Uempty 3.1. Working with the error log

Student Notebook
Error logging components

IBM Power Systems
smit
diagnostics
e-mail
console errpt formatted
output
error notify
method
ODM
errlog
errnotify /var/adm/ras/errlog
error daemon
errclear
errstop /usr/lib/errdemon
errlogger
application
errlog() User
Kernel
/dev/error
errsave() (timestamp)
kernel module
Figure 3-2. Error logging components AN153.0
Notes:
Detection of an error
The error logging process begins when an operating system module detects an error. The
segment of code that detects errors then sends error information to either the errsave() kernel
service or the errlog() application subroutine, where the information is then written to the
/dev/error special file. This process then adds a time stamp to the collected data. The
errdemon daemon constantly checks the /dev/error file for new entries, and when new data
is written, the daemon conducts a series of operations.
Creation of error log entries

The errdemon daemon collects more data from other parts of the system before writing the
information to the error log. There is an Error Record Template (/var/adm/ras/errtmplt)
which identifies what information is needed. For example, if the error signifies a
hardware-related problem and hardware vital product data (VPD) exists, the daemon retrieves
the VPD from the ODM.

V10.0
Student Notebook
Uempty When you use the errpt command (from the command-line or SMIT), the error log is formatted
according to the error template in the error record template and presented in a report. Most
entries in the error log are attributable to hardware and software problems, but informational
messages can also be logged, for example, by the system administrator, that uses the
errlogger command.
The errlogger command

The errlogger command allows the system administrator to record messages of up to 1024
bytes in the error log. Whenever you do a maintenance activity, such as clearing entries from
the error log, replacing hardware, or applying a software fix, it is a good idea to record this
activity in the system error log.
The following example illustrates use of the errlogger command:
# errlogger system hard disk ’(hdisk0)’ replaced.
This message is listed as part of the error log.
The errclear command

You can selectively delete records from the log with the errclear command. The criteria is the
same as for selectively reporting entries with errpt.
The errnotify methods

Later this unit presents details on the option to define an errnotify method to be run anytime
the errdemon processes certain specified error records. The actions taken by the method
program or script might include such actions as sending email, writing to the console, or
triggering diagnostics.

Student Notebook
Generating an error report using SMIT

IBM Power Systems
# smit errpt
Generate an Error Report
Type or select values in entry fields.

Press Enter AFTER making all desired changes.
[Entry Fields]
CONCURRENT error reporting? no
Type of Report summary +
Error CLASSES (default is all) [] +
Error TYPES (default is all) [] +
Error LABELS (default is all) [] +
Error ID's (default is all) [] +
Resource CLASSES (default is all) []
Resource TYPES (default is all) []
Resource NAMES (default is all) []
SEQUENCE numbers (default is all) []
STARTING time interval []
ENDING time interval []
Show only Duplicated Errors [no]
Consolidate Duplicated Errors [no]
LOGFILE [/var/adm/ras/errlog]
TEMPLATE file [/var/adm/ras/errtmplt]
MESSAGE file []
FILENAME to send report to (default is stdout) []
Figure 3-3. Generating an error report using SMIT AN153.0
Notes:
Overview
The SMIT fastpath smit errpt takes you to the screen used to generate an error report. Any user
can use this screen. As shown on the visual, the screen includes a number of fields that can be
used for report specifications.
CONCURRENT error reporting?

Yes means you want errors that are displayed or printed as the errors are entered into the error
log (similar to tail -f).
Type of report
Summary, intermediate, and detailed reports are available. Detailed reports give
comprehensive information. Intermediate reports display most of the error information.
Summary reports contain concise descriptions of errors.

V10.0
Student Notebook
Uempty Error classes

Values are H (hardware), S (software), and O (operator messages that are created with
errlogger). You can specify more than one error class.
Error types
Valid error types include:
- PEND: The loss of availability of a device or component is imminent.
- PERF: The performance of the device or component has degraded to below an acceptable
level.
- TEMP: Recovered from condition after several attempts.
- PERM: Unable to recover from error condition. Error types with this value are usually the
most severe errors and imply that you have a hardware or software defect. Error types other
than PERM usually do not indicate a defect, but they are recorded to analyze later by the
diagnostic programs.
- UNKN: Severity of the error cannot be determined.
- INFO: The error type is used to record informational entries
Error labels
An error label is the mnemonic name that is used for an error ID.
Error IDs
An error ID is a 32-bit hexadecimal code that is used to identify a particular failure.
Resource classes
Means device class for hardware errors (for example, disk).
Resource types
Indicates device type for hardware (for example, 355 MB).
Resource names
Provides common device name (for example hdisk0).
Starting and ending time interval

The format mmddhhmmyy can be used to select only errors from the log that have a time stamp
between the two values.

Student Notebook
Show only duplicated errors

Yes reports only those errors that are exact duplicates of previous errors that are generated
during the interval of time specified. The default time interval is 100 milliseconds. This value can
be changed with the errdemon -t command. The default for the Show only Duplicated
Errors option is no.
Consolidate duplicated errors

Yes reports only the number of duplicate errors and time stamps of the first and last occurrence
of that error. The default for the Consolidate Duplicated Errors option is no.
File name to send reports to

The report can be sent to a file. The default is to send the report to stdout.

V10.0
Student Notebook
Uempty
The errpt command

IBM Power Systems
• Summary report
– # errpt
• Intermediate report
– # errpt -A
• Detailed report
– # errpt -a
• Summary report of all hardware errors
– # errpt -d H
• Detailed report of all software errors
– # errpt -a -d S
• Concurrent error logging ("Real-time" error logging)
– # errpt -c > /dev/console
Figure 3-4. The errpt command AN153.0
Notes:
Types of reports available

The errpt command generates a report of logged errors. Three different layouts can be
produced, depending on the option that is used:
- A summary report gives an overview (default).
- An intermediate report displays only the values for the LABEL, Date/Time, Type, Resource
Name, Description, and Detailed Data fields. Use the option -A to specify an intermediate
report.
- A detailed report shows a detailed description of all the error entries. Use the option -a to
specify a detailed report.

Student Notebook
The -d option
The -d option (flag) can be used to limit the report to a particular class of errors. Two examples
illustrating use of this flag are shown on the visual:
- The command errpt -d H specifies a summary report of all hardware (-d H) errors.
- The command errpt -a -d S specifies a detailed report (-a) of all software (-d S) errors.
Input file that is used

The errpt command queries the error log file to produce the error report. The default error log
file is /var/adm/ras/errlog.
The -c option
If you want to display the error entries concurrently, that is, at the time they are logged, you
must run errpt -c. In the example on the visual, direct the output to the system console.
The -D flag
Duplicate errors can be consolidated by using errpt -D. When used with the -a option,
errpt -D reports only the number of duplicate errors and the time stamp for the first and last
occurrence of the identical error.
The -P flag
Shows only errors that are duplicates of the previous error. The -P flag applies only to duplicate
errors generated by the error log device driver.
Additional information
The errpt command has many options. Refer to your AIX Commands Reference (or the man
page for errpt) for a complete description.

V10.0
Student Notebook
Uempty
A summary report: errpt

IBM Power Systems
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
192AC071 1010130907 T O errdemon ERROR LOGGING TURNED OFF

C6ACA566 1010130807 U S syslog MESSAGE REDIRECTED FROM SYSLOG
A6DF45AA 1010130707 I O RMCdaemon The daemon is started.
2BFA76F6 1010130707 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1010130707 T O errdemon ERROR LOGGING TURNED ON
192AC071 1010123907 T O errdemon ERROR LOGGING TURNED OFF
AA8AB241 1010120407 T O OPERATOR OPERATOR NOTIFICATION
2BFA76F6 1010094907 T S SYSPROC SYSTEM SHUTDOWN BY USER
EAA3D429 1010094207 U S LVDD PHYSICAL PARTITION MARKED STALE
EAA3D429 1010094207 U S LVDD PHYSICAL PARTITION MARKED STALE
F7DDA124 1010094207 U H LVDD PHYSICAL VOLUME DECLARED MISSING
Error Type: Error Class:

• P: Permanent, • H: Hardware
Performance, or Pending • S: Software
• T: Temporary • O: Operator
• I: Informational • U: Undetermined
• U: Unknown
Figure 3-5. A summary report: errpt AN153.0
Notes:
Content of summary report

By default, the errpt command creates a summary report that gives an overview of the
different error entries. One line per error is fine to get a feel for what is there, but you need more
details to understand problems.
Need for detailed report

The example shows different hardware and software errors that occurred. To get more
information about these errors, you must create a detailed report.

Student Notebook
A detailed error report: errpt -a

IBM Power Systems
LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Date/Time: Wed Oct 10 09:42:20 CDT 2007

Sequence Number: 113
Machine Id: 00C35BA04C00
Node Id: rt1s3vlp2
Class: H
Type: UNKN
WPAR: Global
Resource Name: LVDD
Resource Class: NONE
Resource Type: NONE
Location:
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000 0000
Figure 3-6. A detailed error report: errpt -a AN153.0
Notes:
Content of detailed error report

As previously mentioned, detailed error reports are generated by running the errpt -a
command. The first half of the information that is displayed is obtained from the ODM (CuDv,
CuAt, CuVPD) and is useful because it shows clearly which part causes the error entry. The
next few fields explain probable reasons for the problem, and actions that you can take to
correct the problem.
The last field, SENSE DATA, is a detailed report about which part of the device is failing. For
example, with disks, it might display which sector on the disk is failing. IBM support might use
this information to analyze the problem.

V10.0
Student Notebook
Uempty Interpreting error classes and types

The values that are shown for error class and error type provide information that is useful in
understanding a particular problem:
- The combination of an error class value of H and an error type value of PERM indicates that
the system encountered a problem with a piece of hardware and might not recover from it.
- The combination of an error class value of H and an error type value of PEND indicates that a
piece of hardware might become unavailable soon due to the numerous errors detected by
the system.
- The combination of an error class value of S and an error type of PERM indicates that the
system encountered a problem with software and might not recover from it.
- The combination of an error class value of S and an error type of TEMP indicates that the
system encountered a problem with software. After several attempts, the system was able
to recover from the problem.
- An error class value of O indicates that an informational message was logged.
- An error class value of U indicates that an error class might not be determined.
Link between error log and diagnostics

There is a link between the error log and diagnostics. Error reports include the diagnostic
analysis for errors that were analyzed. Diagnostics, and the diagnostic tool diag, are covered in
a later unit.

Student Notebook
Example of Detailed Error Report

LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Date/Time: Wed Oct 10 09:42:20 CDT 2007

Sequence Number: 113
Machine Id: 00C35BA04C00
Node Id: rt1s3vlp2
Class: H
Type: UNKN
WPAR: Global
Resource Name: LVDD
Resource Class: NONE
Resource Type: NONE
Location:
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000
0000

V10.0
Student Notebook
Uempty
Types of disk errors

IBM Power Systems
Error
Error Label Recommendations
Type
DISK_ERR1 P Failure of physical volume media
Action: Replace device as soon as possible
DISK_ERR2, P Device does not respond
DISK_ERR3 Action: Check power supply
DISK_ERR4 T Error that is caused by bad block or occurrence
of a recovered error
Rule of thumb: If disk produces more than one
DISK_ERR4 per week, replace the disk
SCSI_ERR* P SCSI communication problem
(SCSI_ERR10) Action: Check cable, SCSI addresses,
terminator
Error types: P = Permanent
T = Temporary
Figure 3-7. Types of disk errors AN153.0
Notes:
Common disk errors

The following list explains the most common disk errors you should know about:
- DISK_ERR1 is caused from wear and tear of the disk. Remove the disk as soon as possible
from the system and replace it with a new one. Follow the procedures that you learned
earlier in this course.
- DISK_ERR2 and DISK_ERR3 error entries usually caused by a loss of electrical power.
- DISK_ERR4 indicates bad blocks on the disk. Do not panic if you get a few entries in the log
of this type of an error. What you should be aware of is the number of DISK_ERR4 errors
and their frequency. The more you get, the closer you are getting to a disk failure. You want
to prevent a disk failure before it happens, so monitor the error log closely.
- Sometimes SCSI errors are logged, mostly with the LABEL SCSI_ERR10. They indicate that
the SCSI controller is not able to communicate with an attached device. In this case, check
the cable (and the cable length), the SCSI addresses, and the terminator.

Student Notebook
DISK_ERR5 errors
An infrequent error is DISK_ERR5. It is the catch-all (that is, the problem does not match any of
the other DISK_ERRx symptoms). You need to investigate further by running the diagnostic
programs that can detect and produce more information about the problem.

V10.0
Student Notebook
Uempty
LVM error log entries

IBM Power Systems
Class
Error Label and Recommendations
Type
LVM_BBEPOOL, S,P No more bad block relocation
LVM_BBERELMAX, Action: Replace disk as soon as
LVM_HWFAIL possible
LVM_SA_STALEPP S,P Stale physical partition
Action: Check disk, synchronize data
(syncvg)
LVM_SA_QUORCLOSE H,P Quorum lost, volume group closing
Action: Check disk, consider working
without quorum
Error Classes: H = Hardware Error Types: P = Permanent

S = Software T = Temporary
Figure 3-8. LVM error log entries AN153.0
Notes:
Important LVM error codes

The visual shows some important LVM error codes that you should know. All of these errors are
permanent errors that cannot be recovered. Often these errors accompany hardware errors
such as those shown on the previous page.
Immediate response to errors

Errors, such as those shown on the visual, require your immediate intervention.
Categories of LVM error labels:

LVM_BBEPOOL, LVM_BBERELMAX, LVM_HWFAIL:
No more bad block relocation
Action: Replace disk as soon as possible.

Student Notebook
LVM_SA_STALEPP
Stale physical partition
Action: Check disk, synchronize data (syncvg).
LVM_SA_QUORCLOSE
Quorum lost, volume group closing
Action: Check disk, consider working without quorum.

V10.0
Student Notebook
Uempty
Maintaining the error log

IBM Power Systems
# smit errdemon
Change / Show Characteristics of the Error Log

[Entry Fields]
*Maximum LOGSIZE [1048576] #
Memory Buffer Size [32768] #
...
# smit errclear
Clean the Error Log

[Entry Fields]
Remove entries older than this number of days [30] #
Error CLASSES (default is all) [ ] +
Error TYPES (default is all) [ ] +
...
Resource CLASSES (default is all) [ ]
...
==> Use the errlogger command as a reminder <==

Figure 3-9. Maintaining the error log AN153.0
Notes:
Changing error log attributes

To change error log attributes like the error log filename, the internal memory buffer size, and
the error log file size, use the SMIT fastpath smit errdemon. The error log file is implemented
as a ring. When the file reaches its limit, the oldest entry is removed to allow adding a new one.
The command that SMIT runs is the errdemon command. See your AIX Commands Reference
for a listing of the different options.
Cleaning up error log entries

To clean up error log entries, use the SMIT fastpath smit errclear. For example, after
removing a bad disk that caused error logs entries, you should remove the corresponding error
log entries regarding the bad disk. The errclear command is part of the fileset
bos.sysmgt.serv_aid.

Student Notebook
Entries in /var/spool/cron/crontabs/root use errclear to remove software and

hardware errors. Software and operator errors are purged after 30 days, hardware errors are
purged after 90 days.
Using errlogger to create reminders

Follow the suggestion at the bottom of the visual. Whenever an important system event takes
place, for example, the replacement of a disk, log this event with the errlogger command.
Full list of characteristics of the error log

The first SMIT screen that is shown in the visual is not the complete. The complete screen is:
* Maximum LOGSIZE [1048576] #
Memory BUFFER SIZE [32768] #
Duplicate Error Detection [true] +
Duplicate Time Interval [10000] #
in milliseconds
Duplicate error maximum [1000] #

V10.0
Student Notebook
Uempty
Exercise: Error monitoring (Part 1)

IBM Power Systems
• Part 1: Work with the error log
Figure 3-10. Exercise: Error monitoring (Part 1) AN153.0
Notes:
Goals for this part of the exercise

The first part of this exercise has you to work with the AIX error logging facility.
After completing this part of the exercise, you should be able to:
- Determine what errors are logged on your machine
- Generate different error reports
- Start concurrent error notification

Student Notebook

V10.0
Student Notebook
Uempty 3.2. Error notification and syslogd

Student Notebook
Error notification methods

IBM Power Systems
ODM-Based:
/etc/objrepos/errnotify
Error notification
Concurrent error logging:

Self-made error
errpt -c > /dev/console notification
Figure 3-11. Error notification methods AN153.0
Notes:
What is error notification?

Implementing error notification means taking steps that cause the system to inform you
whenever an error is posted to the error log.
Ways to implement error notification

There are different ways to implement error notification:
- Concurrent error logging is the easiest way to implement error notification. If you run
errpt -c, each error is reported when it occurs. By redirecting the output to the console,
an operator is informed about each new error entry.
- Self-made error notification is also an easy way to implement error notification. You write
a shell procedure that regularly checks the error log.
- ODM-based error notification: The errdemon program uses the ODM class errnotify for
error notification. How to work with errnotify is discussed later in this topic.

V10.0
Student Notebook
Uempty
Self-made error notification

IBM Power Systems
#!/usr/bin/ksh
errpt > /tmp/errlog.1
while true
do
sleep 60 # Let's sleep one minute
# Compare the two files.

# If no difference, let's sleep again
cmp -s /tmp/errlog.1 /tmp/errlog.2 && continue
# Files are different: Let's inform the operator:

print "Operator: Check error log " > /dev/console
done
Figure 3-12. Self-made error notification AN153.0
Notes:
Implementing self-made error notification

It is easy to implement self-made error notification by using the errpt command. The sample
shell script on the visual shows how this error notification can be done.
Example on visual
The procedure on the visual shows an easy but effective way of implementing error notification.
- The first errpt command generates a file /tmp/errlog.1.
- The construct while true implements an infinite loop that never ends.
- In the loop, the first action is to sleep 1 minute.
- The second errpt command generates a second file /tmp/errlog.2.

Student Notebook
- The two files are compared by using the command cmp -s (silent compare that means no
output is reported). If the files are not different, it jumps back to the beginning of the loop
(continue), and the process sleeps again.
- If there is a difference, a new error entry is posted to the error log. In this case, the operator
is informed that a new entry is in the error log. Instead of print you might use the mail
command to inform another person.

V10.0
Student Notebook
Uempty
ODM-based error notification: errnotify

IBM Power Systems
errnotify:
en_pid = 0
en_name = "sample"
en_persistenceflg = 1
en_label = ""
en_crcid = 0
en_class = "H"
en_type = "PERM"
en_alertflg = ""
en_resource = ""
en_rtype = ""
en_rclass = "disk"
en_method = "errpt -a -l $1 | mail -s DiskError root"
Figure 3-13. ODM-based error notification: errnotify AN153.0
Notes:
The error notification object class

The Error Notification object class specifies the conditions and actions to be taken when errors
are recorded in the system error log. The user specifies these conditions and actions in an Error
Notification object.
Each time an error is logged, the error notification daemon determines whether the error log
entry matches the selection criteria of any of the Error Notification objects. If matches exist, the
daemon runs the programmed action, also called a notify method, for each matched object.
The Error Notification object class is in the /etc/objrepos/errnotify file. Error Notification
objects are added to the object class by using ODM commands.
Example on visual
The example on the visual shows an object that creates a mail message to root whenever a disk
error is posted to the log.

Student Notebook
List of descriptors
Here is a list of all descriptors for the errnotify object class:
en_alertflg Identifies whether the error is alertable. This descriptor is provided for
use by alert agents with network management applications. The
values are TRUE (alertable) or FALSE (not alertable).
en_class Identifies the class of error log entries to match. Valid values are H
(hardware errors), S (software errors), O (operator messages), and U
(undetermined).
en_crcid Specifies the error identifier that is associated with a particular error.
en_dup Identifies whether the kernel identified the error as a duplicate. TRUE
indicates that it is a duplicate error.
en_err64 Identifies the environment of the error. TRUE indicates that the error is
from a 64-bit environment.
en_label Specifies the label that is associated with a particular error identifier as
defined in the output of errpt -t (show templates).
en_method Specifies a user-programmable action, such as a shell script or a
command string to be run when an error matching the selection criteria
of this Error Notification object is logged. The error notification daemon
uses the sh -c command to run the notify method.
The following keywords are passed to the method as arguments:
$1 Sequence number from the error log entry
$2 Error ID from the error log entry
$3 Class from the error log entry
$4 Type from the error log entry
$5 Alert flags from the error log entry
$6 Resource name from the error log entry
$7 Resource type from the error log entry
$8 Resource class from the error log entry
$9 Error label from the error log entry
en_name Uniquely identifies the object
en_persistenceflg Designates whether the Error Notification object should be removed
when the system is restarted. 0 means removed at boot time; 1 means
persists through boot.
en_pid Specifies a process ID for use in identifying the Error Notification
object. Objects that have a PID specified should have the
en_persistenceflg descriptor set to 0.

V10.0
Student Notebook
Uempty en_rclass Identifies the class of the failing resource. For hardware errors, the
resource class is the device class (see PdDv). Not used for software
errors.
en_resource Identifies the name of the failing resource. For hardware errors, the
resource name is the device name. Not used for software errors.
en_rtype Identifies the type of the failing resource. For hardware errors, the
resource type is the device type (see PdDv). Not used for software
errors.
en_symptom Enables notification of an error that accompanies a symptom string
when set to TRUE.
en_type Identifies the severity of error log entries to match. Valid values are:
INFO: Informational
PEND: Impending loss of availability
PERM: Permanent
PERF: Unacceptable performance degradation
TEMP: Temporary
UNKN: Unknown
TRUE: Matches alertable errors
FALSE: Matches non-alertable errors
0: Removes the Error Notification object at system restart
non-zero: Retains the Error Notification object at system restart

Student Notebook
syslogd daemon
IBM Power Systems
/etc/syslog.conf:
daemon.debug /tmp/syslog.debug
/tmp/syslog.debug:
syslogd inetd[16634]: A connection requires tn service

inetd[16634]: Child process 17212 has ended
# stopsrc -s inetd
# startsrc -s inetd -a "-d" Provide debug
information
Figure 3-14. syslogd daemon AN153.0
Notes:
Function of syslogd
The syslogd daemon logs system messages from different software components (kernel,
daemon processes, system applications).
The /etc/syslog.conf configuration file

When started, the syslogd reads a configuration file /etc/syslog.conf. Whenever you
change this configuration file, you need to refresh the syslogd subsystem:
# refresh -s syslogd

V10.0
Student Notebook
Uempty Example on visual

The visual shows a configuration that is often used when a daemon process causes a problem.
The following line is placed in /etc/syslog.conf and indicates that facility daemon should
be monitored/controlled:
daemon.debug /tmp/syslog.debug
The line that is shown also specifies that all messages with the priority level debug and higher,
should be written to the file /tmp/syslog.debug. This file must exist.
The daemon process that causes problems (in the example the InetD) is started with option -d
to provide debug information. The syslogd daemon collects the debug information and writes
the information to the log file /tmp/syslog.debug.

Student Notebook
syslogd configuration examples

IBM Power Systems
/etc/syslog.conf:
All security messages to the
auth.debug /dev/console system console
Collect all mail messages in

mail.debug /tmp/mail.debug /tmp/mail.debug
Collect all daemon messages

daemon.debug /tmp/daemon.debug in /tmp/daemon.debug
Send all messages, except

*.debug; mail.none @server mail messages, to host server
After changing /etc/syslog.conf:

Figure 3-15. syslogd configuration examples AN153.0
Notes:
Examples on visual
The visual shows some examples of syslogd configuration entries that might be placed in
/etc/syslog.conf:
- The following line specifies that all security messages are directed to the system console:
auth.debug /dev/console
- The following line specifies that all mail messages are collected in the file
/tmp/mail.debug:
mail.debug /dev/mail.debug
- The following line specifies that all messages produced from daemon processes are
collected in the file /tmp/daemon.debug:
daemon.debug /tmp/daemon.debug

V10.0
Student Notebook
Uempty - The following line specifies that all messages, except messages from the mail subsystem,
are sent to the syslogd daemon on the host server:
*.debug; mail.none @server
If this example and the preceding example appear in the same /etc/syslog.conf file,
messages sent to /tmp/daemon.debug are also sent to the host server.
General format of /etc/syslog.conf entries

As you see, the general format for entries in /etc/syslog.conf is:
selector action
The selector field names a facility and a priority level. Separate facility names with a comma (,).
Separate the facility and priority level portions of the selector field with a period (.). Separate
multiple entries in the same selector field with a semicolon (;). To select all facilities, use an
asterisk (*).
The action field identifies a destination (file, host, or user) to receive the messages. If routed to
a remote host, the remote system handles the message as indicated in its own configuration
file. To display messages on a user's terminal, the destination field must contain the name of a
valid, logged-in system user. If you specify an asterisk (*) in the action field, a message is sent
to all logged-in users.
Facilities
Use the following system facility names in the selector field:
kern Kernel
user User level
mail Mail subsystem
daemon System daemons
auth Security or authorization
syslog syslogd messages
lpr Line-printer subsystem
news News subsystem
uucp uucp subsystem
* All facilities
Priority levels
Use the following levels in the selector field. Messages of the specified level and all levels
above it are sent as directed.
emerg Specifies emergency messages. These messages are not distributed to all users.

Student Notebook
alert Specifies important messages such as serious hardware errors. These messages
are distributed to all users.
crit Specifies critical messages, not classified as errors, such as improper login
attempts. These messages are sent to the system console.
err Specifies messages that represent error conditions.
warning Specifies messages for abnormal, but recoverable conditions.
notice Specifies important informational messages.
info Specifies information messages that are useful in analyzing the system.
debug Specifies debugging messages. If you are interested in all messages of a certain
facility, use this level.
none Excludes the selected facility.
Refreshing the syslogd subsystem

As previously mentioned, after changing /etc/syslog.conf, you must refresh the syslogd
subsystem to have the change take effect. Use the command:

V10.0
Student Notebook
Uempty
Redirecting syslog messages to error log

IBM Power Systems
/etc/syslog.conf:
*.debug errlog Redirect all syslog

messages to error log
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

...
...
Figure 3-16. Redirecting syslog messages to error log AN153.0
Notes:
Consolidating error messages

Some applications use syslogd for logging errors and events. Some administrators find it
desirable to list all errors in one report.
Redirecting messages from syslogd to the error log

The visual shows how to redirect messages from syslogd to the error log.
By setting the action field to errlog, all messages are redirected to the AIX error log.

Student Notebook
Directing error log messages to syslogd

IBM Power Systems
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: `errpt -l $1 | grep -v TIMESTAMP`"
errnotify:
en_name = "syslog1"
en_method = "logger Error Log: $(errpt -l $1 | grep -v TIMESTAMP)"
Direct the last error entry (-l $1) to the syslogd

Do not show the error log header (grep -v) or (tail -1)
errnotify:
en_name = "syslog1"
en_method = "errpt -l $1 | tail -1 | logger -t errpt -p
daemon.notice"
Figure 3-17. Directing error log messages to syslogd AN153.0
Notes:
Using the logger command

You can direct error log events to syslogd by using the logger command with the errnotify
ODM class.
Command substitution
You need to use command substitution (or pipes) before calling the logger command. The first
two examples on the visual illustrate the two ways to do command substitution in a Korn shell
environment:
- Using the ‘UNIX-command‘ syntax (with backquotes) - shown in the first example on the
visual
- Using the newer $(UNIX command) syntax - shown in the second example on the visual

V10.0
Student Notebook
Uempty
System hang detection

IBM Power Systems
• System hangs:
– High priority process
– Other
• What does shdaemon do?
– Monitors the system's ability to run processes
– Takes specified action if threshold is crossed
• Actions:
– Logs error in the error log
– Displays a warning message on the console
– Launches recovery login on a console
– Launches a command
– Automatically reboots the system
Figure 3-18. System hang detection AN153.0
Notes:
Types of system hangs

shdaemon can help to recover from certain types of system hangs.
- High priority process
The system might appear to be hung if some applications adjusted their process or thread
priorities so high that regular processes are not scheduled. In this case, work is still being
done, but only by the high priority processes. As currently implemented, shdaemon
specifically addresses this type of hang.
- Other
Other types of hangs can be caused by various problems. For example, system thrashing,
kernel deadlock, and the kernel in tight loop. In these cases, no (or little) meaningful work
gets done. shdaemon can help with some of these problems.

Student Notebook
What does shdaemon do?

If enabled, shdaemon monitors the system to see whether any process with a process priority
number higher than a set threshold was run during a set timeout period. Remember that a
higher process priority number indicates a lower priority on the system. In effect, shdaemon
monitors to see whether lower priority processes are being scheduled.
shdaemon runs at the highest priority (priority number = 0), so that it is always able to get CPU
time, even if a process is running at a high priority.
Actions
If lower priority processes are not being scheduled, shdaemon performs the specified action.
Each action can be individually enabled and has its own configurable priority and timeout
values. There are five actions available:
- Log error in the error log
- Display a warning message on a console
- Start a recovery login on a console
- Start a command
- Automatically reboot the system

V10.0
Student Notebook
Uempty
Configuring shdaemon
IBM Power Systems
# shconf -E -l prio
sh_pp disable Enable Process Priority Problem
pp_errlog disable Log Error in the Error Logging

pp_eto 2 Detection Time-out
pp_eprio 60 Process Priority
pp_warning enable Display a warning message on a console

pp_wto 2 Detection Time-out
pp_wprio 60 Process Priority
pp_wterm /dev/console Terminal Device
pp_login enable Launch a recovering login on a console

pp_lto 2 Detection Time-out
pp_lprio 100 Process Priority
pp_lterm /dev/console Terminal Device
pp_cmd disable Launch a command

pp_cto 2 Detection Time-out
pp_cprio 60 Process Priority
pp_cpath /home/unhang Script
pp_reboot disable Automatically REBOOT system

pp_rto 5 Detection Time-out
pp_rprio 39 Process Priority
Figure 3-19. Configuring shdaemon AN153.0
Notes:
Introduction
shdaemon configuration information is stored as attributes in the SWservAt ODM object class.
Configuration changes take effect immediately and survive across reboots.
Use shconf (or smit shd) to configure or display the current configuration of shdaemon.
The values that are shown in the visual are the default values.
Enabling shdaemon
At least two parameters must be modified to enable shdaemon:
- Enable priority monitoring (sh_pp)
- Enable one or more actions (pp_errlog, pp_warning, and so forth)

Student Notebook
When enabling shdaemon, shconf does the following steps:

- Modifies the SWservAt parameters
- Starts shdaemon
- Modifies /etc/inittab so that shdaemon is started on each system boot
Action attributes
Each action has its own attributes, which set the priority and timeout thresholds and define the
action to be taken. The timeout attribute unit of measure is in minutes.
Example
By changing the shconf attributes, you can enable, disable, and modify the behavior of the
facility. For example, shdaemon is enabled to monitor process priority (sh_pp=enable), and
the following actions are enabled:
- Enable shconf to monitor process priority monitoring:
# shconf -l prio -a sh_pp=enable
- Log error in the error logging:
# shconf -l prio -a pp_errlog=enable
Every 2 minutes (pp_eto=2), shdaemon checks to see whether any process ran with a
process priority number greater than 60 (pp_eprio=60). If not, shdaemon logs an error to
the error log.
- Display a warning message on a console:
# shconf -l prio -a pp_warning=enable (default value)
Every 2 minutes (pp_wto=2), shdaemon checks to see whether any process ran with a
process priority number greater than 60 (pp_wprio=60). If not, shdaemon sends a warning
message to the console specified by pp_wterm.
- Run a command:
# shconf -l prio -a pp_cmd=enable -a pp_cto=5
Every 5 minutes (pp_cto=5), shdaemon checks to see whether any process ran with a process
priority number greater than 60 (pp_cprio=60). If not, shdaemon runs the command that is
specified by pp_cpath (in this case, /home/unhang).

V10.0
Student Notebook
Uempty
Checkpoint (1 of 2)
IBM Power Systems
1. Which command generates error reports?
2. Which flag of this command is used to generate a detailed error

report?
3. Which type of disk error indicates bad blocks?
4. What does the errclear command do?
Notes:

Student Notebook
Checkpoint (2 of 2)
IBM Power Systems
5. What does the errlogger command do?
6. What does the following line in /etc/syslog.conf indicate?

*.debug errlog
7. What does the descriptor en_method in errnotify indicate?
Notes:

V10.0
Student Notebook
Uempty
Exercise: Error monitoring (Part 2)

IBM Power Systems
• Part 2, section 1: Work with syslogd
• Part 2, section 2: Perform error notification

with errnotify
Figure 3-22. Exercise: Error monitoring (Part 2) AN153.0
Notes:

Student Notebook
Unit summary
IBM Power Systems

• Log system messages using the syslogd daemon
• Monitor and take actions for hang conditions using
shdaemon
Notes:

V10.0
Student Notebook
Uempty
Unit 4. Network Installation Manager basics

This unit provides an introduction to using the Network Installation Manager
(NIM) to network boot an AIX client system. It covers the basic installation
and configuration of NIM for supporting client installation or booting to
maintenance mode.

• Configure an AIX partition for use as a NIM master
• Set up NIM to support the installation of AIX onto a client

Accountability:
• Lab exercises
References
Online AIX Version 7.1 Installation and migration
SG24-7296 NIM from A to Z in AIX 5L (Redbooks)
http://www.redbooks.ibm.com
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-1
Student Notebook
Unit objectives
IBM Power Systems

Notes:

V10.0
Student Notebook
Uempty
NIM overview
IBM Power Systems
• AIX software administration

over the network:
– Install
– Update
– Maintain NIM master
and
• Eliminate tape or CD/DVD NIM server
at each system
PUSH installation: PULL installation:
• Distribute installation load Initiated by master Requested by
client
• Support for push or pull
installations
• NIM administrative tools
– Command line interface Client and
Client Client
– SMIT NIM server
Figure 4-2. NIM overview AN153.0
Notes:
Purpose of NIM
NIM provides centralized AIX software administration for multiple machines over the network.
NIM supports full AIX operating system installation, installing or updating individual packages,
and doing software maintenance.
Advantages
NIM provides several advantages:
- Provides one central point for AIX software administration for all the NIM clients
- Eliminates the need to walk a CDROM/DVD or tape to each system and the need for a tape
drive or CDROM/DVD drive at every system
- Installations can be initiated from the master machine (push) or from the client (pull)
Student Notebook
- The installation load can be distributed. The NIM master machine can be configured as the
server for all the filesets to be installed. However, you can also configure one or more client
machines to act as servers to distribute the load if you have many clients.
NIM administrative tools

You can manage your NIM environment with:
- Command-line
The command-line gives you complete control, but the number of options that are needed
can be daunting. Still, if you want to script NIM operations, you must use the command-line.
The basic NIM commands are:
• nimconfig - Configure NIM master
• nim - Perform NIM operations from the master
• nimclient - Perform NIM operations from a client
• niminit - Configure NIM client
• lsnim - List information about NIM objects
- SMIT
There are basically two paths into SMIT’s NIM interface:
• smit nim - Configure master and client machines and perform all NIM operations.
• smit eznim - This fastpath provides a simplified environment to configure machines
and perform some basic NIM operations. This path can be a good starting point for a
new NIM system administrator.
As you become familiar with the NIM environment, you can find that you use a combination of
methods. For example, you can use the command-line to list NIM status and perform simple
NIM operations, while using SMIT for more complex operations or for operations that you do not
perform frequently.

V10.0
Student Notebook
Uempty
Machine roles
IBM Power Systems
• Master
– File sets:
• bos.sysmgt.nim.master
• bos.sysmgt.nim.client
• bos.sysmgt.nim.spot
• Stores NIM database
– NIM administration
– Can initiate push installations to NIM clients
– AIX version >= all other NIM machines
• Client
– File sets:
• bos.sysmgt.nim.client
– Can initiate pull installations from a server
• Server
– Any machine, master, or client
– Serves NIM resources to clients, thus requires adequate disk space and
throughput
Figure 4-3. Machine roles AN153.0
Notes:
Three basic roles that a machine can have in the NIM environment is master, client, and
resource server. There can be only one master machine in a NIM environment. All other
machines are clients. Any machine, master, or client, can be a resource server.
NIM software
All machines in the NIM environment must install bos.sysmgt.nim.client. The master
machine must also install bos.sysmgt.nim.master and bos.sysmgt.nim.spot.
Master
The NIM master manages all other machines that participate in the NIM environment. The NIM
database is stored on the NIM master. The NIM master is fundamental for all of the operations
in the NIM environment and must be set up and operational before performing any NIM
operations. The master can initiate a software installation to a client, which is called a push
installation.
Student Notebook
Also, the NIM master is the only machine that is given the permissions and ability to run NIM
operations on other machines within the NIM environment. The rsh or nimsh commands are
used to remotely run commands on clients that allow the NIM master to install to a number of
clients with one NIM operation.
The master requires the filesets of bos.sysmgt.nim.master, bos.sysmgt.nim.client,
and bos.sysmgt.nim.spot. It is also required to have its AIX operating system software at a
level that is equal to or higher than any of the clients that it is serving.
Client
All other machines in a NIM environment are clients. Clients can request a software installation
from a server machine (pull installation). The client requires the fileset of
bos.sysmgt.nim.client.
Server
The master can configure any machine, the master, or a client, as a server for a particular
software resource. Most often, the master is also the server. However, if your environment has
many nodes or consists of a complex network environment, you might want to configure some
nodes to act as servers to improve installation performance.
Servers must have adequate disk space for the resources they are providing. They also need
network connections to the client machines they serve and sufficient bandwidth to respond to
the expected volume.

V10.0
Student Notebook
Uempty
Boot process for AIX installation: Tape or CD/DVD

IBM Power Systems
Boot image is on
1 Load boot image removable media
2 Execute boot image
Using programs on
3 Configure devices removable media
Backup archive is
4 Install system files on removable
media
Figure 4-4. Boot process for AIX installation: Tape or CD/DVD AN153.0
Notes:
To understand how NIM works, you need to understand what happens when AIX is installed on
a system.
Power on or partition activation

If using a POWER server with a single operating system, the machine must be booted or reset
to install the AIX Base Operating System (BOS). If using a server that is logically partitioned,
then you must activate the AIX partition from the HMC to install an AIX BOS.
Load boot image into memory

The machine's Initial Program Load (IPL) Read Only Memory (ROM) locates a boot image and
loads the image into memory. The boot image contains a miniature runtime environment (the
kernel and a file system that contains libraries and key programs).
Student Notebook
Where is the boot image?

When booting from a hard disk, the boot image is retrieved from the system's hard disk. When a
machine is being installed for the first time, it obviously cannot retrieve a boot image from the
hard disk. Traditionally, the boot image would need to be available on the tape or CD/DVD.
Transfer control to mini-runtime environment

Control is passed to the kernel, and the file system in the boot image is mounted from memory.
Start boot script and configure devices that are needed for installation
The kernel initializes and eventually runs the boot script (rc.boot), which configures devices
that are needed for the installation such as keyboards, displays, and disks.
Configuring devices
To keep the boot image small, not all of the software needed to configure devices is included in
the boot image. These additional files are contained in a small /usr directory tree that is called
a Shared Product Object Tree or SPOT. The boot script mounts the /usr directory tree on
/SPOT in the memory file system. The SPOT is mounted directly from the CDROM/DVD.
Note: Since tape devices do not support file system operations, the SPOT files are included in
the boot image in the case of booting from a tape drive.
Installation script
After the devices are configured, rc.boot starts the BOS installation program (bi_main), and
installs AIX from the installation images on the tape or CD/DVD.

V10.0
Student Notebook
Uempty
Boot process for AIX installation with NIM (1 of 2)

IBM Power Systems
1
Boot image from
Load boot image
NIM server
Client NIM server

bootp request
Bootpd
en0 boot file name
tftp boot file /etc/bootptab
boot image file
2 Execute boot image
Figure 4-5. Boot process for AIX installation with NIM (1 of 2) AN153.0
Notes:
Using NIM to boot over the network, is essentially the same as booting from CD or tape, except
that the boot file (SPOT file) and installation images come from the server system over the
network.
Load boot image into memory

If the client system is booting from the network, the IPL ROM sends a bootp request to the NIM
server for the name of a boot file. The NIM server then uses the /etc/bootptab file to
determine the boot file name and returns that name to the client system. Finally, the client
system uses the tftp command to request that the NIM download the boot file over the
network.
Student Notebook
Boot process for AIX installation with NIM (2 of 2)

IBM Power Systems
Using programs
3 Configure devices on NIM server
Client NIM server

NFS mount of SPOT
spot: ./usr
en0 access programs directory tree
Backup archive
4 Install system files is on NIM server
Client NIM server

mount of lppsource
en0 lppsource:
access backup archives filesets
Figure 4-6. Boot process for AIX installation with NIM (2 of 2) AN153.0
Notes:
Start the boot script and configure devices that are needed for
installation
When booting over the network, the SPOT is mounted from the NIM server with the Network
File System (NFS).
Start installation script

When booting over the network, the installation script installs AIX from installation images that
are NFS mounted from the NIM server.

V10.0
Student Notebook
Uempty
NIM objects
IBM Power Systems
• NIM objects are stored in ODM
• Object classes
– Networks
– Machines
– Resources
Machines
• Group objects
– mac_group
– res_group
Figure 4-7. NIM objects AN153.0
Notes:
NIM is made up of various components, called objects. There are three classes of objects:
machines, networks, and resources.
All information about the NIM environment is stored in Object Data Manager (ODM) databases
on the NIM master system.
Network objects
Network objects are objects in the NIM database that represent information about each local
area network (LAN) that is part of the NIM environment. These objects and some of their
attributes reflect the physical characteristics of the network. NIM network objects are not used
to perform management tasks in the overall network environment; they are only used to
represent the physical network topology of the NIM environment. In other words, if something
changes in the physical network environment, you must also remember to change it in the NIM
database.
The types of networks that are supported by NIM are: Token-Ring, Ethernet, ATM, FDDI, HFI,
and generic. These network types are represented as network objects in the NIM environment.
Student Notebook
Machine objects
Machines in the NIM environment are managed by NIM.
Resource objects
All operations on clients in the NIM environment require one or more NIM resources. NIM
resource objects represent the files, directories, and devices that are used to support each type
of NIM operation. Some resources are AIX filesets (or devices that contain filesets) that can be
installed on a client machine. Other resources are scripts or configuration files that are used in
the installation process.
The location and other attributes for these resources are stored as resource objects in the NIM
database.
Group objects
NIM supports two types of group objects:
- mac_group
A machine group is a group of machine objects. You can use a machine group to simplify
performing a NIM operation on multiple machines.
- res_group
A resource group is a group of resource objects. If you have a set of resources that you
typically want to use at the same time, you can create a resource group to simplify allocating
those resources.

V10.0
Student Notebook
Uempty
Listing NIM objects and their attributes

IBM Power Systems
• To list all defined NIM objects

# lsnim
master machines master
boot resources boot
nim_script resources nim_script
ent0 networks ent
...
• To list attributes of a NIM object

# lsnim -l <object_name>
# lsnim –l ent0
ent0:
class = networks
type = ent
Nstate = ready for use
prev_state = information is missing from this object's definition
net_addr = 10.31.192.0
snm = 255.255.240.0
routing1 = default 10.31.192.1
Figure 4-8. Listing NIM objects and their attributes AN153.0
Notes:
The lsnim command is used to list various types of NIM information. You have the opportunity
to experiment with lsnim in the exercise.
Listing objects and attributes

When used without any argument, lsnim displays all the currently defined NIM objects.
Using the -l flag, you can get a long listing of an individual object.
Student Notebook
NIM configuration
IBM Power Systems
• Configure master
– Install master NIM file sets
– Run nimconfig
• Define resources
– Create real resource with full path
– Create resource object to represent
• Define networks
– How do clients on networks access the master?
• Define clients
– Able to relate network address of the client with object name
• Allocate resources to clients
– Different operations need different resources
• NIM operations on clients
– Setting up for operation
– Initiating operation
Figure 4-9. NIM configuration AN153.0
Notes:
Installing NIM
The NIM filesets that need to be installed on a machine that is designated to act as NIM master
are:
- bos.sysmgt.nim.client
- bos.sysmgt.nim.master
- bos.sysmgt.nim.spot
Configure master
Configuring the master machine consists of installing the master filesets and running
nimconfig. You must specify the primary network interface and a NIM network name for the
network that is attached to the primary interface. Several optional attributes can be specified.

V10.0
Student Notebook
Uempty nimconfig creates the NIM database and the /etc/niminfo configuration file. It also starts
the NIM daemon (nimesis) and creates an entry in /etc/inittab so that nimesis is started on
every boot of the master machine.
Create NIM objects

Next, you need to create the NIM objects:
- resources
Specify the directories and files that NIM needs.
- networks
The master’s primary network was configured with nimconfig. Some of the clients might
be connected to separate networks or subnets. You need to define these networks and
routes for the master to communicate with all the clients. Also, define routes for any servers
to communicate with their clients.
- clients
Specify the client machines that you are installing by using NIM.
Allocate resources
After the resource and machine objects are defined, you need to decide what operation you
want to perform on your client machine. Different resources are needed for each operation.
Next, you need to allocate the resource to your client. The resources identify which resource
object is used to implement the client operation. There are two ways to allocate the resource:
- Use the nim -o allocate operation (or SMIT) to relate the resource to the machine
- Use SMIT, which prompts for the resources to allocate as part of the machine operation
definition
Perform the operation on the client

Some operations that are performed on a client include installing an operating system, installing
maintenance, and supporting a maintenance boot or a diagnostic boot.
Two of the phases that are related to an operation are:
- The NIM setup in which the NIM server is configured to support the task you want to perform
on the client
- The initiation of that task
The task can be initiated from the client; or it can be initiated from the NIM master if the client is
configured as a NIM client.
Student Notebook
Resource objects
IBM Power Systems
• Object types
– boot Represents the network boot image resource
– nim_script Directory for customization scripts that are created by NIM
– spot Shared Product Object Tree - equivalent to /usr file system
– lpp_source Source device for software product images
– bosinst_data Configuration file that is used during base system installation
– image_data Configuration file that is used during base system installation
– mksysb A mksysb image
– script A user created script that is executed on a client to perform
customization
– resolv_conf Configuration file for name server information
– ... (additional resource types)
• Attributes
– location Directory path
– server Machine which serves this resource
– Rstate, prev_state Status attributes
– ... (additional attributes)
Figure 4-10. Resource objects AN153.0
Notes:
Resources are the files and directories that NIM uses to install software on the clients.
Resource types
Resource types identify the different types of files that are used by NIM. For example:
- An lpp_source resource is a directory that contains the product images to be installed
- A spot resource contains the files that are used during the boot operation
- A script resource is a user definable script that can be used to customize a newly
installed client
- A mksysb resource is a backup image that can be used to install a client

V10.0
Student Notebook
Uempty Resource attributes

Attributes for resources identify where the resource can be found, its status, and so forth:
- location defines the directory path to the resource
- server identifies which machine serves the resource
- Rstate indicates whether a resource is available for clients to use
- prev_state indicate the previous value of Rstate
More resource types and attributes

There are a number of different resource types, each having its own set of attributes. lsnim is
probably the easiest way to get information about NIM attributes.
Student Notebook
Resource objects: lpp_source

IBM Power Systems
• lpp_source
– Directory containing software product images
– Supports NIM install operations (bos_inst and cust)
– Also used for creation of SPOT resource
• Defining an lpp_source:
# nim -o define -t lpp_source
-a server=<machine>
-a location=<directory> lppsource
[ optional attributes ]
<lppsource_name>
aix71-03-01 aix71-03-03
• # smit nim_mkres
bos filesets
Figure 4-11. Resource objects: lpp_source AN153.0
Notes:
lpp_source
When a resource of this type is defined, it represents a directory in which software product
images are stored. lpp_source resources are used to support NIM installation operations. An
lpp_source can also be used as the source for the creation of a SPOT.
When you perform a NIM installation operation and allocate an lpp_source resource to the
client, NIM NFS mounts the lpp_source directory on the client. Then, it invokes the
installp command on the client to install from the directory. When installp finishes, NIM
automatically unmounts the resource.
simages attribute
This attribute is used to indicate that an lpp_source resource contains the set of installable
images to which NIM requires access to perform its basic functions. This basic set of images is
referred to as support images or simages. NIM automatically manages the use of this attribute
as part of the management of an lpp_source.

V10.0
Student Notebook
Uempty NIM adds this attribute to the definition of an lpp_source when it provides the required
simages, and NIM removes this attribute from the object's definition if a required image
becomes unavailable.
Some NIM operations require access to an lpp_source that has this attribute as part of its
definition, so having this attribute can be important. Perform the check operation on the
lpp_source to have NIM check to see whether the simages requirement was fulfilled. If it has,
NIM adds this attribute to the lpp_source definition.
Defining an lpp_source resource

You can use the command line or SMIT to define an lpp_source.
The visual shows how the required attributes would be specified on the command line.
Required attributes are:
- server=<machine>
NIM name for the machine that serves this resource.
- location=<directory>
Directory where the lpp_source files are located.
There are a number of optional attributes, including:
- source=<directory>
If you already have a directory that contains the software images, the source attribute is
not required. If you want NIM to create a directory and populate it for you, the source
attribute specifies the directory or device that contains the software images to be copied into
the lpp_source directory.
- packages=<package_list>
Use the packages attribute if you want NIM to copy only specific packages from the source.
The final argument is the name of the NIM object:
- <lppsource_name>
The last argument on the nim command line is the name of the object you are operating on.
In this case, it is the name of the lpp_source resource that is created.
More lpp_source information:
- If you add or remove an installable image from the lpp_source, perform the check
operation on that object so that NIM rebuilds the .toc (table of contents) file, which is in the
lpp_source directory. The installp command uses the .toc to determine which
images are available.
- Starting in AIX 5.3, there is an update operation, which updates an lpp_source resource
by adding and removing packages. Previously, you might copy packages into an
lpp_source directory or remove packages from an lpp_source directory and run
nim -o check to update the lpp_source attributes. Previously, you might use SMIT to
add packages to an lpp_source through the smit nim_bffcreate fast path. However,
this SMIT function does not check to see whether the lpp_source is allocated or locked,
nor does it update the simages attribute when finished. The update operation was created
to address this situation.
Student Notebook
Resource objects: SPOT

IBM Power Systems
• SPOT
– /usr directory tree that is used during network boot lppsource
– Matching network boot images generated:

/tftpboot/<SPOT_name>.<Platform>.<Kernel>.<Network>
• Defining a SPOT
# nim -o define -t spot
-a server=<machine> SPOT
-a location=<directory>
-a source=<lpp_source_name>
[ optional attributes ] spot71-03-01 spot71-03-03
<SPOT_name> usr
bin include lib etc
Figure 4-12. Resource objects: SPOT AN153.0
Notes:
Components
• A /usr file system
A Shared Product Object Tree (SPOT) is a directory that contains AIX code that is equivalent in
content to the code that is in the /usr file system. The NIM SPOT creation process restores
files from AIX filesets into the SPOT directory.
The SPOT is NFS-mounted on a booting client to provide necessary device support for the boot
process.
• Boot image
As part of the creation of a SPOT resource, NIM also creates network boot images. The
network boot images are constructed in /tftpboot on the same machine in which the SPOT
is created. The boot images are constructed with code from the newly created SPOT. The boot
images are also sometimes called SPOT files. The boot image file is transferred to the client
system with the BOOTP protocol.

V10.0
Student Notebook
Uempty Since one SPOT can potentially support several types of machines, several boot image files
can be created. The naming convention identifies each boot image as:
<spot_name>.<Platform>.<Kernel>.<Network>, where:
- <Platform> identifies which architecture this boot image supports: chrp, rspc, and so forth
- <Kernel> specifies whether this boot image contains a multi-processor (mp), 64-bit (64) or
uni-processor (up) kernel.
- <Network> identifies the network type: ent, tok, and so forth
These days, the only combination most of you work with is: chrp.mp.ent or chrp.64.ent.
During a network boot, the boot image is transferred over the network and loaded into the
client’s memory.
• /tftpboot
It is good practice to make /tftpboot a separate file system. As a separate file system, it
removes the risk of filling the root file system. If you are supporting multiple AIX versions on
multiple machine types or multiple network types, this directory can get large.
Defining a SPOT resource on the command line

The visual shows the nim syntax to define a SPOT. The -t flag identifies the type of object you
want to define. In addition, you must specify the following required attributes:
- server=<machine>
- location=<directory>
Directory (on the server) where the SPOT files are located.
- source=<lpp_source_name>
The source attributes points to the location of the files that are used to create the SPOT
resource. The resource can be an existing lpp_source resource, a device name (for
example: /dev/cd0) or a directory that contains the source filesets that are used to create
the SPOT. Most commonly, the lpp_source resource is created first and then the spot is
created from the lpp_source.
- <spot_name>
The last argument on the nim command line is the name of the object you are operating on.
In this case, the name of the SPOT resource that is created.
Optional attributes
There can be a number of optional attributes, including:
- installp_flags=<flags>
NIM calls installp to create the SPOT. By default, NIM uses the -agX flags when calling
installp. You can use installp_flags to specify the options you require.
- auto_expand={yes|no}
Indicates that file systems should be automatically expanded if more space is needed.
Student Notebook
Defining a SPOT with SMIT

The visual shows the SMIT fast path for defining resource objects. SMIT opens with a window
where you can select which type of resource you want to define. After you select a resource
type, SMIT opens a window with the necessary fields to specify the resources and attributes for
that type of object, in this case, a SPOT.

V10.0
Student Notebook
Uempty
Resource objects: mksysb

IBM Power Systems
• mksysb
– Identifies a mksysb system backup image file
– Used for bos_inst operations
• Defining a mksysb
# nim -o define -t mksysb
-a server=<machine>
-a location=<mksysb_path>
[ optional attributes ]
<mksysb_name>
Figure 4-13. Resource objects: mksysb AN153.0
Notes:
mksysb
A mksysb resource represents a system backup image file that is created by using the mksysb
command. A mksysb resource can be used as the source of the BOS runtime files when a
bos_inst is performed.
Defining a mksysb resource

You can use the command line or SMIT to define a mksysb. You can use an existing mksysb
image, or you can have nim create one for you. (nim calls mksysb to create the new backup.)
Required attributes are:
- server=<machine>
- location=<mksysb_path>
Student Notebook
If the system backup image exists, enter the file name of the image. If you are creating the
system backup image as part of this operation, enter the name of the file that you want to
create.
There are a number of optional attributes, including:
- mk_image={yes|no}
If the backup file exists, specify no (the default). If you want nim to create a new backup file,
specify yes.
- source=<machine_name>
If you want nim to create a backup image for you, specify the NIM name of the machine you
want to back up.
- mksysb_flags=<value>
You can use this attribute to specify optional flags for the mksysb command, if needed.

V10.0
Student Notebook
Uempty
Network objects
IBM Power Systems
• Object types
– ent Ethernet network
– fddi FDDI network
– tok Token ring network
– atm ATM network (no network boot capability)
– hfi Host fabric interface network
– generic Generic network (no network boot capability)
• Attributes
– net_addr Network address for a network
– snm Subnetmask for a network
– routing<X> Routing information for a network
– Nstate, prev_state Status attributes
master router client
Figure 4-14. Network objects AN153.0
Notes:
To perform certain NIM operations, the NIM master must be able to supply information
necessary to configure client network interfaces. The NIM master must also be able to verify
that client machines can access all the resources that are provided by the NIM server. To avoid
the extra work of repeatedly specifying network information for each individual client, NIM
network objects are used to represent the networks in a NIM environment.
Network types
NIM supports the network types that are shown in the visual, plus a generic type. Network boot
support is provided for Ethernet, Token-Ring, FDDI and HFI. Network boot operations are not
supported on ATM or generic networks. NIM supports both standard Ethernet and IEEE 802.3
Ethernet networks.
Student Notebook
Network attributes
Network attributes include the network address, subnet mask, routes, and status. The Nstate
attribute indicates whether the object definition of the network is complete. NIM requires that all
networks be able to communicate with the NIM master, either with the master directly
connected to them or by having a NIM route to a network to which the master connects.
Routing
NIM routing information represents standard TCP/IP routing information for the networks that
are part of a NIM environment. This information defines the gateways that are used to establish
communication between the master machine and the clients.
The routing<X> attribute defines a route and includes:
- A destination (default or a NIM network name)
- A gateway address
If needed, multiple routes can be created and are numbered routing1, routing2, and so forth.
More attributes
There are a number of other attributes for each network object. lsnim is probably the easiest
way to get information about NIM attributes.
Other network information

The ring_speed (for token-ring) and cable_type (for Ethernet) are not attributes of the
network objects. They are attributes of the machine objects.

V10.0
Student Notebook
Uempty
Machine objects
IBM Power Systems
• Object types
– master
– standalone
– diskless Master
– dataless
• Attributes
– platform Architecture Standalone
– netboot_kernel up or mp
– if<X> Network interface information
– serves Resource served by this machine
– Cstate, Diskless
prev_state,
Mstate Status attributes
Dataless
Figure 4-15. Machine objects AN153.0
Notes:
NIM supports four types of machines: the master type and three types of clients: standalone,
diskless, and dataless.
Master
The master machine is defined by installing the master fileset, and then performing some quick
configuration. There can be only one master in the NIM environment. After a machine is defined
as the master, it can participate in NIM operations.
Standalone clients
Standalone clients have local disk resources. They are installed from the NIM server, but after
installation, they boot and operate from their local disks.
Student Notebook
Diskless clients
Diskless clients have no disks of their own. They run entirely by using resources from the NIM
server.
Dataless clients
Dataless machines can use only a local disk for paging space and the /tmp and /home file
systems. All of the other storage is provided over the network by the NIM server.
Machine attributes
Each machine object belongs to one of the four machines’ object classes. Additionally, machine
objects store other attributes about the machine. The visual shows a few of them:
- The platform attribute describes the machine architecture (chrp, rspc, and so forth).
- netboot_kernel indicates which type of kernel is required, uni-processor (up),
multi-processor (mp), or 64-bit kernel (64).
- if<X> is used to provide information about a machine’s network interfaces. If there are
multiple interfaces, they are numbered: if1, if2, and so forth. This attribute includes the
NIM network this interface connects to, the host name, the MAC address, and the network
type.
- The serves attribute identifies resources served by this machine. If the machine serves
several resources, there is a serves attribute for each resource.
- Cstate indicates the NIM operation that is being performed on a machine or that no NIM
operations are currently being performed.
- prev_state shows the previous Cstate.
- Mstate shows the execution state for a machine.
Note
NIM attempts to keep the value of this attribute synchronized with the machine's execution state,
but NIM does not guarantee its accuracy. Perform the check operation on the machine for NIM to
attempt to determine the machine's execution state.
More attributes
There are a number of other attributes for each machine object. lsnim is probably the easiest
way to get information about NIM attributes.

V10.0
Student Notebook
Uempty
Defining a machine object

IBM Power Systems
• # nim -o define -t standalone -a platform=<PlatformType>

-a netboot_kernel=<NetbootKernelType>
-a if1=<InterfaceDescription>
-a net_definition=<DefinitionName>
-a cable_type1=<TypeValue>
<MachineName>
• Examples:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
-a cable_type1="N/A" -a connect=nimsh
-a platform=chrp -a netboot_kernel=mp lpar1
# smit nim
Perform NIM Administrative Tasks
Manage Machines
Define a Machine
<provide hostname of client>
Figure 4-16. Defining a machine object AN153.0
Notes:
Follow these steps to add a client with the network information with SMIT:
1. On the NIM master, add a standalone client to the NIM environment by using SMIT
(nim_mkmac is the fast path).
2. Specify the host name of the client.
The client host name is the name translation of the IP address of the installation adapter
of this machine. By default, this name also becomes the host name of this client when
the client is installed. If using DNS, enter in the long host name here. For example,
lpar1.my.company.com.
3. The next SMIT screen that is displayed depends on whether NIM already has
information about the client's network. Supply the values for the required fields or accept
the defaults. Use the help information and the LIST option to help you specify the correct
values to add the client machine.
Student Notebook
For example, the command line might look like:

# nim -o define -t standalone -a if1="net1 lpar1 0 ent0"
The if1 quoted value in the example has multiple space delimited fields as follows:
• net1 is the network object name
• lpar1 is the host name
• 0 is the place holder for the mac address
• ent0 is the physical adapter that is used by the client to reach the master
If using SMIT, the sequence of menu items to the matching dialog panel would be:
# smit nim
Perform NIM Administrative Tasks
Manage Machines
Define a Machine
<provide hostname of client>
The resulting dialog panel is shown in the next visual.

V10.0
Student Notebook
Uempty
Define a client using SMIT

IBM Power Systems
Define a Machine
* NIM Machine Name [lpar1]
* Machine Type [standalone] +
* Hardware Platform Type [chrp] +
Kernel to use for Network Boot [mp] +
Communication Protocol used by client [nimsh] +
Primary Network Install Interface
* Cable Type N/A +
Network Speed Setting [] +
Network Duplex Setting [] +
* NIM Network network1
* Host Name lpar1
Network Adapter Hardware Address [0]
Network Adapter Logical Device Name [ent0]
IPL ROM Emulation Device [] +/
CPU Id []
Machine Group [] +
Comments []
Figure 4-17. Define a client using SMIT AN153.0
Notes:
NIM machine name/hostname

There are two names that are given to your client: a NIM name and a host name. The NIM
name is what is used when performing operations on this client. The host name becomes the
system-wide host name of this client and is also the name that is associated with the client's
adapter that NIM uses to do the client installation. In this example, a short name was used on
the prior panel. Hence, the NIM name and host name are identical. If a long name was used on
the prior panel, then the long name would be used for the host name and the short name for the
NIM Name. For example, if lpar1.my.company.com was used on the prior panel, then the
host name would be lpar1.my.company.com and the NIM name would be lpar1.
Machine type
The standalone machine type is the only type that is used now.
Student Notebook
Hardware platform type

You can choose between chrp, rspc, or the old classical rs6k. Since the chrp architecture came
out in the mid 1990s, most people are using that today. If you want to check what architecture
your client is using, run the command:
getconf -a | grep MACHINE_ARCHITECTURE.
On older AIX release levels, try the bootinfo -p command.
Kernel type
If a client machine is running the 64-bit kernel, then mp or 64 should be chosen. However, if the
client is running the 32-bit kernel, either the up or mp kernel can be chosen. To determine what
client is, run the ls -l /usr/lib/boot/unix command. Notice whether it is linked to the 64
up or mp kernel in that same directory. Also, the getconf -a command can be run to
determine whether the machine can run an mp kernel. An MP_CAPABLE setting of 1 means yes.
On older releases, run the bootinfo -z command to find out whether the machine can
handle mp. A setting of 1 again means yes. Starting with version 6.1, AIX uses only a 64-bit
kernel.
Communication protocol
Either the less Secure Shell protocol (rsh) can be used or the newer (nimsh) protocol (which is
available in AIX 5.3 and later versions of AIX).
Note
Each client can have a different setting.
Cable type
Most configurations today are set to N/A (not applicable), as modern adapters are autosensing
of the connection type, or support only a single type (such as twisted pair or fiber).The cable
type can be checked by running the lsattr -El entX command to notice whether the
cable_type field shows. If not, then setting to N/A should work. If running twisted-pair cable,
then setting it to tp should work.
Network speed/duplex
These settings are only used when performing a push boot operation on the client. If not set, the
current SMS speed/duplex settings for your installation adapter are used.
NIM network
The NIM network is the network to which the client is assigned.

V10.0
Student Notebook
Uempty Hardware address

The hardware address is the MAC address of the client. It is only needed for BOOTP broadcast
operations. This MAC address, if ever needed, can be retrieved by looking at your client's
Remote IPL SMS menus.
Logical device name

The logical device name is the name of NIC physical adapter over which you plan to install. For
example, it might be ent0 or ent1. This adapter receives the host name in the Host Name field
when the client is installed.
IPL ROM emulation

The IPL ROM emulation is only set for machines that do not support network boot. See online
documentation for details.
CPU_ID
The CPU_ID is the machine ID retrieved from running the uname command on the client. It will
be used to uniquely identify this client in the future. You do not have to set the CPU_ID, NIM
configures it.
Machine group
You can assign a client to a machine group.
Command line
The equivalent NIM command for the operation is:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
For more information, use the lsnim -q define -t standalone command or the nim
man page.
Student Notebook
NIM operations
IBM Power Systems
• Operations on clients
– bos_inst
• rte
• mksysb
– cust
– maint
– diag
– maint_boot
• Procedure
– Allocate resources to clients (for intended operation)
– Perform operation
– Deallocate resources
• Other NIM object operations
– define, change, remove, allocate, deallocate, maint,
lslpp, lppchk, check, and so forth
Figure 4-18. NIM operations AN153.0
Notes:
Operations on clients
NIM supports several different types of operations to install and manage software on NIM
clients. In addition, there are operations to manage the NIM objects themselves.
Three of the client operations are:
- bos_inst
Installs AIX on a client.
- cust and maint
Updates and maintains AIX software.
- diag
Prepares resources for a client to be network-booted into diagnostics mode.
- maint_boot
Boots a client to maintenance mode over the network.

V10.0
Student Notebook
Uempty bos_inst
A bos_inst operation is used to perform a Basic Operating System (BOS) installation on a
client. There are two types of bos_inst operations: rte and mksysb.
bos_inst: rte installations

An rte installation instructs the BOS installation process to install AIX from the images in the
lpp_source resource that is specified for the operation.
The default bos_inst operation is rte (runtime environment).
bos_inst: mksysb installations

A mksysb bos_inst operation installs the client from a mksysb resource. A mksysb resource
is a system backup image that is created by using the mksysb command (or the SMIT or
WebSM interfaces to the mksysb command).
Installing a system from backup reduces, and often eliminates, repetitive installation and
configuration tasks. For example, a backup installation can copy optional software that is
installed on the source system, in addition to the Base Operating System. The backup image
also transfers many user configuration settings.
If you have many clients with the same software configuration, you can use one mksysb image
as the source to install all of them.
bos_inst customization
The NIM installation process allows you to run a customization script after AIX is installed on the
system. To run a script, allocate a script resource to the client before performing the
bos_inst. That script can be used to perform such customization as setting passwords,
changing network addresses, and so forth.
cust
This NIM operation performs software customization on a running NIM client. You can use the
cust operation to:
- Update existing software
- Install more software
- Run a customization script
maint
This NIM operation performs software maintenance operations on clients, such as committing
applied software and removing software.
Student Notebook
diag
This NIM operation enables the client to boot to diagnostics over the network.
maint_boot
This operation enables the client to boot to maintenance mode over the network.
Procedure for operations

To perform a NIM operation on a client machine, a number of steps must be performed:
1. Allocate the required resources to the client machine.
- Allocating the resource makes the resources available to the client. You can
explicitly allocate the resources before you perform the NIM operation, or you can
allocate the resources at the same time you perform the NIM operation.
- Allocation usually involves NFS exporting the resource’s directory so the client can
NFS mount it over the network.
- The initial boot image is transferred by using tftp. To provide this network boot
image, an entry is created in the /etc/bootptab file and files are created in the
/tftpboot directory.
2. Perform the operation.
3. Deallocate resources.
- While a resource is allocated to a client, the resource is locked to block any
changes. After the operation completes, the resources should be deallocated from
the machine so they can be freed again for updates or changes.
Other NIM object operations

In addition to operations that directly affect NIM clients, a number of NIM operations can be
used to manage NIM objects. In addition to the obvious (define, change, remove,
allocate, and unallocate), you can also:
- Update or add software to a spot or lpp_source resource.
(cust operation)
- Perform software maintenance on a spot or lpp_source resource.
(maint operation)
- List LPP information in a resource.
(lslpp operation)
- Verify software packages in a spot or lpp_source resource.
(lppchk operation)
- Check the status of a NIM object.
(check operation) The actual tasks performed by the check operation differ depending on
which type of object you are operating on.

V10.0
Student Notebook
Uempty
bos_inst operation
IBM Power Systems
• Command line
# nim -o bos_inst
-a lpp_source=<lpp_res_name>
-a spot=<SPOT_name>
-a source={rte|mksysb}
-a mksysb=<mksysb_name>
-a boot_client={yes|no}
[optional attributes]
<client_name>
• # smit nim_bosinst
Figure 4-19. bos_inst operation AN153.0
Notes:
bos_inst
Configuring NIM to perform a bos_inst can be done from the command line or through SMIT.
There are two steps: allocating resources to the client and enabling the bos_inst. It is also
possible to combine these steps into one command:
# nim -o bos_inst -a lpp_source=<lpp_res_name> -a spot=<spot_name>
[additional resources] [-a source={rte|mksysb} [additional attributes]
<client_name>
If you use SMIT to enable a bos_inst, SMIT opens a series of windows to prompt you for the
required information and then displays a window where you can set more optional attributes.
Required information
The required information for a bos_inst operation is:
- <client_name>
Student Notebook
The last argument specifies the NIM object that you want to operate on. In this case, the
NIM object is the target client machine that you want to install.
- spot=<spot_name>
Specifies the SPOT resource that you want to use.
- lpp_source=<lpp_res_name>
The name is the name of the lpp_source resource you want to use for the installation. In
AIX 5.3 and later, this attribute is not required for a mksysb installation.
Optional information
Optional attributes include:
- source={rte|mksysb}
mksysb=<mksysb_name>
If you do not specify the source attribute, nim performs a rte bos_inst. If you set
source=mksysb, then you must use the mksysb attribute to specify the name of the mksysb
resource you want to use.
Note
In most cases, you must still include an lpp_source resource, even if you are doing a mksysb
installation. If a mksysb is created that includes all devices, you do not need to specify an
lpp_source.
- boot_client={yes|no}
When set to yes, the master attempts to reboot the client machine automatically for
reinstallation. For this option to succeed, the client must be running and initialized as a NIM
client or have rhosts permissions that are granted to the master. If set to no, the server is
configured to support the network boot. The actual boot would need to be initiated later.

V10.0
Student Notebook
Uempty
More information about NIM

IBM Power Systems
• Documentation
– NIM from A to Z in AIX 5L
(http://www.redbooks.ibm.com/ )
– AIX Version 7.1 Installation and migration guide
• IBM training class (AN22)

– AIX Network Installation Manager (NIM)
(http://www.ibm.com/services/learning )
• EZ NIM
– nim_master_setup
– nim_client_setup
Figure 4-20. More information about NIM AN153.0
Notes:
More information about NIM

NIM is a powerful tool; it can be used in many different ways.
This topic introduced some basic NIM concepts and terminology. If you plan to use NIM in your
cluster, you should get more information so that you can use NIM most effectively.
Documentation and Redbooks

The following books provide in-depth information about using NIM:
- AIX Version 7.1 Installation and migration
- SG24-7296 NIM from A to Z in AIX 5L (Redbooks: http://www.redbooks.ibm.com/)
Student Notebook
Classes
You should also consider the following class.
- AN220 - AIX Network Installation Management (NIM)
(IBM Learning Services training course:
http://www.ibm.com/services/learning/index.html)

V10.0
Student Notebook
Uempty
Additional topics in NIM course

IBM Power Systems
• Push operations and unattended installations

• lppsource and SPOT management issues
• Problem determination
• Customization scripts
• Resource creation (lppsource, mksysb) options
• Group definitions
• Client software maintenance and bundles
• Alternate disk migration
• Security and networking issues
• NIM based backup, recovery, and cloning
Figure 4-21. Additional topics in NIM course AN153.0
Notes:
Student Notebook
Checkpoint
IBM Power Systems
1. True or false: NIM can be used to fix an LPAR that fails to

boot because of a problem with the /etc/inittab.
2. True or False: The lsnim command can be used to display

information about NIM objects.
3. True or False: A NIM client cannot be a resource server.
4. True or False: An lpp_source resource contains software to

be installed.
Notes:

V10.0
Student Notebook
Uempty
Exercise: Basic Network Installation Manager
configuration
IBM Power Systems
• Configure an LPAR to be a NIM master

and server
• Define a NIM client machine and set up

for a BOS installation
Figure 4-23. Exercise: Basic Network Installation Manager configuration AN153.0
Notes:
Student Notebook
Unit summary
IBM Power Systems

Notes:

V10.0
Student Notebook
Uempty
Unit 5. System initialization: Accessing a boot
image

This unit describes the boot process up to the point of loading the boot logical
volume. It describes the content of the boot logical volume and how it can be
re-created, if it is corrupted.

• Describe the boot process through to the loading of the boot logical
volume
• Describe the contents of the boot logical volume
• Re-create the boot logical volume on a system that fails to boot
• Adjust the bootlist for the wanted order of search

Accountability:
• Lab exercise
References
Online AIX Version 7.1 Operating system and device management
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-1
Student Notebook
Unit objectives
IBM Power Systems

• Describe the boot process through to the loading the boot
logical volume
• Re-create the boot logical volume on a system that fails to
boot
Notes:

V10.0
Student Notebook
Uempty 5.1. System startup process
Student Notebook
How does a Power server or LPAR boot?

IBM Power Systems
Possible failures
Check and initialize

Hardware error (only for
the hardware
POST physical server power-on)
Locate boot image using Unable to find any boot image

the boot list
Load and pass control to Boot image that is corrupted

boot image
Start AIX software

initialization
Figure 5-2. How does a Power server or LPAR boot? AN153.0
Notes:
Check and initialize hardware (POST)

After powering on a machine, the hardware is checked and initialized. This phase is called the
Power On Self-Test (POST). The goal of the POST is to verify the functions of the hardware.
Locate and load the boot image

After the POST is complete, a boot image is located from the bootlist and is loaded into
memory. During a normal boot, the location of the boot image is usually a disk. Besides, disks,
the boot image can be loaded from tape, CD-ROM/DVD, or the network, which is typically the
case when booting into maintenance mode. If working with the Network Installation Manager
(NIM), the boot image is loaded over the network.
To use a different boot location, you must invoke the appropriate bootlist by pressing function
keys during the boot process. There is more information on bootlists, later in the unit.

V10.0
Student Notebook
Uempty Last steps

Passing control to the operating system means that the AIX kernel (which was loaded from the
boot image) takes over from the system firmware that was used to find and load the boot image.
The operating system is then responsible for completing the boot sequence. The components of
the boot image are discussed later in this unit.
All devices are configured during the boot process. The configuration is done in different phases
of the boot by the cfgmgr utility.
Towards the end of the boot sequence, the init process is started and processes the
/etc/inittab file.
Student Notebook
Loading of a boot image

IBM Power Systems
Firmware
Boot (1) CDROM/DVD
devices RAM
(2) Disk Boot Logical Volume
(3) Network
(hd5)
hdisk0
Boot
controller
Figure 5-3. Loading of a boot image AN153.0
Notes:
Introduction
This visual shows how the boot logical volume is found during the AIX boot process. Machines
use one or more bootlists to identify a boot device. The bootlist is part of the firmware.
Bootstrap code
Power Systems can manage several different operating systems. The hardware is not bound to
the software. System firmware reads the boot list to locate the boot device.
The Open Firmware's load method loads the AIX boot image. It reads the boot image as a
whole from the boot device. Then, the SOFTROS code (aixmon_chrp) processes the loaded
boot image to uncompress and relocate to a different region.
The Open Firmware loads the boot image with the Partition Table Entries (PTE) on the boot
disk. The PTEs describe the location and size of the boot image on the disk.

V10.0
Student Notebook
Uempty Compression of boot image

To save disk space, the boot image is compressed on the disk. During the boot process, the
boot image is uncompressed and the AIX kernel gets boot control.
Student Notebook
Boot disk and the boot logical volume

IBM Power Systems
Boot disk
Boot Logical Volume (BLV - hd5)
Compressed Compressed
VGDA RAM file system Rest of the root disk
kernel
Boot record Base ODM (hd2, hd4, hd9var, and so forth)
SOFTROS
(aixmon_chrp)
Figure 5-4. Boot disk and the boot logical volume AN153.0
Notes:
Contents of the boot logical volume

The boot logical volume contains four components: The SOFTROS, compressed AIX kernel, a
mountable compressed RAM file system, and a reduced ODM.
Boot record
The boot block is not used to decide how to load the boot image. The load of the boot image is
based on the Partition Table Entry (PTE) table and ELF header of the boot image to decide how
to load the image into memory.
Boot logical volume

The Boot Logical Volume (BLV) is a logical volume on the boot disk, which contains the boot
image. The BLV is part of rootvg and has a logical volume type attribute of boot. The logical
name is typically hd5, but can be anything. The boot image includes:

V10.0
Student Notebook
Uempty - SOFTROS
- Compressed kernel
- Compressed RAM file system
- Base ODM
SOFTROS
The SOFTROS program, aixmon_chrp, processes the loaded boot image and uncompresses
the compressed kernel and compressed RAM file system. It then relocates the boot image to a
different region.
Kernel
The kernel initializes itself and then runs /etc/init in the RAM file system. The RAM file
system version of init is a specialized version (/usr/lib/boot/ssh on the boot disk root file
system) and is used in phases 1 and 2 of the AIX initialization process.
Note: The kernel that is loaded from the boot logical volume is never replaced during the boot
process; the same kernel is used in multiuser mode. If you need a new kernel, you must
re-create the boot logical volume with the new kernel.
RAM file system

The RAM file system is a reduced or miniature root file system that is loaded into memory and
used as if it were a disk-based file system. The RAM file system is used during phases 1 and 2
of the AIX initialization. The contents of the RAM file system are slightly different depending on
the type of system boot:
Type of boot Contents of RAM file system

Programs and data necessary to access rootvg and start
Boot from system hard disk
up the rest of AIX.
Boot from the Installation CD or Programs and data necessary to install AIX or do software
DVD maintenance.
Boot from Diagnostics CD or Programs and data necessary to run standalone
DVD diagnostics.
Base ODM
The boot logical volume contains a reduced copy of the ODM. During the boot process, many
devices are configured before hd4 is available. For these devices, the corresponding ODM files
must be stored in the boot logical volume.
Booting from other devices

It is also possible to boot from a CD, DVD, a tape, or the network.
Student Notebook
Boot device Description

When booting from a hard disk, the boot
Disk image is placed into a separate logical
volume (the BLV).
When booting from CD, DVD or tape, the
CD/DVD boot image is included at the beginning of
Tape the media, but is not in a separate logical
volume.
When booting from the network, the boot
image is loaded into memory over the
Network
network (by the bootp protocol) from a file
system on the server.
The focus in this lesson is booting from a hard disk.

V10.0
Student Notebook
Uempty 5.2. Unable to find boot image
Student Notebook
Working with bootlists

IBM Power Systems
• Normal bootlist
# bootlist -m normal hdisk0 hdisk1
# bootlist -m normal -o
hdisk0 blv=hd5 pathid=0
• Customized service bootlist (numeric 6 key)

# bootlist -m service -o
• Default bootlist (numeric 5 key)

> Hard coded in firmware
cd0
hdisk0 blv=hd5
ent0
Figure 5-5. Working with bootlists AN153.0
Notes:
Introduction
You can use the command bootlist or diag from the command-line to change or display the
bootlists. You can also use the System Management Services (SMS) programs. SMS is
covered later in this unit.
bootlist command
The bootlist command is the easiest way to change the bootlist. The first example shows
how to change the bootlist for a normal boot. In this example, the system can be booted from
either hdisk0 or hdisk1. To query the bootlist, you can use the bootlist -o option.
The blv=hd5 part of the bootlist entry is to identify which boot logical volume to use on that
listed disk.
The second example shows how to display the customizable service bootlist.

V10.0
Student Notebook
Uempty With the bootlist command, you can also specify the IP parameters to use when specifying a
network adapter. For example:
# bootlist -m service ent0 gateway=192.168.1.1 bserver=192.168.10.3
client=192.168.1.57
Using the service bootlist in this way, you can boot to maintenance or diagnostic using a NIM
server without having to use SMS to specify the network adapter as the boot device.
Types of bootlists
The normal bootlist is used during a normal boot.
The default bootlist (hardcoded in the firmware) is used when numeric 5 is pressed during the
boot sequence.
Most machines, in addition to the default bootlist and the customized normal bootlist, allow for a
customized service bootlist. The service bootlist is set by using mode service with the
bootlist command. The service bootlist is used when the numeric 6 key is pressed during
boot.
Here is a list that summarizes the boot modes and the manual keys that are associated with
them:
• Numeric 1: Start an SMS (System Management Services) mode boot.
• Numeric 5: Start a service mode boot that uses the default service bootlist.
The default service bootlist is:
cd0
hdisk0 blv=hd5
ent0
• Numeric 6: Start a service mode boot that uses the customized service bootlist.
You can find variations on the different models of AIX systems. Refer to your specific model at:
http://ibm.com/support/knowledgecenter. Look for your model under Power Systems.
Student Notebook
AIX 7: Bootlist pathid enhancements

IBM Power Systems
• The bootlist command now allows specification of the pathid of a

device when setting the bootpath:
# bootlist -m normal hdisk0 blv=hd5 pathid=0
• The pathid argument can be repeated for multiple paths in the wanted
order:
# bootlist -m normal hdisk0 blv=hd5 pathid=0 pathid=1
or
# bootlist -m normal hdisk0 blv=hd5 pathid=0,1
• The bootlist command now shows the pathid with the device:
# bootlist -m normal –o
Figure 5-6. AIX 7: Bootlist pathid enhancements AN153.0
Notes:
The pathid command gives you the ability to operate at a pathid level. In the past, you had
to selectively delete and reconfigure device paths to generate bootlists on systems with MPIO
disks. The operation can now be done with a single command.
There were situations where the bootlist was too long. When the bootlist specifies disks without
any pathid restriction, each path takes an entry in the bootlist. The bootlist has a limited
capacity. Exceeding the capacity can result in being unable to use a different disk. Use of the
pathid specification can avoid this type of problem.
It is important to remember that ordering of paths are maintained with the bootlist command.
If you want the bootlist to be set to boot from paths 1, 0, and 2, use the pathid=1,0,2
argument.

V10.0
Student Notebook
Uempty
Starting System Management Services

IBM Power Systems
• During AIX partition activation

• Press numeric 1 or specify SMS on HMC activate
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
1 = SMS Menu 5 = Default Boot List

8 = Open Firmware Prompt 6 = Stored Boot List
Memory Keyboard Network SCSI

...
Figure 5-7. Starting System Management Services AN153.0
Notes:
Booting to SMS
If you cannot boot AIX because the bootlist needs correcting, then you need to use the System
Management Services (SMS) to modify the bootlist. The SMS programs are integrated into the
hardware (they are in NVRAM).
The visual shows how to start the System Management Services. During system boot, shortly
before the firmware looks for a boot image, it discovers some basic hardware on the system.
Then, the LED usually displays a value of E1F1. As the devices are discovered, either a text
name or graphic icon for the resource displays on the screen. The second device that is
discovered is usually the keyboard. When the keyboard is discovered, a unique double beep
tone is usually sounded. After the keyboard is discovered, the system is ready to accept input
that overrides the default behavior of conducting a normal boot. But after the last icon or name
is displayed, the system starts to use the bootlist to find the boot image and it is too late to
change it. One of the keyboard actions you can do during this brief period is to press the
numeric 1 key to request the system boot to SMS.
Student Notebook
SMS on LPAR systems

To start SMS by using the Advanced Option for Power On:
Activate the partition by using the SMS boot mode. Do this action by clicking the Advanced
button when activating the partition. In the Boot Mode drop down list, select SMS. Do not
forget to choose to open a terminal window, if one is not already opened. The partition stops
at the SMS menu.

V10.0
Student Notebook
Uempty
Working with bootlists in SMS (1 of 2)

IBM Power Systems
System Management Services

Main Menu Multiboot
1. Select Language
2. Setup Remote IPL 1. Select Install/Boot Device
(Initial Program Load) 2. Configure Boot Device Order
3. Change SCSI Settings 3. Multiboot Startup <OFF>
4. Select Console
5. Select Boot Options ===> 2
===> 5
Select Device Type
Configure Boot Device Order 1. Diskette
1. Select 1st Boot Device 2. Tape
2. Select 2nd Boot Device 3. CD/DVD
3. Select 3rd Boot Device 4. IDE
4. Select 4th Boot Device 5. Hard Drive
5. Select 5th Boot Device 6. Network
6. Display Current Setting 7. None
7. Restore Default Setting 8. List All Devices
===> 1 ===> 8
Figure 5-8. Working with bootlists in SMS (1 of 2) AN153.0
Notes:
Working with the bootlist

The System Management Service Main Menu lists:
1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. Change SCSI Settings
4. Select Console
5. Select Boot Options
In the System Management Service menu, pick Select Boot Options to work with the
bootlist.
The next screen is the Multiboot menu that lists:
1. Select Install/Boot Device
2. Configure Boot Device Order
3. Multiboot Startup <OFF>
Student Notebook
With option 1, you select a specific device to boot from right now. With option 2, you can modify
the customized bootlists. Option 3 is a toggle that has the system stop at this Multiboot menu
every time it boots, or continue with the normal boot sequence.
The focus here is the second option, used to modify the customized bootlist. The Configure
Bootlist Device Order panel lists:
1. Select 1st Boot Device
2. Select 2nd Boot Device
3. Select 3rd Boot Device
4. Select 4th Boot Device
5. Select 5th Boot Device
6. Display Current Setting
7. Restore Default Setting
You can either list or modify the bootlist. You select which position in the bootlist you want to
modify and then it lists possible device type to obtain a list of device to select:
1. Diskette
2. Tape
3. CD/DVD
4. IDE
5. Hard Drive
6. Network
7. None
8. List All Devices
Select the device type. If there are not many bootable devices, it is sometimes easier to use the
List All Devices option.
Finally, you would select a specific device to place in that position of the bootlist, as illustrated
on the next visual.
It is important to understand that when SMS is used to modify the bootlist, both the normal
bootlist and the service bootlist are modified. If you wanted them to be different, you need to
customize them later when you have a command prompt (such as in multiuser mode).

V10.0
Student Notebook
Uempty
Working with bootlists in SMS (2 of 2)

IBM Power Systems
Select Device
Device Current Device
Number Position Name
1. - IBM 10/100/1000 Base-TX PCI-X Adapter
( loc=U789D.001.DQDWAYT-P1-C5-T1 )
2. - SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
( loc=U789D.001.DQDWAYT-P3-D1 )
3. 1 SATA CD-ROM
( loc=U789D.001.DQDWAYT-P1-T3-L8-L0
Select Task )
4. None SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
===> 2 ( loc=U789D.001.DQDWAYT-P3-D1 )
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Current Boot Sequence

===> 2
1. SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
2. None
3. None
4. None
Figure 5-9. Working with bootlists in SMS (2 of 2) AN153.0
Notes:
Selecting bootlist devices

You are presented with list of devices to select from. For example:
1. - IBM 10/100/1000 Base-TX PCI-X Adapter
( loc=U789D.001.DQDWAYT-P1-C5-T1 )
2. - SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
3. 1 SATA CD-ROM
( loc=U789D.001.DQDWAYT-P1-T3-L8-L0 )
4. None
For each position in the bootlist, you can select a device. The location code that is provided with
each device in the list helps you to uniquely identify devices that otherwise might be confused.
Next, you are presented with a Select Task panel that provides the following options:
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Student Notebook
After you select a device, you need to set that selection.

You can repeat this action for each position in the bootlist. The other option is to clear a device
by specifying none as an option for that position.
Exiting out of SMS always triggers a boot attempt. If you did not specify a particular device for
this boot, it uses the bootlist set in SMS.

V10.0
Student Notebook
Uempty 5.3. Corrupted boot logical volume
Student Notebook
Boot device alternatives (1 of 2)

IBM Power Systems
• Boot device is either:

– First one found with a boot image in the bootlist
– Device specified in SMS Select Install/Boot Device
• If the boot device is removable media (CD, DVD, tape):

– Boots to the Install and Maintenance menu
• If the boot device is a network adapter:

– Boot result depends on NIM configuration for client machine:
• nim –o bos_inst Install and Maintenance menu
• nim –o maint_boot Maintenance menu
• nim –o diag Diagnostic menu
Figure 5-10. Boot device alternatives (1 of 2) AN153.0
Notes:
Boot alternatives
The device where the system boots is the first device that it finds in the designated bootlist.
Whenever the effective boot device is bootable media, such as a mksysb tape/CD/DVD or
installation media, the system will boot to the Installation and Maintenance menu.
If the booting device is a network adapter, the mode of boot depends on the configuration of the
NIM server that services the network boot request. If the NIM server is configured to support an
AIX installation or a mksysb recover, then the system will boot to Install and Maintenance. If
the NIM server is configured to serve out a maintenance image, then the system boots to a
Maintenance menu (a submenu of Installation and Maintenance). If the NIM server is
configured to serve out a diagnostic image, then boot to a diagnostic mode.
There are other ways to boot to a diagnostic utility. If the booting device is a CD/DVD with a
diagnostic CD/DVD in the drive, boot into that diagnostic utility. If a service mode boot is
requested and the booting device is a hard disk with a boot logical volume, then the system
boots into the diagnostic utilities.

V10.0
Student Notebook
Uempty The system can be signaled which bootlist to use during the boot process. The default is to use
the normal bootlist and boot in a normal mode. The bootlist can be changed during a window of
opportunity between when the system discovers the keyboard and before it commits to the
default boot mode. The signal can be generated from the system console (HMC virtual terminal)
or from a service processor attached workstation (such as an HMC).
The keyboard signal that is used can vary from firmware to firmware. But, the most common is a
numeric 5 to indicate that the firmware should use the service bootlist and a numeric 6 to
indicate that the firmware should use the customizable service bootlist. Either of these special
keyboard signals result in a service mode boot, which can cause a boot to diagnostic mode
when booting off a boot logical volume on your hard disk.
With an HMC, you can specify which signal to send as part of the LPAR activation. Even if you
forget to override the default boot mode (usually normal to multiuser), you can still use the
virtual console keyboard to override the action after the keyboard is discovered.
Student Notebook
Boot device alternatives (2 of 2)

IBM Power Systems
• If the boot device is a disk:

– Boot depends on use of service mode:
• Normal mode boot - Boot to multiuser
• Service mode boot - Diagnostic menu
• Two types of service mode boots:
– Requesting default service bootlist (key 5)
– Requesting customized service bootlist (key 6)
• HMC advanced boot options support all of the above

– Normal boot
– Diagnostic with default bootlist
– Diagnostic with stored bootlist
Figure 5-11. Boot device alternatives (2 of 2) AN153.0
Notes:
Booting off a disk with a boot logical volume (BLV)

When the boot device is a disk on your system, the disk must have a valid boot logical volume
to be successful. The result of the boot depends upon the mode of the boot. If booting in normal
mode, the system is booted up into multiuser mode (the default run level of the inittab). If doing
a service mode boot (that uses either default bootlist or the customizable service mode bootlist),
then the system runs a diagnostics program and present a diagnostics menu.
When using the HMC advanced activation options, you can set the mode of your boot and, if
service mode, which boots list to use: default or stored (customized service).

V10.0
Student Notebook
Uempty
Accessing a system that will not boot

IBM Power Systems
HMC
Advance Activate options: Boot the system from
Default bootlist • BOS CD/DVD
• Tape
• Network device (NIM)
Select maintenance mode
Maintenance
1. Access a Root Volume Group

2. Copy a System Dump to Media Perform corrective actions
3. Access Advanced Maintenance
4. Install from a System Recover data
Backup
Figure 5-12. Accessing a system that will not boot AN153.0
Notes:
Introduction
The visual shows an overview of how to access a system that will not boot normally. The
maintenance mode can be started from an AIX CD/DVD, an AIX bootable tape (like a mksysb),
or a network device that can access a NIM master. The devices that contain the boot media
must be stored in the bootlists.
Boot into maintenance mode

To boot into maintenance mode:
- AIX 5.3, AIX 6.1 and AIX 7.1 systems support the bootlist command and booting from a
mksysb tape, but the tape device is, by default, not part of the boot sequence.
- If planning to boot off media in an LPAR environment, check that the device adapter slot is
allocated to the LPAR in question. If not, you might need to update the partition profile to
allocate that device. If the device is allocated to another LPAR, then you need to first
Student Notebook
deallocate it from that other LPAR.Use a dynamic LPAR operation on the HMC to allocate
that slot.
- If using the default bootlist, the sequence is fixed and the CD/DVD drive is the first practical
device.
- If you are not using SMS for this boot and are using a tape drive or a network adapter as
your boot device, then you need to use one of the customizable bootlists. In this situation, it
is usually the service bootlist.
Verify your bootlist, but do not forget that some machines do not have a service bootlist.
Check that your boot device is part of the bootlist:
# bootlist -m service -o
- If you want to boot from your internal tape device, you need to change the bootlist because
the tape device by default is not part of the bootlist. For example:
# bootlist -m service rmt0 hdisk0
- Whichever bootlist you are using, insert the boot media (either tape or CD/DVD) into the
drive.
- Power on the system (or activate the LPAR). The system begins booting from the
installation media. After several minutes, c31 is displayed in the LED/LCD panel (or as the
reference code on the HMC display). c31 means that the software is prompting on the
console for input (normally to select the console device and then select the language). For
an LPAR, you need to have the virtual console started to interact with the prompts.
- Normally, you are prompted to select the console device and then select the language. After
making these selections, you see the Installation and Maintenance menu.
For partitioned systems with an HMC, you would normally use the HMC to access SMS and
then select the bootable device, which would bypass the use of a bootlist.
You can also use a NIM server to boot to maintenance. You would need to place your system’s
network adapter in your customized service bootlist before any other bootable devices. Or, use
SMS to specifically request boot over that adapter (the latter option is most common). Here is
an example of setting the service boot list:
# bootlist -m service ent0 gateway=192.168.1.1
bserver=192.168.10.3 client=192.168.1.57
You would also need to set up the NIM server to provide a boot image for doing a maintenance
boot. For example, at the NIM server:
# nim -o maint_boot -spot <spotname> <client machine object name>
Use the correct installation media or SPOT

Be careful to use the correct AIX installation CD/DVD (or NIM spot, or mksysb tape) to boot
your machine. For example, you should not boot an AIX 7 installed machine with an AIX 6
installation CD/DVD. You must match the version, release, and maintenance level. The
same applies to the NIM spot level when using a network boot with NIM as the server of the
boot image. A common error that you might experience, if there is a mismatch, is an infinite
loop of /etc/getrootfs errors when trying to access the rootvg in maintenance mode.

V10.0
Student Notebook
Uempty
Booting in maintenance mode

IBM Power Systems
Welcome to Base Operating System

Define the Installation and Maintenance
System Console
Type the number of your choice and press Enter.
Choice is indicated by >>>.
1 Start Install Now with Default Settings

2 Change/Show Installation Settings and Install
>>> 3 Start Maintenance Mode for System Recovery
4 Configure Network Disks (iSCSI)
5 Select Storage Adapters
>>> Choice [1]: 3
Maintenance
>>> 1 Access a Root Volume Group

2 Copy a System Dump to Removable Media
3 Access Advanced Maintenance Functions
4 Erase Disks
>>> Choice [1]: 1
Figure 5-13. Booting in maintenance mode AN153.0
Notes:
First steps
When booting in maintenance mode, you first must identify the system console that is used. For
example, your virtual console (vty), graphic console (lft), or serial attached console (tty that is
attached to the S1 port).
After selecting the console, the Installation and Maintenance menu is shown:
1 Start Install Now with Default Settings
2 Change/Show Installation Settings and Install
3 Start Maintenance Mode for System Recovery
Student Notebook
To work in maintenance mode, use selection 3 to start the Maintenance menu:

1 Access a Root Volume Group
2 Copy a System Dump to Removable Media
3 Access Advanced Maintenance Functions
4 Erase Disks
6 Install from a System Backup
In a network boot that uses NIM, the console goes straight to the maintenance menu.
From this point, access the rootvg to run any system recovery steps that might be necessary.

V10.0
Student Notebook
Uempty
Working in maintenance mode

IBM Power Systems
Access a Root Volume Group
Type the number for a volume group to display the logical volume
information
and press Enter.
1) Volume Group 00c35ba000004c00000001153ce1c4b0 contains these disks:

hdisk1 70006 02-08-00 hdisk0 70006 02-08-00
Choice: 1
Volume Group Information
-----------------------------------------------------------------------------
Volume Group ID 00c35ba000004c00000001153ce1c4b0 includes the following
logical volumes:
hd5 hd6 hd8 hd4 hd2 hd9var

hd3 hd1 hd10opt
-----------------------------------------------------------------------------
1) Access this Volume Group and start a shell
2) Access this Volume Group and start a shell before mounting filesystems
99) Previous Menu
Choice [99]: 1
Figure 5-14. Working in maintenance mode AN153.0
Notes:
Select the correct volume group

When accessing the rootvg in maintenance mode, you need to select the volume group that is
the rootvg. The Access a Root Volume Group panel displays all detected volume groups and
the disks that comprise these volume groups. Only the volume group IDs are shown and not the
names of the volume groups. Check with your system documentation that you select the correct
disk. Do not rely too much on the physical volume name but instead rely more on the PVID,
VGID, or SCSI ID.
After selecting the volume group, it will show the list of logical volumes that are contained in the
volume group. Confirm that the rootvg is selected. Two selections are then offered:
- Access this Volume Group and start a shell
- Access this Volume Group and start a shell before mounting file systems
Student Notebook
Access this volume group and start a shell

When you choose this selection, the rootvg is activated (with the varyonvg command), and all
file systems that belong to the rootvg are mounted. A shell is started which can be used to run
any system recovery steps.
Typical scenarios where this selection must be chosen are:
- Changing a forgotten root password
- Re-creating the boot logical volume
- Changing a corrupted bootlist
Access this volume group and start a shell before mounting file systems
When you choose this selection, the rootvg is activated, but the file systems that belong to the
rootvg are not mounted.
A typical scenario where this selection is chosen is when a corrupted file system needs repair by
the fsck command. Repairing a corrupted file system is only possible if the file system is not
mounted.
Another scenario might be a corrupted hd8 transaction log. Any changes that take place in the
superblock or i-nodes are stored in the log logical volume. When these changes are written to
disk, the corresponding transaction logs are removed from the log logical volume.
The logform command reinitializes a corrupted transaction log, which is only possible, when
no file systems are mounted. After initializing the log device, you need to do a file system repair
for all file systems that use this transaction log. You must explicitly specify the file system type:
JFS or JFS2:
# logform -V jfs2 /dev/hd8
# fsck -y -V jfs2 /dev/hd1
# fsck -y -V jfs2 /dev/hd9var
# fsck -y -V jfs2 /dev/hd10opt
# exit
Keep in mind that US keyboard layout is used but you can use the retrieve function by using the
commands set -o emacs or set -o vi.

V10.0
Student Notebook
Uempty
How to fix a corrupted BLV

IBM Power Systems
1 Boot to maintenance 2 Select volume group

mode from bootable that contains hd5.
media:
CD/DVD, tape, or NIM
Maintenance
1 Access a Root Volume Group
# bosboot -ad /dev/hdisk0

3 Rebuild
# sync
BLV.
# sync
# reboot
Figure 5-15. How to fix a corrupted BLV AN153.0
Notes:
Maintenance mode
If the boot logical volume is corrupted (for example, bad blocks on a disk might cause a
corrupted BLV), the machine will not boot.
To fix this situation, you must boot your machine in maintenance mode, from a CD/DVD or tape.
If NIM is set up for a machine, you can also boot the machine from a NIM master in
maintenance mode. NIM is actual a common way to do special boots in a logical partition
environment.
Re-creating the boot logical volume

After booting from CD/DVD, tape or NIM an Installation and Maintenance menu is shown and
you can start the maintenance mode. After accessing the rootvg, you can repair the boot logical
Student Notebook
volume with the bosboot command. You need to specify the corresponding disk device, for
example hdisk0:
# sync
# sync
# reboot
The sync commands flush any file data in memory cache to disk. While you would normally use
a shutdown command, in maintenance mode it is appropriate to use the reboot command.
The bosboot command requires that the boot logical volume (hd5) exists and is valid. The boot
logical volume might be deleted by mistake or the LVCB of the boot logical volume might be
damaged. If you need to re-create the BLV from scratch, the following steps should be followed:
1. Boot your machine in maintenance mode (from CD/DVD or tape (numeric 5) or use
(numeric 1) to access the Systems Management Services (SMS) to select boot device).
2. Remove the old hd5 logical volume, if it exists.
# rmlv hd5
3. Clear the boot record at the beginning of the disk.
# chpv -c hdisk0
4. Create an hd5 logical volume: one physical partition in size, must be in rootvg and outer
edge as intrapolicy. Specify boot as logical volume type.
# mklv -y hd5 -t boot -a e rootvg 1
5. Run the bosboot command as described on the visual.
6. Check the actual bootlist.
# bootlist -m normal -o
7. Write data immediately to disk.
# sync
# sync
8. Reboot the system.
# reboot
By using the internal command ipl_varyon -i, you can check the state of the boot record.

V10.0
Student Notebook
Uempty
Checkpoint (1 of 2)
IBM Power Systems
1. True or False: You must have AIX loaded on your system to use the
System Management Services programs.
2. Your AIX system is powered off. AIX is installed on hdisk1 but the
bootlist is set to boot from hdisk0. How can you fix the problem and
make the machine boot from hdisk1?
3. Your machine is booted and is at the # prompt. What is the

command that displays the normal bootlist?
4. Your machine is booted and is at the # prompt. How might you

change the normal bootlist?
Notes:
Student Notebook
Checkpoint (2 of 2)
IBM Power Systems
5. What command is used to build a new boot image and write it to the
boot logical volume?
6. What script controls the boot sequence?
7. True or False: During the AIX boot process, the AIX kernel is loaded
from the root file system.
8. How do you boot an AIX machine into maintenance mode?
Notes:

V10.0
Student Notebook
Uempty
Exercise: System initialization: Accessing a boot
image
IBM Power Systems
• Identify information on your system

• Prepare NIM to support booting to
maintenance mode
• Boot to maintenance mode
• Repair a corrupted boot logical volume
• Work with multi-path bootlists
Figure 5-18. Exercise: System initialization: Accessing a boot image AN153.0
Notes:
Student Notebook
Unit summary
IBM Power Systems

• Describe the boot process through to the loading the boot
logical volume
• Re-create the boot logical volume on a system that is failing
to boot
Notes:

V10.0
Student Notebook
Uempty
Unit 6. System initialization: rc.boot and inittab

This unit describes the final stages of the boot process and outlines how
devices are configured for the system.
Common boot errors are described and how they can be analyzed to fix boot
problems.

• Identify the steps in system initialization from loading the boot image to
boot completion
• Identify how devices are configured during the boot process
• Analyze and solve boot problems

Accountability:
• Lab exercise
References
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-1
Student Notebook
Unit objectives
IBM Power Systems

• Identify the steps in system initialization from loading the boot
image to boot completion
Notes:
There are many reasons for boot failures. The hardware might be damaged or due to user
errors, the operating system might not be able to complete the boot process.
A good knowledge of the AIX boot process is a prerequisite for all AIX system administrators.

V10.0
Student Notebook
Uempty 6.1. AIX initialization part 1
Student Notebook
System software initialization overview

IBM Power Systems
Load kernel and pass control
/
Restore RAM file system from
boot image etc dev mnt usr
Start init process Configure base

rc.boot 1 devices
(from RAMFS)
rc.boot 2
Activate rootvg
Configure remaining
Start "real" init process rc.boot 3 devices
(from rootvg)
/etc/inittab
Figure 6-2. System software initialization overview AN153.0
Notes:
Boot sequence
The visual shows the boot sequence after loading the AIX kernel from the boot image.
The AIX kernel gets control and executes the following steps:
1. The kernel restores a RAM file system into memory by using information that is provided
in the boot image. At this stage, the rootvg is not available, so the kernel needs to work
with commands provided in the RAM file system. You can consider this RAM file system
as a small AIX operating system.
2. The kernel starts the init process that was provided in the RAM file system (not from
the root file system). This init process runs a boot script rc.boot.
3. rc.boot controls the boot process. In the first phase, (it is called by init with
rc.boot 1), the base devices are configured. In the second phase (rc.boot 2), the
rootvg is activated (or varied on).

V10.0
Student Notebook
Uempty 4. After activating the rootvg at the end of rc.boot 2, the kernel mounts over the RAM
file system with the file systems from rootvg. The init from the root file system, hd4
replaces the boot image in the kernel.
5. This init processes the /etc/inittab file. Out of this file, rc.boot is called a third
time (rc.boot 3) and all remaining devices are configured.
Student Notebook
rc.boot 1
IBM Power Systems
Failure LED
Process 1 rootvg is not active
F05 init
c06
rc.boot 1
Boot image
ODM
restbase
548 510
RAM file system
ODM
cfgmgr -f
bootinfo -b Devices to activate rootvg

511 are configured !
Figure 6-3. rc.boot 1 AN153.0
Notes:
rc.boot phase 1 actions

The init process started from the RAM file system, runs the boot script rc.boot 1. If init
fails for some reason (for example, a bad boot logical volume), c06 is shown on the LED
display. The most common defect in the boot logical volume is missing device drivers; this
defect is solved by rebuilding the boot image with the needed drive drivers included.
The following steps are run when rc.boot 1 is called:
1. The restbase command is called which copies the ODM from the boot image into the
RAM file system. After this step, an ODM is available in the RAM file system. The LED
shows 510 (DEV CFG 1 START) if restbase completes successfully; otherwise, LED
548 (RESTBASE FAILED) is shown.
2. When restbase completes successfully, the configuration manager, cfgmgr, is run
with the option -f (first). cfgmgr reads the Config_Rules object class and runs all
methods that are stored under phase=1. Phase 1 configuration methods result in the

V10.0
Student Notebook
Uempty configuration of base devices into the system, so that the rootvg can be activated in the
next rc.boot phase.
3. Base devices are all devices that are necessary to access the rootvg. If the rootvg is
stored on a hdisk0, all devices from the system board to the disk itself must be
configured to be able to access the rootvg.
4. At the end of rc.boot 1, the system determines the last boot device (used to establish
the /dev/ipldevice link) by calling bootinfo -b. The LED shows 511 (DEV CFG 1
END), followed by 553 (PHASE 1 COMPLETE).
Student Notebook
rc.boot 2 (1 of 2)
IBM Power Systems
Failure LED rc.boot 2
551
552 554 517 rootvg

ipl_varyon
hd4: hd2: hd9var:
hd6
/ /usr /var
555 fsck -f /dev/hd4
mount /dev/hd4 / copycore
557
if dump,
fsck -f /dev/hd2 copy
518 517 mount /usr dev etc mnt usr var
fsck -f /dev/hd9var
517 mount /var /
518 copycore RAM file system
umount /var
556 swapon /dev/hd6
Figure 6-4. rc.boot 2 (1 of 2) AN153.0
Notes:
rc.boot phase 2 actions (part 1)

rc.boot is run for the second time and is passed the parameter 2. The LED shows 551
(VARYON_IPLDEV). The following steps take part in this boot phase:
1. The rootvg is varied on with a special version of the varyonvg command that is
designed to handle rootvg. If ipl_varyon completes successfully, 517 (MOUNT ROOT)
is shown on the LED, otherwise one of the following codes are shown, and the boot
process stops:
- 552 (IPLVARYON ERROR)
- 554 (UNKNOWN BOOT DISK)
- 556 (LVM_QUERY ERROR)
2. fsck checks the root file system, hd4. The option -f means that the file system is
checked only if it was not unmounted cleanly during the last shutdown. This check
improves the boot performance. If the check fails, LED 555 (FSCK ERROR) is shown.

V10.0
Student Notebook
Uempty 3. Afterward, /dev/hd4 is mounted directly onto the root (/) in the RAM file system. If the
mount fails, for example due to a corrupted JFS log, the LED 557 (ROOT MNT FAILED)
is shown and the boot process stops.
4. Next, /dev/hd2 is checked and mounted (again with option -f, it is checked only if the
file system wasn't unmounted cleanly). If the mount fails, LED 518 (/USR MOUNT
FAILED) is displayed and the boot stops.
5. Then, the /var file system is checked and mounted. This check is necessary at this
stage because the copycore command checks if a memory dump occurred. If a
memory dump exists in a paging space device, it is copied from the memory dump
device, /dev/hd6, to the copy directory that is by default the directory /var/adm/ras.
/var is unmounted afterward. If the /var mount fails, LED 518 (/VAR MOUNT FAILED)
is displayed and the boot stops.
6. The primary paging space /dev/hd6 is made available.
Special root syntax in RAMFS

After the disk-based root file system is mounted over the RAMFS, the syntax that is used in
rc.boot to access the RAMFS files is different:
• RAMFS files are accessed by using a prefix of /../. For example, to access the fsck
command in the RAMFS (before the /usr file system is mounted), rc.boot uses
/../usr/sbin/fsck.
• Disk-based files are accessed by using normal AIX file syntax. For example, to access
the fsck command on the disk (after the /usr file system is mounted) rc.boot uses
/usr/sbin/fsck.
Note
This syntax works only during the boot process. If you boot from the CD/DVD into maintenance
mode and need to mount the root file system manually, you need to mount it over another directory,
such as /mnt. Otherwise, you are unable to access the RAMFS files.
Student Notebook
rc.boot 2 (2 of 2)
IBM Power Systems
swapon /dev/hd6 rootvg
hd4: hd2: hd9var:

hd6
Copy RAM /dev files to disk: / /usr /var
mergedev
Copy RAM ODM files to disk:

cp /../etc/objrepos/Cu* dev etc
/etc/objrepos ODM
mount /var
dev etc mnt usr var
ODM
Copy boot messages to
alog /
RAM file system
Kernel removes RAMFS
Notes:

After the paging space /dev/hd6 is available, the following tasks are run in rc.boot 2:
1. To understand this step, remember two things:
- /dev/hd4 is mounted onto root(/) in the RAM file system.
- In rc.boot 1, cfgmgr runs and all base devices are configured. This configuration
data writes into the ODM of the RAM file system.
Now, mergedev is called and all /dev files from the RAM file system are copied to disk.
2. All customized ODM files from the RAM file system ODM are copied to disk as well. At
this stage, both ODMs (in hd5 and hd4) are in sync.
3. The /var file system (hd9var) is mounted.
4. All messages during the boot process are copied into a special file. You must use the
alog command to view this file:

V10.0
Student Notebook
Uempty # alog -t boot -o

As no console is available at this stage all boot information is collected in this file.
When rc.boot 2 is finished, the /, /usr, and /var file systems in rootvg are active.
Final stage
At this stage, the AIX kernel removes the RAM file system (returns the memory to the free
memory pool) and starts the init process from the root (/) file system in rootvg.
Student Notebook
rc.boot 3 (1 of 2)
IBM Power Systems
Process 1 /etc/inittab:
init /sbin/rc.boot 3 553
fsck -f /dev/hd3
Here, you work with mount /tmp 517 518
rootvg
syncvg rootvg &

517
Normal: cfgmgr -p2 Config_Rules /etc/objrepos:

Service: cfgmgr -p3 ODM
phase=2
phase=3
c31 cfgcon c32
rc.dt boot
c33 c34
savebase hd5:
ODM
Notes:

If rc.boot phase 2 completes as indicated by LED 553 (BOOT PHASE 1 COMPLETE), you can
assume that rc.boot phase 3 started. At this boot stage, the /etc/init process is started. It
reads the /etc/inittab file and runs the commands line-by-line. It runs rc.boot for the third
time, passing the argument 3 that indicates the last boot phase.
rc.boot 3 Runs the following tasks:
1. The /tmp file system is checked and mounted.
2. The rootvg is synchronized by syncvg rootvg. If rootvg contains any stale partitions
(for example, a disk that is part of rootvg was not active), these partitions are updated
and synchronized. syncvg is started as a background job.
3. The configuration manager is called again. If the boot mode is normal, the cfgmgr is
called with option -p2 (phase 2). If the boot mode is service, the cfgmgr is called with
option -p3 (phase 3).

V10.0
Student Notebook
Uempty 4. The configuration manager reads the ODM class Config_Rules and runs either all
methods for phase=2 or phase=3. All remaining devices that are not base devices are
configured in this step.
5. cfgcon configures the console. The numbers c31, c32, c33, or c34 are displayed
depending on the type of console:
- c31: Console not yet configured. Provides instruction to select a console.
- c32: Console is a lft (graphic display) terminal.
- c33: Console is a tty.
- c34: Console is a file on the disk.
If CDE is specified in /etc/inittab, the CDE is started and you get a graphical boot
on the console.
6. To synchronize the ODM in the boot logical volume with the ODM from the root (/) file
system, savebase is called.
Student Notebook
rc.boot 3 (2 of 2)
IBM Power Systems
/etc/objrepos:
savebase ODM
syncd 60
errdemon
hd5:
Turn off LEDs ODM
rm /etc/nologin
A device that was previously detected
could not be found. Run "diag -a".
chgstatus=3
in CuDv ? System initialization is completed.
Execute next line in

/etc/inittab
Notes:

After the ODMs are synchronized again, the following steps take place:
1. The syncd daemon is started. All data that is written to disk is first stored in a cache in
memory before writing it to the disk. The syncd daemon writes the data from the cache
every 60 seconds to the disk.
Another daemon process, the errdemon daemon, is started. This process allows errors
that are triggered by applications or the kernel to be written to the error log.
2. The LED display is turned off.
3. If the file /etc/nologin exists, it is removed. If a system administrator creates this file,
a login to the AIX machine is not possible. During the boot process, /etc/nologin is
removed.

V10.0
Student Notebook
Uempty 4. If devices exist that are flagged as missing in CuDv (chgstatus=3), a message is
displayed on the console. For example, this message is displayed if external devices are
not powered on during system boot.
5. The last message, System initialization completed, is written to the console.
rc.boot 3 is finished. The init process runs the next command in /etc/inittab.
Student Notebook
rc.boot summary
IBM Power Systems
Executed Phase
Command Primary Actions
From Config_Rules
RAM • restbase
rc.boot 1 file system 1
(/dev/ram0) • cfgmgr -f
• ipl_varyon
RAM • Mount /, /usr, /var file systems
rc.boot 2 file system
(/dev/ram0) • mergedev
• Copy ODM files
• mount /tmp
• cfgmgr -p2 2=normal
rc.boot 3 rootvg or
cfgmgr -p3 3=service
• savebase
Figure 6-8. rc.boot summary AN153.0
Notes:
Summary
During rc.boot 1, all base devices are configured. This configuration is done by cfgmgr
-f, which runs all phase 1 methods from Config_Rules.
During rc.boot 2, the rootvg is varied on. All /dev files and the customized ODM files from
the RAM file system are merged to disk.
During rc.boot 3, cfgmgr -p configures all remaining devices. The configuration manager
reads the Config_Rules class and runs the corresponding methods. To synchronize the ODMs,
savebase is called that writes the ODM from the disk back to the boot logical volume.

V10.0
Student Notebook
Uempty
Fixing corrupted file systems and logs

IBM Power Systems
• Boot to maintenance mode.

• Access rootvg without mounting file systems.
• Rebuild file system log and run fsck.

# fsck -y -V jfs2 /dev/hd11admin
Figure 6-9. Fixing corrupted file systems and logs AN153.0
Notes:
JFS log or JFS2 log corrupted?

To fix a corrupted JFS or JFS2 log, boot in maintenance mode and access the rootvg, but do not
mount the file systems. In the maintenance shell, run the logform command and do a file
system check for all file systems that use this JFS or JFS2 log. Keep in mind what file system
type your rootvg had: JFS or JFS2.
For JFS:
# logform -V jfs /dev/hd8
# fsck -y -V jfs /dev/hd1
# fsck -y -V jfs /dev/hd9var
# fsck -y -V jfs /dev/hd10opt
# fsck -y -V jfs /dev/hd11admin
exit
Student Notebook
For JFS2:
# fsck -y -V jfs2 /dev/hd11admin
exit
The logform command initializes a new JFS transaction log and can result in loss of data
because JFS transactions can be destroyed. Your machine will boot after the JFS log is
repaired.
JFS log corruption typically happens when the system crashes or is taken down in a hard
manner by the administrator.
The JFS log recovery that is described does not ensure that disk updates in process are
completed. Determining what was processed and what needs reprocessing is the responsibility
of the applications by using their transaction logs and any checkpoint processing that was
completed.

V10.0
Student Notebook
Uempty
Let’s review: rc.boot (1 of 3)

IBM Power Systems
(1)
rc.boot 1
(2)
(4)
(3)
(5)
Figure 6-10. Let’s review: rc.boot (1 of 3) AN153.0
Notes:
Instructions
Using the following questions, put the solutions into the visual.
1. What calls rc.boot 1? Is it:
• /etc/init from hd4
• /etc/init from the RAMFS in the boot image
2. Which command copies the ODM files from the boot image into the RAM file system?
3. Which command triggers the execution of all phase 1 methods in Config_Rules?
4. Which ODM files contain the devices are configured in rc.boot 1?
• ODM files in hd4
• ODM files in RAM file system
5. How can you determine the last boot device?
Student Notebook

IBM Power Systems
(5)
rc.boot 2
(1) (6)
(2) (7)
(3)
(8)
557
(4)
Notes:
Instructions
Order the following eight expressions in the correct sequence.
- Turn on paging
- Merge RAM /dev files.
- Copy boot messages to alog
- Activate rootvg
- Mount /var; copy memory dump; unmount /var
- Mount /dev/hd4 onto / in RAMFS
- Copy RAM ODM files
- Finally, answer the following question. Put the answer in box 8:
Your system stops booting with an LED 557. Which command failed?

V10.0
Student Notebook
Uempty

IBM Power Systems
From which file is Update ODM in BLV

rc.boot 3 started?
_________________ _________
sy____ ___
/sbin/rc.boot 3 err_______
fsck -f _______ Turn off ____

mount ________
rm _________
s_______ ________&
Missing devices ?
_________=3
________ -p2
in ______ ?
________ -p3
Execute next line in
Start Console: ______ _____________
Start CDE: _________
Notes:
Instructions
Complete the missing information in the picture.
Your instructor reviews the activity with you.
Student Notebook

V10.0
Student Notebook
Uempty 6.2. AIX initialization part 2
Student Notebook
IBM Power Systems
Predefined
PdDv
PdAt
PdCn
cfgmgr Config_Rules
Customized Methods
CuDv Define
CuAt Device
Configure
Driver load
CuDep Change
CuDvDr Unconfigure
unload
CuVPD Undefine
Figure 6-13. Configuration manager AN153.0
Notes:
When the configuration manager is run

During system boot, the configuration manager is run to configure all devices that are detected
and any device whose device information is stored in the configuration database. At run time,
you can configure a specific device by directly running the cfgmgr command.
If you encounter problems during the configuration of a device, use cfgmgr -v. With this
option, cfgmgr shows the devices as they are configured.
Automatic configuration
The configuration manager automatically detects many devices. For this configuration to occur,
device entries must exist in the predefined device object classes. The configuration manager
uses the methods from PdDv to manage the device state, for example, to bring a device into the
defined or available state.

V10.0
Student Notebook
Uempty Installing new device support

cfgmgr can be used to install new device support. If you run cfgmgr with the -i flag, the
command attempts to install device software support for each newly detected device.
High-level device commands like mkdev run methods and allow the user to add, delete, show,
or change devices and their attributes.
Define method
When a device is defined through its define method, the information from the predefined
database for that type of device is used to create the information that describes the
device-specific instance. This device-specific information is then stored in the customized
database.
Configure method steps

The process of configuring a device is often device-specific. The configure method for a kernel
device must:
1. Load the device driver into the kernel
2. Pass device-dependent information that describes the device instance to the driver
3. Create a special file for the device in the /dev directory
Many devices are not physical devices, such as logical volumes or volume groups, these
devices are pseudodevices. For this type of device, the configured state is not as meaningful.
However, it still has a configuration method that marks the device as configured or runs more
complex operations to determine whether any devices are attached to it.
Configuration order
The configuration process requires that a device is defined or configured before a device
attached to it can be defined or configured. At system boot time, the configuration manager
configures the system in a hierarchical fashion. Finally, first the system board is configured,
then the buses, then the adapters that are attached, and the devices that are connected to the
adapters. The configuration manager then configures any pseudodevices (volume groups,
logical volumes, and so forth) that need to be configured.
Student Notebook
Config_Rules object class

IBM Power Systems
Phase seq boot rule
1 10 0 /etc/methods/defsys cfgmgr -f
1 12 0 /usr/lib/methods/deflvm
2 10 0 /etc/methods/defsys
2 12 0 /usr/lib/methods/deflvm cfgmgr -p2
2 19 0 /etc/methods/ptynode (Normal boot)
2 20 0 /etc/methods/startlft
3 10 0 /etc/methods/defsys
3 12 0 /usr/lib/methods/deflvm
3 19 0 /etc/methods/ptynode
cfgmgr -p3
3 20 0 /etc/methods/startlft
3 25 0 /etc/methods/starttty (Service boot)
Figure 6-14. Config_Rules object class AN153.0
Notes:
Introduction
The Config_Rules ODM object class is used by cfgmgr during the boot process. The phase
attribute determines when the respective method is called.
Phase 1
All methods with phase=1 are run when cfgmgr -f is called. The first method that is started
is /etc/methods/defsys, which is responsible for the configuration of all base devices. The
second method /usr/lib/methods/deflvm loads the logical volume device driver (LVDD)
into the AIX kernel.
If you have devices that must be configured in rc.boot 1, that means before the rootvg is
active, you need to place phase 1 configuration methods into Config_Rules. A bosboot is
required afterward.

V10.0
Student Notebook
Uempty Phase 2
All methods with phase=2 are run when cfgmgr -p2 is called. This action takes place in the
third rc.boot phase for a normal boot. The seq attribute controls the sequence of the
execution: The lower the value, the higher the priority.
Phase 3
All methods with phase=3 are run when cfgmgr -p3 is called. This action takes place in the
third rc.boot phase for a service boot.
Sequence number
Each configuration method has an associated sequence number. When running the methods
for a particular phase, cfgmgr sorts the methods based on the sequence number. The methods
are then started, one by one, starting with the smallest sequence number. Methods with a
sequence number of zero are started last after the methods with nonzero sequence numbers.
Boot mask
Each configuration method has an associated boot mask:
- If the boot_mask is zero, the rule applies to all types of boot.
- If the boot_mask is nonzero, the rule then applies to the boot type specified. For example,
if boot_mask = DISK_BOOT, the rule would be used for boots from disk versus
NETWORK_BOOT, which apply when booting through the network.
Student Notebook
cfgmgr output in the boot log using alog

IBM Power Systems
# alog -t boot -o
-------------------------------------------------------
attempting to configure device 'sys0'
invoking /usr/lib/methods/cfgsys_rspc -l sys0
return code = 0
******* stdout *******
bus0
******* no stderr *****
-------------------------------------------------------
attempting to configure device 'bus0'
invoking /usr/lib/methods/cfgbus_pci bus0
return code = 0
******** stdout *******
bus1, scsi0
****** no stderr ******
-------------------------------------------------------
invoking /usr/lib/methods/cfgbus_isa bus1
return code = 0
******** stdout ******
fda0, ppa0, sa0, sioka0, kbd0
****** no stderr *****
Figure 6-15. cfgmgr output in the boot log using alog AN153.0
Notes:
The boot log

Because no console is available during the boot phase, the boot messages are collected in a
special file, which, by default, is /var/adm/ras/bootlog. As shown in the visual, you must
use the alog command to view the contents of this file.
To view the boot log, run the command as shown, or use the smit alog fastpath.

V10.0
Student Notebook
Uempty Here is an example command and output:

# alog -t boot -o
-------------------------------------------------------
attempting to configure device 'sys0'
invoking /usr/lib/methods/cfgsys_rspc -l sys0
return code = 0
******* stdout *******
bus0
******* no stderr *****
-------------------------------------------------------
invoking /usr/lib/methods/cfgbus_pci bus0
return code = 0
******** stdout *******
bus1, scsi0
****** no stderr ******
-------------------------------------------------------
invoking /usr/lib/methods/cfgbus_isa bus1
return code = 0
******** stdout ******
fda0, ppa0, sa0, sioka0, kbd0
****** no stderr *****
If you have boot problems, it is always a good idea to check the boot alog file for potential boot
error messages. All output from cfgmgr is shown in the boot log with other information that is
produced in the rc.boot script.
The default boot log file size is 128 KB. If you want to increase the size of the boot log, for
example to 256 KB, run the following command:
# print “Resizing boot log” | alog -C -t boot -s 262144
Student Notebook
/etc/inittab file
IBM Power Systems
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunab
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # ru
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability
• Do not use an editor to change /etc/inittab

• Use mkitab, chitab, rmitab instead
Figure 6-16. /etc/inittab file AN153.0
Notes:
Purpose of /etc/inittab
The /etc/inittab file supplies information for the init process. Note how the rc.boot script
is run out of the inittab file to configure all remaining devices in the boot process.
Modifying /etc/inittab
Do not use an editor to change the /etc/inittab file. One small mistake in /etc/inittab,
and your machine will not boot. Instead, use the commands mkitab, chitab, and rmitab to edit
/etc/inittab. The advantage of these commands is that they always guarantee a
non-corrupted /etc/inittab file. If your machine stops booting with an LED 553, this code
indicates a bad /etc/inittab file in most cases.

V10.0
Student Notebook
Uempty Consider the following examples:

- To add a line to /etc/inittab, use the mkitab command. For example:
# mkitab "myid:2:once:/usr/local/bin/errlog.check"
- To change /etc/inittab so that init ignores the line tty1, use the chitab command:
# chitab "tty1:2:off:/usr/sbin/getty /dev/tty1"
- To remove the line tty1 from /etc/inittab, use the rmitab command. For example:
# rmitab tty1
Viewing /etc/inittab
The lsitab command can be used to view the /etc/inittab file. For example:
# lsitab dt
dt:2:wait:/etc/rc.dt
If you issue lsitab -a, the complete /etc/inittab file is shown.
telinit and run levels

Use the telinit command to signal the init daemon:
- To tell the init daemon to re-read the /etc/inittab use:
# telinit q
- To tell the init daemon to reset the environment to match a different (or same) run level, use:
# telinit n (Where n is the wanted run level.)
- To query what the current run level is use:
# who -r
Student Notebook
Example inittab:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability

V10.0
Student Notebook
Uempty
Boot problem management

IBM Power Systems
Symptoms Possible causes User action

or LED code
AA060011 Bad bootlist Boot SMS, update bootlist.
message: Can’t Damaged BLV Boot to maintenance, Access the rootvg.
find OS Image Re-create the BLV:
# bosboot -ad /dev/hdiskx
551, 555, 557 File system or log corrupted Rebuild journal log and fsck the file systems.
rootvg locked (only if 551) Unlock rootvg (chvg –u rootvg)
552, 554, 556 File system superblock Rebuild journal log and fsck the file systems
corrupted Or recover superblock from secondary
Reduced ODM corrupted If that fails, recover from mksysb
553 Corrupt /etc/inittab Access the rootvg. Check /etc/inittab (empty,

/etc/environment missing or corrupt?). Check /etc/environment
523 - 534 ODM files missing ODM files are missing or inaccessible.
Restore missing files from a system backup
518, 517 Failed or hung file system Check /etc/filesystem.

mount ( /usr, /var, /tmp) Check network (if remote mount), file systems
(fsck) and hardware.
Figure 6-17. Boot problem management AN153.0
Notes:
Introduction
The visual shows some common boot errors that might happen during the AIX software boot
process.
Bootlist wrong?
If the bootlist is wrong, the system cannot boot. This problem is easy to fix. Boot in SMS and
select the correct boot device. Keep in mind that only hard disks with boot records are shown as
selectable boot devices.
/etc/inittab corrupted? /etc/environment corrupted?

An LED of 553 usually indicates a corrupted /etc/inittab file, but in some cases a bad
/etc/environment might also lead to a 553 LED. To fix this problem, boot in maintenance
mode and check both files. Consider using a mksysb to retrieve these files from a backup tape.
Student Notebook
Boot logical volume or boot record corrupted?

The next thing to try if your machine does not boot, is to check the boot logical volume.
To fix a corrupted boot logical volume, boot in maintenance mode and use the bosboot
command:
JFS log or JFS2 log corrupted?

To fix a corrupted JFS or JFS2 log, boot in maintenance mode and access the rootvg, but do not
mount the file systems. In the maintenance shell, run the logform command and do a file
system check for all file systems that use this JFS or JFS2 log. Keep in mind what file system
type your rootvg had: JFS or JFS2.
The logform command initializes a new JFS transaction log and might result in loss of data
because JFS transactions might be destroyed. Your machine will boot after the JFS log is
repaired.
Superblock corrupted?
Another thing that you can try is to check the superblocks of your rootvg file systems. If you boot
in maintenance mode and you get error messages like Not an AIX file system or Not a
recognized file system type, it is probably due to a corrupted superblock in the file system.
Each file system has two super blocks. Running fsck should automatically recover the primary
superblock by copying from the backup superblock. The following steps are provided in case
you need to do this recovery manually.
For JFS, the primary superblock is in logical block 1 and a copy is in logical block 31. To
manually copy the superblock from block 31 to block 1 for the root file system (in this example),
run the following command:
# dd count=1 bs=4k skip=31 seek=1 if=/dev/hd4 of=/dev/hd4
For JFS2, the locations are different. To manually recover the primary superblock from the
backup superblock for the root file system (in this example), run the following command:
# dd count=1 bs=4k skip=15 seek=8 if=/dev/hd4 of=/dev/hd4
rootvg locked?
Many LVM commands place a lock into the ODM to prevent other commands from working at
the same time. If a lock remains in the ODM due to a crash of a command, it might lead to a
hanging system.
To unlock the rootvg, boot in maintenance mode and access the rootvg with file systems. Run
the following command to unlock the rootvg:
# chvg -u rootvg

V10.0
Student Notebook
Uempty ODM files missing?

If you see LED codes in the range 523 - 534, ODM files are missing on your machine. Use a
mksysb tape of the system to restore the missing files.
Mount of /usr or /var failed?

An LED of 518 indicates that the mount of the /usr or /var file system failed. If /usr is
mounted from a network, check the network connection. If /usr or /var are locally mounted,
use fsck to check the consistency of the file systems. If fsck does not help, check the
hardware by running diagnostics from the Diagnostics CD.
Student Notebook
Let's review: /etc/inittab file

IBM Power Systems
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3
rc:2:wait:/etc/rc
fbcheck:2:wait:/usr/sbin/fbcheck
srcmstr:2:respawn:/usr/sbin/srcmstr
rctcpip:2:wait:/etc/rc.tcpip
rcnfs:2:wait::/etc/rc.nfs
qdaemon:2:wait:/usr/bin/startsrc -
sqdaemon
dt:2:wait:/etc/rc.dt
tty0:2:off:/usr/sbin/getty /dev/tty1
myid:2:once:/usr/local/bin/errlog.check
Figure 6-18. Let's review: /etc/inittab file AN153.0
Notes:
Instructions
Answer the following questions as they relate to the /etc/inittab file shown in the visual:
1. Which process does the init process start only one time?
The init process does not wait for the initialization of this process.
2. Which process is involved in print activities on an AIX system?
3. Which line does the init process ignore?
4. Which line determines that multiuser mode is the initial run level of the system?

V10.0
Student Notebook
Uempty 5. Where is the System Resource Controller started?
6. Which line controls network processes?
7. Which component allows the execution of programs at a certain date or time?
8. Which line runs /etc/firstboot, if it exists?
9. Which script controls starting of the CDE desktop?
10. Which line is run in all run levels?
11. Which line takes care of varying on the volume groups, activating paging spaces, and
mounting file systems that are activated during boot?
Student Notebook
Checkpoint (1 of 2)
IBM Power Systems
1. From where is rc.boot 3 run?
2. Your system stops booting with LED 557. In which rc.boot phase
does the system stop?
3. What are some reasons for this problem (LED 557)?
4. Which ODM file is used by the cfgmgr during boot to configure the
devices in the correct sequence?
Notes:

V10.0
Student Notebook
Uempty
Checkpoint (2 of 2)
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
6. What does the line init:2:initdefault: in /etc/inittab

mean?
Notes:
Student Notebook
Exercise: System initialization: rc.boot and inittab

IBM Power Systems
• Repair a corrupted log logical volume
• Analyze and fix a boot failure
• Explore the rc.boot script
Figure 6-21. Exercise: System initialization: rc.boot and inittab AN153.0
Notes:

V10.0
Student Notebook
Uempty
Unit summary
IBM Power Systems

• Identify the steps in system initialization from loading the boot
image to boot completion
Notes:
Highlights
- After the boot image is loaded into RAM, the rc.boot script is run three times to configure
the system.
- During rc.boot 1, devices to vary on the rootvg are configured.
- During rc.boot 2, the rootvg is varied on.
- In rc.boot 3, the remaining devices are configured.
- The init process initiates processes that are defined in the /etc/inittab file.
Student Notebook

V10.0
Student Notebook
Uempty
Unit 7. LVM metadata and related problems

This unit explains how metadata concepts are important in understanding
and working with AIX logical volume manager problems.

• Explain where LVM metadata information is stored
• Use importvg and exportvg to manage LVM metadata
• Solve ODM-related LVM problems

Accountability:
• Lab exercise
References
SG24-5422-00 AIX Logical Volume Manager from A to Z: Introduction and
Concepts (Redbooks)
SG24-5433-00 AIX Logical Volume Manager from A to Z: Troubleshooting
and Commands (Redbooks)
GG24-4484-00 AIX Storage Management (Redbooks)
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-1
Student Notebook
Unit objectives
IBM Power Systems

Notes:

V10.0
Student Notebook
Uempty 7.1. LVM data representation: Overview
Student Notebook
Review: LVM terms

IBM Power Systems
Physical Logical
Partitions Partitions
Physical Logical
Volumes Volume
Volume
Group
Figure 7-2. Review: LVM terms AN153.0
Notes:
Introduction
This visual and the associated student notes provide a review of basic LVM terms.
Volume groups, physical volumes, and physical partitions

A volume group (VG) consists of one or more physical volumes (PV) that are divided into
physical partitions (PP). When a volume group is created, a physical partition size must be
specified. This physical partition size is the smallest allocation unit for the LVM. The physical
partition size is specified in units of megabytes from 1 (1 MB) through 131,072 (1 GB) for normal
or big volume groups (more on the physical partition size later). The physical partition size for
scalable volume groups can be up to 128 GB. The physical partition size must be equal to a
power of 2 (example 1, 2, 4, 8). The default physical partition size for normal and big volume
groups is the lowest value that can be used to remain within a limitation of 1016 physical
partitions per physical volume. The default value for scalable volume groups (introduced in AIX
5.3) is the lowest value that can be used to accommodate 2040 physical partitions per physical
volume.

V10.0
Student Notebook
Uempty For scalable volume groups, the maximum number of physical partitions is no longer defined on
a per disk basis but applies to the entire volume group. The scalable volume group can hold up
to 2097152 (2048 KB) physical partitions.
Logical volumes and logical partitions

The LVM provides logical volumes (LVs) that can be created, extended, moved, and deleted at
run time. A logical volume can span multiple disks, which is one of the biggest advantages of
the LVM.
Logical volumes contain the JFS and JFS2 file systems, paging spaces, journal logs, the boot
logical volumes, or nothing (when used as a raw logical volume).
Logical volumes are divided into logical partitions (LPs), where each logical partition is
associated with at least one physical partition.
Student Notebook
LVM identifiers
IBM Power Systems
Goal: Unique worldwide identifiers for:

volume groups, hard disks, and logical volumes
# lsvg rootvg | grep IDENT

... VG IDENTIFIER: 00c35ba000004c00000001157f54bf78
# lspv 32 bytes long

hdisk0 00c35ba07b2e24f0 rootvg active
32 bytes long
# lsattr –El hdisk# -a unique_id (16 are shown)
unique_id
3321360050768019102C0F000000000006E2904214503IBMfcp
# lslv hd4 | grep IDENT

LV IDENTIFIER: 00c35ba000004c00000001157f54bf78.4 ...
... VGID.minor number
# uname -m
00C35BA04C00
Figure 7-3. LVM identifiers AN153.0
Notes:
Use of identifiers
The LVM uses identifiers for disks, volume groups, and logical volumes. As volume groups can
be exported and imported between systems, these identifiers must be unique worldwide.
AIX generated identifiers are based on the CPU ID of the creating host and a time stamp.
Volume group identifiers

As shown on the visual, the volume groups identifiers (VGID) have a length of 32 bytes.

V10.0
Student Notebook
Uempty Disk identifiers

Disk identifiers have a length of 32 bytes, but currently the last 16 bytes are unused and are all
set to 0 in the ODM. Notice that, as shown on the visual, only the first 16 bytes of this identifier
are displayed in the output of the lspv command.
In a SAN environment, path management needs to have a method for identifying a disk that is
discovered over two different paths. Some storage solutions in an AIX environment use the
PVID for this purpose. Other storage solutions use a IEEE volume identifier (ieee_volname)
or a UDID unique identifier (unique_id) for this purpose. Each of these methods would be
attributes of the disk in the ODM.
The PVID attribute is created the first time that a disk is assigned to a volume group.
If you ever must manually update the disk identifiers in the ODM, do not forget to add 16 zeros
to the physical volume ID.
Logical volume identifiers

The logical volume identifiers consist of the volume group identifier, a period, and the minor
number of the logical volume.
Student Notebook
LVM data on disk control blocks

IBM Power Systems
• Volume Group Descriptor Area (VGDA)

– Most important data structure of LVM
– Global to the volume group (same on each disk)
– One or two copies per disk
• Volume Group Status Area (VGSA)

– Tracks the state of mirrored copies
– One or two copies per disk
• Logical Volume Control Block (LVCB)

– Historically occupied the first 512 bytes of each logical volume
– Contains logical volume attributes (policies, number of copies)
– Scalable volume groups: The information is merged into VGDA
Figure 7-4. LVM data on disk control blocks AN153.0
Notes:
Disk control blocks that are used by LVM

The LVM uses three different disk control blocks:
- The Volume Group Descriptor Area (VGDA) is the most important data structure of the LVM.
A redundant copy is kept on each disk that is contained in a volume group. Each disk
contains the complete allocation information of the entire volume group.
- The Volume Group Status Area (VGSA) tracks the status of all physical volumes in the
volume group (active or missing) and the state of all allocated physical partitions in the
volume group (active or stale). Each disk in a volume group contains a VGSA.
- The Logical Volume Control Block (LVCB) traditionally is in the first 512 bytes of each logical
volume. If raw devices are used (for example, many database systems use raw logical
volumes), be careful that these programs do not destroy the LVCB. However, LVCB is not
kept at this location in scalable volume groups, but instead is kept in the same reserved disk
area as the VGDA. Also, the administrator of a big volume group can use the -T O option of

V10.0
Student Notebook
Uempty the mklv command to request that the LVCB is not stored in the beginning of the logical
volume, but instead part of the VGDA.
VGSA for scalable volume groups

The VGSA for scalable volume groups consists of three areas: Physical Volume Missing Area
(PVMA), Mirror Write Consistency Dirty Bit Area (MWC_DBA), and PP Status Area (PPSA).
- Physical Volume Missing Area: The PVMA tracks if any of the disks are missing
- MWC dirty bit area: The MWC_DBA holds the status for each logical volume if passive
mirror write consistency is used
- PP status area: The PPSA logs any stale PPs
The overall size that is reserved for the VGSA is independent of the configuration parameters of
the scalable volume group and stays constant. However, the size of the contained PPSA
changes in proportion to the configured maximum number of PPs.
LVCB-related considerations
For normal volume groups, the LVCB is in the first block of the user data within the logical
volume. Big volume groups keep more LVCB information in the VGDA. The LVCB structure on
the first logical volume user block and the LVCB structure within the VGDA are similar but not
identical. If a big volume group was created with the -T O option of the mkvg command, no
LVCB occupies the first block of the logical volume. With scalable volume groups, logical
volume control information is no longer stored on the first user block of any logical volume.
Therefore, no precautions need to be taken when using raw logical volumes because there is
no longer a need to preserve the information that is held by the first 512 bytes of the logical
device.
Student Notebook
LVM data in the operating system

IBM Power Systems
• AIX files
– /etc/vg/vgVGID Handle to the VGDA copy in memory
– /dev/hdiskX Special file for a disk
– /dev/VGname Special file for administrative access to a
volume group
– /dev/LVname Special file for a logical volume
– /etc/filesystems Used by the mount command to associate
logical volume name, file system log, and
mount point
• Object Data Manager (ODM)

– Metadata on physical volumes, volume groups, and logical volumes
– CuDv, CuAt, CuDvDr, CuDep
Figure 7-5. LVM data in the operating system AN153.0
Notes:
LVM information that is stored in the ODM

Physical volumes, volume groups, and logical volumes are handled as devices in AIX. Every
physical volume, volume group, and logical volume is defined in the customized object classes
in the ODM.
LVM information that is stored in AIX files

As shown on the visual, many AIX files also contain LVM-related data.
The kernel always stores the VGDA in memory to increase performance. This technique is
called a memory-mapped file. The handle is always a file in the /etc/vg directory. This
filename always reflects the volume group identifier.

V10.0
Student Notebook
Uempty
LVM-related ODM objects

IBM Power Systems
• CuDv: Identifies as devices:

– Volume groups, physical volumes, logical volumes
• CuAt: Attributes for each LVM entity, including:

– Physical volume’s PVID value (pvid)
– Logical volume’s LVID value (lvserial_id)
– Volume Group’s VGID value (vgserial_id)
– Volumes group’s PVIDs (pv) – one for each PV
• CuDep: One object per logical volume dependency
• CuDvDr: Device driver information:

– Object for each volume group, physical volume, and logical volume
Figure 7-6. LVM-related ODM objects AN153.0
Notes:
Overview
The LVM metadata that is maintained in the ODM database has a large overlap with the
information maintained in the VGDA and LVCB control blocks on disk. Yet, there is information
in the control blocks (such as the mapping of logical partitions) that is not kept in the ODM.
There is also information (such as device drivers and logical names) that is not kept in the
control blocks. Each metadata location plays a special role. There are mechanisms to ensure
that the information does not conflict.
LVM-related ODM object classes

The visual provides an overview of the ODM objects. Each of these objects is covered in more
detail later in the unit.
Student Notebook

V10.0
Student Notebook
Uempty 7.2. Export and import
Student Notebook
Exporting a volume group

IBM Power Systems
moon
To export a volume group:
hdisk9
1. Unmount all file systems

from the volume group:
# umount /dev/lv10
# umount /dev/lv11
myvg 2. Vary off the volume group:

# varyoffvg myvg
3. Export the volume group:

# exportvg myvg
The complete volume group is

removed from the ODM
Figure 7-7. Exporting a volume group AN153.0
Notes:
The scenario
The exportvg and importvg commands can be used to fix ODM problems. These
commands also provide a way to transfer data between different AIX systems. This visual
provides an example of how to export a volume group.
The disk, hdisk9, is connected to the system moon. This disk belongs to the myvg volume
group. This volume group needs to be transferred to another system.
Procedure to export a volume group

Do the following steps to export the volume group:
1. Unmount all file systems from the volume group. In the example, there are three logical
volumes in myvg; lv10, lv11, and loglv01. The loglv01 logical volume is the JFS log
device for the file systems in myvg, which is closed when all file systems that use it are
unmounted.

V10.0
Student Notebook
Uempty 2. When all logical volumes are closed, use the varyoffvg command to vary off the
volume group.
3. Finally, export the volume group with the exportvg command. After this point, the
complete volume group (including all file systems and logical volumes) is removed from
the ODM.
After exporting the volume group, the disks in the volume group can be transferred to
another system.
Student Notebook
Importing a volume group

IBM Power Systems
To import a volume group:
1.Configure the disks.
2.Import the volume group:

# importvg -y myvg hdisk3
mars 3.Mount the file systems:

# mount /dev/lv10
# mount /dev/lv11
The complete volume group is

hdisk3 added to the ODM
myvg
Figure 7-8. Importing a volume group AN153.0
Notes:
Procedure to import a volume group

To import a volume group into a system, for example into a system that is named mars, do the
following steps:
1. Connect all disks (in the example there is only one disk) and reboot the system so that
cfgmgr configures the added disks.
2. You must specify only one disk (either hdisk# or the PVID) in the importvg command.
Because all disks contain the same VGDA information, the system can determine this
information by querying any VGDA from any disk in the volume group.
If you do not specify the -y flag, the command generates a new volume group name.
The importvg command generates new ODM entries.
The volume group is automatically varied on.
3. Finally, mount the file systems.

V10.0
Student Notebook
Uempty
importvg and duplicate names

IBM Power Systems
• Avoid duplicate logical volume names and file system names

between systems
– Avoid default names such as fslv00 and lv01
– Use functionally meaningful names such as db2pay00
• importvg generates a new logical volume name for a

duplicate
• importvg does not create an /etc/filesystems entry for

a duplicate label (mount point)
Figure 7-9. importvg and duplicate names AN153.0
Notes:
Duplicate names during importvg

If a logical volume name or a file system name (label) exists on the system to which you are
importing a volume group, you run into problems. The best way to avoid this situation is to have
a naming convention for your logical volume and file system names, which ensures uniqueness
across systems. The common reason for having duplicates is the acceptance and use of the
AIX default names.
Duplicate logical volume names

If you are importing a volume group with logical volumes that exist on the system, the
importvg command renames the logical volumes from the volume group that is being
imported.
Student Notebook
Duplicate file system names and /etc/filesystems stanzas

Normally the importvg command creates new stanzas in /etc/filesystems for file
systems in the imported volume group. If importvg finds that the new file system’s label
duplicates the label of an existing stanza, it provides an error message and does not create the
new stanza.

V10.0
Student Notebook
Uempty
importvg and existing logical volumes

IBM Power Systems
mars
hdisk3
myvg

importvg: changing LV name lv10 to fslv00
importvg: changing LV name lv11 to fslv01
hdisk2
datavg
importvg can also accept the PVID in place of the hdisk name
Figure 7-10. importvg and existing logical volumes AN153.0
Notes:
Renaming logical volumes

If you are importing a volume group with logical volumes that exist on the system, the
importvg command renames the logical volumes from the volume group that is being
imported.
The logical volumes /dev/lv10 and /dev/lv11 exist in both volume groups. During the
importvg command, the logical volumes from myvg are renamed to /dev/fslv00 and
/dev/fslv01.
Student Notebook
importvg and existing file systems (1 of 2)

IBM Power Systems
/dev/lv10: /home/sarah /dev/lv23: /home/peter

/dev/lv11: /home/michael /dev/lv24: /home/michael
/dev/loglv00: log device /dev/loglv01: log device
Warning: mount point /home/michael already

exists in /etc/filesystems
# umount /home/michael
# mount -o log=/dev/loglv01 /dev/lv24 /home/michael
Figure 7-11. importvg and existing file systems (1 of 2) AN153.0
Notes:
Using umount and mount

If a file system (for example /home/michael) exists on a system, you run into problems when
you mount the file system that was imported.
One method to get around this problem is to:
1. Unmount the file system that exists on the system. For example, /home/michael from
datavg
2. Mount the imported file system. You must specify the:
- Log device (-o log=/dev/lvlog01)
- Logical volume name (/dev/lv24)
- Mount point (/home/michael)
If the file system type is jfs2, you must specify the type as well (-V jfs2). You can get
this information by running the command: getlvcb lv24 -AT
Another method is to add a stanza to the /etc/filesystems file.

V10.0
Student Notebook
Uempty
importvg and existing file systems (2 of 2)

IBM Power Systems
# vi /etc/filesystems /dev/lv10: /home/sarah

/dev/lv11: /home/michael
/home/michael:
dev = /dev/lv11 /dev/loglv00: log device
vfs = jfs
log = /dev/loglv00
mount = false datavg
options = rw
account = false
/home/michael_moon:
dev = /dev/lv24
vfs = jfs /dev/lv23: /home/peter
log = /dev/loglv01 /dev/lv24: /home/michael
mount = false
options = rw /dev/loglv01: log device
account = false
hdisk3 (myvg)
# mount /home/michael
# mount /home/michael_moon
Mount point must exist
Figure 7-12. importvg and existing file systems (2 of 2) AN153.0
Notes:
Create a stanza in /etc/filesystems

If you need both file systems (the imported and the one that exists) mounted at the same time,
you must create a new stanza in /etc/filesystems. In the example, a second stanza is
created for the imported logical volume, /home/michael_moon. The fields in the new stanza
are:
- dev specifies the logical volume, in the example /dev/lv24.
- vfs specifies the file system type, in the example a journaled file system.
- log specifies the JFS log device for the file system.
- mount specifies whether this file system should be mounted by default. The value false
specifies the file system is not mounted during boot. The value true indicates that a file
system should be mounted during the boot process.
- options specifies that this file system should be mounted with read and write access.
Student Notebook
- account specifies whether the accounting system processes the file system. A value of
false indicates no accounting.
Before mounting the file system /home/michael_moon, the corresponding mount point must
be created.

V10.0
Student Notebook
Uempty 7.3. LVM metadata details
Student Notebook
Contents of the VGDA

IBM Power Systems
Header time stamp • Updated when volume group is changed
• PVIDs only (no physical volume names)

Physical volume list
• VGDA count and physical volume state
• LVIDs and logical volume names
Logical volume list
• Number of copies
• Maps logical partitions to physical
Physical partition map
partitions
• Must contain same value as header time

Trailer time stamp
stamp
Figure 7-13. Contents of the VGDA AN153.0
Notes:
Introduction
The table in the visual shows the contents of the VGDA. The individual items that are listed are
discussed in the paragraphs that follow.
Time stamps
The time stamps are used to check whether a VGDA is valid. If the system crashes while
changing the VGDA, the time stamps differ. The VGDA is marked invalid the next time the
volume group is varied on. The most current intact VGDA is used to overwrite the other VGDAs
in the volume group.

V10.0
Student Notebook
Uempty Physical volume list

The VGDA contains the physical volume list. No disk names are stored. The unique disk
identifiers are only used. For each disk, the number of VGDAs on the disk and the physical
volume state is stored.
Logical volume list

The VGDA contains a record of the logical volumes that are part of the volume group. It stores
the logical volume identifiers and the corresponding logical volume names. Additionally, the
number of copies is stored for each logical volume.
Physical partition map

The most important data structure is the physical partition map. It maps each logical partition to
a physical partition. The size of the physical partition map is determined at volume group
creation time.
Student Notebook
VGDA example
IBM Power Systems
# lqueryvg -p hdisk1 -At
Max LVs: 256
PP Size: 20 1: ____________
Free PPs: 12216

LV count: 3 2: ____________
PV count: 1 3: ____________
Total VGDAs: 2 4: ____________
MAX PPs per PV: 32768

MAX PVs: 1024
5: ____________
Logical:
00c35ba000004c00000001157fcf6bdf.1 lv00 1
00c35ba000004c00000001157fcf6bdf.2 lv01 1
00c35ba000004c00000001157fcf6bdf.3 lv02 1
Physical: 00c35ba07fcf6b93 2 0
6: ____________ 7: ____________
Figure 7-14. VGDA example AN153.0
Notes:
The lqueryvg command

The lqueryvg command is a low-level command that shows an extract from the VGDA on a
specified disk, for example, hdisk1.
In the command that is shown on the visual, -p hdisk1 reads the VGDA on hdisk1; -A
displays all available information; and -t displays descriptive tags.
The visual shows only selected fields from the report; a complete example output is below in
these notes.
Interpreting lqueryvg output

As an exercise in interpreting the output of lqueryvg, match each of the following expressions
to the appropriate numbered location on the visual.
a. VGDA count on this disk
b. Two VGDAs in volume group

V10.0
Student Notebook
Uempty c. Three logical volumes in volume group

d. PP size = 220 (2 to the 20th power) bytes, or 1 MB (for this volume group)
e. LVIDs (VGID.minor_number)
f. One physical volume in volume group
g. PVIDs
Student Notebook
Output of lqueryvg on AIX 7.1

The output of lqueryvg on recent AIX versions gives more information than shown in the
example on the visual. An example of lqueryvg (for the rootvg disk) output from an AIX 7.1
system is:
Max LVs: 256
PP Size: 24
Free PPs: 512
LV count: 12
PV count: 2
Total VGDAs: 3
Conc Allowed: 0
MAX PPs per PV 1016
MAX PVs: 32
Quorum (disk): 1
Quorum (dd): 1
Auto Varyon ?: 1
Conc Autovaryo 0
Varied on Conc 0
Logical: 00f6060300004c000000012d097cb46a.1 hd5 1
00f6060300004c000000012d097cb46a.2 hd6 1
00f6060300004c000000012d097cb46a.3 hd8 1
00f6060300004c000000012d097cb46a.4 hd4 1
00f6060300004c000000012d097cb46a.5 hd2 1
00f6060300004c000000012d097cb46a.6 hd9var 1
00f6060300004c000000012d097cb46a.7 hd3 1
00f6060300004c000000012d097cb46a.8 hd1 1
00f6060300004c000000012d097cb46a.9 hd10opt 1
00f6060300004c000000012d097cb46a.10 hd11admin 1
00f6060300004c000000012d097cb46a.11 lg_dumplv 1
00f6060300004c000000012d097cb46a.12 livedump 1
Physical: 000bf81121b8ef00 2 0
00f606036452e4f9 1 0
Total PPs: 1022
LTG size: 128
HOT SPARE: 0
AUTO SYNC: 0
VG PERMISSION: 0
SNAPSHOT VG: 0
IS_PRIMARY VG: 0
PSNFSTPP: 4352
VARYON MODE: 0
VG Type: 0
Max PPs: 32512
Mirror Pool St n
Sys Mgt Mode: 0
VG Reserved: 1

V10.0
Student Notebook
Uempty PV RESTRICTION 0
Infinite Retry: 2
Varyon State: 0
Disk Block Size 512
Student Notebook
The logical volume control block

IBM Power Systems
# getlvcb -AT hd2

AIX LVCB
intrapolicy = c
copies = 1
interpolicy = m
lvid = 00c35ba000004c00000001157f54bf78.5
lvname = hd2
label = /usr
machine id = 35BA04C00
number lps = 102
relocatable = y
strict = y
stripe width = 0
stripe size in exponent = 0
type = jfs2
upperbound = 32
fs =
vfs=jfs2:log=/dev/hd8:mount=automatic:type=bootfs:vol=/usr:
free=false:quota=no
time created = Wed Jul 16 02:17:50 2014
time modified = Wed Jul 16 23:31:00 2014
Figure 7-15. The logical volume control block AN153.0
Notes:
The logical volume control block (LVCB) and the getlvcb command
The LVCB stores attributes of a logical volume. The getlvcb command queries an LVCB.

V10.0
Student Notebook
Uempty Example report:

# getlvcb -AT hd2
AIX LVCB
intrapolicy = c
copies = 1
interpolicy = m
lvid = 00c35ba000004c00000001157f54bf78.5
lvname = hd2
label = /usr
machine id = 35BA04C00
number lps = 102
relocatable = y
strict = y
stripe width = 0
stripe size in exponent = 0
type = jfs2
upperbound = 32
fs =vfs=jfs2:log=/dev/hd8:mount=automatic:type=bootfs:vol=/usr:
free=false:quota=no
time created = Wed Jul 16 02:17:50 2014
time modified = Wed Jul 16 23:31:00 2014
In the example, the logical volume hd2 has the following characteristics:
- intrapolicy, which specifies what strategy should be used for choosing physical
partitions on a physical volume. The five general strategies are edge (sometimes called
outer-edge), inner-edge, middle (sometimes called outer-middle), inner-middle, and center
(c = Center).
- copies (1 = No mirroring)
- interpolicy, which specifies the number of physical volumes to extend across (m =
Minimum).
- lvid
- lvname - Logical volume name (hd2)
- number lps - Number of logical partitions (102)
- Can the partitions be reorganized? (relocatable = y)
- Each mirror copy on a separate disk (strict = y)
- Number of disks that are involved in striping (stripe width)
- Stripe size (stripe size in exponent)
- Logical volume type (type = jfs)
- JFS file system information (fs=)
- Creation and last update time (time created, time modified)
Student Notebook
How LVM interacts with the ODM and the VGDA

IBM Power Systems
importvg
ODM
VGDA and
LVCB
Change, using Match IDs by /etc/filesystems
low-level name
commands
mkvg
extendvg
mklv Update
crfs
chfs exportvg
rmlv
reducevg
...
Figure 7-16. How LVM interacts with the ODM and the VGDA AN153.0
Notes:
High-level commands
Most of the LVM commands that are used when working with volume groups, physical volumes,
or logical volumes are high-level commands. These high-level commands (like mkvg,
extendvg, mklv, and others that are listed on the visual) are implemented as executable code
or shell scripts and use names to reference a certain LVM object. The ODM is consulted to
match a name, for example, rootvg or hdisk0, to an identifier.
Interaction with disk control blocks and the ODM

The high-level commands call intermediate or low-level commands that query or change the
disk control blocks VGDA or LVCB. Additionally, the ODM must be updated; for example, to add
a new logical volume. The high-level commands contain signal handlers to clean up the
configuration if the program is stopped abnormally. If a system crashes, or if high-level
commands end by kill -9, the system can end up in a situation where the VGDA/LVCB and
the ODM are not in sync. The same situation can occur when low-level commands are used
incorrectly.

V10.0
Student Notebook
Uempty The importvg and exportvg commands

The visual shows two important commands that are explained in detail later. The command
importvg imports a complete new volume group based on a VGDA and LVCB on a disk. The
command exportvg removes a complete volume group from the ODM.
VGDA and LVCB corruption

The focus in this course is on situations where the ODM is corrupted and you assume that the
LVM control data (for example, the VGDA or the LVCB) is correct. If an attempted execution of
LVM commands (for example: lsvg, varyonvg, reducevg) results in a failure with a core
dump that might be an indication that the LVM control data on one of the disks is corrupted. In
this situation, do not attempt to resync the ODM by using the procedures that are covered. In
most cases, you need to recover from a volume group backup. If recovery from backup is not a
viable option, it is suggested that you work with AIX Support in dealing with the problem.
Attempting to use the procedures that are covered in this unit do not solve the problem. Even
worse, the corruption might be propagated to other disks in the volume group, thus making the
situation even worse.
Student Notebook
ODM entries for physical volumes (1 of 4)

IBM Power Systems
# odmget -q "name like hdisk[02]" CuDv
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
Figure 7-17. ODM entries for physical volumes (1 of 4) AN153.0
Notes:
CuDV entries for physical volumes

The CuDv object class contains information about each physical volume.

V10.0
Student Notebook

# odmget -q "name like hdisk[02]" CuDv
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
Key attributes
Remember the most important attributes:
- status = 1 means that the disk is available
- chgstatus = 2 means that the status did not change since last reboot
- location specifies the location code of the device
- parent specifies the parent device
Physical versus virtual disks

The two disks have different device drivers and different Predefined Device object class links.
hdisk2 is a physical disk, which was directly allocated to the logical partition (which this example
came from). hdisk0 is a virtual disk that is mapped though the Advanced Power Virtualization
feature to a backing physical disk that is allocated to a Virtual I/O Server partition on the same
machine.
The virtual disk does not have an AIX location code. Rather, its location is the physical location
code of its parent virtual SCSI adapter (vscsi0) supplemented with the LUN number for the
backing device that is recorded in the connwhere field. The physical location code of the parent
adapter is recorded in the CuVPD object for the adapter.
Student Notebook

IBM Power Systems
# odmget -q "name=hdisk1" CuAt | egrep -p "pvid|unique_id"
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
Notes:
The pvid attribute

The disk’s most important attribute is its PVID.
The PVID has a length of 32 bytes, where the last 16 bytes are set to zeros in the ODM.
Whenever you must manually update a PVID in the ODM, you must specify the complete
32-byte PVID of the disk.
The pvid attribute is usually assigned the first time that the disk is added to a volume group.
The unique_id attribute

When working with a disk that is a LUN accessed by way of the storage area network, the
unique_id is often an important identifier.

V10.0
Student Notebook

# odmget -q "name=hdisk1" CuAt | egrep -p "pvid|unique_id"
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
Other information that is stored in CuAt

Other attributes of physical volumes (for example, the reserve_policy or the queue_depth)
can be stored in CuAt.
Other methods of displaying attribute information that is stored in CuAt

In the visual, a grep was used to pick out the stanzas for the objects of interest. The odmget
query restriction allows multiple descriptors:
# odmget -q “name=hdisk1 and attribute=pvid” CuAt
# odmget -q “name=hdisk1 and attribute=unique_id” CuAt
The easier way to normally obtain attribute information is to use the lsattr command:
# lsattr -E -l hdisk1 -a pvid
# lsattr -E -l hdisk1 -a unique_id
Student Notebook

IBM Power Systems
# odmget -q "name=hdisk1" CuDv
CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk"
# lsattr -El hdisk1 | egrep "ww_name|lun_id"
lun_id 0x1000000000000 Logical Unit Number ID FALSE

ww_name 0x500507680140581e FC World Wide Name FALSE
# lscfg -l hdisk1
hdisk1 U8233.E8B.100603P-V16-C31-T1-W500507680140581E-L1000000000000
MPIO IBM 2145 FC Disk
Notes:
For Fibre Channel accessed LUNs, the location field would identify the parent FC adapter; the
connwhere would have a place holder value of W_0, which indicates that the disk identify is
stored in the ww_name attribute of the disk.
The physical location code consists of the location code of the parent adapter, followed by the
ww_name and the LUN ID (obtained from the lun_id attribute of the disk).

V10.0
Student Notebook
Uempty Example reports:

# odmget -q "name=hdisk1" CuDv
CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk“
# lsattr -El hdisk1 | egrep "ww_name|lun_id“
lun_id 0x1000000000000 Logical Unit Number ID FALSE

ww_name 0x500507680140581e FC World Wide Name FALSE
# lscfg -l hdisk1
hdisk1 U8233.E8B.100603P-V16-C31-T1-W500507680140581E-L1000000000000
MPIO IBM 2145 FC Disk
Student Notebook

IBM Power Systems
# odmget -q "value3 like hdisk[03]" CuDvDr

CuDvDr:
resource = "devno"
value1 = "17"
value2 = "0"
value3 = "hdisk0"
CuDvDr:
resource = "devno"
value1 = "36"
value2 = "0"
value3 = "hdisk3"
# ls -l /dev/hdisk[03]
brw------- 1 root system 17, 0 Jul 16 15:13 /dev/hdisk0
brw------- 1 root system 36, 0 Jul 16 04:21 /dev/hdisk3
Notes:
Major and minor numbers

The ODM class CuDvDr is used to store the major and minor numbers of the devices. The
output that is shown on the visual indicates that CuDvDr has the major number 17 (value1)
and the minor number 0 (value2) for hdisk0.
The major numbers for the two disks are different because hdisk0 is a virtual disk, served from
a Virtual I/O Server partition, while hdisk1 is a physical disk allocated to this logical partition.
Special files
Applications or system programs use the special files to access a certain device. For example,
the visual shows special files that are used to access hdisk0 (/dev/hdisk0) and hdisk1
(/dev/hdisk1).

V10.0
Student Notebook
Uempty
ODM entries for volume groups (1 of 2)

IBM Power Systems
# odmget -q "name=rootvg" CuDv

CuDv:
name = "rootvg"
status = 0
chgstatus = 1
ddins = ""
location = ""
parent = ""
connwhere = ""
PdDvLn = "logical_volume/vgsubclass/vgtype“
# odmget -q "name=rootvg" CuAt

CuAt:
name = "rootvg"
attribute = "vgserial_id"
value = "00c35ba000004c00000001157f54bf78"
type = "R"
generic = "D"
rep = "n"
nls_index = 637
Output continued on next visual
Figure 7-21. ODM entries for volume groups (1 of 2) AN153.0
Notes:
CuDv entries for volume groups

Information indicating the existence of a volume group is stored in CuDv, which means all
volume groups must have an object in this class. The visual shows an example of a CuDv entry
for rootvg.
VGID
One of the most important pieces of information about a volume group is the VGID. As shown
on the visual, this information is stored in CuAt.
Disks belonging to a volume group

An entry for each disk that belongs to a volume group is stored in CuAt. This example is shown
on the next page.
Student Notebook
ODM entries for volume groups (2 of 2)

IBM Power Systems
# odmget -q "name=rootvg" CuAt

...
CuAt:
name = "rootvg"
attribute = "timestamp"
value = "470a1bc9243ed693"
type = "R"
generic = "DU"
rep = "s"
nls_index = 0
CuAt:
name = "rootvg"
attribute = "pv"
value = "00c35ba07b2e24f00000000000000000"
type = "R"
generic = ""
rep = "sl"
nls_index = 0
Figure 7-22. ODM entries for volume groups (2 of 2) AN153.0
Notes:
Disks belonging to a volume group

The CuAt object class contains an object for each disk that belongs to a volume group. The
visual shows an example of a CuAt object for a disk in rootvg.
Length of PVID
Remember that the PVID is a 32-number field, where the last 16 numbers are set to zeros.

V10.0
Student Notebook
Uempty
ODM entries for logical volumes (1 of 2)

IBM Power Systems
# odmget -q "name=hd2" CuDv

CuDv:
name = "hd2"
status = 0
chgstatus = 1
ddins = ""
location = ""
parent = "rootvg"
connwhere = ""
PdDvLn = "logical_volume/lvsubclass/lvtype"
# odmget -q "name=hd2" CuAt Other attributes include intra,

CuAt: stripe_width, type, and so on.
name = "hd2"
attribute = "lvserial_id"
value = "00c35ba000004c00000001157f54bf78.5"
type = "R"
generic = "D"
rep = "n"
nls_index = 648
Figure 7-23. ODM entries for logical volumes (1 of 2) AN153.0
Notes:
CuDv entries for logical volumes

The CuDv object class contains an entry for each logical volume.
Attributes of a logical volume

Attributes of a logical volume, for example, its LVID (lvserial_id), are stored in the object
class CuAt. Other attributes that belong to a logical volume are the intra-physical policy
(intra), stripe_width, type, size, and label.
Student Notebook
ODM entries for logical volumes (2 of 2)

IBM Power Systems
# odmget -q "value3=hd2" CuDvDr

CuDvDr:
resource = "devno"
value1 = "10"
value2 = "5"
value3 = "hd2"
# ls -l /dev/hd2
brw-rw---- 1 root system 10, 5 08 Jul 16 23:21 /dev/hd2
# odmget -q "dependency=hd2" CuDep

CuDep:
name = "rootvg"
dependency = "hd2"
Figure 7-24. ODM entries for logical volumes (2 of 2) AN153.0
Notes:
CuDvDr logical volume objects
Each logical volume has an object in CuDvDr that is used to create the special file entry for that
logical volume in /dev. As an example, the sample output on the visual shows the CuDvDr
object for hd2 and the corresponding /dev/hd2 (major number 10, minor number 5) special file
entry in the /dev directory.
CuDep logical volume entries

The ODM class CuDep (customized dependencies) stores dependency information for software
devices. For example, the sample output on the visual indicates that the logical volume hd2 is
contained in the rootvg volume group.

V10.0
Student Notebook
Uempty 7.4. LVM metadata-related problems
Student Notebook
ODM-related LVM problems

IBM Power Systems
2.
VGDA High-level commands ODM

LVCB
1. - Signal handler
- Lock
What can cause problems?

• kill -9, shutdown, system crash
• Improper use of low-level commands
• Hardware changes without or with wrong software actions
• Full root file system
Figure 7-25. ODM-related LVM problems AN153.0
Notes:
Normal functioning of high-level commands

As already mentioned, most of the time administrators use high-level commands to create or
update volume groups or logical volumes. These commands use signal handlers to set up a
proper cleanup in case of an interruption. Additionally, LVM commands use a locking
mechanism to block other commands while a change is in progress.
Causes of problems
The signal handlers that are used by high-level LVM commands do not work with a kill -9, a
system shutdown, or a system crash. You might end up in a situation where the VGDA is, but
the change was not stored in the ODM.
Problems might also occur because of the improper use of low-level commands or hardware
changes.

V10.0
Student Notebook
Uempty Another common problem is ODM corruption when doing LVM operations when the root file
system (which contains /etc/objrepos) is full. Always check the root file system free space
before attempting LVM recovery operations.
Student Notebook
Fixing ODM problems (1 of 2)

IBM Power Systems
If the ODM problem is not in the rootvg, for example in

volume group homevg, do the following:
# varyoffvg homevg
# exportvg homevg Remove complete volume

group from the ODM
# importvg -y homevg hdiskX
Import volume group and

create new ODM objects
Figure 7-26. Fixing ODM problems (1 of 2) AN153.0
Notes:
Determining which volume group has the problem

If you detect ODM problems, you must determine whether the volume group with the problem is
the rootvg or not. Because the rootvg cannot be varied off, the procedure that is given here
applies only to non-rootvg volume groups.
Steps in ODM repair procedure (for problem not in rootvg)

1. In the first step, you vary off the volume group, which requires that all file systems be
unmounted first. To vary off a volume group, use the varyoffvg command.
2. In the next step, you export the volume group by using the exportvg command. This
command removes the complete volume group from the ODM. The exportvg command
does not touch the VGDA and LVCB.
3. In the last step, you import the volume group by using the importvg command. Specify the
volume group name with option -y. Otherwise, AIX creates a new volume group name.

V10.0
Student Notebook
Uempty You need to specify only one intact physical volume of the volume group that you import.
The importvg command reads the VGDA and LVCB on that disk and creates new ODM
objects.
This procedure does not allow the data to be used while repairing the corruption, even if the file
systems are mounted and are accessible despite the problem. The logical volumes must be
closed to vary the volume group offline.
Student Notebook
Fixing ODM problems (2 of 2)

IBM Power Systems
If the ODM problem is in the rootvg, try using the rvgrecover procedure:
PV=hdisk0
VG=rootvg
cp /etc/objrepos/CuAt /etc/objrepos/CuAt.$$
cp /etc/objrepos/CuDep /etc/objrepos/CuDep.$$
cp /etc/objrepos/CuDv /etc/objrepos/CuDv.$$
cp /etc/objrepos/CuDvDr /etc/objrepos/CuDvDr.$$
lqueryvg -Lp $PV | awk '{print $2}' | while read LVname;

do
odmdelete -q "name=$LVname" -o CuAt
odmdelete -q "name=$LVname" -o CuDv
odmdelete -q "value3=$LVname" -o CuDvDr
Done
odmdelete -q "name=$VG" -o CuAt • Uses odmdelete
odmdelete -q "parent=$VG" -o CuDv to export rootvg
odmdelete -q "name=$VG" -o CuDv
odmdelete -q "name=$VG" -o CuDep
odmdelete -q "dependency=$VG" -o CuDep • Uses importvg to
odmdelete -q "value1=10" -o CuDvDr import rootvg
odmdelete -q "value3=$VG" -o CuDvDr
importvg -y $VG $PV # ignore lvaryoffvg errors

varyonvg $VG
Figure 7-27. Fixing ODM problems (2 of 2) AN153.0
Notes:
Problems in rootvg
For ODM problems in rootvg, finding a solution is more difficult because rootvg cannot be varied
off or exported. However, it might be possible to fix the problem by using one of the techniques
that are described next.
The rvgrecover procedure

If you detect ODM problems in rootvg, you can try running a procedure that is called
rvgrecover. You might want to code this procedure in a script (shown on the visual) in /bin
and make it executable.
The rvgrecover procedure removes all ODM entries that belong to your rootvg by using
odmdelete, which is the same way exportvg works.
After deleting all ODM objects from rootvg, it imports the rootvg by reading the VGDA and LVCB
from the boot disk. The result is that there are new ODM objects for your rootvg.

V10.0
Student Notebook
Uempty RAM disk maintenance mode

With the rootvg, the corruption problem might prevent a normal boot to multiuser mode. Thus,
you might need to handle this situation in RAM Disk Maintenance Mode (boot into Maintenance
mode from the CD/DVD or NIM). Before attempting this procedure, you should make sure that
you have a current mksysb backup.
Use the steps in the following table to recover the rootvg volume group after booting to
maintenance mode and mounting the file systems. These steps are similar to the steps in the
rvgrecover script that is shown on the visual.
Step Action
Delete all of the ODM information about logical volumes.
Get the list of logical volumes from the VGDA of the physical volume.
# lqueryvg -p hdisk0 -L | awk '{print $2}' \
| while read LVname; do
1
> odmdelete -q “name=$LVname” -o CuAt
> odmdelete -q “name=$LVname” -o CuDv
> odmdelete -q “value3=$LVname” -o CuDvDr
> done
Delete the volume group information from ODM.
# odmdelete -q “name=rootvg” -o CuAt
# odmdelete -q “parent=rootvg” -o CuDv
# odmdelete -q “name=rootvg” -o CuDv
2
# odmdelete -q “name=rootvg” -o CuDep
# odmdelete -q “dependency=rootvg” -o CuDep
# odmdelete -q “value1=10” -o CuDvDr
# odmdelete -q “value3=rootvg” -o CuDvDr
Add the volume group that is associated with the physical volume back to the
3 ODM.
# importvg -y rootvg hdisk0
Re-create the device configuration database in the ODM from the information
4 on the physical volume.
# varyonvg -f rootvg
This procedure assumes that hdisk0 is part of rootvg.
In CuDvDr:
value1 = major number
value2 = minor number
value3 = object name for major/minor number
rootvg always has value1 = 10.
The steps can also be used to recover other volume groups by substituting the appropriate
physical volume and volume group information. It is suggested that the steps in this example
are put into a script.
Student Notebook
Intermediate level ODM commands

IBM Power Systems
• High-level LVM commands might not be a viable option

– ODM corruption prevents high-level commands from running
– varyoffvg and exportvg disrupts availability
• redefinevg –d <hdisk#> <vgname>

– Identifies and reenters physical volume data for the volume group in
the ODM
– Checks for inconsistencies between LVM data areas and ODM
– Recovers some, but not all of the logical volume data
• synclvodm <vgname>
– Synchronizes the VGDA, LVCB, ODM, and special device files
– Volume group must be active
– First run the redefinevg command if ODM does not have the
minimum required information about the volume group
Figure 7-28. Intermediate level ODM commands AN153.0
Notes:
Overview
There are situations where you are unable to run the exportvg or importvg commands
because they depend on finding a minimal level of information in the ODM. Even if these
high-level LVM commands can be run, they require that the volume group is offline, which would
be disruptive. In these situations, it is useful to know some intermediate level LVM commands.
These commands are primarily intended to be used by high-level ODM commands, but they
can be useful in solving tough problems.
The synclvodm command

Syntax: synclvodm <VG> [<LV> ...]
Use of the synclvodm command is yet another way that you might be able to fix ODM
problems in rootvg. If, for some reason, the ODM is not consistent with on-disk information, the
synclvodm command can be used to resynchronize the database. It synchronizes or rebuilds
the LVCB, the ODM, and the VGDAs. The volume group must be active for the

V10.0
Student Notebook
Uempty resynchronization to occur. If logical volume names are specified, only the information that is
related to those logical volumes is updated.
The synclvodm command, by itself, can do a fairly complete job of resynchronizing the ODM
with the LVM data areas on the disk. It will also synchronize the information between the LVM
data areas. As such, it can worsen a situation where only one disk in the volume group contains
corrupted data areas. The command can be restricted to synchronizing only specific logical
volumes. Otherwise, it synchronizes all logical volumes. The synclvodm command depends
upon a minimal amount of information in the ODM; most importantly, the ODM needs to know
the volume group name plus the physical volume and logical volume memberships.
The redefinevg command

The redefinevg command redefines the set of physical volumes of the volume group in the
device configuration database. If inconsistencies occur between the physical volume
information in the ODM and the on-disk metadata, the redefinevg command determines
which physical volumes belong to the specified volume group and reenters this information in
the ODM. The redefinevg command checks for inconsistencies by reading the reserved
areas of all the configured physical volumes that are attached to the system.
It is sometimes necessary to run the redefinevg command to obtain the minimum information
about the volume group. It creates new ODM objects for the provided volume group name and it
uses the LVM data areas in the specified disk to obtain the correct LVM information. The
redefinevg command is not designed to fully rebuild all of the logical volume information.
Thus, after running the redefinevg command, it is often necessary to run the synclvodm
command to obtain the rest of the logical volume information.
These commands can be run with the volume group online. The ODM corruption might prevent
any attempt to vary the volume group offline.
Using chdev for the PVID

The chdev command accepts an attribute of pvid=clear (to delete the PVID) and
pvid=yes (to create a PVID). While clearing and re-creating a PVID can be useful in some
circumstances, it is generally recommended that any problem in the ODM is resolved by setting
its value to match what is stored on the disk. For example, the exportvg and importvg
commands might be used.
If there is no PVID set, either on the disk or in the ODM, then the PVID is normally established
when that disk becomes a member of a volume group (mkvg, extendvg).
Student Notebook
Checkpoint
IBM Power Systems
1. True or False: All LVM information is stored in the ODM.
2. True or False: You detect that a physical volume hdisk1 that

is contained in your rootvg is missing in the ODM. This
problem can be fixed by exporting and importing rootvg.
Notes:

V10.0
Student Notebook
Uempty
Exercise: LVM metadata and related problems

IBM Power Systems
• Export and import a volume group

• Analyze import messages
• Fix LVM ODM problems using exportvg and
importvg
• Fix LVM ODM problems using rvgrecover
• Use intermediate LVM commands
• Manually fix an LVM ODM problem (optional)
Figure 7-30. Exercise: LVM metadata and related problems AN153.0
Notes:
Student Notebook
Unit summary
IBM Power Systems

Notes:
The LVM information is held in a number of different places on the disk, including the ODM and
the VGDA.
ODM-related problems might be solved by:
• exportvg and importvg (non-rootvg volume groups)
• rvgrecover (rootvg)
• LVM intermediate commands
• Manually fixing by using ODM commands.

V10.0
Student Notebook
Uempty
Unit 8. Disk management procedures

This unit describes different disk management procedures:
• Managing quorum with mirrored logical volumes
• Disk replacement procedures
• Procedures to solve problems that are caused by an incorrect disk
replacement

• Manage volume group quorum issues
• Explain the physical volume states that are used by the LVM
• Replace a disk under different circumstances
• Recover from a total volume group failure

Accountability:
• Lab exercise
References
GG24-4484 AIX Storage Management (Redbooks)
SG24-5432 AIX Logical Volume Manager from A to Z: Introduction and
Concepts (Redbooks)
SG24-5433 AIX Logical Volume Manager from A to Z: Troubleshooting
and Commands (Redbooks)
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-1
Student Notebook
Unit objectives
IBM Power Systems

Notes:

V10.0
Student Notebook
Uempty 8.1. Failed disks: Mirroring and quorum issues
Student Notebook
Mirroring
IBM Power Systems
hdisk0 Logical partitions
hdisk1
Mirrored logical volume

hdisk2
LP: PP1: PP2: PP3:

VGSA 5 hdisk0, 5 hdisk1, 8 hdisk2, 9
Figure 8-2. Mirroring AN153.0
Notes:
Using mirroring to increase availability

The visual shows a mirrored logical volume, where each logical partition is mirrored to three
physical partitions. In this example, each of the physical partitions that are related to a given
logical partition is on a separate physical volume. More than three copies are not possible.
If one of the disks fails, there are at least two copies of the data available. That means mirroring
is used to increase the availability of a system or a logical volume.
Role of VGSA
The information about the mirrored partitions is stored in the VGSA, which is contained on each
disk. In the example that is shown on the visual, logical partition 5 points to physical partition 5
on hdisk0, physical partition 8 on hdisk1, and physical partition 9 on hdisk2.

V10.0
Student Notebook
Uempty
Stale partitions
IBM Power Systems
hdisk0
Mirrored
hdisk1 logical
volume
hdisk2 Stale partition
After the repair of hdisk2:

• varyonvg VGName (calls syncvg -v VGName)
• Only stale partitions are updated
Figure 8-3. Stale partitions AN153.0
Notes:
How data becomes stale

If a disk that contains a mirrored logical volume (such as hdisk2 on the visual) fails, the data on
the failed disk becomes stale (not current, not up-to-date).
How state information is kept

State information (active or stale) is kept for each physical partition. A physical volume is
shown as stale (which can be seen with the command lsvg VGName), provided it has one
stale partition.
Student Notebook
Updating stale partitions

If a disk with stale partitions is repaired (for example, after a power failure), you should run the
varyonvg command, which starts the syncvg command to synchronize the stale partitions.
The syncvg command is started as a background job that updates all stale partitions from the
volume group.
Always use the varyonvg command to update stale partitions. After a power failure, a disk
forgets its reservation. The syncvg command cannot reestablish the reservation, whereas
varyonvg does before calling syncvg. The term reservation means that a disk is reserved for
one system. The disk driver puts the disk in a state where you can work with the disk and at the
same time the control LED of the disk turns on.
The varyonvg command works if the volume group is already varied on or if the volume group
is the rootvg.

V10.0
Student Notebook
Uempty
Mirroring rootvg
IBM Power Systems
hd9var mirrorvg hd9var

hd8 hd8
hd5
...
hd5
...
hd1 hd1
hdisk0 hdisk1
1. bootinfo –B hdisk1 4. bosboot –a

2. extendvg 5. bootlist
3. mirrorvg -m 6. bootinfo -b
• Make a copy of all rootvg logical volumes using mirrorvg

and place copies on the second disk
• Execute bosboot and change your bootlist
Figure 8-4. Mirroring rootvg AN153.0
Notes:
Reason to mirror rootvg

What is the reason to mirror the rootvg?
If your rootvg is on one disk, you get a single point of failure; which means, if this disk fails, your
machine is no longer available.
If you mirror rootvg to a second (or third) disk, and one disk fails, you still have another disk that
contains the mirrored rootvg. You increase the availability of your system.
Procedure for mirroring rootvg

The following steps show how to mirror the rootvg.
1. Select a disk for the mirror copies. It needs to be large enough to hold these copies, plus
enough room to handle future growth.
# bootinfo -s hdisk1
Student Notebook
Check that the disk is bootable, since it holds a boot logical volume.
# bootinfo -B hdisk1
Any returned value other than a value of 1, indicates that the disk is not bootable.
2. If the disk is not part of the rootvg, add the new disk to the volume group (for example,
hdisk1):
# extendvg [ -f ] rootvg hdisk1
3. Use the mirrorvg command to mirror all of the logical volumes in the rootvg to the new
disk. The mirrorvg command, by default, disables quorum and mirrors the existing
logical volumes in the specified volume group. Changes to the volume group quorum
attribute are effective immediately without having to vary off and then vary on the volume
group. By default, it will also synchronize the copies; though, you might suppress
synchronization by using the -s flag. You should use the exact mapping option (-m) to
ensure that the mirror copy of the boot logical volume (hd5) is allocated contiguous
physical partitions. To mirror rootvg, use the command:
# mirrorvg -m rootvg hdisk1
Restrictions:
- You cannot use the mirrorvg command on a snapshot volume group
- You cannot use the mirrorvg command on a volume group that has an active
firmware assisted dump logical volume
- You cannot use the mirrorvg command if ALL of the following conditions exist:
• The target system is a logical partition (LPAR).
• A copy of the boot logical volume (by default, hd5) is on the failed physical
volume.
• The replacement physical volume's adapter was dynamically configured into the
LPAR since the last cold start.
An alternative to running mirrorvg is to separately run the component tasks:
- If you use one mirror disk, be sure that a quorum is not required for vary on:
# chvg -Qn rootvg
- Add the mirrors for all rootvg logical volumes:
# mklvcopy hd1 2 hdisk1
# mklvcopy hd9var 2 hdisk1
# mklvcopy hd10opt 2 hdisk1
# mklvcopy hd11admin 2 hdisk1

V10.0
Student Notebook
Uempty - If you have other logical volumes in your rootvg, be sure to create copies for them as
well.
- Now, synchronize the new copies that you created:
# syncvg -v rootvg
4. To be able to boot from the different disks, run bosboot:
# bosboot -a
As hd5 is mirrored, there is no need to do it for each disk.
5. Update the bootlist. In a disk failure, you must be able to boot from different disks.
# bootlist -m service hdisk1 hdisk0
6. Check that the system boots from the first boot disk.
# bootinfo -b
Student Notebook
VGDA count
IBM Power Systems
Two-disk volume group
Loss of PV1: Only 33% VGDAs available

(No quorum)
Loss of PV2: 66% of VGDAs available

PV1 PV2 (Quorum)
Three-disk volume group

Loss of 1 PV: 66% of VGDAs still available
(Quorum)
PV1 PV2 PV3
Figure 8-5. VGDA count AN153.0
Notes:
Reservation of space for VGDAs

Each disk that is contained in a volume group contains at least one VGDA. The LVM always
reserves space for two VGDAs on each disk.
Volume groups that have two disks

If a volume group consists of two disks, one disk contains two VGDAs, the other disk contains
only one (as shown on the visual). If the disk with the two VGDAs fails, you have only 33% of
VGDAs available, that means you have less than 50% of VGDAs. In this case, the quorum,
which means that more than 50% of VGDAs must be available, is not fulfilled.
Volume groups that have more than two disks

If a volume group consists of more than two disks, each disk contains one VGDA. If one disk
fails in a volume group with three disks, you still have 66% of VGDAs available and the quorum
is fulfilled.

V10.0
Student Notebook
Uempty
Quorum not available

IBM Power Systems
datavg
Two VGDAs One VGDA
hdisk1 hdisk2
If hdisk1 fails, datavg has no quorum
# varyonvg datavg Closed during operation:

• No more access to logical volumes
FAILS • LVM_SA_QUORCLOSE in error log
Figure 8-6. Quorum not available AN153.0
Notes:
Introduction
What happens if quorum checking is enabled for a volume group and a quorum is not available?
Consider the following example (illustrated on the visual and discussed in the following
paragraphs): In a two-disk volume group datavg, the disk hdisk1 is not available due to a
hardware defect. hdisk1 is the disk that contains the two VGDAs; that means the volume group
does not have a quorum of VGDAs.
Result if volume group not varied on

If the volume group is not varied on and the administrator tries to vary on datavg, the varyonvg
command fails.
Student Notebook
Volume group that is already varied on

If the volume group is already varied on when quorum is lost, the LVM deactivates the volume
group. There is no access to any logical volume that is part of this volume group. The system
sometimes shows strange behavior. This situation is posted to the error log, which shows an
error entry LVM_SA_QUORCLOSE. After losing the quorum, the volume group might still be listed
as active (as seen with the lsvg -o command). However, all application data access and LVM
functions requiring data access to the volume group fails. The volume group is dropped from the
active list as soon as the last logical volume is closed. If you use the commands:
fuser -k /dev/LVname or umount /dev/LVname, no data is written to the disk.

V10.0
Student Notebook
Uempty
Nonquorum volume groups

IBM Power Systems
• With single mirroring, always disable the quorum:

– chvg -Qn datavg
• Additional considerations for rootvg:

– chvg -Qn rootvg
– bosboot -ad /dev/hdiskX
• Turning off the quorum checking:

– Requires 100% VGDAs for normal varyonvg
– Allows the volume group to stay active if quorum is lost
Figure 8-7. Nonquorum volume groups AN153.0
Notes:
Loss of quorum in a nonquorum volume group

When a nonquorum volume group loses its quorum it is not deactivated, it is active until it loses
all of its physical volumes.
Recommendations when using single mirroring

When working with single mirroring, always disable quorum checking with the command:
chvg -Qn (and VGname as the argument). In AIX 6 and later, the change in quorum checking is
effective immediately. In older versions of AIX, you need to vary off and vary on the volume
group for the change to take effect.
The mirrorvg command now automatically disables quorum checking for a mirrored volume
group.
Student Notebook
Recommendations for rootvg

When turning off the quorum checking for rootvg, you must do a bosboot (or a savebase), to
reflect the change in the ODM in the boot logical volume. In versions of AIX before AIX 6, you
need to then reboot the machine to have the change take effect. It occurs at varyonvg.
Varying on a nonquorum volume group

It is important that you know that turning off the quorum checking does not allow a varyonvg
without a quorum. It just prevents the closing of an active volume group when losing its quorum.

V10.0
Student Notebook
Uempty
Forced vary on (varyonvg -f)

IBM Power Systems
datavg
Two VGDAs One VGDA
hdisk1 hdisk2
# varyonvg datavg Fails (even when quorum is disabled)

Check the reason for the failure (cable, adapter, power)
before doing the following:
# varyonvg -f datavg
Failure accessing hdisk1.
Set PV STATE to removed.
Volume group datavg is varied on.
Figure 8-8. Forced vary on (varyonvg -f) AN153.0
Notes:
When normal vary on might fail

If the quorum of VGDAs is not available during vary on, the varyonvg command fails, even
when quorum is disabled. In fact, when quorum is disabled, the varyonvg command requires
that 100% of the VGDAs be available instead of 51%.
Doing a forced vary on

Before doing a forced vary on (varyonvg -f), always check the reason of the failure. If the
physical volume appears to be permanently damaged, use a forced varyonvg.
All physical volumes that are missing during the forced vary on are changed to the removed
physical volume state. The removed state means that all the VGDA and VGSA copies are
removed from these physical volumes. When finished, these physical volumes no longer take
part in quorum checking, nor do they become active within the volume group until you return
them to the volume group.
Student Notebook
Change in VGDA distribution

In the example on the visual, the active disk hdisk2 becomes the disk with the two VGDAs. The
number of VGDAs does not change, even if the failed disk can be brought back.
Quorum checking on
With quorum checking on, you always need > 50% of the VGDAs available (except to vary on
rootvg).
Quorum checking off

With quorum checking off, you must distinguish between an already active volume group and
varying on a volume group.
An active volume group is kept open provided there is at least one VGDA available.
Set MISSINGPV_VARYON=true in /etc/environment if a volume group needs to be varied
on with missing disks at boot time.
When using varyonvg -f or if MISSINGPV_VARYON=true, you take full responsibility for the
volume group integrity.

V10.0
Student Notebook
Uempty
Physical volume states

IBM Power Systems
varyonvg VGName
active
missing missing
varyonvg -f VGName
Hardware
repair
removed
Hardware repair
followed by:
varyonvg VGName
chpv -v a hdiskX
removed
Figure 8-9. Physical volume states AN153.0
Notes:
Introduction
This page introduces physical volume states (not device states). Physical volume states can be
displayed with lsvg -p VGName.
Active state
If a disk can be accessed when a volume group is varied on with the command, varyonvg, it
gets a physical volume state of active.
Missing state
If a disk cannot be accessed during a varyonvg, but quorum is available, the failing disk gets a
physical volume state missing. If the disk can be repaired, for example, after a power failure,
you must run a varyonvg VGName to bring the disk into the active state again. Any stale
partitions are synchronized.
Student Notebook
Removed state
If a disk cannot be accessed during a varyonvg and the quorum of disks is not available, you
can run the command, varyonvg -f VGName, and force the volume group online.
The failing disk gets a physical volume state of removed, and it is not used for quorum checks
any longer.
Recovery after repair

If you are able to repair the disk (for example, after a power failure), running a varyonvg alone
does not bring the disk back into the active state. It maintains the removed state.
At this stage, you must announce the fact that the failure is over by using the following
command:
# chpv -va hdiskX
This command defines the disk hdiskX as active.
You must do a varyonvg VGName afterward to synchronize any stale partitions.
The chpv -r command

The opposite of chpv -va is chpv -vr, which brings the disk into the removed state. This
command works only when all logical volumes are closed on the disk that is defined as
removed. Additionally, chpv -vr does not work if the quorum is lost after removing the disk.

V10.0
Student Notebook
Uempty 8.2. Disk replacement techniques
Student Notebook
Disk replacement: Starting point

IBM Power Systems
A disk must be replaced ...
Yes
Disk mirrored? Procedure 1
No
Yes
Disk still working? Procedure 2
No
Volume group No
Procedure 3
lost?
Yes
Procedure 4 Procedure 5
Figure 8-10. Disk replacement: Starting point AN153.0
Notes:
Reasons to replace a disk

Many reasons might require the replacement of a disk, for example:
- Disk is too small
- Disk is too slow
- Disk produces many DISK_ERR4 error log entries
Flowchart
Before starting the disk replacement, always follow the flowchart that is shown in the visual.
This flowchart helps you whenever you must replace a disk.
1. If the disk that must be replaced is mirrored onto another disk, follow procedure 1
2. If a disk is not mirrored, but still works, follow procedure 2
3. If you are sure that a disk failed and you are not able to repair the disk:

V10.0
Student Notebook
Uempty - If the volume group can be varied on (normal or forced), use procedure 3
- If the volume group is lost after the disk failure, that means the volume group might
not be varied on (either normal or forced)
• If the volume group is rootvg, follow procedure 4
• If the volume group is not rootvg, follow procedure 5
Student Notebook
Procedure 1 (1 of 4): Disk mirrored

IBM Power Systems
• The replacepv command simplifies procedure 1
• Use of replacepv has restrictions:

– Not rootvg
– Snapshot volume group mechanism not being used
– Replacement physical volume at least as large as failed physical
volume
– Both physical volumes can be on system at the same time
• Otherwise, use a variation without the replacepv command
Figure 8-11. Procedure 1 (1 of 4): Disk mirrored AN153.0
Notes:
When to use this procedure

Use procedure 1 when the disk that must be replaced is mirrored.
Disk state
This procedure requires that the disk state of the failed disk is either missing or removed. Use
the command, lspv hdiskX, to check the state of your physical volume. If the disk is still in
the active state, you cannot remove any copies or logical volumes from the failing disk. In this
case, one way to bring the disk into a removed or missing state is to run the reducevg -d
command or to do a varyoffvg and a varyonvg on the volume group by rebooting the
system.

V10.0
Student Notebook
Uempty Alternative approaches

The two main alternatives for this procedure are to use the replacepv command or to not use
that command. The replacepv command greatly simplifies the procedure.
The restrictions are:
- The volume group cannot be rootvg.
- The snapshot volume group mechanism must not be in use.
- The replacement physical volume must be at least as large as failed physical volume.
- Both physical volumes can be on the system at the same time. In other words, you cannot
remove the failed disk and then place the new disk in the same position.
Student Notebook
Procedure 1 (2 of 4): Disk mirrored with replacepv

IBM Power Systems
1. Provide a replacement disk
2. If a new disk, discover and configure:

# cfgmgr
Mirrored
Disk discovered as: hdiskY
3. Run replacepv:
# replacepv hdiskX hdiskY
4. Remove the failed disk from the ODM:

# rmdev -l hdiskX -d
Figure 8-12. Procedure 1 (2 of 4): Disk mirrored with replacepv AN153.0
Notes:
The replacepv command greatly simplifies the procedure.
1. Provide a replacement disk. It can be an unused disk, already known to AIX. Otherwise,
you need to provide a new disk. There are many ways to provide a disk that is new to
AIX:
- Directly allocate a PCI storage adapter to the LPAR. If the adapter does not have an
available PCI, it needs to be provided through a hot add (if a local disk) or by zoning
a LUN (if it is a Fibre Channel adapter).
- Use PowerVM to provision a virtual SCSI disk.
2. Discover the new disk by running the cfgmgr command.
3. Run the replacepv to allocate physical partitions on the replacement disk for the
problem disk. Effectively the new disk replaces the failing disk in the mirroring
configuration. In the example, hdiskX is the failing disk.
4. Remove the failing disk.

V10.0
Student Notebook
Uempty
Procedure 1 (3 of 4): Disk mirrored without
replacepv
IBM Power Systems
1. Remove all copies from the disk:

# unmirrorvg vg_name hdiskX
2. Remove the disk from the volume group:

# reducevg vg_name hdiskX
Mirrored
3. Remove the disk from the ODM:
4. Provide a replacement disk.

If a new disk, discover and configure
# cfgmgr
5. Add the new disk to the volume group:

# extendvg vg_name hdiskY
6. Create new copies:

# mirrorvg vg_name hdiskY
Figure 8-13. Procedure 1 (3 of 4): Disk mirrored without replacepv AN153.0
Notes:
The goal of each disk replacement is to remove all logical volumes from a disk.
1. Remove all logical volume copies from the disk. Use either the SMIT fastpath smit
unmirrorvg or the unmirrorvg command as shown in the visual. These commands
unmirror each logical volume that is mirrored on the disk.
If you have more unmirrored logical volumes on the disk, you must either move them to
another disk (migratepv), or remove them if the disk cannot be accessed (rmlv).
2. If the disk is empty, remove the disk from the volume group. Use SMIT fastpath smit
reducevg or the reducevg command.
3. After the disk is removed from the volume group, you can remove it from the ODM. Use
the rmdev command as shown in the visual.
4. Use a hot-swap procedure to replace the failed or failing disk. (In older machines, disk
replacement would effectively require the system to be shut down for the procedure).
Run cfgmgr to discover and configure the new disk.
5. Add the new disk to the volume group. Use either the SMIT fastpath
smit extendvg or the extendvg command.
Student Notebook
6. Finally, create new copies for each logical volume on the new disk. Use either the SMIT
fastpath smit mirrorvg or the mirrorvg command. If synchronization was
suppressed during mirroring, then remember to eventually synchronize the volume
group (or each logical volume), with the syncvg command.

V10.0
Student Notebook
Uempty
Procedure 1 (4 of 4): Special steps for rootvg

IBM Power Systems
• Before the reducevg step:

1. Remove the failed disk from the bootlist:
# bootlist –m normal hdisk1
2. Ensure primary dump logical volume is on the good disk:

# mklv -t sysdump -y dump rootvg 64 hdisk0
# sysdumpdev –P –p /dev/dump
• mirrorvg step and after:

1. Exact mapping for mirrorvg:
# mirrorvg –m rootvg hdiskX
2. Rebuild the boot image:

# bosboot –a
3. Add the new disk to the boot list:

# bootlist –m normal hdisk1 hdiskX
Figure 8-14. Procedure 1 (4 of 4): Special steps for rootvg AN153.0
Notes:
Special steps for rootvg

The rootvg has special considerations because it contains the boot logical volume and the
dump device.
The new disk must replace the old disk in the bootlist.
The bootlist needs to be rebuilt to include the replacement disk instead of the bad disk.
The main reason for exact mirroring is to be sure that the boot logical volume has contiguous
allocations.
If a dedicated dump device is being used, it is common for it to not be mirrored. If the dump
logical volume is on the failing disk, then it should be redefined on the good disk instead.
Student Notebook
Procedure 2 (1 of 2): Disk still working

IBM Power Systems
1. Connect the new disk to the system.

Volume group
2. Add new disk to volume group:
# extendvg vg_name hdiskY
3. Migrate old disk to new disk: (*) hdiskY

# migratepv hdiskX hdiskY
4. Remove old disk from volume group:

# reducevg vg_name hdiskX
5. Remove old disk from ODM:

(*) : Is the disk in rootvg?

See next visual for further considerations
Figure 8-15. Procedure 2 (1 of 2): Disk still working AN153.0
Notes:

Procedure 2 applies to a disk replacement where the disk is unmirrored but might be accessed.
If the disk that must be replaced is in rootvg, follow the instructions on the next visual.
The goal and how to do it

The goal is the same as always. Before you can replace a disk, you must remove everything
from the disk.
1. Shut down your system if you need to physically attach a new disk to the system. Boot
the system so that cfgmgr configures the new disk.
2. Add the new disk to the volume group. Use either the SMIT fastpath

V10.0
Student Notebook
Uempty 3. Before running the next step, it is necessary to distinguish between the rootvg and a
non-rootvg volume group.
- If the disk that is replaced is in rootvg, execute the steps that are shown on the next
visual Procedure 2 (2 of 2): Special Steps for rootvg.
- If the disk that is replaced is not in rootvg, use the migratepv command:
# migratepv hdisk_old hdisk_new
This command moves all logical volumes from one disk to another. You can do the
migratepv during normal system activity. The command migratepv requires that
the disks are in the same volume group.
4. If the old disk was migrated, remove it from the volume group. Use either the SMIT
fastpath smit reducevg or the reducevg command.
5. If you need to remove the disk from the system, remove it from the ODM with the rmdev
command as shown. Finally, remove the physical disk from the system.
Student Notebook
Procedure 2 (2 of 2): Special steps for rootvg

IBM Power Systems
rootvg 1…
hdiskX 2…
hdiskY
3.Disk contains hd5?
# migratepv -l hd5 hdiskX hdiskY
# bosboot -ad /dev/hdiskY
1. Connect new disk to system
# chpv -c hdiskX
# bootlist -m normal hdiskY
2. Add new disk to volume
group Migrate old disk to new disk:
3.
4. Remove old disk from

volume group 4…
5. Remove old disk from ODM 5…
Figure 8-16. Procedure 2 (2 of 2): Special steps for rootvg AN153.0
Notes:
Extra steps for rootvg

Procedure 2 requires some additional steps if the disk that must be replaced is in rootvg.
1. Connect the new disk to the system as described in procedure 2.
2. Add the new disk to the volume group. Use smit extendvg or the extendvg
command.
3. This step requires special considerations for rootvg:
- Check whether your disk contains the boot logical volume. The default location for
the boot logical volume is /dev/hd5.
Use the command, lspv -l, to check the logical volumes on the disk that must be
replaced.
If the disk contains the boot logical volume, migrate the logical volume to the new
disk and update the boot logical volume on the new disk. To avoid a potential boot

V10.0
Student Notebook
Uempty from the old disk, clear the old boot record by using the chpv -c command. Then,
change your bootlist:
# migratepv -l hd5 hdiskX hdiskY
# bosboot -ad /dev/hdiskY
# chpv -c hdiskX
# bootlist -m normal hdiskY
If the disk contains the primary dump device, you must deactivate the dump before
migrating the corresponding logical volume:
# sysdumpdev -p /dev/sysdumpnull
- Migrate the complete old disk to the new one:
If the primary dump device is not active, you must activate it:
# sysdumpdev -p /dev/hdX
4. After the disk is migrated, remove it from the rootvg volume group.
# reducevg rootvg hdiskX
5. If the disk must be removed from the system, remove it from the ODM (use the rmdev
command), shut down your AIX, and remove the disk from the system afterward.
Student Notebook
Procedure 3: Disk in missing or removed state

IBM Power Systems
1. Identify all logical volumes and file systems on failing disk:

# lspv -l hdiskY Volume group
2. Unmount all file systems on failing disk:
# umount /dev/lv_name
3. Remove all file systems from failing disk: hdiskX hdiskY
# rmfs filesystem
4. Remove all logical volumes from failing disk:
# rmlv logical-volume
5. Remove disk from volume group:
# reducevg vg_name hdiskY
# lspv hdiskY
6. Remove disk from system:
# rmdev -l hdiskY -d ...
PV STATE: removed
7. Add new disk to volume group:
# extendvg vg_name hdiskZ
# lspv hdiskY
8. Re-create all logical volumes and file systems on new disk: ...
# mklv -y lv_name
PV STATE: missing
# smit crfs
9. Restore file systems from backup:
# restore -rvqf /dev/rmt0
Figure 8-17. Procedure 3: Disk in missing or removed state AN153.0
Notes:

Procedure 3 applies to a disk replacement where a disk might not be accessed but the volume
group is intact. The failing disk is either in a state (not device state) of missing (normal
varyonvg worked) or removed (forced varyonvg was necessary to bring the volume group
online).
If the failing disk is in an active state (this state is not a device state), this procedure does not
work. In this case, one way to bring the disk into a removed or missing state is to run the
reducevg -d command or to do a varyoffvg and a varyonvg on the volume group by
rebooting the system. The reboot is necessary because you cannot vary off a volume group
with open logical volumes. Because the failing disk is active, there is no way to unmount file
systems.

V10.0
Student Notebook
Uempty Procedure steps

If the failing disk is in a missing or removed state, start the procedure:
1. Identify all logical volumes and file systems on the failing disk. Use commands like
lspv, lslv, or lsfs to provide this information. These commands work on a failing
disk.
2. If there are mounted file systems on logical volumes on the failing disk, you must
unmount them. Use the umount command.
3. Remove all file systems from the failing disk with smit rmfs or the rmfs command. If
you remove a file system, the corresponding logical volume and stanza in
/etc/filesystems is removed as well.
4. Remove the remaining logical volumes (the logical volumes that are not associated with
a file system) from the failing disk with smit rmlv or the rmlv command.
5. Remove the disk from the volume group, with the reducevg command or the SMIT
fastpath smit reducevg.
6. Remove the disk from the ODM and from the system with the rmdev command.
7. Add the new disk to the system and extend your volume group. Use the SMIT fastpath
8. Re-create all logical volumes and file systems that were removed due to the disk failure.
Use smit mklv, smit crfs, or the commands directly.
9. Due to the total disk failure, you lost all data on the disk. This data must be restored,
either by the restore command or any other tool you use to restore data (for example,
Tivoli Storage Manager) from a previous backup.
Student Notebook
Procedure 4: Total rootvg failure

IBM Power Systems
rootvg
1. Replace bad disk
hdiskX
2. Boot in maintenance mode
rootvg
3. Restore from a mksysb image
hdiskX hdiskY
4. Import each volume group into the new
ODM (importvg) if needed
Contains OS
datavg
logical
volumes
hdiskZ
mksysb
Figure 8-18. Procedure 4: Total rootvg failure AN153.0
Notes:

Procedure 4 applies to a total rootvg failure.
This situation might come up when your rootvg consists of one disk that fails. Or, your rootvg is
installed on two disks and the disk fails that contains operating system logical volumes (for
example, /dev/hd4).
Procedure steps
Follow these steps:
1. Replace the bad disk
2. Boot your system in maintenance mode
3. Restore your system from a mksysb

V10.0
Student Notebook
Uempty If any rootvg file systems were not mounted when the mksysb was made, those file
systems are not included on the backup image. You need to create and restore those file
systems as a separate step.
4. Import any user volume groups after restoring the mksysb. For example:
# importvg -y datavg hdisk9
Only one disk from the volume group (in the example hdisk9), needs to be selected.
Export and import of volume groups is discussed in more detail in the next topic.
Student Notebook
Procedure 5: Total non-rootvg failure

IBM Power Systems
datavg
1. Export the volume group from the system:
# exportvg vg_name
2. Check /etc/filesystems hdiskX
3. Remove bad disk from ODM and the system:

4. Connect the new disk

5. If volume group backup is available (savevg):
Tape
# restvg -f /dev/rmt0 hdiskY
6. If no volume group backup is available: Re-create ...

- Volume group (mkvg)
hdiskY
- Logical volumes and file systems (mklv, crfs)
7. Restore data from a backup:

# restore -rqvf /dev/rmt0
Figure 8-19. Procedure 5: Total non-rootvg failure AN153.0
Notes:

Procedure 5 applies to a total failure of a non-rootvg volume group. This situation might come
up if your volume group consists of only one disk that fails. Before starting this procedure, make
sure that the problem is not just a temporary disk failure (for example, a power failure).
Procedure steps
Follow these steps:
1. To fix this problem, export the volume group from the system. Use the command
exportvg as shown. During the export of the volume group, all ODM objects that are
related to the volume group is deleted.
2. Check your /etc/filesystems. There should be no references to logical volumes or file
systems from the exported volume group.

V10.0
Student Notebook
Uempty 3. Remove the bad disk from the ODM (use rmdev as shown). Shut down your system and
remove the physical disk from the system.
4. Connect the new drive and boot the system. The cfgmgr configures the new disk.
5. If you have a volume group backup available (created by the savevg command), you
can restore the complete volume group with the restvg command (or the SMIT
fastpath smit restvg). All logical volumes and file systems are recovered.
If you have more than one disk that should be used during restvg, you must specify
these disks:
# restvg -f /dev/rmt0 hdiskY hdiskZ
The savevg and restvg commands will be discussed in a future unit.
6. If you have no volume group backup available, you must re-create everything that was
part of the volume group.
Re-create:
- The volume group with mkvg or smit mkvg
- All logical volumes with mklv or smit mklv
- All file systems with crfs or smit crfs
7. Finally, restore the lost data from backups, for example with the restore command or
any other tool you use to restore data in your environment.
Student Notebook
ODM errors from LVM commands

IBM Power Systems
# lsvg -p datavg
unable to find device id ...734...
ODM failure in device configuration database
1.Typing error in the command?

Analyze failure
2.Analyze the ID of the device: Which
physical volume or logical volume
causes problems?
ODM problem in No
rootvg? Export and import
volume group
Yes
rvgrecover
Figure 8-20. ODM errors from LVM commands AN153.0
Notes:
ODM failure
After an incorrect disk replacement, you might detect ODM failures. For example, when running
the command lsvg -p datavg, a typical error message might be:
unable to find device id 00837734 in device configuration database
In this case, a device might not be found in the ODM.
Analyze the failure

Before trying to fix it, check the command that you typed in. Maybe it just contains a typing
error.
Find out what device corresponds to the ID that is shown in the error message.

V10.0
Student Notebook
Uempty Fix the ODM problem

Two ways to fix an ODM problem:
- If the ODM problem is related to the rootvg, run the rvgrecover procedure.
- If the ODM problem is not related to the rootvg, export the volume group with the exportvg
command and import it again with the importvg command.
Student Notebook
Removal of disk without reducevg (1 of 2)

IBM Power Systems
VGDA: physical: ODM: CuAt:

...221... hdisk4 pvid= …221…
...555... hdisk5 pvid= …555…
datavg pv= …221…
# rmdev –l hdisk5 -d
VGDA: physical: ODM: CuAt:

...221... hdisk4 pvid= …221…
...555... object deleted
• ODM is in conflict with itself and with the VGDA
• Unable to varyonvg or reducevg due to this problem
Figure 8-21. Removal of disk without reducevg (1 of 2) AN153.0
Notes:
The problem
A frequent error occurs when the administrator removes a disk from the ODM (by running
rmdev). Then, physically removes the disk from the system, without first running the reducevg
command to remove volume group references to that disk (in the VGDA and in the ODM).
The VGDA stores information about all physical volumes of the volume group. ODM disk
references include the physical volume attributes for the volume group.
Throughout this course, the physical volume ID (PVID) is abbreviated in the visuals for
simplicity. The physical volume ID is really 32 characters.
The result of this mistake is that the volume group cannot be varied on. If you try to use
reducevg after the fact, it fails, since the command requires that the volume group is active.

V10.0
Student Notebook
Uempty
Removal of disk without reducevg (2 of 2)

IBM Power Systems
1. Repair the ODM enough to varyonvg

• Two options to repair the ODM:
– Use ODM commands (such as odmdelete)
– Use exportvg and importvg (using hdisk4)
2. # varyonvg datavg
Succeeds but reports errors due to VGDA reference to PVID
# lsvg –l datavg also complains:
• Unable to find device id …555…
• …555… missing
3. Use reducevg to remove disk reference from VGDA
# reducevg datavg hdisk5
• Fails: unable to find physical volume hdisk5
# reducevg datavg …555…
• Succeeds
Figure 8-22. Removal of disk without reducevg (2 of 2) AN153.0
Notes:
The fix
Before fixing the problem, be sure that you have the PVID for the removed disk.
The problem can be fixed by running the reducevg command, but the volume group needs to
be active. The varyonvg command does not work if volume group has a PVID value that
cannot be resolved to a disk.
You might use the odmdelete command to remove the bad PVID attribute object, but this
action is not as simple as it sounds and a mistake might make matter worse. An easier way to
clean up the bad ODM reference is to export the volume group and then import the volume
group by using the VGDA on the remaining disk.
After the volume group is active, you can then use the reducevg command to properly remove
the bad PVID reference from the VGDA. Instead of specifying the disk name, the PVID of the
removed disk is specified. If you did not earlier record the PVID, then you need to obtain it from
the VGDA itself.
Student Notebook
To obtain the PVID of the removed disk from the VGDA, use the command:
# lqueryvg -p hdisk4 -At (Use any disk from the volume group.)
You need to compare this output with the lsvg -p datavg output to identify which PVID is for
the missing disk.

V10.0
Student Notebook
Uempty
Checkpoint
IBM Power Systems
1. Although everything seems to be working fine, you detect

error log entries for disk hdisk0 in your rootvg. The disk is
not mirrored to another disk. You decide to replace this disk.
Which procedure would you use to migrate this disk?
2. You detect an unrecoverable disk failure in volume group

datavg. This volume group consists of two disks that are
completely mirrored. Because of the disk failure you are not
able to vary on datavg. How do you recover from this
situation?
3. After disk replacement, you recognize that a disk has been

removed from the system but not from the volume group.
How do you fix this problem?
Notes:
Student Notebook
Exercise: Disk management procedures

IBM Power Systems
• Work with LVM mirroring and quorum
• rootvg disk replacement
• User volume group disk replacement

procedure
Figure 8-24. Exercise: Disk management procedures AN153.0
Notes:

V10.0
Student Notebook
Uempty
Unit summary
IBM Power Systems

Notes:
Different procedures are available that can be used to fix disk problems under any
circumstance:
Procedure 1: Mirrored disk
Procedure 2: Disk still working (rootvg specials)
Procedure 3: Total disk failure
Procedure 4: Total rootvg failure
Procedure 5: Total non-rootvg failure
The exportvg and importvg commands can be used to easily transfer volume groups
between systems.
Student Notebook

V10.0
Student Notebook
Uempty
Unit 9. Install and cloning techniques

This unit describes techniques to reduce the size of a maintenance window.
Specific techniques are taught for installing system updates while cloning the
rootvg.

• Use alternate disk installation techniques to update AIX
• Use multibos to update AIX

Accountability:
• Lab exercise
Reference
SC24-7910 AIX Version 7.1 Differences Guide (Redbooks)
SC23-6742 AIX Version 7.1 Understanding the Diagnostic Subsystem
for AIX
http://www.ibm.com/developerworks/aix/library/au-alt_disk_
copy
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-1
Student Notebook
Unit objectives
IBM Power Systems

Notes:

V10.0
Student Notebook
Uempty 9.1. Alternate disk installation
Student Notebook
Topic 1 objectives
IBM Power Systems
After completing this topic, you should be able to:

• Install a mksysb onto an alternate disk
• Clone an existing rootvg to an alternate disk
• Remove an alternate disk
Figure 9-2. Topic 1 objectives AN153.0
Notes:

V10.0
Student Notebook
Uempty
Alternate disk installation

IBM Power Systems
# smit alt_install
# smit alt_mksysb # smit alt_clone

-OR- -OR-
# alt_disk_mksysb # alt_disk_copy
Installing a mksysb on Cloning the running

another disk rootvg to another disk
Figure 9-3. Alternate disk installation AN153.0
Notes:
Benefits of alternate disk techniques

An alternate disk installation installs the operating system while the system is still up and
running, which reduces installation or upgrade downtime considerably. It also allows large
facilities to better manage an upgrade because systems can be installed over a longer period.
While the systems are still running at the previous version, the switch to the newer version can
happen at the same time.
When to use alternate disk techniques

Alternate disk installation can be used in one of two ways:
- Installing a mksysb image on another disk
- Cloning the current running rootvg to an alternate disk
Student Notebook
Filesets
An alternate disk installation uses the following filesets:
- bos.alt_disk_install.boot_images must be installed for alternate disk mksysb
installations
- bos.alt_disk_install.rte must be installed for rootvg cloning and alternate disk
mksysb installations
How to use alternate disk techniques

All modes of alternate disk installations are available through the SMIT fastpath:
smit alt_install.
To focus on installing a new image on an alternate disk, you can either use the SMIT fastpath:
smit alt_mksysb or directly run the command, alt_disk_mksysb.
To focus on cloning an existing mksysb to an alternate disk, you can either use the SMIT
fastpath: smit alt_clone or directly run the command, alt_disk_copy.

V10.0
Student Notebook
Uempty
Alternate mksysb disk installation (1 of 2)

IBM Power Systems
hdisk0
• rootvg (AIX 6.1)
hdisk1
AIX 7.1
mksysb
# alt_disk_mksysb –m /dev/rmt0 –d hdisk1
Example installs an AIX 7.1 mksysb on hdisk1

• Bootlist is set to alternate disk (hdisk1)
• Changing the bootlist allows you to boot different AIX levels
(hdisk0 boots AIX 6.1, hdisk1 boots AIX 7.1)
Figure 9-4. Alternate mksysb disk installation (1 of 2) AN153.0
Notes:
Introduction
An alternate mksysb installation involves installing a mksysb image that was created from
another system onto an alternate disk of the target system.
Example
In the example, an AIX 7.1 mksysb tape image is installed on an alternate disk, hdisk1 by
running the following command:
# alt_disk_mksysb -m /dev/rmt0 -d hdisk1
The system now contains two rootvgs on different disks. In the example, one rootvg has an AIX
6.1 (hdisk0), one has an AIX 7.1 (hdisk1).
Student Notebook
Which disk does the system use to boot?

The alt_disk_mksysb command changes the bootlist by default. During the next reboot, the
system will boot from the new rootvg. If you do not want to change the bootlist, use the option
-B of alt_disk_mksysb.
By changing the bootlist, you determine which AIX version you want to boot.
Filesets within the mksysb being installed

The mksysb image that is used for the installation must be created on a system that has either:
- The same hardware configuration as the target system.
- All the device and kernel support that is installed for a different machine type or platform. In
this case, the following filesets must be contained in the mksysb:
• devices.*
• bos.mp
• bos.up
• bos.64bit
alt_disk_mksysb options
The alt_disk_mksysb command has the following options:
-m device
-d target-disks
-B (Do not change the bootlist).
-i image.data
-s script
-R resolve.conf
-p platform
-L mksysb_level
-n (Remain a NIM client.)
-P phase
-c console
-r (Reboot after installation).
-k (Keep mksysb device customization).
-y (Import non-rootvg volume groups).

V10.0
Student Notebook
Uempty
Alternate mksysb disk installation (2 of 2)

IBM Power Systems
# smit alt_mksysb
Install mksysb on an Alternate Disk

[Entry Fields]
* Target Disk(s) to install [hdisk1] +
* Device or image name [/dev/rmt0] +
Phase to execute all +
image.data file [] /
Customization script [] /
Set bootlist to boot from this disk
on next reboot? yes +
Reboot when complete? no +
Verbose output? no +
Debug output? no +
resolv.conf file [] /
Figure 9-5. Alternate mksysb disk installation (2 of 2) AN153.0
Notes:
SMIT panel example

The alternate disk installation function can also be run from the smit dialog panel.
When you run smit alt_mksysb, you get the SMIT menu shown on the visual.
Student Notebook
Alternate disk rootvg cloning (1 of 2)

IBM Power Systems
hdisk0
• rootvg (AIX 7.1 TL01)
Clone
hdisk1
AIX AIX 7.1 TL03 • rootvg (AIX 7.1 TL03)
# alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1
• Example creates a copy of the current rootvg on hdisk1

• Installs a technology level on the clone (AIX7.1 TL03)
• Changing the bootlist allows you to boot different AIX levels
(hdisk0 boots AIX 7.1 TL01, hdisk1 boots AIX 7.1 TL03)
Figure 9-6. Alternate disk rootvg cloning (1 of 2) AN153.0
Notes:
Benefits of cloning rootvg

Cloning the rootvg to an alternate disk can have many advantages. One advantage is having an
online backup available if a disk fails. Another benefit of rootvg cloning is in applying new
maintenance levels or updates. A copy of the rootvg is made to an alternate disk (in the
example hdisk1) followed by the installation of a technology level on that copy. The active
system runs uninterrupted during this time. When the system is rebooted, it will boot from the
newly updated rootvg for testing. If the technology level causes problems, the old rootvg can be
used by resetting the bootlist and rebooting.
Example
In the example, alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1, rootvg that
is on hdisk0, is cloned to the alternate disk hdisk1. Additionally, a new technology level is
applied to the cloned version of AIX.

V10.0
Student Notebook
Uempty
Alternate disk rootvg cloning (2 of 2)

IBM Power Systems
# smit alt_clone
Clone the rootvg to an Alternate Disk
[Entry Fields]
* Target Disk(s) to install [hdisk1] +
Phase to execute all +
image.data file [] /
Exclude list [] /
Bundle to install [update_all] +
-OR-
Fileset(s) to install []
Fix bundle to install []
-OR-
Fixes to install []
Directory or Device with images [/dev/cd0]
(required if filesets, bundles or fixes used)
installp Flags
COMMIT software updates? yes +
SAVE replaced files? no +
AUTOMATICALLY install requisite software? yes +
EXTEND file systems if space needed? yes +
OVERWRITE same or newer versions? no +
VERIFY install and check file sizes? no +
ACCEPT new license agreements? no +
Customization script [] /
Set bootlist to boot from this disk
on next reboot? yes +
Reboot when complete? no +
Verbose output? no +
Debug output? no +
Figure 9-7. Alternate disk rootvg cloning (2 of 2) AN153.0
Notes:
Example with SMIT

The SMIT fastpath for alternate disk rootvg cloning is smit alt_clone.
The target disk in the example is hdisk1 that means the rootvg is copied to that disk. If you
specify a bundle, a fileset or a fix, then the installation or the update takes place on the clone,
not in the original rootvg.
By default, the bootlist is set to the new disk.
Changing the bootlist lets you boot from the original rootvg or the cloned rootvg.
Student Notebook
Removing an alternate disk installation

IBM Power Systems
Original hdisk0
• rootvg (AIX 7.1 TL01)

# shutdown -Fr
Clone
# lsvg • rootvg (AIX 7.1 TL03)
rootvg
altinst_rootvg
# alt_rootvg_op -X
• alt_rootvg_op -X removes the

# bootlist -m normal hdisk1 ODM definition from the ODM
# shutdown -Fr
# lsvg • Do not use exportvg to
rootvg remove the alternate volume
old_rootvg group
# alt_rootvg_op –X old_rootvg
Figure 9-8. Removing an alternate disk installation AN153.0
Notes:
Removing the alternate rootvg

If you created an alternate rootvg with alt_disk_mksysb or alt_disk_copy, but no
longer want to use it, first boot your system from the original disk (in the example, hdisk0) and
then use alt_rootvg_op.
When running lsvg to list the volume groups in the system, the alternate rootvg is shown with
the name altinst_rootvg.
To remove the alternate rootvg, do not use the exportvg command. Run the command:
# alt_rootvg_op -X
This command removes the altinst_rootvg definition from the ODM database.
If exportvg runs by accident, you must re-create the /etc/filesystems file before
rebooting the system. The system will not boot without a correct /etc/filesystems.

V10.0
Student Notebook
Uempty Removing the original rootvg

If you created an alternate rootvg with alt_disk_mksysb or alt_disk_copy, and no
longer want to use the original disk, first boot your system from the cloned disk and then use the
alt_rootvg_op command to remove it.
When running lsvg to list the volume groups in the system, the alternate rootvg is shown with
the name old_rootvg.
To remove the original rootvg, do not use the exportvg command. Run the command:
# alt_rootvg_op -X old_rootvg
This command removes the old_rootvg definition from the ODM database.
If exportvg runs by accident, you must re-create the /etc/filesystems file before
rebooting the system. The system will not boot without a correct /etc/filesystems.
Student Notebook
NIM alternate disk migration

IBM Power Systems
• alt_disk_copy does not support migrating to a new

version or release of AIX
• nimadm uses a NIM server to migrate to an alternate disk

hdisk0
• rootvg
• (AIX 6.1)
Clone
NIM server NIM client:
lpar1
hdisk1
• rootvg
AIX AIX 7.1
• (AIX 7.1)
# nimadm -c lpar1 -s spot1 -l lpp1 -d "hdisk1" -Y
Figure 9-9. NIM alternate disk migration AN153.0
Notes:
What is nimadm?
The nimadm command (Network Install Manager Alternate Disk Migration) creates a copy of
rootvg to a free disk (or disks) and simultaneously migrates it to a new version or release level
of AIX. The nimadm command uses NIM resources to perform this function.
Advantages of nimadm
There are several advantages to using the nimadm command over a conventional migration:
- Reduced downtime. The migration is done while the system is up and functioning normally.
There is no requirement to boot from installation media, and most of processing occurs on
the NIM master.
- The nimadm command facilitates quick recovery in the event of migration failure. Since the
nimadm command uses alt_disk_install to create a copy of rootvg, all changes are
done to the copy (altinst_rootvg). In the event of serious migration installation failure, the
failed migration is cleaned up and there is no need for the administrator to take further

V10.0
Student Notebook
Uempty action. In the event of a problem with the new (migrated) level of AIX, the system can be
quickly returned to the pre-migration operating system by booting from the original disk.
- The nimadm command allows a high degree of flexibility and customization in the migration
process. This process is done with the use of optional NIM customization resources:
image_data, bosinst_data, exclude_files, pre-migration script,
installp_bundle, and post-migration script.
Details of using NIM to do an alternate disk migration are not covered in this course.
Student Notebook
Exercise: Install and cloning techniques (Part 1)

IBM Power Systems
• Clone the existing rootvg

• Apply a new service pack
• Alternate booting between different levels
Figure 9-10. Exercise: Install and cloning techniques (Part 1) AN153.0
Notes:

V10.0
Student Notebook
Uempty 9.2. Using multibos
Student Notebook
Topic 2 objectives
IBM Power Systems

• Clone an active BOS to a standby BOS
• Customize a standby BOS
• Alternate booting between an active BOS and a standby BOS
• Mount a standby BOS
• Start a standby BOS shell
Notes:

V10.0
Student Notebook
Uempty
multibos overview
IBM Power Systems
• Two alternate AIX base operating systems (BOS) in a single

rootvg
• Standby BOS created as copy of active BOS
• Modify standby BOS without affecting active BOS
– Apply maintenance to the standby BOS
– Mount and modify the standby BOS
– Start an interactive shell working in the standby BOS
• Can alternate on reboot which BOS is active
Figure 9-12. multibos overview AN153.0
Notes:
Overview
The main purpose of using multibos is to have the type of alternate BOS (base operating
system) capabilities that are available with the alternate disk technology, without having to use
another disk. The operating system filesets do not occupy enough space to justify allocating
another entire disk for that purpose. With multibos, you can have the two BOS versions on
the same disk.
This task is accomplished by creating copies of the effected (by an OS update) base operating
system logical volumes (active BOS) with a different file name path. These copies are in the
only rootvg.
Another advantage to multibos is that it does not need as much space as the cloning
operation, since it does not need to clone all the logical volumes in the rootvg.
After you create the alternate BOS, changes, such as applying maintenance, can be made to
these copies, without changing the AIX version in the active BOS. In addition to applying
maintenance, you can access and make configuration changes to the standby BOS through two
Student Notebook
techniques: mounting the standby BOS and starting an interactive shell (chroot) for the
standby BOS.
When you would like to test the standby BOS, you reboot the standby copy of the boot logical
volume (BLV). If there is a problem with the changes that were made, configure the bootlist to
use the original BLV and a reboot returns you to the original version of the BOS.

V10.0
Student Notebook
Uempty
Active and standby BOS logical volumes

IBM Power Systems
Active BOS
/
BLV jfslog (hd4)
(hd5) (hd8)
Standby BOS
home opt usr var tmp bos_inst (if mounted)
(hd1) (hd10opt) (hd2) (hd9var) (hd3) (bos_hd4)
opt usr var

(bos_hd10opt) (bos_hd2) (bos_hd9var)
BLV jfslog
(bos_hd5) (bos_hd8)
Figure 9-13. Active and standby BOS logical volumes AN153.0
Notes:
Standby BOS structure

The standby BOS needs to mimic the structure of the live BOS file system structure, but not to
replace the active file systems. To handle this requirement, multibos creates a logical volume
to match each of the BOS logical volumes, including not only the file systems, but also the
JFSlogs and the boot logical volume. The names are modified by adding a prefix of bos_ to the
front of the standard logical volume names. For the standby BOS file systems, the file system
mount point is changed to have a root path of /bos_inst/.
If you mount the standby BOS is mounted, then you use this modified path (beginning with
/bos_inst). If you use the chroot shell access or if you reboot to make the standby BOS the
active BOS, then the (formally standby BOS) file systems have a root path of /.
Student Notebook
Setting up a standby BOS (1 of 2)

IBM Power Systems
• multibos –s –X
• Pre-validate that there is sufficient rootvg free space
• Uses default image.data (can customize with –i)
• Special logical volumes and file systems are created for the
standby OS
– bos_<lvname>
– /bos_inst/<mount point>
Figure 9-14. Setting up a standby BOS (1 of 2) AN153.0
Notes:
multibos space prerequisite

Since the multibos needs sufficient space in rootvg to replicate the BOS logical volumes, you
must ensure that there is enough free space in the rootvg. Display the current space that is
used by these BOS logical volumes. Remember that user-defined logical volumes, even if in the
rootvg is not cloned. Then, check that there is enough space on the rootvg disk. The clone, by
default, uses the default /image.data file. So, the cloned logical volumes, are placed on the
same disk as the source logical volumes. If you need to obtain space by extending the volume
group, then you need to customize the image.data file that is used.
The creation of the standby BOS requires extra space in the active BOS during the operation.
You should allow the multibos command to increase the size of file systems as needed (by
using the -X flag).

V10.0
Student Notebook
Uempty image.data customization

To change characteristics of the cloned rootvg logical volumes or file systems, create a copy of
the image.data file, edit the copy, and then direct multibos to use the edited copy (by using
the -i flag).
For example, if you want the cloned logical volumes to be placed on a disk that was added to
the rootvg, follow these steps:
i. Run the mkszfile command to obtain a current capture of the characteristics.
ii. Copy the created /image.data file to a different name, and edit it to specify that the
cloned logical volumes should be placed on the additional disk.
iii. Point to the new file by running the command: multibos -i <image.data copy>
-Xs
Which logical volumes are cloned?

The multibos facility does not clone all the logical volumes in the rootvg, unlike the
alternate disk facility. Some of the system defined logical volumes and all user-defined
logical volumes are accessed in common between the active BOS and the standby BOS.
The logical volumes that are cloned are:
• /dev/hd5 (BLV)
• /dev/hd4 (root file system)
• /dev/hd2 (/usr)
• /dev/hd9var (/var)
• /dev/hd10opt (/opt)
Student Notebook
Setting up a standby BOS (2 of 2)

IBM Power Systems
• Copies BOS file systems - backup and restore
• Non-BOS logical volumes are shared
• Optional post-creation customization script
• Bootlist updated (-t blocks)

– First: Standby BOS
– Second: Active BOS
Figure 9-15. Setting up a standby BOS (2 of 2) AN153.0
Notes:
Tasks of multibos standby BOS creation

The multibos command, when requested to create a standby BOS, will:
• Collect the metadata information about the rootvg.
• Create and define the standby logical volumes and file systems.
• Use the backup and restore commands to copy the files from the active BOS file
systems to the standby file systems.
• Set the bootlist to have the standby BOS BLV first and the active BOS BLV second.
• Run a post-creation customization script, if provided by the administrator.

V10.0
Student Notebook
Uempty
Other multibos operations (1 of 2)

IBM Power Systems
• Customizing a standby BOS

– multibos –c { -a | -b <bundle> | -f <fixlist> } –l device
– Can combine with standby BOS creation
• Mounting and unmounting a standby BOS

– multibos –m
– Mounts to /bos_inst/
– multibos -u
Figure 9-16. Other multibos operations (1 of 2) AN153.0
Notes:
Customizing standby BOS

You can use the multibos customization operation, with the -c flag, to update the standby
BOS. The customization operation requires a source for the fix filesets (-l device or directory
flag) and at least one installation option (installation by bundle, installation by fix, or update_all).
The customization operation performs the following steps:
1) The standby BOS file systems are mounted, if not already mounted.
2) If you specify an installation bundle with the -b flag, the installation bundle is
installed by using the geninstall utility. The installation bundle syntax should
follow geninstall conventions. If you specify the -p preview flag, geninstall
does a preview operation.
3) If you specify a fix list, with the -f flag, the fix list is installed by using the instfix
utility. The fix list syntax should follow instfix conventions. If you specify the -p
preview flag, then instfix does a preview operation.
Student Notebook
4) If you specify the update_all function, with the -a flag, it is done by using the
install_all_updates utility. If you specify the -p preview flag, then
install_all_updates does a preview operation. Note: It is possible to do one,
two, or all three of the installation options during a single customization operation.
5) The standby boot image is created and written to the standby BLV by using the AIX
bosboot command. You can block this step with the -N flag. You should use the -N
flag if you are an experienced administrator and have a good understanding of the
AIX boot process.
6) Upon exit, if standby BOS file systems were mounted in step 1, they are unmounted.
Mounting and unmounting standby BOS

It is possible to access and modify the standby BOS by mounting its file systems over the
standby BOS file system mount points. The multibos mount operation, with the -m flag,
mounts all standby BOS file systems in the appropriate order.
The multibos unmount operation, with the -u flag, unmounts all standby BOS file systems in
the appropriate order.

V10.0
Student Notebook
Uempty
Other multibos operations (2 of 2)

IBM Power Systems
• Standby BOS shell

– multibos –S
– Exit returns to active shell environment
• Booting to either standby BOS or active BOS

– bootlist –m normal hdisk# blv#
– shutdown -Fr
• Removing a standby BOS

– multibos -R
Figure 9-17. Other multibos operations (2 of 2) AN153.0
Notes:
Standby BOS shell

You can start a limited interactive chroot shell with standby BOS file systems by using the
multibos -S command. This shell accesses the standby files by using standard paths. For
example, /bos_inst/usr/bin/ls maps to /usr/bin/ls within the shell. The active BOS
files are not visible outside of the shell, unless they are mounted over the standby file systems.
Limit shell operations to changing data files, and do not make persistent changes to the kernel,
process table, or other operating system structures. Use the BOS shell only if you are
experienced with the chroot environment.
The multibos shell operation performs the following steps:
1. The standby BOS file systems are mounted, if they are not already.
2. The chroot utility is called to start an interactive standby BOS shell. The shell runs until an
exit occurs.
3. If standby BOS file systems were mounted in step 1, they are unmounted.
Student Notebook
Alternate boot
The bootlist command supports multiple BLVs. As an example, to boot from disk hdisk0 and
BLV bos_hd5, you would enter the command:
# bootlist –m normal hdisk0 blv=bos_hd5
After the system is rebooted from the standby BOS, the standby BOS logical volumes are
mounted over the usual BOS mount points, such as /, /usr, and /var. The set of BOS
objects, such as the BLV, logical volumes, file systems that are currently booted are considered
the active BOS, regardless of logical volume names. The previously active BOS becomes the
standby BOS in the existing boot environment.
Some facilities are blocked from alternating the BLV. When they tried to set the bootlist to the
standby BLV, they would receive the following error:
0514-226 bootlist: Invalid attribute value for blv
This error is an indication that either the BLV is corrupted or the ODM entry for it is corrupted. A
suggested solution is to rebuild the standby BLV. This solution requires a special bosboot flag:
# bosboot -sd /dev/ipldevice -M standby -l bos_hd5
Removing the standby BOS

The remove operation, with the -R flag, deletes all standby BOS objects, such as the BLV,
logical volumes, file systems.
You can use the remove operation to make room for a new standby BOS, or to clean up a failed
multibos installation. The remove operation performs standby tag verification on each object
before removing it. The remove operation act only on BOS objects that multibos created,
regardless of name or label. You always have the option of removing extra BOS objects by
using standard AIX utilities, such as rmlv, rmfs, and rmps.
The multibos remove operation does the following steps:
1) All boot references to the standby BLV are removed.
2) The bootlist is set to the active BLV. You can skip this step by using the -t flag.
3) Any mounted standby BLVs are unmounted.
4) Standby file systems are removed.
5) Remaining standby logical volumes are removed.

V10.0
Student Notebook
Uempty
Checkpoint (1 of 2)
IBM Power Systems
1. Name the two ways alternate disk installation can be used.
2. What are the advantages of alternate disk rootvg cloning?
3. Why should you not use exportvg with an alternate disk

volume group?
Notes:
Student Notebook
Checkpoint (2 of 2)
IBM Power Systems
4. True or False: multibos provides for booting between

alternate operating system environments within a single
rootvg.
5. True or False: A standby BOS can only be accessed by

changing the bootlist and then rebooting.
6. True or False: multibos requires cloning all of the logical

volumes in the active rootvg.
Notes:

V10.0
Student Notebook
Uempty
Exercise: Install and cloning techniques (Part 2)

IBM Power Systems
• Create and work with an alternate

rootvg
• Create and work with standby BOS

using multibos
Figure 9-20. Exercise: Install and cloning techniques (Part 2) AN153.0
Notes:
Student Notebook
Unit summary
IBM Power Systems

Notes:
Alternate disk installation techniques are available:
- Installing a mksysb onto an alternate disk
- Cloning the current rootvg onto an alternate disk
Alternate BOS can be created and maintenance that is applied.

V10.0
Student Notebook
Uempty
Unit 10. Advanced backup techniques

This unit describes techniques to ensure data integrity and consistency while
doing online backups.

• Explain factors that are related to online backup consistency
• Use JFS split mirror to back up file system data
• Use a snapshot volume group to back up file system data
• Use JFS2 snapshot to back up file system data
• Explain AIX considerations in using SAN Copy facilities

Accountability:
• Lab exercise
Reference
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-1
Student Notebook
Unit objectives
IBM Power Systems

Notes:

V10.0
Student Notebook
Uempty
Backup data inconsistency

IBM Power Systems
• Applications can have multiple data updates per transaction

• Failure to capture all related updates results in inconsistent
backups
• Application can use transaction logs to re-establish integrity
during recovery
– Otherwise, the backup needs to have consistency
Data states
Transaction X0, Y0
Write X1 X1, Y0
backup
X1, Y0
Write Y1 X1, Y1
Figure 10-2. Backup data inconsistency AN153.0
Notes:
Backing up data while a file system is active can lead to data consistency problems. The
backup utility is sequentially copying files while applications might still be updating those
contents. For a collection of related updates, the backup utility can copy one piece of data the
data after the update, but copy the other related data before it is updated. The result can be a
backup where two pieces of data are not consistent.
Some applications, especially database engines, record the progress of related updates in a
transaction log. During the application recovery process, the log identifies transactions where
not all related updates were confirmed. The recovery process then backs out the transaction,
backing out any updates that were recorded during the previous backup.
If an application does not have this type of recovery logic, then use of the inconsistent backup
can result in serious problems. In that situation, you need to have a way to ensure that the
backup has consistency.
Student Notebook
Ensuring backup data consistency

IBM Power Systems
• Offline backup: Integrity is assured by stopping application

and unmounting the file systems
• Online backup: Integrity is assured by quiescing application

processing during backup
– Stops writing data to file system for new transactions
– Completes writes for previously started transactions
• Problem: Time that is needed to backup is often too long to

have application stopped or quiesced
• Solution: Provide a quick way to capture a stable data state,

thus requiring only a brief quiesce
Figure 10-3. Ensuring backup data consistency AN153.0
Notes:
Traditionally, the best way to ensure that the data is consistent is to stop the application and
unmount the file system, followed by running a backup by inode. This procedure ensures that
there are no updates during the backup and that all file system’s data is flushed to disk. If a
backup takes a long time, having the application down for a long period can be unacceptable.
Some applications can be quiesced. In this state, either new transactions are not accepted or
they are only processed in user space without writing the updates to the file system. Either way,
the backup of the mounted file system can proceed without any file system activity from the
quiesced application. Again, if the backup takes a long time, being quiesced for a long period
might still be unacceptable.
The solution is to use the quiesced state to quickly capture the state of the file system. On-going
updates do not affect the actual file system. A method for capturing the file system state might
run for a few seconds. Such a short time for being in a quiesced state is often acceptable.

V10.0
Student Notebook
Uempty 10.1. LVM mirror-based online backups
Student Notebook
Topic 1 objectives
IBM Power Systems

• Create a split mirror of a JFS file system
• Back up data from the copy that was split off
• Reintegrate the split copy with the remaining mirror copies
Notes:

V10.0
Student Notebook
Uempty
Online JFS backup

IBM Power Systems
File system: /fs1
Copy 1 Copy 2 Copy 3
jfslog
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/syncd /fs1
Figure 10-5. Online JFS backup AN153.0
Notes:
Requirements
By splitting a mirror, you can back up the copy of the mirror that is not changing while the other
mirrors remain online.
To use this technique, it is best to have three copies of your data. You need to stop one of the
copies but the other one or two copies continue to provide redundancy for the online portion of
the logical volume.
You are also required to mirror the journal log for the file system.
The output from lsvg -l indicates that the logical volume and the log are both mirrored.
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lv03 jfs 1 3 3 open/syncd /fs1
Student Notebook
Splitting the mirror

IBM Power Systems
/backup
File system
/fs1
Copy 1 Copy 2 Copy 3
jfslog
# chfs -a splitcopy=/backup -a copy=3 /fs1
Figure 10-6. Splitting the mirror AN153.0
Notes:
Using chfs to split a mirror

The chfs command is used to split the mirror to form a snapshot of a JFS file system.
# chfs -a splitcopy=/backup -a copy=3 /fs1
This command creates a read-only file system that is called /backup that can be accessed to
do a backup. The journal log logical volume that is associated with the file system you are
splitting must also be mirrored.

V10.0
Student Notebook
Uempty Example
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
lv03 jfs 1 3 3 open/stale /fs1
lv03copy00 jfs 0 0 0 open/syncd /backup
The /fs1 file system still contains three physical partitions, but the mirror is now stale. The
stale copy is now accessible by the newly created read-only file system /backup. That file
system is on a newly created logical volume, lv03copy00. This logical volume is not
synchronized and is considered stale. Also, it does not indicate any logical partitions (since the
logical partitions really belong to lv03).
You can look at the content and interact with the /backup file system just like any other
read-only file system.
Student Notebook
Reintegrate a mirror backup copy

IBM Power Systems
File system /backup

/fs1
syncvg
Copy 1 Copy 2
Copy 3
syncvg
jfslog
# unmount /backup
# rmfs /backup
Figure 10-7. Reintegrate a mirror backup copy AN153.0
Notes:
Reintegrate the backup copy

To reintegrate the snapshot into the file system, unmount the /backup file system and then
remove it with the rmfs command.
The third copy automatically resynchronizes and comes online. The file system for the split copy
is removed.
The downside to this method is that all copies in the split mirror are considered stale and they all
must be resynced when it is rejoined. For large file systems, this process can take some time
during which the application must compete for access to the data with the syncvg operation.

V10.0
Student Notebook
Uempty
Snapshot volume groups (1 of 2)

IBM Power Systems
• All logical volumes must be mirrored on disks that contain only those
mirrors
• Ensure that there are no stale copies
• Use the splitvg command to split a mirrored copy into a snapshot

volume group
– Uses the recreatevg command to implement
• The split copy becomes a new volume group, called a snapshot volume
group, with its own VGname
• New logical volumes and mount points are created in the snapshot
volume group
• The snapshot file systems are not automatically mounted
Figure 10-8. Snapshot volume groups (1 of 2) AN153.0
Notes:
How it works
Snapshot support for a mirrored volume group is provided to split a mirrored copy of a fully
mirrored volume group into a snapshot volume group.
Ensure that there are no stale copies in the original volume group. The splitvg command
rejects a situation where the only remaining non-stale copy is in disk to be split unless you use
the force (-f) option.
When the volume group is split, the original volume group does not use the disks that are now
part of the snapshot volume group.
The splitvg command uses the recreatevg command to implement the split. This method
is a different technique from the JFS split mirror. It creates a new volume group with new file
system and logical volume names.
Student Notebook
Snapshot volume groups (2 of 2)

IBM Power Systems
• Physical partition changes in both volume groups are tracked

– Writes to a physical partition in the original volume group causes a
corresponding physical partition in the snapshot volume group to be
marked stale
– Writes to a physical partition in the snapshot volume group causes that
physical partition to be marked stale
• Use the joinvg command to rejoin the volume groups

– The stale physical partitions are resynchronized
– The user sees the same data in the rejoined volume group as was in
the original volume group before the rejoin
Figure 10-9. Snapshot volume groups (2 of 2) AN153.0
Notes:
Both volume groups track changes in physical partitions within the volume group. When the
snapshot volume group is rejoined with the original volume group, the synchronization needs to
occur on only the subset of physical partitions that were touched during the split period. This
method is much faster and has less performance impact than resynchronizing all physical
partitions, as is needed with the JFS split copy function.
Physical partition changes in both volume groups are tracked. Writes to a physical partition in
the original volume group causes a corresponding physical partition in the snapshot volume
group to be marked stale. Writes to a physical partition in the snapshot volume group causes
that physical partition to be marked stale.
To rejoin the volume groups, use the joinvg command. The stale physical partitions are
included in the original mirroring and the stale copies are automatically resynchronized.
The user sees the same data in the rejoined volume group as was in the original volume group
before the volume group is rejoined. In other words, the third copy shows the data changes that
occurred in the original volume group during the period it was split off.

V10.0
Student Notebook
Uempty
Snapshot volume group commands

IBM Power Systems
splitvg [ -y SnapVGname ] [-c copy] [-f] [-i] Vgname

-y Specifies the name of the snapped volume group
-c Specifies which mirror to use (1, 2 or 3)
-f Forces the split even if there are stale partitions
-i Creates an independent volume group which cannot be rejoined
into the original
joinvg [-f] Vgname

-f Forces the join when disks in the snapshot volume group are
missing or removed
Figure 10-10. Snapshot volume group commands AN153.0
Notes:
The splitvg command

The splitvg command splits a single mirror copy of a fully mirrored volume group into a
snapshot volume group. The original volume group stops does not use the disks that are now
part of the snapshot volume group. Both volume groups monitor the writes within the volume
group so that when the snapshot volume group is rejoined with the original volume group,
consistent data is maintained across the rejoined mirror copies.
The joinvg command

The joinvg command joins a snapshot volume group that was created with the splitvg
command back into its original volume group. The snapshot volume group is deleted and the
disks that are reactivated in the original volume group. A background process resynchronizes
any stale partitions.
Student Notebook
Snapshot volume group example

IBM Power Systems
Example: File system /data is in the datavg volume group

• These commands split the volume group, create a backup of
the /data file system and then rejoins the snapshot volume
group with the original.
1.splitvg -y snapvg datavg

The volume group datavg is split and the volume group snapvg is
created. The mount point /fs/data is created
2.backup -f /dev/rmt0 /fs/data
An i-node based backup of the unmounted file system /fs/data is
created on tape
3.joinvg datavg
snapvg is rejoined with the original volume group and synced in the
background
Figure 10-11. Snapshot volume group example AN153.0
Notes:
The splitvg creates a point in time separate snapshot volume group. The splitvg
command fails if any of the disks to be split are not active within the original volume group.
This volume group can be used to do the backup or other operations. In the example, the
backup command backs up one of the renamed file systems by inode (unmounted). You can
also mount the file system and backup by name instead.
Later, the joinvg command is used to rejoin the snapshot volume to the original volume group.
In the event of a system crash or loss of quorum while running this command, the joinvg
command must be run to rejoin the disks back to the original volume group.
You must have root authority to run these commands.

V10.0
Student Notebook
Uempty 10.2.JFS2 snapshot
Student Notebook
Topic 2 objectives
IBM Power Systems

• Create either an internal or external JFS2 snapshot
• List existing JFS2 snapshots
• Recover lost or corrupted files from a JFS2 snapshot
• Remove a JFS2 snapshot
• Increase the size of an external JFS2 snapshot
Notes:

V10.0
Student Notebook
Uempty
JFS2 snapshot (1 of 2)
IBM Power Systems
• A point-in-time image of a JFS2 file system

– Source file system is called the snapped file system (snappedFS)
– Snapshot creation is quick and requires little space
– It can have multiple snapshots for a single snappedFS, each taken at a
different point in time
• A snapshot image of a JFS2 file system can be used to:

– Restore files from a known point in time
– Access files or directories as they were at the time of the snapshot
– Back up a mounted snapshot to tape, DVD or a remote server
Figure 10-13. JFS2 snapshot (1 of 2) AN153.0
Notes:
JFS2 snapshot
A point-in-time image for a JFS2 file system is called a snapshot. The file system that is the
source of this point-in-time image is referred to as the snapped file system or snappedFS.
The snapshot view of the data remains static and retains the same security permissions that the
original snappedFS had when the snapshot was made. Also, a JFS2 snapshot can be created
without unmounting the file system, or quiescing the file system (though it is advisable for some
application to briefly quiesce during the snapshot). A snapshot can be used to access files or
directories as they existed when the snapshot was taken.
The snapshot can then be used to create a backup of the file system at the point in time that the
snapshot was taken. The snapshot also provides the capability to access files or directories as
they were at the time of the snapshot.
Student Notebook
JFS2 snapshot (2 of 2)
IBM Power Systems
• Snapshot stays stable while snappedFS is changing
• Using snapshot reduces application downtime

– Automatically freezes I/O while snapshot is created
– If intolerant of fuzzy backups, briefly quiesce the application
• A snapshot typically needs 2% - 6% of snappedFS space

requirements. There are two options:
– Separate logical volume (PPsize unit of allocation)
– Allocate space out of snappedFS (called an internal snapshot)
• At snapshot creation, only structure information is included
• When a write or delete occurs in the snappedFS, the affected

blocks are copied into existing snapshots
Figure 10-14. JFS2 snapshot (2 of 2) AN153.0
Notes:
How the JFS2 snapshot works

During creation of a snapshot, the snappedFS I/O is momentarily frozen, and all new writes are
blocked. This process ensures that the snapshot really is a consistent view of the file system at
the time of snapshot.
When a snapshot is initially created, only structure information is included. When a write or
delete occurs, then the affected blocks are copied into the snapshot file system.
Every read of the snapshot requires a lookup to determine whether the block needed should be
read from the snapshot or from the snappedFS. For instance, the block is read from the
snapshot file system if the block was changed since the snapshot took place. If the block is
unchanged since the snapshot, it is read from the snappedFS.
There are two types of JFS2 snapshots: internal and external. A JFS2 internal snapshot uses
space within the snappedFS. A JFS2 external snapshot is created in a separate logical volume
from the file system. The external snapshot can be mounted separately from the file system at

V10.0
Student Notebook
Uempty its own unique mount point. A file system can use either internal or external snapshots; it cannot
mix the different types.
Space requirements for a snapshot

Typically, a snapshot needs 2-6% of the space that is needed for the snappedFS. In the case of
a highly active snappedFS, this estimate can rise to 15%. This space is needed if a block in the
snappedFS is either written to or deleted. If this situation happens, the block is copied to the
snapshot. Therefore, any blocks that are associated with new files written after the snapshot
was taken are not copied to the snapshot, as they were not current at the time of the snapshot
and not relevant.
If the snapshot runs out of space, all snapshots that are associated with the snappedFS are
discarded and an entry is made in the AIX error log. If a snapshot file system fills up before a
backup is taken, the backup is not complete and must be rerun from a new snapshot. It might
need to be rerun with a larger size to allow for changes in the snappedFS.
Student Notebook
JFS2 snapshot mechanism (1 of 2)

IBM Power Systems
snappedFS
inode1 inode2
snapshot
inode1 inode2
Initially, the snapshot only points to data extents in

snappedFS
Figure 10-15. JFS2 snapshot mechanism (1 of 2) AN153.0
Notes:
Data blocks in snappedFS

The diagram, at the top, shows two inodes anchoring file data blocks. The inode accesses the
data blocks through a binary tree structure.
Data blocks in JFS2 snapshot

The diagram, at the bottom, shows the structure that is initially created in a JFS2 snapshot. The
snapshot has the metadata, but all of the pointers refer to the snappedFS data blocks. Thus, the
snapshot requires little space. Initially, data that is retrieved from a mounted snapshot is
identical to the current data in the snappedFS.

V10.0
Student Notebook
Uempty
JFS2 snapshot mechanism (2 of 2)

IBM Power Systems
snappedFS
inode1 inode2
snapshot
inode1 inode2
Original of modified data is copied to the snapshot
Figure 10-16. JFS2 snapshot mechanism (2 of 2) AN153.0
Notes:
Data blocks in snappedFS after data changes

In the diagram in the visual, some of the data blocks were modified. Because the kernel file
system logic knows that there is a snapshot for this file system, it copies the original data blocks
to the snapshot before modifying (or deleting) those data blocks in the snappedFS.
Data blocks in JFS2 snapshot after data changes

The diagram at the bottom of the visual shows that the inode tree structure points to the copies
of the original data (now stored in the snapshot) rather than referring to the snappedFS data
blocks. This process ensures that access to the snapshot always returns the original data (from
the time the snapshot was created) for the snappedFS.
Student Notebook
JFS2 snapshot SMIT menu

IBM Power Systems
# smit jfs2
Enhanced Journaled File Systems
Move cursor to desired item and press Enter.
...
List Snapshots for an Enhanced Journaled File System
Create Snapshot for an Enhanced Journaled File System
Mount Snapshot for an Enhanced Journaled File System
Remove Snapshot for an Enhanced Journaled File System
Unmount Snapshot for an Enhanced Journaled File System
Change Snapshot for an Enhanced Journaled File System
Rollback an Enhanced Journaled File System to a Snapshot
Figure 10-17. JFS2 snapshot SMIT menu AN153.0
Notes:
The various JFS2 snapshot operations can be done from SMIT. The SMIT JFS2 menu includes
many items that relate to JFS2 snapshots.
An example with only the menu items for snapshot is shown in the visual.

V10.0
Student Notebook
Uempty
Creating snapshots: External

IBM Power Systems
• Creating an external snapshot for a JFS2 file system that is

already mounted:
– Using a new logical volume
# snapshot -o snapfrom=snappedFS -o size=Size
or
# smit crsnapj2
– Using an existing logical volume

# snapshot -o snapfrom=snappedFS snapshotLV
or
# smit crsnapj2lv
• Creating an external snapshot as part of the mount option

# mount -o snapto=/snapshotLV snappedFS MountPoint
Figure 10-18. Creating snapshots: External AN153.0
Notes:
Creating an external snapshot on a new logical volume for a JFS2 file

system that is already mounted
When creating a new external snapshot, you must provide the size of the logical volume
allocation (unless using a pre-existing logical volume).
If you want to create a snapshot for a mounted JFS2 file system, you can use the following
method:
• To create a snapshot in a new logical volume, specifying the size:
# snapshot -o snapfrom=snappedFS -o size=Size
For example:
# snapshot -o snapfrom=/home/myfs -o size=16M
This command creates a 16 MB logical volume and create a snapshot for the
/home/myfs file system on the newly created logical volume.
Student Notebook
Creating an external snapshot on an existing logical volume for a JFS2

file system that is already mounted
If you want to control details of the logical volume that holds an external snapshot, you can use
the following method:
• To create a snapshot that uses an existing logical volume:
# snapshot -o snapfrom=snappedFS snapshotLV
For example:
# snapshot -o snapfrom=/home/myfs /dev/mysnaplv
This command creates a snapshot for the /home/myfs file system on the /dev/mysnaplv
logical volume, which exists.
Creating an external snapshot for a JFS2 file system that is not mounted
The mount option, -o snapto=/snapshotlv, can be used to create a snapshot for a JFS2 file
system that is not currently mounted:
# mount -o snapto=/snapshotLV snappedFS MountPoint
If the snapto value starts with a slash, then it is assumed to be a special device file for an
existing logical volume where the snapshot should be created. For example:
# mount -o snapto=/dev/mysnaplv /dev/fslv00 /home/myfs
This command mounts the file system that is contained on the /dev/fslv00 to the mount
point of /home/myfs and then proceeds to create a snapshot for the /home/myfs file system
in the logical volume /dev/mysnaplv.

V10.0
Student Notebook
Uempty
Creating snapshots: Internal

IBM Power Systems
• Creating an external snapshot for a JFS2 file system that is

already mounted:
# snapshot -o snapfrom=snappedFS –n snapshotName
or
# smit crintsnapj2
• Creating an internal snapshot as part of the mount option

# mount -o snapto=snapshotLV snappedFS MountPoint
• Internal snapshot attribute must be set to yes on creation of

the file system:
# smit crfs
(in dialog panel: Allow Internal Snapshots[yes])
or
# crfs –a isnapshot=yes
Figure 10-19. Creating snapshots: Internal AN153.0
Notes:
Creating an internal snapshot for a JFS2 file system that is already

mounted
If you want to create an internal snapshot for a mounted JFS2 file system, you can use the
following method:
• To create an internal snapshot, specify a snapshot name:
# snapshot -o snapfrom=snappedFS -n snapshotname
For example:
# snapshot -o snapfrom=/home/myfs -n mysnap
This command creates a snapshot that is named mysnap that is internal to the
snappedFS /home/myfs.
Student Notebook
Creating an internal snapshot for a JFS2 file system that is not mounted
The mount option, -o snapto=snapshotlv, can be used to create a snapshot for a JFS2 file
system that is not currently mounted:
# mount -o snapto=snapshotname snappedFS MountPoint
If the snapto value starts with a slash, then it is assumed to be a special device file for an
existing logical volume where the snapshot should be created. If the snapto value does not
start with a slash, then it is assumed to be the name of an internal snapshot to be created.
Internal JFS2 snapshot considerations:

First, it is important to know that internal snapshots cannot be used unless the file system was
enabled to support them at file system creation.
• To enable the file system to support internal snapshots (at creation time only):
# crfs –a isnapshot=yes ....
Internal snapshots are preserved when the logredo command runs on a JFS2 file system
with an internal snapshot.
Internal snapshots are removed if the fsck command needs to modify a JFS2 file system to
repair it.
If an internal snapshot runs out of space, or if a write to an internal snapshot fails, all internal
snapshots for that snappedFS are marked invalid. Further access to the internal snapshots fails.
These failures write an entry to the error log.
Internal snapshots are not separately mountable.
Internal snapshots are not compatible with AIX releases before AIX 6.1. A JFS2 file system that
is created to support internal snapshots cannot be modified on an earlier release of AIX.
A JFS2 file system with internal snapshots cannot be defragmented.

V10.0
Student Notebook
Uempty
Listing snapshots
IBM Power Systems
# smit lssnap (and select file system from list)

-OR-
# snapshot -q /home/myfs2
Snapshots for /home/myfs2

Current Name Time
mysnap Wed 19 Nov 08:44:33 2014
mysnap2 Fri 21 Nov 09:33:33 2014
* mysnap3 Mon 24 Nov 14:03:18 2014
# snapshot -q /home/myfs
Snapshots for /home/myfs

Current Location 512-blocks Free Time
* /dev/fslv06 262144 261376 Tue May 6 18:15:11 2014
Figure 10-20. Listing snapshots AN153.0
Notes:
The snapshot –q option can be used display the snapshots that are related to the specified
file system.
If the file system uses internal snapshots, then the report provides the snapshot names and
creation times. The * indicates the current snapshot.
# snapshot -q /home/myfs2
Snapshots for /home/myfs2

Current Name Time
mysnap Wed 19 Nov 08:44:33 2014
mysnap2 Fri 21 Nov 09:33:33 2014
* mysnap3 Mon 24 Nov 14:03:18 2014
Student Notebook
If the file system uses external snapshots, then the report provides, for each snapshot, the
logical volume special device file, the snapshot size, how much space is free in the snapshot,
and the creation time.
# snapshot -q /home/myfs
Snapshots for /home/myfs

Current Location 512-blocks Free Time
* /dev/fslv06 262144 261376 Wed May 6 18:15:11 2014

V10.0
Student Notebook
Uempty
Using a JFS2 snapshot to recover

IBM Power Systems
• Recover entire file system to point of snapshot creation:

# umount /home/myfs
# rollback /home/myfs /dev/mysnaplv (for external)
# rollback –n mysnap /home/myfs (for internal)
• Recover individual files from JFS2 snapshot image:
– Mount the snapshot (if external):

# mount -v jfs2 -o snapshot /dev/mysnaplv /mntsnapshot
– Change to the directory that contains the snapshot:

# cd /mntsnapshot
# cd /home/mfs/.snapshot/mysnap (if internal)
– Copy the accurate file to overwrite the corrupted one:

# cp myfile /home/myfs (Copies only the file named myfile)
Figure 10-21. Using a JFS2 snapshot to recover AN153.0
Notes:
Rollback
The rollback command is an interface to revert a JFS2 file system to a point-in-time
snapshot. The snappedFS parameter must be unmounted before the rollback command is
run and remains inaccessible during the command. Any snapshots that are taken after the
specified snapshot (snapshotObject for external or snapshotName for internal) are removed.
The associated logical volumes are also removed for external snapshots.
Recover individual files

If you want to restore individual files back to their original state, then you can mount the
snapshot and then manually copy the files back over. If the snapshot is internal, then no mount
is necessary. Instead, you need to explicitly specify the path to the snapshot
(/snappedFS-mount-point/.snapshot/snapshot-name) on a change directory command.
Student Notebook
As with any file copying, be careful about changing the nature of the file (for example,
ownership, permission, and sparseness). Using the backup and restore utilities to implement
a copy of files is often a safer technique.

V10.0
Student Notebook
Uempty
Using a JFS2 external snapshot to back up

IBM Power Systems
• The JFS2 snapshot can be a stable source for backup to

media
• Mount the external snapshot and use relative path backup:

# mount -v jfs2 -o snapshot /dev/mysnaplv /mntsnapshot
# cd /mntsnapshot
# find . | backup –i –f /servermnt/backup52
• To create snapshot and backup in one operation:

# backsnap -m MountPoint -s Size BackupOptions snappedFS
For example:
# backsnap -m /mntsnapshot -s size=16M –i –f /dev/rmt0 /home/myfs
Figure 10-22. Using a JFS2 external snapshot to back up AN153.0
Notes:
Using an existing external snapshot to do a backup

For an external snapshot, you first need to mount the snapshot. Then, specify the mount point in
your backup by name execution.
Creating an external snapshot and backup in one operation

The backsnap command provides an interface to create a snapshot for a JFS2 file system and
backs up the snapshot. The command syntax for an external snapshot is:
For example:
# backsnap -m /mntsnapshot -s size=16M -i -f /dev/rmt0 /home/myfs
This command creates a 16 MB logical volume and create a snapshot for the /home/myfs file
system on the newly created logical volume. It then mounts the snapshot logical volume on
Student Notebook
/mntsnapshot. The remaining arguments are passed to the backup command. In this
example, the files and directories in the snapshot are backed up by name (-i) to /dev/rmt0.

V10.0
Student Notebook
Uempty
Using a JFS2 internal snapshot to back up

IBM Power Systems
• cd to internal snapshot and use relative path backup:

# cd /home/myfs/.snapshot/mysnap
# find . | backup –i –f /servermnt/backup52
• To create snapshot and backup in one operation:

# backsnap –n snapshotname BackupOptions snappedFS
For example:
# backsnap –n mysnap -s size=16M -i -f/dev/rmt0 /home/myfs
Figure 10-23. Using a JFS2 internal snapshot to back up AN153.0
Notes:
Using an existing internal snapshot to do a backup

For an internal snapshot, you need to know only the hidden directory name for accessing the
snapshot. Then, specify that directory name in your backup by name execution.
Creating an internal snapshot and backup in one operation

The backsnap command provides an interface to create a snapshot for a JFS2 file system and
back up the snapshot. The command syntax for an internal snapshot is:
For example:
# backsnap -m /mntsnapshot -s size=16M -i -f/dev/rmt0 /home/myfs
This command creates a 16 MB logical volume and create a snapshot for the /home/myfs file
system on the newly created logical volume. It then mounts the snapshot logical volume on
Student Notebook
/mntsnapshot. The remaining arguments are passed to the backup command. In this
example, the files and directories in the snapshot are backed up by name (-i) to /dev/rmt0.

V10.0
Student Notebook
Uempty
JFS2 snapshot space management

IBM Power Systems
• List snapshots for the snappedFS

# snapshot –q snappedFS
• External snapshot:
– The snapshot report identifies the size and amount of free space
– If the snapshot needs more space:
# snapshot –o size=+1 snapshotLV
• Internal snapshot:
– Shares logical volume with the snappedFS
# df –m snappedFS
– If snappedFS is out of space, try to free up space – possibly delete old
snapshots
# snapshot –d –n snapshot_name snappedFS
Figure 10-24. JFS2 snapshot space management AN153.0
Notes:
It is useful to be able to identify situation where a snapshot is growing large. If a snapshot runs
out of space, then all snapshots are invalidated and become unusable. If dealing with an
internal snapshot, the snapshots can contribute to the entire file system running out of space.
To monitor an external snapshot, use the query option of the snapshot command. An
alternative would be to mount the snapshot and use the df command, but that is more
complicated.
If an external snapshot needs more room, you can dynamically increase the size of the
snapshot logical volume by using the size option of the snapshot command.
For an internal snapshot, there is no mechanism for identifying the space usage of the
snapshots. Instead, you monitor the size of the snappedFS.
When a file system is running out of space, one way to free space is to delete old snapshots.
Keeping many generations of snapshots can be useful, but it can also be expensive in terms of
space usage.
Student Notebook

V10.0
Student Notebook
Uempty 10.3.SAN Copy issues
Student Notebook
Topic 3 objectives
IBM Power Systems

• Explain potential problems in using SAN Copy
• Explain use of the JFS2 freeze and resume function
• Explain methods for accessing SAN Copy target LUNs
Notes:

V10.0
Student Notebook
Uempty
SAN Copy and file system cache

IBM Power Systems
• SAN Copy uses the storage subsystem contents

• Even with quiesced application, file system updates can be
cached in memory and unwritten to the storage subsystem
• Need to stop file system activity and flush memory cache
• Unmounting the file system would work, but online alternative
is needed
Kernel cache Storage subsystem
Transaction X0, Y0 X0, Y0
Write X1 X1, Y0 X1, Y0
Write Y1 X1, Y1 Update only

in memory
Figure 10-26. SAN Copy and file system cache AN153.0
Notes:
Use of copy services that are provided by SAN-attached storage subsystems is fairly common
and sometimes referred to as SAN Copy. These copy services make a point-in-time exact copy
of the contents of a LUN as seen by the storage subsystem controller. Not only can they provide
a point in time copy of a LUN, but this activity does not depend on any host system resources.
However, potential problems can result from seeing only the data because it is in the storage
subsystem.
Normally, when an application writes data, it receives confirmation of the write when AIX caches
the data in memory. Later, various AIX mechanisms flush that data to disk storage. When a SAN
Copy is initiated, the transaction-related updates can either be in AIX kernel memory or in the
storage subsystem. The SAN Copy might have inconsistent data, even if the application was
quiesced before taking the snapshot.
To avoid this problem, you need to ensure that none of the related data updates are cached in
AIX memory at the time of the SAN Copy. Unmounting the file system is generally not an
acceptable solution given the disruption to the application.
Student Notebook
Use of JFS2 freeze and thaw

IBM Power Systems
1. JFS2 freeze stops processing application write requests and

then flushes cached data to disk
# sync; chfs –a freeze=<timeout in sec> <FSname>
2. Use SAN Copy command for related LUNs
3. When SAN Copy completes, thawing the file system

resumes processing of application write requests
# chfs –a freeze=off <FSname>
• JFS2 freeze is not needed if the quiesced application:

– Uses Direct I/O or Concurrent I/O for the concerned files
– Issues fsync() to flush the file data after quiescing
Figure 10-27. Use of JFS2 freeze and thaw AN153.0
Notes:
AIX provides a JFS2 file system freeze capability. It stops processing new file system I/O
requests and then flushes out all memory cached file system data to the physical volume.
After the application is quiesced and the file system that is frozen, use of a SAN Copy captures
consistent data.
After the SAN Copy completes, you can then thaw the file system and resume application
processing.
This procedure is only needed when the application allows AIX to cache writes and to decide
when to flush the cached data. There are two situations where the freeze mode is not needed.
- The application processes the file by using Direct I/O (DIO). With DIO, writes are
synchronous and go directly to storage without any caching in kernel memory. Concurrent
I/O always uses DIO.
- The application calls the synchronous fsync() system call for its output files, forcing AIX
to flush all cached data for that file and returning to the application when that is completed.

V10.0
Student Notebook
Uempty The chfs freeze attribute requires a value that specifies a timeout period. If the file system is
not explicitly thawed (again using the chfs command) within that timeout period, the file system
is automatically thawed. This attribute is intended to avoid permanent file system freezes and
the timeout should be set a time period that is much longer than you would imagine being
required to process your SAN Copy.
The sync command is run immediately before the freeze request because for large amounts
of cached data, the sync command is much more efficient in finding and flushing that data than
the freeze function. Then, the freeze function needs to handle only data that was cached
immediately after the flush; which should be a small amount of data.
Student Notebook
Consistency groups
IBM Power Systems
• When multiple LUNs contain related information, sequential

SAN Copy of those LUNs can result in inconsistencies
• The storage subsystem should define related LUNs in the
same consistency group
• SAN Copy ensures a point in time copy across all LUNs in the
consistency group
• A SAN Copy of a file system log that is not consistent with the
SAN Copy of the file system data results in metadata
corruption
• All file systems that use a given file system journal log must
be in same consistency group as the journal log
– JFS2 in-line logs, or
– Dedicated logs for each file system
Figure 10-28. Consistency groups AN153.0
Notes:
While the previously discussed techniques can ensure the consistency of a point-in-time copy of
a single LUN, when multiple LUNs are interrelated you are faced with new issues. Normally,
each LUN would be SAN Copied separately and each would be at a different point-in-time. But
since they are at different points-in-time, between them, they can have inconsistency of related
data.
When the storage subsystem defines LUNs as belonging to a common consistency group, the
entire consistency group is copied at the same point-in-time. This procedure ensures data
consistency.
Of special concern is the relationship between a file system and its journal log. If the file
systems are on different LUNS and you do not ensure consistency, then you essentially have
metadata corruption that can make that file system and log combination unusable.
You can have a problem if multiple file systems share a log and some of the file systems are not
included in the consistency group. What can happen is you have a situation where later access
of the log is incompatible with the state of those other file systems. Thus, for file systems that
are using SAN Copy, either each file system has its own external journal log or that they use
JFS2 in-line journal logs.

V10.0
Student Notebook
Uempty If the LUN is one of many physical volumes in an entire volume group that is being backed up,
all of the LUNs in the volume group should be included in the same consistency group.
Student Notebook
Accessing SAN Copy data

IBM Power Systems
• SAN Copy target LUN is an exact copy of original disk,

including VGDA, LVCB, VGID, and PVID
• If using for rootvg recovery, boot from the target LUN
• If accessing the entire user volume group on a different AIX
system:
– Import using the importvg command
– Run varyonvg, fsck and mount file systems
• If accessing a user volume group on the same system:
– Import using the recreatevg command
– Run fsck and mount file systems
Figure 10-29. Accessing SAN Copy data AN153.0
Notes:
SAN Copy creates exact duplicates of the physical volumes, rather than a backup image to be
restored. For an AIX system to access the disk, it needs to be discovered (zoned to that host
and detected, by way of cfgmgr, by that host) and then imported into the ODM.
If it is to act as the rootvg of that system, it must be designated as the boot device before
booting that host.
User volume groups can be accessed to either directly recover contents from the copy, or to
enable a backup utility to create a backup of the copied volume group. In either case, the PVID
on the disk (or disks) should be changed to avoid issues of duplicate PVIDs.
If accessing the entire volume group from a system that is different from the original system,
use the importvg command on any disk in the consistency group for the volume group. Then,
vary online, run a file system check, and mount the file systems of interest. To avoid possible
future PVID conflicts, you should consider changing the PVID on the disks after importvg is
completed. Changing the PVID can be accomplished by using the chdev command as follows:
# chdev -l hdisk# -a pv=clear
# chdev -l hdisk# -a pv=yes

V10.0
Student Notebook
Uempty When accessing from the same system (it is assumed that the original volume group still exists)
or accessing a subset of the physical volumes in the volume group, use the recreatevg
command. Then, do a file system check and a mount of the file systems that are of interest. The
recreatevg command has special abilities to selectively restore only the logical volumes that
are on the specified disks. The recreatevg command can automatically change the PVIDs.
Student Notebook
The recreatevg command

IBM Power Systems
recreatevg [-f][-y vgname] [-L label prefix] [-Y LV prefix] PVs

-y Specifies the name to use for the volume group
-L Parent directory for new file system mount points
-Y Prefix for new logical volume names
-f Force creation even with missing physical volumes
• Handles conflicts with still active volume group
• recreatevg –y oldvg –L /old –Y XX hdisk5 hdisk6

– Original names: /dev/myfs /myfs
– New names: /dev/XXmyfs /old/myfs
Figure 10-30. The recreatevg command AN153.0
Notes:
The recreatevg command is specially designed to handle the import of volume group copies
to the same system from which they were copied.
One way in which it differs from just using importvg, is the creation of a new VGID and new
PVIDs. Another major difference is that you can specify prefixes to be used when creating new
file system names and logical volume names, which avoid conflicts with the original names.
As seen in the visual, the -L option is used to create a prefix to the file system name, which
becomes a common parent directory to all of the file system mount points. The -Y option is
used to create a prefix for the logical volume names.
It is important that you specify all disks that belong to the volume group, as arguments to the
command, when trying to access the entire volume group.
You can have the recreatevg command use only the specified disks and logical volumes that
are on those disks.

V10.0
Student Notebook
Uempty
Checkpoint
IBM Power Systems
1. True or False: The creation of a snapshot volume group

marks all copies in the snapshot as stale.
2. True or False: The creation of a JFS split copy marks all of

the split mirror copies as stale.
3. True or False: After the creation of a JFS split mirror copy,

the administrator needs to mount the new file system to be
able to access the split copy.
4. To access a SAN Copy of an active volume group on the

source system, use the command:
a. joinvg
b. importvg
c. recreatevg
Notes:
Student Notebook
Exercises: Advanced backup techniques

IBM Power Systems
• Use a snapshot volume group
• Use JFS split copy (optional)
• Use JFS2 snapshots
• Use a file system as a recovery source
Figure 10-32. Exercises: Advanced backup techniques AN153.0
Notes:

V10.0
Student Notebook
Uempty
Unit summary
IBM Power Systems

Notes:
Student Notebook

V10.0
Student Notebook
Uempty
Unit 11. Diagnostics

This unit is an overview of diagnostics available in AIX.

After completing this appendix, you should be able to:
• Use the diag command to diagnose hardware
• List the different diagnostic program modes

Accountability:
• Lab exercise
References
Online AIX Version 7.1 Understanding the Diagnostic Subsystem
for AIX
© Copyright IBM Corp. 2009, 2015 Unit 11. Diagnostics 11-1

Student Notebook
Unit objectives
IBM Power Systems
After completing this appendix, you should be able to:

Notes:

V10.0
Student Notebook
Uempty
When do you need diagnostics?

IBM Power Systems
Diagnostics
NIM Master
CD/DVD
bos.diag
Diagnostics
Hardware error in Machine does not Strange system

error log boot behavior
Figure 11-2. When do you need diagnostics? AN153.0
Notes:
Introduction
The lifetime of hardware is limited. Broken hardware leads to hardware errors in the error log, to
systems that will not boot, or to strange system behavior.
The diagnostic package helps you to analyze your system and discover hardware that is
broken. Additionally, the diagnostic package provides information to service representatives that
allows fast error analysis.
Sources for diagnostic programs

Diagnostics are available from different sources:
- A diagnostic package is shipped and is installed with your AIX operating system.
Diagnostics are packaged into separate software packages and filesets. The base
diagnostics support is contained in the package bos.diag. The individual device support is
packaged in separate devices.[type].[deviceid] packages.

Student Notebook
The bos.diag package is split into the following filesets:

- bos.diag.rte contains the Controller and other base diagnostic code
- bos.diag.util contains the Service Aids and Tasks
- bos.diag.com contains the diagnostic libraries, kernel extensions, and
development header files
- bos.diag.ecc contains the inventory scout ECC client
- Diagnostic CDs are available for you to diagnose a system that does not have AIX installed.
Normally, the diagnostic CD is not shipped with the system.
- Diagnostic programs can be loaded from a NIM master. This master holds and maintains
different resources, for example, a diagnostic package. This package can be loaded through
the network to a NIM client that is used to diagnose the client machine.

V10.0
Student Notebook
Uempty
Where do you run diagnostics?

IBM Power Systems
• No diag on virtual devices

• VIOS CLI: diagmenu or run AIX diag under oem_setup_env
Physical
P
adapter Virtual I/O Server Client Client
S
VSCSI server
virtual adapter
VSCSI client VTD

C
virtual adapter
Virtual Virtual
VTD SCSI SCSI
VTD
Virtual target disk disk
device
P S1 S2 C C
Hypervisor
Physical VSCSI protocol
storage
hdisk
Figure 11-3. Where do you run diagnostics? AN153.0
Notes:
Diagnostics are done on physical devices. It is fairly common to have logical partitions that see
only virtual devices: virtual Ethernet, virtual SCSI, virtual Fibre Channel. The diag utilities do
not diagnose virtual devices.
In a virtualized environment, the physical devices are allocated to the virtual I/O servers (VIOS).
If a client LPAR cannot access a device, the administrator needs to identify the VIOS providing
access and run the diagnostics at the VIOS.
The VIOS command-line interface (CLI) equivalent of the AIX diag command is the diagmenu
command. The alternative is to create a root AIX subshell with the oem_setup_env command
and run the AIX command in that shell.

Student Notebook
The diag command

IBM Power Systems
AIX error log
Auto diagnose Report test result
diag
•diag allows testing of a device, if it is not busy

•diag allows analyzing the error log
Figure 11-4. The diag command AN153.0
Notes:
Overview of the diag command

Whenever you detect a hardware problem, for example, a communication adapter error in the
error log, use the diag command to diagnose the hardware.
The diag command can test a device, if the device is not busy. If any AIX process is using a
device, the diagnostic programs cannot test it; they must have exclusive use of the device to be
tested. Methods that are used to test devices that are busy are introduced later in this unit.
The diag command analyzes the error log to fully diagnose a problem if run in the correct
mode. It provides information that is useful for the service representative, for example Service
Request Numbers (SRN) or probable causes.
There is a relationship between the AIX error log and diagnostics. When the errpt command is
used to display an error log entry, diagnostic results that are related to that entry are also
displayed.

V10.0
Student Notebook
Uempty
Working with diag (1 of 3)

IBM Power Systems
# diag
FUNCTION SELECTION 801002
Move cursor to selection, then press Enter.
Diagnostic Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will not be used.
Advanced Diagnostics Routines
other advanced functions will be used.
Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.)
This selection will list the tasks supported by these procedures.
Once a task is selected, a resource menu may be presented showing
all resources supported by the task.
Resource Selection
This selection will list the resources in the system that are supported
by these procedures. Once a resource is selected, a task menu will
be presented showing all tasks that can be run on the resource(s).
Figure 11-5. Working with diag (1 of 3) AN153.0
Notes:
Introduction to diag menus

The diag command is menu-driven, and offers different ways to test hardware devices or the
complete system. One method to test hardware devices with diag is:
Start the diag command. A welcome screen appears, which is not shown on the visual. After
pressing Enter, the FUNCTION SELECTION menu is shown.
If Diagnostic Routines or Advanced Diagnostics Routines is selected, then the Diagnostic
Mode Selection menu is displayed, to determine whether System Verification or Problem
Determination should be run.
If the Task Selection menu is selected, then the following happens:
- The Diagnostic Controller displays a list of Tasks that are available for the system.
- After a task is selected, a Resource Selection menu appears if the selected task supports
a resource selection. After selection of a resource, the task is called with the selected
resource name as a command-line argument.

Student Notebook
- If the selected task does not support resource selection, then the task is started.
If the Resource Selection menu is selected, then the following happens:
- The Diagnostic Controller displays a list of resources available on the system.
- After a resource is selected, a Task Selection menu will appear containing the commonly
supported tasks for each selected resource. After selection of a task, the task is started.

V10.0
Student Notebook
Uempty

IBM Power Systems
# diag
Diagnostic Routines
other advanced functions will not be used.
...
DIAGNOSTIC MODE SELECTION 801003
System Verification
This selection will test the system, but will not analyze the error
log. Use this option to verify that the machine is functioning
correctly after completing a repair or an upgrade.
Problem Determination
This selection tests the system and analyzes the error log
if one is available. Use this option when a problem is
suspected on the machine.
Notes:
Working with Diagnostic Routines

Select Diagnostic Routines, or Advanced Diagnostic Routines to test hardware devices.
The next menu is DIAGNOSTIC MODE SELECTION. Here you have two selections:
• System Verification tests the hardware without analyzing the error log. This option is
used after a repair to test the new component. If a part is replaced due to an error log
analysis, the service provider must log a repair action to reset error counters and
prevent the problem from being reported again. Running Advanced Diagnostics
Routines (in the FUNCTION SELECTION menu) in System Verification mode will log a
repair action.
• Problem Determination tests hardware components and analyzes the error log. Use this
selection when you suspect a problem on a machine. Do not use this selection after you
repair a device, unless you remove the error log entries of the broken device.

Student Notebook

IBM Power Systems
DIAGNOSTIC SELECTION 801006
From the list below, select any number of resources by moving the
cursor to the resource and pressing 'Enter'.
To cancel the selection, press 'Enter' again.
To list the supported tasks for the resource highlighted, press 'List'.
Once all selections have been made, press 'Commit'.

To avoid selecting a resource, press 'Previous Menu'.
All Resources
This selection will select all the resources currently displayed.
sysplanar0 System Planar
U7311.D20.107F67B-
sisscsia0 P1-C04 PCI-XDDR Dual Channel Ultra320 SCSI
Adapter
+ hdisk2 P1-C04-T2-L8-L0 16 Bit LVD SCSI Disk Drive (73400 MB)
hdisk3 P1-C04-T2-L9-L0 16 Bit LVD SCSI Disk Drive (73400 MB)
ses0 P1-C04-T2-L15-L0 SCSI Enclosure Services Device
L2cache0 L2 Cache
...
Notes:
Selecting a device to test

In the next diag menu, select the hardware devices that you want to test. If you want to test the
complete system, select All Resources. If you want to test selected devices, press Enter to
select any device, then press F7 to commit your actions. In the example, hdisk2 is selected.
If you press F4 (List), diag presents tasks the selected devices support, for example:
- Run diagnostics
- Run error log analysis
- Change hardware vital product data
- Display hardware vital product data
- Display resource attributes
To start diagnostics, press F7 (Commit).

V10.0
Student Notebook
Uempty
What happens if a device is busy?

IBM Power Systems
ADDITIONAL RESOURCES ARE REQUIRED FOR TESTING 801011
No trouble was found. However, the resource was not tested because
the device driver indicated that the resource was in use.
The resource needed is

- hdisk2 16 Bit LVD SCSI Disk Drive (73400 MB)
U7311.D20.107F67B-P1-C04-T2-L8-L0
To test this resource, you can do one of the following:

Free this resource and continue testing.
Shut down the system and reboot in Service mode.
Testing should stop.

The resource is now free and testing can continue.
Figure 11-8. What happens if a device is busy? AN153.0
Notes:
If the device is busy

If a device is busy, which means the device is in use, you cannot test the device or analyze the
error log.
The example in the visual shows that the disk drive was selected to test, but the resource was
not tested because the device was in use. To test the device, the resource must be freed.
Another diagnostic mode must be used to test this resource.

Student Notebook
Diagnostic modes (1 of 3)
IBM Power Systems
Concurrent mode:
# diag
• Execute diag during normal
system operation
• Limited testing of components
# shutdown -m
Maintenance mode:
Password:
• Execute diag during single-user # diag
mode
• Extended testing of components
Figure 11-9. Diagnostic modes (1 of 3) AN153.0
Notes:
Diagnostic modes
Three different diagnostic modes are available:
- Concurrent mode
- Maintenance (single-user) mode
- Service (standalone) mode (covered on the next visual).
Concurrent mode
Concurrent mode provides a way to run online diagnostics on some of the system resources
while the system is running normal system activity. Certain devices can be tested, for example,
a tape device that is not in use, but the number of resources that can be tested is limited.
Devices that are in use cannot be tested.

V10.0
Student Notebook
Uempty Maintenance (single-user) mode

To expand the list of devices that can be tested, one method is to take down the system to
maintenance mode by using the command, shutdown -m.
Enter the root password when prompted, and run the diag command in the shell.
All programs, except the operating system itself, are stopped. All user volume groups are
inactive, which extends the number of devices that can be tested in this mode.

Student Notebook
IBM Power Systems
Service Insert diagnostics CD/DVD, or

(standalone) ensure that drive is empty
mode
using CD/DVD or
hard drive Shut down your system
# shutdown
Start LPAR with

Boot system in service mode
boot mode of:
Diagnostic with
default bootlist
diag is started
automatically
Notes:
Standalone mode
But what do you do if your system does not boot or if you must test a system without AIX
installed on the system? In this case, you must use the standalone mode.
Standalone mode offers the greatest flexibility. You can test systems that do not boot or that
have no operating system installed (the latter requires a diagnostic CD/DVD).
Starting standalone diagnostics

Follow these steps to start diagnostics in standalone mode:
1. If you have a diagnostic CD/DVD, insert it into the system.
2. Shut down the AIX system. If this machine is a server in the manufacturing default
configuration (single partition) and you are not using an HMC - you would next power
down the server from the operator panel. If running in a partitioned environment, the
firmware should shut down the partition after AIX reaches a halt state.

V10.0
Student Notebook
Uempty 3. Boot your AIX system. If in manufacturing default configuration, you can power on the
server from the operator panel. If in a partitioned system, you would use the HMC to
start the LPAR.
4. If starting a partition with the HMC, you would specify a boot mode of Diagnostic with
Default Bootlist. If using the manufacturing default configuration with an attached
console, see the paragraph on using the console keyboard to control the boot mode.
Either method boot the machine in service mode.
5. If the CD/DVD drive has a diagnostic CD/DVD mounted, the diagnostic program boots
from that device. If there is nothing in the CD/DVD drive, then it will boot off the hard
disk, running the diagnostic program on that disk.
6. Now, you can run one of the diagnostic routines.
Using keyboard to control boot mode

After the system discovers the keyboard (you hear a beep) and before the system uses a
particular bootlist, you can press a key to control the mode and bootlist.
On older pSeries models, the attached graphic console used functions keys (such as F5 or F6)
to signal the wanted mode. In current models, the screen prompts you to use the matching
numeric keys (such as numeric 5 or 6) instead. The rest of the text in this unit uses the function
key terminology to refer to numeric 5 (F5) and numeric 6 (F6).
Both F5 and F6 cause the system to run a service mode boot.
F5 uses the system default (non-customizable) bootlist. It lists the diskette drive, CD drive, hard
disk, and network adapter (in that order).
F6 uses the customizable service bootlist, which can be set with the bootlist command,
SMS, or the diag utility.
If the first successfully bootable device in the selected bootlist (normal, F5 or F6) is a CD/DVD
drive with a diagnostic CD/DVD loaded, the system will boot into diagnostic mode.
If you are doing a service mode boot and the first successfully bootable device in the selected
bootlist is a hard disk, then the system will boot into diagnostic mode from that hard disk.
If the first successfully bootable device in the selected bootlist is installation media (AIX
installation CD/DVD or mksysb tape/CD/DVD), then the system will boot into Installation and
Maintenance mode.

Student Notebook
IBM Power Systems
NIM diag operation on machine

Service Master
(standalone) and assign SPOT
mode
using NIM
Shut down your system
# shutdown
HMC
Boot LPAR to SMS

Network boot your LPAR
and configure for
network boot
diag is started
automatically
Notes:
Using NIM to boot to standalone diagnostic mode

Assuming that the network adapter itself is not the problem, you can also boot to standalone
diagnostic mode with a network boot from a NIM server.
The NIM service must first be set up with a spot resource assigned to your machine object.
Then, you need to apply a NIM diag operation to the machine object for your machine to
prepare NIM to serve a diagnostics program rather than a mksysb or BOS filesets from
installation.
Next, you boot the machine to SMS, use SMS to set up the IP parameters (designating the NIM
server as the boot server) and then select the network adapter as the boot device.

V10.0
Student Notebook
Uempty
diag: Using task selection

IBM Power Systems
# diag

...
Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.)
This selection will list the tasks supported by these procedures.
Once a task is selected, a resource menu may be presented showing
all resources supported by the task.
...
• Run Diagnostics • Change Hardware Vital Product Data

• Run Error Log Analysis • Configure Platform Processor
• Run Exercisers Diagnostics
• Display or Change Diagnostic Run • Delete Resource from Resource List
Time Options • Disk Maintenance
• Add Resource to Resource List • Display Configuration and Resource
• Automatic Error Log Analysis and List
Notification
• Backup and Restore Media … and more
Figure 11-12. diag: Using task selection AN153.0
Notes:
Other tasks
The diag command offers a wide number of other tasks that are hardware-related. All these
tasks can be found after starting the diag main menu and selecting Task Selection.
The tasks that are offered are hardware (or resource) related. For example, if your system has a
service processor, there is a service processor maintenance task, which you do not find on
machines without a service processor. On some systems, you find tasks to maintain RAID and
SSA storage systems.

Student Notebook
Example list of tasks

Following is a list of tasks available on a Power p750 running AIX 7.1:
- Run Diagnostics
- Run Error Log Analysis
- Run Exercisers
- Display or Change Diagnostic Run Time Options
- Add Resource to Resource List
- Automatic Error Log Analysis and Notification
- Backup and Restore Media
- Change Hardware Vital Product Data
- Configure Platform Processor Diagnostics
- Delete Resource from Resource List
- Disk Maintenance
- Display Configuration and Resource List
- Display Firmware Device Node Information
- Display Hardware Error Report
- Display Hardware Vital Product Data
- Display Multipath I/O (MPIO) Device Configuration
- Display Previous Diagnostic Results
- Display Resource Attributes
- Display Service Hints
- Display Software Product Data
- Display or Change Bootlist
- Gather System Information
- Hot Plug Task
- Log Repair Action
- Microcode Tasks
- RAID Array Manager
- Update Disk Based Diagnostics

V10.0
Student Notebook
Uempty
Diagnostic log
IBM Power Systems
# /usr/lpp/diagnostics/bin/diagrpt -r
ID DATE/TIME T RESOURCE_NAME DESCRIPTION
DC00 Mon Oct 08 16:13:06 I diag Diagnostic Session was started
DAE0 Mon Oct 08 16:10:38 N hdisk2 The device could not be tested
DA00 Mon Oct 08 16:05:11 N sysplanar0 No Trouble Found
DA00 Mon Oct 08 16:05:05 N sisscsia0 No Trouble Found
# /usr/lpp/diagnostics/bin/diagrpt -a
IDENTIFIER: DC00
Date/Time: Mon Oct 08 16:13:06
Sequence Number: 15
Event type: Informational Message
Resource Name: diag
Diag Session: 327726
Description: Diagnostic Session was started.
----------------------------------------------------------------------------
IDENTIFIER: DAE0
Date/Time: Mon Oct 08 16:10:38
Sequence Number: 14
Event type: Error Condition
Resource Name: hdisk2
Resource Description: 16 Bit LVD SCSI Disk Drive
Location: U7311.D20.107F67B-P1-C04-T2-L8-L0
Figure 11-13. Diagnostic log AN153.0
Notes:
Diagnostic log
When diagnostics are run in online or single user mode, the information is stored into a
diagnostic log. The binary file is called /var/adm/ras/diag_log. The command,
/usr/lpp/diagnostics/bin/diagrpt, is used to read the content of this file.
Report fields
The ID column identifies the event that was logged. In the example in the visual, DC00 and DA00
are shown. DC00 indicated that the diagnostics session was started and the DA00 indicates No
Trouble Found (NTF).
The T column indicates the type of entry in the log. I is for informational messages. N is for No
Trouble Found. S shows the Service Request Number (SRN) for the error that was found. E is
for an Error Condition.

Student Notebook
Checkpoint
IBM Power Systems
1. What diagnostic modes are available?

a. Concurrent
b. Maintenance
c. Service (standalone)
d. All of the above
2. How can you diagnose a communication adapter that is

used during normal system operation?
Notes:

V10.0
Student Notebook
Uempty
Exercise: Diagnostics
IBM Power Systems
• Run diagnostics in multi-user mode
• Run diagnostics in single user mode
• Run diagnostics in service mode from a disk
• Boot to diagnostics using an external boot image

(NIM server)
Figure 11-15. Exercise: Diagnostics AN153.0
Notes:

Student Notebook
Unit summary
IBM Power Systems

Notes:

V10.0
Student Notebook
Uempty
Unit 12. The AIX system dump facility

This unit explains how to maintain the AIX system dump facility and how to
obtain a system dump.

• Determine the system dump configuration of a system
• Describe the traditional dump mechanism
• Describe the firmware assisted dump mechanism
• Determine the status of the last system dump
• Access the dump image
• Package a system dump and other information with the snap command

Accountability:
• Lab exercise
References
Online AIX Version 7.1 Kernel Extensions and Device Support
Programming Concepts (Chapter 16. Debug Facilities)
(section on System Startup)
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-1
Student Notebook
Unit objectives
IBM Power Systems

• Package a system dump and other information with the snap
command
Notes:
Importance of this unit

If the AIX kernel crashes, routines that are used to create a system dump are started. This
dump can be used to analyze the cause of the system crash.
As an administrator, you need to know what a dump is, how the AIX dump facility is maintained,
and how a dump can be obtained.
You also need to know how to use the snap command to package the dump before sending it to
IBM.

V10.0
Student Notebook
Uempty
System dumps
IBM Power Systems
• What is a system dump?
• What is a system dump used for?
Figure 12-2. System dumps AN153.0
Notes:
What is a system dump?

A system dump is a snapshot of the operating system state at the time of a crash or a manually
initiated dump. When a manually initiated or unexpected system halt occurs, the system dump
facility automatically copies selected areas of kernel data to the primary (or secondary) dump
device. These areas include kernel memory and other areas that are registered in a structure
that is called the master dump table by kernel modules or kernel extensions.
What is a system dump used for?

The system dump facility provides a mechanism to capture sufficient information about the AIX
kernel for later expert analysis. After the preserved image is written to disk, the system is
booted and to production. The dump is then typically submitted to IBM for analysis.
Student Notebook
Types of dumps
IBM Power Systems
• Traditional:
– AIX generates dump before halt
• Firmware assisted (fw-assist):
– POWER firmware generates dump in parallel with AIX halt process
– Defaults to same scope of memory as traditional
– Can request a full system dump
• Live dump facility:
– Selective dump of registered components without need for a system
restart
– Can be initiated by software or by operator
– Controlled by livedumpstart and dumpctrl
– Written to a file system rather than a dump device
Figure 12-3. Types of dumps AN153.0
Notes:
Overview
In addition to the traditional dump function, AIX 6 introduced two new types of dumps, firmware
assisted dumps and the live dump facility.
Traditional dumps
Traditionally, AIX alone handled system dump generation and the only way to get a dump was
to halt the system either due to a crash or through operator request. In a logical partition, AIX
dumps only the memory that is allocated to that partition.
Firmware assisted dumps (fw-assist)

With AIX 6.1 (or later) and POWER6 (or later) hardware, you can configure the dump facility to
have the firmware of the hardware platform handle the dump generation. The main advantage

V10.0
Student Notebook
Uempty to this is that the operating system can start its reboot while the firmware handles the dumping
of the memory contents.
In its default mode, it captures the same scope of memory as the traditional dump, but it can be
configured for a full memory dump.
If for some reason (such as memory restrictions), a configured or requested firmware assisted
dump is not possible, then the traditional dump facility is started.
Live dump facility

AIX 6.1 also introduced a new live dump capability. If a system component is designed to use
this facility, a restricted scope dump of the related memory can be captured without the need to
halt the system.
If an individual component is having problems (such as being hung), a livedumpstart
command can be run to dump the needed diagnostic information.
The management of live dumps (such as enabling a component or controlling the dump
directory) is handled with the dumpctrl command.
The use and management of live dumps require a knowledge of system components, which is
beyond the scope of this class. Use these commands under the direction of the AIX Support
Line personnel.
Student Notebook
Traditional system dump

IBM Power Systems
• The dump routine is started when the system encounters a

fatal error
– Can also be started manually
Primary dump
device
Kernel code
and data
Selective
Partial data copy
from running
applications
Kernel
extension
code and data
Secondary dump
device
Figure 12-4. Traditional system dump AN153.0
Notes:
Contents of an AIX system dump

Typically, an AIX system dump includes all of the information that is needed to determine the
nature of the problem. The dump contains:
- Operating system (kernel) code and data
- Some data from the current running application on each CPU
- Most of the kernel extensions code and data
The dump facility cannot cause a page fault, so only what is in physical memory can be included
in the dump image. Normally the memory that is excluded is not a problem since most of the
kernel data structures are in memory. The process and thread tables are pinned, and the
uthread and ublock structures of the running threads are pinned as well. The traditional dump
image is not a copy of the entire contents of physical memory at the time of the crash, but rather
a selective copy.

V10.0
Student Notebook
Uempty
Traditional dump actions

IBM Power Systems
• Ignore all maskable interrupts on the current processor

• Tell other processors that the dump routine has been started
• Stop error logging and trace event recording
• Display 0c9 or 0c2 on virtual operator panel
• Process the master dump table to gather data
– Write data to the primary dump device, using a parallel compression
algorithm if enough processors are available
– If a failure occurs and a secondary device is configured, write data to
the secondary device using a serial compression algorithm
• Write dump information to NVRAM
• Return to the calling function with a dump status code
– Calling function can initiate a reboot
Figure 12-5. Traditional dump actions AN153.0
Notes:
Traditional dump sequence

The sequence on the visual shows the traditional dump sequence that is used by AIX V5.3
TL05 and newer versions.
The dump routine is started on a single logical CPU, and informs the other CPUs in the AIX
instance of the dump. To speed up the dump sequence, the algorithm uses other available
logical CPUs to assist with tasks in the overall dump sequence. The faulting CPU is used to
gather the data to be included in the dump, while other CPUs are used to compress the data,
and write it out to the dump device. A minimum of three processors must be configured and
available in the AIX instance for parallel dump compression to be used. Up to eight processors
are used for data compression. If the AIX instance has only one or two logical processors
available, then a serial dump compression algorithm is used.
The first step is to disable all maskable interrupts. The main implication of this is that the dump
routine can access only code and data that is resident in physical memory, since page faults
can no longer be resolved.
Student Notebook
The second step is to inform the other processors on the system about the dumps so that a
consistent snapshot of memory can be written to the dump image.
The dump routine then disables error logging, which allows the most recent error that was
recorded in NVRAM to be preserved instead of being overwritten. The most recent error is likely
more indicative of why the dump routine started.
The routine then arranges for a value to be shown on the operator panel that indicates whether
the system initiated the dump, or manually started by the system administrator.
After these initial steps are taken, the dump routine then proceeds with the main task, which is
processing the master dump table to determine which areas of memory should be written to the
dump image.
If a failure occurs while writing data to the primary dump device, and a secondary dump device
is defined, then the dump routine fails over to the secondary dump device. The dump restarts
processing of the master dump table in an attempt to write a complete dump image to the
secondary device. When a failover to the secondary device occurs, the dump sequence uses a
serial dump algorithm instead of the parallel algorithm.
When the dump routine finishes (successfully or not), it returns to the calling function, which in
most cases automatically reboot the operating system (the default setting).

V10.0
Student Notebook
Uempty
The dump device

IBM Power Systems
• Two dump devices can be configured: primary and secondary

• The default primary dump device configuration depends on
the amount of memory in the partition at installation time
– If platform supports firmware assisted dump, then a dedicated primary
dump device is created regardless of the amount of memory
• By default, the secondary dump device is not configured
– The special device /dev/sysdumpnull
Partition memory amount Default primary dump device Size

< 4 GB /dev/hd6 512 MB
>= 4 GB and < 12 GB /dev/lg_dumplv 1 GB
>= 48 GB /dev/lg_dumplv 4 GB
Figure 12-6. The dump device AN153.0
Notes:
Primary dump device

The default primary dump device that is configured depends on the amount of memory in the
partition when AIX is installed. An AIX instance with less than 4 GB of memory has the default
paging space (/dev/hd6) configured as the primary dump device. A system with 4 GB of
memory or more has a dedicated logical volume (/dev/lg_dumplv) configured as the primary
dump device. The size of the logical volume also depends on the amount of memory in the
partition.
Secondary dump device

By default, no secondary dump device is configured, which is indicated by using the special
device /dev/sysdumpnull. If a system has sufficient disk space available, you should
configure a secondary dump device.
To improve the chances of getting a dump, the secondary device should be on a different
hardware path from the primary dump device as possible. For example, different physical
Student Notebook
devices and adapters. A logical volume on the same physical disk as the primary would be
worthless. It would be better to increase the size of the primary.

V10.0
Student Notebook
Uempty
Dump device types

IBM Power Systems
• What can be configured as a dump device?

– Paging space
– Logical volume
– Tape drive
– DVD-RAM drive
• Considerations:
– If using paging space:
• Use only /dev/hd6 for primary dump device
• Secondary device can be any paging space in rootvg
– If using logical volumes:
• Primary dump device must be in rootvg
• Secondary device can be in any volume group
– Mirrored paging space can be used
– Dump to DVD-RAM or tape does not span multiple volumes
Figure 12-7. Dump device types AN153.0
Notes:
What can be used as a dump device?

A paging space device can be used as a dump device. When the system crashes, the system
does not need the data in the paging space since the system is not going to continue running
and does not need to resolve page faults. Therefore, a paging space logical volume can be
used to hold the dump data. The dump data must be saved before the paging space can be
used again during the next reboot. When the system recognizes that a dump that is written, the
data is copied to a file in the copy directory (/var/adm/ras by default) before bringing the
paging space online.
A dedicated logical volume can be used as the dump device. The logical volume can be
mirrored if required.
Any supported tape drive can be configured as a dump device. However, remember that (at
least in theory) a system crash might occur at any time, and the dump routine expects to be
able to write to the dump device. If the tape drive has no media, or the media in the tape drive is
read-only, then the dump routine has a problem. AIX V5.3 added support for configuring a
DVD-RAM drive as a dump device. Use of a DVD-RAM drive has the same issues as a tape
Student Notebook
drive. That is, the dump routine expects to find writable media in the dump device, so there are
problems if no media is in the drive, or the media is read-only.

V10.0
Student Notebook
Uempty
Firmware assisted dump (1 of 3)

IBM Power Systems
• The traditional dump method relies on the operating system to

write the data to the dump device
– However, normally the dump routine is invoked because the OS has a
problem
• Firmware assisted dump allows writing of the dump data to be
deferred until the OS is booting
• Introduced with AIX 6.1 TL6 when running on POWER6 or
newer hardware
– Only uses primary dump device, which must be a dedicated logical
volume in rootvg
– Partition must meet minimum memory requirement, dependent on the
number of logical CPUs
• Firmware assisted dump is the default in AIX 7.1 when the
platform supports it
Figure 12-8. Firmware assisted dump (1 of 3) AN153.0
Notes:
Why use a firmware assisted dump?

One potential issue with the traditional dump method is that it relies on the failing operating
system to still be working enough for the dump data to be written out to the dump device.
Depending on the nature of the problem, the operating system might not be able to write data to
the dump device.
To get around this problem, the firmware assisted dump feature is available when using
POWER6 or newer processor-based hardware. The firmware assisted dump allows the writing
of dump data to be deferred, and handled by the booting operating system.
AIX 7.1 is configured to use firmware assisted dump as the preferred method when running on
POWER6 or newer hardware. AIX 6.1 continues to use the traditional dump mechanism by
default, even when running on hardware that supports firmware assisted dump.
Student Notebook

IBM Power Systems
• Firmware is used to freeze the partition memory

– Allows writing of the dump data to be deferred
• The dump data is written by the booting operating system
– In other words, an instance that has not failed
• Once data in a logical memory block (LMB) has been written
to the dump device, the LMB is released back to the booting
OS for normal use
• Main objective of firmware assisted dump is to minimize the
work that is done by the failing OS instance
– Secondary benefit is a performance improvement in overall recovery
time
Notes:
Objectives of a firmware assisted dump

When a firmware assisted dump is initiated, the failing operating system determines the data
that needs to be included in the dump, and then requests that the platform firmware freeze this
area of physical memory. Writing of the dump data is deferred until the operating system is
booted again.
The main objective of the firmware assisted dump feature is to minimize the amount of work that
is done by the failing operating system. A secondary benefit of the feature is that the overall
recovery time can be reduced, since the dump data can be written to the dump device in parallel
with the normal boot of the operating system.

V10.0
Student Notebook
Uempty

IBM Power Systems
• Two types of firmware assisted dump are supported:

– Selective memory dump: Similar to traditional dump, where the dump
tables are used to determine the data to be dumped
– Full memory dump: The entire contents of the LPAR’s physical
memory are dumped, no interaction with AIX required
• Both modes of firmware assisted dump write whole pages to
the dump device
– Traditional system dump writes data byte ranges, not pages
– As a result, the dump image format is different from that used by
traditional dump
Notes:
Selective memory dump

A selective memory dump is the default configuration when firmware assisted dump is
configured. In this case, the dump routines process the master dump table to determine which
areas of memory to include in the dump.
Full memory dump

A full memory dump can be configured, which bypasses the processing of the master dump
table, and instead arranges for the full contents of physical memory of the partition to be written
to the dump device.
The data that is eventually written to the dump device is a sequence of whole pages that contain
the data to dump. So, the resulting dump image does not correspond directly to the dump data
areas that were initially listed in the master dump table.
Student Notebook
Firmware assisted dump algorithm (1 of 2)

IBM Power Systems
• Ignore all maskable interrupts

• Stop the other CPUs in the partition
• Display 00cb on the virtual operator panel
• If selective memory dump is configured, process the master
dump table to determine which logical memory blocks contain
data to be preserved and, which do not
– Initial assumption is that all LMBs need to be preserved
• Invoke POWER Hypervisor (PHYP) to logically power off
partition but preserve contents of LMBs
• PHYP copies information to an area of memory that indicates
to the booting OS that firmware assisted dump is in progress
Figure 12-11. Firmware assisted dump algorithm (1 of 2) AN153.0
Notes:
Dump sequence
The dump sequence for a firmware assisted dump is different from the dump sequence of a
traditional dump.
When the dump routine is started, as with the traditional dump sequence, it starts ignoring all
interrupts, stop the other CPUs in the AIX instance, and display a value on the virtual operator
panel.
When the operating system is first booted and the firmware assisted dump mechanism is
configured, an assumption is made that a full memory dump is used.
If a selective memory dump is configured, the dump routine processes the master dump table
(MDT) to determine the memory blocks being used for the data to include in the dump. This
information is used to update the table in physical memory that initially assumed that a full
memory dump would take place.

V10.0
Student Notebook
Uempty The dump routine then starts the POWER Hypervisor (PHYP) to freeze the partition memory,
and reboot. PHYP copies data to a reserved area of memory to indicate to the AIX bootloader
(SoftROS) that a firmware assisted dump is in progress.
Student Notebook
Firmware assisted dump algorithm (2 of 2)

IBM Power Systems
• PHYP boots the operating system in the LPAR

• The AIX bootloader (SoftROS) notices that a firmware
assisted dump is in progress
• SoftROS starts to write dump data to the dump device,
freeing each LMB when all the relevant data has been written
– Also releases any LMBs that were marked by the dump routine as not
required for a selective memory dump
• Once sufficient LMBs have been freed, SoftROS starts and
passes control to AIX
• AIX handles writing the remaining dump data to the dump
device (and releasing the LMBs for normal usage) in parallel
with the normal boot sequence
Figure 12-12. Firmware assisted dump algorithm (2 of 2) AN153.0
Notes:
Dump sequence (continued)

When PHYP initiates the boot sequence, the AIX bootloader, otherwise known as SoftROS,
notes that a firmware assisted dump is in progress. Then, SoftROS starts copying data from
memory to the dump device. As memory is copied, it is unfrozen, and marked as available.
When sufficient memory is freed, SoftROS starts the full AIX operating system. AIX continues to
write the remaining data to the dump device in parallel with the normal boot tasks by unfreezing
the memory and marking it as available when the data is written out.
The result is that the dump image was written to the dump device by a fresh instance of the
operating system, rather than the failing instance as in the case of a traditional dump.

V10.0
Student Notebook
Uempty
The sysdumpdev command

IBM Power Systems
• The sysdumpdev command is used for many dump-related

purposes:
– List the current dump settings
– Set the primary and secondary dump devices
– Enable and disable the always allow dump setting
– Set the copy directory location and policy
– Estimate the size of the dump
– Show system dump image information
– Change the dump type from traditional to fw-assisted (for AIX 6 and
newer when running on POWER6 and newer hardware)
• Configuration is stored in the SWservAt ODM object class
Figure 12-13. The sysdumpdev command AN153.0
Notes:
Introduction
Many aspects of dump device configuration and monitoring are accomplished by using the
sysdumpdev command. The command is used to do the tasks that are listed on the visual.
The system dump configuration settings are stored in the SWservAt ODM object class.
Student Notebook
List current dump settings (1 of 2)

IBM Power Systems
• The –l flag of sysdumpdev lists the current dump settings
# sysdumpdev –l
primary /dev/lg_dumplv
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
type of dump traditional
Figure 12-14. List current dump settings (1 of 2) AN153.0
Notes:
Overview
The -l flag of the sysdumpdev command lists the current dump configuration. The command
lists the current primary and secondary dump device settings. These settings are used until the
system reboots, or the sysdumpdev command is started again to change the configuration of
the primary or secondary devices.
The copy directory and forced copy flag settings are only relevant if a paging space
device is used as one of the dump devices. These settings are covered a little later in this unit.
The last line of output that is shown on the visual, indicating the type of dump, is displayed
when running AIX 6 or newer.

V10.0
Student Notebook
Uempty
List current dump settings (2 of 2)

IBM Power Systems
• The type of dump value will either be traditional or

fw-assisted
– If the value is fw-assisted, then the value of the full memory dump
attribute indicates if a full or partial memory dump is configured
# sysdumpdev –l
primary /dev/lg_dumplv
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
type of dump fw-assisted
full memory dump disallow
Figure 12-15. List current dump settings (2 of 2) AN153.0
Notes:
Full memory dump display

If AIX was configured for firmware assisted dump, another line that shows the dump mode is
visible in the sysdumpdev output.
Student Notebook
Configuring the dump type

IBM Power Systems
• The –t flag of sysdumpdev specifies the type of dump

– Allowable values are traditional or fw-assisted
– Firmware assisted dump is supported by AIX 6.1 or newer when
running on POWER6 or newer hardware
• A reboot is required when changing from traditional to
fw-assisted
– Not required when changing to traditional
• The –f flag of sysdumpdev specifies the mode of firmware
assisted dump
– Allowable values are:
• disallow - Full memory dump is not permitted (in other words, selective copy
is performed)
• allow - Full memory dump is only performed if the operating system cannot
handle the dump request
• required - Full memory dump will always be performed
Figure 12-16. Configuring the dump type AN153.0
Notes:
Dump type
The dump type is selected by using the -t flag of the sysdumpdev command. Allowable values
are traditional and fw-assisted. When changing from traditional to fw-assisted,
a reboot is required for the change to take place. The reboot is required because the firmware
assisted dump facility must reserve an area of memory for use in communication between
PHYP and SoftROS during system reboot when a firmware assisted dump is in progress.
Changing from fw-assisted to traditional does not require a reboot, as the existing
reserved memory can be released.
Firmware assisted dump modes

The mode of firmware assisted dump is selected by using the -f flag of the sysdumpdev
command. Allowable values are disallow, allowed, and required.
The disallow value specifies that the full memory system dump mode is not allowed, in other
words, the firmware assisted dump uses the selective memory mode.

V10.0
Student Notebook
Uempty The allow value specifies that the full memory system dump mode is allowed but is done only
when the operating system cannot properly handle the dump request.
The require value specifies that the full memory system dump mode is allowed and is always
done.
Student Notebook
Set the dump devices (1 of 2)

IBM Power Systems
• Two primary and two secondary devices are stored in the

ODM
• The tprimary and tsecondary ODM object values are used
when AIX boots
– Configured from /etc/rc.boot by sysdumpdev –q
• The primary and secondary ODM objects reflect the current
settings
– What AIX uses from now on until the value is changed or the OS is
shutdown
• The –p flag of sysdumpdev sets the primary ODM object
• The –s flag of sysdumpdev sets the secondary ODM object
• Adding the –P flag also changes the tprimary or tsecondary
ODM object
Figure 12-17. Set the dump devices (1 of 2) AN153.0
Notes:
ODM objects for dump devices

The SWservAt ODM object class that is used to store the dump configuration of the system has
two objects that are related to the primary dump device, and two objects for the secondary
device.
The ODM objects that are called tprimary and tsecondary contain the configuration of the
primary and secondary devices that are used when the system boots. The setting of the primary
and secondary dump devices during the boot sequence are done from the /sbin/rc.boot script,
when it runs the command sysdumpdev -q.
The primary and secondary ODM objects reflect the current settings of the primary and
secondary dump devices. For most systems, the value of the tprimary object is the same as the
value of the primary object; and the value of the tsecondary object is the same as the value of
the secondary object.
The ODM objects relative to each device can be changed individually by using the
sysdumpdev command. Using the -p flag allows the primary ODM object to be changed. Using

V10.0
Student Notebook
Uempty the -s flag allows the secondary object to be changed. Using the -P flag in addition to either -p
or -s changes the tprimary or tsecondary object in addition to the primary or secondary ODM
object. This value changes the device that the system uses at reboot.
Student Notebook
Set the dump devices (2 of 2)

IBM Power Systems
• The sysdumpdev –l command shows the current device

information
– Since this is what will be used if the system crashes
– Normally the current device is the same as the device used at reboot
• One use of this facility is to allow the configuration of a
primary or secondary device that is not in rootvg
• For example:
– Primary dump device is configured as /dev/hd6 from
/etc/rc.boot during boot sequence - only rootvg is active at this
stage
– Primary dump device is then configured as /dev/lv_nonrootvg by
the script that is called from inittab later in boot once other volume
groups are active
Figure 12-18. Set the dump devices (2 of 2) AN153.0
Notes:
Multiple ODM objects for a dump device

The output of the sysdumpdev command shows the current dump configuration: the devices
and settings that are used if the system crashes right now. For most systems, the tprimary
object has the same value as the primary object, and the same is true for the secondary device
information.
One reason multiple ODM objects are used to configure each dump device is that it allows for
more flexible configurations. For example, having a primary dump device that is not in the
rootvg volume group. This scenario can be useful on systems that have only small amounts of
internal storage, but a large amount of memory that requires a large dump device to be
configured. For example, a system might be configured to have the default paging space
/dev/hd6 as the primary dump device. This device would be configured as the primary dump
device during the boot sequence, and would be used if there was a crash at that point. Later in
the boot sequence (after the other volume groups are varied on), a script is called from inittab.
This script configures a larger dedicated logical volume (/dev/lg_nonrootvg) that is in
another volume group to be used as the primary dump device. However, since this logical

V10.0
Student Notebook
Uempty volume is not in the rootvg volume group, it cannot be configured as the permanent primary
dump device. It is used as the primary dump device from when it is configured until the system
crashes, reboots, or the sysdumpdev command is run again to change the primary dump
device setting.
Student Notebook
Copy directory location and copy policy (1 of 2)

IBM Power Systems
• A dump that is written to a paging space device is copied to

the copy directory during the boot sequence
– Otherwise, the dump will be overwritten when paging space is enabled
• /sbin/rc.boot performs an explicit mount of /var, and uses
the copycore command to copy the dump image
• The default copy directory location is /var/adm/ras
• If you change the copy directory:
– The new location must be a directory on a file system in rootvg, since
only rootvg is active when the dump is copied
– The copycore command takes care of mounting the file system if it is
something other than /var
Figure 12-19. Copy directory location and copy policy (1 of 2) AN153.0
Notes:
Copy directory
The copy directory location is only relevant when a paging space device is used as a dump
device.
When the system reboots after a crash, if the dump was written to a paging space device it must
be copied somewhere before the paging space can be activated. The copy directory is the
location where the dump is copied. By default, it is set to /var/adm/ras.
When a dump in a paging space device is detected at system boot time, only the root volume
group is active and no file systems are mounted. The /sbin/rc.boot script does an explicit
mount of the /var file system before running the copycore command to copy the dump from
the paging space into the copy directory. After the dump is copied, the /var file system is
unmounted before the script continues.
If you change the location of the copy directory, it must be to another location that is contained
within the rootvg volume group. rootvg is the only active volume group at the stage of the boot
sequence when the copy must be done.

V10.0
Student Notebook
Uempty
Copy directory location and copy policy (2 of 2)

IBM Power Systems
• The copy directory location is changed using the

–D or –d flag of sysdumpdev
– For example: sysdumpdev –D /var/adm/sysdumps
– Each flag specifies a different policy for dealing with failure situations
• The flag that is used to set the copy directory location
determines the value for the forced copy flag field that is
shown in the output of sysdumpdev –l
• If the –D flag is used to set the directory location, then the
forced copy flag value is TRUE
• If the –d flag is used to set the directory location, then the
forced copy flag value is FALSE
Figure 12-20. Copy directory location and copy policy (2 of 2) AN153.0
Notes:
Changing the copy directory location

The copy directory location can be changed by using either the -D (uppercase D) or -d
(lowercase d) flag of the sysdumpdev command.
Each flag specifies a different value to be set for the forced copy flag attribute with the
sysdumpdev -l command. The value of forced copy flag determines what happens
should the copy of the dump to the copy directory fail for any reason.
Student Notebook
Dump copy failure (1 of 2)

IBM Power Systems
• If the copy of the dump image fails and the forced copy
flag value is FALSE:
– The boot sequence continues (In other words, the dump is ignored)
– Dump image will likely be corrupted when paging space is activated
• If the copy of the dump image fails and the forced copy
flag value is TRUE:
– A menu is displayed on the console that allows you to copy the dump
to removable media, or to ignore the dump
– AIX 5.3 and above support tape and DVD-RAM devices
– The boot sequence waits until you interact with the menu
• The output of sysdumpdev –L indicates whether either of
these dump copy failure situations has occurred
Figure 12-21. Dump copy failure (1 of 2) AN153.0
Notes:
What happens if the dump copy fails?

If the forced copy flag setting is FALSE and the copy of the dump to the copy directory fails,
then the failure is ignored and the boot sequence continues as normal. The dump is probably
lost, since it is corrupted when the paging space is activated.
If the forced copy flag setting is TRUE and the copy of the dump to the copy directory fails for
any reason, then the boot sequence displays a menu on the system console. You can specify
that the dump be copied to removable media, or ignore the dump and continue. The dump can
be copied to tape or DVD-RAM. The boot sequence does not continue until an option from the
menu is selected, even if no supported devices are present on the machine.
The output of the sysdumpdev -L command, used to view information about the most recent
dump in a partition, indicates which of these situations (if any) occurred.

V10.0
Student Notebook
Uempty
Dump copy failure (2 of 2)

IBM Power Systems
• Menu shows available devices for copying
Copy a System Dump to Removable Media
The system dump is 117215744 bytes and will be copied from /dev/hd6
to media inserted into the device from the list below.
Please make sure that you have sufficient blank, formatted

media before you continue.
Step One: Insert blank media into the chosen device.

Step Two: Type the number for that device and press Enter.
Device Type Path Name
>>> 1 tape/scsi/scsd /dev/rmt0
88 Help ?
99 Exit -- Warning, the dump will be lost!
>>> Choice[1]:
Figure 12-22. Dump copy failure (2 of 2) AN153.0
Notes:
Menu that might be shown if dump copy fails

The visual shows an example of the menu that is shown on the system console if the dump
copy from paging space into the copy directory fails and the forced copy flag attribute is
TRUE.
Student Notebook
always allow dump flag

IBM Power Systems
• The always allow dump value indicates whether special

sequences on a native (USB or serial) keyboard will start a
dump
• Default value is FALSE
– No special key sequence can initiate a dump
• When the value is TRUE, the following special key sequences
on a native keyboard are recognized:
– Ctl-Alt-Numpad1 initiates a dump to the primary dump device
– Ctl-Alt-Numpad2 initiates a dump to the secondary dump device
• The value is set TRUE using the –K flag of sysdumpdev, and
set FALSE using –k
Figure 12-23. always allow dump flag AN153.0
Notes:
How a dump can be manually initiated with the always allow dump flag
The value of the always allow dump flag has significance over how a system dump can be
manually initiated.
For systems with AIX 6.1 or newer, the flag controls whether a special key sequence entered on
a native console keyboard initiates a system dump. A native console keyboard is either a USB
keyboard that is used with a graphics adapter in an LFT console configuration. Or, a keyboard
that is configured as part of a physical terminal device (such as a VT220 or IBM 3153) attached
to a physical serial port. On a system with a Hardware Management Console (HMC), the
integrated serial ports are disabled. The key sequence is not recognized when generated on the
virtual console that is provided by the HMC.

V10.0
Student Notebook
Uempty
Estimate the dump size

IBM Power Systems
• Dump compression is always enabled in AIX 6.1 and above

• The –e flag of sysdumpdev prints an estimate of the dump
image size if the system were to crash right now
# sysdumpdev –e
0453-041 Estimated dump size in bytes: 326107136
– Value that is displayed is the estimated number of bytes to be written

to the dump device
– Takes dump type and compression into account
• When using traditional dump or selective copy firmware
assisted dump, the size of the dump image fluctuates with
system load
– More active threads/processes means more data must be written
– Size shrinks as the system becomes idle
Figure 12-24. Estimate the dump size AN153.0
Notes:
Estimating the dump size

The -e flag of the sysdumpdev command prints an estimate of the size the dump image would
be if the system were to crash right now. The estimate is given in bytes. It takes the current
dump mode into account, along with the amount of compression that can be expected for the
configured dump mode.
When using traditional dump or selective copy firmware assisted dump, the size of the dump
image fluctuates with system load. If an AIX instance has many thousands of active processes
and threads, all of that information from the process and thread tables must be included in the
dump image. So, obviously the dump would be bigger than if the instance had a much smaller
number of processes and threads.
Student Notebook
Methods of starting a dump

IBM Power Systems
• Automatic invocation of dump routines by system

• Using the sysdumpstart command or SMIT
– Option: -p (send to primary dump device)
– Option: –s (send to secondary dump device)
– Option: –t (use traditional dump)
– Option: –f (select scope of dump)
• Using a special key sequence on the LFT
<Ctrl-Alt-NUMPAD1> (to primary dump device)
<Ctrl-Alt-NUMPAD2> (to secondary dump device)
• Using the Hardware Management Console (HMC)
– Restart LPAR with the Dump option
• Using the remote reboot facility
Figure 12-25. Methods of starting a dump AN153.0
Notes:
Ways to obtain a system dump

The system can automatically initiate a system dump. In addition, there are several ways for a
user to start a system dump. The most appropriate method to use depends on the condition of
the system.
Automatic invocation of dump routines

If there is a kernel panic, the system automatically dumps the contents of real memory to the
primary dump device.
Using the sysdumpstart command or SMIT

One method a superuser can use to start a dump is to run the sysdumpstart command or
start it through SMIT (fastpath smit dump).
The -p flag of sysdumpstart is used to specify a dump to the primary dump device.

V10.0
Student Notebook
Uempty The -s flag of sysdumpstart is used to specify a dump to the secondary dump device.
The -t flag of sysdumpstart is used to change the default type from fw_assist to traditional.
The -f flag of sysdumpstart is used to change the scope of the dump (interacts with the
configuration set up with sysdumpdev):
- disallow - Do not allow a full memory dump
- require - Require a full memory dump
Using a special key sequence

If the system halted, but the keyboard still accepts input, a dump to the primary dump device
can be forced by pressing the <Ctrl-Alt-NUMPAD1> key sequence on the LFT keyboard. The
key combination <Ctrl-Alt-NUMPAD2> on the LFT can be used to initiate a system dump to the
secondary dump device. This method can be used only when your machine's mode switch (if
your machine has such a switch) is set to the Service position or the Always Allow System
Dump option is set to true. The Always Allow System Dump option can be set to true by
using SMIT or by using sysdumpdev -K.
Using the Hardware Management Console

In an LPAR environment, a dump can be initiated from the Hardware Management Console
(HMC) by choosing Dump from the Restart Options (accessed through the Restart Partition
menu selection in the Server Management application). The Dump option is the equivalent of
pressing the physical Reset button on a non-LPAR system. The partition initiates a system
dump to the primary dump device if configured to do that. Otherwise, the partition reboots.
Using the remote reboot facility

The remote reboot facility can also be used to obtain a system dump.
Obtaining a useful system dump

Bear in mind that if your system is still operational, a dump that is taken now does not help with
problem determination. A relevant dump is one taken at the time of the system halt.
Student Notebook
Starting a dump from a TTY

IBM Power Systems
S1
login: #dump#>1
Add a TTY
...
REMOTE Reboot ENABLE: dump
REMOTE Reboot STRING: #dump#
...
Figure 12-26. Starting a dump from a TTY AN153.0
Notes:
The remote reboot facility

The remote reboot facility allows the system to be rebooted through a native (integrated) serial
port. The system is rebooted when the reboot_string is received at the port. This facility is
useful when the system does not otherwise respond but can service serial port interrupts.
Remote reboot can be enabled on only one native serial port at a time.
An important feature of the remote reboot facility is that it can be configured to obtain a system
dump before rebooting.

V10.0
Student Notebook
Uempty Configuring the remote reboot facility

Two native serial port attributes control the operation of remote reboot:
- reboot_enable
- reboot_string
Use of these attributes is discussed in the following paragraphs.
reboot_enable
The value of this attribute (referred to as REMOTE Reboot ENABLE in SMIT) indicates whether
this port is enabled to reboot the machine by the remote reboot_string, and if so, whether to
take a system dump before rebooting:
- no - Indicates that remote reboot is disabled
- reboot - Indicates that remote reboot is enabled
- dump - Indicates that remote reboot is enabled, and, before rebooting, a system dump is
taken on the primary dump device
reboot_string
This attribute (referred to as REMOTE Reboot STRING in SMIT) specifies the remote
reboot_string that the serial port scans for when the remote reboot feature is enabled.
When the remote reboot feature is enabled, and the reboot_string is received on the port, a
'>' character is transmitted, and the system is ready to reboot. If a '1' character is received, the
system is rebooted (and a system dump might be started, depending on the value of the
reboot_enable attribute); any character other than '1' aborts the reboot process. The
reboot_string has a maximum length of 16 characters and must not contain a space, colon,
equal sign, null, new line, or Ctrl-\ character.
Enabling remote reboot

Remote reboot can be enabled through SMIT or the command-line. For SMIT, the path System
Environments > Manage Remote Reboot Facility can be used for a configured TTY.
Alternatively, when configuring a new TTY, remote reboot can be enabled from the Add a TTY
or Change/Show Characteristics of a TTY menus. These menus are accessed through the
path Devices > TTY.
From the command-line, the mkdev or chdev command is used to enable remote reboot.
Student Notebook
Generating dumps with SMIT

IBM Power Systems
# smit dump
System Dump
Move cursor to desired item and press Enter
Show Current Dump Devices
Show Information About the Previous System Dump
Show Estimated Dump Size
Change the Type of Dump
Change the Full Memory Dump Mode
Change the Primary Dump Device
Change the Secondary Dump Device
Change the Directory to which Dump is Copied on Boot
Start a Dump to the Primary Dump Device
Start a Traditional System Dump to the Secondary Dump Device
Copy a System Dump from a Dump Device to a File
Always ALLOW System Dump
Check Dump Resources Utility
Change/Show Global System Dump Properties
Change/Show Dump Attributes for a Component
Change Dump Attributes for multiple Components
Figure 12-27. Generating dumps with SMIT AN153.0
Notes:
Using the SMIT dump interface

You can use the SMIT dump interface to work with the dump facility.
The Always ALLOW System Dump option

An important item on the menu that is shown on the visual is Always ALLOW System Dump. If
you set this option to yes, the CTRL-ALT-1 (numpad) and CTRL-ALT-2 (numpad) key
sequences start a dump even when the key mode switch is in Normal position.

V10.0
Student Notebook
Uempty The SMIT dump menu

You can use the SMIT dump interface to work with the dump facility.
# smit dump
Move cursor to desired item and press Enter

Show Current Dump Devices
Show Information About the Previous System Dump
Show Estimated Dump Size
Change the Type of Dump
Change the Full Memory Dump Mode
Change the Primary Dump Device
Change the Secondary Dump Device
Change the Directory to which Dump is Copied on Boot
Start a Dump to the Primary Dump Device
Start a Traditional System Dump to the Secondary Dump Device
Copy a System Dump from a Dump Device to a File
Always ALLOW System Dump
Check Dump Resources Utility
Change/Show Global System Dump Properties
Change/Show Dump Attributes for a Component
Change Dump Attributes for multiple Components
The menu items that show or change the dump information use the sysdumpdev command.
Student Notebook
Generating dumps with HMC

IBM Power Systems
Figure 12-28. Generating dumps with HMC AN153.0
Notes:
If using an HMC to manage the LPAR, you can use the HMC GUI interface (or the chsysstate
command) to trigger a dump of the operating system.
In the GUI interface you would select the LPAR and then from the tasks menu: Operations >
Restart. The resulting window is shown in the visual. Clicking the Dump button selects an
operation to signal the system to effectively signal a reset to initiate a dump.

V10.0
Student Notebook
Uempty
Dump image information (1 of 3)

IBM Power Systems
• Use sysdumpdev –L to show information about the most

recent dump that occurred
• Output includes dump status value
– A value of 0 means the dump completed successfully
# sysdumpdev –L
0453-039
Device name: /dev/lg_dumplv

Major device number: 10
Minor device number: 11
Size: 53626368 bytes
Uncompressed Size: 674379157 bytes
Date/Time: Sun Jan 12 18:03:06 PST 2014
Dump status: 0
Type of dump: fw-assisted
dump completed successfully
Figure 12-29. Dump image information (1 of 3) AN153.0
Notes:
Displaying recent dump information

The sysdumpdev -L command displays information about the most recent system dump to
occur on the partition. One important field of the output is the Dump status field. A value of zero
indicates that the dump image was written to the dump device successfully.
The output also includes useful information such as the dump device that was used, and the
date and time of the crash.
Since dump compression is enabled by default (and cannot be disabled) in AIX 6.1 and above,
two lines of size information are displayed. The Size information indicates the number of bytes
of (compressed) data that is written to the dump device. The Uncompressed Size information
indicates the number of bytes of data originally in the dump image.
The Type of dump value is only displayed on AIX 6.1 and above.
Student Notebook

IBM Power Systems
• For a dump written to paging space, sysdumpdev –L shows

where the image was copied
# sysdumpdev –L
0453-039
Device name: /dev/hd6

Dump status: 0
Type of dump: traditional
Dump copy filename: /var/adm/ras/vmcore.0.BZ
Notes:
Dump copy filename

If the dump was to paging space, and the dump copy was successful, then the output of
sysdumpdev -L shows where the dump was copied to.

V10.0
Student Notebook
Uempty

IBM Power Systems
• Output shows if dump copy failed and if menu was presented

on the console
# sysdumpdev –L
0453-039
Device name: /dev/hd6

Dump status: 0
Type of dump: traditional
0481-195 Failed to copy the dump from /dev/hd6 to /var/adm/ras.
0481-198 Allowed the customer to copy the dump to external media.
Notes:
Display if dump copy fails

If the dump was to paging space, and the dump copy failed, then the output of sysdumpdev -L
indicates the failure. For example:
0481-195 Failed to copy the dump from /dev/hd6 to /var/adm/ras.
If the forced copy flag value is TRUE, then the boot sequence pauses and displays the
menu on the system console. This action displays the last line of output that is shown on the
visual:
0481-198 Allowed the customer to copy the dump to external media.
This message does not indicate that the dump was copied to external media. It just means that
the menu was shown on the console.
If the forced copy flag value is FALSE, then the boot sequence ignores the dump and
continue as normal. The output of sysdumpdev -L shows only that the copy failed.
Student Notebook
Operator panel codes

IBM Power Systems
• Dump status and progress codes can be shown in the virtual

operator panel in the HMC GUI
– Dump status value can be examined in output of sysdumpdev –L
Operator Description Dump status
panel code value
00c0 Dump completed successfully 0
00c8 No dump device defined -1
00c4 Dump device too small -2
00c5 Dump crashed or did not start -3
00c1 I/O error on dump device during dump -4
00c9 System initiated dump in progress N/A
00c2 User initiated dump in progress N/A
00cb Firmware assisted dump in progress (POWER6 only) N/A
00c6 User initiated dump in progress to secondary device N/A
00cc Dump has switched to secondary device N/A
0c20 System is in the kernel debugger N/A
Figure 12-32. Operator panel codes AN153.0
Notes:
Dump progress codes

The column on the left side of the table in the visual shows the different codes that might be
displayed on the operator panel while a dump is in progress, or when a dump terminated. The
first five codes are defined as terminal codes, indicating these codes are displayed when the
dump routine finishes. The other codes are intermediate codes, since when they are displayed
the dump routine is still running.
The dump status value is a number that indicates whether the dump completed successfully.
The sysdumpdev -L command displays the dump status value. It is also included in the error
log entry with the DUMP_STATS label that is created when the system reboots after the crash.

V10.0
Student Notebook
Uempty
Dump problems
IBM Power Systems
• Dump problems are indicated by a status value that is not 0

• Dump device can not be large enough (00c4, status = -2)
– Increase the size of the device if possible
• Problem with device or adapter (00c1, status = -4)
– Run diagnostics to test hardware
– Consider moving dump device to another location
• System crashed during boot sequence (00c8, status = -1)
– Enable debugging information during boot
• When no valid dump image is created:
– Check error log entries for information
– Might need to resort to using the KDB debugger
Figure 12-33. Dump problems AN153.0
Notes:
Identifying dump problems

The most common problem with system dumps is that the dump device is too small. The 00c4
code on the operator panel indicates this problem. Also, a dump status value of -2 shown in
both the output of sysdumpdev -L and in the error log. The solution to this problem is to
increase the size of the dump device if possible. If the dump device cannot be increased, then
perhaps change the dump device to another location.
A dump status value of -4 indicates a problem with the physical device or adapter that is used
for the dump device. You should run diagnostics to test the device if possible, and consider
configuring a different device to act as the dump device.
A dump status value of -1 indicates that the system crashed when no dump device was
defined. The most likely cause of this is that the system crashed during the boot sequence
before the dump device was configured. To detect why the system crashed, you should enable
debugging output during the boot sequence.
Student Notebook
If no dump image is obtained, you should check the error log for information. There might still be
error log entries that are related to the crash. When the dump fails to complete successfully,
there might be a partial dump created. The partial dump might or might not be useful, since it
depends on what is present in the dump image, and what is missing. A partial dump is indicated
when the Size field is greater than zero in the sysdumpdev -L output, and the dump status
value is not zero.

V10.0
Student Notebook
Uempty
Retrieving the dump image

IBM Power Systems
• A dump image in paging space is copied to the copy directory

at boot time
– The copycore command is invoked from /sbin/rc.boot
• The copycore command:
– Uses the savecore command to perform the copy
– Arranges for copydumpmenu to run if the copy fails and the forced
copy flag value is TRUE
• If dump image is in a dedicated logical volume, no action is
taken at boot time
– Dump image can be retrieved manually using the savecore
command
• Use the dd command to retrieve a dump that is written to tape
or DVD-RAM media
Figure 12-34. Retrieving the dump image AN153.0
Notes:
Recovering the dump image at boot time

If the dump was written to paging space, then the system automatically copies the dump to the
copy directory before activating the paging space during the boot sequence. The /sbin/rc.boot
script runs the copycore command to copy the dump to the copy directory.
The copycore command uses the savecore command to do the actual copying of the dump.
If the copy fails for any reason, and forced copy flag is TRUE, then the copycore
command arranges for the copydumpmenu command to display the menu on the system
console that allows the dump to be copied to external media.
If the dump was written to a dedicated logical volume, then the system does not copy the dump
when the system reboots. If you want to examine the dump, it must be copied to the file system
by using the savecore command.
If the dump was written to tape or DVD-RAM media, it must be copied back into the file system
by using the dd command.
Student Notebook
The savecore command

IBM Power Systems
• Use savecore to copy a dump image from a logical volume

– Usage: savecore directory
• The command looks in the directory for a file that is called bounds and if
no bounds file exists:
– The dump image is named vmcore.0.BZ, and a new bounds file is created that
contains the character 1
– /unix file copied and named vmunix.0
• If a bounds file is found:
– All previous dump images are removed, apart from the most recent
– The current dump image is named vmcore.N.BZ, and /unix named vmunix.N,
where N is the value from bounds file
– Value in bounds file is incremented for use in next dump copy
• If the dump image has already been retrieved savecore reports an
error
– Can override this by using the –f flag to force the copy
Figure 12-35. The savecore command AN153.0
Notes:
What does the savecore command do?

The savecore command is used to retrieve a dump image from a dedicated logical volume
dump device. The command is also used by the copycore command to start the dump copy
into the copy directory at boot time. The normal command usage is to run savecore with the
name of a directory into which the dump should be copied.
The savecore command looks in the specified directory for a file called bounds. If no file called
bounds exists, then the dump image is copied to a file called vmcore.0.BZ. If savecore is run
from the command-line by the user, then it also copies the /unix file into the specified
directory, and names it vmunix.0. If copycore runs the savecore command, then the /unix
file is not copied. The savecore command then creates a file that is called bounds. That file
contains the character 1 in the specified directory. If a file called bounds is found, then the
savecore command reads the file, and uses the number that it contains as a component of the
name of the dump file that is created. The number in the bounds file is then incremented and
the file updated. For example, suppose that a directory has a file that is called bound that

V10.0
Student Notebook
Uempty contains the value 2. In this case, the savecore command would copy the dump image and
name it vmcore.2.BZ, and then update the bounds file to contain the value 3.
When the savecore command copies the dump image, it marks the dump device copied. If the
savecore command is run on the dump device again, it fails with an error message that the
dump is no longer valid. You can ignore this warning by using the -f flag of savecore.
Student Notebook
The dumpcheck command

IBM Power Systems
• The dumpcheck command checks to see that the dump

device is large enough for the current estimated dump size
– If the dump device is paging space, the command also checks that the
copy directory has sufficient free space
– Command is in /usr/lib/ras
• The command creates an error log entry if a problem is
detected
• The root user has a cron job that is scheduled to run
dumpcheck at 3:00pm every day
– Might need to change timing to when system is expected to be most
heavily loaded
Figure 12-36. The dumpcheck command AN153.0
Notes:
Is the dump device large enough?

The dumpcheck command is in the /usr/lib/ras directory. This command checks the dump
configuration to ensure that enough disk resources are configured to handle storing the dump
image in the event of a system dump. The command is run as a root user cron job at 3:00pm
each day. It logs an error if either:
- The largest dump device is too small to receive the dump.
- There is insufficient space in the copy directory file system when the dump device is a
paging space.

V10.0
Student Notebook
Uempty
Automatically reboot after a crash

IBM Power Systems
# smit chgsys
Change/Show Characteristics of Operating System

...
Maximum number of PROCESSES allowed per user [128] +#
Maximum number of pages in block I/O BUFFER CACHE [20] +#
...
Automatically REBOOT system after a crash false +
...
Enable full CORE dump false +
OR
# chdev -l sys0 -a autorestart=true
Figure 12-37. Automatically reboot after a crash AN153.0
Notes:
Specifying automatic reboot that uses SMIT

If you want your system to reboot automatically after a dump, you must set the kernel parameter
autorestart to true. This action can be easily done by the SMIT fastpath smit chgsys.
The corresponding menu item is Automatically REBOOT system after a crash. The
default value is true in AIX V5.2 and later.
Specifying automatic reboot that uses the chdev command

The following command can also be used to automatically reboot after a crash:
# chdev -l sys0 -a autorestart=true
Checking the size of /var

If you specify an automatic reboot, you should verify that the /var file system is large enough
to store a system dump.
Student Notebook
Using kdb to analyze a dump

IBM Power Systems
/unix
/var/adm/ras/vmcore.x
(Kernel)
(Dump file)
# uncompress /var/adm/ras/vmcore.x.Z
OR
# dmpuncompress /var/adm/ras/vmcore.x.BZ
# kdb /var/adm/ras/vmcore.x /unix
> status
> stat
(further subcommands for analyzing)
> quit
/unix kernel must be the same as on the failing machine
Figure 12-38. Using kdb to analyze a dump AN153.0
Notes:
Function of the kdb command

The kdb command is an interactive tool that is used for operating system analysis. Typically,
kdb is used to examine kernel dumps in a system postmortem state. However, a live running
system can also be examined with kdb. Due to the dynamic nature of the operating system,
various tables, and structures often change while they are being examined, and this action
precludes extensive analysis.
Examining an active system

To examine an active system, you would run the kdb command without any arguments.
Analyzing a system dump

For a dead system, a dump is analyzed by using the kdb command with file name arguments,
as illustrated on the visual.

V10.0
Student Notebook
Uempty To use kdb, the vmcore file must be uncompressed. After a crash, it is typically named
vmcore.x.Z, which indicates that it is in a compressed format. As illustrated on the visual, use
the uncompress command before using kdb.
To analyze a dump file, you would first uncompress the compressed dump. If the dump file has
a .Z suffix, then you would use the uncompress command. Starting in AIX 6.1, the dump file
ends in a .BZ suffix and you must use the dmpuncompress command to process this file. If
you want to leave the original compressed file intact (rather than replacing it with the
uncompressed file), then use the -p option of the dmpuncompress command.
# uncompress /var/adm/ras/vmcore.x.Z
or
# dmpuncompress /var/adm/ras/vmcore.x.BZ
When the dump is uncompressed, you would analyze it with the kdb command.
# kdb /var/adm/ras/vmcore.x /unix
Potential problems when using kdb

If the copy of /unix does not match the dump file, the following output appears on the screen:
WARNING: dumpfile does not appear to match namelist
If the dump itself is corrupted in some way, then the following message appears on the screen:
...
dump /var/adm/ras/vmcore.x corrupted
Useful subcommands
Examining a system dump requires an in-depth knowledge of the AIX kernel. However, two
subcommands that might be useful to you are:
- The subcommand status displays the processes/threads that were active on the CPUs
when the crash occurred
- The subcommand stat shows the machine status when the dump occurred
To exit the kdb debug program, type quit at the > prompt.
Creating a sample system dump

The following example stops your running machine and creates a system dump:
# cat /unix > /dev/mem
Do not run this command in your production environment.
The LEDs displayed are 888, 102, 300, 0C0:
- Refer to earlier material about the 888 code
- LED 102 indicates that “a dump has occurred”
- LED 300 stands for crash code “Data Storage Interrupt (DSI)”
- LED 0C0 means “Dump completed successfully”
Student Notebook
The snap command

IBM Power Systems
• Used to gather information about the operating system for

debug purposes
– Information that is gathered in subdirectories under /tmp/ibmsupt
– Directory location can be changed using the –d flag
– Various flags available to specify what information to collect
• Depending on the options that are used, the command will:
– Estimate the amount of space that is required for the information to be
gathered
– Collect information in a temporary directory
– Create a compressed pax archive of the contents of the temporary
directory and optionally write the archive to tape or DVD-RAM
Figure 12-39. The snap command AN153.0
Notes:
Overview
The snap command collects system configuration information for problem determination
purposes.
By default, the command creates the /tmp/ibmsupt directory, and other subdirectories under
/tmp/ibmsupt depending on the data that is collected. The -d flag can be used to change the
default directory that is used to collect the data. The snap command has control options that
determine what the command does, and data collection options that determine the type of
system information that is collected.
By default, the command checks the file system that is used for data collection to ensure that
enough free space is available. It gathers the system configuration information based on the
flags that are specified, and then optionally either copy the results to media or create a
compressed pax archive file.
The intent of the snap command is to serve as a single interface for every step that is required
to package information for transmission to a support group. While some of the options might

V10.0
Student Notebook
Uempty seem simple, they do help to prevent simple mistakes from occurring. They also help those
administrators who are not familiar with basic UNIX commands like pax and compress. This
knowledge helps maximize the chances that the snap images sent to IBM for analysis are
correct and contain valid data.
Student Notebook
Data collection flags

IBM Power Systems
Flag Description Directory
–a Gather all information (apart from HACMP) /tmp/ibmsupt

–D Gather dump information /tmp/ibmsupt/dump
–g Gather general information /tmp/ibmsupt/general
–k Gather kernel information /tmp/ibmsupt/kernel
–A Gather asynchronous TTY information /tmp/ibmsupt/async
–e Gather HACMP information /tmp/ibmsupt/hacmp
–f Gather file system information /tmp/ibmsupt/filesys
–l Gather programming language information /tmp/ibmsupt/lang
–L Gather LVM information /tmp/ibmsupt/lvm
–n Gather NFS information /tmp/ibmsupt/nfs
–t Gather TCP/IP information /tmp/ibmsupt/tcpip
–p Gather printer information /tmp/ibmsupt/printer
–S Gather security information when –g is used /tmp/ibmsupt/general
–@ Gather WPAR information /tmp/ibmsupt/wlm
Figure 12-40. Data collection flags AN153.0
Notes:
snap data collection flags

The table on the visual shows many of the data collections flags supported by the snap
command. Each entry in the table describes the type of data that is collected when the flag is
used, and lists the name of the subdirectory that is created to hold the information. See the AIX
documentation for the snap command for a full description of the data collection options
available.

V10.0
Student Notebook
Uempty
Control flags
IBM Power Systems
Flag Description
–c Create a compressed pax archive file
–d directory Collect system information in specified directory

instead of /tmp/ibmsupt
–o device Create a pax archive on the specified device
–N Suppress checking the amount of space that is

required to collect information
–r Remove collected information from the directory
/tmp/ibmsupt
Figure 12-41. Control flags AN153.0
Notes:
snap control flags

The -c flag causes snap to create a compressed pax image, /tmp/ibmsupt/snap.pax.Z,
of all files in the /tmp/ibmsupt directory tree, or other directory tree that is an argument with
the -d flag. The -d flag causes snap to use the specified directory for data collection instead of
/tmp/ibmsupt. Running snap -o device causes snap to gather the system information,
and then write it out to device by using the pax command. The -r flag tells snap to remove
the files in the data collection directory. The -d flag is the only other flag that can be used with
the -r flag.
Normally the snap command does two passes of data collection. In the first pass, snap runs all
the commands to gather system information, but counts only the number of bytes produced - it
does not save the results. After all commands are run once, the snap command knows how
much space it requires to gather all the system information it was instructed to collect. If there is
sufficient space on the file system that is used for the temporary directory, then snap runs all
the data collection commands again, but this time it stores the output in the collection repository.
If snap is run with the -N flag, it skips the first pass to calculate the amount of space that is
Student Notebook
required, and collects the data immediately. Using this flag can cut the time that is taken to run
the snap command in half, the risk is that you can run out of space on the file system without
warning.

V10.0
Student Notebook
Uempty
snap examples
IBM Power Systems
• Example 1:
snap –a –c –d /some/directory
• Example 2:
snap –Dkg –o /dev/rmt0
• Recommendation is to use –a for initial data collection
– Minimum of –Dkg when collecting information about a system dump
• Other considerations:
– Most data collection options append information to files in the
/tmp/ibmsupt directory structure
– Depending on the AIX version, some options might not be available
– /tmp/ibmsupt/other and /tmp/ibmsupt/testcase can be
used to supply additional information
Figure 12-42. snap examples AN153.0
Notes:
Examples
In the first example, snap -a gathers all the system information and use the directory
/some/directory to store the information (-d /some/directory), and to create a
compressed pax archive of the information that is collected (the -c flag).
In the second example, snap captures the dump (-D flag), kernel (-k flag) and general
information (-g flag), and writes the information as a pax archive to the device /dev/rmt0.
The data is collected in /tmp/ibmsupt before being written to the tape.
You should always use the -a data collection option with snap when gathering problem
determination information on a system for the first time. One thing to watch out for with the
snap command is that many of the functions append data to existing files. If the snap
command is run multiple times without cleaning the temporary directory, some of the collected
data files have multiple sets of information, but collected at different times. Always make sure
that you are looking at the most recent set of data in a file. The most recent set of data is at the
end of the file.
Student Notebook
The subdirectories other and testcase can be used to supply test case data and programs as
part of the snap package. Run snap -a to collect system data, then place files into the other or
testcase directories, then run snap -c or snap -o device to create the package.

V10.0
Student Notebook
Uempty
Checkpoint
IBM Power Systems
1. True or False: The savecore command shows information

about the last system dump.
2. True or False: The snap command is used to gather

information to be submitted for problem investigation.
3. True or False: The dumpcheck command indicates whether

the copy directory has insufficient disk space.
4. True or False: By default, AIX 7.1 is configured to use

firmware assisted dump when the platform supports it.
Notes:
Student Notebook
Exercise: The AIX system dump facility

IBM Power Systems
• Work with the AIX Dump Facility

• Work with a dedicated dump logical
volume
• Generate a firmware assisted dump
• Initiate a dump from the HMC
Figure 12-44. Exercise: The AIX system dump facility AN153.0
Notes:

V10.0
Student Notebook
Uempty
Unit summary
IBM Power Systems

• Package a system dump and other information with the snap
command
Notes:
When a dump occurs, kernel and system data are copied to the primary dump device.
By default, the system has a primary dump device (/dev/hd6) and a secondary device
(/dev/sysdumpnull).
During reboot, the dump is copied to the copy directory (/var/adm/ras).
A system dump should be retrieved from the system by using the snap command.
The Support Center uses the kdb debugger to examine the dump.
Student Notebook

V10.0
Student Notebook
AP
Appendix A. Checkpoint solutions
Unit 1, "Advanced AIX administration overview"
Solutions for Figure 1-15, "Checkpoint," on page 1-26
Checkpoint solutions
IBM Power Systems
1. What are the four major problem determination steps?

The answer is identify the problem, talk to users (to further define the
problem), collect system data, and resolve the problem.
2. Who should provide information about system problems?

The answer is always talk to the users about such problems to be able
to gather as much information as possible.
3. True or False: If there is a problem with the software, it is necessary to

get the next release of the product to resolve the problem.
The answer is false: in most cases, it is only necessary to apply fixes or
upgrade microcode.
4. True or False: Documentation can be viewed or downloaded from the

IBM website.
The answer is true.
© Copyright IBM Corp. 2009, 2015 Appendix A. Checkpoint solutions A-1

Student Notebook
Unit 2, "The Object Data Manager"
Solutions for Figure 2-23, "Checkpoint (1 of 2)," on page 2-38
Checkpoint solutions (1 of 2)
IBM Power Systems
1. True or False: The CuAt ODM object class contains an entry for
each attribute for each supported device.
The answer is false. It is the PdAt ODM object class. The CuAt
object class only contains attributes that are different from the
default value.
2. True or False: The DvDr attribute in the PdDv ODM object class
identifies the program that is loaded into the kernel when the device
is made available.
The answer is true.
3. True or False: The configure attribute in the CuDv ODM object

class identifies the program that runs to bring a device to the
available state.
The answer is false. It is in the PdDv ODM object class.
A-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.0
Student Notebook
AP Solutions for Figure 2-24, "Checkpoint (2 of 2)," on page 2-39
IBM Power Systems
4. True or False: The /etc/objrepos ODM repository

holds object classes that are specific to a system.
The answer is true.
5. True or False: A defined device has an entry in CuDv, but

cannot be used at this time.
The answer is true.
6. True or False: An available device has its device driver

loaded into the kernel and a device file created in /dev (if
applicable).
The answer is true.

Student Notebook
Unit 3, "Error monitoring"
IBM Power Systems
1. Which command generates error reports?

The answer is the errpt command.
2. Which flag of this command is used to generate a detailed error

report?
The answer is errpt–a generates a detailed report.
3. Which type of disk error indicates bad blocks?

The answer is DISK_ERR4.
4. What does the errclear command do?

The answer is it clears entries from the error log.

V10.0
Student Notebook
AP Solutions for Figure 3-21, "Checkpoint (2 of 2)," on page 3-42
IBM Power Systems
5. What does the errlogger command do?

The answer is it is used by root to add entries into the error log.
6. What does the following line in /etc/syslog.conf indicate?

*.debug errlog
The answer is all syslogd entries are directed to the error log.
7. What does the descriptor en_method in errnotify indicate?

The answer is it specifies a program or command to be run when an
error matching the selection criteria is logged.

Student Notebook
Unit 4, "Network Installation Manager basics"
IBM Power Systems
1. True or False: NIM can be used to fix an LPAR that fails to

boot because of a problem with the /etc/inittab.
The answer is true, maint_boot.
2. True or False: The lsnim command can be used to display

information about NIM objects.
The answer is true.
3. True or False: A NIM client cannot be a resource server.

The answer is false.
4. True or False: An lpp_source resource contains software to

be installed.
The answer is true.

V10.0
Student Notebook
AP Unit 5, "System initialization: Accessing a boot image"
IBM Power Systems
1. True or False: You must have AIX loaded on your system to use the System
Management Services programs.
The answer is false: SMS is part of the built-in firmware.
2. Your AIX system is powered off. AIX is installed on hdisk1 but the bootlist is set to
boot from hdisk0. How can you fix the problem and make the machine boot from
hdisk1?
The answer is you need to boot the SMS programs and set the new boot list to
include hdisk1.
3. Your machine is booted and is at the # prompt. What is the command that
displays the normal bootlist?
The answer is # bootlist -m normal –o.
4. Your machine is booted and is at the # prompt. How might you change the
normal bootlist?
The answer is # bootlist -m normal device1 device2.

Student Notebook
IBM Power Systems
5. What command is used to build a new boot image and write it to the
boot logical volume?
The answer is bosboot -ad /dev/hdiskx.
6. What script controls the boot sequence?

The answer is rc.boot.
7. True or False: During the AIX boot process, the AIX kernel is loaded
from the root file system.
The answer is false: the AIX kernel is loaded from hd5.
8. How do you boot an AIX machine into maintenance mode?

The answer is you need to boot from an AIX CD, mksysb, or NIM
server.

V10.0
Student Notebook
AP Unit 6, "System initialization: rc.boot and inittab"
IBM Power Systems
1. From where is rc.boot 3 run?

The answer is from the /etc/inittab file in rootvg.
2. Your system stops booting with LED 557. In which rc.boot

phase does the system stop?
The answer is rc.boot 2.
3. What are some reasons for this problem (LED 557)?

The answer is corrupted JFS log or damaged file system.
4. Which ODM file is used by the cfgmgr during boot to

configure the devices in the correct sequence?
The answer is Config Rules.

Student Notebook
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
The answer is there is a problem with processing /etc/inittab.
6. What does the line init:2:initdefault: in /etc/inittab

mean?
The answer is this line is used by the init process to determine the
initial run level (2=multiuser).

V10.0
Student Notebook
AP Unit 7, "LVM metadata and related problems"
IBM Power Systems
1. True or False: All LVM information is stored in the ODM.

The answer is false: Information is also stored in other AIX files and in disk
control blocks (like the VGDA and LVCB).
2. True or False: You detect that a physical volume hdisk1 that

is contained in your rootvg is missing in the ODM. This
problem can be fixed by exporting and importing rootvg.
The answer is false: Use the rvgrecover procedure instead. This script
creates a complete set of new rootvg ODM entries.

Student Notebook
Unit 8, "Disk management procedures"
IBM Power Systems
1. Although everything seems to be working fine, you detect error log

entries for disk hdisk0 in your rootvg. The disk is not mirrored to
another disk. You decide to replace this disk. Which procedure would
you use to migrate this disk?
The answer is procedure 2: Disk still working. There are some
additional steps necessary for hd5 and the primary dump device
hd6.You detect an unrecoverable disk failure in volume group datavg.
2. This volume group consists of two disks that are completely mirrored.
Because of the disk failure you are not able to vary on datavg. How do
you recover from this situation?
The answer is forced varyon: varyonvg -f datavg. Use
procedure 1 for mirrored disks.
3. After disk replacement, you find that a disk has been removed from the
system but not from the volume group. How do you fix this problem?
The answer is repair the ODM, for example through exportvg and
importvg. Execute reducevg using the PVID instead of disk name.

V10.0
Student Notebook
AP Unit 9, "Install and cloning techniques"
IBM Power Systems
1. Name the two ways alternate disk installation can be used.

The answer is installing a mksysb image on another disk and cloning
the current running rootvg to an alternate disk.
2. What are the advantages of alternate disk rootvg cloning?

The answer is creates an online backup and allows maintenance and
updates to software on the alternate disk which helps to minimize
downtime.
3. Why should you not use exportvg with an alternate disk volume
group?
The answer is this removes rootvg related entries from
/etc/filesystems.

Student Notebook
IBM Power Systems
4. True or False: multibos provides for booting between alternate

operating system environments within a single rootvg.
The answer is True.
5. True or False: A standby BOS can only be accessed by changing the

bootlist and then rebooting.
The answer is False.
6. True or False: multibos requires cloning all of the logical volumes in

the active rootvg.
The answer is False.

V10.0
Student Notebook
AP Unit 10, "Advanced backup techniques"
IBM Power Systems
1. True or False: The creation of a snapshot volume group marks all

copies in the snapshot as stale.
2. True or False: The creation of a JFS split copy marks all of the split
mirror copies as stale.
The answer is true.
3. True or False: After the creation of a JFS split mirror copy, the
administrator needs to mount the new file system to be able to access
the split copy.
4. To access a SAN Copy of an active volume group on the source

system, use the command:
a. joinvg
b. importvg
c. recreatevg
The answer is recreatevg.

Student Notebook
Unit 11, "Diagnostics"
IBM Power Systems
1. What diagnostic modes are available?

a. Concurrent
b. Maintenance
c. Service (standalone)
d. All of the above
The answer is all of the above.
2. How can you diagnose a communication adapter that is

used during normal system operation?
The answer is use either maintenance or service mode.

V10.0
Student Notebook
AP Unit 12, "The AIX system dump facility"
IBM Power Systems
1. True or False: The savecore command shows information about

the last system dump.
The correct answer is false.
2. True or False: The snap command is used to gather information to

be submitted for problem investigation.
The correct answer is true.
3. True or False: The dumpcheck command indicates whether the

copy directory has insufficient disk space.
4. True or False: By default, AIX 7.1 is configured to use firmware

assisted dump when the platform supports it.

Student Notebook

V10.0
Student Notebook
AP
Appendix B. Command summary
Startup, Logoff, and Shutdown

<Ctrl>d (exit) Log off the system (or the current shell).
shutdown Shuts down the system by disabling all processes. If in single-user
mode, you can use the-F option for fast shutdown. -r option reboots
system. This command requires user to be root or member of
shutdown group.
Directories
mkdir Make directory
cd Change the directory. The default is $HOME directory.
rmdir Remove a directory (beware of files that start with “.”).
rm Remove file; -r option removes directory and all files and
subdirectories recursively.
pwd Print working directory: shows name of current directory
ls List files
-a (all)
-l (long)
-d (directory information)
-r (reverse alphabetic order)
-t (time changed)
-C (multi-column format)
-R (recursively)
-F (places / after each directory name and * after each exec file)
Files: Basic
cat List files contents (concatenate). cat can open a new file with
redirection, for example, cat > newfile. Use <Ctrl>d to end
input.
chmod Change the permission mode for files or directories.
• chmod =+- files or directories
• (r,w,x = permissions and u, g, o, a = who)
• Can use + or - to grant or revoke specific permissions
• Can also use numerics, 4 = read, 2 = write, 1 = execute
• Can combine them, first - user, next - group, last - other
• For example, chmod 746 file1 is user = rwx, group = r, other
= rw
© Copyright IBM Corp. 2009, 2015 Appendix B. Command summary B-1

Student Notebook
chown Change owner of a file, for example, chown owner file

chgrp Change group of files
cp Copy file
mv Move or rename file
pg List file content by screen (page)
• h (help)
• q (quit)
• <cr> (next page)
• f (skip one page)
• l (next line)
• d (next 1/2 page)
• $ (last page)
• p (previous file)
• n (next file)
• . (redisplay current page)
• /string (find string forward)
• ?string (find string backward)
• -# (move backward # pages)
• +# (move forward # pages)
. Current directory
.. Parent directory
rm Remove (delete) files (-r option removes directory and all files and
subdirectories)
head Print first several lines of a file
tail Print last several lines of a file
wc Report number of lines (-l), words (-w), characters (-c) in files, no
options gives lines, words, and characters
su Switch user
id Displays your user ID environment, user name and ID, group names
and IDs
tty Displays the device that is active. Useful for X Windows where
several pts devices that can be created. It is nice to know which one
you have active. who am i does the same.
Files: Advanced
awk Programmable text editor / report write
banner Display banner (can redirect to another terminal nn with
> /dev/ttynn)
cal Calendar (cal month year)
B-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.0
Student Notebook
AP cut Cut out specific fields from each line of a file.

diff Differences between two files
find Find files anywhere on disks. Specify location by path (searches all
subdirectories under specified directory).
• -name fl (file names match fl criteria)
• -user ul (files that user ul owns)
• -size +n (or -n) (files larger (or smaller) than n blocks)
• -mtime +x (-x) (files that are modified more (less) than x days
ago)
• -perm num (files whose access permissions match num)
• -exec (Run a command with results of find command)
• -ok (Run a command interactively with results of find
command)
• -o (logical or)
• -print (display results. Usually included.)
find syntax: find path expression action
For example:
• find / -name "*.txt" -print
• find / -name "*.txt" -exec li -l {} \;
(Runs li -l where names found are substituted for {})
; indicates end of command to be run and \ removes usual
interpretation as command continuation character).
grep Search for pattern, for example, grep pattern files.
pattern can include regular expressions.
• -c (count lines with matches, but do not list)
• -l (list files with matches, but do not list)
• -n (list line numbers with lines)
• -v (find files without pattern)
Expression metacharacters:
• [ ] matches any single character inside.
• With a - in [ ] matches a range of characters
• ^ matches BOL when ^ begins the pattern.
• $ matches EOL when $ ends the pattern.
• . matches any single character. (same as ? in shell)
• * matches 0 or more occurrences of the preceding character.
(Note: ".*" is the same as "*" in the shell).
sed Stream (text) editor that is used with editing flat files
sort Sort and merge files
-r (reverse order); -u (keep only unique lines)

Student Notebook
Editors
ed Line editor
vi Screen editor
INed LPP editor
emacs Screen editor +
Shells, redirection, and pipelining

< (read) Redirect standard input, for example, command < file reads input
for command from file.
> (write) Redirect standard output, for example, command > file writes
output for command to file overwriting contents of file.
>> (append) Redirect standard output, for example, command >> file appends
output for command to the end of file.
2> Redirect standard error (to append standard error to a file, use
command 2>> file) combined redirection examples:
• command < infile > outfile 2> errfile
• command >> appendfile 2>> errfile < infile
; Command terminator that is used to string commands on single line
| Pipe information from one command to the next command. For
example, ls | cpio -o > /dev/fd0 passes the results of the
ls command to the cpio command.
\ Continuation character to continue command on a new line, is
prompted with > for command continuation
tee Reads standard input and sends standard output to both standard
output and a file, for example,
ls | tee ls.save | sort results in ls output to ls.save and
piped to sort command
Metacharacters
* Any number of characters (0 or more)
? Any single character
[abc] [ ] any character from the list
[a-c] [ ] match any character from the list range
! Not any of the following characters (for example, [, !abc, or ] )
; Command terminator that is used to string commands on a single line
& Command preceding and to be run in background mode

V10.0
Student Notebook
AP # Comment character
\ Removes special meaning (no interpretation) of the following
character
Removes special meaning (no interpretation) of character in
quotation marks
" Interprets only $, backquote, and \ characters between the quotation
marks
' Used to set variable to results of a command.
For example, now='date' sets the value of now to current results of
the date command
$ Preceding variable name indicates the value of the variable
Physical and logical storage

chfs Changes file system attributes such as mount point, permissions, and
size
compress Reduces the size of the specified file that uses the adaptive LZ
algorithm
crfs Creates a file system within a previously created logical volume
extendlv Extends the size of a logical volume
extendvg Extends a volume group by adding a physical volume
fsck Checks for file system consistency, and allows interactive repair of file
systems
fuser Lists the process numbers of local processes that use the files that
are specified
lsattr Lists the attributes of the devices that are known to the system
lscfg Gives detailed information about the AIX system hardware
configuration
lsdev Lists the devices that are known to the system
lsfs Displays characteristics of the specified file system such as mount
points, permissions, and file system size
lslv Shows you information about a logical volume
lspv Shows you information about a physical volume in a volume group
lsvg Shows you information about the volume groups in your system
lvmstat Controls LVM statistic gathering
migratepv Used to move physical partitions from one physical volume to another
migratelp Used to move logical partitions to other physical disks
mkdev Configures a device

Student Notebook
mkfs Makes a new file system on the specified device

mklv Creates a logical volume
mkvg Creates a volume group
mount Instructs the operating system to make the specified file system
available for use from the specified point
quotaon Starts the disk quota monitor
rmdev Removes a device
rmlv Removes logical volumes from a volume group
rmlvcopy Removes copies from a logical volume
umount Unmounts a file system from its mount point
uncompress Restores files compressed by the compress command to their
original size
unmount The same function as the umount command
varyoffvg Deactivates a volume group so that it cannot be accessed
varyonvg Activates a volume group so that it can be accessed
Variables
= Set a variable (for example, d="day" sets the value of d to "day").
Can also set the variable to the results of a command by the `
character. For example, now=`date` sets the value of now to the
current result of the date command.
HOME Home directory
PATH Path to be checked
SHELL Shell to be used
TERM Terminal being used
PS1 Primary prompt characters, usually $ or #
PS2 Secondary prompt characters, usually >
$? Return code of the last command run
set Displays current local variable settings
export Exports variable so the child process can inherit the variable
env Displays inherited variables
echo Echo a message (for example, echo HI or echo $d).
Can turn off carriage returns with \c at the end of the message.
Can print a blank line with \n at the end of the message.

V10.0
Student Notebook
AP Tapes and diskettes

dd Reads a file in, converts the data (if required), and copies the file out
fdformat Formats diskettes or read/write optical media disks
flcopy Copies information to and from diskettes
format AIX command to format a diskette
backup Backs up individual files
• -i reads file names from standard input
• -v list files as backed up;
• For example, backup -iv -f/dev/rmt0 file1, file2
• -u backup file system at specified level; For example, backup
-level -u filesystem
Can pipe list of files to be backed up into command, for example,
find . -print | backup -ivf/dev/rmt0 where you are in
directory to be backed up.
mksysb Creates an installable image of the root volume group
restore Restores commands from backup
• -x restores files that are created with backup -i
• -v list files as restore
• -T list files that are stored of tape or diskette
• -r restores file system that is created with backup -level
-u;
For example, restore -xv -f/dev/rmt0
cpio Copies to and from an I/O device. Overwrites all data previously on
tape or diskette. For input, must be able to place files in the same
relative (or absolute) path name as when copied out (can determine
path names with -it option). For input, if file exists, compares last
modification date and keeps most recent (can override with -u
option).
• -o (output)
• -i (input),
• -t (table of contents)
• -v (verbose),
• -d (create needed directory for relative path names)
• -u (unconditional to override last modification date)
For example, cpio -o > /dev/fd0 or
cpio -iv file1 < /dev/fd0
tapechk Runs simple consistency checking for streaming tape drives
tcopy Copies information from one tape device to another
tctl Sends commands to a streaming tape device
tar Alternative utility to back up and restore files

Student Notebook
pax Alternative utility to cpio and tar commands
Transmitting
mail Send and receive mail. With user ID sends mail to user ID. Without
user ID, displays your mail. When processing your mail, at the ?
prompt for each mail item, you can:
• d - delete
• s - append
• q - quit
• enter - skip
• m - forward
mailx Upgrade of mail
uucp Copy file to other UNIX systems (UNIX to UNIX copy)
uuto/uupick Send and retrieve files to public directory
uux Run on remote system (UNIX to UNIX execute)
System administration
df Display file system usage
installp Install program
kill (pid) Stop batch process with ID or (PID) (find by using ps);
kill -9 PID absolutely kill process
mount Associate logical volume to a directory;
For example, mount device directory
ps -ef Shows process status (ps -ef)
umount Disassociate file system from directory
smit System management interface tool
Miscellaneous
banner Displays banner
date Displays current date and time
newgrp Change active groups
nice Assigns lower priority to following command (for example,
nice ps -f)
passwd Modifies current password
sleep n Sleep for n seconds

V10.0
Student Notebook
AP stty Show or set terminal settings

touch Create a zero length files
xinit Initiate X Windows
wall Sends message to all logged in users
who List users who are currently logged in (who am i identifies this user)
man,info Displays manual pages
System files
/etc/group List of groups
/etc/motd Message of the day, which is displayed at login
/etc/passwd List of users and signon information. Password that is shown as !,
can prevent password checking by editing to remove !
/etc/profile System-wide user profile that is executed at login, can override
variables by resetting in the user's .profile file
/etc/security Directory not accessible to normal users
/etc/security/environ User environment settings
/etc/security/group Group attributes
/etc/security/limits User limits
/etc/security/login.cfg Login settings
/etc/security/passwd User passwords
/etc/security/user User attributes, password restrictions
Shell programming summary
Variables
var=string Set variable to equal string. (NO SPACES). Spaces must be enclosed
by double quotation marks. Special characters in string must be
enclosed by single quotation marks to prevent substitution. Piping (|),
redirection (<, >, >>), and & symbols are not interpreted.
$var Gives value of var in a compound
echo Displays value of var, for example, echo $var
HOME = Home directory of user
MAIL = Mail file name
PS1 = Primary prompt characters, usually "$" or "#"
PS2 = Secondary prompt characters, usually ">"

Student Notebook
PATH = Search path

TERM = Terminal type
export Exports variables to the environment
env Displays environment variables settings
${var:-string} Gives value of var in a command, if var is null, uses string instead
$1 $2 $3... Positional parameters for variable that is passed into the shell script
$* Used for all arguments that are passed into shell script
$# Number of arguments that are passed into shell script
$0 Name of shell script
$$ Process ID (PID)
$? Last return code from a command
Commands
# Comment designator
&& Logical-and. Run command after && only if command Preceding &&
succeeds (return code = 0)
|| Logical-or. Run command after || only if the command that precedes
|| fails (return code < > 0)
exit n Used to pass return code nl from shell script, passed as variable
$? to parent shell
expr Arithmetic expressions
Syntax: "expr expression1 operator expression2"
Operators: + - \* (multiply) / (divide) % (remainder)
for loop for n (or: for variable in $*); for example:
do
command
done
if-then-else if test expression
then command
elif test expression
then command
else
then command
fi
read Read from standard input
shift Shifts arguments 1-9 one position to the left and decrements number
of arguments

V10.0
Student Notebook
AP test Used for conditional test, has two formats.

if test expression (for example, if test $# -eq 2)
if [ expression ]
(for example, if [ $# -eq 2 ]) (spaces required)
Integer operators:
-eq (=) -lt (<) -le (=<)
-ne (<>) -gt (>) -ge (=>)
String operators:
= != (not eq.) -z (zero length)
File status (for example, -opt file1)
• -f (ordinary file)
• -r (readable by this process)
• -w (writable by this process)
• -x (executable by this process)
• -s (Nonzero length)
while loop while test expression
do
command
done
Miscellaneous
sh Run shell script in the sh shell
-x (execute step-by-step, used for debugging shell scripts)
vi Editor
Entering vi
vi file Edits the file named file
vi file file2 Edit files consecutively (through :n)
.exrc File that contains the vi profile
wm=nn Sets wrap margin to nn. Can enter a file other than at first line by
adding + (last line), +n (line n), or +/pattern (first occurrence of
pattern).
vi -r Lists saved files
vi -r file Recover file that is named file from crash
:n Next file in stack
:set all Show all options
:set nu Display line numbers (off when set nonu)

Student Notebook
:set list Display control characters in file

:set wm=n Set wrap margin to n
:set showmode Sets display of "INPUT" when in input mode
Read, write, exit

:w Write buffer contents
:w file2 Write buffer contents to file2
:w >> file2 Write buffer contents to end of file2
:q Quit editing session
:q! Quit editing session and discard any changes
:r file2 Read file2 contents into buffer after current cursor
:r! com Read results of shell command com after current cursor
:! Exit shell command (filter through command)
:wq or ZZ Write and quit edit session
Units of measure
h, l Character left, character right
k or <Ctrl>p Move cursor to character above cursor
j or <Ctrl>n Move cursor to character below cursor
w, b Word right, word left
^, $ Beginning, end of current line
<CR> or + Beginning of next line
- Beginning of previous line
G Last line of buffer
Cursor movements
Can precede cursor movement commands (including cursor arrow) with number of times to repeat,
for example, 9--> moves right 9 characters.
0 Move to first character in line
$ Move to last character in line
^ Move to first nonblank character in line
fx Move right to character x
Fx Move left to character x

V10.0
Student Notebook
AP tx Move right to character that precedes character x

Tx Move left to character that precedes character x
; Find next occurrence of x in same direction
, Find next occurrence of x in opposite direction
w Tab word (nw = n tab word) (punctuation is a word)
W Tab word (nw = n tab word) (ignore punctuation)
b Backtab word (punctuation is a word)
B Backtab word (ignore punctuation)
e Tab to ending character of next word (punctuation is a word)
E Tab to ending character of next word (ignore punctuation)
( Move to beginning of current sentence
) Move to beginning of next sentence
{ Move to beginning of current paragraph
} Move to beginning of next paragraph
H Move to first line on screen
M Move to middle line on screen
L Move to last line on screen
<Ctrl>f Scroll forward one screen (three lines overlap)
<Ctrl>d Scroll forward 1/2 screen
<Ctrl>b Scroll backward one screen (0 line overlap)
<Ctrl>u Scroll backward 1/2 screen
G Go to last line in file
nG Go to line n
<Ctrl>g Display current line number
Search and replace

/pattern Search forward for pattern
?pattern Search backward for pattern
n Repeat find in the same direction
N Repeat find in the opposite direction
Adding text
a Add text after the cursor (end with <esc>)

Student Notebook
A Add text at end of current line (end with <esc>)

i Add text before the cursor (end with <esc>)
I Add text before first nonblank character in current line
o Add line after current line
O Add line before current line
<esc> Return to command mode
Deleting text
<Ctrl>w Undo entry of current word
@ Delete the insert on this line
x Delete current character
dw Delete to end of current word (observe punctuation)
dW Delete to end of current word (ignore punctuation)
dd Delete current line
d Erase to end of line (same as d$)
d) Delete current sentence
d} Delete current paragraph
dG Delete current line through end of buffer
d^ Delete to the beginning of line
u Undo last change command
U Restore current line to original state before modification
Replacing text
ra Replace current character with a
R Replace all characters that are written over until <esc> is entered
s Delete current character and append test until <esc>
s/s1/s2 Replace s1 with s2 (in the same line only)
S Delete all characters in the line and append text
cc Replace all characters in the line (same as S)
ncx Delete n text objects of type x, w, b = words,) = sentences, } =
paragraphs, $ = end-of-line, ^ = beginning of line) and enter append
mode
C Replace all characters from cursor to end of line

V10.0
Student Notebook
AP Moving text
p Paste last text that is deleted after cursor (xp will transpose 2
characters)
P Paste last text that is deleted before cursor
nYx Yank n text objects of type x (w, b = words,) = sentences, } =
paragraphs, $ = end of line, and no "x" indicates lines. Can then paste
them with p command. Yank does not delete the original.
"ayy" Can use named registers for moving, copying, cut/paste with "ayy" for
register a (use registers a-z), can then paste them with ap command.
Miscellaneous
. Repeat last command
J Join current line with next line

Student Notebook

V10.0
Student Notebook
AP
Appendix C. AIX dump code and progress codes
This appendix is an extract out of the AIX 4.3 Messages Guide and Reference.
0c0 - 0cc
0c0 A user-requested dump completed successfully.
0c1 An I/O error occurred during the dump.
0c2 A user-requested dump is in progress. Wait at least 1 minute for the dump to
complete.
0c4 The dump ran out of space. Partial dump is available.
0c5 The dump failed due to an internal failure. A partial dump might exist.
0c7 Progress indicator. Remote dump is in progress.
0c8 The dump device is disabled. No dump device configured.
0c9 A system-initiated dump started. Wait at least 1 minute for the dump to
complete.
0cc (AIX 4.2.1 and later) An error occurred writing to the primary dump device. It
switched over to the secondary.
100 - 195
100 Progress indicator. BIST completed successfully.
101 Progress indicator. Initial BIST started following system reset.
102 Progress indicator. BIST started following power-on reset.
103 BIST could not determine the system model number.
104 BIST could not find the common on-chip processor bus address.
105 BIST could not read from the on-chip sequencer EPROM.
106 BIST detected a module failure.
111 On-chip sequencer stopped. BIST detected a module error.
112 Checkstop occurred during BIST and checkstop results could not be logged
out.
113 The BIST checkstop count equals 3 which means three unsuccessful system
restarts. System halts.
120 Progress indicator. BIST started CRC check on the EPROM.
121 BIST detected a bad CRC on the on-chip sequencer EPROM.
122 Progress indicator. BIST started a CRC check on the EPROM.
© Copyright IBM Corp. 2009, 2015 Appendix C. AIX dump code and progress codes C-1
Student Notebook
123 BIST detected a bad CRC on the on-chip sequencer NVRAM.

124 Progress indicator. BIST started a CRC check on the on-chip sequencer
NVRAM.
125 BIST detected a bad CRC on the time-of-day NVRAM.
126 Progress indicator. BIST started a CRC check on the time-of-day NVRAM.
127 BIST detected a bad CRC on the EPROM.
130 Progress indicator. BIST presence test started.
140 BIST was unsuccessful. The system halts.
143 Invalid memory configuration
151 Progress indicator. BIST started.
152 Progress indicator. BIST started direct-current logic self-test (DCLST) code.
153 Progress indicator. BIST started.
154 Progress indicator. BIST started array self-test (AST) test code.
160 BIST detected a missing early power-off warning (EPOW) connector.
161 The Bump quick I/O tests failed.
162 The JTAG tests failed.
164 BIST encountered an error while reading low NVRAM.
165 BIST encountered an error while writing low NVRAM.
166 BIST encountered an error while reading high NVRAM.
167 BIST encountered an error while writing high NVRAM.
168 BIST encountered an error while reading the serial input/output register.
169 BIST encountered an error while writing the serial input/output register.
180 Progress indicator. The BIST checkstop logout is in progress.
182 BIST COP bus is not responding.
185 Checkstop occurred during BIST.
186 System logic-generated checkstop (Model 250 only).
187 BIST was unable to identify the chip release level in the checkstop logout
data.
195 Progress indicator. The BIST checkstop logout completed.
200 - 299, 2e6-2e7

200 Key mode switch is in the secure position.
C-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

V10.0
Student Notebook
AP 201 Checkstop occurred during system restart. If a 299 LED was shown before,
re-create the boot logical volume (bosboot).
202 Unexpected machine check interrupt, system halts
203 Unexpected data storage interrupt, system halts
204 Unexpected instruction storage interrupt, system halts
205 Unexpected external interrupt, system halts
206 Unexpected alignment interrupt, system halts
207 Unexpected program interrupt, system halts
208 Machine check due to an L2 uncorrectable ECC, system halts
209 Reserved, system halts
210 Unexpected switched virtual circuit (SVC) 1000 interrupt, system halts
211 IPL ROM CRC miscompare occurred during system restart, system halts
212 POST found processor to be bad, system halts
213 POST failed. No good memory could be detected, the system halts.
214 An I/O planar failure was detected. The power status register, the time-of-day
clock, or NVRAM on the I/O planar failed. The system halts
215 Progress indicator. The level of voltage that is supplied to the system is too
low to continue a system restart.
216 Progress indicator. The IPL ROM code is being uncompressed into memory
for execution.
217 Progress indicator. The system encountered the end of the boot devices list.
The system continues to loop through the boot devices list.
218 Progress indicator. POST is testing for 1MB of good memory.
219 Progress indicator. POST bit map is being generated.
21c L2 cache was not detected as part of systems configuration (when LED
persists for 2 seconds).
220 Progress indicator. IPL control block is being initialized.
221 An NVRAM CRC miscompare occurred while loading the operating system
with the key mode switch in Normal position. System halts.
222 Progress indicator. Attempting a Normal-mode system restart from the
standard I/O planar-attached devices. System tries to restart.
SCSI-attached devices that are specified in the NVRAM list.
224 Progress indicator. Attempting a Normal-mode system restart from the 9333
High-Performance Disk Drive Subsystem.
bus-attached internal disk.
Student Notebook
226 Progress indicator. Attempting a Normal-mode system restart from Ethernet.

227 Progress indicator. Attempting a Normal-mode system restart from token ring.
228 Progress indicator. Attempting a Normal-mode system restart by using the
expansion code devices list, but cannot restart from any of the devices in the
list.
229 Progress indicator. Attempting a Normal-mode system restart from devices in
NVRAM boot devices list, but cannot restart from any of the devices in the list.
System tries to restart.
22c Progress indicator. Attempting a Normal-mode IPL from FDDI specified in the
NVRAM device list.
230 Progress indicator. Attempting a Normal-mode system restart from Family 2
Feature ROM specified in the IPL ROM default devices list.
231 Progress indicator. Attempting a Normal-mode system restart from Ethernet
that is specified by selection from ROM menus.
standard I/O planar-attached devices that are specified in the IPL ROM
default device list.
SCSI-attached devices that are specified in the IPL ROM default device list.
High-Performance Disk Drive Subsystem that is specified in the IPL ROM
High-Performance Disk Drive Subsystem that is specified in the IPL ROM
bus-attached internal disk that is specified in the IPL ROM default device list.
Ethernet that is specified in the IPL ROM default device list.
237 Progress indicator. Attempting a Normal-mode system restart from the token
ring that is specified in the IPL ROM default device list.
token-ring that is specified by selection from ROM menus.
239 Progress indicator. A Normal-mode menu selection failed to boot.
23c Progress indicator. Attempting a Normal-mode IPL form FDDI in IPL ROM
device list.
240 Progress indicator. Attempting a Service-mode system restart from the
Family 2 Feature ROM specified in the NVRAM boot devices list.
241 Attempting a Normal-mode system restart from devices that are specified in
NVRAM bootlist.

V10.0
Student Notebook
AP 242 Progress indicator. Attempting a Service-mode system restart from the

standard I/O planar-attached devices that are specified in the NVRAM boot
devices list.
SCSI-attached devices that are specified in the NVRAM boot devices list.
244 Progress indicator. Attempting a Service-mode system restart from the 9333
High-Performance Disk Drive Subsystem that is specified in the NVRAM boot
devices list.
bus-attached internal disk that is specified in the NVRAM boot devices list.
Ethernet that is specified in the NVRAM boot devices list.
Token-Ring that is specified in the NVRAM boot devices list.
248 Progress indicator. Attempting a Service-mode system restart by using the
expansion code that is specified in the NVRAM boot devices list.
249 Progress indicator. Attempting a Service-mode system restart from devices in
NVRAM boot devices list, but cannot restart from any of the devices in the list.
Family 2 Feature ROM specified in the IPL ROM default devices list.
251 Progress indicator. Attempting a Service-mode system restart from Ethernet
by selection from ROM menus.
standard I/O planar-attached devices that are specified in the IPL ROM
default devices list.
SCSI-attached devices that are specified in the IPL ROM default devices list.
254 Progress indicator. Attempting a Service-mode system restart from the 9333
High-Performance Subsystem devices that are specified in the IPL ROM
default devices list.
bus-attached internal disk that is specified in the IPL ROM default devices list.
Ethernet that is specified in the IPL ROM default devices list.
257 Progress indicator. Attempting a Service-mode system restart from the token
ring that is specified in the IPL ROM default devices list.
258 Progress indicator. Attempting a Service-mode system restart from the token
ring that is specified by selection from ROM menus.
259 Progress indicator. Attempting a Service-mode system restart from FDDI
specified by the operator.
Student Notebook
260 Progress indicator. Menus are being displayed on the local display or terminal
that is connected to your system. The system waits for input from the
terminal.
261 No supported local system display adapter was found. The system waits for a
response from an asynchronous terminal on serial port 1.
262 No local system keyboard was found.
263 Progress indicator. Attempting a Normal-mode system restart from the Family
2 Feature ROM specified in the NVRAM boot devices list.
269 Progress indicator. Cannot boot system, end of bootlist reached.
270 Progress indicator. Ethernet/FDX 10 Mbps MC adapter POST is running.
271 Progress indicator. Mouse and mouse port POST are running.
272 Progress indicator. Tablet port POST is running.
276 Progress indicator. A 10/100 Mbps Ethernet MC adapter POST is running.
277 Progress indicator. Auto Token Ring LAN streamer MC 32 adapter POST is
running.
278 Progress indicator. Video ROM scan POST is running.
279 Progress indicator. FDDI POST is running
280 Progress indicator. 3Com Ethernet POST is running.
281 Progress indicator. Keyboard POST is running.
282 Progress indicator. Parallel port POST is running.
283 Progress indicator. Serial port POST is running.
284 Progress indicator. POWER Gt1 graphics adapter POST is running.
286 Progress indicator. Token Ring adapter POST is running.
287 Progress indicator. Ethernet adapter POST is running.
288 Progress indicator. Adapter slot cards are being queried.
290 Progress indicator. I/O planar test started.
291 Progress indicator. Standard I/O planar POST is running.
292 Progress indicator. SCSI POST is running.
293 Progress indicator. Bus-attached internal disk POST is running.
294 Progress indicator. TCW SIMM in slot J is bad.
295 Progress indicator. Color Graphics Display POST is running.
296 Progress indicator. Family 2 Feature ROM POST is running.

V10.0
Student Notebook
AP 297 Progress indicator. System model number could not be determined. System
halts.
298 Progress indicator. Attempting a warm system restart.
299 Progress indicator. IPL ROM passed control to loaded code.
2e6 Progress indicator. A PCI Ultra/Wide differential SCSI adapter is being
configured.
2e7 An undetermined PCI SCSI adapter is being configured.
500 - 599, 5c0 - 5c6

500 Progress indicator. Querying standard I/O slot.
501 Progress indicator. Querying card in slot 1.
510 Progress indicator. Starting device configuration.
511 Progress indicator. Device configuration completed.
512 Progress indicator. Restoring device configuration from media.
513 Progress indicator. Restoring BOS installation files from media.
516 Progress indicator. Contacting server during network boot.
517 Progress indicator. The / (root) and /usr file systems are being mounted.
518 Mount of the /usr file system was not successful. System halts.
520 Progress indicator. BOS configuration is running.
521 The /etc/inittab file was incorrectly modified or is damaged. The
configuration manager was started from the /etc/inittab file with invalid
options. System halts.
522 The /etc/inittab file was incorrectly modified or is damaged. The
configuration manager was started from the /etc/inittab file with conflicting
options. System halts.
523 The /etc/objrepos file is missing or inaccessible.
524 The /etc/objrepos/Config_Rules file is missing or inaccessible.
525 The /etc/objrepos/CuDv file is missing or inaccessible.
Student Notebook
526 The /etc/objrepos/CuDvDr file is missing or inaccessible.

527 You cannot run Phase 1 now. The /sbin/rc.boot file was probably incorrectly
modified or is damaged.
528 The /etc/objrepos/Config_Rules file was incorrectly modified or is
damaged, or a program that is specified in the file is missing.
529 There is a problem with the device that contains the ODM database or the
root file system is full.
530 The savebase command was unable to save information about the base
customized devices onto the boot device during Phase 1 of system boot.
System halts.
531 The /usr/lib/objrepos/PdAt file is missing or inaccessible. System halts.
532 There is not enough memory for the configuration manager to continue.
System halts.
533 The /usr/lib/objrepos/PdDv file was incorrectly modified or is damaged, or a
program that is specified in the file is missing.
534 The configuration manager is unable to acquire a database lock. System
halts.
535 A HIPPI diagnostics interface driver is being configured.
damaged. System halts.
damaged. System halts.
538 Progress indicator. The configuration manager is passing control to a
configuration method.
539 Progress indicator. The configuration method finished and control was
returned to the configuration manager.
540 Progress indicator. Configuring child of IEEE-1284 parallel port.
544 Progress indicator. An ECP peripheral configure method is running.
545 Progress indicator. A parallel port ECP device driver is being configured.
546 IPL cannot continue due to an error in the customized database.
547 Rebooting after error recovery (LED 546 precedes this LED).
548 restbase failure.
549 Console could not be configured for the “Copy a System Dump” menu.
550 Progress indicator. ATM LAN emulation device driver is being configured.
551 Progress indicator. A varyonvg operation of the rootvg is in progress.
552 The ipl_varyon command failed with a return code not equal to 4, 7, 8 or 9
(ODM or malloc failure). System is unable to vary on the rootvg.

V10.0
Student Notebook
AP 553 The /etc/inittab file was incorrectly modified or is damaged. Phase 1 boot is
completed and the init command started.
554 The IPL device could not be opened or a read failed (hardware not configured
or missing).
555 The fsck -fp /dev/hd4 command on the root file system failed with a nonzero
return code.
556 LVM subroutine error from ipl_varyon.
557 The root file system could not be mounted. The problem is usually due to bad
information on the log logical volume (/dev/hd8) or the boot logical volume
(hd5) is damaged.
558 Not enough memory is available to continue system restart.
559 Less than 2 MB of good memory are left for loading the AIX kernel. System
halts.
560 Unsupported monitor is attached to the display adapter.
561 Progress indicator. The TMSSA device is being identified or configured.
565 Configuring the MWAVE subsystem.
566 Progress indicator. Configuring Namkan twinaxx common card.
567 Progress indicator. Configuring High-Performance Parallel Interface (HIPPI)
device driver (fpdev).
568 Progress indicator. Configuring High-Performance Parallel Interface (HIPPI)
device driver (fphip).
569 Progress indicator. FCS SCSI protocol device is being configured.
570 Progress indicator. A SCSI protocol device is being configured.
571 HIPPI common functions driver is being configured.
572 HIPPI IPI-3 master mode driver is being configured.
573 HIPPI IPI-3 slave mode driver is being configured.
574 HIPPI IPI-3 user-level interface is being configured.
575 A 9570 disk-array driver is being configured.
576 Generic async device driver is being configured.
577 Generic SCSI device driver is being configured.
578 Generic common device driver is being configured.
579 Device driver is being configured for a generic device.
580 Progress indicator. A HIPPI-LE interface (IP) layer is being configured.
581 Progress indicator. TCP/IP is being configured. The configuration method for
TCP/IP is being run.
582 Progress indicator. Token ring data link control (DLC) is being configured.
Student Notebook
583 Progress indicator. Ethernet data link control (DLC) is being configured.
584 Progress indicator. IEEE Ethernet (802.3) data link control (DLC) is being
configured.
585 Progress indicator. SDLC data link control (DLC) is being configured.
586 Progress indicator. X.25 data link control (DLC) is being configured.
587 Progress indicator. Netbios is being configured.
588 Progress indicator. Bisync read-write (BSCRW) is being configured.
589 Progress indicator. SCSI target mode device is being configured.
590 Progress indicator. Diskless remote paging device is being configured.
591 Progress indicator. Logical Volume Manager device driver is being
configured.
592 Progress indicator. An HFT device is being configured.
593 Progress indicator. SNA device driver is being configured.
594 Progress indicator. Asynchronous I/O is being defined or configured.
595 Progress indicator. X.31 pseudo device is being configured.
596 Progress indicator. SNA DLC/LAPE pseudo device is being configured.
597 Progress indicator. Outboard communication server (OCS) is being
configured.
598 Progress indicator. OCS hosts is being configured during system reboot.
599 Progress indicator. FDDI data link control (DLC) is being configured.
5c0 Progress indicator. Streams-based hardware driver being configured.
5c1 Progress indicator. Streams-based X.25 protocol stack being configured.
5c2 Progress indicator. Streams-based X.25 COMIO emulator driver being
configured.
5c3 Progress indicator. Streams-based X.25 TCP/IP interface driver being
configured.
5c4 Progress indicator. FCS adapter device driver being configured.
5c5 Progress indicator. SCB network device driver for FCS is being configured.
5c6 Progress indicator. AIX SNA channel being configured.
c00 - c99
c00 AIX Install/Maintenance loaded successfully.
c01 Insert the AIX Install/Maintenance diskette.
c02 Diskettes inserted out of sequence.
c03 Wrong diskette inserted.

V10.0
Student Notebook
AP c04 Irrecoverable error occurred.

c05 Diskette error occurred.
c06 The rc.boot script is unable to determine the type of boot.
c07 Insert next diskette.
c08 RAM file system started incorrectly.
c09 Progress indicator. Writing to or reading from diskette.
c10 Platform-specific bootinfo is not in boot image.
c20 Unexpected system halt occurred. System is configured to enter the kernel
debug program instead of doing a system dump. Enter bosboot -D for
information about kernel debugger enablement.
c21 The if config command was unable to configure the network for the client
network host.
c25 Client did not mount remote mini root during network installation.
c26 Client did not mount the /usr file system during the network boot.
c29 System was unable to configure the network device.
c31 If a console is not configured, the system pauses with this value and then
displays instructions for choosing a console.
c32 Progress indicator. Console is a high-function terminal.
c33 Progress indicator. Console is a tty.
c34 Progress indicator. Console is a file.
c40 Extracting data files from media.
c41 Could not determine the boot type or device.
c42 Extracting data files from diskette.
c43 Could not access the boot or installation tape.
c44 Initializing installation database with target disk information.
c45 Cannot configure the console. The cfgcon command failed.
c46 Normal installation processing.
c47 Could not create a PVID on a disk. The chgdisk command failed.
c48 Prompting you for input. BosMenus is being run.
c49 Could not create or form the JFS log.
c50 Creating rootvg on target disk.
c51 No paging devices were found.
c52 Changing from RAM environment to disk environment.
c53 Not enough space in /tmp to do a preservation installation. Make /tmp larger.
c54 Installing either BOS or additional packages.
Student Notebook
c55 Could not remove the specified logical volume in a preservation installation.
c56 Running user-defined customization.
c57 Failure to restore BOS.
c58 Displaying message to turn the key.
c59 Could not copy either device special files, device ODM, or volume group
information from RAM to disk.
c61 Failed to create the boot image.
c70 Problem mounting diagnostic CD disk in stand-alone mode.
c99 Progress indicator. The diagnostic programs completed.

V10.0
backpg
Back page

AN153 Stud

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AN153 Stud

Uploaded by

Copyright:

Available Formats

V10.

IBM Training Front cover

Power Systems for AIX III: Advanced Administration and

January 2015 edition

© Copyright International Business Machines Corporation 2009, 2015.

Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Unit 1. Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Unit 2. The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

© Copyright IBM Corp. 2009, 2015 Contents iii

Additional device object classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-34

Unit 3. Error monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1

Unit 4. Network Installation Manager basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1

iv AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

TOC Define a client using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31

Unit 5. System initialization: Accessing a boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

Unit 6. System initialization: rc.boot and inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

© Copyright IBM Corp. 2009, 2015 Contents v

/etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30

Unit 7. LVM metadata and related problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1

Unit 8. Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-1

vi AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

TOC VGDA count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-10

Unit 9. Install and cloning techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

Unit 10. Advanced backup techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

© Copyright IBM Corp. 2009, 2015 Contents vii

Ensuring backup data consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-4

Unit 11. Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-1

viii AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

TOC Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-22

Unit 12. The AIX system dump facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

© Copyright IBM Corp. 2009, 2015 Contents ix

Appendix A. Checkpoint solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Appendix B. Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

Appendix C. AIX dump code and progress codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1

x AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

© Copyright IBM Corp. 2009, 2015 Trademarks xi

xii AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

© Copyright IBM Corp. 2009, 2015 Course description xiii

xiv AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

© Copyright IBM Corp. 2009, 2015 Agenda xv

xvi AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

What this unit is about

What you should be able to do

How you will check your progress

After completing this unit, you should be able to:

© Copyright IBM Corporation 2009, 2015

Figure 1-1. Unit objectives AN153.0

1-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

Figure 1-2. Application outages AN153.0

1-4 AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

Maintenance window tasks

• Minimize time that is needed for tasks

• Operating system maintenance

© Copyright IBM Corporation 2009, 2015

Figure 1-3. Maintenance window tasks AN153.0

Expediting work in the maintenance window

Operating system maintenance

1-6 AIX Advanced Administration © Copyright IBM Corp. 2009, 2015

Effective problem management

• Keep system documentation current

• Keep maintenance up to date

• Use a problem determination methodology

© Copyright IBM Corporation 2009, 2015