You are on page 1of 18

Slide 1

2006 Hewlett-Packard Development Company, L.P.


The information contained herein is subject to change without notice
Top Ten Performance
Tips for HP-UX 11i
v2 on HP Integrity
Servers
June 2006
Simplifying Integrity Performance Team
BCS Transition Engineering and Consulting



Slide 2

May 31, 2006 2 2006 Hewlett-Packard Development Company, L.P.
What Will Help Customers Get the
Best Out-of-the-Box Performance?
This list covers both operating system and application
tuning
Gathered from direct discussions with HP-UX labs
These all apply to HP-UX on Integrity servers
some apply to HP-UX on PA-RISC as well
Performance tuning is an art not every issue
will apply to every customer
These ten represent SOME of the areas that will help
customers but not everything
The order of listing does not imply priority



Slide 3

May 31, 2006 3 2006 Hewlett-Packard Development Company, L.P.
Tip #1: Use the Latest HP Compilers
Best performance will be achieved using the HP
Integrity compilers for HP-UX, rather than:
using open source compilers
executing PA-RISC binaries with the ARIES translator
Use current versions of the HP compilers
Download or order from DSPP:
www.hp.com/go/acc
www.hp.com/go/fortran
www.hp.com/go/java
Free for registered DSPP partners


HP continues to evolve HP-UX, its compilers, and development tools on HP Integrity servers for
improved performance.
Improvements are being added to the HP compilers with every release.
Some customers have been surprised to find that they are running PA-RISC binaries on their Integrity
servers without realizing it. You can use the file command on a binary to determine if it is a PA-
RISC binary or a native Integrity binary.

Slide 4

May 31, 2006 4 2006 Hewlett-Packard Development Company, L.P.
Performance Improvements over Time with
New Compiler Releases
0%
10%
20%
30%
40%
50%
60%
May-02 Oct-02 Jun-03 Mar-04 Dec-04 Sep-05 Jun-06
Integer
Technical
Commercial
Application performance normalized to same hardware


The Integer and Technical results use the base options of +O4, Profile feedback, and large pages.
The Commercial result corresponds to a large commercial application built with +O2, Profile
feedback, and large pages.
The points on the chart correspond to different compiler releases.

Slide 5

May 31, 2006 5 2006 Hewlett-Packard Development Company, L.P.
Tip #2: Optimize, Optimize, Optimize!
Take advantage of the HP Integrity compiler optimizations
Can optimize selectively or across an entire application
Use Profile-Based Optimization (PBO)
+Oprofile=collect
Invoke Interprocedural Optimizer (IPO)
Can be used with level 2, 3, or 4 (for example, - +O2 -ipo)
Trade-off - performance versus the ability to debug
Use the latest version of the Caliper performance tool at:
http://www.hp.com/go/caliper
Optimizing Itanium-based Applications (April 2006)
http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/Optimi
zingApps-Itanium.pdf


Profile-based optimization (PBO) is a set of performance-improving code transformations that make use of an
execution profile gathered for an application. There are three steps to PBO:
1. Instrumentation compile the program with profiling turned on
2. Data collection run the program with representative data to collect execution profile statistics
3. Optimization generate optimized code based on the profile

Note that instrumented programs run slower and should only be used to collect statistics for profile-based
optimization.
Better alias information and inlining improves optimization.


Slide 6

May 31, 2006 6 2006 Hewlett-Packard Development Company, L.P.
Performance Gains
Through Optimization
0%
10%
20%
30%
40%
50%
60%
70%
80% gcc4.0 O2
gcc4.0 O3
gcc4.1 O2
gcc4.1 O3
HP DD64 -O
HP DD64 -O PBO
HP DD64 -ipo
HP DD64 -ipo -PBO
HP DD32 Base
Compiler optimizations deliver greater performance on HP
Integrity than on other architectures



Slide 7

May 31, 2006 7 2006 Hewlett-Packard Development Company, L.P.
Tip #3: Investigate Memory
Requirements
Memory requirements for HP-UX on Integrity
should be about the same as on PA-RISC.
Differences may occur, depending on the applications
being run or when comparing to other UNIX operating
systems
Read the new article on DSPP, entitled Memory
Usage on HP-UX Integrity Servers
May see increases in code size with little system impact;
larger binaries on disk does not imply more memory
May need more memory if moving applications from 32-bit to
64-bit, or using lots of pointers in C++
http://h21007.www2.hp.com/dspp/files/unprotected/h
pux/Itanium-Memory-Usage.pdf


The article covers three topics: Code expansion, data expansion, and object file expansion.
Expect an increase in code size compared to PA-RISC. This increase is a trade-off of using some of
the new performance features of Itaniumprocessors. The effect on overall system memory usage
should be minor, because one copy of code is typically shared in memory by all instances of a
program.
Data size can also increase if, for example, a 32-bit application is migrated to the 64-bit
programming model.
Top and other tools may report huge Virtual Address spaces, especially for stacks, but these do not
affect actual memory usage.

Slide 8

May 31, 2006 8 2006 Hewlett-Packard Development Company, L.P.
Tip #4: Use Large Pages
The ability to dynamically change page sizes at run
time is a competitive advantage for HP-UX
Goal is to reduce the number of data TLB misses
This can be done globally by increasing:
vps_ceiling, vps_chatr_ceiling, vps_pagesize
Or it can be done by process (up to vps_chatr_ceiling):
chatr +pd 1M sets data page size maximum to 1 MB
Can change and rerun to find the best page size for
each application in powers of 4 (4 K, 16 K, 64 K, ...)


To ensure that chatr works properly on all versions of HP-UX, the recommended technique to use it is to:
Terminate all processes running the program.
Make a copy of the program file.
Run the chatr command on the copy.
Copy the file that the chatr command was run on back over the original program file.
Run the program.

Slide 9
May 31, 2006 9 2006 Hewlett-Packard Development Company, L.P.
Tip #5: Get All of the Latest HP-UX
11i v2 Performance Patches
Some critical performance patches:
PHKL_33583 high memory pressure/page synchronization
PHKL_33368 JFS direct I/O performance
Some key performance enhancements:
Threads
PHCO_33675 pthread cumulative patch
PHKL_34032 ksleep cumulative patch
High-resolution timers
Six patches to help customers migrating to HP-UX

HP continuously analyzes application performance on HP-UX and offers HP-UX patches that improve throughput,
responsiveness, and behavior. This list is an attempt to document some known patch/performance relationships
and the suggested remedy. It is meant to be used as a quick check when a system is experiencing performance
problems. This is not a complete list, as new patches continue to be introduced.
Many HP-UX 11i v2 patches are the same for both PA-RISC and Integrity servers.
PHKL_33583 - High memory pressure is seen and some pages of physical memory are never used by the kernel.
This problem is only seen on Integrity servers. This patch provides a page cache synchronization fix.
PHKL_33368 - Direct I/O reads after buffered I/O writes on a large file take a long time. This patch provides
JFS3.5 direct I/O performance improvement.
Patch PHCO_33675 caused a problem in one situation with Java applications, so it is no longer recommended
on Integrity servers on which pthreads applications, such as Java, intermittently abort or exhibit other unexpected
behavior. PHCO_34718 is planned to supersede PHCO_33675 at some point in 2006 and correct this
problem.
Patch PHKL_34032 - Higher-resolution timers have been requested by customers porting from Tru64 UNIX and
AIX. A new resolution of 1 ms will be provided for usleep, nanosleep, and setitimer in a series of patches for HP-
UX 11i v2. These six patches are numbered from PHKL_34356 through PHKL_34361, inclusive, and will go with
patch PHKL_34032 to address this need.
Slide 10

May 31, 2006 10 2006 Hewlett-Packard Development Company, L.P.
Locating Performance Patches
Search for performance-specific patches
ITRC (patches released to customers)
http://www1.itrc.hp.com/service/patch/mainPage.do
Use TOUR V3.0 for networking patches
(Transport Optional Upgrade Release)
Performance enhancements and bug fixes
http://software.hp.com (search on TOUR)


TOUR V3.0
TOUR packages are designed for releasing optional enhancements and bug fixes that some
customers may want. Many customers may not need these enhancement features. TOUR release
notes are available at http://docs.hp.com. Doing a keyword search on "TOUR" (all in uppercase)
will locate all these TOUR release notes.
Note many of the TOUR documents are labeled for TOUR V2, but are still relevant to TOUR V3.
In TOUR 3.0, HP has included a NOSYNC enhancement. This feature is only beneficial to systems
using the link aggregation product (HP APA) or 10 gigabit links.


Slide 11

May 31, 2006 11 2006 Hewlett-Packard Development Company, L.P.
Tip #6: Use Kernel Threads (1x1)
MxN (Kernel and User threads) versus 1x1 (Kernel)
Different operating system versions have different defaults
for threads, with different performance implications:
in HP-UX 11i v2, the default was originally MxN
in all HP-UX 11i v2 updates, the default changed from MxN to 1x1
Stick to 1x1 threads for best performance
Install the performance patches for the pthread library
(see Tip #5)
Make sure you set all the right environment variables
for threads, described in:
POSIX Threads on HP-UX 11i: HP-UX 11i v2 Update 2
http://devresource.hp.com/drc/topics/hpux_hpux.jsp#a095b8d8480264033


The key to using 1x1 threads is to NOT set the environment variables for MxN. This is described in
the paper listed here.
Try to use process private mutexes and condition variables rather than process shared.


Slide 12
May 31, 2006 12 2006 Hewlett-Packard Development Company, L.P.
Tip #7: On Cell-based Systems,
Use Cell Local Memory and psets
to Improve Memory Latency
Optimize performance by minimizing memory
accesses across cell board boundaries (within
the same hard partition).
Cells fully populated with same-size DIMMs give
optimal bandwidth.
For applications such as BI on a big system, allocate
up to 70% of memory as cell local.
Bring up Oracle BI applications which use Parallel
Query (PQ) slaves with a scattering policy of round-
robin (mpsched RR).
Use psets to separate applications and improve
their memory locality.

From the HP-UX 11i v2 Release Notes for the subject of cell local memory, found at:
http://docs.hp.com/en/5990-8153/ch12s02.html
This feature can improve system and application performance when the memory of the system is appropriately configured to
the proper balance between interleaved and cell local memory for the particular work load running on the system. Further
performance improvements are possible if applications are modified to advise the operating system of the usage model for
the memory they request.
This feature can degrade performance if the system memory configuration does not match the work load on the system: for
example, if the workload largely requires interleaved memory but the system has been configured with mostly cell local
memory.
This feature can also degrade performance if multithreaded applications have their threads distributed across multiple
locality domains while their memory is allocated cell local.
Refer to the white paper on ccNUMA:
http://docs.hp.com/en/4913/ccNUMA_White_Paper.pdf

For best performance, consider putting the application and database layers in separate psets.
With Oracle, consider putting the Oracle log writer in its own pset.

Slide 13

May 31, 2006 13 2006 Hewlett-Packard Development Company, L.P.
Tip #8: Monitor, Profile, and Tune
Java Applications with the HP
Free Java Performance Tools
HPjconfig and JavaOOB
Configure your system for Java workloads:
kernel parameters and latest OS patches.
HPjmeter
Profile your application using Xeprof option to
collect detailed performance metrics. Then run
HPjmeter to view, navigate, and drill down to
discover your performance bottlenecks.
Use HPjmeter 2.0 in your production environment
to monitor your applications performance and
resource utilization, and to set up custom alerts.



Slide 14

May 31, 2006 14 2006 Hewlett-Packard Development Company, L.P.
Free HP Java Performance Tools
HPjtune
Use Xverbosegc to collect detailed metrics on
memory use and garbage collection (GC)
performance. Then use HPjtune to view results,
discover inefficient GC behaviors, and compare
and tune your heap sizes and GC algorithms.
For more info on Java performance:
Webcasts: Learn how to use Java tools:
www.presentationselect.com/hpinvent/archivec.asp?ctg=JAV
HP Programmers Guide for Java:
www.hp.com/products1/unix/java/infolibrary/prog_guide/index.ht
ml
HP-UX Performance Tuning Java Web site:
http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPag
e_IDX/1,1701,1602,00.html


When an application does its work, it frequently creates and uses new Java objects. When the JVM heap
memory into which new objects are placed becomes full, a Garbage Collection occurs. However, if the
garbage collector is doing long collections at frequent intervals, it can play havoc with application
performance. The -Xverbosegc Java command line option prints out detailed information about the spaces
within the Java Heap before and after garbage collection.
The size of the heap determines the frequency and duration of garbage collections. The JVM command-line
options to set the initial and maximum heap sizes are the -Xms and -Xmx options, respectively. A third option
(available with Java 2 HotSpot JVM only), -Xmn, configures the New generation heap size, where newly
created objects are stored.
You should tune each application individually to get the best performance.

Slide 15

May 31, 2006 15 2006 Hewlett-Packard Development Company, L.P.
Tip #9: Application Specific Tips
With SAP,
Oracle, u
Apache, the d
minimize the number of work processes;
do not allocate more than you need. Also do not
oversize the buffers.
With se the SCHED_NOAGE parameter for
best performance in I/O intensive environments.
With efault tuning recommendations for
v1 on apache.org can lead to issues with copy
avoidance and the use of sendfile. This is fixed in
Apache v2.


See Appendix A of the Oracle white paper on tuning for HP-UX at:
http://www.oracle.com/technology/products/database/clustering/pdf/11iRACBM2.pdf
The parameter hpux_sched_noage should be set to 178.
With Apache v1, you should change the default so that you are not using memory mapped files.

Slide 16
May 31, 2006 16 2006 Hewlett-Packard Development Company, L.P.
Tip #10: Integrity Virtual Machines
Guest Operating Systems
When installing a guest operating system, make sure
you install the HPVM-Guest bundle that includes a
performance tuning script, /sbin/init.d/hpvmguest.
The script does several things to improve I/O and
network performance and behavior of the guest:
disables I/O forwarding
disables TOPS (Thread-Optimized Packet Scheduling)
extends the default SCSI disk timeout value
extends the timeout settings for the mpt SCSI driver

The HPVM host installation package (T2767AC) contains a depot for installation in HP-UX guests. This depot can be found
on every HPVM host in:
/opt/hpvm/guest-images/hpux/hpvm_guest_depot.sd
This depot contains a single bundle:
HPVM-Guest A.01.10 Integrity VM Guest
It is technically not necessary to have the bundle installed on guests, but it is HIGHLY RECOMMENDED. The bundle
includes HPVM tools such as hpvminfo and hpvmcollect, and a set of kernel tunes that improve I/O and network
performance and behavior of the guest.
The tuning is performed via a rc script, /sbin/init.d/hpvmguest. The script's actions include:
- disabling I/O forwarding
- disabling Thread-Optimized Packet Scheduling (TOPS)
- extending the default SCSI disk timeout value
- extending the timeout settings for the mpt SCSI driver
You can find out and potentially alter the new settings one by one in the file /etc/rc.config.d/hpvmguest.
Warning: The usual restrictions apply when making changes to kernel tunables. The presence of a tunable in this file does
not imply that it is supported for customers to change it.
Slide 17

May 31, 2006 17 2006 Hewlett-Packard Development Company, L.P.
Honorable Mention: Other Tips
Use a transaction monitor like Tuxedo to multiplex
down the number of processes.
Select will run slow on large systems, due
to the number of devices that must be polled.
Improve this by using event ports instead.
Run kcweb or kctune to check your kernel tuning
for things like maximum stack size.
If at all possible, test on the same size server you
will implement on. Applications that run well on a
small server may have problems when combined
with other applications on a large server.


Event ports are described in the poll(7) manpage. For more information, refer to the following Web site:
http://docs.hp.com/en/B2355-60105/poll.7.html





Slide 18

Itanium is a trademark or registered trademark of Intel Corporation in the U.S.
and other countries and is used under license.
Java is a US trademark of Sun Microsystems, Inc.
Oracle is a registered US trademark of Oracle Corporation, Redwood City, California.

You might also like