Professional Documents
Culture Documents
October 2011
IBM Systems Group
Agenda
The importance of IO tuning Disk basics and performance overview AIX IO stack Data layout Characterizing application IO Disk performance monitoring tools Testing disk subsystem performance Tuning
2
2011 IBM Corporation
Memory access takes 540 CPU cycles Disk access takes 20 million CPU cycles, or 37,037 memory accesses System bottlenecks are being pushed to the disk Disk subsystems are using cache to improve IO service times Customers now spend more on storage than on servers
3
2011 IBM Corporation
+35% 450
Capacity (GB) Max Sustained DR (MB/s) Read Seek (ms)
+15%
73 75 3.6
180
-1%
3.4
2002
2010
4
2011 IBM Corporation
Performance metrics
Disk metrics MB/s IOPS With a reasonable service time Application metrics Response time Batch job run time System metrics CPU, memory and IO Size for your peak workloads Size based on maximum sustainable thruputs Bandwidth and thruput sometimes mean the same thing, sometimes not For tuning - it's good to have a short running job that's representative of your workload
5
2011 IBM Corporation
Performance metrics
Use a relevant metric for testing Should be tied to business costs, benefits or requirements Batch job run time Maximum or sufficient application transactions/second Query run time Metrics that typically are not so relevant Application transaction time if < a few seconds Metrics indicating bottlenecks CPU, memory, network, disk Important if the application metric goal isnt met Be aware of IO from other systems affecting disk performance to shared disk If benchmarking two systems, be sure the disk performance is apples to apples and youre not really comparing disk subsystem performance
6
2011 IBM Corporation
Disk performance
ZBR Geometry
Interface type
ATA SATA SCSI FC SAS
7
2011 IBM Corporation
Disk performance
When do you have a disk bottleneck? Random workloads Reads average > 15 ms With write cache, writes average > 2.5 ms Sequential workloads Two sequential IO streams on one disk You need more thruput
IOPS vs IO service time - 15,000 RPM disk IO service time (ms)
500 400 300 200 100 0 25 50 75 100 125 150 175 200 225 250 275 300 325
IOPS
8
2011 IBM Corporation
What is %iowait?
A misleading indicator of disk performance A type of CPU idle Percent of time the CPU is idle and waiting on an IO so it can do some more work High %iowait does not necessarily indicate a disk bottleneck Your application could be IO intensive, e.g. a backup You can make %iowait go to 0 by adding CPU intensive jobs Low %iowait does not necessarily mean you don't have a disk bottleneck The CPUs can be busy while IOs are taking unreasonably long times If disk IO service times are good, you arent getting the performance you need, and you have significant %iowait consider using SSDs or RAM disk Improve performance by potentially reducing %iowait to 0
10
2011 IBM Corporation
11
2011 IBM Corporation
1X HDD SSD
12
RAM disk
Use system RAM to create a virtual disk Data is lost in the event of a reboot or system crash IOs complete with RAM latencies For file systems, it takes away from file system cache Taking from one pocket and putting it into another A raw disk or file system only no LVM support
# mkramdisk 16M /dev/rramdisk0 # mkfs -V jfs2 /dev/ramdisk0 mkfs: destroy /dev/ramdisk0 (yes)? y File system created successfully. 16176 kilobytes total disk space. Device /dev/ramdisk0: Standard empty filesystem Size: 32352 512-byte (DEVBLKSIZE) blocks # mkdir /ramdiskfs # mount -V jfs2 -o log=NULL /dev/ramdisk0 /ramdiskfs # df -m /ramdiskfs Filesystem MB blocks Free %Used Iused %Iused Mounted on /dev/ramdisk0 16.00 15.67 3% 4 1% /ramdiskfs
13
2011 IBM Corporation
JFS JFS2
NFS
Other
Multi-path IO driver (optional) Disk Device Drivers Queues exist for both adapters and disks Adapter Device Drivers Adapter device drivers use DMA for IO Disk subsystem (optional) Disk subsystems have read and write cache Disks have memory to store commands/data Disk Read cache or memory area used for IO Write cache
IOs can be coalesced (good) or split up (bad) as they go thru the IO stack IOs adjacent in a file/LV/disk can be coalesced IOs greater than the maximum IO size supported will be split up
2011 IBM Corporation
14
Data layout
Data layout affects IO performance more than any tunable IO parameter Good data layout avoids dealing with disk hot spots An ongoing management issue and cost Data layout must be planned in advance Changes are often painful iostat and filemon can show unbalanced IO Best practice: evenly balance IOs across all physical disks Random IO best practice: Spread IOs evenly across all physical disks For disk subsystems Create RAID arrays of equal size and RAID level Create VGs with one LUN from every array Spread all LVs across all PVs in the VG The SVC can, and XIV does do this automatically
16
2011 IBM Corporation
1 2 3 4 5
RAID array LUN or logical disk PV
17
2011 IBM Corporation
datavg
# mklv lv1 e x hdisk1 hdisk2 hdisk5 # mklv lv2 e x hdisk3 hdisk1 . hdisk4 .. Use a random order for the hdisks for each LV
18
2011 IBM Corporation
Data layout
Best practice for VGs and LVs Use Big or Scalable VGs Both support no LVCB header on LVs (only important for raw LVs) These can lead to issues with IOs split across physical disks Big VGs require using mklv T O option to eliminate LVCB Scalable VGs have no LVCB Only Scalable VGs support mirror pools (AIX 6100-02) For JFS2, use inline logs For JFS, one log per file system provides the best performance If using LVM mirroring, use active MWC Passive MWC creates more IOs than active MWC Use RAID in preference to LVM mirroring Reduces IOs as theres no additional writes for MWC Use PP striping in preference to LV striping
2011 IBM Corporation
19
LVM limits
Standard VG Max PVs/VG Max LVs/VG Max PPs/VG Max LPs/LV 32 256 32,512 32,512 Big VG 128 512 130,048 130,048 Scalable VG 1024 4096 2,097,152 2,097,152
Max PPs per VG and max LPs per LV restrict your PP size Use a PP size that allows for growth of the VG Use a PP size that allows your LVs to be spread across all PVs Unless your disk subsystem ensures your LVs are spread across all physical disks Valid LV strip sizes range from 4 KB to 128 MB in powers of 2 for striped LVs
20
2011 IBM Corporation
21
2011 IBM Corporation
Application IO characteristics
Random IO Typically small (4-32 KB) Measure and size with IOPS Usually disk actuator limited Sequential IO Typically large (32KB and up) Measure and size with MB/s Usually limited on the interconnect to the disk actuators To determine application IO characteristics Use filemon
# filemon o /tmp/filemon.out O lf,lv,pv,detailed T 500000; sleep 90; trcstop
Check for trace buffer wraparounds which may invalidate the data, run filemon with a larger T value or shorter sleep
22
2011 IBM Corporation
23
24
2011 IBM Corporation
Using filemon
Look at PV summary report Look for balanced IO across the disks Lack of balance may be a data layout problem Depends upon PV to physical disk mapping LVM mirroring scheduling policy also affects balance for reads IO service times in the detailed report is more definitive on data layout issues Dissimilar IO service times across PVs indicates IOs are not balanced across physical disks Look at most active LVs report Look for busy file system logs Look for file system logs serving more than one file system At 6.1, filemon also has reports showing the processes/threads doing IO to files
25
2011 IBM Corporation
Using iostat
Use a meaningful interval, 30 seconds to 15 minutes The first report is since system boot (if sys0s attribute iostat=true) Examine IO balance among hdisks Look for bursty IO (based on syncd interval) Useful flags: -T Puts a time stamp on the data -a Adapter report (IOs for an adapter) for both physical and virtual -m Disk path report (IOs down each disk path) -s System report (overall IO) -A or P For standard AIO or POSIX AIO -D for hdisk queues and IO service times -R to reset min and max values for each interval -l puts data on one line (better for scripts) -p for tape statistics -f/-F for file system statistics (AIX 6.1 TL1)
2011 IBM Corporation
26
Using iostat
# iostat <interval> <count> For individual disk and system statistics tty: tin tout avg-cpu: % user % sys % idle % iowait 24.7 71.3 8.3 2.4 85.6 3.6 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 2.2 19.4 2.6 268 894 hdisk1 5.0 231.8 28.1 1944 11964 hdisk2 5.4 227.8 26.9 2144 11524 hdisk3 4.0 215.9 24.8 2040 10916 ... # iostat ts <interval> <count> For total system statistics System configuration: lcpu=4 drives=2 ent=1.50 paths=2 vdisks=2 tty: tin tout avg-cpu: % user % sys % idle % iowait physc % entc 0.0 8062.0 0.0 0.4 99.6 0.0 0.0 0.7 Kbps tps Kb_read Kb_wrtn 82.7 20.7 248 0 0.0 13086.5 0.0 0.4 99.5 0.0 0.0 0.7 Kbps tps Kb_read Kb_wrtn 80.7 20.2 244 0 0.0 16526.0 0.0 0.5 99.5 0.0 0.0 0.8
27
2011 IBM Corporation
Using iostat
# iostat -f <interval> <count> FS Name: % tm_act / /usr /var /tmp /home /admin /proc /opt /var/adm/ras/livedum /oracle /staging /ggs -
Kbps 85.7 961.1 0.0 0.0 0.0 0.0 7.6 0.0 0.0 2.2 0.0 0.0
tps 113.3 274.1 0.0 0.0 0.0 0.0 17.3 0.0 0.0 22.9 0.0 0.0
Kb_wrtn 0 0 0 0 0 0 0 0 0 6 0 0
28
2011 IBM Corporation
Using iostat
# iostat -DRTl <interval> <count>
Disks: xfers read write queue time -------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- --------%tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv act serv serv serv outs serv serv serv outs time time time wqsz sqsz qfull hdisk41 4.6 89.8K 5.7 24.8K 65.0K 3.0 8.5 0.2 28.9 0 0 2.6 9.4 0.4 233.2 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk44 21.6 450.2K 52.0 421.5K 28.7K 51.5 4.3 0.2 39.0 0 0 0.6 5.9 0.5 30.9 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk42 6.6 57.3K 6.8 42.3K 15.0K 5.2 10.9 0.2 32.7 0 0 1.6 7.0 0.3 22.4 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk43 37.2 845.5K 101.4 818.2K 27.3K 99.9 4.0 0.2 47.6 0 0 1.5 17.2 0.4 230.2 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk37 94.4 700.0K 2.2 0.0 700.0K 0.0 0.0 0.0 0.0 0 0 2.2 1.1S 117.9 4.1S 0 0 0.0 0.0 0.1 0.0 0.1 0.0 04:52:25 hdisk53 23.5 296.2K 35.5 269.5K 26.8K 32.9 7.7 0.2 47.0 0 0 2.6 2.5 0.4 27.7 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk51 32.5 471.2K 55.6 445.5K 25.7K 54.4 6.7 0.2 58.8 0 0 1.2 3.1 0.4 13.0 0 0 0.0 0.0 0.1 0.0 0.0 0.0 04:52:25 hdisk56 19.5 178.0K 20.7 122.3K 55.7K 14.9 9.8 0.2 55.0 0 0 5.7 55.8 0.4 318.9 0 0 2.8 0.0 194.4 0.0 0.0 0.6 04:52:25 hdisk48 18.0 149.6K 18.0 101.0K 48.6K 12.3 10.6 0.2 38.5 0 0 5.7 19.0 0.4 250.2 0 0 0.0 0.0 3.7 0.0 0.0 0.3 04:52:25 hdisk46 12.9 167.4K 19.8 156.7K 10.6K 19.1 6.8 0.2 37.5 0 0 0.7 4.4 0.4 17.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk57 55.2 608.8K 71.1 574.4K 34.4K 69.5 8.9 0.2 118.3 0 0 1.6 10.1 0.4 216.3 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk55 13.4 244.9K 29.8 234.0K 10.9K 28.6 4.8 0.2 36.9 0 0 1.3 2.6 0.4 22.3 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk50 48.6 616.7K 73.3 575.5K 41.2K 70.3 7.9 0.2 84.5 0 0 3.1 5.7 0.4 40.1 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25 hdisk52 14.5 174.2K 20.6 116.0K 58.1K 14.2 7.7 0.2 36.9 0 0 6.5 10.7 0.4 270.1 0 0 0.0 0.0 0.0 0.0 0.0 0.0 04:52:25
Shows average IO service times for reads and writes, IO rates, IOPS (tps) and time in the hdisk driver queue One can calculate R/W ratio and average IO size Time spent in the queue indicates increasing queue_depth may improve performance sqfull = number of times the hdisk drivers service queue was full avgserv = average IO service time avgsqsz = average service queue size This can't exceed queue_depth for the disk avgwqsz = average wait queue size Waiting to be sent to the disk If avgwqsz is often > 0, then increase queue_depth If sqfull in the first report is high, then increase queue_depth
29
2011 IBM Corporation
Using sar
sar -d formerly reported zeros for avwait and avserv avque definition changes in AIX 5.3 # sar -d 1 2 AIX sq1test1 3 5 00CDDEDC4C00 06/22/04 System configuration: lcpu=2 drives=1 ent=0.30 10:01:37 10:01:38 10:01:39 Average device %busy avque r+w/s Kbs/s avwait avserv hdisk0 100 36.1 363 46153 51.1 8.3 hdisk0 99 38.1 350 44105 58.0 8.5 hdisk0 99 37.1 356 45129 54.6 8.4
avque - average IOs in the wait queue Waiting to get sent to the disk (the disk's queue is full) Values > 0 indicate increasing queue_depth may help performance Used to mean number of IOs in the disk queue avgwait - time (ms) waiting in wait queue avgserv - IO service time (ms) when sent to the disk
30
2011 IBM Corporation
Using lvmstat
Provides IO statistics for LVs, VGs and PPs Useful for SSD data placement You must enable data collection first for a VG: # lvmstat e v <vgname> Useful to find busy LVs and PPs
root/ # lvmstat -sv rootvg <interval length> <number of intervals> Logical Volume iocnt Kb_read Kb_wrtn Kbps hd8 212 0 848 24.00 hd4 11 0 44 0.23 hd2 3 12 0 0.01 hd9var 2 0 8 0.01 .. hd8 3 0 12 8.00 . hd8 12 0 48 32.00 hd4 1 0 4 2.67 # lvmstat -l lv00 1 Log_part mirror# iocnt Kb_read Kb_wrtn Kbps 1 1 65536 32768 0 0.02 2 1 53718 26859 0 0.01 Log_part mirror# iocnt Kb_read Kb_wrtn Kbps 2 1 5420 2710 0 14263.16 Log_part mirror# iocnt Kb_read Kb_wrtn Kbps 3 1 4449 2224 0 13903.12 2 1 979 489 0 3059.38
2011 IBM Corporation
31
Using NMON
# nmon - then press a for all adapters or ^ for FC adapters
Easy way to monitor adapter thruput NMON can also be used to create Excel graphs showing IO over time Plus CPU, memory, and network IO data
32
2011 IBM Corporation
Testing thruput
Sequential IO Test sequential read thruput from a device: # timex dd if=<device> of=/dev/null bs=1m count=100 # timex dd if=/dev/rhdisk20 of=/dev/null bs=1m count=1024 1024+0 records in. 1024+0 records out. real 3.44 user 0.00 sys 0.17 1024 MB/3.44 s = 297.7 MB/s Test sequential write thruput to a device: # timex dd if=/dev/zero of=<device> bs=1m count=100 Note that /dev/zero writes the null character, so writing this character to files in a file system will result in sparse files For file systems, either create a file, or use the lptest command to generate a file, e.g., # lptest 127 32 > 4kfile
33
2011 IBM Corporation
34
2011 IBM Corporation
419.2 | 1.64 1676.98 30.00 8384.8 | 32.75 Rand procs= 20 read=100% bs=4KB
35
2011 IBM Corporation
38
2011 IBM Corporation
JFS JFS2
NFS
Other
VMM LVM (LVM device drivers) Multi-path IO driver (optional) Disk Device Drivers Adapter Device Drivers Disk subsystem (optional) Disk Write cache
Disk buffers (pbufs) at this layer hdisk queue_depth adapter num_cmd_elems
39
2011 IBM Corporation
Tuning IO buffers
# vmstat -v | tail -5 <-- only last 5 lines needed 0 pending disk I/Os blocked with no pbuf 0 paging space I/Os blocked with no psbuf 8755 filesystem I/Os blocked with no fsbuf 0 client filesystem I/Os blocked with no fsbuf 2365 external pager filesystem I/Os blocked with no fsbuf
First field is a count of IOs delayed since boot due to lack of the specified buffer For pbufs, use lvmo to increase pv_pbuf_count (see the next slide) For psbufs, stop paging or add paging spaces For filesystem fsbufs, increase numfsbufs with ioo For external pager fsbufs, increase j2_dynamicBufferPreallocation with ioo For client filesystem fsbufs, increase nfso's nfs_v3_pdts and nfs_v3_vm_bufs (or the NFS4 equivalents) Run # ioo FL to see defaults, current settings and whats required to make the changes go into effect
40
2011 IBM Corporation
To increase a VGs pbufs dynamically: # lvmo v <vgname> -o pv_pbuf_count=<new value> pv_min_pbuf is tuned via ioo or lvmo Changes to pv_pbuf_count via lvmo are dynamic Increase value, collect statistics and change again if necessary See # lvmo L
41
2011 IBM Corporation
42
2011 IBM Corporation
43
2011 IBM Corporation
Attributes with user_settable=True can be changed List allowable values for an attribute with # lsattr Rl <device> -a <attribute>
# lsattr -Rl hdisk10 -a queue_depth 1...256 (+1)
To change an attribute use # chdev l <device> -a <attribute>=<new value> -P Then reboot, or if the device is not in use, eliminate the -P so the change is immediately effective
44
2011 IBM Corporation
The max_xfer_size attribute also controls a DMA memory area used to hold data for transfer, and at the default is 16 MB Changing to other allowable values increases it to 128 MB and increases the adapters bandwidth Often changed to 0x200000 This can result in a problem if there isnt enough memory on PHB chips in the IO drawer with too many adapters/devices on the PHB Make the change and reboot check for Defined devices or errors in the error log, and change back if necessary For NPIV and virtual FC adapters the DMA memory area is 128 MB at 6.1 TL2 or later
45
2011 IBM Corporation
46
2011 IBM Corporation
Indicates that we queued 128 IOs to the hdisk driver If queue_depth is 128, we filled the queue
# iostat -D hdisk10 System configuration: lcpu=4 drives=35 paths=35 vdisks=2 hdisk10 xfer: read: write: queue: %tm_act 0.1 rps 0.0 wps 0.2 avgtime 5.4 bps 1.4K avgserv 4.6 avgserv 8.2 mintime 0.0 tps 0.2 minserv 0.3 minserv 0.5 maxtime 579.5 bread bwrtn 442.6 940.9 maxserv timeouts 67.9 0 maxserv timeouts 106.8 0 avgwqsz avgsqsz 0.0 0.0
The sqfull value represents the number of times we filled the queue per second Non-zero values indicate we filled the queue
47
2011 IBM Corporation
48
2011 IBM Corporation
VIO
The VIO Server (VIOS) uses multi-path IO code for the attached disk subsystems The VIO client (VIOC) always uses SCSI MPIO if accessing storage thru two VIOSs In this case only entire LUNs are served to the VIOC Set the queue_depth at the VIOC equal to queue_depth at the VIOS for the LUN If you increase vFC adapter num_cmd_elems, also do it on the real FC adapter Preferably treat the real FC adapter num_cmd_elems as a shared resource The VSCSI adapter has a queue also To avoid queuing on the VSCSI adapter: Max LUNs/VSCSI adapter =INT(510/(Q+3)) Q is the queue depth of the LUN assuming all are the same One can monitor adapters with NMON in the oem_setup_env shell
49
2011 IBM Corporation
Read ahead
Read ahead detects that we're reading sequentially and gets the data before the application requests it Reduces %iowait Too much read ahead means you do IO that you don't need Operates at the file system layer - sequentially reading files Set maxpgahead for JFS and j2_maxPgReadAhead for JFS2 Values of 1024 for max page read ahead are not unreasonable Disk subsystems read ahead too - when sequentially reading disks Improves IO service time and thruput Tunable on DS4000, fixed on ESS, DS6000, DS8000 and SVC
50
2011 IBM Corporation
Write behind
Initiates writes from file system cache before syncd does it Write behind tuning for sequential writes to a file Tune numclust for JFS Tune j2_nPagesPerWriteBehindCluster for JFS2 These represent 16 KB clusters Larger values allow IO to be coalesced When the specified number of sequential 16 KB clusters are updated, start the IO to disk rather than wait for syncd Write behind tuning for random writes to a file Tune maxrandwrt for JFS Tune j2_maxRandomWrite and j2_nRandomCluster for JFS2 Max number of random writes allowed to accumulate to a file before additional IOs are flushed, default is 0 or off j2_nRandomCluster specifies the number of clusters apart two consecutive writes must be in order to be considered random If you have bursty IO, consider using write behind to smooth out the IO rate
51
2011 IBM Corporation
52
2011 IBM Corporation
j2_syncPageLimit
Overrides j2_syncPageCount when a threshold is reached. This is to guarantee that sync will eventually complete for a given file. Not applied if j2_syncPageCount is off. Default: 16, Range: 1-65536, Type: Dynamic, Unit: Numeric
If application response times are impacted by syncd, try j2_syncPageCount settings from 256 to 1024. Smaller values improve short term response times, but still result in larger syncs that impact response times over larger intervals. These will likely require a lot of experimentation, and detailed analysis of IO behavior. Does not apply to mmap() memory mapped files. May not apply to shmat() files (TBD)
53
2011 IBM Corporation
Mount options
Release behind: rbr, rbw and rbrw Says to throw the data out of file system cache rbr is release behind on read rbw is release behind on write rbrw is both Applies to sequential IO only DIO: Direct IO Bypasses file system cache No file system read ahead No lrud or syncd overhead No double buffering of data Half the kernel calls to do the IO Half the memory transfers to get the data to the application Requires the application be written to use DIO CIO: Concurrent IO The same as DIO but with no i-node locking
54
2011 IBM Corporation
JFS JFS2
NFS
Other
VMM LVM (LVM device drivers) Multi-path IO driver Disk Device Drivers Adapter Device Drivers Disk subsystem (optional) Disk Write cache Read cache or memory area used for IO
55
2011 IBM Corporation
Mount options
Direct IO IOs must be aligned on file system block boundaries IOs that dont adhere to this will dramatically reduce performance Avoid large file enabled JFS file systems - block size is 128 KB after 4 MB
56
2011 IBM Corporation
Mount options
Concurrent IO for JFS2 (not JFS): # mount -o cio # chfs -a options=rw,cio <file system> Assumes that the application ensures data integrity for multiple simultaneous IOs to a file Changes to meta-data are still serialized I-node locking: When two threads (one of which is a write) to do IO to the same file are at the file system layer of the IO stack, reads will be blocked while a write proceeds Provides raw LV performance with file system benefits Requires an application designed to use CIO For file system maintenance (e.g. restore a backup) one usually mounts without cio during the maintenance Some applications now make CIO/DIO calls directly without requiring cio/dio mounts, in which case dont use the mount options Important for times when alignment requirements arent met, or when file system read ahead helps (like during backups)
57
2011 IBM Corporation
58
2011 IBM Corporation
59
2011 IBM Corporation
IO Pacing
IO pacing - causes the CPU to do something else with a specified amount of IOs to a file in process Turning it off improves backup times and thruput Turning it on ensures that no process hogs CPU for IO, and ensures good keyboard response on systems with heavy IO workloads Default is on with minpout=4096 and maxpout=8193 Originally used to avoid HACMP's dead man switch Old HACMP recommended values of 33 and 24 significantly inhibit thruput but are reasonable for uniprocessors with noncached disk Changed via:
# chgsys l sys0 a maxpout=<new value> -a minpout=<new value>
IO pacing per file system via the mount command # mount -o minpout=256 o maxpout=513 /myfs
60
2011 IBM Corporation
Asynchronous IO
Asynchronous IO automatically turned on at AIX 6.1 AIO kernel threads automatically exit after aio_server_inactivity seconds AIO kernel threads not used for AIO to raw LVs or CIO mounted file systems Only aio_maxservers and aio_maxreqs need to be changed Defaults are 21 and 8K respectively per logical CPU Set via ioo Some may want to adjust minservers for heavy AIO use maxservers is the maximum number of AIOs that can be processed at any one time maxreqs is the maximum number of AIO requests that can be handled at one time and is a total for the system (they are queued to the AIO kernel threads) Typical values:
Default 3 10 4096 OLTP 200 800 16384 SAP 400 1200 16384
61
2011 IBM Corporation
Asynchronous IO tuning
Use iostat A to monitor AIO (or -P for POSIX AIO)
#
iostat -A <interval> <number of intervals> System configuration: lcpu=4 drives=1 ent=0.50 aio: avgc avfc maxg maxf maxr avg-cpu: %user %sys %idle %iow physc %entc 25 6 29 10 4096 30.7 36.3 15.1 17.9 0.0 81.9 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 100.0 61572.0 484.0 8192 53380
avgc - Average global non-fastpath AIO request count per second for the specified interval avfc - Average AIO fastpath request count per second for the specified interval for IOs to raw LVs (doesnt include CIO fast path IOs) maxg - Maximum non-fastpath AIO request count since the last time this value was fetched maxf - Maximum fastpath request count since the last time this value was fetched maxr - Maximum AIO requests allowed - the AIO device maxreqs attribute If maxg or maxf gets close to maxr or maxservers then increase maxreqs or maxservers
62
2011 IBM Corporation
How this adapter is CONNECTED False Dynamic Tracking of FC Devices True FC Fabric Event Error RECOVERY Policy True Adapter SCSI ID False FC Class for Fabric True
63
2011 IBM Corporation
QUESTIONS? COMMENTS?
65
2011 IBM Corporation