Professional Documents
Culture Documents
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 1 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 2
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 3 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 4
I/O Overlap I/O Performance
I/O overlaps with computation in complicated ways Timejob = timecpu + timeI/O - timeoverlap
I/O request I/O request I/O interrupt
e.g., 10 = 10+4-4
job 1 job 2 job 3 job 1
USER speed up CPU by 2x
what is timejob
OS
timejob = 5+4-4 = 5 (best)
done
timejob = 5+4-0 = 9 (worst)
I/O
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 5 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 6
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 7 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 8
Device Characteristics Device Characteristics
behavior Data Rate
Device I or O? Partner
• input - read once KB/s
• output - write once mouse I human 0.01
• storage - read many times; usually write graphics dis- O human 60,000
play
partner modem I/O machine 2-8
• human
LAN I/O machine 500-6000
• machine
tape storage machine 2000
data rate disk storage machine 2000-10,000
• peak transfer rate
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 9 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 10
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 11 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 12
Disk Parameters Disk Operations
sectors per surface: 32 typical seek: move head to track
• sector # —gap—data+ECC— n
• avg seek time = ∑ seek ( i ) ⁄ n
• fixed length sectors (except IBM) 1
• typically fixed sectors per track
• n is # tracks, seek(i) is time to seek ith track
• recently constant bit density
rotational latency: wait for sector
• avg rotational latency 0.5/3600 = 8.3 ms
transfer rate
• typically 1-4 MB per second
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 13 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 14
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 15 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 16
Alternatives to Disks Alternatives to Disks
DRAMS FLASH memory
• SSD - solid state disk + no seek time
• standard disk interface + fast transfer
• DRAM and battery backup + non-volatile
• ES - expanded storage – bulk erase before write
• software controlled cache
– slow writes
• large (4K) blocks
– “wears” out over time
+ no seek time
+ fast transfer rate
– cost
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 17 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 18
write-many screen has many scan lines each of which has many pixels
• expensive, slow phosphorous acts as capacitor- refresh 30-60 times/second
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 19 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 20
Graphics Displays - Frame Buffer Graphics Displays - Frame Buffer
frame buffer stores bit map
CPU Memory
• one entry per pixel
• black - 1 bit per pixel
0.2 MB/s • gray-scale 4-8 bits per pixel
• color (RGB) 8 bits per color
Frame
CRT
Buffer
30 MB/s • typical size 1560 x 1280 pixels
• • black and white: 250 KB
• color (RGB): 5.7 MB
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 21 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 22
color map: frame buffer stores color map index • but read as well
• color map translates index to full 24-bit color BIT BLTS: bit block transfers
Frame Buffer Color Map
(256×24) • read-modify-write operations
X0 17
• e.g., read xor write
120 014 074 CRT
8-bit
• used for cursors etc
index
Y0
open question
24-bit RGB
• OS only?
• 1560 x 1280 with 256-entry color map - factor 3 reduction
• or direct user access? protection?
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 23 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 24
Frame Buffer Implementation Other Issues in Displays
1560 x 1280 RGB display double buffering
• bandwidth required = 1560x1280x24x30 = 171 MB/s • duplicate frame buffer
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 25 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 26
Networks Networks
Terminal networks Long haul networks
• machine-terminal • machine-machine
• star - point-to point • irregular structure - point to point
• 0.3-19 Kbits/s, RS232 protocol • 50-2000 Kbits/s, > 10 km
LANs • Internet
• machine-machine
• bus, ring, star
• 0.1-100 Mbits/s, < 10 km
• ethernet
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 27 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 28
LAN LAN
E.g., Ethernet ATM Asynchronous Transfer Method
• one-write bus with collisions and exponential backoff Phone company uses for long-haul networks (packet-switch)
• within building
not a viable LAN yet
• 10Mb
Now ethernet is
• point to point to clients (switched network)
• with hubs
• client s/w unchanged
• 100Mb
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 29 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 30
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 31 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 32
I/O System Architecture Buses
High Low
CPU Option
Performance cost
Address/data lines separate? yes no
Cache
Data lines wider narrower
CPU - Memory Bus
transfer size multiple single word
words
Frame
Memory IOP Buffer CRT bus masters multiple one
split transactions yes no
I/O Bus
clocking synchronous asynchronous
Disk Disk Network
Controller Controller Interface
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 33 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 34
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 35 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 36
CPU interface CPU Interface
• I/O bus
CPU I/O
+ industry standard
– slower than memory bus
Direct to Cache
Cache
– indirection through IO processor
CPU - Memory Bus
Memory Bus
I/O Bus
I/O Bus
I/O
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 37 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 38
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 39 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 40
Bus Switching Methods Standard I/O Buses
circuit-switched buses Micro PCI-
S bus SCSI
• bus is held until request is complete channel Xpress
• simple protocol data width 32 bits 32 32-64 8-16
• latency of device affects bus utilization clock 16-25 Mhz asynch 256 10/asynch
# masters multiple multiple multiple multiple
split transaction or packet-switched (or pipelined)
b/w, 32-bit 33 MB/s 20 150+ 20 0r 6
• bus is released after request is initiated
read
• others use the bus until reply comes back
b/w, peak 89 75 800+ 20 or 6
• complex bus control
• better utilization of bus
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 41 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 42
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 43 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 44
Communicating with I/O processors Communicating with I/O processors
I/O control I/O completion
• memory mapped • polling
• ld/st to “special” addresses => operations occur • wait for status bit to change
• protected by virtual memory • periodic checking
• I/O instructions • interrupt
• special instructions initiate I/O operations • I/O completion interrupts CPU
• protected by privileged instructions
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 45 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 46
1 user program sets up table in memory with I/O request (pointer disk arrays
to channel program) then execute syscall
redundant arrays of inexpensive disks (RAIDs)
2 OS checks for protection, then executes “start subchannel” instr
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 47 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 48
Extensions to conventional disks Extensions to conventional disks
fixed head disk parallel transfer disk
• head per track, head does not seek • read from multiple surfaces at the same time
• seek time eliminated • difficulty in looking onto different tracks on multiple surfaces
• rotational latency unchanged • lower cost alternatives possible (disk arrays)
• low track density increasing disk density
• not economical • an on-going process
• requires increasingly sophisticated lock-on control
• increases cost
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 49 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 50
B1 B1 B1 B1
• schedule simultaneous I/O requests to reduce latency B2 B2 B2 B2 | | | |
| | | |
A2 B2 C2 D2
• e.g., schedule request with shortest seek time C0
C1
C0
C1
C0
C1
C0
C1
B0 B1 B2 B3
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 51 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 52
Disk Arrays Disk Arrays
independent addressing coarse-grain striping
• s/w user distribute data • data transfer parallelism for large requests
• load balancing an issue • concurrency for small requests
• one bit, one byte, one sector must consider workload to determine stripe size
• #disks x stripe unit evenly divides smallest accessible data
• perfect load balance; only one request served at a time
• effective transfer rate approx N times better than single disk
• access time can go up, unless synchronized disks
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 53 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 54
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 55 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 56
Redundant Array of Inexpensive Disk-RAID Redundant Array of Inexpensive Disk-RAID
level 3: hard error detection and parity (e.g., D=4, C=1) level5: rotated parity to parallelize writes
• key: failed disk is easily identifed by controller • parity spread out across disks in a group
• no need for special code to identify failed disk • different updates of parities go to different disks
• striped data - N data and 1 parity level6: two-dimensional array
• because failed disk is known, parity enough for recovery • array of data is a two-dimensional array
level 4: intra goup parallelism • with row and column parities
• coarse-grain striping • more than 1 failure
• like level 3 + ability to do more than one small I/O at a time
• write must update disk with data and parity disk
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 57 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 58
Better option: I/O is shared resource and sees requests from Little’s Law:
many jobs, so if jobs are independent enough I/O requests will be
random enough that we can use queuing theory (ECE 600, 547) • rate = avg. # in system/avg. response time
• applies to any queue in equilibrium
Think of I/O as a queuing system
• requests enter the queue at a certain rate queue
• wait for service server
arrivals
• service takes certain time
• requests leave the system at a certain rate
• we can calculate response time for each request
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 59 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 60
I/O Performance I/O Performance
Total time in system = time in queue + time in service utilization = arrival rate/service rate
total time is response time - that’s what matters note that little’s law can be applied to individual components
service rate = 1/time to serve • server: # in server = arrival rate x time in service
• queue: queue length = arrival rate x time in queue
lenth of system = length of queue + avg. # of jobs in service
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 61 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 62
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 63 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 64
I/O Performance I/O Performance
time in q = q lengthxservice time + util x average residual time avoid bottlenecks in I/O system
time in q = (service time x (1+C) x util)/ (2 x (1- util)) designing an I/O system
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 65 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 66
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 67 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 68
I/O Performance I/O Performance
CPU limit - 500 MIPS/10000 = 50000 IOPS SCSI-2 transfer = 16KB/20MB/s = 0.8 ms
memory limit - 1/100ns x 16/ 16KB = 10000 IOPS SCSI-2 limit - 1/(1+0.8) = 556 IOPS
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 69 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 70
minimum SCSI-2 buses for 100 2-GB disks = 100/15 = 7 number of disks per SCSI at full b/w = 556/67 = 8
max IOPS for 2 SCSI-2 = 2 x 556 = 1112 number of SCSI for 8-GB = 25/8 = 4
max IOPS for 7 SCSI-2 = 7 x 556 = 3892 number of SCSI for 2-GB = 100/8 = 13
so we have
• 25 8-GB with 2 or 4 SCSI strings
• 100 2-GB with 7 or 13 SCSI strings
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 71 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 72
I/O Performance Unix File System Performance
We assumed 100% util for some of the components cache files in memory
but queuing delay worsens severely with high util • mmeory much faster than disks
so we need to limit util - rules of thumb file cache is key to I/O performanc
• I/O bus < 75% • OS parameters - cache size, write policy
• disk string < 40% • asynchronous writes => processor continues
• disk arm < 60% • coherence in client/server systems
• disk < 80%
• recalculate performance based on these limits
© 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 73 © 2009 by Vijaykumar ECE565 Lecture Notes: Chapter 6 74