Professional Documents
Culture Documents
Introduction
Introduction
Major Topics
Hardware-software interface Machine language and assembly language programming Processor design Pipelined processor design Memory hierarchy Virtual memory & operating systems support I/O devices
Introduction
Course Information
Instructor: Yosi Keller E-mail: yosi.keller@gmail.com Office: Room 427 Office Hours: by appointment TA: Tal Darom
Introduction
Website:
ask
Tal
Grading
Homework Sets Final
0% 100%
Introduction
Introduction
Why use electronics Electrons are easy to move / control Easier than the current alternatives Result is that we move information / not real physical stuff Think phone, email, fax, TV, WWW, etc.
Introduction
The calculating section of Difference Engine No. 2, has 4,000 moving parts (excluding the printing mechanism) and weighs 2.6 tons. It is seven feet high, eleven feet long and eighteen inches in depth
Introduction
Electronics
Building electronics: Started with tubes, then miniature tubes Transistors, then miniature transistors Components were getting cheaper, more reliable but
There is a minimum cost of a component (storage, handling ) Total system cost was proportional to complexity
Integrated circuits changed that Devices that integrate multiple transistors Printed a circuit, like you print a picture,
Create components in parallel Cost no longer depended on # of devices
Introduction
Introduction
10
Sense of Scale
What fits on a chip today? Mainstream logic chip
10mm on a side (100mm2) 90nm drawn gate length 210nm wire pitch 10 wires levels
90nm
For comparison
32b RISC integer processor
1K x 2K wire grids 1100 processors 210nm
SRAM
About 4 x 4 grids / bit 138 M SRAM cells
DRAM
1 x 2 grids / bit 1.1 B cells
Introduction
11
Technology Scaling
1998
Chip density doubles every 3 years What can you do with this?
2004
2010
Introduction
12
Too many applications to cast all into hardware logic Takes too long to finish the design 2. How do you make sure it works? Verification problem Only way to survive complexity: Hide complexity in general-purpose components Reuse components
Introduction
13
Introduction
14
Microprocessor Complexity
Model has hidden the scaling of technology Efficiently transformed transistors to performance 8080 3,500 transistors, and ran at 200kHz (1975) Pentium4 42M transistors, runs at 3+GHz (2003) Performance changed from 0.06MIPS to >1,000MIPS
22
Introduction
15
Introduction
16
It is still useful to look inside the box Understand limitations of the programmers model Understand strange performance issues
Efficiency and performance issues will become more important
Introduction
17
Yes!
Is (x + y) + z = x + (y + z)?
Unsigned & Signed Ints: Floats:
(1e20 + -1e20) + 3.14 --> 3.14 1e20 + (-1e20 + 3.14) --> ??
Yes!
Introduction
18
Introduction
19
Security
Introduction
20
Class Goal
Provide a better understanding of modern digital systems design These systems almost always have a programmable processor Processors are a good example of a complex system
Pipelining and caches
Tie the hardware with the software Most people use processors and dont build them Interaction of HW and SW is fundamental to computer systems Write better software Provide a foundation for other classes in systems Networking, OS, Compilers, Embedded Systems, etc. Understand capabilities of Compilers, OS
Introduction
21
Introduction
22
Introduction
23
Personal Computer
Computer Processor Control (brain) Datapath Memory (where programs, data live when running) Devices Input
Output
Display, Printer
Introduction
24
Introduction
25
Introduction
26
PC Motherboard
Input/output interfaces Power reg/supply PCI slots AGP Memory controller Pentium 4 socket
DIMM slots
COMPUTER ARCHITECTURE, Lecture 1 35
Introduction
27
PC System
Pentium 4 2.66 GHz 8KB Data cache, 12 KB Instruction cache 512 KB L2 Cache 533 MHz System Bus 68 Watts Memory system 4 DDR DIMM slots Up to 4 GB I/O interfaces Ethernet USB Serial ATA (disk) Serial port Parallel port Firewire
Introduction
28
ARM processor
Introduction
29
Introduction
30
64 b MIPS CPU 300 MHz Behavioral synthesis, geometry processing, main system control
PS2 Motherboard
Rendering Texture Framebuffer ops
R3000 CPU (120K transistors) R3010 FPU 32 KB Instruction cache 32 KB Data cache 256 KB secondary cache Memory controller chips
39
Introduction
31
Applications
Interfaces
IR Regs
Technology
Machine Organization
Introduction
Link
API
ISA
32
Transaction/web processing
??
Multimedia processing
??
Embedded control
??
Standalone networked
IO integration & system software become more critical
Introduction
34
Parallelism
Data-level (DLP): same operation on every element of a data sequence Instruction-level (ILP): independent instructions within sequential program Thread-level (TLP): parallel tasks within one program Multi-programming: independent programs Pipelining
Predictability
Control-flow direction, memory references, data values
Introduction
35
Magnetic Disks
60% to 100% increase in density
IO/networking
Little improvement in latency Large improvements in bandwidth through fast/wide signaling 2001 64x more devices since 1992 4x faster devices
Introduction
36
1990s
1 M - 64M transistors, 64b CPUs Complex control to exploit instructionlevel parallelism Deep pipelines Multi-level caches
2000s
100 M - 5 B transistors Slow wires, power consumption, design, complexity, memory latency, IO bottlenecks, Multiprocessors & parallel systems Support & programming for parallelism?
1980s
5K 500 K transistors Single-chip, pipelined CPUs On-chip memory possible Simple, hard-wired control Simple instruction sets Small on-chip caches
Iterative Process
Sort
Bad ideas
Mediocre ideas
Introduction
38
Metrics of Efficiency
Desktop computing ($500 - $3K)
Metrics: ?? Prominent processors: Intel Pentium, AMD Athlon, PowerPC G5
Introduction
39
Performance Metrics
Plane Boeing 747 BAD/Sud Concorde DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200
Metric is independent of exact number of tasks executed Important when we have many tasks to run
Introduction
40
Examples
Latency metric: program execution time in seconds
CPUtime = Seconds Cycles Seconds = Pr ogram Pr ogram Cycle
= ICC P ICCT
Your system architecture can affect all of them
CPI: memory latency, IO latency, CPI = Cycles per Instruction CCT: cache organization, CCT = Clock cycle time IC: OS overhead,
Bandwidth metrics:
Network bandwidth: 1 Gb/s ethernet Database server throughput: 106 transactions/sec
Introduction
41
CPI I
i=1 i
Instruction Frequency
CPI =
CPI F *
i=1 i i
where
Fi =
instruction count
Introduction
42
Introduction
43
A is Faster than B?
Given the CPUtime for machines A and B, A is X times faster than B means:
CPUTimeB X= CPUTime A
Example, CPUtimeA=3.4sec & CPUtimeB=5.3sec then
A is 5.3/3.4=1.55 times faster than B or 55% faster
X=
BandWidthA BandWidthB
Introduction 44
Introduction
45
Introduction
46
Performance Sensitivity
Definitions: Ci contributor i to performance P performance absolute change in x x x /x relative change in x Sensitivity of P to change in Ci: to absolute change in Ci: P(C1 ,..., Ci ,...C N ) Ci to relative change in Ci:
P(C1 ,..., Ci ,...C N ) Ci Ci
47
Introduction
Relative Sensitivity
Relative sensitivity of P to relative changes in Ci versus its relative sensitivity to relative changes in Cj:
P(C1 ,..., Ci ,...C N ) Ci P (C1 ,..., Ci ,...C N ) Ci Ci Ci = P (C1 ,..., Ci ,...C N ) P (C1 ,..., Ci ,...C N ) C j C j C j Cj
Introduction
48
Solution: relSens = C1 / C2 so, the relative benefit of increasing C1 grows. Underlying theory: Amdahls law.
Introduction
49
Inst Count X X X
CPI
Clock Rate
X X X X X
i
Introduction
50
Evaluating Performance
What do we mean by performance? How do we select benchmark programs? How do we summarize performance across a suite of programs?
When to use the different types of means Statistics for architects
Introduction
51
Examples:
SPEC CPU integer/floating-point suites TPC transaction processing benchmarks
Introduction
52
Introduction
53
If you know your exact workload (benchmarks & relative frequencies), this is the right way to summarize performance.
Based on slides by C. Kozyrakis Introduction 54
Next few slides: basic tools for statistics for computer architectus
How to observe large collections of experiment results How to represent large collections of experiment results
Introduction
55
Sample, n Statistics
Introduction
Basic Assumptions
Measurements are repeatable
Same program + input gives same performance Valid for most programs/machines worth verifying Watch out for non-deterministic programs
Number of benchmarks in suite (sample size) is large enough to yield good conclusions
Confidence intervals help verify this
Lognormal (*)
.001 .01 Based on slides by C. Kozyrakis .1 1 GM 10
1 AVERAGE : AM = = N 1 VARP : 2 = N
x
i =1
= Arithmetic Mean
(xi )2
i =1
SDEVP : = 2
Introduction
59
m+s m + 2s m + 3s
2.5 3.0 3.5 4.0
m
---68%-------------95%----------------------------99.7%-----------------Based on slides by C. Kozyrakis Introduction 60