ארכיטקטורה- הרצאה 1. - Introduction

Computer Architecture
Based on slides by C. Kozyrakis
Introduction
What Computer Architecture is About

Many different views: How to build programmable digital systems Introduction to processor architecture Understanding why your programs sometimes run much more slowly then you expect or dont run at all Bottom line: Digital systems are ubiquitous Processors are one of the common idioms in digital design
Cant avoid them these days They are in your computers, TV, car, phones, door locks,
It pays to understand how they work

To understand what they can and cant do
Introduction
Major Topics
Hardware-software interface Machine language and assembly language programming Processor design Pipelined processor design Memory hierarchy Virtual memory & operating systems support I/O devices
Instruction sets: RISC Vs. SISC
Introduction
Course Information
Instructor: Yosi Keller E-mail: yosi.keller@gmail.com Office: Room 427 Office Hours: by appointment TA: Tal Darom
Introduction
Other Course Info

Course Text Computer Organization & Design, 4th Edition
By D. Patterson & J. Hennessy CD includes manuals, appendices, simulators, CAD tools,
Website:
ask
Tal
Online material: Google
Grading
Homework Sets Final
0% 100%
Introduction
Lecture 1 Introduction to Programmable Digital Systems
Introduction
Current State of the World

Electronic systems dominate almost everything And most of these systems use processors and memory Why? Break this question into three questions
Why electronics Why use ICs to build electronics Why use processors in ICs
Why use electronics Electrons are easy to move / control Easier than the current alternatives Result is that we move information / not real physical stuff Think phone, email, fax, TV, WWW, etc.
Introduction
Mechanical Alternative to Electronics

Picture of a version of the Babbage difference engine built by the Museum of Science UK
The calculating section of Difference Engine No. 2, has 4,000 moving parts (excluding the printing mechanism) and weighs 2.6 tons. It is seven feet high, eleven feet long and eighteen inches in depth
Introduction
Electronics
Building electronics: Started with tubes, then miniature tubes Transistors, then miniature transistors Components were getting cheaper, more reliable but
There is a minimum cost of a component (storage, handling ) Total system cost was proportional to complexity
Integrated circuits changed that Devices that integrate multiple transistors Printed a circuit, like you print a picture,
Create components in parallel Cost no longer depended on # of devices
What happens as resolution goes up?
Introduction
The Famous Moores Law

Devices get smaller Get more devices on a chip Devices get faster Initial graph from 1965 paper Prediction: 2x per year Not too many data points Slowed down to 2x Every 1.5 to 2 years? Is Moores Law really a Law? What does it say about performance?
Introduction
10
Sense of Scale
What fits on a chip today? Mainstream logic chip
10mm on a side (100mm2) 90nm drawn gate length 210nm wire pitch 10 wires levels
90nm
For comparison
32b RISC integer processor
1K x 2K wire grids 1100 processors 210nm
SRAM
About 4 x 4 grids / bit 138 M SRAM cells
64b FP Processor 32b RISC Processor 10mm (47,000 wire pitches)
DRAM
1 x 2 grids / bit 1.1 B cells
Introduction
11
Technology Scaling
1998
Chip density doubles every 3 years What can you do with this?
2004
More devices harder to design
2010
Introduction
12
The Complexity Problem

Complexity is the limiting factor in modern chip design Two problems
1. How do you make use of all that space? Uberappliance

Cellphone, PDA, iPod, mobile TV, video camera
Too many applications to cast all into hardware logic Takes too long to finish the design 2. How do you make sure it works? Verification problem Only way to survive complexity: Hide complexity in general-purpose components Reuse components
Introduction
13
Programmable Components aka Processors

An old approach to solve complexity problem Build a generic device and customize with memory (program) Best way to do this is with a general purpose processor Processor complexity grows with technology But software model stays roughly the same
C, C++, Java, run on Pentium 2, 3, and 4 True for sequential programs
This is getting much tougher to do

Recent hardware developments require software model changes Multi-core processors
Introduction
14
Microprocessor Complexity
Model has hidden the scaling of technology Efficiently transformed transistors to performance 8080 3,500 transistors, and ran at 200kHz (1975) Pentium4 42M transistors, runs at 3+GHz (2003) Performance changed from 0.06MIPS to >1,000MIPS
COMPUTER ARCHITECTURE, Lecture 1
22
Introduction
15
Key to Complexity: Nice Interfaces

Use abstraction to hide complexity Define an interface to allow people to use features without needing to understand all the implementation details Works for hardware and software Stable interfaces allows people to optimize below and above it
Applications C, C++ Instruction Set Arch. Functional Units Logic Gates Transistors
Introduction
16
But I Never Want to Build Hardware

Why should I care about how a computer works? And why should I have to learn about assembly code? No one codes in assembly any more, right?
Unfortunately that is not correct E.g. compilers, operating systems kernel E.g. Embedded systems, video games
It is still useful to look inside the box Understand limitations of the programmers model Understand strange performance issues
Efficiency and performance issues will become more important
Help you when things go wrong
Introduction
17
Reality #1 Ints are not Integers, Floats are not Reals

Examples Is x2 0?
Floats: 32b Ints:

Yes!
40,000 * 40,000 --> 1,600,000,000 50,000 * 50,000 --> ??
Is (x + y) + z = x + (y + z)?
Unsigned & Signed Ints: Floats:
(1e20 + -1e20) + 3.14 --> 3.14 1e20 + (-1e20 + 3.14) --> ??
Yes!
Introduction
18
Reality #2 Youve got to know assembly

Chances are, youll never write program in assembly Compilers are much better & more patient than you are Understanding assembly key to machine-level execution model Behavior of programs in presence of bugs
High-level language model breaks down
Tuning program performance

Understanding sources of program inefficiency
Implementing system software

Compiler has machine code as target Operating systems must manage process state
Introduction
19
Reality #3 Memory Matters

Memory is not unbounded It must be allocated and managed Many applications are memory dominated Memory referencing bugs especially pernicious Effects are distant in both time and space Memory performance is not uniform Cache and virtual memory can greatly affect program performance Adapting program to characteristics of memory system can lead to major speed improvements
10x to 100x in several cases
Security
Introduction
20
Class Goal
Provide a better understanding of modern digital systems design These systems almost always have a programmable processor Processors are a good example of a complex system
Pipelining and caches
Tie the hardware with the software Most people use processors and dont build them Interaction of HW and SW is fundamental to computer systems Write better software Provide a foundation for other classes in systems Networking, OS, Compilers, Embedded Systems, etc. Understand capabilities of Compilers, OS
Introduction
21
What is a Computer System?

Depends (a little) on what type of computer system We probably mostly think about PC systems
Introduction
22
What is a Computer System?

Actually most computers look like this
Introduction
23
5 components of any Computer
Personal Computer
Computer Processor Control (brain) Datapath Memory (where programs, data live when running) Devices Input
Keyboard, Mouse Disk

(where programs, data live when not running)
Output
Display, Printer
Introduction
24
What is in a Computer System?

Each system is different, but generally have similar parts: Must have: Processor, Memory Interface to outside world (I/O) Generally have: Cache memory System bus Memory controller I/O bus
Introduction
25
Example Processor Based Systems

MIPS processor board MIPS = Microprocessor without Interlocked Pipeline Stages Example: DSP boards PC Board Digital cell phone Game console
Introduction
26
PC Motherboard
Input/output interfaces Power reg/supply PCI slots AGP Memory controller Pentium 4 socket
DIMM slots
COMPUTER ARCHITECTURE, Lecture 1 35
Introduction
27
PC System
Pentium 4 2.66 GHz 8KB Data cache, 12 KB Instruction cache 512 KB L2 Cache 533 MHz System Bus 68 Watts Memory system 4 DDR DIMM slots Up to 4 GB I/O interfaces Ethernet USB Serial ATA (disk) Serial port Parallel port Firewire
Introduction
28
Digital Cell Phone (Nokia 8260) Front Side

Battery 900 mAhr 3.5 hr talk ~1W 8 days standby ~ 1mW
ARM processor
Introduction
29
Digital Cell Phone (Nokia 8260) Back Side
Introduction
30
64 b MIPS CPU 300 MHz Behavioral synthesis, geometry processing, main system control
PS2 Motherboard
Rendering Texture Framebuffer ops
R3000 CPU (120K transistors) R3010 FPU 32 KB Instruction cache 32 KB Data cache 256 KB secondary cache Memory controller chips
32 b MIPS CPU 34 MHz IO processing PS1 emulation
COMPUTER ARCHITECTURE, Lecture 1
39
Introduction
31
What do Computer Architects Do?

I/O Chan
Applications
Interfaces
IR Regs
Technology
Machine Organization
Computer Architect Software Requirements Measurement & Analysis
The science/art of constructing efficient systems for computing tasks
Introduction
Link
API
ISA
32
Application: Constraints & Opportunities

Applications drive machine balance
Scientific computations
Floating-point performance Main memory bandwidth
Transaction/web processing
??
Multimedia processing
??
Embedded control
??
Architecture concepts typically exploit application behavior

Based on slides by C. Kozyrakis Introduction 33
Applications Change over Time

Data-sets & memory requirements larger
Cache & memory architecture become more critical
Standalone networked
IO integration & system software become more critical
Single task multiple tasks

Parallel architectures become critical
Limited IO requirements rich IO requirements

60s: tapes & punch cards 70s: character oriented displays 80s: video displays, audio, hard disks 90s: 3D graphics; networking, high-quality audio 00s: real-time video, immersion,
Introduction
34
Application Properties to Exploit in Computer Design

Locality in memory/IO references
Programs work on subset of instructions/data at any point in time Both spatial and temporal locality
Parallelism
Data-level (DLP): same operation on every element of a data sequence Instruction-level (ILP): independent instructions within sequential program Thread-level (TLP): parallel tasks within one program Multi-programming: independent programs Pipelining
Predictability
Control-flow direction, memory references, data values
Introduction
35
Technology Trends & Constraints: Yearly Improvement

Integrated circuits: logic
60% more devices per chip 15% faster devices Long wires dont improve 1992 1995
Integrated circuits: DRAM

60% more devices per chip 7% reduction in latency 14% increase in bandwidth 1998
Magnetic Disks
60% to 100% increase in density
IO/networking
Little improvement in latency Large improvements in bandwidth through fast/wide signaling 2001 64x more devices since 1992 4x faster devices
Introduction
36
Changes in Technology & Applications lead to Changes in Architecture

1970s
Multi-chip CPUs Semiconductor memory very expensive Complex instruction sets (good code density) Microcoded control
1990s
1 M - 64M transistors, 64b CPUs Complex control to exploit instructionlevel parallelism Deep pipelines Multi-level caches
2000s
100 M - 5 B transistors Slow wires, power consumption, design, complexity, memory latency, IO bottlenecks, Multiprocessors & parallel systems Support & programming for parallelism?
1980s
5K 500 K transistors Single-chip, pipelined CPUs On-chip memory possible Simple, hard-wired control Simple instruction sets Small on-chip caches
Keeps computer architecture interesting and challenging

Architects use a Quantitative Approach
Iterative Process
Tools that help us analyze, estimate, and compare efficiency
Sort
New concepts created Good ideas worth implementing
Bad ideas
Mediocre ideas
Introduction
38
Metrics of Efficiency
Desktop computing ($500 - $3K)
Metrics: ?? Prominent processors: Intel Pentium, AMD Athlon, PowerPC G5
Server computing ($3K - $1M)

Metrics: ?? Prominent processors: IBM Power5, Sun UltraSparc, AMD Opteron
Embedded computing ($10 - $500)

Metrics: ?? Prominent processors: ARM, MIPS, Motorola 68K, many others
Diversity in requirements leads to diversity in architectures
Introduction
39
Performance Metrics
Plane Boeing 747 BAD/Sud Concorde DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200
Latency or execution time or response time

Wall-clock time to complete a task Important if all we have to run is a single or a time-critical time to run
Bandwidth or throughput or execution rate

Number of tasks completed per unit of time
Bandwidth = total amount of work / total execution time
Metric is independent of exact number of tasks executed Important when we have many tasks to run
Introduction
40
Examples
Latency metric: program execution time in seconds
CPUtime = Seconds Cycles Seconds = Pr ogram Pr ogram Cycle
Instructions Cycles Seconds Pr ogram Instructio n Cycle
= ICC P ICCT
Your system architecture can affect all of them
CPI: memory latency, IO latency, CPI = Cycles per Instruction CCT: cache organization, CCT = Clock cycle time IC: OS overhead,
Bandwidth metrics:
Network bandwidth: 1 Gb/s ethernet Database server throughput: 106 transactions/sec
Introduction
41
Cycles Per Instruction

Average Cycles per Instruction
CPI(machine, program) = total number of clock cycles/ # of instructions executed CPU tim e= CycleiTme *
CPI I
i=1 i
Instruction Frequency
CPI =
CPI F *
i=1 i i
where
Fi =
instruction count
Invest Resources where time is Spent!
Introduction
42
Example: Calculating CPI

Base Machine (Reg / Reg) Op Freq Cycles CPI(i) ALU 50% 1 .5 Load 20% 2 .4 Store 10% 2 .2 Branch 20% 2 .4 1.5
Typical Mix in code
(% Time) (33%) (27%) (13%) (27%)
Introduction
43
A is Faster than B?
Given the CPUtime for machines A and B, A is X times faster than B means:
CPUTimeB X= CPUTime A
Example, CPUtimeA=3.4sec & CPUtimeB=5.3sec then
A is 5.3/3.4=1.55 times faster than B or 55% faster
If you start with bandwidth metrics of performance, use inverse ratio
X=
BandWidthA BandWidthB
Introduction 44
Speedup and Amdahls Law

Speedup = CPUtimeold / CPUtimenew Given an optimization x that accelerates fraction fx of program by a factor of Sx, how much is the overall speedup?
Speedup = CPUTimeold CPUTimeold 1 = = CPUTimenew CPUTime [(1 f ) + f x ] (1 f ) + f x old x x Sx Sx
Lessons from Amdhals law

Make common cases fast: as fx1, speedupSx But dont overoptimize common case: as Sx, speedup 1 / (1-fx)
Speedup is limited by the fraction of the code that can be accelerated Uncommon case will eventually become the common one
Introduction
45
Amdahls Law Example

If Sx=100, what is the overall speedup as a function of fx?
Speedup vs Optimized Fraction
100 90 80 70 60 Speedup 50 40 30 20 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction of Code Optimized
Introduction
46
Performance Sensitivity
Definitions: Ci contributor i to performance P performance absolute change in x x x /x relative change in x Sensitivity of P to change in Ci: to absolute change in Ci: P(C1 ,..., Ci ,...C N ) Ci to relative change in Ci:
P(C1 ,..., Ci ,...C N ) Ci Ci
47
Introduction
Relative Sensitivity
Relative sensitivity of P to relative changes in Ci versus its relative sensitivity to relative changes in Cj:
P(C1 ,..., Ci ,...C N ) Ci P (C1 ,..., Ci ,...C N ) Ci Ci Ci = P (C1 ,..., Ci ,...C N ) P (C1 ,..., Ci ,...C N ) C j C j C j Cj
Introduction
48
Effect of changes on relative sensitivity example

Let P=C1 + C2 (larger P is better) How does increasing C1 affect the relative benefit of
increasing C1 rather than C2 by the same percentage?
Solution: relSens = C1 / C2 so, the relative benefit of increasing C1 grows. Underlying theory: Amdahls law.
Introduction
49
Aspects of CPU Performance

CPU time CPU time = Seconds = Instructions x Cycles x Seconds = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Program Program Instruction Cycle
Program Compiler Inst. Set. Organization Technology
Inst Count X X X
CPI
Clock Rate
X X X X X
i
Introduction
50
Evaluating Performance
What do we mean by performance? How do we select benchmark programs? How do we summarize performance across a suite of programs?
When to use the different types of means Statistics for architects
Introduction
51
Choosing Benchmark Programs

Criteria
Representative of real workloads in some way Hard to cheat (i.e. get deceptively good performance that will never be seen in real life)
Best solution: run substantial, real-world programs

Representative because real Improvements on these programs = improvements in the real world but require more effort than toy benchmarks
Examples:
SPEC CPU integer/floating-point suites TPC transaction processing benchmarks
Introduction
52
How do you summarize performance?

Combining different benchmark results into 1 number: sometimes misleading, always controversialand inevitable
Arithmetic mean: for times
Statistics for architects: benchmark suites as samples of a population

Distributions Confidence intervals
Introduction
53
(Weighted) Arithmetic Mean

1 n (Weighti ) Timei n i =1
Machine A Prog. 1 (sec) Prog. 2 (sec) Mean (50/50) Mean (75/25) 1 1000 500.5 250.75 Machine B 10 100 55 32.5 Speedup (B over A) 0.1 10 9.1 7.7
If you know your exact workload (benchmarks & relative frequencies), this is the right way to summarize performance.
Statistics for Architects

Means are nice, but they dont tell you the whole truth
More info when you run 1,000 programs on a machine More info when you run one program on 1,000 machine configurations
Next few slides: basic tools for statistics for computer architectus
How to observe large collections of experiment results How to represent large collections of experiment results
Introduction
55
Populations and Samples

Population: set of observations measured for ALL members of group
Forms a distribution Uncertainty: individual measurement errors
Sample: subset of population

Compute statistics Extra uncertainty: small samples or selection bias Population, N Parameters Sample size, representativeness
Sample, n Statistics
Introduction
Estimate population mean and std.dev Confidence interval for mean

56
Basic Assumptions
Measurements are repeatable
Same program + input gives same performance Valid for most programs/machines worth verifying Watch out for non-deterministic programs
Choice of input doesnt change relative performance of different machines

Usually true counterexample?
Number of benchmarks in suite (sample size) is large enough to yield good conclusions
Confidence intervals help verify this
Benchmarks are representative and not a biased sample

Can only address qualitatively
Data Distributions with Same Arithmetic Mean

Multi-modal (here, left-skewed) Right-skewed Uniform Symmetric Triangular Normal (+) Symmetric Terrific! Statistics toolkit OK, not much central tendency Good, more central tendency Uncertain Awful, but hope
Lognormal (*)
.001 .01 Based on slides by C. Kozyrakis .1 1 GM 10
100 1000 Introduction
Log-symmetric Terrific! Statistics toolkit

58
General Distribution Descriptions

Mean: measure of central tendency, 1st moment Variance: measure of dispersion, 2nd moment Standard deviation: measure of dispersion, same scale as Mean
1 AVERAGE : AM = = N 1 VARP : 2 = N
x
i =1
= Arithmetic Mean
(xi )2
i =1
SDEVP : = 2
Introduction
59
The Familiar Normal (Gaussian) Distribution

Arises from large number of small additive effects Completely specified by mean m and standard deviation Familiar, useful properties never automatically assume normal, but hope 68% within m -/+ ; 95% within -/+ 2 ; 99.7% within m -/+ 3 Symmetric around the mean = intuitive measure of central tendency
ms m 2s m 3s
0.0 0.5 1.0 1.5 2.0
m+s m + 2s m + 3s
2.5 3.0 3.5 4.0
m
---68%-------------95%----------------------------99.7%-----------------Based on slides by C. Kozyrakis Introduction 60

ארכיטקטורה- הרצאה 1. - Introduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ארכיטקטורה- הרצאה 1. - Introduction

Uploaded by

Copyright:

Available Formats

Computer Architecture

Based on slides by C. Kozyrakis

Based on slides by C. Kozyrakis

What Computer Architecture is About

It pays to understand how they work

Based on slides by C. Kozyrakis

Instruction sets: RISC Vs. SISC

Based on slides by C. Kozyrakis

Based on slides by C. Kozyrakis

Other Course Info

Online material: Google

Based on slides by C. Kozyrakis

Lecture 1 Introduction to Programmable Digital Systems

Based on slides by C. Kozyrakis

Current State of the World

Based on slides by C. Kozyrakis

Mechanical Alternative to Electronics

Based on slides by C. Kozyrakis

What happens as resolution goes up?

Based on slides by C. Kozyrakis

The Famous Moores Law

Based on slides by C. Kozyrakis

64b FP Processor 32b RISC Processor 10mm (47,000 wire pitches)

Based on slides by C. Kozyrakis

More devices harder to design

Based on slides by C. Kozyrakis

The Complexity Problem

1. How do you make use of all that space? Uberappliance

Based on slides by C. Kozyrakis

Programmable Components aka Processors

This is getting much tougher to do

Based on slides by C. Kozyrakis

COMPUTER ARCHITECTURE, Lecture 1

Based on slides by C. Kozyrakis

Key to Complexity: Nice Interfaces

Based on slides by C. Kozyrakis

But I Never Want to Build Hardware

Help you when things go wrong

Based on slides by C. Kozyrakis

Reality #1 Ints are not Integers, Floats are not Reals

40,000 * 40,000 --> 1,600,000,000 50,000 * 50,000 --> ??

Based on slides by C. Kozyrakis

Reality #2 Youve got to know assembly

Tuning program performance

Implementing system software

Based on slides by C. Kozyrakis

Reality #3 Memory Matters

Based on slides by C. Kozyrakis

Based on slides by C. Kozyrakis

What is a Computer System?

Based on slides by C. Kozyrakis

What is a Computer System?

Based on slides by C. Kozyrakis

5 components of any Computer

Keyboard, Mouse Disk

Based on slides by C. Kozyrakis

What is in a Computer System?

Based on slides by C. Kozyrakis

Example Processor Based Systems

Based on slides by C. Kozyrakis

Based on slides by C. Kozyrakis

Based on slides by C. Kozyrakis

Digital Cell Phone (Nokia 8260) Front Side