You are on page 1of 44

CS 704 Advanced Computer Architecture

Lecture 2
Quantitative Principles

Detailed discussion on the computer Performance the key to quantitative design and analysis
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

Todays Topics
Recap of Lecture 1 Growth in processor performance Price-performance design CPU performance metrics CPU benchmarks suites Summary
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

Recap of Lecture 1
Computer Systems:

Architecture refers to those attributes of a computer visible to a programmer or compiler writer; e.g. instruction set, addressing techniques, I/O mechanisms etc.
Organization refers to how the features of a computer are implemented? i.e., control signals are generated using the principles of finite state machine (FSM) or microprogramming
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

Recap of Lecture 1
Computer Development: Academically, modern computer developments have their infancy in 1944-49
Commercially, the first machine was built by EckertMauchly Computer Corporation in 1949 Technological developments, from vacuum tubes to VLSI circuits, dynamic memory and network technology gave birth to four different generations of computers.
MAC/VU-Advanced Microprocessor and PCs - were introduced in 1971 Computer Architecture Lecture 2 Performance 4

Recap of Lecture 1 Design Perspectives: Processor ISA, ILP and Cache Memory hierarchy: Multilevel cache and Virtual memory input/output and storages multiprocessor and networks
Lecture 2 - Performance
5

MAC/VU-Advanced Computer Architecture

Recap of Lecture 1
Computer Design Cycle:

The computer design and development has been under the influence of
-Technology

-performance and
-cost;

the decisive factors for rapid changes in the computer development have been the performance enhancements, price reduction and functional improvements.
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

Growth in Processor Performance Insert Slide 9 here


The supercomputers and mainframes, costing millions of dollars and occupying excessively large space, prevailing form of computing in 1960s were replaced with relatively low-cost and smaller-sized minicomputers in 1970s In 1980s, very low-cost microprocessor-based desktop computing machines in the form of personal computer (PC) and workstation were introduced.
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

Growth in Processor Performance Insert Slide 9 here


The growth in processor performance since mid-1980s has been substantially high than in earlier years Prior to the mid-1980s microprocessor performance growth was averaged about 35% per year

By 2001 the growth raised to about 1.58 per year


MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

Growth in Processor Performance


1600
Performance relative to MIPS

Intel P-III HP 9000

1400 1200 1000 800 600 400 200 0


DEC Alpha IBM HP 9000 DEC MIPS Power1 Alpha R2000

1984 1986 1988 1990 1992 1994 1996 1998 200 0

Year

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

Price-Performance Design
Technology improvements are used to lower the cost and increase performance. The relationship between cost and price is complex one The cost is the total amount spends to produce a product The price is the amount for which a finished good is sold.
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

10

Price-Performance Design

The cost passes through different stages before it becomes price.


A small change in cost may have a big impact on price
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

11

Price vs. Cost .. Insert Slide 14 here


Manufacturing Costs: Total amount spent to
produce a component - Component Cost: Cost at which the components are available to the designer. - It ranges from 40% to 50% of the list price of the product. - Recurring costs: Labor, purchasing scrap, warranty 4% - 16 % of list price - Gross margin Non-recurring cost: R&D, marketing, sales, equipment, rental, maintenance, financing cost, pre-tax profits, taxes Lecture 2 - Performance
MAC/VU-Advanced Computer Architecture

12

Price vs. Cost .. Insert Slide 14

here
List Price:

Amount for which the finished good is sold; it includes Average Discount of 15% to 35% of the as volume discounts and/or retailer markup
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

13

Price vs. Cost .. Price-Performance Design Contd

100% 80% 60% 40% 20% 0% Mini


MAC/VU-Advanced Computer Architecture

Average Discount Gross Margin Direct Costs Component Costs

W/S
Lecture 2 - Performance

PC
14

Cost-effective IC Design:

Price-Performance Design

Yield: Percentage of manufactured


components surviving testing

Volume: increases manufacturing hence


decreases the list price and improves the purchasing efficiency

Feature Size:

the minimum size of a transistor or wire in either x or y direction

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

15

Cost-effective IC Design:

Price-Performance Design

Reduction in feature size from 10 microns in 1971 and 0.18 in 2001has resulted in:

- Quadratic rise in transistor count


Linear increase in performance 4-bit to 64-bit microprocessor Desktops have replaced time-sharing machines

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

16

Cost of Integrated Circuits


Manufacturing Stages:

The Integrated circuit manufacturing passes through many stage: Wafer growth and testing Wafer chopping it into dies Packaging the dies to chips Testing a chip.
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

17

Cost of Integrated Circuits


Insert Slide 19 here

Die: is the square area of the wafer containing the integrated circuit
See that while fitting dies on the wafer the small wafer area around the periphery goes waist

Cost of a die: The cost of a die is determined from cost of a wafer; the number of dies fit on a wafer and the percentage of dies that work, i.e., the yield of the die.
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

18

Dies of Integrated Circuits

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

19

Cost of Integrated Circuits Insert Slide 21 here


The

cost of integrated circuit can be determined as ratio of the total cost; i.e., the sum of the costs of die, cost of testing die, cost of packaging and the cost of final testing a chip; to the final test yield.
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

20

Calculating Integrated Circuits Costs


Cost of IC
=

die cost + die testing cost + packaging cost + final testing cost

final test yield

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

21

Cost of Integrated Circuits Insert Slide 23 here


The

cost of die is the ratio of the cost of the wafer to the product of the dies per wafer and die yield

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

22

Calculating Integrated Circuits Costs


Cost of IC
=

die cost + die testing cost + packaging cost + final testing cost

final test yield Cost of die


=

Cost of wafer dies per wafer x die yield

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

23

Cost of Integrated Circuits Insert Slide 25 here


The

number of dies per wafer is determined by the dividing the wafer area (minus the waist wafer area near the round periphery) by the die area

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

24

Calculating Integrated Circuits Costs


Cost of IC
=

die cost + die testing cost + packaging cost + final testing cost

final test yield Cost of die


=

Cost of wafer dies per wafer x die yield

Dies per wafer = (wafer diameter/2)2 die area


MAC/VU-Advanced Computer Architecture

(wafer diameter) 2 x die area

Lecture 2 - Performance

25

Example Calculating Number of Dies


For die of 0.7 Cm on a side, find the number of dies per wafer of 30 cm diameter

Answer: [Wafer area / Die Area] - Wafer Waist area

= (30/2)2 / 0.49 - (30) / (2 x 0.49)


= 1347 dies
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

26

Example
For die of 0.7 Cm on a side, find the number of dies per wafer of 30 cm diameter

Answer: [Wafer area / Die Area] - Wafer Waist area

= (30/2)2 / 0.49 - (30) / (2 x 0.49)


= 1347 dies
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

27

Calculating Die Yield Insert Slide 29 here


Die yield is the fraction or percentage of good dies on a wafer number Wafer yield accounts for completely bad wafers so need not be tested Wafer yield corresponds to on defect density by which depends on number of masking levels good estimate for CMOS is 4.0 and
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

28

Calculating Integrated Circuits Costs


Die yield =

Wafer yield x (1 + defects per unit area x die area) -


Example:

The yield of a die, 0.7cm on a side, with defect density of 0.6/cm2 = (1+[0.6x0.47]/4.0) -4 = 0.75
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

29

Price-Performance Design

Time to run the task: Execution time, response time, latency Throughput or bandwidth: Tasks per day, hour, week, sec, ns

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

30

Price-Performance Design Insert Slid 32


Example: To carry 2400 passengers from Lahore to Islamabad
Train completes the task in 4:00 hrs while airplane completes the same task in 6.00 hrs.; .e., 66.67% of the task in same time throughput and hence performance of train is 50% more than airplane

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

31

Price-Performance Design: Example


Vehicle Time Lah to Isb
Passenge rs/ trip Time to complete job

Execution time /person

Cost / person

Cost-performance

Train

4.0 hours

2400

4.0 hours

6.0 sec

300 Rs.

300x6=1,800 Rs-sec/person

Plane

45 min.

300

45x8 min. = 6.0 Hr

9.0 sec.

3000 Rs.

3000x9=27,000 Rs-sec/person

Plane 10 time faster but takes 50% more time to complete the job; i.e., lesser throughput thus performance of train is MAC/VU-Advanced 50%better than plane Computer Architecture

The time per person and cost person of train is less than that of plane Thus the cost-performance of plane
is 1:15
32

Lecture 2 - Performance

Metrics of Performance Insert Slide 33


Application
Programming Language Answers per month Operations per second

Compiler

MIPS: Millions of Instructions per second MFLOPS: millions of FP operations per sec.

Instruction Set Architecture


Datapat hControl Function Units Transistors Pins/ Wire I/O
MAC/VU-Advanced Computer Architecture

Megabytes per second

Cycles per second (clock rate)

Lecture 2 - Performance

33

Aspects of CPU Performance


CPU time =
Seconds = Instructions x Cycles x Seconds

Program

Program

Instruction

Cycle

Inst Count
Program

CPI

Clock Rate

Compiler
Inst. Set. Organization Technology
MAC/VU-Advanced Computer Architecture


Lecture 2 - Performance
34

Cycles Per Instruction


Cycles

per Instruction CPI = CPU Clock Cycles for program / Instruction Count
= (CPU Time * Clock Rate) / Instruction Count

Instruction

Frequency

For instruction mix, the relative frequency of occurrence of different types of instructions is given as:

FICi = IC of ith instruction / Total Instruction count Average Cycles per Instruction
n i=1
MAC/VU-Advanced Computer Architecture

n i=1
35

CPI = [1/Instruction count] ICi x CPIi = FICi x CPIi


Lecture 2 - Performance

Example: Calculating average CPI


Base Machine (Reg / Reg)
Op ALU Load Store Branch Freq 50% 20% 10% 20% Cycles 1 2 2 2 CPI (i) 0.5 0.4 0.2 0.4 (% Time) (33%) (27%) (13%) (27%)

1.5

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

36

Cycles Per Instruction


n

Arithmetic mean time:


n

1/n Time i i=1

Weighted arithmetic mean time: w i x Time i


i=1

Geometric mean time:


n __________________

/ n / Execution time ratio i I =1


MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

37

Summary: Price-Performance Design


Computer cost:
The total cost of manufacturing a computer is distributed among different parts of the system such as the cost of cabinet, processor board and I/O devices.

Performance Time is the key measurement of performance Comparing performance of two designs: the ratio,
= Execution time Y / Execution time X determines how much lower execution time machine Y takes as compared to X ; as performance is inverse of execution time, i.e., = Performance X / Performance Y
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

38

Instruction Execution Rate - MIPS


MIPS specify performance inversely to execution time;
For a given program:

MIPS = (instruction count) / (execution time x 106)


MIPS could not be calculated from the instruction mix Relative MIPS for a machine M is defined based on some reference machine as: RMIPS = [Performance M / Performance reference] x MIPS reference or = [Time reference / Time M] x MIPS reference

MFLOPS defined for Floating-point-intensive programs as millions of floating-point operations per second
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

39

CPU Benchmark Suites


Performance Comparison: the execution time of
the same workload running on two machines without running the actual programs Benchmarks: the programs specifically chosen to measure the performance. Five levels of programs: in the decreasing order of accuracy Real Applications Modified Applications Kernels Toy benchmarks Synthetic benchmarks
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

40

SPEC:
SPECmarks

System Performance Evaluation Cooperative

First Round 1989: 10 programs yielding a single number

Second Round 1992: SPECInt92 (6 integer programs) and


SPECfp92 (14 floating point programs)

Third Round 1995


new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) benchmarks useful for 3 years Single flag setting for all programs: SPECint_base95, SPECfp_base95
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

41

Summary: Designing and performance comparison


Designing to Last through Trends Capacity
Logic
DRAM Disk

Speed
2x in 3 years
2x in 10 years 2x in 10 years

2x in 3 years
4x in 3 years 4x in 3 years

6yrs to graduate => 16X CPU speed, DRAM/Disk size Execution time, response time, latency

Time to run the task


Tasks per day, hour, week, sec, ns,
Throughput, bandwidth

X is n times faster than Y means


ExTime(Y) =
MAC/VU-Advanced Computer Architecture

Performance(X) Performance(Y)
Lecture 2 - Performance
42

ExTime(X)

Summary .. Contd
CPI Law:
CPU time = Seconds = Instructions x Cycles x Seconds

Program

Program

Instruction

Cycle

Execution time is the REAL measure of computer performance! Good products created when have: Good benchmarks, good ways to summarize performance Die Cost goes roughly with die area4

MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

43

Summary

.. Contd

For better or worse, benchmarks shape a field Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!
MAC/VU-Advanced Computer Architecture

Lecture 2 - Performance

44

You might also like