You are on page 1of 7

What Does Performance Mean?

Response time
Lecture 2: Performance

Measure one task
e.g. A simulation program finishes in 20 minutes
Evaluation Methodology Throughput
Measure many tasks
and Dependability
e.g. A web server serves 5 million requests per second
Other metrics
MIPS (million instruction per second)
Chapter 1 (1.7~1.9) MFLOPS (millions of floating point operations per
second)
BIPS, GFLOPS
Clock frequency

Quantitative Definitions Performance Comparison


X is n times faster than Y:
Use response time or execution time Performance x Execution time y
= =n
Performance is 1/[Execution time] Performance y Execution time x
Performance is 1/CPI (cycles per instruction) n: speedup if we are considering an
Performance is IPC (instructions per cycle) enhancement, optimization, etc.
Elapsed time vs. CPU time
Elapsed: latency to complete a task Improve performance: decrease execution time,
CPU: user + system, no waiting time increase throughput
Use throughput
Performance is # tasks per second or hour

3 4

1
Performance of Computers Benchmark Suite
Performance is defined for a given program and a given
machine. How about the machine alone? Need Benchmark suite is a collection of benchmarks with a
benchmark programs: variety of applications
Real applications: scientific programs, compilers, Alleviating weakness of a single benchmark
text-processing software, image processing More representative for computer designers to evaluate
Modified applications: providing portability and focus their design
Kernels: good to isolate performance of individual Categories of benchmark suites
features Desktop benchmarks: CPU, memory, and graphics
Lmbench: measure latency and bandwidth of memory, file performance
system, networking, etc.
Sever benchmarks: throughput-oriented, I/O and OS
Toy benchmarks intensive
Synthetic benchmarks: matching average execution Embedded benchmarks: measuring the ability to meet
profile deadline and save power

5 6

SPEC CPU Benchmark Other SPEC Benchmarks


SPEC: Standard Performance Evaluation Graphics and Workstation: SPECviewperf12 ,
Corporation (www.spec.org) SPECwpc and SPECapc
CPU-intensive benchmark for evaluating High Performance Computing: SPEC ACCEL,
processor performance of workstation SPEC MPI2007, SPEC OMP2012
Java Client/Server: SPECjbb2013
Five generations: SPEC89, SPEC92, SPEC95,
SPEC2000, SPEC2006 Network File System: SPECsfs2014
Power: SPECpower_ssj2008
Two types of programs: INT (12) and FP (17)
Virtualization: SPECvirt_sc2013

7 8

2
TPC Benchmarks Embedded Benchmark
TPC Measuring the ability of a system EEMBC (Embedded Microprocessor
to handle transactions (www.tpc.org) Benchmark Consortium) benchmarks
TPC-C, TPC-E: online transaction processing (www.eembc.org)
(OLTP) benchmark (for bank/brokerage)
Based on kernel performance
TPC-H, TPC-DS: decision support
Classes: telecommunications, networking,
TPC-VMS: Virtualization
digital media, Java, automotive/industrial,
consumer, and office equipment products

9 10

Summarizing Performance Metric 1: Arithmetic Mean


Given the performance of a set of programs, how to Total execution time / (number of
evaluate the performance of machines?
programs)
A B C 1 n
P1 (secs) 1 10 20
Timei
n i =1
P2 (secs) 1000 100 20 Simple and intuitive
Total (secs) 1001 110 40
Representative if the user run the
programs an equal number of times
Which computer is the best one?

11 12

3
Metric 2: Weighted
Metric 3: Geometric Mean
Arithmetic Mean Based on relative performance to a reference
machine
Give (different) weights to different n

programs
n
Execution time ratio
i =1
i

Relative performance is consistent with different


n n reference machines
Weight Time ,
i i Weighti = 1 Geometric mean(X i ) X
= Geometric mean( i )
i =1 i =1 Geometric mean(Yi ) Yi
If C is 2x faster than B (using B as the reference), B is 2x
Considering the frequencies of programs in faster than A (A as the reference), then C is 4x faster than
A (A as the reference)
the workload SPECRate uses geometric mean

13 14

Example SPECRate
How to get average of normalized execution time?

Recall the previous example


A B C
P1 (secs) 1 10 20
P2 (secs) 1000 100 20
Total (secs) 1001 110 40

Arithmetic mean of execution time: B is 9.1x faster than A, C is


25x times faster than A

BUT arithmetic mean of normalized execution time? Might get


conflicting results

Geometric mean of normalized execution time Consistent


results: A and B are equally fast, and C is 40% faster than A

15 16

4
Amdahls Law Amdahls Law
We know about performance: defining, Predict overall speedup from local
measuring, and summarizing speedup by an enhancement, provided
the frequency to use the enhancement
How to maximize performance gains from
is know.
the beginning in our design?

Local speedup is related to design and


Principle: Make the Common Case Fast! optimization objectives, like to double CPU
frequency, to reduce cache latency by half

17 18

Amdahls Law Amdahls Law Application


Execution time new = Execution Timeold
Objective: improve performance of a graphics engine
Fraction enhanced Choice one: Speed up FP Square root by 10x
(1 Fraction enhanced ) + Choice two: Speed up all FP instruction by 1.6x
Speedup enhanced Assume 20% inst are FP Square root, 50% are all FP inst

Execution time old Ask: Which choice is better?


Speedup overall = The answer is:
Execution time new
1 Implication: Optimizing for the common case first
=
(1 - Fraction enhanced ) + Fraction enhanced
Speedup enhanced
19 20

5
CPI and IPC CPU Time Equation
CPI: Average number of cycles spent for CPU time = CPU clock cycles cycle time
each instruction
CPU clock cycles for a program
CPU clock cycles = Instruction count CPI
CPI =
Instruction count

IPC: Average number of instructions that CPU time = Instruction count CPI
can be finished in one cycle cycle time
Instruction count
IPC =
CPU clock cycles for a program
21 22

Equation Based on Instruction Types Make Design Choice Using CPU


Time Equation
CPU time = CPU Clock Cycles Clock cycle time FP FPSQR Other
n Frequency 25% 2% 75%
CPU Clock Cycles = IC CPI CPI 4.0 20 1.33
i i
i = 1
n Alternative 1: CPIFPSQR 20 2
CPU time = IC CPI Clock cycle time
i i Alternative 2: CPIFP 4 2.5
i = 1
n
CPI = Instruction frequency i CPI i Which one is better? Calculate speedups.
i =1

23 24

6
Dependability
Dependability Example
Module reliability A disk system System MTTF
Mean time to failure (MTTF) 10 disks, each 1M-hour Failure _ rate
Mean time to repair (MTTR) MTTF = 10
1
+
1
+
1
+
1
+
1

Mean time between failures (MTBF) 1 ATA controller, 1M 500K 200K 200K 1M
10 + 2 + 5 + 5 +1 23
= MTTF + MTTR 500K-hour MTTF = =
1Mhours 1Mhours
Availability = MTTF / MTBF 1 power supply, 200K- 1 1Mhours
MTTF = =
hour MTTF Failure _ rate 23
1 fan, 200K-hour MTTF = 43.5Khours 5years
1 ATA cable, 1M-hour
MTTF

Copyright 2012, Elsevier Inc. All


rights reserved. 26

You might also like