You are on page 1of 10

CS701 High Performance Computing

Programming Assignments 3

Submitted to
Basavaraj Talawar
Dept. of Computer science & Engg.

Prepared By:
Himanshu Patel
Rahul Kushwah
(M.Tech - I )

INDEX
1. CACTI
1.1 Introduction
1.2 results
2. Network-on-Chip Simulation
2.1 Introduction
2.2 64 node Buttery network
2.3 8_8 2D Mesh
2.4 8_8 2D torus

2
2
4
4
6
8

1. CACTI
1.1. Introduction
An integrated cache access time, cycle time, area, leakage, and dynamic power model for uniform
and non-uniform cache architectures
CACTI is an integrated cache access time, cycle time, area, leakage, and dynamic power model.
By integrating all these models together, users can have confidence that trade-offs between time,
power and area are all based on the same assumptions and, hence, are mutually consistent. CACTI is
intended for use by computer architects to better understand the performance trade-offs inherent in
different cache sizes and organizations.
CACTI 6.0 improves upon prior versions of the tool by focusing on interconnect design for large
caches. In addition to strengthening existing analytical models of the tool for dominant cache
components, CACTI 6.0 introduces two major extensions:
(i)
The ability to model Non-Uniform Cache Access (NUCA)
(ii)
The ability to model different types of wires (RC-based wires with different power,
delay, and area characteristics, and differential low-swing buses).

1.2. Results
Since CACTI 6.0 does not require command-line parameters; input parameters are provided in
a cache.cfg file. We Modified cache.cfg file and entered given data and got the results after running
cacti. Please see the below answers:
1. How many total sets per bank are there?
Sets per bank= cache size/(banks*set associativity * line size)
Sets per bank = 16384/(4*2*64)

= 32

2. What is the default Vdd value?


Default Vdd Value = 0.9

3. What are the components of the access time parameter in Cacti? What was the value
in your case?
Time Components:
Data side (with Output driver) (ns):
H-tree input delay (ns):
Decoder + wordline delay (ns):
Bitline delay (ns):
Sense Amplifier delay (ns):
H-tree output delay (ns):
Tag side (with Output driver) (ns):
H-tree input delay (ns):
Decoder + wordline delay (ns):
Bitline delay (ns):
Sense Amplifier delay (ns):
Comparator delay (ns):
H-tree output delay (ns):

0.17047
0.0281967
0.0567766
0.0168885
0.03
0.0274759
0.102964
0.00906248
0.0309153
0.00609368
0.03
0.0116144
0.0118907

4. Amongst the Data array and the Tag array, which consumed the most dynamic and
leakage power? What are the values?
Power Components:
Data array: Total dynamic read energy/access (nJ): 0.0040165
Total leakage read/write power all banks at maximum frequency (mW): 2.25506
Tag array: Total dynamic read energy/access (nJ): 0.00061173
Total leakage read/write power all banks at maximum frequency (mW): 0.176398
So in our case total dynamic read energy/access in Data array is consumed the most.
And total leakage read/write power all banks at maximum frequency is consumed the most.
5. How large (in mm2) is the 16KB cache?
Area Components:
Data array: Area (mm2): 0.0580618
Tag array: Area (mm2): 0.00560037
Cache height x width (mm): 0.294739 x 0.226968
So area of cache is = 0.294739 x 0.226968
= 0.066896321 mm2

2. Network-on-Chip Simulation
2.1. Introduction
As mentioned we used Booksim simulator for Network-on-Chip Simulation. BookSim is a
cycle-accurate interconnection network simulator. Originally developed for and introduced with the
Principles and Practices of Interconnection Networks book, its functionality has since been
continuously extended. The current major release, BookSim 2.0, supports a wide range of topologies
such as mesh, torus and flattened butterfly networks, provides diverse routing algorithms and includes
numerous options for customizing the network's router microarchitecture.

Simulation Result
We used two simulators: Booksim1 and Booksim2 (https://github.com/booksim/).We plotted the
graph with five results from each simulator with the average packet latency(Y-axis) varying over
injection rate (its/node/cycle)(X-axis). We used 2 synthetic traffic patterns 1.)Transpose and
2.)Uniform. So we have two graphs for each topology:

2.2. 64 node Buttery network.


Traffic = Transpose
Butterfly Network Traffic=Transpose
average packet latency
injection rate
booksim1
0.1
497.251
0.2
839.049
0.3
453.459
0.4
589.216
0.5
678.706

average packet latency


booksim2
15.6403
16.6759
407.465
615.42
756.035

Transpose Butterfly
Average Packet Latency

1000
800
600
booksim1

400

booksim2

200
0
-200

0.1

0.2

0.3
Injection Rate

0.4

0.5

0.6

Traffic= Uniform

uniform Butterfly traffic=Uniform


average packet latency
injection rate
booksim1
0.1
26.5263
0.2
552.354
0.3
1100.82
0.4
589.152
0.5
675.053

average packet latency


booksim2
15.6014
15.8888
16.3502
17.1356
18.8178

uniform Butterfly
1200

Average Packet Latency

1000
800
600
booksim1
booksim2

400
200
0
0

0.1

0.2

0.3
injection rate

0.4

0.5

0.6

2.3.

8_8 2D Mesh.
8*8 2D Mesh Traffic =Transpose

injection rate
0.02
0.04
0.06
0.08
0.1

transpose(mesh)
average packet latency average packet latency
booksim1
booksim2
36.3802
27.1151
36.6855
27.26
38.0634
27.2258
40.4777
27.2277
48.1659
27.3722

Transpose (Mesh)
Average Packet Latency

60
50
40
30

booksim1

20

booksim2

10
0
0

0.02

0.04

0.06
Injection Rate

0.08

0.1

0.12

8*8 2D Mesh Traffic =Uniform

injection rate
0.02
0.04
0.06
0.08
0.1

uniform(Mesh)
average packet latency
booksim1
36.0799
37.2644
38.6514
40.9062
45.5958

average packet latency


booksim2
27.031
27.0415
27.0928
27.1794
27.7941

Uniform(Mesh)
50
45
Average Packet Rate

40
35
30
25

booksim1

20

booksim2

15
10
5
0
0

0.02

0.04

0.06
Injection Rate

0.08

0.1

0.12

2.4.

8_8 2D Torus.
Traffic=Transpose
transpose

injection rate
0.15
0.2
0.25
0.3
0.35

average packet
latency
booksim1
725.894
379.23
406.066
476.48
562.457

average packet
latency
booksim2
301.642
438.113
233.349
670.639
771.829

tourus88 traffic=transpose
Average Packet Latency

900
800
700
600
500
400

booksim1

300

booksim2

200
100
0
0

0.05

0.1

0.15

0.2

0.25

Injection Rate

0.3

0.35

0.4

Traffic=Uniform

Uniform
injection rate
0.15
0.2
0.25
0.3
0.35

average packet latency


booksim1
870.749
504.206
619.446
679.416
712.926

average packet latency


booksim2
33.6559
36.16
44.197
435.977
679.11

Uniform torus88
1000

Average Packet Latency

900
800
700
600
500

booksim1

400

booksim2

300
200
100
0
0

0.05

0.1

0.15

0.2

0.25

Injection Rate

0.3

0.35

0.4

You might also like