Professional Documents
Culture Documents
Programming Assignments 3
Submitted to
Basavaraj Talawar
Dept. of Computer science & Engg.
Prepared By:
Himanshu Patel
Rahul Kushwah
(M.Tech - I )
INDEX
1. CACTI
1.1 Introduction
1.2 results
2. Network-on-Chip Simulation
2.1 Introduction
2.2 64 node Buttery network
2.3 8_8 2D Mesh
2.4 8_8 2D torus
2
2
4
4
6
8
1. CACTI
1.1. Introduction
An integrated cache access time, cycle time, area, leakage, and dynamic power model for uniform
and non-uniform cache architectures
CACTI is an integrated cache access time, cycle time, area, leakage, and dynamic power model.
By integrating all these models together, users can have confidence that trade-offs between time,
power and area are all based on the same assumptions and, hence, are mutually consistent. CACTI is
intended for use by computer architects to better understand the performance trade-offs inherent in
different cache sizes and organizations.
CACTI 6.0 improves upon prior versions of the tool by focusing on interconnect design for large
caches. In addition to strengthening existing analytical models of the tool for dominant cache
components, CACTI 6.0 introduces two major extensions:
(i)
The ability to model Non-Uniform Cache Access (NUCA)
(ii)
The ability to model different types of wires (RC-based wires with different power,
delay, and area characteristics, and differential low-swing buses).
1.2. Results
Since CACTI 6.0 does not require command-line parameters; input parameters are provided in
a cache.cfg file. We Modified cache.cfg file and entered given data and got the results after running
cacti. Please see the below answers:
1. How many total sets per bank are there?
Sets per bank= cache size/(banks*set associativity * line size)
Sets per bank = 16384/(4*2*64)
= 32
3. What are the components of the access time parameter in Cacti? What was the value
in your case?
Time Components:
Data side (with Output driver) (ns):
H-tree input delay (ns):
Decoder + wordline delay (ns):
Bitline delay (ns):
Sense Amplifier delay (ns):
H-tree output delay (ns):
Tag side (with Output driver) (ns):
H-tree input delay (ns):
Decoder + wordline delay (ns):
Bitline delay (ns):
Sense Amplifier delay (ns):
Comparator delay (ns):
H-tree output delay (ns):
0.17047
0.0281967
0.0567766
0.0168885
0.03
0.0274759
0.102964
0.00906248
0.0309153
0.00609368
0.03
0.0116144
0.0118907
4. Amongst the Data array and the Tag array, which consumed the most dynamic and
leakage power? What are the values?
Power Components:
Data array: Total dynamic read energy/access (nJ): 0.0040165
Total leakage read/write power all banks at maximum frequency (mW): 2.25506
Tag array: Total dynamic read energy/access (nJ): 0.00061173
Total leakage read/write power all banks at maximum frequency (mW): 0.176398
So in our case total dynamic read energy/access in Data array is consumed the most.
And total leakage read/write power all banks at maximum frequency is consumed the most.
5. How large (in mm2) is the 16KB cache?
Area Components:
Data array: Area (mm2): 0.0580618
Tag array: Area (mm2): 0.00560037
Cache height x width (mm): 0.294739 x 0.226968
So area of cache is = 0.294739 x 0.226968
= 0.066896321 mm2
2. Network-on-Chip Simulation
2.1. Introduction
As mentioned we used Booksim simulator for Network-on-Chip Simulation. BookSim is a
cycle-accurate interconnection network simulator. Originally developed for and introduced with the
Principles and Practices of Interconnection Networks book, its functionality has since been
continuously extended. The current major release, BookSim 2.0, supports a wide range of topologies
such as mesh, torus and flattened butterfly networks, provides diverse routing algorithms and includes
numerous options for customizing the network's router microarchitecture.
Simulation Result
We used two simulators: Booksim1 and Booksim2 (https://github.com/booksim/).We plotted the
graph with five results from each simulator with the average packet latency(Y-axis) varying over
injection rate (its/node/cycle)(X-axis). We used 2 synthetic traffic patterns 1.)Transpose and
2.)Uniform. So we have two graphs for each topology:
Transpose Butterfly
Average Packet Latency
1000
800
600
booksim1
400
booksim2
200
0
-200
0.1
0.2
0.3
Injection Rate
0.4
0.5
0.6
Traffic= Uniform
uniform Butterfly
1200
1000
800
600
booksim1
booksim2
400
200
0
0
0.1
0.2
0.3
injection rate
0.4
0.5
0.6
2.3.
8_8 2D Mesh.
8*8 2D Mesh Traffic =Transpose
injection rate
0.02
0.04
0.06
0.08
0.1
transpose(mesh)
average packet latency average packet latency
booksim1
booksim2
36.3802
27.1151
36.6855
27.26
38.0634
27.2258
40.4777
27.2277
48.1659
27.3722
Transpose (Mesh)
Average Packet Latency
60
50
40
30
booksim1
20
booksim2
10
0
0
0.02
0.04
0.06
Injection Rate
0.08
0.1
0.12
injection rate
0.02
0.04
0.06
0.08
0.1
uniform(Mesh)
average packet latency
booksim1
36.0799
37.2644
38.6514
40.9062
45.5958
Uniform(Mesh)
50
45
Average Packet Rate
40
35
30
25
booksim1
20
booksim2
15
10
5
0
0
0.02
0.04
0.06
Injection Rate
0.08
0.1
0.12
2.4.
8_8 2D Torus.
Traffic=Transpose
transpose
injection rate
0.15
0.2
0.25
0.3
0.35
average packet
latency
booksim1
725.894
379.23
406.066
476.48
562.457
average packet
latency
booksim2
301.642
438.113
233.349
670.639
771.829
tourus88 traffic=transpose
Average Packet Latency
900
800
700
600
500
400
booksim1
300
booksim2
200
100
0
0
0.05
0.1
0.15
0.2
0.25
Injection Rate
0.3
0.35
0.4
Traffic=Uniform
Uniform
injection rate
0.15
0.2
0.25
0.3
0.35
Uniform torus88
1000
900
800
700
600
500
booksim1
400
booksim2
300
200
100
0
0
0.05
0.1
0.15
0.2
0.25
Injection Rate
0.3
0.35
0.4