You are on page 1of 5

14:332:331 Computer Architecture

Quiz #4 SOLUTIONS
1. Problem 1 (30 points)

a) In order to create the illusion of infinite and fast memory, what are the two principles memory
architectures rely on today and what do they mean? (6 points)

There are two principles that architects rely on:


1) The principle of spatial locality it means that if an item is requested by the pipeline, chances are
its neighbors will be requested soon. This is true for example in arrays. That is why hard drives transfer
pages, rather than words to RAM (main memory), and why blocks in cache tend to be multi-word.
2) The principle of temporal locality it means that if an item in memory is reference now, it tends to
be referenced again soon. This is why items are brought closer to the processor (i.e. in cache) where
access is faster, and why when the cache is full, it is the oldest item that is evicted.

b) Draw the block diagram of the hierarchy of multiple memory levels starting with the four-core
CPU, which has a shared L3 cache. Which is the fastest, and which is the slowest element of memory
in this case? Which is the most expensive and the cheapest in terms of $/byte? (12 points)

Speed: (Fastest excluding the register files) L1 Cache > L2 Cache > L3 Cache > Main Memory > Disk
(slowest)
Cost/byte: (Most expensive) L1 Cache > L2 Cache > L3 Cache > Main Memory > Disk (Least
expensive)

c) In a MIPS architecture, a 4-word/block direct-mapped cache has a miss and needs to retrieve its
block from RAM. The main memory is 4-bank wide and interleaved and the processor-memory data
bus is 32 bit wide. Each words memory address takes 1 cycle to send, each memory bank takes 8
cycles to be read, and each data word takes 1 cycle to be sent back. What is the miss penalty in this
case (how many cycles total takes this RAM to return the requested block to the processor)? Draw
and explain. (12 points)

It takes 1 cycle to send address, 8 cycles to read RAM and 4*1 cycles to return data, which is 13
cycles in total.
2. Problem 2 (30 points)

Consider the cache shown in Figure 1. Dirty and reference (use) bits for each block are not
shown but play a role in this problem. There are 32 data bits per block.
a) Explain what type of cache this is and how it works, including the role of the valid, index,
reference and tag bits. (10 points)

Its a 4-way set associative cache organization. When the CPU tries to read data with
address A from this cache, first it would use the index bits (part of A) to locate the specific cache
set where the desired block of data may be stored. After finding the cache set, it would use valid bit
of each data block to tell if the data in that block is valid or not. Then cache control compares the
tag bits (part of A) with the tag bits (identifying which memory location corresponds to that
particular block in the set) from the four valid data blocks using four comparators. If there is a
match, its a cache hit. An OR gate will raise the Hit line when any of the four AND gates raise their
output. Otherwise, its a cache miss.
The control of the multiplexer is tied to the four outputs of the AND gates. The multiplexer will
output the word from the block that matched the desired address, sending it to the pipeline. The
role of the AND gate is to insure that a Hit only occurs when both conditions (matched tag bits and
valid data bit) occur.
Figure 1

b) What happens when this cache is full, the pipeline needs data to be placed in cache and
the block to be evicted is dirty? What happens with the pipeline in case of a cache miss?
(10 points)

First, after reading the data block from the memory, it would locate the specific set number for
that data based on the index address (part of the data memory address). According to the block
replacement algorithm, it would evict out one data block from the cache set. The data block
being evicted is the least recently used (oldest) block in that set (as determined by the reference
bit). Then the new memory data block is placed, valid bits and use bits are updated.

If the block to be evicted is dirty, the data block has to be first written in main memory to keep
data consistency. The cache will use a write back algorithm, which insures that data is
written back only when the block to be replaced is dirty (it was modified since it had been read
from main memory). This needs to be done before the block is replaced by a new one from
RAM that was needed by the pipeline.

In case of a cache miss the pipeline has to stall/wait for the reading of data from main memory.
And if thats an instruction miss, after the desired data is read back, the control block would re-
fetch the instruction

c) Assuming that the address is 30 bits wide (excluding that last 2 byte offset bits), divided
between index bits b and tag bits t, what is the total size of the cache C, as a function of
b and t (remember there are also valid, dirty and use bits)? (10 points)

Because the block offset is 2 bits, so the cache block size would be 4 bytes.
Number of sets in this cache is 2b (because it has b index bits).
The size of each set in this cache is the tag portion plus the data portion plus the valid, use and
dirty bits: 4 [(tag bits + valid bit + dirty bit + use bit)/8+4] bytes = 4[ (t+3)/8 + 4 ] bytes
Total cache size: Sets number Set size = 2(b+2)[(t+3)/8 +4] bytes

Problem 3 (40 points)


Consider the diagram in Figure 2.

Figure 2
a) Draw over Figure 2 to indicate how does the CPU switch between different applications running
at the same time? Can the applications use the same virtual address space? Can they use the
same physical address space? (8 points)
The CPU uses the Page Table Register where it writes a pointer to the first line in the Page Table of the
application which is to be executed. Yes, the applications can use the same virtual address space.
However each application has its own Page Table, thus each virtual address is translated differently to
a physical address in the space allocated to that particular application.
Different applications cannot use the same physical address space since one application might corrupt
the data of another application. There is one exception to the rule, namely for applications which use
shared memory allowing them to interchange data asynchronously.

b) How is a virtual address mapped to a physical one and what are the roles of the valid, dirty and
reference bits in the Page Table? (8 points)
A virtual address is divided into two parts: virtual page number (20 bits) and page offset (least
significant 12 bits of the virtual address). The page table is used to lookup the corresponding physical
page number. The page offset bits are left unchanged and are concatenated at the end of the physical
page number to produce the physical page address.
Valid bit: It is used in each page table entry. If the bit is 0, the page is not present in main
memory, rather it is located in the hard drive and a page fault occurs; if the bit is 1, the page is in
memory and the entry contains the physical page number.
Reference bit (or use bit): It is used in each page table entry. It is set whenever that page is
accessed, and periodically reset to 0 by the OS. The reference bit is thus a statistical indicator of page
usage, and used when it is necessary to evict a page from the page table (the least recently used has to
go).
Dirty bit: It is used in each page table entry. It is set when any word in a page is modified by an
application vs. the value loaded from hard drive. In this case the page has to first be written back to the
hard drive before being replaced. .

c) What happens with the pipeline when the physical page is not in physical memory (RAM)?
Remember that the processor is shared between different applications. (8 points)

It will cause a page fault exception (millions of cycles miss penalty), and the exception handling is
processed by the OS as follow:
1. CPU starts processing another application after saving the PC of the instruction that caused the
page fault.
2. Lookup the page table entry using the virtual address and find the location of that page on the hard
disk;
3. Choose a physical page (in main memory) to replace; if the chosen page is dirty, it must first be
written back to disk before bringing in the new page into memory;
4. Start a read to get the referenced page from disk into the chosen physical page;
5. Update the Page Table and turn the Valid Bit to 1 for that address;
6. CPU resumes the execution of the application that had the page fault

d) A multi-core 3-level cache processor has the following organization L1 private cache hit time 2
ns, misses once every 80 instructions; L2 private cache hit time 8 ns, miss rate 4%; L3 shared
cache hit latency 20 ns, miss rate 8%, and main memory hit latency is 400 ns. What is the
Average Memory Access Time (AMAT) under these conditions? (8 points)

L1 miss rate: 1/80 = 1.25%


L2 miss rate: 4%
L3 miss rate: 8%
AMAT = L1 hit time+L1 miss rate[L2 hit time+L2 miss rate(L3 hit time+L3-miss rate Mem
latency)]
= 2 + 1.25% * [8+ 4% (20 + 8% * 400)] = 2.126 ns

e) What computer architectural problem do 3-D chips attempt to solve. Give 2 benefits and 1
drawback of 3-D chips and explain. (8 points)

3D chips attempt to solve the processor-memory gap problem. That is they attempt to make the RAM
faster and reduce penalties by integrating RAM on a chip (thus the name 3D chip).
Benefits (any 2 of):
1. Higher packing density due to the addition of a third dimension;
2. Higher performance due to the reduced average interconnect length (the RAM is much closer to the
processor than if it were on the motherboard);
3. Higher performance due to faster core-to-RAM communication inside the chip.
4. Faster access due to significantly larger processor-to-memory number of busses.
Drawbacks (any 1 of):
1. Because of the high packing density, effective cooling of the processor becomes an important
problem; Air cooling is no longer an option thus water cooling (piping water directly through the
3D chip) is utilized.
2. More expensive to manufacture

You might also like