You are on page 1of 16

A TRANSLATION LOOKASIDE BUFFER

TLB is a table used in a virtual memory system, that lists the physical address
page number associated with each virtual address page number. A TLB is used
in conjunction with a cache whose tags are based on virtual addresses. The
virtual address is presented simultaneously to the TLB and to the cache so that
cache access and the virtual-to-physical address translation can proceed in
parallel (the translation is done "on the side"). If the requested address is not
cached then the physical address is used to locate the data in main memory. The
alternative would be to place the translation table between the cache
and main memory so that it will only be activated once
there was a cache miss.
TLB is a CPU cache that memory management
hardware uses to improve virtual address translation
speed. Some key features of TLB are:

 It holds cache most recent


translations

 Small, fully associative cache

 Avoids retranslation

Figure shows how various parts of a multilevel memory management typically


realize the address-translation ideas just discussed. The input address Av is a
virtual address consisting of a (virtual) base address Bv concatenated with a
displacement D. Av contains an effective address computed in accordance with
some program-defined addressing mode (direct, indirect, indexed, and so on)
for the memory item being accessed. It also can contain system-specific control
information-a segment address, for example-as we will see later. The real
address BR =f(Bv) assigned to Bv is stored in a memory map somewhere in the
memory system; this map can be quite large. To speed up the mapping process,
part (or occasionally all) of the memory map is placed in a small high-speed
memory in the CPU called a translation look-aside buffer (TLB). The TLB's
input is thus the base-address part Bv of Av; its output is the corresponding real
base address BR. This address is then concatenated with the D part of Av to
obtain the full physical address AR. If the virtual address Bv is not currently
assigned to the TLB, then the part of the memory map that contains Bv is first
transferred from the external memory into the TLB. Hence the TLB itself forms
a cachelike level within a multilevel address storage system for memory maps.
For this reason, the TLB is sometimes referred to as an address cache.

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 1


Segmentation

The virtual address space is divided into logical, variable-length units, or


segments. Physical memory isn't really divided or partitioned into anything.
When a segment needs to be copied into physical memory, the operating system
looks for a chunk of free memory large enough to store the entire segment. Each
segment has a base address, indicating were it is located in memory, and a
bounds limit, indicating its size. Each program, consisting of multiple segments,
now has an associated segment table instead of a page table. This segment table
is simply a collection of the base/bounds pairs for each segment.

Formally, a segment is a set of logically related, contiguous words. A word in a


segment is referred to by specifying a base address-the segment address-and a
displacement within the segment. A program and its data can be viewed as a
collection of linked segments. The links arise from the fact that a program
segment uses, or calls, other segments. Some computers have a memory
management technique that allocates main memory by M 1 segments alone.
When a segment not currently resident in M 1 is required, the entire segment is
transferred from secondary memory M2. The physical addresses assigned to the
segments are kept in a memory map called a segment table (which can itself be
a relocatable segment).

The main advantage of segmentation

 segment boundaries correspond to natural program and data boundaries.


Consequently, information that is shared among different users is often
organized into segments.

 Because of their logical independence, a program segment can be


changed or recompiled at any time without affecting other segments.

 Certain properties of programs such as the scope (range of definition) of a


variable and access rights are naturally specified by segment. These
properties require that accesses to segments be checked to protect against
unauthorized use; this protection is most easily implemented when the
units of allocation are segments.

 Certain segment types-stacks and queues, for instance-vary in length


during program execution. Segmentation varies the region assigned to
such a segment as it expands and contracts, thus efficiently using the
available memory space.

The main disadvantage of segmentation


UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 2
Segments can be of different lengths requires a relatively complex allocation
method to avoid excessive fragmentation of main-memory space. This problem
is alleviated by combining segmentation with paging, as discussed later.

Paging

The basic idea behind paging is quite simple: Allocate physical memory to
processes in fixed size chunks (page frames) and keep track of where the
various pages of the process reside by recording information in a page table.
Every process has its own page table that typically resides in main memory, and
the page table stores the physical location of each virtual page of the process.
The page table has N rows, where N is the number of virtual pages in the
process. If there are pages of the process currently not in main memory, the
page table indicates this by setting a valid bit to 0; if the page is in main
memory, the valid bit is set to 1. Therefore, each entry of the page table has two
fields: a valid bit and a frame number.

Process memory is divided into these fixed size pages, resulting in potential
internal fragmentation when the last page is copied into memory. The process
may not actually need the entire page frame, but no other process may use it.
Therefore, the unused memory in this last frame is effectively wasted

Now that you understand what paging is, we will discuss how it works. When a
process generates a virtual address, the operating system must dynamically
translate this virtual address into the physical address in memory at which the
data actually resides. (For purposes of simplicity, let's assume we have no cache
memory for the moment.) For example, from a program viewpoint, we see the
final byte of a 10-byte program as address 9, assuming 1-byte instructions and
1-byte addresses, and a starting address of 0. However, when actually loaded
into memory, the logical address 9 (perhaps a reference to the label X in an
assembly language program) may actually reside in physical memory location
1239, implying the program was loaded starting at physical address 1230. There
must be an easy way to convert the logical, or virtual, address 9 to the physical
address 1230.

To accomplish this address translation, a virtual address is divided into two


fields: a page field and an offset field, to represent the offset within that page
where the requested data is located. This address translation process is similar to
the process we used when we divided main memory addresses into fields for the
cache mapping algorithms. And similar to cache blocks, page sizes are usually
powers of 2; this simplifies the extraction of page numbers and offsets from
virtual addresses.
UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 3
To access data at a given virtual address, the system performs the following
steps:

 Extract the page number from the virtual address.


 Extract the offset from the virtual address.
 Translate the page number into a physical page frame number by
accessing the page table.
 Look up the page number in the page table (using the virtual page number
as an index).
 Check the valid bit for that page.
 If the valid bit = 0, the system generates a page fault and the operating
system must intervene to
 Locate the desired page on disk.
 Copy the desired page into the free page frame in main memory.
 Update the page table
 Resume execution of the process causing the page fault, continuing to
Step B2.
 If the valid bit = 1, the page is in memory.
 Replace the virtual page number with the actual frame number.
 Access data at offset in physical page frame by adding the offset to the
frame number for the given virtual page.
Please note that if a process has free frames in main memory when a page fault
occurs, the newly retrieved page can be placed in any of those free frames.
However, if the memory allocated to the process is full, a victim page must be
selected. The replacement algorithms used to select a victim are quite similar to
those used in cache. FIFO, Random, and LRU are all potential replacement
algorithms for selecting a victim page.

Paging Combined with Segmentation

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 4


Paging is not the same as segmentation. Paging is based on a
purely physical value: The program and
main memory are divided up into the
same physical size chunks.
Segmentation, on the other hand,
allows for logical portions of the program
to be divided into variable-sized
partitions. With segmentation, the user is
aware of the segment sizes and boundaries;
with paging, the user is unaware of the partitioning.
Paging is easier to manage: allocation,
freeing, swapping, and relocating are easy
when everything's the same size.
However, pages are typically smaller than
segments, which means more overhead (in terms of
resources to both track and transfer pages). Paging eliminates external
fragmentation, whereas segmentation eliminates internal fragmentation.
Segmentation has the ability to support sharing and protection, both of which
are very difficult to do with paging.

Paging and segmentation both have their advantages; however, a system does
not have to use one or the other-these two approaches can be combined, in an
effort to get the best of both worlds. In a combined approach, the virtual address
space is divided into segments of variable length, and the segments are divided
into fixed-size pages. Main memory is divided into the same size frames.

When segmentation is used with paging, a virtual address has three components:
a segment index Sf, a page index PI, and a displacement (offset) D. The
memory map then consists of one or more segment tables and page tables. For
fast address translation, two TLBs can be used as shown in Figure , one for
segment tables and one for page tables. As discussed earlier, the TLBs serve as
fast caches for the memory maps. Every virtual address Av generated by a
program goes through a two-stage translation process. First, the segment index
Sf is used to read the current segment table to obtain the base address PB of the
required page table. This base address is combined with the base index Pf
(which is just a displacement within the page table) to produce a page address,
which is then used to access a page table. The result is a real page address, that
is, a page frame number, which can be combined with the displacement part D
of Av to give the final (real) address AR• This system, as depicted in Figure , is

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 5


very flexible. All the various memory maps can be treated as paged segments
and can be relocated anywhere in the physical memory space.

Combined segmentation and paging is very advantageous because it allows for


segmentation from the user's point of view and paging from the system's point
of view.

Page size

The page size Sp has a big impact on both storage utilization and the effective
memory data-transfer rate. Consider first the influence of Sp on the space-
utilization factor u defined earlier. If Sp is too large, excessive internal
fragmentation results; if it is too small, the page tables become very large and
tend to reduce space utilization. A good value of Sp should achieve a balance
between these two extremes. Let Ss denote the average segment size in words. If
Ss >> Sp the last page assigned to a segment should contains about S p /2 words.
The size of the page table associated with each segment is approximately S s / Sp
words, assuming each entry in the table is a word. Hence the memory space
overhead associated with each segment is

The space utilization u is

The optimum page size S~PT can be defined as the value of Sp that maximizes
u or, equivalently, that minimizes S. Differentiating S with respect to Sp' we
obtain

S is a minimum when dS/dSp = 0, from which it follows that

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 6


The optimum space utilization is

MEMORY ALLOCATION

The various levels of a memory system are divided into sets of contiguous
locations, variously called regions, segments, or pages, which store blocks of
data. Blocks are swapped automatically among the levels in order to minimize
the access time seen by the processor. Swapping generally occurs in response to
processor requests (demand swapping). However, to avoid making a processor
wait while a requested item is being moved to the fastest level of memory M I,
some kind of anticipatory swapping must be implemented, which implies
transferring blocks to MI in anticipation that they will be required soon. Good
short-range prediction of access-request patterns is possible because of locality
of reference.

The placement of blocks of information in a memory system is called


memory allocation and is the topic of this section. The method of selecting the
part of Mj in which an incoming block K is to be placed is the replacement
policy. Simple replacement policies assign K to M 1 only when an unoccupied
or inactive region of sufficient size is available. More aggressive policies
preempt occupied blocks to make room for K. In general, successful memory
allocation methods result in a high hit ratio and a low average access time. Ifthe
hit ratio is low, an excessive amount of swapping between memory levels
occurs, a phenomenon known as thrashing. Good memory allocation also
minimizes the amount of unused or underused space inM1•
The information needed for allocation within a two-level hierarchy (M
j ,M2)unless otherwise stated, we will assume the main-secondary-memory
hierarchycan be held in a memory map that contains the following information:
• Occupied space list for M1 Each entry of this list specifies a block name,
the (base) address of the region it occupies, and, if variable, the block
size. In systems using preemptive allocation, additional information is
associated with each block to determine when and how it can be
preempted.
• Available space list for M1 Each entry of this list specifies the address of
an unoccupied region and, if necessary, its size.
• Directory for M2• This list specifies the unites) that contain the
directories for all the blocks associated with the current programs. These
directories, in turn, define the regions of the M2 space to which each
block is assigned.
UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 7
When a block is transferred from M2 to M1, the memory management system
makes an appropriate entry in the occupied space list. When the block is no
longer required in MJ, it is deallocated and the region it occupies is transferred
from the occupied space list to the available space list. A block is deallocated
when a program using it terminates execution or when the block is replaced to
make room for one with higher priority.
Many preemptive and nonpreemptive algorithms have been developed for
dynamic memory allocation.
Nonpreemptive allocation. Suppose a block Ki of niwords is to be transferred
from M2 to M1If none of the blocks already occupying MI can be preempted
(overwritten or moved) by K" then it is necessary to find or create an "available"
region of ni or more words to accommodate Ki• This process is termed
nonpreemptive allocation. The problem is more easily solved in a paging system
where all blocks (pages) have size Sp words and Mj is divided into fixed Sp-
word regions (page frames). The memory map (page table) is searched for an
available page frame; if one is
found, it is assigned to the
incoming block Kj • This easy
allocation method is the
principal reason for the
widespread use of paging. If
memory space is divisible into
regions of variable length,
however, then it becomes
more difficult to allocate
incoming blocks efficiently.
Two widely used algorithms
for nonpreemptive allocation of
variable-sized blocks-unpaged segments, for example-are first fit and best fit.
The first-fit method scans the memory map sequentially until an available
region Rj of nj or more words is found, where nj is the size of the incoming
block Kj • It then allocates Kj to Rj' The best-fit approach requires searching the
memory map completely and assigning Kj to a region of nj ~ nj words such that
nj nj is minimized.

Preemptive allocation. Nonpreemptive allocation cannot make efficient


USE of memory in all situations. Memory overflow, that is, rejection of a
memory allocation request due to insufficient space, can be expected to occur
with M1 only partially full. Much more efficient use of the
available memory space is possible if the occupied space
can be reallocated to make room for incoming blocks.
Reallocation may be done in two ways:
• The blocks already in M1 can be relocated within
M1 to create a gap large for the incoming block.
UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 8
• One or more occupied regions can be made available by deallocating the
blocks they contain. This method requires a rule-a replacement policy-for
selecting blocks to be deallocated and replaced.

Deallocation requires that a distinction be made between "dirty" blocks,


which have been modified since being loaded into M I, and "clean" blocks,
which have not been modified. Blocks of instructions remain clean, whereas
blocks of data become dirty. To replace a clean block, the memory
management system can simply overwrite it with the new block and update
its entry in the memory map. a dirty block is overwritten, it should be copied
to M2, which involves a slow transfer.
Relocation of the blocks already occupying M1 can be done by a method
compaction. The blocks currently in are compressed into a single contiguous
group at one end of the memory. This creates an available region of maximum
size. Once the memory is compacted, incoming blocks are assigned to
contiguous regions at the unoccupied end.

REPALCEMENT POLICIES

CACHE MEMORY

These are small fast memories placed between the processor and the main
memory. Caches are faster than main memory. Small cache memories are
intended to provide fast speed of memory retrieval without sacrificing the size
of memory. Cache contains a copy of certain portions of main memory. The
UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 9
memory read or writes Operation is first checked with cache and if the desired
location data is available in cache then used by the CPU directly. Otherwise, a
block of words are read from main memory to cache and the word is used by
CPU from cache. Since cache has limited space, so for this incoming block a
portion called a slot need to be vacated in Cache. The contents of this vacating
block are written back to the main memory at the position it belong to. The
reason of bringing a block of words to cache is once again locality of reference.
We expect that next few addresses will be close to this address and, therefore,
the block of word is transferred from main memory to cache. Thus, for the word
which is not in cache access time is slightly more than the access time for main
memory without cache. But, because of locality of references, the next few
words may be in the cache. thus, enhancing the overall speed of memory
references. For example, if memory read cycle takes 100 ns and a cache read
cycle takes 20 ns, then for four continuous references (first one brings the main
memory content to cache and next three from cache).

The time taken with cache =(100+20)   + 20 x 3

    for the first   for last three

    read operation   read operation

  =120+60 = 180

Time taken without cache =100x4  = 400 ns

CACHE ORGANIZATION

Figure below shows the principal components of a cache. Memory words are
stored in a cache data memory and are grouped into small pages called cache
blocks or lines. The contents of the cache's data memory are thus copies of a set
of main-memory blocks. Each cache block is marked with its block address,
referred to as a tag, so the cache knows to what part of the memory space the
block belongs. The collection of tag addresses currently
assigned to the cache, which can be noncontinguous, is
stored in a special memory, the cache tag memory or
directory.

There are two basic organizations of cache memory:

 Look aside design

 Look through design

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 10


LOOK ASIDE DESIGN

In the look-aside design, the cache and the main memory are directly connected
to the system bus. In this design the
CPU initiates a memory access by
placing a (real) address Ai on the
memory address bus at the start of a
read (load) or write (store) cycle.
The cache M1 immediately
compares Ai to the tag addresses
currently residing in its tag
memory. If a match is found in M1,
that is, a cache hit occurs, the access is completed by a read or write operation
executed in the cache; main memory M 2 is not involved. If no match with A i is
found in cache, that is, a cache miss occurs, then the desired access is completed
by a write operation directed to M2. In response to a cache miss, a block (line)
of B; that includes the target address Ai is transferred from M2 to M1. This
transfer is fast, taking advantage of the small block size and fast RAM access
methods, which allow the cache block to be filled in a single short burst. The
cache implements some replacement policy such as LRU to determine where to
place an incoming block. When necessary, the cache block replaced by B; in M 1
is saved in M2. Note that cache misses, even though they are infrequent, result
in block transfers between M1 and M2 that tie up the system bus, making it
unavailable for other uses like IO operations.

LOOK THROUGH DESIGN

A faster, but more costly organization called a look- through cache


appears in Figure. The CPU communicates with the cache via a
separate (local) bus that is isolated from the main system
bus. The system bus is available for use by
other units, such as IO controllers, to
communicate with main memory. Hence
cache accesses and main-memory accesses
not involving the CPU can proceed
concurrently. Unlike the look-aside case,
with a look-through cache the CPU does not automatically send all memory
requests to main memory; it does so only after a cache miss. A look-through
cache allows the local bus linking M1 and M2 to be wider than the system bus,
thus speeding up cache-main-memory transfers.

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 11


CACHE OPERATION

Read Policy

Figure shows relationship between the data stored in the cache M 1 and the data
stored in main memory M2.Here a cache block (line) size of 4 bytes is assumed.
Each memory address is 12 bits long, so the 10 high-order bits form the tag or
block address, and the 2 low-order bits define a displacement address within the
block. When a block is assigned to M 1 data memory, its tag is also placed in
M1's tag memory. Figure shows the contents of two blocks assigned to the cache
data memory; note the locations of the same blocks in main memory. To read
the shaded word, its address Ai = 101111000110 is sent to M1, which compares
Ai tag part to its stored tags and finds a match (hit). The stored tag pinpoints the
corresponding block in M1 tag memory, and the 2-bit displacement is used to
output the target word to the CPU.

Write Policy:
A cache write operation employs the same addressing technique. The data in
cache and main memory can be written by processors or Input/Output devices.
The main problems faced in writing with cache memories are:

The content of cache and main


memory can be altered by more than
one devices e.g. CPU can write to
caches and Input/Output module can
directly write to main memory. This
can result in inconsistencies in the
values of cache and main memory.

In the case of multiple CPUs with different caches a word altered in one cache
automatically invalidate the word in other cache.

The suggested techniques for writing in system with caches are:

Write through: Write the data in cache as well as main memory. The other
CPUs - Cache combination (in multiprocessor system) has to watch traffic to
the main memory and make suitable amendment in the contents of cache. The
disadvantage of this technique is that a bottleneck is created due to large
number of accesses to main memory by
various CPUs.

Write block: In this method updates


are made only in the cache, setting a bit
UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 12
called Update bit. Only that block whose update bit is set is replaced in the main
memory. But here all the accesses to main memory whether from other CPUs or
Input/output modules need to be from the cache resulting in complex circuitry.

ADDRESS MAPPING TECHNIQUES

Address mapping is defined as the smallest unit of addressed data that can be
mapped independently to an area of the virtual address space.

There are three common mapping techniques in cache memory:

 Direct Mapping

 Associative memory

 Set associative memory

Direct Mapping: In this mapping each block of memory is mapped in a fixed


slot of cache only. For example, if a cache has four slots then the main memory
blocks 0 or 4 or 8 or 12 or 16... can be found in slot 0, while 1 or 5 or 9 or 13 or
17... can be found in slot 1; 2 or 6 or 10 or 14 or 18 ... in slot 2: and 3 or7or 11
or 15 or 19 ... in slot 3. This can be mathematically defined as

Cache slot number =(Block number of main memory) Modulo (Total number
of slots

in cache)

ADVANTAGE

 In this technique, it can be easily determined whether a


block is in cache or not.
 This is a simple technique.
DISADVANTAGE

this scheme suppose two words


which are referenced alternately
repeatedly are falling in the same slot then the swapping of
these two blocks will take place in the cache, thus, resulting
in reduced efficiency of the cache.

Associative Mapping: In associate mapping any block of the


memory can be mapped on to any location of the cache. But here the main
difficulty is to determine "Whether a block is in cache or not?". This process of
determination is normally carried out simultaneously.

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 13


The main disadvantage of this mapping is the complex circuitry required to
examine all the cache slots in parallel to determine the presence or absence of a
block in cache.

Set Associative Mapping: This is a compromise between the


above two types of mapping. Here the advantages of both
direct and associative cache can be obtained. The cache
border is divided in some sets, let’s say A.
The scheme is that a direct mapping
is used to map the main memory
blocks in one of the A sets and
within this set any slot can be
assigned to this block.

Associative Memories

In associative memories any stored item can be accessed directly by assigning


the contents of the item in question, such as name of a person, account number,
number etc., as an address. Associative memories are also known as content
addressable memories (CAMs). The entity chosen to address the memory is
known as the key.

Figure aside shows the structure


of a simple associative memory.
The information is stored in
CAMs as fixed-length words.
Any entity of the word may be
chosen as the key field. The
desired key is indicated by the
mask register. Then, the key is
compared simultaneously with all
stored words. The words that
match the key issue a match
signal. This match signal then
enters a select circuit. The select
circuit in turn helps for the access
of required data field. In case
more than one entry has the same
by. Then it is the responsibility of the select circuit to determine the data field to
be read. For example, the select circuit read out all the matching entries in a pre-
determined order. Each word is provided with its own match circuit as all the

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 14


words in the memory need to compare their keys with the desired key
simultaneously. The match and select circuits thus, make these associative
memories very complex and expensive than any conventional memory. The
VLSI technology has made these associative memories economically feasible.
Even now the cost considerations limit the applications of associative memories
to relatively small amount of information which needed to be accessed very
rapidly.

Associative memory cell

The logic circuit for a 1-bit associative memory cell appears in Figure below.
The cell comprises a D flip-flop for data storage, a match circuit (the
EXCLUSIVE-NOR gate) for comparing the flip-flop's contents to an external
data bit D, and circuits for reading from and writing into the cell. The results of
a comparison appear on the match output M, where M = 1 denotes a match and
M = 0 denotes no match. The cell is selected or addressed for both read and
write operations by setting the select line S to 1. New data is written into the
cell by setting the write enable line WE to 1, which in turn enables the D flip-
flop's clock input CK. The stored data is read out via the Q line. The mask
control line MK is activated (MK =1) to force the match line M to 0
independently of the data stored in the D flip-flop; MK also disables the input
circuits of the flip-flop by forcing CK to 0. A cell like that of Figure can be
realized with about 10 transistors-far more than the single transistor required for
a dynamic RAM cell. This high hardware cost is the main reason that large
associative memories are rarely used outside caches.

Structure versus Performance

We next examine some additional aspects of


cache design: the
types of
information
to store in
the cache, the
cache's dimensions
and control
methods, and the
impact of the cache's design on its performance.

CACHE TYPES

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 15


Caches are distinguished by the kinds of information they store. An instruction
or I-cache stores instructions only, while a data or D-cache stores data only.
Separating the stored data in this way recognizes the different access behaviour
patterns of instructions and data. For example, programs tend to involve few
write accesses, and they often exhibit more temporal and spatial locality than
the data they process. A cache that stores both instructions and data is referred
to as unified. A split cache, on the other hand, consists of two associated but
largely independent units: an I-cache for instructions and a D-cache for data.
While a unified cache is simpler, a split cache makes it possible to access
programs and data concurrently. A split cache can also be designed to manage
its I-and D-cache components differently.

PERFORMANCE

The cache is the fastest component in the memory hierarchy, so it is desirable to


make the average memory access time tA seen by the CPU as close as possible
to access time tAl of the cache. To achieve this goal, M1 should satisfy a very
high percentage of all memory references; that is, the cache hit ratio H should
be almost one. A high hit ratio is possible because of the locality-of reference
property discussed earlier. we have tA = tA + (I -H)tB where tA is the block-
transfer time from M2 to M1 The block size is small enough that, with a
sufficiently wide M2 to- M1 data bus, a block can be loaded into the cache in a
single main-memory read operation, making tB=t A the main-memory access
time. Hence we can roughly estimate cache performance with the equation

Consider a k-way set-associative cache M1 defined by the following parameters:


the number of sets s1 the number of blocks (lines) per set k, and the number
bytes per block (also called the line size) p 1 .Recall that the cache is fully
associative when s1=1 and is direct-mapped when k =1. The number of bytes
stored in the cache's data memory, usually referred to as the cache size S1 is
given by the following formula:

or, in words, Cache size=number of blocks (lines) per set x number of sets x
number of bytes per block

UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 16

You might also like