Professional Documents
Culture Documents
The
Memory System
Overview
Basic Concepts
The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations
Byte address
Byte address
2 -4
2 -4
2 -3
2- 2
2 - 1
2 - 4
2- 1
2 - 2
2 -3
2 -4
Traditional Architecture
Processor
k-bit
address bus
Memory
MAR
n-bit
data bus
MDR
Up to 2k addressable
locations
Word length =n bits
Control lines
( R / W , MFC, etc.)
Basic Concepts
Semiconductor RAM
Memories
Internal Organization of
Memory Chips
b7
b7
b1
b1
b0
b0
W0
FF
A0
A2
A1
W1
FF
Address
decoder
Memory
cells
A3
W15
Sense / Write
circuit
b1
Sense / Write
circuit
b0
R/W
CS
A Memory Chip
5-bit row
address
W0
W1
5-bit
decoder
32 32
memory cell
array
W31
10-bit
address
Sense/ Write
circuitry
32-to-1
output multiplexer
and
input demultiplexer
5-bit column
address
Data
input/output
R/ W
CS
Static Memories
T1
T2
Word line
Bit lines
Vsupply
T3
T4
T1
T2
X
T5
T6
Word line
Bit lines
Static Memories
Figure 5.5. An example of a CMOS memory cell.
Asynchronous DRAMs
Static RAMs are fast, but they cost more area and are more expensive.
Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not
retain their state indefinitely need to be periodically refreshed.
Bit line
Word line
T
C
A20 - 9 A 8 -
Row
decoder
4096 (512 8)
cell array
Sense / Write
circuits
Column
address
latch
CA S
CS
R/ W
Column
decoder
D7
D0
Synchronous DRAMs
Row
address
latch
Row
decoder
Cell array
Column
address
counter
Column
decoder
Read/Write
circuits & latches
Row/Column
address
Clock
RA S
CA S
R/ W
Mode register
and
timing control
Data input
register
Data output
register
CS
Data
Synchronous DRAMs
Clock
R/ W
RA S
CA S
Address
Data
Row
Col
D0
D1
D2
D3
Synchronous DRAMs
DDR SDRAM
Double-Data-Rate SDRAM
Standard SDRAM performs all actions on the rising
edge of the clock signal.
DDR SDRAM accesses the cell array in the same
way, but transfers the data on both edges of the
clock.
The cell array is organized in two banks. Each can
be accessed separately.
DDR SDRAMs and standard SDRAMs are most
efficiently used in applications where block transfers
are prevalent.
A0
A1
A19
A20
2-bit
decoder
512K 8
memory chip
D31-24
D23-16
D 15-8
D7-0
19-bit
address
8-bit data
input/output
Chip select
Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.
Memory System
Considerations
Memory Controller
Row/Column
address
Address
RA S
R/ W
Request
Memory
controller
Processor
CA S
R/ W
CS
Clock
Clock
Data
Memory
Read-Only Memories
Read-Only-Memory
Flash Memory
Similar to EEPROM
Difference: only possible to write an entire
block of cells instead of a single cell
Low power
Use in portable equipment
Implementation of such modules
Flash cards
Flash drives
Registers
Increasing
size
Primary L1
cache
Increasing Increasing
speed cost per bit
SecondaryL2
cache
Main
memory
Magnetic disk
secondary
memory
Cache Memories
Cache
What is cache?
Page 315
Why we need it?
Locality of reference (very important)
- temporal
- spatial
Cache block cache line
Cache
Processor
Cache
Replacement algorithm
Hit / miss
Write-through / Write-back
Load through
Main
memory
Memory Hierarchy
Main Memory
CPU
I/O Processor
Cache
Magnetic
Disks
Magnetic Tapes
30 / 19
Cache
Memory
High speed (towards CPU speed)
CPU
Hit
Cache
(Fast)
Cache
Main
Memory
(Slow)
Mem
31 / 19
Cache Memory
CPU
30-bit Address
Cache
1 Mword
Main
Memory
1 Gword
32 / 19
Cache Memory
00000
00001
FFFFF
Cache
00000000
00000001
3FFFFFFF
Main
Memory
Main
memory
Block 0
Direct Mapping
Block 1
tag
Cache
Block 127
Block 0
Block 128
Block 1
Block 129
Block 127
Block 255
Block 256
Block 257
Block 4095
Tag
Block
Word
Direct
Mapping
Address
000 00500
00000
Cache
00500
000 0 1 A 6
What happens
when Address
= 100 00500
Tag Data
00900
080 4 7 C C
01400
150 0 0 0 5
000 0 1 A 6
FFFFF
Compare
20
10
16
Bits
Bits Bits
(Addr) (Tag) (Data)
Match
No match
35 / 19
Direct
Mapping with Blocks
Address
000 0050 0
00000
Block Size = 16
Cache
00500
01A6
000
00501
0254
00900
47CC
080
00901
A0B4
01400
0005
150
01401
5C04
FFFFF
Tag Data
000 0 1 A 6
Compare
20
10
16
Bits
Bits Bits
(Addr) (Tag) (Data)
Match
No match
36 / 19
Direct Mapping
Tag
Block
Word
11101,1111111,1100
Tag: 11101
Block: 1111111=127, in the 127th block of the
cache
Word:1100=12, the 12th word of the 127th
block in the cache
Associative Mapping
Main
memory
Block 0
Block 1
Cache
tag
Block 0
tag
Block 1
Block i
tag
Block 127
Block 4095
Tag
Word
12
Associative Memory
Cache Location
00000000
00000001
00012000
08000000
15000000
3FFFFFFF
00000 Cache
00001
00012000
15000000
FFFFF
08000000
Address (Key)
Main
Memory
Data
39 / 19
Associative Mapping
Address
00012000
Can have
any number
of locations
Cache
00012000
01A6
Data
How many
comparators?
15000000
0005
08000000
47CC
30 Bits
(Key)
16 Bits
(Data)
01A6
40 / 19
Associative Mapping
Tag
Word
12
111011111111,1100
Tag: 111011111111
Word:1100=12, the 12th word of a block in the
cache
Main
memory
Block 0
Set-Associative Mapping
Block 1
Cache
tag
Set 0
tag
tag
Set 1
tag
tag
Set 63
tag
Block 0
Block 63
Block 1
Block 64
Block 2
Block 65
Block 3
Block 127
Block 126
Block 128
Block 127
Block 129
Block 4095
Set
Word
Set-Associative
Mapping
Address
000 00500
2-Way Set Associative
00000
Cache
00500
000 0 1 A 6
010 0 7 2 1
Tag1 Data1
00900
080 4 7 C C 000 0 8 2 2
01400
150 0 0 0 5
Tag2 Data2
000 0 1 A 6 010 0 7 2 1
000 0 9 0 9
FFFFF
Compare
20
10
16
10
16
Bits
Bits Bits
Bits Bits
(Addr) (Tag) (Data) (Tag) (Data)
Match
Compare
No match
43 / 19
Set-Associative Mapping
Tag
Set
Word
111011,111111,1100
Tag: 111011
Set: 111111=63, in the 63th set of the cache
Word:1100=12, the 12th word of the 63th set
in the cache
Replacement Algorithms
Replacement
Algorithms
For Associative & Set-Associative Cache
Which location should be emptied when the cache
is full and a miss occurs?
First In First Out (FIFO)
Least Recently Used (LRU)
Valid Bit
46 / 19
Replacement Algorithms
CPU
Reference
Cache
FIFO
Miss
Miss
Miss
Hit
Miss
Miss
Miss
Hit
Hit
Miss
A
B
A
B
C
A
B
C
A
B
C
D
E
B
C
D
E
A
C
D
E
A
C
D
E
A
C
D
E
A
F
D
47 / 19
Replacement Algorithms
CPU
Reference
Cache
LRU
Miss
Miss
Miss
Hit
Miss
Miss
Hit
Hit
Hit
Miss
B
A
C
B
A
A
C
B
D
A
C
B
E
D
A
C
A
E
D
C
D
A
E
C
C
D
A
E
F
C
D
A
48 / 19
Performance
Considerations
Overview
Interleaving
ABR
DBR
Module
0
k bits
m bits
Module
Address in module
ABR DBR
Module
i
m bits
k bits
Address in module
Module
MM address
MM address
ABR DBR
ABR DBR
ABR DBR
Module
0
Module
i
Module
k
2 - 1
ABR DBR
Module
n- 1
Tave=hC+(1-h)M
Tave: average access time experienced by the processor
h: hit rate
M: miss penalty, the time to access information in the main
memory
C: the time to access information in the cache
Example:
Assume that 30 percent of the instructions in a typical program
perform a read/write operation, which means that there are 130
memory accesses for every 100 instructions executed.
h=0.95 for instructions, h=0.9 for data
C=10 clock cycles, M=17 clock cycles, interleaved memory
Time without cache
130x10
= 5.04
Time with cache
100(0.95x1+0.05x17)+30(0.9x1+0.1x17)
The computer with the cache performs five times better
Other Enhancements
Virtual Memories
Overview
Overview
Memory
Management
Unit
Address Translation
All programs and data are composed of fixedlength units called pages, each of which
consists of a block of words that occupy
contiguous locations in the main memory.
Page cannot be too small or too large.
The virtual memory mechanism bridges the
size and speed gaps between the main
memory and secondary storage similar to
cache.
Data 2
Code
Data
Heap
Stack
Stack 1
Heap 1
Code 1
Stack 2
Prog 1
Virtual
Address
Space 1
Prog 2
Virtual
Address
Space 2
Data 1
Heap 2
Code 2
OS code
Translation Map 1
OS data
Translation Map 2
OS heap &
Stacks
Page table
Virtual
page
number
Valid
bits
Other
flags
Main memory
Address Translation
Offset
Page frame
Offset
+
PAGE TABLE
Control
bits
Page frame
in memory
Address Translation
TLB
Offset
TLB
Virtual page
number
No
Control
bits
Page frame
in memory
=?
Yes
Miss
Hit
Page frame
Offset
TLB
Memory Management
Requirements
Multiple programs
System space / user space
Protection (supervisor / user state, privileged
instructions)
Shared pages
Secondary Storage
Disk
Disk drive
Disk controller
Sector 3, trackn
Sector 0, track 1
Sector 0, track 0
Sector header
Following the data, there is an errorcorrection code (ECC).
Formatting process
Difference between inner tracks and outer
tracks
Access time seek time / rotational delay
(latency time)
Data buffer/cache
Disk Controller
Processor
Main memory
System bus
Disk controller
Disk drive
Disk drive
Disk Controller
Seek
Read
Write
Error checking
Aluminum
Optical Disks
Pit
Acrylic
Label
Polycarbonate plastic
Land
(a) Cross-section
Pit
Land
Reflection
Reflection
No reflection
Source
Detector
Source
Detector
Source
Detector
0 1 0 0
1 0 0 0 0
1 0 0 0 1
0 0 1 0 0
1 0
Optical Disks
CD-ROM
CD-Recordable (CD-R)
CD-ReWritable (CD-RW)
DVD
DVD-RAM
File
mark
File
mark
File gap
Record
Record
gap
Record
Record
gap
7 or 9
bits
Homework
Outer loop excluding Inner loop: (outer loop word size-inner loop
word size)x10x1
Inner loop: inner loop word sizex20x10x1