Lecture 24

CPS 104 Computer Organization and Programming Lecture- 24: Memory
March 15, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures
CPS104 Lec24.1
GK Spring 2004
Admin.
Homework-6: Is posted. u Please start ASAP!!
CPS104 Lec24.2
GK Spring 2004
Review: Memory Technology
Random Access: u Random is good: access time is the same for all locations u DRAM: Dynamic Random Access Memory

u
High density, low power, cheap, slow Dynamic: need to be refreshed regularly Main memory Low density, high power, expensive, fast Static: content will last forever(until lose power) Caches
SRAM: Static Random Access Memory

Not-so-random Access Technology: u Access time varies from location to location and from time to time u Examples: Disk, CDROM Sequential Access Technology: access time linear in location (e.g.,Tape)
GK Spring 2004
CPS104 Lec24.3
Review: Random Access Memory (RAM) Technology
Why do computer professionals need to know about RAM technology? u Processor performance is usually limited by memory latency and bandwidth. u Latency: The time it takes to access a word in memory. u Bandwidth: The average speed of access to memory (Words/Sec). u As IC densities increase, lots of memory will fit on processor chip
Tailor on-chip memory to specific needs.
- Instruction cache - Data cache - Write buffer What makes RAM different from a bunch of flip-flops? u Density: RAM is much more dense u Speed: RAM access is slower than flip-flop (register) access.
CPS104 Lec24.4
GK Spring 2004
Technology Trends
Logic: DRAM: Disk: Capacity 2x in 3 years 4x in 3 years 4x in 3 years Speed 2x in 3 years 1.4x in 10 years 1.4x in 10 years
Year 1980 1983 1986 1989 1992 1995

CPS104 Lec24.5
DRAM Size 1000:1! 64 Kb 2:1! 256 Kb 1 Mb 4 Mb 16 Mb 64 Mb
Cycle Time 250 ns 220 ns 190 ns 165 ns 145 ns 120 ns

GK Spring 2004
Static RAM Cell

6-Transistor SRAM Cell
0 1
word word (row select)
bit
bit
Write: 1. Drive bit lines (bit=1, bit=0) bit bit 2. Select row Read: 1. Precharge bit and bit to Vdd (set to 1) 2. Select row 3. Cell pulls one line low (pulls to 0) 4. Sense amp on column detects difference between bit and bit
GK Spring 2004
CPS104 Lec24.6
Typical SRAM Organization: 16-word x 4-bit

Din 3 Din 2 Din 1 Din 0
Precharge
WrEn
Wr Driver & - Precharger+ SRAM Cell SRAM Cell
Wr Driver & - Precharger+

Word 0
A0 A1 A2 A3
Address Decoder
SRAM Cell
Word 1
SRAM Cell
:
SRAM Cell - Sense Amp +
:
:
:
Word 15
Dout 3
CPS104 Lec24.7
Dout 2
Dout 1
Dout 0
GK Spring 2004
Logic Diagram of a Typical SRAM

A
N
WE_L OE_L

2 N words x M bit SRAM

M
Write Enable is usually active low (WE_L) Din and Dout are combined to save pins: u A new control signal, output enable (OE_L) is needed u WE_L is asserted (Low), OE_L is disasserted (High)
D serves as the data input pin D is the data output pin Result is unknown. Dont do that!!!
WE_L is disasserted (High), OE_L is asserted (Low)
Both WE_L and OE_L are asserted:
CPS104 Lec24.8
GK Spring 2004
Typical SRAM Timing

A
N
WE_L OE_L
2 N words x M bit SRAM

M
Write Timing:
D A OE_L WE_L Write Hold Time Write Setup Time
CPS104 Lec24.9
Read Timing:
High Z Junk Junk Data Out Read Address Junk Data Out
Data In Write Address
Read Address
Read Access Time
Read Access Time
GK Spring 2004
1-Transistor Memory Cell (DRAM)
Write: u 1. Drive bit line u 2. Select row Read: u 1. Precharge bit line to Vdd (1) u 2. Select row u 3. Cell and bit line share charges
row select
Very small voltage changes on the bit line bit Can detect changes of ~1 million electrons
4. Sense (fancy sense amp)
5. Write: restore the value Refresh u 1. Just do a dummy read to every CPS104cell. Lec24.10
u
GK Spring 2004
Introduction to DRAM
_N
Dynamic RAM (DRAM): u Refresh required u Very high density u Low power (.1 - .5 W active, .25 - 10 mW standby) u Low cost per bit u Pin sensitive (few pins):

r o w addr log N 2
cell array N bits SA & c o l

OE_L
_N
WE_L
Output Enable (OE_L) Write Enable (WE_L) Row address strobe (ras) Col address strobe (cas)
CPS104 Lec24.11
GK Spring 2004
Classical DRAM Organization (square)

bit (data) lines r o w d e c o d e r RAM Cell Array
Each intersection represents a 1-T DRAM Cell
word (row) select
row address
Sense-Amps, Column Selector & I/O Circuits
Column Address
Row and Column Address together:

u
Select 1 bit a time

GK Spring 2004
CPS104 Lec24.12
data
Typical DRAM Organization
Typical DRAMs: access multiple bits in parallel u Example: 2 Mb DRAM = 256K x 8 = 512 rows x 512 cols x 8 bits u Row and column addresses are applied to all 8 planes in Plane 7 parallel
Plane 1 256 Kb DRAM 256 Kb DRAM
512 cols Plane 0 512 rows One Plane of 256 Kb DRAM
D<7>
D<1> D<0>
CPS104 Lec24.13
GK Spring 2004
Logic Diagram of a Typical DRAM

RAS_L CAS_L WE_L OE_L
A
9
256K x 8 DRAM
Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low Din and Dout are combined (D): u WE_L is asserted (Low), OE_L is disasserted (High)
D serves as the data input pin D is the data output pin
WE_L is disasserted (High), OE_L is asserted (Low)
Row and column addresses share the same pins (A) u RAS_L goes low: Pins A are latched in as row address u CAS_L goes low: Pins A are latched in as column address u RAS/CAS edge-sensitive
GK Spring 2004
CPS104 Lec24.14
DRAM Write Timing
Every DRAM access begins at: u The assertion of the RAS_L u 2 ways to write: early or late v. CAS
DRAM WR Cycle Time RAS_L CAS_L A
Row Address Col Address Junk
RAS_L
CAS_L
WE_L
OE_L
A
9
256K x 8 DRAM
Row Address
Col Address
Junk
OE_L WE_L D
Junk Data In Junk Data In Junk
WR Access Time
WR Access Time
Late Early Wr Cycle: WE_L asserted before CAS_L Wr Cycle: WE_L asserted after CAS_L
CPS104 Lec24.15
GK Spring 2004
DRAM Read Timing
Every DRAM access begins at: u The assertion of the RAS_L u 2 ways to read: early or late v. CAS
DRAM Read Cycle Time
RAS_L
CAS_L
WE_L
OE_L D
A
9
256K x 8 DRAM
RAS_L CAS_L A Row Address Col Address Junk Row Address Col Address Junk
WE_L OE_L D High Z Junk Read Access Time Data Out High Z Output Enable Delay Data Out
Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L
CPS104 Lec24.16
GK Spring 2004
Increasing Bandwidth - Interleaving

Access Pattern without Interleaving:
CPU Memory
D1 available Start Access for D1
Start Access for D2
Memory Bank 0 Memory Bank 1 Memory Bank 2
Access Pattern with 4-way Interleaving:

CPU
Access Bank 0
Memory Bank 3 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again
GK Spring 2004
CPS104 Lec24.17
SPARCstation 20s Memory System Overview

Memory Controller Memory Bus (SIMM Bus) 128-bit wide datapath Memory Module 7 Memory Module 5 Memory Module 3 Memory Module 1 Memory Module 4 Memory Module 6 Memory Module 2 Memory Module 0
Processor Bus (Mbus) 64-bit wide
Processor Module (Mbus Module) SuperSPARC Processor External Cache Instruction Cache Data Cache
Register File
CPS104 Lec24.18
GK Spring 2004
Fast Memory Systems: DRAM specific
Multiple RAS accesses: several names (page mode) u 64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns New DRAMs? u Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock u RAMBUS: reinvent DRAM interface (Intel will use it)

Each Chip a module vs. slice of memory Short bus between CPU and chips Does own refresh Variable amount of data returned 1 byte / 2 ns (500 MB/s per chip)
CPS104 Lec24.19
GK Spring 2004
Summary of Memory Technology
DRAM is slow but cheap and dense: u Good choice for presenting the user with a BIG memory system u Uses one transistor, must be refreshed. SRAM is fast but expensive and not very dense: u Good choice for providing the user FAST access time. u Uses six transistors, holds state as long as power is supplied. GOAL: u Present the user with large amounts of memory using the cheapest technology. u Provide access at the speed offered by the fastest technology.
Next: Caches
CPS104 Lec24.20
GK Spring 2004
Who Cares about Memory Hierarchy?

The CPU Memory Gap
106-Cycles/S
1000 CPU
100
10
DRAM 1
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Year
CPS104 Lec24.21
GK Spring 2004
2000
The CPU-Memory Speed Gap
To illustrate the problem consider typical delays, measured in ns.

u u u
Clock Period:
0.3ns
Instructions : 1-4 instructions/clock (4-way super-scalar) On-chip small-fast SRAM (Level-1 cache): 0.3-0.6ns (1-2 clocks). On-chip large-fast SRAM (Level-2 cache) 4-6ns (12-18 clocks). Off-chip large-fast SRAM (Level-3 cache) 7-14ns (20-40 clocks) Off chip large-slow DRAM (Main memory) 90-120ns (270-360 clocks)
Question: How often does the computer access memory?

GK Spring 2004
CPS104 Lec24.22
The Motivation for Caches

Memory System
Processor
Cache
DRAM
Motivation: u Large memories (DRAM) are slow u Small memories (SRAM) are fast Make the average access time shorter by: u Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory.
CPS104 Lec24.23
GK Spring 2004
Levels of the Memory Hierarchy

Capacity Access Time Cost CPU Registers 100s Bytes <10s ns Cache K Bytes 10-100 ns ~$.0005/bit Main Memory M Bytes 100-200ns ~$10-7/byte Disk 10-100 G Bytes ms -8 -10 ~$10 - 10 /byte Tape infinite sec-min $10 CPS104 Lec24.24
-11
Upper Level
Staging Xfer Unit
faster
Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape
user/operator Mbytes OS 512-8K bytes cache cntl 8-128 bytes prog./compiler 1-8 bytes
Larger
Lower Level
GK Spring 2004
The Principle of Locality

Probability of reference
Address Space
2n
The Principle of Locality: u Program access a relatively small portion of the address space at any instant of time. u Example: 90% of time in 10% of the code Two Different Types of Locality: u Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. u Spatial Locality (Locality in Space): If an item is referenced, GK Spring 2004 CPS104 items whose addresses are close by tend to be referenced Lec24.25 soon.
Memory Hierarchy: Principles of Operation
At any given time, data is copied between only 2 adjacent levels: u Upper Level (Cache) : the one closer to the processor
Smaller, faster, and uses more expensive technology
Lower Level (Memory): the one further away from the processor
Bigger, slower, and uses less expensive technology
Block: u The minimum unit of information that can either be present or not present in the two level hierarchy
To Processor Upper Level (Cache)
Blk X
Lower Level (Memory)
From Processor
Blk Y
CPS104 Lec24.26
GK Spring 2004
Memory Hierarchy: Terminology
Hit: data appears in some block in the upper level (example: Block X) u Hit Rate: the fraction of memory access found in the upper level u Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the lower level (Block Y) u Miss Rate = 1 - (Hit Rate) u Miss Penalty = Time to replace a block in the upper level +
Time to deliver the block the processor
Hit Time << Miss Penalty

To Processor Upper Level (Cache)
Blk X
Lower Level (Memory)
From Processor
CPS104 Lec24.27
Blk Y
GK Spring 2004

Lecture 24

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 24

Uploaded by

Copyright:

Available Formats

CPS 104 Computer Organization and Programming Lecture- 24: Memory

March 15, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures

Homework-6: Is posted. u Please start ASAP!!

Review: Memory Technology

SRAM: Static Random Access Memory

Review: Random Access Memory (RAM) Technology

Tailor on-chip memory to specific needs.

Year 1980 1983 1986 1989 1992 1995

DRAM Size 1000:1! 64 Kb 2:1! 256 Kb 1 Mb 4 Mb 16 Mb 64 Mb

Cycle Time 250 ns 220 ns 190 ns 165 ns 145 ns 120 ns

Static RAM Cell

word word (row select)

Typical SRAM Organization: 16-word x 4-bit

Wr Driver & - Precharger+ SRAM Cell SRAM Cell

Wr Driver & - Precharger+ SRAM Cell SRAM Cell

Wr Driver & - Precharger+ SRAM Cell SRAM Cell

Wr Driver & - Precharger+

SRAM Cell - Sense Amp +

Logic Diagram of a Typical SRAM

2 N words x M bit SRAM

WE_L is disasserted (High), OE_L is asserted (Low)

Both WE_L and OE_L are asserted:

Typical SRAM Timing

2 N words x M bit SRAM

Data In Write Address

Read Access Time

Read Access Time

1-Transistor Memory Cell (DRAM)

4. Sense (fancy sense amp)

cell array N bits SA & c o l

Classical DRAM Organization (square)

word (row) select

Sense-Amps, Column Selector & I/O Circuits

Row and Column Address together:

Select 1 bit a time

Typical DRAM Organization

512 cols Plane 0 512 rows One Plane of 256 Kb DRAM

Logic Diagram of a Typical DRAM

D serves as the data input pin D is the data output pin

WE_L is disasserted (High), OE_L is asserted (Low)

DRAM Write Timing

DRAM Read Timing

Increasing Bandwidth - Interleaving

D1 available Start Access for D1

Start Access for D2

Memory Bank 0 Memory Bank 1 Memory Bank 2

Access Pattern with 4-way Interleaving:

SPARCstation 20s Memory System Overview

Processor Bus (Mbus) 64-bit wide

Fast Memory Systems: DRAM specific

Summary of Memory Technology

Who Cares about Memory Hierarchy?

The CPU-Memory Speed Gap

To illustrate the problem consider typical delays, measured in ns.

Question: How often does the computer access memory?

The Motivation for Caches

Levels of the Memory Hierarchy

The Principle of Locality

Memory Hierarchy: Principles of Operation

Smaller, faster, and uses more expensive technology

Bigger, slower, and uses less expensive technology

Lower Level (Memory)

Memory Hierarchy: Terminology

Hit Time << Miss Penalty