You are on page 1of 27

CPS 104 Computer Organization and Programming Lecture- 24: Memory

March 15, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures

CPS104 Lec24.1

GK Spring 2004

Admin.

Homework-6: Is posted. u Please start ASAP!!

CPS104 Lec24.2

GK Spring 2004

Review: Memory Technology

Random Access: u Random is good: access time is the same for all locations u DRAM: Dynamic Random Access Memory

u

High density, low power, cheap, slow Dynamic: need to be refreshed regularly Main memory Low density, high power, expensive, fast Static: content will last forever(until lose power) Caches

SRAM: Static Random Access Memory


Not-so-random Access Technology: u Access time varies from location to location and from time to time u Examples: Disk, CDROM Sequential Access Technology: access time linear in location (e.g.,Tape)
GK Spring 2004

CPS104 Lec24.3

Review: Random Access Memory (RAM) Technology

Why do computer professionals need to know about RAM technology? u Processor performance is usually limited by memory latency and bandwidth. u Latency: The time it takes to access a word in memory. u Bandwidth: The average speed of access to memory (Words/Sec). u As IC densities increase, lots of memory will fit on processor chip

Tailor on-chip memory to specific needs.

- Instruction cache - Data cache - Write buffer What makes RAM different from a bunch of flip-flops? u Density: RAM is much more dense u Speed: RAM access is slower than flip-flop (register) access.
CPS104 Lec24.4
GK Spring 2004

Technology Trends
Logic: DRAM: Disk: Capacity 2x in 3 years 4x in 3 years 4x in 3 years Speed 2x in 3 years 1.4x in 10 years 1.4x in 10 years

Year 1980 1983 1986 1989 1992 1995


CPS104 Lec24.5

DRAM Size 1000:1! 64 Kb 2:1! 256 Kb 1 Mb 4 Mb 16 Mb 64 Mb

Cycle Time 250 ns 220 ns 190 ns 165 ns 145 ns 120 ns


GK Spring 2004

Static RAM Cell


6-Transistor SRAM Cell
0 1

word word (row select)

bit

bit

Write: 1. Drive bit lines (bit=1, bit=0) bit bit 2. Select row Read: 1. Precharge bit and bit to Vdd (set to 1) 2. Select row 3. Cell pulls one line low (pulls to 0) 4. Sense amp on column detects difference between bit and bit
GK Spring 2004

CPS104 Lec24.6

Typical SRAM Organization: 16-word x 4-bit


Din 3 Din 2 Din 1 Din 0
Precharge

WrEn

Wr Driver & - Precharger+ SRAM Cell SRAM Cell

Wr Driver & - Precharger+ SRAM Cell SRAM Cell

Wr Driver & - Precharger+ SRAM Cell SRAM Cell

Wr Driver & - Precharger+


Word 0

A0 A1 A2 A3

Address Decoder

SRAM Cell
Word 1

SRAM Cell

:
SRAM Cell - Sense Amp +

:
SRAM Cell - Sense Amp +

:
SRAM Cell - Sense Amp +

:
Word 15

SRAM Cell - Sense Amp +

Dout 3
CPS104 Lec24.7

Dout 2

Dout 1

Dout 0
GK Spring 2004

Logic Diagram of a Typical SRAM


A
N

WE_L OE_L

2 N words x M bit SRAM


M

Write Enable is usually active low (WE_L) Din and Dout are combined to save pins: u A new control signal, output enable (OE_L) is needed u WE_L is asserted (Low), OE_L is disasserted (High)

D serves as the data input pin D is the data output pin Result is unknown. Dont do that!!!

WE_L is disasserted (High), OE_L is asserted (Low)

Both WE_L and OE_L are asserted:

CPS104 Lec24.8

GK Spring 2004

Typical SRAM Timing


A
N

WE_L OE_L

2 N words x M bit SRAM


M

Write Timing:
D A OE_L WE_L Write Hold Time Write Setup Time
CPS104 Lec24.9

Read Timing:
High Z Junk Junk Data Out Read Address Junk Data Out

Data In Write Address

Read Address

Read Access Time

Read Access Time

GK Spring 2004

1-Transistor Memory Cell (DRAM)

Write: u 1. Drive bit line u 2. Select row Read: u 1. Precharge bit line to Vdd (1) u 2. Select row u 3. Cell and bit line share charges

row select

Very small voltage changes on the bit line bit Can detect changes of ~1 million electrons

4. Sense (fancy sense amp)

5. Write: restore the value Refresh u 1. Just do a dummy read to every CPS104cell. Lec24.10
u

GK Spring 2004

Introduction to DRAM
_N

Dynamic RAM (DRAM): u Refresh required u Very high density u Low power (.1 - .5 W active, .25 - 10 mW standby) u Low cost per bit u Pin sensitive (few pins):

r o w addr log N 2

cell array N bits SA & c o l


OE_L

_N

WE_L

Output Enable (OE_L) Write Enable (WE_L) Row address strobe (ras) Col address strobe (cas)

CPS104 Lec24.11

GK Spring 2004

Classical DRAM Organization (square)


bit (data) lines r o w d e c o d e r RAM Cell Array
Each intersection represents a 1-T DRAM Cell

word (row) select

row address

Sense-Amps, Column Selector & I/O Circuits

Column Address

Row and Column Address together:


u

Select 1 bit a time


GK Spring 2004

CPS104 Lec24.12

data

Typical DRAM Organization

Typical DRAMs: access multiple bits in parallel u Example: 2 Mb DRAM = 256K x 8 = 512 rows x 512 cols x 8 bits u Row and column addresses are applied to all 8 planes in Plane 7 parallel
Plane 1 256 Kb DRAM 256 Kb DRAM

512 cols Plane 0 512 rows One Plane of 256 Kb DRAM

D<7>

D<1> D<0>

CPS104 Lec24.13

GK Spring 2004

Logic Diagram of a Typical DRAM


RAS_L CAS_L WE_L OE_L

A
9

256K x 8 DRAM

Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low Din and Dout are combined (D): u WE_L is asserted (Low), OE_L is disasserted (High)

D serves as the data input pin D is the data output pin

WE_L is disasserted (High), OE_L is asserted (Low)

Row and column addresses share the same pins (A) u RAS_L goes low: Pins A are latched in as row address u CAS_L goes low: Pins A are latched in as column address u RAS/CAS edge-sensitive
GK Spring 2004

CPS104 Lec24.14

DRAM Write Timing

Every DRAM access begins at: u The assertion of the RAS_L u 2 ways to write: early or late v. CAS
DRAM WR Cycle Time RAS_L CAS_L A
Row Address Col Address Junk

RAS_L

CAS_L

WE_L

OE_L

A
9

256K x 8 DRAM

Row Address

Col Address

Junk

OE_L WE_L D
Junk Data In Junk Data In Junk

WR Access Time

WR Access Time

Late Early Wr Cycle: WE_L asserted before CAS_L Wr Cycle: WE_L asserted after CAS_L
CPS104 Lec24.15
GK Spring 2004

DRAM Read Timing

Every DRAM access begins at: u The assertion of the RAS_L u 2 ways to read: early or late v. CAS
DRAM Read Cycle Time

RAS_L

CAS_L

WE_L

OE_L D

A
9

256K x 8 DRAM

RAS_L CAS_L A Row Address Col Address Junk Row Address Col Address Junk

WE_L OE_L D High Z Junk Read Access Time Data Out High Z Output Enable Delay Data Out

Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L
CPS104 Lec24.16
GK Spring 2004

Increasing Bandwidth - Interleaving


Access Pattern without Interleaving:
CPU Memory

D1 available Start Access for D1

Start Access for D2

Memory Bank 0 Memory Bank 1 Memory Bank 2

Access Pattern with 4-way Interleaving:


CPU

Access Bank 0

Memory Bank 3 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again
GK Spring 2004

CPS104 Lec24.17

SPARCstation 20s Memory System Overview


Memory Controller Memory Bus (SIMM Bus) 128-bit wide datapath Memory Module 7 Memory Module 5 Memory Module 3 Memory Module 1 Memory Module 4 Memory Module 6 Memory Module 2 Memory Module 0

Processor Bus (Mbus) 64-bit wide

Processor Module (Mbus Module) SuperSPARC Processor External Cache Instruction Cache Data Cache

Register File

CPS104 Lec24.18

GK Spring 2004

Fast Memory Systems: DRAM specific

Multiple RAS accesses: several names (page mode) u 64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns New DRAMs? u Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock u RAMBUS: reinvent DRAM interface (Intel will use it)

Each Chip a module vs. slice of memory Short bus between CPU and chips Does own refresh Variable amount of data returned 1 byte / 2 ns (500 MB/s per chip)

CPS104 Lec24.19

GK Spring 2004

Summary of Memory Technology

DRAM is slow but cheap and dense: u Good choice for presenting the user with a BIG memory system u Uses one transistor, must be refreshed. SRAM is fast but expensive and not very dense: u Good choice for providing the user FAST access time. u Uses six transistors, holds state as long as power is supplied. GOAL: u Present the user with large amounts of memory using the cheapest technology. u Provide access at the speed offered by the fastest technology.

Next: Caches
CPS104 Lec24.20
GK Spring 2004

Who Cares about Memory Hierarchy?


The CPU Memory Gap
106-Cycles/S
1000 CPU

100

10

DRAM 1

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

Year
CPS104 Lec24.21
GK Spring 2004

2000

The CPU-Memory Speed Gap

To illustrate the problem consider typical delays, measured in ns.


u u u

Clock Period:

0.3ns

Instructions : 1-4 instructions/clock (4-way super-scalar) On-chip small-fast SRAM (Level-1 cache): 0.3-0.6ns (1-2 clocks). On-chip large-fast SRAM (Level-2 cache) 4-6ns (12-18 clocks). Off-chip large-fast SRAM (Level-3 cache) 7-14ns (20-40 clocks) Off chip large-slow DRAM (Main memory) 90-120ns (270-360 clocks)

Question: How often does the computer access memory?


GK Spring 2004

CPS104 Lec24.22

The Motivation for Caches


Memory System

Processor

Cache

DRAM

Motivation: u Large memories (DRAM) are slow u Small memories (SRAM) are fast Make the average access time shorter by: u Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory.

CPS104 Lec24.23

GK Spring 2004

Levels of the Memory Hierarchy


Capacity Access Time Cost CPU Registers 100s Bytes <10s ns Cache K Bytes 10-100 ns ~$.0005/bit Main Memory M Bytes 100-200ns ~$10-7/byte Disk 10-100 G Bytes ms -8 -10 ~$10 - 10 /byte Tape infinite sec-min $10 CPS104 Lec24.24
-11

Upper Level
Staging Xfer Unit

faster

Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape
user/operator Mbytes OS 512-8K bytes cache cntl 8-128 bytes prog./compiler 1-8 bytes

Larger

Lower Level
GK Spring 2004

The Principle of Locality


Probability of reference

Address Space

2n

The Principle of Locality: u Program access a relatively small portion of the address space at any instant of time. u Example: 90% of time in 10% of the code Two Different Types of Locality: u Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. u Spatial Locality (Locality in Space): If an item is referenced, GK Spring 2004 CPS104 items whose addresses are close by tend to be referenced Lec24.25 soon.

Memory Hierarchy: Principles of Operation

At any given time, data is copied between only 2 adjacent levels: u Upper Level (Cache) : the one closer to the processor

Smaller, faster, and uses more expensive technology

Lower Level (Memory): the one further away from the processor

Bigger, slower, and uses less expensive technology

Block: u The minimum unit of information that can either be present or not present in the two level hierarchy
To Processor Upper Level (Cache)
Blk X

Lower Level (Memory)

From Processor

Blk Y

CPS104 Lec24.26

GK Spring 2004

Memory Hierarchy: Terminology

Hit: data appears in some block in the upper level (example: Block X) u Hit Rate: the fraction of memory access found in the upper level u Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss

Miss: data needs to be retrieve from a block in the lower level (Block Y) u Miss Rate = 1 - (Hit Rate) u Miss Penalty = Time to replace a block in the upper level +
Time to deliver the block the processor

Hit Time << Miss Penalty


To Processor Upper Level (Cache)
Blk X

Lower Level (Memory)

From Processor
CPS104 Lec24.27

Blk Y
GK Spring 2004

You might also like