Professional Documents
Culture Documents
CPS104 Lec24.1
GK Spring 2004
Admin.
CPS104 Lec24.2
GK Spring 2004
Random Access: u Random is good: access time is the same for all locations u DRAM: Dynamic Random Access Memory
u
High density, low power, cheap, slow Dynamic: need to be refreshed regularly Main memory Low density, high power, expensive, fast Static: content will last forever(until lose power) Caches
Not-so-random Access Technology: u Access time varies from location to location and from time to time u Examples: Disk, CDROM Sequential Access Technology: access time linear in location (e.g.,Tape)
GK Spring 2004
CPS104 Lec24.3
Why do computer professionals need to know about RAM technology? u Processor performance is usually limited by memory latency and bandwidth. u Latency: The time it takes to access a word in memory. u Bandwidth: The average speed of access to memory (Words/Sec). u As IC densities increase, lots of memory will fit on processor chip
- Instruction cache - Data cache - Write buffer What makes RAM different from a bunch of flip-flops? u Density: RAM is much more dense u Speed: RAM access is slower than flip-flop (register) access.
CPS104 Lec24.4
GK Spring 2004
Technology Trends
Logic: DRAM: Disk: Capacity 2x in 3 years 4x in 3 years 4x in 3 years Speed 2x in 3 years 1.4x in 10 years 1.4x in 10 years
bit
bit
Write: 1. Drive bit lines (bit=1, bit=0) bit bit 2. Select row Read: 1. Precharge bit and bit to Vdd (set to 1) 2. Select row 3. Cell pulls one line low (pulls to 0) 4. Sense amp on column detects difference between bit and bit
GK Spring 2004
CPS104 Lec24.6
WrEn
A0 A1 A2 A3
Address Decoder
SRAM Cell
Word 1
SRAM Cell
:
SRAM Cell - Sense Amp +
:
SRAM Cell - Sense Amp +
:
SRAM Cell - Sense Amp +
:
Word 15
Dout 3
CPS104 Lec24.7
Dout 2
Dout 1
Dout 0
GK Spring 2004
WE_L OE_L
Write Enable is usually active low (WE_L) Din and Dout are combined to save pins: u A new control signal, output enable (OE_L) is needed u WE_L is asserted (Low), OE_L is disasserted (High)
D serves as the data input pin D is the data output pin Result is unknown. Dont do that!!!
CPS104 Lec24.8
GK Spring 2004
WE_L OE_L
Write Timing:
D A OE_L WE_L Write Hold Time Write Setup Time
CPS104 Lec24.9
Read Timing:
High Z Junk Junk Data Out Read Address Junk Data Out
Read Address
GK Spring 2004
Write: u 1. Drive bit line u 2. Select row Read: u 1. Precharge bit line to Vdd (1) u 2. Select row u 3. Cell and bit line share charges
row select
Very small voltage changes on the bit line bit Can detect changes of ~1 million electrons
5. Write: restore the value Refresh u 1. Just do a dummy read to every CPS104cell. Lec24.10
u
GK Spring 2004
Introduction to DRAM
_N
Dynamic RAM (DRAM): u Refresh required u Very high density u Low power (.1 - .5 W active, .25 - 10 mW standby) u Low cost per bit u Pin sensitive (few pins):
r o w addr log N 2
_N
WE_L
Output Enable (OE_L) Write Enable (WE_L) Row address strobe (ras) Col address strobe (cas)
CPS104 Lec24.11
GK Spring 2004
row address
Column Address
CPS104 Lec24.12
data
Typical DRAMs: access multiple bits in parallel u Example: 2 Mb DRAM = 256K x 8 = 512 rows x 512 cols x 8 bits u Row and column addresses are applied to all 8 planes in Plane 7 parallel
Plane 1 256 Kb DRAM 256 Kb DRAM
D<7>
D<1> D<0>
CPS104 Lec24.13
GK Spring 2004
A
9
256K x 8 DRAM
Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low Din and Dout are combined (D): u WE_L is asserted (Low), OE_L is disasserted (High)
Row and column addresses share the same pins (A) u RAS_L goes low: Pins A are latched in as row address u CAS_L goes low: Pins A are latched in as column address u RAS/CAS edge-sensitive
GK Spring 2004
CPS104 Lec24.14
Every DRAM access begins at: u The assertion of the RAS_L u 2 ways to write: early or late v. CAS
DRAM WR Cycle Time RAS_L CAS_L A
Row Address Col Address Junk
RAS_L
CAS_L
WE_L
OE_L
A
9
256K x 8 DRAM
Row Address
Col Address
Junk
OE_L WE_L D
Junk Data In Junk Data In Junk
WR Access Time
WR Access Time
Late Early Wr Cycle: WE_L asserted before CAS_L Wr Cycle: WE_L asserted after CAS_L
CPS104 Lec24.15
GK Spring 2004
Every DRAM access begins at: u The assertion of the RAS_L u 2 ways to read: early or late v. CAS
DRAM Read Cycle Time
RAS_L
CAS_L
WE_L
OE_L D
A
9
256K x 8 DRAM
RAS_L CAS_L A Row Address Col Address Junk Row Address Col Address Junk
WE_L OE_L D High Z Junk Read Access Time Data Out High Z Output Enable Delay Data Out
Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L
CPS104 Lec24.16
GK Spring 2004
Access Bank 0
Memory Bank 3 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again
GK Spring 2004
CPS104 Lec24.17
Processor Module (Mbus Module) SuperSPARC Processor External Cache Instruction Cache Data Cache
Register File
CPS104 Lec24.18
GK Spring 2004
Multiple RAS accesses: several names (page mode) u 64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns New DRAMs? u Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock u RAMBUS: reinvent DRAM interface (Intel will use it)
Each Chip a module vs. slice of memory Short bus between CPU and chips Does own refresh Variable amount of data returned 1 byte / 2 ns (500 MB/s per chip)
CPS104 Lec24.19
GK Spring 2004
DRAM is slow but cheap and dense: u Good choice for presenting the user with a BIG memory system u Uses one transistor, must be refreshed. SRAM is fast but expensive and not very dense: u Good choice for providing the user FAST access time. u Uses six transistors, holds state as long as power is supplied. GOAL: u Present the user with large amounts of memory using the cheapest technology. u Provide access at the speed offered by the fastest technology.
Next: Caches
CPS104 Lec24.20
GK Spring 2004
100
10
DRAM 1
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Year
CPS104 Lec24.21
GK Spring 2004
2000
Clock Period:
0.3ns
Instructions : 1-4 instructions/clock (4-way super-scalar) On-chip small-fast SRAM (Level-1 cache): 0.3-0.6ns (1-2 clocks). On-chip large-fast SRAM (Level-2 cache) 4-6ns (12-18 clocks). Off-chip large-fast SRAM (Level-3 cache) 7-14ns (20-40 clocks) Off chip large-slow DRAM (Main memory) 90-120ns (270-360 clocks)
CPS104 Lec24.22
Processor
Cache
DRAM
Motivation: u Large memories (DRAM) are slow u Small memories (SRAM) are fast Make the average access time shorter by: u Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory.
CPS104 Lec24.23
GK Spring 2004
Upper Level
Staging Xfer Unit
faster
Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape
user/operator Mbytes OS 512-8K bytes cache cntl 8-128 bytes prog./compiler 1-8 bytes
Larger
Lower Level
GK Spring 2004
Address Space
2n
The Principle of Locality: u Program access a relatively small portion of the address space at any instant of time. u Example: 90% of time in 10% of the code Two Different Types of Locality: u Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. u Spatial Locality (Locality in Space): If an item is referenced, GK Spring 2004 CPS104 items whose addresses are close by tend to be referenced Lec24.25 soon.
At any given time, data is copied between only 2 adjacent levels: u Upper Level (Cache) : the one closer to the processor
Lower Level (Memory): the one further away from the processor
Block: u The minimum unit of information that can either be present or not present in the two level hierarchy
To Processor Upper Level (Cache)
Blk X
From Processor
Blk Y
CPS104 Lec24.26
GK Spring 2004
Hit: data appears in some block in the upper level (example: Block X) u Hit Rate: the fraction of memory access found in the upper level u Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the lower level (Block Y) u Miss Rate = 1 - (Hit Rate) u Miss Penalty = Time to replace a block in the upper level +
Time to deliver the block the processor
From Processor
CPS104 Lec24.27
Blk Y
GK Spring 2004