Professional Documents
Culture Documents
N. B. Dodge 11/13
N. B. Dodge 11/13
N. B. Dodge 11/13
N. B. Dodge 11/13
N. B. Dodge 11/13
Types of Memory
As we have just seen, even in the everyday PC, the use of
sophisticated memory management is common.
This means that there are four kinds of memory in the modern PC
or workstation computer: Registers, cache, DRAM, and the disk
or HDD (or SSM). And this does not count CDs, DVDs, Zip
drives, thumb drives (flash EPROM), or floppy disks!
The challenge to the computer engineer is to mesh the first five
storage media and to make the use of them transparent that is,
invisible to the user, who will appear to have massive amounts of
high-speed, cheap memory available to solve any problem.
Before we discuss how to manage this extremely challenging
engineering problem, we will discuss the types of memory that are
used and learn a little about them.
6
N. B. Dodge 11/13
Registers
D Q
CR
D FF
32-Bit Reg.
Register Block
N. B. Dodge 11/13
N. B. Dodge 11/13
L1 Cache
L1 cache (level 1 cache) is SRAM memory that is very close to
the CPU. For example, it is next to the ALU in most processors.
L1 cache is basically sets of D FFs but many more than in the
CPU register block.
For example, a typical register block might have 16-32 registers of
4 or 8 bytes each for a total of 64-128 bytes of storage. The Intel
Quad-core cache, on the other hand, has 512 Kbytes the
equivalent of 4,000,000 flip-flops.
Access speed of L1 cache is slower, however, due to the complex
arrangement of data buses which is necessary to access specific
bytes in the L1 memory array. It is typically about 1/2-1/3 as fast
as CPU registers in terms of load/store cycle.
9
N. B. Dodge 11/13
L1 Cache (Continued)
D Q
CR
D FF
32-Bit Reg.
L1 Cache
N. B. Dodge 11/13
L2/3 Cache
The level-2 cache, although on the CPU chip, is on the
opposite side of the chip. L2 cache is also SRAM.
L2 (often now called L3 cache) cache is much greater
than L1 cache, since more real estate is devoted to
memory. Both Intel and AMD multi-cores typically
have 8-15 Mbyte cache, shared.
Due to even more elaborate bus arrangements and the
fact that L2/3 cache is not as close to the CPU,
load/store access is > L1 cache, but still << DRAM.
11
N. B. Dodge 11/13
L2 cache physically
located on P-IV chip.
12
N. B. Dodge 11/13
N. B. Dodge 11/13
Parameter
DRAM
Very fast
High; ~16
transistors
per storage cell
High
Speed
Fast
Low; 1
transistor
per cell
Very low
Virtually
none
Very low
Excessive
High
14
Complexity
Power Used
Heat
Generated
Cost
N. B. Dodge 11/13
DRAM Memory
The term DRAM stands for dynamic random-access memory
(pronounced D-ram, not dram). This means that the title
above is actually redundant!
DRAM is electronic memory that is capable of very fast access
(load or store), but is not as fast as cache. One exception is
Rambus memory, a special DRAM memory whose
manufacturer has announced cache-speed products (up to 7.2
GHz!). It is very expensive, however.
The simple construction of DRAM makes it ideal in modern,
workstation-based computing, where most users have their own
computer system (PC, Mac, Sun, etc.).
DRAM consists of a simple charge-storage device (stored charge =
1), with a switch to store/test the charge. Only a single
transistor is required for a DRAM bit cell.
15
N. B. Dodge 11/13
DRAM (Continued)
The term dynamic in DRAM is due to the fact that the
memory is not truly a flip-flop; it is not static. DRAM
remembers a 1 by storing charge on a capacitor.
Capacitors, however, are not perfect storage elements
the charge leaks off after a short time. Thus the DRAM
element is dynamic its memory lifetime is limited
and it must have its memory refreshed periodically.
On the next several slides, we explore the way DRAM is
constructed and the odd way that it must be treated to
be sure that it retains its memory.
16
N. B. Dodge 11/13
CMOS
transistor
Word
line
Capacitor
Ground
N. B. Dodge 11/13
+V (= logic 1)
turns on
transistor
Word
line
Current
CMOS
transistor
Capacitor
charges
Bit line
+V (= logic 1)
turns on
transistor
Word
line
Current
CMOS
transistor
Ground
0V (= logic 0)
Capacitor
discharges
Ground
N. B. Dodge 11/13
+V (= logic 1)
turns on
transistor
Word
line
Current
CMOS
transistor
Capacitor
charged
Bit line
No current flow
+V (= logic 1)
turns on
transistor
CMOS
transistor
Word
line
Ground
logic 0 sensed
Capacitor
has no charge
Ground
To read, or sense the value of the DRAM cell, the word line once again has a
voltage applied to it, which turns on the transistor. If the capacitor is charged,
current flows OUT of the transistor, and this current is sensed and amplified,
showing that a 1 is present. If the capacitor is discharged, no current flows,
so that the sensing element determines that a logic 0 is present.
19
N. B. Dodge 11/13
N. B. Dodge 11/13
Word line
activated
Word
line
Current
CMOS
transistor
+0
Capacitor
discharges
Bit line
Current
Word line
reactivated
CMOS
transistor
0+
Word
line
Capacitor
recharged
Ground
Logic 1 rewritten by
applying +V to bit line
Ground
N. B. Dodge 11/13
Exercise 1
1. Rank these memories by speed: L2 cache, DRAM, L1
cache, registers, and hard disk drives.
2. A DRAM memory chip is accessed and a bit read out.
The bit that is read is a 1. What happens now?
3. That same memory bit is then left alone (i.e., not
accessed by its addressing mechanism for either read
or write) for several milliseconds. What happens
next?
22
N. B. Dodge 11/13
N. B. Dodge 11/13
Rotation
Current flow
N. B. Dodge 11/13
26
Recording head
N. B. Dodge 11/13
27
N. B. Dodge 11/13
N. B. Dodge 11/13
HDD Package
N. B. Dodge 11/13
Where:
Seek time = time for the positioning arm to move the head from its
present track to the track where the load/store data is located.
Rotational time = time for the requested sector to rotate underneath
the read/write head after the head is positioned over the track.
Transfer time = time for data transfer from disk to main memory.
Controller delay = time to set up transfer in the HDD electronic
interface.
30
N. B. Dodge 11/13
N. B. Dodge 11/13
32
N. B. Dodge 11/13
N. B. Dodge 11/13
N. B. Dodge 11/13
L2/3
Cache
DRAM
HDD
Registers
8-64 Kbytes
200 ps
160-2000 Gbytes
~10-20 ms
N. B. Dodge 11/13
N. B. Dodge 11/13
Cache Utilization
Cache designers make use of the principles of temporal and spatial
locality to assure that the most-probably needed instructions and
data are available to the computer in cache (to speed execution).
Special hardware is designed to manage cache content with the
goal of forecasting upcoming instructions and data required by the
processor during program execution and moving it from slower
DRAM into cache.
This hardware has two special goals: (1) examining the currentlyexecuting process and predicting instruction and data need, and
(2) moving the required information from DRAM to cache in a
timely manner to foresee that anticipated need.
37
N. B. Dodge 11/13
38
N. B. Dodge 11/13
N. B. Dodge 11/13
Cache Diagram
CPU Package
CPU
L1
Cache
L2
Cache
DRAM
HDD
Registers
Cache management hardware
includes subsystems to predict
usage and move data or
instructions from DRAM to
cache as appropriate. A cache
miss will initiate DRAM access
for transfer to cache.
40
N. B. Dodge 11/13
Summary
Modern memory management maximizes the speed of
computer processing while keeping system cost
reasonable for the user.
This approach uses a small amount of very fast SRAM
cache memory which are physically near the computer,
a substantial amount of DRAM, which is still very fast,
as the main working memory, and HDD or flash
memory for large program storage.
Effective, (but complex) hardware and software have
been developed to manage this memory hierarchy and
maximize its effectiveness.
41
N. B. Dodge 11/13
Exercise 2
1. Each small area of cache (say, 1K byte) represents a
much larger area (say, 1 Mbyte) in DRAM. If an
instruction, for example, is supposed to reside in a
given Mbyte of DRAM, the corresponding cache
extent is searched. Assume that, according to the
validity indicator, the correct instruction is NOT in
cache. What now?
2. Give simple definitions of the principles of temporal
and spatial locality.
42
N. B. Dodge 11/13
N. B. Dodge 11/13
Memory
In 2008, a completely new kind of circuit element (that
was predicted in 1971), the memristor, was developed.
Memristor circuit elements can retain a state (i.e.,
memory) even when power is off. They could eventually
replace flash EPROM, and perhaps DRAM. Thousand
Gbyte main memories are possible.
Memristors remember multiple states (not just ones and
zeros). Thus a memristor memory might eventually
remember like a human neuron, leading to neuraltype processors in the long term.
45
N. B. Dodge 11/13
Memory (2)
Time frame for new memristor devices:
46
N. B. Dodge 11/13
Memory (3)
Another new memory type is Phase Change Memory
(PCM). PCM stores 0s or 1s as follows:
The material is heated to a high temperature, producing an
amorphous crystalone that has a disorganized structure, with
high conductivity. This is a 1.
To write a 0, the material is heated to a lower temperature,
creating an organized crystal structure with low conductivity.
N. B. Dodge 11/13
Memory (4)
Other new memory types in development include:
Magnetic RAM (MRAM) Uses tunneling resistance that
depends on the relative magnetization of ferromagnetic electrodes
(very scalable, potential for high speed). Much work needed to
reduce geometries below 20 nm.
Resistive RAM (ReRAM) Varies resistance according to applied
voltage. Nonvolatile, low power, high density. Materials research
has improved the outlook, but production cost and reliability are
problems.
N. B. Dodge 11/13
The CPU
Intel and AMD abandoned the GHz race in CPUs
years ago. In the early 2000s, Intel stated a goal of a
10 GHz CPU by 2010which clearly didnt happen.
Multiple CPUs became the performance enhancer.
The standard is now 4- and 6- core CPUs. Intel Xeon
server CPUs are 8-core.
With the new Broadwell architecture at the 14 nm
node, Intel is now said to be planning an 18-core CPU,
with 8- and 10-core performance CPUs as well. Speed
is inching upmaybe to 5 GHz in the late teens.
49
N. B. Dodge 11/13
N. B. Dodge 11/13
Three-D Technology
As feature size gets
smaller (22 nm14 nm),
3-D manufacturing
processes continue to
improve.
Generically referred to as
FINFET due to the 3-d
fin. Intel calls it trigate.
At right, comparison of
transistor sizes in 22 and
14 nm processes.
51
N. B. Dodge 11/13
Multi-Core Advance
Intel has announced a 72-core
CPU, Knights Landing.
Available in early 2015, will
have up to 16GB DRAM, up to
500GB/sec of memory
bandwidth, plus up to 384GB
of DDR4-2400 mainboard
memory. KL will use the 14nm
process. With a promise of 3 Intel Knights
Landing,
teraflops (double precision)
with 72 Pentium
per socket it will almost
cores with 64-bit
certainly be used to build some support.
monster x86 supercomputers.
52
N. B. Dodge 11/13
Stampede
New supercomputer at the University of Texas.
Uses several thousand Dell Zeus servers, each with
dual 8-core Intel Xeon processors.
Each Zeus server uses several Knights Corner chips, a
precursor to Knights Landing. Has 522,080 cores.
Knights Corner also uses modified Pentium-era cores.
Has 270 Tbytes (yes, that TERA bytes) of DRAM.
Has 14 Petabytes of storage memory.
Peak performance = 9,600,000,000,000,000 floating
point operations per second (9.6 petaflops).
Exaflops are on the way!!
53
N. B. Dodge 11/13
In the last few years, the Intel CPU family has gone from
Sandy Bridge (32 nm) Ivy Bridge (22 nm,
FinFET) Haswell (Optimized FinFET, 22 nm)
Broadwell (14 nm). The next generation is termed
Skylake (14 nm, new architecture, optimized FinFET).
Both Broadwell and Skylake will use much less power,
especially in the mobile-computing variants.
* Intel has odd CPU Family names, like Apple OS X updates Leopard, Snow Leopard, Lion.
N. B. Dodge 11/13
Lecture # 21: Memory Management in Modern Computers
N. B. Dodge 11/13
Tick-Tock
Intel is said to follow a Tick-Tock design strategy.
The Tick phase is shrinking the minimum feature size.
Tock is introducing a new microarchitecture. Thus:
56
Tick
Tock
Tick
Tock
Tick
Tock
Skylake (Mid-2015)
Tick
N. B. Dodge 11/13
N. B. Dodge 11/13
N. B. Dodge 11/13
Windows (Continued)
Rumors have stated that Windows 10 may be free.
Probably not! However, W10 may be offered at a steep
discount to Windows XP users to get them to ditch
their 13-year-old operating system. Maybe even a
similar carrot to Vista users.
Other rumors (more likely) state that W10 is the last
Windows update.
An OS that can be more easily updated (a la Apple)
may be next. It will be interesting to see Microsoft OS,
TNG (the next generation) and its features.
59
N. B. Dodge 11/13
Samsung 105 4K TV
N. B. Dodge 11/13
3-D Printing
N. B. Dodge 11/13