Professional Documents
Culture Documents
instruction set
hardware
Levels of Program Code
• High-level language
– Level of abstraction closer to problem
domain
– Provides for productivity and portability
• Assembly language
– Textual representation of instructions
• Hardware representation
– Binary digits (bits)
– Encoded instructions and data
Components of a Computer
The BIG Picture • Same components for
all kinds of computer
Desktop, server, embedded
• Input/output includes
User-interface devices
• Display, keyboard, mouse
Storage devices
• Hard disk, CD/DVD, flash
Network adapters
• For communicating with other
computers
The Von Neumann Computer
The Computer
• Stored-Program Concept – Storing programs as numbers – by
John von Neumann – Eckert and Mauchly worked in engineering
the concept.
Performance X Performance Y
Execution timeY Execution timeX n
• CPU time
Time spent processing a given job
• Discounts I/O time, other jobs’ shares
Different programs are affected differently by CPU and
system performance
CPU Clocking
Operation of digital hardware governed by a constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
Performance improved by
Reducing number of clock cycles (good algorithm or hardware design)
Increasing clock rate (good technology)
Hardware designer must often trade off clock rate against cycle count
Performance Summary
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
Instructions Clock cycles Seconds
CPU Time
Program Instruction Clock cycle
Earlier measures –
MIPS (Millions of Instructions per sec.)
MFLOPS – Million floating point operations per sec.
CPI – Cycles per Instruction;
IPC – Instructions per cycle = 1/CPI;
• Computer • Software
A device that accepts input, A computer program that tells the
processes data, stores data, and computer how to perform particular
produces output, all according to a tasks.
series of stored instructions.
• Network
• Hardware Two or more computers and other
Includes the electronic and devices that are connected, for the
mechanical devices that process the purpose of sharing data and
data; refers to the computer as well programs.
as peripheral devices.
• Peripheral devices
Used to expand the computer’s
input, output and storage
capabilities.
Basic Terminology
• Input
• Whatever is put into a computer system.
• Data
• Refers to the symbols that represent facts, objects, or ideas.
• Information
• The results of the computer storing data as bits and bytes; the words, numbers,
sounds, and graphics.
• Output
• Consists of the processing results produced by a computer.
• Processing
• Manipulation of the data in many ways.
• Memory
• Area of the computer that temporarily holds data waiting to be processed,
stored, or output.
• Storage
• Area of the computer that holds data on a permanent basis when it is not
immediately needed for processing.
…Key Terminology
•• Out of order
Control Pathexecution
•• Datapath
Microcontroller
•• Out-of-order
ALU, FPU, GPU etc.
execution
•• Register renaming
CPU design
•• Dataflow
Hardwarearchitecture
description language
•• Stream processing
Pipelining
•• multi-threading
Cache
•• RISC, CISC
Von-Neumann architecture
•• Instruction-level parallelism (ILP)
Multi-core (computing)
•• Addressing
SuperscalarModes
• Vector processor
• Instruction set
Multiprocessors
• Multicore microprocessors
More than one processor per chip
NOTEBOOK COMPUTER
• Compact form of personal computer (laptop)
• Advantage is portability
WORK STATIONS
• More computational power than PC
• Costlier
• Used to solve complex problems which arises in engineering application
(graphics, CAD/CAM etc)
ENTERPRISE SYSTEM (MAINFRAME)
• More computational power
• Larger storage capacity
• Used for business data processing in large organization
• Commonly referred as servers or super computers
SERVER SYSTEM
• Supports large volumes of data which frequently need to be accessed or to
be modified
• Supports request response operation
SUPER COMPUTERS
• Faster than mainframes
• Helps in calculating large scale numerical and algorithm calculation
• Used for aircraft design and testing, military application and weather
forecasting
Computing Systems
Computers have two kinds of components:
• Hardware, consisting of its physical devices
(CPU, memory, bus, storage devices, ...)
• Software, consisting of the programs it has
(Operating system, applications, utilities, ...)
The Big Picture
Processor
Input
Control
Memory
ALU
Output
FUNCTIONAL UNITS OF COMPUTER
• Input Unit
• Output Unit
• Memory
• Bus Structure
Structure - Top Level
Peripherals Computer
Central Main
Processing Memory
Unit
Computer Systems
Interconnection
Input
Output
Communication
lines
Structure - The CPU
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Bus Unit
Control Unit
Registers Registers and
Decoders
Control
Memory
Function
ALL computer functions are:
– Data PROCESSING
– Data STORAGE
– Data MOVEMENT Data = Information
Coordinates How
– CONTROL Information is Used
• NOTHING ELSE!
INPUT UNIT:
• Converts the external world data to a binary format, which can be understood by CPU
OUTPUT UNIT:
• Converts the binary format data to a format that a common man can understand
CPU
• The “brain” of the machine
• CU Provides control signals in accordance with timings which in turn controls the execution process
T2 Enable R2
T4
Each basic step is
executed in one clock
cycle
MEMORY
• Two types are RAM or R/W memory and ROM read only memory
• ROM is used to store data and program which is not going to change.
• Memory Hierarchy
• Main Memory
• Auxiliary Memory
• Associative Memory
• Cache Memory
• Virtual Memory
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks
CPU Cache
memory
MEMORY HIERARCHY
MEMORY HIERARCHY
Register
Cache
Speed
Main Memory
Size
Magnetic Disk
Magnetic Tape
The memory unit that directly communicate with CPU is called the main memory
The memory hierarchy system consists of all storage devices employed in a computer system.
Memory Hierarchy
• CPU logic is usually faster than main memory access time,
(processing speed is limited primarily by the speed of main memory)
• The typical access time ratio between cache and main memory
is about 1 to 7~10
WL
M2 M4
Q M6
M5 !Q
M1 M3
!BL BL
Dynamic RAM (DRAM)
Each cell stores bit with a capacitor and transistor.
Small cells (1 to 3 fets/cell) – so more bits/chip
Periodic refresh required
Sensitive to disturbances.
Slower – so used for main memories
Not typically compatible with CMOS technology
Data stored as charge in a capacitor can be retained only for a limited time
due to the leakage current which eventually removes or modifies the charge.
Assuming that the voltage on the fully charged storage capacitor is V = 2.5V,
and that the leakage current is I = 40pA, then the time to discharge the
capacitor C = 20fF to the half of the initial voltage can be estimated as
ADVANTAGES OF DRAM:
• The size of a DRAM cell is in the order of 8F2,
where F is the smallest feature size in a given technology.
For F = 0.2μm the size is 0.32μm2
No static power is dissipated for storing charge in a capacitance
• SDRAM : Synchronous DRAM
• SDR SDRAM : Single Data Rate SDRAM
• DDR SDRAM : Double Data Rate SDRAM
• DDR2 SDRAM : an evolution over DDR SDRAM
• DDR3 SDRAM : improvement over DDR2
• RLDRAM : Reduced‐latency DRAM
Tran. Access
per bit time Persist? Sensitive? Cost Applications
1 2 3 4 5 6 7 8 Column
Memory Organized as 4 x 16 and 1 x 64 Arrays
Row
1 1 1 0 0 1 1
2 0 1 0 0 2 0
3 0 1 0 0 3 0
4 1 0 1 0 4 1
5 1 0 1 1 5 1
6 0 1 0 1 6 0
.• .• .•
.• .• .•
.• .• .•
13 1 0 1 1 61 1
14 0 0 0 0 62 0
15 0 1 1 0 63 0
16 0 0 0 0 64 0
1 2 3 4 Column 1
Block Diagram of a Read-write Memory
Memory
Select
Address
Decoder
Memory Array
Address Data Bus
Bus
Read Write
Memory Read Operation (8 X 8)
Read Write
Memory Write Operation
Read Write
16K x 8 Static RAM
A0
Address
Lines RAM Data Input/
16K x 8 Output Lines
A13
CS
WE
OE
Tri State Buffer
Input
D-Latch D-Flip Flop
Horizontal expansion
Add: #1 #2
A2A1A0 D3D2 D1D0
000 D3D2 D1D0
001 D3D2 D1D0
010 D3D2 D1D0
011 D3D2 D1D0
100 D3D2 D1D0
101 D3D2 D1D0
110 D3D2 D1D0
111 D3D2 D1D0
Memory Subsystems Organization
Two or more memory chips can be combined to create more
locations Address Lines : 4
(two 8X2 chips can create 16X2 memory) Bits per Address: 2
Address:
A3A2A1A0
0 0000
1 0001
0010
A3=0
2
3 0011
4 0100
5 0101
6 0110
Vertical expansion 7 0111
8 1000
9 1001
10 1010
A3=1
11 1011
12 1100
13 1101
14 1110
15 1111
MEMORY ADDRESS MAP
Pictorial representation of assigned address space for each chip in the system
- The low-order lines in the address bus select the byte within the chips and
other lines in the address bus select a particular chip through its chip select inputs
RAM: 512 Byte (512 X8) from 128x8 chip
No of Chips required: (512X8 ) / (128X8) = 4 Chips
Address bus
Hexa
Component address
10 9 8 7 6 5 4 3 2 1
Decoder
3 2 1 0
CS1
CS2
Data
RD 128 x 8
RAM 1
0000 – 007F
WR
AD7
CS1
CS2
Data
RD 128 x 8 0080 – 00FF
RAM 2
WR
AD7
CS1
CS2
Data
RD 128 x 8
RAM 3
0100 – 017F
WR
AD7
CS1
CS2
RD 128 x 8 Data 0180 – 017F
RAM 4
WR
AD7
CS1
CS2
Data
Tradeoff between size, speed and cost and exploits the principle of locality.
1. Register
Fastest memory element; but small storage; very expensive
2. Cache
Fast and small compared to main memory; acts as a buffer between the CPU
and main memory: it contains the most recent used memory locations
(address and contents are recorded here)
3. Main memory is the RAM of the system
4. Disk storage - HDD
Memory Hierarchy Design
Comparison between different types of
memory
Register Cache Memory HDD
Temporal Locality
The information which will be used in near future is likely to be in use already.
Spatial Locality
If a word is accessed, adjacent(near) words are likely accessed soon.
(e.g. Related data items are usually stored together; instructions are executed sequentially)
Cache
- The property of Locality of Reference makes the Cache memory systems work
- Cache is a fast small capacity memory that should hold those information
which are most likely to be accessed
Main memory
CPU
Cache memory
Cache Memory
• High speed (towards CPU speed)
• Small size (power & cost)
Miss
Main
CPU Memory
(Slow)
Mem
Cache
(Fast)
Hit Cache
Memory Access
All the memory accesses are directed first to Cache
If the word is in Cache; Access cache to provide it to CPU
If the word is not in Cache; Bring a block (or a line) including that word
to replace a block now in Cache
Q. If a new block is to replace one of the old blocks, which one should
we choose ?
Replacement algorithm
Hit / miss
Write-through / Write-back
Load through
Cache Memory
Main
CPU Memory
(Slow)
Mem
Cache
(Fast)
Cache
Cache
• Every address reference goes first to the cache;
if the desired address is not here, then we have a cache
miss;
If the desired data is in the cache, then we have a cache hit
Te = hTc + (1 - h) Tm
n
Key register (K)
Match
n register
Associative memory
Input m
array and logic
mXn M
Read m words
Write n bits per word
- Compare each word in CAM in parallel with the content of A (Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M (i) = 1
- K (Key Register) provides a mask for choosing a
particular field or key in the argument in A
(only those bits in the argument that have 1’s in their corresponding position of K are compared)
K1 Kj Kn
Write
R S
F ij Match To M i
Read logic
Output
MATCH LOGIC
K1 A1 K2 A2 Kn An
Mi
Back on Cache Memory
00000000
00000001 Main
•
• Memory
•
00000
00001
Cache •
•
• •
• •
• •
• •
FFFFF •
3FFFFFFF
Associative mapping
Direct mapping
Set-associative mapping
Associative mapping
Any block of Main memory can potentially reside in any cache block position.
This is much more flexible mapping method.
Direct mapping
A particular block of main memory can be brought to a particular block of cache memory.
So, it is not flexible.
Set-associative mapping
Blocks of cache are grouped into sets, and the mapping allows a block of main memory to
reside in any block of a specific set.
Associative Mapping
- Any block location in Cache can store any block in memory -> Most flexible
- Mapping Table is implemented in an associative memory -> Fast, very Expensive
- Mapping Table Stores both address and the content of the memory word
Argument register
Address Data
01000 3450
CAM 02777 6710
22235 1234
Octal address
Associative Mapping
Address
68212
Data
15830 0005 01A6
08993 47CC
How many
comparators?
15 Bits 12 Bits
(Key) (Data)
Associative Mapping
Cache
Location Main
Memory
00000000
00000
00001
Cache 00000001
• •
•
• •
00012000
• 00012000
• •
•
• 15000000 •
• •
• •
• 08000000 08000000
•
• •
FFFFF •
•
•
•
Address (Key) Data 15000000
•
3FFFFFFF
Direct Mapping
- Each memory block has only one place to load in Cache
- Mapping Table is made of RAM instead of CAM
- n-bit memory address consists of 2 parts; k bits of Index field and (n-k) bits of Tag field
- n-bit addresses are used to access main memory , k-bit Index used to access the Cache
00 000 32K x 12
000
512 x 12
Main memory Cache memory
Address = 15 bits Address = 9 bits
Data = 12 bits Data = 12 bits
77 777 777
Direct Mapping
Direct Mapping Cache Organization
01777 4560
02000 5670
777 02 6710
02777 6710
Cache memory
Each word in cache consists of data word and memory tag.
Operation
When a new
- CPU word isafirst
generates brought
memory in cache,
request tag bits stored alongside data bits
with (TAG;INDEX)
When CPU generate
- Access a memory
Cache using INDEX ;request, index field
(tag; data) used TAG
Compare as address
and tag for cache
Tag- field of CPU->address
If matches Hit compared with tag of word to CPU
Provide Cache[INDEX](data)
-If both match,
- If not matchHIT, else-MISS
-> Miss
• M [tag;INDEX] <- Cache[INDEX](data)
• Cache[INDEX] <- (TAG;M[TAG; INDEX])
• CPU <- Cache[INDEX](data)
Direct Mapping
00000
Cache
00500 000 01A6
Tag Data
00900 080 47CC 000 01A6
FFFFF
Match
Compare
No match
20
10 Bits 16 Bits
Bits
(Tag) (Data)
(Addr)
Direct Mapping Main
memory
Block 0
0
Block j of main memory maps onto block j modulo 128 Block 1
of the cache
EX. 11101,1111111,1100
Tag: 11101 Block 4095 3
Block: 1111111=127, in the 127th block of the cache 1
Word:1100=12, the 12th word of the 127th block in the cache
Direct Mapping (with block)
Address
000 0050 0
Block Size = 16
00000
Cache
00500 01A6
000
00501 0254
• Tag Data
00900
080
47CC 000 01A6
00901 A0B4
•
01400 0005
150
01401 5C04
•
FFFFF
Match
Compare
No match
20
Bits 10 Bits 16 Bits
(Addr) (Tag) (Data)
Set Associative Mapping
- Each memory block has a set of locations in the Cache to load
Operation
- CPU generates a memory address(TAG; INDEX)
- Access Cache with INDEX, (Cache word = (tag 0, data 0); (tag 1, data 1))
- Compare TAG and tag 0 and then tag 1
- If tag i = TAG -> Hit, CPU <- data i
- If tag i TAG -> Miss,
Replace either (tag 0, data 0) or (tag 1, data 1),
Assume (tag 0, data 0) is selected for replacement,
M[tag 0, INDEX] <- Cache[INDEX](data 0)
Cache[INDEX](tag 0, data 0) <- (TAG, M[TAG,INDEX]),
CPU <- Cache[INDEX](data 0)
Main
Set Associative Mapping memory
Block 0
Block 1
4: one of 16 words. (each block
has 16=24 words)
Cache
6: points to a particular set in tag
the cache (128/2=64=26) Set 0
Block 0
Block 63
tag
6: 6 tag bits is used to check if Block 1
the desired block is present Block 64
tag
Block 2
(4096/64=26). Set 1
tag Block 65
Block 3
Block 4095
Tag Set Word Set-associative-mapped cache with two blocks per set.
000 00500
2-Way Set Associative
00000
Cache
00500 000 01A6 010 0721
FFFFF
Compare Compare
20
10 Bits 16 Bits 10 Bits 16 Bits
Bits
(Tag) (Data) (Tag) (Data) Match No match
(Addr)
Replacement Algorithms
• Difficult to determine which blocks to kick out
• The cache controller tracks references to all blocks as
computation proceeds.
• Increase / clear track counters when a hit/miss occurs
Which Block should be Replaced on a Cache Miss?
Two primary strategies for full and set associative caches
– Random – candidate blocks are randomly selected
Some systems generate pseudo random block numbers, to get reproducible
behavior useful for debugging
– LRU (Least Recently Used) – to reduce the chance that information that
has been recently used will be needed again, the block replaced is the
least-recently used one.
Replacement Algorithms
• For Associative & Set-Associative Cache
Which location should be emptied when the cache
is full and a miss occurs?
First In First Out (FIFO)
Least Recently Used (LRU)
• Distinguish an Empty location from a Full one
Valid Bit
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss
Cache A A A A A E E E E E
FIFO B B B B B A A A A
C C C C C C C F
D D D D D D
First in - A
then - B
then - C Hit Ratio = 3 / 10 = 0.3
then - D
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss
Cache A B C A D E A D C F
LRU A B C A D E A D C
A B C A D E A D
B C C C E A
Give the programmer the illusion that the system has a very large memory,
even though the computer actually has a relatively small main memory
Address Mapping
Memory Mapping Table for Virtual Address -> Physical Address
Virtual address
Physical
Address
Memory table Main memory
buffer register buffer register
ADDRESS MAPPING
Address Space and Memory Space are each divided into fixed size group of
words called blocks or pages.
Ex: 1K words group(page size) Page 0
Page 1
Page 2
Address space Memory space Block 0
N = 8K = 213 Page 3
M = 4K = 212 Block 1
Page 4
Block 2
Page 5
Block 3
Page 6
Page 7
Organization of memory Mapping Table in a paged system
Page no. Line number
1 0 1 0 1 0 1 0 1 0 0 1 1 Virtual address
Table Presence
address bit
000 0 Main memory
001 11 1 Block 0
010 00 1 Block 1
011 0 01 0101010011 Block 2
100 0 Block 3
Main memory
101 01 1 address register
110 10 1
111 0 MDR
1 0 1 0 0 Key register
0 0 1 1 1
0 1 0 0 0 Associative memory
1 0 1 0 1
1 1 0 1 0
Page no. Block no.
Page Fault
Page number cannot be found in the Page Table
PAGE REPLACEMENT ALGORITHMS
• Counters
- For each page table entry - time-of-use register
- Incremented for every memory reference
- Page with the smallest value in time-of-use register is replaced
• Stack
- Stack of page numbers
- Whenever a page is referenced its page number is removed from the stack and
pushed on top
- Least recently used page number is at the bottom
Reference string 4 7 0 1 2 7 1
2 7
1 2
0 1
7 0
4 4
ADDRESS MAPPING
Address Space and Memory Space are each divided into fixed size group of
words called blocks or pages.
Ex: 1K words group(page size) Page 0
Page 1
Page 2
Address space Memory space Block 0
N = 8K = 213 Page 3
M = 4K = 212 Block 1
Page 4
Block 2
Page 5
Block 3
Page 6
Page 7
Organization of memory Mapping Table in a paged system
Page no. Line number
1 0 1 0 1 0 1 0 1 0 0 1 1 Virtual address
Table Presence
address bit
000 0 Main memory
001 11 1 Block 0
010 00 1 Block 1
011 0 01 0101010011 Block 2
100 0 Block 3
Main memory
101 01 1 address register
110 10 1
111 0 MDR
• Stack
- Stack of page numbers
- Whenever a page is referenced its page number is removed from the stack and
pushed on top
- Least recently used page number is at the bottom
Reference string 4 7 0 1 2 7 1
2 7
1 2
0 1
7 0
4 4
TIMING AND CONTROL
Control unit of basic computer
Instruction register (IR)
15 14 13 12 11 - 0 Other inputs
3x8
decoder
7 6543 210
D0
I
D7 Control Control
logic outputs
gates
T15
T0
15 14 . . . . 2 1 0
4 x 16
decoder