Professional Documents
Culture Documents
Digital and
Computer
Organization ACeL
The student will learn to develop simple assembly language
AMITY
programs and understand the underlying operations of more complex
UNIVERSITY
programs using Intels 8085 Microprocessor.
Syllabus
Course Objective:
The student will develop an understanding of the underlying operation of a modern digital
computer, identify and understand the various "building blocks" from which a modern computer
is constructed. The student will learn to develop simple assembly language programs and
understand the underlying operations of more complex programs using Intels 8085
Microprocessor.
Course Contents:
Module I: Digital Logic Fundamentals
Boolean Algebra: Basic Functions, Manipulating Boolean functions, Basic Combinational Logic:
Adder/Subtractor, Decoders, Encoders, Multiplexers, Memory, Basic Sequential Circuits: Flip-
flops, Registers, Counters.
Module II
General Computer Architecture
Block Diagram of typical Computer, Memory Section, Input/Output Section, CPU, Registers,
Arithmetic Unit, Instruction handling Areas, Stacks
Micro operations: Register Transfer, Bus and Memory Transfer, Arithmetic Micro operations,
Logic Micro operations, Shift Micro operations, Arithmetic Logic Shit Unit
Module III
Basic Computer Organization and Design: Instruction Codes, Operation code, Timing and
Control, Instruction Cycle, Memory Reference Instructions, Input Output Instructions and
Interrupts
Control Memory: Control Word, Microinstruction, Microprogramming, Control Memory,
Hardwired
Module IV
Central Processing Unit: General Register Organization, Stack Organization, Instruction
Formats, Addressing Modes, RISC, CISC
Pipelining and Vector Processing: Parallel Processing, Pipelining, Arithmetic Pipeline,
Instruction Pipeline, Vector Processing, Array Processors
Module V
Input Output Organization: I/O Interface, Asynchronous Data Transfer, Modes of Transfer,
Priority Interrupt, DMA, IOP, Serial Communication
Memory Organization: Associative Memory, Cache Memory, Virtual Memory
Module VI
Introduction to Microprocessor: Machine Language, Assembly Language, Assembler, High
Level Language, Compiler, Interpreter, Internal Architecture
8085.
Text:
Computer System Architecture, M. M. Mano, Pearson Education.
References:
Computer Architecture and Organization, J.P Hayes, TNH.
Lance A Leventhal Introduction to Microprocessors: Software, Hardware, Programming
Table of Contents
Syllabus .......................................................................................................................................... 2
Chapter 1 : Digital Logic Fundamentals .................................................................................... 7
1.1 Overview ............................................................................................................................... 7
1.2 Boolean Algebra .................................................................................................................... 7
1.3 Combinatorial Logic Circuit ............................................................................................... 11
1.4 Multiplexers ....................................................................................................................... 16
1.5 Decoders and Encoders ....................................................................................................... 18
1.6 Memory ............................................................................................................................... 20
1.7 Flip-Flop .............................................................................................................................. 21
1.8 Sequential circuit ................................................................................................................. 23
1.9 Register................................................................................................................................ 25
1.10 Counters ........................................................................................................................... 26
1.11 Digital Circuit ................................................................................................................... 27
MULTIPLE CHOICE QUESTIONS ........................................................................................ 28
Chapter 2 : Introduction to computer organization ................................................................ 30
2.1 Functional Units .................................................................................................................. 30
2.2.System Buses....................................................................................................................... 31
2.3 Memory Subsystem Organization ....................................................................................... 32
2.4 I/O Subsystem Organization and Interfacing .................................................................... 36
End chapter quizzes ................................................................................................................... 37
Chapter 3 : Programming the Basic computer ........................................................................ 40
3.1 Introduction: ....................................................................................................................... 40
3.2 The level of programming languages .................................................................................. 40
3.3 Translators ........................................................................................................................... 45
MULTIPLE CHOICE QUESTIONS ........................................................................................ 52
Chapter 4 : CPU ORGANIZATION........................................................................................ 54
4.1 General Register Organization ......................................................................................... 54
4.2 Stack Organization ............................................................................................................. 56
4.3 Register Organization .......................................................................................................... 57
4.3.1 Program Visible Register.............................................................................................. 57
4.4 Status and Control Registers ............................................................................................. 58
4.5 Register Transfer Language .............................................................................................. 60
4.6 Register Transfer ................................................................................................................. 60
4.7 Bus And Memory Transfer ........................................................................................... 61
4.8 Arithmetic Micro operations ............................................................................................ 63
4.9 Logic Microoperation ......................................................................................................... 64
4.10 Shift Micro operations...................................................................................................... 65
4.11 Control Unit....................................................................................................................... 66
4.11.1 Structure of control unit .............................................................................................. 66
4.11.2 Hardwired Control Unit .............................................................................................. 67
4.11.3 A Micro-programmed Control Unit .......................................................................... 68
4.12 Case Study: 8085 Microprocessor .................................................................................... 70
MULTIPLE CHOICE QUESTIONS ........................................................................................ 72
Chapter 5 : Computer Arithmetic ............................................................................................. 75
5.1 INTRODUCTION ............................................................................................................... 75
5.2 Number Systems ................................................................................................................ 76
5.3 Addition and Subtraction .................................................................................................... 79
5.4 Multiplication ................................................................................................................... 80
5.4.1 Booths Algorithm for multiplication ........................................................................ 81
5.5 Division .............................................................................................................................. 81
5.6 Floating Point ...................................................................................................................... 83
5.6.1 Normalization .............................................................................................................. 85
End chapter quizzes ................................................................................................................... 88
Chapter 6 : Input-Output Organization .................................................................................. 91
INTRODUCTION ..................................................................................................................... 91
6.1 PERIPHERAL DEVICES ................................................................................................. 92
6.1.1 Keyboard ...................................................................................................................... 92
6.1.2. Pointing devices ........................................................................................................... 92
6.1.3. High-degree of freedom input devices ....................................................................... 93
6.1.4. Imaging and Video input devices ............................................................................... 93
6.1.5. Medical Imaging ......................................................................................................... 93
6.2 OUTPUT DEVICES ........................................................................................................... 94
6.3 INPUT/OUTPUT INTERFACE ......................................................................................... 94
6.3.1 BUS AND INTERFACE MODULES ..................................................................... 95
6.3.2. CONNECTION OF I/O BUS ................................................................................... 96
6.4 ASYNCHRONOUS DATA TRANSFER .......................................................................... 98
6.4.1. Synchronous and Asynchronous Operations ............................................................... 98
6.4.2 Asynchronous Data Transfer ........................................................................................ 99
6.4. ASYNCHRONOUS SERIAL TRANSFER................................................................... 101
6.5 DIRECT MEMORY ACCESS ...................................................................................... 110
6.5.1. DMA I/O OPERATION .......................................................................................... 111
6.5.2. CYCLE STEALING................................................................................................. 111
6.6 PRIORITY INPUT/OUTPUT PROCESSOR .................................................................. 113
Multiple choice Questions....................................................................................................... 114
Chapter 7 : Memory Organization ......................................................................................... 117
7.1 Memory Hierarchy ............................................................................................................ 117
7.2 Main Memory.................................................................................................................... 118
7.2.1 RAM and ROM Chips ................................................................................................ 118
7.2.2 Memory Address maps ............................................................................................... 119
7.3. Auxiliary Memory ............................................................................................................ 121
7.3.1 Magnetic Tapes........................................................................................................... 121
7.3.2 Magnetic Disks .......................................................................................................... 122
7.3.3 RAID .......................................................................................................................... 122
7.4 Associative memory ......................................................................................................... 124
7.5 Cache memory................................................................................................................... 126
7.6 Virtual Memory Concept .................................................................................................. 133
7.6.1 Address Mapping ........................................................................................................ 134
7.6.2 Associative memory page table .................................................................................. 135
Multiple choice questions........................................................................................................ 137
Chapter 8 : Pipeline and Vector Processing ........................................................................... 139
8.1 Parallel Processing ............................................................................................................ 139
8.2 Pipelining .......................................................................................................................... 141
8.3 Arithmetic Pipeline ........................................................................................................... 142
8.4 Instruction Pipeline ........................................................................................................... 144
8.5 RISC Pipeline .................................................................................................................... 145
8.6 Vector Processing .............................................................................................................. 146
8.7 Array Processors ............................................................................................................... 149
8.8 Reduced Instruction set computer: RISC VS CISC .......................................................... 151
End Chapter quizzes:............................................................................................................... 152
Chapter 1 : Digital Logic Fundamentals
1.1 Overview
Gates, latches, memories and other logic components are used to design computer
systems and their subsystems
Neither combinatorial logic nor sequential logic is better than the other. In practice, both
are used as appropriate in circuit design.
This presentation introduces the Boolean algebra basic functions and examines the
fundamental methods used to combine, manipulate and transform these functions.
AND
x y out = xy
x
0 0 0 out
y
0 1 0
1 0 0 amplitude
0 1 0 1
1 1 1 Y(t)
0 0 1 1
X(t)
OR
x y out = x+y
x
0 0 0
out
y
0 1 1
1
1 0
amplitude
1 1 1
0 1 0 1
Y(t)
0 0 1 1
X(t)
0 1 1 0
out(t)= x(t) xor y(t)
The number of inputs that are 1 matter.
General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an
even number of input values are 1.
x y out =
x
0 0 0 out
y
0 1 1
amplitude
1 0 1
0 1 0 1
Y(t)
1 1 0 0 0 1 1
X(t)
0 1 1 0
The number of inputs that are 1 matter. out(t)= x(t) xor y(t)
General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an
even number of input values are 1.
NOT
x x'
x x'
0 1
1 0
amplitude
0 1 0 1
x(t)
1 0 1 0
x'(t)
XNOR
out =x xnor y
x y
x
out
0 0 1 y
0 1 0
1 0 0
amplitude
0 1 0 1
1 1 1 Y(t)
0 0 1 1
X(t)
1 0 0 1
out(t)= x(t) xnor y(t)
Output value is the complemented output from an XOR function.
Combinatorial Logic Circuit that implements the function xy+yz DeMorgans Law
(ab)=a+b
(a+b)=ab
xy'+yz
z
Property for generating equivalent functions
It may allow the simplification of complex functions, that will allow a simpler design
0 0 0 1 1 0 1
0 0 1 1 0 0 1
0 1 0 0 1 1 1
0 1 1 0 0 0 0
1 0 0 0 0 0 0
1 0 1 0 0 0 0
1 1 0 0 0 1 1
1 1 1 0 0 0 0
Karnaugh Map (K map)
The rows and columns of the K-map correspond to the possible values of the function's
input
Each cell in the K-map represents a minterm (i.e. a three variables function has: xyz,
xyz, xyz, xyz, xyz, xyz, xyz and xyz)
Gray Code
The 1-bit Gray code serves as basis for the 2-bit Gray code, the 2-bit Gray code is the
basis for 3-bit Gray code, etc
Gray code sequences are cycles: 000 -> 001 -> 011 -> 010 -> 110 -> 111 -> 101 -> 100 -
> 000
g1: xyz+xyz=xy(z+z)=xy
g3: xyz+xyz=xz(y+y)=xz
To derive a minimal expression we must select the fewest groups that cover all active
minterms (1s).
(xy + yz)= xy + yz
x y Z x'y'+y'z'+yz'
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 0
Consider four binary data inputs as inputs of a multiplexer. Two select signals will
determine which of the four inputs will be passed to the output.
Figure (a) presents the internal structure of a four inputs multiplexer, b and c present the
multiplexer schematic representation with active high enable signal (b) and active low
enable signal (c)
Multiplexer schematic representation with active high enable signal
Multiplexer schematic representation with active low enable signal
made of 2 to 1 multiplexers
For example, a decoder with three inputs and eight outputs will activate output 6
whenever the input values are 110.
Figure (a) shows a two to four decoder internal structure, (b) and (c) show its schematic
representation with active high enable signal and active low enable signal
For inputs S1S0 = 00, 01, 10 and 11 the outputs are 0, 1, 2 respectively 3 are active
Variants:
have active low outputs (the selected output has a value 0 and all the other
outputs have a value 1)
output all 0 when not enabled instead of state Z (the ones in the figure).
Encoders
The encoder is the exact opposite of the decoder.
It receives 2n inputs and outputs a n bit value corresponding to the one input that has a
value of 1.
An 4-to-2 encoder and its schematic representations are presented in (a), (b) and (c).
1.6 Memory
The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were
equipped with additional storage device beyond the capacity of main memory.The main memory
unit that communicates directly with CPU is called the MAIN MEMORY . Devices that
provide backup storage are called AUXILARY MEMORY. Most common auxiliary devices are
magnetic disks and tapes they are used for strong system programs, large data files and other
backup information. Only programs and data currently needed by the processor resides in main
memory. All other informations are stored in auxiliary memory and transferred to the main
memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer
system from the slow but high capacity auxiliary memory to a relatively faster main memory, to
an even smaller and faster cache memory accessible to the high-speed processing logic. Memory
Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the
memory system.
1.7 Flip-Flop
Every digital circuit likely to have a combinational circuit , most systems encountered, its also
include storage elements which describe in terms of sequential circuits. The most common type
of sequential circuit is synchronous.The storage elements employed in clocked sequential circuits
are called flip-flops. A flip-flop is a binary cell capable of storing one bit of information.
Clocked D Flip-Flop
Clock Pulse Definition
Master-Slave Flip-Flop
T Flip-Flop
JK Flip-Flop
Input
Present state x
Next state
Y
x/z
Input/output
y Y/z
Present state y
Next
state/output
(a) (b)
(a)
0/1
1/1
A C
0/0
1/0 1/0
0/0
B D
1/1
0/1
x/z
(b)
1.9 Register
A register is a group of flip-flops with each flip-flop capable of storing one bit of information.
An n-bit register has a group of n flip-flop and is capable of storing any binary information of n-
bits. In addition to flip-flops a register may have combinational gates that perform certain data -
processing tasks. The flip-flops hold the binary information and the gates control when and how
new information is transferred into the register.
D and Q are both sets of lines, with the number of lines equal to the width of each register.
There are often multiple address ports, as well as additional data ports.
1.10 Counters
A register that goes through a predetermined of states upon the application of input pulses is
called a counter. The input pulses may be clock pulses or may originate from external sources.
They may occur at uniform interval of time or at a random. Counters are found in almost all
equipment containing digital logic.
Binary counter
Count-down counter: A binary counter with reverse count: Starts from 15 goes down.
In a count-down counter the least significant bit is complemented with every count
pulse. Any other bit is complemented if the previous bit goes from 0 to 1.
We can use the same counter design with negative edge flip-flops to make a count-down
flip-flop.
a) Speed Register
b) Shift Register
c) User- accessible Register
d) Constant Register
ANSWERS
1. D, 2.C,3.B,4.A,5.B,6.B,7.C,8.A,9.A ,10.B
Chapter 2 : Introduction to computer organization
Memory
Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost
of the memory system.
2.3.2. Memory Location: Address Map
Address space assignment to each memory chip. Example: 512 bytes RAM and 512 bytes ROM.
Hexa Address
Component addres bus
10 9 8 7 6 5 4 3 2 1
s
RAM 1 0000 - 007F 0 0 0 x x x x x x x
RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x
Auxiliary Memory
o Information organized on magnetic tapes
o Auxiliary memory holds programs and data for future use
o Non-volatile data remains even when the memory device is taken off-power
o Slower access rates
o Greater storage capacity
Associative Memory
o Accessed by the content of the data rather than by an address
o Also called Content Addressable Memory (CAM)
o Used in very high speed searching applications
o Much faster than RAM in virtually all search applications
o Higher costs
Argument register(A)
Match Register
Cache Memory
o Cache is a fast small capacity memory that should hold those information which
are most likely to be accessed
o The property of Locality of Reference makes the Cache memory systems work
o Locality of Reference - The references to memory at any given time interval
tend to be confined within a localized areas. This area contains a set of
information and the membership changes gradually as time goes by.
Temporal Locality - The information which will be used in near future is
likely to be in use already
Spatial Locality - If a word is accessed, adjacent(near) words are likely
accessed soon
2.4 I/O Subsystem Organization and Interfacing
2.4.1. A Model of I/O Subsystem Organization
User Processes
Directory Management
File System
Logical I/O Communication
Architecture
Physical Organization
Device I/O
Hardware
Provides a method for transferring information between internal storage (such as memory
and CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
Peripherals - Electromechanical Devices
CPU or Memory - Electronic Device
I/O data
Port A
register
Bidirectional Bus
data bus buffers I/O data
Port B
In register
te
CPU
Chip select
CS rn I/O
Register select
RS1
al Control Control Device
Timing b register
Register select
RS0 and u
I/O read Control s
RD Status
I/O write Status
WR register
ANSWERS
1. c, 2.d, 3.a, 4.a, 5.b, 6.d, 7.c, 8.b, 9.d, 10. c
Chapter 3 : Programming the Basic computer
3.1 Introduction:
A total computer system includes both hardware and software. Hardware consists of physical
components and all associated equipment. Software refers to the program that written to the
computer. It is possible to be familiar with various aspects of computer software without being
concerned with of how computer hardware operates. It is also possible to design the parts of
software without a knowledge of software capabilities. A program written by a user may be
either dependent or independent of the physical computer that runs his program. For this
programming language is a interface between computer and programmer. The set of instructions
written in a language which is being translated in machine compatible language.
1. PASCAL
Designed for teaching structured programming.
2. C
Designed to operate computer at low-level using high level language.
More readable and better structured than Assembly language.
Similar language C++ is popular in game programming.
3. Java
Includes some syntax of C.
Java programs can be called from HTML document or run directly.
Computer needs Java Virtual Machine to execute Java Programs e.g. Yahoo
Snooker .
4. COBOL
Designed for business applications
Wordy language
Readable for beginners
e.g. multiply hourly-rate by hours-worked giving gross-pay
i.e. gross-pay hourly-rate * hours-worked. But, can be clumsy for
long programs.
5. BASIC
Designed to provide students with an easy-to-learn language.
Visual Basic
A version of BASIC
Specialized for developing Windows applications
Visual Basic.NET
Newest version of Visual Basic
Supports all Web-based features
6. Script Languages
Interpreted and processed by a software (e.g. Browser)
e.g. VBScript and JavaScript
have similar syntax as Visual Basic and Java respectively
Interpreted and processed by Web browser
3.3 Translators
Translator consist of three attributes source program, translator and object program. Source
program is a text file used to for storing source code which must be translated into machine
instructions before execution .Translator is a software to convert source code into machine
instructions. And it is used to discover syntax errors in a program.Object program is a binary file
e.g. EXE used for storing machine instructions during execution.
Types of translators:
3.3.1. Assembler
3.3.2. Compiler
3.3.3. Interpreter
3.3.1. Assemblers
A software used to translate assembly language program into machine instructions for producing
one machine instruction for each assembly language statement which translate the whole source
program before execution and the store the results in an object program(During execution, need
the object program only).
3.3.2 Compilers
2.3.5.1. Elements of a Machine Instruction each instruction must contain the information
required by the CPU for execution.Elements of a Machine Instruction are:
3.3.5.2. Source and Result operands:(values for that operands are of importance)
Virtually all arithmetic and logic operations are either unary (one operand) or binary (two
operands). The result of an operation must be stored, suggesting a third address. Finally,
after the completion of an instruction, the next instruction must be fetched, and its
address is needed.
This line of resoning suggests that an instruction could be required to contain 4 address
references: two operands, one result, and one address of the next instruction. In practice,
the address of the next instruction is handled by the program counter; therefore most
instructions have one, two or three operand addresses.
Three-address instruction formats are not common, because they require a relatively long
instruction format to hold three address references.
Number of Addresses
Numbers:
o Integer or fixed point
o Floating point (real numbers)
o Decimal
Characters:
o International Reference Alphabet (IRA) which is referred to as ASCII in the USA,
o EBCDIC character set used in IBM 370 machines.
Logical Data:
Boolean data. 1 = true, 0 = false.
Types of Operations
A set of general types of operations is as follows:
Data transfer: Move, Store, Load (Fetch), Exchange, Clear (Reset), set, Push, Pop.
Arithmetic: Add, Subtract, Multiply, Divide, Absolute, Negate, Increment, Decrement.
Logical: AND, OR, NOT, XOR, Test, Compare, Shift, Rotate, Set control variables.
Conversion: Translate, Convert.
I/O: Input (Read), Output (Write), Start I/O, Test I/O.
Transfer of Control: Jump (Branch), Jump Conditional, Jump to Subroutine, Return,
Execute, Skip, Skip Conditional, Halt, Wait (Hold), No Operation.
System Control: Instructions that can only be executed while the processor is in a
priviledged state, or is executing a program in a special priviledged area of memory.
Instructions reserved for the use of operating system.
Instruction Length:
Affected by and affects:
Memory size
Memory organization
Bus structure
CPU complexity
CPU speed
Trade off between powerful instruction repertoire and saving space
Allocation of Bits:
Number of addressing modes
Number of operands
Register versus memory
Number of register sets
Address range
Address granularity
Variable-Length Instructions:
1. Due to varying number of operands,
2. Due to varying length of opcode in some CPU's.
1. The speed at which the CPU processes data to convert is measured in what :
a) Megahertz
b) Gigahertz
c) Nanoseconds
d) A and B
3. Which register example holds the address of the current instruction being processed?
a) Program counter
b) Instruction register
c) Control Unit
d) Arithmetic Logic Unit
Answers
1.a , 2.b, 3.b, 4.a, 5.b 6.a , 7.a ,8.a, 9. a,10.d
Chapter 4 : CPU ORGANIZATION
Cloc Input
k
R
1
R
2
R
3
R
4
R
5
R
6
R
Load 7
(7 lines)
SEL
A
{ MUX MUX } SELB
3x8 A bus B
decoder
bus
SEL
D OPR ALU
Output
o Stack - A storage device that stores information in such a manner that the
item stored last is the first item retrieved. Hence, also called last-in first-
out (LIFO) list.
o Use of Stack
Temporary storage during program runtime
Not fixed in size: grows and shrinks using push and pop
operations
Not typed: elements of list not defined as one type
Subroutines (functions, procedures)
Linkage: saving a link back to the caller (address of
calling routine)
Data: return value, parameters, local variables
Other saved information
Managed at the assembly language level
Rules for compiler, AL programmer
o Stack Pointer (SP) - A register that holds the address of the top item in
the stack. SP always points at the top item in the stack.
o Push - Operation to insert an item into the stack.
o Pop - Operation to retrieve an item from the stack.
Programmer Visible Registers: These registers can be used by machine or assembly language
programmers to minimize the references to main memory.
Status Control and Registers: These registers can not be used by the programmers but are used
to control the CPU or the execution of a program.
These registers can be accessed using machine language. In general we encounter four types of
programmer visible registers.
General Purpose Registers
Data registers
Address Registers
Condition Codes Registers
The general purpose registers are used for various functions desired by the processor. A true
general purpose register can contain operand or can be used for calculation of address of operand
for any operation code of an instruction. But trends in today's machines show drift towards
dedicated registers, for example, some registers may be dedicated to floating point operations. In
some machines there is a distinct separation between data and address registers. The data
registers are used only for storing intermediate results or data. These data registers are not used
for calculation of address of the operand.
An address register may be a general-purpose register, but some dedicated address registers are
also used in several machines. Examples of dedicated address registers can be:
In the case of specialized register the number of bits needed for register specific details are
reduced as here we need to specify only few registers out of a set of registers. However, this
specialization does not allow much flexibility to the programmer.
Another issue related to the register set design is the number of general-purpose registers or data
and address registers to be provided in a microprocessor. The number of registers also effects the
instruction design as the number of registers determines the number of bits needed in an
instruction to specify a register reference. In general, it has been found that optimum number of
registers in a CPU is in the range 8 to 32. In case registers fall below the range then more
memory reference per instruction on an average will be needed, as some of the intermediate
result then has to be stored in the memory. On the other hand, if the number of registers goes
above 32, then there is no appreciable reduction in memory references.
One major decision to be taken for designing status and control registers organization is :
how to allocate control information between registers and the memory. Generally first few
hundred or thousands words of memory are allocated for storing control information. It is the
responsibility of the designer to determine how much control information should be in registers
and how much in memory.
4.5 Register Transfer Language
Rather than specifying a digital system in words, a specific notation is used, register
transfer language
For any function of the computer, the register transfer language can be used to describe
the (sequence of) microoperations
Register transfer language:
A symbolic language which is a convenient tool for describing the internal
organization of digital computers and can also be used to facilitate the design process
of digital systems.
Registers are designated by capital letters, sometimes followed by numbers (e.g., A, R13,
IR)
Often the names indicate function:
MAR - memory address register
PC - program counter
IR - instruction register
Registers and their contents can be viewed and represented in various ways
A register can be viewed as a single entity:
Registers may also be represented showing the bits of data they contain
However, most systems only implement four of these are AND ( ), OR ( ), XOR ( ),
Complement/NOT.
The others can be created from combination of these.
List of Logic Microoperations
- 16 different logic operations with 2 binary vars.
- n binary vars functions
Truth tables for 16 functions of 2 variables and the corresponding 16 logic micro-
operations
Application of logic Microoperation
Logic microoperations can be used to manipulate individual bits or a portions of a word
in a register
Consider the data in a register A. In another register, B, is bit data that will be used to
modify the contents of A
Selective-set A A+B
Selective-complement A A B
Selective-clear A A B
Mask (Delete) A AB
Clear A A B
Insert A (A B) + C
Compare A A B
Logic micro operation specify binary operations for strings of bits stored in register. These
operations consider each bit of register separately and treat them as binary variables. For
example, the exclusive OR micro operation with the content of two registers R1 and R2 is
symbolized by the statement.
P: R1 R1 R2
It specifies a logic micro operation to be executed on the individual bits of registers provided
that controls variable P = 1.
b).What differentiates them is the information that goes into the serial input
c). A right shift operation
Figure 1 is a block diagram showing the internal organization of a hard-wired control unit for our
simple computer. Input to the controller consists of the 4-bit opcode of the instruction currently
contained in the Instruction Register and the negative flag from the accumulator. The controller's
output is a set of 16 control signals that go out to the various registers and to the memory of the
computer, in addition to a HLT signal that is activated whenever the leading bit of the op-code is
one. The controller is composed of the following functional units: A ring counter, an instruction
decoder, and a control matrix.
The ring counter provides a sequence of six consecutive active signals that cycle continuously.
Synchronized by the system clock, the ring counter first activates its T0 line, then its T1 line, and
so forth. After T5 is active, the sequence begins again with T0. Figure 2 shows how the ring
counter might be organized internally. The instruction decoder takes its four-bit input from the
op-code field of the instruction register and activates one and only one of its 8 output lines. Each
line corresponds to one of the instructions in the computer's instruction set. Figure 3 shows the
internal organization of this decoder. The most important part of the hard-wired controller is the
control matrix. It receives input from the ring counter and the instruction decoder and provides
the proper sequence of control signals.
As we have seen, the controller causes instructions to be executed by issuing a specific set of
control signals at each beat of the system clock. Each set of control signals issued causes one
basic operation (micro-operation), such as a register transfer, to occur within the data path
section of the computer. In the case of a hard-wired control unit the control matrix is responsible
for sending out the required sequence of signals.
Addresses provided to the control ROM come from a micro-counter register, which is analogous
to the external machine's program counter. The micro-counter, in turn, receives its input from a
multiplexer which selects from : (1) the output of an address ROM, (2) a current-address
incrementer, or (3) the address stored in the next-address field of the current microinstruction.
The 8085 is an 8-bit general purpose microprocessor that can address 64K Byte of
memory. It has 40 pins and uses +5V for power. It can run at a maximum frequency of 3
MHz.The pins on the chip can be grouped into 6 groups:
Address Bus.
Data Bus.
Control and Status Signals.
Power supply and frequency.
Externally Initiated Signals.
Serial I/O ports.
The Address and Data Busses
The address bus has 8 signal lines A8 A15 which are unidirectional.The other 8 address
bits are multiplexed (time shared) with the 8 data bits.So, the bits AD0 AD7 are bi-
directional and serve as A0 A7 and D0 D7 at the same time.During the execution of
the instruction, these lines carry the address bits during the early part, then during the late
parts of the execution, they carry the 8 data bits.In order to separate the address from the
data, we can use a latch to save the value before the function of the bits changes.
There are 3 important pins in the frequency control group.X0 and X1 are the inputs from
the crystal or clock generating circuit.The frequency is internally divided by 2.So, to run
the microprocessor at 3 MHz, a clock running at 6 MHz should be connected to the X0
and X1 pins.
CLK (OUT): An output clock pin to drive the clock of the rest of the system.
Steps For Fetching an Instruction
Lets assume that we are trying to fetch the instruction at memory location 2005. That
means that the program counter is now set to that value.The following is the sequence of
operations:
The program counter places the address value on the address bus and the
controller issues a RD signal.
The memorys address decoder gets the value and determines which
memory location is being accessed.
The value in the memory location is placed on the data bus.
The value on the data bus is read into the instruction decoder inside the
microprocessor.
After decoding the instruction, the control unit issues the proper control
signals to perform the operation.
2. To accomplish a task a computer has to process data in three stages. They are:
3. The CPU is also known as:
A) The Brain
B) The Processor
C) The Central Processing Unit
D) All of the above
4. Main Memory holds data and instructions being processed by the computer and is this
memory is accessible by the CPU.
a) True
b) False
5. It is difficult to classify computer systems on the basis of their system performance ,as newer,
smaller computer systems outperform their larger models of yesteryear.
a) True
b) False
7.Which smaller unit of the CPU directs and coordinates all activities within it and determines
the sequence in which instructions are executed, sending instruction sequence to the other units.
a) CU
b) ALU
c) PROCESSOR
d) All of the above
8. The CPU primary responsibility is the movement of data and instructions from itself to main
memory and ALU and back. Arrange the CU execution of instruction in the correct order by
placing the execution instructions letter in the box provided:
a) Send data to memory unit after processing
b) Fetches data required by the instruction from memory
c) Fetches the instruction from memory
d) Decode the instruction
e) Send data and instruction to the ALU for processing
9. Which smaller CPU unit contains registers-temporary storage locations that hold a single
instruction or data item needed immediately and frequently
a) CU
b) ALU
c) PROCESSOR
d) All of the above
10. Program counter (PC) and instruction register (IR) are examples of registers:
a) True
b) False
Answers
1 ,2 ,3 ,4 ,5 ,6 ,7 8, 9, 10
Chapter 5 : Computer Arithmetic
5.1 INTRODUCTION
Because electronic logic deals with currents that are on or off, it has been found convenient to
represent quantities in binary form to perform arithmetic on a computer. Thus, instead of having
ten different digits, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, in binary arithmetic, there are only two different
digits, 0 and 1, and when moving to the next column, instead of the digit representing a quantity
that is ten times as large, it only represents a quantity that is two times as large. Thus, the first
few numbers are written in binary as follows:
Decimal Binary
Zero 0 0
One 1 1
Two 2 10
Three 3 11
Four 4 100
Five 5 101
Six 6 110
Seven 7 111
Eight 8 1000
Nine 9 1001
Ten 10 1010
Eleven 11 1011
Twelve 12 1100
The addition and multiplication tables for binary arithmetic are very small, and this makes it
possible to use logic circuits to build binary adders.
+|0 1 *|0 1
--------- ---------
0|0 1 0|0 0
1 | 1 10 1|0 1
Thus, from the table above, when two binary digits, A and B are added, the carry bit is simply (A
AND B), while the last digit of the sum is more complicated; ((A AND NOT B) OR ((NOT A)
AND B)) is one way to express it.
X= i xi10i
Binary Number System
Hexadecimal Notation:
command ground between computer and Human
Use 16 digits, (0,1,3,9,A,B,C,D,E,F)
1A16=(116x161)+(A16x16o)
= (110 x 161)+(1010 x 160)=2610
Convert group of four binary digits to/from one hexadecimal digit,
0000=0; 0001=1; 0010=2; 0011=3; 0100=4; 0101=5; 0110=6; 0111=7; 1000=8;
1001=9; 1010=A; 1011=B; 1100=C; 1101=D; 1110=E; 1111=F;
e.g.
1101 1110 0001. 1110 1101 = DE1.DE
Sign-Magnitude
Left most bit is sign bit
0 means positive
1 means negative
+18 = 00010010
-18 = 10010010
Problems
Need to consider both sign and magnitude in arithmetic
Two representations of zero (+0 and -0)
+3 = 00000011
+2 = 00000010
+1 = 00000001
+0 = 00000000
-1 = 11111111
-2 = 11111110
-3 = 11111101
Benefits
One representation of zero
Arithmetic works easily (see later)
Negating is fairly easy (2s compliment operation)
3 = 00000011
Boolean complement gives 11111100
Add 1 to LSB 11111101
5.4 Multiplication
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products
Multiplication Example
(unsigned numbers e.g.)
1011 Multiplicand (11 dec)
x 1101 Multiplier (13 dec)
1011 Partial products
0000 Note: if multiplier bit is 1 copy
1011 multiplicand (place value)
1011 otherwise zero
10001111 Product (143 dec)
Note: need double length result
5.4.1 Booths Algorithm for multiplication
5.5 Division
More complex than multiplication
However, can utilize most of the same hardware.
Based on long division
Division of Unsigned Binary Integers
00001101 Quotient
Divisor 1011 10010011 Dividend
1011
001110
Partial 1011
Remainders
001111
1011
100 Remainder
Flowchart for Unsigned Binary division
Real Numbers
Numbers with fractions
Could be done in pure binary
1001.1010 = 24 + 20 +2-1 + 2-3 =9.625
Where is the binary point?
Fixed?
Very limited
Moving?
How do you show where it is?
The relative magnitudes (order) of the numbers do not change. Can be treated as integers
for comparison.
5.6.1 Normalization
FP numbers are usually normalized
i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1
Since it is always 1 there is no need to store it
(c.f. Scientific notation where numbers are normalized to give a single digit before
the decimal point
e.g. 3.123 x 103)
FP Ranges
Expressible Numbers
IEEE 754
Standard for floating point storage
32 and 64 bit standards
8 and 11 bit exponent respectively
Extended formats (both mantissa and exponent) for intermediate results
Representation: sign, exponent, faction
0: 0, 0, 0
-0: 1, 0, 0
Plus infinity: 0, all 1s, 0
Minus infinity: 1, all 1s, 0
NaN; 0 or 1, all 1s, =! 0
FP Arithmetic +/-
Check for zeros
Align significands (adjusting exponents)
Add or subtract significands
Normalize result
FP Arithmetic x/
Check for zero
Add/subtract exponents
Multiply/divide significands (watch sign)
Normalize
Round
All intermediate results should be in double length storage
Floating Point Multiplication
Q4. What is the result of adding the following two positive binary bit strings?
101101.101
+101100.0010
a).1000001.1110
b).1000001.1010
c).1000001.1000
d).1000001.1100
Q5.What name is used to describe the bit of lowest magnitude within a byte or bit string?
a). radix point
b). low order bit
c). high order bit
d). op code
Q6. What is the best way to write the value '7564' and make it clear to the reader that the number
should be interpreted as a hexadecimal value?
a). 0x7654
b). 7654B
c). 7654H
d). \7654
Q8. What coding format encodes a real number as a mantissa multiplied by a power (exponent)
of two?
a). Binary
b). excess notation
c). floating point
d). two's complement
Q9. Which of the following does not result from floating point math operations?
a). Underflow
b). Overflow
c). Truncation
d). Two's complement
Q10. How many bits of information can each memory cell in a computer chip hold?
a). 0 bit
b). 1 bit
c). 8 bits
Answers
1.a ,2. a ,3.c, 4.d, 5.a, 6.a, 7.a, 8.a, 9.d, 10.c
Chapter 6 : Input-Output Organization
INTRODUCTION
In computing, input/output, or I/O, refers to the communication between an information
processing system (such as a computer), and the outside world possibly a human, or another
information processing system. Inputs are the signals or data received by the system, and outputs
are the signals or data sent from it. The term can also be used as part of an action; to "perform
I/O" is to perform an input or output operation. I/O devices are used by a person (or other
system) to communicate with a computer. For instance, keyboards and mouses are considered
input devices of a computer, while monitors and printers are considered output devices of a
computer. Devices for communication between computers, such as modems and network cards,
typically serve for both input and output.
Note that the designation of a device as either input or output depends on the perspective.
Mouses and keyboards take as input physical movement that the human user outputs and convert
it into signals that a computer can understand. The output from these devices is input for the
computer. Similarly, printers and monitors take as input signals that a computer outputs. They
then convert these signals into representations that human users can see or read. (For a human
user the process of reading or seeing these representations is receiving input.)
In computer architecture, the combination of the CPU and main memory (i.e. memory that the
CPU can read and write to directly, with individual instructions) is considered the brain of a
computer, and from that point of view any transfer of information from or to that combination,
for example to or from a disk drive, is considered I/O. The CPU and its supporting circuitry
provide memory-mapped I/O that is used in low-level computer programming in the
implementation of device drivers.
INPUT-OUTPUT ORGANIZATION
Peripheral Devices
Input-Output Interface
Asynchronous Data Transfer
Modes of Transfer
Priority Interrupt
Direct Memory Access
Input-Output Processor
Serial Communication
6.1.1 Keyboard
Computer keyboard
Keyer
Chorded keyboard
LPFK
Webcam
Image scanner
Fingerprint scanner
Barcode reader
3D scanner
Laser rangefinder
o Computed tomography
o Magnetic resonance imaging
o Positron emission tomography
o Medical ultrasonography
6.2 OUTPUT DEVICES
An output device is any piece of computer hardware equipment used to communicate the
results of data processing carried out by an information processing system (such as a
computer) to the outside world. In computing, input/output, or I/O, refers to the
communication between an information processing system (such as a computer), and the
outside world. Inputs are the signals or data sent to the system, and outputs are the signals or
data sent by the system to the outside.
The most common input devices used by the computer are the keyboard and mouse. The
keyboard allows the entry of textual information while the mouse allows the selection of a
point on the screen by moving a screen cursor to the point and pressing a mouse button. The
most common outputs are monitors and speakers
Provides a method for transferring information between internal storage (such as memory
and CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
Peripherals - Electromechanical Devices
CPU or Memory - Electronic Device
Data Transfer Rate
Peripherals - Usually slower
CPU or Memory - Usually faster than peripherals
Some kinds of Synchronization mechanism may be needed
Unit of Information
Peripherals Byte, Block,
CPU or Memory Word
Data representations may differ
Commands
6.3.2. CONNECTION OF I/O BUS
* I/O BUS is for information transfers between CPU and I/O devices through their I/O interface
* Many computers use a common single bus system for both memory and I/O interface units
- Use one common bus but separate control lines for each function
- Use one common bus with common control lines for both functions
* Some computer systems use two separate buses, one to communicate with memory and the
other with I/O interfaces
- Communication between CPU and all interface units is via a common I/O Bus
- An interface connected to a peripheral device may have a number of data registers , a control
register, and a status register
- Function code and sense lines are not needed (Transfer of data, control, and status information
is always via the common I/O Bus)
Separate I/O read/write control lines in addition to memory read/write control lines.
Memory-mapped I/O
A single set of read/write control lines(no distinction between memory and I/O transfer)
- Memory and I/O addresses share the common address space
-> reduces memory address range available
- No specific input or output instruction
-> The same memory reference instructions can be used for I/O transfers
- Information in each port can be assigned a meaning depending on the mode of operation of the
I/O device. Port A = Data; Port B = Command; Port C = Status
Asynchronous data transfer between two independent units requires that control signals be
transmitted between the communicating units to indicate the time at which data is being
transmitted.
Strobe pulse : A strobe pulse is supplied by one unit to indicate the other unit when the transfer
has to occur.
Handshaking: A control signal is accompanied with each data being transmitted to indicate the
presence of data .The receiving unit responds with another control signal to acknowledge receipt
of the data.
* The strobe may be activated by either the source or the destination unit
Source-Initiated Strobe for Data Transfer
6.4.2.2. HANDSHAKING
Strobe Methods
1. Source-Initiated
The source unit that initiates the transfer has no way of knowing whether the destination
unit has actually received data
2. Destination-Initiated
The destination unit that initiates the transfer no way of knowing whether the source has
actually placed the data on the bus.To solve this problem, the HANDSHAKE method introduces
a second control signal to provide a Reply to the unit that initiates the transfer.
FIRST-IN-FIRST-OUT(FIFO) BUFFER
* Input data and output data at two different rates
* Output data are always in the same order in which the data entered the buffer.
* Useful in some applications when data is transferred asynchronously
4 x 4 FIFO Buffer (4 4-bit registers Ri),
4 Control Registers(flip-flops Fi, associated with each
MODES OF TRANSFER - PROGRAM-CONTROLLED I/O
3 different Data Transfer Modes between the central computer (CPU or Memory) and
peripherals;
1. Program-Controlled I/O
2. Interrupt-Initiated I/O
PRIORITY INTERRUPT
Priority
- Determines which interrupt is to be served first when two or more requests are made
simultaneously. And also determines which interrupts are permitted to interrupt the computer
while another is being serviced.
- Higher priority interrupts can make requests while servicing a lower priority interrupt .
Interrupt Register:
- Each bit is associated with an Interrupt Request from different Interrupt Source - different
priority level.
- Each bit can be cleared by a program instruction.
Mask Register:
- Mask Register is associated with Interrupt Register.
- Each bit can be set or cleared by an Instruction.
INTERRUPT PRIORITY ENCODER
Determines the highest priority interrupt when more than one interrupts take place.
Priority Encoder Truth table
INTERRUPT CYCLE
At the end of each Instruction cycle :
- CPU checks IEN and IST.
- If IEN IST = 1, CPU -> Interrupt Cycle.
SP SP - 1 Decrement stack pointer.
M[SP] PC Push PC into stack.
INTACK 1 Enable interrupt acknowledge.
PC VAD Transfer vector address to PC.
IEN 0 Disable further interrupts.
Go To Fetch to execute the first instruction in the interrupt service routine.
INTERRUPT SERVICE ROUTINE
d. Issue a GO command
Upon receiving, a GO Command DMA performs I/O operation as follows independently from
CPU.
[1] Input Device <- R (Read control signal).
[2] Buffer (DMA Controller) <- Input Byte;and assembles the byte into a word until word is
full.
[4] M <- memory address, W(Write control signal).
[5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC 1.
[6] If WC = 0, then Interrupt to acknowledge done, else go to [1] .
Output
[1] M <- M Address, R, M Address R <- M Address R + 1, WC <- WC 1.
[2] Disassemble the word .
[3] Buffer <- One byte; Output Device <- W, for all disassembled bytes .
[4] If WC = 0, then Interrupt to acknowledge done, else go to [1].
Cycle Steal
- CPU is usually much faster than I/O(DMA), thus CPU uses the most of the memory
cycles.
- DMA Controller steals the memory cycles from CPU.
- For those stolen cycles, CPU remains idle.
- For those slow CPU, DMA Controller may steal most of the memory cycles which may
cause CPU remain idle long time .
DMA TRANSFER
6.6 PRIORITY INPUT/OUTPUT PROCESSOR
Channel
- Processor with direct memory access capability that communicates with I/O devices.
- Channel accesses memory by cycle stealing.
- Channel can execute a Channel Program.
iii. Each CCW specifies the parameters needed by the channel to control the
I/O devices and perform data transfer operations.
- CPU initiates the channel by executing an channel I/O class instruction and once initiated,
channel operates independently of the CPU .
CHANNEL / CPU COMMUNICATION
6. Transmitter register
a).Accepts a data byte through data bus.
b). Accepts a data byte through memory bus.
c). Accepts a data byte through address bus.
7.The larger the RAM of computer the faster its processing speed is, since it eliminates.
a). need for external memory b). need for ROM
c).frequent disk I/Os. d). need for wider data path
8. A group of signal lines used to transmit data in parallel from one element of a computer to
another is
a).Control Bus b).Address Bus
c). Databus d). Network
9. The basic unit within a computer store capable of holding a single unit of Data is
a). register b).ALU
c).Control unit d). store location
Answers
1.a, 2.b,3.c,4.b,5.d,6.a, 7.b,8.c,9.b,10.d
Chapter 7 : Memory Organization
The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were
equipped with additional storage device beyond the capacity of main memory.The main memory
unit that communicates directly with CPU is called the MAIN MEMORY . Devices that
provide backup storage are called AUXILARY MEMORY. Most common auxiliary devices are
magnetic disks and tapes they are used for strong system programs, large data files and other
backup information. Only programs and data currently needed by the processor resides in main
memory. All other informations are stored in auxiliary memory and transferred to the main
memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer
system from the slow but high capacity auxiliary memory to a relatively faster main memory, to
an even smaller and faster cache memory accessible to the high-speed processing logic. Memory
Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the
memory system.
7.3.3 RAID
RAID is an acronym first defined by David A. Patterson, Garth A. Gibson and Randy Katz at
the University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive
Disks a technology that allowed computer users to achieve high levels of storage reliability from
low-cost and less reliable PC-class disk-drive components, via the technique of arranging the
devices into arrays for redundancy .More recently, marketers representing industry RAID
manufacturers reinvented the term to describe a Redundant Array of Independent Disks as a
means of disassociating a "low cost" expectation from RAID technology.
"RAID" is now used as an umbrella term for computer data storage schemes that can divide and
replicate data among multiple hard disk drives. The different Schemes/architectures are named by
the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all
involve two key design goals: increased data reliability or increased input/output performance.
When multiple physical disks are set up to use RAID technology, they are said to be in a RAID
array. This array distributes data across multiple disks, but the array is seen by the computer user
and operating system as one single disk. RAID can be set up to serve several different purposes.
Purpose and basics: Redundancy is achieved by either writing the same data to multiple drives
(known as mirroring), or writing extra data (known as parity data) across the array, calculated
such that the failure of one (or possibly more, depending on the type of RAID) disks in the array
will not result in loss of data. A failed disk may be replaced by a new one, and the lost data
reconstructed from the remaining data and the parity data. Organizing disks into a redundant array
decreases the usable storage capacity. For instance, a 2-disk RAID 1 array loses half of the total
capacity that would have otherwise been available using both disks independently, and a RAID 5
array with several disks loses the capacity of one disk. Other types of RAID arrays are arranged so
that they are faster to write to and read from than a single disk.There are various combinations of
these approaches giving different trade-offs of protection against data loss, capacity, and speed.
RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements.
RAID can involve significant computation when reading and writing information. With
traditional "real" RAID hardware, a separate controller does this computation. In other cases the
operating system or simpler and less expensive controllers require the host computer's processor
to do the computing, which reduces the computer's performance on processor-intensive tasks
(see "Software RAID" and "Fake RAID" below). Simpler RAID controllers may provide only
levels 0 and 1, which require less processing.
RAID systems with redundancy continue working without interruption when one (or possibly
more, depending on the type of RAID) disks of the array fail, although they are then vulnerable
to further failures. When the bad disk is replaced by a new one the array is rebuilt while the
system continues to operate normally. Some systems have to be powered down when removing
or adding a drive; others support hot swapping, allowing drives to be replaced without powering
down. RAID with hot-swapping is often used in high availability systems, where it is important
that the system remains running as much of the time as possible.
Principles: RAID combines two or more physical hard disks into a single logical unit by using
either special hardware or software. Hardware solutions often are designed to present themselves
to the attached system as a single hard drive, so that the operating system would be unaware of the
technical workings. For example, you might configure a 1TB RAID 5 array using three 500GB
hard drives in hardware RAID, the operating system would simply be presented with a "single"
1TB disk. Software solutions are typically implemented in the operating system and would present
the RAID drive as a single drive to applications running upon the operating system.
There are three key concepts in RAID: mirroring, the copying of data to more than one disk;
striping, the splitting of data across more than one disk; and error correction, where redundant
data is stored to allow problems to be detected and possibly fixed (known as fault tolerance).
Different RAID levels use one or more of these techniques, depending on the system
requirements. RAID's main aim can be either to improve reliability and availability of data,
ensuring that important data is available more often than not (e.g. a database of customer orders),
or merely to improve the access speed to files (e.g. for a system that delivers video on demand
TV programs to many viewers).
Compare each word in CAM in parallel with the content of A(Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M(i) = 1
- K(Key Register) provides a mask for choosing a particular field or key in the argument in
A(only those bits in the argument that have 1s in their corresponding position of K are
compared).
Organization of CAM
The cache is a small amount of high-speed memory, usually with a memory cycle time
comparable to the time required by the CPU to fetch one instruction. The cache is usually filled
from main memory when instructions or data are fetched into the CPU. Often the main memory
will supply a wider data word to the cache than the CPU requires, to fill the cache more rapidly.
The amount of information which is replaces at one time in the cache is called the line size for
the cache. This is normally the width of the data bus between the cache memory and the main
memory. A wide line size for the cache means that several instruction or data words are loaded
into the cache at one time, providing a kind of prefetching for instructions or data. Since the
cache is small, the effectiveness of the cache relies on the following properties of most programs:
Spatial locality -- most programs are highly sequential; the next instruction usually comes
from the next memory location.
Data is usually structured, and data in these structures normally are stored in contiguous
memory locations.
Short loops are a common program structure, especially for the innermost sets of nested
loops. This means that the same small set of instructions is used over and over.
Generally, several operations are performed on the same data values, or variables.
When a cache is used, there must be some way in which the memory controller determines
whether the value currently being addressed in memory is available from the cache. There are
several ways that this can be accomplished. One possibility is to store both the address and the
value from main memory in the cache, with the address stored in a type of memory called
associative memory or, more descriptively, content addressable memory.
An associative memory, or content addressable memory, has the property that when a value is
presented to the memory, the address of the value is returned if the value is stored in the
memory, otherwise an indication that the value is not in the associative memory is returned. All
of the comparisons are done simultaneously, so the search is performed very quickly. This type
of memory is very expensive, because each memory location must have both a comparator and a
storage element. A cache memory can be implemented with a block of associative memory,
together with a block of ``ordinary'' memory. The associative memory would hold the address of
the data stored in the cache, and the ordinary memory would contain the data at that address.
Such a cache memory might be configured as shown in Figure.
Figure: A cache implemented with associative memory
If the address is not found in the associative memory, then the value is obtained from main
memory. Associative memory is very expensive, because a comparator is required for every
word in the memory, to perform all the comparisons in parallel. A cheaper way to implement a
cache memory, without using expensive associative memory, is to use direct mapping. Here, part
of the memory address (usually the low order digits of the address) is used to address a word in
the cache. This part of the address is called the index. The remaining high-order bits in the
address, called the tag, are stored in the cache memory along with the data. For example, if a
processor has an 18 bit address for memory, and a cache of 1 K words of 2 bytes (16 bits) length,
and the processor can address single bytes or 2 byte words, we might have the memory address
field and cache organized as in Figure .
Figure: A direct mapped cache configuration
This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there
are 4 other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for
each byte and one for the tag. The parity bits are used to check that a single bit error has not
occurred to the data while in the cache. A fourth bit, called the valid bit is used to indicate
whether or not a given location in cache is valid. In the PDP-11/60 and in many other processors,
the cache is not updated if memory is altered by a device other than the CPU (for example when
a disk stores new data in memory). When such a memory operation occurs to a location which
has its value stored in cache, the valid bit is reset to show that the data is ``stale'' and does not
correspond to the data in main memory. As well, the valid bit is reset when power is first applied
to the processor or when the processor recovers from a power failure, because the data found in
the cache at that time will be invalid. In the PDP-11/60, the data path from memory to cache was
the same size (16 bits) as from cache to the CPU. (In the PDP-11/70, a faster machine, the data
path from the CPU to cache was 16 bits, while from memory to cache was 32 bits which means
that the cache had effectively prefetched the next instruction, approximately half of the time).
The amount of information (instructions or data) stored with each tag in the cache is called the
line size of the cache. (It is usually the same size as the data path from main memory to the
cache.) A large line size allows the prefetching of a number of instructions or data words. All
items in a line of the cache are replaced in the cache simultaneously, however, resulting in a
larger block of data being replaced for each cache miss.
The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K
bytes. For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a
cache configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes
would be typical, and line sizes of 1, 2 or 4 words would be typical).
Figure: One possible MIPS cache organization
A characteristic of the direct mapped cache is that a particular memory address can be mapped
into only one cache location. Many memory addresses are mapped to the same cache location (in
fact, all addresses with the same index field are mapped to the same cache location.) Whenever a
``cache miss'' occurs, the cache line will be replaced by a new line of information from main
memory at an address with the same index but with a different tag.
Note that if the program ``jumps around'' in memory, this cache organization will likely not be
effective because the index range is limited. Also, if both instructions and data are stored in
cache, it may well happen that both map into the same area of cache, and may cause each other
to be replaced very often. This could happen, for example, if the code for a matrix operation and
the matrix data itself happened to have the same index values.
A more interesting configuration for a cache is the set associative cache, which uses a set
associative mapping. In this cache organization, a given memory location can be mapped to
more than one cache location. Here, each index corresponds to two or more data words, each
with a corresponding tag. A set associative cache with n tag and data fields is called an ``n-way
set associative cache''. Usually , for k = 1, 2, 3 are chosen for a set associative cache (k =
0 corresponds to direct mapping). Such n-way set associative caches allow interesting tradeoff
possibilities; cache performance can be improved by increasing the number of ``ways'', or by
increasing the line size, for a given total amount of memory. An example of a 2-way set
associative cache is shown in Figure , which shows a cache containing a total of 2K lines, or 1 K
sets, each set being 2-way associative. (The sets correspond to the rows in the figure.)
Random -- the location for the value to be replaced is chosen at random from all n of the
cache locations at that index position. In a 2-way set associative cache, this can be
accomplished with a single modulo 2 random variable obtained, say, from an internal
clock.
First in, first out (FIFO) -- here the first value stored in the cache, at each index position,
is the value to be replaced. For a 2-way set associative cache, this replacement strategy
can be implemented by setting a pointer to the previously loaded word each time a new
word is stored in the cache; this pointer need only be a single bit. (For set sizes > 2, this
algorithm can be implemented with a counter value stored for each ``line'', or index in the
cache, and the cache can be filled in a ``round robin'' fashion).
Least recently used (LRU) -- here the value which was actually used least recently is
replaced. In general, it is more likely that the most recently used value will be the one
required in the near future. For a 2-way set associative cache, this is readily implemented
by setting a special bit called the ``USED'' bit for the other word when a value is accessed
while the corresponding bit for the word which was accessed is reset. The value to be
replaced is then the value with the USED bit set. This replacement strategy can be
implemented by adding a single USED bit to each cache location. The LRU strategy
operates by setting a bit in the other word when a value is stored and resetting the
corresponding bit for the new word. For an n-way set associative cache, this strategy can
be implemented by storing a modulo n counter with each data word. (It is an interesting
exercise to determine exactly what must be done in this case. The required circuitry may
become somewhat complex, for large n.)
Cache memories normally allow one of two things to happen when data is written into a memory
location for which there is a value stored in cache:
Write through cache -- both the cache and main memory are updated at the same time.
This may slow down the execution of instructions which write data to memory, because
of the relatively longer write time to main memory. Buffering memory writes can help
speed up memory writes if they are relatively infrequent, however.
Write back cache -- here only the cache is updated directly by the CPU; the cache
memory controller marks the value so that it can be written back into memory when the
word is removed from the cache. This method is used because a memory location may
often be altered several times while it is still in cache without having to write the value
into main memory. This method is often implemented using an ``ALTERED'' bit in the
cache. The ALTERED bit is set whenever a cache value is written into by the processor.
Only if the ALTERED bit is set is it necessary to write the value back into main memory
(i.e., only values which have been altered must be written back into main memory). The
value should be written back immediately before the value is replaced in the cache.
Physical address
Address Space and Memory Space are each divided into fixed size group of words called blocks
or pages
1K words group
Organization of memory Mapping Table in a paged system
Page Table
Straight forward design -> n entry table in memory, Inefficient storage space utilization <-
n-m entries of the table is empty
More efficient method is m-entry Page Table. Page Table made of an Associative Memory
that is m words; (Page Number: Block Number)
Virtual
address
Page no.
1 0 1 Line number Argument
register
1 0 1 0 0 Key
register
0 0 1 1 1
0 1 0 0 0 Associative
1 0 1 0 1 memory
1 1 0 1 0
Page Block no.
no.
Page Fault
1. Trap to the OS
4. Check that the page reference was legal and determine the location of the page on the
backing store(disk)
8. Save the registers and program state for the other user
10. Correct the page tables (the desired page is now in memory)
12. Restore the user registers, program state, and new page table, then resume the interrupted
instruction.
Processor architecture should provide the ability to restart any instruction after a page fault.
4.How many bits of information can each memory cell in a computer chip hold?
a).0 bit
b).1 bit
c).8 bits
5.What type of computer chips are said to be volatile
a).ROM Chips
b).RAM Chips
c).DRAM Chips
6. The interface between level-2 (operating system) and level-1 (Microprogram) of a computer
design is called:
a).Computer architecture
b).Virtual machine interface
c).User interface
d).All of the above
9. A micro computer has primary memory of 640k . What is the exact number of bytes contained
in this memory?
Answers
The sequence of instructions read from memory constitutes an instruction stream. The operations
performed on the data in the processor constitutes a data stream. Parallel processing may occur in
the instruction stream, in the data stream, or in both. Flynn's classification divides computers into
four major groups as follows:
SISD represents the organization of single computer containing a control unit, a processor unit
and memory unit. Instruction are executed sequentially and system may or may not have internal
parallel processing capabilities. Parallel processing in this case may be achieved by means of
multiple functional units or by pipeline processing.
SIMD represents an organization that includes many processing units under the supervision of a
common control unit .All processor receive the same instruction from the control unit but operate
on the different items of data. The shared memory unit must contain multiple modules so that it
can communicate with all the processors simultaneously. MISD structure is only of theoretical
interest since no practical system has been constructed using this organization. MIMD
organization refer to computer system capable of processing several programs at the same time.
Most multiprocessor and multicomputer systems can be classified in this category.
Flynn's classification depends on the distinction between the performance of the control unit and
the data-processing unit. It emphasizes the behavior characteristics of the computer system
rather than its operational and structural interconnections. One type of parallel processing that
does not fit Flynn's classification is pipelining.
8.2 Pipelining
Pipelining is a technique of decomposing a sequential process into suboperations, with each
subprocess being executed in special dedicated segment that operates concurrently with all other
segments. A pipeline can be visualized as a collection of processing segments through which
binary information flows.
Each segment performs partial processing dictated by the way the task is partitioned. The result
obtained from the computation in each segment is transferred to the next segment in the pipeline.
The final result is obtained after the data have passed through all segments. The name "pipeline"
implies a flow of information analogous to an industrial assembly line. It is characteristic of
pipelines that several computations can be in progress in distinct segments at the same time. The
overlapping of computation is made possible by associating a register with each segment in the
pipeline. The registers provide isolation between each segment so that each can operate on
distinct data simultaneously.
Perhaps the simplest way of viewing the pipeline structure is to imagine that each segment
consists of an input register followed by a combinational circuit. The register holds the data and
the combinational circuit performs the suboperation in the particular segment. The output of the
combinational circuit in a given segment is applied to the input register of the next segment. A
clock is applied to all registers after enough time has elapsed to perform all segment activity. In
this way the information flows through the pipeline one step at a time.
The pipeline organization will be demonstrated by means of a simple example. Suppose that we
want to perform the combined multiply and add operations with a stream of numbers.
Ai*Bi + Cj
for i = 1,2,3, . . . , 7
Each suboperation is to be implemented in a segment within a pipeline. Each segment has one or
two registers and a combinational circuit as shown in Figure 8.2. Rl through R5 are registers that
receive new data with every clock pulse.The multiplier and adder are combinational circuits. The
suboperations performed in each segment of the pipeline are as follows:
R1<--Aj, R2<--Bj Input Aj and Bj
R3<--Rl*R2, R4-Cj Multiply and input Ci
R5<--R3 + R4 Add Cj to product
Consider a computer with an instruction fetch unit and an instruction execution unit designed to
provide a two-segment pipeline. The instruction fetch segment can be implemented by means of
a first-in, first-out (FIFO) buffer. This is a type of unit that forms a queue rather than a stack.
Whenever the execution unit is not using memory, the control increments the program counter
and uses its address value to read consecutive instructions from memory. The instructions are
inserted into the FIFO buffer so that they can be executed on a first-in, first-out basis. Thus an
instruction stream can be placed in a queue, waiting for decoding and processing by the
execution segment. The instruction stream queuing mechanism provides an efficient way for
reducing the average access time to memory for reading instructions. Whenever there is space in
the FIFO buffer, the control unit initiates the next instruction fetch phase. The buffer acts as a
queue from which control then extracts the instructions for the execution unit.
Computers with complex instructions require other phases in addition to the fetch and execute to
process an instruction completely. In the most general case, the computer needs to process each
instruction with the following sequence of steps.
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
There are certain difficulties that will prevent the instruction pipeline from operating at its
maximum rate. Different segments may take different times to operate on the incoming
information. Some segments are skipped for certain operations. For example, a register mode
instruction does not need an effective address calculation. Two or more segments may require
memory access at the same time, causing one segment to wait until another is finished with the
memory. Memory access conflicts are sometimes resolved by using two memory buses for
accessing instructions and data in separate modules. In this way, an instruction word and a data
word can be read simultaneously from two different modules.
The design of an instruction pipeline will be most efficient if the instruction cycle is divided into
segments of equal duration. The time that each step takes to fulfill its function' depends on the
instruction and the way it is executed.
Vector Operations
Many scientific problems require arithmetic operations on large arrays of numbers. These
numbers fire usually formulated as vectors and matrices of floating-point numbers. A vector is
an ordered set of a one-dimensional array of data items. A vector V of length n is represented as
a row vector by V = [VI V2 V3 ... Vn]. It may be represented as a column vector if the data items
are listed in a column. A conventional sequential computer is capable of processing operands one
at a time. Consequently, operations on vectors must be broken down into single computations
with subscripted variables. The element Vi ot vector V is written as V(I) and the index I refers to
a memory address or register where the number is stored. To examine the difference between a
conventional scalar processor and a vector processor, consider the following Fortran DO loop:
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
This is a program for adding two vectors A and B of length 100 to produce a vector C. This is
implemented in machine language by the following sequence of operations.
Initialize I = 0
20 Read A (I)
Read B ( I )
Store C (I) = A (I) + B (I)
Increment I = I + 1
If I :5 100 go to 20
Continue
This constitutes a program loop that reads a pair of operands from arrays A and B and performs a
floating-point addition. The loop control variable is then up~ and the steps repeat 100 times.
A computer capable of vector processing eliminates the overhead associated with the time it
takes of fetch and execute the instructions in the program loop.It allows operations to be
specified with single vector instruction of the form
C(1:100) = A(1:100) + B(1:100)
Vector Instructions
An SIMD array processor is a computer with multiple processing units operating in parallel. The
processing units are synchronized to perform the same operation under the control of common
control unit, thus providing a single instruction stream, multiple data stream organization. A
general block diagrams of an array is shown in the below diagram. It contains a set of identical
processing elements (PE),each having a local memory M. Each processor includes an ALU, a
floating point arithmetic unit , and working registers. The master control unit controls the
operations in the processor elements.The main memory is used for storage of the program.The
Function of the master control unit is decode the instructions and determine how the
instructions and determine how the instruction is to be executed. Scalar and program control
instructions are directly executed within the master control units. Vector instructions are broad
cast to all PEs simultaneously. Each PE uses operand stored in its local memory. Vector
operands distributed to the local memories prior to parallel execution of the instruction.
8.8 Reduced Instruction set computer: RISC VS CISC
Many computers have instruction sets that include more than 100 and sometimes even
more than 200 instructions. These computers also employ a variety of data types and a
large number of addressing modes. The trend into computer hardware complexity was
influenced by various factors, such as upgrading existing models to provide more customer
applications, adding instructions that facilitate the translation from high-level language into
machine language programs, and striving to develop machines that move functions from
software implementation into hardware implementation. A computer with a large number
of instructions is classified as a complex instruction set computer, abbreviated CISC.
ClSC Characteristics
1. A large number of instructions-typically from 100 to 250 instructions.
2. Some instructions that perform specialized tasks and are used infrequently.
3. A large variety of addressing modes-typically from 5 to 20 different modes.
4. Variable-length instruction formats
5. Instructions that manipulate operands in memory
RISC Characteristics
1.Relatively few instructions.
2.Relatively few addressing modes.
3. Memory access limited to load and store instructions
4.All operations done within the registers of the CPU
Q3. Which Pipeline are used to implement floating-point operations, multiplication of fixed-
point numbers, and similar computations encountered in scientific problems.
a) Instruction Pipeline
b) Arithmetic Pipeline
c) RISC Pipeline
d) CISC Pipeline
Q4. SIMD does not include
a) ALU
b) Processing Element
c) Local memory
d) Control Unit
Q10. A measure used to evaluate supercomputer computer in their ability to perform a given
number of floating-point operations per second is referred
a) Flops
b) Bytes
c) Bits
d) Hz
Answers
1.(a), 2.(b),3.(b), 4.(d) ,5.(a) ,6(d), 7.(a) ,8.(c), 9.(d) , 10.(a)