You are on page 1of 154

ABI 302

Digital and
Computer
Organization ACeL
The student will learn to develop simple assembly language
AMITY
programs and understand the underlying operations of more complex
UNIVERSITY
programs using Intels 8085 Microprocessor.
Syllabus
Course Objective:
The student will develop an understanding of the underlying operation of a modern digital
computer, identify and understand the various "building blocks" from which a modern computer
is constructed. The student will learn to develop simple assembly language programs and
understand the underlying operations of more complex programs using Intels 8085
Microprocessor.

Course Contents:
Module I: Digital Logic Fundamentals
Boolean Algebra: Basic Functions, Manipulating Boolean functions, Basic Combinational Logic:
Adder/Subtractor, Decoders, Encoders, Multiplexers, Memory, Basic Sequential Circuits: Flip-
flops, Registers, Counters.

Module II
General Computer Architecture
Block Diagram of typical Computer, Memory Section, Input/Output Section, CPU, Registers,
Arithmetic Unit, Instruction handling Areas, Stacks
Micro operations: Register Transfer, Bus and Memory Transfer, Arithmetic Micro operations,
Logic Micro operations, Shift Micro operations, Arithmetic Logic Shit Unit

Module III
Basic Computer Organization and Design: Instruction Codes, Operation code, Timing and
Control, Instruction Cycle, Memory Reference Instructions, Input Output Instructions and
Interrupts
Control Memory: Control Word, Microinstruction, Microprogramming, Control Memory,
Hardwired
Module IV
Central Processing Unit: General Register Organization, Stack Organization, Instruction
Formats, Addressing Modes, RISC, CISC
Pipelining and Vector Processing: Parallel Processing, Pipelining, Arithmetic Pipeline,
Instruction Pipeline, Vector Processing, Array Processors

Module V
Input Output Organization: I/O Interface, Asynchronous Data Transfer, Modes of Transfer,
Priority Interrupt, DMA, IOP, Serial Communication
Memory Organization: Associative Memory, Cache Memory, Virtual Memory

Module VI
Introduction to Microprocessor: Machine Language, Assembly Language, Assembler, High
Level Language, Compiler, Interpreter, Internal Architecture
8085.

Text:
Computer System Architecture, M. M. Mano, Pearson Education.

References:
Computer Architecture and Organization, J.P Hayes, TNH.
Lance A Leventhal Introduction to Microprocessors: Software, Hardware, Programming
Table of Contents
Syllabus .......................................................................................................................................... 2
Chapter 1 : Digital Logic Fundamentals .................................................................................... 7
1.1 Overview ............................................................................................................................... 7
1.2 Boolean Algebra .................................................................................................................... 7
1.3 Combinatorial Logic Circuit ............................................................................................... 11
1.4 Multiplexers ....................................................................................................................... 16
1.5 Decoders and Encoders ....................................................................................................... 18
1.6 Memory ............................................................................................................................... 20
1.7 Flip-Flop .............................................................................................................................. 21
1.8 Sequential circuit ................................................................................................................. 23
1.9 Register................................................................................................................................ 25
1.10 Counters ........................................................................................................................... 26
1.11 Digital Circuit ................................................................................................................... 27
MULTIPLE CHOICE QUESTIONS ........................................................................................ 28
Chapter 2 : Introduction to computer organization ................................................................ 30
2.1 Functional Units .................................................................................................................. 30
2.2.System Buses....................................................................................................................... 31
2.3 Memory Subsystem Organization ....................................................................................... 32
2.4 I/O Subsystem Organization and Interfacing .................................................................... 36
End chapter quizzes ................................................................................................................... 37
Chapter 3 : Programming the Basic computer ........................................................................ 40
3.1 Introduction: ....................................................................................................................... 40
3.2 The level of programming languages .................................................................................. 40
3.3 Translators ........................................................................................................................... 45
MULTIPLE CHOICE QUESTIONS ........................................................................................ 52
Chapter 4 : CPU ORGANIZATION........................................................................................ 54
4.1 General Register Organization ......................................................................................... 54
4.2 Stack Organization ............................................................................................................. 56
4.3 Register Organization .......................................................................................................... 57
4.3.1 Program Visible Register.............................................................................................. 57
4.4 Status and Control Registers ............................................................................................. 58
4.5 Register Transfer Language .............................................................................................. 60
4.6 Register Transfer ................................................................................................................. 60
4.7 Bus And Memory Transfer ........................................................................................... 61
4.8 Arithmetic Micro operations ............................................................................................ 63
4.9 Logic Microoperation ......................................................................................................... 64
4.10 Shift Micro operations...................................................................................................... 65
4.11 Control Unit....................................................................................................................... 66
4.11.1 Structure of control unit .............................................................................................. 66
4.11.2 Hardwired Control Unit .............................................................................................. 67
4.11.3 A Micro-programmed Control Unit .......................................................................... 68
4.12 Case Study: 8085 Microprocessor .................................................................................... 70
MULTIPLE CHOICE QUESTIONS ........................................................................................ 72
Chapter 5 : Computer Arithmetic ............................................................................................. 75
5.1 INTRODUCTION ............................................................................................................... 75
5.2 Number Systems ................................................................................................................ 76
5.3 Addition and Subtraction .................................................................................................... 79
5.4 Multiplication ................................................................................................................... 80
5.4.1 Booths Algorithm for multiplication ........................................................................ 81
5.5 Division .............................................................................................................................. 81
5.6 Floating Point ...................................................................................................................... 83
5.6.1 Normalization .............................................................................................................. 85
End chapter quizzes ................................................................................................................... 88
Chapter 6 : Input-Output Organization .................................................................................. 91
INTRODUCTION ..................................................................................................................... 91
6.1 PERIPHERAL DEVICES ................................................................................................. 92
6.1.1 Keyboard ...................................................................................................................... 92
6.1.2. Pointing devices ........................................................................................................... 92
6.1.3. High-degree of freedom input devices ....................................................................... 93
6.1.4. Imaging and Video input devices ............................................................................... 93
6.1.5. Medical Imaging ......................................................................................................... 93
6.2 OUTPUT DEVICES ........................................................................................................... 94
6.3 INPUT/OUTPUT INTERFACE ......................................................................................... 94
6.3.1 BUS AND INTERFACE MODULES ..................................................................... 95
6.3.2. CONNECTION OF I/O BUS ................................................................................... 96
6.4 ASYNCHRONOUS DATA TRANSFER .......................................................................... 98
6.4.1. Synchronous and Asynchronous Operations ............................................................... 98
6.4.2 Asynchronous Data Transfer ........................................................................................ 99
6.4. ASYNCHRONOUS SERIAL TRANSFER................................................................... 101
6.5 DIRECT MEMORY ACCESS ...................................................................................... 110
6.5.1. DMA I/O OPERATION .......................................................................................... 111
6.5.2. CYCLE STEALING................................................................................................. 111
6.6 PRIORITY INPUT/OUTPUT PROCESSOR .................................................................. 113
Multiple choice Questions....................................................................................................... 114
Chapter 7 : Memory Organization ......................................................................................... 117
7.1 Memory Hierarchy ............................................................................................................ 117
7.2 Main Memory.................................................................................................................... 118
7.2.1 RAM and ROM Chips ................................................................................................ 118
7.2.2 Memory Address maps ............................................................................................... 119
7.3. Auxiliary Memory ............................................................................................................ 121
7.3.1 Magnetic Tapes........................................................................................................... 121
7.3.2 Magnetic Disks .......................................................................................................... 122
7.3.3 RAID .......................................................................................................................... 122
7.4 Associative memory ......................................................................................................... 124
7.5 Cache memory................................................................................................................... 126
7.6 Virtual Memory Concept .................................................................................................. 133
7.6.1 Address Mapping ........................................................................................................ 134
7.6.2 Associative memory page table .................................................................................. 135
Multiple choice questions........................................................................................................ 137
Chapter 8 : Pipeline and Vector Processing ........................................................................... 139
8.1 Parallel Processing ............................................................................................................ 139
8.2 Pipelining .......................................................................................................................... 141
8.3 Arithmetic Pipeline ........................................................................................................... 142
8.4 Instruction Pipeline ........................................................................................................... 144
8.5 RISC Pipeline .................................................................................................................... 145
8.6 Vector Processing .............................................................................................................. 146
8.7 Array Processors ............................................................................................................... 149
8.8 Reduced Instruction set computer: RISC VS CISC .......................................................... 151
End Chapter quizzes:............................................................................................................... 152
Chapter 1 : Digital Logic Fundamentals

1.1 Overview
Gates, latches, memories and other logic components are used to design computer
systems and their subsystems

Good understanding of digital logic is necessary in order to learn the fundamentals of


computing systems organization and architecture

Two types of digital logic:

Combinatorial logic: output is a function of inputs

Sequential logic: output is a complex function of inputs, previous inputs and


previous outputs

Neither combinatorial logic nor sequential logic is better than the other. In practice, both
are used as appropriate in circuit design.

1.2 Boolean Algebra


A Boolean algebra value can be either true or false.

Digital logic uses 1 to represent true and 0 to represent false.

This presentation introduces the Boolean algebra basic functions and examines the
fundamental methods used to combine, manipulate and transform these functions.
AND

x y out = xy

x
0 0 0 out
y
0 1 0

1 0 0 amplitude

0 1 0 1
1 1 1 Y(t)

0 0 1 1
X(t)

Output is one if every input has value of 1 0 0 0 1


out(t)= x(t) and y(t)
More than two values can be and-ed together

For example xyz = 1 only if x=1, y=1 and z=1

OR

x y out = x+y

x
0 0 0
out
y
0 1 1

1
1 0
amplitude
1 1 1
0 1 0 1
Y(t)

0 0 1 1
X(t)

0 1 1 0
out(t)= x(t) xor y(t)
The number of inputs that are 1 matter.

More than two values can be xor-ed together.

General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an
even number of input values are 1.

XOR (Exclusive OR)

x y out =

x
0 0 0 out
y
0 1 1
amplitude
1 0 1
0 1 0 1
Y(t)

1 1 0 0 0 1 1
X(t)

0 1 1 0
The number of inputs that are 1 matter. out(t)= x(t) xor y(t)

More than two values can be xor-ed together.

General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an
even number of input values are 1.
NOT

x x'
x x'
0 1

1 0
amplitude

0 1 0 1
x(t)

1 0 1 0
x'(t)

This function operates on a single Boolean value.

Its output is the complement of its input.

An input of 1 produces an output of 0 and an input of 0 produces an output of 1

XNOR

out =x xnor y
x y
x
out
0 0 1 y

0 1 0

1 0 0
amplitude

0 1 0 1
1 1 1 Y(t)

0 0 1 1
X(t)

1 0 0 1
out(t)= x(t) xnor y(t)
Output value is the complemented output from an XOR function.

1.3 Combinatorial Logic Circuit

Combinatorial Logic Circuit that implements the function xy+yz DeMorgans Law

(ab)=a+b
(a+b)=ab

xy'+yz
z
Property for generating equivalent functions

Allows conversion of AND function to an equivalent OR function and vice-versa

It may allow the simplification of complex functions, that will allow a simpler design

It is useful in generating the complement of a function

Generating the complement of a function using DeMorgans law


(xy + yz) = (xy)(yz) = (x + y)(y + z) = xy + xz + yy + yz (because yy=0) =>
(xy+yz) = xy + xz + yz

x y z x'y' x'z' yz' x'y + y'z + yz'

0 0 0 1 1 0 1

0 0 1 1 0 0 1

0 1 0 0 1 1 1

0 1 1 0 0 0 0

1 0 0 0 0 0 0

1 0 1 0 0 0 0

1 1 0 0 0 1 1

1 1 1 0 0 0 0
Karnaugh Map (K map)

Method for minimizing logic

Is used to represent the values of a function for different input values

The rows and columns of the K-map correspond to the possible values of the function's
input

Each cell in the K-map represents a minterm (i.e. a three variables function has: xyz,
xyz, xyz, xyz, xyz, xyz, xyz and xyz)

Gray Code

Depends on the number of bits in its value

The 1-bit Gray code serves as basis for the 2-bit Gray code, the 2-bit Gray code is the
basis for 3-bit Gray code, etc

Gray code sequences are cycles: 000 -> 001 -> 011 -> 010 -> 110 -> 111 -> 101 -> 100 -
> 000

Adjacent values differ by only one bit


K-map Example

Lets consider (xy+yz) = xy + xz + yz

Group together the 1s in the map:

g1: xyz+xyz=xy(z+z)=xy

g2: xyz+xyz = yz(x+x)=yz

g3: xyz+xyz=xz(y+y)=xz

To derive a minimal expression we must select the fewest groups that cover all active
minterms (1s).

(xy + yz)= xy + yz

x y Z x'y'+y'z'+yz'

0 0 0 1

0 0 1 1

0 1 0 1

0 1 1 0

1 0 0 0

1 0 1 0
1 1 0 1

1 1 1 0

K-map for more complex function


wxyz + wxyz + wxyz + wxyz + wxyz + wxyz
The final minimized function is:
xz + wx + wxyz
1.4 Multiplexers
It is a selector: it chooses one of its data inputs and passes it to the output according to
some other selection inputs

Consider four binary data inputs as inputs of a multiplexer. Two select signals will
determine which of the four inputs will be passed to the output.

Figure (a) presents the internal structure of a four inputs multiplexer, b and c present the
multiplexer schematic representation with active high enable signal (b) and active low
enable signal (c)
Multiplexer schematic representation with active high enable signal
Multiplexer schematic representation with active low enable signal

Multiplexers can be cascaded to select from a large number of inputs 4 to 1 multiplexer

made of 2 to 1 multiplexers

1.5 Decoders and Encoders


A decoder accepts a binary value as input and decodes it.

It has n inputs and 2n outputs, numbered from 0 to 2n -1.

Each output represents one minterm of the inputs

The output corresponding to the value of the n inputs is activated

For example, a decoder with three inputs and eight outputs will activate output 6
whenever the input values are 110.

Figure (a) shows a two to four decoder internal structure, (b) and (c) show its schematic
representation with active high enable signal and active low enable signal
For inputs S1S0 = 00, 01, 10 and 11 the outputs are 0, 1, 2 respectively 3 are active

As with the multiplexer, the output can tri-state all outputs


Decoders can have active high or active low enable signals.

Variants:

have active low outputs (the selected output has a value 0 and all the other
outputs have a value 1)

output all 0 when not enabled instead of state Z (the ones in the figure).

Encoders
The encoder is the exact opposite of the decoder.

It receives 2n inputs and outputs a n bit value corresponding to the one input that has a
value of 1.

An 4-to-2 encoder and its schematic representations are presented in (a), (b) and (c).

1.6 Memory

The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were
equipped with additional storage device beyond the capacity of main memory.The main memory
unit that communicates directly with CPU is called the MAIN MEMORY . Devices that
provide backup storage are called AUXILARY MEMORY. Most common auxiliary devices are
magnetic disks and tapes they are used for strong system programs, large data files and other
backup information. Only programs and data currently needed by the processor resides in main
memory. All other informations are stored in auxiliary memory and transferred to the main
memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer
system from the slow but high capacity auxiliary memory to a relatively faster main memory, to
an even smaller and faster cache memory accessible to the high-speed processing logic. Memory
Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the
memory system.
1.7 Flip-Flop

Every digital circuit likely to have a combinational circuit , most systems encountered, its also
include storage elements which describe in terms of sequential circuits. The most common type
of sequential circuit is synchronous.The storage elements employed in clocked sequential circuits
are called flip-flops. A flip-flop is a binary cell capable of storing one bit of information.

Basic RS Flip-Flop (NAND)

Clocked D Flip-Flop
Clock Pulse Definition

Master-Slave Flip-Flop

T Flip-Flop
JK Flip-Flop

1.8 Sequential circuit


A sequential circuits is an interconnection of flip-flop and gates.The gates by themselves
constitutes a Combinational circuit ,but when included with the flip-flop, the overall circuit is
classified as a Sequential circuit .
State Tables and State Diagrams

Input
Present state x

Next state
Y
x/z
Input/output
y Y/z
Present state y

Next
state/output
(a) (b)

Sequential Circuit Example


Input x
0 1
A D/0 C/1
P resent B B/1 A/0
state C C/1 D/0
D A/0 B/1

(a)

0/1
1/1
A C

0/0
1/0 1/0
0/0

B D
1/1
0/1
x/z
(b)
1.9 Register
A register is a group of flip-flops with each flip-flop capable of storing one bit of information.
An n-bit register has a group of n flip-flop and is capable of storing any binary information of n-
bits. In addition to flip-flops a register may have combinational gates that perform certain data -
processing tasks. The flip-flops hold the binary information and the gates control when and how
new information is transferred into the register.

4-Bit Parallel Register

D and Q are both sets of lines, with the number of lines equal to the width of each register.
There are often multiple address ports, as well as additional data ports.
1.10 Counters
A register that goes through a predetermined of states upon the application of input pulses is
called a counter. The input pulses may be clock pulses or may originate from external sources.
They may occur at uniform interval of time or at a random. Counters are found in almost all
equipment containing digital logic.

Binary counter

Count-down counter: A binary counter with reverse count: Starts from 15 goes down.

In a count-down counter the least significant bit is complemented with every count
pulse. Any other bit is complemented if the previous bit goes from 0 to 1.

We can use the same counter design with negative edge flip-flops to make a count-down
flip-flop.

A BCD counter starts from 0 ends at 9.

State diagram of Binary counter.


1.11 Digital Circuit
MULTIPLE CHOICE QUESTIONS

1. Which of the following is not a Boolean Algebra function?


a) AND
b) OR
c) XAND
d) XNOR
2. Which of the following is a De- Morgan Law ?
a) a+b = ab
b) (a-b) = ab
c) (ab)=a+b
d) a+b=a + b
3. Transistors are made up of?
a) Lithium
b) Neon
c) Silicon
d) Nickel
4. How many Type of memory are there?
a) 1
b) 2
c) 3
d) 4
5. Which of the following is not a counter?
a) Century counter
b) Ring Counter
c) Johnson Counter
d) Decade Counter
6. What is the main advantage of synchronous logic?
a) Complexity
b) Simplicity
c) Serial Port
d) Regularity
7. How many types of counters are there?
a) 6 type
b) 7 type
c) 8 type
d) 9 type
8. Asynchronous & Synchronous counter is created from?
a) Decoder
b) Encoder
c) JK flip Flop
d) D Flip flop
9. Which of the following is not a type of register?
a) Active Register
b) Data Register
c) Address Register
d) Vector Register

10 . Which of the following is a type of register?

a) Speed Register
b) Shift Register
c) User- accessible Register
d) Constant Register

ANSWERS

1. D, 2.C,3.B,4.A,5.B,6.B,7.C,8.A,9.A ,10.B
Chapter 2 : Introduction to computer organization

2.1 Functional Units


CPU Central Processing Unit, which includes:
o ALU Arithmetic and Logic Unit
o Set of registers (including program counter keeping the address of the next
instruction to be performed)
o Control Unit decoding instructions and controlling internal data
movement among different CPU parts
o Interface Unit moves data between CPU and other hardware components

Memory

o The Main Memory Primary Storage or RAM Random Access Memory


Holds data or a piece of a program code (instructions)
Consists of a large number of cells, each individually addressable.
Cell is called byte and consists of 8 bits (binary elements holding
0 or 1)
Usually a word consists of 4 bytes.
In most cases volatile memory
o The Multilevel Cache - An intermediate stage between ultra-fast registers
(inside CPU) and much slower main memory.
Most actively used information in the main memory is duplicated
in the cache, which is faster, but of much lesser capacity.
2.2.System Buses
2.2.1 Memory Bus
The memory bus is the set of wires that is used to carry memory addresses and data to and from
the system RAM. The memory bus in most PCs is also shared with the processor bus, connecting
the system memory to the processor and the system chipset. The memory bus is part of the PC's
hierarchy of buses, which are high-speed communications channels used within the computer to
transfer information between its components.
The memory bus is made up of two parts:
Data bus: carries actual memory data within the PC
Address bus: selects the memory address that the data will come from or go to on a read or
write

2.2.2 I/O Bus


The I/O bus facilitates communication between CPU and all the interface units.
I/O bus is of two types:
Isolated I/O
o Separate I/O read/write control lines in addition to memory read/write control
lines
o Separate (isolated) memory and I/O address spaces
o Distinct input and output instructions
Memory-mapped I/O
o A single set of read/write control lines (no distinction between memory and I/O
transfer)
o Memory and I/O addresses share the common address space which reduces
memory address range available
o No specific input or output instruction, as a result the same memory reference
instructions can be used for I/O transfers
o Considerable flexibility in handling I/O operations
2.3 Memory Subsystem Organization
2.3.1. Memory Hierarchy

Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost
of the memory system.
2.3.2. Memory Location: Address Map
Address space assignment to each memory chip. Example: 512 bytes RAM and 512 bytes ROM.

Hexa Address
Component addres bus
10 9 8 7 6 5 4 3 2 1
s
RAM 1 0000 - 007F 0 0 0 x x x x x x x
RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x

2.3.3. Types of Memory


Main Memory
o Consists of RAM and ROM chips
o Holds instructions and data for the programs in execution
o Volatile
o Fast access
o Small storage capacity

Chip select 1 CS1


Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7

Typical RAM chip


Chip select 1 CS1
Chip select 2 CS2
512 x 8 8-bit data bus
ROM
9-bit address AD 9

Typical ROM chip

Auxiliary Memory
o Information organized on magnetic tapes
o Auxiliary memory holds programs and data for future use
o Non-volatile data remains even when the memory device is taken off-power
o Slower access rates
o Greater storage capacity

Associative Memory
o Accessed by the content of the data rather than by an address
o Also called Content Addressable Memory (CAM)
o Used in very high speed searching applications
o Much faster than RAM in virtually all search applications
o Higher costs
Argument register(A)

Key register (K)

Match Register

Input Associative memory


array and logic
M
Read m words
Write n bits per word

CAM Hardware Organization

Cache Memory

o Cache is a fast small capacity memory that should hold those information which
are most likely to be accessed
o The property of Locality of Reference makes the Cache memory systems work
o Locality of Reference - The references to memory at any given time interval
tend to be confined within a localized areas. This area contains a set of
information and the membership changes gradually as time goes by.
Temporal Locality - The information which will be used in near future is
likely to be in use already
Spatial Locality - If a word is accessed, adjacent(near) words are likely
accessed soon
2.4 I/O Subsystem Organization and Interfacing
2.4.1. A Model of I/O Subsystem Organization

User Processes

Directory Management

File System
Logical I/O Communication
Architecture

Physical Organization

Device I/O

Scheduling & Control

Hardware

Local peripheral device Communications port File System

2.4.2. I/O Interface

Provides a method for transferring information between internal storage (such as memory
and CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
Peripherals - Electromechanical Devices
CPU or Memory - Electronic Device

Data Transfer Rate


o Peripherals - Usually slower
o CPU or Memory - Usually faster than peripherals; Some Synchronization
mechanism may be needed
unit of Information
o Peripherals Byte, Block, etc.
o CPU or Memory Word
o

I/O data
Port A
register
Bidirectional Bus
data bus buffers I/O data
Port B
In register
te
CPU
Chip select
CS rn I/O
Register select
RS1
al Control Control Device
Timing b register
Register select
RS0 and u
I/O read Control s
RD Status
I/O write Status
WR register

End chapter quizzes


Q1.Control unit basically
a) Decode instruction
b) Moves the data
c) It is a logical unit
d) Used to save the data in register

Q2. CPU does not include


a) ALU
b) Registers
c) Control Unit
d) System Bus
Q3. Memory bus is made up of two parts
a) Data Bus and Address bus
b) Data bus and I/O bus
c) Main memory and address bus
d) Address bus and I/O bus

Q4. RAM and ROM chip


a) Volatile and Non volatile
b) Non volatile and Volatile
c) Both are volatile
d) Both are non volatile

Q5. I/O interface work for


a) Synchronization
b) Transferring information
c) Managing memory
d) Process management

Q6. Special high speed memory


a) Auxiliary memory
b) Main memory
c) Associative memory
d) Cache memory

Q7.Memory is assigned with help of


a) Memory location
b) Memory management
c) Memory address map
d) By assignment
Q8. CPU- Central Processing Unit has how many parts
a) 5 parts
b) 4 parts
c) 3 parts
d) 2 parts

Q9. Associative Memory is also known as


a) RAM
b) ROM
c) CAD
d) CAM

Q10. What is the basic function of I/O.


a) Connect to CPU
b) Connect to Memory
c) Connect to Output Devices
d) Connect to ROM

ANSWERS
1. c, 2.d, 3.a, 4.a, 5.b, 6.d, 7.c, 8.b, 9.d, 10. c
Chapter 3 : Programming the Basic computer

3.1 Introduction:
A total computer system includes both hardware and software. Hardware consists of physical
components and all associated equipment. Software refers to the program that written to the
computer. It is possible to be familiar with various aspects of computer software without being
concerned with of how computer hardware operates. It is also possible to design the parts of
software without a knowledge of software capabilities. A program written by a user may be
either dependent or independent of the physical computer that runs his program. For this
programming language is a interface between computer and programmer. The set of instructions
written in a language which is being translated in machine compatible language.

3.2 The level of programming languages

3.2.1 Low level Programming Language:


2.2.1.1. First generation language.
2.2.1.2. Second generation language.
Low level programming language includes machine languages and Assembly
languages. Both are machine dependent like a programs written in low-level
languages for IBM PC cannot be run in a Macintosh computer. e.g. LEGO
programming. Low level languages are used to control the hardware for e.g.
Driver programs are written in low-level languages. And also Programs
written in low-level languages are usually small in size and efficient.

3.2.2. High level Programming Language


1. Third Generation Language (3GL)
2. Fourth Generation Language (4GL)

3.2.3. Comparison : High & Low Level languages


3.2.1.1 First generation languages
The first generation programming languages:
Machine languages
Defined by the hardware design of the computer
Machine dependent.
Instruction formed by binary digits (0or 1)Corresponding to an ON or OFF state
of an electric circuit .
Machine Instruction has two parts: op code and operand
1. Op code : Operation to be performed as a verb like increment, add , copy etc.
2. operand: operand acts like a noun on the data for example incrementing the value
of A, A is the operand. It can be a value or an address of memory location.(Remark:
variable is machine language is in form of address, i.e address = variable)

3.2.1.2. Second generation languages


Assembly languages: The user employs symbols (letter numerals , or special
characters ) for the operation parts , the address part and other parts of instruction
code. Each symbolic instruction can be translated in to one binary coded instruction.
This translation is done by the special program called assembler. Because an
assembler translates the symbols, this type of symbolic program is referred to as an
assembly language program and the language is assembly language.
Instructions are 1:1 with machine instructions.
Symbolic instruction code (or mnemonics).
1. Using symbolic instruction code to replace binary digits.
2. Meaningful abbreviations that substitute the op code
3. e.g. JMP A means jump to address represented by A.
3.2.3 High-Level Programming Languages
Machine-independent
Third generation languages or above
More like human languages
Let programmers concentrate on the logic rather than machine architecture.
These are the special languages developed to reflected the procedure used
in the solution of a problem rather than be concerned with computer
hardware behavior e.g. FORTAN.

3.2.3.1. Third generation language (3GL)


More human like than assembly language
Meet demand of efficiency and effectiveness
Examples : C, Fortran, COBOL, BASIC, PASCAL etc.
Each high-level language statement is translated () into many machine
instructions.

3.2.3.2. Fourth generation language (4GL)


Each 4GL instruction represents many 3GL statements.
Typical example: SQL (Structured Query Language).
e.g. SELECT * FROM student ORDER BY class, class_no;
will display all the records in table Student in ascending order of class and
class_no.
If the same task is written in a 3GL, instructions have to be written:
Sort the records in ascending order of class and class_no;Get each record, test
for end of file, put each item on screen;Go back and repeat the operation until
no more records to process.
2.2.3. Comparing Low & High-level Languages (1)
1. Advantages of high-level language

Advantages of HIGH level Reasons


language
1. more effective designed to solve problem

2. easier to learn, write, read and instructions are English-like format


modify
3. programs are shorter in length one high-level language statement
take the place of many machine
instructions
4. portable (can use in different machine-independent
computers with little
modification)

2. Advantages of low-level language


Advantages of LOW level Reasons
language
1. more efficient hardware features, like bus width, is
fully made use of
2. require smaller memory space to fewer low-level instructions are
run needed to do the same task (no
redundant instructions produced by a
translator)
3. precise control of hardware some low-level instructions are
designed specifically for controlling
the hardware at low level

Examples of High Level Languages

Designed to solve different problems.


Ranging from business to games .
Some common programming languages:

1. PASCAL
Designed for teaching structured programming.

2. C
Designed to operate computer at low-level using high level language.
More readable and better structured than Assembly language.
Similar language C++ is popular in game programming.
3. Java
Includes some syntax of C.
Java programs can be called from HTML document or run directly.
Computer needs Java Virtual Machine to execute Java Programs e.g. Yahoo
Snooker .

4. COBOL
Designed for business applications
Wordy language
Readable for beginners
e.g. multiply hourly-rate by hours-worked giving gross-pay
i.e. gross-pay hourly-rate * hours-worked. But, can be clumsy for
long programs.

5. BASIC
Designed to provide students with an easy-to-learn language.

Visual Basic
A version of BASIC
Specialized for developing Windows applications

Belong to Rapid Application Development (RAD)


Programming can be done by drag-and-drop
Programmers can quickly build an application

Visual Basic.NET
Newest version of Visual Basic
Supports all Web-based features
6. Script Languages
Interpreted and processed by a software (e.g. Browser)
e.g. VBScript and JavaScript
have similar syntax as Visual Basic and Java respectively
Interpreted and processed by Web browser

Instructions are embedded in HTML documents


Increasing the function of HTML
Making Web pages more interactive

3.3 Translators
Translator consist of three attributes source program, translator and object program. Source
program is a text file used to for storing source code which must be translated into machine
instructions before execution .Translator is a software to convert source code into machine
instructions. And it is used to discover syntax errors in a program.Object program is a binary file
e.g. EXE used for storing machine instructions during execution.

Types of translators:
3.3.1. Assembler
3.3.2. Compiler
3.3.3. Interpreter

3.3.1. Assemblers
A software used to translate assembly language program into machine instructions for producing
one machine instruction for each assembly language statement which translate the whole source
program before execution and the store the results in an object program(During execution, need
the object program only).
3.3.2 Compilers

Compiler (e.g. Visual Basic Make EXE)


A software
Translate high-level language program into machine instructions
Translate the whole source program before execution
Storing the results in an object program

3.3.3. Interpreters (e.g. Visual Basic Run)


A software for translating ONE high-level language program statement at a
time send it to the computer for execution and then proceed to the NEXT
statement.
No object program is produced when program is run, the following must be
present interpreter and source code.
Interpreted program runs slower than compiled ones because time is spent
on translation .
Like by most of the programmers as it is more convenient in testing.

3.3.4. Comparison of Translators

Assembler Compiler Interpreter


Translate low-level language Translate high-level language into machine language
into machine language
A new file, called object program, is created No new file is produced
Programs run faster Programs run slower
Translate the whole source program Translate and execute a
statement at a time.
Less convenient for programmer. Every change in the Convenient for programmer
source program needs compilation.
Programmer can hide the source program The source program must be
present
Translator is not necessary at run-time Translator is necessary at
run-time
3.3.5. Instruction sets
An instruction set is the complete collection of instructions that are understood (executed) by a
CPU that is machine code, binary code and usually represented by assembly codes.

2.3.5.1. Elements of a Machine Instruction each instruction must contain the information
required by the CPU for execution.Elements of a Machine Instruction are:

Operation Code: Specifies the operation to be performed. The operation is specified by


a binary code, known as the operation code, or opcode.
Source operand reference: The operation may involve one or more source operands;
that is, operands that are the inputs for the operation.
Result operand reference: The operation may produce a result.
Next instruction reference: This tells the CPU where to fetch the next instruction after
the execution of this instruction is complete.

3.3.5.2. Source and Result operands:(values for that operands are of importance)

Main (or virtual) memory: The memory address must be supplied.


CPU register: CPU contains one or more registers that may be referenced by machine
instructions. If only one register exists, reference to it may be implicit. If more than one
register exists, then each register is assigned a unique number, and the instruction must
contain the number of the desired register.
I/O device: The instruction must specify the I/O module or device for the operation. If
memory-mapped I/O is used, this is just another memory address.
Each instruction is represented by a sequence of bits. The instruction is divided into fields,
corresponding to the constituent elements of the instruction.
3.3.5.3 Instruction Types

Example: High level language statement: X = X + Y


If we assume a simple set of machine instructions, this operation could be accomplished
with three instructions: (assume X is stored in memory location 624, and Y in memory
loc. 625.)
1. Load a register with the contents of memory location 624.
2. Add the contents of memory location 625 to the register,
3. Store the contents of the register in memory location 624.
As seen, a simple "C" (or BASIC) instruction, may require 3 machine instructions.

The instructions fall into one the following four categories:


Data processing: Arithmetic and logic instructions.
Data storage: Memory instructions.
Data movement: I/O instructions.
Control: Test and branch instructions.

What is the maximum number of addresses one might need in an instruction?

Virtually all arithmetic and logic operations are either unary (one operand) or binary (two
operands). The result of an operation must be stored, suggesting a third address. Finally,
after the completion of an instruction, the next instruction must be fetched, and its
address is needed.

This line of resoning suggests that an instruction could be required to contain 4 address
references: two operands, one result, and one address of the next instruction. In practice,
the address of the next instruction is handled by the program counter; therefore most
instructions have one, two or three operand addresses.
Three-address instruction formats are not common, because they require a relatively long
instruction format to hold three address references.

Number of Addresses

Instruction Set Design


The fundamental design issues in designing an instruction set are:
Operation Repertoire: How many and which operations to provide, and how complex
operations should be.
Data Types: The various types of data upon which operations are performed.
Instruction Format: Instruction length (in bits), number of addresses, size of various
fields, and so on.
Registers: Number of CPU registers that can be referenced by instructions and their use.
Addressing: The mode or modes by which the address of an operand is specified.
These issues are highly interrelated and must be considered togeter in designing an instruction
set.

3.3.6. Types of Operands

3.3.6.1. Most important general categories of data are:


Addresses
Numbers
Characters
Logical data.

Numbers:
o Integer or fixed point
o Floating point (real numbers)
o Decimal

Characters:
o International Reference Alphabet (IRA) which is referred to as ASCII in the USA,
o EBCDIC character set used in IBM 370 machines.

Logical Data:
Boolean data. 1 = true, 0 = false.

Types of Operations
A set of general types of operations is as follows:
Data transfer: Move, Store, Load (Fetch), Exchange, Clear (Reset), set, Push, Pop.
Arithmetic: Add, Subtract, Multiply, Divide, Absolute, Negate, Increment, Decrement.
Logical: AND, OR, NOT, XOR, Test, Compare, Shift, Rotate, Set control variables.
Conversion: Translate, Convert.
I/O: Input (Read), Output (Write), Start I/O, Test I/O.
Transfer of Control: Jump (Branch), Jump Conditional, Jump to Subroutine, Return,
Execute, Skip, Skip Conditional, Halt, Wait (Hold), No Operation.
System Control: Instructions that can only be executed while the processor is in a
priviledged state, or is executing a program in a special priviledged area of memory.
Instructions reserved for the use of operating system.

3.3.7. Instruction Formats


Includes opcode
Includes (implicit or explicit) operand(s)
Usually more than one instruction format in an instruction set

Instruction Length:
Affected by and affects:
Memory size
Memory organization
Bus structure
CPU complexity
CPU speed
Trade off between powerful instruction repertoire and saving space

Allocation of Bits:
Number of addressing modes
Number of operands
Register versus memory
Number of register sets
Address range
Address granularity
Variable-Length Instructions:
1. Due to varying number of operands,
2. Due to varying length of opcode in some CPU's.

MULTIPLE CHOICE QUESTIONS

1. The speed at which the CPU processes data to convert is measured in what :
a) Megahertz
b) Gigahertz
c) Nanoseconds
d) A and B

2. Machine cycles are measured in nanoseconds or picoseconds..


a) True
b) False

3. Which register example holds the address of the current instruction being processed?
a) Program counter
b) Instruction register
c) Control Unit
d) Arithmetic Logic Unit

4. What is a machine cycle :


a) Data measured in megahertz
b) Data measured in gigahertz
c) A sequence of instruction perform to execute one program instruction
d) All of the Above
5. Most of the computers available today are known as
a). 3rd generation computers b). 4 th generation computers
c).5th generation computers d). 6th generation computers
6. What difference does the 5th generation computers have from othe generation computers?
a). Technological advancement b). Scientific code
c). Object Oriented Programming d). All of the above

7.Which of the following language is translated in to pseudo code?


a). Assembly .language b). FORTRAN
c).PASCAL d). BASIC

8. Which translator translate high level into machine language


a). compiler and interpreter
b). compiler and assembler
c) assembler and interpreter

9.Which translator translate the whole source program before execution


a) compiler
b) assembler
c) interpreter

10. Instruction format does not include


a) Includes opcode
b) Includes (implicit or explicit) operand(s)
c) Usually more than one instruction format in an instruction set
d) nclude only operators

Answers
1.a , 2.b, 3.b, 4.a, 5.b 6.a , 7.a ,8.a, 9. a,10.d
Chapter 4 : CPU ORGANIZATION

4.1 General Register Organization


In a computational device memory access is the most time consuming process. So, it would be
more convenient & more efficient to store these intermediate values in processor register.

Cloc Input
k
R
1
R
2
R
3
R
4
R
5
R
6
R
Load 7
(7 lines)
SEL
A
{ MUX MUX } SELB
3x8 A bus B
decoder
bus

SEL
D OPR ALU

Output

The overall layout includes:


o Bus Systems
MUX A & B - select one of 7 register or external data input by
SELA and SELB
BUS A and B : form the inputs to a common ALU
3 X 8 Decoder: select the register (by SELD) that receives the
information from ALU
o Control Word
Format 14 binary selection inputs with 3 bits for SELA, 3 for
SELB, 3 for SELD, and 5 for OPR

Encoding of register selection fields

o ALU An operation is selected by the ALU operation selector (OPR).


The result of a microoperation is directed to a destination register selected
by a decoder (SELD).
Example of CPU Microprograms

4.2 Stack Organization

o Stack - A storage device that stores information in such a manner that the
item stored last is the first item retrieved. Hence, also called last-in first-
out (LIFO) list.
o Use of Stack
Temporary storage during program runtime
Not fixed in size: grows and shrinks using push and pop
operations
Not typed: elements of list not defined as one type
Subroutines (functions, procedures)
Linkage: saving a link back to the caller (address of
calling routine)
Data: return value, parameters, local variables
Other saved information
Managed at the assembly language level
Rules for compiler, AL programmer
o Stack Pointer (SP) - A register that holds the address of the top item in
the stack. SP always points at the top item in the stack.
o Push - Operation to insert an item into the stack.
o Pop - Operation to retrieve an item from the stack.

4.3 Register Organization


One of the key difference among various computers is the difference in their register sets. Some
computers have very large while some has smaller sets. But on the whole, from a user's point of
view the register set can be classified under two basic categories

Programmer Visible Registers: These registers can be used by machine or assembly language
programmers to minimize the references to main memory.

Status Control and Registers: These registers can not be used by the programmers but are used
to control the CPU or the execution of a program.

4.3.1 Program Visible Register

These registers can be accessed using machine language. In general we encounter four types of
programmer visible registers.
General Purpose Registers
Data registers
Address Registers
Condition Codes Registers

The general purpose registers are used for various functions desired by the processor. A true
general purpose register can contain operand or can be used for calculation of address of operand
for any operation code of an instruction. But trends in today's machines show drift towards
dedicated registers, for example, some registers may be dedicated to floating point operations. In
some machines there is a distinct separation between data and address registers. The data
registers are used only for storing intermediate results or data. These data registers are not used
for calculation of address of the operand.
An address register may be a general-purpose register, but some dedicated address registers are
also used in several machines. Examples of dedicated address registers can be:

Segment Pointer : Used to point out a segment of memory.


Index Register : These are used for index addressing scheme.
Stack Pointer (when programmer visible stack addressing is used.)

In the case of specialized register the number of bits needed for register specific details are
reduced as here we need to specify only few registers out of a set of registers. However, this
specialization does not allow much flexibility to the programmer.

Another issue related to the register set design is the number of general-purpose registers or data
and address registers to be provided in a microprocessor. The number of registers also effects the
instruction design as the number of registers determines the number of bits needed in an
instruction to specify a register reference. In general, it has been found that optimum number of
registers in a CPU is in the range 8 to 32. In case registers fall below the range then more
memory reference per instruction on an average will be needed, as some of the intermediate
result then has to be stored in the memory. On the other hand, if the number of registers goes
above 32, then there is no appreciable reduction in memory references.

4.4 Status and Control Registers


For control of various operations several registers are used. These registers can not be used in
data manipulations, however. the content of some of these registers can be used by the
programmer. Some of the control register for a von Neurnann machine can be the Program
Counter (PC), Memory Address Register (MAR) and Data Register .Almost all the CPUs, have
status register, a part of which may be programmer visible. A register which may be formed by
condition codes is called condition code register. Some of the commonly used flags or condition
codes in such a register may be:
Sign flag : This indicates whether the sign of previous arithmetic operation was positive (0) or
negative (1).
Zero flag : This flag bit will be set if the result of last arithmetic operation was zero.
Carry flag : This flag is set, if a carry results from the addition of the highest order bits or a
borrow is taken on subtraction of highest order bit.
Equal flag : This bit flag will be set if a logic comparison operation finds out that both of its
operands are equal.
Overflow flag : This flag is used to indicate the condition of arithmetic overflow.
Interrupt : This flag is used for enabling or disabling interrupts.
Supervisor : This flag is used in certain computers to determine whether the CPU is,flag
executing in supervisor or user mode. In case the CPU is in supervisor modeit will be allowed to
execute certain privileged instructions. In most CPUs, on encountering a subroutine call or
interrupt handling routine, it is desired that the status information such as conditional codes and
other register information be stored and restored on initiation and end of these routines
respectively. The register often known as Program Status Word (PSW) contains condition code
plus other status information. There can be several other status and control registers such as
interrupt vector register in the machines using vectored interrupt, stack pointer if a stack is used
to implement subroutine calls, etc.T4.2 Status and Control Registers has status and control
register design and also dependent on the Operating System (0s) support. The functional
understanding of OS helps in tailoring the register organization. In fact, some control
information is only of specific use to the operating system.

One major decision to be taken for designing status and control registers organization is :
how to allocate control information between registers and the memory. Generally first few
hundred or thousands words of memory are allocated for storing control information. It is the
responsibility of the designer to determine how much control information should be in registers
and how much in memory.
4.5 Register Transfer Language

Rather than specifying a digital system in words, a specific notation is used, register
transfer language
For any function of the computer, the register transfer language can be used to describe
the (sequence of) microoperations
Register transfer language:
A symbolic language which is a convenient tool for describing the internal
organization of digital computers and can also be used to facilitate the design process
of digital systems.
Registers are designated by capital letters, sometimes followed by numbers (e.g., A, R13,
IR)
Often the names indicate function:
MAR - memory address register
PC - program counter
IR - instruction register
Registers and their contents can be viewed and represented in various ways
A register can be viewed as a single entity:

Registers may also be represented showing the bits of data they contain

4.6 Register Transfer


Computer registers are designated by capital letters to denote the function of registers eg. The
register that holds an address for the memory unit is usually called a memory address register
and is designated by the name MAR. Other designation for registers are PC (for programme
counter), IR (for instruction register), and R1 (for processor register). The individual flip flops is
an n-bit register are3 numbered in sequence from zero through n-1, starting from zero is the right
most position and increasing the number towards the left.
Block diagram of register

Copying the contents of one register to another is a register transfer


A register transfer is indicated as R2 R1
In this case the contents of register R2 are copied (loaded) into register R1
A simultaneous transfer of all bits from the source R1 to the destination register
R2, during one clock pulse.
Note that this is a non-destructive; i.e. the contents of R1 are not altered by
copying (loading) them to R2

4.7 Bus And Memory Transfer


Bus is a path(of a group of wires) over which information is transferred, from any of several
sources to any of several destinations.
From a register to bus: BUS R

Bus system for four registers


Transfer From Bus to a Destination Register
4.8 Arithmetic Micro operations

The basic arithmetic microoperations are


Addition
Subtraction
Increment
Decrement
The additional arithmetic microoperations are
Add with carry
Subtract with borrow
Transfer/Load
4.9 Logic Microoperation

Specify binary operations on the strings of bits in registers


Logic microoperations are bit-wise operations, i.e., they work on the individual bits of
data and useful for bit manipulations on binary data even useful for making logical decisions
based on the bit value.
There are, in principle, 16 different logic functions that can be defined over two
binary input variables

However, most systems only implement four of these are AND ( ), OR ( ), XOR ( ),
Complement/NOT.
The others can be created from combination of these.
List of Logic Microoperations
- 16 different logic operations with 2 binary vars.
- n binary vars functions
Truth tables for 16 functions of 2 variables and the corresponding 16 logic micro-
operations
Application of logic Microoperation
Logic microoperations can be used to manipulate individual bits or a portions of a word
in a register
Consider the data in a register A. In another register, B, is bit data that will be used to
modify the contents of A

Selective-set A A+B
Selective-complement A A B
Selective-clear A A B
Mask (Delete) A AB
Clear A A B
Insert A (A B) + C
Compare A A B

4.10 Shift Micro operations

Logic micro operation specify binary operations for strings of bits stored in register. These
operations consider each bit of register separately and treat them as binary variables. For
example, the exclusive OR micro operation with the content of two registers R1 and R2 is
symbolized by the statement.

P: R1 R1 R2

It specifies a logic micro operation to be executed on the individual bits of registers provided
that controls variable P = 1.

a).There are three types of shifts


Logical shift
Circular shift
Arithmetic shift

b).What differentiates them is the information that goes into the serial input
c). A right shift operation

A left shift operation

4.11 Control Unit


The Control Unit can be thought of as the brain of the CPU itself. It controls based on the
instructions it decodes, how other parts of the CPU and in turn, rest of the computer systems
should work in order that the instruction gets executed in a correct manner. There are two types
of control units, the first type is called hardwired control unit. Hardwired control units are
constructed using digital circuits and once formed cannot be changed. The other type of control
unit is micro programmed control unit. A micro programmed control unit itself decodes and
execute instructions by means of executing micro programs.

4.11.1 Structure of control unit


Control Unit Block Diagram
4.11.2 Hardwired Control Unit

Figure 1 is a block diagram showing the internal organization of a hard-wired control unit for our
simple computer. Input to the controller consists of the 4-bit opcode of the instruction currently
contained in the Instruction Register and the negative flag from the accumulator. The controller's
output is a set of 16 control signals that go out to the various registers and to the memory of the
computer, in addition to a HLT signal that is activated whenever the leading bit of the op-code is
one. The controller is composed of the following functional units: A ring counter, an instruction
decoder, and a control matrix.

The ring counter provides a sequence of six consecutive active signals that cycle continuously.
Synchronized by the system clock, the ring counter first activates its T0 line, then its T1 line, and
so forth. After T5 is active, the sequence begins again with T0. Figure 2 shows how the ring
counter might be organized internally. The instruction decoder takes its four-bit input from the
op-code field of the instruction register and activates one and only one of its 8 output lines. Each
line corresponds to one of the instructions in the computer's instruction set. Figure 3 shows the
internal organization of this decoder. The most important part of the hard-wired controller is the
control matrix. It receives input from the ring counter and the instruction decoder and provides
the proper sequence of control signals.

Figure 1.Block diagram of hard-wired control unit


Figure 2. The internal organization of Ring Counter

Figure 3.The internal organization of hard-wired Instruction Decoder

4.11.3 A Micro-programmed Control Unit

As we have seen, the controller causes instructions to be executed by issuing a specific set of
control signals at each beat of the system clock. Each set of control signals issued causes one
basic operation (micro-operation), such as a register transfer, to occur within the data path
section of the computer. In the case of a hard-wired control unit the control matrix is responsible
for sending out the required sequence of signals.

An alternative way of generating the control signals is that of micro-programmed control. In


order to understand this method it is convenient to think of sets of control signals that cause
specific micro-operations to occur as being "microinstructions" that could be stored in a memory.
Each bit of a microinstruction might correspond to one control signal. If the bit is set it means
that the control signal will be active; if cleared the signal will be inactive. Sequences of
microinstructions could be stored in an internal "control" memory. Execution of a machine
language instruction could then be caused by fetching the proper sequence of microinstructions
from the control memory and sending them out to the data path section of the computer. A
sequence of microinstructions that implements an instruction on the external computer is known
as a micro-routine. The instruction set of the computer is thus determined by the set of micro-
routines, the "microprogram," stored in the controller's memory. The control unit of a
microprogram-controlled computer is essentially a computer within a computer.

Figure 4. A Microprogrammed Control Unit for the Simple Computer


Figure 4 is a block diagram of a micro-programmed control unit that may be used to implement
the instruction set of the computer we described above. The heart of the controller is the control
32 X 24 ROM memory in which upt to 32 24-bit long microinstructions can be stored. Each is
composed of two main fields: a 16-bit wide control signal field and an 8-bit wide next-address
field. Each bit in the control signal field corresponds to one of the control signals discussed
above. The next-address field contains bits that determine the address of the next
microinstruction to be fetched from the control ROM. We shall see the details of how these bits
work shortly. Words selected from the control ROM feed the microinstruction register. This 24-
bit wide register is analogous to the outer machine's instruction register. Specifically, the leading
16 bits (the control-signal field) of the microinstruction register are connected to the control-
signal lines that go to the various components of the external machine's data path section.

Addresses provided to the control ROM come from a micro-counter register, which is analogous
to the external machine's program counter. The micro-counter, in turn, receives its input from a
multiplexer which selects from : (1) the output of an address ROM, (2) a current-address
incrementer, or (3) the address stored in the next-address field of the current microinstruction.

4.12 Case Study: 8085 Microprocessor

The 8085 is an 8-bit general purpose microprocessor that can address 64K Byte of
memory. It has 40 pins and uses +5V for power. It can run at a maximum frequency of 3
MHz.The pins on the chip can be grouped into 6 groups:
Address Bus.
Data Bus.
Control and Status Signals.
Power supply and frequency.
Externally Initiated Signals.
Serial I/O ports.
The Address and Data Busses

The address bus has 8 signal lines A8 A15 which are unidirectional.The other 8 address
bits are multiplexed (time shared) with the 8 data bits.So, the bits AD0 AD7 are bi-
directional and serve as A0 A7 and D0 D7 at the same time.During the execution of
the instruction, these lines carry the address bits during the early part, then during the late
parts of the execution, they carry the 8 data bits.In order to separate the address from the
data, we can use a latch to save the value before the function of the bits changes.

The Control and Status Signals

There are 4 main control and status signals. These are:


ALE: Address Latch Enable. This signal is a pulse that become 1 when the
AD0 AD7 lines have an address on them. It becomes 0 after that. This
signal can be used to enable a latch to save the address bits from the AD
lines.
RD: Read. Active low.
WR: Write. Active low.
IO/M: This signal specifies whether the operation is a memory operation
(IO/M=0) or an I/O operation (IO/M=1).
S1 and S0 : Status signals to specify the kind of operation being
performed .Usually un-used in small systems.

Frequency Control Signals

There are 3 important pins in the frequency control group.X0 and X1 are the inputs from
the crystal or clock generating circuit.The frequency is internally divided by 2.So, to run
the microprocessor at 3 MHz, a clock running at 6 MHz should be connected to the X0
and X1 pins.

CLK (OUT): An output clock pin to drive the clock of the rest of the system.
Steps For Fetching an Instruction

Lets assume that we are trying to fetch the instruction at memory location 2005. That
means that the program counter is now set to that value.The following is the sequence of
operations:
The program counter places the address value on the address bus and the
controller issues a RD signal.
The memorys address decoder gets the value and determines which
memory location is being accessed.
The value in the memory location is placed on the data bus.
The value on the data bus is read into the instruction decoder inside the
microprocessor.
After decoding the instruction, the control unit issues the proper control
signals to perform the operation.

MULTIPLE CHOICE QUESTIONS

1. Which register example hold the instruction itself?


A) Program Counter
B) Instruction Register
C) Control Unit
D) Arithmetic Logic unit

2. To accomplish a task a computer has to process data in three stages. They are:
3. The CPU is also known as:
A) The Brain
B) The Processor
C) The Central Processing Unit
D) All of the above

4. Main Memory holds data and instructions being processed by the computer and is this
memory is accessible by the CPU.
a) True
b) False

5. It is difficult to classify computer systems on the basis of their system performance ,as newer,
smaller computer systems outperform their larger models of yesteryear.
a) True
b) False

6.The two smaller units of the processor are CU and ALU


a). True
b). False

7.Which smaller unit of the CPU directs and coordinates all activities within it and determines
the sequence in which instructions are executed, sending instruction sequence to the other units.
a) CU
b) ALU
c) PROCESSOR
d) All of the above

8. The CPU primary responsibility is the movement of data and instructions from itself to main
memory and ALU and back. Arrange the CU execution of instruction in the correct order by
placing the execution instructions letter in the box provided:
a) Send data to memory unit after processing
b) Fetches data required by the instruction from memory
c) Fetches the instruction from memory
d) Decode the instruction
e) Send data and instruction to the ALU for processing
9. Which smaller CPU unit contains registers-temporary storage locations that hold a single
instruction or data item needed immediately and frequently
a) CU
b) ALU
c) PROCESSOR
d) All of the above

10. Program counter (PC) and instruction register (IR) are examples of registers:
a) True
b) False

Answers

1 ,2 ,3 ,4 ,5 ,6 ,7 8, 9, 10
Chapter 5 : Computer Arithmetic

5.1 INTRODUCTION
Because electronic logic deals with currents that are on or off, it has been found convenient to
represent quantities in binary form to perform arithmetic on a computer. Thus, instead of having
ten different digits, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, in binary arithmetic, there are only two different
digits, 0 and 1, and when moving to the next column, instead of the digit representing a quantity
that is ten times as large, it only represents a quantity that is two times as large. Thus, the first
few numbers are written in binary as follows:

Decimal Binary
Zero 0 0
One 1 1
Two 2 10
Three 3 11
Four 4 100
Five 5 101
Six 6 110
Seven 7 111
Eight 8 1000
Nine 9 1001
Ten 10 1010
Eleven 11 1011
Twelve 12 1100

The addition and multiplication tables for binary arithmetic are very small, and this makes it
possible to use logic circuits to build binary adders.
+|0 1 *|0 1
--------- ---------
0|0 1 0|0 0
1 | 1 10 1|0 1

Thus, from the table above, when two binary digits, A and B are added, the carry bit is simply (A
AND B), while the last digit of the sum is more complicated; ((A AND NOT B) OR ((NOT A)
AND B)) is one way to express it.

5.2 Number Systems


ALU does calculations with binary numbers
Decimal number system
Uses 10 digits (0,1,2,3,4,5,6,7,8,9)
In decimal system, a number 84, e.g., means
84 = (8x10) + 3
4728 = (4x1000)+(7x100)+(2x10)+8
Base or radix of 10: each digit in the number is multiplied by 10 raised to a power
corresponding to that digits position
E.g. 83 = (8x101)+ (3x100)
4728 = (4x103)+(7x102)+(2x101)+(8x100)

Decimal number system


Fractional values, e.g.
472.83=(4x102)+(7x101)+(2x100)+(8x10-1)+(3x10-2)
In general, for the decimal representation of
X = { x2x1x0.x-1x-2x-3 }

X= i xi10i
Binary Number System

Uses only two digits, 0 and 1


It is base or radix of 2
Each digit has a value depending on its position:
102 = (1x21)+(0x20) = 210
112 = (1x21)+(1x20) = 310
1002 = (1x22)+ (0x21)+(0x20) = 410
1001.1012=(1x23)+(0x22)+(0x21)+(1x20)+(1x2-1)+(0x2-2)+(1x2-3)= 9.62510

Decimal to Binary conversion

Integer and fractional parts are handled separately,


Integer part is handled by repeating division by 2
Factional part is handled by repeating multiplication by 2

E.g. convert decimal 11.81 to binary
Integer part 11
Factional part .81

Decimal to Binary conversion, e.g. //

e.g. 11.81 to 1011.11001 (approx)


11/2 = 5 remainder 1
5/2 = 2 remainder 1
2/2 = 1 remainder 0
1/2 = 0 remainder 1
Binary number 1011
.81x2 = 1.62 integral part 1
.62x2 = 1.24 integral part 1
.24x2 = 0.48 integral part 0
.48x2 = 0.96 integral part 0
.96x2 = 1.92 integral part 1
Binary number .11001 (approximate)

Hexadecimal Notation:
command ground between computer and Human
Use 16 digits, (0,1,3,9,A,B,C,D,E,F)
1A16=(116x161)+(A16x16o)
= (110 x 161)+(1010 x 160)=2610
Convert group of four binary digits to/from one hexadecimal digit,
0000=0; 0001=1; 0010=2; 0011=3; 0100=4; 0101=5; 0110=6; 0111=7; 1000=8;
1001=9; 1010=A; 1011=B; 1100=C; 1101=D; 1110=E; 1111=F;
e.g.
1101 1110 0001. 1110 1101 = DE1.DE

Integer Representation (storage)


Only have 0 & 1 to represent everything
Positive numbers stored in binary
e.g. 41=00101001
No minus sign
No period
How to represent negative number
Sign-Magnitude
Twos compliment

Sign-Magnitude
Left most bit is sign bit
0 means positive
1 means negative
+18 = 00010010
-18 = 10010010
Problems
Need to consider both sign and magnitude in arithmetic
Two representations of zero (+0 and -0)

Twos Compliment (representation)

+3 = 00000011
+2 = 00000010
+1 = 00000001
+0 = 00000000
-1 = 11111111
-2 = 11111110
-3 = 11111101

Benefits
One representation of zero
Arithmetic works easily (see later)
Negating is fairly easy (2s compliment operation)
3 = 00000011
Boolean complement gives 11111100
Add 1 to LSB 11111101

5.3 Addition and Subtraction


Normal binary addition
0011 0101 1100
+0100 +0100 +1111
-------- ---------- ------------
0111 1001 = overflow 11011
Monitor sign bit for overflow (sign bit change as adding two positive numbers or two
negative numbers.)
Subtraction: Take twos compliment of subtrahend then add to minuend
i.e. a - b = a + (-b)
So we only need addition and complement circuits

5.4 Multiplication
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products

Multiplication Example
(unsigned numbers e.g.)
1011 Multiplicand (11 dec)
x 1101 Multiplier (13 dec)
1011 Partial products
0000 Note: if multiplier bit is 1 copy
1011 multiplicand (place value)
1011 otherwise zero
10001111 Product (143 dec)
Note: need double length result
5.4.1 Booths Algorithm for multiplication

Example of Booths Algorithm

5.5 Division
More complex than multiplication
However, can utilize most of the same hardware.
Based on long division
Division of Unsigned Binary Integers

00001101 Quotient
Divisor 1011 10010011 Dividend
1011
001110
Partial 1011
Remainders
001111
1011
100 Remainder
Flowchart for Unsigned Binary division

Real Numbers
Numbers with fractions
Could be done in pure binary
1001.1010 = 24 + 20 +2-1 + 2-3 =9.625
Where is the binary point?
Fixed?
Very limited
Moving?
How do you show where it is?

5.6 Floating Point


+/- .significand x 2exponent
Point is actually fixed between sign bit and body of mantissa
Exponent indicates place value (point position)
Floating Point Examples

Signs for Floating Point

Exponent is in excess or biased notation

e.g. Excess (bias) 127 means


8 bit exponent field
Pure value range 0-255
Subtract 127 to get correct value
Range -127 to +128

The relative magnitudes (order) of the numbers do not change. Can be treated as integers
for comparison.
5.6.1 Normalization
FP numbers are usually normalized
i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1
Since it is always 1 there is no need to store it
(c.f. Scientific notation where numbers are normalized to give a single digit before
the decimal point
e.g. 3.123 x 103)

FP Ranges

For a 32 bit number


8 bit exponent
+/- 2256 1.5 x 1077
Accuracy
The effect of changing lsb of mantissa
23 bit mantissa 2-23 1.2 x 10-7
About 6 decimal places

Expressible Numbers
IEEE 754
Standard for floating point storage
32 and 64 bit standards
8 and 11 bit exponent respectively
Extended formats (both mantissa and exponent) for intermediate results
Representation: sign, exponent, faction
0: 0, 0, 0
-0: 1, 0, 0
Plus infinity: 0, all 1s, 0
Minus infinity: 1, all 1s, 0
NaN; 0 or 1, all 1s, =! 0

FP Arithmetic +/-
Check for zeros
Align significands (adjusting exponents)
Add or subtract significands
Normalize result

FP Arithmetic x/
Check for zero
Add/subtract exponents
Multiply/divide significands (watch sign)
Normalize
Round
All intermediate results should be in double length storage
Floating Point Multiplication

Floating Point Division


End chapter quizzes
Q1. The Binary Number system uses the radix of
a) 2
b) 3
c) 1
d) 0
Q2. (41) is equivalent to
a) (101001)2
b) (.10111)2
c) (101010)2
d) (156)2

Q3. (0.1011) is equivalent to


a) (41)10
b) (41)8
c) (0.6785)10
d) (0.6875)10

Q4. What is the result of adding the following two positive binary bit strings?
101101.101
+101100.0010
a).1000001.1110
b).1000001.1010
c).1000001.1000
d).1000001.1100

Q5.What name is used to describe the bit of lowest magnitude within a byte or bit string?
a). radix point
b). low order bit
c). high order bit
d). op code
Q6. What is the best way to write the value '7564' and make it clear to the reader that the number
should be interpreted as a hexadecimal value?
a). 0x7654
b). 7654B
c). 7654H
d). \7654

Q7.What is the numeric range of an eight bit unsigned binary number?


a). 0..7
b). 1..8
c). 0..255
d). 1..256

Q8. What coding format encodes a real number as a mantissa multiplied by a power (exponent)
of two?
a). Binary
b). excess notation
c). floating point
d). two's complement

Q9. Which of the following does not result from floating point math operations?
a). Underflow
b). Overflow
c). Truncation
d). Two's complement

Q10. How many bits of information can each memory cell in a computer chip hold?
a). 0 bit
b). 1 bit
c). 8 bits
Answers
1.a ,2. a ,3.c, 4.d, 5.a, 6.a, 7.a, 8.a, 9.d, 10.c
Chapter 6 : Input-Output Organization

INTRODUCTION
In computing, input/output, or I/O, refers to the communication between an information
processing system (such as a computer), and the outside world possibly a human, or another
information processing system. Inputs are the signals or data received by the system, and outputs
are the signals or data sent from it. The term can also be used as part of an action; to "perform
I/O" is to perform an input or output operation. I/O devices are used by a person (or other
system) to communicate with a computer. For instance, keyboards and mouses are considered
input devices of a computer, while monitors and printers are considered output devices of a
computer. Devices for communication between computers, such as modems and network cards,
typically serve for both input and output.

Note that the designation of a device as either input or output depends on the perspective.
Mouses and keyboards take as input physical movement that the human user outputs and convert
it into signals that a computer can understand. The output from these devices is input for the
computer. Similarly, printers and monitors take as input signals that a computer outputs. They
then convert these signals into representations that human users can see or read. (For a human
user the process of reading or seeing these representations is receiving input.)

In computer architecture, the combination of the CPU and main memory (i.e. memory that the
CPU can read and write to directly, with individual instructions) is considered the brain of a
computer, and from that point of view any transfer of information from or to that combination,
for example to or from a disk drive, is considered I/O. The CPU and its supporting circuitry
provide memory-mapped I/O that is used in low-level computer programming in the
implementation of device drivers.
INPUT-OUTPUT ORGANIZATION

Peripheral Devices
Input-Output Interface
Asynchronous Data Transfer
Modes of Transfer
Priority Interrupt
Direct Memory Access
Input-Output Processor
Serial Communication

6.1 PERIPHERAL DEVICES

6.1.1 Keyboard

A keyboard is a human interface device which is represented as a layout of buttons. Each


button, or key, can be used to either input a linguistic character to a computer, or to call upon a
particular function of the computer. Traditional keyboards use spring-based buttons, though
newer variations employ virtual keys, or even projected keyboards.

Examples of types of keyboards include:

Computer keyboard
Keyer
Chorded keyboard
LPFK

6.1.2. Pointing devices


A pointing device is any human interface device that allows a user to input spatial data to a
computer. In the case of mice and touch screens, this is usually achieved by detecting movement
across a physical surface. Analog devices, such as 3D mice, joysticks, or pointing sticks,
function by reporting their angle of deflection. Movements of the pointing device are echoed on
the screen by movements of the cursor, creating a simple, intuitive way to navigate a computer's
GUI.

6.1.3. High-degree of freedom input devices


Some devices allow many continuous degrees of freedom as input. These can be used as pointing
devices, but are generally used in ways that don't involve pointing to a location in space, such as
the control of a camera angle while in 3D applications. These kinds of devices are typically used
in CAVEs, where input that registers 6DOF is

6.1.4. Imaging and Video input devices


Video input devices are used to digitize images or video from the outside world into the
computer. The information can be stored in a multitude of formats depending on the user's
requirement.

Webcam
Image scanner
Fingerprint scanner
Barcode reader
3D scanner
Laser rangefinder

6.1.5. Medical Imaging

o Computed tomography
o Magnetic resonance imaging
o Positron emission tomography
o Medical ultrasonography
6.2 OUTPUT DEVICES

An output device is any piece of computer hardware equipment used to communicate the
results of data processing carried out by an information processing system (such as a
computer) to the outside world. In computing, input/output, or I/O, refers to the
communication between an information processing system (such as a computer), and the
outside world. Inputs are the signals or data sent to the system, and outputs are the signals or
data sent by the system to the outside.

The most common input devices used by the computer are the keyboard and mouse. The
keyboard allows the entry of textual information while the mouse allows the selection of a
point on the screen by moving a screen cursor to the point and pressing a mouse button. The
most common outputs are monitors and speakers

o Card Puncher, Paper Tape Puncher


o CRT
o Printer (Impact, Ink Jet, Laser, Dot Matrix) Plotter
o Analog
o Voice

6.3 INPUT/OUTPUT INTERFACE

Provides a method for transferring information between internal storage (such as memory
and CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
Peripherals - Electromechanical Devices
CPU or Memory - Electronic Device
Data Transfer Rate
Peripherals - Usually slower
CPU or Memory - Usually faster than peripherals
Some kinds of Synchronization mechanism may be needed
Unit of Information
Peripherals Byte, Block,
CPU or Memory Word
Data representations may differ

6.3.1 BUS AND INTERFACE MODULES

Each peripheral has an interface module associated with it Interface.

- Decodes the device address (device code)


- Decodes the commands (operation)
- Provides signals for the peripheral controller
- Synchronizes the data flow and supervises
the transfer rate between peripheral and CPU or Memory

Typical I/O instruction


Op. Device Function code
code address

Commands
6.3.2. CONNECTION OF I/O BUS

6.3.2.1 Connection of I/O Bus to CPU

6.3.2.2.Connection of I/O to One interface

6.3.3. I/O BUS AND MEMORY BUS


Functions of Buses

MEMORY BUS is for information transfers between CPU and the MM

* I/O BUS is for information transfers between CPU and I/O devices through their I/O interface
* Many computers use a common single bus system for both memory and I/O interface units

- Use one common bus but separate control lines for each function

- Use one common bus with common control lines for both functions

* Some computer systems use two separate buses, one to communicate with memory and the
other with I/O interfaces

- Communication between CPU and all interface units is via a common I/O Bus

- An interface connected to a peripheral device may have a number of data registers , a control
register, and a status register

- A command is passed to the peripheral by sending to the appropriate interface register

- Function code and sense lines are not needed (Transfer of data, control, and status information
is always via the common I/O Bus)

6.3.4. ISOLATED vs MEMORY MAPPED I/O


Isolated I/O

Separate I/O read/write control lines in addition to memory read/write control lines.

- Separate (isolated) memory and I/O address spaces.

- Distinct input and output instructions.

Memory-mapped I/O

A single set of read/write control lines(no distinction between memory and I/O transfer)
- Memory and I/O addresses share the common address space
-> reduces memory address range available
- No specific input or output instruction
-> The same memory reference instructions can be used for I/O transfers

- Considerable flexibility in handling I/O operations


6.3.5. I/O INTERFACE

6.3.6. Programmable Interface

- Information in each port can be assigned a meaning depending on the mode of operation of the
I/O device. Port A = Data; Port B = Command; Port C = Status

- CPU initializes(loads) each port by transferring a byte to the Control Register .

Allows CPU can define the mode of operation of each port.

Programmable Port: By changing the bits in the control register, it is possible to


change the interface characteristics.

6.4 ASYNCHRONOUS DATA TRANSFER

6.4.1. Synchronous and Asynchronous Operations


Synchronous - All devices derive the timing information from common clock line.

Asynchronous - No common clock.


6.4.2 Asynchronous Data Transfer

Asynchronous data transfer between two independent units requires that control signals be
transmitted between the communicating units to indicate the time at which data is being
transmitted.

Two Asynchronous Data Transfer Methods

Strobe pulse : A strobe pulse is supplied by one unit to indicate the other unit when the transfer
has to occur.

Handshaking: A control signal is accompanied with each data being transmitted to indicate the
presence of data .The receiving unit responds with another control signal to acknowledge receipt
of the data.

6.4.2.1 STROBE CONTROL

* Employs a single control line to time each transfer

* The strobe may be activated by either the source or the destination unit
Source-Initiated Strobe for Data Transfer

Destination-Initiated Strobe for Data Transfer

6.4.2.2. HANDSHAKING

Strobe Methods

1. Source-Initiated

The source unit that initiates the transfer has no way of knowing whether the destination
unit has actually received data

2. Destination-Initiated

The destination unit that initiates the transfer no way of knowing whether the source has
actually placed the data on the bus.To solve this problem, the HANDSHAKE method introduces
a second control signal to provide a Reply to the unit that initiates the transfer.

SOURCE-INITIATED TRANSFER USING HANDSHAKE


* Allows arbitrary delays from one state to the next

* Permits each unit to respond at its own data transfer rate

* The rate of transfer is determined by the slower unit

DESTINATION-INITIATED TRANSFER USING HANDSHAKE

6.4. ASYNCHRONOUS SERIAL TRANSFER

Four Different Types of Transfer


Asynchronous serial transfer

Synchronous serial transfer

Asynchronous parallel transfer

Synchronous parallel transfer


Asynchronous Serial Transfer
- Employs special bits which are inserted at both ends of the character code .
- Each character consists of three parts; Start bit; Data bits; Stop bits.

A character can be detected by the receiver from the knowledge of 4 rules;


When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected by a Start Bit , which is always a 0.
- The character bits always follow the Start Bit.
- After the last character , a Stop Bit is detected when the line returns to the 1-state for at least 1
bit time.
The receiver knows in advance the transfer rate of the bits and the number of information bits to
expect.

UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER


- UART
Transmitter Register
- Accepts a data byte(from CPU) through the data bus
- Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift register
- Complete data byte is sent to the receiver register
Status Register Bits
- Used for I/O flags and for recording errors
Control Register Bits.
Define baud rate, no. of bits in each character, whether to generate and check parity, and no. of
stop bits.

FIRST-IN-FIRST-OUT(FIFO) BUFFER
* Input data and output data at two different rates
* Output data are always in the same order in which the data entered the buffer.
* Useful in some applications when data is transferred asynchronously
4 x 4 FIFO Buffer (4 4-bit registers Ri),
4 Control Registers(flip-flops Fi, associated with each
MODES OF TRANSFER - PROGRAM-CONTROLLED I/O

3 different Data Transfer Modes between the central computer (CPU or Memory) and
peripherals;

1. Program-Controlled I/O

2. Interrupt-Initiated I/O

3. Direct Memory Access (DMA)

Program-Controlled I/O(Input Dev to CPU)

MODES OF TRANSFER - INTERRUPT INITIATED I/O & DMA

- Polling takes valuable CPU time.


- Open communication only when some data has to be passed -> Interrupt.
- I/O interface, instead of the CPU, monitors the I/O device.
- When the interface determines that the I/O device is ready for data transfer, it generates an
Interrupt Request to the CPU.
- Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the
service routine to process the data transfer, and then returns to the task it was performing.
DMA (Direct Memory Access)
- Large blocks of data transferred at a high speed to or from high speed devices, magnetic drums,
disks, tapes, etc.
- DMA controller : Interface that provides I/O transfer of data directly to and from the
memory and the I/O device.
- CPU initializes the DMA controller by sending a memory address and the number of words to
be transferred.
- Actual transfer of data is done directly between the device and memory through DMA
controller.
-> Freeing CPU for other tasks

PRIORITY INTERRUPT
Priority
- Determines which interrupt is to be served first when two or more requests are made
simultaneously. And also determines which interrupts are permitted to interrupt the computer
while another is being serviced.
- Higher priority interrupts can make requests while servicing a lower priority interrupt .

Priority Interrupt by Software(Polling)


- Priority is established by the order of polling the devices(interrupt sources) and flexible
since it is established by software.
- Low cost since it needs a very little hardware.
- Very slow .

Priority Interrupt by Hardware


Require a priority interrupt manager which accepts all the interrupt requests to determine
the highest priority request . Fast since identification of the highest priority interrupt request is
identified by the hardware. Fast since each interrupt source has its own interrupt vector to
access directly to its own service routine .
HARDWARE PRIORITY INTERRUPT - DAISY-CHAIN -
Interrupt Request from any device(>=1)
-> CPU responds by INTACK <- 1.
-> Any device receives signal (INTACK) 1 at PI puts the VAD on the bus.
Among interrupt requesting devices the only device which is physically closest to CPU gets
INTACK=1, and it blocks INTACK to propagate to the next device.
One stage of the daisy chain priority arrangement

PARALLEL PRIORITY INTERRUPT


IEN: Set or Clear by instructions ION or IOF.
IST: Represents an unmasked interrupt has occurred. INTACK enables tristate Bus Buffer to
load VAD generated by the Priority Logic.

Interrupt Register:
- Each bit is associated with an Interrupt Request from different Interrupt Source - different
priority level.
- Each bit can be cleared by a program instruction.

Mask Register:
- Mask Register is associated with Interrupt Register.
- Each bit can be set or cleared by an Instruction.
INTERRUPT PRIORITY ENCODER
Determines the highest priority interrupt when more than one interrupts take place.
Priority Encoder Truth table

INTERRUPT CYCLE
At the end of each Instruction cycle :
- CPU checks IEN and IST.
- If IEN IST = 1, CPU -> Interrupt Cycle.
SP SP - 1 Decrement stack pointer.
M[SP] PC Push PC into stack.
INTACK 1 Enable interrupt acknowledge.
PC VAD Transfer vector address to PC.
IEN 0 Disable further interrupts.
Go To Fetch to execute the first instruction in the interrupt service routine.
INTERRUPT SERVICE ROUTINE

Initial and Final Operations


Each interrupt service routine must have an initial and final set of operations for controlling the
registers in the hardware interrupt system.
Initial and Final Operations
6.5 DIRECT MEMORY ACCESS
* Block of data transfer from high speed devices, Drum, Disk, Tape.
* DMA controller - Interface which allows I/O transfer directly between Memory and Device,
freeing CPU for other tasks.
* CPU initializes DMA Controller by sending memory address and the block size (number of
words) .

CPU bus signals for DMA transfer

Block diagram of DMA controller


6.5.1. DMA I/O OPERATION
Starting an I/O

- CPU executes instruction to


a. Load Memory Address Register

b. Load Word Counter

c. Load Function (Read or Write) to be performed

d. Issue a GO command

Upon receiving, a GO Command DMA performs I/O operation as follows independently from
CPU.
[1] Input Device <- R (Read control signal).
[2] Buffer (DMA Controller) <- Input Byte;and assembles the byte into a word until word is
full.
[4] M <- memory address, W(Write control signal).
[5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC 1.
[6] If WC = 0, then Interrupt to acknowledge done, else go to [1] .
Output
[1] M <- M Address, R, M Address R <- M Address R + 1, WC <- WC 1.
[2] Disassemble the word .
[3] Buffer <- One byte; Output Device <- W, for all disassembled bytes .
[4] If WC = 0, then Interrupt to acknowledge done, else go to [1].

6.5.2. CYCLE STEALING


While DMA I/O takes place, CPU is also executing instructions DMA Controller and CPU both
access Memory -> Memory Access Conflict .
Memory Bus Controller
- Coordinating the activities of all devices requesting memory access
- Priority System
Memory accesses by CPU and DMA Controller are interwoven, with the top priority given to
DMA Controller
-> Cycle Stealing

Cycle Steal
- CPU is usually much faster than I/O(DMA), thus CPU uses the most of the memory
cycles.
- DMA Controller steals the memory cycles from CPU.
- For those stolen cycles, CPU remains idle.
- For those slow CPU, DMA Controller may steal most of the memory cycles which may
cause CPU remain idle long time .

DMA TRANSFER
6.6 PRIORITY INPUT/OUTPUT PROCESSOR

Channel
- Processor with direct memory access capability that communicates with I/O devices.
- Channel accesses memory by cycle stealing.
- Channel can execute a Channel Program.

i. Stored in the main memory.

ii. Consists of Channel Command Word(CCW).

iii. Each CCW specifies the parameters needed by the channel to control the
I/O devices and perform data transfer operations.

- CPU initiates the channel by executing an channel I/O class instruction and once initiated,
channel operates independently of the CPU .
CHANNEL / CPU COMMUNICATION

Multiple choice Questions

1. In Programmable interface each port contains

a).Port A=data ; Port B= Command ; Port C=status ;


b).Port A=data ; Port B= status ; Port C =Command ;
c).Port A=Command ; Port B= Data ; Port=status ;

2._________ types of Asynchronous Data Transfer methods are


a). 3, strobe , stroke and Handshaking
b).2, strobe and handshaking
c). 1 , handshaking

3. Four different types of asynchronous serial transfer


a).Asynchronous serial and parallel transfer
b). synchronous serial and asynchronous parallel transfer
c).Asynchronous serial, synchronous parallel Asynchronous parallel, synchronous serial.

4.Status register is used for


a). I/O flag and to receive information
b).I/O flag and for recording errors.
c).for recording errors and to receive information

5 Which one is incorrect


a).CPU is usually much faster than I/O(DMA) .
b). DMA Controller steals the memory cycles from CPU.
c).For those stolen cycles, CPU remains idle.
d). For those slow CPU, DMA Controller may steal most of the memory cycles which may
cause CPU remain idle for few seconds .

6. Transmitter register
a).Accepts a data byte through data bus.
b). Accepts a data byte through memory bus.
c). Accepts a data byte through address bus.
7.The larger the RAM of computer the faster its processing speed is, since it eliminates.
a). need for external memory b). need for ROM
c).frequent disk I/Os. d). need for wider data path
8. A group of signal lines used to transmit data in parallel from one element of a computer to
another is
a).Control Bus b).Address Bus
c). Databus d). Network
9. The basic unit within a computer store capable of holding a single unit of Data is
a). register b).ALU
c).Control unit d). store location

10.A device used to bring information into a computer is


a).ALU b).Input device
c).Control unit d).Output device

Answers
1.a, 2.b,3.c,4.b,5.d,6.a, 7.b,8.c,9.b,10.d
Chapter 7 : Memory Organization

7.1 Memory Hierarchy

The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were
equipped with additional storage device beyond the capacity of main memory.The main memory
unit that communicates directly with CPU is called the MAIN MEMORY . Devices that
provide backup storage are called AUXILARY MEMORY. Most common auxiliary devices are
magnetic disks and tapes they are used for strong system programs, large data files and other
backup information. Only programs and data currently needed by the processor resides in main
memory. All other informations are stored in auxiliary memory and transferred to the main
memory when needed.

The main memory hierarchy system consists of all storage devices employed in a computer
system from the slow but high capacity auxiliary memory to a relatively faster main memory, to
an even smaller and faster cache memory accessible to the high-speed processing logic. Memory
Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the
memory system.

Memory Hierarchy in computer system


A very high speed memory is called cache memory used to increase the speed of processing
by making current programs and data available to the CPU at rapid rate.The cache memory is
employed in the system to compensates the speed differential between main memory access
time and processor logic.

7.2 Main Memory


The main memory is the central storage unit in a computer system. It is a relatively large and fast
memory used to store programs and data during the computer operations. The principal
technology used for maim memory is based on semiconductor integrated circuits. Integrated
circuits RAM chips are available in two possible operating modes static and dynamic. The static
RAM is easier to use and has shorter read and write cycles.
The dynamic RAM offers reduced power consumption and larger storage capacity in a single
memory chip compared to static RAM.

7.2.1 RAM and ROM Chips


Most of main memory in a general- purpose computer is made up of RAM integrated circuit
chips, but apportion of the memory may be constructed with ROM chips. Originally RAM was
used to refer the random access memory, but now it used to designate the read/write memory to
distinguish it from only read only memory, although ROM is also a random access. RAM is used
for storing bulk of programs and data that are subject to change. ROM are used to for storing
programs that are permanently resident in the computer and for tables of constants that do not
change in value once the production of computer s completed. Among other things , the ROM
portion is used to store the initial programs called a bootstrap loader .This is program whose
function is used to turn on the computer software operating system. Since RAM is volatile its
content are destroyed when power is turn off on other side the content in ROM remain
unchanged either the power is turn off and on again.

7.2.2 Memory Address maps


The designer of computer system must calculate the amount of memory required for particular
application and assign it to either RAM and ROM. The interconnection between memory and
processor is an established from knowledge of the size of memory needed and the type of RAM
and ROM chips available. The addressing of memory can be established by means of a table that
specifies the memory address to each chip. The table, called a memory address map , is a
pictorial representation of assigned address space for each chip in the system.
Example: 512 bytes RAM and 512 bytes ROM

Memory Connection to CPU


1. RAM and ROM chips are connected to a CPU through the data and address buses.
2. The low-order lines in the address bus select the byte within the chips and other lines in
the address bus select a particular chip through its chip select inputs.
7.3. Auxiliary Memory
The most common auxiliary memory devices used in computer systems are magnetic disks and
tapes. Other components used, but not as frequently, are magnetic drums, magnetic bubble
memory, and optical disks. To understand fully the physical mechanism of auxiliary memory
devices one must have knowledge of magnetic, electronics and electronics and electromechanical
systems.

7.3.1 Magnetic Tapes


A magnetic tape transport consists of electrical, mechanical and electronic components to provide
the parts and control mechanism for magnetic tape unit. The tape itself is a strip of coated with
magnetic recording medium. Bits are recorded as magnetic spots on the tape along tracks. Usually,
seven or nine bits are recorded simultaneously to from a character together with a parity bit.
Read/write heads are mounted one in each track so that data can be recorded and read as a
sequence of characters.
7.3.2 Magnetic Disks
A magnetic disk is a circular plate constructed of metals or plastic coated with magnetized. Often
both sides of disk are used and several disks may be stacked on one spindle with read/write heads
available on each surface. All disks rotate together at high speed and are not stopped or started for
access purposes. Bits are stored in magnetized surface in spots along concentric circles called
track. The tracks are commonly divided into section called sectors. In most systems, the minimum
quality of information, which can be transferred, is a sector.

7.3.3 RAID

RAID is an acronym first defined by David A. Patterson, Garth A. Gibson and Randy Katz at
the University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive
Disks a technology that allowed computer users to achieve high levels of storage reliability from
low-cost and less reliable PC-class disk-drive components, via the technique of arranging the
devices into arrays for redundancy .More recently, marketers representing industry RAID
manufacturers reinvented the term to describe a Redundant Array of Independent Disks as a
means of disassociating a "low cost" expectation from RAID technology.

"RAID" is now used as an umbrella term for computer data storage schemes that can divide and
replicate data among multiple hard disk drives. The different Schemes/architectures are named by
the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all
involve two key design goals: increased data reliability or increased input/output performance.
When multiple physical disks are set up to use RAID technology, they are said to be in a RAID
array. This array distributes data across multiple disks, but the array is seen by the computer user
and operating system as one single disk. RAID can be set up to serve several different purposes.

Purpose and basics: Redundancy is achieved by either writing the same data to multiple drives
(known as mirroring), or writing extra data (known as parity data) across the array, calculated
such that the failure of one (or possibly more, depending on the type of RAID) disks in the array
will not result in loss of data. A failed disk may be replaced by a new one, and the lost data
reconstructed from the remaining data and the parity data. Organizing disks into a redundant array
decreases the usable storage capacity. For instance, a 2-disk RAID 1 array loses half of the total
capacity that would have otherwise been available using both disks independently, and a RAID 5
array with several disks loses the capacity of one disk. Other types of RAID arrays are arranged so
that they are faster to write to and read from than a single disk.There are various combinations of
these approaches giving different trade-offs of protection against data loss, capacity, and speed.
RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements.

RAID can involve significant computation when reading and writing information. With
traditional "real" RAID hardware, a separate controller does this computation. In other cases the
operating system or simpler and less expensive controllers require the host computer's processor
to do the computing, which reduces the computer's performance on processor-intensive tasks
(see "Software RAID" and "Fake RAID" below). Simpler RAID controllers may provide only
levels 0 and 1, which require less processing.

RAID systems with redundancy continue working without interruption when one (or possibly
more, depending on the type of RAID) disks of the array fail, although they are then vulnerable
to further failures. When the bad disk is replaced by a new one the array is rebuilt while the
system continues to operate normally. Some systems have to be powered down when removing
or adding a drive; others support hot swapping, allowing drives to be replaced without powering
down. RAID with hot-swapping is often used in high availability systems, where it is important
that the system remains running as much of the time as possible.

Principles: RAID combines two or more physical hard disks into a single logical unit by using
either special hardware or software. Hardware solutions often are designed to present themselves
to the attached system as a single hard drive, so that the operating system would be unaware of the
technical workings. For example, you might configure a 1TB RAID 5 array using three 500GB
hard drives in hardware RAID, the operating system would simply be presented with a "single"
1TB disk. Software solutions are typically implemented in the operating system and would present
the RAID drive as a single drive to applications running upon the operating system.

There are three key concepts in RAID: mirroring, the copying of data to more than one disk;
striping, the splitting of data across more than one disk; and error correction, where redundant
data is stored to allow problems to be detected and possibly fixed (known as fault tolerance).
Different RAID levels use one or more of these techniques, depending on the system
requirements. RAID's main aim can be either to improve reliability and availability of data,
ensuring that important data is available more often than not (e.g. a database of customer orders),
or merely to improve the access speed to files (e.g. for a system that delivers video on demand
TV programs to many viewers).

7.4 Associative memory


Many data-processing application require the search of items in a table stored in memory. The
established way to search a table is to store all items where they can be addressed in a sequence.
The search procedure is a strategy for choosing a sequence of addresses, reading the content of
memory at each address, and comparing the information read with the item being searched until
the match occurs.
The number of accesses to memory depends on the location of item and efficiency of the search
algorithm.
The time required to find the item stored in memory can be reduced considerably if stored data
can be identified for access by content of the data itself rather than by an address. A memory
unit accessed by a content is called associative memory or content addressable memory(CAM).

Compare each word in CAM in parallel with the content of A(Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M(i) = 1
- K(Key Register) provides a mask for choosing a particular field or key in the argument in
A(only those bits in the argument that have 1s in their corresponding position of K are
compared).
Organization of CAM

7.5 Cache memory

The cache is a small amount of high-speed memory, usually with a memory cycle time
comparable to the time required by the CPU to fetch one instruction. The cache is usually filled
from main memory when instructions or data are fetched into the CPU. Often the main memory
will supply a wider data word to the cache than the CPU requires, to fill the cache more rapidly.
The amount of information which is replaces at one time in the cache is called the line size for
the cache. This is normally the width of the data bus between the cache memory and the main
memory. A wide line size for the cache means that several instruction or data words are loaded
into the cache at one time, providing a kind of prefetching for instructions or data. Since the
cache is small, the effectiveness of the cache relies on the following properties of most programs:

Spatial locality -- most programs are highly sequential; the next instruction usually comes
from the next memory location.

Data is usually structured, and data in these structures normally are stored in contiguous
memory locations.

Short loops are a common program structure, especially for the innermost sets of nested
loops. This means that the same small set of instructions is used over and over.

Generally, several operations are performed on the same data values, or variables.

When a cache is used, there must be some way in which the memory controller determines
whether the value currently being addressed in memory is available from the cache. There are
several ways that this can be accomplished. One possibility is to store both the address and the
value from main memory in the cache, with the address stored in a type of memory called
associative memory or, more descriptively, content addressable memory.

An associative memory, or content addressable memory, has the property that when a value is
presented to the memory, the address of the value is returned if the value is stored in the
memory, otherwise an indication that the value is not in the associative memory is returned. All
of the comparisons are done simultaneously, so the search is performed very quickly. This type
of memory is very expensive, because each memory location must have both a comparator and a
storage element. A cache memory can be implemented with a block of associative memory,
together with a block of ``ordinary'' memory. The associative memory would hold the address of
the data stored in the cache, and the ordinary memory would contain the data at that address.
Such a cache memory might be configured as shown in Figure.
Figure: A cache implemented with associative memory

If the address is not found in the associative memory, then the value is obtained from main
memory. Associative memory is very expensive, because a comparator is required for every
word in the memory, to perform all the comparisons in parallel. A cheaper way to implement a
cache memory, without using expensive associative memory, is to use direct mapping. Here, part
of the memory address (usually the low order digits of the address) is used to address a word in
the cache. This part of the address is called the index. The remaining high-order bits in the
address, called the tag, are stored in the cache memory along with the data. For example, if a
processor has an 18 bit address for memory, and a cache of 1 K words of 2 bytes (16 bits) length,
and the processor can address single bytes or 2 byte words, we might have the memory address
field and cache organized as in Figure .
Figure: A direct mapped cache configuration

This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there
are 4 other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for
each byte and one for the tag. The parity bits are used to check that a single bit error has not
occurred to the data while in the cache. A fourth bit, called the valid bit is used to indicate
whether or not a given location in cache is valid. In the PDP-11/60 and in many other processors,
the cache is not updated if memory is altered by a device other than the CPU (for example when
a disk stores new data in memory). When such a memory operation occurs to a location which
has its value stored in cache, the valid bit is reset to show that the data is ``stale'' and does not
correspond to the data in main memory. As well, the valid bit is reset when power is first applied
to the processor or when the processor recovers from a power failure, because the data found in
the cache at that time will be invalid. In the PDP-11/60, the data path from memory to cache was
the same size (16 bits) as from cache to the CPU. (In the PDP-11/70, a faster machine, the data
path from the CPU to cache was 16 bits, while from memory to cache was 32 bits which means
that the cache had effectively prefetched the next instruction, approximately half of the time).
The amount of information (instructions or data) stored with each tag in the cache is called the
line size of the cache. (It is usually the same size as the data path from main memory to the
cache.) A large line size allows the prefetching of a number of instructions or data words. All
items in a line of the cache are replaced in the cache simultaneously, however, resulting in a
larger block of data being replaced for each cache miss.

The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K
bytes. For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a
cache configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes
would be typical, and line sizes of 1, 2 or 4 words would be typical).
Figure: One possible MIPS cache organization

A characteristic of the direct mapped cache is that a particular memory address can be mapped
into only one cache location. Many memory addresses are mapped to the same cache location (in
fact, all addresses with the same index field are mapped to the same cache location.) Whenever a
``cache miss'' occurs, the cache line will be replaced by a new line of information from main
memory at an address with the same index but with a different tag.

Note that if the program ``jumps around'' in memory, this cache organization will likely not be
effective because the index range is limited. Also, if both instructions and data are stored in
cache, it may well happen that both map into the same area of cache, and may cause each other
to be replaced very often. This could happen, for example, if the code for a matrix operation and
the matrix data itself happened to have the same index values.

A more interesting configuration for a cache is the set associative cache, which uses a set
associative mapping. In this cache organization, a given memory location can be mapped to
more than one cache location. Here, each index corresponds to two or more data words, each
with a corresponding tag. A set associative cache with n tag and data fields is called an ``n-way
set associative cache''. Usually , for k = 1, 2, 3 are chosen for a set associative cache (k =
0 corresponds to direct mapping). Such n-way set associative caches allow interesting tradeoff
possibilities; cache performance can be improved by increasing the number of ``ways'', or by
increasing the line size, for a given total amount of memory. An example of a 2-way set
associative cache is shown in Figure , which shows a cache containing a total of 2K lines, or 1 K
sets, each set being 2-way associative. (The sets correspond to the rows in the figure.)

Figure: A set-associative cache organization


In a 2-way set associative cache, if one data word is empty for a read operation corresponding to
a particular index, then it is filled. If both data words are filled, then one must be overwritten by
the new data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are
filled, then one value in the set must be overwritten, or replaced, in the cache by the new tag and
data values. Note that an entire line must be replaced each time. The most common replacement
algorithms are:

Random -- the location for the value to be replaced is chosen at random from all n of the
cache locations at that index position. In a 2-way set associative cache, this can be
accomplished with a single modulo 2 random variable obtained, say, from an internal
clock.
First in, first out (FIFO) -- here the first value stored in the cache, at each index position,
is the value to be replaced. For a 2-way set associative cache, this replacement strategy
can be implemented by setting a pointer to the previously loaded word each time a new
word is stored in the cache; this pointer need only be a single bit. (For set sizes > 2, this
algorithm can be implemented with a counter value stored for each ``line'', or index in the
cache, and the cache can be filled in a ``round robin'' fashion).
Least recently used (LRU) -- here the value which was actually used least recently is
replaced. In general, it is more likely that the most recently used value will be the one
required in the near future. For a 2-way set associative cache, this is readily implemented
by setting a special bit called the ``USED'' bit for the other word when a value is accessed
while the corresponding bit for the word which was accessed is reset. The value to be
replaced is then the value with the USED bit set. This replacement strategy can be
implemented by adding a single USED bit to each cache location. The LRU strategy
operates by setting a bit in the other word when a value is stored and resetting the
corresponding bit for the new word. For an n-way set associative cache, this strategy can
be implemented by storing a modulo n counter with each data word. (It is an interesting
exercise to determine exactly what must be done in this case. The required circuitry may
become somewhat complex, for large n.)

Cache memories normally allow one of two things to happen when data is written into a memory
location for which there is a value stored in cache:
Write through cache -- both the cache and main memory are updated at the same time.
This may slow down the execution of instructions which write data to memory, because
of the relatively longer write time to main memory. Buffering memory writes can help
speed up memory writes if they are relatively infrequent, however.
Write back cache -- here only the cache is updated directly by the CPU; the cache
memory controller marks the value so that it can be written back into memory when the
word is removed from the cache. This method is used because a memory location may
often be altered several times while it is still in cache without having to write the value
into main memory. This method is often implemented using an ``ALTERED'' bit in the
cache. The ALTERED bit is set whenever a cache value is written into by the processor.
Only if the ALTERED bit is set is it necessary to write the value back into main memory
(i.e., only values which have been altered must be written back into main memory). The
value should be written back immediately before the value is replaced in the cache.

7.6 Virtual Memory Concept


In a memory hierarchy system, programs and data are first stored in auxiliary memory. Portion
of program or data are brought into main memory as they are needed by CPU. Virtual memory is
a concept used in some large computer systems that permit the user to construct programs as
though a large memory space were available , equal to the totality of auxiliary memory. Each
address that is referenced by CPU goes through the address mapping from the so called virtual
address to a physical address in memory.Virtual memory is used to give the programmer the
illusion that the system has a very large memory, even though the computer actually has a
relatively small main memory.
7.6.1 Address Mapping Memory Mapping Table for Virtual Address ->

Physical address

Address Space and Memory Space are each divided into fixed size group of words called blocks
or pages

1K words group
Organization of memory Mapping Table in a paged system

7.6.2 Associative memory page table


Assume that Number of Blocks in memory = m, Number of Pages in Virtual Address Space = n.

Page Table

Straight forward design -> n entry table in memory, Inefficient storage space utilization <-
n-m entries of the table is empty
More efficient method is m-entry Page Table. Page Table made of an Associative Memory
that is m words; (Page Number: Block Number)

Virtual
address
Page no.
1 0 1 Line number Argument
register
1 0 1 0 0 Key
register
0 0 1 1 1
0 1 0 0 0 Associative
1 0 1 0 1 memory
1 1 0 1 0
Page Block no.
no.
Page Fault

1. Trap to the OS

2. Save the user registers and program state

3. Determine that the interrupt was a page fault

4. Check that the page reference was legal and determine the location of the page on the
backing store(disk)

5. Issue a read from the backing store to a free frame

a. Wait in a queue for this device until serviced

b. Wait for the device seek and/or latency time

c. Begin the transfer of the page to a free frame

6. While waiting, the CPU may be allocated to some other process

7. Interrupt from the backing store (I/O completed)

8. Save the registers and program state for the other user

9. Determine that the interrupt was from the backing store

10. Correct the page tables (the desired page is now in memory)

11. Wait for the CPU to be allocated to this process again

12. Restore the user registers, program state, and new page table, then resume the interrupted
instruction.
Processor architecture should provide the ability to restart any instruction after a page fault.

Multiple choice questions


1.The content of these chips are lost when the computer is switched off?
a) ROM Chips
b) RAM Chips
c) DRAM Chips

2.What are responsible for storing permanent data and instructions?


a) ROM Chips
b) RAM Chips
c) DRAM Chips

3. which memory is used to speedup the processes


a) Associative memory
b) Main memory
c) Cache memory

4.How many bits of information can each memory cell in a computer chip hold?
a).0 bit
b).1 bit
c).8 bits
5.What type of computer chips are said to be volatile
a).ROM Chips
b).RAM Chips
c).DRAM Chips

6. The interface between level-2 (operating system) and level-1 (Microprogram) of a computer
design is called:
a).Computer architecture
b).Virtual machine interface
c).User interface
d).All of the above

7.Which memory is actual working memory


a)SAM b)ROM
c).RAM d).TAB MEMORY

8.Which of the following is secondary memory device

a). ALU b).keyboard c). Disk d) All of the above

9. A micro computer has primary memory of 640k . What is the exact number of bytes contained
in this memory?

a). 64x 1000 b).640x100 c).640x1024 d). either b or c

10.How many bits can be stored in 8K RAM

a). 8000 b).8192 c). 4000 d). 4096

Answers

1. a , 2.b , 3. a,4.c ,5.b ,6.d ,7. d ,8. c , 9. c,10.d


Chapter 8 : Pipeline and Vector Processing

8.1 Parallel Processing


Parallel processing is a term used to denote a large class of techniques that are used to provide
simultaneous data processing tasks for the purpose of increasing the computational speed of
computer. The purpose of parallel processing is to speed up the computer processing capability
and increased its throughput that is, the amount of processing can be accomplished during a
interval of time the amount of hardware increases with parallel processing and with it the cost of
system increases.

Processor with multiple functional units 8.1

Parallel processing at a higher level of complexity can be achieved by having a multiplicity of


functional units that perform identical or different operation simultaneously. Parallel processing
is established by distributing the data among the multiple functional units. . For example, the
arithmetic, logic, and shift operations can be separated into three units and the operands diverted
to each unit under the supervision of a control unit . The normal operation of a computer is to
fetch instructions from memory and execute them in the processor.

The sequence of instructions read from memory constitutes an instruction stream. The operations
performed on the data in the processor constitutes a data stream. Parallel processing may occur in
the instruction stream, in the data stream, or in both. Flynn's classification divides computers into
four major groups as follows:

1. Single instruction stream, single data stream (SISD)


2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)

SISD represents the organization of single computer containing a control unit, a processor unit
and memory unit. Instruction are executed sequentially and system may or may not have internal
parallel processing capabilities. Parallel processing in this case may be achieved by means of
multiple functional units or by pipeline processing.

SIMD represents an organization that includes many processing units under the supervision of a
common control unit .All processor receive the same instruction from the control unit but operate
on the different items of data. The shared memory unit must contain multiple modules so that it
can communicate with all the processors simultaneously. MISD structure is only of theoretical
interest since no practical system has been constructed using this organization. MIMD
organization refer to computer system capable of processing several programs at the same time.
Most multiprocessor and multicomputer systems can be classified in this category.

Flynn's classification depends on the distinction between the performance of the control unit and
the data-processing unit. It emphasizes the behavior characteristics of the computer system
rather than its operational and structural interconnections. One type of parallel processing that
does not fit Flynn's classification is pipelining.
8.2 Pipelining
Pipelining is a technique of decomposing a sequential process into suboperations, with each
subprocess being executed in special dedicated segment that operates concurrently with all other
segments. A pipeline can be visualized as a collection of processing segments through which
binary information flows.
Each segment performs partial processing dictated by the way the task is partitioned. The result
obtained from the computation in each segment is transferred to the next segment in the pipeline.
The final result is obtained after the data have passed through all segments. The name "pipeline"
implies a flow of information analogous to an industrial assembly line. It is characteristic of
pipelines that several computations can be in progress in distinct segments at the same time. The
overlapping of computation is made possible by associating a register with each segment in the
pipeline. The registers provide isolation between each segment so that each can operate on
distinct data simultaneously.
Perhaps the simplest way of viewing the pipeline structure is to imagine that each segment
consists of an input register followed by a combinational circuit. The register holds the data and
the combinational circuit performs the suboperation in the particular segment. The output of the
combinational circuit in a given segment is applied to the input register of the next segment. A
clock is applied to all registers after enough time has elapsed to perform all segment activity. In
this way the information flows through the pipeline one step at a time.
The pipeline organization will be demonstrated by means of a simple example. Suppose that we
want to perform the combined multiply and add operations with a stream of numbers.

Ai*Bi + Cj

for i = 1,2,3, . . . , 7
Each suboperation is to be implemented in a segment within a pipeline. Each segment has one or
two registers and a combinational circuit as shown in Figure 8.2. Rl through R5 are registers that
receive new data with every clock pulse.The multiplier and adder are combinational circuits. The
suboperations performed in each segment of the pipeline are as follows:
R1<--Aj, R2<--Bj Input Aj and Bj
R3<--Rl*R2, R4-Cj Multiply and input Ci
R5<--R3 + R4 Add Cj to product

Example of pipeline processing 8.2

8.3 Arithmetic Pipeline


Pipeline arithmetic units are usually found in very high speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems. A pipeline multiplier is essentially an array
multiplier as described in Fig. 10-10, with special adders designed to minimize the carry
propagation time through the partial products. Floating-point operations are easily decomposed
into suboperations as demonstrated in . We will now show an example of a pipeline unit for
floating-point addition and subtraction.
Floating Point Arithmetic Pipeline:

a. Suitable for pipeline as it consists of series of steps


b. Can implement FP ADD algorithm using the following pipeline

Fixed Point Arithmetic Pipeline:

a. Can convert the ripple-carry adder as follows:

b. Can be used for both ADD & SUB


8.4 Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream as well.
An instruction pipeline reads consecutive instructions from memory while previous instructions
are being executed in other segments.This causes the instruction fetch and execute phases to
overlap and perform simultaneous operations. One possible digression associated with such a
scheme is that an instruction may cause a branch out of sequence. In that case the pipeline must
be emptied and all the instructions that have been read from memory after the branch instruction
must be discarded.

Consider a computer with an instruction fetch unit and an instruction execution unit designed to
provide a two-segment pipeline. The instruction fetch segment can be implemented by means of
a first-in, first-out (FIFO) buffer. This is a type of unit that forms a queue rather than a stack.
Whenever the execution unit is not using memory, the control increments the program counter
and uses its address value to read consecutive instructions from memory. The instructions are
inserted into the FIFO buffer so that they can be executed on a first-in, first-out basis. Thus an
instruction stream can be placed in a queue, waiting for decoding and processing by the
execution segment. The instruction stream queuing mechanism provides an efficient way for
reducing the average access time to memory for reading instructions. Whenever there is space in
the FIFO buffer, the control unit initiates the next instruction fetch phase. The buffer acts as a
queue from which control then extracts the instructions for the execution unit.

Computers with complex instructions require other phases in addition to the fetch and execute to
process an instruction completely. In the most general case, the computer needs to process each
instruction with the following sequence of steps.
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

There are certain difficulties that will prevent the instruction pipeline from operating at its
maximum rate. Different segments may take different times to operate on the incoming
information. Some segments are skipped for certain operations. For example, a register mode
instruction does not need an effective address calculation. Two or more segments may require
memory access at the same time, causing one segment to wait until another is finished with the
memory. Memory access conflicts are sometimes resolved by using two memory buses for
accessing instructions and data in separate modules. In this way, an instruction word and a data
word can be read simultaneously from two different modules.

The design of an instruction pipeline will be most efficient if the instruction cycle is divided into
segments of equal duration. The time that each step takes to fulfill its function' depends on the
instruction and the way it is executed.

8.5 RISC Pipeline


Among the characteristics attributed to RISC is its ability to use an efficient instruction pipeline.
The simplicity of the instruction set can be utilized to implement an instruction pipeline using a
small number of suboperations, with each being executed in one clock cycle. Because of the
fixed-length instruction format, the decoding of the operation can occur at the same time as the
register selection. All data manipulation instructions have register-to register operations. Since
all operands are in registers, there is no need for calculating an effective address or fetching of
operands from memory. Therefore, the instruction pipeline can be implemented with two or three
segments.One segment fetches the instruction from program memory, and the other segment
executes the instruction in the ALU. A third segment may be used to store the result of the ALU
operation in a destination register.
The data transfer instructions in RISC are limited to load and store instructions. These
instructions use register indirect addressing. They usually need three or four stages in the
pipeline. To prevent conflicts between a memory access to fetch an instruction and to load or
store an operand, most RISe machines use two separate buses with two memories: one for
storing the instructions and the other for storing the data. The two memories can sometime
operate at the same speed as the epu clock and are referred to as cache memories .
It is not possible to expect that every instruction be fetched from memory and executed in one
clock cycle. What is done, in effect, is to start each instruction with each clock cycle and to
pipeline the processor to achieve the goal of single-cycle instruction execution. The advantage of
RISC over else (complex instruction set computer) is that RISC can achieve pipeline segments,
requiring just one clock cycle, while CISC uses many segments in its pipeline, with the longest
segment requiring two or more clock cycles.
Another characteristic of RISC is the support given by the compiler that translates the high-level
language program into machine language program. Instead of designing hardware to handle the
difficulties associated with data conflicts and branch penalties, RISC processors rely on the
efficiency of the compiler to detect and minimize the delays encountered with these problems.

8.6 Vector Processing


There is a class of computational problems that are beyond the capabilities of conventional
computer. These problems are characterized by the fact that they require a vast number of
computations that will take a conventional computer days or even weeks to complete. In many
science and engineering applications, the problems can be formulated in terms of vectors and
matrices that lend themselves to vector processing.Computers with vector processing capabilities
are in demand in specialized applications. The following are representative application areas
where vector processing is of the utmost importance.

1. Long-range weather forecasting


2. Petroleum explorations
3. Seismic data analysis
4. Medical diagnosis Aerodynamics and space flight simulations
5. Artificial intelligence and expert systems
6. Mapping the human genome
7. Image processing
Without sophisticated computers, many of the required computations cannot be completed within
a reasonable amount of time. To achieve the required level of high performance it is necessary to
utilize the fastest and most reliable hardware and apply innovative procedures from vector and
parallel processing techniques.

Vector Operations
Many scientific problems require arithmetic operations on large arrays of numbers. These
numbers fire usually formulated as vectors and matrices of floating-point numbers. A vector is
an ordered set of a one-dimensional array of data items. A vector V of length n is represented as
a row vector by V = [VI V2 V3 ... Vn]. It may be represented as a column vector if the data items
are listed in a column. A conventional sequential computer is capable of processing operands one
at a time. Consequently, operations on vectors must be broken down into single computations
with subscripted variables. The element Vi ot vector V is written as V(I) and the index I refers to
a memory address or register where the number is stored. To examine the difference between a
conventional scalar processor and a vector processor, consider the following Fortran DO loop:
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
This is a program for adding two vectors A and B of length 100 to produce a vector C. This is
implemented in machine language by the following sequence of operations.
Initialize I = 0
20 Read A (I)
Read B ( I )
Store C (I) = A (I) + B (I)
Increment I = I + 1
If I :5 100 go to 20
Continue

This constitutes a program loop that reads a pair of operands from arrays A and B and performs a
floating-point addition. The loop control variable is then up~ and the steps repeat 100 times.
A computer capable of vector processing eliminates the overhead associated with the time it
takes of fetch and execute the instructions in the program loop.It allows operations to be
specified with single vector instruction of the form
C(1:100) = A(1:100) + B(1:100)

Vector Instructions

Vector Instruction format


Pipeline for Inner Product

8.7 Array Processors


An array processor is processor that performs computations on large arrays of data. The term is
used to refer two different types of processor . An attach array processor is an auxiliary
processor attached to general purpose computer. It is intended to improve the performance of
the host computer in specific numerical computation tasks. An SIMD processor is a processor
that has single instruction multiple data organization. It manipulates vector instructions by means
of multiple functional units responding to a common instruction . Although both types of array
processors manipulate vector , their organization is different.

Attached Array Processor


An attached array processor is designed as a peripheral for conventional host computer, and its
purpose is to enhance the performance of computer by providing Vector processing for complex
applications. It achieves high performance by means of parallel processing with multiple
functional units. It includes an arithmetic unit containing one or more pipelined floating point
adders and multipliers. The array processor can be programmed by the user to accommodate
variety of complex arithmetic problems.
Attached array processor with host computer

SIMD Array Processor

An SIMD array processor is a computer with multiple processing units operating in parallel. The
processing units are synchronized to perform the same operation under the control of common
control unit, thus providing a single instruction stream, multiple data stream organization. A
general block diagrams of an array is shown in the below diagram. It contains a set of identical
processing elements (PE),each having a local memory M. Each processor includes an ALU, a
floating point arithmetic unit , and working registers. The master control unit controls the
operations in the processor elements.The main memory is used for storage of the program.The
Function of the master control unit is decode the instructions and determine how the
instructions and determine how the instruction is to be executed. Scalar and program control
instructions are directly executed within the master control units. Vector instructions are broad
cast to all PEs simultaneously. Each PE uses operand stored in its local memory. Vector
operands distributed to the local memories prior to parallel execution of the instruction.
8.8 Reduced Instruction set computer: RISC VS CISC
Many computers have instruction sets that include more than 100 and sometimes even
more than 200 instructions. These computers also employ a variety of data types and a
large number of addressing modes. The trend into computer hardware complexity was
influenced by various factors, such as upgrading existing models to provide more customer
applications, adding instructions that facilitate the translation from high-level language into
machine language programs, and striving to develop machines that move functions from
software implementation into hardware implementation. A computer with a large number
of instructions is classified as a complex instruction set computer, abbreviated CISC.

ClSC Characteristics
1. A large number of instructions-typically from 100 to 250 instructions.
2. Some instructions that perform specialized tasks and are used infrequently.
3. A large variety of addressing modes-typically from 5 to 20 different modes.
4. Variable-length instruction formats
5. Instructions that manipulate operands in memory
RISC Characteristics
1.Relatively few instructions.
2.Relatively few addressing modes.
3. Memory access limited to load and store instructions
4.All operations done within the registers of the CPU

5. Fixed-length, easily decoded instruction format


6. Single-cycle instruction execution

7. Hardwired rather than micro programmed control

End Chapter quizzes:


Q1. A technique uses special hardware to detect a conflict and then avoid the data by routing
the data through special paths between pipeline segment.
a) Operand forwarding
b) Delayed load
c) Hardware interlocks
d) Delayed branch

Q2.Data transfer are limited to load and store instructions.


a) CISC
b) RISC
c) SIMD
d) MISD

Q3. Which Pipeline are used to implement floating-point operations, multiplication of fixed-
point numbers, and similar computations encountered in scientific problems.
a) Instruction Pipeline
b) Arithmetic Pipeline
c) RISC Pipeline
d) CISC Pipeline
Q4. SIMD does not include
a) ALU
b) Processing Element
c) Local memory
d) Control Unit

Q5. Pipelining is a technique of decomposing a __________into __________, with each


subprocess being executed in special dedicated segment.
a) sequential process and suboperations
b) Suboperations and sequential process
c) Instructions and sequential process
d) Sequential and segmentation

Q6. Flow of information is synonymous of


a) Parallel processing
b) Vector processing
c) Array processor
d) Pipelining

Q7.Parallel processing can be achieved with the help of


a) Multiple functional unit
b) Synchronization
c) Stream lining the data
d) Execution of instruction

Q8. To solve A-B/C equation number of registers


a) 3
b) 4
c) 5
d) 2
Q9.Flynns classification divides computers into how many computers.
a) 5
b) 6
c) 3
d) 4

Q10. A measure used to evaluate supercomputer computer in their ability to perform a given
number of floating-point operations per second is referred
a) Flops
b) Bytes
c) Bits
d) Hz

Answers
1.(a), 2.(b),3.(b), 4.(d) ,5.(a) ,6(d), 7.(a) ,8.(c), 9.(d) , 10.(a)

You might also like