You are on page 1of 12

Embedded Systems

Purushotam Shrestha

Chapter 2: Hardware and Software Design Issues


2.1 Hardware Design Issues
2.1.1 Combinational and Sequential Logic
Combinational Logic Circuits
A combinational logic circuit gives output on the basis of value of its current inputs, no memory or storage of
previous history is required. It should be noted that the output appears after certain time, the propagation
delay, after the input has been applied at the input ports. The output may not change immediately which should
not be confused with memory.
Combinational
Outputs
Inputs
Logic
Figure: Combinational Logic
puts
A combinational circuit is represented by a truth table which lists the combination of inputs and their
corresponding outputs.
The examples are adder circuits, multiplexors, decoders, comparators etc.
Combinational design
A generalized design procedure is explained below:
1.
2.
3.
4.
5.

Use the problem description to find out the truth table containing inputs and corresponding outputs.
Use K-maps to find a logic expression or get a pre-defined function.
Use Boolean algebra if simplification is required.
Draw circuits and simulate/ implement the circuit using necessary gates.
Testing and re-design may be required.

Sequential Logic Circuits


A sequential logic circuit gives output as a function of
current as well as previous inputs. The fact that previous
inputs are involved is some sort or memory or storage is
involved in these circuits, the memory is provided by
feedback mechanism. A sequential circuit stays in a state
and changes from one to another state based on input
values. A state can be considered as combination of the
values stored in the memory element. A same input may
give different output depending upon the state of the
circuit.

Combinational
Logic

Sequential
Logic
elements:
Flip-Flops
Figure: Sequential Logic

At the heart of a sequential circuit are the flipflops which provide the memory based functions,
while some processing may be done by
additional combinational logic.
Examples are JK flip flop, counters, registers etc
A sequential circuit is represented by a state
Chapter 2

Figure: State Diagram


1

Embedded Systems

Purushotam Shrestha

diagram consisting of states(circles), transitions(lines) between states upon triggered by input control, or state
tables listing input, present state, output, next state and required excitations,
Sequential design
1. Use the problem description to find out the state diagram/ table consisting of present states, inputs, next
states and corresponding outputs.
2. If N is the number of states, the log2N flip flops are required. Choose a particular flip-flop on the basis of
availability, cost, required flexibility etc.
3. Find out the flip-flop inputs that change a current state into next state for each flip-flop. Use flip-flop
excitation tables. The guidelines for combinational logic may be applicable here.
4. Draw circuits and simulate/ implement the circuit using necessary gates.
5. Testing and re-design may be required.

2.1.2 Custom Single Purpose Processor Design


A single purpose processor is characterized by:
Very specific, single task; one program.
Usually implemented in hardware requiring no program memory. Even if a program exists, there are few
instructions, enabling a direct hardware implementation.
Single purpose processors have advantages that they are in general:

Faster
Small
Low power consuming

The disadvantage is that they have

High NRE: Non Recurring Engineering may be higher implying higher cost, and
Less flexible, a processor of this type may be useless when required to perform a different task.

Examples:
Timers whose sole purpose is to decrement
a loaded value(time) to zero and give a signal.
LED and LCD display drivers that take in
certain bit values and compute specific bit
patterns suitable for the devices.
Motor control circuits which generate
driving signals in response to a command.
The custom single purpose processor is
required for non-standard task, the one a
designer needs but is not commercially
available in the market.

Design
A custom single purpose processor is designed
to meet a non standard specific customer/
application need.
Generally a single purpose processor consists of a
controller and a datapath.

Fig: Single Purpose Processor Architecture

Chapter 2

Embedded Systems

Purushotam Shrestha

Controller: A controller consists of the circuitry that controls the actions/ functions of the functional units, what
operation to perform by establishing paths between the units, selecting hardware blocks for computations. For
this it takes in external control signals, generates control signals to use upon the registers and other functional
units and gives out status and other control signals.
Datapath:
The datapath block consists of registers, ALUs, interconnection buses, multiplexers that are required for
handling data: moving data between registers and memory, performing computations on data, feeding data to
and taking results from functional units. The operations are carried out on the basis of the control signals
provided by control unit.
A general design procedure to design a custom single purpose involves following:
Specification: Before starting any design, the requirements must be clear: what the processor does/ has to do.
Identify inputs and outputs.
Algorithm: The processor processes inputs and gives outputs. The algorithm is about how it does the processing.
A flowchart may also do the job. The processing may require basic arithmetic operations, logical operations or
combinations of these. Develop the algorithm and verify the processes. A control mechanism is required to
carry out those operations in a certain sequence in order to give desired results.
Finite State Machine with Datapath (FSMD for short, it is a complex state diagram in which states and arcs may
include arithmetic and logical expressions which may use external control inputs and outputs as well as
variables, this is also known as Register Transfer Level): Use the algorithm to construct a FSMD or construct it
directly if possible. It is more like a flow chart containing the expressions. This state shows the number of states
required to perform the task at hand.
Datapath: Use a suitable register for a variable, may be input/output or an intermediate result. For each type of
operation/ computation, use a functional block, for example an adder for Addition purpose. An ALU has several
functional units, but we are designing a custom processor, not a general purpose. Define the interconnections
between the registers and functional units.
Finite State Machine (FSM): The FSM is for the controller. Assign the binary codes for each state in the FSMD.
Identify the control signals required by the datapath to carry out the operations in the sequence and manner as
per the specification. There may be inputs, external or generated by the datapath, to the state machine. Based
on these inputs and the current state, the FSM generates these signals as output. For example if there is a
register load operation in the FSMD, the signal line controlling the load operation is identified in the datapath,
labeled as a output variable of the controller and is included in the state diagram.
Controller: The controller design is essentially a sequential circuit design based on above FSM. Use
combinational and sequential logic elements to design the controller. An excitation table with present state,
inputs, next state, outputs may be helpful at this stage.
Implementation and Testing: After the design is completed, a simulation may help to catch the errors in the
design. Iterative review and simulation can reduce and eliminate the errors. Actual hardware implementation
can be done now. The hardware should be tested before application.

2.1.3 Optimization
Optimization is process of maximizing output/ efficiency of a system for an available, often limited, resources.
Once the design phase is completed, the whole process should be reviewed for optimization opportunity. The
custom single purpose processor can be optimized stage-wise as follows:
Original Program/ Algorithm:
The areas of improvement may be in
Chapter 2

Embedded Systems

Purushotam Shrestha

Size of variable: the size of variable directly impacts upon size of registers, interconnection buses, reduce the
size if possible
Number of computations: multiple computations may be reduced. A subtraction of value 1 may be replaced by
decrement reducing the number of computation and complexity. Approximation can also reduce the complexity.
Operations used: Multiplication and Division hardware is costly, replace by other operations where possible.
The FSMD:
Merge states: Two adjacent states with constants on transitions can be eliminated. If two states have
independent operations, one of them can be eliminated or merged to one. Two register-load operations can be
performed in a single state if there are two registers available.
Split States: If a state consists of complex type of operation, it can be split resulting in simpler operation which
implies simpler and less hardware. Instead of adding 4 numbers at once, each can be added one by one to a sum
value initialized to zero.
The Datapath:
Reuse of hardware units: It is not necessary to use a single hardware unit for each operation. If the operations
are same and are not carried out simultaneously, the hardware unit can be shared. Repetitive and sequential (non
simultaneous) operations like additions can be performed in single hardware.
Use of multifunctional units: A multi function unit like ALU can be used for arithmetic and logical operations, the
function being selected as per requirement. A single register with right and left shift capability can be used for both
operations instead of using two registers.
The controller:
Number of states: the controller involves states and transitions between the states. Its optimization follows
directly from FSM. Similar techniques of state minimization and simplification can be applied.

2.2 Software Design Issues


2.2.1 Basic Architecture
When software is involved, the processor
needs to be programmable. A processor
executing a program has a controller and a
datapath different from that of single
purpose processor in the sense that the
datapath is capable of multiple functions
which are selected by the controller in the
sequence and manner dictated by the
program.
Since program is required, it needs memory
for storing program, a control logic to read
from and write to the memory.
The controller gets an instruction from
memory, decodes it, generates appropriate
control signals for the datapath.
Generalization, program, storage are key
aspects in the basic architecture.

Fig: Architecture of a General Purpose or programmable processor

Other Architectural Aspects


Since memory is involved, it may be Harvard or Princeton architecture.
Word length of the processor, registers and interconnection bus width.
Clock frequency should accommodate the propagation delay inherent in the devices.
Processors may use pipelining and multi-core architectures for enhancing speed.
4

Chapter 2

Embedded Systems

Purushotam Shrestha

Superscalar and VLIW


Both the superscalar and VLIW (very long instruction word) architectures involve multiple execution units.
Independent Instructions can be sorted and assigned to the appropriate execution units simultaneously so that
they can be executed in parallel thus increasing performance. By independent instructions we mean that one
instruction does not have to wait for the results of another for execution or memory writes are not taking place
simultaneously. Dependent instructions must be executed sequentially.
The sorting and scheduling of the instructions can be during compiling or runtime.
Superscalar architecture schedules the independent instructions during run time using specialized hardware, the
process being known as dynamic scheduling.
While in VLIW, the scheduling takes place during compiling time and long word instructions are prepared which
are later assigned to the execution units later. This involves a simpler hardware.
Pipelining:
A method of executing tasks by breaking them into well defined multiple steps and executing the different steps
simultaneously in order to increase throughput.
Example:
Execution of an instruction involves
Instruction Fetch
Decode
Execute stages.
Each stage is completed by a separate hardware or can be made to be carried on separate hardware. The first
instruction is fetched and goes to next stage. Now the fetching hardware is free so another instruction can be
fetched. The first instruction is decoded and moved to execute stage. The second instruction is sent to decode
stage. Now a third instruction can be fetched. Three instructions are in execution process or one can say
pipeline. Once the first instruction is executed, instructions are executed every other cycle. While in nonpipelined, it takes 3 cycles for each instruction. Hence speed is achieved.
Instruction no 1

IF

Instruction no 2

DE

EX

IF

DE

EX

IF

DE

EX

IF

DE

EX

Instruction no 3
Instruction no 4
Time Cycle

2.2.2 Operations:
In order to carry out a task, a processor performs computational and control operations. The operations can be
broadly categorized into two groups:
Datapath operations and
Control operations
Datapath Operations:
The datapath being a data processing unit, carries out arithmetic and logical operations on data and data
movement into and out of the actual computational unit. The following are the main operations carried out by
the datapath:
Load / Read operations:
The data to be processed are loaded into ALU registers, either from memory or other input registers connected
to sensors and other input modules. Examples MOV operations
ALU operations:
The ALU can perform many different arithmetic and logical operations. The data are loaded into the processor,
one of the several processing functions is selected using appropriate value of the control lines, and the results
Chapter 2

Embedded Systems

Purushotam Shrestha

appear on the output register. Examples ADD, SUB, AND operations. Depending upon the application of
processor, the operations may be different, but the operations are basically computational in nature.
Store/ write operations:
The output of the computations are to be written into memory for further computations or loaded into output
registers that interface the external world through some additional stages. Examples MOV, PUSH operations
Controller operations:
The datapath needs control signals in order to carry out its operations. The controller is responsible for
providing these signals based upon the program instruction. In general, the controller repeatedly performs
following sequence of operations
Fetch instruction: The controller gets the instruction from memory address pointed by the program counter
which always points to next instruction address. Once the instruction is fetched, the program counter is
increased by 1 or a jump address is loaded. The fetched opcode is loaded into instruction register for decoding.
Decode instruction: The value loaded into the instruction is decoded to find out what the instruction means.
Each value of an opcode is unique and decoded using logic circuits to generate control signals that activate /
deactivate, enable / disable registers, a function of ALU etc
Fetch operands: Once an instruction is decoded, operands are required to operate on. The fetch operand is a
data movement process between datapath registers and memory. The registers and memory address are
determined by addressing modes. Some instructions may not require data for their operation like subroutine
return RET, no operation NOP.
Execute: The execute phase involves passing data to the actual processor, like ALU, and selecting the processor
function. The processor gives the results according to the function selected.
Store results: The output might be required for further computations and needs to be stored in memory or
other registers. So the processed data is moved from the registers in the datapath to specified memory address
or registers.

Fig: Operations / Instruction Execution

Chapter 2

Embedded Systems

Purushotam Shrestha

2.2.3 Programmer's view


Hardware details are not a concern for a programmer. Instead of how the components of processor are interconnected, a programmer is interested in what the given processor can be programmed to do. A programmer
sees a processor as a system of following:
Instruction Set
A processor is bound to its instruction set, it cannot perform outside its instruction set. A person programming a
processor must know about the instructions the processor can execute. The instructions may be arithmetic and
logical, data transfer or branching. An incomplete knowledge renders the program and the processor executing
the program, both inefficient.
The instructions may be RISC or CISC.
The processor may be programmed using:
A structured high level language, easier for the programmer, program size is larger
A low level assembly language, harder for the programmer, program size is smaller
Memory
Another concern for a programmer is the availability of memory. The size of the program cannot exceed the size
of memory an embedded system has for program-storage. The speed and efficiency of program execution
depends upon the number of registers, memory closest to the processor. The width of the word stored by the
memory/ registers is also a key point. A floating point number requires larger words, larger registers. Again, the
hardware details are not relevant to the programmer.
Addressing modes
The addressing modes determine how, or from where, the necessary operands are fetched. Some common
modes are:
Immediate, operand is available immediately after the opcode
Register Direct, operand is available in the register in context
Register indirect, operand is in the memory address contained by the register in context
Direct, operand is in the memory, the address being pointed by the value immediately after the opcode
Indirect, operand is in the memory, the address being pointed by the value contained in the memory address
immediately after the opcode
Addressing modes are logical features.
Input / Output System
Input/ Output System determines how the system interacts with the outside world. The program takes in some
input and delivers output through a number of ports. The ports may require preparation, some lines of code,
before they can actually perform the desired input or output task. Taking input and giving output may be roundrobin or interrupt driven feature may be available.
Interrupts
An interrupt is a signal generated by peripheral devices to have attention of processor while it is busy executing
other instructions. Once an interrupt is issued the processor halt whatever it is doing and starts the interrupt sub
routine. The programmer has to know what interrupts are available, how to enable or disable them them. If
there are multiple interrupts the precedence of the interrupts is also important.
Operating System: There might be an operating system that provides low-level services like memory read /
write management, i/o interfacing, scheduling tasks etc. The operating system makes it easy for the
programmer to handle those cumbersome tasks.

2.2.4 Development environment


The development environment refers to the hardware and software elements required to write programs,
debug, transfer the program to the processor and test the whole system under design.
Development computer
The program development takes place in a separate processor/ computer - host processor
Chapter 2

Embedded Systems

Purushotam Shrestha

The host contains are software systems that allow program writing, compiling, assembling, debugging for specific
type of controllers. There may also be an emulator that mimics the target device so that the program can be
tested on an actual hardware like system.
The host may also include circuit designing and analysis software packages for hardware design phase. The
output may be a PCB layout file for circuit board fabrication.
Target
The program is developed for a target processor into which the test program is downloaded into or burned. The
target processor runs the program and does some useful work.
Hardware testing and debugging tools :
Before downloading the program
Instruction Set Simulation
Emulation
After Downloaded into actual device
Digital multimeters, oscilloscope, logic
analysers, function generators
IDE Integrated Development Environment :
software package which provides source code
editor(text editor), cross-compiler, compiler,
linker, debugger, Emulators, programmer,
downloader,
Eg :MPLAB provided by microchip
Starts with a project
Choice of processor/controller
Programming
Compiling/assembling
Debugging
Testing on an emulator
Program burning

Fig: Development Environment

2.2.5 Application specific Instructions Set Processors


ASIPs are designed to be used for similar type of applications, a particular domain of applications for example
digital signal processor that can be used in image processing, speech recognitions; similar type of application. In
the domain, processors may be different but they share same or nearly same architectural features.
The need for ASIPs arises from the inefficiency of general purpose processors for special tasks. It is not that a GPP
cannot be programmed for a task that is done by an ASIC, but it would take a lot of extra programming, the word
size of the processor and other supporting hardware may not be enough. A good example is the graphics card
containing a special processor to meet the processing required by some games, graphics designers and architects
using CAD software.
There are many advantages of using an ASIP as compared to GPPs and SPPs. The following are some major ones:
Speed: The hardware of an ASIP is specially tailored to execute the application specific instructions. For example
an image processer may have an instruction for differentiating the input values or, a communication interface
circuit may have an instruction for recognizing a bit pattern, the hardware is designed to implement the
instruction resulting in faster processing.

Chapter 2

Embedded Systems

Purushotam Shrestha

Reprogrammability: ASIPs can be reprogrammed. The scope of the programs executed by an ASIP may be
limited to particular class of applications, but still the feature of reprogrammability gives flexibility, though
limited, for upgrades and modifications. This may save time and cost.
NRE: ASIPs are not designed for a specific single task, they are designed for a class of tasks. Unlike single
purpose processor, they can be reprogrammed when the application requirement changes. Thus the cost of
engineering work for producing an ASIC can be distributed which lowers the overall cost. For same task, ASIPs
are cheaper than single purpose processor
Power Consumption: Compared to a GPP, an ASIP may consume less power. An ASIP is designed to execute
specific tasks; it would not contain unnecessary hardware components required by the GPPs in order to
possess generality. Less hardware implies less power.
The importance of ASIPs can't be undermined when there is an increasing use of microprocessor controlled
systems. The availability of hardware units including the processors in HDL (Hardware Description Language)
allows one to implement the processor in ASIP form.

2.2.6 Selecting a Processor


The selection basis is the requirement, what the processor needs to do. A processor may be selected on the
basis of
Speed: How fast a processor can computer has always been a point of high interest for designers and
developers. There is always a demand for faster processors, it may not be the same; the application
determines the requirement. A data logging application that records temperature every 5 minutes might not
require a faster processor but a X-ray machine in the emergency ward should have a faster one.
One way to measure the speed may be the clock frequency of the processor, but the number of clock cycles
taken to execute an instruction should also be accounted.
MIPS, Million Instruction Per Second, is also used to measure speed. An benchmark, defined in 1984, is
Dhrystones per second.
1 MIPS = 1757 Dhrystones per second
EEMBC: EDN Embedded Consortium for various benchmarks
Instruction Set: The Instruction Set defines what a processor can do. Based on the task to be performed, an
additional set of instructions or totally different set of instruction may be required. In a robotic system the
processors for driving a motor and analyzing the environment are required to perform different tasks,
instruction set may be different for them. The instruction set may be RISC or CISC.
Bit/ Word width: the size of the data the processor can handle, the width of the registers around processor. If
the processor needs to process floating point data, the word length that works for integer type data would
not suffice. A narrower option may work but takes more time.
Power consumption: Power consumption may not a deciding factor for a fixed system but when it comes to a
portable handheld device, it becomes a crucial point. Both the standby and peak power consumption of the
processor are to be considered. For example, the power consumed by a mobile phone in standby/sleep and
active mode should be low as the device is handheld and keeps going around with the user.
Prior Experience: While working in a project, a designer would choose a processor with which he has
experience. The availability of development environment and libraries for the software for a processor also
contributes to the preference.
Size: The actual physical size of the processor may impact the design when the trend is going for slim smart
devices. Everything, including the processor, is required to be small. The processor for desktop system can
have less stringent size requirement, a tablet design prefers the smallest available size. Small size also implies
higher speed and lower power consumption.
Cost: The price of the processor is the ultimate selection factor. The available project budget may not
accommodate an expensive processor. Also, if the system is to be produced and sold as a market product,
Chapter 2

Embedded Systems

Purushotam Shrestha

keeping the cost line low is preferred, though some performance trade-offs are required. It is not a good idea
to use a processor whose cost exceeds all the cost of the remaining hardware and software.
Other factors may be type/ version, no of registers available etc may also be used as selection criteria.
2.2.7 General Purpose Processor Design
A general purpose processor, GPP, is characterized by its nature of reprogramming for a wide variety of
applications. A GPP is designed to execute generalized, basic instructions which can be used to write programs
that perform different tasks. A lot of effort is put in the design phase in order to generalize the processor so that
it can be programmed for different situations. The high design cost, i e NRE, is acceptable as the GPP is produced
in large number distributing the cost and reducing the price per unit.
The design of a general purpose requires
the list of all the operations it is to perform.
We call the list the instruction set. The
adjacent table shows a list of operations for
a very simple general purpose processor.
There are 7 operations and they are
assigned binary values which will be used to
decode an instruction

Instructions
Load A
Load B
A OR B
A AND B
A+B
AB
A+1

The general purpose processor consists of


datapath to carry out the operations
controller to control the operations of the datapath
memory to store instructions/program

0
0
0
1
1
1
1

Binary Value
0
1
1
0
0
1
1

1
0
1
0
1
0
1

The datapath hardware contains the functional units like adders, logical units, shifters that are required to
perform various operations
The controller fetches instruction from the program memory, decodes and provides appropriate signals to the
datapath. In doing so, the controller accepts various status signals from the datapath which are generated by the
datapath while carrying out the operations like overflow, carry, zero.
FSMD: The instructions that a general purpose is to execute and the different states that it goes through are all
summarized in a FSMD. The diagram contains the actual expressions with the unique variables. The designs for
both the datapath and the controller are derived from the FSMD.
Datapath:
The design of datapath follows from the instruction set. The each different expression, required by the
instruction, shown in the FSMD are carried out by separate functional unit. The datapath must contain the
functional units required by the instructions. The operation of a functional unit is activated by control signals
generated by the controller.
Another way is to select output given out by each functional unit after processing the common input using a
multiplexer. If A and B are two inputs, all the operations are performed on them: A+B, A-B, A AND B etc. But only
one of the output is selected, the selection being based upon the instruction being executed. on the input data
and outputs are available but only, to select the functional units.
The number of registers is determined by the nature of the operations defined in the instructions. Usually,
individual registers are required for unique variables contained in the expressions. Extra registers can be added
to facilitate computations.
The datapath uses the control signals to execute the instructions. It is not the concern of datapath design how
the control signals are generated, the concern is what control signals are required. Clearly define the required
control signals.
10

Chapter 2

Embedded Systems

Purushotam Shrestha

The interconnection between functional units, the registers and other units should also be defined.
Controller: The controller is a finite state machine that goes from one to another state and its design involves
sequential design procedure. The state diagram is different but procedure is more or less same. It is
responsible for generating control signals for datapath and generally cycles through following states:

FETCH
IR = M[PC]
PC=PC+1

DECODE

PC=PC+1
EXECUTE

001

Load A

Load value into


Reg-A from M

Load B

Load value into


Reg-B from M

010

011
A OR B
100
A AND B
101
A+B
110
AB
111
A+1

O/P = A OR B

O/P = A AND B

O/P = A + B

O/P = A -B

O/P = A + 1

A FSMD for a General Purpose Processor

Fetch: reads the memory location whose address is contained by Program Counter, PC, and loads the Instruction
Register ,IR, with contents of the location
Decode: takes the opcode bits from the IR and decodes what is to be done according to the current instruction
Execute: provides appropriate control signals to datapath which does the computational work.
The controller uses following special purpose registers to hold various data such as memory address, instruction
code, opcode, etc
The Program Counter(PC):
The program counter holds the address of memory location from where the instruction is fetched. The address
is calculated by increasing the previous value by 1. If a branch instruction is encountered, the address value is
loaded as calculated from the branch instruction.
The output of PC is connected to memory address lines.
A control signal is required to load values into the PC. When this line is activated, bits are loaded into the PC or
incremented.
Chapter 2

11

Embedded Systems

Purushotam Shrestha

The Instruction Register(IR):


The instruction register gets the instruction bits from the memory location pointed by the PC. It is connected to
the memory data lines.
A control signal is required to load bits from memory into the IR. When this line activated contents of memory
are transferred to the IR.
The bits in the IR and the values of the state are used to generate control signals. The control signals depend
upon the instructions to execute. To add two numbers, move operations and an add signal might be required,
each of these signals must be generated from the FSM using the state values and the IR bits.

Controller

Control
circuit:
state machine

Datapath

Register

Register

IR
Control
Signals

PC

Address

calculato
r

Memory

ALU with
multiple
functional
Units
Status
values

Data to be
stored

Register

Register

Output Data

A very basic general purpose processor

12

Chapter 2

You might also like