Professional Documents
Culture Documents
Chapter 1
1.1 Introduction
Microprocessors and Microcontrollers have traditionally been designed around two
Philosophies: Complex Instruction Set Computer (CISC) and Reduced Instruction Set
Computer (RISC).
The CISC concept is an approach to the Instruction Set Architecture (ISA) design that
emphasizes doing more with each Instruction using a wide variety of Addressing modes,
variable number of operands in various locations in its Instruction Set. As a result, the
Instructions are of widely varying lengths and execution times thus demanding a very
complex Control Unit, which occupies a large real estate on chip.
On the other hand, the RISC Processor works on reduced number of Instructions,
fixed Instruction length, more general-purpose registers, load-store architecture and
simplified Addressing modes which makes individual instructions execute faster, achieve a
net gain in performance and an overall simpler design. The above features make the RISC
design ideally suited to participate in a powerful trend in the embedded Processor market
the system-on-a-chip".
Features which are generally found in RISC designs are:
1) uniform instruction encoding (for example the op-code is always in the same bit
position in each instruction, which is always one word long), which allows faster decoding.
2) A homogeneous register set, allowing any register to be used in any context and
simplifying compiler design.
3) Load and store architecturewhich means that only operations that interact with memory
are load and store.
4)simple addressing modes (complex addressing modes are replaced by sequences
simple arithmetic instructions).
of
5)Few data types supported in hardware (for example, some CISC machines had
instructions for dealing with byte strings. Others had support for polynomials and complex
numbers. Such instructions are unlikely to be found on a RISC machine).
Also, the processor is a 5-stage pipeline architecture such as:
-
Fetch.
Decode.
Execute.
Memory access.
Write back.
1.2 Motivation
Samiappa Sakthikumaran, S. Salivahanan, V. S. KanchanaBhaaskaran,16-Bit RISC
processor design for convolution application demonstrates that the designing of the 16 bit
processor is having 27 operations but it is of a non-pipelined stages.
Though it has a non-pipelined stage , the power dissipation of the 16 bit convolutional
application processor is is high as compared to pipelined stage. Also the speed of the
processor is 200Mhz which is very much less.
The high power dissipation and the slower speed of the processor is motivates me to
targeting the 3-stage pipeline or 5- stage pipeline to lower the power dissipation and for the
higher speed using the 180 nm SCL CMOS technology.
Chapter 2
2.1 Literature survey
The problem of designing the processor and minimizing the power dissipation by
various power minimization techniques isaddressed by many authors and a brief overview of
their work is mentioned below:
Kui YI and Yue-hua DING [1] demonstrates the function and working theory of RISC
CPU instruction decoder module which , includes register file, write back data to register file.
Verilog Digital System Design by Zainalabedin nawabi[2] demonstrates the Verilog
coding as well as all the design algorithms of the designing in the Verilog. Also study how the
control signals pas from the control unit towards the datapath unit to control the data flow.
Samiappa
Sakthikumaran,
S.
Salivahanan,
V.
S.
KanchanaBhaaskaran,16-Bit RISC processor design for convolution
application [3] demonstrates the design of a single cycle 16-Bit nonpipelined RISC processor for
its application towards convolution
application has been presented. Novel adder and multiplier structures
have been employed in the RISC architecture. The processor has been
designed for executing the instruction set comprising of 27 instructions
in total. It mapped the Verilog-HDL components to a SAED 90nm ASIC
standard cell library.
Also, The total power dissipation was found to be 7.44mW. In order
to achieve reduced power consumption, a low power design technique
called clock gating was employed. To save power, clock gating technique
adds more logic to the circuitry to prune the clock tree, thus disabling
portions of the circuitry so that its flip-flops do not change state
unnecessarily. This technique reduces the power consumption
to 3.04mW.
Design of a 16 Bit RISC Processor, Indian Journal of Science and
Technology, Vol 8(20), DOI: 10.17485/ijst/2015/v8i20/78320, August
2015 [4] concludes Conclusion: The RISC processor is found to consume
68.9mW of power for the execution of the AND instruction with a delay of
1600ns. It consumes 77.6mW of power dissipation for the execution of
the ADD instruction with a delay of 1900ns using the technology file of
180 nm TSMC.
Chapter 3
Proposed Approach
Instruction fetch.
Instruction decode.
Execute.
Memory access.
Write back.
Program Counter:
DATA MEMORY:
Data memory is the storage device that store data from the
program executed. It could be the constants, variables, address etc.
Normally, data that are stores here are not a frequently used data as
accessing the memory is slow thus make the program execution slower.
INSTRUCTION MEMORY:
REGISTER FILE:
ALU is the unit that does the manipulation to the data such as
addition, subtraction, logical AND, logical OR and many more.
CONTROL UNIT:
R-Type:
The R-Type instruction set mainly used for performing the register
type operations such as addition, subtraction, ORing, ANDing etc. The
format of the instruction set are as follows:
Rs (B20-16) : This is the first source register. The source register is the
register that holds one of the arguments of the operation.
I-type:
16
J-type:
Chapter 4
Hardware and Software
requirements
4.1 Xilinx ISE:
Algorithmic level (much like c code with if, case and loop statements).
(ii)
(iii)
(iv)
The language also defines constructs that can be used to control the input and output of
simulation.
More recently Verilog is used as an input for synthesis programs which will generate a
gate-level description (a netlist) for the circuit. Some Verilog constructs are not synthesizable.
Also the way the code is written will greatly effect the size and speed of the synthesized
circuit. Most readers will want to synthesize their circuits, so nonsynthesizable constructs
should be used only for test benches. These are program modules used to generate I/O needed
to simulate the rest of the design. The words not synthesizable will be used for examples
and constructs as needed that do not synthesize.
4.3 Synopsis:
Synopsys Verification Continuum is a comprehensive verification
platform built from the industrys fastest engines for virtual prototyping,
static and formal verification, simulation, emulation, FPGA-based
prototyping and debug. Verification Continuum features Unified Compile
based on VCS for a simulation-like use model throughout the verification
flow, enabling faster design bring-up, seamless transitions between
simulation, emulation and prototyping. It also delivers Unified Debug
with Verdi to provide a debug continuum across all domains and
abstraction levels enabling dramatic increases in debug efficiency. The
Synopsys Verification Continuum also includes comprehensive planning
and coverage as well as a multi-platform, verification IP solutions. This
platform is collectively complemented by low power and analog/mixedsignal technology, integration and flows.
Chapter 5
Background
(16 bit RISC CPU)
16-bit Accumulator.
3-stage pipeline.
ISA specifies the address lines and the opcode among the 16 bits.
Fetch:
The next instruction is fetched from the memory address that is
currently stored in the program counter (PC), and stored in
the instruction register(IR). At the end of the fetch operation, the PC
points to the next instruction that will be read at the next cycle.
Decode:
Execute:
The control unit of the CPU passes the decoded information as a
sequence of control signals to the relevant function units of the CPU to
perform the actions required by the instruction such as reading values
from registers, passing them to the ALU to perform mathematical or logic
functions on them, and writing the result back to a register. If the ALU is
involved, it sends a condition signal back to the CU. The result generated
by the operation is stored in the main memory, or sent to an output
device. Based on the condition of any feedback from the ALU, Program
Counter may be updated to a different address from which the next
instruction will be fetched.
-Accumulator.
-Program Counter.
-Instruction register.
Arithmetic and logical unit.
Control Unit.
As the Opcode of 4 bits goes to the control unit. After each opcode
bits, the control unit sends the control signals to execute the various
opearations. Also the CPU is having 16 bit data lines and 12 bit address
lines which is performed the ALU operations like addition, subtraction,
ANDing, ORing, NOT operation, Left shift, Right shift and pass the left
hand signal. The instruction register specifies the bits for OPCODE and
address line.
Also, It has the 3-stage pipeline which is fetch decode and execute
and the various control signals are as:
-
3. Load Immediate :
4. Store Immediate :
5. Shift operations:
LSHEIFT: AC <<1
Data in the accumulator is left shifted by 1.
Rshift: AC>>!
Data in the accumulator is right shifted by 1.
1)
2)
DURATION
PROGRESS
REFERENCES
[1]
Kui YI and Yue-hua DING. 32-bit RISC CPU Based on MIPSInstruction Decoder Module DesignPublished in web mining and
web based application, sept 2009.
[2]
[3]
Samiappa Sakthikumaran; S. Salivahanan; V. S. Kanchana Bhaaskaran, 16Bit RISC processor design for convolution application, Recent Trends in Information
Technology (ICRTIT), 2011 International Conference on
[4]
[5]
[6]
Samir Palnitkar,Verilog HDL, A guide to digital design and
synthesis,1996.
[7]
[8]
Mrs. Rupali Balpande, Mrs. Rashmi Keote, "Design of FPGA based Instruction Fetch
& Decode Module of 32-bit RISC (MIPS) Processor". 2011 IEEE.