You are on page 1of 29

Problem Definitions and Objectives:

Simulate and verification of the 32 bit RISC CPU using Xilinx


tool, which is having the operations like addition, subtraction,
etc.
Design the 32 bit RISC CPU using 180 nm SCL CMOS
technology file in the synopsis.
Compare the power performance for the 3-stage pipeline and 5stage pipeline after the design of the RISC CPU.
Add the module of Multiplication and division in the designing
of 32 bit RISC CPU.
Increase the Power performance per unit power of the RISC
CPU.

Chapter 1
1.1 Introduction
Microprocessors and Microcontrollers have traditionally been designed around two
Philosophies: Complex Instruction Set Computer (CISC) and Reduced Instruction Set
Computer (RISC).
The CISC concept is an approach to the Instruction Set Architecture (ISA) design that
emphasizes doing more with each Instruction using a wide variety of Addressing modes,
variable number of operands in various locations in its Instruction Set. As a result, the
Instructions are of widely varying lengths and execution times thus demanding a very
complex Control Unit, which occupies a large real estate on chip.
On the other hand, the RISC Processor works on reduced number of Instructions,
fixed Instruction length, more general-purpose registers, load-store architecture and
simplified Addressing modes which makes individual instructions execute faster, achieve a
net gain in performance and an overall simpler design. The above features make the RISC
design ideally suited to participate in a powerful trend in the embedded Processor market
the system-on-a-chip".
Features which are generally found in RISC designs are:
1) uniform instruction encoding (for example the op-code is always in the same bit
position in each instruction, which is always one word long), which allows faster decoding.
2) A homogeneous register set, allowing any register to be used in any context and
simplifying compiler design.
3) Load and store architecturewhich means that only operations that interact with memory
are load and store.
4)simple addressing modes (complex addressing modes are replaced by sequences
simple arithmetic instructions).

of

5)Few data types supported in hardware (for example, some CISC machines had
instructions for dealing with byte strings. Others had support for polynomials and complex
numbers. Such instructions are unlikely to be found on a RISC machine).
Also, the processor is a 5-stage pipeline architecture such as:
-

Fetch.
Decode.
Execute.

Memory access.
Write back.

1.2 Motivation
Samiappa Sakthikumaran, S. Salivahanan, V. S. KanchanaBhaaskaran,16-Bit RISC
processor design for convolution application demonstrates that the designing of the 16 bit
processor is having 27 operations but it is of a non-pipelined stages.

Fig. 16 bit NON-pipelined RISC processor

Though it has a non-pipelined stage , the power dissipation of the 16 bit convolutional
application processor is is high as compared to pipelined stage. Also the speed of the
processor is 200Mhz which is very much less.
The high power dissipation and the slower speed of the processor is motivates me to
targeting the 3-stage pipeline or 5- stage pipeline to lower the power dissipation and for the
higher speed using the 180 nm SCL CMOS technology.

Chapter 2
2.1 Literature survey
The problem of designing the processor and minimizing the power dissipation by
various power minimization techniques isaddressed by many authors and a brief overview of
their work is mentioned below:
Kui YI and Yue-hua DING [1] demonstrates the function and working theory of RISC
CPU instruction decoder module which , includes register file, write back data to register file.
Verilog Digital System Design by Zainalabedin nawabi[2] demonstrates the Verilog
coding as well as all the design algorithms of the designing in the Verilog. Also study how the
control signals pas from the control unit towards the datapath unit to control the data flow.
Samiappa
Sakthikumaran,
S.
Salivahanan,
V.
S.
KanchanaBhaaskaran,16-Bit RISC processor design for convolution
application [3] demonstrates the design of a single cycle 16-Bit nonpipelined RISC processor for
its application towards convolution
application has been presented. Novel adder and multiplier structures
have been employed in the RISC architecture. The processor has been
designed for executing the instruction set comprising of 27 instructions
in total. It mapped the Verilog-HDL components to a SAED 90nm ASIC
standard cell library.
Also, The total power dissipation was found to be 7.44mW. In order
to achieve reduced power consumption, a low power design technique
called clock gating was employed. To save power, clock gating technique
adds more logic to the circuitry to prune the clock tree, thus disabling
portions of the circuitry so that its flip-flops do not change state
unnecessarily. This technique reduces the power consumption
to 3.04mW.
Design of a 16 Bit RISC Processor, Indian Journal of Science and
Technology, Vol 8(20), DOI: 10.17485/ijst/2015/v8i20/78320, August
2015 [4] concludes Conclusion: The RISC processor is found to consume
68.9mW of power for the execution of the AND instruction with a delay of
1600ns. It consumes 77.6mW of power dissipation for the execution of
the ADD instruction with a delay of 1900ns using the technology file of
180 nm TSMC.

The processor has been custom designed and the use of


appropriate logic styles for various functional blocks can also be
attempted while striving for low power operation for the processor.
Cosmin Cernazanu-Glavan, Alexandru Amaricai, Marius Marcu, Direct FPGA-based
power profiling for a RISC processor investigates the possibility of creating an energy
profile of a RISC processor instruction set in the prototyping phase, using FPGA
implementation and physical measurements. In order to determine the power consumption at
instruction-level, several programs have been developed and run on the processor
implementation on FPGA.
The experiments have focused at the following groups of instructions: arithmetic and
logic (ALU) instructions, memory access instructions, control instructions, compare and
move instructions. The main goal of our work is the investigation of the correlation between
dynamic power consumption of a RISC processor design implemented in different
technologies (FPGA vs. ASIC) and manufacturing processes, called power technology gap.
The achieved correlation coefficient between the FPGA 45nm physical power
measurements and ASIC 45nm power estimation is 86.39%.

Chapter 3
Proposed Approach

(32 bit RISC CPU)


Design and verification of 32 bit RISC CPU using 180 nm SCL CMOS
technology in the synopsis tool. The specifications for the 32 bit RISC
CPU design are as follows:
- Harvard Architecture.
- 32 bit data bus.
- Harvard architecture.
-32 bit data bus.
-226 X 32 memory (RAM).
-32 bit ALU
- 3 types of instruction set (R-type, I-type, J-type)
-Load-store architecture.
-5 stage pipeline:

Instruction fetch.

Instruction decode.

Execute.

Memory access.

Write back.

-Power Dissipation < 800mW.


-Speed =250MIPS.

3.1 FUNCTIONAL UNITS:

Program Counter:

Program counter contain the next instruction address to be


executed. This address will be input the program RAM to access a
specific line of instructions. Normally, PC would be increased after every
instruction executed to point to the next address except if flow control
instructions is executed which modify the bits contain in the PC.

Fig . Block diagram of 32 bit RISC processor

ADDRESS BUS AND DATA BUS:

Bus is used to simplify the movement of data from point to point in


a computer. Bus is analogous to a highway and the devices are
analogous to junctions that connect to this highway. By having both
address bus and data bus, it is possible to reduce the number of wires
that interconnect within a computer but, it introduces a complexity. In a
bused system, only one communication from point to point could happen
at a time. Thus a careful synchronization needs to be taken care of and
each bus access time has to be long enough for the safe reception in a
communication.

DATA MEMORY:

Data memory is the storage device that store data from the
program executed. It could be the constants, variables, address etc.
Normally, data that are stores here are not a frequently used data as
accessing the memory is slow thus make the program execution slower.

INSTRUCTION MEMORY:

An instruction memory is a memory on which a computer retrieves


a program instruction from its memory, determines what actions
the instruction dictates, and carries out those actions.

REGISTER FILE:

A register file is an array of processor registers in a central


processing unit (CPU). It has two read register such as readreg1 and read
reg 2as well as two write register write reg1 and write reg2.

Fig. I/O of Register file

ARITHMETIC AND LOGICAL OPERATION:

ALU is the unit that does the manipulation to the data such as
addition, subtraction, logical AND, logical OR and many more.

Fig. Block of ALU

CONTROL UNIT:

Control logic is among most important modules that make up a


processor. It controls the sequence and datapath flow of an instruction.
When an instruction is executed, it fetch and decode the opcode of that
instruction and generate the control logic signals to the appropriate
functional units such as register , ALU and memory.

3.2. INSTRUCTION SET ARCHITECTURE

There are mainly three types of instructions set:


1. R-Type.
2. I-Type.
3. J-Type.

R-Type:

The R-Type instruction set mainly used for performing the register
type operations such as addition, subtraction, ORing, ANDing etc. The
format of the instruction set are as follows:

Fig. Instruction format of R-type

The semantics of the instruction are R[d] = R[s] + R[t].

Opcode (B31-26) : Opcode is short for operation code. The opcode is a


binary encoding for the instruction. Opcodes are seen in all ISAs. In MIPS,
there is an opcode for add.
The opcode in MIPS ISA is only 6 bits. Ordinarily, this means there
are only 64 possible instructions. Even for a RISC ISA, which typically has
few instructions, 64 is quite small. For R-type instructions, an additional 6
bits are used (B5-0) called the function.
Thus, the 6 bits of the opcode and the 6 bits of the function specify the
kind of instruction for R-type instructions.

Rd (B25-21) : This is the destination register. The destination register is the


register where the result of the operation is stored.

Rs (B20-16) : This is the first source register. The source register is the
register that holds one of the arguments of the operation.

Rt (B15-11) : This is the second source register.

Shift amount (B10-6) : The amount of bits to shift. Used in shift


instructions.

Function (B5-0) : An additional 6 bits used to specify the operation, in


addition to the opcode.

I-type:

The I-Type instruction set mainly used for performing the


immediate type operations such as addi, load, store etc. The format of
the instruction set are as follows:

Fig. Instruction Format of I-type

The semantics of the addi instruction are R[t] = R[s] + (IR15)


IR15-0

16

where IR refers to the instruction register, the register where the


current instruction is stored. (IR15) 16 means that bit B15 of the instruction
register (which is the sign bit of the immediate value) is repeated 16
times. This is then followed by IR15-0, which is the 16 bits of the
immediate value.

Basically, the semantics says to sign-extend the immediate value


to 32 bits, add it (using signed addition) to register R[s], and store the
result in register rt.

J-type:

J-type is short for jump type. The format of an J-type instuction


looks like:

Fig. Instruction Format of J-type

The semantics of the j instruction (j means jump) are: PC <- PC31-28


IR25-0 00

where PC is the program counter, which stores the current address


of the instruction being executed. You update the PC by using the upper
4 bits of the program counter, followed by the 26 bits of the target
(which is the lower 26 bits of the instruction register), followed by two
0's, which creates a 32 bit address.

3.3 INSTRUCTION SET

Fig. Instruction set for 32 bit RISC CPU

Chapter 4
Hardware and Software
requirements
4.1 Xilinx ISE:

Xilinx ISE (Integrated Synthesis Environment) is a software tool produced


by Xilinx for synthesis and analysis of HDLdesigns, enabling the developer
to synthesize ("compile") their designs, perform timing analysis, examine RTL diagrams,
simulate a design's reaction to different stimuli, and configure the target device with
the programmer.
Xilinx ISE is a design environment for FPGA products from Xilinx, and is tightlycoupled to the architecture of such chips, and cannot be used with FPGA products from other
vendors. The Xilinx ISE is primarily used for circuit synthesis and design, while ISIM or
the ModelSim logic simulator is used for system-level testing. Other components shipped
with the Xilinx ISE include the Embedded Development Kit (EDK), a Software Development
Kit (SDK) and ChipScope Pro.
Since 2012, Xilinx ISE has been discontinued in favor of Vivado Design Suite, that serves the
same roles as ISE with additional features for system on a chip development. Xilinx released
the last version of ISE in October 2013 (version 14.7), and states that "ISE has moved into
the sustaining phase of its product life cycle, and there are no more planned ISE releases".

4.2 Verilog HDL:


Verilog HDL is one of the two most common Hardware Description Languages
(HDL) used by integrated circuit (IC) designers. The other one is VHDL.
HDLs allows the design to be simulated earlier in the design cycle in order to correct
errors or experiment with different architectures. Designs described in HDL are technologyindependent, easy to design and debug, and are usually more readable than schematics,
particularly for large circuits.
Verilog can be used to describe designs at four levels of abstraction:
(i)

Algorithmic level (much like c code with if, case and loop statements).

(ii)

Register transfer level (RTL uses registers connected by Boolean equations).

(iii)

Gate level (interconnected AND, NOR etc.).

(iv)

Switch level (the switches are MOS transistors inside gates).

The language also defines constructs that can be used to control the input and output of
simulation.
More recently Verilog is used as an input for synthesis programs which will generate a
gate-level description (a netlist) for the circuit. Some Verilog constructs are not synthesizable.
Also the way the code is written will greatly effect the size and speed of the synthesized
circuit. Most readers will want to synthesize their circuits, so nonsynthesizable constructs

should be used only for test benches. These are program modules used to generate I/O needed
to simulate the rest of the design. The words not synthesizable will be used for examples
and constructs as needed that do not synthesize.

4.3 Synopsis:
Synopsys Verification Continuum is a comprehensive verification
platform built from the industrys fastest engines for virtual prototyping,
static and formal verification, simulation, emulation, FPGA-based
prototyping and debug. Verification Continuum features Unified Compile
based on VCS for a simulation-like use model throughout the verification
flow, enabling faster design bring-up, seamless transitions between
simulation, emulation and prototyping. It also delivers Unified Debug
with Verdi to provide a debug continuum across all domains and
abstraction levels enabling dramatic increases in debug efficiency. The
Synopsys Verification Continuum also includes comprehensive planning
and coverage as well as a multi-platform, verification IP solutions. This
platform is collectively complemented by low power and analog/mixedsignal technology, integration and flows.

Chapter 5
Background
(16 bit RISC CPU)

Simulate and verification of the 16 bit RISC CPU which is


performing the operations like addition , subtraction, ANDing, ORing,
shifting etc in the Xilinx tool (Xilinx 14.5).It has the following
specifications:

Von Neumann architecture.

16-bit data bus.

12-bit address bus.

Program counter initializes from 000(0d) to FFF(4095d).

16-bit Accumulator.

ALU to perform arithmetic and logical operation.

3-stage pipeline.

ISA specifies the address lines and the opcode among the 16 bits.

B15 B12 for OPCODE.

B11 B0 for Address lines.

It is having 3-stage pipeline:


1)Fetch.
2)Decode.
3) Execute.

Fetch:
The next instruction is fetched from the memory address that is
currently stored in the program counter (PC), and stored in
the instruction register(IR). At the end of the fetch operation, the PC
points to the next instruction that will be read at the next cycle.

Decode:

During this cycle the encoded instruction present in the IR


(instruction register) is interpreted by the decoder.

Execute:
The control unit of the CPU passes the decoded information as a
sequence of control signals to the relevant function units of the CPU to
perform the actions required by the instruction such as reading values
from registers, passing them to the ALU to perform mathematical or logic
functions on them, and writing the result back to a register. If the ALU is
involved, it sends a condition signal back to the CU. The result generated
by the operation is stored in the main memory, or sent to an output
device. Based on the condition of any feedback from the ALU, Program
Counter may be updated to a different address from which the next
instruction will be fetched.

5.1 ARCHITECTURE OF 16 BIT RISC CPU:

16 bit RISC CPU which perform certain operations like addition ,


subtraction, shifting etc are designed. As a CPU contains two unit such s
datapath unit and control unit. The control signals from the control unit
are passed to the datapath unit to control the data flow in the datapath
unit through the finite state machine(FSM). The various blocks are in a 16
bit RISC CPU are.

-Accumulator.
-Program Counter.
-Instruction register.
Arithmetic and logical unit.
Control Unit.

Fig. Architecture of 16 bit RISC CPU

As the Opcode of 4 bits goes to the control unit. After each opcode
bits, the control unit sends the control signals to execute the various
opearations. Also the CPU is having 16 bit data lines and 12 bit address
lines which is performed the ALU operations like addition, subtraction,
ANDing, ORing, NOT operation, Left shift, Right shift and pass the left
hand signal. The instruction register specifies the bits for OPCODE and
address line.

Also, It has the 3-stage pipeline which is fetch decode and execute
and the various control signals are as:
-

Ld_ir( to load instruction register)

Ld_ac (to load Accumulator)

Ld_pc (to load program counter)

Inc_pc(to increment program counter)

Clr_pc (to clear the program counter)

rd_mem (to read from the memory)

wr)mem(to write into the memory)

5.2 INSTRUCTION SET:

As I am performed total 9 instructions which is describes as follows:

Table. Instruction set of 16 bit RISC CPU

1. Arithmetic (Twos Complement) ALU operation:


ADD: Rd = Rs + Rt
Operands A and B stored in register locations Rs and Rt are
added and written to the destination register specified by Rd.
SUB: Rd = Rs Rt
Operand B (Rt) is subtracted from Operand A (Rs) and written
to Rd.

2. Logical ALU operation


AND: Rd = Rs & Rt
Operand A (Rs) is bitwise anded with Operand B (Rt) and
written into Rd.
OR: Rd = Rs | Rt
Operand A (Rs) is bitwise ored with Operand B (Rt) and written
into Rd.

3. Load Immediate :

Rd = 16-bit Sign extended Immediate


The 8-bit immediate in the Instruction word is sign-extended to 16bits and written into the register specified by Rd.

4. Store Immediate :

8-bit Sign extended Immediate = Rd


The 8-bit immediate in the Instruction word is sign-extended to 16bits and stored into the register specified by Rd.

5. Shift operations:
LSHEIFT: AC <<1
Data in the accumulator is left shifted by 1.
Rshift: AC>>!
Data in the accumulator is right shifted by 1.

5.3 SIMULATION RESULTS:

1)

Fig . Simulation Result of ADD and JUMP operations


The ADD instruction add the data in the memory and the data
present in the accumulator. Also the jump instruction directly jump to the
location 000H.

2)

Fig . Simulation result of AND operation


The AND instruction add the data in the memory and the data
present in the accumulator.

ACTIVITY TIME CHART:

DURATION

PROGRESS

TILL MID SEMESTER

1. Gone through the various Research


papers related to RISC CPU.
2. Simulation and verification of 16 bit
RISC CPU having various arithmetic
and logical instructions have been
done.

1. Simulation and verification of the 32


bit RISC CPU using Xilinx tool.
TILL END SEMESTER

2. Add the instructions( or different


module) of multiplication and
division in the verilog.

1. Design of the 32 bit RISC CPU using


180 nm SCL CMOS technology.
FINAL THESIS

2. Compare the power consumption of


3-stage pipeline and 5-stage pipeline.
3. Increase the performance per unit
power for the design of RISC CPU.

REFERENCES

[1]

Kui YI and Yue-hua DING. 32-bit RISC CPU Based on MIPSInstruction Decoder Module DesignPublished in web mining and
web based application, sept 2009.

[2]

Zainalabedin Navabi, Verilog digital system design register


transfer lever synthesis test and verification /mcgraw Hill
Electronics engineering: Second edition.

[3]

Samiappa Sakthikumaran; S. Salivahanan; V. S. Kanchana Bhaaskaran, 16Bit RISC processor design for convolution application, Recent Trends in Information
Technology (ICRTIT), 2011 International Conference on

[4]

[5]

Design of a 16 Bit RISC Processor, Indian Journal of Science and


Technology, Vol 8(20), DOI: 10.17485/ijst/2015/v8i20/78320,
August 2015.

Cosmin Cernazanu-Glavan, Alexandru Amaricai, Marius Marcu,


Direct FPGA-based power profiling for a RISC processor IEEE
Instrumentation and Measurement. 978-1-4799-6144-6/15 2015

[6]
Samir Palnitkar,Verilog HDL, A guide to digital design and
synthesis,1996.

[7]

"Computer Organization and Design- the hardware/software


interface", 3rd edition by David A. Patterson and John L. Hennessy,
pp. 370-412.

[8]

Mrs. Rupali Balpande, Mrs. Rashmi Keote, "Design of FPGA based Instruction Fetch
& Decode Module of 32-bit RISC (MIPS) Processor". 2011 IEEE.

You might also like