You are on page 1of 23

CHAPTER 3

PROPOSED SYSTEM
3.1 2X2 VEDIC MULTIPLIER
The method explained below for two, 2 bit numbers A and B where
A=a1a0 and B=b1b0 as shown in figure 3.1
3.1. Firstly, the least significant bits (LSB)
are multiplied which gives the LSB of the final product. Then, the LSB of the
multiplicand is multiplied with the next higher bit of the multiplier and added with,
the product of LSB of the multiplier and next higher bit of the multiplicand. The
sum gives second bit of the final product and carry is added with the partial
product obtained by multiplying the most significant bits to give the sum and carry.
The sum is the third corresponding bit and carry becomes the fourth bit of the final
product.
s0= a0b0; (1)
c1s1= a1b0+a0b1; (2)
c2s2= c1+a1b1; (3)
The final result will be c2s2s1s0.This multiplication method is applicable for all
cases.

Figure 3.1 The Vedic Multiplication Method for two 22-bit binary numbers for
2X2 bit
18

The 2X2 Vedic multiplier(VM) module is implemented using four


input AND gates and two half
half-adders
adders which is displayed in its block diagram in
Figure 3.2.. The same method can be extended number of input bits. The Vedic
multiplier is based on Urdhva
Urdhva-tiryakbyham Sutra.

Figure
ure 3.2 Block Diagram of 2X2 Vedic Multiplier
3.1.1 HARDWARE REALIZATION OF 2X2 MULTIPLIER BLOCK
The hardware realization of 2 X 2 multiplier block is illustrated in
Figure 3.3. For the sake of simplicity, usage of clock and registers is not shown,
but emphasis has been laid on understanding of the algorithm.

Figure 3.3 Hardware realization of 2 X 2 block


19

3.1.2 EXAMPLE OF 2X2 VEDIC MULTIPLICATION


Example of decimal and binary Vedic multiplication of 2X2 bit is
shown below in Figure 3.4 and 3.5.

Figure 3.4 Vedic multiplication of decimal numbers

Figure 3.5 Vedic multiplication of binary numbers


20

3.1.3 ALGORITHM
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) .
X1X0
* Y1Y0
FEDC

STEP 1:CP1=X0*Y0=C1C0
STEP 2;C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
X=multiplicand
Y=multiplier
3.2 VEDIC MULTIPLIER FOR 4X4 BIT
Block diagram of 4X4 bit Vedic multiplier is shown in figure 3.6. To
get the final product, 4 two bit Vedic multipliers are used and three 4 bit ripple
carry adders are required. In this proposal, the first 4-bit RC adder is used to add
two 4-bit operands obtained from cross multiplication of the two middle 2X2 bit
multiplier modules. The second 4-bit RC adder is used to add two 4-bit operands.
i.e. concatenated 4-bit(00 & most significant two output bits of right hand most
of 2X2 multiplier module) and one 4-bit operand we get as the output sum of hand
21

most of 2X2 multiplier


iplier module. Early literature speaks about Vedic multipliers
speaks about Vedic multipliers based on array multiplier structures. Similarly, 8X8
bit multiplier is constructed using four 4 bit multipliers and 16X16 bit multiplier is
constructed using four 8 bit multipliers and so on.

Figure
ure 3.6 Block diagram
iagram of 4X4 Vedic Multiplier
Here, instead of following serial addition, the addition tree has been
modified to Wallace tree look alike, thus reducing the levels of addition to 2,
instead of 3. Here, two lower bits of q0 pass directly to output, while the upper bits
of q0 are fed into addition tree. The bits being fed to addition tree can be further
illustrated
lustrated by the diagram in figure 3.7
3.7.

Figure 3.7 Addition of partial products in 4 x 4 block


22

3.2.1 Algorithm for 4 x 4 bit Vedic multiplier Using Urdhva Tiryakbhyam


(Vertically and crosswise) for two Binary numbers
CP = Cross Product (Vertically and Crosswise)
X3 X2 X1 X0

Multiplicand

Y3 Y2 Y1 Y0

Multiplier

-----------------------------------------------------------------H

---------------------------------------------------------------P7

P6

P5

P4

P3

P2

P1

P0

Product

-------------------------------------------------------------------PARALLEL COMPUTATION METHODOLOGY


1. CP

X0 = X0 * Y0 = A
Y0

2. CP

X1 X0 = X1 * Y0+X0 * Y1 = B
Y1 Y0

3. CP

X2 X1 X0 = X2 * Y0 +X0 * Y2 +X1 * Y1 = C
Y2 Y1 Y0

4. CP

X3 X2 X1 X0 = X3 * Y0 +X0 * Y3+X2 * Y1 +X1 * Y2 = D


Y3 Y2 Y1 Y0

5. CP

X3 X2 X1 = X3 * Y1+X1 * Y3+X2 * Y2 = E
Y3 Y2 Y1

6. CP

X3 X2= X3 * Y2+X2 * Y3 = F
Y3 Y2

7. CP

X3 = X3 * Y3 = G
Y3

Where CP =cross product

23

3.2.2 EXAMPLE OF 4X4 VEDIC MULTIPLICATION


Example of 4X4 Vedic multiplication is shown below in figure
f
3.8.

Figure 3.8 Line diagram for multip


multiplication of two 4 - bit numbers
Firstly, least significant bits are multiplied which gives the least
significant bit of the product (vertical). Then, the LSB of the multiplicand is
multiplied with the next higher bit of the multiplier and added with the product of
LSB of multiplier and next higher bit of the multiplicand (crosswise). The sum
gives second bit of the product and the carry is added in the output of next stage
sum obtained by the crosswise and vertical multiplication and addition of three
bits of the two numbers from least significant position. Next, all the four bits are
processed withh crosswise multiplication and aaddition to give the sum and carry.
The sum is the corresponding bit of the product and the carry is again added to the
next stage multiplication and addition oof three bits except the LSB. The same
operation continues until the multiplication of the two MSBs to give the MSB of
24

the product. For example, if in some intermediate step, we get 110, then 0 will act
as result bit (referred as rn) and 11 as the carry ((referred
referred as cn). It should be clearly
noted that cn may be a multi--bit number.
Thus we get the following expressions:
r0=a0b0;
c1r1=a1b0+a0b1;
c2r2=c1+a2b0+a1b1 + a0b2;;
c3r3=c2+a3b0+a2b1 + a1b2 + a0b3
a0b3;
c4r4=c3+a3b1+a2b2 + a1b3;;
c5r5=c4+a3b2+a2b3;
c6r6=c5+a3b3
With c6r6r5r4r3r2r1r0 being the final product. Hence this is the general
mathematical formula applicable to all cases of multiplication.
3.2.3 HARDWARE ARCHITECTURE
This hardware design is very similar to that of the famous array
multiplier where an array of adders is required to arrive at the final product.

Figure
ure 3.9 Hardware Architecture
Hardware architecture of 4x4 mu
multiplier is shown in figure 3.9.
25

3.3 8 X 8 MULTIPLIER
The 8 X 8 multiplier is made by using 4, 4 X 4 multiplier blocks.
Here, the multiplicands are of bit size(n=8) where as the result is of 16 bit size.
The input is broken into smaller chunks size of n/2 = 4, for both inputs, that is a
and b, just like as in case of 4 X 4 multiplier block.These newly formed chunks of
4 bits are given as input to 4 X 4 multiplier block, where again these new chunks
are broken into even smaller chunks of size n/4 = 2 and fed to 2 X 2 multiply
block. Block diagram of 8X8 Vedic m
multiplier
ultiplier is shown in figure 3.10.

Figure 3.10
10 Block diagram of 8X8 Vedic multiplier
The result produced , from output of 4 X 4 bit multiply block which
is of 8 bits, are sent for addition to an addition tree, as shown in the figure
f
3.11
below. Here, one fact must be kept in mind that, each 4 X 4 multiply block works
as illustrated in figure 3.6. In 8 X 8 Multiply block, lower 4 bits of q0 are passed
26

directly to output and the remaining bits are fed for addition tree, as shown in
figure 3.11.

Figure 3.11 Addition of Partial Products in 8 X 8 block


3.3.1 ALGORITHM FOR 8X8 BIT MULTIPLICATION
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) .LSB bits are A0,A1,A2,A3,B0,B1,B2,B3
A7,A6,A5,A4, B7,B6,B5,B4.
A= A7A6A5A4 A3A2A1A0
X1

X0

B= B7B6B5B4 B3B2B1B0
Y1

Y0

X1 X0
* Y1 Y0
--------------------------------------------------------FEDC
STEP 1:CP = X0 * Y0 = C
STEP 2:CP = X1 * Y0 + X0 * Y1 = D
27

and MSB are

STEP 3:CP = X1 * Y1 = E
Where CP = Cross Product.
Each Multiplication operation is an embedded parallel 4x4 Multiply module.
3.3.2 EXAMPLE OF 8X8 VEDIC MULTIPLICATION
An example of 8X8 vedic multiplication of binary numbers is shown
in figure 3.12 below.

Figure 3.12 Example of 8X8 Vedic multiplication


Lets say 8x8 bit multiplication of 11111111 and 00001001. While
doing multiplication for higher no of bits, divide the number of bit equally and do
the same analysis that used for 4x4 multiplications. It means, 11111111 should be
treated as 1111 and 1111. Similarly 00001001 should be treated as 0000 and 1001.
So the four different multiplications will be Now adder will add 00000000 and
10000111 giving sum as 10000111 with no carry out, and the adder will add the
result of the adders with 00001000 and will result sum as 10001111. Since no carry

28

is generated from either of the adder, so adder will give both sum and carry out as
zero, so nothing is to be added with 0000, so final result will be:
S0=1,S1=1,S2=1,S3=0,S4=1
S0=1,S1=1,S2=1,S3=0,S4=1,S5=1,S6=1,S7=1,S8=0,S9=0,S10=0,S11=1,
S11=1,
S12=0,S13=0,S14=0,S15=0.The
2=0,S13=0,S14=0,S15=0.The final answer happens to be 0000100011110111.
3.4 16 X 16 BIT MULTIPLIER
The 16 X 16 Multiplier is made by using 4, 8 X 8 multiplier blocks.
Here, the multiplicands are of bit size(n = 16) where as the result is of 32 bit size.
The input is broken into smaller chunks size of n/2 = 8, for both inputs, that is a
and b. These newly formed chunks of 8 bits are given as input to 8 X 8 multiplier
block, where again these new chunks are broken into even smaller chunks of size
n/4 = 4 and fed to 4 X 4 multiplier block, just as in case of 8 X 8 Multiply block.
Again, the new chunks are divided in half, to get chunks of size 2, which are fed to
2 X 2 multiplier block. The result produced, from output of 8 X 8 bit multiply
block which is of 16 bits, are sent for addition to an addi
addition
on tree, as shown in the
figure 3.13.

Figure 3.13 Block di


diagram
agram of 16 X 16 Multiply block
29

Here, as shown in figure 3.14 , the lower 8 bits of q0 directly pass on


to result, while the higher bits are fed for addition into the addition tree.
The adition of partial products is shown in figure 3.14 below.

Figure 3.14 Addition of Partial products in 16 X 16 block


3.4.1 ALGORITHM OF 16X16 VEDIC MULTIPLICATION
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the
most

significant

bits(MSB).

LSB

A0,A1,A2,A3,A5,A6,A7,B0,B1,B2,B3,B4,B5,B6,B7

bits
and

MSB

A8,A9,A10,A11,A12,A13,A14,A15 and B8,B9,B10,B11,B12,B13,B14,B15.


A= A15A14A13A12A11A10A9A8
X1
B= B15B14B13B12B11B10B9B8
Y1

A7A6A5A4A3A2A1A0
X0
B7B6B5B4B3B2B1B0
Y0

X1X0
* Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
30

are
are

STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
3.5 32 X 32 VEDIC MULTIPLIER
The 32 X 32 Multiplier is made by using 4, 16 X 16 multiplier blocks
as shown in figure 3.15 Here, the multiplicands are of bit size(n=32) where as the
result is of 64 bit size. The input is broken into smaller chunks size of n/2 = 16, for
both inputs, that is a and b.

Figure 3.15 Block diagram of 32X32 Vedic multiplier

31

These newly formed chunks of 16 bits are given as input to 16 X 16


multiplier block, where again these new chunks are broken into even smaller
chunks of size n/4 = 8 and fed to 8 X 8 multiply block, just as in case of 16 X 16
block. Again new chunks are divided in half, to get chunks of size 4, which is then
fed to 4 X 4 multiply block. The result produced, is again fed to 2 x 2 multiplier,
then the resultant bits are sent for addition to an addition tree.
3.5.1 ALGORITHM OF 32X32 VEDIC MULTIPLICATION
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) . Both LSB, MSB consists of 16 bits.
A=A31-A16

A15-A0

X1

X0

B= B31-B16

B15-B0

Y1

Y0

X1X0
Y1Y0
FEDC

STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
32

3.6 64 X 64 VEDIC MULTIPLIER


The 64 X 64 multiplier is made by using 4, 32 X 32 multiplier
blocks. Here, the multiplicands are of bit size(n=64) where as the result is of 128
bit size. The input is broken into smaller chunks size of n/2 = 32, for both inputs,
that is a and b. These newly formed chunks of 32 bits are given as input to 32 X 32
multiplier block, where again these new chunks are broken into even smaller
chunks of size n/4 = 16 and fed to 16 X 16 multiply block, just as in case of 32 X
32 block. Again new chunks are divided in half, to get chunks of size 8, which is
then fed to 8 X 8 multiply block. The result produced, is again fed to 4 X 4
multiplier, then the resultant bits are fed to 2 X 2 and final resultant bits are sent
for addition to an addition tree, as shown in figure 3.16.

Figure 3.16 64 X 64 VEDIC MULTIPLIER

33

3.6.1.ALGORITHM OF 64X64 VEDIC MULTIPLICATION:


Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) . Both LSB, MSB consists of 32 bits.
A=A63-A32

A31-A0

X1

X0

B= B63-B32
Y1

B31-B0
Y0
X1X0
* Y1Y0
FEDC

STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
3.7 RIPPLE CARRY ADDER
The arrangement of Ripple Carry Adder as shown in figure 3.17 helps
to reduce delay. A simple ripple carry adder is a digital circuit that produces the
arithmetic sum of two binary numbers.It can be constructed by a number of full
adders connected in cascade, with a carry output of each adder connected to carry
input of next full adder in chain. Each full adder inputs a cin , which is the cout of

34

the previous adder. This kind of adder is called a ripple carry adder, since each
carry bit ripples to the next full adder.
First full adder may be replaced by a half adder. The layout of a ripple
carry adder is simple , which allows for fast design time.

Figure 3.17 Circuit Diagram of 4 bit Ripple Carry Adder


3.8.MULTIPLY ACCUMULATE UNIT:
Multipliy-accumulate operation is one of the basic arithmetic
operations extensively used in modern digital signal processing(DSP). Most
arithmetic, such as digital filtering, convolution and fast Fourier Transform(FFT),
requires high-performance multiply accumulate operations. The multiplyaccumulator(MAC) unit always lies in the critical path that determines the speed
of the overall hardware systems. Therefore, a high-speed MAC that is capable of
supporting multiple precisions and parallel operations is highly desirable.

35

3.8.1 BASIC MAC ARCHITECTURE


Basically a MAC unit employs a fast multiplier fitted in the data path
and the multiplied output of multiplier is fed into a fast adder which is set to zero
initially. The result of addition is stored in an accumulator register. The MAC unit
should be able to produce output in one clock cycle and the new result of addition
is added to the previous one and stored in the accumulator register. Figure
Fig
3.18
below shows basic MAC architecture.
Here the multiplier that has been used is a Vedic Multiplier using
Urdhva Tiryakbyham Sutra and has been fitted into the MAC design.

Figure 3.18 Basic MAC architecture


3.8.2 MAC UNIT USING VEDIC MULTIPLIER
In the MAC unit, the data inputs A and B are stored in two data
registers, that is data a_reg and data b_reg. Then the inputs are fed into a Vedic
multiplier,, which stores the result in Multiply_reg. The contents of Multiply_reg
are continuously fed into a conventional adder and the result is stored in a
dataout_reg. Here, the MAC unit make use of two clocks, one for the operation of
36

MAC unit and the other one, namely clk2 for the multiplier. The frequency of clk2
should be 4 times the frequency of MAC unit for proper operation. A clock divider
by 4 circuit may be used, in future here, which takes clk2 as the parent clock and
produces clk as the daughter clock, which is 4 times slower than the parent clock,
but with 50% duty cycle. The faster clock clk2 is used for the multiplier while
slower clock clk is used for the MAC unit. The data coming as input to MAC
may vary with clock clk.
The signal clr when applied , makes the contents of all the data
registers that is Data a_reg,Data b-reg,multiply_reg and dataout_reg to be forced to
be zero. The clken signal is used to enable the MAC operation. Figure 3.19
shows the architecture of MAC.

Figure 3.19 MAC using Vedic Multiplier


Multiplication Accumulation is an important part of real-time digital signal
processing (DSP) with applications ranging from digital filtering to image processing.
Multiply and accumulate is a very common basic-level operation seen in many DSP
37

designs/algorithms. Two numbers are multiplied together, and added into an


accumulator register. As shown in figure 3.20 and 3.21, the basic MAC unit consists of
multiplier, adder and accumulator.

Figure 3.20 Architecture of Vedic Multiplier

Figure 3.21 Architecture of Booth Multiplier


38

In general MAC unit uses the conventional multiplier unit, which consists
of multiplication of multiplier and multiplicand based on adding the generated partial
products and to compute the final multiplication. This results to adding the partial
products. The key to the proposed MAC unit is to enhance the performance of MAC
using Vedic Multiplier and to compare the Vedic, Booth and conventional multiplier in
terms of computation required to generate the partial products and add the generated
partial products to get the final result of the multiplication.
3.9 ADVANTAGES
i)Vedic Multiplier is faster than array multiplier and Booth multiplier. As the
number of bits increases from 4X4 bits to 32x32 bits, the timing delay is greatly
reduced for Vedic multiplier as compared to other multipliers. Vedic Multiplier has
the greatest advantage as compared to other multipliers over gate delays and
regularity of structures.
ii) Power dissipation is very less when compared to booth multipliers.
3.10 APPLICATIONS
i)MAC.
ii)DSP applications(FIR,IIR filters).

3.11 TOOLS USED


3.11.1 SOFTWARE USED
i)Modelsim 6.3 for simulation:
Modelsim is a popular hardware simulation and debug environment
primarily targeted at smaller ASIC and FPGA design. ModelSim provides a
39

complete HDL simulation environment that enables you to verify the functional
and timing models of your design, and your HDL source code. It is optimized for
use with all configurations of Xilinx ISE products.
ii)Xilinx 10.1 for synthesis:
Xilinx ISE is a software tool produced by Xilinx for synthesis and
analysis of HDL designs, which enables the developer to synthesize ("compile")
their designs, perform timing analysis, examine RTL diagrams, simulate a design's
reaction to different stimuli, and configure the target device with the programmer.
3.11.2 HARDWARE USED
FIELD PROGRAMMABLE GATE ARRAY (FPGA)
FPGAs are programmable semiconductor devices that are based
around a matrix of Configurable Logic Blocks (CLBs) connected through
programmable interconnects. As opposed to Application Specific Integrated
Circuits (ASICs), where the device is custom built for the particular design,
FPGAs can be programmed to the desired application or functionality
requirements. Although a One-Time Programmable (OTP) FPGAs are available. In
our project we are using Spartan 3 FPGA kit.
SPARTAN 3
The Spartan 3 trainer xc3s400 pq208 is useful to realize and verify
digital designs. User can construct Verilog/VHDL code and verify the results by
implementing physically into the target device (FPGA). With the help of this kit
user can simulate/observe various input and output conditions to verify the
implemented design.

40

You might also like