Professional Documents
Culture Documents
PROPOSED SYSTEM
3.1 2X2 VEDIC MULTIPLIER
The method explained below for two, 2 bit numbers A and B where
A=a1a0 and B=b1b0 as shown in figure 3.1
3.1. Firstly, the least significant bits (LSB)
are multiplied which gives the LSB of the final product. Then, the LSB of the
multiplicand is multiplied with the next higher bit of the multiplier and added with,
the product of LSB of the multiplier and next higher bit of the multiplicand. The
sum gives second bit of the final product and carry is added with the partial
product obtained by multiplying the most significant bits to give the sum and carry.
The sum is the third corresponding bit and carry becomes the fourth bit of the final
product.
s0= a0b0; (1)
c1s1= a1b0+a0b1; (2)
c2s2= c1+a1b1; (3)
The final result will be c2s2s1s0.This multiplication method is applicable for all
cases.
Figure 3.1 The Vedic Multiplication Method for two 22-bit binary numbers for
2X2 bit
18
Figure
ure 3.2 Block Diagram of 2X2 Vedic Multiplier
3.1.1 HARDWARE REALIZATION OF 2X2 MULTIPLIER BLOCK
The hardware realization of 2 X 2 multiplier block is illustrated in
Figure 3.3. For the sake of simplicity, usage of clock and registers is not shown,
but emphasis has been laid on understanding of the algorithm.
3.1.3 ALGORITHM
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) .
X1X0
* Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2;C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
X=multiplicand
Y=multiplier
3.2 VEDIC MULTIPLIER FOR 4X4 BIT
Block diagram of 4X4 bit Vedic multiplier is shown in figure 3.6. To
get the final product, 4 two bit Vedic multipliers are used and three 4 bit ripple
carry adders are required. In this proposal, the first 4-bit RC adder is used to add
two 4-bit operands obtained from cross multiplication of the two middle 2X2 bit
multiplier modules. The second 4-bit RC adder is used to add two 4-bit operands.
i.e. concatenated 4-bit(00 & most significant two output bits of right hand most
of 2X2 multiplier module) and one 4-bit operand we get as the output sum of hand
21
Figure
ure 3.6 Block diagram
iagram of 4X4 Vedic Multiplier
Here, instead of following serial addition, the addition tree has been
modified to Wallace tree look alike, thus reducing the levels of addition to 2,
instead of 3. Here, two lower bits of q0 pass directly to output, while the upper bits
of q0 are fed into addition tree. The bits being fed to addition tree can be further
illustrated
lustrated by the diagram in figure 3.7
3.7.
Multiplicand
Y3 Y2 Y1 Y0
Multiplier
-----------------------------------------------------------------H
---------------------------------------------------------------P7
P6
P5
P4
P3
P2
P1
P0
Product
X0 = X0 * Y0 = A
Y0
2. CP
X1 X0 = X1 * Y0+X0 * Y1 = B
Y1 Y0
3. CP
X2 X1 X0 = X2 * Y0 +X0 * Y2 +X1 * Y1 = C
Y2 Y1 Y0
4. CP
5. CP
X3 X2 X1 = X3 * Y1+X1 * Y3+X2 * Y2 = E
Y3 Y2 Y1
6. CP
X3 X2= X3 * Y2+X2 * Y3 = F
Y3 Y2
7. CP
X3 = X3 * Y3 = G
Y3
23
the product. For example, if in some intermediate step, we get 110, then 0 will act
as result bit (referred as rn) and 11 as the carry ((referred
referred as cn). It should be clearly
noted that cn may be a multi--bit number.
Thus we get the following expressions:
r0=a0b0;
c1r1=a1b0+a0b1;
c2r2=c1+a2b0+a1b1 + a0b2;;
c3r3=c2+a3b0+a2b1 + a1b2 + a0b3
a0b3;
c4r4=c3+a3b1+a2b2 + a1b3;;
c5r5=c4+a3b2+a2b3;
c6r6=c5+a3b3
With c6r6r5r4r3r2r1r0 being the final product. Hence this is the general
mathematical formula applicable to all cases of multiplication.
3.2.3 HARDWARE ARCHITECTURE
This hardware design is very similar to that of the famous array
multiplier where an array of adders is required to arrive at the final product.
Figure
ure 3.9 Hardware Architecture
Hardware architecture of 4x4 mu
multiplier is shown in figure 3.9.
25
3.3 8 X 8 MULTIPLIER
The 8 X 8 multiplier is made by using 4, 4 X 4 multiplier blocks.
Here, the multiplicands are of bit size(n=8) where as the result is of 16 bit size.
The input is broken into smaller chunks size of n/2 = 4, for both inputs, that is a
and b, just like as in case of 4 X 4 multiplier block.These newly formed chunks of
4 bits are given as input to 4 X 4 multiplier block, where again these new chunks
are broken into even smaller chunks of size n/4 = 2 and fed to 2 X 2 multiply
block. Block diagram of 8X8 Vedic m
multiplier
ultiplier is shown in figure 3.10.
Figure 3.10
10 Block diagram of 8X8 Vedic multiplier
The result produced , from output of 4 X 4 bit multiply block which
is of 8 bits, are sent for addition to an addition tree, as shown in the figure
f
3.11
below. Here, one fact must be kept in mind that, each 4 X 4 multiply block works
as illustrated in figure 3.6. In 8 X 8 Multiply block, lower 4 bits of q0 are passed
26
directly to output and the remaining bits are fed for addition tree, as shown in
figure 3.11.
X0
B= B7B6B5B4 B3B2B1B0
Y1
Y0
X1 X0
* Y1 Y0
--------------------------------------------------------FEDC
STEP 1:CP = X0 * Y0 = C
STEP 2:CP = X1 * Y0 + X0 * Y1 = D
27
STEP 3:CP = X1 * Y1 = E
Where CP = Cross Product.
Each Multiplication operation is an embedded parallel 4x4 Multiply module.
3.3.2 EXAMPLE OF 8X8 VEDIC MULTIPLICATION
An example of 8X8 vedic multiplication of binary numbers is shown
in figure 3.12 below.
28
is generated from either of the adder, so adder will give both sum and carry out as
zero, so nothing is to be added with 0000, so final result will be:
S0=1,S1=1,S2=1,S3=0,S4=1
S0=1,S1=1,S2=1,S3=0,S4=1,S5=1,S6=1,S7=1,S8=0,S9=0,S10=0,S11=1,
S11=1,
S12=0,S13=0,S14=0,S15=0.The
2=0,S13=0,S14=0,S15=0.The final answer happens to be 0000100011110111.
3.4 16 X 16 BIT MULTIPLIER
The 16 X 16 Multiplier is made by using 4, 8 X 8 multiplier blocks.
Here, the multiplicands are of bit size(n = 16) where as the result is of 32 bit size.
The input is broken into smaller chunks size of n/2 = 8, for both inputs, that is a
and b. These newly formed chunks of 8 bits are given as input to 8 X 8 multiplier
block, where again these new chunks are broken into even smaller chunks of size
n/4 = 4 and fed to 4 X 4 multiplier block, just as in case of 8 X 8 Multiply block.
Again, the new chunks are divided in half, to get chunks of size 2, which are fed to
2 X 2 multiplier block. The result produced, from output of 8 X 8 bit multiply
block which is of 16 bits, are sent for addition to an addi
addition
on tree, as shown in the
figure 3.13.
significant
bits(MSB).
LSB
A0,A1,A2,A3,A5,A6,A7,B0,B1,B2,B3,B4,B5,B6,B7
bits
and
MSB
A7A6A5A4A3A2A1A0
X0
B7B6B5B4B3B2B1B0
Y0
X1X0
* Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
30
are
are
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
3.5 32 X 32 VEDIC MULTIPLIER
The 32 X 32 Multiplier is made by using 4, 16 X 16 multiplier blocks
as shown in figure 3.15 Here, the multiplicands are of bit size(n=32) where as the
result is of 64 bit size. The input is broken into smaller chunks size of n/2 = 16, for
both inputs, that is a and b.
31
A15-A0
X1
X0
B= B31-B16
B15-B0
Y1
Y0
X1X0
Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
32
33
A31-A0
X1
X0
B= B63-B32
Y1
B31-B0
Y0
X1X0
* Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
3.7 RIPPLE CARRY ADDER
The arrangement of Ripple Carry Adder as shown in figure 3.17 helps
to reduce delay. A simple ripple carry adder is a digital circuit that produces the
arithmetic sum of two binary numbers.It can be constructed by a number of full
adders connected in cascade, with a carry output of each adder connected to carry
input of next full adder in chain. Each full adder inputs a cin , which is the cout of
34
the previous adder. This kind of adder is called a ripple carry adder, since each
carry bit ripples to the next full adder.
First full adder may be replaced by a half adder. The layout of a ripple
carry adder is simple , which allows for fast design time.
35
MAC unit and the other one, namely clk2 for the multiplier. The frequency of clk2
should be 4 times the frequency of MAC unit for proper operation. A clock divider
by 4 circuit may be used, in future here, which takes clk2 as the parent clock and
produces clk as the daughter clock, which is 4 times slower than the parent clock,
but with 50% duty cycle. The faster clock clk2 is used for the multiplier while
slower clock clk is used for the MAC unit. The data coming as input to MAC
may vary with clock clk.
The signal clr when applied , makes the contents of all the data
registers that is Data a_reg,Data b-reg,multiply_reg and dataout_reg to be forced to
be zero. The clken signal is used to enable the MAC operation. Figure 3.19
shows the architecture of MAC.
In general MAC unit uses the conventional multiplier unit, which consists
of multiplication of multiplier and multiplicand based on adding the generated partial
products and to compute the final multiplication. This results to adding the partial
products. The key to the proposed MAC unit is to enhance the performance of MAC
using Vedic Multiplier and to compare the Vedic, Booth and conventional multiplier in
terms of computation required to generate the partial products and add the generated
partial products to get the final result of the multiplication.
3.9 ADVANTAGES
i)Vedic Multiplier is faster than array multiplier and Booth multiplier. As the
number of bits increases from 4X4 bits to 32x32 bits, the timing delay is greatly
reduced for Vedic multiplier as compared to other multipliers. Vedic Multiplier has
the greatest advantage as compared to other multipliers over gate delays and
regularity of structures.
ii) Power dissipation is very less when compared to booth multipliers.
3.10 APPLICATIONS
i)MAC.
ii)DSP applications(FIR,IIR filters).
complete HDL simulation environment that enables you to verify the functional
and timing models of your design, and your HDL source code. It is optimized for
use with all configurations of Xilinx ISE products.
ii)Xilinx 10.1 for synthesis:
Xilinx ISE is a software tool produced by Xilinx for synthesis and
analysis of HDL designs, which enables the developer to synthesize ("compile")
their designs, perform timing analysis, examine RTL diagrams, simulate a design's
reaction to different stimuli, and configure the target device with the programmer.
3.11.2 HARDWARE USED
FIELD PROGRAMMABLE GATE ARRAY (FPGA)
FPGAs are programmable semiconductor devices that are based
around a matrix of Configurable Logic Blocks (CLBs) connected through
programmable interconnects. As opposed to Application Specific Integrated
Circuits (ASICs), where the device is custom built for the particular design,
FPGAs can be programmed to the desired application or functionality
requirements. Although a One-Time Programmable (OTP) FPGAs are available. In
our project we are using Spartan 3 FPGA kit.
SPARTAN 3
The Spartan 3 trainer xc3s400 pq208 is useful to realize and verify
digital designs. User can construct Verilog/VHDL code and verify the results by
implementing physically into the target device (FPGA). With the help of this kit
user can simulate/observe various input and output conditions to verify the
implemented design.
40