A Novel Modified Distributed Arithmetic Scheme For Lms Low Power and Reduced Area Fir Adaptive Filter Implementation

International Journal of Electrical and Electronics Engineering Research and Development
(IJEEERD) ISSN 2248-9282 (Print), ISSN 2248-9290 (Online) Volume 4 Number 2, April-June (2014)
1

A NOVEL MODIFIED DISTRIBUTED ARITHMETIC SCHEME FOR
LMS LOW POWER AND REDUCED AREA FIR ADAPTIVE FILTER
IMPLEMENTATION

1
P.Sridhar M.E,
2
S.Vishnupriya M.E,
3
J.Arputhamary M.E

1,2,3
PG Scholars (Applied Electronics)

ABSTRACT

In emerging trend of development of DSP algorithms adaptive filter structure retain
its own important. The major operation of any DSP algorithms is multiplication it replaced by
bit serial distributed arithmetic technique. Distributed arithmetic scheme perform the
multiplication operation by explicit ROM look up tables. we propose the novel modified
distributed arithmetic scheme based LMS (Least mean Square) filter structure with carry
select adder accumulation it replace previous Distributed arithmetic - carry save adder
accumulation, our proposed structure of adaptive filter for N=16, N=32 utilize the power
consumption are 116 mw and 158 mw and LUT accumulation is 2605, 3340 respectively.
2.7% reduction in LUT 26% minimization in consuming the power compare to previous
design [1].

Keywords: Adaptive filter, pipelined circuit optimization, distributed arithmetic (DA), least
mean square (LMS) algorithm.

I. INTRODUCTION

In all the applications of digital signal process the signal arise is non-stationary, in
which the fixed (static) co-efficient filter not adequate, so the filter is not be dynamic in
coefficient so adaptive filter is one such type of filter, whose filter coefficient are varies
dynamically based on the non-stationary signal arise. Heart of any adaptive filter is update
equations of (1)

W
W

W
(1)

Where W
n
is a correction that is applied to the filter coefficient, W
n
at a time n to
form a new set of co-efficient, Wn
+1
at time n+1. The design of an adaptive filter involves
framing how this correction is to be formed. In ordinary multiplier based adaptive FIR (Finite
IJEEERD

PRJ PUBLICATION
International Journal of Electrical and Electronics Engineering
Research and Development (IJEEERD)

ISSN 2248 9282(Print)
ISSN 2248 9290(Online)
Volume 4, Number 2, April-June (2014), pp. 1-11
PRJ Publication, http://www.prjpublication.com/IJEERD.asp
2

Impulse Response) filter the following set of problem arise like long critical path, High
adaption delay, low throughput, huge area consumption and complexity, in emerging
technology operation of multiplier is performed by some other alternative is distributed
arithmetic if performed the multiplication effective in cost and efficient in area. The
contribution to our proposed design below.

i. By concurrent LUT update significantly throughput is improved.
ii. By parallel operation of filtering and correction update help to added improvement in
throughput.
iii. Distributed arithmetic Square root carry select adder accumulation replace the previous
Distributed arithmetic carry save accumulation which limit the long critical path of the
filter operations this also helpful to area efficiency and low power operations.
iv. By using Fast word clock for Distributed arithmetic Square root carry select adder
accumulation slow bit clock for reaming filter operations contribute to reduced power
consumption.
v. For our proposed filter structure the additional master structure is not required for
generating the Distributed arithmetic look up table (LUT).
In the Section II review the pipelined LMS algorithm, Filter multiplication operation
by DA is offered in the section III, tracked by LMS adaptive filter construction based on DA
in section IV, Section V contain the Conclusions.

II. PIPLINED LEAST MEAN SQUARE ADAPTIVE ALGORITHM

During each control step the least mean square algorithm perform the weight vector
update operation based on the following weight vector update equation

W
W

W
(2)

Where W
n
is filter coefficient at time n.

en xn (3)

is step size of the digital filter x (n) is non-stationary input sequence to the filter.

xn xn, xn 1, . . , xn p

W

W

0, W

1, . . , W

p

Update for the n
th
coefficient

W
k W

k enx n k) (4)

3

Fig.1.LMS adaptive filter Structure

In the pipelined architecture the feedback-error e (n) before to the k
th
filtering
operation cycle is not available for updating the filter weights in the same operation cycle.
The pipeline architecture introduce the delay, this delay makes available of feedback error
after certain number of cycles, which introduces the delay called the Adaptation delay.so the
feedback error is delayed by N
p
cycle time where p

is pipeline stages. The weight vector
update equation of the pipelined Least Mean Square (LMS) error adaptive filter is furnished
below.

w
. en N
. x
(5)

Where e (n-N
p
) pipelined feedback error

III. FILTER MULTIPLICATION OPERATION BY DA

Distributed arithmetic is a bit serial operations, this operation effective in terms of
chip area than bit parallel operations. Major part of all the filter structure is multiply and
accumulation process that given by

.

.
6

Where

7

Is normalized w
d
bit binary in signed twos complement representation

X(n)
LMS algorithm
N
p
D
W
n
(z)
N
p
D
w
n
Y(n)
d(n )
d(n )
4

8
Which can be written as

2

9

y F
, . .
, .

Where
F
, x
, . . x
. x
10

For example we take W
d
= 4

Fig.3.Block diagram for conventional distributed arithmetic
0
a
0
a
1
a
2

a
1
+a
0
a
3
+a
2
+a
0

a
2
+a
0

a
2
+a
1

a
2
+a
1
+a
0
a
3
a
3
+a
0
a
3
+a
1
a
3
+ a
1
+a
0
a
3
+a
2
a
3
+a
2
+a
1
a
3
+a
2
+a
1
+a
0
ADDADD/SUB
Shift register
X
i1
X
i2
X
i3
X
i4
LSB
Y
5

The figure 3 is clearly explains that, for each clock the16 different combination of
input sequence, the equivalent DALUT content will be selected. This DALUT will be added
or subtracted from the input sequence bit serial out of shift registers. Initially x
n
(w
d-1
)a
N

added with cleared accumulator register in the next clock cycle outputs of the shift register
map the w
N
(w
d-2
)a
N
which is summed with value in the accumulator content. So entire inner
product computation or multiply and accumulation takes w
d-1
clock cycles at finally the sign
bit x
N0
a
N
is added or subtracted with accumulator content based on the sign it hold.
The content of the L
th
LUT can be derived from the following equation

C
a

.
11

Where k
i
is the (i+1)
th
bit of N bit binary value of integer is for 0K2
N
- 1. The C
L
can
be pre computed for a value and stored in distributed arithmetic Look Up Tables. We store
only (2
N
1) different values in Distributed Arithmetic Look up Tables with (2
N
1) RAM
locations.
4 bit Square root variable size carry select adder scheme for shift accumulation given
in below figure

.

Fig.4. 4 bit Square root variable sized carry select adder scheme for shift accumulation

IV. LMS ADAPTIVE FILTER IMPLEMENTATION BASED ON DA

Distributed Arithmetic Look up Table (DALUT) size depend on size of the filter
length. Both are directly proportional each other. Bottom up approach handles to avoid the
larger size of the DALUT. Therefore the larger filter section can be built from combines the
small filter structure. This bottom up approach inner product computation given below.

2
2
x(1:0)
2 bit RCA
0
2
y(3:2)
2 bit RCA
1
x(3:2)
M
UX
C
out 2 bit RCA
MUX
C
in
2
2
2
y(1:0)
Sum
6

Y a

.
a

.
a

.
12

Filter decomposition factor is given below (N/p). Where N is required large order
filter length and p is size of small order filter length.

IV. (A) SMALL ORDER LMS ADAPTIVE FILTER IMPLEMENTATION
The LMS adaptive filter of filter length N=4 is given figure 5. It consists of two major
unit 4-point inner product unit with weight update and error computation unit. Initially
DALUT consist of 15 register which consist of temporary inner product value F
k
for 1 I
15. 16:1 multiplexer just look up and pick one value among the 15 register content.
Multiplexer have the four selection weights x
4i
,x
3i
,x
2i,
x
1i
. 0 I N-1is fed into the MUX
Least Significant Bit first.

Fig.5. Proposed LMS adaptive FIR filter structure for length N=4

D
D
X(n
-2)
X(n
-3)
X(n-4)
X(n-5)
X(n+1
)
L
a(n)
a(n-1)
a(n)+a(n-1)
a(n-2)
a(n)+a(n-2)
a(n-1)+x(n-2)
a(n)+a(n-1)+a(n-2)
a(n-3)
a(n-1)+a(n-3)
a(n-2)+a(n-3)
a(n-1)+a(n-2)+a(n-3)
a(n)+a(n-3)
a(n)+a(n-1)+a(n-3)
a(n)+a(n-2)+a(n-3)
a(n)+a(n-1)+a(n-2)+a(n-3)

Weight Increment Block
>>2
D
D
d(n)
[C(n-2)] Sign-Mag
Separator
Control Word
Generator
mag ([C(n-2)])
Sig ([C(n-2)])
Carry
Select
Adder
Accum
ulation
L
+
2
D D
>>1
L+2
L
+
2
Sign control
0
MU
X
(16-
1)
V= {x
i1
,x
i2
,x
i3
,x
i4
}
4
DA
-
TA
BL
E
7

S = M([C(n-2)])
If S
6
=1 Then V=0
If S
5
=1 Then V=1
If S
4
=1 Then V=2
If S
3
=1 Then V=3
If S
2
=1 Then V=4
If S
1
=1 Then V=5
If S
0
=1 Then V=6
Else then V = 7

Fig.5.a.Weight Increment Block for N=4

The square Root carry select adder has the input from MUX outputs at first clock step
N bit cycles, square root carry select adder. Compute the 4-point inner product value by
accumulating all the temporary product value it generate the N+2 length sum and Cout. The
filter output is generated by adding the shift version of sum and Cout with (Cin=1). The
feedback correction is obtain from the difference between the desired output d(n) and filter
output y.
Sign magnitude separator have the input of 2 time shifted version of feedback
correction is converts to control input to barrel shifter, through control word generator. The
technique for generation of control input is given in figure 6. Where S is 7 bit word. M is
taken as 1/N which is Correction conversion factor.
Four barrel shifter and four adder or subtractor is present in the weight update unit.
Based on the MSB in magnitude of the feedback correction, the difference input values are
shifted (x
i
; i=0, 1, 2N-1). The require weight update is produced by the barrel shifter
which is increment (or) decrement the current weight based on the sign of the feedback
correction

Fig.6. Technique for control word V generation
BS-0
D
+/-
L
X(n-2)
BS-1
D
+/-
L
X(n-3)
BS-2
D
+/-
L
X(n-4)
BS-3
D
+/-
L
X(n-5)
Sig
([C(n-
2)])
WORD- PARALLEL BIT SERIAL CONVERTER
4
3
v
8

IV. B) LARGE ORDER LMS ADAPTIVE FILTER IMPLEMENTATION
The large order LMS adaptive filter is implemented by using the bottom up approach.
The bottom up approach mathematical equation for general large order filter is given below

Y a

.
a

.
a

.
13

Each multiply and accumulator term is the above equation represents the small order
filter structure used in the bottom up approach.
We design the filter length of N=32, for that we take the small order filer structure p=4.

Y a

.
a

.
a

.
a

.

a

.
a

.
a

.
, a

.
14

The equation 14 explain clearly for implementing LMS adaptive filter structure N=32,
we need eight LMS adaptive filter structure N=4.Each LMS adaptive filter structure N=4 will
generate filter output in term of sum and carry with word length (L+2).Adjacent LMS
adaptive filter output are summed by two isolated adder tree, which generate sum and carry
of length (L+3) similarly another two filter structure output summed. These two inter mediate
(L+3) sum and carry are added by the Boolean adder tree two generate the larger order
filter N=16 output in term of sum and carry with length (L+4).Carry length of (L + 4)
is summed with one bit right shifted version of sum to produce the filter output y (n-1), this
then subtracted from desire signal d (n) to find out the feedback correction of the weights.
The four LSB bit of feedback correction is truncated to get the work length w=16 from this
sign and magnitude are getting separated to update the filter coefficients.

V. SYNTHESIS AND ANALYSIS

In this chapter we analyze the hardware utilization and power consumption of the
LMS filter structure. The LMS adaptive filter structure is coded by using the VHDL language
and Xilinx for synthesize the filter design in the FPGA Spartan 2E device was taken in the
complier the target device is XC2S600E-7EG456. 8 bit word length input sample and weights
are taken for filter operations, for analysis the filter design we consider the following factors
Number of LUTS, Gate counts and Power Consumption. Table 1 explains clearly about
parameter comparison of the previous design with proposed design. The proposed N=4,
N=16, N =32 LMS adaptive filter structure, will accumulate 785, 2605,3340 LUTS and
9236,30972,40310 Gate counts, the power consumption are 81.35 mw , 116.36 mw,158.05
under 100 MHz clock frequency is used for carry select adder

9

Fig.7. LMS adaptive filter based on proposed structure of length N= 16 and P = 4ffd

Table.1. Parameter comparison

It reduce the LUT accumulation compare to DA structure with carry save adder by amount
6.3% , 3.5%, 2.7 % for the filter length N=4,N=16,N=32 respectively. The Power
Consumption is reduced to 15.3%, 31.39%, 25.8% for N =4, N=16, N=32 LMS adaptive
filter structure. In the proposed design adders/shifters/registers requirement are equal to filter
length N.

Fig.8.LUT Comparison for filter length N=16

0
500
1000
1500
2000
2500
3000
Previous design Proposed design
LUT for N=16
LUT for N=16
Parameters Previous design Proposed design
N=4 N=16 N=32 N=4 N=16 N=32
Number of LUTS 837 2697 3432 785 2605 3340
Gate counts 9431 31184 40530 9236 30972 40310
Power consumption (mW) 96.09 169.62 212.91 81.35 116.36 158.05
L+4 L+4
V0 x (n-3) v
LMS adaptive filter
(N=4)
S0(n)
C0(n)

V1 x (n-7) v
LMS adaptive filter
(N=4)
S1(n)
C1(n)

V3 x (n-15) v
LMS adaptive filter
(N=4)
S3(n)
C3(n)

V2 x (n-11) v
LMS adaptive filter
(N=4)
S2(n)
C2(n)

L+3
L+3
L+3
L+3
>>
L+4 -
R d (n) >>4 R
Sign-Mag
Separator
Control Word
Generator
Mag (m ((c (n-2))
Sign (m ((c (n-2))
L
Input
10

In our design one clock cycle is needed for inner product calculation and for feedback
correction also only one clock is needed. The clock requirement for this two unit directly
proportional to adaption delay. Therefore adaption delay for this design is two. This delay
does not produce any visible deprivation in merging performance. Figure 8 explain the
column chart of the LUTs of the proposed design with previous design for N =16 its shows
the less amount of LUT for proposed design.

Fig.9.LUT Comparison for filter length N=32

Figure 9 explain the column chart of the LUTs of the proposed design with previous
design for N =32 its shows the less amount of LUT for proposed design.

Fig.10.Power Utilization Comparison for filter length N=16 and N =32

Figure 10 explain the column chart of the of power utilization of the previous design
with proposed design for Both N=16 and N =32 under 100 MHz clock frequency

VI. CONCLUSION

Thus our proposed LMS based adaptive FIR filter structure is implemented in the
FPGA Spartan 2E and target device is XC2600E-7F9456 by using the VHDL, the proposed
filter structure for N=16, N=32 utilize the power consumption are 116 mw and 158 mw and
LUT accumulation is 2605, 3340 respectively, under 100 MHz fast clock for DA based carry
select adder accumulation. The LUT reduced to 2.7 % and power consumption is reduced to
26 % compare to previous design. The Object binary code based DA structure also applicable
to our design.

0
50
100
150
200
250
Power for N= 4
Power for N=16
Power for N =32
0
500
1000
1500
2000
2500
3000
3500
LUT for N=32
LUT for N=32
11

REFERENCES

[1] S.Haykin and B.Widrow, Least Mean Sauare Adaptive Filters.
Hoboken,NJ,USA:Wiely,2003.
[2] S.A.White,Applications of the Distributed arithmetic tool digital signal processing:A
Tutorial Review,IEEE Mag., vol. 6, no.3, pp. 4-19 july.1989. .
[3] D.J.Allred, H.Yoo, V.Krishnan, W.Huang, D.V.Anderson, LMS adaptive filter using
distributed arithmetic for high throughput,IEEE Trans. Circuit syst., I reg.papers,
vol.52,no.7,pp.1327-1337, jul.2005.
[4] R.Guo and L.S.DeBrunner,Two high performance adaptive filter implementation
scheme using distributed arithmetic,IEEETrans. Circuit Syst., II, Exp. Briefs,
v.58,no.9,pp.600-604,Sep.2001.
[5] R.Guo and L.S.DeBrunner,A novel adaptive filter implementation scheme using
distributed arithmetic, in Proc. Asilomar Conf. Signals, Ssyst., Comput., Nov. 2011,
pp. 160-164
[6] P.K.Meher and S.Y.Park, High-throughput pipelined realization of adaptive FIR filter
based on Distributed arithmetic,in VLSI Symp.Tech. Dig., Oct.2011,pp. 428-433.
[7] M.D. Meyer and P.Agrawal, A modulor pipeline implementation of a delayed LMS
transversal adaptive filter, in Proc. IEEE Int Symp. Circuit Syst., May 1990,pp.1943-
1946.
[8] S.Y.Park and P.K.Meher, low power, high throughput and low area adaptive FIR
filter based on distributed arithmetic , in IEEE transactions on Circuits and Syst.,II
express brifs,vol.60,no.6,june 2013.
[9] K.Allipeera and S.Ahmed basha,An efficient 64-bit Carry Select Adder with less delay
and reduced area application, IJERA-ISSN:2248-9622.vol.2, issue 5,sep-oct
2012,pp.550-554.
[10] G.Prasannakumar and K.Indirapriyadarsini, Low Complexity Algorithm For Updating
The Coefficients of Adaptive Filter International journal of Electronics and
Communication Engineering &Technology (IJECET), Volume 4 Issue 3, 2013,
pp.63 - 69, ISSN Print: 0976- 6464, ISSN Online: 0976 6472.

A Novel Modified Distributed Arithmetic Scheme For Lms Low Power and Reduced Area Fir Adaptive Filter Implementation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Novel Modified Distributed Arithmetic Scheme For Lms Low Power and Reduced Area Fir Adaptive Filter Implementation

Uploaded by

Copyright:

Available Formats

International Journal of Electrical and Electronics Engineering Research and Development

You might also like