Professional Documents
Culture Documents
PRASANTH KRISHNA K
Dept. of Electronics and Communication, Amrita University, Amrita School of Engineering
Kollam, Kerala - 690 525, India
ABSTRACT: IIR filters of order greater than four are generally implemented as cascade of
second order biquads due to concerns over stability. As the order of the filter increases, the number
of biquads that have to be cascaded also increases. This imposes a restriction on the maximum
speed that is achievable by the filter. In this paper I present a three parallel implementation of a
digital biquad, which can be used as a building block for higher order IIR filters. Since the
proposed biquad block is three parallel the IIR filter that uses this can process 3 inputs at a time to
give 3 outputs, effectively reducing the delay for producing one output to one third. The same
structure can be used for reducing the power in applications where there is not much performance
requirement. The simple biquad structure and the three parallel biquad structure was modelled in
MATLAB and implemented in Verilog. The results of the Verilog implementation was verified
through FFT and by Vector matching with the results obtained from that of the MATLAB model.
KEYWORDS: Biquad, IIR Filter, FIR Filter, Quantization, Fixed Point Representation, Booth
Recoding, Wallace Tree, Convergent Rounding, Unfolding, FFT.
INTRODUCTION
IIR filters when compared to FIR filters are less complex and takes much less area since it utilizes
lesser multipliers and adders to achieve the same specification. However since IIR filters have a
feedback they are nonlinear in phase and can be unstable. Applications which do not require
linearity in phase can use IIR filters in place of FIR filters which are often implemented in a higher
order to meet the specification, thereby burning more power and area. Practical IIR filters are of
order greater than four and are implemented as cascades of biquads due to concerns over stability.
Biquads which are second order recursive filters contains two poles and two zeros, can be
implemented in various forms like Direct Form I, Direct Form II and Transpose forms, Francis
(2009) and Karen et al. (2013). Out of these structures Transpose Direct Form II is most canonical
form as it uses minimum number of delay elements, multipliers and adders. Apart from the above
mentioned structures there are other structures which are optimized for performance or area. In this
paper the biquad is implemented in the Transpose Direct Form II structure, since it is hardware
efficient and can be modularized for reuse. The simple biquad designed in Direct Form II structure
forms the heart of the 3 parallel biquad which can be used as a building block of IIR filter. The
design steps involved are algorithmic in nature and involves a number of iterations. The system
design results thus derived are optimal in terms of hardware requirement and sticks to the
maximum error requirement for the design under consideration.
IIR filter biquad generally comprises of adders and multipliers along with delay elements. As
discussed earlier IIR filters has a feedback path so ensuring stability of the design is an important
design parameter. An IIR biquad can be represented mathematically as in (1) where b0 to b2 are
the feed forward coefficients and a1 to a2 are the feedback coefficients. All the coefficients are
normalized such that the coefficient a0 is unity. x(n) and y(n) are the input and the output of the
biquad.
To design and verify the biquad structures, a standard set of filter specifications were used. The
filter specifications are tabulated in Table. 1. The coefficients of the filter (b0, b1, b2, a1, a2) were
generated by using the Filter Design and Analysis Tool (FDA Tool) provided by MATLAB. The
FDA Tool also gives a gain value which is to be multiplied to the input signal before it is
processed. The coefficients, tabulated in Table. 2 were computed for a Low Pass Butterworth filter
which is realized in Transpose Direct Form II structure. Apart from the specifications mentioned in
Table. 1 other filter design specifications can also be added to generate the coefficients. The basic
biquad structure is shown in Fig. 1. The Direct Form II implementation requires a total of six
multipliers and three adders along with two delay elements. The Gain can be adjusted so that the
coefficients b0 to b2 can be powers of two. Multiplication by a power of two can be efficiently
implemented as shifters in digital domain, thereby reducing the number of multipliers required for
the implementation.
Gain b0
X(n) Y(n)
+
Z-1
b1 a1
+
Z-1
b2 a2
+
Figure 1. Biquad Structure Direct Form II.
180
Table 2. Biquad Coefficients - Double Precision.
Coefficient Value
b0 1
b1 2
b2 1
a0 1
a1 -1.9555782403150355
a2 0.95654367651120342
Gain 0.00024135904904198073
181
(2)
The algorithm used for finding the width is shown in Fig. 2. A sine wave in the pass band
(50 KHz) and a signal in transition band (150 KHz) was used for calculating the required
amount of precision. After passing the two inputs to the algorithm the maximum value of
N was taken as the length required for representing the input. It was found that the total
input width came to 28 bits, 2 bits to represent the integer part along with the sign and 26
bits for representing the fractional part. Since the input is 28 bits the output word length is
also 28 bits. The same design algorithm can be used to find the input and output word
length for a different error requirement. As the error requirement decreases the word
length increases resulting in a higher hardware requirement.
182
Figure 3. Filter coefficient quantization algorithm
The algorithms discussed above were implemented in MATLAB. Test sinusoids of different
frequencies one in pass band and other in transition band were generated and was given as input to
the algorithms. The results of the system design are consolidated in Table. 3. The bit width
183
represents the total number of bits required for representing each quantities. These digital
hardware specifications go as inputs to digital implementation of the biquad in Verilog.
Using the system design results the IIR Filter was modelled in MATLAB. The structure of the
filter as shown in Fig. 1, which is the Transpose Direct Form 2 implementation was modelled in
MATLAB as the topology is canonical in terms of delay required for the design. The multiplier
used in the design utilizes Modified Booth Recoding Algorithm to reduce the number of partial
products along with Wallace Tree approach for adding the partial products, to reduce the
computation time. In order to limit the data path word length unbiased rounding was also
implemented in the design. Special care was taken for designing multiplier so as to get greater
performance and efficiency as multipliers take up huge chip area and create more delay in the
critical path.
Multiplier Implementation
A multiplier computes the result by computing a set of partial products and then summing up all
the partial products. Since the multiplier deals with signed numbers the partial products has to be
appropriately sign extended to get to the correct results. Also the last partial product has to convert
appropriately when multiplying with the sign bit. Excessive number of partial products and the
sign extension that is required can result in high hardware requirements which directly results in
increases area, power and delay values. To avoid this, Modified booth algorithm in Neil & David
(2010), was implemented which approximately reduces the number of partial products into half.
To further speed up the addition of the generated partial products Wallace Tree approach was
implemented. The partial products are added together in different stages, thereby reducing the total
delay. The stage wise addition process involves grouping of partial products into three. If there are
r rows of partial products then 3 (r/3) rows are grouped and the remaining rows are passed to
the next stage. The grouped rows are summed using a full adder if there are three bits in one
column or by using a half adder if there are only two bits. The sum and carry thus generated are
passed on to the next stage. This method is repeated until only two rows are left. The method can
be generalized and extended to any number of partial products. The optimum height of the partial
product matrix at stage n+1 is given by 2r/3 where r is the height of the partial product matrix
at stage n. Sign extension of the partial products was avoided by using the algorithm described in
Israel (2005).
184
Rounding of Multiplier Results
Most DSP processors have a rounding algorithm in place to reduce the data path width after a
multiplication operation. Implementing this has a huge advantage in terms of are required for the
design. Since some data is lost in the rounding process, it adds some amount of noise to the system
so care must be taken so that the error does not cause a substantial impact on the filter result.
Rounding can be implemented in two ways Biased Rounding and Unbiased or Convergent
Rounding. Freescale Semiconductors (2005) describes an unbiased rounding implementation that
reduces the average noise imparted to the result. The rounding algorithm is shown in Fig. 4.
Unfolding of Biquad
The simple biquad can process only one input sample at a time. This fundamental limitation puts a
limit for the maximum frequency with which the biquad can be operated. Unlike an FIR filter
structure pipelining or parallelization cannot be directly applied to the biquad due to the feedback
path. To implement parallelism in a biquad the system has to be unfolded. Parhi (2007) explains an
unfolding algorithm which was applied in the simple biquad. The resulting unfolded structure is
shown in Fig. 5. Unlike a feed forward parallel design unfolded design utilizes the same number of
delay elements as was present in the original design. The design can process 3 input samples at a
time to give 3 outputs, thereby increasing the effective speed for generating an output.
After modeling the IIR Filter Biquad in MATLAB and verifying the results, the biquad structure
shown in Fig. 1 was implemented in Verilog. Parallel biquad was also implemented using the
biquad that was designed earlier. The architecture of the unfolded design is shown in Fig. 5. Since
the design processes 3 input samples at a time and gives 3 outputs it gives room for reducing the
VDD and thereby reducing the power consumption based on the implementation requirement.
For verifying the designed filter sinusoidal inputs of different frequencies each of length 65000
were created in MATLAB. These test vectors were given as inputs to the MATLAB model and the
185
Verilog model. The FFT of both responses were compared with that of the input to verify the
correctness of the design. Fig. 6 shows the FFT plots of input test sinusoids and output sinusoids of
the biquad and the unfolded design. The input is a combination of three sinusoid signals of
frequency 50 KHz, 150 KHz and 600 KHz and amplitude 0.33. From the filter magnitude response
a 50 KHz signal has a magnitude of -0.3377. This corresponds to an output signal amplitude of
0.3179. This theoretical result agrees with the one that was computed from the response. Similar
validation was performed for the signals with frequency of 150 KHz and 600 KHz and was found
that the results agree with that computed theoretically.
CONCLUSIONS
The simple biquad and the parallel biquad was implemented in Verilog. The Verilog
implementation was synthesized using Encounter RTL compiler in 45nm NanGate Open Cell
Library. The MATLAB code combined with the Verilog code can be used for code generation as
HDL was written in a highly parameterized way. This improves the HDL codes reusability and for
its use in code generation purposes. As a future extension to this work the multipliers topology can
be changed to include multi-bit recording which can give better performance at the cost of extra
hardware.
186
Figure 6. FFT plot of Filter Test Input and Output
REFERENCES
Andrew, Rushton (2011). VHDL for Logic Synthesis, John Wiley & Sons.
Francis, M. (2009). Infinite Impulse Response Filter Structures in Xilinx FPGAs, Xilinx White
Paper.
Freescale Semiconductors (2005). DSP56300 Family Manual, 54-58.
Neil, H. E. Weste, David Money, Harris (2010), CMOS VLSI Design - A Circuits and Systems
Perspective, Addison-Wesley: Pearson.
Israel, Koren (2005). Computer Arithmetic Algorithms, Universities Press, 141-175.
Karen M.G.V. Gettings, Andrew K. Bolstad, Michael N. Ericson & Xiao Wang (2013). Biquad
Implementation of an IIR filter for IQ mismatch correction in an SoC RF receiver, High
Performance Extreme Computing Conference (HPEC), 1-5.
Keshab, K. Parhi (2007). VLSI Digital Signal Processing Systems: Design and Implementation,
John Wiley & Sons.
187