You are on page 1of 9

DESIGN OF PARALLEL BIQUAD FOR IMPLEMENTING IIR FILTERS

PRASANTH KRISHNA K
Dept. of Electronics and Communication, Amrita University, Amrita School of Engineering
Kollam, Kerala - 690 525, India

ABSTRACT: IIR filters of order greater than four are generally implemented as cascade of
second order biquads due to concerns over stability. As the order of the filter increases, the number
of biquads that have to be cascaded also increases. This imposes a restriction on the maximum
speed that is achievable by the filter. In this paper I present a three parallel implementation of a
digital biquad, which can be used as a building block for higher order IIR filters. Since the
proposed biquad block is three parallel the IIR filter that uses this can process 3 inputs at a time to
give 3 outputs, effectively reducing the delay for producing one output to one third. The same
structure can be used for reducing the power in applications where there is not much performance
requirement. The simple biquad structure and the three parallel biquad structure was modelled in
MATLAB and implemented in Verilog. The results of the Verilog implementation was verified
through FFT and by Vector matching with the results obtained from that of the MATLAB model.

KEYWORDS: Biquad, IIR Filter, FIR Filter, Quantization, Fixed Point Representation, Booth
Recoding, Wallace Tree, Convergent Rounding, Unfolding, FFT.

INTRODUCTION

IIR filters when compared to FIR filters are less complex and takes much less area since it utilizes
lesser multipliers and adders to achieve the same specification. However since IIR filters have a
feedback they are nonlinear in phase and can be unstable. Applications which do not require
linearity in phase can use IIR filters in place of FIR filters which are often implemented in a higher
order to meet the specification, thereby burning more power and area. Practical IIR filters are of
order greater than four and are implemented as cascades of biquads due to concerns over stability.
Biquads which are second order recursive filters contains two poles and two zeros, can be
implemented in various forms like Direct Form I, Direct Form II and Transpose forms, Francis
(2009) and Karen et al. (2013). Out of these structures Transpose Direct Form II is most canonical
form as it uses minimum number of delay elements, multipliers and adders. Apart from the above
mentioned structures there are other structures which are optimized for performance or area. In this
paper the biquad is implemented in the Transpose Direct Form II structure, since it is hardware
efficient and can be modularized for reuse. The simple biquad designed in Direct Form II structure
forms the heart of the 3 parallel biquad which can be used as a building block of IIR filter. The
design steps involved are algorithmic in nature and involves a number of iterations. The system
design results thus derived are optimal in terms of hardware requirement and sticks to the
maximum error requirement for the design under consideration.

IIR FILTER BIQUAD SYSTEM DESIGN

IIR filter biquad generally comprises of adders and multipliers along with delay elements. As
discussed earlier IIR filters has a feedback path so ensuring stability of the design is an important
design parameter. An IIR biquad can be represented mathematically as in (1) where b0 to b2 are
the feed forward coefficients and a1 to a2 are the feedback coefficients. All the coefficients are
normalized such that the coefficient a0 is unity. x(n) and y(n) are the input and the output of the
biquad.

y(n) = b0 x(n) + b1 x(n-1) + b2 x(n-2) + a1 y(n-1) + a2 y(n-2) (1)

To design and verify the biquad structures, a standard set of filter specifications were used. The
filter specifications are tabulated in Table. 1. The coefficients of the filter (b0, b1, b2, a1, a2) were
generated by using the Filter Design and Analysis Tool (FDA Tool) provided by MATLAB. The
FDA Tool also gives a gain value which is to be multiplied to the input signal before it is
processed. The coefficients, tabulated in Table. 2 were computed for a Low Pass Butterworth filter
which is realized in Transpose Direct Form II structure. Apart from the specifications mentioned in
Table. 1 other filter design specifications can also be added to generate the coefficients. The basic
biquad structure is shown in Fig. 1. The Direct Form II implementation requires a total of six
multipliers and three adders along with two delay elements. The Gain can be adjusted so that the
coefficients b0 to b2 can be powers of two. Multiplication by a power of two can be efficiently
implemented as shifters in digital domain, thereby reducing the number of multipliers required for
the implementation.

Biquad Data Representation


Signed fixed point representation was used for representing the input and output of the biquad.
Since the input data range and the coefficients are fixed, the hardware requirement for the biquad
can be reduced if fixed point representation is used while maintaining the required accuracy. In
fixed point representation the integer part and the fractional part are represented in fixed number of
bits. This can reduce the area required for implementation of the biquad by 2x - 3x as mentioned in
Andrew (2011), when compared with floating point representation.

Gain b0
X(n) Y(n)
+
Z-1
b1 a1
+
Z-1
b2 a2
+
Figure 1. Biquad Structure Direct Form II.

Table 1. IIR low pass filter specifications.


Filter Parameter Value
Pass band frequency 100 KHz
Stop band frequency 500 KHz
Sampling frequency 20 MHz

180
Table 2. Biquad Coefficients - Double Precision.
Coefficient Value
b0 1
b1 2
b2 1
a0 1
a1 -1.9555782403150355
a2 0.95654367651120342
Gain 0.00024135904904198073

Figure 2. Filter input quantization algorithm.

Biquad Input Quantisation


As the IIR Filter implementation is in digital domain the input bit width needs to be fixed. As the
input signal ranges from -1 to +1 and also to minimize the area, signed fixed point representation
was used for representing the data. Considering the input data range only two bits are required to
represent the integer part. The bit width for representing the fractional part is computed so that the
error introduced by quantizing the input is 0.001dB or less. The error was computed using (2),
where IdealFilterOut is the ideal filter output and InputQuantizedFilterOut is the output of the
filter with the input quantized to fixed bits.

181
(2)
The algorithm used for finding the width is shown in Fig. 2. A sine wave in the pass band
(50 KHz) and a signal in transition band (150 KHz) was used for calculating the required
amount of precision. After passing the two inputs to the algorithm the maximum value of
N was taken as the length required for representing the input. It was found that the total
input width came to 28 bits, 2 bits to represent the integer part along with the sign and 26
bits for representing the fractional part. Since the input is 28 bits the output word length is
also 28 bits. The same design algorithm can be used to find the input and output word
length for a different error requirement. As the error requirement decreases the word
length increases resulting in a higher hardware requirement.

Biquad Coefficient and Gain Quantization


The gain and coefficients generated by the FDA Tool are in double precision format. To
implement the filter coefficients and gain has to be quantized to finite word length in such a way
that it minimizes the quantization error there by preserving the precision and also optimizing the
amount of hardware required to meet the required precision. As the biquad has a feedback path the
stability of the design also needs to be taken into consideration while the coefficients are
quantized. The algorithm that was used for quantization of coefficients and gain is shown in Fig. 3.
A pass band signal and a transition band signal was used for finding the width of gain and
coefficients. After completing the iteration the coefficient width was computed to be 13 bit and the
gain width to be 22 bits. Since the biquad uses signed fixed point representation and also
considering re-usability of the design, two bits are allotted for representing the integer part and the
rest are for representing the fractional part for both gain and coefficients. The stability of the
quantized filter was verified using FDA Tool by importing the quantized filter.

Biquad Data-path Truncation


As illustrated in the biquad architecture, adders and multipliers form the integral part of the filter.
When two signed N bit numbers are multiplied the result will be 2N-1 bits and if further
multiplication is performed the word length increases in such a way that it cannot be implemented
in practical digital system. Similar is the case with adders. When two N bit signed numbers are
added the resultant should be of length N+1 to avoid over flow. In the biquad the input and the
feedback values are multiplied with the coefficients, thus the word length of the result becomes the
total size of both of the multiplicands. This increases the size of the data-path. As the result of the
multipliers goes to the adders there size also needs to be increased and subsequently results in an
increase in the number of flip flops required in the delay element. To reduce this the data path has
to be rounded in such a way that the error reflected at the output of the filter is less than 0.01dB.
Unbiased rounding or convergent rounding was used as the rounding algorithm. This reduces the
bias error that will be introduced if other rounding algorithms were used. The generic convergent
rounding algorithm as explained in Freescale Semiconductors (2005) was implemented in the
design.

182
Figure 3. Filter coefficient quantization algorithm

IIR FILTER BIQUAD SYSTEM DESIGN RESULTS

The algorithms discussed above were implemented in MATLAB. Test sinusoids of different
frequencies one in pass band and other in transition band were generated and was given as input to
the algorithms. The results of the system design are consolidated in Table. 3. The bit width

183
represents the total number of bits required for representing each quantities. These digital
hardware specifications go as inputs to digital implementation of the biquad in Verilog.

Table 3. Biquad Coefficients - Double Precision.


Coefficient Value Bit Width
b0 1 -
b1 2 -
b2 1 -
a0 1 -
a1 -1.95556640625 13
a2 0.95654296875 13
Gain 0.00024127960205078125 22
Input - 28
Output - 28

MATLAB MODELING OF BIQUAD

Using the system design results the IIR Filter was modelled in MATLAB. The structure of the
filter as shown in Fig. 1, which is the Transpose Direct Form 2 implementation was modelled in
MATLAB as the topology is canonical in terms of delay required for the design. The multiplier
used in the design utilizes Modified Booth Recoding Algorithm to reduce the number of partial
products along with Wallace Tree approach for adding the partial products, to reduce the
computation time. In order to limit the data path word length unbiased rounding was also
implemented in the design. Special care was taken for designing multiplier so as to get greater
performance and efficiency as multipliers take up huge chip area and create more delay in the
critical path.

Multiplier Implementation
A multiplier computes the result by computing a set of partial products and then summing up all
the partial products. Since the multiplier deals with signed numbers the partial products has to be
appropriately sign extended to get to the correct results. Also the last partial product has to convert
appropriately when multiplying with the sign bit. Excessive number of partial products and the
sign extension that is required can result in high hardware requirements which directly results in
increases area, power and delay values. To avoid this, Modified booth algorithm in Neil & David
(2010), was implemented which approximately reduces the number of partial products into half.
To further speed up the addition of the generated partial products Wallace Tree approach was
implemented. The partial products are added together in different stages, thereby reducing the total
delay. The stage wise addition process involves grouping of partial products into three. If there are
r rows of partial products then 3 (r/3) rows are grouped and the remaining rows are passed to
the next stage. The grouped rows are summed using a full adder if there are three bits in one
column or by using a half adder if there are only two bits. The sum and carry thus generated are
passed on to the next stage. This method is repeated until only two rows are left. The method can
be generalized and extended to any number of partial products. The optimum height of the partial
product matrix at stage n+1 is given by 2r/3 where r is the height of the partial product matrix
at stage n. Sign extension of the partial products was avoided by using the algorithm described in
Israel (2005).

184
Rounding of Multiplier Results
Most DSP processors have a rounding algorithm in place to reduce the data path width after a
multiplication operation. Implementing this has a huge advantage in terms of are required for the
design. Since some data is lost in the rounding process, it adds some amount of noise to the system
so care must be taken so that the error does not cause a substantial impact on the filter result.
Rounding can be implemented in two ways Biased Rounding and Unbiased or Convergent
Rounding. Freescale Semiconductors (2005) describes an unbiased rounding implementation that
reduces the average noise imparted to the result. The rounding algorithm is shown in Fig. 4.

Figure 4. Unbiased Rounding - 32-Bit to 16-Bit

Unfolding of Biquad
The simple biquad can process only one input sample at a time. This fundamental limitation puts a
limit for the maximum frequency with which the biquad can be operated. Unlike an FIR filter
structure pipelining or parallelization cannot be directly applied to the biquad due to the feedback
path. To implement parallelism in a biquad the system has to be unfolded. Parhi (2007) explains an
unfolding algorithm which was applied in the simple biquad. The resulting unfolded structure is
shown in Fig. 5. Unlike a feed forward parallel design unfolded design utilizes the same number of
delay elements as was present in the original design. The design can process 3 input samples at a
time to give 3 outputs, thereby increasing the effective speed for generating an output.

IIR FILTER IMPLEMENTATION USING BIQUAD

After modeling the IIR Filter Biquad in MATLAB and verifying the results, the biquad structure
shown in Fig. 1 was implemented in Verilog. Parallel biquad was also implemented using the
biquad that was designed earlier. The architecture of the unfolded design is shown in Fig. 5. Since
the design processes 3 input samples at a time and gives 3 outputs it gives room for reducing the
VDD and thereby reducing the power consumption based on the implementation requirement.

IIR FILTER VERIFICATION

For verifying the designed filter sinusoidal inputs of different frequencies each of length 65000
were created in MATLAB. These test vectors were given as inputs to the MATLAB model and the

185
Verilog model. The FFT of both responses were compared with that of the input to verify the
correctness of the design. Fig. 6 shows the FFT plots of input test sinusoids and output sinusoids of
the biquad and the unfolded design. The input is a combination of three sinusoid signals of
frequency 50 KHz, 150 KHz and 600 KHz and amplitude 0.33. From the filter magnitude response
a 50 KHz signal has a magnitude of -0.3377. This corresponds to an output signal amplitude of
0.3179. This theoretical result agrees with the one that was computed from the response. Similar
validation was performed for the signals with frequency of 150 KHz and 600 KHz and was found
that the results agree with that computed theoretically.

Figure 5. 3 Level Unfolded Filter

CONCLUSIONS

The simple biquad and the parallel biquad was implemented in Verilog. The Verilog
implementation was synthesized using Encounter RTL compiler in 45nm NanGate Open Cell
Library. The MATLAB code combined with the Verilog code can be used for code generation as
HDL was written in a highly parameterized way. This improves the HDL codes reusability and for
its use in code generation purposes. As a future extension to this work the multipliers topology can
be changed to include multi-bit recording which can give better performance at the cost of extra
hardware.

186
Figure 6. FFT plot of Filter Test Input and Output

REFERENCES

Andrew, Rushton (2011). VHDL for Logic Synthesis, John Wiley & Sons.
Francis, M. (2009). Infinite Impulse Response Filter Structures in Xilinx FPGAs, Xilinx White
Paper.
Freescale Semiconductors (2005). DSP56300 Family Manual, 54-58.
Neil, H. E. Weste, David Money, Harris (2010), CMOS VLSI Design - A Circuits and Systems
Perspective, Addison-Wesley: Pearson.
Israel, Koren (2005). Computer Arithmetic Algorithms, Universities Press, 141-175.
Karen M.G.V. Gettings, Andrew K. Bolstad, Michael N. Ericson & Xiao Wang (2013). Biquad
Implementation of an IIR filter for IQ mismatch correction in an SoC RF receiver, High
Performance Extreme Computing Conference (HPEC), 1-5.
Keshab, K. Parhi (2007). VLSI Digital Signal Processing Systems: Design and Implementation,
John Wiley & Sons.

187

You might also like