1014006

International Review on Computers and Software (I.R.E.CoS), Vol. 4, no.
3, May 2009
Delay Analysis of Pipeline FFT Processors

S.Reza Talebiyan1, Saied Hosseini-Khayat 2
Abstract Pipeline FFT processors are used in mobile communication systems and in particular
in OFDM-based systems. This paper presents a method for delay analysis of pipeline FFT processors. This method is applied to various architectures with different radices. The analysis can be used in the design of high speed pipeline FFT processors.
Keywords: Pipeline FFT Processor; Delay Analysis

FFT processors from delay point of view. This comparison enables designers to choose the best tradeoff for chip speed and other IC design factors.
Nomenclature
m number of bits N FFT points TFA time delay of a 1-bit full adder Tadder time delay of a multi-bit full adder Tmin minimum time cycle that is required for pipeline FFT processor Tmultiplier time delay of a multi-bit multiplier
II.
Pipeline FFT Processor Architectures
I.
Introduction
High speed is one of the main goals in IC design and is an important requirement for todays mobile communication systems. On the other hand, FFT processors are widely utilized in such systems. FFT processing has particularly a central place in OFDMbased systems, which are now a part of IEEE 802-11 standards for WLANs. Using a pipelined FFT processor is the best design choice and perhaps the only choice for obtaining high-throughput and low-power solutions [1][5]. There is a great deal of research on high-speed FFT processor [6]-[10]. Many of them deal with parallelism for pipeline FFT processor [6]-[8]. Some have tried to reduce the number of processing elements that are used in data-path [9], and yet others try to soap up the memory-based FFT processors [10]. There are a number of architectures for pipeline FFT processors. Single-path Delay Feedback (SDF) and Multiple-path Delay Commutator (MDC) are two well-known examples. Different architectures suit different applications. But there is no analysis that compares the various kinds of pipeline FFT structures from speed point of view. In this paper, an analysis of the time delay for different kinds of pipeline FFT processors will be presented. For pipeline FFT processors, the main factor determining their speed is clock frequency. To be able to design high speed pipeline FFT processors, we must analyze their critical path delay. In this paper the critical path of different pipeline FFT processors will be analyzed. We then compare different kinds of pipeline 1
The general structure of a pipeline FFT processor is shown in Fig. 1. In this figure, BF denotes a butterfly unit. This unit carries out addition and subtraction. W is the twiddle factor that must be multiplied into the butterfly unit result. The internal structure of buffers and butterfly units can vary in different situations. Therefore, several structures of pipeline FFT processor can be specified. The main structures of pipeline FFT processor are Single-path Delay Feedback (SDF), Multiple-path Delay Commutator (MDC), and Single-path Delay Commutator (SDC). Each structure is suitable for a special application. For example, the MDC structure is suitable for MIMO communications [1], and the SDF structure is suitable for reconfigurable FFT processors [11]. Now pipeline FFT processor structures are described.
Fig. 1. The main structure of pipeline FFT processor
II.1.
SDF Structure
This structure is a well-known structure for pipeline FFT processors with suitable amount of hardware resources [2], [3]. Buffers in this kind of architecture are in feedback path of butterfly units. Thus, the output of each butterfly unit is stored in shift-register buffers. At each clock cycles, only one of the butterfly outputs is stored in buffer, and the other is passed onto the next stage. Different radices such as radix 2 (R2SDF) or radix 4 (R4SDF) are used in this structure [2]. Fig.2
S. R. Talebiyan, S. Hosseini-Khayat
shows a 16-point pipeline FFT processor in R2SDF structure. Because FFT algorithm needs log2N arithmetic stages, therefore we need log2 N butterfly units. Each butterfly unit needs 2 adders, thus R2SDF needs 2log2 N adders. Also this structure needs log2N 2 multipliers, because the two final stages don't need any multiplier. R2SDF structure needs N-1 buffer elements. Each butterfly works in one of two modes: transfer mode (in which the two inputs are passed to the next stage) and add mode (in which the two inputs are added). A critical path is created when all butterflies are in adding mode. Therefore, the critical path contains all adders and multipliers in the FFT processor.
become relatively complicated. Fig. 4 shows a 1024point R4SDC pipeline FFT processor architecture [12].
Fig. 4 R4SDC pipeline FFT processor
Fig. 2. 16-point R2SDF pipeline FFT processor
In this architecture, the critical path is from one commutator to the next and contains two adders and one multiplier. A pipeline FFT processor is made up of three basic hardware elements. These basic elements are buffers, adders, and multipliers. To analyze a pipeline FFT processor, we must select specific circuits for each of these elements. The analysis presented here is based on finding the critical path of each structure and calculating the delay of main hardware resources in the path. The result of this analysis will be the minimum cycle time that is needed for each pipeline FFT processor structure.
II.2.
MDC Structure
This structure is the most common implementation method for radix-2 FFT algorithm [2]. Although it is possible to have higher radices for this structure, they need a large amount of hardware resources. Fig. 3 shows a 16-point pipeline FFT processor in R2MDC structure. Data sequence is broken into two streams with buffer elements. Buffers save data and also make an interval between data streams. SW denotes a switch box and determines the data direction to butterfly unit or to the next buffer. The number of multipliers and adders are the same as in R2SDF structure, but the number of buffers (3/2N2 registers) are higher than R2SDF structure.
III. Choice of Building Block Circuits

In order to perform delay analysis, first we need to select the circuit structure of building block element that is used in critical path, i.e., 1-bit full adder. For 1-bit full adder circuit, we choose the circuit that is presented in [13] with enabling capability. This circuit is illustrated in Fig. 5. We use the delay of this circuit to analysis the delay of the whole system.
IV.
Choice of m-bit Adder and Multiplier Structures
Fig. 3. 16-point R2MDC pipeline FFT processor
In this structure the critical path is obtained when all switch boxes send data to the next stage, so all adders except the last will be in critical path and one multiplier operates at the end. II.3. SDC Structure
In the second step, the structure of full-adders and multipliers must be selected. The next step is the choice of data word-length. The multiplier is composed of a number of 1-bit full adders that are combined in a special form. Therefore, the delay of adders and multipliers will be a function of 1-bit full adders that is needed. The number of 1-bit full adders depends on the data word-length and the selected architecture. This relation is explained in Equations (1), (2). K1 and K2 are two correction coefficients that show the effect of interconnection on delay between 1-bit full adders. Because the different kinds of pipeline FFT processors are studied in equal condition, therefore we can set both K1 and K2 to 1.
In MDC structure with higher radices, hardware resources will become larger. To reduce buffer elements in comparison with R4MDC structure, a new structure is proposed in [12]. Buffer elements are replaced by multiplexed and programmable Delay-Commutator. Due to programmability requirement, the butterfly, Delay-Commutator and the control unit of this structure
Tadder K1. f (m).TFA

Tmultiplier K 2 .g (m).TFA
(1) (2)
Obviously, by choosing the structure of adder and multiplier and data word-length, f(m) and g(m) will be specified. For example f(m) for carry propagation adder
International Review on Computers and Software, Vol. 4, n. 3
Copyright 2009 Praise Worthy Prize S.r.l. - All rights reserved
(CPA) will be m (m is the number of bits). And for carry select adder if we divide this adder to two (m/2)-bit adders f(m) will be about (m/2). Here, for selection of adder and multiplier structures, we consider their suitability for deep submicron technology. One of the main issues in deep submicron technology is interconnection networks [14]. Simple structures of adders and multipliers have a minimum amount of interconnection and routings. Consequently, we select carry propagation adder (CPA) and Baugh-Wooley multiplier. Therefore f, g will be simplified as Eqs. (3), (4).
Tm in (log 2 N ).Tadder (log 2 N 1)Tmultiplier
(5)
In R4SDF in each butterfly two adders work therefore, the minimum time cycle of R4SDF will be as Eq. (6).
Tm in 2.(log 4 N ).Tadder (log 4 N 1)Tmultiplier (6)
V.2.
Critical Path for MDC Structure
f (m) m g (m) 2m
(3) (4)
In this structure the critical path is obtained when all switch boxes except the last two of them direct data to other way, by this work all adders except two of them will be in critical path and all multipliers except one of them are not in this path. Thus, the critical path of this structure contains one of all adders in each butterfly except two of them and one multiplier. Therefore, the minimum time cycle of this structure will be as follows:
Tm in [(log 2 N ) 2].Tadder Tmultiplier
(7)
In R4MDC in each butterfly two adders work therefore, the minimum time cycle of R4MDC will be as Eq. (8).
Fig. 5. Selected 1bit-full-adder circuit [2]
Tm in 2.[(log 4 N ) 2].Tadder Tmultiplier
(8)
V.
Finding Critical Path for Pipeline FFT Processor Structures
V.3.
Critical Path for R4SDC Structure
The critical path in each structure is the longest path in it from one buffer to the next buffer. This path is created due to different working modes of processing elements or switch boxes in FFT processor. This path shows the worst case operation of FFT processor. Here the critical path of some well-known structures of pipeline FFT processor is presented. These structures are R2SDF, R2MDC, R4SDF, R4SDC and R4MDC. The time cycle of each structure can not be more than the delay of critical path. Therefore, the maximum frequency of clock pulse, for each FFT processor structure, will be specified. V.1. Critical Path for SDF Structure
In this architecture, the critical path contains all steps because one of the inputs of each butterfly does not come from buffers. Thus, the critical path contains all adders and multipliers in the path. Therefore, the minimum time cycle of this structure will be as follows:
Tm in 2(log 4 N ).Tadder [(log 4 N ) 1].Tmultiplier (9)

Tadder and Tmultiplier denote to the delay of adder and multiplier circuits. With respect to the Eqs. (1), (2) these delays will be as Eqs. (10) and (11).
Tadder m.TFA
Tmultiplier 2.m.TFA
(10) (11)
In this structure, each butterfly at any time works in one of two modes. One mode is transfer mode and the other is adding mode, that butterfly adds two inputs with each other. When all butterflies are in adding mode, the path contains all adders and some of multipliers can work in this path. We consider that, all of multipliers work in this path to cover the worst case. Therefore, the critical path of this structure contains all of multipliers and one adder of each butterfly. Then the minimum time cycle of R2SDF structure will be as follows:
VI. Using the Analysis

We present a method to estimate the worst case time cycle of some useful structure of pipeline FFT processor. Now, by considering Equations (5) to (9), we can estimate the minimum time cycle of each FFT processor structure. m denotes to the number of bits and
is selected to be 16. By using these equations the minimum time cycle or the maximum frequency for different FFT structures would be specified as a function of FFT points. Figs. 6 and 7 show minimum time cycle of each structure versus the FFT points for 2 and 4 radices. The effect of circuit level design is in selection of 1-bit full adders. For computation of TFA (delay of 1-bit full adder) HSPICE simulations are used in 90nm PTM technology [15].
Delay Analysis of Radix-2
80000 70000
References
[1] S. Lee, Y. Jung, and J. Kim, Low complexity pipeline FFT processor for MIMO-OFDM systems, IEICE Electronics Express, vol. 4, no. 23, pp. 750-754. S. He and M. Torkelson, Design and Implementation of 1024Point Pipeline FFT Processor, Custom Integrated Circuits Conference, Processing of the IEEE, May 1998, pp. 131-134. S. He, M. Torkelson, "Designing pipeline FFT processor for OFDM (de)modulation," Proc. of ISSSE, vol. 2, pp. 945-950, 1998. L. Jia, Y. Gao, J. Isoaho, and H. Tenhunen, A New VLSIOriented FFT Algorithm and Implementation, Proc. of Eleventh Annual IEEE Intl ASIC Conference, 1998, pp. 337-341. M. Jiang, B. Yang, A. Jiang, X. Wang, X. Gan, B. Zhao, T. Zhang, "Design of FFT processor with low power complex multiplier for OFDM-based high-speed wireless applications," in Proc. of ISCIT'2004, pp. 639-641 , 2004. H.L. Lin, H. Lin, R.C. Chang, S.W. Chen, C.Y. Liao, C.H.Wu, "A High-Speed Highly Pipelined 2n-Point FFT Architecture for a dual OFDM Processor," MIXDES, POLAND, 22 - 24 June 2006. J. Lee, H. Lee, S. Cho and S.S. Choi, "A High-Speed, LowComplexity Radix-24 FFT Processor for MB-OFDM UWB Systems," ISCAS 2006. Wei Han, A. T. Erdogan, T. Arslan, and M. Hasan, "The Development of High Performance FFT IP Cores through Hybrid Low Power Algorithmic Methodology," ASP-DAC 2005. Koushik Maharatna, Eckhard Grass and Ulrich Jagdhold, "A 64Point Fourier Transform Chip for High-Speed Wireless LAN Application Using OFDM," IEEE JOURNAL OF SOLIDSTATE CIRCUITS, Vol. 39, No. 3, MARCH 2004. Chu Chao, Zhang Qin, Xie Yingke and Han Chengde, "Design of a High Performance FFT Processor Based on FPCA," ASPDAC 2005. F. Kristensen, P. Nilsson, A. Olsson, "A flexible FFT processor," Proc. 20th NORCHIP Conf. pp. 121-126, 2002. G. Bi, E. V. Jones, "A pipelined FFT processor for wordsequential data," IEEE Transactions on Acoustics Speech and Signal Processing, vol. 37, pp. 1982 -1985, 1989. S. Goel, A. Kumar, and M. A. Bayoumi, Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic Style," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 12, pp. 13091321, Feb. 2006. X. Chen, L. Peh, "Leakage power modeling and optimization in interconnection networks," Proc. of ISLPED, Seoul, Korea, August, 2003. "Predictive Technology Model," Home page 2008, URL: www.eas.asu.edu/~ptm.
[2]
[3]
[4]
[5]
Minimum Time Cycle
60000 50000 40000 30000 20000 10000 0 16 32 64 128 256 FFT Points 512 1024 2048 4096 R2SDF R2MDC
[6]
[7]
[8] Fig. 6. Results of area analysis for radix-2 FFT processors
[9]
Delay Analysis of Radix-4
50000 45000 40000
Minimum Time Cycle
[10]
R4SDF R4MDC R4SDC
35000 30000 25000 20000 15000 10000 5000 0 16 64 256 FFT Points 1024 4096
[11] [12]
[13]
Fig. 7. Results of area analysis for radix-4 FFT processors
The delay of SDF structure is higher than the others because all of its processing elements are in critical path. But for MDC structure only adders and one multiplier are in critical path therefore its delay is the least of FFT processor structures. Also, higher radices are faster then lower ones in FFT processor.
[14]
[15]
AUTHORS INFORMATION
1
VII. Conclusion
Using the delay estimation technique presented in this paper, a designer can estimate the minimum operating frequency of a pipeline FFT processor in comparison to another pipeline FFT processor. This is done after selection of circuits for the basic computational elements, such as a one bit-full adder circuit, selection of data word-length and selection of multiplier structures. This method can also be applied to other DSP systems.
Electrical Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran. E-mail: talebiyan@gmail.com 2 Electrical Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran.

1014006

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1014006

Uploaded by

Copyright:

Available Formats

International Review on Computers and Software (I.R.E.CoS), Vol. 4, no.

Delay Analysis of Pipeline FFT Processors

Keywords: Pipeline FFT Processor; Delay Analysis

Pipeline FFT Processor Architectures

Fig. 1. The main structure of pipeline FFT processor

Fig. 4 R4SDC pipeline FFT processor

Fig. 2. 16-point R2SDF pipeline FFT processor

III. Choice of Building Block Circuits

Choice of m-bit Adder and Multiplier Structures

Fig. 3. 16-point R2MDC pipeline FFT processor

Tadder K1. f (m).TFA

Copyright 2009 Praise Worthy Prize S.r.l. - All rights reserved

Tm in (log 2 N ).Tadder (log 2 N 1)Tmultiplier

Tm in 2.(log 4 N ).Tadder (log 4 N 1)Tmultiplier (6)

Critical Path for MDC Structure

Tm in [(log 2 N ) 2].Tadder Tmultiplier

Tm in 2.[(log 4 N ) 2].Tadder Tmultiplier

Finding Critical Path for Pipeline FFT Processor Structures

Critical Path for R4SDC Structure

Tm in 2(log 4 N ).Tadder [(log 4 N ) 1].Tmultiplier (9)

VI. Using the Analysis

Copyright 2009 Praise Worthy Prize S.r.l. - All rights reserved

International Review on Computers and Software, Vol. 4, n. 3

Minimum Time Cycle

[8] Fig. 6. Results of area analysis for radix-2 FFT processors

Fig. 7. Results of area analysis for radix-4 FFT processors

Copyright 2009 Praise Worthy Prize S.r.l. - All rights reserved

International Review on Computers and Software, Vol. 4, n. 3

You might also like