Professional Documents
Culture Documents
Miodrag Bolic
1
Objective
• FFT Introduction
• Some FFT algorithms
• FFT on PDSP
• FFT floating to fixed-point conversion
• Hardware implementation of FFT
2
FFT for TMS320x67 with 2 buffers
Buffer (ping)
Destination address 1
count
Source
address
Serial EDMA
Port FFT
Buffer (pong) Processing
Destination address 2
3
FFT Fixed point - Xilinx
• Performing the calculations with no scaling and carrying computation
– The growth of the fractional bits created from the multiplication are truncated
after the multiplication.
– The width of the output will be the (input width + number of stages + 1).
– For example, a 1024-pt transform with an input of 16 bits consisting of 1 integer
bit and 15 fractional bits, will have an output of 27 bits with 12 integer bits and 15
fractional bits.
4
[Xilinx05]
Block-floating point
– The computation is fixed-point
– After every addition there is an overflow test
– If the overflow is detected the array is divided by ½
– The number of division is counted to determine the scale factor
– SNR depends on how many overflows occurs
5
Butterfly computation for Decimation in Time
• Linear noise model
6
[Oppenheim98]
7
[Oppenheim98]
Butterfly with Scaling multipliers
8
[Oppenheim98]
Sequential FFT-Xilinx core
9
[Xilinx05]
Pipelined FFT-Xilinx core
10
[Xilinx05]
Pipelined FFT architecture
• Radix-2 multipath delay commutator (R2MDC)
• Radix-2 single-path delay feedback (R2SDC)
• Radix-4 multipath delay commutator (R4MDC)
• Radix-4 single-path delay commutator (R4SDC)
• Radix-4 single-path delay feedback (R4SDF)
• Radix-22 single-path delay commutator (R22SDC)
11
[Li03]
Radix-2 multipath delay commutator
• The total number of delay elements is 4 + 2 + 2 + 1 + 1 =
10 for the 8-point FFT.
• The utilization of the butterfly and the multiplier is 50%
12
[Li03]
Radix-2 single-path delay feedback
13
[Li03]
FFT processor
• Datapath
– memories,
– butterflies and
– complex multipliers.
• Control unit
14
[Li03]
Requirements
• Requirement
– Transform length is 1024
– Transform time is less than 40 ms (continuously)
– Continuous I/O
– 25.6 Msamples/sec. throughput
– Complex 24 bits I/O data
• Steps in designing
– Architecture selection
– Partitioning
– Scheduling
– Word length selection
– RTL model generation
– Validation of models
15
[Li03]
Resource analysis
• Computation time for the 1024-point FFT
• This is optimal with the assumption that ALL data are available to ALL stages, which
is impossible for continuous data streams. Each butterfly has to be idle for 50% in
order to reorder the incoming data.
16
[Li03]
Resource analysis
• The solution: the number of butterflies is 10
• The number of complex multipliers is 9
• Memory length for Radix-2 single-path delay feedback is
N-1
17
[Li03]
RAM Based Commutator
• A dual-port memory is required since the read and write operation
must be performed in one clock cycle.
18
[Li03]
Complex multiplier
19
[Li03]
Radix - 4
20
Radix 4
21
Altera radix-4 butterfly
22
[Oppenheim98]
References
[Altera05] Altera, FFT MegaCore Function User Guide,
DSP Literature, 2005.
[Li03] W Li, Studies on implementation of low power FFT
processors, Thesis, Linköpings University, 2003
[Oppenheim98] A. V. Oppenheim, R. W. Schafer, Discrete-
time signal processing, 2nd edition, Prentice Hall, 1998.
[Xlinx05] Xilinx, “Fast Fourier Transform v3.2”, DS260
August 31, 2005
23