Professional Documents
Culture Documents
Implementation Summary
eASIC
Features Device Support for Nextreme-2 and Nextreme Devices FFT point sizes from 64 16K pt in steps of powers of 2 (i.e. 256, 512, 1024)*. Fixed Point bit accurate C-Model for system modeling available Support for both FFT & iFFT, run-time configurable Optional Run-time configurable Transform Length. Support for unscaled & scaled version
* Radix-4 Loop engine only supports N points which are powers of 4
Input data bit width: 2s Complement 10 24 bits Phase Factor bit width: 2s Complement 10 24 bits Rounding: Convergent Rounding Scaling: Unscaled, Fixed & Block Floating Point Bit Reversed or Natural Order Input Complete Verilog RTL Code Testbench for Simulation Architectures available: 1) Loop engine (Radix-2 & Radix-4) 2) High Performance Pipelined Streaming FFT
Release Information Below is a list of the files and documents containing in this release of the eASIC FFT IP Core function: 1) RTL files in Verilog 2) Bit accurate C-Model 3) Test bench for RTL simulation with Test vectors covering the FFT features 4) Documentation 5) Scripts for running the testbench
Rev: FFT_v2_1_ds001
www.eASIC.com
CONFIDENTIAL
For the determination of maximum frequency, the core was generated with double registers on each input and output. The registers directly connected to the core run on the core clock, whereas the outer registers run off a separate clock. This ensures that all paths in the core are included in the timing constraint without artificially distorting the design to fit the chip. The device voltage library used for the implementation is specified at the top of each table.
Nextreme-2
Note: All implementations use 16-bit Data and Phase Factors & 1.1v library Table 1:
FFT Architecture R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-4 Loop Engine Pipelined FFT Pipelined FFT Pipelined FFT Pipelined FFT
eCells 3,164 3,179 7,268 7,269 2,973 7,952 12,483 12,833 13,611 21,120
eDFF 1,932 1,981 4,080 4,187 1,932 4,295 11,208 12,454 11,694 14,427
bRAM 3 3 7 7 3 28 11 12 15 75
Rev: FFT_v2_1_ds001
www.eASIC.com
CONFIDENTIAL Nextreme
Note: All implementations use 16-bit Data and Phase Factors & 1.2v library
Table 2:
FFT Architecture R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-4 Loop Engine
eCells 3450 3509 3520 3591 9437 9465 3509 3482 3520 3550 9437 9379 3696 9644
bRAM 5 5 5 10 11 11 5 5 5 5 11 11 40 44
Rev: FFT_v2_1_ds001
www.eASIC.com
CONFIDENTIAL
General Description
The formulae for evaluating the DFT is Forward DFT
Where n range from 0 to N-1 We note here that the inverse DFT only change in the phase factor is conjugate of the forward DFT. Fast Fourier transform is an efficient algorithm to find the DFT of a given block of input data. This algorithm uses divide and conquer approach which reduces the computation to smaller and repetitive structure called butterfly structure. This basic/butterfly structure can be implemented in such a way that it takes 2 inputs at a time (Radix-2) or 4 inputs at a time (Radix-4). The iFFT is calculated by conjugating the phase factors of the corresponding forward FFT. User has to take care for the division of result by N when using iFFT. The eASIC FFT cores have 3 architectures. 1) Loop engine - Radix-2: Separate stage for Loading, processing & unloading process. 2) Loop engine - Radix-4: Separate stage for Loading, processing & unloading process. 3) High Performance Pipelined Streaming FFT: Continuous output data after some latency. Figure 1 illustrates the throughput and area difference between the three architectures.
Rev: FFT_v2_1_ds001
www.eASIC.com
CONFIDENTIAL
CONFIDENTIAL
Scaling
The FFT processes a frame of data by successive passes over the input data frame. In each stage the data is subjected to addition which in turn increases the data width per stage. In each stage there are 2 set of additions so that the bit growth per stage is 2 bits for each real & imaginary in the case of R22SDF (Streaming) & Radix-4 (Loop engine). The numbers generated by the computation are potentially larger than the numbers picked up from memory. A strategy must be employed to accommodate this dynamic range expansion. The Bit growth is handled by fixed scaling schedule for both streaming & loop engine and Block floating point scaling for loop engine only. 1. Scaling at each stage using a fixed-scaling schedule When using scaling, a scaling schedule is used to scale by a factor of 1, 2, 4, or 8 in each stage. If scaling is insufficient, a butterfly output may grow beyond the dynamic range and cause an overflow. As a result of the scaling applied in the FFT implementation, the transform computed is a scaled transform. For Pipelined, Streaming I/O architecture, the scaling schedule is specified with two bits for every R22SDF stage, starting at the two MSBs. For example, a scaling schedule for N=256 could be [1 2 3 2]. So the scaling value for each stage is Stage1 1; Stage2 2; Stage3 3; Stage4 2. For Loop engine we have Radix-2 and Radix-4 architectures, for which the Radix-2 would produce 1 bit increase in each stage and Radix-4 would produce 2 bits in each stage. For N=64 Radix-2 the scaling schedule could be given as [1 2 2 1 1 1]. So the scaling value for each stage are stage1 1; stage2 1; stage3 1; stage4 2; stage5 2; stage6 1. For N=64 Radix-4 the scaling schedule could be given as [3 2 1]. So the scaling value for each stage are stage1 1; stage2 2; stage3 3. 2. Block Floating Point This is used only for loop engine. In this the core is intelligent enough to calculate the scaling factor required in each stage. The scaling value is found out by having a replicated radix-2 or Radix-4 computation just before storing the data into RAM. If any overflow is detected then the scaling value is calculated and then passed on to the next stage where the scaling process takes place.
Rev: FFT_v2_1_ds001
www.eASIC.com
CONFIDENTIAL
Start
Done
Data valid
Data Memory
Radix Computation
Tw_addr
Rev: FFT_v2_1_ds001
www.eASIC.com
This block is the main controlling block for the entire FFT operation and implements the following functions key functions: a) Controls the entire core b) Generation of the read & write address data c) Read address for fetching the phase factors d) Indicates when the Loading ,Computation & Unloading stage occurs e) Generation of data valid, done signals.
Data Memory
This block and implements the following functions key functions a) Stores the input data given by user b) Stores the intermediate data after the Radix computation. c) Takes complex data as input. d) Memory used is the block memory and output is registered.
Radix Computation
This block and implements the following functions key functions a) This block contains the basic butterfly structure (Radix-2 or Radix-4). b) This block accepts the complex data as input & gives out the complex data.
This architecture uses a Radix-2 Butterfly structure for the FFT computation. This is the smallest area implementation of all in the FFT computation. Figure 2 shows Radix-2 computation block.
tw_rom adr_i fd_inv_i rd_addr_i Memory Block RAM0 Twiddle ROM Twos Comple ment
0
input_i
ip_sel_i
ip_sel_i
Radix-2 Engine
S C A L IN G
R 2 0 R 2 1 R 3
0 1
sel_0_i
1 0
sel_1_i
scale_value_i
0
Radix-2Data Reorder Block
RAM1
op_sel_i
output_o
Rev: FFT_v2_1_ds001
www.eASIC.com
CONFIDENTIAL
Twiddle ROM Twos Comple ment 3 factors R4(0) R4(1) Radix-4 R4(2) Engine R4(3)
S C A L IN G
input_i
ip_sel_i
R4_reord(2) R4_reord(1)
RAM1
ip_sel_i
1 0
scale_value_i
sel_0_o
sel_1_o
rd_addr_i RAM2
0
input_i
ip_sel_i
0 1
output
R4_reord(3)
RAM3
op_sel_i(1:0)
Output Mux
ip_sel_i
Rev: FFT_v2_1_ds001
www.eASIC.com
10
CONFIDENTIAL
Below diagram shows the structure of R2 SDF architecture for FFT computation (for N=16 point) This is just a conceptual block diagram
8 4 2 1
BF1 s1 3
BF2 s1 2 t1 W1(n)
BF1 s2 1
BF2 s2 0 t2
We note that there are two structure BF1 & BF2 for each stage of computation. Below diagram shows the structure of the butterfly-1 & butterfly-2. The only difference b/w BF1 & BF2 is that the BF2 has additional multiplexer at the beginning which multiplies the input with (-j) and select corresponding input for the butterfly computation.
Rev: FFT_v2_1_ds001
www.eASIC.com
11
CONFIDENTIAL
N/4 0 1 D41
D12
0 1 X1
D32 1 0 D42
s1
t1
t1
BF1
BF2
Figure 5 : Internal structure of the butterfly structure 1 & 2 The number of multiplier required is same as that for the radix-4 computation stages & the adders required is same as that for radix-2 computation block. This structure is simple compared to radix-4 2 structure. The number of stages of R2 SDF stages is log4(N_point). As the number of twiddle multiplier is log4(N_point) the resource is less compared to radix-2 where the number of multiplier is log2(N_point) & also the memory required to store the twiddle factors is less.
input
Stage-1 (BF1-BF2)
Stage-2 (BF1-BF2)
Output Shuffling
output
Sel_line
Sel_line
Sel_line
addr
start
addr
Ctrl_sig
Twi 1
Twi 2
CONFIDENTIAL
The main components that we note here are 1) Address control generation 2) Stage computation block 3) Data memory 4) Twiddle memory 5) Output shuffling
Data memory
This block does the following key functions: a) This stores the input data & intermediate data b) In this architecture we require shift register which is implemented by using FIFOs c) Take complex data as input d) Memory used can be block ram or distributed ram depending on the threshold and depth
Twiddle memory
Following are the key functions: a) This block stores the twiddle factors generated through Eulers theorem b) At the beginning the core will be initialization stage. In this stage the phase factors are computed & stored in the block RAM c) In the initialization stage the user should not give data. Only when the config_o port signal goes low then only data should be fed d) Memory used can be block ram or distributed ram depending on the threshold and depth
Rev: FFT_v2_1_ds001
www.eASIC.com
13
a) This block will convert the bit reverse order output data to natural order output data b) This block includes address generation block and dual port memory c) This block will be active only when the user selects to have output data in natural order
Rev: FFT_v2_1_ds001
www.eASIC.com
14
CONFIDENTIAL
clk_i rst_ni rom_clk_i rom_rst_ni ce_i start_i xn_re_i xn_im_i nfft_i nfft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i
Figure 7: FFT core symbol
Bxk
xk_re_o
Bxk xk_im_o
xk_index_o rfd_o
B B
busy_o
FFT_v2_1
Rev: FFT_v2_1_ds001
www.eASIC.com
15
CONFIDENTIAL
Port Interface
Table 3: Port Name
clk_i rst_ni rom_clk_i rom_rst_ni ce_i
Width
1 1 1 1 1
Description
Rising-edge clock Master asynchronous reset (Active High) Slower clock for the ViaROM block Reset signal for the the ViaROM block Clock enable (Active High) Input data bus: Real component (B = 10 - 24) in
xn_re_i
Input
xn_im_i
Input
Input data bus: Imaginary component (B = 10 24) in twos complement format FFT start signal (Active High): START is asserted to begin the data loading and transform calculation (for Burst I/O architectures). For Streaming I/O, START begins data loading, which proceeds directly to transform calculation and then data unloading. Point size of the transform: This port specifies the N_value that user need to
start_i
Input
nfft_i
Input
feed in or configure the core with. N-point would be (2^nfft_i). If this port is Zero then the least value is selected. (According to the architecture)
nfft_we_i
Input
Write enable for NFFT (Active High): Write enable for nfft_i port Control signal that indicates if a forward FFT or an
fwd_inv_i
Input
inverse FFT is performed. When FWD_INV=1, a forward transform is computed. If FWD_INV=0, an inverse transform is computed.
fwd_inv_we_i
Input
Rev: FFT_v2_1_ds001
www.eASIC.com
16
CONFIDENTIAL
For 2xceil(number_of_stage/2) for Streaming I/O & Radix-4 Loop engine. scale_sch_i Input 2xceil(number_of_stages) for Radix-2 Loop engine Streaming
schedule is specified with two bits for R22SDF stages, starting at two MSBs. Ex: Scaling schedule for N=256 could be [1 2 3 2]. So the scaling value for each stage is Stage1 1; Stage2 2; Stage3 3; Stage4 2. For Loop engine scaling schedule is specified with two bits for each stage starting at two LSBs. Ex: For N=64 Radix-4 the scaling schedule could be given as [3 2 1]. So the scaling value for each stage are stage1 1; stage2 2; stage3 3. Write enable for SCALE_SCH (Active High): This scale_sch_we_i Input 1 port is available only with scaled arithmetic and not with full precision.
**Note: unload_i 1 bit port is present when the user chooses loop engine architecture. When this port is high the output will be in natural order & when low then the output will be in reverse order.
xk_index_o
Output
Index of output data. Ready for data (Active High): RFD is High during the load operation. Core activity indicator (Active High): This signal
rfd_o
Output
busy_o
Output
Rev: FFT_v2_1_ds001
www.eASIC.com
17
CONFIDENTIAL
dv_o
Output
Data valid (Active High): This signal is high when valid data is presented at the output. FFT complete strobe (Active High): DONE
done_o
Output
transitions High for one clock cycle when the transform calculation has completed. Arithmetic ovflo_o is High during result unloading if any value in the overflow indicator (Active High):
ovflo_o
Output
data frame overflowed. The ovflo_o signal is reset at the beginning of a new frame of data. This port is optional and only available with scaled arithmetic. Indicates that the core is still in the configuration
config_o
Output
stage. (That is the core is still in the evaluation of 1 the Phase factors). No input shall be fed in to the Core until this signal goes low.
blk_exp_o**
Output
This output signal indicates how many bits are scaled in each stage for the given frame.
**Note: blk_exp_o is present only when the user chooses the loop engine architecture.
Rev: FFT_v2_1_ds001
www.eASIC.com
18
CONFIDENTIAL
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n)
0 1 1 0 2 1 2 n-1 n-1
Rev: FFT_v2_1_ds001
www.eASIC.com
19
CONFIDENTIAL
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n) busy_o done_o dv_o xk_index_o X(k)
0 1 2 0 1 2 n-1 n-1 1 0 1 2 0 1 2 n-1 n-1
Rev: FFT_v2_1_ds001
www.eASIC.com
20
Figure 11 shows the signals that one should note for feeding the data into the FFT during with a streaming interface. 1) config_o should be low before feeding the data (I.e.,, before start pulse is asserted) 2) Before start pulse the run time configuration signals should be asserted 3) After the assertion of the start pulse the rdf_o signal will go high after one clock pulse 4) User should fed data in the next positive edge of the clock after getting the index.
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_o xn_index_o x(n)
0 1 0 2 1 2 n-1 n-1 0 1 0 2 1 2 n-1 n-1
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_o xn_index_o x(n)
0 1 0 2 1 2 n-1 0 1 2 1 2 n-1 n-1
n-1 0
Rev: FFT_v2_1_ds001
www.eASIC.com
21
CONFIDENTIAL
0 1 2 0 1 2
2 2
n-1 n-1
Rev: FFT_v2_1_ds001
www.eASIC.com
22
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n) busy_o done_o dv_o xk_index_o X(k)
0 1 2 0 1 2 n1 n1 1 0 1 2 0 1 2 n-1 n-1
Rev: FFT_v2_1_ds001
www.eASIC.com
23
CONFIDENTIAL
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n) busy_o done_o dv_o xk_index_o X(k)
0 1 2 0 1 2 n-1 0 n-1 0 1 2 1 2 n-1 n-1 0 1 2 0 1 2 n-1 0 1 2 n-1 0 1 2 n-1 0 1 2 n-1 0 1 2 n-1 0 1 2 n-1 0 1 2 n-1 n-1
Rev: FFT_v2_1_ds001
www.eASIC.com
24
CONFIDENTIAL
Parameter Name
data_width
Description
Real = data_width Imag = data_width Real = phase_width Imag = phase_width Specifies the Fourier Transform Length in steps of 2 powers (16, 32) To Configure the core to compute only IFFT. When user wants the output data in natural order. If NOT defined then the output order will be bit reverse order If defines the core would be configured for scaled version. If NOT defined then the core would be unscaled version To make the core configurable for the run time FFT/iFFT computation If defined then the core works for Nextreme device.
phase_width
10 to 24*
SCALING**
Define
DYN_FFT_IFFT
Defines
NX
Defines
If NOT defined then the core works for N2X device (Nextreme-2 Device).
RUN_TIME_N_CONFIG BLK_FLT_POINT***
Defines Defines
To make the FFT core run time configurable To make the core configured to Block Floating Point Specifies the limit to choose Flip flop(Shift regs) or Memory
THRESHOLD_MEM_FF
User Discretion
elements; if (depth <= Threshold) FF are chosen else Memory element Specifies the limit to choose BRAM or Reg files
THRESHOLD_BRAM_REGFILE
User Discretion
if (depth <= Threshold) Reg files are chosen else BRAM is chosen
* Note: For loop engine the data width & phase width will range from 8 18. ** Note: Define SCALING is only present in streaming FFT ***Note: Present in Loop engine architecture only #Note: Present in Streaming architecture only
Rev: FFT_v2_1_ds001
www.eASIC.com
25
CONFIDENTIAL
//`define STATIC_IFFT
//`define DYN_FFT_IFFT
`define NX
//`define RUN_TIME_N_CONFIG
`define OUTPUT_ORDER
`define SCALING
`define THRESHOLD_MEM_FF
`define THRESHOLD_BRAM_REGFILE 16
Rev: FFT_v2_1_ds001
www.eASIC.com
26
CONFIDENTIAL
FFT IP Core have configuration file named fftv2_config.vh(for
fft_config.vh(for Loop engine architecture). Here user can configure the USER CONFIGURABLE PARAMETERS/DEFINES & should not touch other parameters. The other parameters are testcase specific. If fftv2_ is suffix then this is specific for streaming FFT Core. If fft_ is the suffix then configuration is for the Loop engine. So while doing synthesis one should include this configuration file (fftv2_config.vh) and change the parameter for user specific. The allowable range is given in the comment of the configuration file & also in the parameter table. The following parameters are specific to the streaming FFT core, THRESHOLD_MEM_FF and THRESHOLD_BRAM_REGFILE. These parameters would decide the choice of memory and the type of memory for each stage. BLK_FLT_POINT parameter is specific to Loop engine which makes the core for dynamic scaling according to the input value.
Rev: FFT_v2_1_ds001
www.eASIC.com
27
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
28
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
29
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
30
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
31
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
32
CONFIDENTIAL
The following diagram shows how multiple instance of FFT core is done.
ViaROM Wrapper phase_w FFT core modified rd_addr_viarom_w xn0_re_i xn0_im_i xk0_re_i xk0_im_i fwd_inv_0i scale_sch_0i Nfft0_i
twi_wr_addr_w twi_wr_en_w config_o FFT core modified xn1_re_i xn1_im_i xk1_re_i xk1_im_i fwd_inv_1i scale_sch_1i Nfft1_i
rst_ni
Rev: FFT_v2_1_ds001
www.eASIC.com
33
CONFIDENTIAL
Following pin out would change for Multi Core FFT.
Single Core clk_i rst_ni rom_clk_i rom_rst_ni ce_i xn_re_i xn_im_i start_i unload_i nfft_i nfft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i config_o xk_re_o xk_im_o xn_index_o xk_index_o rfd_o busy_o dv_o done_o blk_exp_o ovflo_o
Multi Core
clk_i rst_ni rom_clk_i rom_rst_ni ce_i xn_re_i xn_im_i start_i unload_i nfft_i nfft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i config_o xk_re_o xk_im_o xn_index_o xk_index_o rfd_o busy_o dv_o done_o blk_exp_o ovflo_o [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst] [No of Inst]
The Top module name will change when one is going for multiple instantiation. Radix-2 Loop engine : fft_multi_core_r2_top_rtl Radix-4 Loop engine Streaming Even Streaming Odd Streaming Run Even Streaming Run Odd : fft_multi_core_r4_top_rtl : fftv2_even_top_multi_core_viarom_rtl : fftv2_odd_top_multi_core_viarom_rtl : fftv2_run_even_top_multi_core_viarom_rtl : fftv2_run_odd_top_multi_core_viarom_rtl
NOTE: In the config file the parameter NO_OF_CHANNELS which defines the number of instance of FFT core present in the module.
Rev: FFT_v2_1_ds001
www.eASIC.com
34
CONFIDENTIAL
Bit-Accurate C Model
The C Model is designed for bit-accurate modelling of the FFT core. The model produces the same exact result as Verilog implementation of the FFT core. It is important to note that the C-model is not cycle accurate and does not model interface or clock latency. The files provided with the C-Model are 1. fftv2_r22sdf_cmodel.c - The complete C-Model 2. fftv2_inter_parameter.h Internal parameters required for the IP 3. fftv2_user_defines.h - User parameters for the FFT
System Requirements
A GCC or Microsoft Visual C++ 8.0 or greater is required to use the C-Model The C-model is tested using GCC and Microsoft Visual Studio 8.0 in the Linux environment. NOTE: The FFT V2.1 C-model code is tested in a 32-bit environment (Win XP Operating System). The models functionality has been tested for configurations in which the intermediate results do not exceed 32 bits. (We have chosen configurations where the data width = 8bits, 9 bits and 10 bits). In MS-VC++, functionality of the model for the cases where, internal/final result with data width is greater than 32 bits has not been tested.
Rev: FFT_v2_1_ds001
www.eASIC.com
35
CONFIDENTIAL
User Defines
Table 5: User Parameters for the C Model Description
The Point size of a Transform fft data width, only real data width fft phase width, only real data width number of frames first frame N_POINT value second frame N_POINT value depending on NO_OF_FRAMES values we need to have that many F*_N_POINT defines F1_FWD_INV F2_FWD_INV first frame transform value second frame transform value forward transform value = 1 for reverse transform value = 0 F1_SCA_VAL F2_SCA_VAL first frame scaling value second frame scaling value this value indicates the scaling after each stage whether to include scaling or not SCALING_EN BLK_FLT_POINT STATIC_IFFT if it is 1 then scaling is enabled if it is 0 then scaling is disabled and default scaling is applied. To enable the block floating point scaling feature when STATIC_IFFT = 0, negates the imaginary values of phase factor to calculate the IFFT in multi frame transform When DYN_FFT_IFFT defined, and the transform == 0, the twiddle factors are negated. This is to calculate the Forward and inverse transformation in run time input output EN_STAGE_RESULT PRINT_TWIDDLE F_N_POINT F_FWD_INV F_SCA_VAL
Rev: FFT_v2_1_ds001
Parameter Name
N_POINT DATA_WIDTH PHASE_WIDTH NO_OF_FRAMES F1_N_POINT F2_N_POINT
DYN_FFT_IFFT
input file name output file name This is to print intermediate stage results This is to print phase factor values make the array of F*_N_POINT make the array of F*_FWD_INV make the array of F*_SCA_VAL
www.eASIC.com 36
In "fft_user_defines.h" the character array "input" specifies the input file name. This file should contain the input data to be transformed. The data should be in decimal format, and contain the real and imaginary values separated by space. An example is shown below: +19783 +47534 +61825 +16308 +118822 +43074 +96314 +117995 . The left most column is the real part and the right portion is the imaginary part. Note that the data are in decimal format.
Rev: FFT_v2_1_ds001
www.eASIC.com
37
CONFIDENTIAL
CONFIDENTIAL
3. Select the project in the Solution explorer Click on the properties option and Set "Working Directory: You need to specify where the "input.txt" file is present
Rev: FFT_v2_1_ds001
www.eASIC.com
39
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
40
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
41
CONFIDENTIAL
Rev: FFT_v2_1_ds001
www.eASIC.com
42
CONFIDENTIAL
"output.txt" file will be generated in the "Working Directory" specified above
Figure 25: Slot Noise comparison between eASIC FFT and Floating Pt streaming FFT for 16 bit 1024 pt FFT unscaled
Rev: FFT_v2_1_ds001
www.eASIC.com
43
CONFIDENTIAL
Figure 26: Slot Noise comparison between eASIC FFT and Floating Pt FFT for 16 bit 1024 pt FFT Scaled by 32
Rev: FFT_v2_1_ds001
www.eASIC.com
44
CONFIDENTIAL
Figure 27: Slot Noise comparison between eASIC FFT and Floating Pt FFT for 16 bit 1024 pt FFT Radix-2 scaled version The scaling value used was [1 1 1 1 ] .Hence the stages were alternatively scaled.
Rev: FFT_v2_1_ds001
www.eASIC.com
45
CONFIDENTIAL
Figure 28: Slot Noise comparison between eASIC FFT and Floating Pt FFT for 16 bit 1024 pt FFT Radix-2 block floating point version
Rev: FFT_v2_1_ds001
www.eASIC.com
46
CONFIDENTIAL
Figure 29: Slot Noise comparison between eASIC FFT and Floating Pt FFT for 16 bit 1024 pt FFT Radix4 scaled version
Rev: FFT_v2_1_ds001
www.eASIC.com
47
CONFIDENTIAL
Figure 30: Slot Noise comparison between eASIC FFT and Floating Pt FFT for 16 bit 1024 pt FFT Radix4 block floating point version
Rev: FFT_v2_1_ds001
www.eASIC.com
48
CONFIDENTIAL
Directory Structure
Figure 277 shows the directory structure after unpacking the release package. Make sure the directory structure is correct before using the core:
Rev: FFT_v2_1_ds001
www.eASIC.com
49
CONFIDENTIAL
indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the 9.
2.
Rev: FFT_v2_1_ds001
CONFIDENTIAL
a. Go to the ./scripts/functional_NX2/run_all_viarom.do file. b. Look for quietly set TESTCASE {<testcase_name>}. This would have the list of the testcases that needs to run. c. Set TESTCASE to a particular testcase that needs to be run. d. After modification follow the steps mentioned above for compilation & simulation of the core
Script Descriptions
The scripts are available in the folder data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2 testcase. Following are the files which are used for streaming viarom (EASIC) configurations 1) compile_all_viarom.do //Compilation of R22SDF Architecture source & TB files 2) compile_lib_NX2.do //Compilation of NX 2 device libraries source files 3) run_all_viarom.do // Loading the design & running the simulation of R22SDF architecture files to run a
4) fft_gen_verdict_rpt_viarom.tcl // Generation of final verdict Also, under NX2 folder memory models are present. compile_all_viarom.do This script will compile source file & Test bench files required for the R22SDF streaming architecture. The Test bench files are under the directory \data\dsp_cores\fft_core\TESTBENCH\Streaming RTL and RTL header files are present under the directory \data\dsp_cores\fft_core\RTL\Streaming_viarom compile_lib_NX2.do This script will compile library files for the memory model required. Memory related files for NX2 are available under the folder \data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2\NX2 run_all_viarom.do
st
This script will create the work directory when user runs for the 1
compile_lib_NX2.do script is called for the compilation of library files related to memory. In this script the compile_all_viarom.do script is called for the compilation of the source & TB files To run regression & a particular testcase follow the procedure given in the section above. This script is for simulation of streaming architecture The above script internally calls fft_gen_verdict_rpt_viarom.tcl script which gives the verdict report which tells the status of the testcase. The report also provides the time stamp of each of the test cases.
Rev: FFT_v2_1_ds001 www.eASIC.com 51
CONFIDENTIAL
Note: There is a variable called FIRST_RUN confirm this is set to 1 for running first time. This ensures that the work is created and libraries are compiled before compilation of RTL files. NOTE The streaming configuration is also available for altera version. The script files used are given as follows and they are present in the same folder as before. Following are the files which are used for streaming viarom altera configurations 1) compile_all_altera.do //Compilation of R22SDF Architecture source & TB files 2) compile_lib_NX2.do 3) run_all_altera.do //Compilation of NX 2 device libraries source files // Loading the design & running the simulation of R22SDF architecture files
Let us consider tc_fftv2_64 is to be run. As per the testcase register .xls tc_fftv2_64 has the following configuration. 1) Run time configurable 2) N point = 1024 3) Dynamic FFT/iFFT 4) Data width = 16 (Real = 16 & Imag = 16) 5) Phase width = 16 (Real = 16 & Imag = 16) 6) Output order bit Natural 7) Scaled version Hence, in the fftv2_config.vh file under the directory \data\dsp_cores\fft_core\INTERFACE\Streaming\NX2\tc_fftv2_64\ which is the configuration file for the core should have following definitions `define DATA_WIDTH `define PHASE_WIDTH `define N_POINT //`define STATIC_IFFT `define DYN_FFT_IFFT //`define NX `define RUN_TIME_N_CONFIG `define OUTPUT_ORDER `define SCALING define THRESHOLD_MEM_FF define THRESHOLD_BRAM_REGFILE 2 16 16 16 1024
In addition to this the user can also define clock period & duty cycle This completes the FFT core configuration. Next is changing the testcase name in the run_all_viarom.do for simulation and running it
www.eASIC.com 52
Rev: FFT_v2_1_ds001
CONFIDENTIAL
In the run_all_viarom.do file which is under the folder
\data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2, set the test case name to be run as follows and save the file. quietly set TESTCASE {" tc_fftv2_64 "} Run the script using the command do ./Streaming/scripts/functional_NX2/run_all_viarom.do After the simulation is complete, the output report file can be viewed for final results. The report file will be available in the file tc_fftv2_64_nx2_1.rpt under the folder \data\dsp_cores\fftv2.0\TESTBENCH\Streaming\simdata\tc_fftv2_64\rep ort\
in
the
folder
\data\dsp_cores\fft_core\TEST\
53
www.eASIC.com
CONFIDENTIAL
Loop_engine\Radix2\functional_NX2 9.
mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT. Detailed report of each test case will be present in \data\dsp_cores\fft_core\TESTBENCH\Loop_engine\simdata\<testcase>\repor t\<testcase>_nx2.rpt for N2X device
Script Descriptions
The scripts are available in the folder data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\functional_NX2 to run a testcase. Following are the files which are used for looping radix 2(EASIC) configurations 1) compile_all_viarom.do //Compilation of loop engine Radix2 Architecture source & TB files 2) compile_lib_NX2.do 3) run_all_viarom.do //Compilation of NX 2 device libraries source files // Loading the design & running the simulation of
loop engine Radix2 architecture files 4) fft_gen_verdict_rpt_viarom.tcl // Generation of final verdict NOTE The script files for running the altera version of streaming is also available in the folder data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\functional_NX2 Running altera test cases is similar to streaming test cases compile_all_viarom.do This script will compile source file & Test bench files required for the loop engine Radix2 architecture. Following are the list of the files that this script compiles The Test bench files are under the directory \data\dsp_cores\fft_core\TESTBENCH\Loop_engine RTL and RTL header files are present under the directory \data\dsp_cores\fft_core\RTL\Loop_engine_viarom\r2_rtl for radix 2 and \data\dsp_cores\fft_core\RTL\Loop_engine_viarom\r4_rtl for radix 4 compile_lib_NX2.do This script will compile library files for the memory model required. Following are the list of the files that this script compiles .Memory related files for NX2 are available under \data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\functional_NX2\NX2 run_all_viarom.do This script will create the work directory when user runs for the 1 time. In this script compile_lib_NX2.do script is called for the compilation of library files related to memory. In this script the compile_all_viarom.do script is called for the compilation of the source & TB files. To run regression & a particular testcase follow the procedure given in the section above. This script is for simulation of loop engine Radix2 architecture
Rev: FFT_v2_1_ds001 www.eASIC.com 54
st
the
folder
CONFIDENTIAL
The above script internally calls fft_gen_verdict_rpt_viarom.tcl script which gives the verdict report which tells the status of the testcase. The report also provides the time stamp of each of the test cases. Note: There is a variable called FIRST_RUN confirm this is set to 1 for running first time. This ensures that the work is created and libraries are compiled before compilation of RTL files. NOTE The looping configuration is also available for altera version. The script files used are given as follows and they are present in the same folder as before. Following are the files which are used for looping radix2 altera configurations 1) compile_all_altera.do //Compilation of loop engine Architecture source & TB files 2) compile_lib_NX2.do 3) run_all_altera.do //Compilation of NX 2 device libraries source files // Loading the design & running the simulation of
Let us consider TV_FFT_34 is to be run. As per the testcase register .xls TV_FFT_34 has the following configuration. 8) Run time configurable 9) N point = 8192 10) Static FFT 11) Data width = 16 (Real = 16 & Imag = 16) 12) Phase width = 16 (Real = 16 & Imag = 16) 13) Input order bit Natural 14) Scaled version Hence, in the fft_config.vh file under the directory \data\dsp_cores\fft_core\INTERFACE\Loop_engine\NX2\ TV_FFT_34\ which is the configuration file for the core should have following definitions (5frames) `define DATA_WIDTH `define PHASE_WIDTH `define N_POINT //`define STATIC_IFFT //`define DYN_FFT_IFFT //`define NX //`define RUN_TIME_N_CONFIG `define INPUT_ORDER In addition to this the user can also define clock period & duty cycle 16 16 8192
Rev: FFT_v2_1_ds001
www.eASIC.com
55
CONFIDENTIAL
This completes the FFT core configuration. Next is changing the testcase name in the run_all_viarom.do for simulation and running it In the run_all_viarom.do file which is under the folder \data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\funct ional_NX2, set the test case name to be run as follows and save the file. quietly set TESTCASE {" TV_FFT_34"}
Run the script using the command do ./Loop_engine/Radix2/scripts/functional_NX2/run_all_viarom.do After the simulation is complete, the output report file can be viewed for final results. The report file will be available in the file TV_FFT_34_nx2.rpt under the folder \data\dsp_cores\fftv2.0\TESTBENCH\ Loop_engine\ simdata\ TV_FFT_34\report\
Rev: FFT_v2_1_ds001
CONFIDENTIAL
\data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2 for N2X device. 3. Here we will explain the procedure for N2X device. Now go to the file \data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2\ run_all_viarom_multi_core.do and set the variable ELIB_NX2in the above mentioned script file to the path where the below files are available. They files are available in \data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2\NX2 4. Once the directory is set, then go to Modelsim command/transcript window and type do ./Streaming/scripts/functional_NX2/ run_all_viarom_multi_core.do Modelsim will call the macro run_all_viarom_multi_core.do and executes commands. 5. Open the simulation script run_all_viarom_multi_core.do, which is located in the folder \data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2\ in any text editor. This file has commands to run each of the test cases. Test case names are assigned to TESTCASE variable in the script. The information regarding the configuration used for a particular test case will be available in the fftv2_config.vh file under the folder \data\dsp_cores\fft_core\INTERFACE\multi_core_Streaming\NX2\<testcase_name> 6. 7. 8. Once the simulation for a test case is finished, **** Simulation End **** message is displayed on Modelsim command/transcript window. The verdict will be generated whose run_all_viarom_multi_core.do script itself Final report of the test cases status PASS/FAIL will be present in the verdict report verdict.rpt in the folder \data\dsp_cores\fft_core\TEST\ multi_core_streaming\functional_NX2.This verdict will indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT. 9. Detailed report of each test case will be present in \data\dsp_cores\fft_core\TESTBENCH\multi_core_streaming\simdata\<testca se>\report\<testcase>_nx2_XXX.rpt for N2X device script is included in the
Script Descriptions
The scripts are available in the folder data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2 testcase. Following are the files which are used for streaming viarom (EASIC) configurations 1) compile_all_viarom_multi_core.do //Compilation of R22SDF Architecture source & TB files 2) compile_lib_NX2.do //Compilation of NX 2 device libraries source files 3) run_all_viarom_multi_core.do // Loading the design & running the simulation of R22SDF architecture files to run a
CONFIDENTIAL
Note: All these scripts are similar to the scripts used for streaming test cases and they are explained above.
An Example Testcase
Select a testcase to be run. Let us consider tc_fftv2_64 is to be run. As per the testcase register .xls tc_fftv2_64has the following configuration. 15) Multichannel test case (10 channels) 16) Run time configurable 17) N point = 1024 18) Dynamic FFT/IFFT 19) Data width = 16 (Real = 16 & Imag = 16) 20) Phase width = 16 (Real = 16 & Imag = 16) 21) Output order Natural 22) Scaled version Hence, in the fftv2_config.vh file under the directory \data\dsp_cores\fft_core\INTERFACE\multi_core_streaming\NX2\ tc_fftv2_64\ which is the configuration file for the core should have following definitions (5frames) `define DATA_WIDTH `define PHASE_WIDTH `define N_POINT //`define STATIC_IFFT `define DYN_FFT_IFFT //`define NX `define RUN_TIME_N_CONFIG `define OUTPUT_ORDER `define SCALING In addition to this the user can also define clock period & duty cycle This completes the FFT core configuration. Next is changing the testcase name in the run_all_viarom.do for simulation and running it In the run_all_viarom_multi_core.do file which is under the folder \data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX 2\NX2, set the test case name to be run as follows and save the file. quietly set TESTCASE {" tc_fftv2_64"} Run the script using the command do ./Streaming/scripts/functional_NX2/run_all_viarom_multi_core.do
Rev: FFT_v2_1_ds001 www.eASIC.com 58
16 16 1024
CONFIDENTIAL
After the simulation is complete, the output report file can be viewed for final results. The file verdict.rpt will be available in the folder \data\dsp_cores\fft_core\TEST\ multi_core_streaming\functional_NX2.This verdict will indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT. Details regarding the rtl files can be found in the release document fft_release_notes_2.1.doc The rtl files can be found in the folder \data\dsp_cores\fft_core\RTL\Multi_core_streaming And the Test bench files are there in the folder \data\dsp_cores\fft_core\TESTBENCH\multi_core_streaming
NOTE The multicore streaming configuration is also available for altera version. The script files used are given as follows and they are present in the folder \data\dsp_cores\fft_core\SIMULATION\Streaming\scripts\functional_NX2\NX2 The RTL files are present in the folder \data\dsp_cores\fft_core\RTL\Altera_Multi_core_streaming Following are the files which are used for multi core streaming viarom (EASIC) configurations 1. compile_all_altera_multi_core.do //Compilation of R22SDF Architecture source & TB files 2. compile_lib_NX2.do 3. run_all_altera_multi_core.do //Compilation of NX 2 device libraries source files // Loading the design & running the simulation of R22SDF architecture files
Compile & Simulate the Design (Multi Core Loop engine radix 2)
The following steps are required for running the provided testcases (For Multi Core Loop engine radix 2 simulation ) 1. Before you start simulation, ensure that the Modelsim present working directory is set to the \data\dsp_cores\fft_core\SIMULATION folder. 2. We have scripts for N2X device in the folder \data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\function al_NX2 for N2X device. 3. Here we will explain the procedure for N2X device. Now go to the file \data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\function al_NX2\ run_all_viarom_multi_core.do and set the variable ELIB_NX2in the above mentioned script file to the path where the below files are available. These files are available in
Rev: FFT_v2_1_ds001 www.eASIC.com 59
CONFIDENTIAL
\data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\functional_NX2 \NX2 4. Once the directory is set, then go to Modelsim command/transcript window and type do ./Loop_engine/Radix2/scripts/functional_NX2/ run_all_multicore_viarom.do Modelsim will call the macro run_all_multicore_viarom.do and executes commands. 5. Open the simulation script run_all_multicore_viarom.do, which is located in the folder \data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\function al_NX2\ in any text editor. This file has commands to run each of the test cases. Test case names are assigned to TESTCASE variable in the script. The information regarding the configuration used for a particular test case will be available in the fft_config.vh file under the folder \data\dsp_cores\fft_core\INTERFACE\multi_core_Loop_engine\NX2\ <testcase_name>. 6. 7. 8. Once the simulation for a test case is finished, **** Simulation End **** message is displayed on Modelsim command/transcript window. The verdict will be generated whose run_all_multicore_viarom.do script itself Final report of the test cases status PASS/FAIL will be present in the verdict report multi_core_Loop_engine_verdict.rpt in the folder \data\dsp_cores\fft_core\TEST\multi_core_Loop_engine\Radix2\functional_ NX2.This verdict will indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT. 9. Detailed report of each test case will be present in \data\dsp_cores\fft_core\TESTBENCH\ multi_core_Loop_engine\simdata\<testcase>\report\<testcase>XX_nx2.rpt for N2X device script is included in the
Script Descriptions
The scripts are available in the folder data\dsp_cores\fft_score\SIMULATION\Loop_engine\Radix2\scripts\functional_NX2 to run a testcase. Following are the files which are used for multicore loop engine (EASIC) configuration 1) compile_all_ multicore _viarom.do //Compilation of Loop engine radix 2 Architecture source & TB 2) compile_lib_NX2.do //Compilation of NX 2 device libraries source files 3) run_all_ multicore _viarom.do // Loading the design & running the simulation of Loop engine radix 2 architecture files // Generation of final verdict
4) fft_gen_verdict_rpt_multicore _viarom.tcl
All these scripts are similar to the scripts used for loop engine test cases and they are explained above.
Rev: FFT_v2_1_ds001 www.eASIC.com 60
CONFIDENTIAL
NOTE - The multicore looping configuration is also available for altera version. The script files used are given as follows and they are present in the same folder as before. Following are the files which are used for multicore loop engine (altera) configuration 1) compile_all_viarom_multicore_altera.do //Compilation of Loop engine radix 2 Architecture source & TB 2) compile_lib_NX2.do 3) run_all_viarom_ multicore_altera.do 4) fft_gen_verdict_rpt_multicore _viarom.tcl //Compilation of NX 2 device libraries source files // Loading the design & running the simulation of Loop engine radix 2 architecture files // Generation of final verdict
An Example Testcase
Select a testcase to be run. Let us consider TV_FFT_36 is to be run. As per the testcase register .xls TV_FFT_36 has the following configuration. 1) Multichannel test case (2 channels) 2) Run time configurable 3) N point = 8192 4) Dynamic FFT/IFFT 5) Data width = 16 (Real = 16 & Imag = 16) 6) Phase width = 16 (Real = 16 & Imag = 16) 7) Input order Natural 8) Block floating point Hence, in the fft_config.vh file under the directory \data\dsp_cores\fft_core\INTERFACE\multi_core _Loop_engine\NX2\ TV_FFT_36 \ which is the configuration file for the core should have following definitions ` define DATA_WIDTH ` define PHASE_WIDTH ` define N_POINT //`define STATIC_IFFT ` define DYN_FFT_IFFT //`define NX ` define RUN_TIME_N_CONFIG ` define INPUT_ORDER ` define BLK_FLT_POINT In addition to this the user can also define clock period & duty cycle This completes the FFT core configuration. Next is changing the testcase name in the run_all_multicore_viarom.do for simulation and running it In the run_all_viarom_multi_core.do file which is under the folder \data\dsp_cores\fft_core\SIMULATION\Loop_engine\Radix2\scripts\func tional_NX2\, set the test case name to be run as follows and save the file.
Rev: FFT_v2_1_ds001 www.eASIC.com 61
16 16 8192
CONFIDENTIAL
quietly set TESTCASE {" TV_FFT_36"}
Run the script using the command do ./Loop_engine/Radix2/scripts/functional_NX2/run_all_multicore_ viarom.do After the simulation is complete, the output report file can be viewed for final results. The file verdict.rpt will be available in the folder \data\dsp_cores\fft_core\TEST\multi_core_Loop_engine\Radix2\functional_NX2. This verdict will indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT. NOTE -Details regarding the rtl files can be found in the release documentfft_release_notes_2.1.doc The rtl files can be found in the folder \data\dsp_cores\fft_core\RTL\Multi_core_Loop_engine\r2_rtl for radix 2 \data\dsp_cores\fft_core\RTL\Multi_core_Loop_engine\r2_rtl for radix 4 And the Test bench files are there in the folder \data\dsp_cores\fft_core\TESTBENCH\multi_core_Loop_engine
CONFIDENTIAL
Details about the RTL files and Test Bench files are provided in the release notes
Rev: FFT_v2_1_ds001
www.eASIC.com
63
CONFIDENTIAL
References
1) Digital Signal Processing - Principles, Algorithms & Applications Proakis & Manolakis][3rd Ed].
Revision History
Date 04/24/2010 Version v2.1 ds001 Summary of Changes Initial version release
Rev: FFT_v2_1_ds001
www.eASIC.com
64