You are on page 1of 6

A Pipelined CORDIC Architecture and Its Implementation in All-Digital

FM Modulator-Demodulator
Trio Adiono, Nur Ahmadi, Antonius P. Renardy, Ashbir A. Fadila, Naufal Shidqi
School of Electrical Engineering and Informatics, Bandung Institute of Technology
Jl. Ganesha No 10, Bandung, Indonesia 40132
E-mail: tadiono@stei.itb.ac.id

Abstract— COordinate Rotation DIgital Computer able for such application that need lesser elements usage.
(CORDIC), is an algorithm that is used to perform The rest of this paper is organized as follows: Sec-
trigonometric-related calculations. CORDIC is often uti- tion II describes the algorihm of CORDIC; Section III
lized in the absence of hardware multiplier since this al- presents the architectural implementation of CORDIC al-
gorithm requires only addition, subtraction, bit shifting, gorithm; Section IV covers the use of CORDIC as NCO
and lookup table. This paper provides an implementa- and other modules required in all-digital FM modulator-
tion of CORDIC algorithm using pipelined architecture. demodulator; The implementation of proposed system on
The pipelined CORDIC is then used in an all-digital FM FPGA and its performance evaluation are provided in Sec-
modulator-demodulator. All designs are implemented in tion V; Finally, conclusions are drawn in Section VI.
Verilog and synthesized by using Altera Quartus software
with DE2-70 FPGA target board. The proposed design II. CORDIC Algorithm
consumes 1,103 logic element, latency 33.32 ns, and maxi- CORDIC algorithm is derived from rotation of a vector
T
mum frequency 420.17 MHz. The overall system including [x0 y0 ] in Cartesian coordinate which can be expressed in
(1) where [x y  ] is the final vector produced after rotation
FM modulator-demodulator utilizes 3,911 logic elements, T

latency 233.33 ns, and maximum frequency 60 MHz. and θ is the target angle of rotation.
Keywords— CORDIC, Frequency Modulation, Modu-      
x cos(θ) sin(θ) x0
lator, Demodulator, FPGA.  = (1)
y − sin(θ) cos(θ) y0

I. Introduction Equation (1) can be rewritten by factoring out the cosine


function as follows
COordinate Rotation DIgital Computer (CORDIC), also      
known as Volders Algorithm based on its inventor [1], is x 1 tan(θ) x0
= cos(θ) (2)
an algorithm that is used to perform trigonometric-related y − tan(θ) 1 y0
calculations. By varying a few parameters, this simple yet
efficient algorithm can also be used for implementation in CORDIC employs iteration in which the angle θ is ex-
wide range of elementary transcendental function involving pressed as summation of elementary rotation angles α
exponentials, logarithms, and square roots [2]. which is defined in (3), where b is the bit precision of the
CORDIC is often utilized in the absence of hardware angle argument and σ the direction of the rotation, which
multiplier since this algorithm requires only addition, sub- can only be −1 or 1. The elementary rotational angle α is
traction, bit shifting, and table lookup. This simple setup restricted to have only certain values as expressed in (4).
render the potential of efficient and low-cost implementa-

b−1
tion with generally faster speed than most hardware ap- θ= σi α i (3)
proaches. i=0
Several architectures exist in order to keep the require-
ments and constraints of different applications. Itera- αi = tan−1 (2−i ) (4)
tive architecture provides hardware implementation with Substituting (3) into (2) and by using definition in (4), the
minimum size with throughput as the tradeoff. High- final vector produced by iteration can be expressed in (5)
throughput computation often implement parallel and and the remaining angle in each iteration is expressed in
pipelined CORDIC. (6).
This paper provides a prototype for implementing
CORDIC algorithm with pipelined architecture. In addi-   
b−1   
x 1 σi 2−i xi
tion, we propose the use of CORDIC algorithm in all-digital = cos(αi ) (5)
y −σi 2−i 1 yi
Frequency Modulation (FM) modulator-demodulator. Our i=0
designed CORDIC is utilized as a Direct Digital Frequency
zi+1 = zi + σi 2−i (6)
Synthesizer (DDFS) in FM modulator and as a Numer-
ically Controlled Oscillator (NCO) in FM demodulator. The physical meaning of expressions above is: in rotation
This implementation utilizes smaller area compare to oth- mode, each iteration will decrease the angle component (zi )
ers that use a LUT. Therefore, CORDIC algorithm is suit- to approach zero, while in vectoring mode the ordinate (yi )

978-1-4673-7495-8/15/$31.00 ©2015 IEEE 37 6th Asia Symposium on Quality Electronic Design


is to be made to zero. This is achieved by either rotating (Zi<-pi/2) ||
(Zi>pi/2) Zi>pi/2
the vector clockwise or counter-clockwise. By initializing
the three CORDIC parameters (x0 , y0 , z0 ), different output
Xi 0
can be produced. For example, by setting x0 = 1, y0 = 1 0
and z0 = π/2 , each iteration will bring xi and yi closer to Yi 1 Xo
-1 1
cos(π/2) and sin(π/2) respectively.

III. CORDIC Architecture Zi<-pi/2

In order to meet design specification of 2−8 accuracy 0


0
and considering the angle range of input argument to be 1 Yo
[−π, π], we chose 16-bit wide datapath with signed fixed- -1 1

point number representation (1 sign bit, 2 numeral bits,


(Zi<-pi/2)||(Zi>pi/2)
and 13 fractional bits).
Figure 1 shows top level design of pipelined CORDIC Zi>pi/2

architecture. The design has three inputs: Clk for clock, 0 0


Rst for reset (active low), and Angle for the target an- 0
Pi/2 1
gle. It also has two outputs, Cos and Sin, which rep- -1 1
Zo
resents cosine and sine result of the target angle respec- Zi
tively. Two main block are available: Quadrant Detec-
tor and CORDIC Core. In order to calculate cosine and Fig. 2: Quadrant detector
sine value, the input angle are first processed in Quadrant
Detector. This block has three inputs which corresponds
that controls the direction of rotation, and an inverse tan-
to the input argument of CORDIC rotation expressions
gent constant are present. Since the amount of bit-shifting
(x, y, z). The and port of this block are set to 0x136E
performed on each stage is constant, the shifter can be im-
and 0x0000. This is necessary in order to achieve the fi-
T plemented as series of wire.
nal vector in the form of [cos θ sin θ] without the need
of additional post-processing. The output of this block is Direction
the appropriate x, y, and z argument with respect to the xi yi sign Selector sign zi arctani
quadrant of the target angle, as shown in Figure 2.
>>i >>i
Angle Clk Rst

0x2000 0x0000
16 16 16 xi+1 yi+1 zi+1
Xi Yi Zi
Quadrant Detector
Fig. 3: Elementary rotational unit of pipelined CORDIC
Xo Yo Zo core
16 16 16

Dx Dy Dz
Input Register
Each pipeline stage is separated by a register in order to
Rst
Qx Qy Qz reduce the critical delay path. This leads to a penalty in
16 16 16
terms of increased latency as the pipeline stage increases.
Xi Yi Zi In total, the latency for this design is 14 clock cycles. Due
CORDIC Core
Xo Yo Zo
Rst to the nature of the architecture, this design is able to
16 16 16 achieve throughput of one.
Dx Dy Dz
Output Register IV. FM Modulator-Demodulator
Rst
Qx Qy Qz
16 We implement our pipelined CORDIC into an All-
16 16
Digital FM Modulator-Demodulator. Frequency Modula-
tion (FM) is used in many applications such as audio and
Cos Sin
video broadcasting systems, telemetry, radar, etc. The
CORDIC is applied in Numerically Controlled Oscillator
Fig. 1: Top level of pipelined CORDIC (NCO), the key component inside FM Modulator and FM
Demodulator block. NCO plays an important role in dig-
CORDIC Core is the realization of CORDCIC’s differ- ital FM to ensure the linearity over the entire frequency
ence equations in pipelined architecture. There are 14 range which analog FM is lack of [3]. The architecture of
stages of pipeline in the core, each comprising the struc- proposed application is described in Figure 4.
ture shown in Figure 3. On each stage, three adders/sub- First, the analog input signal x(t) is sent into our ap-
stractor, two arithmetic right-shifters, one direction block plication system. ADC block is used to convert the input
signal, the input of NCO is,
x(t) x[n] FM m[n] FM r[n] r(t)
step = 1 × 106 × 67.464 = 67464000
ADC DAC
Modulator Demodulator (8)

The 16-bit MSB of accumulator value then is used by


Fig. 4: Proposed application architecture our pipelined CORDIC to generate cosine or sine output.

B. Modulator
signal into digital signal, x[n]. Then, FM Modulator block
will generate frequency modulated signal m[n] using x[n] In the FM modulation technique, which is a kind of an-
as message signal and our pipelined CORDIC as carrier gle modulation methods, instantaneous frequency of the
signal. carrier signal varies linearly with the baseband-modulated
The signal m[n] will be digitally transferred into FM message signal x(t) as follows:
Demodulator block. In this block, we use all-digital phase
locked loop as the demodulation method because it is nor- MF M (t) = Ac cos[2πFc t + θ(t)]
 t
mally considered a relatively high performance form of FM (9)
= Ac cos[2πFc t + 2πKf m(n)dn]
demodulator beside its easiness to be applied. FM Demod- 0
ulator block will produce recovered message signal, r[n].
Then, r[n] will be converted into analog signal by passing where Ac is the amplitude of the carrier, Fc is the carrier
it through DAC block. frequency, and Kf is the frequency deviation constant.
The architecture of the FM modulator is as shown in
A. Numerically Controlled Oscillator (NCO) Figure 6.
The fundamental component of our application is a Nu-
merically Controlled Oscillator (NCO). The architecture of
designed NCO block is shown in Figure 5.
Kf

Trigonometric Message Modulated_message


output Function Unit input NCO
(Modified-
Pipelined CORDIC)
Carrier_freq
D Q
accumulator
Fig. 6: FM modulator architecture
Fig. 5: NCO block design
The carrier frequency of our FM Modulator is 10 kHz, so
The block is a simple accumulator which accumulates the value that will be inputted into NCO is pre-computed
the input value and maps it into a Trigonometric Function with Norm first become step form as we discussed in previ-
Unit. In some NCO block designs, they need a block to ous section. In addition, because of multiplier will consume
normalize the accumulator value into a phase of sine or co- large area when it is implemented in hardware, we manip-
sine, such as a multiplication by 2π. In our proposed NCO ulate the multiplier block to add and shift operations as it
design, we do not need normalization block because we use multiplies the Message with a constant Kf . This constant
accumulator value as input of Trigonometric Function Unit is obtained from Equation 10 as follow,
directly. The output frequency of general NCO is given by, Bandwidth M odulation
Kf = × 67.464
step M essage W idth/2 (10)
Fout = N × Fclock (7) 3000
2 = × 67.464 ≈ 6
32768
where step is the input of the NCO block and 2N is the
modulo. The arithmetic manipulation of constant multiplication
In order to avoid the use of multiplier, we have selected is A = 6B = 2(2B + B). Multiplication by a factor of 2
the modulo into our pipelined CORDIC’s angle range. can be implemented by 1-bit left shift instead of a multi-
Thus, the modulo value is 51470, as our CORDIC angle plier. Therefore, our multiplier block is turned into follow-
range is −π to π or 0x9B78 to 0x6488 in our16-bit fixed- ing block:
point format.
Due to enlarge our frequency resolution, the modulo data B A
Shift << 1 Shift << 1
width is extended from 16 bit to 32 bit and put the 51470
value into MSB. The ratio between modulo and Fclock is
called Norm. In our design, Norm value is 67.464. It ex-
presses the step of accumulator per 1 Hz of desired fre- Fig. 7: Modified constant multiplication
quency. For an example, if we want to generate 1 MHz
C. Digital Phase-Locked Loop (DPLL) In1 Out
D Q
In FM Demodulator, a Digital Phase-Locked Loop is
used. It generates an output signal whose phase is re- In2
lated to the phase of input signal. Operation of the digital Fig. 9: Phase detector architecture
phase-locked loop (DPLL) as an important component of
FM Demodulator has been considered in the early of 1970s
[4]. This operation needs a multiplier module. In order
The DPLL system consists of three elemental parts, to avoid the use of multiplier block in hardware directly,
namely Phase Detector, Loop Filter, and Numerically Con- we compares several parallel multiplier architectures [6] as
trolled Oscillator (NCO) [5]. The complete diagram of FM shown in Table I.
Demodulator block is shown in Figure 8 including a Low
Pass Output Filter. TABLE I: Comparison of several multiplier architectures
Architecture A (LEs) T (ns) A∗T
Recovered
CT PPA - CSA FSA 543 26.878 14594.754
Modulated
Phase
Loop Filter
Low Pass DT PPA - LFA FSA† 746 25.150 18761.900
Detector Filter
WT PPA - KSA FSA‡ 627 25.439 15950.253
 Compressor Tree PPA - Carry Select Adder FSA
† Dadda Tree PPA - Ladner Fischer Adder FSA
‡ Wallace Tree PPA - Kogge Stone Adder FSA
NCO Gain

Fig. 8: FM demodulation architecture Based on Table I, Compressor Tree PPA Carry Select
Adder FSA offers the best performance among the other
The input frequency modulated signal can be expressed multipliers in terms of A ∗ T . Because of our target is to re-
as follows: place the area consumption of signed arithmetic multiplier
in(t) = sin[ωi t + θi (t)] (11) operation, Compressor Tree PPA CSA FSA multiplier is
the right choice although its propagation delay (T ) is larger
Feedback loop mechanism of the PLL makes the NCO to
than others.
produce a sinusoidal signal ref (t) with the same frequency
as that of in(t), where V. FPGA Implementation
ref (t) = cos[ωi t + θ0 (t)] (12) A. Simulation
The output of the phase detector, which is the product The verification is performed by comparing results pro-
of these two signals, is found using familiar trigonometric duced from every bit possibilities of the input argument
identity: to sine and cosine function in Microsoft Excel. The cosine
value obtained from CORDIC and Excel computation is
pt = Kd [in(t) + ref (t)]
shown in Figure 10a. The error performance of the pro-
= Kd [sin[ωi t + θi (t)] ∗ cos[ωi t + θ0 (t)] (13)
Kd posed design is plotted as in Figure 10b. The maximum
= [sin[2ωi t + θi (t) + θ0 (t)] + sin(θi (t) − θ0 (t))] of output error obtained from simulation of cosine and
2
sine between −π and π is 8.095 × 2−13 and the average is
where Kd is the gain of the phase detector. The first term in
1.585×2−13 . Figure 11 shows the Modelsim functional sim-
(13) relates to the high frequency component. The second
ulation of proposed pipelined CORDIC. The design has 14
term corresponds to the phase difference between in(t) and
clock cycles latency and throughput of one. This amount of
ref (t). By removing the first term through a loop filtering,
latency results from 14 stages pipelined architecture inter-
the phase difference can be obtained.
stage registers.
The important thing to realize when designing with the
PLL is that it is a feedback system and, consequently, it is
characterized mathematically by the similar equations that 1.0
Cordic 0.0008
Excel
are applied to other conventional feedback control systems. 0.5
0.0006

0.0004
Cosine Value

The mathematical model of the DPLL system can be de-


Error Value

0.0002
0.0
0.0000
rived to analyze the transient and steady state responses. -0.0002

The second-order DPLL system improves the performance -0.5 -0.0004

-0.0006
of the loop in terms of the speed and locking range as com- -1.0 -0.0008

pared to the first-order DPLL system. -4 -3 -2 -1 0 1 2 3 4


-0.0010
-4 -3 -2 -1 0 1 2 3 4
Phase (rad) Phase (rad)

D. Phase Detector Block (a) (b)

The phase detector is used to discover the phase error Fig. 10: Cosine value comparison (a) and the error perfor-
between modulated message and the output of NCO block mance (b)
as shown in Figure 9.
Fig. 11: Functional simulation showing input and output pipelined CORDIC

Our FM modulator for data modulation is used to verify [4] in term of resource utilization (area) as can be seen in
the FM demodulator block design. The sine-wave data Table III. Our proposed FM modem design is shown to
frequency is 1 kHz, while the bandwidth modulation is ± have better resource utilization which is 1,123 for modula-
3 kHz at 10 kHz center frequency. The result can be seen tor and 1,997 for demodulator.
in Figure 12.
TABLE II: Synthesis results of CORDIC and FM Modem
Parameters CORDIC FM Modem
Total Logic Elements 1,103 3,911
Combinational Functions 1,040 3,666
Registers 649 1,894
Memory Bits 0 6,144
Embedded Multipliers 0 0
PLLs 0 1
Fmax (MHz) 420.17 60
Latency (ns) 33.32 233.33

Area Block Comparison


Fig. 12: FM demodulator simulation result 2500
2095 2095 2095
2000 1848 1887 1887 1887

From the verification result, the first row shows the data 1500
1152
signal that has been inputted manually in testbench file. 1000
1040

The second row is the FM modulated waveform accord- 649


500
ing to the input data and the third row is the phase de-
tector output. The fourth row and the fifth row are the 0
Our Pipelined Sun-Ting Lin Quartus Quartus Quartus
NCO output and the loop filter output, respectively. The CORDIC (Cyclone II) MegaCore MegaCore MegaCore
(Cyclone II) (Arria V) (Cyclone V) (Stratix V)
last row shows the output of low pass filter block which
LE Register
is the recovered signal. At the initial of simulation phase,
Fig. 13: Area comparison
the demodulated signal is still unshaped since the phase
synchronization is in convergence phase and then system Clock Speed Comparison
is stable. From Figure 12, the designed FM demodulator 700 644
block successfully demodulates input data signal back to 600
the original signal. 500
420.17
400 340
B. Synthesis Results 300
215.75
260
200
The designed system was described using Verilog HDL 100
and Altera Quartus II was used for synthesis and FPGA 0
implementation. Altera Cyclone II 2C70 was chosen as the Our Pipelined
CORDIC
Sun-Ting Lin
(Cyclone II)
Quartus
MegaCore
Quartus
MegaCore
Quartus
MegaCore
target device for implementation. The resource utilization (Cyclone II) (Arria V) (Cyclone V) (Stratix V)

such as logic elements, memory bits, and multipliers, and Fmax (MHz)

maximum working frequency can be obtained in compila- Fig. 14: Clock speed comparison
tion report. Latency can be obtained by dividing amount
of cycles needed for processing one frame with maximum
frequency. The synthesis results of the pipelined CORDIC TABLE III: Comparison of FM Modem
and overall system of FM modem are listed in Table II. Blocks Our Design [4]
The proposed CORDIC design is compared to other de- FM Modulator 1,123 6,270
signs including built-in MegaCore function inside Quartus FM Demodulator 1,997 5,934
Altera for several device families. The CORDIC’s area - Trigonometric Unit 1,087 1,510
- Phase Detector 516 616
comparison is shown in Figure 13 while the clock speed
- Loop Filter 40 297
comparison is presented in Figure 14. From both figures, it - Low Pass Filter 335 3,511
can be inferred that our proposed design performes better.
The design of FM modem is also compared to other design
C. On-Chip Verification of the demodulator through digital oscilloscope. Spectral
In our proposed application design, we need 240 kHz analysis was then performed by using Fast Fourier Trans-
clock generator since the PLL in FPGA cannot provide form. To measure the SFDR in dB, the magnitude of the
small frequency. The principal of clock divider block is largest spurious signal was subtracted from the magnitude
to generate binary data (0 and 1) when it reaches certain of the signal at 500 Hz, both were in dB.
value. The V alue = Fclock /2Fout . Thus, for generating a Figure 17a and 17b show the FFT results of the output
240 kHz clock from 50 MHz clock system, the V alue is 104. of NCO and overall system. From these figures, it can be
The designed system was implemented in Altera DE2- inferred that the SFDR is -38 dB for the NCO and -22 dB
70 FPGA board. It has one input (analog signal as the for the overall system.
message to be modulated) and one output (analog signal
resulting from demodulation). The ADC and DAC was
implemented using built-in audio codec with 16-bit wide
digital data and 48 kHz sampling frequency.
For testing purpose, the input signal to the modulator
was given from PC in the form of music and the output sig-
nal was observed using Audacity software (also on PC) as
shown in Figure 15. The output waveform from FPGA
recorded on the Audacity software as well as the input
waveform can be seen in Figure 16. It can be concluded (a) (b)
from inspection of this figure that the designed circuit was
Fig. 17: Spectral plot of NCO and FM demodulator output
able to demodulate the FM input signal back to its origi-
nal form, although some noises was present in the output
signal. The noises came from the steady state error of the VI. Conclusion
output of PLL since the control loop was implemented us-
ing gain control only. This hindered the PLL to completely We have succesfully implemented and verified CORDIC
lock the frequency of the signal to be demodulated, result- algorithm based on pipelined architecture. Its functionality
ing in some errors to the output of the PLL. is verified using ModelSim with accuracy 2−13 and maxi-
mum error of 8 × 2−13 . The proposed pipelined CORDIC
Software :
-Audacity
architecture consumes 1,103 logic element, latency 33.32
-Music
Player
ns, and maximum frequency of 420.17 MHz. While our FM
modulator-demodulator application utilizes 3,911 logic ele-
Output audio
ments, latency 233.33 ns, and maximum frequency 60 MHz.
Input audio The proposed design is compared to other design and of-
fers better performance. The CORDIC serves as NCO for
signal generation. Frequency demodulation is achieved by
Fig. 15: Tools arrangement utilizing digital PLL with NCO for decoding the message
signal. The all-digital FM modem has been successfully im-
plemented and run in Altera DE2-70 FPGA Development
board. The input sound is obtained from the computer and
output sound is sent to active speaker.

References
[1] J. E. Volder, “The CORDIC trigonometric computing technique,”
Fig. 16: Comparison of our application output (upper) and IRE Transactions on Electronic Computers, no. 3, pp. 330–334,
1959.
original audio (lower) [2] J. S. Walther, “A unified algorithm for elementary functions,” in
Proceedings of the Spring Joint Computer Conference. ACM,
May 1971, pp. 379–385.
D. Performance Evaluation [3] I. Hatai and I. Chakrabarti, “A new high-performance digital
fm modulator and demodulator for software-defined radio and
The quality of the system was measured by finding the its fpga implementation,” International Journal of Reconfigurable
Computing, vol. 2011, p. 2, 2011.
spurious-free dynamic range (SFDR) which is the magni- [4] J. P. M. Brito and S. Bampi, “Design of a digital FM demodu-
tude ratio of the fundamental signal to the strongest signal lator based on a 2nd-order all-digital phase-locked loop,” Analog
within the Nyquist frequency of the output. Two measure- Integrated Circuits and Signal Processing, vol. 57, no. 1-2, pp.
97–105, 2008.
ment of SFDR was performed to: (1) measure the qual- [5] T. Wada, “All digital FM receiver (version 1.0),” Sep. 03,
ity of the designed NCO and (2) the overall modulator- 2004. [Online]. Available: http://www.ie.u-ryukyu.ac.jp/∼wada/
demodulator system. The measurement was performed by design05/spec e.html
[6] T. Aoki, “Hardware algorithms for arithmetic modules,” Aug.
taking pure 500 Hz sinewave as the input and observing the 08, 2007. [Online]. Available: http://www.aoki.ecei.tohoku.ac.
output of the NCO+ADC (often called Direct Digital Fre- jp/arith/mg/algorithm.html
quency Synthesizer or DDFS) in modulator and the output

You might also like