You are on page 1of 6

A NOVEL LOW-LATENCY PARALLEL ARCHITECTURE FOR DIGITAL PLL WITH APPLICATION TO ULTRA-HIGH SPEED CARRIER RECOVERY SYSTEMS Pablo

Gianni, Hugo S. Carrer, Graciela Corral-Briones, and Mario R. Hueda Laboratorio de Comunicaciones Digitales - Universidad Nacional de C rdoba - CONICET o Av. V lez Sarseld 1611 - C rdoba (X5016GCA) - Argentina e o Emails: giannipablo@gmail.com, mhueda@com.uncor.edu
ABSTRACT This paper introduces a new low latency parallel processing digital carrier recovery (CR) architecture suitable for ultra-high speed intradyne coherent optical receivers (e.g. 100Gb/s). The proposed parallel scheme builds upon a novel digital phase locked loop (DPLL) architecture, which breaks the bottleneck of the feedback path. Thus, it is avoided the high latency introduced by the parallel processing implementation in the feedback loop of traditional DPLLs. Numerical results show that the bandwidth and the capture range of the new parallel DPLL are close to those achieved by a serial DPLL. This excellent behavior makes the proposed low latency parallel DPLL architecture an excellent choice for implementing high speed CR systems in both ASIC and FPGA platforms. 1. INTRODUCTION Coherent detection based receivers with electronic dispersion compensation (EDC) are being considered for next generation optical ber transmission systems (e.g., 100 Gigabits per second (Gb/s) and beyond) [1, 2]. Unlike in intensity modulation direct detection (IM/DD) schemes [3], in coherent detection receivers with EDC it is possible to completely compensate with zero penalty the main ber channel impairments [1] (i.e., chromatic dispersion (CD) and polarization mode dispersion (PMD) [4]). In particular, intradyne detection is preferred over the alternative heterodyne or homodyne architectures because it replaces complex optical phase-locked loops (PLLs) by more robust and easier to implement digital carrier recovery (CR) techniques. The main challenges for the digital carrier recovery in ber optic receivers are the high carrier frequency offset typical of intradyne optical detectors, and the large phase noise of typical lasers used as transmitters and local oscillators [5, 6]. Both challenges can be overcome with a high-bandwidth carrier recovery system. However, another challenge is that
This paper has been supported in part by the ANPCyT (PICT20081256), MINCyT - C rdoba (PID2008), and Fundaci n Tarpuy. o o

parallel processing implementations, necessary to achieve 100Gb/s throughput, introduce high latency in the feedback loop of traditional digital PLLs (DPLLs), which limits the achievable bandwidth and consequently the capture range and phase noise tracking capabilities of the receiver. Although feedforward carrier recovery based on the Viterbi and Viterbi (VV) algorithm overcomes some of the latencyrelated limitations [6], traditional decision directed DPLLs [7] offer advantages in some aspects of the operation of CR, for example, tracking of high amplitude and high frequency sinusoidal carrier frequency jitter experienced by typical lasers. Therefore, an optimal carrier recovery may involve a combination of the VV algorithm with a traditional decision directed DPLL. This motivates the interest in low-latency DPLL implementations suitable for parallel processing. A traditional PLL is often modeled as a linear lter, assumption which is useful to compute the small signal transfer function [8, 7]. However, the PLL is actually a nonlinear lter. Unfortunately, this precludes the use of the unfolding techniques discussed by Parhi in [9], which are applicable only to strictly linear lters. Therefore, a different approach to reduce the latency of the PLL parallel implementation must be considered. In this work we introduce a new low latency parallel processing digital carrier recovery architecture suitable for ultra-high speed intradyne coherent optical receivers. The proposed approach takes out of the feedback loop as much hardware as possible in order to simplify the loop and reduce latency. Then, the bottleneck of the critical PLL feedback path is broken by using a novel approximation to the DPLL computation. Simulation results show a capture range and bandwidth close to those achieved by serial DPLLs [8]. The proposed low latency DPLL architecture enables the efcient parallel implementation of high-speed CR systems in both FPGA and ASIC devices. This paper is organized as follows. Section 2 introduces the new DPLL computation and describes parallel implementation architectures. Section 3 presents simulation results while conclusions are drawn in Section 4.

31
978-1-4244-8848-3/11/$26.00 2011 IEEE

In a decision directed CR loop (see Fig. 2), the symbol information is rst removed [7]. In QPSK modulation receivers, this operation can be easily carried out in the phase domain as follows: Fig. 1. Simplied block diagram of the coherent receiver with electronic dispersion compensation (EDC). n = (n )/2 , (5)

where (.)M denotes modulus M . In the absence of phase noise and carrier frequency offset (i.e., n = 0 and c = 0), notice that n = /4 n. The residual phase n is ltered by the rst-order PLL. The phase at the output of the numerical control oscillator (NCO) results [7] n = n1 + Kp n , (6)

Fig. 2. Block diagram of a decision-directed rst-order serial DPLL. 2. NEW APPROXIMATION TO DPLL COMPUTATION 2.1. First-Order DPLL Figure 1 shows a simplied block diagram of the coherent receiver with electronic dispersion compensation. Without loss of generality we consider quadrature phase-shift keying (QPSK) modulation [7]. Then, the sample at the EDC output can be expressed as rn = an ejn + zn , (1)

where Kp is the proportional gain of the rst-order PLL loop lter, and (7) n = n n1 4 /2 is the phase error. For QPSK modulation, note that n [/4, +/4] (e.g., n = 0 when n = n1 and c = 0). Similarly, it is possible to show that n+1 = = where n+1 = n+1 n1 Kp n
/2

n + Kp n+1 n1 + Kp n + Kp n+1 , (8)

where an {1 j} is the transmit QPSK symbol; n is the total phase noise, which includes the effects of the lasers phase noise, carrier frequency offset, and laser phase jitter. Component zn represents the amplied spontaneous emission (ASE) noise sample, which is modeled as a white complex Gaussian random variable with power 2 [1]. The EDC output signal (1) can be rewritten as rn = sn ejn , (2)

. 4

(9)

where sn and n are the module and the phase of the complex sample rn , respectively. In QPSK modulation systems, the symbol information is contained in the phase of rn . The received phase n can be expressed as n = n + c n + n , (3)

Notice that the nonlinear operation (.)/2 precludes the use of the unfolding techniques for parallel processing1 [9]. When the carrier frequency offset is very small (i.e., c 1) and the bandwidth of the loop is low to moderate such Kp 1, the term Kp n in (9) can be neglected. Thus, the phase error results , n+1 n+1 n1 (10) 4 /2 therefore n+1 + n1 + Kp (n n1 )/2 Kp (n+1 n1 )/2 2Kp . 4

where n {/4, 3/4} is the phase of the transmit QPSK symbol an , c is the angular carrier frequency offset given by c = 2T fc , where fc and T are the carrier frequency offset and the symbol duration, respectively. Component n is the total phase noise given by n = (laser) + (ASE) + (jitter) . n n n (4)

Generalizing, we can get n+m n1 + Kp


m

n+k n1

k=0

/2

Note that n includes the contribution of the laser phase (laser) (ASE) ), ASE generated phase noise (n ), and noise (n (jitter) laser phase jitter (n ).

(m + 1)Kp , 4

m 0.

(11)

1 This

situation is similar to the one found in [10].

32

Similarly, it is possible to show that n+1 = = + where n+1 = Fig. 3. Low latency parallel architecture for the rst-order DPLL. n+1 n1 Kp n Ki n1
/2

n + Kp n+1 + Ki n n1 + Kp (n + n+1 ) Ki (n1 + n ) (15)

. 4

(16)

For the type-II second-order DPLL, the steady-state error is zero (i.e., lim n 0) [7]. Thus, assuming that the n bandwidth of the loop is low to moderate such Kp 1, the contribution of the term Kp n can be neglected; therefore the phase error (16) results . (17) n+1 n+1 n1 Ki n1 4 /2 Furthermore, since the accumulated phase error varies slowly with the time (i.e., n n1 ), from (15) and (17) we can obtain

Fig. 4. Implementation of the low latency parallel rst-order DPLL. A low latency parallel implementation of the rst-order DPLL can be easily derived from (11). Let P be the parallelization factor. Figs. 3 and 4 show the architecture of the low latency parallel rst-order DPLL. Block Wk (k = 0, 1, ..., P 1) uses a fast adder (e.g., a Wallace tree and a carry save adder [9]) to quickly calculate the NCO output. Furthermore, the gain Kp is assumed to be a power of 2 (i.e., Kp = 2Np with Np being a positive integer). This way, multiplications by the proportional gain Kp are reduced to simple bit shift operations. 2.2. Second-Order DPLL For a second-order DPLL, the NCO output is given by [7] n = n1 + Kp n + Ki n1 , where Ki is the integral gain while n1 =
n1 k=0

n+1

+ +

n1 1 Kp (n+k n1 kKi n1 )/2


k=0

2 Ki n1 Kp

. 4

(18)

The good accuracy of (17) and (18) will be veried by computer simulations in the next section. Following a similar analysis, it is possible to derive n+m + + n1 m Kp (n+k n1 kKi n1 )/2
k=0

(m + 1) Ki n1 Kp , 4

m 0. (19)

A low latency parallel architecture to implement the secondorder DPLL can be obtained from (19). 2.3. Modied Second-Order DPLL Maximum clock frequency of complex digital signal processors in state of the art 40nm CMOS technology is limited to less than 1GHz. Thus, the computational load and bit resolution required to carry out the different operations in (19) could be difcult to implement in multigigabit per second data rate receivers with current CMOS technology. In order to simplify the implementation of (19), consider the block diagrams of the DPLL shown in Fig.5. Note that the secondorder DPLL can be considered as two separated feedback

(12)

k ,

(13)

is the accumulated phase error with k = k k1 . 4 (14)

/2

33

Fig. 5. Block diagrams of the decision-directed secondorder serial DPLL. loops: the proportional and integral loops. Thus, the NCO output (19) can be rewritten as n+m = n+m + n+m ,
(p) (i) (p) (i)

Fig. 6. Low latency parallel implementation of a secondorder DPLL. On the other hand, from (19), (22), and Fig. 5 we can also derive the NCO component due to the integral path: n+m
(i)

m 0,

(20)

where n+m and n+m are the NCO components due to the proportional and integral paths, respectively (see Fig. 5). From (19), it is simple to show that n+m
(p)

n1 + (m + 1)Ki n1 .

(i)

(25)

n1 m Kp (n+k n1 kKi n1 )/2


k=0

(p)

Based on (14), (20), (24), and (25), the accumulated phase error can be evaluated as n+m = with
n+m k=0

(m + 1)Kp . 4

k = n1 +

n+m k=n

k ,

(26)

(21) k
/2

Since

n+k n1 kKi n1
(i) n1 (p) n1

= =

kKi n1 = n+k /2 (i) (p) = (n+k n1 kKi n1 )/2 n1 eq. (21) can be rewritten as n+m
(p)

(22)
/2

(p) (i) k k1 k1 4 /2 (p) . k k1 4 /2

(27)

n1 + Kp

(p)

k=0

(n+k n1 )/2 (23)

(p)

(m + 1)Kp , 4

where (i) n+k = n+k n1 kKi n1


/2

Thus, a parallel implementation of the type II second-order DPLL can be easily achieved as depicted in Fig. 6. Term L = lP with l being a positive integer, represents the latency required to compute all the operations of the integral path (e.g., computation of the phase errors (27)). Since the latency in this path is not as critical as in the proportional loop, its effect on the DPLL performance will be negligible, as we will show in the next section. Similarly to Kp , the integral gain Ki is assumed to be a power of 2 (i.e., Ki = 2Ni with Ni being a positive integer). Finally, it is important to note that all additions are modulus 2. 3. SIMULATION RESULTS Next we evaluate the effectiveness of the proposed low latency parallel DPLL architecture. We use QPSK modulation

(24)

Notice that (23) reduces to the rst-order DPLL computation given by (11), therefore its parallel implementation can be achieved as shown in Fig. 4.

34

12

Table 1. DPLL Parameters


DPLL Serial Proposed
6 4 2 0 2 4 6 8 10 12 5 10
6 7 8 9

Parallelism 1 16

Kp 0.12 24

Ki 0.001 27

Processing Rate
SNR at BER=1e3 [dB]

Serial DPLL Low Latency Parallel DPLL

11.5

10GHz 625M Hz

11

Serial DPLL Low Latency Parallel DPLL

10.5

10

9.5

Magnitude [dB]

9 1.5 1 0.5 0 0.5 1 1.5


Frequency offset [GHz]

Fig. 8. Capture range of the serial and low latency parallel DPLL.
10 10 10 10
Serial DPLL Low Latency Parallel DPLL

Frequency [Hz]

0.8 0.7
SNR penalty at BER=1e3 [dB]

Fig. 7. Frequency response of the DPLLs. on a nondispersive noisy channel with P = 16, 1/T = 10 Giga-symbols per second (Gs/s), and latency L = 32 symbols. The signal-to-noise ratio (SNR) with SNR = 2/ 2 (see eq. (1)) at a given bit-error-rate (BER) is also used as a measure of the goodness of the proposed CR loop. Two different second-order DPLLs were simulated for comparison purposes: the serial DPLL (S-DPLL) and the proposed low latency parallel DPLL architecture (P-DPLL) shown in Fig. 6. The frequency responses for both DPLLs are depicted in Fig. 7. The loop lter gains were selected in order to get 200MHz loop bandwidth and 0.3 dB peaking (see Table 1). For the optical system considered here, these values of bandwidth and peaking provide a good tradeoff between capture range and the residual phase noise power at the input of the slicer (see Fig. 1). The capture range is analyzed in Fig. 8. We plot the SNR required to achieve BER = 103 for different values of the carrier frequency offset fc (see eq. (3)). As it can be observed, the capture range for the P-DPLL is 1GHz, which is close to the maximum theoretical frequency offset value for QPSK given by 1/8T = 1.25GHz [11]. Finally, Fig. 9 investigates the behavior of the DPLLs in the presence of sinusoidal frequency jitter given by (see eq. (4)) (jitter) = n Aj sin (2T fj n) , fj (28)

0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0 20 40 60 80 100

Jitter amplitude [MHz]

Fig. 9. SNR penalty versus sinusoidal frequency jitter amplitude Aj with fj =1MHz.

was set to fj =1MHz and the amplitude Aj was swept from 0 to 100MHz. It can be noted that the proposed low latency parallel DPLL can track the sinusoidal jitter with an SNR degradation 0.3 dB with respect to the serial DPLL.

4. CONCLUSIONS A new DPLL based carrier recovery architecture for high speed optical coherent receivers has been introduced in this paper. The proposed parallel scheme builds upon a novel DPLL computation, which breaks the bottleneck of the feedback path. We have shown a novel approach that leads to a simple parallel simple implementation. Furthermore, it has been shown that the new parallel DPLL with P = 16 can provide a bandwidth and capture range similar to those achieved by the serial DPLL.

where Aj and fj are the amplitude and frequency of the sinusoidal frequency jitter, respectively. The jitter frequency

35

5. REFERENCES [1] D. E. Crivelli, H. S. Carrer, and M. R. Hueda, Adaptive digital equalization in the presence of chromatic dispersion, PMD, and phase noise in coherent ber optic systems, Globecom04, Dec. 2004, paper SP08-3. [2] M. Kuschnerov et. al., DSP for coherent single carrier receivers, J. Lightw. Technol., vol. 27, no. 16, pp. 36143622, Aug. 2009. [3] O. E. Agazzi, M. R. Hueda, H. S. Carrer, and D. E. Crivelli, Maximum likelihood sequence estimation in dispersive optical channels, J. Lightw. Technol., vol. 23, no. 2, pp. 749763, Feb. 2005. [4] G. P. Agrawal, Fiber-Optic Communication Systems. Wiley-Interscience, 1997. [5] K. Pyawanno et. al., Fast and automatic frequency control for coherent receivers, ECOC, Sep. 2009, paper 7.3.1.

[6] M. Taylor, Phase estimation methods for optical coherent detection using digital signal processing, J. Lightw. Technol., vol. 27, no. 7, pp. 901914, 2009. [7] E. A. Lee and D. G. Messerschmitt, Digital Communication, 1st ed. KAP, 1992. [8] F. M. Gardner, Phaselock Techniques, 3rd ed. WileyInterscience, Jul. 2005. [9] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. Wiley-Interscience, Jan. 1999. [10] M. Thompson, Low-latency, high-speed numerically controlled oscillator using progression-of-states technique, Solid-State Circuits, IEEE Journal of, vol. 27, no. 1, pp. 113117, 1992. [Online]. Available: 10.1109/4.109564 [11] D. Messerchmitt, Frequency detectors for PLL acquisition for timing and carrier recovery, IEEE Trans. Commun., vol. 27, no. 9, pp. 12881295, Sep. 1979.

36

You might also like