You are on page 1of 4

ANALYSIS OF THE EFFICIENCY OF SNR-SCALABLE STRATEGIES

FOR MOTION COMPENSATED VIDEO CODERS

Josep Prades-Nebot †, Gregory W. Cook ‡ and Edward J. Delp‡

† Departamento de Comunicaciones ‡ Video and Image Processing Laboratory (VIPER)


Universidad Politécnica de Valencia Purdue University
Valencia 46071, SPAIN West Lafayette, IN 47907-1285, USA
jprades@dcom.upv.es cook@ieee.org, ace@purdue.ecn.edu

ABSTRACT transmitter and the receiver have different reference frames, pre-
In this paper, an analysis of the efficiency of three signal-to-noise diction drift is introduced (unless R = Rmax ) which reduces the
ratio (SNR) scalable strategies for motion compensated video efficiency. In Scalable Encodings Above the Loop Rate (SALR),
coders and their non-scalable counterpart is presented. After as- prediction drift is avoided by setting Rl = Rl0 = Rmin . This
suming some models and hypotheses with respect to the signals is the scalable strategy used in the fine granular scalability (FGS)
and systems involved, we have obtained the SNR of each coding profile of the MPEG-4 standard [2]. In a SALR coder, the refer-
strategy as a function of the decoding rate. To validate our anal- ence frames s0 are decoded at Rmin which limits the quality of the
ysis, we have compared our theoretical results with data from en- prediction, and therefore, the efficiency of the coder.
codings of real video sequences. Results show that our analysis
describes qualitatively the performance of each scalable strategy, s + e Intraframe Texture Intraframe + s00
+ encoder (Re ) decoder (R) +
and therefore, it can be useful to understand main features of each − +
scalable technique and what factors influence their efficiency. ŝ Intraframe Intraframe
decoder (Rl ) decoder (Rl0 )

1. INTRODUCTION +
+ e0 +
+
+ +
s0
Scalable video can be decoded at two or more different bit-rates MCP MCP
each corresponding to a different level of quality. Although scal-
ability is a desirable property when video has to be transmitted Motion MVs
in channels with errors and bandwidth fluctuations, scalable video Estimation

coders are not commonly being used in practice. One of the rea- Transmitter Receiver
sons is that all scalable coders are lower in efficiency than their
non-scalable (NS) counterparts [1, 2, 3, 4, 5]. Consequently, it is
important to know main features of each scalable technique and Fig. 1. Scheme of a SNR-scalable MCP-based video coder.
what factors influence their efficiency. In this paper, we present a
theoretical study of the efficiency of three signal-to-noise (SNR)
scalable strategies used in video coders with single-loop motion To improve their efficiency, some coders set Rl between Rmin
compensated prediction (MCP). and Rmax and allow decoding both above and below Rl [3, 4, 5].
Figure 1 shows the scheme of a SNR-scalable MCP-based In the following, we call this type Scalable encoding Above
video coder. At the transmitter, the predicted error frames (PEF) and Below the Loop Rate (SABLR). In [6], these three scalable
represented by signal e are encoded at a rate Re to generate the schemes were studied considering one dimensional signals and lin-
bit-stream, and decoded at the loop rate Rl to provide signal e0 ear prediction. In this paper, we have extended the study in [6] to
to the MCP loop. At the decoder, the bit-stream is decoded at Rl0 video signals and motion compensated coders.
(for the MCP loop) and at the decoding rate R. Depending on the In our theoretical analysis we make some assumptions about
values of these four rates (Re , Rl , Rl0 , R) we have different coding the signals and systems involved. With respect to the intra-frame
strategies. If Re = Rl = Rl0 = R, then we have a NS coder, encoding, we assume that embedded quantization is used and the
which sets the maximum performance for scalable coders. In all quantization noise q is modeled as an additive white noise with
the SNR-scalable strategies: Re = Rmax and the decoding rate variance
can vary between the minimum and the maximum rate of the ser- σq2 = σe2 2−βR , (1)
vice (Rmin ≤ R ≤ Rmax ). In Scalable encodings Below the Loop where σe2 is the power of the PEF, β is a parameter that measures
Rate (SBLR), Rl = Rmax and R = Rl0 . This is the encoding strat- the efficiency of the of the intra-frame coding, and R is the intra-
egy proposed in the SNR-scalable MPEG-2 standard [1]. As the frame encoding rate [7]. We also assume that q and e are uncorre-
lated.
This work has been supported by a grant for the Secretarı́a de Estado
de Educación y Universidades of the Spanish Government, by the pro- The rest of hypotheses are similar to the ones assumed in [8,
gram CICYT TIC-2002-02469, and by an Indiana Twenty-First Century 9]. With respect to the input video signal s, we assume that its
Research and Technology Fund grant. frames constitutes a stationary random field. We also assume that

0-7803-8554-3/04/$20.00 ©2004 IEEE. 3109


the only difference between consecutive frames is a constant-in- where D = {Λ : |ωx | < π, |ωy | < π}, and Ef is
time and uniform-in-space displacement (dx ,dy ). Although these ZZ
hypothesis are not accurate in real encodings (MVs change in time 1
Ef = 2
|F (Λ)|2 dΛ. (6)
and space, motion can be non-translatory, at low rates q is not 4π D
white and is correlated with e), our analysis can still be useful to
Finally, from (1) and (5), the SNR of the NS coder as a function of
study the relative performance of every scalable strategy.
the decoding rate is
In our analysis, we ignore the bits necessary to encode mo-
tion vectors (MVs). In practice, this does not introduce significant σs2 σ2 “ ”
differences in analyzing the relative performance of each scalable SNRNS (R) = 2
= s 2βR − Ef . (7)
σr Es
strategy, if the number of bits aimed to encode MVs are approxi-
mately the same at all rates and is low compared to the number of If R is large enough so that 2βR  Ef , then the SNR (in dB) of
bits used to encode PEF texture. the NS coder is an affine function of R with slope 3β.
In the following, x and y are the spatial variables, and t is
the temporal variable of the video sequence. Their corresponding
3. ANALYSIS OF THE SALR SCHEME
frequency variables are ωx , ωy and ωt respectively, although for
simplicity, Λ = (ωx , ωy ) and Ω = (ωx , ωy , ωt ) are used some-
Figure 3 shows the block diagram of a SALR coder. The quantiza-
times. The predictor is modeled as a random linear time-invariant
tion noise qb is generated by the encoding e at Rmax and its further
system whose frequency response is
decoding at Rl . With respect to the quantization noise source q, is
ˆ ˆ generated by encoding e at Rmax and decoding it at R.
H(ωx , ωy , ωt ) = F (ωx , ωy ) e−j(ωx dx +ωy dy +ωt ) (2)

where F (ωx , ωy ) is the frequency response of the spatial filtering q


performed in the MCP loop and (dˆx ,dˆx ) is the estimated (random) s + e s00
+ + +
displacement vector. In general, there is a displacement error vec- −
tor ∆d = (∆dx , ∆dy ) ŝ + qb + ŝ
e0
(∆dx , ∆dy ) = (dx , dy ) − (dˆx , dˆy ). (3) + +

s0 s0
H(Ω) H(Ω)
2. ANALYSIS OF THE NON-SCALABLE CODER

The block diagram of a non-scalable MCP-based video coder is Fig. 3. Block diagram to compute the SNR of the SALR coder.
shown in Figure 2. Notice that the reconstruction error r = s00 − s
is equal to the quantization noise q, and thus σq2 = σr2 . Similarly to the NS coder, σr2 = σq2 , but now

q σe2 = Es + σq2b Ef (8)

s + e +
+
e0 + s00 and the variance of qb is
+ + +
− +
σq2b = σe2 2−βRmin . (9)
ŝ +
+
+ From (1), (8) and (9), the SNR of the SALR coder is
s0 H(Ω)
SNRSALR (R) = SNRNS (Rmin ) 2β(R−Rmin ) . (10)
H(Ω)
Notice there is no loss with respect to the NS coder at Rmin . Above
Fig. 2. Block diagram of the non-scalable coder. this rate, the SNR (in dB) is an affine funtion of R with slope 3β.

The power spectral density (PSD) of the error frames is [9]: 4. ANALYSIS OF THE SBLR CODER

See (Λ) = Sss (Λ) 1 − 2 Re {F ∗ (Λ) P (Λ)} + |F (Λ)|2


ˆ ˜
In a SBLR coder, two quantization noise sources must be taken
+ 2
|F (Λ)| Sqq (Λ) (4) into account (Figure 4). The first one (qm ) is placed in the trans-
mitter and is the result of encoding and decoding the predicted
where Sss (Λ) and Sqq (Λ) are the PSD of the input frames and error frames at Rmax . The second one (q) is placed in the receiver
the quantization noise respectively, Re{·} denotes “real part”, and and is the result of decoding the compressed PEF at R.
P (Λ) is the 2-D Fourier Transform of the probability density func- In this case, the reconstruction error r is
tion p∆d (∆d). Then, the power of e is
r = qm + ∆q ∗ hd (11)
σe2 = Es + σq2 Ef (5)
where ∆q = q − qm , hd represents the end-to-end decoder trans-
where Es is fer function, and ∗ is the convolution operator. We assume that
ZZ E{qm ∆q } = 0 and that ∆q is white noise, which provides
1
Sss (Λ) 1 − 2 Re {F ∗ (Λ) P (Λ)} + |F (Λ)|2 dΛ,
ˆ ˜
Es = 2 2
σr2 = σq2m + σ∆q Ed (12)
4π D

3110
q quantization to encode the PEF [10], it can operate in any of the
s + e s00 four coding modes (NS, SALR, SBLR and SABLR).
+ + + To obtain specific numerical simulation results, some parame-

ŝ + qm ters have to be set. With respect to the video signals, we assume s
e0 has an isotropic PSD
+ H(Ω)
s0 2π σs2

ωx2 + ωy2
«−3/2
H(Ω) Sss (ωx , ωy ) = 1+ (17)
ω02 ω02
Fig. 4. Block diagram to compute the SNR of the SBLR coder.
where σs2 is the signal power and ω0 has been set to provide an
adjacent step correlation coefficient equal to 0.93 [9]. It is as-
2
where σq2m and σ∆q are the variances of qm and ∆q respectively, sumed that ∆d follows a zero mean isotropic Gaussian distribu-
2
Ed is tion with σ∆d = 0.2 T 2 where T is the spatial sampling period.
1
Z Z Z ff With respect to the coder, parameter β has been set to 3 and, al-
−2
Ed = E |1 − H(Ω)| dΩ (13) though spatial filtering is not considered, we introduce a leaky fac-
8π 3 D0
tor equal to 0.95, and then F (Λ) = 0.95. The use of a leaky factor
where E{·} is the expectation operator, and D0 = {Ω : |ωx | < limits the effect of prediction drift in SBLR and SABLR coders.
2
π, |ωy | < π, |ωt | < π}. As σ∆q = σq2 − σq2m , Expression (12) Practical coders usually introduce some implicit or explicit spatial
transforms into filtering in the MCP loop which can be considered as a frequency-
dependent leaky factor. The rate interval chosen is Rmin = 0.066
σr2 = σq2m + σq2 − σq2m Ed
` ´
bits/pixel and Rmax = 0.33 bits/pixel which for CIF sequences at
30 frames/s is equivalent to Rmin = 200 kbits/s and Rmax = 1000
h “ ” i
= σq2m 1 + 2β(Rmax −R) − 1 Ed . (14)
kbits/s.
Figure 5 shows the SNR(R) function of the NS, SALR, SBLR
Finally, from (14) and σq2m = Es /(2βRmax − Ef ), we obtain and SABLR coder for the set of parameters previously described.
In the case of the SABLR coder three curves, corresponding to
SNRNS (Rmax )
SNRSBLR (R) = (15) Rl = 0.131, 0.197 and 0.263 bits/pixel, have been plotted. These
[1 + (2β(Rmax −R) − 1) Ed ] three rates correspond to 400, 600 and 800 kbits/s respectively,
The SBLR coder has no loss with respect the NS coder at Rmax . if CIF video sequences at 30 frames/s are used. In the SABLR
Below this rate, prediction drift is introduced. Note that if R is far curves, the Rl value is the rate at which the SABLR and the NS
below Rmax so that Ed 2β(Rmax −R)  1, the SNR of the SBLR curve intersect. The portions of the three SABLR curves where
coder (in dB) is an affine function of R with slope 3β. R > Rl are equivalent to the curves of a SALR coder using
Rmin = Rl . Equivalently, the portions of the SABLR curves
where R < Rl can be considered SBLR curves with Rmax = Rl .
5. ANALYSIS OF SABLR CODER In the SALR intervals of the curves in Figure 5, notice that the
larger Rmin is, the lower the loss is with respect to the NS coder,
In SABLR coders, according to the decoding rate R, we can dis- but the interval of rates where decoding is possible is also lowered.
tinguish two operating intervals: In fact, if Rmin is large enough so that 2βRmin  Ef , the loss is
• the SBLR interval (Rmin ≤ R ≤ Rl ) where prediction insignificant. With respect to the SBLR intervals of the curves,
drift is introduced. In this interval, the SABLR coder has a the contrary effect in the SALR ones is noted: the loss decreases
higher SNR than the SBLR coder. with a decrease in Rmax (again, at the expense of reducing the
interval of decoding rates). SABLR coders allow a balancing of
• the SALR interval (Rl ≤ R ≤ Rmax ) where there is a
both effects and by setting Rl properly, the mean SNR (MSNR)
loss of performance with respect to the NS coder because
can be improved with respect to the SALR and the SBLR coders.
the prediction is based on previous frames decoded at Rmin
For the encoding parameters of Figure 5, a maximum MSNR of
instead of R. In this interval, the SABLR coder has a higher
10.15 dB is achieved at Rl = 0.162 bits/pixel (or, equivalently, at
SNR than the SALR coder.
550.3 kbits/s with CIF sequences at 30 frames/s). With respect to
From Sections 3 and 4, the SNR for the SABLR coder is: the SALR and the SBLR coders, the MSNR are 8.86 dB and 8.33
8 dB respectively.
SNRNS (Rl ) To test the efficiency of the strategies in practice, we have en-
< , R ≤ Rl
SNRSABLR (R,Rl) = [1 + (2β(Rl −R) − 1) Ed ] (16) coded several test CIF sequences (352 × 288 pixels/frame) at 30
β(R−Rl )
SNRNS (Rl ) 2 , R ≥ Rl
:
frames/s with SAMCoW. The quality of each encoding is measured
by computing the mean PSNR (in dB) of the luminance component
Notice that the SABLR coder has no loss with respect to its NS of 100 decoded frames. As our theoretical analysis only accounts
counterpart at Rl . for the steady-state performance of coders, in every encoding an
initial portion of each decoded sequence containing frames with
6. EXPERIMENTAL RESULTS transient response was not considered. Motion estimation is per-
formed at integer-pixel accuracy with no loop filter and, as in the-
In this section, we compare our theoretical analysis with data from ory, a leaky factor c = 0.95 is introduced. Figure 6 shows the
encodings of real video sequences using the MCP-based SNR- SNR(R) function obtained by encoding Foreman with SAMCoW
scalable SAMCoW video coder. As SAMCoW uses embedded running in the four strategies. By comparing Figures 5 and 6, we

3111
15 cannot be increased much above Rmin because the improvement
NS
14 SALR in the SALR interval could not compensate the loss introduced in
SBLR the SBLR interval. Second, in practice, gains with respect to the
13 SABLR
SALR are lower than in theory. In fact, the optimum Rl value is
12 300 kbits/s which provides a mean PSNR of 30.72 dB, compared
11 to the 30.41 dB and 28.44 dB of the SALR and SBLR coders re-
spectively.
SNR [dB]

10

9
7. CONCLUSIONS AND FUTURE WORK
8
In this paper, we have theoretically analyzed the performance of
7
three sorts of MCP-based SNR-scalable video coders and have
6 compared them to their non-scalable counterpart. Results show
5 that main trends in the efficiency described by the theory match
practical results obtained from the encoding of real video se-
0.066 0.131 0.197 0.263 0.33 quences. Consequently, our analysis is useful to understand the
Decoding Rate [bits/pixel] main features of each scalable strategy and what factors influence
Fig. 5. Numerical simulation of the theoretical SNR(R) of the four their efficiency.
video strategies using the assumptions outlined in Section 6. Although the present work only takes into account the steady-
state response of SALR and SABLR coders, we are currently ex-
tending our analysis by considering also their transitory response.
NS
34 SALR
This will allow us to analyze the efficiency of these strategies in
SBLR coders using periodic intra-frames. We are also studying the op-
SABLR
timum values of parameters c and Rl when different degrees of
32
motion estimation accuracy exist.

30
PSNR [dB]

8. REFERENCES

28
[1] B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video:
An introduction to MPEG-2, Chapman and Hall, 1997.
[2] W. Li, “Overview of fine granularity scalability in MPEG-4
26
video standard,” IEEE Trans. on CSVT, vol. CSVT-11, pp.
301–317, 2001.
24 [3] C. Buchner, T. Stockhammer, D. Marp, G. Blatterman, and
G. Heising, “Efficient fine granular scalable video coding,”
200 300 400 500 600 700 800 900 1000
Decoding Rate [kbits/s]
in Proceedings of the ICIP, Thessaloniki, Greece, October
7–10 2001, pp. 997–1000.
Fig. 6. PSNR(R) of the four video strategies using SAMCoW
[4] J. Prades-Nebot, G. Cook, and E. J. Delp, “Rate control for
FFGS video coders,” in Proceedings of the SPIE VCIP, San
Jose, California, 2002, vol. 4310, pp. 828–839.
can study the differences between theory and practice. No attempt
of using similar parameters values (β, ω0 ) in theory and practice [5] M. van der Schaar and H. Radha, “Adaptive motion-
has been made, and therefore, our comparison is qualitative. compensation fine-granular-scalability (AMC-FGS) for
With respect to the SALR intervals of the scalable strategies, wireless video,” IEEE Transactions on CSVT, vol. 12, no. 6,
while in theory all the SALR curves have the same slope, in prac- pp. 360–370, June 2002.
tice the slope decreases when Rl increases. The reason is that, in [6] J. Prades-Nebot and G. W. Cook, “Analysis of the perfor-
practice β is not constant but depends on Rl : starting in Rl = 0, mance of predictive SNR scalable coders,” in Proceedings of
β decreases rapidly with increase in Rl , but tends to a constant the ICIP, Barcelona, Spain, Sept. 2003, vol. 3, pp. 861–864.
value at high Rl . The consequence of this is that, in practice, the [7] P.-Y. Cheng, J. Li, and C.-C. J. Kuo, “Rate control for an
gain obtained by increasing the value of Rmin is lower than the embedded wavelet video coder,” IEEE Trans. on CSVT, vol.
one obtained in theory. 7, pp. 696–701, 1997.
With respect to the SBLR intervals of the scalable strategies, [8] B. Girod, “The efficiency of motion-compensating predic-
although theory and practice tend to be similar at high decoding tion for hybrid coding of video sequences,” IEEE Journal on
rates, there is a great divergence at low decoding rates where the SAC, vol. SAC-5, no. 7, pp. 1140–1154, 1987.
loss in practice is higher than the theoretical one. The reasons of
[9] B. Girod, “Motion-compensating prediction with fractional-
this divergence is that, at low rates, some of our hypothesis do not
pel accuracy,” IEEE Trans. on Communications, vol. 41, pp.
hold (β changes largely with R and, ∆q and qm are correlated).
604–611, 1993.
We have checked that when rate intervals with higher Rmin val-
ues are used, theory and practice are much closer. Differences [10] K. Shen and E. J. Delp, “Wavelet based rate scalable video
between theory and practice in both the SALR and SBLR inter- compression,” IEEE Trans. on CSVT, vol. 9, pp. 109–122,
vals, have two main consequences for the SABLR coder. First, Rl 1999.

3112