Professional Documents
Culture Documents
net/publication/224671990
CITATIONS READS
4 19
2 authors, including:
Haibo Li
KTH Royal Institute of Technology
178 PUBLICATIONS 1,433 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Haibo Li on 26 January 2015.
Abstract— In wavelet video coding, due to motion- to a PSNR fluctuation pattern. In Section II we derive the
compensation entwined in the temporal wavelet transform, the new update step, which takes into account a Lagrangian term
distortion in the temporal subbands propagates in an uneven for even distribution of distortion in addition to minimizing
manner into the reconstructed frames of a group of pictures.
This leads to fluctuation of reconstruction quality in time, total distortion in the reconstructed frames. Section III pro-
which can be visually displeasing. We propose a new update poses some heuristics for implementation. Section IV shows
step which is derived by taking into account a Lagrangian experimental results.
term for even distribution of distortion among reconstructed
frames in addition to minimizing total distortion. Additionally
some heuristics for the implementation of this new update step
are proposed. Experimental results show reduction of quality 1 2 3 4
fluctuation compared to the conventional update step.
Index Terms— wavelet; scalable video coding; motion compen-
sated lifted;
L1 L2
I. I NTRODUCTION H1 H2
The recent approach of motion-compensated lifted wavelet
transform for video compression provides a signal decompo-
sition ([1], [2], [3], [4]) which is invertible and can employ ar-
bitrary motion compensation (MC) techniques. This approach LL1 LH1
provides inherent scalability of bit-rate, spatial resolution and
Fig. 1. Temporal decomposition of a group of 4 pictures using MC-lifted
temporal resolution while intending to approach the coding Haar wavelet transform as the basic decomposition unit.
efficiency of traditional hybrid coding schemes. The analysis
in [5] implies that wavelet video coding might reach the same
or even better coding efficiency than traditional hybrid coding
schemes and at the same time provide the additional benefits of
∆L
scalability. Due to their DPCM structure, hybrid video codecs
X + + X + ∆X
have to sacrifice some coding efficiency in order to provide
+ + +
scalability. + -
The pyramidal type temporal decomposition of a group
of 4 pictures using MC lifted Haar wavelet as the basic P U U P
∆H
decomposition unit is shown in Fig. 1. MC lifted Haar wavelet
Y +- + + Y + ∆Y
decomposition with input frames X and Y is shown in Fig. + + +
2. Note that transform in Fig.√2 is never orthonormal since
frame L should be scaled by 2 and frame H by √12 . Even
after including these scaling factors, due to the inclusion of
motion compensation, the temporal wavelet transform is no Fig. 2. Lifted Haar wavelet decomposition.
longer orthonormal, and typically frame Y is penalized more
than frame X when quantization of the temporal subbands is
performed [6]. Since the basic decomposition unit is used in a II. M ODIFIED U PDATE S TEP
pyramidal fashion within a group of pictures (GOP) this leads The input frames in Fig. 2, represented as column vectors,
This work was supported, in part, by the Max Planck Center for Visual are denoted by X and Y respectively. Prediction and update
Computing and Communication. operators can then be represented by square matrices P and
U , respectively. The quantization noise added to the temporal III. H EURISTIC RULES FOR I MPLEMENTING THE
subbands is denoted by column vectors ∆L and ∆H . The M ODIFIED U PDATE S TEP
error propagated to reconstructed frames is represented by Equation (2) entails a large amount of complexity and storage
column vectors ∆X and ∆Y . for implementation. Hence we propose some heuristics which
The expected total distortion in the reconstructed frames can either exactly or approximately implement (2) for a given
is E(∆X T ∆X) + E(∆Y T ∆Y ). The optimal update step value of α.
from the point of view of minimizing total distortion in the
reconstruction of X and Y is derived in [7]. The result is
A. Scaling the Conventional Update Step
directly stated here:
By performing an eigenvalue decomposition of P T P and
Uopt = (I + P T P )−1 P T (1) denoting the eigenvalues by λ1 , λ2 , . . . , λn in both (1) and (2)
We formulate an extended cost function to be minimized as we can write:
shown below. The term weighted by the Lagrangian multiplier −1 1 1 1
aims to reduce the difference in the distortion in X and Y : Uopt = T diag , ,..., T P T (3)
1 + λ1 1 + λ 2 1 + λn
J = E(∆X T ∆X) + E(∆Y T ∆Y ) −1
Um = T diag
1
,
1
,...,
1
T P T (4)
+ λ{E(∆X T ∆X) − E(∆Y T ∆Y )} α + λ 1 α + λ2 α + λn
The distortion terms in the above equation can be expressed All the λi above are positive and for simple motion are quite
as follows by tracking the synthesis of X and Y from L and close to each other. For the extreme case of all λi constant, (4)
H. is just a scaled version of (3). But from [7] we know that Uopt
is very close to the conventional update step Uconv . This leads
E(∆X T ∆X) us to our first heuristic: Um ≈ βUconv , i.e. a scaled version
= E{(∆LT − ∆H T U T )(∆L − U ∆H)} of the conventional update step.
= E(∆LT ∆L − ∆LT U ∆H
− ∆H T U T ∆L + ∆H T U T U ∆H) B. Modified Barbell Update
= E(tr(∆L∆LT − U ∆H∆LT Barbell lifting [8] implies the inversion of a many-to-many
mapping. The weights used in this inversion are the same as
− U T ∆L∆H T + U T U ∆H∆H T )) the pixel-connection weights in the forward mapping. This
T
= tr(RL + U U RH ) idea suffers from the fact that if a pixel in frame X is M-
where tr denotes trace. It is assumed that ∆L and ∆H are connected then it receives an update component from several
zero-mean uncorrelated stochastic processes. Similarly pixels in the update step and this might lead to inappropriate
amounts of energy. This problem is mitigated by using the
E(∆Y T ∆Y ) attenuation factor M1+α in addition to the weight of bilinear
= E{(∆LT P T + ∆H T (I − U T P T )) interpolation which connects the particular pixel pair from
(P ∆L + (I − P U )∆H)} frame X and frame Y . The attenuation factor M1+α is a
characteristic of the pixel in frame X, while the weight is a
= tr(P T P RL + (I − U T P T )(I − P U )RH )
characteristic of the connection of the pixel-pair. Fig. 3 shows
The new update step is obtained by setting the derivative of J an example with half-pel MC. In case of full-pel MC the pixel-
to 0. The structure of the matrices RL and RH is immaterial: pair connection weight is always 1 for any connection and
∂J modified Barbell update implements the solution (2) exactly
= 0 ⇒ (1 + λ)U − (1 − λ)P T + (1 − λ)P T P U = 0 for any given value of α. This jibes with the implementation
∂U
⇒ Um = [(1 + λ)I + (1 − λ)P T P ]−1 (1 − λ)P T rule for Uopt for full-pel MC given in [7] which is stated
as follows: if a pixel in frame X is M-connected, all the
⇒ Um = (αI + P T P )−1 P T (2)
connected pixels in the highband H are added to the pixel
where α = 1−λ1+λ
. with a weight of M1+1 . 1-connected pixels are included as the
The new update step given in (2) includes (1) as a special case special case M = 1. If a pixel in frame X is unconnected then
α = 1. In general, the appropriate α has yet to be determined. it is simply copied to the respective position in the lowband
When either a conventional update step or the update step L. The exact implementation of Um for full-pel MC can now
(1) is used, {E(∆X T ∆X)−E(∆Y T ∆Y )} is negative. Hence be looked upon as the more general case of the rule in [7]
λ has to be negative. The best choice of α might also be with M1+α for any given α.
dependent on the input signal, i.e. the frames X and Y Fig. 4 shows the reconstruction PSNR for 2 different pairs
undergoing that particular decomposition step. Within the of frames, (X1 , Y1 ) and (X2 , Y2 ). The first pair consists of
decomposition of a GOP, every single decomposition step frame number 1 and number 2 of the Foreman sequence (CIF).
could be carried out with its own appropriate α. The second pair consists of frame number 1 and number 5.
the same noise, depicting quantization noise of the temporal
1
0 .5 subbands, for every trial with a different α. The noise added
0.5 1+α
in the two temporal subbands L and H is zero-mean with no
1
0.5 0 .5
2+α
cross-correlation but having the same variance. This additive
1
0 .5 noise chosen from a uniform distribution is referred to as
0.5 2+α
quantization noise in the rest of the paper. In this experiment
1
0.5 0 .5 we also see that there is no significant drop in the average
2+α
1 PSNR. This is also observed when the experiment is extended
0 .5
0.5 2+α
to a larger GOP size, i.e. more temporal decomposition levels.
1
1 1+α IV. P ERFORMANCE OF THE H EURISTICS
X H(Y) L(X) H(Y) All the results in this section are plotted for 8 frames in a
GOP which amounts to 3 levels of temporal decomposition
Prediction Update with the Haar wavelet. The quantization noise added in the
8 temporal subbands is zero-mean with no cross-correlation
but having the same variance. Comparison of various schemes
Fig. 3. Modified Barbell update example with half-pel MC. is performed by adding exactly the same quantization noise.
Fig. 5 shows the results for quarter-pel MC and scaling
the conventional update step. The scaling β selected here is
determined empirically from a few trials and is not guaranteed
to be the global optimum. Also a constant value of β is used
for the entire sequence whereas ideally β should be optimized
for every single decomposition unit. Even while searching for
the optimal value of a parameter (like β) for any scheme, the
same quantization noise is added for every trial. It can be seen
that the range of fluctuations is reduced by half.
Fig. 4. Experiment with modified Barbell update step showing search for
optimal α with two different pairs of frames, (X1 , Y1 ) and (X2 , Y2 ).
The two frames in the second pair are further apart temporally
compared to the two frames in the first pair. In this case there
are more unconnected pixels and also the efficiency of the
motion compensation is lower. Without any consideration for
reducing fluctuations (α = 1) frame Y is penalized more than
frame X. As we reduce α we see that the distortion in the Fig. 5. Scaling the conventional update step for quarter-pel MC for Foreman
2 frames gets closer. In this particular experiment, for the sequence (CIF), luminance component.
first pair the optimal value of α is around 0.76 and for the
second pair it is around 0.62. If α is reduced below these Fig. 6 shows the results for quarter-pel MC and modified
values then the distortion pattern gets reversed and frame X Barbell update. In this case a different value of α is used for
gets more distortion than frame Y . Notice that with more every temporal level, though it is kept constant for the entire
unconnected pixels in general we find the required balance sequence. These values are also obtained from a few trials with
for a lower value of α. In this experiment we added exactly the same sequence. This heuristic performs better than scaling
which leads to fluctuation of PSNR of the reconstructed
frames. To mitigate this problem we design a new update
step as well as propose some heuristics for implementation.
This can be looked upon as a solution which is employed
early during the temporal decomposition, as compared to other
approaches which do not alter the temporal decomposition
but instead change the rate allocation among the temporal
subbands to mitigate the fluctuation problem as much as
possible. We believe that these two solutions should be com-
bined. i.e. the rate allocation among the temporal subbands
following our new update step should not only aim to minimize
total distortion in a group of pictures but instead minimize
the maximum distortion in the reconstructed frames of the
group of pictures. This requires the modeling of the distortion
propagation from the temporal subbands into the reconstructed
frames in a flexible way for this new update step with any
given α. Thus we propose to tackle the fluctuation problem
in two consecutive stages, temporal decomposition and spatial
Fig. 6. Modified Barbell update step for quarter-pel MC for Foreman encoding of temporal subbands. This will be part of future
sequence (CIF), luminance component.
research.
R EFERENCES
[1] B. Pesquet-Popescu and V. Bottreau, “Three-dimensional lifting schemes
for motion compensated video compression,” Proceedings of the IEEE
Int. Conference on Acoustics, Speech and Signal Processing (ICASSP),
Salt Lake City, UT, U.S.A., vol. 3, pp. 1793 –1796, Dec. 2001.
[2] L. Luo, J. Li, S. Li, Z. Zhuang, and Y.-Q. Zhang, “Motion compensated
lifting wavelet and its application in video coding,” Proceedings of the
IEEE International Conference on Multimedia and Expo (ICME), Tokyo,
Japan, pp. 481 –484, Aug. 2001.
[3] A. Secker and D. Taubman, “Motion-compensated highly scalable video
compression using an adaptive 3d wavelet transform based on lifting,”
Proceedings of the IEEE Int. Conference on Image Processing (ICIP),
Thessaloniki, Greece, vol. 2, pp. 1029 –1032, Oct. 2001.
[4] J.-R. Ohm, “Motion-compensated wavelet lifting filters with flexible
adaptation,” Proceedings of the Int. Workshop on Digital Communica-
tions (IWDC), Capri, Italy, pp. 113 –120, Sept. 2002.
[5] M. Flierl and B. Girod, “Video coding with motion-compensated lifted
wavelet transforms,” EURASIP Journal on Image Communication,
Special Issue on Subband/Wavelet Interframe Video Coding, vol. 19,
no. 7, pp. 561 –575, Aug. 2004.
[6] A. A. Mavlankar and E. Steinbach, “Distortion prediction for motion-
compensated lifted haar wavelet transform and its application to rate
Fig. 7. Comparison of the two heuristics with full-pel MC for Foreman allocation,” Proceedings of the Picture Coding Symposium (PCS), San
sequence (CIF), luminance component. Francisco, U.S.A., Dec. 2004.
[7] B. Girod, S. Han, and C-L Chang, “Optimum update step for motion-
compensated lifted wavelet coding,” Proceedings of the Picture Coding
Symposium (PCS), San Francisco, U.S.A., Dec. 2004.
the conventional update step as can be seen in comparison to [8] R. Xiong, F. Wu, J. Xu, S. Li, and Y-Q. Zhang, “Barbell lifting wavelet
Fig. 5. transform for highly scalable video coding,” Proceedings of the Picture
Coding Symposium (PCS), San Francisco, U.S.A., Dec. 2004.
Fig. 7 shows the results for full-pel MC. In this case [9] S. J. Choi and J. Woods, “Motion-compensated 3-d subband coding of
modified Barbell implements Um exactly. The improvement video,” IEEE Transactions on Image Processing, vol. 8, no. 2, pp. 155
over the simpler heuristic is obvious. In this plot both α and β –167, Feb. 1999.
[10] K. Hanke, J.-R. Ohm, and T. Rusert, “Adaptation of filters and quanti-
are kept constant for the entire sequence. Note that the gain of zation in spatiotemporal wavelet coding with motion compensation,”
quarter-pel MC over full-pel MC is also visible by comparing Proceedings of Int. Picture Coding Symposium (PCS), Saint Malo,
the plots in Fig. 5 and Fig. 6 with those in Fig. 7. France, pp. 49 –54, Apr. 2003.
[11] B.Girod and S. Han, “Optimal update step for motion-compensated
lifting,” IEEE Signal Processing Letters, vol. 12, no. 2, pp. 150 –153,
V. C ONCLUSION Feb. 2005.