You are on page 1of 4

Comparison of Compression Performance of

10-bit vs. 8-bit Depth, under H.264 Hi422 Profile


Damian Ruiz
1
, Srdjan Sladojevic
2
, Dubravko Culibrk
3
, Gerardo Fernndez-Escribano
4


AbstractH.264 is one of the first video coding standard
incorporating coding formats with a bit-depth of above 8 bits.
This paper presents the results of compression comparison tests
for the H.264 High 422 profile, between 10-bit and 8-bit
sample depths. The simulations were run on five 720p and 1080i
high definition sequences. PSNR and SSIM metrics were used to
evaluate the objective quality performance of both bit-depths,
and with the aim of enabling a fair comparison, both metrics
were computed with 10-bit precision, up-scaling the 8-bit
decoded sequences to 10-bits. Some works have been published
in this field based on the evaluation of commercial 10-bit H.264
implementations. In this work, we carry out a neutral evaluation
of H.264 standard performance using the official H.264
Reference Software. Unlike the expected 10-bit coding gains, the
results show unnoticeable differences between both sample
depths in terms of objective quality, lower 0.1dB for PSNR and
0.002 for SSIM, with a 5% bit rate saving that can be achieved
for the luminance component, especially for the 720p format, and
negligible quality improvement for U and V chroma components.
KeywordsH.264, Fidelity Range Extensions, Internal Bit
Depth Increase, bit-depth, contouring, banding artifacts, High
422 Profile.
I. INTRODUCTION
AVC/H264 coding standard [1] represents state of the art
video coding, for both personal users and the professional
environment. Although its complexity is greater than that
shown in previous standards, like MPEG-2, and MPEG-4 Part
2, the H.264 bit rate savings are greater than 50% for the same
perceptual quality, especially for the new high definition
formats. The high coding efficiency of H.264 has been
achieved through the introduction of new tools like a new
Intra-frame prediction, in-loop filter, new high efficient
entropy coding (CABAC), and the improvement of the tools
used in previous video standards, such as the variable block
size used in the motion estimation, the 4x4 and 8x8 integer
DCT transforms, and the use of hierarchical B frames. The
success of H.264 standard for multimedia and broadcasting
services, promoted the development of new professional
profiles, known as FRExt [2], targeted for live contribution
services, media editing, and storage and archiving. These new
profiles allow the use of 10-bit depth, and 4:2:2 chroma
subsampling format.

1
Damian Ruiz is with the Instituto de Investigacin en Informtica
de Albacete, Universidad de Castilla-La Mancha, Albacete, Spain, E-
mail: josedamian.ruiz@alu.uclm.es

2
Srdjan Sladojevic is with the University of Novi Sad, Trg
Dositeja Obradovica 6, Novi Sad, Serbia, E-mail:
sladojevic@uns.ac.rs
Its commonly accepted that the use of bit-depth beyond 8
bits improves the subjective quality for no-compressed
formats, preventing traditional distortions like contouring,
banding artifacts or smearing for scenes with smooth areas.
The 10-bit depth support has been also adopted for the new
video coding standard HEVC (High Efficiency Video
Coding), recently approved by ITU-T and ISO [3][12] in
January 2013.
From the beginning, the compression standards were
designed with 8-bit depth encoding due to the difficulty in
perceiving the subjective quality improvement that more than
8-bit encoding could provide. The tradeoff between quality
and computational complexity of 10-bit sample depth is not
yet sufficiently justified for professional environments.
This work presents the results of compression comparison
tests between 10-bit vs. 8-bit sample depth encoding using the
H.264 High 422 Profile (Hi422), targeted for the high
definition format used for current contribution services. The
simulations have been run using the JM 18.0 H.264 reference
software [4], coding five original 10-bit@422 high definition
sequences, from both 720p and 1080i formats.
It is well known that the Peak Signal to Noise Ratio
(PSNR) metric, under certain circumstances, could not offer a
good correlation with subjective quality. The Structural
Similarity Index Metric (SSIM) [5] is accepted within the
scientific community as a better perceptual quality approach
than PSNR, and taking into account the impact of 10-bit vs. 8-
bit, largely in terms of perceptual quality, the SSIM metric is
also computed in addition to PSNR in this work.
The rest of this paper is organized as follows, the FRExt
H.264 profiles are presented in Section II, and in Section III
we describe the test bed model used to compute both metrics,
including the 10-bit to 8-bit sample conversion process.
Section IV presents the simulation results in terms of PSNR
and SSIM. Finally, the conclusions of this paper are presented
in Section V.
II. PROFESSIONAL H.264 PROFILES AND FUTURE
EVOLUTION
Professional production and contribution media require a
higher visual quality compared with those required at the end
of the chain, just for user visualization. Typically, the
production process suffers different post-production and
editing stages. For this reason, a two or three times H.264 bit
rate over diffusion bit rates (around 10Mbps for high
3
Dubravko Culibrk is with the University of Novi Sad, Trg
Dositeja Obradovica 6, Novi Sad, Serbia, E-mail: dculibrk@uns.ac.rs
4
Gerardo Fernandez-Escribano is with the Instituto de
Investigacin en Informtica de Albacete, Universidad de Castilla-La
Mancha, Albacete, Spain, E-mail: Gerardo.Fernandez@uclm.es
978-1-4799-0902-5/13/$31.00 2013 IEEE 119

definition formats) is required, as well as a 4:2:2 color
subsampling scheme, to allow a high color fidelity
performance.
MPEG-2 [6] [7] was the first standard that included a
profile supporting 4:2:2 subsampling chroma format, and it
was widely accepted by the professional industry as a format
for storing and exchanging audiovisual content.
Similarly, the following standard, MPEG-4 Part 2 [8],
included a new set of tools for professional applications,
defining a Studio Profile to use in acquisition, editing, and
post-production applications. None of these standards
supported formats beyond 8-bit depth. H.264 was the first to
do it, supporting bit-depths even above 10bits, with its
FRExt Profiles [9], including High 10 and High 422
profiles, the subject of evaluation in this work.
The High Profiles family also added new improvements,
such as the 8x8 Intra prediction, integer 8x8 DCT transform,
and perceptual quantization matrix. These tools achieve a
better subjective quality compared with basic profiles, but it is
relevant to stress that 10-bit depth is not considered to be a
new tool that improves coding efficiency, but is rather an
additional feature that those profiles offer [10].
In digital signal processing, it is well known that increasing
the internal arithmetic precision over the resolution of the
incoming data ensures that more accurate results are achieved.
This fact motivated the use of a new tool in video
compression called IBDI (Internal Bit Depth Increase), which
increases the processing internal precision above 8-bits. At
[11] demonstrates how using a 10-bit internal precision
applied to H.264 for 720p format can achieve a bit rate
savings of up to 5%, and up to 10% using 12 bits compared
with an arithmetic internal precision of 8-bits.
That experiment works with inverse scenery to ours, where
the input bit-depth is up-scaled two bit (8-bit to 10-bit),
instead to downscaled 2 bit (10-bit to 8-bit), impacting in
different way in terms of compression efficiency.
III. SIMULATIONS METHODOLOGY
Fig. 1 depicts the encoding architecture used to carry out
the simulations in this work, where the 10-bit source
sequences are downscaled to 8-bits by rounding, saturating,
and shifting right by 2-bits, and the inverse up-scaling process
is applied to 8-bit decoded sequences, performed by 2-bit left
shifting and 2 less bit filling, following the procedure
described in [13].
These procedures allow us to perform a fair evaluation of
the PSNR and SSIM quality metrics, comparing sequences
with the same bit-depth (10-bits). Therefore, both metrics will
use a peak value of 1023 instead of 255 (8-bits), as is shown
in (1).
10
2 1
( ) 20lg
N
PSNR dB
MSE
| |

=
|
|
\ .
This methodology represents a real contribution scenario
with high fidelity, where original 10-bit sequences are
previously downscaled to 8-bits to be 8-bit depth encoded
(422@8b), stored, and transmitted, and finally up-scaled to
10-bits on the decoder side.

PSNR
/SSIM
PSNR/SSI
M
Sequence
10b
H.264
Hi422@10b
Round,
clip, >> 2
Sequence
8b
H.264
Hi422@8b
<< 2

Fig. 1. Experimental architecture for 422@10b and 422@8b

The aim of this experiment is to evaluate the coding
performance of Hi422 profile of H.264, for 10-bit and 8-bit
pixel depth sequences, and not to evaluate custom
implementations [14] [15]. For this reason, all of the
simulations have been run using the AVC/H.264 reference
software JM, version 18.0, available from [4].
IV. EXPERIMENTAL RESULTS
Five original 422@10b high definition sequences, have
been used in this experiment, with a 10 second length each.
Four of them, CrowdRun, Ducks, IntoTree, and ParkJoy are
supplied by SVT [16], and the last one, Dancer, is available at
[17], supplied by the European Broadcasting Union (EBU).



Fig. 2. First frame of Test sequences used in experiments (422@10b)
To run the simulations, the authors followed the Common
Simulation Conditions, recommendations made by ITU-T in
[18], for the analysis of H.264 coding of high resolution
sequences, fitting them to the current professional scenario for
live transmissions that use fix GOP. As such, Hi422 profile
and GOP length 32 were used. Other relevant parameters used
in this work are shown in Table I.
10 N = , (1)




120

TABLE I
COMMON SIMULATION CONDITIONS FOR H.264

Parameter Value
CABAC On
Number of B frames 7
Pyramid Coding Levels 3
Explicit Pyramid Format (JM defined) b3r0b1r1b0e2b2e2b5r1b4e2b6e2
Rate Distortion Optimization 1
Motion Estiamtion Search Range 64
Use Fast Motion Estimation 3
Number of Reference Frames 4
QP for I Slices 22, 27, 32, 37
QP for P Slices 23, 28, 33, 38
QP for B Slices 24, 29, 34, 39
To compute the objective difference between 422@10b and
422@8b simulations, we used the Bjntegaard Delta
methodology defined by ITU in [19], which computes the
BD-Rate, BD-PSNR, and BD-SSIM.
This method calculates the average difference between two
Rate-Distortion curves (Rate vs. PSNR or Rate vs. SSIM) a
and b, fitting each of these curves through four data points
obtained for each QP (22, 27, 32, and 37). The convention
used in this work implies that a negative BD-PSNR or BD-
SSIM value means a lower quality performance of Hi422@8b
regarding Hi422@10b, for the same bit rate, and therefore a
positive value means a higher 8-bit coding performance.
Concerning BD-Rate, a negative value means that, for the
same quality, a lower bit rate is obtained from Hi422@8b
coding regarding Hi422@10b. Consequently, 8-bit depth gets
a bit rate saving for the same 10-bit quality. BD-Rate is
measured as a percent of bit rate between bit rates of curves
a and b. Table II shows the BD-PSNR and BD-Rate
simulation results for 720p format.

TABLE II
BD-PSNR (720P50)

PSNR Y U V
720p50
BD-Rate
(%)
BD-PSNR
(dB)
BD-Rate
(%)
BD-PSNR
(dB)
BD-Rate
(%)
BD-PSNR
(dB)
CrowdRun -0.0918 0.0020 4.0450 -0.0916 4.0188 -0.0935
Ducks -1.3139 0.0516 1.1551 -0.0282 3.5450 -0.0325
IntoTree -0.0225 0.0014 5.4909 -0.0629 8.4008 -0.0801
ParkJoy -0.9126 0.0386 0.3849 -0.0400 2.0398 -0.0586
Dancer 0.1798 -0.0070 4.7290 -0.1099 7.2565 -0.1665
TOTAL 0.43 0.02 3.16 0.07 5.05 0.09
Can be observed as the luminance component (Y), for the
first four sequences, shows a slight PSNR improvement for 8-
bit coding, and hence a slight bit rate saving. Only the
Dancer sequence shows an insignificant luminance
improvement for 10-bit encoding. The average BD-PSNR
performance for the luminance component is negligible
(0.02dB), with a meager 0.43% bit rate saving for 422@8b
encoding.
On the other hand, the results for U and V chroma
components obtain a slightly better performance for 10-bit
encoding, with a negligible PSNR improvement (0.02 and
0.07dB) and a rate saving of around 5%. Considering that U
and V color components spend statistically fewer bits than the
luminance component, no straight conclusions can be derived
from U and V bit rate savings (3% and 5%, respectively).
In order to appreciate the small differences between both
bit depths coding, Fig. 3 depicts an enlargement of PSNR
curves for the Y component of the ParkJoy sequence, from
20Mbps to 35Mbps, which are common bit rates used for
HDTV contribution services. Figure 4 shows the global
PSNR-Rate simulation results obtained from the U component
for the ParkJoy sequence, where unnoticeable differences
can be observed.
32.0
32.5
33.0
33.5
34.0
34.5
35.0
35.5
36.0
20 22 24 26 28 30 32 34
P
S
N
R

(
d
B
)
Bitrate (Mb)
PSNR-Y
ParkJoy 720p50
Hi422@10
b
Hi422@8b
Fig. 3. PSNR-Y, ParkJoy@720p50, zoom from 20-35Mbps

29
31
33
35
37
39
41
0 10 20 30 40 50 60
P
S
N
R

(
d
B
)
Bitrate (Mb)
PSNR-U
ParkJoy 720p50
Hi422@10
b
Hi422@8b

Fig. 4. PSNR-U, ParkJoy@720p50
In an analogous way, the 720p SSIM results are depicted in
Table III, and a similar trend can be derived. No noticeable
quality differences are shown (lower than 0.003) between both
422@10b and 422@8b encoding profiles, and only a slight bit
rate saving (0.27% for luminance and 4% and 5% for color
components) can be obtained from 10-bit depth coding.
Regarding the 1080i format, Table IV depicts the BD-
PSNR results, where again the luminance component obtains
for 422@8b encoding a slight PSNR improvement (0.4dB)
and bit rate saving (0.0186%). The U and V color components
achieve a negligible PSNR improvement, with a meager data
rate saving (3.3% and 6.13%). Table V shows the 1080i25
SSIM-Rate results obtained for the Y, U, and V, components,
in which no noticeable SSIM difference can be observed, and
only a slight bit rate saving similar to 720p50 format is
present.
121

122
TABLE III
BD-SSIM RESULTS (720P50)

SSIM Y U V
720p50
BD-Rate
(%)
SSIM BD-Rate
(%)
SSIM BD-Rate
(%)
SSIM
CrowdRun 0.3779 -0.0001 6.7057 -0.0056 6.9034 -0.0047
Ducks -0.2068 0.0004 1.9941 -0.0015 3.5571 -0.0010
IntoTree 0.1029 0.0001 5.6341 -0.0017 8.8491 -0.0013
ParkJoy 0.6824 -0.0002 2.8037 -0.0019 4.2846 -0.0016
Dancer 0.4334 -0.0002 3.2694 -0.0011 4.6582 -0.0013
TOTAL 0.2780 0.0000 4.0814 -0.0024 5.6505 -0.0020

TABLE IV
BD-PSNR RESULTS (1080I25)

PSNR Y U V
1080i25
BD-Rate
(%)
BD-PSNR
(dB)
BD-Rate
(%)
BD-PSNR
(dB)
BD-Rate
(%)
BD-PSNR
(dB)
CrowdRun -0.2208 0.0065 3.4337 -0.0712 3.0257 -0.0632
Ducks 0.2301 -0.0082 4.2152 -0.0762 10.9798 -0.0832
IntoTree 0.6380 -0.0149 8.3053 -0.0801 12.3458 -0.0986
ParkJoy -0.3607 0.0105 2.7265 -0.0624 5.7560 -0.0642
Dancer -2.3228 0.0989 -2.1784 0.0517 -1.4362 0.0359
TOTAL -0.4072 0.0186 3.3005 -0.0476 6.1342 -0.0547

TABLE V
BD-SSIM RESULTS (1080I25)

SSIM Y U V
1080i25
BD-Rate
(%)
SSIM BD-Rate
(%)
SSIM BD-Rate
(%)
SSIM
CrowdRun 0.0975 -0.0063 -0.5927 0.0007 -4.1455 0.0023
Ducks 0.0927 0.0002 2.7838 -0.0018 7.7800 -0.0015
IntoTree 0.9010 -0.0006 9.2103 -0.0019 15.2666 -0.0015
ParkJoy 0.2084 0.0000 3.3945 -0.0020 6.0989 -0.0016
Dancer 0.5702 -0.0002 4.0392 -0.0013 6.1847 -0.0014
TOTAL 2.3048 -0.0014 3.7670 -0.0013 6.2369 -0.0007
V. CONCLUSIONS
The experiment results reveal that the hypothetical
efficiency improvement obtained using the 10-bit Hi422
profile against 8-bit is reduced to an unnoticeable gain in
terms of bit rate saving, but exclusively for color video
components, around 5%, but not for the luminance
component. In most of the sequences, especially for the 720p
format, the luminance component achieves better PSNR and
bit rate saving figures. It is well known that the human visual
perception system is more sensitive to luminance information
that color.
Both the PSNR and SSIM video quality metrics used in the
experiment show the same trend, and it does not allow to
confirm that 10-bit H.264 coding offers a better perceptual
quality than 8-bit sample depth coding, under these specific
test conditions.
ACKNOWLEDGEMENT
This work has been jointly supported by the MINECO and
European Commission (FEDER funds) under the project
TIN2012-38341-C04-04.
REFERENCES
[1] Draft ITU-T Recommendation and Final Draft International
Standard of Joint Video Specification (ITU-T Rec. H.264 |
ISO/IEC 14496-10 AVC) Joint Video Team (JVT), Mar.2003,
Doc. JVT-G050.
[2] G.J. Sullivan, P. Topiwala, and A. Luthra, The H.264/AVC
Advanced Video Coding Standard: Overview and Introduction
to the Fidelity Range Extensions (FRExt), SPIE Conference on
Applications of Digital Image Processing XXVII, August 2004.
[3] B. Bross, W.-J. Han, J.-R. Ohm, G.J. Sullivan, and T. Wiegand.
High efficiency video coding (HEVC) text specification draft 6.
Document JCTVCH1003 of JCT-VC, February 2012.
[4] ftp://standards.polycom.com/reference_software/
[5] Z. Wang, A..C. Bovik, H.R. Sheikh, and E.P. Simoncelli,
Image quality assessment: From error visibility to structural
similarity, IEEE Trans. Image Process., vol. 13, no. 4, pp.
600612, Apr. 2004.
[6] ITU-T and ISO/IEC JTC 1, Generic coding of moving pictures
and associated audio information Part 2:Video, ITU-T Rec.
H.262 and ISO/IEC 13818-2 (MPEG-2), Nov. 1994.
[7] E. Dumic, M. Mustra, S. Grgic, G. Gvozden, "Image quality of
4:2:2 and 4:2:0 chroma subsampling formats," ELMAR '09
International Symposium, pp.19-24, 28-30 Sept. 2009.
[8] ISO/IEC JTC 1, Coding of audio-visual objects Part 2:
Visual, ISO/IEC 14496-2 (MPEG-4 Part 2), Jan.1999.
[9] D. Marpe, T. Wiegand, S. Gordon, "H.264/MPEG4-AVC
fidelity range extensions: tools, profiles, performance, and
application areas", IEEE International Conference on Image
Processing ICIP 2005, vol.1, pp. I-593-6, 11-14 Sept.2005.
[10] T. Wedi and Y. Kashiwagi, Subjective quality evaluation of
H.264/AVC FRExt for HD movie content, Joint Video Team
document JVT-L033, July, 2004.
[11] T. Chujoh, R. Noda, Internal bit depth increase for coding
efficiency, ITU-T SG16 Q.6 Document, VCEG-AE13,
Marrakech, Jan. 2007.
[12] Joint Call for Proposals on Video Compression Technology,
ITU-T SG16 Q6 document VCEG-AM91 and ISO/IEC
JTC1/SC29/WG11, Kyoto, Japan, Jan. 2010.
[13] W. Gish, H. Yu, " Extended Sample Depth: Implementation
and Characterization," ISO/IEC JTC1/SC29/WG11 and ITU-T
SG16 Q.6 Document JVT-H016, Geneva, Switzerland, May,
2003.
[14] P. Larbier, Using 10-bits AVC/H.264 Encoding with 4:2:@ for
Broadcast Contribution, ATEME, Bievres, France
[15] M. Compton, 10 bit high quality MPEG-4 AVC video
compression, Tandberg Television, Southampton, UK.
[16] ftp://vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/
[17] http://www.ebu.ch/fr/technical/hdtv/test_sequences.php
[18] T. Tan, G. Sullivan, T. Wedi, "Recommended Simulation
Common Conditions for Coding Efficiency Experiments," ITU-
T SG16 Q.6 Document, VCEG-AA10, Nice, France, October,
2005.
[19] G. Bjntegaard, "Calculation of average PSNR differences
between RD-curves", ITU-T SG16 Q.6 Document, VCEG-M33,
Austin, US, April 2001.

You might also like