Professional Documents
Culture Documents
ABSTRACT [4], phase coding [5], patchwork coding [4], low-bit coding [1]
The audio watermarking method proposed in this paper offers and spread spectrum [6]. Among all the audio watermarking
the copyright protection to an audio without the use of the orig- techniques, the watermark bits can be embedded either in the
inal signal for watermark detection. The analysis filterbank de- transform domain or in the spatial domain. In this research
composition, the psychoacoustic model and the empirical mode work, the transform domain watermark is studied because it is
decomposition (EMD) are the three key techniques used in the demonstrated to be more robust against various attacks [2].
novel audio watermarking method. Unlike the traditional audio During the past few years, many of the developed audio wa-
watermarking algorithms where the watermark bits are embed- termarking algorithms [1][2][5] took advantage of the percep-
ded directly in the signal either by time domain or transform tual properties of the HAS in order to increase the robustness of
domain processing, the novel blind audio watermarking algo- the watermark message by maximizing its strength while em-
rithm proposed in this paper embeds the watermark bits in the bedding it in a perceptually transparent manner. Therefore, the
final residue of the subbands in the transform domain. Four psychoacoustic model is adopted to guarantee the impercepti-
watermark messages are embedded into the proposed audio wa- bility of the watermark message. In order to efficiently use the
termarking system. The inaudibility, capacity and robustness of psychoacoustic model in the transform domain audio data, a
the audio watermarking system are evaluated, in order to opti- polyphase filterbank is used for the time to frequency mapping
mize the system performance. The experimental results show of the original input audio data.
that the proposed blind watermarking scheme is robust against Traditional multimedia watermarking in either transform do-
MP3 compression and adding Gaussian noise attacks. main or temporal domain tends to embed the watermark bits di-
rectly into the coefficients. The empirical mode decomposition
Keywords Blind audio watermarking, empirical mode de-
(EMD) [7] is proved useful while processing the nonlinear and
composition (EMD), psychoacoustic model, analysis filterbank
non-stationary time series [8][9]. By using the EMD method,
decomposition
any multi-component signal is decomposed into a set of intrin-
sic mode functions (IMFs) and the final residual. This residual
1. INTRODUCTION is proved to be highly robust under Gaussian noise attack and
MP3 compression [9]. Thus it is possible to embed the wa-
Over the past decades, significant effort has been focused on termark bits robustly into the residual rather than the subband
the copyright protection of the digital media (audio, image, and audio signal itself.
video). A promising solution to this problem is the addition of The rest of the paper is organized as follows. Section 2 gives
a watermark to the digitized media, where the special infor- an overview of the empirical mode decomposition. The novel
mation (the watermark message) is hidden in the original data audio watermark embedding and extracting procedure is pro-
in an imperceptible manner [1]. posed in Section 3. Section 4 illustrates the experimental results
Compared to embedding watermarks into images, audio wa- for the quality evaluation. The experiments results of the water-
termarking is a more challenging task due to the fact that the marking capacity and robustness, as well as the performance
human auditory system (HAS) is more sensitive to distortions against signal processing attacks are given in Section 5. And
than the human visual system (HVS) [2], and that inaudibility is finally the paper concludes in Section 6.
much more difficult to achieve than invisibility for images [3].
Also, compared to the visual signals, audio signals are repre-
sented by much less number of samples per time interval, which 2. EMPIRICAL MODE DECOMPOSITION
limits the watermark capacity for the audio signals.
Several techniques in audio watermarking have been pro- A detailed mathematically formulated introduction of the em-
posed to address these challenges, including the echo coding pirical mode decomposition (EMD) can be found in [7][8][9].
By applying the EMD, any multi-component signal is decom-
We thank the Agency for Science, Technology and Research (A*STAR),
Singapore for supporting this work under the project Digital Rights Violation posed into a set of intrinsic mode functions (IMFs). The IMF
Detection for Digital Asset Management (Project No: 0721010022). can be defined as a hidden oscillation mode that is embedded in
978-1-4244-7493-6/10/$26.00 2010
c IEEE ICME 2010
1427
(shown in Fig. 2) makes the use of the polyphase filterbank,
psychoacoustic models and the empirical mode decomposition.
63
7
Fig. 1. The original PCM audio signal, its imf components (imf Si (t) = T (i)(k) (C(k + 64i) x(k + 64i)) (2)
1 - imf 8) and the final residue. k=0 j=0
where cn (t) is the n-th IMF of the signal and rm (t) is the fi- Si,j (t) = Si (j J + t) (3)
nal residue. The completeness and orthogonality of IMFs are where t = 0, 1, ..., J 1 and j = 0, 1, ..., NS 1.
shown by Huang [7]. It should be noted that, as the order of
the mode increases, the time scale increases while the mean fre-
3.2. Watermark Embedding Domain Control
quency of the mode decreases.
The final residue is a monotonic function and is the coarsest Often it is required to be able to embed the watermark into a
component of the signal. It has been shown that the final residue certain part of the audio, or not to embed the watermark into
behave stable under the Gaussian noise and MPEG compression a specific region, such as the silent region. Thus the water-
attack [9]. Thus we choose to embed watermark bits into the mark embedding domain control module is proposed, which
final residue obtained from the EMD process of the audio signal. makes it possible to embed the watermark bits in a more flex-
An example of the empirical mode decomposition of an au- ible manner. If we denote i as the appropriate bands and
dio signal is shown in Fig. 1. j as the appropriate segments for the watermarking process,
In our experiment, the watermark message is embedded into Si,j (t) := Si,j (t)ii,jj is defined as the segmented, band-pass
the Waveform Audio File Format (WAV) audio signal where the filtered audio stream that is suitable for the watermark embed-
bit stream is encoded with the Pulse Code Modulation (PCM) ding.
format. For the audio and speech processing, the PCM samples
are stored and processed using floating point numbers which 3.3. Empirical Mode Decomposition
have the zero mean (or the mean value is sufficiently small com-
pared with the amplitude of the signal) and varies in the interval The EMD is applied to each of the segmented subband stream
[-1.0, 1.0]. Thus, compared with the original audio signal, the Si,j (t). Thus we have
amplitude of its final residue can be regarded sufficiently small
(as shown in Fig. 1), which makes is possible to embed the
Ni,j
Si,j (t) = ci,j,n (t) + rm,i,j (t) (4)
watermarks in the final residue of the audio signal while the
n=1
watermark messages are perceptually inaudible.
where ci,j,n (t) are the IMFs of the segmented stream Si,j (t),
3. PROPOSED ALGORITHM Ni,j is the number of IMFs for the segmented streams Si,j (t)
and rm,i,j (t) is the final residue of stream Si,j (t).
A novel blind audio watermarking embedding scheme is de- It is worth noting that, the length of the segmented stream
scribed in this section. The proposed embedding scheme Si,j (t) may affect the watermarking system performance at a
1428
Watermark
Embedding
Domain Control
IMFs c 0,j,n(t)
EMD for
S 0(t) S 0, j(t)
each
Segmentation Mean Trend
Band 0 segment j
r m,0,j (t)
in band 0 + +
...
...
...
EMD for IMFs c M-1,j,n (t)
S M-1 (t) SM-1, j (t) each
Segmentation segment j Mean Trend
Band M-1 in band r m,M-1,j (t)
M-1 + +
Watermark Bits
... Embedded into Mean
Trends
Masking Watermark Strength Watermark
FFT
Thresholds Adjustment Watermark Bits w(j) Generator
noticeable level. The longer the length of Si,j (t) provides more determine the maximum possible power of the watermark mes-
samples to perform the EMD, thus better performance can be sage. By calculation of the signal-to-mask ratio (SMR) for each
expected. However in the proposed audio watermarking sys- segment in each subband, the total maximum possible water-
tem, the watermarking capacity will be reduced, as more audio mark strength can also be obtained.
samples is used to watermark with 1 bit. If we denote the signal-to-mask ratio for the segment j in
subband i as SM Ri,j , then we should have
3.4. Watermark Embedding
In order to increase the watermarking capacity, the subband au-
J1 31
dio stream Si (t) is embedded with the watermark sequences |rm,i,j i,j Wi (j)| SM Ri,j (6)
Wi (j) = wi,j , wi,j 1, +1 and 0 j NS 1. Since the (ii,jj) t=0 i=0
watermarking robustness generally increases with the amplitude
of the host audio signal, the signal-dependent watermark should where i,j is the weight for thresholding the amplitude of the
be embedded in the host audio. watermark strength, and i,j should be proportional to the sig-
Each watermark bit wi,j is embedded into the j-th segment nal strength Si,j (t) since the masking threshold is also propor-
in the i-th subband of the original audio signal by modifying tional to the signal strength.
its final residue rm,i,j (t). The segmented stream Si,j (t) after
embedding watermark bit streams Wi (j) are given by the fol-
lowing equations: 3.6. Watermarked Audio
1429
Watermark
Embedding Domain
Control
Mean Trend
Band 0 Sw 0, j (t) EMD for each rw m,0,j (t) J-1
Segmentation segment j in r w m,0,j(t)
band 0 t=0
Mean Trend
J-1
Watermarked Band 1 Sw 1, j (t) EMD for each rw m,1,j (t) Extracted
PCM Segmentation segment j in r w m,1,j(t)
t=0 Watermark Watermarks
Audio Output X w(t) Analysis band 1
Bits
Filterbank
Calculation
(M-band)
...
...
...
Mean Trend
Band M-1 S w M-1, j (t) EMD for each r w m,M-1,j (t) J-1
Segmentation segment j in r wm,M-1,j (t)
band M-1 t=0
w
Ni,j
w
Si,j (t) = i,j,n (t) + rm,i,j (t)
cw w
(7) audio clips are used for the evaluation. All audio files are 16-
n=1
bits mono audio sampled at 44.1 kHz (CD quality) ranged from
where cw i,j,n are IMFs of the watermarked signal Si,j (t) and
w 1-5 min. The type of the music includes classical, rock, jazz and
rm,i,j (t) are the final residue of the watermark signal.
w electrical music. There are altogether 20 listeners participate the
listening test. None of the participant was trained for the listen-
3.10. Watermark Bits Calculation and Watermark Mes- ing test all of them were only music listeners. All participants
sage Extraction were given the instruction of the listening test just before the
test began and they all used their own headsets.
w
The final residue value rm,i,j (t) is used to determine the embed- In the first part of the quality evaluation, participants were
ded watermark bits. The watermark bit wi,j can be calculated given the non-watermarked and watermarked audio files in ran-
by using the following formula: dom order, and they had to identify the watermarked ones
blindly. For each of the audio file, the listeners could make
J1
their choices as one of the three options: 1) non-watermarked,
wi,j = 1, if w
rm,i,j (t) 0 (8)
2) watermarked, or 3) can not tell the difference. While most of
t=0
the listeners could not tell the difference, this indicates that the
J1
non-watermarked and watermarked audio can not be discrimi-
wi,j = 1, if w
rm,i,j (t) < 0 (9) nated.
t=0
In the second part of the evaluation, with the prior knowl-
Thus, for each of the watermarked subband i, where i edge of the non-watermarked and watermarked (been water-
i, the corresponding watermark message can be extracted as: marked with one watermark message only) audio files, the lis-
Wi (j) = wi,j
, wi,j 1, +1 and 0 j NS 1. teners were asked to report the dissimilarities between the two
signals, using the so called Subjective Difference Grades (SDG)
4. QUALITY EVALUATION [11] as described in table 1.
The audio quality of a watermarking system can be linked to
Subjective quality evaluation of the watermarking scheme was the perceived difference (impairment) between the watermarked
conducted by the listening tests. A total number of 10 testing audio signal and the original audio signal. To facilitate data
1430
analysis, subjective difference grade (SDG) is calculated as the 2
difference of the grades between the watermarked signal and the 1.8
In the subjective listening test, the average SDG score was 1.4
-0.15. This indicates that the proposed EMD based watermark- 1.2
0.8
marked signal. 0.6
0.4
EXPERIMENTAL RESULTS 0
0 200 400 600 800
The length of segment for EMD process
1000 1200
of the segment (J) used for the EMD process. For 1 min of 0.012
the mono audio signal with the sampling rate of 44.1 kHz, the
are used for watermarking, and then vary the length of the seg-
0
ment in order to optimize the performance of the watermarking 0 200 400 600 800
The length of segment for EMD process
1000 1200
1431
6. CONCLUSION
Table 2. The bit error rate (BER) of the proposed watermark
system under signal processing attacks In this paper, a novel blind audio watermarking system based
MP3 compression Adding Gaussian Noise on the psychoacoustic model, the polyphase filterbank analysis
(128 kbps) and the empirical mode decomposition is proposed. In our ap-
BER 1.43e-02 1.15e-02 proach, the analysis filterbank is used to decompose the host au-
dio signal into multiply subbands, and each of the subbands can
be embedded with a unique watermark message. Within each
of the subbands, the signal is firstly segmented and the empir-
ical mode decomposition (EMD) is applied to each of the seg-
curity of the proposed watermark system, whereas higher wa- ments. The watermark bits are embedded into the final residue
termarking capacity is preferred, one can choose the length of extracted by the EMD process. The inaudibility of the water-
segment equals to the smallest value. In our experiment, we marks is guaranteed with the use of the psychoacoustic model.
choose the length of segment for the EMD process equals to 32. The watermark extraction procedure does not use the original
audio signal. The proposed blind audio watermarking scheme
Thus we choose to embed 4 watermark messages into sub- is proved to be robust against MP3 compression and adding
bands 3, 7, 11 and 15 with one watermark message embedded Gaussian noise attacks. However this method may not be robust
into each of the subband; and to use the segment of 32 samples to some other attacks such as band-pass filtering and cropping.
for the EMD process. Since each subband has a watermark- Ongoing research focuses in increasing the robustness of our
ing capacity of 43 bits/sec, the total watermarking capacity is method against such attacks.
172 bits/sec (43 4). With this configuration, bit error rate is
1.04e-2 which is very small. We can employ error correction
7. REFERENCES
technique to the watermark information bits before embedded
which would help to restore the watermarking bits even if the [1] N. Cvejic, Algorithm for Audio Watermarking and Steganography. Ph.D.
miss detection exits. diss., Department of Electrical and Information Engineering, University of
Oulu, 2004.
It can be seen from table 2 that, the proposed audio water- [2] I.J. Cox, and M.L. Miller, Digital Watermarking, Morgan Kaufmann,
2002.
marking scheme is robust against MP3 compression and Gaus- [3] F. Hartung, and M. Kutter, Multimedia watermarking techniques, Proc.
sian noise attacks. Especially under the Gaussian noise attack, of IEEE, vol 87, no. 7, Jul. 1999.
the bit error rate is still very low. This is because as the EMD de- [4] W. Bender, D. Gruhl, N. Morimoto, and A. Lu, Techniques for data hid-
ing, IBM Systems Journal, 35(3/4), pp. 313-336, 1996.
composition proceeds, the time scale increases while the mean [5] X. He, A. Lliev, and M. Scordilis, A high capacity watermarking tech-
frequency of the modes decreases. Thus the IMFs are extracted nique for stereo audio, in IEEE International Conference on Acoustic,
with the finest scale from the signal and the remainder final Speech, and Signal Processing, vol. 5, pp. 393-396, 2004.
[6] D. Kirovski, and H.S. Malvar, Spread-spectrum watermarking of audio
residue is the coarsest component of the signal [7]. Therefore signals, IEEE Transactions on Signal Processing Special Issue on Data
for the zero mean Gaussian noise, it is sifted in the lower or- Hiding, 51(4), pp. 1120-1033, 2003.
der of IMFs and the final residue (the mean trend) remains un- [7] N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih,Q. Zheng,N.C. Yen,
C.C. Tung, and H.H. Liu, The empirical mode decomposition and the
influenced. Hilbert spectrum for nonlinear and non-stationary time series analysis,
Royal Society A: Mathematical, Physical and Engineering Sciences, vol.
Since no other implementation of the watermarking system 454, no. 1971, pp. 903-995, Mar. 1998.
or test data is readily available at our site, no direct compar- [8] H. Liang, S.L. Bressler, R. Desimone, and P. Fries, Empirical mode de-
ison can be made in this paper. However, it should be noted composition: A method for analyzing neural data, Computational Neuro-
science: Trends in Research, vol. 65-66, pp. 801-807, Jun. 2005.
that, Cvejic et al. [12] proposed a spread spectrum based audio [9] N. Bi, Q. Sun, D. Huang, Z. Yang, and J.Huang, Robust image water-
watermarking scheme in temporal domain and claimed a wa- marking based on multiband wavelets and empirical mode decomposi-
termarking capacity of 14.7 bits per second for the mono audio tion, IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 1956-
1966, Aug. 2007.
signal. In their later research [13], a spread spectrum based au- [10] ISO/IEC Intl Standard IS 11172-3: 1993 Information Technology-
dio watermarking scheme in spectral domain is proposed, with Coding of Moving Pictures and Associated Audio for Digital Storage Me-
the watermarking capacity increased to 27.1 bits per second. dia at up to about 1.5 Mbits/s-Part 3: Audio.
[11] C. Neubauer, and J. Herre, Digital watermarking and its influence on
Bassia et al. [14] proposed an audio watermarking method that audio quality, 105th Convention of the Audio Engineering Society, pp.
embedded the watermarks in the temporal domain segmented 225-233, 1998.
audio signal. Each watermark bit is embedded in each of the [12] N. Cvejic, A. Keskinarkaus, and T. Seppanen, Audio watermarking using
m-sequences and temporal masking, IEEE Workshop on the Applications
audio samples. In order to obtain a higher performance, the wa-
of Signal Processing on Audio and Acoustics, pp. 227-230, 2001.
termark bits are repeatedly embedded in the audio with the em- [13] N. Cvejic, and T. Seppanen, Spread spectrum audio watermarking using
bedding length of 217 samples. However when embedded with frequency hopping and attack characterization, Signal Processing, vol.
multiple watermarks, the subjective quality evaluation could not 84, no. 1, pp. 207-213, 2004.
[14] P. Bassia, I. Pitas and N. Nikolaidis, Robust audio watermarking in the
provide promising results (the resulting watermarking system time domain, IEEE Transactions on Multimedia, vol. 3, no. 2, pp. 232-
would produce a noticeable distortion while 4 watermark mes- 241, 2001.
sages are embedded).
1432