Professional Documents
Culture Documents
1 INTRODUCTION
Long-term preservation of audio data is hopeless: the carriers are unstable, the commercial
lifetimes of the formats seem to become shorter and shorter, and the amount of data to be stored
increases every day.
Magnetic tapes, for analogue and digital audio recordings, typically contain the following
materials: magnetic oxide, polyurethane binder, polyester base and carbon back coating.
M a g n e tic
P a rtic le
B in d e r
L u b ric a n t
R e s e rv o ir
Top Coat
S u b s tra t u m
Back C oat
All three components - magnetic particle, binder, and backing - are potential sources of failure for
a magnetic tape medium. Polyester base is considered very stable under typical storage
conditions. The binder is subject to hydrolysis. The result is sticky syndrome or sticky shed.
The Magnetic-Media Industries Association of Japan (MIAJ) has concluded that the shelf life of
magnetic tape under normal conditions is controlled by the binder rather than the magnetic
particles ("DDS Specs Drive DAT Reliability," Computer Technology Review, 13 (5), May
1993). In this instance, the shelf life would refer both to the life of recorded as well as unrecorded
media; the life of the binder is independent of whether or not the tape has ever been recorded.
Accordingly, analogue and digital tape formats share many of the physical attributes as it relates
to aging and life expectancy. Digitals greatest advantage is each duplicate matches the quality of
the original. The risk is, when failure occurs, it is complete failure.
1/29
Analogue technology is well documented. Operators and engineering expertise are still available.
Over time this will diminish, as the younger people in the industry are not using analogue as the
primary recording method.
On the other hand, special knowledge is necessary about the technology used for digital medium
formats:
CD-R and DVD-R both use organic dyes that respond similarly to temperature and humidity over
time. Manufactures have conducted accelerated aging tests by subjecting discs to higher
temperature/humidity then extrapolating the future failure point.
Using similar methods, National Institute of Standards & Technology (NIST) found DVD-R
authoring discs to last 25 years.
Replicated discs, DVD Video & audio CDs use aluminium as the reflective layer (CD-R/DVD-R
use a more stable Silver) that can be subject to rot as the metal oxidizes if not properly sealed
during manufacturing.
All DVD discs are constructed by gluing two polycarbonate discs together. There are two
methods: one is the UV cured lacquer (considered more stable) and thermal melt glue.
Separation of the discs can cause failure.
MO Discs use heat and magnetism to mark the disc. A NIST study concluded with 95%
confidence that an MO disc will last 57 years at room temperature at 90% humidity.
Helical scan data recording formats for digital tape (AIT-SAIT-DTRS-DAT-HDCAM-Hi8) are
considered higher risk because misalignment of the recording heads or warp-age of the media
base can cause data retrieval failure. Sony quotes 30 years life expectancy under proper storage
conditions (from Sony web site).
Linear recording formats for digital tape (DLT-SDLT-LTO1, 2, 3, 4) are considered more
reliable because of the fixed position recording head.
Metal particle (MP) or metal evaporated (ME) tape both does not use binders for extra thin
coating providing better wrap and signal performance but at a price of being more fragile 1 .
All practical media degrade. The substrate and the active layer deteriorate over time. This can be
slowed by optimal storage, but not halted. With good quality media and optimal storage,
magnetic recordings will last for many decades. The problem is always that of finding a drive that
will play the medium. (In 20 years time, how many 1/4-inch open reel recorders will be around?
How many DAT machines? 2 )
From this point forward only digital solutions will be available. Organizations responsible for the
preservation of audio elements can expect the digital media formats to evolve faster than before
in history, resulting in less compatibility regionally and worldwide.
The challenge is to choose archive strategies that compensate for these dynamics.
One of the things digital recording does very reliably is not to cause generation loss. So, if by
reliability we mean the ability to record audio for an indefinite period, then digital becomes the
only choice.
Unlike analogue recording, digital audio can be copied and recopied without loss of signal or
addition of noise or distortion. The potential offered by the production of digital surrogates for
the purpose of preservation seems to provide an answer to linked issues of preservation and
access.
2/29
A well-engineered digital recorder has an effective error correction system that puts the data back
to its original binary value provided the error rate is within limits. Thus it is possible to determine
if a digital medium is deteriorating by playing it once in a while. If the error rate starts to
increase, we know that the medium is destined to disaster. However, if we make a digital copy
onto a new medium (which may even be a new type of medium) before the error rate exceeds the
performance of the error correction, or before the player becomes obsolete, we can start the race
again. Thus the question of reliability comes down to how well the system for doing that digital
copying is administered.
A survey done with ten broadcasters in Europe indicates that the audio holdings are mainly in
quarter inch tapes and shellac and vinyl discs, besides cassettes, DAT, CD, minidisks and
Tandberg QIC cartridges.
For all broadcasters the problems with the analogue replay equipment appear more and more
difficult to overpass them, but those caused by digital carriers in the same situation are even
greater. In consequence are enough arguments, before the digital recording technology it is made
on becomes obsolete, for a migration of all recordings made in digital form to a stable storage
system and format.
That means, the digital preservation should be initiated (a planned, organised and standardised
transfer to suitable storage), as soon as possible, enabling the digital file to be successfully and
simply migrated when necessary 3 .
The automatically accessible, self-controlling and self re-generating archival system, also known
as digital mass storage system (DMSS), is the most appropriate solution. The features of such
system are 4, 5 , 6 :
The management of audiovisual data as computer files in mass storage systems, e.g. libraries or
robotics of magnetic tape cartridges.
An open file architecture to accommodate all audio data together with catalogue / content
information and written text (metadata).
The access time of such systems is not of major importance.
Data integrity is controlled automatically, and copying of the information onto new carriers is
done automatically before mistakes cannot be fully corrected.
Once new storage media and systems are available due to technical development, automated
migration will be implemented.
Similar observations are done in IASA after extensive pilot projects, digital mass storage
systems (DMSS) have been installed in major archives for the storage of large audio collections.
Such systems permit the automatic performance of tasks including checking of data integrity,
refreshment, and, finally, migration with a minimum use of manpower (cf. IASA-TC 04, 6.2).
The benefits of using a networked radio station with digital mass storage versus using a
conventional radio sound archive are broadly described in several documents.7,7
3/29
Migration: this is the main immediate work for audio and video materials, and for decaying film.
Preservation via migration consists of transferring master material from old formats to new ones.
The same technical process, namely transfer from one format to another, is also used to make
viewing copies, usually in lower quality. All formats benefit from cost effective methods for
production of viewing copies.
Restoration: Archive media can be of varying quality, and modern technology can significantly
improve the result of a migration (transfer) process. Restoration has tended to be seen a too
expensive to incorporate in cost-effective transfers, but some of the data necessary for restoration
is calculated as a matter of course in the digitisation stage. To integrate the workflow so that
restoration becomes more cost effective, so that more archive material can benefit from the power
of digital restoration, it could be a major goal for this kind of project.
Presto published a model for a range of activities associated with digitisation, from all the actions
needed to identify materials for transfer, gather them together and transport them, and their
metadata, to the digitisation area providing web access and updating the metadata (catalogue).
These additional activities are shown in the following figure.
Composition
Identify and
assemble materials
Digitisation
Update Archive
IASA-TC 03 10 and TC 04 11 , in addition to stating that equipment must be optimally adjusted and
maintained, suggest that playback requires knowledge of the historic audio technologies and a
technical awareness of the advances in replay technology.
The CLIR/LC 12 report, Capturing Analog Sound, addresses this directly, suggesting that there
are many areas in which a trained ear and years of experience are by far the most important
tools. in some archives, fragile audio recordings are being handled, played, and transferred for
digital preservation by staff who have limited experience working with audio recordings or little
knowledge about the sonic characteristics and weaknesses of various audio formats.
Recommendations and basic audio engineering principles regarding all signal chain components
and technical spaces used for preservation transfer work
IASA-TC 04 stipulates: The combination of reproduction equipment, signal cables, mixers and
other audio processing equipment should have specifications that equal or exceed that of digital
audio at the specified sampling rate and bit depth. The quality of the replay equipment, audio
path, target format and standards must exceed that of the original carrier.
The CLIR/LC report discusses the need for accurate monitoring systems to evaluate quality as
well as test equipment to evaluate potential problems.
Richard Warrens storage document published in the ARSC Journal 13 recommends a Noise
Criteria-level of 20-25 dB for critical listening areas. More generally, he also calls for
consideration of the proper acoustical conditions to prevent the room from distorting the sounds
to be studied.
According to IASA-TC 04, any transfer should attempt to extract the optimal signal from the
original [as] the original carrier may deteriorate, and future replay may not achieve the same
quality, or may in fact become impossible, and secondly, signal extraction is such a time
consuming effort that financial considerations call for optimization at the first attempt.
Taking care of conclusion that the most direct and clean signal path must be used from
source to destination, it is very important to underline the weakest link in the digital chain: the
point of conversion from analogue to digital. The choices made regarding conversion
technologies, and the selection of digital formats, resolutions, carriers and technology systems
will impose limits on the effectiveness of digital preservation that cannot be reversed, as will the
quality of audio being encoded. Optimal signal extraction from original carriers is the
indispensable starting point of each digitization process 14 .
Then, having faced the need to copy, the selection of storage format becomes next major issue.
The sound archiving community is rallying around the European Broadcast Unions Broadcast
Wave Format (BWF). BWF [EBU Tech 3285] is a format that complies with the specification of
the .wav format but has included a number of metadata tags in same manner as TIFF (tagged
image file formats) has done for images. The International Association of Sound Archives
recommends the use of linear BWF files for archiving: because of the simplicity and ubiquity of
5/29
linear PCM (interleaved for stereo) The BWF format is widely accepted by the archiving
community
All responsible archiving groups and associations strongly argue against the use of any format
that uses lossy data compression or perceptual coding in archival recordings, or in recordings
eventually intended for archives. MP3 (MPEG 2 layer 3), minidisk and any form of streamed
audio are all formats which employ bit rate reduction or data compression, and should not be
used in archival processes, including field recording. It is not possible to uncompress recorded
audio that uses perceptual coding; instead the part of the audio that is discarded remains forever
lost, permanently limiting the quality and use of that audio thereafter.
The conversion and storage system consists of three parts, the analogue to digital conversion
hardware, the computer system and the storage system.
We are aware of the difficulties using transducers in a complete audio signal chain to convert
signals from acoustical to electrical, what is done by the microphone, and back again from
electrical to acoustical, by the loudspeaker. But, trying to keep meaningfully audible audio
signals for indefinite long time, it is necessary to store them in the best conditions offered by
digital domain.
Hence, the A/D converter becomes the key component in the signal path, as the choice of the A/D
converter irrevocably affects the fidelity of the resulting signal.
To assess the degree of transparency, the converters electrical measurements and subjective
aural performance, as well as the converters operating parameters such as sampling frequency
and word length, must be considered. Finally, the signal-level input to the converter, convertercomponent design, and external conditions such as grounding and shielding can greatly affect the
fidelity of the resulting file.
Choosing an A/D converter must be based on an evaluation of technical measurements and of
subjective listening.
In converting analogue audio to a digital data stream, the analogue to digital converter should not
colour the audio or add any extra noise. It must exhibit audio transparencythat is, it should
neither add to nor subtract from the sound. In practice, the A/D converter incorporated in a
computers sound card does not, and cannot, meet the specifications required due to low cost
circuitry and the inherent electrical noise in a computer. A discrete (stand alone) A/D converter
that will convert from analogue to digital in accordance with the professional specifications is
always recommended.
The more recent generations of computers have sufficient power to manipulate large audio files.
Once in the digital domain, the integrity of the audio files should be maintained. As noted above,
the critical point in the preservation process is converting the analogue audio to digital, and this
6/29
relies on the A/D converter, and entering the data into the system, either through the sound card
or other data port.
Peak Levels
in Music Performances
Classical music
90-118 dB SPL
Rock music
115-129dB SPL
Jazz music
114-127 dB SPL
Others
116-127 dB SPL
~4 dB SPL
Headroom
6-9 dB
Mixing consoles
>288 dB
Storage
>144 dB
A/D convertors
CD-A
96 dB
SACD (with noise shaping)120 dB
DVD-A
144 dB
115-130 dB
Techniques for
increasing
dynamic range
Footroom
6-9 dB
Reproduction system limitations
- dynamic range
D/A convertors
>110 dB
Power amplifier
110-120 dB
Loudspeakers (1m peak outputs)
consumer
112-120 dB SPL
professional
128-131 dB SPL
7/29
0.8
Amplitude
Amplitude
0.8
0.6
0.4
0.2
0.6
0.4
0.2
0
0.5
0.6
0.7
0.8
Time (mseconds)
0.9
0.05
0.1
Time (mseconds)
0.15
Figure 3 Step responses of an equiripple FIR filter for two different sampling frequencies:
48 kHz and 96 kHz
8/29
Mag. (dB)
Apass
Astop
|
|
Fpass Fstop
f (Hz)
Fs/2
M a g n i tu d e R e s p o n s e ( d B ) - L o w p a s s E q u i r i p p le F IR
F s = 1 9 2 0 0 0 H z, F p a s s = 2 0 0 0 0 H z
0
-2 0
Magnitude (dB)
-4 0
-6 0
F s to p = 2 4 0 0 0 H z
F s to p = 4 8 0 0 0 H z
F s to p = 9 6 0 0 0 H z
-8 0
-1 0 0
-1 2 0
10
20
30
40
50
60
F re q u e n c y (k H z)
70
80
90
Im p u ls e R e s p o n s e - L o w p a s s E q u i r i p p le F IR
F s = 1 9 2 0 0 0 H z, F p a s s = 2 0 0 0 0 H z
0 .8
0 .7
F s to p = 2 4 0 0 0 H z
F s to p = 4 8 0 0 0 H z
F s to p = 9 6 0 0 0 H z
0 .6
0 .5
Amplitude
0 .4
0 .3
0 .2
0 .1
0
- 0 .1
- 0 .2
0 .1
0 .2
0 .3
0 .4
0 .5
T im e (m s e c o n d s )
0 .6
0 .7
0 .8
0 .9
Figure 4 Impulse responses of an equiripple FIR filter for different attenuation slopes
3.1.2 Quantization Word Length: 24 bits
The word length of the converter describes the length of the output digital word and hence the
number of bits used to represent the amplitude of the audio samples.
9/29
An ideal ADC (without room for any internal noise) has noise spread over the band from DC to
the folding frequency and can be determined using the following equation:
FB
Ideal noise ( DCto f B ) = 10 log
+ 3.01 n 6.02 dBFS
1/
2
F
H(f)
Sa(t)
Sq(n)
1/ q q q / 2
p ( q ) =
q > q / 2
0
16
The quantisation noise power is given by :
q2
1
2
2
= E {( q q ) } = =
.
12 3 22 n
In the digital domain the signal levels is expressed relative to digital full scale, as it was defined
in AES 17 17 : the level of the sine wave that has peak level equivalent to the maximum
positive value:
S a (t ) = U v cos(2 t / T )
Fitting U v = 1 , the average power of the reference signal is
Uv2 1
=
a
2
2
Accordingly, the signal to noise ratio inside of the convenient audio bandwidth being:
S2 1/ 2 F
1/ 2 Fs
s
6.02n + 1.76 + 10 log10
SNR = 10 log10 2a
dBFS
FB
FB
This is the available when the quantizer, a nonlinear device, behaves in a statistical sense like a
linear device, the quantization noise being modelled as IUDN (Independent UniformlyDistributed Noise). So, although quantization acts nonlinearly on signals, it acts linearly on their
probability densities.
The quantizer is then a source of additive noise whose statistical properties are known and fixed:
mean = 0, variance = q2/12, uncorrelated with the quantizer input.
s2 = E {( Sa (t ) Sa (t ) ) 2 } =
10/29
If the quantizer input Sa has a PDF (probability density function) that does not satisfy any of the
quantizing theorems, the quantization noise will not have properties like IUDN. These properties
can be obtained by the addition of a suitably designed independent dither signal d to the quantizer
input. This usually means that: each dither sample is produced by a pseudorandom number
generator, and a D/A converter is used to convert the number to an analog level to be added to the
input of the quantizer before quantization.
The total output noise q+d should be independent of the quantizer input Sa (or q+d
uncorrelated with Sa) in order to satisfy the ideal objective for a linear quantizer device.
So, the quantizer would be linearized by the dither, and the IUDN model would prevail.
The price paid using this technique is the increased noise power due to dither signal.
Using, for example, a Gaussian dither, whose standard deviation is q/2, the noise of ideal
converter will be increased by
q2 q2 q2
+
=
+6dB .
12 4
3
Or, using a triangular dither, whose amplitude range is +/-q, the total output noise power will be
q2 q2 q2
+
=
+4.77dB
12 6
4
As, from the statistical point of view of second-order moments, the triangular probability
distribution function (TPDF) dither ensure the desired behaviour of the IUDN model much better,
using this dither, the ideal signal to noise ratio of the converter will became:
S2 1/ 2 F
1/ 2 Fs
s
6.02n 3.01 + 10 log10
SNR = 10 log10 2a
dBFS
FB
F
Number of
bits
16
24
Fs = 44.100 Hz
Fs = 48.000 Hz
Fs = 96.000 Hz
Fs = 192.000 Hz
93.73 dBFS
94.10 dBFS
97.11 dBFS
100.12 dBFS
141.89 dBFS
142.26 dBFS
145.27 dBFS
148.28 dBFS
SNR of a ideal ADC (with unshaped TPDF dither of 2 LSBs amplitude peakto-peak), in unweighted bandwidth (20.000 Hz) measurement conditions
However, this is a theoretical figure. A more effective measure of the converter quality, due to
the converter errors, is ENOB (effective number of bits) where
ENOB = (dynamic range 1.76)/6.02
For example, a 24-bit converter with a measured dynamic range of 125 dB provides only 20.5
bits of resolution.
However, a well-designed 24-bit converter will provide a noise floor that lies at the limits of
audibility offering the potential for the requested highest fidelity of a complete audio signal
chain.
The debate regarding the converter resolution required for transparency could be made easier
using some statistics about the human hearing sense.
Listeners weigh the determining factors, sound pressure level, frequency contents, and duration,
differently. Loudness, for example, unlike electrical level, is subjective.
11/29
Our sense of hearing assesses loudness by how the cilia and corresponding auditory nerve fibres
are excited in the basilar membrane in the inner ear. This excitation is distributed on the
membrane by frequency bands, forming a kind of biological spectrum analyzer. Each frequency
excites a certain zone on the basilar membrane and each excited zone adds up to the total
loudness.
The Fletcher/Munson curves were constructed by subjective responses to sinusoidal tones
presented frontally. The phon values were defined by the 1 kHz sinusoidal tones, measured in dB,
the levels giving the name of the phon curves. For example, the 40 phon curve has 40 dB
intensity with a 1 kHz tone.
Several corrections to the Fletcher/Munson were done and included in ISO 226, as a standard for
the hearing threshold of sine waves under free-field conditions, and modified to diffuse-field
conditions by the ISO 454.
Threshold of pain
Figure 5 Equal loudness contour as described by ISO226 versus Dynamic range of high
quality audio A/D converters and DSPs
12/29
The analysis 18 of the sound levels of acoustic noise (taking care of the ability of the listeners to
detect noise, 3.8 dB SPL being just audible level of white noise), and the sound level of music
(taking care of 120-129 dB SPL peak levels of some music performances) give us the figure of
the necessary dynamic range: 122-124 dB (Figure 5). Accordingly, if a digital system produces
processing artefacts, which are above the noise floor of the input signal, then these artefacts will
be audible under certain circumstances.
The archival conversion of old recordings signals, with low intensity or limited frequency content
(Figures 6 a) and b)), should be followed by digital processing designed to prevent processing
noise from reaching levels at which it may appear above the noise floor of the input and hence
becoming audible.
Year
Old recording medium
dB
Frequency bandwidth (Hz)
1897 Shellac Discs
28
168-2.000
1931 Vinyl long play records
60
30-10.000
1944 Decca FFRR (Full Frequency Range
60
10-15.000
Recordings)
Table 1 Dynamic range and frequency bandwidth of gramophone discs
Power spectrum estimate - Example1: old gramophone disc, specific background noise
-40
Hamming
Kaiser
Chebyshev
-60
Power spectrum estimate - Example2: old gramophone disc, specific background noise
-40
Hamming
Kaiser
-60
Chebyshev
-80
Magnitude (dB)
Magnitude (dB)
-80
-100
-120
-100
-120
-140
-140
-160
-160
-180
-180
10
15
-200
20
Frequency (kHz)
a)
10
Frequency (kHz)
15
20
b)
Figure 6 - Specific background noise of old gramophone discs (two examples)
It is important to quantize with a word length that is relatively longer than what may be
immediately required. The larger dynamic range provided by recommended 24-bit word length
supplies greater headroom, which makes level setting less critical.
13/29
Also, a well-designed 24-bit converter will offer the potential for the requested highest fidelity of
a complete audio signal chain, providing a noise floor that lies at the limits of audibility (Figure
7).
Power spectrum estimate - Musical modern recording fragment, 24bit, Fs=192.000Hz
-20
Hamming
Kaiser
Chebyshev
-40
-20
-60
-80
-100
Magnitude (dB)
Magnitude (dB)
-80
-120
-140
-100
-120
-160
-140
-180
-160
-200
-180
-220
Hamming
Kaiser
Chebyshev
-40
-60
10
20
30
40
50
60
Frequency (kHz)
70
80
-200
90
10
15
20
Frequency (kHz)
a)
b)
Figure 7 Fragment of recent piano recording, made with extremely low self-noise
microphone and 24 bit (192 kHz sampling) digital recorder
- Large bandwidth power spectrum estimation; the bandwidth is limited at Nyquist
frequency (half the sampling frequency);
- Enlarged part of the above power spectrum estimation including only frequencies up
to 25 kHz
In order for the DSP to maintain the SNR established by the A/D converter, all intermediate DSP
calculations require the use of higher precision processing. The digital processing, as you could
see in next figure, decreases useful worth length, effectively, because, the cascading
mathematical operations, truncation and rounding add error to the least significant bit (LSB).
Sa(t)
x1
u1+
xn(t)+eq+d
ei
A/D
Arithmetic
precision
b0
ep
z-1
Arithmetic
precision
z-1
b1
ep
Arithmetic
precision
ep
bm
x2
x2
x1
u1
x2
es
Rounding/
Truncation
er/t ~yn(t)
u1
es
Saturation
effects
Saturation
effects
ep
x1
u1
es
yn(t)
Arithmetic
precision
ep
x1
u1
es
Saturation
effects
b3
Arithmetic
precision
x1
x2
z-1
z-1
b2
Saturation
effects
Memory
x1
u1 +
Sa(t)
eo
D/A
14/29
15/29
Apogee AD-8000
Channel 1-8
107109
108113
Fs=44.1kHz
RME ADI-8 DS
Channel 1-8
113.5
117
Fs1=44.1 kHz
Fs2=88.2 kHz
Fs3=96.0 kHz
10 Hz (0.1 dB)
20.81 kHz (0.4 dB);
10 Hz (0.1 dB)
20.72 kHz (0.4 dB) or,
41.01 kHz (0.4 dB) or,
44.67 kHz (0.4 dB);
21.44 kHz (3 dB) or,
42.89 kHz (3 dB) or,
46.52 kHz (3 dB).
-107
-105
Joshua D. Reiss, in his recent, already cited article Understanding sigma-delta modulation: the
solved and unsolved issues, described several limitation of the practical sigmadelta
modulation: limit cycles, idle tones, harmonic distortion, dead zones, noise modulation, and
stability.
Definitions included in the cited article
Limit cycles: the occurrence of a repeating
sequence in the output bitstream, for audio
applications, being possible audible artefacts.
Idle tones: a discrete peak in the frequency
spectrum of the output of a converter with
sigmadelta modulation, but superimposed on
a background of noise.
Harmonic distortion: peaks that are due to
unwanted harmonics or aliasing of the input
signal and those that bear no apparent
relationship to the input frequency.
Dead zones: a range of input for which the
sigmadelta modulator may produce the same
average output value.
Noise modulation: the quantization noise
power depends on the signal and it can be
perceived after the quantization of audio
signals.
Stability: with given initial conditions and
constant input, the stable behaviour of the
higher order sigmadelta modulators
converter is questionable.
16/29
20 log10
p s 13
14.6 (s p ) / 2
In this context, a sharp cutoff or a narrow transition band will imply a very long length FIR filter,
whereas a wider transition will involve a shorter length FIR filter.
Parks and Burrus 26 proposed the following alternative formula for very wide band filter case:
20 log10 ( p ) + 5.94
N
27 (s p ) / 2
The estimation of the filter order is more dependent of the passband ripple in this circumstance.
The passband response, especially, should approximate the ideal of being flat in a way that
minimises the maximum distortion of the real filter response.
18/29
for 0
The linear-phase property ensures that the frequency response of the filter can be written 28 :
H (e j ) = H p ( ) = H p ( ) exp j ( a + b ) a, b : real constant coeficients and H p : R R
as a phase factor (linear-phase) in cascade with a real frequency response which can be expressed
as the sum of cosines. The sum of cosines term in turn can be expanded as a sum of cosine
powers, i.e. a Chebyshev polynomial in cos() .
With this decomposition, algorithms such as the Remez exchange procedure can be used to
design optimal min-max approximations to a desired response.
In concordance with above design idea, the filter passband response (and similar the stopband)
can be considered as the desired flat response with additional error response.
Next figures illustrate the possible response (designed with REMEZ algorithm, in Signal
Processing Toolbox from MATLAB workspace) of some high-end equipment, when anti-alias
and anti-image, equiripple linear-phase FIR filters are used:
19/29
0
Frequency (kHz): 21.41309
Magnitude (dB): -3.016742
-40
-60
-80
Lowpass Equiripple
FIR 118 tap
Frequency Response
-120
0
a)
x 10
-3
10
15
Frequency (kHz)
-80
20
Lowpass Equiripple
FIR 147 tap
Frequency Response
10
d)
Magnitude Response (dB), Fs=48kHz
x 10
-3
20
30
Frequency (kHz)
40
0.5
Magnitude (dB)
Magnitude (dB)
-60
-120
0.5
-0.5
-1
-40
-100
-100
-20
Magnitude (dB)
Magnitude (dB)
-20
-0.5
Lowpass Equiripple
FIR 118 tap
Passband Magnified
0
10
Frequency (kHz)
15
-1
20
b)
Lowpass Equiripple
FIR 147 tap
Passband Magnified
0
10
Frequency (kHz)
15
20
e)
Magnitude Response (dB), Fs=48kHz
-100
-110
-120
Magnitude (dB)
Magnitude (dB)
-120
-130
-140
-150
-160
-170
Lowpass Equiripple
FIR 118 tap
Stopband Magnified
-180
23
23.2
23.4
23.6
Frequency (kHz)
-140
-160
Lowpass Equiripple
FIR 147 tap
Stopband Magnified
-180
23.8
24
25
30
35
40
Frequency (kHz)
45
c)
f)
Figure 9 - FIR filter specifications for 48 kHz sampling rate (a, b, c), in conjunction with
critical, analogue low-pass filter of high order; the same specifications (d, e, f) for 2x
oversampling equivalent filter, in conjunction with gentle, analogue low-pass filter of lower
order
20/29
0
Frequency (kHz): 29.80078
Magnitude (dB): -3.07245
-40
-60
-80
-100
-60
-80
10
20
30
Frequency (kHz)
40
20
d)
1
x 10
-3
-3
40
60
Frequency (kHz)
80
0.5
Magnitude (dB)
-0.5
-1
x 10
0.5
-0.5
Lowpass Equiripple
FIR 22 tap
Passband Magnified
0
10
Frequency (kHz)
15
-1
20
b)
Lowpass Equiripple
FIR 41 tap
Lowpass Magnified
0
10
Frequency (kHz)
15
20
e)
Magnitude Response (dB), Fs=192kHz
-100
-110
-110
-120
-120
Magnitude (dB)
Magnitude (dB)
Lowpass Equiripple
FIR 41 tap
Frequency Response
-120
a)
Magnitude (dB)
-40
-100
Lowpass Equiripple
FIR 22 tap
Frequency Response
-120
-140
-20
Magnitude (dB)
Magnitude (dB)
-20
-130
-140
-150
-130
-140
-150
-160
-160
Lowpass Equiripple
FIR 22 tap
Stopband Magnified
-170
44.5
45
45.5
46
46.5
Frequency (kHz)
-170
-180
47
47.5
48
Lowpass Equiripple
FIR 41 tap
Stopband Magnified
50
60
70
80
Frequency (kHz)
90
c)
f)
Figure 10 Gentle, digital low-pass filters with very small errors in the 20kHz band using
high frequency sampling: 96kHz (a, b, c) or 192kHz (d, e, f)
The passband response of this kind of digital filter is not ideal flat in an obvious manner, having
specific additional error response as a constant ripple. This error can be approximated by
cosinusoidal shape in frequency domain, indicating pre and post-echoes in the time domain.
The above figures show echo amplitudes less than 80dB and timing variations of between 0.1ms
(approximated at 192kHz sampling rate) and 1.2ms (at 48kHz sampling rate).
However, these values are far away from those that were found to be quite perceptible by
untrained listeners (-30 dB at +/- 40ms).
21/29
Taking care of the interest in the growing requirement for restoration of degraded sources to get
improved resolution of the impulsive signals and an improved perception of musical transient
attacks passages, it is recommended to repeat perception experiments noticing the difference
between 48kHz and 96kHz or 192kHz in localisation accuracy with available real-less ideal
filters.
The real anti-alias and anti-image filters should develop, inside of a more or less large transition
region, the full attenuation of the filter in order to avoid alias or image specific distortions.
And, in accordance with this principle, for systems operating at low sampling frequency and
requiring small transition region (0.45Fs to 0.5Fs), it is very difficult to achieve the desired
performance even with highest performance integrated circuits and filters design.
4.2.1 The effect of aliasing during digitization process
The aliasing caused by the reflection of the spectrum of the audio signal about the folding
frequency (0.5Fs) during sampling process in an analogue to digital conversion process produces
frequency shifted signal in the audio band.
The poor rejection of the alias components in the transition region (above 19-20kHz, for
example) could involve low direct effects, being inaudible for most of listeners. But, any
intermodulation mechanism, likely to happen inside following stages of processing and
reproduction system, could provoke, at lower frequency, audible frequency distortion.
The alias signal, consequently, will modulate with the harmonics of the original signal generating
a-harmonic signals as intermodulation distortion.
The solution is to have full attenuation at the half of the sample frequency. On the other side it is
necessary to have as wide frequency response as possible for different sampling rate applications.
For example, most implemented digital filters as anti-alias filters, in A/D conversion, using
44.196kHz sampling rates, start at 45% and have full attenuation at 55% of the sample
frequency (Table 1).
Parameter
Min
Typ
Max
Single Speed Mode (2 kHz to 50 kHz sample rates)
Passband (-0.1 dB)
0
0.47
Passband Ripple
+/-0.035
Stopband
0.58
Stopband Attenuation
-95
Total Group Delay (Fs = Output Sample Rate) tgd
12/Fs
Dual Speed Mode (50 kHz to 100 kHz sample rates)
Passband (-0.1 dB)
0
0.45
Passband Ripple
+/-0.035
Stopband
0.68
Stopband Attenuation
-92
Total Group Delay (Fs = Output Sample Rate) tgd
9/Fs
Quad Speed Mode (100 kHz to 200 kHz sample rates)
Passband (-0.1 dB)
0
0.24
22/29
Unit
Fs
dB
Fs
dB
s
Fs
dB
Fs
dB
s
Fs
Passband Ripple
+/-0.035
dB
Stopband
0.78
Fs
Stopband Attenuation
-97
dB
Total Group Delay (Fs = Output Sample Rate) tgd
5/Fs
s
Table 2 - Digital filter characteristics of CS5381 (120 dB, 192 kHz, multi-bit audio A/D
converter), Cirrus Logic -Product information
Above exemplified filter, at 48kHz sampling frequency, offers 22.5kHz as passband edge and
27.5kHz as the end of transition region to the stopband full attenuation.
In this case, the a-harmonic mirrored frequencies: Fs-f (where f > 0.5Fs), reproduced in a
loudspeaker, could intermodulate with the audible signal and create, new audible frequency
components, so called Aliasing Intermodulation Distortion.
James Boyk carried out, in 1992-1997, measurements of several instruments, mainly in the Music
Lab at California Institute of Technology, capturing their ultrasonic extension and energy 29 (with
a Hewlett Packard 3567 FFT analyzer and two quarter inch microphones, a Bruel&Kjaer 4135
model and, the other, an Aco/Pacific 7016 model).
Regarding these aspects, he gave interesting information about the highest frequency where the
harmonics are still present (Table 3, for instruments with harmonics) and about the highest
frequency where the sound level is, at least, 10dB above background (Table 4, for instruments
without harmonics)
Instrument with
harmonics
SPL
(dB)
Harmonics
still present
1.
Trumpet (Harmon
mute)
96
>50kHz
Percentage of
power above
20 kHz
0.5%
2.
Trumpet (Harmon
mute)
76
>80kHz
2%
3.
Trumpet (straight
mute)
83
>85kHz
0.7%
4.
113
>90kHz
0.03%
5.
99
>65kHz
0.05%
23/29
20kHz
6.
French horn
105
>55kHz
0.1%
7.
Violin (double-stop)
87
>50kHz
0.04%
8.
77
>35kHz
0.02%
9.
Oboe
84
>40kHz
0.01%
Table 3 Frequency extension and ultrasonic energy of some instruments with harmonics
Instrument without
harmonics
SPL
(dB)
1.
Speech Sibilant
72
Sound level:
10 dB above
background
>40kHz
2.
Claves
104
>102kHz
3.8%
3.
73
>90kHz
6%
4.
Crash Cymbal
108
>102kHz
40%
5.
Triangle
96
>90kHz
1%
6.
Keys jangling
71
>60kHz
68%
24/29
Percentage of
power above
20 kHz
1.7%
20kHz
7.
Piano
111
>70kHz
0.02%
This evidence is not a confirmation for the ultrasound perception abilities, but it could be the
knowledge of the ultrasound reality that might interfere, indirectly, with the recording
reproducing process.
There are areas where the desired quality of audio restoration process is strong related with
previous signal enhancement, due to very poor high frequency response of most early recordings.
In this case, the high frequency information of recorded signal being buried deep in noise, it is
important to predict these low level components using an adequate model, including frequency
characteristics of instruments.
Even we ignore the frequency extension and ultrasonic energy of instruments, the non-linear
behaviour of the stages following the digital to analogue conversion could cause intermodulation
distortion artefacts.
So, the poor rejection of the alias components in the transition region and the nonlinearities in the
signal path (the behaviour of the loudspeakers being a good example, generating modulation
between frequency components of the signal) increase the incertitude during the evaluation
process of the audio restoration work.
4.2.2 The effect of imaging during audio signal reproduction
Even though the audibility and relevance of signals above 20 kHz is matter of further debates, all
images above folding frequency (0.5Fs), especially, for lower sample rate Fs, could provoke
distortion artefacts in audio band.
It is necessary to take into consideration, once again, the potential non-linear behaviour of the
electronic and electromechanical stages following the digital to analogue conversion.
Accordingly, the effects of high amplitude and frequency input signal components (bellow the
half sample frequency, 0.5Fs), having image components above 0.5Fs, (more or less attenuated by
image filter of D/A converter), should be evaluated in correlation with specific non-linearity in
amplifiers, loudspeakers or other parts of the system.
To maximise archiving quality, interrelated with necessary conditions for further restoration and
post-production activities, several investigations (objective analysis and subjective listening tests)
have to be done:
Of various tweeters response in order to evaluate their significant amounts of intermodulation
products, below 20kHz, when driven by ultrasonic signals;
Of amplifiers that can produce distortion products below 20kHz, audible (even with difficulty), in
the absence of other signals below 20kHz.
The sound systems quality should be judged using harmonic and intermodulation distortion
measurement numbers in the context of their effects perception. They remain purely
mathematical relationships without any further consideration for the characteristics of the
receiver the human ear.
25/29
Real systems can have frequency dependent nonlinearities, most notably loudspeakers, limiting
their performance at high amplitudes. Besides, the recent application of psychoacoustics to audio
data compression problems demonstrates the dominant role of masking in hearing acuity.
A-harmonic
signal as 1%
intermodulation
product (second
order products),
due to nonlinearities in the
signal path, when
aliasing distortion
is present
-20
Magnitude (dB)
b)
-40
-60
-80
Aliasing distortion
Example of
instrument with
ultrasound energy
Intermodulation distortion
a)
-100
-120
3.9
10
15 17.9
21.8 24 26.2
30
35
40
45
Frequency (kHz)
26/29
-20
-60
-80
-100
Imaging distortion
-40
Intermodulation distortion
A-harmonic
signal as 1%
intermodulation
product (second
order products),
due to nonlinearities in the
signal path, when
imaging
distortion is
present
Magnitude (dB)
c)
-120
10
15
21
24
27
30
35
40
45
Frequency (kHz)
5 CONCLUSIONS
The requirements for higher resolution in acquisition of the impulsive signals and better
perception of musical transient attacks passages in the restoration of degraded sources activities
should be analyzed in the modern surrounding conditions with extended bandwidth, gentle
filtering, improved phase and impulse characteristics.
The effort to increase bandwidth should be correlated with new designing results for an improved
off-axis response of loudspeaker and better sound quality at higher frequency of those, with
diaphragm resonances located well out of audible range.
In these conditions the transfer work for digital preservation, interpreted as the creation of a
surrogate (as an accurate, authentic, and very high quality representation of the original), could
start, identifying all necessary and adequate equipment and operating personnel that could be
involved in the preservation system.
The evaluation of the factors that influence the A/D converter fidelity described here indicate
that, reducing distortion mechanisms by filters designed for higher sampling frequency with
27/29
Watanabe K., FPC Inc., A Kodak Company, Evolution Availability Longevity, Joint Technical Symposium, 2004
Watkinson J., Is digital storage more reliable than analogue?, Resolution, November/December 2002
3
Bradley K., Critical Choices, Critical Decisions: Sound Archiving and Changing Technology, 2004
4
Schuller, D. Preserving Audio and Video Recordings in the Long-term, International Preservation News, 14, 1997.
(On-line): Hhttp://www.ifla.org/VI/4/news/14-97.htmH
5
Schuller, D. Preserving the Facts for the Future: Principles and Practices for the Transfer of Analog Audio
Documents into the Digital Domain. Journal of the Audio Engineering Society, 49 (2001), 7/8, 618-621
6
Hafner, A. The Suedwestrundfunk (SWR) and the Mass Storage Systems in Its Radio Sound Archives: Concepts
and some Performance/Cost Aspects, 106th Audio Engineering Society Convention, Munich, Germany, May 08-11,
1999
7
Herla, S., Houpert J. and Lott, F. From Single-Carrier Sound Archive to BWF Online Archive A New Optimized
Workstation Concept, Journal of the Audio Engineering Society, 49, 7/8, 2001, p. 606-617
8
Presto Space, Preservation Status, Annual Report on Preservation Issues for European Audiovisual Collections,
Deliverable D22.4 DIS4, 31/01/2005
9
Best Practices For Audio Preservation, by Mike Casey, Indiana University and Bruce Gordon, Harvard University,
Hhttp://www.dlib.indiana.edu/projects/sounddirections/bestpractices2007/H
10
IASA-TC 03: The Safeguarding of the Audio Heritage: Ethics, Principles and Preservation Strategy, Version 3,
December 2005,
Hhttp://www.iasa-web.org/IASA_TC03/IASA_TC03.pdfH
11
IASA-TC 04: Guidelines on the Production and Preservation of Digital Audio Objects
12
Capturing Analog Sound for Digital Preservation: Report of a Roundtable Discussion of Best Practices for
Transferring Analog Discs and Tapes, CLIR/LC, NRPB (Council on Library and Information Resources and the
Library of Congress under the auspices of the National Recording Preservation Board)
13
Richard Warren, Jr., Storage of Sound Recordings, ARSC Journal 24, no. 2 (1993)
14
Bradley K., Critical Choices, Critical Decisions: Sound Archiving and Changing Technology, 2004
15
Ken C. Pohlmann, Measurement and Evaluation of Analog-to-Digital Converters Used in the Long Term
Preservation of Audio Recordings (roundtable discussion, Issues in Digital Audio Preservation Planning and
Management, Washington, DC, March 10-11, 2006). Also available online:
http://www.clir.org/activities/details/AD-Converters-Pohlmann.pdf.
2
16
Joshua D. Reiss, Understanding sigma-delta modulation: the solved and unsolved issues, J. Audio Eng. Soc.,
Vol. 56, No. 1/2, 2008 January/February
28/29
17
AES17, AES standard method for digital audio engineering - Measurement of digital audio equipment, J. Audio
Eng. Soc., vol. 46 No. 5, pp. 428-447, 1998 May
18
Fielder, L. Dynamic Range Issues in the Modern Digital Audio Environment Proceedings AES UK Conference
Managing the Bit Budget, 3-19 (May 1994)
19
Joshua D. Reiss, Understanding sigma-delta modulation: the solved and unsolved issues, J. Audio Eng. Soc.,
Vol. 56, No. 1/2, 2008 January/February
20
Cirrus Logic - CS5381, 120 dB, 192 kHz, multi-bit audio A/D converter, Advance product information
21
Thomas Sandmann, Comparative test 24-bit-converters Apogee AD-8000 and RME ADI-8 DS, PMA
Production Management
22
Martin Colloms, Do we need an ultrasonic bandwidth for higher fidelity sound reproduction?, Proceedings of the
Institute of Acoustics, Vol. 28, Pt. 8, 2006
23
Tsutomu Oohashi, et al, Inaudible high-frequency sounds affect brain activity: hypersonic effect, Journal of
Neurophysiology, 83:3548-3558, 2000, http://jn.physiology.org/cgi/content/full/83/6/3548
24
Nishigichi et al, Perceptual discrimination between musical sounds with and without very high frequency
components, NHC Laboratory Note no 486, AES 115th Convention 2003
25
Sanjit Mitra, Digital signal processing, a computer-based approach, McGraw Hill, Second edition, 2001
26
Parks T.W. and Burrus C.S., Digital filter Design, Wiley, 1987
27
Parks T.W. and McClellan J.H., Chebyshev approximation for nonrecursive digital filters with linear phase,
IEEE Trans. On Circuit Theory, CT-19: 189-194, 1972.
28
Stanomir D. Discrete signals and systems, Bucharest, Athena, 1997
29
Boyk J. Theres life above 20 kilohertz! A survey of musical instrument spectra to 102 kHz, California Institute
of Technology, Music Lab, 1997
Hhttp://www.cco.caltech.edu/~musiclabH
30
Dunn J. Anti-alias and anti-image filtering: The benefits of 96kHz sampling rate formats for those who cannot
hear above 20kHz, 104th AES Convention, Amsterdam, May 1998
29/29