Recommended Methods and Values For Digitization Projects of Audio Archives

MANDATORY PRINCIPLES, RECOMMENDED METHODS AND VALUES
FOR DIGITIZATION PROJECTS OF AUDIO ARCHIVES

Mihai DUMITRU
Romanian Radio Broadcasting Corporation, Technical Department
ABSTRACT
Standards and best practices are provided as foundation for preservation work by outlining expectations and
goals for the output of capturing analogue sound in digital preservation system.
It is critical that capturing analogue sound technologies, formats, procedures, and techniques developed by
technical experts to be adequate implemented, ensuring the high-quality output of a preservation system
with longer usability, sustainability, and products interoperability.
The digitization of analogue sources as PCM audio at 96 kHz or 192kHz sample frequency and 24bit word
length were evaluated and presented as the best compromise, using mathematical formulae, simulation
software applications and data sheets product information, instead of electrical measurements support,
and dynamic range evaluation in a complete audio signal chain correlated with statistics about the human
hearing sense, instead of subjective evaluations and listening tests.
Trade-offs of high sampling frequencies in typical real digital systems and consideration about recent
promoted technical specifications, established consensus and perception tests were presented.
1 INTRODUCTION
Long-term preservation of audio data is hopeless: the carriers are unstable, the commercial
lifetimes of the formats seem to become shorter and shorter, and the amount of data to be stored
increases every day.
Magnetic tapes, for analogue and digital audio recordings, typically contain the following
materials: magnetic oxide, polyurethane binder, polyester base and carbon back coating.
M a g n e tic
P a rtic le
B in d e r
L u b ric a n t
R e s e rv o ir
Top Coat
S u b s tra t u m
Back C oat
All three components - magnetic particle, binder, and backing - are potential sources of failure for
a magnetic tape medium. Polyester base is considered very stable under typical storage
conditions. The binder is subject to hydrolysis. The result is sticky syndrome or sticky shed.
The Magnetic-Media Industries Association of Japan (MIAJ) has concluded that the shelf life of
magnetic tape under normal conditions is controlled by the binder rather than the magnetic
particles ("DDS Specs Drive DAT Reliability," Computer Technology Review, 13 (5), May
1993). In this instance, the shelf life would refer both to the life of recorded as well as unrecorded
media; the life of the binder is independent of whether or not the tape has ever been recorded.
Accordingly, analogue and digital tape formats share many of the physical attributes as it relates
to aging and life expectancy. Digitals greatest advantage is each duplicate matches the quality of
the original. The risk is, when failure occurs, it is complete failure.
1/29
Analogue technology is well documented. Operators and engineering expertise are still available.
Over time this will diminish, as the younger people in the industry are not using analogue as the
primary recording method.
On the other hand, special knowledge is necessary about the technology used for digital medium
formats:
CD-R and DVD-R both use organic dyes that respond similarly to temperature and humidity over
time. Manufactures have conducted accelerated aging tests by subjecting discs to higher
temperature/humidity then extrapolating the future failure point.
Using similar methods, National Institute of Standards & Technology (NIST) found DVD-R
authoring discs to last 25 years.
Replicated discs, DVD Video & audio CDs use aluminium as the reflective layer (CD-R/DVD-R
use a more stable Silver) that can be subject to rot as the metal oxidizes if not properly sealed
during manufacturing.
All DVD discs are constructed by gluing two polycarbonate discs together. There are two
methods: one is the UV cured lacquer (considered more stable) and thermal melt glue.
Separation of the discs can cause failure.
MO Discs use heat and magnetism to mark the disc. A NIST study concluded with 95%
confidence that an MO disc will last 57 years at room temperature at 90% humidity.
Helical scan data recording formats for digital tape (AIT-SAIT-DTRS-DAT-HDCAM-Hi8) are
considered higher risk because misalignment of the recording heads or warp-age of the media
base can cause data retrieval failure. Sony quotes 30 years life expectancy under proper storage
conditions (from Sony web site).
Linear recording formats for digital tape (DLT-SDLT-LTO1, 2, 3, 4) are considered more
reliable because of the fixed position recording head.
Metal particle (MP) or metal evaporated (ME) tape both does not use binders for extra thin
coating providing better wrap and signal performance but at a price of being more fragile 1 .
All practical media degrade. The substrate and the active layer deteriorate over time. This can be
slowed by optimal storage, but not halted. With good quality media and optimal storage,
magnetic recordings will last for many decades. The problem is always that of finding a drive that
will play the medium. (In 20 years time, how many 1/4-inch open reel recorders will be around?
How many DAT machines? 2 )
From this point forward only digital solutions will be available. Organizations responsible for the
preservation of audio elements can expect the digital media formats to evolve faster than before
in history, resulting in less compatibility regionally and worldwide.
The challenge is to choose archive strategies that compensate for these dynamics.
One of the things digital recording does very reliably is not to cause generation loss. So, if by
reliability we mean the ability to record audio for an indefinite period, then digital becomes the
only choice.
Unlike analogue recording, digital audio can be copied and recopied without loss of signal or
addition of noise or distortion. The potential offered by the production of digital surrogates for
the purpose of preservation seems to provide an answer to linked issues of preservation and
access.
2/29
A well-engineered digital recorder has an effective error correction system that puts the data back
to its original binary value provided the error rate is within limits. Thus it is possible to determine
if a digital medium is deteriorating by playing it once in a while. If the error rate starts to
increase, we know that the medium is destined to disaster. However, if we make a digital copy
onto a new medium (which may even be a new type of medium) before the error rate exceeds the
performance of the error correction, or before the player becomes obsolete, we can start the race
again. Thus the question of reliability comes down to how well the system for doing that digital
copying is administered.
A survey done with ten broadcasters in Europe indicates that the audio holdings are mainly in
quarter inch tapes and shellac and vinyl discs, besides cassettes, DAT, CD, minidisks and
Tandberg QIC cartridges.
For all broadcasters the problems with the analogue replay equipment appear more and more
difficult to overpass them, but those caused by digital carriers in the same situation are even
greater. In consequence are enough arguments, before the digital recording technology it is made
on becomes obsolete, for a migration of all recordings made in digital form to a stable storage
system and format.
That means, the digital preservation should be initiated (a planned, organised and standardised
transfer to suitable storage), as soon as possible, enabling the digital file to be successfully and
simply migrated when necessary 3 .
The automatically accessible, self-controlling and self re-generating archival system, also known
as digital mass storage system (DMSS), is the most appropriate solution. The features of such
system are 4, 5 , 6 :
The management of audiovisual data as computer files in mass storage systems, e.g. libraries or
robotics of magnetic tape cartridges.
An open file architecture to accommodate all audio data together with catalogue / content
information and written text (metadata).
The access time of such systems is not of major importance.
Data integrity is controlled automatically, and copying of the information onto new carriers is
done automatically before mistakes cannot be fully corrected.
Once new storage media and systems are available due to technical development, automated
migration will be implemented.
Similar observations are done in IASA after extensive pilot projects, digital mass storage
systems (DMSS) have been installed in major archives for the storage of large audio collections.
Such systems permit the automatic performance of tasks including checking of data integrity,
refreshment, and, finally, migration with a minimum use of manpower (cf. IASA-TC 04, 6.2).
The benefits of using a networked radio station with digital mass storage versus using a
conventional radio sound archive are broadly described in several documents.7,7
2 CONSERVATION, MIGRATION, RESTORATION

In fact, before the digital preservation procedure start-up, there are several types of project that
must be initiated 8 :
Conservation: improving storage conditions to make existing material last longer.
3/29
Migration: this is the main immediate work for audio and video materials, and for decaying film.
Preservation via migration consists of transferring master material from old formats to new ones.
The same technical process, namely transfer from one format to another, is also used to make
viewing copies, usually in lower quality. All formats benefit from cost effective methods for
production of viewing copies.
Restoration: Archive media can be of varying quality, and modern technology can significantly
improve the result of a migration (transfer) process. Restoration has tended to be seen a too
expensive to incorporate in cost-effective transfers, but some of the data necessary for restoration
is calculated as a matter of course in the digitisation stage. To integrate the workflow so that
restoration becomes more cost effective, so that more archive material can benefit from the power
of digital restoration, it could be a major goal for this kind of project.
Presto published a model for a range of activities associated with digitisation, from all the actions
needed to identify materials for transfer, gather them together and transport them, and their
metadata, to the digitisation area providing web access and updating the metadata (catalogue).
These additional activities are shown in the following figure.
Composition
Identify and
assemble materials
Digitisation
New Media Creation
Update Archive
Create a digital master

copy plus low-data rate
versions
Create a new archive

item (physical or
electronic)
Replace old item

with new update
metadata
Figure 1 Digitisation and associated activities

Of course, the above figure is by no means everything. Each of the boxes could be broken down
into many more steps and costs. As an example, the Composition process could be broken in
more steps, each with two parallel streams:
Actions concerning documentation (metadata);
Actions on the actual media
The primary goal of transfer work for digital preservation being the creation of a surrogate (as an
accurate, authentic, and very high quality representation of the original), it is necessary to identify
the adequate equipment and operating personnel that could be involved in the preservation
system.
Both the abilities of staff and the equipment used greatly impact the success of the analogue
playback stage. The engineer must understand how field recordings carried on obsolete and
deteriorating historic formats may be optimally reproduced despite degradation, taking into
account specific characteristics of both the individual recording and the format itself. The
engineer must also align, calibrate, and verify the performance of the playback machine, which
itself must be able to reproduce the recording at the highest fidelity possible 9 .
Recommendations for an experienced preservation transfer personnel
4/29
IASA-TC 03 10 and TC 04 11 , in addition to stating that equipment must be optimally adjusted and
maintained, suggest that playback requires knowledge of the historic audio technologies and a
technical awareness of the advances in replay technology.
The CLIR/LC 12 report, Capturing Analog Sound, addresses this directly, suggesting that there
are many areas in which a trained ear and years of experience are by far the most important
tools. in some archives, fragile audio recordings are being handled, played, and transferred for
digital preservation by staff who have limited experience working with audio recordings or little
knowledge about the sonic characteristics and weaknesses of various audio formats.
Recommendations and basic audio engineering principles regarding all signal chain components
and technical spaces used for preservation transfer work
IASA-TC 04 stipulates: The combination of reproduction equipment, signal cables, mixers and
other audio processing equipment should have specifications that equal or exceed that of digital
audio at the specified sampling rate and bit depth. The quality of the replay equipment, audio
path, target format and standards must exceed that of the original carrier.
The CLIR/LC report discusses the need for accurate monitoring systems to evaluate quality as
well as test equipment to evaluate potential problems.
Richard Warrens storage document published in the ARSC Journal 13 recommends a Noise
Criteria-level of 20-25 dB for critical listening areas. More generally, he also calls for
consideration of the proper acoustical conditions to prevent the room from distorting the sounds
to be studied.
According to IASA-TC 04, any transfer should attempt to extract the optimal signal from the
original [as] the original carrier may deteriorate, and future replay may not achieve the same
quality, or may in fact become impossible, and secondly, signal extraction is such a time
consuming effort that financial considerations call for optimization at the first attempt.
Taking care of conclusion that the most direct and clean signal path must be used from
source to destination, it is very important to underline the weakest link in the digital chain: the
point of conversion from analogue to digital. The choices made regarding conversion
technologies, and the selection of digital formats, resolutions, carriers and technology systems
will impose limits on the effectiveness of digital preservation that cannot be reversed, as will the
quality of audio being encoded. Optimal signal extraction from original carriers is the
indispensable starting point of each digitization process 14 .
Then, having faced the need to copy, the selection of storage format becomes next major issue.
The sound archiving community is rallying around the European Broadcast Unions Broadcast
Wave Format (BWF). BWF [EBU Tech 3285] is a format that complies with the specification of
the .wav format but has included a number of metadata tags in same manner as TIFF (tagged
image file formats) has done for images. The International Association of Sound Archives
recommends the use of linear BWF files for archiving: because of the simplicity and ubiquity of
5/29
linear PCM (interleaved for stereo) The BWF format is widely accepted by the archiving
community
All responsible archiving groups and associations strongly argue against the use of any format
that uses lossy data compression or perceptual coding in archival recordings, or in recordings
eventually intended for archives. MP3 (MPEG 2 layer 3), minidisk and any form of streamed
audio are all formats which employ bit rate reduction or data compression, and should not be
used in archival processes, including field recording. It is not possible to uncompress recorded
audio that uses perceptual coding; instead the part of the audio that is discarded remains forever
lost, permanently limiting the quality and use of that audio thereafter.
3 RECOMMENDED METHODS TO IDENTIFY THE BEST POSSIBLE

A/D CONVERTERS FOR CRITICAL AUDIO APPLICATIONS
The conversion and storage system consists of three parts, the analogue to digital conversion
hardware, the computer system and the storage system.
We are aware of the difficulties using transducers in a complete audio signal chain to convert
signals from acoustical to electrical, what is done by the microphone, and back again from
electrical to acoustical, by the loudspeaker. But, trying to keep meaningfully audible audio
signals for indefinite long time, it is necessary to store them in the best conditions offered by
digital domain.
Hence, the A/D converter becomes the key component in the signal path, as the choice of the A/D
converter irrevocably affects the fidelity of the resulting signal.
To assess the degree of transparency, the converters electrical measurements and subjective
aural performance, as well as the converters operating parameters such as sampling frequency
and word length, must be considered. Finally, the signal-level input to the converter, convertercomponent design, and external conditions such as grounding and shielding can greatly affect the
fidelity of the resulting file.
Choosing an A/D converter must be based on an evaluation of technical measurements and of
subjective listening.
In converting analogue audio to a digital data stream, the analogue to digital converter should not
colour the audio or add any extra noise. It must exhibit audio transparencythat is, it should
neither add to nor subtract from the sound. In practice, the A/D converter incorporated in a
computers sound card does not, and cannot, meet the specifications required due to low cost
circuitry and the inherent electrical noise in a computer. A discrete (stand alone) A/D converter
that will convert from analogue to digital in accordance with the professional specifications is
always recommended.
The more recent generations of computers have sufficient power to manipulate large audio files.
Once in the digital domain, the integrity of the audio files should be maintained. As noted above,
the critical point in the preservation process is converting the analogue audio to digital, and this
6/29
relies on the A/D converter, and entering the data into the system, either through the sound card
or other data port.
Peak Levels
in Music Performances
Classical music
90-118 dB SPL
Rock music
115-129dB SPL
Jazz music
114-127 dB SPL
Others
116-127 dB SPL
Just Audible Noise Level

Mean threshold
~4 dB SPL
(for 20-kHz low-pass - filtered white noise

-2 to 9 dB SPL typical detection levels span)
Wide-band noise levels in

listening rooms
20-35 dBA SPL
Headroom
6-9 dB
Requirements (dynamic range)

for digital audio processing
Available transducers
and equipment - dynamic range
Microphones
110-115 dB
with A/D incorporated
120-125 dB
Mixing consoles
>288 dB
Storage
>144 dB
A/D convertors
CD-A
96 dB
SACD (with noise shaping)120 dB
DVD-A
144 dB
115-130 dB
Available distribution media
Techniques for
increasing
dynamic range
Footroom
6-9 dB
Reproduction system limitations
- dynamic range
D/A convertors
>110 dB
Power amplifier
110-120 dB
Loudspeakers (1m peak outputs)
consumer
112-120 dB SPL
professional
128-131 dB SPL
Figure 2 The dynamic range values in a complete audio signal chain

Unfortunately, many historic recordings were recorded with very limited audio bandwidth and
high noise floor. Even so, any digitization must use the best-possible signal chain to capture and
preserve as much information as possible. This is a more prudent approach because in any
archival-conversion project the cost of digitization equipment is trivial, compared with the cost of
labour. An archival conversion signal chain must provide very high fidelity.
But, only an ideal converter has no sound of its own. Most converters are certainly not
transparent. Only the best converters can approach transparency.
The factors that influence the A/D converter fidelity (sampling frequency, quantization word
length, dither, converter chip architecture, converter component design, input audio
preamplifier and signal levels) have to be evaluated using electrical measurements completed
with subjective evaluations and listening tests.
7/29
3.1 Recommended Values

3.1.1 Sampling frequency: 96 kHz, 192kH
IASA-TC 0412, the CLIR/LC document13 and Ken Pohlmann article on converters 15 recommend
higher sampling rates than 44.1 kHz for several reasons:
Many musical instruments are capable of producing information in higher frequency ranges
including inaudible higher frequency harmonic content that also impacts our perception of
sounds: a cymbal might have response of 90 dB SPL (sound pressure level) beyond 60 kHz, and
a violin might have content beyond 100 kHz;
The binaural time response leading to improved imaging in multichannel recordings (a 15-S
difference between the pulses can be heard, being a time difference shorter than the time between
two samples at 48 kHz, 22.7 S being at 44.1 kHz and 5.2 S at 192 kHz);
The temporal response as the musical instruments can generate transients with rise times of less
than 10 S and some reverberation might comprise arrivals spaced regularly at less than 2 S
time interval;
The filter (anti-aliasing) and signal processing performance as a lower order slope might be
employed, providing improved time-domain response;
It is important to accurately capture noise, such as clicks and pops on a disc, and other inaudible,
high frequency information so that improved signal processing algorithms in the future that are
able to take advantage of higher frequency information will have enough data to work as
effectively as possible. Some of this noise resides in frequency ranges higher than can be
captured at 44.1 kHz.
In accordance with these arguments, we can present the step responses (Figure 3) and impulse
responses (Figure 4) analysing (with Filter Design and Analysis Tool user interface, in MATLAB
workspace) a digital Finite Impulse Response (FIR) filter, used in most ADC.
Step Response, Fs=48000Hz, Fpass=20000Hz
Step Response, Fs=96000Hz, Fpass=20000Hz

1
0.8
Amplitude
Amplitude
0.8
0.6
0.4
0.2
0.6
0.4
0.2
0
0.5
0.6
0.7
0.8
Time (mseconds)
0.9
0.05
0.1
Time (mseconds)
0.15
Figure 3 Step responses of an equiripple FIR filter for two different sampling frequencies:
48 kHz and 96 kHz
8/29
Mag. (dB)
Apass
Astop
|
|
Fpass Fstop
f (Hz)
Fs/2
M a g n i tu d e R e s p o n s e ( d B ) - L o w p a s s E q u i r i p p le F IR
F s = 1 9 2 0 0 0 H z, F p a s s = 2 0 0 0 0 H z
0
-2 0
Magnitude (dB)
-4 0
-6 0
F s to p = 2 4 0 0 0 H z
F s to p = 4 8 0 0 0 H z
F s to p = 9 6 0 0 0 H z
-8 0
-1 0 0
-1 2 0
10
20
30
40
50
60
F re q u e n c y (k H z)
70
80
90
Im p u ls e R e s p o n s e - L o w p a s s E q u i r i p p le F IR
F s = 1 9 2 0 0 0 H z, F p a s s = 2 0 0 0 0 H z
0 .8
0 .7
F s to p = 2 4 0 0 0 H z
F s to p = 4 8 0 0 0 H z
F s to p = 9 6 0 0 0 H z
0 .6
0 .5
Amplitude
0 .4
0 .3
0 .2
0 .1
0
- 0 .1
- 0 .2
0 .1
0 .2
0 .3
0 .4
0 .5
T im e (m s e c o n d s )
0 .6
0 .7
0 .8
0 .9
Figure 4 Impulse responses of an equiripple FIR filter for different attenuation slopes
3.1.2 Quantization Word Length: 24 bits
The word length of the converter describes the length of the output digital word and hence the
number of bits used to represent the amplitude of the audio samples.
9/29
An ideal ADC (without room for any internal noise) has noise spread over the band from DC to
the folding frequency and can be determined using the following equation:
FB
Ideal noise ( DCto f B ) = 10 log
+ 3.01 n 6.02 dBFS
1/
2
F
H(f)
Sa(t)
Sq(n)
This equation is based on several assumptions:

A linear model of quantization (sampling rate satisfying the sampling theorem, i.e. the
signal being sampled at least twice the highest frequency in the input signal Fs>2FB);
The noise of analogue to digital conversion is, mainly, due to the error sequence of
quantization process (with q quantization step size and n bits resolution);
The error sequence is a stationary, random process, being uncorrelated with itself and the
input Sq(n);
The quantization error q is uniformly distributed over a quantization step:
B
1/ q q q / 2
p ( q ) =
q > q / 2
0
16
The quantisation noise power is given by :
q2
1
2
2
= E {( q q ) } = =
.
12 3 22 n
In the digital domain the signal levels is expressed relative to digital full scale, as it was defined
in AES 17 17 : the level of the sine wave that has peak level equivalent to the maximum
positive value:
S a (t ) = U v cos(2 t / T )
Fitting U v = 1 , the average power of the reference signal is
Uv2 1
=
a
2
2
Accordingly, the signal to noise ratio inside of the convenient audio bandwidth being:
S2 1/ 2 F
1/ 2 Fs
s
6.02n + 1.76 + 10 log10
SNR = 10 log10 2a
dBFS
FB
FB
This is the available when the quantizer, a nonlinear device, behaves in a statistical sense like a
linear device, the quantization noise being modelled as IUDN (Independent UniformlyDistributed Noise). So, although quantization acts nonlinearly on signals, it acts linearly on their
probability densities.
The quantizer is then a source of additive noise whose statistical properties are known and fixed:
mean = 0, variance = q2/12, uncorrelated with the quantizer input.
s2 = E {( Sa (t ) Sa (t ) ) 2 } =
10/29
If the quantizer input Sa has a PDF (probability density function) that does not satisfy any of the
quantizing theorems, the quantization noise will not have properties like IUDN. These properties
can be obtained by the addition of a suitably designed independent dither signal d to the quantizer
input. This usually means that: each dither sample is produced by a pseudorandom number
generator, and a D/A converter is used to convert the number to an analog level to be added to the
input of the quantizer before quantization.
The total output noise q+d should be independent of the quantizer input Sa (or q+d
uncorrelated with Sa) in order to satisfy the ideal objective for a linear quantizer device.
So, the quantizer would be linearized by the dither, and the IUDN model would prevail.
The price paid using this technique is the increased noise power due to dither signal.
Using, for example, a Gaussian dither, whose standard deviation is q/2, the noise of ideal
converter will be increased by
q2 q2 q2
+
=
+6dB .
12 4
3
Or, using a triangular dither, whose amplitude range is +/-q, the total output noise power will be
q2 q2 q2
+
=
+4.77dB
12 6
4
As, from the statistical point of view of second-order moments, the triangular probability
distribution function (TPDF) dither ensure the desired behaviour of the IUDN model much better,
using this dither, the ideal signal to noise ratio of the converter will became:
S2 1/ 2 F
1/ 2 Fs
s
6.02n 3.01 + 10 log10
SNR = 10 log10 2a
dBFS
FB
F
Number of
bits
16
24
Fs = 44.100 Hz
Fs = 48.000 Hz
Fs = 96.000 Hz
Fs = 192.000 Hz
93.73 dBFS
94.10 dBFS
97.11 dBFS
100.12 dBFS
141.89 dBFS
142.26 dBFS
145.27 dBFS
148.28 dBFS
SNR of a ideal ADC (with unshaped TPDF dither of 2 LSBs amplitude peakto-peak), in unweighted bandwidth (20.000 Hz) measurement conditions
However, this is a theoretical figure. A more effective measure of the converter quality, due to
the converter errors, is ENOB (effective number of bits) where
ENOB = (dynamic range 1.76)/6.02
For example, a 24-bit converter with a measured dynamic range of 125 dB provides only 20.5
bits of resolution.
However, a well-designed 24-bit converter will provide a noise floor that lies at the limits of
audibility offering the potential for the requested highest fidelity of a complete audio signal
chain.
The debate regarding the converter resolution required for transparency could be made easier
using some statistics about the human hearing sense.
Listeners weigh the determining factors, sound pressure level, frequency contents, and duration,
differently. Loudness, for example, unlike electrical level, is subjective.
11/29
Our sense of hearing assesses loudness by how the cilia and corresponding auditory nerve fibres
are excited in the basilar membrane in the inner ear. This excitation is distributed on the
membrane by frequency bands, forming a kind of biological spectrum analyzer. Each frequency
excites a certain zone on the basilar membrane and each excited zone adds up to the total
loudness.
The Fletcher/Munson curves were constructed by subjective responses to sinusoidal tones
presented frontally. The phon values were defined by the 1 kHz sinusoidal tones, measured in dB,
the levels giving the name of the phon curves. For example, the 40 phon curve has 40 dB
intensity with a 1 kHz tone.
Several corrections to the Fletcher/Munson were done and included in ISO 226, as a standard for
the hearing threshold of sine waves under free-field conditions, and modified to diffuse-field
conditions by the ISO 454.
32bit-DSP : dynamic range
Threshold of pain
24bit-A/D : dynamic range

16bit-A/D : dynamic range
Music - Approx. range
Speech - Approx. range
8 extra bits for guardband computer errors
Figure 5 Equal loudness contour as described by ISO226 versus Dynamic range of high
quality audio A/D converters and DSPs
12/29
The analysis 18 of the sound levels of acoustic noise (taking care of the ability of the listeners to
detect noise, 3.8 dB SPL being just audible level of white noise), and the sound level of music
(taking care of 120-129 dB SPL peak levels of some music performances) give us the figure of
the necessary dynamic range: 122-124 dB (Figure 5). Accordingly, if a digital system produces
processing artefacts, which are above the noise floor of the input signal, then these artefacts will
be audible under certain circumstances.
The archival conversion of old recordings signals, with low intensity or limited frequency content
(Figures 6 a) and b)), should be followed by digital processing designed to prevent processing
noise from reaching levels at which it may appear above the noise floor of the input and hence
becoming audible.
Year
Old recording medium
dB
Frequency bandwidth (Hz)
1897 Shellac Discs
28
168-2.000
1931 Vinyl long play records
60
30-10.000
1944 Decca FFRR (Full Frequency Range
60
10-15.000
Recordings)
Table 1 Dynamic range and frequency bandwidth of gramophone discs
Power spectrum estimate - Example1: old gramophone disc, specific background noise
-40
Hamming
Kaiser
Chebyshev
-60
Power spectrum estimate - Example2: old gramophone disc, specific background noise
-40
Hamming
Kaiser
-60
Chebyshev
-80
Magnitude (dB)
Magnitude (dB)
-80
-100
-120
-100
-120
-140
-140
-160
-160
-180
-180
10
15
-200
20
Frequency (kHz)
a)
10
Frequency (kHz)
15
20
b)
Figure 6 - Specific background noise of old gramophone discs (two examples)
It is important to quantize with a word length that is relatively longer than what may be
immediately required. The larger dynamic range provided by recommended 24-bit word length
supplies greater headroom, which makes level setting less critical.
13/29
Also, a well-designed 24-bit converter will offer the potential for the requested highest fidelity of
a complete audio signal chain, providing a noise floor that lies at the limits of audibility (Figure
7).
Power spectrum estimate - Musical modern recording fragment, 24bit, Fs=192.000Hz
-20
Power spectrum estimate - Musical modern recording fragment, 24bit, Fs =192.000Hz
Hamming
Kaiser
Chebyshev
-40
-20
-60
-80
-100
Magnitude (dB)
Magnitude (dB)
-80
-120
-140
-100
-120
-160
-140
-180
-160
-200
-180
-220
Hamming
Kaiser
Chebyshev
-40
-60
10
20
30
40
50
60
Frequency (kHz)
70
80
-200
90
10
15
20
Frequency (kHz)
a)
b)
Figure 7 Fragment of recent piano recording, made with extremely low self-noise
microphone and 24 bit (192 kHz sampling) digital recorder
- Large bandwidth power spectrum estimation; the bandwidth is limited at Nyquist
frequency (half the sampling frequency);
- Enlarged part of the above power spectrum estimation including only frequencies up
to 25 kHz
In order for the DSP to maintain the SNR established by the A/D converter, all intermediate DSP
calculations require the use of higher precision processing. The digital processing, as you could
see in next figure, decreases useful worth length, effectively, because, the cascading
mathematical operations, truncation and rounding add error to the least significant bit (LSB).
Sa(t)
x1
u1+
xn(t)+eq+d
ei
A/D
Arithmetic
precision
b0
ep
z-1
Arithmetic
precision
z-1
b1
ep
Arithmetic
precision
ep
bm
x2
x2
x1
u1
x2
es
Rounding/
Truncation
er/t ~yn(t)
u1
es
Saturation
effects
Saturation
effects
ep
x1
u1
es
yn(t)
Arithmetic
precision
ep
x1
u1
es
Saturation
effects
b3
Arithmetic
precision
x1
x2
z-1
z-1
b2
Saturation
effects
Memory
x1
u1 +
Sa(t)
eo
D/A
Figure 8 Error sources in a digitisation operation including a FIR filter processing
14/29
4 THE TRADE-OFFS OF HIGH SAMPLING FREQUENCIES IN

TYPICAL REAL DIGITAL SYSTEMS
4.1 Preliminary considerations
The non-linear phase distortion caused by the anti-aliasing filter may create harmonic distortion
and audible degradation. Since the analog anti-aliasing filter is the limiting factor in controlling
the bandwidth and phase distortion of the input signal, a high performance anti-aliasing filter is
required to obtain high resolution and minimum distortion.
While a Nyquist-rate A/D converter performs the quantization in a single sampling interval to the
full precision of the converter, an oversampling converter generally uses a sequence of coarsely
quantized data at the input oversampling rate of Fs = 2m +1 FB (m being the doubling factor of
frequency) followed by a digital-domain decimation process to compute a more precise estimate
for the analog input at the lower output sampling rate, Fs, which is the same as used by the
Nyquist samplers. Regardless of the quantization process, the oversampling has immediate
benefits for the anti-aliasing filter.
The oversampling and special filtering, designed to shape away the noise from passband, are the
key elements of sigma-delta modulation.
The general formula for the SNR 19 of an ideal sigma-delta modulation Nth order converter is:
S2a
(2 N + 1)2(2 N +1) m
SNR = 10 log10 2 + 10 log10
2N
6.02n + 1.76 + 10 log10 (2 N + 1) 9.94 N + 3.01(2 N + 1)m dBFS

In practice, of course, no actual realization can achieve this theoretical performance (167.15
dBFS, for example, is estimated SNR in case of sigma-delta modulation 5th order, 64 x
oversampled 1-bit A/D converters).
Example 1: Specific components
The CS5381 is a complete analog-to-digital integrated circuit converter for digital
audio systems, designed by Cirrus Logic.
The CS5381 uses a 5th-order, multi-bit delta-sigma modulator followed by digital
filtering and decimation, which removes the need for an external anti-alias filter.
Designed for audio systems requiring wide dynamic range, negligible distortion
and low noise, such as A/V receivers, DVD-R, CD-R, digital mixing consoles, and
effects processors, the CS5381 has the following main features 20 :
24-Bit conversion
120 dB dynamic range
-110 dB THD+N
Supports all audio sample rates including 192 kHz
Example 2: Stand alone equipment

The Apogee AD-8000 and the RME ADI-8 DS eight-channel converters, very well appreciated in
this high-end sector, have following specifications 21 :
15/29
SNR (dB) rms unweighted

SNR (dB) rms A-weighted
Frequency response
Apogee AD-8000
Channel 1-8
107109
108113
Fs=44.1kHz
RME ADI-8 DS
Channel 1-8
113.5
117
Fs1=44.1 kHz
Fs2=88.2 kHz
Fs3=96.0 kHz
10 Hz (0.1 dB)
20.81 kHz (0.4 dB);
10 Hz (0.1 dB)
20.72 kHz (0.4 dB) or,
41.01 kHz (0.4 dB) or,
44.67 kHz (0.4 dB);
21.44 kHz (3 dB) or,
42.89 kHz (3 dB) or,
46.52 kHz (3 dB).
-107
21.44 kHz (3 dB)

THD+N (dB)
-105
Joshua D. Reiss, in his recent, already cited article Understanding sigma-delta modulation: the
solved and unsolved issues, described several limitation of the practical sigmadelta
modulation: limit cycles, idle tones, harmonic distortion, dead zones, noise modulation, and
stability.
Definitions included in the cited article
Limit cycles: the occurrence of a repeating
sequence in the output bitstream, for audio
applications, being possible audible artefacts.
Idle tones: a discrete peak in the frequency
spectrum of the output of a converter with
sigmadelta modulation, but superimposed on
a background of noise.
Harmonic distortion: peaks that are due to
unwanted harmonics or aliasing of the input
signal and those that bear no apparent
relationship to the input frequency.
Dead zones: a range of input for which the
sigmadelta modulator may produce the same
average output value.
Noise modulation: the quantization noise
power depends on the signal and it can be
perceived after the quantization of audio
signals.
Stability: with given initial conditions and
constant input, the stable behaviour of the
higher order sigmadelta modulators
converter is questionable.
Conclusions about these issues

It may be considered a mostly solved problem.
It is no theoretical basis for these well-defined

and simple relationships between the input
signal and the frequencies of the tones that have
been observed.
It is not well-understood phenomenon, but
clearly related with idle tones.
It is without reported problems in high order or
commercial designs.
There is no well-established theory even for
low-order sigmadelta modulators.
It is necessary a better understanding of
stability problematic as far as robust, high
performance implementations should be
developed.
Although, the dithering technology could be an
16/29
effective solution for all above issues, it is not

indicated for low bit quantizer stability issues,
as it decreases the stable range of a sigmadelta
modulator.
Further, the evaluation of the limiting factors in typical real digital systems should be done in
conjunction with the objective of audio preservation in a present practical perspective 22 :
Reproduction bandwidths (greater then 20 kHz) offered to the consumer as higher fidelity
specification;
Recording, transmission/storage resources, amplification and sound radiation aspects;
Old and new sound carriers (SACD, DVD-A, HD DVD, Blu-ray HD) in relation with the
generally limited bandwidth of available sound reproducers.
Promoted technical specifications
Expanded high frequency limit of
the audio chain, up to two and half
more octaves than 20 kHz
Established consensus, perception tests

The sound is perceived, via bone conduction, up
to 100 kHz as a single noise like pitch;
High intensity sound above 20 kHz may be
perceived as pain;
The propagation in air is less directive and
increasingly lossy at higher frequency.
Significant ultrasonic noise which accompanies
the noise shaping D/A converters requires low
pass filter restricting bandwidth (to less than 50
kHz) before the signal reach the end audio
amplifier.
SACD and DVD-A carriers are

capable of 100kHz replay
bandwidth;
HD DVD and Blu-ray have higher
storage capacity with potential for
multiple wide band audio channels.
Higher frequency or extra bandwidth Many elements of the replay channel (decoders,
reproducers
amplifiers) have low pass filters at 20-25 kHz,
some of them (switching technology power
amplifiers) to combat their tendency for
electromagnetic radiation;
Loudspeakers designers have to overpass the
conflicting requirements at higher frequency: the
necessary sensitivity in opposition with reduced
diaphragm area, imposed by the directivity and
continuous response (without high Q resonances)
characteristics;
The room behaviour (more absorbed) and the ear
sensitivity (more directional) restrict the benefits
of higher frequency range only to those sounds
with direct path to the entrance of the ear canal.
Super-tweeters
The extended response performance should be
achieved and validated without any intermodulation in the common audible range.
The commercial promoted add-on tweeters and
matching crossovers seem to have this
inconvenient effect, more or less subtle.
17/29
The extensive brain scanner investigation 23 with

ultrasound stimuli noticed quite complex
physiological effects. Further work suggested
that the previous reported phenomenon had been
in relation with a body exposed to the ultrasonic
sound field, not just the ears.
But, other different investigations 24 separating
firmly, audible energy band from the inaudible,
using very steep band filtering in the experiment,
reiterated that 20 kHz is entirely sufficient for
sound reproduction.
4.2 The benefits of high sampling rate in anti-alias and anti-image

filtering design
Anti-alias and anti-image filtering are performed, in almost all audio A/D and D/A converters
subsystems, by gentle, non-critical, analogue low-pass filter of low order in conjunction with an
oversampled converter and high order digital brickwall filter. The digital filter, using a finite
impulse response (FIR), of one or more stages, permits the performance of necessary, requested
sharp cut-off.
The FIR filter can be designed with exact linear phase and the filter structure is always stable in
relation with the quantized filter coefficients.
The minimum length of an FIR low pass filter is related to three parameters: the transition region
width, maximum pass-band error (ripple) and minimum stop-band rejection.
Accordingly with several authors estimation 25 , the minimum value of the filter order N comes
directly from following digital filter specifications: normalized passband, edge angular
frequency p , normalized stopband edge angular frequency s , peak passband ripple p , and
peak stopband ripple s .
Kaiser, for example, developed a rather simple approximate formula:
N
20 log10
p s 13
14.6 (s p ) / 2
In this context, a sharp cutoff or a narrow transition band will imply a very long length FIR filter,
whereas a wider transition will involve a shorter length FIR filter.
Parks and Burrus 26 proposed the following alternative formula for very wide band filter case:
20 log10 ( p ) + 5.94
N
27 (s p ) / 2
The estimation of the filter order is more dependent of the passband ripple in this circumstance.
The passband response, especially, should approximate the ideal of being flat in a way that
minimises the maximum distortion of the real filter response.
18/29
In consequence, filter design algorithms rely on iterative optimization techniques in order to

minimize the error between desired frequency response and that of the DSP generated filter.
Equiripple linear-phase FIR filter design has become a mainstay of FIR filter design after the
classic work by McClellan and Parks.
The basic idea included in the Parks-McClellan algorithm 27 is to minimize the peak absolute
value of the weighted error given by the difference between the frequency response of the digital
transfer function (designed response, H (e j ) ) and the desired frequency response (ideal
response, D(e j ) ), according to following equation:
( ) = W (e j ) H (e j ) D(e j )
for 0
The linear-phase property ensures that the frequency response of the filter can be written 28 :
H (e j ) = H p ( ) = H p ( ) exp j ( a + b ) a, b : real constant coeficients and H p : R R
as a phase factor (linear-phase) in cascade with a real frequency response which can be expressed
as the sum of cosines. The sum of cosines term in turn can be expanded as a sum of cosine
powers, i.e. a Chebyshev polynomial in cos() .
With this decomposition, algorithms such as the Remez exchange procedure can be used to
design optimal min-max approximations to a desired response.
In concordance with above design idea, the filter passband response (and similar the stopband)
can be considered as the desired flat response with additional error response.
Next figures illustrate the possible response (designed with REMEZ algorithm, in Signal
Processing Toolbox from MATLAB workspace) of some high-end equipment, when anti-alias
and anti-image, equiripple linear-phase FIR filters are used:
19/29
Magnitude Response (dB), Fs=96kHz

0
0
Frequency (kHz): 21.41309
Magnitude (dB): -3.016742
-40
-60
-80
Lowpass Equiripple
FIR 118 tap
Frequency Response
-120
0
a)
x 10
-3
10
15
Frequency (kHz)
-80
20
Lowpass Equiripple
FIR 147 tap
Frequency Response
10
d)
x 10
-3
20
30
Frequency (kHz)
40
0.5
Magnitude (dB)
Magnitude (dB)
-60
-120
0.5
-0.5
-1
-40
-100
-100

-20
Magnitude (dB)
Magnitude (dB)
-20
-0.5
Lowpass Equiripple
FIR 118 tap
Passband Magnified
0
10
Frequency (kHz)
15
-1
20
b)
Lowpass Equiripple
FIR 147 tap
Passband Magnified
0
10
Frequency (kHz)
15
20
e)

-100
-100
-110
-120
Magnitude (dB)
Magnitude (dB)
-120
-130
-140
-150
-160
-170
Lowpass Equiripple
FIR 118 tap
Stopband Magnified
-180
23
23.2
23.4
23.6
Frequency (kHz)
-140
-160
Lowpass Equiripple
FIR 147 tap
Stopband Magnified
-180
23.8
24
25
30
35
40
Frequency (kHz)
45
c)
f)
Figure 9 - FIR filter specifications for 48 kHz sampling rate (a, b, c), in conjunction with
critical, analogue low-pass filter of high order; the same specifications (d, e, f) for 2x
oversampling equivalent filter, in conjunction with gentle, analogue low-pass filter of lower
order
20/29

0
0
-40
-60
-80
-100
-60
-80
10
20
30
Frequency (kHz)
40
20
d)
1
x 10
-3
-3
40
60
Frequency (kHz)
80
0.5
Magnitude (dB)
-0.5
-1
x 10
0.5
-0.5
Lowpass Equiripple
FIR 22 tap
Passband Magnified
0
10
Frequency (kHz)
15
-1
20
b)
Lowpass Equiripple
FIR 41 tap
Lowpass Magnified
0
10
Frequency (kHz)
15
20
e)

-100
-100
-110
-110
-120
-120
Magnitude (dB)
Magnitude (dB)
Lowpass Equiripple
FIR 41 tap
Frequency Response
-120
a)
Magnitude (dB)
-40
-100
Lowpass Equiripple
FIR 22 tap
Frequency Response
-120
-140

-20
Magnitude (dB)
Magnitude (dB)
-20
-130
-140
-150
-130
-140
-150
-160
-160
Lowpass Equiripple
FIR 22 tap
Stopband Magnified
-170
44.5
45
45.5
46
46.5
Frequency (kHz)
-170
-180
47
47.5
48
Lowpass Equiripple
FIR 41 tap
Stopband Magnified
50
60
70
80
Frequency (kHz)
90
c)
f)
Figure 10 Gentle, digital low-pass filters with very small errors in the 20kHz band using
high frequency sampling: 96kHz (a, b, c) or 192kHz (d, e, f)
The passband response of this kind of digital filter is not ideal flat in an obvious manner, having
specific additional error response as a constant ripple. This error can be approximated by
cosinusoidal shape in frequency domain, indicating pre and post-echoes in the time domain.
The above figures show echo amplitudes less than 80dB and timing variations of between 0.1ms
(approximated at 192kHz sampling rate) and 1.2ms (at 48kHz sampling rate).
However, these values are far away from those that were found to be quite perceptible by
untrained listeners (-30 dB at +/- 40ms).
21/29
Taking care of the interest in the growing requirement for restoration of degraded sources to get
improved resolution of the impulsive signals and an improved perception of musical transient
attacks passages, it is recommended to repeat perception experiments noticing the difference
between 48kHz and 96kHz or 192kHz in localisation accuracy with available real-less ideal
filters.
The real anti-alias and anti-image filters should develop, inside of a more or less large transition
region, the full attenuation of the filter in order to avoid alias or image specific distortions.
And, in accordance with this principle, for systems operating at low sampling frequency and
requiring small transition region (0.45Fs to 0.5Fs), it is very difficult to achieve the desired
performance even with highest performance integrated circuits and filters design.
4.2.1 The effect of aliasing during digitization process
The aliasing caused by the reflection of the spectrum of the audio signal about the folding
frequency (0.5Fs) during sampling process in an analogue to digital conversion process produces
frequency shifted signal in the audio band.
The poor rejection of the alias components in the transition region (above 19-20kHz, for
example) could involve low direct effects, being inaudible for most of listeners. But, any
intermodulation mechanism, likely to happen inside following stages of processing and
reproduction system, could provoke, at lower frequency, audible frequency distortion.
The alias signal, consequently, will modulate with the harmonics of the original signal generating
a-harmonic signals as intermodulation distortion.
The solution is to have full attenuation at the half of the sample frequency. On the other side it is
necessary to have as wide frequency response as possible for different sampling rate applications.
For example, most implemented digital filters as anti-alias filters, in A/D conversion, using
44.196kHz sampling rates, start at 45% and have full attenuation at 55% of the sample
frequency (Table 1).
Parameter
Min
Typ
Max
Single Speed Mode (2 kHz to 50 kHz sample rates)
Passband (-0.1 dB)
0
0.47
Passband Ripple
+/-0.035
Stopband
0.58
Stopband Attenuation
-95
Total Group Delay (Fs = Output Sample Rate) tgd
12/Fs
Dual Speed Mode (50 kHz to 100 kHz sample rates)
Passband (-0.1 dB)
0
0.45
Passband Ripple
+/-0.035
Stopband
0.68
-92
9/Fs
Quad Speed Mode (100 kHz to 200 kHz sample rates)
Passband (-0.1 dB)
0
0.24
22/29
Unit
Fs
dB
Fs
dB
s
Fs
dB
Fs
dB
s
Fs
Passband Ripple
+/-0.035
dB
Stopband
0.78
Fs
-97
dB
5/Fs
s
Table 2 - Digital filter characteristics of CS5381 (120 dB, 192 kHz, multi-bit audio A/D
converter), Cirrus Logic -Product information
Above exemplified filter, at 48kHz sampling frequency, offers 22.5kHz as passband edge and
27.5kHz as the end of transition region to the stopband full attenuation.
In this case, the a-harmonic mirrored frequencies: Fs-f (where f > 0.5Fs), reproduced in a
loudspeaker, could intermodulate with the audible signal and create, new audible frequency
components, so called Aliasing Intermodulation Distortion.
James Boyk carried out, in 1992-1997, measurements of several instruments, mainly in the Music
Lab at California Institute of Technology, capturing their ultrasonic extension and energy 29 (with
a Hewlett Packard 3567 FFT analyzer and two quarter inch microphones, a Bruel&Kjaer 4135
model and, the other, an Aco/Pacific 7016 model).
Regarding these aspects, he gave interesting information about the highest frequency where the
harmonics are still present (Table 3, for instruments with harmonics) and about the highest
frequency where the sound level is, at least, 10dB above background (Table 4, for instruments
without harmonics)
Instrument with
harmonics
SPL
(dB)
Harmonics
still present
1.
Trumpet (Harmon
mute)
96
>50kHz
Percentage of
power above
20 kHz
0.5%
2.
Trumpet (Harmon
mute)
76
>80kHz
2%
3.
Trumpet (straight
mute)
83
>85kHz
0.7%
4.
French horn (bell up)
113
>90kHz
0.03%
5.
French horn (mute)
99
>65kHz
0.05%
23/29
20kHz
6.
French horn
105
>55kHz
0.1%
7.
Violin (double-stop)
87
>50kHz
0.04%
8.
Violin (sul ponticello)
77
>35kHz
0.02%
9.
Oboe
84
>40kHz
0.01%
Table 3 Frequency extension and ultrasonic energy of some instruments with harmonics
Instrument without
harmonics
SPL
(dB)
1.
Speech Sibilant
72
Sound level:
10 dB above
background
>40kHz
2.
Claves
104
>102kHz
3.8%
3.
Rimshot (jazz music)
73
>90kHz
6%
4.
Crash Cymbal
108
>102kHz
40%
5.
Triangle
96
>90kHz
1%
6.
Keys jangling
71
>60kHz
68%
24/29
Percentage of
power above
20 kHz
1.7%
20kHz
7.
Piano
111
>70kHz
0.02%
Table 4 - Frequency extension and ultrasonic energy of some instruments without

harmonics
This evidence is not a confirmation for the ultrasound perception abilities, but it could be the
knowledge of the ultrasound reality that might interfere, indirectly, with the recording
reproducing process.
There are areas where the desired quality of audio restoration process is strong related with
previous signal enhancement, due to very poor high frequency response of most early recordings.
In this case, the high frequency information of recorded signal being buried deep in noise, it is
important to predict these low level components using an adequate model, including frequency
characteristics of instruments.
Even we ignore the frequency extension and ultrasonic energy of instruments, the non-linear
behaviour of the stages following the digital to analogue conversion could cause intermodulation
distortion artefacts.
So, the poor rejection of the alias components in the transition region and the nonlinearities in the
signal path (the behaviour of the loudspeakers being a good example, generating modulation
between frequency components of the signal) increase the incertitude during the evaluation
process of the audio restoration work.
4.2.2 The effect of imaging during audio signal reproduction
Even though the audibility and relevance of signals above 20 kHz is matter of further debates, all
images above folding frequency (0.5Fs), especially, for lower sample rate Fs, could provoke
distortion artefacts in audio band.
It is necessary to take into consideration, once again, the potential non-linear behaviour of the
electronic and electromechanical stages following the digital to analogue conversion.
Accordingly, the effects of high amplitude and frequency input signal components (bellow the
half sample frequency, 0.5Fs), having image components above 0.5Fs, (more or less attenuated by
image filter of D/A converter), should be evaluated in correlation with specific non-linearity in
amplifiers, loudspeakers or other parts of the system.
To maximise archiving quality, interrelated with necessary conditions for further restoration and
post-production activities, several investigations (objective analysis and subjective listening tests)
have to be done:
Of various tweeters response in order to evaluate their significant amounts of intermodulation
products, below 20kHz, when driven by ultrasonic signals;
Of amplifiers that can produce distortion products below 20kHz, audible (even with difficulty), in
the absence of other signals below 20kHz.
The sound systems quality should be judged using harmonic and intermodulation distortion
measurement numbers in the context of their effects perception. They remain purely
mathematical relationships without any further consideration for the characteristics of the
receiver the human ear.
25/29
Real systems can have frequency dependent nonlinearities, most notably loudspeakers, limiting
their performance at high amplitudes. Besides, the recent application of psychoacoustics to audio
data compression problems demonstrates the dominant role of masking in hearing acuity.
A-harmonic
signal as 1%
intermodulation
product (second
order products),
due to nonlinearities in the
signal path, when
aliasing distortion
is present
Magnitude Response (dB), 2Fs=96kHz, Halfband Anti-Alias Filter

0
-20
Magnitude (dB)
b)
-40
-60
-80

Aliasing distortion
Example of
instrument with
ultrasound energy
Intermodulation distortion
a)
-100
-120
3.9
10
15 17.9
21.8 24 26.2
30
35
40
45
Frequency (kHz)
f2=26.2kHz, f2=21.8kHz, f1=17.9kHz:

IMD (second order intermodulation product)= f2- f1 (~-60dB)
26/29
Magnitude Response (dB), 2Fs=96kHz, Halfband Anti-Image Filter

0
-20
-60
-80
-100
Imaging distortion
-40
Intermodulation distortion
A-harmonic
signal as 1%
intermodulation
product (second
order products),
due to nonlinearities in the
signal path, when
imaging
distortion is
present
Magnitude (dB)
c)
-120
10
15
21
24
27
30
35
40
45
Frequency (kHz)
f2=27kHz (0dB), f2=27kHz (-32dB), f1=21kHz:

IMD (second order intermodulation product)= f2- f1 (~-70dB)
Figure 11Simulated effects of inadequate low pass filtering in A/D and D/A, 48kHz
sampling frequency subsystems, producing frequency shifted signal in the audio band;
Crash cymbals recording as an example of instrument with ultrasound energy
(copyright, James Boyk)
Second order IMD as effect of poor alias rejection at close to the 0.5Fs frequency, using
half-band anti-alias filter (as decimation filter before the decrease sampling rate stage
in an oversampling A/D converter)
Second order IMD as effect of poor image rejection at close to the 0.5Fs frequency,
using half-band anti-image filter (as interpolation filter after the increase sampling
rate stage in an oversampling D/A converter).
5 CONCLUSIONS
The requirements for higher resolution in acquisition of the impulsive signals and better
perception of musical transient attacks passages in the restoration of degraded sources activities
should be analyzed in the modern surrounding conditions with extended bandwidth, gentle
filtering, improved phase and impulse characteristics.
The effort to increase bandwidth should be correlated with new designing results for an improved
off-axis response of loudspeaker and better sound quality at higher frequency of those, with
diaphragm resonances located well out of audible range.
In these conditions the transfer work for digital preservation, interpreted as the creation of a
surrogate (as an accurate, authentic, and very high quality representation of the original), could
start, identifying all necessary and adequate equipment and operating personnel that could be
involved in the preservation system.
The evaluation of the factors that influence the A/D converter fidelity described here indicate
that, reducing distortion mechanisms by filters designed for higher sampling frequency with
27/29
relaxed transition region, an improvement in localisation of sound sources could be made

reducing the audibility of the echo 30 .
Analogue recordings, with different audio fidelity peculiarities, should be digitized using a highquality A/D converter, trying to minimize the risk of losing information of the original source.
For a good transcription, the merits of the audio conversion equipment, with optimum coverage
of the human hearing limits, should be considered before any evaluation of the time and effort
needed to achieve the result:
96 or 192 kHz sampling frequency for a wide audio bandwidth, good temporal response,
and improved low-pass filter characteristics;
24-bit word length for a large dynamic range, with more headroom in level setting and
good margin for the effects of rounding in subsequent digital signal processing;
More than one conversion of the same analogue source, using different converters,
critically monitoring input and output levels, using high-quality D/A converters, high
quality loudspeakers, and ambient room (acoustics) conditions.
6 BIBLIOGRAPHY AND REFERENCES

1
Watanabe K., FPC Inc., A Kodak Company, Evolution Availability Longevity, Joint Technical Symposium, 2004
Watkinson J., Is digital storage more reliable than analogue?, Resolution, November/December 2002
3
Bradley K., Critical Choices, Critical Decisions: Sound Archiving and Changing Technology, 2004
4
Schuller, D. Preserving Audio and Video Recordings in the Long-term, International Preservation News, 14, 1997.
(On-line): Hhttp://www.ifla.org/VI/4/news/14-97.htmH
5
Schuller, D. Preserving the Facts for the Future: Principles and Practices for the Transfer of Analog Audio
Documents into the Digital Domain. Journal of the Audio Engineering Society, 49 (2001), 7/8, 618-621
6
Hafner, A. The Suedwestrundfunk (SWR) and the Mass Storage Systems in Its Radio Sound Archives: Concepts
and some Performance/Cost Aspects, 106th Audio Engineering Society Convention, Munich, Germany, May 08-11,
1999
7
Herla, S., Houpert J. and Lott, F. From Single-Carrier Sound Archive to BWF Online Archive A New Optimized
Workstation Concept, Journal of the Audio Engineering Society, 49, 7/8, 2001, p. 606-617
8
Presto Space, Preservation Status, Annual Report on Preservation Issues for European Audiovisual Collections,
Deliverable D22.4 DIS4, 31/01/2005
9
Best Practices For Audio Preservation, by Mike Casey, Indiana University and Bruce Gordon, Harvard University,
Hhttp://www.dlib.indiana.edu/projects/sounddirections/bestpractices2007/H
10
IASA-TC 03: The Safeguarding of the Audio Heritage: Ethics, Principles and Preservation Strategy, Version 3,
December 2005,
Hhttp://www.iasa-web.org/IASA_TC03/IASA_TC03.pdfH
11
IASA-TC 04: Guidelines on the Production and Preservation of Digital Audio Objects
12
Capturing Analog Sound for Digital Preservation: Report of a Roundtable Discussion of Best Practices for
Transferring Analog Discs and Tapes, CLIR/LC, NRPB (Council on Library and Information Resources and the
Library of Congress under the auspices of the National Recording Preservation Board)
13
Richard Warren, Jr., Storage of Sound Recordings, ARSC Journal 24, no. 2 (1993)
14
Bradley K., Critical Choices, Critical Decisions: Sound Archiving and Changing Technology, 2004
15
Ken C. Pohlmann, Measurement and Evaluation of Analog-to-Digital Converters Used in the Long Term
Preservation of Audio Recordings (roundtable discussion, Issues in Digital Audio Preservation Planning and
Management, Washington, DC, March 10-11, 2006). Also available online:
http://www.clir.org/activities/details/AD-Converters-Pohlmann.pdf.
2
16
Joshua D. Reiss, Understanding sigma-delta modulation: the solved and unsolved issues, J. Audio Eng. Soc.,
Vol. 56, No. 1/2, 2008 January/February
28/29
17
AES17, AES standard method for digital audio engineering - Measurement of digital audio equipment, J. Audio
Eng. Soc., vol. 46 No. 5, pp. 428-447, 1998 May
18
Fielder, L. Dynamic Range Issues in the Modern Digital Audio Environment Proceedings AES UK Conference
Managing the Bit Budget, 3-19 (May 1994)
19
Joshua D. Reiss, Understanding sigma-delta modulation: the solved and unsolved issues, J. Audio Eng. Soc.,
Vol. 56, No. 1/2, 2008 January/February
20
Cirrus Logic - CS5381, 120 dB, 192 kHz, multi-bit audio A/D converter, Advance product information
21
Thomas Sandmann, Comparative test 24-bit-converters Apogee AD-8000 and RME ADI-8 DS, PMA
Production Management
22
Martin Colloms, Do we need an ultrasonic bandwidth for higher fidelity sound reproduction?, Proceedings of the
Institute of Acoustics, Vol. 28, Pt. 8, 2006
23
Tsutomu Oohashi, et al, Inaudible high-frequency sounds affect brain activity: hypersonic effect, Journal of
Neurophysiology, 83:3548-3558, 2000, http://jn.physiology.org/cgi/content/full/83/6/3548
24
Nishigichi et al, Perceptual discrimination between musical sounds with and without very high frequency
components, NHC Laboratory Note no 486, AES 115th Convention 2003
25
Sanjit Mitra, Digital signal processing, a computer-based approach, McGraw Hill, Second edition, 2001
26
Parks T.W. and Burrus C.S., Digital filter Design, Wiley, 1987
27
Parks T.W. and McClellan J.H., Chebyshev approximation for nonrecursive digital filters with linear phase,
IEEE Trans. On Circuit Theory, CT-19: 189-194, 1972.
28
Stanomir D. Discrete signals and systems, Bucharest, Athena, 1997
29
Boyk J. Theres life above 20 kilohertz! A survey of musical instrument spectra to 102 kHz, California Institute
of Technology, Music Lab, 1997
Hhttp://www.cco.caltech.edu/~musiclabH
30
Dunn J. Anti-alias and anti-image filtering: The benefits of 96kHz sampling rate formats for those who cannot
hear above 20kHz, 104th AES Convention, Amsterdam, May 1998
29/29

Recommended Methods and Values For Digitization Projects of Audio Archives

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recommended Methods and Values For Digitization Projects of Audio Archives

Uploaded by

Copyright:

Available Formats

MANDATORY PRINCIPLES, RECOMMENDED METHODS AND VALUES

FOR DIGITIZATION PROJECTS OF AUDIO ARCHIVES

2 CONSERVATION, MIGRATION, RESTORATION

New Media Creation

Create a digital master

Create a new archive

Replace old item

Figure 1 Digitisation and associated activities

3 RECOMMENDED METHODS TO IDENTIFY THE BEST POSSIBLE

Just Audible Noise Level

(for 20-kHz low-pass - filtered white noise

Wide-band noise levels in

Requirements (dynamic range)

Available distribution media

Figure 2 The dynamic range values in a complete audio signal chain

3.1 Recommended Values

Step Response, Fs=96000Hz, Fpass=20000Hz

This equation is based on several assumptions:

32bit-DSP : dynamic range

24bit-A/D : dynamic range

Music - Approx. range

Speech - Approx. range

8 extra bits for guardband computer errors

Power spectrum estimate - Musical modern recording fragment, 24bit, Fs =192.000Hz

Figure 8 Error sources in a digitisation operation including a FIR filter processing

4 THE TRADE-OFFS OF HIGH SAMPLING FREQUENCIES IN

 6.02n + 1.76 + 10 log10 (2 N + 1) 9.94 N + 3.01(2 N + 1)m dBFS

Example 2: Stand alone equipment

SNR (dB) rms unweighted

21.44 kHz (3 dB)

Conclusions about these issues

It is no theoretical basis for these well-defined

effective solution for all above issues, it is not

Established consensus, perception tests

SACD and DVD-A carriers are

The extensive brain scanner investigation 23 with

4.2 The benefits of high sampling rate in anti-alias and anti-image

In consequence, filter design algorithms rely on iterative optimization techniques in order to

Magnitude Response (dB), Fs=96kHz

Magnitude Response (dB), Fs=48kHz

Magnitude Response (dB), Fs=96kHz

Frequency (kHz): 21.41602

Magnitude Response (dB), Fs=96kHz

Magnitude Response (dB), Fs=192kHz

Magnitude Response (dB), Fs=96kHz

Magnitude Response (dB), Fs=96kHz

Magnitude Response (dB), Fs=192kHz

Magnitude Response (dB), Fs=96kHz

Frequency (kHz): 29.90625

French horn (bell up)

French horn (mute)

Violin (sul ponticello)

Rimshot (jazz music)

Table 4 - Frequency extension and ultrasonic energy of some instruments without

Magnitude Response (dB), 2Fs=96kHz, Halfband Anti-Alias Filter

Frequency (kHz): 26.20313

f2=26.2kHz, f2=21.8kHz, f1=17.9kHz:

Magnitude Response (dB), 2Fs=96kHz, Halfband Anti-Image Filter

f2=27kHz (0dB), f2=27kHz (-32dB), f1=21kHz:

relaxed transition region, an improvement in localisation of sound sources could be made

6 BIBLIOGRAPHY AND REFERENCES

You might also like

6.02n + 1.76 + 10 log10 (2 N + 1) 9.94 N + 3.01(2 N + 1)m dBFS