You are on page 1of 26

Audio Compression

by: Philipp Herget

Su ciency Course Sequence: Course Number HI1341 HI2328 MU1611 MU2611 MU3611 Course Title Introduction to Global History History of Revolution in the 20th Century Fundamentals of Music I Fundamentals of Music II Computer Techniques in Music Term A92 B92 A93 B93 C94

Presented to: Professor Bianchi Department of Humanities & Arts Term B, 1996 FWB5102

Submitted in Partial Ful llment of the Requirements of the Humanities & Arts Su ciency Program Worcester Polytechnic Institute Worcester, Massachusetts

Abstract
This report examines the area of audio compression and its rapidly expanding use in the world today. Covered topics include a primer on digital audio, discussion of di erent compression techniques, a description of a variety of compressed formats, and compression in computers and Hi-Fi stereo equipment. Information was gathered on a multitude of di erent compression uses.

1 Introduction 2 Digital Audio Basics 3 Compression Basics

Contents

3.1 Lossless vs. Lossy Compression : : : : : : : : : : : : : : : : : : : : : : : : : 7 3.2 Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.3 Common Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : 10

1 2 7

4 Uses of Compression 5 Conclusion Bibliography

4.1 Compression in File Formats : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 4.2 Compression in Recording Devices : : : : : : : : : : : : : : : : : : : : : : : 19

17 22 23

1 Introduction
The rst form of audio compression came out in 1939 when Dudley rst introduced the VOCODER (VOice CODER) to reduce the amount of bandwidth needed to transmit speech over a telephone line (Lynch, 222). The VOCODER broke speech down into certain frequency bands, transmitted information about the amount of energy in each band, and then synthesized speech using the transmitted information on the receiving end of the device. Since then, there has been a great deal of research conducted in the area of audio compression. In the 1960's, compression was used in telephony, and extensive research was done to minimize bandwidth needed to transmit audio data (Nelson, 313). Today, audio compression is a large subarea of Audio Engineering. The need for audio compression is brought about by the tremendous amount of space required to store high quality digital audio data. One minute of CD quality audio data takes up 4Mbytes of storage space (Ratcli , 32). The use of compression allows a signi cant reduction in the amount of data needed to create audio sounds with usually only a minimal loss in the quality of the audio signal. Compression comes at the expense of the extra hardware or software needed to compress the signal. However, in todays technologically advanced times, this cost is usually small compared to the cost of space that is saved. Compression is used in almost all new digital audio devices on the market, and in many of the older ones. Some examples are the telephone system, digital message recorders, like those in answering machines, and Sony's new MiniDisc player. With the use of compression, these devices are able to store more information in less space. Compression is accompanied by a loss in quality, but usually so minimal it cannot be heard by most people. A good example of this is the anti-shock mechanism found in the newer CD players. This mechanism uses a small portion of digital memory to bu er digital data from the CD. When a physical shock disrupts the player and it can no longer read data from the CD, the data from the memory bu er is used to generate the audio signal until the player re-tracks on the CD. To store a maximum amount of data, the player uses compression to store the data in the memory. The 1

Panasonic SL-S600C has such an anti-shock mechanism with 10 seconds of storage bu er. The Panasonic SL-S600C Operating Instructions state: The extra anti-shock function incorporates digital signal compression technology. When listening to sound with the unit connected to a system at home, it is recommended that the extra anti-shock switch be set to the OFF position. The recommendation is given because the compression algorithm used in the storage has a slightly detrimental impact on the sound quality. The use of audio compression is a tradeo among di erent factors. Knowledge of audio compression is useful not only to the designer, but also the consumer. The key questions that arise in the evaluation of an audio compression systems are how much is the data compressed, what are the losses associated with the compression, and what is the cost of the compression. This paper will answer some of these questions by providing a basic awareness of compression, giving background on compression, explaining various popular compression techniques, and discussing the compression formats used in various audio devices and audio computer les.

2 Digital Audio Basics


Compression can be accomplished using two di erent methods. The rst method is to take the data from a standard digital audio system and compress it using software. The second is to encode the signal in a di erent yet similar manner to that done in a normal digital audio system. Both of these methods are based on digital audio theory, therefore, the understanding of their functionality and performances requires an understanding of digital audio basics. The sounds we hear are caused by variations in air pressure which are picked up by our ear. In an analog electronic audio system, these pressure signals are converted to a electric voltage by a microphone. The changing voltage, which represents the sound pressure, is stored on a medium (like tape), and later used to control a speaker to reproduce the original sound. The largest source of error in such an audio system occurs in the storage and retrieval process were noise is added to the sound. 2

Voltage (Air Pressure)

time

Figure 1: An Example of an Analog Waveform The idea behind a digital system is to represent an analog (continuous) waveform as a nite number of discrete values. These values can be stored in any digital media, such as a computer. Later, the values can be converted back to an analog audio signal. This method is advantageous over the older analog techniques because no information (quality) is lost in the storage and retrieval process. Also unlike analog, when a copy of a digital recording is made, the values can be exactly duplicated, creating an exact replica of the original digital work. However, the process does su er other losses. These losses occur in the conversion process from the analog to the digital format. To explain the analog to digital conversion process, we will look at an analog audio waveform and show each of the steps taken in digitizing it. The waveform in Figure 1 represents a brief moment of an audible sound. The amplitude of the waveform represents the relative air pressure due to the sound. In a digital system, the waveform is represented by a series of discrete values. To get these values, two steps must be taken. First the signal is sampled. This means that discrete values of the signal are selected in time. The second step is to quantize each of the values attained in the sampling step. Quantization reduces the amount of storage space required for each value in a digital system. In the rst step, the samples are taken at constant intervals. The number of samples 3

Voltage

time
T

Figure 2: An Example of a Sampled Analog Waveform taken every second is called the sampling rate. Figure 2 shows the result of sampling the signal. The X's on the waveform represent the samples which were taken. Since the samples were taken every T seconds, there are 1=T samples per second. The sampling rate shown in Figure 2 is therefore 1=T samples/s. Typical sampling rates range from 8000 to 44100 samples/s for a CD. The term samples/s is often replaced by the term Hz, kHz, or MHz to represent units of samples/s, kilo samples/s, or Mega samples/s respectively (Audio FAQ). The sample values, the values with the X's, now represent the original waveform. These values could now be stored, and be used at a later time to recreate the original signal. How well the original signal can be recreated, is related to the number of samples taken in a given time period. Therefore, the sampling rate is a critical factor in the quality of the digitized signal. If too few samples are taken, then the original signal cannot be re-generated correctly. In 1933, a publication by Harry Nyquest proved that if the sampling rate is greater that twice the highest frequency of the original signal, the original signal can be exactly reconstructed (Nelson, 321). This means that if we sample our original signal at a rate that is twice as high as the highest frequency contained in the signal, there will be no theoretical losses of quality. This sampling rate, necessary for perfect reconstruction, is commonly referred to as the Nyquest rate. Now that we have a set of consecutive samples of the original signal, the samples need 4

Voltage

time
T

Figure 3: An Example of a Quantization of the A Sampled Analog Waveform to be quantized in order to reduce the storage space required by each sample. The process involves converting the sampled values into a certain number of discrete levels, which are stored as binary numbers. A sample value is typically converted to one of 2n levels, where n is the number of bits used to represent each sample digitally. This process is carried out in hardware by a device called an analog to digital converter (ADC). The result of quantizing the values from Figure 2 is shown in Figure 3. The samples still have approximately the same value as before, but have been \rounded o " to the nearest of 16 di erent levels. In a digital system, the amount of storage space required by a number is governed by the number of possible values that number could have. By quantizing the sample, the number of possible values is limited, signi cantly reducing the required storage space. After quantizing the value of each sample in the gure to one of 24 levels, only 4 bits of storage are needed for each sample. In most digital audio systems, either 8 or 16 bits are used for storage, yielding 28 = 256 or 216 = 65536 di erent levels in the quantization process. The quantization process is the most signi cant source of error in a digital audio signal. Each time a value is quantized, the original value is lost, and the value is replaced by an approximation of the original. The peak value of the error is 1=2 the value of the quantization step. Thus the smaller the quantization steps, the smaller the error is. This means the more 5

Voltage

time
T

Figure 4: An Example of a Signal Reconstructed from the Digital Data bits used to quantize the signal, the better the quality of reconstructed sound signal, and the more space required to store the signal values. To regain the original signal, each of the values stored as the digital audio signal are converted back to an analog audio signal using a Digital to Analog Converter (DAC). An example of the output of the DAC is shown in Figure 4. The DAC takes the sample points and makes an analog waveform out of them. Due to the process used to convert the waveform, the resulting signal is comprised of a series of steps. To remedy this, the signal is then put through a low pass lter which smoothes out the waveform, removing all of the sharp edges caused by the DAC. The resulting signal is very close to the original. All the losses in the digital system occur in the conversion process to and from a digital signal. Once the signal is digital, it can be duplicated, or replayed any number of times and never lose any quality. This is the advantage of a digital system. The losses generated by the conversion process can be measured as a Signal to Noise Ratio (SNR), the same measure used for analog signals. The noise in the signal is considered to be the signal that would have to be subtracted from the reconstructed signal to obtain the original. SNR is used to compare the quality of di erent types of quantization, and is also used in the quality measurement of compression techniques. 6

3 Compression Basics
The underlying idea behind data compression is that a data le can be re-written in a di erent format that takes up less space. A data format is called compressed when it saves either more information in the same space, or saves information in less space than a standard uncompressed format. A compression algorithm for an audio signal will analyze the signal and store it in a di erent way, hopefully saving space. An analogy could be made between compression and shorthand. In shorthand, words are represented by symbols, e ectively shortening the amount of space occupied. Data compression uses the same concept.

3.1 Lossless vs. Lossy Compression


The eld of compression is divided into two categories, lossless and lossy compression. In lossless compression, no data is lost in the compression process. An example of a lossless compression program is pkzip for the IBM PC. This is a shareware utility which is widely available. It can be used to compress and uncompress any type of computer le. When a le is uncompressed, the exact original is retrieved. The amount of compression that is achieved is highly dependent on the type of le, and varies greatly from le to le. In lossy compression schemes, the goal is to encode an approximation of the original. By using a close approximation of the signal, the coding can usually be accomplished using much less space. Since an approximation is saved, instead of the original, lossy compression schemes can only be used to compress information when the exact original is not needed. This is the case for audio and video data. With these types of data, any digital format used is an approximation of the original signal. Compression used in computer data or program les must be compressed using lossless compression because all of the data is usually critical. In general, lossy compression schemes yield much higher compression ratios than lossless compression schemes. In many cases, the di erence in quality between the compressed version and the original is so minimal that it is not noticeable. Yet, in other compression 7

schemes there is a signi cant di erence in quality. Deciding what how much information is to be lost is up to the discretion of the designer of the algorithm or technique. It is a tradeo between size and quality. If the shorthand writer, from the previous analogy, was to write only the main idea's of the text down, it would be analogous to lossy compression. Using only the main ideas would be an extreme form of compression. If he or she were to leave out some adjectives and adverbs, it would again be a form of lossy compression. This one being less lossy than the rst. From the analogy, it can be seen how the writer (programmer) can decide how important the details are and how many details to include. Almost all compression techniques used in digital systems are lossy. This is because lossless compression algorithms are generally very unpredictable in the amount of compression they can achieve. In a typical application, there is a limited amount of \space" for the digital audio data that is generated. If the audio data cannot be compressed to a guaranteed size, it simply will not t in the required space, which is unacceptable. The reason for the unpredictability of a lossless technique lies in the technique itself. Data which happens to be in a format which does not lend itself to the way the lossless technique \re-writes" the data will not be compressed. In The Data Compression Book, Mark Nelson compares raw speech les which were compressed with a shareware lossless data compression program, ARJ, to demonstrate how well a typical lossless compression scheme will compress an audio signal. He states: ARJ results showed that voice les did in fact compress relatively well. The six sample raw sound les gave the following results: SAMPLE-1.RAW SAMPLE-2.RAW SAMPLE-3.RAW SAMPLE-4.RAW SAMPLE-5.RAW SAMPLE-6.RAW

Filename

Original Compressed Ratio


50777 12033 73019 23702 27411 15913 33036 8796 59527 9418 19037 12771 35% 27% 19% 60% 30% 20%

His data shows that the compression ratios uctuate greatly depending on the particular sample of speech that is used.

3.2 Audio Compression Techniques


For any type of compression, the compression ratio and the algorithm used is highly dependent on the type of data that is being compressed. The data source used in this paper is audio data, and we have already determined that lossy compression will be used in most cases. Now we can further subdivide the source into music and voice data. The more information that is known about the source, the better the compression technique can be tailored toward that type of data. The di erences between music and speech allow audio compression techniques to be subdivided into two categories: waveform coding and voice coding. Waveform coding can be used on all types of audio data, including voice. The goal of waveform coding is to recreate the original waveform after decompression. The closer the decompressed waveform is to the original, the better the quality of the coding algorithm is. The second technique, voice coding, yields a much higher compression ratio, but can only be used if the audio source is a voice. In voice coding, the goal is to recreate the words that were spoken and not the actual voice. The algorithms \utilize priori information about the human voice, in particular the mechanism that produces it" (Lynch, 255). Since the two techniques are fundamentally di erent, the performance of each technique is measured di erently. The performance of waveform coding techniques are measured by determining how well the uncompressed signal matches the original speech waveform. This is usually done by measuring the SNR. With the voice coding technique this is not possible since the technique doesn't try to mimic the waveform. Therefore, in voice coding algorithms, the quality of the algorithm is measured by listener preference. These coding techniques can be further subdivided into two categories, time domain coding and frequency domain coding. In a time domain coding technique, information on each of the samples of the original signal are encoded. In a frequency domain coding technique, 9

the signal is transformed into it's frequency representation. This frequency representation is then encoded into a compressed format. Later the information is decoded, and transformed back into the time representation of the signal to get back the original samples. Most simple compression algorithms use a time domain coding technique. The more recent waveform coding techniques provide a much higher compression ratio by using psychoacoustics to aid in the compression. Psychoacoustics is \the study of how sounds are heard subjectively and of the individual's response to sound stimuli" (Webster's New World Dictionary, 1147). By basing the compression scheme on psychoacoustic phenomenon, data that can't be heard by humans can be discarded. For example, in psychoacoustics it has been determined that certain levels of sounds cannot be heard while other louder sounds are present (Beerends, 965). This e ect is called masking. By eliminating the unheard sounds from the audio signal, the signal is simpli ed, and can be more easily compressed. Techniques like these are used in modern systems where high compression ratios are necessary, like Sony's new MiniDisc player.

3.3 Common Audio Compression Techniques


The techniques that have been discussed thus far are general subcategories of the approaches that can be taken when designing an audio compression algorithm. In this section, the details of some popular compression techniques will be discussed. Since compression is such a large area, a comprehensive guide to all the di erent compression methods is far beyond the scope of this paper. However, this section covers some fundamental and some advanced techniques to provide a general idea of how di erent compression techniques are implemented. To give a general background, both waveform and voice coding techniques are discussed. Since the waveform coding techniques are simpler, they will be discussed rst. In these techniques, the compressed digital data is often obtained from the original signal itself, rather than creating standard digital audio data and compressing it with software. 10

3.3.1 Waveform Coding Techniques PCM


Pulse Code Modulation (PCM) refers to the technique used to code the raw digital audio data as described in Section 2. It is the fundamental digital audio technique that is used most frequently in digital audio systems. Although PCM is not a compression technique, when it is used along with non-uniform quantization such as {Law or A{Law, it can be considered compression. PCM combined with non-uniform quantization is used as a reference for comparing the performance of other compression schemes (Lynch, 225).

{Law and A{Law Companding


Since the dynamic range of an audio signal is very wide, an audio waveform having a maximum possible amplitude of 1 volt may never reach over 0.1 volts if the audio signal is not very loud. If the signal is quantized with a linear scale, the values attained by the signal will cover only 1/10 of the quantization range. As a result, the softer audio signals have a very granular waveform after being quantized, and the quality of the sound deteriorates rapidly as the sound gets softer. To compensate for the wide dynamic range of audio signals, a nonlinear scale can be used to quantize the signal. Using this method, the digitized signal will have an increased number of steps in the lower range, alleviating the problem (Couch, 152). Using non-uniform quantization can raise the SNR for a softer sound, making the SNR for a wide range of sound levels approximately uniform (Couch, 155). Typically, non-uniform quantization is done on a logarithmic scale. The two standard formats for the logarithmic quantization of a signal are {Law and A{Law. A{Law is the standard format used in Europe (Couch, 153), and {Law is used in the telephone systems of the United States, Canada, and Japan. The {Law quantization, used in phone systems, uses eight bits of data to provide the dynamic range that normally requires twelve bits of PCM data (Audio FAQ). The process of converting a computer le to {Law is a form of compression, since the 11

amount of data that is needed per sample is reduced and the dynamic range of the sample is increased. The result is much less data with more information. To create {Law or A{ Law data, the signal must be originally be compressed and later expanded. This process is commonly referred to as companding.

Silence Compression
Silence compression is a form of lossless compression that is extremely easy to implement. In silence compression, periods of relative silence in a audio signal are replaced by actual silence. The samples of data that were used to represent the silent part are replaced by a code and a number telling the device which reconstructs the analog signal how much silence to insert. This reduces all of the data needed to represent the silent part of the signal down to a few bytes. To implement this, the compression algorithm rst determines if the audio data is silent by comparing the level of the digital audio data to a threshold. If the level is lower than the threshold, that part of the audio signal is considered silent, and the samples are replaced by zeros. The performance of the algorithm therefore hinges on the threshold level. The higher the level, the more compression there is but the more lossy the technique is. The amount of compression achieved also depends on the total length of all the silent periods in an audio signal. The amount can be very signi cant in some types of audio data like voice data. Silence encoding is extremely important for human speech. If you examine a waveform of human speech, you will see long, relatively at pauses between the spoken words. (Ratcli 32) In The Data Compression Book, Mark Nelson wrote silence compression code in C, and used it to compress some PCM audio data les. The results he obtained were as follows: SAMPLE-1.RAW SAMPLE-2.RAW SAMPLE-3.RAW SAMPLE-4.RAW SAMPLE-5.RAW

Filename

Original Compressed Ratio


50777 12033 73019 13852 27411 37769 11657 73072 10962 22865 26% 3% 0% 21% 17%

12

a)

b)

Figure 5: An Example of Signals in a DM waveform: a) The original and reconstructed waveforms and b) The DM waveform The table indicates that silence compression can be very e ective in some instances, but in others it may have no e ect at all, or even increase the le size slightly. Silence compression is used mainly in le formats found in computers.

DM
Delta Modulation (DM) is one of the most primitive forms of audio encoding. In DM, a stream of 1 bit values is used to represent the analog signal. Each bit contains information on whether the DM signal is greater or less than the actual audio signal. With this information, the original signal can then be reconstructed. Figure 5 shows an example DM signal, the original signal it was generated from, and the reconstructed signal before ltering. The actual DM signal, Figure 5b, contains information on whether the output should rise or fall. The size of the step and the rate of the steps are xed. The reconstruction algorithm simply raises or lowers the input value according to the DM waveform. DM su ers from two major losses, granular noise and slope overload. Granular noise occurs when the input signal is at. The DM signal simulates at regions by rising and falling, leading to granular noise. Slope overload is caused when the input signal rises faster 13

than the DM signal can follow it. Granular noise can be eliminated by making the step size small enough, and slope overload can be prevented by increasing the data rate. However, decreasing the step size and increasing the data rate, also increases the amount of data needed to store the signal. DM is rarely used, but was explained here to provide a basis for understanding ADM, which o ers a signi cant advantage over PCM.

ADM
Adaptive Delta Modulation (ADM) is the solution to the problems with DM. In ADM, the step size is continuously adjusted, making the step size larger in the fast changing parts of the signal and smaller in the slower changing parts of the signal. Using this technique, both the granular noise and the slope overload problems are solved. In order to adjust the step size, an estimation must be made to determine if the signal is changing rapidly. The estimation in ADM is usually based on the last sample. If the signal increased for two consecutive samples, the step size is increased. If the two previous steps were opposite in direction, then the step size is decreased. This estimation method is simple yet e ective. The performance of ADM using the above technique turns out to be better than Log PCM when little data is used to represent a signal1. When more data is used however, Log PCM performs better (Lynch 229).

DPCM
A Di erential Pulse Code Modulation (DPCM) system consists of a predictor, a di erence calculator, and a quantizer. The predictor predicts the value of the next sample. The di erence calculator then determines the di erence between the predicted value and the actual value. Finally, this di erence value is quantized by the quantizer. The quantized di erences are used to represent the original signal.
1

Performance is measured with SNR.

14

Essentially, a DM signal is a DPCM signal with one bit being used in the quantization process and a predictor based on the previous bit. In a DM system, the predicted value is always the same as the previous value and the di erence between the predicted value (previous value) and the actual signal is quantized with using one bit (two levels). The performance of a DPCM signal depends on the predictor. The better it can predict where the signal is headed, the better it will perform. A DPCM system using one previous value in the predictor can achieve the same SNR as a {Law PCM system using one less bit to quantize each sample value. If three previous values are used for the predictor, the same SNR can be achieved using two bits less to represent each sample (Lynch 227). This is a signi cant performance increase over PCM because it obtains the same SNR using less data. This technique can be extended even further by making the prediction method adaptive to the input data. The technique is called Adaptive Di erential Pulse Code Modulation (ADPCM).

ADPCM
ADPCM is a modi cation of the DPCM technique making the algorithm adapt to the characteristics of the signal. The relationship between DM and ADM is the same as that between DPCM and ADPCM. In both of these, the algorithm is made adaptive to the changes in the audio signal. The adaptive part of the system can be built into the predictor, the quantizer, or both, but has been shown to be most e ective in the quantizer (Lynch 227). Using this adaptive algorithm, the compression performance can be increased beyond that of DPCM. \Cohen (1973) shows that by using the two most signi cant bits in the previous three samples, a gain in SNR of 7dB over non-adaptive DPCM can be obtained" (Lynch, 227). Di erent forms of ADPCM are used in many applications including inexpensive digital recorders. Also, ADPCM is used in public compression standards which are slowly gaining popularity, like CCITT G.721 and G.723, which used ADPCM at 32 kbits/s and 24 or 40 kbits/s respectively (Audio FAQ).

15

PASC and ATRAC


All of the previously mentioned compression techniques are a relatively simple re-writing of the audio data. Precision Adaptive Subband Coding (PASC) and Adaptive TRansform Acoustic Coding (ATRAC) di er from these, because they are much more complex proprietary schemes which were developed for a speci c purpose. PASC and ATRAC were both developed for used in the Hi-Fi audio market. PASC was developed by Philips for use with the Digital Compact Cassette (DCC), and ATRAC was developed by Sony for use with their MiniDisc player. Both of these techniques use psychoacoustic phenomena as a basis for the compression algorithm in order to achieve the extreme compression ratios required for their applications. The details of the algorithms are complicated, and will not be discussed here. More information is given in the discussion of compression used in Hi-Fi audio equipment in Section 4.2. In addition to this, details on PASC can be found in Advanced Digital Audio by Ken Polmann, and details on ATRAC can be found in the Proceedings of the IEEE in an article titled, \The Rewritable MiniDisc System" by Tadao Yoshida.

3.3.2 Voice Coding Techniques LPC


Linear Predictive Coding (LPC) is one of the most popular voice coding techniques. In an LPC system, the voice signal is represented by storing characteristics about the system creating the voice. When the data is played back, the voice is synthesized from the stored data by the playing device. The model used in an LPC system includes the source of the sound, a variable lter resembling the human vocal tract, and an variable ampli er resembling the amplitude of the sound. The source of the sound is modeled in two di erent ways depending on how the voice is being produced. This is done because humans can produce two types of sound, voiced and unvoiced. Voiced sounds are those which are created by using the vocal cords and unvoiced 16

sounds are created by pushing air through the vocal tract. An LPC algorithm models these sounds by using either driven periodic pulses (voiced) or a random noise generator (unvoiced) as the source. The human vocal tract is modeled in the system as a time-varying lter (Lynch, 240). Parameters are calculated for the lter to mimic the changing characteristics of the vocal tract when the sound was being produced. The data used to represent the voice in an LPC algorithm consists of the information on the lter parameters, the source used (voiced or unvoiced), the pitch of the voice, and the volume of the voice. The amount of data generated by storing these parameters is signi cantly less than the amount of data used to represent the waveform of the speech signal.

GSM
The Global System for Mobile telecommunications (GSM) is a standard used for compression of speech in the European digital cellular telephone system. GSM is an advanced compression technique that can achieve a compression ratio of 8:1. To obtain this high compression ratio and still produce high quality sound, GSM is based on the LPC voice coding technique and also incorporates a form of waveform coding (Degener, 30).

4 Uses of Compression
Compression is used in almost all modern digital audio applications. These devices include computer les, audio playing devices, telephony applications, and digital recording devices. Many of the devices, like the telephone system, have been using compression for many years now. Others have just recently started using it. The type of compression that is used depends on cost, size, space, and many other factors. After reviewing a basic background on compression, one question remains unanswered: what type of compression is used for a particular application? In the following sections, the 17

uses of compression in two major areas will be discussed: computer les, and digital histereo equipment. Knowledge about these areas is particularly useful, because it can help in deciding which device to use.

4.1 Compression in File Formats


When digital audio technology was rst appearing on the market, each computer manufacturer had their own le format, or formats, associated with their computer (Audio FAQ). As software became more advanced, computers attained the ability to read more than one le format. Today, most software can read and write a wide range of le formats, leaving the choice to the user. In general, there are two types of le formats, \raw" and self-describing. In a raw le format data can be in any format. The encoding and parameters are xed and know in advance to be able to read the le. The self-describing format has a header in which di erent information about the data type are stored, like sampling rate and compression. The main concern here will be with self-describing le formats, since these are most often used and most versatile. A disadvantage of using compression in computer les is that the le usually needs to be converted to linear PCM data for playback on digital audio devices. This requires extra code and processing time. It also may be one of the reasons why approximately half of the le formats available for computers don't support compression. The following is a chart taken from the \Audio Tutorial FAQ" of The Center for Innovative Computer Applications. It describes most of the popular le formats on the market, and the compression that is used if any:

18

Extension, Name Origin


.au or .snd .aif(f), AIFF .aif(f), AIFC .i , IFF/8SVX .voc .wav, WAVE .sf none, HCOM none, MIME .mod or .nst NeXT, Sun Apple, SGI Apple, SGI Amiga Soundblaster Microsoft IRCAM Mac Internet Amiga

Variable Parameters
rate, #channels, encoding, info string rate, #channels, sample width, lots of info same (extension of AIFF with compression) rate, #channels, instrument info (8 bits) rate (8 bits/1 ch; can use silence deletion) rate, #channels, sample width, lots of info including compression scheme] rate, #channels, encoding, info rate (8 bits/1 ch; uses Hu man compression) usually 8-bit {Law compression 8000 samp/s] bank of digitized instrument samples with sequencing information]

Many of these le formats are just uncompressed PCM data with the sampling rate and the number of channels used during recording speci ed in the header. For the formats that do support compression, it is usually optional. For example, in the Soundblaster \.voc" format, silence compression can be used, and in the Microsoft \.wav" format, a number of di erent encoding schemes can be used including PCM, DM, DPCM, and ADPCM. Conversion from one format to another can be accomplished via software. The \Audio FAQ" also provides information on a number of di erent programs that will do the conversion. When converting from uncompressed to compressed formats, the le is generally smaller afterwards, but some quality is lost. If the le is later converted back, the size will increase, but the quality can never be regained.

4.2 Compression in Recording Devices


There are currently four major digital stereo devices on the market. These are the Compact Disc (CD), the Digital Analog Tape (DAT), the Digital Compact Cassette (DCC), and the MiniDisc (MD). They are all very di erent from each other. The CD and MD use an optical storage mechanism, and the DAT and DCC use a magnetic tape to store the data. There are also a number of other apparent di erences between the mediums. For example, a CD is not 19

re-writable while the others are. A major di erence that may not be apparent, however, is that the MD and DCC utilize digital data compression while the DAT and CD do not. This allows the MD and DCC to be physically smaller than their uncompressed counterparts. In both devices, the smaller data size is necessary and advantageous. In the MD, the design goal was to make the optical disc small so that it would be portable. The MD contains the same density of data as the CD. Only by using compression can the disc be made physically smaller than the CD. In addition to reducing the size, the compression used gave the MD other advantages. It allowed the MD to be the rst optical player with the digital anti-shock mechanism described in the introduction. Since less data is required to generate sound and the MD reads at the same speed as the CD, the MD can read more data than it needs to generate sound. The extra data is stored in a bu er, which does not need to be very big. CD's eventually came out with the same technology, but in order to implement it, the reading speed of the CD needed to be increased, and the data needed to be compressed after reading to t it into a memory bu er. The design goal of the DCC was to make the storage medium inexpensive and the same size as an audio tape. By doing this, a DCC player could accept standard audio tapes as well as the new DCC tapes, making it more marketable. To be able to t the data onto a relatively inexpensive tape medium which can be housed in an audio cassette case, digital compression was required. In both the MD and DCC, the space available for digital audio data was approximately 1=4 of the size required for PCM data. The compression ratio needed was therefore approximately 4:1. To obtain such high compression rates, the compression schemes utilize psychoacoustic phenomena. Precision Adaptive Subband Coding (PASC) is the compression algorithm that is used for the DCC to provide a 4:1 compression of the digital PCM data. PASC is described in the book Advanced Digital Audio, edited by Ken Pohlmann: 20

The PASC system is based on three principles. First, the ear only hears sounds above the threshold of hearing. Second, louder sounds mask softer sounds of similar frequency, thus dynamically changing the threshold of hearing. Similarly, other masking properties such as high- and low-frequency masking may be utilized. Third, su cient data must be allocated for precise encoding of sounds above the dynamic threshold of hearing. Using PASC, enough digital data can t onto a medium the size of a cassette to make the DCC player feasible. The MD uses the ATRAC compression algorithm, which is based on the same psychoacoustical phenomenon. Compression in a MiniDisc is more advanced, however. The MiniDisc achieves a compression ratio of \5:1 in order to o er 74 min of playback time" (Yoshida, 1498). Although these algorithms o er such a high compression, there are some losses that are involved. Experts claim that they can hear a di erence between a CD and a MD, but the actual losses are so minimal that the average person will not hear them. The largest errors occur with certain types of audio sounds that the compression algorithm has problems with. In an article in Audio Magazine, Edward Foster writes: Although the test was not double-blind, and thus is suspect, I convinced myself I could reliably tell the original from the copy|just barely, buy di erent nonetheless. The di erences occurred in three areas: A slight suppression of low-level highfrequency content when the algorithm needed most of the available bitstream to handle strong bass and midrange content, a slight dulling of the attack of percussion instruments (piano, harpsichord, glockenspiel, etc.) probably caused by imperfect masking of \pre-echo" and a slight \post-echo" (noise pu ) at the sensation of a sharp sound (such as claves struck in an acoustically dead environment). The second and third of these anomalies were most readily discernible on single instruments played one note at a time in a quiet environment and were taken from a recording speci cally made to evaluate perceptual encoders. Similar e ects exist when listening to a DCC recording. Although the losses are minimal, they are still present, being the tradeo of having the small compact portable format. 21

5 Conclusion
In the last decade, the eld of digital audio compression has grown tremendously. With the expansion of the electronics industry and the decreasing prices of digital audio, many devices which once used analog audio technology now use digital technology. Many of these digital devices use compression to reduce storage space, and bring down cost. Digital audio compression has become a sub-area of Audio Engineering, supporting many professionals who specialize in this eld. Millions of dollars are invested by companies, such as Sony and Philips, to develop proprietary compression schemes for their digital audio applications (Audio FAQ). Because of the widespread use of compression, knowledge in this area can be useful. As a musician working with modern digital recording and editing equipment, the study of compression can provide an advantage. Knowledge in the eld of compression can help in the evaluation and understanding of recording and playback equipment. It can also aid when manipulating digital les with computers. As we move into the next century, and digital audio technology continues to grow, the knowledge of audio compression will become an increasingly valuable asset.

22

Bibliography
\Audio tutorial FAQ." FTP://pub/usenet/news.answers/audio-fmts/part 12], Center for Innovative Computer Applications, August 1994. J. G. Beerends and J. A. Stermerdink, \A perceptual audio quality measure based on a psychoacoustic sound representation," AES: Journal of the Audio Engineering Society, vol. 40, p. 963, December 1992. L. W. Couch, Digital and Analog Communication Systems. New York, NY: Macmillan Publishing Company, fourth ed., 1993. J. Degener, \Digital speech compression," Dr. Dobb's Journal, vol. 19, p. 30, December 1994. M. Fleischmann, \Digital recording arrives," Popular Science, vol. 242, p. 84, April 1993. E. J. Foster, \Sony MSD-501 minidisc deck," Audio, vol. 78, p. 56, November 1994. D. B. Guralnik, ed., Webster's New World Dictionary. New York, NY: Prentice Hall Press, second college ed., 1986. P. Lutter, M. Muller-Wernhart, J. Ramharter, F. Rattay, and P. Slowik, \Speech research with WAVE-GL," Dr. Dobb's Journal, vol. 21, p. 50, November 1996. T. J. Lynch, Data Compression: Techniques and Applications. New York, NY: Van Nostrand Reinhold, 1985. M. Nelson, The Data Compression Book. San Mateo, CA: M&T Books, 1992. Panasonic Portable CD Player SL-S600C Operating Instructions. K. C. Pollmann, ed., Advanced Digital Audio. Carmel, IN: SAMS, rst ed., 1993. J. W. Ratcli , \Audio compression," Dr. Dobb's Journal, vol. 17, p. 32, July 1992. J. W. Ratcli , \Examining PC audio," Dr. Dobb's Journal, vol. 18, p. 78, March 1993. J. Rothstein, MIDI: A Comprehensive Introduction. Madison, WI: A-R Editions, Inc., 1992. A. Vollmer, \Minidisc, digital compact cassette vie for digital recording market," Electronics, vol. 66, p. 11, September 13 1993. J. Watkinson, An Introduction to Digital Audio. Jordan Hill, Oxford (GB): Focal Press, 1994. T. Yoshida, \The rewritable minidisc system," Proceedings of the IEEE, vol. 82, p. 1492, October 1994. 23

You might also like