You are on page 1of 18

Basic Features of Audio Signals

()
Jyh-Shing Roger Jang ()
http://mirlab.org/jang
MIR Lab, CSIE Dept
National Taiwan Univ., Taiwan

Audio Features
Four commonly used audio features
Volume, pitch, zero crossing rate, timber
Our goal
These features can be perceived subjectively
(except for zero crossing rate).
Our goal is to compute them quantitatively (and
objectively) for further processing and
recognition.
Audio Features in Time Domain
Audio features presented in the time domain



Intensity
Fundamental period
Timbre: Waveform within an FP
Audio Features in Frequency Domain
Volume: Magnitude of spectrum
Pitch: Distance between harmonics
Timber: Smoothed spectrum








Second formant
F2
First formant
F1
Pitch freq
Intensity
General Steps for Audio Analysis
1. Frame blocking
Frame duration of 20 ms or so
2. Feature extraction
Volume, zero-crossing rate, pitch, MFCC, etc
3. Frame-based Analysis
Pitch contour comparison, HMM evaluation, etc
Frame Blocking
Sample rate = 11025 Hz
Frame size = 256 samples
Overlap = 84 samples
(Hop size = frame size - overlap)
Frame rate = 11025/(256-84)=64 frames/sec
0 50 100 150 200 250 300
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Frame-based Manipulation
For simplicity, we usually pack frames into a
frame matrix for easy manipulation in
MATLAB:
[y, fs, nbits]=wavread(file.wav);
frameMat=enframe(y, frameSize, overlap);
Volume (I)
Loudness of audio signals
Visual cue: Amplitude of vibration
Also known as energy or intensity
Two major ways of computing volume:
Volume:

Log energy (in decibel):

1
n
i
i
vol s
=
=

2
10
1
10*log
n
i
i
energy s
=
| |
=
|
\ .

Volume (II)
Perceived volume is influenced by
Frequency (see equal loudness curves in text)
Timbre (see example in text)
Computed volume is influenced by
Microphone types
Microphone setups
Volume (III)
To avoid DC bias (or DC drifting)
DC bias: The vibration is not around zero
Computation:
Volume:

Log energy (in decibel):
Theoretical background (How to prove?)

( )
1
n
i
i
vol s median s
=
=

( ) ( )
2
10
1
10*log
n
i
i
energy s mean s
=
| |
=
|
\ .

| | ( )
1 2
1
, ,..., arg min
n
n i
x
i
s s s s s x median s
=
= =

| | ( ) ( )
2
1 2
1
, ,..., arg min
n
n i
x
i
s s s s s x mean s
=
= =

Volume (IV)
Functions for computing volume
Example: volume01
Example: volume02
Example: volume03
Volume depends on
Frequency
Try this equal loudness test
Timber
Example: volume04

Zero Crossing Rate
Zero crossing rate (ZCR)
The number of zero crossing in a frame.
Characteristics
Zero-justification is required.
Noise and unvoiced sound have high ZCR.
ZCR is commonly used in endpoint detection,
especially in detection the start and end of
unvoiced sounds.
To distinguish noise/silence from unvoiced sound,
usually we add a shift before computing ZCR.

ZCR Computations
Two types of ZCR definitions
If a sample with zero value is considered a case of
ZCR, then the value of ZCR is higher. Otherwise
its lower.
The distinction diminishes when using a higher
bit resolution.
Other consideration
ZCR with shift can be used to distinguish between
unvoiced sounds and silence. (How to determine
the shift amount?)
ZCR
ZCR computing
Example: zcr01
Example: zcr02
To use ZCR to distinguish between unvoiced
sounds and environmental noise
Example: Example: zcrWithShift
Pitch
Definition
Pitch is also known as fundamental frequency,
which is equal to the no. of fundamental period
within a second. The unit used here is Hertz (Hz).
More commonly, pitch is in terms of semitone,
which can be converted from pitch in Hertz:
2
69 12*log
440
Hz
semitone
| |
= +
|
\ .
Pitch Computation (I)
Pitch of tuning forks
semitone
f f
pitch
Hz f p f f
f p
9827 . 68
440
log 69
56 . 439 / 1
sec 002275 . 0 16000 / 5 / ) 7 189 (
2
=
|
.
|

\
|
+ =
= =
= =
Pitch Computation (II)
Pitch of speech
semitone
f f
pitch
Hz f p f f
f p
42 . 46
440
log 69
403 . 119 / 1
sec 008375 . 0 16000 / 3 / ) 75 477 (
2
=
|
.
|

\
|
+ =
= =
= =
Statistics of Mandarin Chinese
5401 characters, each character is at least associated with a
base syllable and a tone
411 base syllables, and most syllables have 4 ones, so we
have 1501 tonal syllables
Tone is characterized by the pitch curves:
Tone 1: high-high
Tone 2: low-high
Tone 3: high-low-high
Tone 4: high-low
Some examples of tones:
1242
1234
?????Taiwanese

You might also like