Introduction To Computer Programming With MATLAB

6/6/2014 Introduction to Computer Programming with MATLAB
UCL Department of Phonetics and Linguistics
Introduction to Computer Programming with MATLAB
Lecture 10: Speech Signal Analysis

Objectives
By the end of the session you should:

q know one way to estimate the fundamental frequency of a section of speech signal from its spectrum
q know one way to estimate the fundamental frequency of a section of speech signal from its waveform
q know one way to estimate the formant frequencies from a section of speech signal
q appreciate some of the problems of fundamental frequency and formant frequency estimation
Outline
1. Fundamental frequency estimation frequency domain
The general problem of fundamental frequency estimation is to take a portion of signal and to find the dominant
frequency of repetition. Difficulties arise from (i) that not all signals are periodic, (ii) those that are periodic may
be changing in fundamental frequency over the time of interest, (iii) signals may be contaminated with noise, even
with periodic signals of other fundamental frequencies, (iv) signals that are periodic with interval T are also
periodic with interval 2T, 3T etc, so we need to find the smallest periodic interval or the highest fundamental
frequency; and (v) even signals of constant fundamental frequency may be changing in other ways over the
interval of interest.
A reliable way of obtaining an estimate of the dominant fundamental frequency for long, clean, stationary speech
signals is to use the cepstrum. The cepstrum is a Fourier analysis of the logarithmic amplitude spectrum of the
signal. If the log amplitude spectrum contains many regularly spaced harmonics, then the Fourier analysis of the
spectrum will show a peak corresponding to the spacing between the harmonics: i.e. the fundamental frequency.
Effectively we are treating the signal spectrum as another signal, then looking for periodicity in the spectrum itself.
The cepstrum is so-called because it turns the spectrum inside-out. The x-axis of the cepstrum has units of
quefrency, and peaks in the cepstrum (which relate to periodicities in the spectrum) are called rahmonics.
To obtain an estimate of the fundamental frequency from the cepstrum we look for a peak in the quefrency
region corresponding to typical speech fundamental frequencies. Here is an example:
% get a section of vowel

[x,fs]=wavread('six.wav',[24120 25930]);
ms1=fs/1000; % maximum speech Fx at 1000Hz
ms20=fs/50; % minimum speech Fx at 50Hz
%
% plot waveform
t=(0:length(x)-1)/fs; % times of sampling instants
subplot(3,1,1);
plot(t,x);
legend('Waveform');
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 1/6
xlabel('Time (s)');
ylabel('Amplitude');
%
% do fourier transform of windowed signal
Y=fft(x.*hamming(length(x)));
%
% plot spectrum of bottom 5000Hz
hz5000=5000*length(Y)/fs;
f=(0:hz5000)*fs/length(Y);
subplot(3,1,2);
plot(f,20*log10(abs(Y(1:length(f)))+eps));
legend('Spectrum');
xlabel('Frequency (Hz)');
ylabel('Magnitude (dB)');
%
% cepstrum is DFT of log spectrum
C=fft(log(abs(Y)+eps));
%
% plot between 1ms (=1000Hz) and 20ms (=50Hz)
q=(ms1:ms20)/fs;
subplot(3,1,3);
plot(q,abs(C(ms1:ms20)));
legend('Cepstrum');
xlabel('Quefrency (s)');
To search for the index of the peak in the cepstrum between 1 and 20ms, and then convert back to hertz, use:
[c,fx]=max(abs(C(ms1:ms20)));
fprintf('Fx=%gHz\n',fs/(ms1+fx-1));
The cepstrum works best when the fundamental frequency is not changing too rapidly, when the fundamental
frequency is not too high and when the signal is relatively noise-free. A disadvantage of the cepstrum is that it
involves computationally-expensive frequency-domain processing.
2. Fundamental frequency estimation time domain
The cepstrum looks for periodicity in the log spectrum of the signal, whereas our perception of pitch is more
strongly related to periodicity in the waveform itself. A means to estimate fundamental frequency from the
waveform directly is to use autocorrelation. The autocorrelation function for a section of signal shows how well
the waveform shape correlates with itself at a range of different delays. We expect a periodic signal to correlate
well with itself at very short delays and at delays corresponding to multiples of pitch periods.
This code plots the autocorrelation function for a section of speech signal:

ms20=fs/50; % minimum speech Fx at 50Hz
%
% plot waveform
subplot(2,1,1);
plot(t,x);
legend('Waveform');
xlabel('Time (s)');
%
% calculate autocorrelation
r=xcorr(x,ms20,'coeff');
%
% plot autocorrelation
d=(-ms20:ms20)/fs; % times of delays
subplot(2,1,2);
plot(d,r);
legend('Autocorrelation');
xlabel('Delay (s)');
ylabel('Correlation coeff.');
You can see that the autocorrelation function peaks at zero delay and at delays corresponding to 1 period,
2 periods, etc. We can estimate the fundamental frequency by looking for a peak in the delay interval
corresponding to the normal pitch range in speech, say 2ms(=500Hz) and 20ms (=50Hz). For example:
ms2=fs/500 % maximum speech Fx at 500Hz

ms20=fs/50 % minimum speech Fx at 50Hz
% just look at region corresponding to positive delays
r=r(ms20+1:2*ms20+1)
[rmax,tx]=max(r(ms2:ms20))
fprintf('rmax=%g Fx=%gHz\n',rmax,fs/(ms2+tx-1));
The autocorrelation approach works best when the signal is of low, regular pitch and when the spectral content
of the signal is not changing too rapidly. The autocorrelation method is prone to pitch halving errors where a
delay of two pitch periods is chosen by mistake. It can also be influenced by periodicity in the signal caused by
formant resonances, particularly for female voices where F1 can be lower in frequency than Fx.
3. Formant frequency estimation
Estimation of formant frequencies is generally more difficult than estimation of fundamental frequency. The
problem is that formant frequencies are properties of the vocal tract system and need to be inferred from the
speech signal rather than just measured. The spectral shape of the vocal tract excitation strongly influences the
observed spectral envelope, such that we cannot guarantee that all vocal tract resonances will cause peaks in the
observed spectral envelope, nor that all peaks in the spectral envelope are caused by vocal tract resonances.
The dominant method of formant frequency estimation is based on modelling the speech signal as if it were
generated by a particular kind of source and filter:
This type of analysis is called source-filter separation, and in the case of formant frequency estimation, we are
interested only in the modelled system and the frequencies of its resonances. To find the best matching system
we use a method of analysis called Linear Prediction. Linear prediction models the signal as if it were
generated by a signal of minimum energy being passed through a purely-recursive IIR filter.
We will demonstrate the idea by using LPC to find the best IIR filter from a section of speech signal and then
plotting the filter's frequency response.

% resample to 10,000Hz (optional)
x=resample(x,10000,fs);
fs=10000;
%
% plot waveform
subplot(2,1,1);
plot(t,x);
legend('Waveform');
xlabel('Time (s)');
%
% get Linear prediction filter
ncoeff=2+fs/1000; % rule of thumb for formant estimation
a=lpc(x,ncoeff);
%
% plot frequency response
[h,f]=freqz(1,a,512,fs);
subplot(2,1,2);
plot(f,20*log10(abs(h)+eps));
legend('LP Filter');
xlabel('Frequency (Hz)');
ylabel('Gain (dB)');
To find the formant frequencies from the filter, we need to find the locations of the resonances that make up the
filter. This involves treating the filter coefficients as a polynomial and solving for the roots of the polynomial.
Details are outside the scope of this course (see readings). Here is the code to find estimated formant
frequencies from the LP filter:
% find frequencies by root-solving

r=roots(a); % find roots of polynomial a
r=r(imag(r)>0.01); % only look for roots >0Hz up to fs/2
ffreq=sort(atan2(imag(r),real(r))*fs/(2*pi));
% convert to Hz and sort
for i=1:length(ffreq)
fprintf('Formant %d Frequency %.1f\n',i,ffreq(i));
end
This produces the output:
Formant 1 Frequency 289.2

Reading
Unit 8 from the Introduction to Digital Signal Processing course at http://www.phon.ucl.ac.uk/courses/spsci/dsp/

provides more detail about linear prediction.
A mathematical treatment can be found in: J. Makhoul. Linear prediction: A tutorial review. Proceedings of the
IEEE, 63(4):561-580, April 1975.
A good set of lecture notes on the mathematics of speech signal processing can be found at:
http://mi.eng.cam.ac.uk/~ajr/SA95/SpeechAnalysis.html
Exercises
For these exercises, use the editor window to enter your code, and save your answers to files under your
account on the central server. When you save the files, give them the file extension of ".m". Run your programs
from the command window.
1. Write a program (ex101.m) that asks the user for a filename (WAV file), and plots a fundamental frequency
track against time for the whole signal using the cepstrum method. Use a window size of 30ms and an output
frame rate of 100 Fx samples/second.
2. Modify the previous exercise to use the autocorrelation method (ex102.m). In this program set the
fundamental frequency value to zero when the peak of the autocorrelation function is less than 0.5. This acts
as a crude voicing decision.
3. Write a program (ex103.m) that asks the user for a filename (WAV file), and produces a scatter plot of
formant peaks against time. Use a window size of 30ms and an output frame rate of 100 per second.
4. (Homework) Write a GUI program which asks you to say a vowel and plots the location of the vowel on a
Vowel Quadrilateral diagram (effectively F1 vs F2).

Introduction To Computer Programming With MATLAB

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Computer Programming With MATLAB

Uploaded by

Copyright:

Available Formats

6/6/2014 Introduction to Computer Programming with MATLAB

UCL Department of Phonetics and Linguistics

Introduction to Computer Programming with MATLAB

Lecture 10: Speech Signal Analysis

By the end of the session you should:

1. Fundamental frequency estimation frequency domain

% get a section of vowel

2. Fundamental frequency estimation time domain

% get a section of vowel

ms2=fs/500 % maximum speech Fx at 500Hz

3. Formant frequency estimation

% get a section of vowel

% find frequencies by root-solving

This produces the output:

Formant 1 Frequency 289.2

Unit 8 from the Introduction to Digital Signal Processing course at http://www.phon.ucl.ac.uk/courses/spsci/dsp/

You might also like