1986 - Levinson - Maximum Likelihood Estimation For Multivariate Mixture Observations of Markov Chains

IEEE TRANSACTIONS ON INFORMATION
THEORY, VOL. IT-32, NO. 2, MARCH 1986
307
TABLE II GROUP WITH THE OPTIMAL STATISTICAL PERFORMANCE FOR THE FIRST-ORDER MARROV PROCESS n
P
12
16
18
24
30
32
s3
36
'36
40
s3
42
48
54
56
60
64
C .9 S, Q2 G X s3 CM Cl, 1.99 s, Q, C, x S, C, x Q2 C, x S, C, TQ,
C C C, ?:es, C, TQ,
GO X S, Q, X c5
c s, x""Q,
S, x C,
c54
Q2 x C,
c56
S, ?C,,
Q22Q;
TABLE III DISPERSION FOR THE GROUP FILTER WITH G = S, X C,
P
Ds3xc2
0.5
0.4545
0.6
0.4294
0.7
0.3944
0.8
0.3437
0.9
0.2628
0.92
0.2398
0.94
0.2128
0.96
0.1800
0.99
0.111
TABLE IV GROUP WITH THE OPTIMAL STATISTICAL PERFORMANCE FOR THE RANDOM SINE WAVE
8
c,
12
c,,
16
cl6
18
24
~3
30
S3
32
3:
S3
40
Q2 X G Q2 x C, c, x Q2
42
s, x c,
S3
48
S, x C,
S3
54
S, x C,
56
Q2 x C,
s3 S3
60
x Cl0
64
ep2 ; g3
1.01 c,
S, x
S3 x C, s, x c, S, X c4
x C,
Q2 x C,
Q2 x C, c, X Q2
X G
1.05 2
;:
Q2
22 , 2
C2
z;x, :
Q2
s, x c, S, X C,
s, x c, s3 X c5
S, x S, c, X s3
x C,
x s3
X Q2 S, x C, Q2 x G
c,,
G4 c56
x Go Q : x Q :
C 60 C 64
1.1
s3 S3
x S, G x
c,
ACKNOWLEDGMENT The authors wish to thank Mr. 0. Zimmerman for conducting the computer experiments of Section III.
REFERENCES
VI
PI I31 [41 [51
[61 [71
WI [91
PO1
Ull
and K. R. Rae, Fast Transforms: Algorithm, Analyses und Applications. New York: Academic, 1983. Picture Processing and Digital Filtering. New York: H. C. Andrews, Springer-Verlag. 1975. N. Ahmed and K. R. Rae, Orthogonal Transform for Digitul Signal Processing. New York: Springer-Verlag. 1975. B. J. Fino and V. R. Algazi, A unified treatment of discrete fast unitary transforms, SIAM J. Computing, vol. 6, pp. 700-717, Dec. 1977. H. C. Andrew and K. L. Caspari, Degrees of freedom and modular IEEE Trans. Comput., vol. C-19, pp. structure in matrix multiplication, 16-25, 1970. N. Ahmed, T. Natarajan, and K. R. Rao, Discrete cosine transform, IEEE Trans. Comput., vol. C-23, pp. 90-93, 1974. J. Pearl, Basis restricted transformations and performance measures for spectral representations, IEEE Trms. Inform. Theory, vol. IT-17, pp. 751-752, 1971. H. S. Kwak, R. Srinivasan, and K. R. Rao, C-matrix transforms, submitted to IEEE Truns. Acoust. Speech, Signul Processing. W. M. Chen, C. M. Smith, and S. C. Fralick, A fast computational algorithm for the discrete cosine transform, IEEE Truns. Commun., vol. COM-25, pp. 1004-1009.1977. K. R. Rao, J. G. K. Kuo, and M. A. Narasimhan, Slant-Haar transform, Int. J. Computer M&h., sec. B, 7, pp. 73-83, 1979. E. A. Trachtenberg, Fast Wiener filtering computation technique, in D. F. Elliott
Fast Fourier transforms on finite non-Abelian M. G. Karpovsky, groups, IEEE Truns. Comput., vol. C-26, pp. 1028-1030, 1977. P. J. Nicholson, Algebraic theory of finite Fourier transforms, J. WI Comput. Syst. Sci., pp. 524-547, 1971. L. Domhoff, Group Representation Theory. New York: Marcel Dekker, WI 1971. and E. A. Trachtenberg, Fourier transform over 1231 M. G. Karpovsky finite groups for error detection and error correction in computation channels, Inform. Contr., vol. 40, pp. 335-358, 1979. [241 M. D. Flickner and N. Ahmed, A derivation for the discrete cosine transform, Proc. IEEE, vol. 70, pp. 1132-1134, 1982. of cosine and Fourier transforms 1251 M. Hamidi and J. Pearl, Comparison of Markov-1 signals, IEEE Trans. Acourt., Speech, Signcrl Processing, vol. ASSP-24, pp. 428-429,1976. E. A. Trachtenberg, Systems over finite groups as suboptimal Wiener WI filters: A comparative study, presented at 1983 Int. Symp. on Muthenwtical Theory of System and Networks, Beer-Sheva, Israel, 1983. M. G. Karpovsky, Finite Orthogonal Series in the Design of Digital v71 Deuices. New York: Wiley, 1976. Construction of Group Transforms Subject to E. A. Trachtenberg, WI Several Performance Criteria, IEEE Tmns. Acourt. Speech, Signal Processing, vol. ASSP-33, pp. 1521-1531, 1985. PO1
Proc. 1979 Int. Symp. on the Muthemuticul Theory of Networks and Swtems. vol. 3, Delft, The Netherlands pp. 174-177.
WI
E. A. Trachtenberg, Construction of fast unitary transforms which are equivalent to Karhunen-Loeve spectral representations, in Proc. 1980 IEEE Int. Symp. on EMC, Baltimore, MD, USA, pp. 376-379. M. G. Karpovsky and E. A. Trachtenberg, Some optimization probv31 lems for convolution systems over finite groups, In/arm. Corm., vol. 34, pp. 227-247.1977. J. Pearl, Optimal dyadic models of time-invariant systems, IEEE v41 Trum. Comp;t., vol. C-24, 1975, pp. 598-602. Wiener filtering computation technique, [I51 W. K. Pratt, Generalized IEEE Truns. Comput., vol. C-21, pp. 636-641, 1972. V61 J. Pearl, Walsh processing of random signals, IEEE Truns. Electromugn. Cornput., vol. EMC-13, pp. 137-141, 1971. On coding and filtering stationery signals by discrete Fourier (171 -, transform, IEEE Truns. Inform.Theory, vol. IT-19, pp. 229-232, 1973. -. Asymptotic equivalence of spectral representations, IEEE Truns. IlRl Acoust.. Speech, Signul Processing, vol. ASSP-23, pp. 547-551. 1975. properties of discrete unitary I1 91 Y. Yemini and J. Pearl, Asymptotic transforms, IEEE Trans. Pattern Anal. Muchine Intell., vol. PAMI-1, pp. 366-371. 1979.
Maximum Likelihood Estimation for Multivariate Mixture Observationsof Markov Chains

B. H. JUANG, STEPHEN E. LEVINSON, AND M. M. SONDHI SENIOR MEMBER, IEEE,
Abstract-To use probabilistic functions of a Markov chain to model certain parameterizations of the speech signal, we extend an estimation technique of Liporace to the cases of multivariate mixtures, such as Gaussian sums, and products of mixtures. We also show how these problems relate to Liporace original framework. s
Manuscript received September 10, 1984; revised July 12, 1985. This work was presented at the 1985 IEEE International Symposium on Information Theory, Brighton, England, June 24-28. The authors are with Acoustics Research Department, Bell Laboratories, Murray Hill, NJ 07974. IEEE Log Number 8406633.
001%9448/86/0300-0307$01.00 01986 IEEE
Authorized licensed use limited to: UNIVERSIDADE DE BRASILIA. Downloaded on June 23, 2009 at 11:23 from IEEE Xplore. Restrictions apply.
308
INTRODUCTION
IEEE TRANSACTIONSON
INFORMATIONTHEORY,VOL.IT-32,NO.2,~ARCH1986
In a recently published paper, Liporace [6] derived a method for estimating the parameters of a broad class of elliptically symmetric probabilistic functions of a Markov chain. The corollary to that work presented here was motivated by the desire to use this general technique to model the speech signal for which it is known [4], [8] that, unfortunately, certain of its most useful parameterizations do not possess the prescribed symmetry. Since any continuous probability density function can be approximated arbitrarily closely by a normal mixture [9], it is reasonable to use such constructs to avoid the restrictions imposed by the requirement of elliptical symmetry. In this correspondence we adapt the method and proof of [6] to two types of mixture densities.
NOMENCLATURE
02~ (0) = 0. Thus a recursive application of Y to some initial value of X converges to a local maximum (or possibly an inflection point) of the likelihood functions. Liporace result [6] res laxed the original requirement of Baum et al. [2] that 5(x) be strictly log concave to the requirement that it be strictly log concave and/or elliptically symmetric. We will further extend the class of admissible pdf to mixtures and products of mixtures of s strictly log concave and/or elliptically symmetric densities. For the present problem we will show that a suitable mapping Y is given by the following equations:
T-l
Zij =quij)
C a,(i>u,jbj(O,+l)p,+l(j) = t=l T-l

t=1
Throughout this presentation we shall, where possible, adopt the notation used in [6]. Consider an unobservable n-state Markov chain with state transition matrix A = [ u~~],,~,~. Associated with each state j of the hidden Markov chain is a probability density function b,(x), of the observed d-dimensional random vector x. Here we shall consider densities of the form
k=l
c % (9&(i)
(6)
Cjk =.7(cjk) = =; t~14i)8,(i)

T
(7)
(1)
i$k =r(ll/k) =
where m is known; cjk 2 0 for 1 5 j I n, 1 I k 5 m; C; ,ic,, = 1 for 1 < j I n; and .N(x, p, U) denotes the d-dimensional normal density function of mean vector p and covariance matrix U. It is convenient then to think of our hidden Markov chains as being defined over a parameter manifold A = { zZpp Q x gd x x @ }, where .sP is the set of all n x n row-wise stochastic matrices; Y is the set of all M x n row-wise stochastic matrices; Wd is the usual d-dimensional Euclidean space; and ed is the set of all d x d real symmetric positive definite matrices. Then for a given sequence of observations, 0 = 0, ,Oz,. . , Or, of the vector x and a particular choice of parameter values X E A, we can efficiently evaluate the likelihood function, L,(O), of the hidden Markov chain by the forward-backward method of Baum [l]. The forward and backward partial likelihoods, cy,(j) and p,(i), are computed recursively from
=f
I=1
c di, k>Nj)
k)bt(j)(q pjk)(q pjk)
(8)
and
f if& = 9-( qk) = 1-1 ,$ldj9 k)&(j) dj?
(9)
forl~i,jIn,lIkImandlIr,sId.
In U-(9) ( db,t
fort=l, for 1 < 4 T.
and
(10)
(For fixed k, p,( j, k) is formally identical to p:(j), as defined by Liporace.)
Proof of the Formu1u.s:Equations (6) and (7) for the reestimation of ai, and cjk follow directly from a theorem of Baum and Sell [3] because the likelihood function ZA(0) given in (4) is a polynomial with nonnegative coefficients in the variables (4) u,~, ~,~,l s i, j I n,l I k I m. To prove (8) and (9) our strategy, following Liporace, is to define an appropriate auxiliary function Q(1,x). This function for any t between 1 and T - 1. will have the property that Q(x, X_> Q(x, A) implies Z > x<O) > THEESTIMATIONALGORITHM sA(0). Further, as a function of h for any fixed X, Q(x, X) will The parameter estimation problem is then one of maximizing have a unique global maximum given by (6)-(9). As a first step to derive such a function we express the Zi( 0) with respect to X for a given 0. One way to mtimize Z A likelihood function as a sum over the set, 9, of all state seis to use conventional methods of constrained optimization. Liporace, on the other hand, advocates a reestimation technique quences S: analogous to that of Baum et al. [l], [2]. It is essentially a mapping Y: A + A with the property that %T(h,(O) 22A(O) with equality iff A is a critical point of ZA(0), (9 that is,
respectively. The recursion is initialized by setting a,(l) = 1, q)(j) = 0 for 2 <j I n and &-(I = 1 for 1 I i I n, ) whereupon we may write
IEEE TRANSACTIONS
ON INFORMATION
THEORY,
VOL.
IT-32,
NO.
2, MARCH
1986
309 zero or unity. As these are unaltered by (6) and (7), they need not be reestimated. Using this reconfiguration of the state diagram, Liporace formulas can be used in case b,(x) is any mixture of s elliptically symmetric densities. A variant on the Gaussian mixture theme results from using b,(x) of the form of a product of mixtures,
D m
Let us partition the likelihood function further by choosing a particular sequence, K = (k, , k, , . . . , k,), of mixture densities. As in the case of state sequences denote the set of all mixture we sequencesas X= { 1,2,. . . , m}? Thus for some particular K E X we can write the joint likelihood of 0, S, and K as
~A(O,S,K) =iras ,~(~)~LS,I(I)US,~,)CS,~,. ,-,. F (14

t=1
W e have now succeededin partitioning the likelihood function as What we have considered so far is the special case of (16) for D = 1. Yh(O) = c c g~(o,s,K). (13) From the structure of our derivation it is clear that for hidden SE9KEY Markov chains having densities of the form (16), reestimation formulas can be derived as before by solving v-,Q(x, X) = 0. In view of the similarity of the representation (13) to that of Such solutions will yield results quite analogousto (6)-(10). Note -Ep, in [6], we now define the auxiliary function that this case too can be represented as a reconfiguration of the state diagram. =CC~~(O,S,K)log~~(O,S,K). (14) One numerical difficulty which may be manifest in the methS K ods described is the phenomenon noted by Nadas [7] in which When the expressions for Px and 2~ derived from (12) are one or more of the mean vectors converge to a particular observation while the corresponding covariance matrix approaches substituted in (14), we get a singular matrix. Under these conditions, PA(O) -+ cc but the value of X is meaningless.A practical, if unedifying, remedy for this difficulty is to try a different initial A. Alternatively, one can drop the offending term from the mixture since it is only contributing at one point of A. where Y,~,~,, 0. The innermost summation in (15) is formally 2 Finally, we call attention to two minor facets of these alidentical to that used by Liporace in his proof; therefore, the gorithms. First, for flexibility in modeling, the number of terms properties which he demonstrated for his auxiliary function with in each mixture may vary with state, so that m in (1) could as respect to p and U hold in our case as well, thus giving us (8) well be m,. A similar dependenceon dimension results if m in and (9). W e may thus conclude that (5) is correct for Y defined (16) is replaced by mj,. In either case, the constraints on the by (6)-(9). Furthermore, the parameter separation made explicit in (12)-(15) allows us to apply the same algorithm to mixtures of mixture weights must be satisfied. Second, for realistic numbers of observations, for example, strictly log concave densities and/or,elliptically symmetric densiT 2 5000, the reestimation formulas will underflow on any existties as treated by Liporace in 161. ing computer. The basic scaling mechanism described in [5] can be used to alleviate the problem but must be modified to account DISCUSSION for the fact that the pt/3, product will be missing the tth scale In [6] Liporace notes that by setting a,, = pj,l I j I n,Vi, factor. To divide out the product of scale factors, the tth sumthe special case of a single mixture can be treated. It is natural mand in both numerator and denominator of (7), (9), and (10) then to think of using a model with n clusters of m states, each must be multiplied by the missing coefficient. with a single associated Gaussian density function as a way of At this writing numerical experiments based on Monte Carlo treating the Gaussian mixture problem considered here. simulations and classification experiments using real speech sigThe transformation can be accomplished in the following way. nals are being conducted. W e hope to report the results of these First we expand the state space of our n-state model as shown in studies upon their completion. Fig. 1, in which we have added states j, through j, for each state j in the original Markov chain. Associated with states REFERENCES Jl, 52, ., j,, are distinct Gaussian densities corresponding to the . m terms of the j th Gaussian mixture in our initial formulation. 111 L. E. Baum, An inequality and associated maximization technique in The transitions exiting state j have probabilities equal to the statistical estimation for probabilistic functions of a Markov process, in Inequalities, vol. III, 0. Shisha, Ed. New York: Academic, 1972, pp. corresponding mixture weights. State j,, is a distinguished state l-8. that is entered with probability 1 from the other new states, exits PI L. E. Baum, T. Petrie, G. Soules, and N. Weiss, A maximization to state j with probability ujj, and generatesno observation in so technique occurring in the statistical analysis of probabilistic functions of doing. The transition matrix for this configuration can be written Markov chains Ann. Murh. Statist., vol. 41, pp. 164-171, 1970. down by inspection. A large number of the entries in it will be [31 L. E. Baum and G. R. Sell, Growth transformations for functions on
Q&x)
Fig. 1.
Equivalent
M + 2 state configuration Gaussian mixture.
for each state with
m-term
manifolds, Pm. J. Math., vol. 27, pp. 211-227, 1968. A. H. Gray and J. D. Markel, Quantization and bit allocation in speech processing, IEEE Truns. Acourt. Speech Signal Processing, vol. ASSP-24, pp. 459-473, Dec. 1976. to [51 S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, An introduction the application of the theory of probabilistic functions of a Markov process to automatic speech recognition, Bell. Cyst. Tech. J., vol. 62, pp. 1035-1074, Apr. 1983. 161 L. R. Liporace, Maximum likelihood estimation for multivariate observations of Markov sources. IEEE Truns. Inform. Theory. vol. IT-28. pp. 129-734, Sept. 1982. A. Nadas, Hidden Markov chains, the forward-backward algorithm and [71 initial statistics, IEEE Truns. Acoust. Speech Signd Processing, vol. ASSP-31, pp. 504-506, Apr. 1983. L. R. Rabiner, J. G. Wilpon, and J. G. Ackenhusen, On the effects of WI varying analysis parameters on an LPC-based isolated word recognizer, Bell Svst. Tech. J.. vol. 60, pp. 893-911, 1981. H. W. Sorenson and D. L. Alspach, Recursive Bayesian estimation using [91 Gaussian sums, Automuticu, vol. 7, pp. 465-479, 1971. [41

1986 - Levinson - Maximum Likelihood Estimation For Multivariate Mixture Observations of Markov Chains

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1986 - Levinson - Maximum Likelihood Estimation For Multivariate Mixture Observations of Markov Chains

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INFORMATION

THEORY, VOL. IT-32, NO. 2, MARCH 1986

C .9 S, Q2 G X s3 CM Cl, 1.99 s, Q, C, x S, C, x Q2 C, x S, C, TQ,

TABLE III DISPERSION FOR THE GROUP FILTER WITH G = S, X C,

Maximum Likelihood Estimation for Multivariate Mixture Observationsof Markov Chains

001%9448/86/0300-0307$01.00 01986 IEEE

C a,(i>u,jbj(O,+l)p,+l(j) = t=l T-l

Cjk =.7(cjk) = =; t~14i)8,(i)

fort=l, for 1 < 4 T.

~A(O,S,K) =iras ,~(~)~LS,I(I)US,~,)CS,~,. ,-,. F (14

M + 2 state configuration Gaussian mixture.

for each state with

You might also like