A Tutorial On Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures
Maximum Entropy
FRAME
Satellite Maximum Likelihood Estimation
Random Fields and Maximum Entropy

A Brief Tutorial the FRAME Model and Gibbs Learning
Julian Antolin Camarena

Department of Physics and Astronomy
Wednesday, November 20, 2013
J. Antolin Camarena

Maximum Entropy
FRAME
Coming Up

The Maximum Entropy Method
The FRAME Model
Maximum Satellite Likelihood Estimation
J. Antolin Camarena

Maximum Entropy
FRAME
Markov Random Fields

Boltzmann Distribution
Random Fields
A stochastic process is a set of random variables {Xt : t T }
with Xt taking values in a finite set St .
The joint probability distribution of the variables is
p(x) = P (Xt = xt , t T ),
x = (x1 , x2 , . . . , xn ).
Let T be the set of nodes of a graph, G, and Nt the

neighborhood of t, i.e. the set for which (t, s) share and edge
in G, then the processes is said to be a Markov random field
(MRF)if
i p(x) > 0 for all x
ii for each t and x
P (xt |{xs , s G t}) = P (xt |{xs , s Nt }).
J. Antolin Camarena

Maximum Entropy
FRAME
J. Antolin Camarena


Maximum Entropy
FRAME

The neighborhood, Nt , of a node t must satisfy the following

properties:
i A site is not its own neighbor: t
/ Nt .
ii The neighborhood property must reciprocate:
t Ns s Nt
J. Antolin Camarena

Maximum Entropy
FRAME

A clique is an ordered subset of nodes of the graph: C G.

Exmples are
Single-site: C1 = {t|t G}
Pair-site: C2 = {{t, s}|s Nt , t G}
Triple-site:
C3 = {{t, s, r}|t, s, r G are neighbors of one another}
J. Antolin Camarena

Maximum Entropy
FRAME

In statistical physics the Boltzmann distribution is given by

p(x) =
1 H(x)
e
;
Z
Z =
eH(x) .
In the MRF literature the Boltzmann distribution is called the

Gibbs measure or distribution.
J. Antolin Camarena

Maximum Entropy
FRAME

Hammersley-Clifford Theorem
Theorem
X is a Markov random field on G with respect to N if and only if
X is a Gibbs random field on G with respect to N .
The proof is omitted.
In plain English: all MRF distributions can be written as a Gibbs
distribution.
J. Antolin Camarena

Maximum Entropy
FRAME

A Gibbs distribution has the clique factorization property:

X
hc (x);
H(x) =
cC
that is, the sum is over the local energy functions of each
clique.
A GRF is said to be homogeneous if hc (x) is independent of
the relative position of the clique, c and
isotropic if hc (x) is independent of the orientation of c.
J. Antolin Camarena

Maximum Entropy
FRAME

Sometimes it is convenient to write H(x) as the sum over

cliques of equal size. For example, for cliques up to size two:
X
XX
H(x) =
h1 (xt ) +
h2 (xt , xs ),
tG
tG sNt
which is the form of a much celebrated model in statistical

physics.
J. Antolin Camarena

Maximum Entropy
FRAME

Ising Model
The Ising model of magnetism is a prototypical example of a

Gibbs random field. The Ising hamiltonian is
X
X
HI =
Jij i j
hj j ,
hi,ji
where hi, ji denotes pairs i, j in the same neighborhood.
J. Antolin Camarena

Maximum Entropy
FRAME

Ising Model Sample
Source: http://pages.physics.cornell.edu/ sethna/StatMech/ComputerExercises/Fig/CoarsenedBy0.gif

J. Antolin Camarena

Maximum Entropy
FRAME

MRF model textures
Source: Statistical Image Processing and Multidimensional Modeling by Paul Fieguth, Springer 2012
J. Antolin Camarena

Maximum Entropy
FRAME
Maximum Entropy Method
The ME distribution is maximally noncommittal with respect

to missing information and is solely dependent on available
data.
The resulting distribution is in the exponential family. More
specifically, it is a Gibbs distribution.
Remember, its not the true underlying distribution, it is
simply the best distribution that can be obtained from the
data that will, on average, yield the same statistics as the
data.
J. Antolin Camarena

Maximum Entropy
FRAME
To construct it:
i Data is assumed to be a good estimate of the average value of
the measured function:
X
measurement of i (x) yields hi (x)i =
i (x)p(x)
x
ii Solve the optimization problem via Lagrange multipliers:

(
)
(P
X
x p(x) = 1
max
p(x) log p(x) subject to
P
p(x)
hi (x)i = x i (x)p(x)
x
iii Solving, one has the ME distribution:
p(x; ) p (x) =
1 Pi Ti i (x)
e
Z
where = (1 , 2 , . . . , N ).
J. Antolin Camarena

Maximum Entropy
FRAME
Z satisfies
log Z
= hi (x)ip ,
i
2 log Z
= cov{i (x), j (x)}.
i j
The second property of Z says that the Hessian of log Z

positive semidefinite and is concave wrt and so is p(x; ).
Thus, given a set of consistent constraints the Lagrange
multipliers are unique.
The maximum likelihood estimate of the Lagrange multipliers
satisfies
dn
= hn (x)ip n ,
dt
J. Antolin Camarena
n = 1, 2, . . . , N

Maximum Entropy
FRAME
Overview
We now discuss the paper Filters, Random Fields and Maximum
Entropy (FRAME): Towards a Unified Theory for Texture
Modeling [International Journal of Computer Vision 27(2), 107126
(1998)] by Zhu, Wu, and Mumford.
Given an input texture image
a set of filters is selected from a general set of filters;
histograms of the filtered image are calculated as they
approximate the marginals of the true underlying distribution,
f (I);
a maximum entropy distribution, p(I), is constructed
constrained by the marginal distributions of f (I)
J. Antolin Camarena

Maximum Entropy
FRAME
Filters
A filter is a system that performs mathematical operations on
an input signal to enhance or reduce desired features of the
input.
Linear space-invariant (LSI) filters are popular because
because they can be implemented with a convolution
operation. Let h be an LSI filters impulse response (filter
window/Green function) and x an input signal, then filtered
signal, is given by their convolution
Z
y(z) =
h(z 0 )x(z z 0 )dz 0
or
yn =
xnk hk .
k=
J. Antolin Camarena

Maximum Entropy
FRAME
Laplacian filter
Lena filtered with Laplacian filter. Source:

http://asura.iaigiri.com/OpenGL/Image/LaplacianFilter/LaplacianFilter.png
2
2
L(x, y) = 2 + 2
x
y
J. Antolin Camarena

Maximum Entropy
FRAME
Gaussian filter
Source: Wikipedia
G(x, y; x0 , y0 , x , y ) = 21 e
x y
2 +(yy )2 /2 2 )
1 ((xx0 )2 /2x
0
y
2
J. Antolin Camarena

Maximum Entropy
FRAME
Laplacian of Gaussian
http://www.aishack.in/wp-content/uploads/2010/08/conv-laplacian-of-gaussian-result.jpg
LG (x, y; x0 , y0 , x , y ) = L(x, y)G(x, y; x0 , y0 , x , y )
J. Antolin Camarena

Maximum Entropy
FRAME
Model Assumptions and Definitions
The image I is a random field on a discrete lattice and is a

stationary process.
I contains sufficiently many pixels for statistical analysis.
Filters are denoted by F (k) , k = 1, . . . , K and the filtered
image by I(k) = I ? F (k)
Further, since I is stationary and the F (k) are LSI ,
I(k) = I ? F (k) is a convolution.
J. Antolin Camarena

Maximum Entropy
FRAME
The histograms of I(k) are good approximations to the

marginals f (k) (I). They are vectors and are denoted H (k) .
Knowing a sufficient number of marginals we can build the
distribution.
The observed (input) image is denoted Iobs . The observed
(k)
filtered (by F (k) ) images are denoted by Iobs and the
(k)
corresponding histograms by Hobs . Similar notation is used
for the synthesized quantities.
J. Antolin Camarena

Maximum Entropy
FRAME
The ME distribution depends upon the selected filter set SK

and the Lagrange multipliers K :
p(I; SK , K ) =
1
ZK
PK
n=1
T (n)
H (n)
We look for
K = argmax {log p(Iobs ; SK , K )}
(
= argmax log ZK
K
K
X
T (n)
(n)
Hobs
n=1
which is equivalent to
d(n)
(n)
(n)
= hHsyn
ip(I;SK ,K ) Hobs
dt
J. Antolin Camarena

Maximum Entropy
FRAME
FRAME Algorithm
Input a texture image Iobs .
Select a set of K filters, SK = {F (1) , F (2) , . . . , F (K) }.
Compute H (k) , k = 1, 2, . . . , K.
Initialize (k) 0, k = 1, 2, . . . , K.
InitializeI syn white Gaussian
noise texture.

(k)
(k)
1
While 2 hHsyn ip Hobs for k = 1, 2, . . . , K
`1
Calculate
(k)
Hsyn
Update (k)
(k)
from Isyn , use it for hHsyn ip

(n)
(k)
by (k) = hHsyn ip Hobs . This updates p.
Samplea p(I; SK , K ) to update Isyn .

a
Gibbs, MCMC, etc.
J. Antolin Camarena

Maximum Entropy
FRAME
Filter Selection Algorithm

Let B be a general filter bank, S the set of selected filters, Iobs the
observed texture image, and Isyn the synthesized texture image.
Initialize k = 0, S , p(I) = U[0,G1] and Isyn U[0,G1] For
()
()
= 1, . . . , |B| compute Hobs from Iobs .

Repeat
()
()
Calculate Hsyn from Isyn .

()
()
d() = 12 Hsyn Hobs
Choose F (k+1) so that d(k + 1) = max{d() : F () B/S}
S
S S {F (k+1) }, k k + 1.
Update p(I) and Isyn with the FRAME algorithm.
Until d() <
J. Antolin Camarena

Maximum Entropy
FRAME
Reported Results: K = 0, 1, 2, 3, 6 filters
J. Antolin Camarena

Maximum Entropy
FRAME
Reported Results: histograms and Lagrange multipliers for

subband images
J. Antolin Camarena

Maximum Entropy
FRAME
Graphically, we have
J. Antolin Camarena

Maximum Entropy
FRAME
Overview
We now give a brief review of a follow up paper by Song Chun Zhu

and Xiuwen Liu, Learning in Gibbsian Fields: How Fast and How
Accurate Can It Be? [IEEE TRANSACTIONS ON PATTERN
ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 7,
JULY 2002]
The authors identify two major issues in Gibbsian learning:
1
2
the efficiency of likelihood functions, and

the variance in approximating partition functions using Monte
Carlo integration.
J. Antolin Camarena

Maximum Entropy
FRAME
This paper proposes three algorithms for learning Gibbs

distribution parameters (Gibbsian learning):
1
2
3
A maximum partial likelihood estimator

A maximum patch likelihood estimator, and
A maximum satellite estimator.
They find that these algorithms have different benefits and

downfalls, but generally outperform standard MCMC Gibbsian
learning. They claim that the third algorithm offers the best
trade-off between accuracy and speed of estimation.
J. Antolin Camarena

Maximum Entropy
FRAME
The Common Framework of Gibbsian Learning

Let IS be an input texture image and IS its boundary
conditions and S is the underlying lattice.
The feature statistics are h(IS |IS ).
The Gibbs distribution is
p(IS |IS ; ) =
1
Z(IS , )
eh,h(IS |IS )i
We wish to estimate
= argmax{G }
with G =
M
X
log p(ISi |ISi ; ).
i=1
Here Si i = 1, 2, . . . , M indicates that the lattice S has been

segmented into M regions.
J. Antolin Camarena

Maximum Entropy
FRAME
The Common Framework of Gibbsian Learning
The authors identify two choices that need to be made in the

Gibbsian learning problem:
1
The number, sizes, and shapes of the foreground patches Si

and corresponding backgrounds Si i = 1, 2, . . . , M .
The reference models used to estimate the partition functions.
J. Antolin Camarena

Maximum Entropy
FRAME
Choice 1: The foreground and background
The foreground pixels, Si and corresponding backgrounds Si i = 1, 2, . . . , M are

shown in light and dark shading, respectively. (a)-(c) are m m patches. In one
extreme the loglikelihood, G in (a) chooses m = N 2w and is used in MCMCMLE
methods. The other extreme in (c) chooses m = 1 and G is the pseudolikelihood
used in MPLE. The midpoint is shown in (b) and G is the lo=patch-likelihood. The
choice in (d) has M = 1 irregular patch, 1 , with pixels randomly selected, the rest of
the lattice is the background 1 and G is the log-partial-likelihood. In (b) and (c)
patches are allowed to overlap.
J. Antolin Camarena

Maximum Entropy
FRAME
Choice 2: Reference model for estimation of Z
M

Now we need to estimate Z (Iobs
Si ) for each Si i=1 by Monte Carlo integration using a reference model at
= 0 :
L
syn obs
Z0 (Iobs
Si ) X h0 ,h(Iij |IS )i
obs
i
Z (IS )
e
i
L
j=1
syn
where Iij L
are typical samples of the reference model. The log-likelihood can be estimated iteratively by
j=1
gradient descent. The dashed line shows the inverse Fisher information and the solid curves show the variance in a
sequence of models approaching the true parameter value.
J. Antolin Camarena

Maximum Entropy
FRAME
Algorithm 1: Maximizing partial likelihood (MPLE)
We choose S as in the figure by randomly selecting 1/3 of pixels as

obs
foreground. The log-partial likelihood is G = log p(Iobs
S1 |ISS1 ; ).
Maximizing G by gradient descent we update iteratively. This is the same

setup as in FRAME, although MPLE trades-off accuracy (lower Fisher info.)
for speed ( 25) in a better way than FRAME. This is mainly due to
FRAMEs image synthesis under nontypical conditions (initializing Isyn to
noise) and MPLE always has typical boundary conditions.
J. Antolin Camarena

Maximum Entropy
FRAME
Algorithm 2: Maximizing patch likelihood (MPaLE)
The foreground is a set of overlapping patches from Iobs

and digs a hole Si
S
in each patch as in the figure. The patch likelihood is
G =
M
X
obs
log p(Iobs
Si |ISSi ; ).
i=1
Maximizing G by gradient descent we update iteratively. Algorithms 1 and

2 have similar performance.
J. Antolin Camarena

Maximum Entropy
FRAME
Algorithm 3: Maximizing satellite likelihood (MSLE)

In contrast to algorithms 1 and 2, MSLE does not synthesize images online
(within the learning algorithm), which is computationally intensive.
We select a set of reference models in the exponential family:
R = {p(I; j ) : j , j = 1, 2, . . . , s}. Each model is sampled to synthesize
a large image. The log-satellite likelihood is given by
G =
s
X
G (j) (; j );
G (j) (; j ) =
j=1
M
X
i=1
obs
h,h(Iobs
S |ISS )i
log
(j)
Zi
and
(j)
Zi =
L
X
syn
Zj (Iobs
Si )
hj ,h(Iij` |Iobs
Si )i
e
L
`=1
is estimated by Monte Carlo integration. In the above the index 1 ` L runs

over the different realizations of the reference models; 1 j s runs over the
different models; and 1 i M runs over the foreground lattices. Maximizing
G by gradient descent we update iteratively.
J. Antolin Camarena

Maximum Entropy
FRAME
Reported results: FRAME used as truth
J. Antolin Camarena

Maximum Entropy
FRAME
Results
Top row: The difference between the two MSLE synthesized images is that the
result (b) ignores all boundary conditions, whereas (c) uses obeserved boundary
conditions.
Bottom row: was learned with MSLE for different hole sizes (a) m = 2; (b)
m = 6; and (c) m = 9.
J. Antolin Camarena

Maximum Entropy
FRAME
Summary of Algorithms
Group 1. In (a) ML estimators (FRAME, MPLE, MPaLE, MCMCMLE)

generate a sequence of satellites 0 , 1 , 2 , . . . , k
online.
Group 2. In (c) we see the maximum pseudo-likelihood uses a unform
model 0 = 0 to estimate any model and thus has large
variance.
Group 3. In (b) the MSLEs use a general set of satellites which are
precomputed and sampled offline. To save time, one can
obs
compute the difference d(j) = |h(Isyn
)| the index
j ) h(I
values that return the smallest s values correspond to satellites
that are closer to the truth.
J. Antolin Camarena

Maximum Entropy
FRAME
THANK YOU!
J. Antolin Camarena

A Tutorial On Random Fields and Maximum Entropy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Tutorial On Random Fields and Maximum Entropy

Uploaded by

Copyright:

Available Formats

Markov Random Fields and Gibbs Measures

Random Fields and Maximum Entropy

Julian Antolin Camarena

Wednesday, November 20, 2013

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields and Gibbs Measures

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

Let T be the set of nodes of a graph, G, and Nt the

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

The neighborhood, Nt , of a node t must satisfy the following

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

A clique is an ordered subset of nodes of the graph: C G.

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

In statistical physics the Boltzmann distribution is given by

In the MRF literature the Boltzmann distribution is called the

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

A Gibbs distribution has the clique factorization property:

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

Sometimes it is convenient to write H(x) as the sum over

which is the form of a much celebrated model in statistical

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

The Ising model of magnetism is a prototypical example of a

where hi, ji denotes pairs i, j in the same neighborhood.

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

Ising Model Sample

Source: http://pages.physics.cornell.edu/ sethna/StatMech/ComputerExercises/Fig/CoarsenedBy0.gif

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Markov Random Fields

MRF model textures

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

Maximum Entropy Method

The ME distribution is maximally noncommittal with respect

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

ii Solve the optimization problem via Lagrange multipliers:

Random Fields and Maximum Entropy

Markov Random Fields and Gibbs Measures

The second property of Z says that the Hessian of log Z