You are on page 1of 55

Partial Least Squares

Regression (PLSR)
• Partial least squares (PLS) is a method for
constructing predictive models when the
predictors are many and highly collinear.

• Note that the emphasis is on predicting the


responses and not necessarily on trying to under
stand the underlying relationship between the va
riables.

• When prediction is the goal and there is no


practical need to limit the number of measured f
actors, PLS can be a useful tool.
• PLS was developed in the 1960’s by Herman
Wold as an econometric technique, but some of i
ts most avid proponents (including Wold’s son S
vante) are chemical engineers and chemometrici
ans.
• Partial least squares regression (PLSR) is a
multivariate data analytical technique designed
to handle intercorrelated regressors.

• It is based on Herman Wold’s general PLS


principle in which complicated, multivariate
systems analysis problems are solved by
sequence of simple least squares regressions.
How Does PLS Work?

• In principle, MLR can be used with very many predictors.

• However, if the number of predictors gets too large (for


example, greater than the number of observations), you ar
e likely to get a model that fits the sampled data perfectly b
ut that will fail to predict new data well.

• This phenomenon is called over-fitting.

• In such cases, although there are many manifest


predictors, there may be only a few underlying or latent fac
tors that account for most of the variation in the response.
• The general idea of PLS is to try to extract these latent
factors, accounting for as much of the manifest predictor
variation as possible while modeling the responses well.

• For this reason, the acronym PLS has also been taken to
mean ‘‘projection to latent structure.’’
• The overall goal is to use the predictors to predict the
responses in the population.

• This is achieved indirectly by extracting latent variables T


and U from sampled factors and responses, respectively.

• The extracted factors T (also referred to as X-scores) are


used to predict the Y-scores U, and then the predicted Y-
scores are used to construct predictions for the response
s.

• This procedure actually covers various techniques,


depending on which source of variation is considered mo
st crucial.
• PCR is based on the spectral decomposition of
XtX, where X is the matrix of predictor values;

• PLS is based on the singular value


decomposition of XtY .
• If the number of extracted factors is greater than or equal
to the rank of the sample factor space, then PLS is
equivalent to MLR.

• An important feature of the method is that usually a great


deal fewer factors are required.

• One approach to extract optimum number of factors is to


construct the PLS model for a given number of factors
on one set of data and then to test it on another, choosin
g the number of extracted factors for which the total pred
iction error is minimized.
• Alternatively, van der Voet (1994) suggests
choosing the least number of extracted factors w
hose residuals are not significantly greater than t
hose of the model with minimum error.

• If no convenient test set is available, then each


observation can be used in turn as a test set; thi
s is known as cross-validation.
• The PLSR is a bilinear regression method that
extracts a small number of factor, ta, a = 1, 2,…,
A that are linear combinations of the K X
variables, and use these factors as regressors
for y.

• What is special for the PLSR compared to


principal component regression (PCR) is that the
y variable is used actively in determining how
the regression factors
ta are computed from the X.
• Each PLSR factor ta is defined so that it
describes as much as possible of the covariance
between X and y remaining after the previous a-
1 factors have been estimated and subtracted.

• The purpose of using PLSR in multivariate


calibration is to obtain good insight and good
predictive ability at the same time.
• In classical stepwise multiple linear regression
(SMLR) the collinearity is handled by picking out
a small subset of individual, distinctly different X
variables from all the available X variables.

• This reduced subset is used as regressors for y,


leaving the other X variables unused.
• The estimated factors are often defined to be
orthogonal to one another.

• The model for regressions on estimated latent


variables can be summarized as follows:

T = w(X)

X = p(T) + E

y = q(T) + f

y = q(w(X)) + f = b(X) + f
• In practice, the model parameters have to be
estimated from empirical data.

• Since the regression is intended for later


prediction of y and X, the factor scores T are
generally defined as functions of X:
T = w(X).

• The major difference between calibration


methods is how T is estimated.
• For instance, in PCR it is estimated as a series
of eigenvector spectra for
(X – 1xT)T(X – 1xT), etc.

• In PLSR w() is defined as a sequence of X


versus y covariances.
PLS-Regression (PLS-R)
PLS-A Powerful Alternative to PCR

• It is possible to obtain the same prediction results as


PCR, but based on a smaller number of components, by
allowing the y-data structure to intervene directly in the
X-decomposition.

• This by condensing the two-stage PCR process into just


one: PLS-R (Partial Least Squares Regression).

• Usually the term used is just PLS, which has also been
interpreted to signify Projection to Latent Structures.

• PLS claims to do the same job as PCR, only with fewer


bilinear components.
PLS(X, Y); Initial Comparison with
PCA(X),PCA(Y)

• In comparision between PCR and PLS, PLS


uses the y-data structure, the y-variance, directly
as a guiding hand in decomposing the X-matrix,
so that the outcome constitutes as optimal
regression, precisely in the strict prediction
validation sense.
• A very first approximation to an understanding of
how the PLS-approach works (though not
entirely correct) is tentatively and simply to view
it as two simultaneous PCA-analyses, PCA of X
and PCA of Y.

• The equivalent PCA equations are presented at


the following Figure.
• Note how the score and loading complements in
X are called T and P respectively (X also has an
alternative W-loading in addition to the familiar
P-loading), while these are called U and Q
respectively for the Y-space.

X = ∑T ⋅ PT + E
A

Y = ∑U ⋅ Q T + F
A
• However PLS does not really perform two
independent PCA-analyses on the two spaces.

• On the contrary, PLS actively connects the X- and


Y-spaces by specifying the u-score vector (s) to act
as the starting points for (actually instead of) the t-
score vectors in the X-space decomposition.

w = loading weight
p = x loading
q = y loading
• Thus the starting proxy-t1 is actually u1 in the
PLS-R method, thereby letting the Y-data
structure directly guide the otherwise much more
“PCA-like” decomposition of X.

• Subsequently u1 is later substituted by t1 at the


relevant stage in the PLS-algorithm in which the
Y-space is decomposed.
• The crucial point is that it is the u1 (reflecting the
Y-space structure) that first influences the X-
decomposition leading to calculation of the X-
loadings, but these are now termed “w” (for
“loading-weights”).

• Then the X-space t-vectors are calculated,


formally in a “standard” PCA fashion, but
necessarily based on this newly calculated w-
vector.
• This t-vector is now immediately used as the
starting proxy- u1-vector, i.e. instead of u1, as
described above only symmetrically with the X-
and the Y-space interchanged.

• By this means, the X-data structure also


influences the “PCA (Y)-like” decomposition.
B = W(PTW)-1 QT
• Thus, what might at first sight appear as two
sets of independent PCA decompositions is in
fact based on these interchanged score vectors.

• In this way we have achieved the goal of


modeling the X- and Y-space interdependently.
PLS actively reduces the influence of large X-
variations which do not correlate with Y.

• PCR is based on the spectral decomposition of


X’X, where X is the matrix of variables and PLS
is based on the singular value decomposition of
X’Y.
• Alternative overview of PLS (indirect modeling)
states that the overall goal is to use the variables
to predict the responses in the population.

• This is achieved indirectly by extracting latent


variables T and U from sampled variables and
responses, respectively.

• The extracted factors T (also referred to as X-


scores) are used to predict the Y-scores U, and
then the predicted Y-scores are used to
construct predictions for the responses.
Interpretation of PLS models

• In principle PLS models are interpreted in much


the same way as PCA and PCR models.

• Plotting the X- and the Y-loadings in the same


plot allows you to study the inter-variable
relationship, now also including the relationship
between the X- and Y-variables.
• Since PLS focuses on Y, the Y-relevant
information is usually expected already in early
components.

• There are however situations where the variation


related to Y is very subtle, so many components
will be necessary to explain enough of Y.
Loadings (p) and loading weights (w)

• The P-loadings are very much like the well-known PCA-


loadings; they express the relationship between the raw
data matrix X and its score, T. (in PLS these may be
called PLS scores.)

• These loadings may be interpreted in the same way as in


PCA or PCR, so long as it is aware that the scores have
been calculated by PLS.

• In many PLS applications P and W are quite similar.


This means that the dominant structures in X “happen” to
be directed more or less along the same directions as
those with maximum correlation to Y.
• The loading weights, W, however represent the
effective loadings directly connected to building
the sought for regression relationship between X
and Y.

• In PLS there is also a set of Y-loadings, Q,


which are the regression coefficients from the Y-
variables onto the scores, U.

• Q and W may be used to interpret relationships


between the X- and Y-variables, and to interpret
the patterns in the score plots related to these
loadings.
Loading plot of non-spectra variables
Loading plot of spectra variables
• The fact that both P and W are important
however, is clear from construction of the formal
regression equation Y = XB from any specific
PLS solution with A components.

• This B matrix is calculated from:

B = W(PTW)-1QT

This B-matrix is often used for practical


(numerical) prediction purposes.
When to use which method?

• PLS-approach is easy to understand


conceptually and to be preferred because it is
direct, and effective.

• PLS is said to produce results, which are easier


to interpret because they are less complex
(using fewer components).

• Often PCR may give prediction errors as low as


those of PLS, but almost invariably by using
more PCs to do the jobs.
• PLS2 is a natural method to start with when
there are many Y-variables.

• You quickly get an overview of the basic


patterns and see if there is significant correlation
between the Y-variables.

• PLS2 may actually in a few cases even give


better results if Y is collinear, because it utilises
all the available information in Y.

• The drawback is that you may need different


numbers of PCs for the different Y-variables,
which you must remember at interpretation and
prediction.
Exercise- Interpretation of PLS (Jam)

You might also like