Partial Least Squares Regression (PLSR)

Partial Least Squares
Regression (PLSR)
• Partial least squares (PLS) is a method for
constructing predictive models when the
predictors are many and highly collinear.
• Note that the emphasis is on predicting the

responses and not necessarily on trying to under
stand the underlying relationship between the va
riables.
• When prediction is the goal and there is no

practical need to limit the number of measured f
actors, PLS can be a useful tool.
• PLS was developed in the 1960’s by Herman
Wold as an econometric technique, but some of i
ts most avid proponents (including Wold’s son S
vante) are chemical engineers and chemometrici
ans.
• Partial least squares regression (PLSR) is a
multivariate data analytical technique designed
to handle intercorrelated regressors.
• It is based on Herman Wold’s general PLS

principle in which complicated, multivariate
systems analysis problems are solved by
sequence of simple least squares regressions.
How Does PLS Work?
• In principle, MLR can be used with very many predictors.
• However, if the number of predictors gets too large (for

example, greater than the number of observations), you ar
e likely to get a model that fits the sampled data perfectly b
ut that will fail to predict new data well.
• This phenomenon is called over-fitting.
• In such cases, although there are many manifest

predictors, there may be only a few underlying or latent fac
tors that account for most of the variation in the response.
• The general idea of PLS is to try to extract these latent
factors, accounting for as much of the manifest predictor
variation as possible while modeling the responses well.
• For this reason, the acronym PLS has also been taken to
mean ‘‘projection to latent structure.’’
• The overall goal is to use the predictors to predict the
responses in the population.
• This is achieved indirectly by extracting latent variables T

and U from sampled factors and responses, respectively.
• The extracted factors T (also referred to as X-scores) are

used to predict the Y-scores U, and then the predicted Y-
scores are used to construct predictions for the response
s.
• This procedure actually covers various techniques,

depending on which source of variation is considered mo
st crucial.
• PCR is based on the spectral decomposition of
XtX, where X is the matrix of predictor values;
• PLS is based on the singular value

decomposition of XtY .
• If the number of extracted factors is greater than or equal
to the rank of the sample factor space, then PLS is
equivalent to MLR.
• An important feature of the method is that usually a great

deal fewer factors are required.
• One approach to extract optimum number of factors is to

construct the PLS model for a given number of factors
on one set of data and then to test it on another, choosin
g the number of extracted factors for which the total pred
iction error is minimized.
• Alternatively, van der Voet (1994) suggests
choosing the least number of extracted factors w
hose residuals are not significantly greater than t
hose of the model with minimum error.
• If no convenient test set is available, then each

observation can be used in turn as a test set; thi
s is known as cross-validation.
• The PLSR is a bilinear regression method that
extracts a small number of factor, ta, a = 1, 2,…,
A that are linear combinations of the K X
variables, and use these factors as regressors
for y.
• What is special for the PLSR compared to

principal component regression (PCR) is that the
y variable is used actively in determining how
the regression factors
ta are computed from the X.
• Each PLSR factor ta is defined so that it
describes as much as possible of the covariance
between X and y remaining after the previous a-
1 factors have been estimated and subtracted.
• The purpose of using PLSR in multivariate

calibration is to obtain good insight and good
predictive ability at the same time.
• In classical stepwise multiple linear regression
(SMLR) the collinearity is handled by picking out
a small subset of individual, distinctly different X
variables from all the available X variables.
• This reduced subset is used as regressors for y,

leaving the other X variables unused.
• The estimated factors are often defined to be
orthogonal to one another.
• The model for regressions on estimated latent

variables can be summarized as follows:
T = w(X)
X = p(T) + E
y = q(T) + f
y = q(w(X)) + f = b(X) + f
• In practice, the model parameters have to be
estimated from empirical data.
• Since the regression is intended for later

prediction of y and X, the factor scores T are
generally defined as functions of X:
T = w(X).
• The major difference between calibration

methods is how T is estimated.
• For instance, in PCR it is estimated as a series
of eigenvector spectra for
(X – 1xT)T(X – 1xT), etc.
• In PLSR w() is defined as a sequence of X

versus y covariances.
PLS-Regression (PLS-R)
PLS-A Powerful Alternative to PCR
• It is possible to obtain the same prediction results as

PCR, but based on a smaller number of components, by
allowing the y-data structure to intervene directly in the
X-decomposition.
• This by condensing the two-stage PCR process into just

one: PLS-R (Partial Least Squares Regression).
• Usually the term used is just PLS, which has also been
interpreted to signify Projection to Latent Structures.
• PLS claims to do the same job as PCR, only with fewer

bilinear components.
PLS(X, Y); Initial Comparison with
PCA(X),PCA(Y)
• In comparision between PCR and PLS, PLS

uses the y-data structure, the y-variance, directly
as a guiding hand in decomposing the X-matrix,
so that the outcome constitutes as optimal
regression, precisely in the strict prediction
validation sense.
• A very first approximation to an understanding of
how the PLS-approach works (though not
entirely correct) is tentatively and simply to view
it as two simultaneous PCA-analyses, PCA of X
and PCA of Y.
• The equivalent PCA equations are presented at

the following Figure.
• Note how the score and loading complements in
X are called T and P respectively (X also has an
alternative W-loading in addition to the familiar
P-loading), while these are called U and Q
respectively for the Y-space.
X = ∑T ⋅ PT + E
A
Y = ∑U ⋅ Q T + F
A
• However PLS does not really perform two
independent PCA-analyses on the two spaces.
• On the contrary, PLS actively connects the X- and

Y-spaces by specifying the u-score vector (s) to act
as the starting points for (actually instead of) the t-
score vectors in the X-space decomposition.
w = loading weight
p = x loading
q = y loading
• Thus the starting proxy-t1 is actually u1 in the
PLS-R method, thereby letting the Y-data
structure directly guide the otherwise much more
“PCA-like” decomposition of X.
• Subsequently u1 is later substituted by t1 at the

relevant stage in the PLS-algorithm in which the
Y-space is decomposed.
• The crucial point is that it is the u1 (reflecting the
Y-space structure) that first influences the X-
decomposition leading to calculation of the X-
loadings, but these are now termed “w” (for
“loading-weights”).
• Then the X-space t-vectors are calculated,

formally in a “standard” PCA fashion, but
necessarily based on this newly calculated w-
vector.
• This t-vector is now immediately used as the
starting proxy- u1-vector, i.e. instead of u1, as
described above only symmetrically with the X-
and the Y-space interchanged.
• By this means, the X-data structure also

influences the “PCA (Y)-like” decomposition.
B = W(PTW)-1 QT
• Thus, what might at first sight appear as two
sets of independent PCA decompositions is in
fact based on these interchanged score vectors.
• In this way we have achieved the goal of

modeling the X- and Y-space interdependently.
PLS actively reduces the influence of large X-
variations which do not correlate with Y.
• PCR is based on the spectral decomposition of

X’X, where X is the matrix of variables and PLS
is based on the singular value decomposition of
X’Y.
• Alternative overview of PLS (indirect modeling)
states that the overall goal is to use the variables
to predict the responses in the population.
• This is achieved indirectly by extracting latent

variables T and U from sampled variables and
responses, respectively.
• The extracted factors T (also referred to as X-

scores) are used to predict the Y-scores U, and
then the predicted Y-scores are used to
construct predictions for the responses.
Interpretation of PLS models
• In principle PLS models are interpreted in much

the same way as PCA and PCR models.
• Plotting the X- and the Y-loadings in the same

plot allows you to study the inter-variable
relationship, now also including the relationship
between the X- and Y-variables.
• Since PLS focuses on Y, the Y-relevant
information is usually expected already in early
components.
• There are however situations where the variation

related to Y is very subtle, so many components
will be necessary to explain enough of Y.
Loadings (p) and loading weights (w)
• The P-loadings are very much like the well-known PCA-

loadings; they express the relationship between the raw
data matrix X and its score, T. (in PLS these may be
called PLS scores.)
• These loadings may be interpreted in the same way as in

PCA or PCR, so long as it is aware that the scores have
been calculated by PLS.
• In many PLS applications P and W are quite similar.

This means that the dominant structures in X “happen” to
be directed more or less along the same directions as
those with maximum correlation to Y.
• The loading weights, W, however represent the
effective loadings directly connected to building
the sought for regression relationship between X
and Y.
• In PLS there is also a set of Y-loadings, Q,

which are the regression coefficients from the Y-
variables onto the scores, U.
• Q and W may be used to interpret relationships

between the X- and Y-variables, and to interpret
the patterns in the score plots related to these
loadings.
Loading plot of non-spectra variables
Loading plot of spectra variables
• The fact that both P and W are important
however, is clear from construction of the formal
regression equation Y = XB from any specific
PLS solution with A components.
• This B matrix is calculated from:
B = W(PTW)-1QT
This B-matrix is often used for practical

(numerical) prediction purposes.
When to use which method?
• PLS-approach is easy to understand

conceptually and to be preferred because it is
direct, and effective.
• PLS is said to produce results, which are easier

to interpret because they are less complex
(using fewer components).
• Often PCR may give prediction errors as low as

those of PLS, but almost invariably by using
more PCs to do the jobs.
• PLS2 is a natural method to start with when
there are many Y-variables.
• You quickly get an overview of the basic

patterns and see if there is significant correlation
between the Y-variables.
• PLS2 may actually in a few cases even give

better results if Y is collinear, because it utilises
all the available information in Y.
• The drawback is that you may need different

numbers of PCs for the different Y-variables,
which you must remember at interpretation and
prediction.
Exercise- Interpretation of PLS (Jam)

Partial Least Squares Regression (PLSR)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Partial Least Squares Regression (PLSR)

Uploaded by

Copyright:

Available Formats

Partial Least Squares

• Note that the emphasis is on predicting the

• When prediction is the goal and there is no

• It is based on Herman Wold’s general PLS

• In principle, MLR can be used with very many predictors.

• However, if the number of predictors gets too large (for

• This phenomenon is called over-fitting.

• In such cases, although there are many manifest

• This is achieved indirectly by extracting latent variables T

• The extracted factors T (also referred to as X-scores) are

• This procedure actually covers various techniques,

• PLS is based on the singular value

• An important feature of the method is that usually a great

• One approach to extract optimum number of factors is to

• If no convenient test set is available, then each

• What is special for the PLSR compared to

• The purpose of using PLSR in multivariate

• This reduced subset is used as regressors for y,

• The model for regressions on estimated latent

• Since the regression is intended for later

• The major difference between calibration

• In PLSR w() is defined as a sequence of X

• It is possible to obtain the same prediction results as

• This by condensing the two-stage PCR process into just

• PLS claims to do the same job as PCR, only with fewer

• In comparision between PCR and PLS, PLS

• The equivalent PCA equations are presented at

• On the contrary, PLS actively connects the X- and

• Subsequently u1 is later substituted by t1 at the

• Then the X-space t-vectors are calculated,

• By this means, the X-data structure also

• In this way we have achieved the goal of

• PCR is based on the spectral decomposition of

• This is achieved indirectly by extracting latent

• The extracted factors T (also referred to as X-

• In principle PLS models are interpreted in much

• Plotting the X- and the Y-loadings in the same

• There are however situations where the variation

• The P-loadings are very much like the well-known PCA-

• These loadings may be interpreted in the same way as in

• In many PLS applications P and W are quite similar.

• In PLS there is also a set of Y-loadings, Q,

• Q and W may be used to interpret relationships

• This B matrix is calculated from:

This B-matrix is often used for practical

• PLS-approach is easy to understand

• PLS is said to produce results, which are easier

• Often PCR may give prediction errors as low as

• You quickly get an overview of the basic

• PLS2 may actually in a few cases even give

• The drawback is that you may need different

You might also like