Transforms Are Used To

Transforms are used to
Transforms Revisited
Change the mean function so that it is

linear.
Adjust for non-constant variance problem

Fix non-Normal residuals
Although you won't always solve all three
problems (or any problem for that matter.)
Youve already studied

log transforms and
square-root transforms
Now
were going to consider a more general
class of transforms and discuss strategies

for finding the best transform
Strategy
First, transform Y.
If that doesn't work, transform the
predictors, but not Y.
If that improves things but not perfectly, see

if you can now transform Y.
There are also approaches that consider
transforming ALL variables simultaneously.
Keep in mind
Don't remove outliers, influential points,
etc. until the transforming is done.
These points might not really be so outlying

once the transform is done.
Keep in Mind
Transform Y
Basic idea: What if
Simple is better than complicated

If you are expected to interpret the
parameters, then transformations might

make this impossible.
E(Y |X) 6=
1 x1
+ ... +
p xp
but instead:
E(Y |X) = g(
1 x1
+ ... +
p xp )
so we need to discover g()
E(Y |X) = g(
1 x1
+ ... +
Transform Y: 2 approaches
p xp )
if we knew g(), we could invert it:

g
(E(Y |X)) = g
Ynew =
(g(
1 x1
1 x1
+ ... +
+ ... +
p xp
p xp ))
Inverse Response Plots

Box-Cox Method
Inverse Response Plots

a technique for guessing g()
If the predictors have an elliptically symmetric
distribution (so joint Normal is one example of this), then
plot y-hat against y.
The shape of the resulting curve gives you an idea as to the
shape of g inverse.
> m1=lm(ozone~temperature+pressure,data=ozonetext)
> plot(m1)
A plot of the predictors show that their joint distribution

is roughly elliptical.
> library(alr3)
> invResPlot(m2)
1
2
3
4
lambda
0.3658881
-1.0000000
0.0000000
1.0000000
RSS
1989.771
3412.912
2082.377
2196.992
Note log transform isnt to different from optimal
Suggests that the best transform is
Ynew = Y
(lambda=0 refers to the log transform)
0.365881
> ozone.t1=transform(ozonetext,ozone.t = ozone^(.37) )

> m2=lm(ozone.t~temperature+pressure,data=ozone.t1)
> plot(m2)
transformed
original
transform
transformed
original
original
On the whole, the transformation
improved the validity of the model.
But interpretation may now be quite

difficult.
Still, improved validity means we better
trust p-values and confidence intervals and

prediction intervals.
> summary(m2)
Call:
lm(formula = ozone.t ~ temperature + pressure, data = ozone.t1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4004629 0.1774149 -2.257
0.0256 *
temperature 0.0423812 0.0027663 15.321
<2e-16 ***
pressure
-0.0001918 0.0010937 -0.175
0.8610
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.3794 on 138 degrees of freedom
Multiple R-squared: 0.6688,
Adjusted R-squared: 0.664
F-statistic: 139.3 on 2 and 138 DF, p-value: < 2.2e-16
Another approach:
Box-Cox
Choose a transform of Y,
(Y )
such that distribution of Y is closer to Normal

(Useful when the distribution of the variable to be
transformed is not Normal.)
where
(Y ) = gm(Y )1
(Y
(Y ) = gm(Y )log(Y )
1)/
for
(gm is the geometric mean)
=0
(Y ) = gm(Y )1
(Y
1)/
gm(Y) is the geometric mean of y =

1/n
ni=1 Yi
To find lambda....
maximum likelihood estimation of lambda.
> library(MASS)
> boxcox(m1)
or
> library(alr3)
>summary(powerTransform(y~x1+x2,data=))
> boxcox(m1)
1/3
> summary(powerTransform(m1))
bcPower Transformation to Normality
which confirms
our previous
transformation
using lambda = .37
Y1
Est.Power Std.Err. Wald Lower Bound Wald Upper Bound

0.2343
0.0866
0.0646
0.4041
Likelihood ratio tests about transformation parameters

LRT df
pval
LR test, lambda = (0) 7.568201 1 5.940706e-03
LR test, lambda = (1) 66.558671 1 3.330669e-16
In fact, optimal transform is .23, which is smaller than

previous .37. However, .37 is within the confidence interval
of 0.0646 to 0.4041
Likelihood ratio tests about transformation parameters

LRT df
pval
LR test, lambda = (0) 7.568201 1 5.940706e-03
LR test, lambda = (1) 66.558671 1 3.330669e-16
Null: no transform (lambda=1)

Alt: do a transform
Reject. We need a transform.
Small p-value, so we reject.
Thus, it is best to not do a
log transform.
Null: lambda=0
Alt: lambda <> 0
Transform Predictors
You can use BoxCox to transform
predictors when Y is NOT transformed
Then, if necessary, use inverse response

plot to transform Y
In this approach, we find a transformation
that makes the joint distribution of all the

predictors multivariate Normal.
(or as close to it as we can get)

once thats done, we try to find a
transform for Y.
Then we see if it helps.
Do these predictors look like they come from a Normal

distribution?
>library(alr3)
> summary(powerTransform(ozone~temperature+height,data=o2.mini))
box.cox Transformations to Multinormality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
1.1383
0.3246
3.5070
0.426
18.9126
4.5176
4.1864
3.965
LRT df
p.value
LR test, all lambda equal 0 25.50600 2 2.893633e-06
temperature
height
(probably not)
Best lambda could be within two Std. Errors of Estimated.

For temp, use a lambda between 0.5 to 1.7, rounding
generously.
> summary(powerTransform(cbind(o2.mini$temperature, o2.mini

$height,data=o2.mini)~1)
1.1383
0.3246
3.5070
0.426
18.9126
4.5176
4.1864
3.965
LRT df
p.value
temperature
height
Temp: try square-root transform or no transform

Height: Transform to a high power, which is very unusual
and probably not helpful. But let's try the 20th power
anyways.
> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t =

height^20)
> plot(o2.minit)
residuals: no transform
> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t = height^20)

> plot(o2.minit)
transformed
predictors
not much better, so look at transforming

Y
> m.t1 = lm(ozone~temp.t+height.t,data=o2.minit)

> plot(m.t1)
> invResPlot(m.t1)
once again,
Y
Y 1/3
looks best.
> o2.minit2 = transform(o2.minit,ozone.t =

ozone^(1/3))
> m.t2 = lm(ozone.t~temp.t
+height.t,data=o2.minit2)
> plot(m.t2)
A third approach is to use boxcox to
transform the predictors and the response

simultaneously
Use BoxCox to transform ALL at once.

>
summary(powerTransform(with(o2.mini,cbind(ozone,height,temperature))~1
)
0.2503
0.0888
2.8178
-8.4416
18.8959
4.4542
4.2422
4.0177
1.1590
0.2661
4.3550
0.5976
LRT df
p.value
LR test, all lambda equal 1 83.53574 3 0.000000e+00
ozone
height
temperature
This is consistent with the 1/3 power of ozone, a 20th power for
height, and no change (raise to the 1 power) for temp.
2 (p + 1)/n = 2 3/141 = 0.04 = "big" leverage

Transforms Are Used To

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Transforms Are Used To

Uploaded by

Copyright:

Available Formats

Transforms are used to

Change the mean function so that it is

Adjust for non-constant variance problem

problems (or any problem for that matter.)

Youve already studied

class of transforms and discuss strategies

If that improves things but not perfectly, see

There are also approaches that consider

transforming ALL variables simultaneously.

These points might not really be so outlying

Simple is better than complicated

parameters, then transformations might

so we need to discover g()

if we knew g(), we could invert it:

Inverse Response Plots

Inverse Response Plots

A plot of the predictors show that their joint distribution

Suggests that the best transform is

(lambda=0 refers to the log transform)

> ozone.t1=transform(ozonetext,ozone.t = ozone^(.37) )

On the whole, the transformation

improved the validity of the model.

But interpretation may now be quite

Still, improved validity means we better

trust p-values and confidence intervals and

such that distribution of Y is closer to Normal

(gm is the geometric mean)

gm(Y) is the geometric mean of y =

Est.Power Std.Err. Wald Lower Bound Wald Upper Bound

Likelihood ratio tests about transformation parameters

In fact, optimal transform is .23, which is smaller than

Likelihood ratio tests about transformation parameters

Null: no transform (lambda=1)

predictors when Y is NOT transformed

Then, if necessary, use inverse response

In this approach, we find a transformation

that makes the joint distribution of all the

(or as close to it as we can get)

Then we see if it helps.

Do these predictors look like they come from a Normal

Best lambda could be within two Std. Errors of Estimated.

> summary(powerTransform(cbind(o2.mini$temperature, o2.mini

Temp: try square-root transform or no transform

> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t =

> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t = height^20)

not much better, so look at transforming

> m.t1 = lm(ozone~temp.t+height.t,data=o2.minit)

> o2.minit2 = transform(o2.minit,ozone.t =

A third approach is to use boxcox to

transform the predictors and the response

Use BoxCox to transform ALL at once.

2 (p + 1)/n = 2 3/141 = 0.04 = "big" leverage

You might also like