You are on page 1of 26

Transforms are used to

Transforms Revisited

Change the mean function so that it is


linear.

Adjust for non-constant variance problem


Fix non-Normal residuals
Although you won't always solve all three

problems (or any problem for that matter.)

Youve already studied


log transforms and
square-root transforms

Now
were going to consider a more general

class of transforms and discuss strategies


for finding the best transform

Strategy
First, transform Y.
If that doesn't work, transform the
predictors, but not Y.

If that improves things but not perfectly, see


if you can now transform Y.

There are also approaches that consider

transforming ALL variables simultaneously.

Keep in mind
Don't remove outliers, influential points,
etc. until the transforming is done.

These points might not really be so outlying


once the transform is done.

Keep in Mind

Transform Y
Basic idea: What if

Simple is better than complicated


If you are expected to interpret the

parameters, then transformations might


make this impossible.

E(Y |X) 6=

1 x1

+ ... +

p xp

but instead:
E(Y |X) = g(

1 x1

+ ... +

p xp )

so we need to discover g()

E(Y |X) = g(

1 x1

+ ... +

Transform Y: 2 approaches

p xp )

if we knew g(), we could invert it:


g

(E(Y |X)) = g
Ynew =

(g(
1 x1

1 x1

+ ... +

+ ... +
p xp

p xp ))

Inverse Response Plots


Box-Cox Method

Inverse Response Plots


a technique for guessing g()
If the predictors have an elliptically symmetric
distribution (so joint Normal is one example of this), then
plot y-hat against y.
The shape of the resulting curve gives you an idea as to the
shape of g inverse.

> m1=lm(ozone~temperature+pressure,data=ozonetext)
> plot(m1)

A plot of the predictors show that their joint distribution


is roughly elliptical.

> library(alr3)
> invResPlot(m2)

1
2
3
4

lambda
0.3658881
-1.0000000
0.0000000
1.0000000

RSS
1989.771
3412.912
2082.377
2196.992
Note log transform isnt to different from optimal

Suggests that the best transform is

Ynew = Y

(lambda=0 refers to the log transform)

0.365881

> ozone.t1=transform(ozonetext,ozone.t = ozone^(.37) )


> m2=lm(ozone.t~temperature+pressure,data=ozone.t1)
> plot(m2)

transformed

original

transform

transformed

original
original

On the whole, the transformation

improved the validity of the model.

But interpretation may now be quite


difficult.

Still, improved validity means we better

trust p-values and confidence intervals and


prediction intervals.

> summary(m2)
Call:
lm(formula = ozone.t ~ temperature + pressure, data = ozone.t1)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4004629 0.1774149 -2.257
0.0256 *
temperature 0.0423812 0.0027663 15.321
<2e-16 ***
pressure
-0.0001918 0.0010937 -0.175
0.8610
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.3794 on 138 degrees of freedom
Multiple R-squared: 0.6688,
Adjusted R-squared: 0.664
F-statistic: 139.3 on 2 and 138 DF, p-value: < 2.2e-16

Another approach:

Box-Cox
Choose a transform of Y,

(Y )

such that distribution of Y is closer to Normal


(Useful when the distribution of the variable to be
transformed is not Normal.)

where
(Y ) = gm(Y )1

(Y

(Y ) = gm(Y )log(Y )

1)/

for

(gm is the geometric mean)

=0

(Y ) = gm(Y )1

(Y

1)/

gm(Y) is the geometric mean of y =


1/n

ni=1 Yi

To find lambda....
maximum likelihood estimation of lambda.
> library(MASS)
> boxcox(m1)
or
> library(alr3)
>summary(powerTransform(y~x1+x2,data=))

> boxcox(m1)
1/3

> summary(powerTransform(m1))
bcPower Transformation to Normality

which confirms
our previous
transformation
using lambda = .37

Y1

Est.Power Std.Err. Wald Lower Bound Wald Upper Bound


0.2343
0.0866
0.0646
0.4041

Likelihood ratio tests about transformation parameters


LRT df
pval
LR test, lambda = (0) 7.568201 1 5.940706e-03
LR test, lambda = (1) 66.558671 1 3.330669e-16

In fact, optimal transform is .23, which is smaller than


previous .37. However, .37 is within the confidence interval
of 0.0646 to 0.4041

Likelihood ratio tests about transformation parameters


LRT df
pval
LR test, lambda = (0) 7.568201 1 5.940706e-03
LR test, lambda = (1) 66.558671 1 3.330669e-16

Null: no transform (lambda=1)


Alt: do a transform
Reject. We need a transform.
Small p-value, so we reject.
Thus, it is best to not do a
log transform.
Null: lambda=0
Alt: lambda <> 0

Transform Predictors
You can use BoxCox to transform

predictors when Y is NOT transformed

Then, if necessary, use inverse response


plot to transform Y

In this approach, we find a transformation

that makes the joint distribution of all the


predictors multivariate Normal.

(or as close to it as we can get)


once thats done, we try to find a
transform for Y.

Then we see if it helps.

Do these predictors look like they come from a Normal


distribution?

>library(alr3)
> summary(powerTransform(ozone~temperature+height,data=o2.mini))
box.cox Transformations to Multinormality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
1.1383
0.3246
3.5070
0.426
18.9126
4.5176
4.1864
3.965
LRT df
p.value
LR test, all lambda equal 0 25.50600 2 2.893633e-06
LR test, all lambda equal 1 17.30179 2 1.749703e-04
temperature
height

(probably not)

Best lambda could be within two Std. Errors of Estimated.


For temp, use a lambda between 0.5 to 1.7, rounding
generously.

> summary(powerTransform(cbind(o2.mini$temperature, o2.mini


$height,data=o2.mini)~1)
box.cox Transformations to Multinormality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
1.1383
0.3246
3.5070
0.426
18.9126
4.5176
4.1864
3.965
LRT df
p.value
LR test, all lambda equal 0 25.50600 2 2.893633e-06
LR test, all lambda equal 1 17.30179 2 1.749703e-04
temperature
height

Temp: try square-root transform or no transform


Height: Transform to a high power, which is very unusual
and probably not helpful. But let's try the 20th power
anyways.

> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t =


height^20)
> plot(o2.minit)

residuals: no transform

> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t = height^20)


> plot(o2.minit)

transformed
predictors

not much better, so look at transforming


Y

> m.t1 = lm(ozone~temp.t+height.t,data=o2.minit)


> plot(m.t1)
> invResPlot(m.t1)

once again,
Y

Y 1/3

looks best.

> o2.minit2 = transform(o2.minit,ozone.t =


ozone^(1/3))
> m.t2 = lm(ozone.t~temp.t
+height.t,data=o2.minit2)
> plot(m.t2)

A third approach is to use boxcox to

transform the predictors and the response


simultaneously

Use BoxCox to transform ALL at once.


>
summary(powerTransform(with(o2.mini,cbind(ozone,height,temperature))~1
)
box.cox Transformations to Multinormality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
0.2503
0.0888
2.8178
-8.4416
18.8959
4.4542
4.2422
4.0177
1.1590
0.2661
4.3550
0.5976
LRT df
p.value
LR test, all lambda equal 0 37.03313 3 4.527709e-08
LR test, all lambda equal 1 83.53574 3 0.000000e+00
ozone
height
temperature

This is consistent with the 1/3 power of ozone, a 20th power for
height, and no change (raise to the 1 power) for temp.

2 (p + 1)/n = 2 3/141 = 0.04 = "big" leverage

You might also like