You are on page 1of 4

Example: Forestry

Data Analysis Using SPSS


Multiple Linear Regression • Dependent:
– volume of cherry trees
Alexander Ploner
Institute for Mathematics and Statistics
• Independent:
29. April 2002 – tree circumference
– tree height

• 31 trees from Allegheny National Forest

• File: cherry.sav

⇒ y = f (x1, x2)

– Multiple Regression – – Multiple Regression – 1

Example: Forestry cont.

• Model:

volume = β0 + β1 × height + β2 × circumference + 


General Regression Issues
• Unknown parameters:
– β0: scaling constant
– β1: height factor
– β2: circumference factor
– : random influences, ’error’

• one dependent – several independents

• fit via least squares

– Multiple Regression – 2 – Multiple Regression – 3


Multiple & Simple Regression Example: Ozone

• Parameter estimation • Dependent:


– ozone
• Inference for parameters
• Independent:
• Model diagnostics:
– temperature
– explanatory quality of model
– wind
– check assumptions
– solar radiation

• Prediction of the dependent variable


• 111 consecutive days in NY

• File: air2.sav

– Multiple Regression – 4 – Multiple Regression – 5

Different & Important in Multiple Regression

• Diagnosing
– influential observations
– problematic observations (outliers)
Specific Issues in Multiple Regression
• Variable selection

• Numerical problems

– Multiple Regression – 6 – Multiple Regression – 7


Influential Observations Example: Body Fat

Diagnostic Tools:
• Dependent: body fat percentage
• (Deleted) residuals
• Independent:
• Influence measures: – age, height, weight
– 10 body measurements (circumferences)
– DfBeta: change in parameter for each observation removed
– DfFit: change in fit for each observation removed
• 252 men between 22 and 81
• Distance measures:
• File: fat2.sav
– Mahalanobis-distance, leverage
– Cook-distance

– Multiple Regression – 8 – Multiple Regression – 9

Model Selection Automatic Selection of Variables

• ’All models are wrong, some are useful.’ (G. Box) • Using the F-statistic (ANOVA)

• In practice often: • To be used with caution


– large number of independent variables
• Forward – add one variable per step
– some of them w/o real influence
– simpler partial model advantageous
– several models competetive • Backward – remove one variable per step

• Selection of variables: • Stepwise – add one variable per step, remove another one if necessary

– automatic
– manual

– Multiple Regression – 10 – Multiple Regression – 11


Manual Selection of Variables Numerical Problems

• Collinearity:
• known causal relationships
– correlation between independent variables
• t-statistics and confidence intervals for coefficients – calculations too sensitive to small changes in data
– due to rounding errors
– Symptoms:
• Correlations with dependent variable:
∗ ’wrong’ sign of coefficients
– (0-order) ∗ important variables have small t-statistic
– partial – correlation after removing effects of other variables ∗ large standard errors
– part – correlation after removing effects of other variables from the indepen-
dent variable • Measures:
– tolerance, variance inflation factor
– eigenvalues

• Solution: remove correlated variables

– Multiple Regression – 12 – Multiple Regression – 13

You might also like