You are on page 1of 17

This workbook explores the properties of the Least Median of Squares regression.

It is designed to be used in conjunction with the Word file:


LMS.doc

The Definition sheet shows the difference among Least Sqaures, Quantile (or Median) regression, and Least Median of Squar
The terminology can be quite confusing and it is important to keep straight these alternative ways to fit a line.

The Live sheet contains the data generation process.


Hitting F9 (or CTRL + =) recalculates the sheet and draws new Ys based on parameter values on the sheet.
The Xs remain fixed.
The buttons across the top of the Live sheet provide access to Monte Carlo simulations.

The Dead sheet has unchanging Y values.


Scroll right in the Dead sheet to see (1) how to use Solver to fit a least squares line; (2) why Solver can't
handle the LMS optimization problem; and (3) to access the Fit LMS button.

Please send comments and suggestions to: Thanks to Frank Howland and David Maharry.
Humberto Barreto
Department of Economics This sheet is set to Manually Recalculate. Hit F9 after
Wabash College changing a cell in order to recalculate the sheet.
Crawfordsville, IN 47933

barretoh@wabash.edu
(765) 361-6315
Dec-01
Jul-06 Modified to catch the possibility of a subset with the same X value causing a divide by zero error in the code.

and Least Median of Squares.


ative ways to fit a line.

values on the sheet.

why Solver can't

David Maharry.

ecalculate. Hit F9 after


lculate the sheet.
zero error in the code.
There are SIX commonly used objective functions to fit lines, but they come in pairs so that there are really only THREE comm
Least Squares: minimize either the mean or the sum of squared residuals
Quantile (aka Median and L1-Norm) Regression: minimize either the mean or the sum of the absolute values of the
Least Median of Squares: minimize the median of either the squared residuals or the absolute value of the residua

When opened, the table and graph below show the Least Squares fit. Execute Tools: Solver and change the objective function
the other five possibilities (from a Target Cell of E10 to E11, F10, F11, E12, and F12) to see how the coefficients ch

intercept slope 0.88 0.8 mean LS minimizes the SUM (or MEAN) of res^2
3.799994 0.400002 4.4 4.000001 sum Solver gets this solution right.
0.640007 0.800004 median
x y pred y res res^2 |res| Quantile (median) regression
1 5 4.199996 0.800004 0.640007 0.800004 minimizes the SUM (or MEAN) of |res|
2 4 4.599997 -0.599997 0.359997 0.599997 It is unclear whether Solver is reliable for this p
3 5 4.999999 1.02E-06 1.03E-12 1.02E-06
4 4 5.400001 -1.400001 1.960002 1.400001 LMS minimizes the MEDIAN of the
5 7 5.800002 1.199998 1.439994 1.199998 res^2 which is the same as |res|
8 Solver is NOT reliable for this problem.
The Dead sheet demonstrates this (if you aren
7
The correct LMS fit is intercept=2, slope=1, ob
6
5 f(x) = 0.4x + 3.8
y
4 The red pred y line is whatever objective function was use
Linear (y)
3 The Linear (y) is the Least Squares line
pred y
2
1
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
really only THREE common alternatives:

the absolute values of the residuals


olute value of the residuals

ange the objective function to


ee how the coefficients change.

M (or MEAN) of res^2

or MEAN) of |res|
Solver is reliable for this problem.

MEDIAN of the

e for this problem.


onstrates this (if you aren't already convinced).
intercept=2, slope=1, objective function=0

objective function was used by Solver


Parameters
b0 1
b1 5
s 10 Outlier Factor 100

CLEAN DIRTY
X e Y X e Y
1 -5.0 1.0 160.0 Clean Data 1 -5.0 1.0 Dirty Data
160.0
2 -18.8 -7.8 140.0 2 -18.8 -7.8 140.0
3 -3.2 12.8 120.0 3 -3.2 12.8
120.0
4 0.8 21.8 100.0
4 0.8 21.8
100.0
5 -3.7 22.3 5 -3.7 22.3
80.0 80.0
6 11.5 42.5 6 11.5 42.5 f(x) = 9.8x - 2
60.0 60.0
7 -3.7 32.3 7 -3.7 32.3
40.0 f(x) = 5.3x - 4.8 40.0
8 -11.3 29.7 8 -11.3 29.7
20.0 20.0
9 -9.9 36.1 9 -9.9 36.1
10 10.8 61.8 0.0 10 10.8 61.8 0.0
0 2 4 6 8 10 12 0 2 4 6
11 -13.7 42.3 11 -13.7 142.3

Summary Statistics Summary Statistics


6 -4.2 26.8 Average 6 -4.2 35.9 Average
3.3 9.4 19.8 SD 3.3 9.4 40.2 SD
11 11.5 61.8 Max 11 11.5 142.3 Max
1 -18.8 -7.8 Min 1 -18.8 -7.8 Min
Dirty Data
160.0
140.0
120.0
100.0
80.0
f(x) = 9.8x - 23.0
60.0
40.0
20.0
0.0
0 2 4 6 8 10 12
Scroll right for more -->
Parameters
b0 1
b1 5
s 10 Outlier Factor 100

CLEAN DIRTY

160.0 Clean Data Dirty Data


160.0

140.0 140.0
X e Y X e Y
1 -13.8 -7.8 120.0 1 -13.8 -7.8 120.0
2 5.0 16.0 100.0
2 5.0 16.0
100.0
3 -2.3 13.7 3 -2.3 13.7
4 -16.5 4.5 80.0 4 -16.5 4.5 80.0
f(x) = 9.54x - 22.
5 1.6 27.6 60.0 5 1.6 27.6 60.0
6 -7.3 23.7 6 -7.3 23.7
40.0 f(x) = 4.99x - 4.37 40.0
7 0.3 36.3 7 0.3 36.3
8 -12.8 28.2 20.0 8 -12.8 28.2 20.0
9 1.3 47.3 0.0 9 1.3 47.3 0.0
10 -8.7 42.3 0 2 4 6 8 10 12 10 -8.7 42.3 0 2 4 6
-20.0 -20.0
11 -6.5 49.5 11 -6.5 149.5

Summary Statistics Summary Statistics


6 -5.4 25.6 Average 6 -5.4 34.7 Average
3.3 7.1 18.0 SD 3.3 7.1 41.4 SD
11 5.0 49.5 Max 11 5.0 149.5 Max
1 -16.5 -7.8 Min 1 -16.5 -7.8 Min
Solver for fitting LS
1) Generate Pred Y (R9:R19) based on the formula, PredYi = interc
2) Calculate Residuals (S9:S19) as Y - PredY
3) Square Residuals (T9:T19) and sum them (T7)
4) Execute Tools: Solver, make appropriate settings (see picture be
intercept 0 Avg SR
slope 0 Sum SR

Dirty Data
160.0

140.0
X Y PredY Residuals
120.0 1 -7.82955 0 -7.82955
2 15.96343 0 15.96343
100.0
3 13.65437 0 13.65437
80.0 4 4.514863 0 4.514863
f(x) = 9.54x - 22.55
60.0 5 27.63441 0 27.63441
6 23.74989 0 23.74989
40.0
7 36.26646 0 36.26646
20.0 8 28.21482 0 28.21482
0.0
9 47.30146 0 47.30146
0 2 4 6 8 10 12 10 42.28336 0 42.28336
-20.0 11 49.49232 0 49.49232

SSR values, given intercept and slope, are in the middle


intercept slope
4.6 4.7
-8.37 965.8832 878.7117
-7.37 837.4515 763.48
-6.37 731.0198 670.2484
-5.37 646.5882 599.0167
-4.37014 584.1637 549.7904
-3.37 543.7248 522.5534
-2.37 525.2931 517.3217
-1.37 528.8615 534.09

0
90
SSR
0
70
-0.37 554.4298 572.8584

0
90
SSR
0
70
0
50 6
4. 4.7 .8 9
4
slope 4.
50
6 56
9
98
4.
Solver for fitting LMS
ased on the formula, PredYi = intercept + slope*Xi 1) Generate Pred Y (AD
) as Y - PredY 2) Calculate Residuals (
nd sum them (T7) 3) Square Residuals (AF
appropriate settings (see picture below) and click Solve 4) Execute Tools: Solver
948.8008
10436.81

Residuals2 X
61.30184 After running Solver, the results are displayed. 1
254.8311 Cells R6 and R7 have the intercept and slope of 2
186.4418 the fitted LS line while the minimum SSR is in cell T7. 3
20.38399 4
763.6605 Solver quickly converges to an optimal solution 5
564.0575 and reports its success. 6
1315.256 7
796.0763 Scroll down to see a graphic of why this problem is 8
2237.429 easy to solve. 9
1787.882 10
2449.489 Scroll right to see Solver applied to LMS. --> 11

ntercept and slope, are in the middle of the table. Note the smooth bowl-shape with a clear minimum.

4.8 4.9 4.989657 5.1 5.2 5.3 5.4


801.6602 734.7288 683.3244 631.2259 594.6544 568.2029 551.8715
699.6286 645.8971 606.3274 568.7942 545.4227 532.1713 529.0398
619.5969 579.0654 551.3303 528.3625 518.1911 518.1396 528.2081
561.5652 534.2338 518.3333 509.9309 512.9594 526.1079 549.3765
525.5371 511.4038 507.3363 513.4971 529.7238 556.0705 592.5372
511.5019 510.5704 518.3393 539.0675 568.4961 608.0446 657.7131
519.4702 531.7388 551.3423 586.6359 629.2644 682.0129 744.8815
549.4386 574.9071 606.3453 656.2042 712.0327 777.9813 854.0498

0
90
SSR
0 -0.37
70
-1.37
-2.37
-3.37
-4.3701400759
601.4069 640.0754 683.3483 747.7725 816.8011 895.9496 985.2181

0
90
SSR
0 -0.37
70
-1.37
-2.37
-3.37
0 -4.3701400759
-5.37
50 6 -6.37
intercept
4. 7
4. 4.8 -7.37
.9 3 -8.37
slope4 5 04 1
5. 5.2 .3 4
56 5 5.
96
98
4.
Solver for fitting LMS
1) Generate Pred Y (AD9:AD19) based on the formula, PredYi = intercept + slope*Xi
2) Calculate Residuals (AE9:AE19) as Y - PredY
3) Square Residuals (AF9:AF19) and find the MEDIAN (AF7)
4) Execute Tools: Solver, make appropriate settings (see picture below) and click Solve
intercept 0 Avg SR 948.8008
slope 0 MEDIAN SR 763.6605

Initial Solver MEDIAN


Y PredY Residuals Residuals2 Value Answer SR
-7.82955 0 -7.82954906 61.30184 0 1.953956
15.96343 0 15.96343091 254.8311 0 4.316028 16.81255
13.65437 0 13.65436769 186.4418 -0.37 0.208621
4.514863 0 4.514862834 20.38399 5.4 4.633368 18.13862
27.63441 0 27.63440745 763.6605 True Optimum
23.74989 0 23.74989477 564.0575 4.657385
36.26646 0 36.26645921 1315.256 4.060606 10.14317
28.21482 0 28.2148237 796.0763
47.30146 0 47.30146418 2237.429 Solver alone cannot reliably find the LMS line.
42.28336 0 42.2833573 1787.882 In the simple bivariate regression case, a reliable
49.49232 0 49.4923153 2449.489 algorithm is available within Excel.
Click the LMS Fit button in cell AK1 above.

SSR values, given intercept and slope, are in the middle of the table. Note the smooth bowl-shape with a clear minimum.
intercept slope
3.9 4 4.1 4.2 4.3 4.4 4.5 4.6
0 66.16858 58.28417717 50.89977 44.01536 38.02522 31.74655 26.36214 21.47773
0.5 58.28418 50.89976972 44.01536 37.63095 32.10876 26.36214 21.47773 18.92342
1 50.89977 44.01536226 37.63095 34.41534 26.6923 21.47773 18.06339 23.52352
1.5 44.01536 37.63095481 36.80193 28.79888 21.77584 17.22337 22.5635 27.21336
2 37.63095 33.46826477 30.98547 23.68243 17.35938 21.62348 23.05406 22.69027
2.5 31.74655 29.84907732 25.66901 19.06597 20.70346 21.74759 19.92222 18.17684
3 26.66102 24.63564641 20.85255 19.80344 19.03953 22.10377 15.70878 16.87307
3.5 21.74759 19.92221549 18.17684 16.51147 17.78008 17.6523 12.30385 21.23076
4 17.33416 18.06339444 14.16341 13.81343 21.17347 13.70084 16.06154 26.08844
4.5 17.22337 14.18621498 10.64998 17.78008 16.82201 16.36714 20.31922 31.44613
5 15.7328 11.19325556 13.81343 20.26318 18.02539 20.66277 25.07691 37.30381
5.5 12.57151 14.78888787 17.78008 19.76365 22.52103 25.45841 30.33459 43.6615
6 16.36714 18.88452018 21.5819 24.45928 27.51666 30.75404 36.09228 50.51918

60 6
5.5
40 5
4.5
4
20 3
3.5
2.5
0 2
1.5
9 1
3. 4 1 0.5
4. 2
4. 4.3 4 0
4. 5
4. 4.6 7
4.

Another perspective (click on the chart and execute Chart: 3DView to generate your own view):
70
60
50
40
30
20
10
.0
49 3.21
44..43
444...7.65
1.5

3.5
4.5
5.5
0.5

2.5

44
0
1
2
3
4
5
6
To use this button on your own data, you can simply right-click the button to select it, then copy-and-paste it to a wo
that has your data. Then click on it to use it.
Right-click and delete it when you are done (or Excel will include a link to this workbook in the workbook with your d

The LMS Fit button generated the output and graph below.

OLS
Squared LMS
OLS Residual LMS Pred Squared
X Y Pred Y s Y Residuals
1 -7.82955 0.619517 71.38672 8.717991 273.8211 60
2 15.96343 5.609174 107.2106 12.7786 10.14317
3 13.65437 10.59883 9.336301 16.8392 10.14317 50
4 4.514863 15.58849 122.6252 20.89981 268.4664 40
5 27.63441 20.57815 49.79083 24.96041 7.150244 30
6 23.74989 25.5678 3.304791 29.02102 27.78475
20
7 36.26646 30.55746 32.59267 33.08162 10.14317
8 28.21482 35.54712 53.76253 37.14223 79.69859 10
d the LMS line. 9 47.30146 40.53677 45.76102 41.20284 37.19326 0
on case, a reliable 10 42.28336 45.52643 10.51753 45.26344 8.880903 -10 0 2 4 6 8 10 12
11 49.49232 50.51609 1.048113 49.32405 0.028314
-20
Intercept -4.37014 Intercept 4.657385
Slope 4.989657 Slope 4.060606
Avg SR 46.12148 66.67755
Median SR 45.76102 10.14317

wl-shape with a clear minimum.

4.7
22.24672
24.50354
29.70365
25.63833
20.8249
22.1623
27.11998
32.57766
38.53535
44.99303
51.95072
59.4084
67.36609

o generate your own view):


hen copy-and-paste it to a workbook

k in the workbook with your data).

Y
OLS Pred Y
LMS Pred Y

6 8 10 12

You might also like