Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=astata. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Locally WeightedRegression:An Approach to
RegressionAnalysisby Local Fifing
WILLIAMS. CLEVELANDand SUSAN J. DEVLIN*
z
90 -~ ~ ~~~ 1500
1580
90~~~~~~~~~ 525
1520
150
1600
1540 12
*0 0 156
10 0 0~ ~~~~
O00 00000
1680
0
1700
~~~~
*
C.) ~~~0
*0 0 0
0 .
0
-Z -2 1580
t~~~~~~
so j0
*
00
.
*
o
00
_
0
0
0
CO)
_70 *. gis 00 *
1740
- 70
30 0 0 0
0~~~~~~~~ 0 Og
I
J 0000 00
~0
0~~~0 00*
10 NGC
Figure
70 0
0. -50 -25 0 25 50
East Eat- West - East
'lc
eons
oordnt East - West Coordinate(ArcSeconds)
50 - 0
08
:3
00 %00 0 o
50
2- 0 04 io 6o%
so 05 a: ~0 00~0
225
0
(D -~~~~
0
O 100 00 300
-25 0
SolarRadiation
-50 - - 50 0
-3 -2 -1 0 1 2 3
NormalQuantiles
50
75 -0|j[ 000
0 a: 00 0 m0 0
50 0~~~
0
0 ~~~~~~~~0 ~ 80 ~~
0
?O
0
o O
0
'F 50 -
~~~~0~~ 8o 100
:2 ~~~~~~0
' D~~~~
po0~~~~~~~~ 0
Temnperature
? cO 0
or 0
0
M-
5 0 (D~~~
0
00~~~0
0 25 -
250 o 0
0
?
0 0
o 00 ?P 0
00 o? 0
0 0 0
010
w 9 000 oo
I I I I . I
o 25 50 75 100 125
FinedValues
Figure4. Ozone and MeteorologicalData. Ozone was regressed on S 10 15 20
the meteorologicalvariables using locallylinearfitting
and f = .4. The WindSpeed
toppanelis a normal probabilityplotoftheresiduals.Thebottompanel
is a graphoftheabsoluteresidualsagainstthefitted values;thesmooth Figure5. Ozoneand Meteorological
Data. Theresidualsfortheozone
curveis a loess fitto thedata oftheplot,withf = 2/3.Theplotsshow dataaregraphedagainsttheindependent variables;thesmoothcurves
nonnormality and a dependenceof varianceon thelevelof thede- are loess fitsto thedata oftheplots,withf = 2/3.Theplotsindicate
pendentvariable. thattheestimatedregression
surfacedoes notfitthedata.
Cleveland and Devlin:LocoallyWeightedRegression 601
7 X
0
c 0~~~~~~~~~~~~~~
?
7'
4 -
X < t C 170 x
0
CO)
7 ~~~~~~~~~~~
0
7- 4 8 12 16 4170 a:
a:~~~~~~~~~~~~~~~~~~~~c
0
4 ] < H950 CO
7-
60 70 80 90 60 70 80 90 60 70 80 90 4 ~~~~~~~~~~~~~~~
Temperature
00 0 0 5-
120- 0 0
00
0 0 0 0
0 0
I
- 0
0
~~~~~~0
00
0
-5-
#00 00 000 0 00
0 0 0 0
00 0 0 0 0
00
0 0 3 6 9 12
0~~~~
00 00
EquivalentNumberof Parameters
50 70 90 50 150 250 350
350- 35 -l I
25-
6250
150~ ~ ~~~~~~~~5 15
CI)
173
197
220 5-
50 -
*
60 70 80
Hardness
5-5
350 - 3 6 9 12
EquivalentNumberof Parameters
100- l l l
300 - 0
0
-
a 250- o0 0 0
ioo
0 75-
a: ~~~~~0
X 200-
0
1[ ~~~~~8\
0 49 50-
0
150 -
0
00
0
0 0 0
100 00 0 25-
180 , ,
20 30 40 50
45 60 75 90 -
Hardness
hk(Xj) - h(xj)
wererunforeach configuration, resulting in 24 simula- Nevertheless, one might hopethatv1is closeto v2andthat
tions. 51is closeto '2, and thentakethedegreesoffreedom to
Therewerethreedesignconfigurations forp = 3. Only be v1and 51. The 60 simulations describedin Section9.1
the value of n = 100 was used, and the configurationswerealso used to investigate thisone-moment approxi-
weregenerated ina manneranalogoustothatforthecase mation.Forthe10%,5%, and2.5% levelsofsignificance,
withp = 2 and n = 100. Simulationswithf = .5, .7, and themaximum absolutedeviations are 3.84%, 2.68%, and
.9 wererunforeach configuration, resulting in ninesim- 1.62%,respectively. The corresponding valuesforthetwo-
ulations. momentapproximation (givenin Sec. 9.1) are 2.18%,
Figure16 showsinformation aboutthetestat the5% 1.59%,and 1.05%. The degradation intheapproximation
levelof significance. The valuesplottedon the vertical fortheone-moment case is justlargeenoughthatwe have
scalesare 5% minustheactualsignificance, and thehor- continued withthesomewhat morecomplicated two-mo-
izontalscalesarethedegreesoffreedom ofthenumerator, mentapproximation.
thatis, v1Iv2.The panelsare arrangedbyp and n. Most
important, Figure16 showsthatthe approximating 5% 9.4 Field Simulations
significance levelis closeto thetruelevelsin each ofthe As we statedearlier,a data analystcan checktheper-
60 simulations. The largestabsolutedeviationis 1.59%. formance of theapproximating distribution in anyappli-
In fact,thesituation is evenbetterthanthat,becausethe cationby a fieldsimulation. If theapproximating distri-
largestdepartures occurforthelargestdegreesof free- butionperformed poorly, thesimulation distribution could
dom, and thesevalues are somewhatlargerthanthose be usedto makeinferences. Butwe havenotyetencoun-
typically used in practice.For thecaseswithlessthan10 teredan application inwhichtheresidualshavea sample
df,thelargestabsolutedeviationis .94%. Similarresults distribution thatis wellapproximated bythenormaland
hold forthe deviationsat the 10% and 2.5% levelsof theapproximating distribution performed poorly.We will
significance. Fortheformer, thelargestabsolutedeviation illustrate the use of twofieldsimulations fortwoof the
is 2.18%; forthe latter,thelargestis 1.05%. Figure16 applications in thisarticle.
also showsthatthe deviationof thetruelevelfromthe For theestimation oftheozone surfacein Section5, it
nominallevelincreasesas p increases, as n decreases,or is sensibleto ask whether theobservedcurvature in the
as thedegreesoffreedomincrease. fittedsurfaceis significant, becausethe estimateof the
The good performance of the approximations for standarderrorof the residualsis a = .43, whichis not
ANOVA occurseven thoughthe numerator of the test smallcomparedwiththesamplestandard deviation ofthe
statisticis notindependent of thedenominator. The ap- cuberootozone concentrations, whichis .89. To address
proximation workspartlybecausethedependenceis not whether datawiththismuchnoisecansupportotherthan
strongand partlybecauseunlessn or f is verysmallthe a globalfit,we carriedout ANOVA (describedin Sec.
numerator is contributingthemosttothevariability ofthe 4.2), testingthelocallyweightedregression fitagainsta
statistic. quadraticleast-squares fit.The F statistic is 2.10 and the
9.2 LaboratorySimulations:Confidence Intervals approximating distribution is F, with19.2and89.0df.The
foroU2 and g(x) significance level is .011, so thecurvature is highly signif-
icant.We also ran a fieldsimulation with1,200replica-
The 60 simulations describedin Section9.1 werealso tions:The simulated significance levelwas .010,whichis
used to investigate confidence intervalsfor U2. For the quitecloseto theapproximating level.
90% confidence level,themaximum absolutedeviation of The resultoftheabrasion-loss application in Section7
theactuallevelfromthenominallevelwas .50%; forthe was a nonlinearadditivefit.Sincethenumberof obser-
95% levelthemaximum was .48%. Clearly,theapprox- vations(29) is small,we might reasonably askwhether the
imating distributions performed excellentlyinthesecases. data reallysupporta nonlinearregression surface.Thus
The 27 simulations forp = 1 thatweredescribedin we testedtheadditivemodelagainsta linearleast-squares
Section9.1 werealso used to investigate confidence in- fit:The significance levelwas .00256,makingthenonlin-
tervalsforg(x) at twovaluesofx: themeanofthexiand earityhighly significant. (Of course,thetestneedsto be
thelargestofthexi. Forthe90% confidence interval,the viewedwithsomecaution,becausethemodelaroseafter
largestabsolutedeviationwas .44% forthe mean and severalpasses of the fitting processand becausef was
.65% fortheextreme.For the95% interval, thelargest selectedfromtheM plot.)A fieldsimulation wasalsorun:
absolutedeviationwas .45% forthemeanand .65% for The simulated significance levelwas .00211,whichis quite
the extreme.Again,the approximations performed ex- closeto theapproximating level.
cellently.
10. DISCUSSION
9.3 OtherLaboratorySimulations
10.1 Locally Weighted Regression
In distributionalapproximations forANOVA, thediv-
isorsforthesumsofsquares,v1forthenumerator forApplications
and ~2
forthe denominator, are not generally thesame as the The methodology introduced herecan be an integral
degreesoffreedom fortheapproximating F distribution,partoftheanalysisin manyregression studies.In fact,it
v2lIv2forthe numerator and b1'/2 forthe denominator. represents a newapproach,comparedwithwhatis most
608 Journal of the American Statistical Association, September 1988
Henderson, R. (1924),"A NewMethodof Graduation," Transactions Stone,C. J.(1977),"Consistent Nonparametric Regression," TheAnnals
oftheActuarial SocietyofAmerica,25, 29-40. ofStatistics,5, 595-620.
Huber,P. (1985),"Projection Pursuit"(withdiscussion), TheAnnalsof (1982), "OptimalRates of Convergence forNonparametric
Statistics,
13,435-525. Regression," TheAnnalsofStatistics, 10,1040-1053.
Kendall,M., andStuart, A. S. (1977),TheAdvancedTheory ofStatistics Stone,M. (1974),"Cross-Validatory Choiceand Assessment of Statis-
(Vol. 1, 4thed.), NewYork:Macmillan. ticalPredictions" (withdiscussion),Journalof theRoyalStatistical
Landwehr, J. M. (1983),"UsingPartialResidualPlotsto DetectNon- Society,Ser. B, 36, 111-147.
linearity,"technical memorandum, AT&T Bell Laboratories, Murray Titterington, D. M. (1985), "CommonStructure of Smoothing Tech-
Hill,NJ. niquesin Statistics,"International Review,53, 141-170.
Statistical
Larsen,W.A., andMcCleary, S. J.(1972),"TheUse ofPartialResidual Wahba,G. (1978),"Improper SplineSmoothing,
Priors, andtheProblem
Plotsin Regression Analysis,"Technometrics, 14,781-790. of GuardingAgainstModel Errorsin Regression," Journalof the
Macaulay,F. R. (1981),TheSmoothing of TimeSeries,NewYork:Na- RoyalStatistical Ser. B, 40, 364-372.
Society,
tionalBureauofEconomicResearch. (1979),"Convergence Ratesof 'ThinPlate'Smoothing Splines
Mallows,C. L. (1966),"Choosinga SubsetRegression," unpublished WhentheData Are Noisy,"in Smoothing Techniques forCurveEs-
paperpresented at the annualmeeting of the AmericanStatistical timation, eds. T. GasserandM. Rosenblatt, Berlin:Springer-Verlag,
Association, Los Angeles. pp. 233-245.
(1973),"SomeComments on Cp,"Technometrics, 15,661-675. (1984),"Cross-Validated SplineMethodsfortheEstimation of
Priestley,M. B., and Chao, M. T. (1972),"Non-parametric Function Multivariate Functions FromData on Functionals," An
in Statistics:
Fitting," Journalof theRoyalStatistical Society,Ser. B, 34, 385- Appraisal,eds. H. A. David and H. T. David, Ames: Iowa State
392. University Press,pp. 205-235.
Reinsch, C. (1967),"Smoothing bySplineFunctions,"Numerische Math- Watson,G. S. (1964),"SmoothRegression Analysis," Sankhya, Ser.A,
ematik, 10, 177-183. 26, 359-372.
Rodriguez, R. N. (1985),"A Comparison of theACE and MORALS Weerahandi, S., and Zidek,J. V. (1985),"Smoothing LocallySmooth
Algorithms inanApplication toEngineExhaustEmissions Modeling," Processesby BayesianNonparametric Methods,"TechnicalReport
in Computer Scienceand Statistics:Proceedings oftheSixteenthSym- 26, University ofBritishColumbia,Dept. ofStatistics.
posiumon theInterface, ed. L. Billard,New York:North-Holland, Wegman, E. J.,andWright, I. W. (1983),"SplinesinStatistics,"
Journal
pp. 159-167. oftheAmerican Association,
Statistical 78, 351-365.
Silverman, B. W. (1984),"SplineSmoothing: theEquivalentVariable Whittaker, E. T. (1923),"On a New Methodof Graduation," in Pro-
KernelMethod,"TheAnnalsofStatistics, 12,898-916. ceedings oftheEdinburgh Mathematical Society(Vol. 41), pp. 63-75.
(1985), "Some Aspectsof theSplineSmoothing Approachto Young,F. W., DeLeeuw,J.,andTakane,Y. (1976),"Regression With
Non-parametric Regression CurveFitting" Journal
(withdiscussion), Qualitative andQuantitative Variables:An Alternating Least-Squares
oftheRoyalStatistical Society, Ser. B, 47, 1-52. MethodWithOptimalScalingFeatures," Psychometrika, 41,505-529.