Professional Documents
Culture Documents
"Less is More."
Mis van der Rohe
Architect
The OLS command will perform Ordinary Least Squares regressions and produce
standard regression diagnostics. This was introduced in the chapter A CHILD'S GUIDE
TO RUNNING REGRESSIONS. In addition, the OLS command has an extensive list of
options that provides many features for estimation and testing of the linear regression
model. For example, tests on the residuals are available with the DLAG, DWPVALUE,
GF, LM, and RSTAT options. Model selection tests (including the Akaike information
criterion and the Schwarz criterion) are available with the ANOVA option. Standard
errors adjusted for heteroskedasticity or autocorrelation are computed with the
HETCOV or AUTCOV= options. Linear restrictions on the coefficients can be
incorporated with the RESTRICT option. Ridge regression is available with the RIDGE=
option and weighted least squares can be implemented with the WEIGHT= option.
The SHAZAM output from the OLS command includes elasticities evaluated at the
sample means. However, the printed elasticities may not be correct if data has been
transformed, for example the data is in logarithms. In this case, the LINLOG,LOGLIN
or LOGLOG options can be specified to obtain meaningful elasticities.
The results from the OLS command can be accepted as input by other SHAZAM
commands to obtain further analysis of the results. For example, the DIAGNOS
command is described in the chapter DIAGNOSTIC TESTS, the TEST and CONFID
commands are described in the chapter HYPOTHESIS TESTING AND CONFIDENCE
INTERVALS and the FC command is described in the chapter FORECASTING.
where depvar is the dependent variable, indeps is a list of independent variables and
options is a list of desired options. If any variable in the list of independent variables is
a matrix, SHAZAM will treat each column as a separate explanatory variable. To
impose a lag structure on the independent variables see the chapter DISTRIBUTED-
LAG MODELS. The available options on the OLS command are:
82 ORDINARY LEAST SQUARES
ANOVA Prints the ANalysis Of VAriance tables and the F-statistic for the test
that all coefficients are zero. In some restricted models, the F-test
from an ANOVA table is invalid. In these cases the F-statistic will not
be printed, but may be obtained with the TEST command. Model
Selection Tests are also printed. This option is in effect automatically
when running in BATCH mode. In TALK mode, the ANOVA option
must be specified if the table is desired. In BATCH mode these
statistics can be suppressed with the NOANOVA option.
DFBETAS Computes the DFBETAS statistic. Also see the INFLUENCE option.
Further details and an example are given later in this chapter.
N N
N
h =
1- N 2
where = et et1 e2t1
t=2 t= 2
GNU Prepares GNUPLOT plots of the residuals and the fitted values. For
more information on this option see the GNUPLOT section in the
PLOTS chapter. With the GNU option the APPEND, OUTPUT=,
DEVICE=, PORT= and COMMFILE= options are also available as
described for the PLOT command.
LINLOG Used when the dependent variable is LINear, but the independent
variables are in LOG form. In this model the elasticities are estimated
as E k = k / Y . NOTE: This option does not transform the data. The
data must be transformed by the user with the appropriate GENR
commands.
84 ORDINARY LEAST SQUARES
LIST LISTs and plots the residuals and predicted values of the dependent
variable and residual statistics. When LIST is specified RSTAT is
automatically turned on.
LOGLIN Used when the dependent variable is in LOG form, but the
independent variables are LINear. In this model the elasticities are
estimated as E k = k X . The log-likelihood function is evaluated as:
N N N
N
2
( )
ln 2 2 + lnY t
2 t=1
2 = 1 e2t
where
N t=1
LOGLOG Used when the dependent variable and all the independent variables
are in LOG form. In this model the elasticities are estimated as
E k = k . The log-likelihood function and R2 measure are evaluated as
in the LOGLIN option above. NOTE: This option does not transform
the data. The data must be transformed by the user with the
appropriate GENR commands.
NOMULSIGSQ With this option, for classical OLS estimation, the (X X )1 matrix is
used as the complete covariance matrix of the estimated coefficients
and is NOt be MULtiplied by 2 .
( ) ( ) ( ) ( )
Cor i , j = Cov i , j / Var i Var j i,j =1,...,K
PLUSH Prints the LUSH (Linear Unbiased with Scalar covariance matrix
using Householder transformation) residuals. The Householder
transformation method first finds the N x N orthogonal matrix P
such that:
P X R
1
P X = =
P X 0
2
REPLICATE Used with WEIGHT= when the weights indicate a sample replication
factor. When this option is specified the NONORM option is often
desirable. The effective sample size will then be adjusted upward to
be equal to the sum of the weights. The UT option is automatically in
effect.
86 ORDINARY LEAST SQUARES
UT This option is used with WEIGHT= if you want the residuals and
predicted values to be UnTransformed so that the estimated
coefficients are used with unweighted data in order to obtain
predicted values. The regression estimates are not affected by this
option. This option is not available with METHOD=HH.
AUTCOV= Specifies the lag length L to use in the Newey and West [1987]
variance estimator for models with autocorrelated disturbances. This
option is the autocorrelation equivalent to the HETCOV option. An
explanation of the method is in Greene [1993, p. 422]. Use AUTCOV=1
until you really understand the method. The variance-covariance
matrix of the estimated coefficients is estimated as:
()
V = N (XX )1 S * (X X )1 where
1 L N
(
S * = S 0 + w j ete t-j Xt X t-j + X t-jX t
N j=1 t=j+1
) and w j = 1
j
L+1
FE=, FX= Specifies the entering and exiting criteria in terms of F-values rather
than probability levels when running Stepwise regressions.
ORDINARY LEAST SQUARES 87
SHAZAM will use the user-supplied values of FE= and FX= only
when no values are supplied for PE= or PX=. If either FE= or FX= is
not specified, it will be defaulted to the other value. If FX>FE then
SHAZAM will set FX to FE. If FE=0 then all of the step variables will
be allowed to enter. If the value specified for FX= is a large number
then all of the step variables may be removed. See the section on
STEPWISE REGRESSION in this chapter for further details.
INDW= Specifies a value for the Durbin-Watson test statistic. This option
may be used with the DWPVALUE option to get a p-value for the
Durbin-Watson statistic.
PCINFO= This option is only used on OLS commands in conjunction with the
PC command. For more information on this option see the chapter
PRINCIPAL COMPONENTS AND FACTOR ANALYSIS.
PCOMP= This option is only used on OLS commands in conjunction with the
PC command. For more information on this option see the chapter
PRINCIPAL COMPONENTS AND FACTOR ANALYSIS.
PE=, PX= These options are similar to the FE= and FX= options described
above, and are used to specify the Probability levels for Entering (PE=)
and eXiting (PX=) variables that may be stepped into the equation
when a Stepwise regression is being run. If a variable is more
significant than the PE level, it will be included. If at any step a
variable becomes less significant than PX it will be deleted from the
equation. The default values are PE=.05 and PX=.05. If either value is
not specified it will be defaulted to the other value. If PX< PE then PX
will be set to equal PE. If PE=1 then all of the step variables will be
allowed to enter. If PX=0 then all of the step variables will be
removed. See the section on STEPWISE REGRESSION in this
chapter for further details.
RESID= Saves the values of the RESIDuals from the regression in the variable
specified.
( )
R = [XX +kI ]1 X Y and V R =
2 [ X X+kI ]1( XX)[XX +kI ]1
ORDINARY LEAST SQUARES 89
STDERR= Saves the values of the STanDard ERRors of the coefficients in the
variable specified.
There are many temporary variables available on the OLS command. These variables
contain useful statistics from the most recent regression command in the SHAZAM
run. (For a list of the temporary variables available on each regression command see
the chapter MISCELLANEOUS COMMANDS AND INFORMATION.)
From the ANALYSIS OF VARIANCE - FROM MEAN table in the OLS output:
From the ANALYSIS OF VARIANCE - FROM ZERO table in the OLS output:
If the ANOVA option is used the model selection test statistics are available in the
temporary variables:
If the RSTAT, LIST or MAX option is used the statistics available in temporary
variables are:
If the DWPVALUE option is used then the following temporary variable is available:
EXAMPLES
This example shows how temporary variables can be useful. Consider computing the
Chow test statistic to test the hypothesis that the coefficients are the same in two
regimes (this test is described in the chapter DIAGNOSTIC TESTS). The SHAZAM
commands that compute the Chow test statistic are:
In the above example three OLS regressions are run. The first is run using all the
observations in the sample. This is the combined regression. The second and third are
run with the first 9 and last 8 observations in the sample, respectively. The purpose of
the Chow test is to test the hypothesis that the coefficients are the same in each of the
separate samples. The question mark (?) that appears before the OLS commands serves
to suppress the output, since only the $SSE variables are of interest in this problem.
(For details on the ? output suppressor and temporary variables, see the chapter
MISCELLANEOUS COMMANDS AND INFORMATION.) The contents of the $SSE
temporary variable are saved in a permanent variable before another regression is run
because only the most recent values of temporary variables are saved. When saving a
single variable, like $SSE, the GEN1 rather than GENR command should be used. The
variable F contains the Chow Test statistic computed as:
F=((CSSE-(SSE1+SSE2))/K)/((SSE1+SSE2)/(N1+N2-2K))
where SSE1 is the sum of squared errors for the OLS regression using the first 9
observations, SSE2 is the sum of squared errors for the OLS regression using the last 8
observations, CSSE is the sum of squared errors for the combined OLS regression, K is
the number of parameters, N1 is the number of observations in the first separate OLS
regression, and N2 is the number of observations in the last separate OLS regression.
The final step is to calculate the p-value. That is, for an F-distribution with (K,N1+N2
2K) degrees of freedom find the area to the right of the calculated F-test statistic of
2.662815. This is done with the DISTRIB command which is described in the chapter
PROBABILITY DISTRIBUTIONS. On the SHAZAM output this area is listed in the
column marked 1-CDF. The p-value is .099796 so we can reject the null hypothesis at the
10% level although not at the 5% level.
Note that the DIAGNOS command (see the chapter DIAGNOSTIC TESTS)
automatically computes Chow tests. That is, the same result can be obtained with the
commands:
The next SHAZAM output shows the use of the IDVAR= option on OLS using the data
that was read and printed in the chapter DATA INPUT AND OUTPUT. The variables
are the unemployment and vacancy rates for some provinces of Canada for January,
1976 as provided by Statistics Canada.
ORDINARY LEAST SQUARES 93
|_SAMPLE 1 9
|_FORMAT(A8,2F5.1)
|_READ PROVINCE UR VR / FORMAT
|_OLS UR VR / IDVAR=PROVINCE LIST
OLS ESTIMATION
9 OBSERVATIONS DEPENDENT VARIABLE = UR
...NOTE..SAMPLE RANGE SET TO: 1, 9
The RESTRICT option informs SHAZAM that RESTRICT commands are to follow.
More than one is allowed provided each is typed on a separate line. The equation is a
linear function of the variables (that represent coefficients) involved in the estimation.
NOTE: The restrictions must be a linear function of the coefficients. The END
94 ORDINARY LEAST SQUARES
command marks the end of the list of RESTRICT commands and is required. Each
restriction will add one degree of freedom.
[ ] ( )
1
r = + (X X )1 R R (XX )1 R r- R where = (X X )1X Y
( ) [ ]
1
V r = 2 (X X )1
2 ( XX )1 R R(X X )1 R R(X X )1
ee
The R 2 measure is calculated as: 1
YY NY 2
where e is the estimated residuals from the restricted estimation. Note that this
formula is not bounded at zero.
The following is an example of the use of the RESTRICT option and RESTRICT
commands with OLS using Theil's textile data. The SHAZAM commands are:
WEIGHTED REGRESSION
1
Y t = X 't + t where E( 2t ) = 2 for t=1, . . . , N
W t
and Y t, Xt and W t are observed and the unknown parameters are and 2. The values
for W t are specified in the WEIGHT= variable. Each observation of the dependent and
independent variables is multiplied by the square root of the weight variable and the
weighted least squares estimator w is then obtained by applying OLS to the
transformed model:
Wt Y t = W t X't + vt
The weights are normalized to sum to the number of observations and the
transformed residuals (that may be saved with the RESID= option) are calculated as:
Wt 1 N
v *t = (Y t X't w ) where W = Wt
W N t=1
The residual variance (reported as SIGMA**2 on the SHAZAM output) is estimated as:
1 N * 2
(v )
N K t=1 t
N 1 N N
N
2
( )
ln 2 2 + log(W t / W )
2 2 t=1
2 = 1 ( v *t ) 2
where
N t=1
|_GENR PW=1/PRICE
|_OLS CONSUME INCOME PRICE / WEIGHT=PW
OLS ESTIMATION
17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME
...NOTE..SAMPLE RANGE SET TO: 1, 17
SUM OF LOG(SQRT(ABS(WEIGHT))) = -.19396
Now consider an application to replicated data. In this case, consider that each
observation Yt, Xt is replicated Nt times. The values for Nt are specified in the
WEIGHT= variable. The weighted least squares estimator w is obtained as before by
applying OLS to the transformed model:
Nt Y t = N t X't + vt
When the REPLICATE option is used the estimated residuals are calculated as the
untransformed residuals as:
et = Yt X't w
When the NONORM option is used the residual variance (reported as SIGMA**2 on the
SHAZAM output) is estimated as:
ORDINARY LEAST SQUARES 97
N
1
N Nt e2t
N
t=1 t
K t=1
SHAZAM output from the OLS command with the INFLUENCE option follows.
The HT values are defined as the diagonal values of the Hat matrix as described in
Belsley, Kuh, and Welsch [1980, Equations 2.2 and 2.15] as follows:
(
h = DIAG X( XX )1 X )
98 ORDINARY LEAST SQUARES
where DIAG() takes the diagonal of the matrix (the HATDIAG= option can be used to
save the diagonal). The DFFIT values measure the effect of the coefficient change and
measure the change in fit due to a deletion of one observation. Denote X(t) as the
observation matrix with row t deleted and b(t) as the estimator obtained with this
matrix. Then:
DFFIT t Y
=Y t t t [ ]
(t) = X - b( t) = ht et
1- ht
where (t) = X(t)b(t)
Y and
1
ht 2 et 1
[ Yk Xk b( t)]
2
DFFITS t =
1 - h t (t) 1 - h t where 2 (t) =
N -K -1 k t
For further reference on these statistics see Belsley, Kuh, and Welsch [1980, Equations
2.10 and 2.11].
The RSTUDENT values are the Studentized Residuals as described in Belsley, Kuh, and
Welsch [1980, Equation 2.26]:
et
e*t =
( t) 1 ht
and compares the covariance matrices 2 ( XX) 1 and 2 (t)[X (t)X(t)] as described in
1
If the DFBETAS option is specified on the OLS regression the following output is
printed:
The DFBETAS statistic yields a scaled measure of the change on the estimated regression
coefficients when row t is deleted. For further reference see Belsley, Kuh, and Welsch
[1980, Equation 2.7].
j b j (t)
DFBETAS tj = for j=1,. . . ,K
(t )
(XX )-1
jj
(X X )1 Xt et
DFBETA t = - b(t ) =
1-h t
This statistic measures the effect of deleting row t on the estimated regression
coefficients. For further reference see Belsley, Kuh, and Welsch [1980, Equation 2.1].
100 ORDINARY LEAST SQUARES
STEPWISE REGRESSION
Stepwise regression is not widely used in econometrics because most economists use
economic theory to determine which variables belong in the model. However it is
available for those who think a computer algorithm is smart enough to replace
economic theory. For stepwise regression, a slight modification of the OLS command is
required. The format of the OLS command is:
where depvar is the name of the dependent variable, indeps (if present) are the names
of the independent variables that are always forced into the equation, stepvars are the
names of the the variables that may be stepped into the equation, and options are the
desired options. (See the FE=, FX=, PE=, PX= options described above.) The
DWPVALUE, HETCOV, RESTRICT and RIDGE= options may not be used when
stepwise regressions are being run. Stepwise regression automatically uses
METHOD=NORMAL.
----------------------------------------------------------------------
|**** STEPPING SEQUENCE ****| | | | |
| | | | | | | |
| STEP |VARIABLE | | |D.F. | D.F. | |
|NUMBER| LABEL | STATUS | F-VALUE |NUM. | DEN. |F-PROBABILITY|
|--------------------------------------------------------------------|
| 1 |PRICE |STEPPED IN| 129.4014 | 1 | 15| 0.000000|
| 2 |INCOME |STEPPED IN| 15.8508 | 1 | 14| .001365|
|*SUMMARY FOR POTENTIAL VARIABLES NOT ENTERED INTO THE REG. EQUATION*|
| |YEAR |IF ENTERED| .3869 | 1 | 13| .544702|
|**** END OF STEPPING SEQUENCE **** | | | |
----------------------------------------------------------------------