Let's Take The Bias Out of Econometrics

Working Paper Series
Department of Economics
ISSN 1753 - 5816

Please cite this paper as:
QIN, D. (2016) “Let’s Take the Bias Out of Econometrics” SOAS Department of Economics Working
Paper Series, No. 192, The School of Oriental and African Studies.
No. 192
Let’s Take the Bias Out of
Econometrics
by
Duo QIN
December, 2016
Department of Economics
School of Oriental and African Studies
London
WC1H 0XG
Phone: + 44 (0)20 7898 4730
Fax: 020 7898 4759
E-mail: economics@soas.ac.uk
http://www.soas.ac.uk/economics/
Electronic copy available at: https://ssrn.com/abstract=2895627

The SOAS Department of Economics Working Paper Series is published electronically by
The School of Oriental and African Studies-University of London.
© Copyright is held by the author or authors of each working paper. SOAS DoEc Working
Papers cannot be republished, reprinted or reproduced in any format without the permission of
the paper’s author or authors.
This and other papers can be downloaded without charge from:
SOAS Department of Economics Working Paper Series at

http://www.soas.ac.uk/economics/research/workingpapers/
Design and layout: O. González Dávila
Electronic copy available at: https://ssrn.com/abstract=2895627

SOAS Department of Economics Working Paper Series No 192 - 2015
Let’s Take the Bias Out of Econometrics
Duo QIN1
Department of Economics, SOAS, University of London, UK
December 2016
Abstract
This study exposes the specious quality of ‘endogeneity bias’. It reviews how conceptualisation
of the bias has evolved to embrace all major econometric problems, despite extensive lack of hard
evidence. It reveals the crux of the bias – a priori rejection, as conditionally invalid, of
explanatory variables in causal postulates of interest, and of the bias correction by consistent
estimators – modification of those variables by non-uniquely and non-causally generated
regressors. It demonstrates cognitive flaws in this estimator-centred approach and highlights the
need to shake off the bias to let statistical learning play an active role in designing causally faithful
models.
JEL classification: B23, B40, C10, C50

Keywords: endogeneity, multicollinearity, self-selection, consistency, causality, conditioning
1
Contacting author; email: dq1@soas.ac.uk . This is a substantial revision of the earlier paper entitled ‘Time to
demystify endogeneity bias’. The current version has benefitted a great deal from all the comments and suggestions
made by Ruben Lee and participants at SOAS economic seminar where the working paper was presented, as well as
anonymous referees from a journal.
1
1. Introduction
The notion of endogeneity bias arguably forms the keystone of econometrics. It played a
pivotal role in the formalisation of econometrics during 1940s; it acts as a fundamental attribute
demarcating econometrics from statistics and other disciplines overlapping with statistics. At its
most fundamental, the bias arises when the ordinary least squares (OLS) estimator is applied to
an a priori constructed model in which an explanatory variable is postulated to be correlated with
the error term. The bias then acts as a marker to divide endogenous variables from exogenous
ones. This division is succinctly described in a popular textbook by Stock and Watson (2003:
333):
‘Variables correlated with the error term are called endogenous variables, while variables
uncorrelated with the error term are called exogenous variables. The historical source of
these terms traces to models with multiple equations, in which an “exogenous” variable is
determined outside the model’ [bold in original].
A slightly lengthier description can be found in Wooldridge’s textbook (2010: 54):
‘You should not rely too much on the meaning of “endogenous” from other branches of
economics. In traditional usage, a variable is endogenous if it is determined within the
context of a model. The usage in econometrics, while related to traditional definitions, has
evolved to describe any situation where an explanatory variable is correlated with the
disturbance.’
To illustrate this usage, Wooldridge lists three examples – ‘omitted variables’, ‘measurement
error’ and ‘simultaneity’ (2010: 54-5). Two other cases are listed in Kennedy’s (2008: 139-40)
textbook – ‘autocorrelated errors’ and ‘sample selection’. Correction of the bias entails the device
of consistent estimators. A description of such estimators thus occupies the core of econometrics
textbooks.
Two points are worth noting from the above quotations. First, the concept of endogeneity
bias has changed significantly from its original use in the context of applying the OLS to a
simultaneous-equation model (SEM). Second, the concept is fundamental as it is used to signify
2
virtually all the major problems which economists worry about when fitting causal postulates
with data – simultaneity bias, omitted variables, measurement error, autocorrelated errors, and
selection bias. Indeed, textbook econometrics advocates the use of consistent estimators as the
universal solution to these perceived major problems and, in doing so, spread a preconception
against prevalent endogeneity bias widely among economists. In contrast, the causal modelling
community outside econometrics has concentrated increasingly on dissecting two key conditions
for the adequate closure of statistical models – the causal Markov condition and the related
‘faithfulness’ condition – accompanied by lively development in graphic model-assisted causal
structure learning by means of computers, e.g. see Wermuth and Cox (2011), and Kalisch and
Bühlmann (2014).2 Endogeneity bias has thus played a decisive role in widening the gap in
research strategies used for causal modelling between econometrics on the one hand and statistics
and other related disciplines on the other.
In order to help bridge the gap, this paper probes into the conceptualisation of endogeneity
bias to reveal its specious quality. Essentially, the probe pins down the bias to the a priori
rejection of direct translation of causal postulates of interest into statistically conditional relations,
or more precisely, the rejection of the explanatory variable of interest as a valid conditional
variable, and the consequent bias correction to modification of the variable in question by non-
uniquely and non-causally generated regressors (section 3). As such, the issue should be
conceptually tackled as one of causal model specification rather than estimation, an argument
which has been repeatedly raised and debated in the history (section 2). The estimation outlook
is, however, pivotal in maintaining the bias. Mathematical derivations of consistent estimators,
shown as the analytical solution to correct the bias, are so impeccable that they almost completely
camouflage their shaky premise – existence of the error term as autonomous as economic
2
A recent book edited by Mayo and Spanos (2010) is a rare exception. However, a search with Google Scholar
yields no citations of this book by econometricians or economists once self-citations are discounted.
3
variables (section 4). In practice, the strength of those derivations is limited severely by the facts
that the error term is the residual derivative of a model and that the causally bivariate relations
based on which those derivations are elaborated are much too simplistic for the economic reality.
These facts help explain not only why it is impossible to measure directly and robustly the
corrections in question, but also why failures are widespread in getting consistency cross-
validated empirically of those estimators. Methodologically, the estimator-centred approach
cannot be scientific because it seeks to reduce and entangle different sources of key econometric
problems into one symptom – the presumed correlation, and to settle on analytical solutions so
long as they remove this symptom. At its core, this approach depends on a priori model closure.
Untenability of such closure is reflected in extremely naïve translations of economic reality into
causally bivariate models (Section 5). Faithful translations require the profession modify its
predominantly analytical-solution based standpoint to let statistical learning play an active and
systematic role, especially when it comes to decisions as whether, under what circumstances
and/or to what degree causal postulates are, or indeed are not, directly translatable into
statistically conditional relations.3 A paradigm shift toward a posteriori model closure entails
releasing the profession out of the conceptual trap of endogeneity bias. To facilitate this task, the
rest of this paper tries to demystify the bias by reviewing its historical formation (section 2),
offering a common and as simple as possible explanation of the crux of the bias from different
sources as well as its treatment (section 3), clarifying the cognitive deficiency in judging
consistency by a priori analytical solutions alone (section 4), and highlighting the fundamental
importance for economists to shake off the bias and engage actively in searching for causally
faithful and data-consistent models (section 5).
3
Methodologically, the issue on positions of model closure is closely related to the recurring debate over realism
of econometrics and economics as well, e.g. see Sims (1980), Hoover (2001), Mäki (2002), Colander et al (2009),
Romer (2016).
4
2. A Brief History of Endogeneity Bias
The traditional usage of the term ‘endogeneity bias’ referred to by Wooldridge stems from
Haavelmo’s 1943 exposition of simultaneity bias (SB) when the OLS was applied to an SEM.
The related history has been well studied, e.g. see Christ (1952), Epstein (1987; 1989), Qin
(1993). In the work of the Cowles Commission (CC) during the 1940s, solutions to the problem
were formalised into the device of multiple-equation based consistent estimators preceded by a
step of parameter identification of SEMs. Herman Wold (1954; 1956; 1960) was almost the sole
voice standing against this CC approach at the time and attributed the problem to inadequately
formulated causal models in terms of conditional expectations. It took several decades, however,
before Wold’s causal modelling idea won a de facto victory through reforms in dynamic macro-
econometric modelling. The victory was best reflected in a general reduction of concern about
SB as the Vector Auto-Regression (VAR) type of models became embraced by the macro
modelling community, e.g. Sims (1980).4
Although it took a few decades for econometricians’ attention on SB to wane, empirical
evidence of the bias was scant almost from the start. Initial experiments by Haavelmo failed to
yield significant OLS bias (1947), see also Girshick and Haavelmo (1947). Similar results were
also obtained in subsequent investigations, e.g. see Christ (1960), and led to Waugh’s verdict
(1961) endorsing the OLS as adequate for applied purposes. This verdict has been repeatedly
verified in various applied cases since then. Amazingly, all these empirical results were
anticipated by Wold’s ‘proximity theorem’, which shows SB to be practically negligible in an
SEM when the model is adequately specified in terms of its causal chain - see Wold and Juréen,
(1953: 37-8).
One conceptual issue which emerged as problematic during the dynamic macro-
econometric modelling was the endogenous-exogenous classification, e.g. see Aldrich (1993).
4
See Qin (2013) for a detailed study of the history of this time period.
5
This has led to a formal redefinition of ‘exogeneity’ by Richard (1980) and Engle et al (1983).
Their aim was to clarify under what conditions certain explanatory variables of interest were valid
conditional variables in statistical models. Three levels of exogeneity were identified – ‘weak’
exogeneity based essentially on causal reasoning, ‘strong’ exogeneity via time sequencing and
‘super’ exogeneity via cross-regime invariance.5 Noticeably, the latter two are based on statistical
criteria, and their evaluations are shown to rest on the parameters of interest, i.e. those
representing the effects of the exogenous variables. The redefinition thus highlights the close link
between the parameters of interest and causal specification of explanatory variables, as well as
the empirical importance of moving away from the SEM tradition to asymmetric causal model
specifications and their cross validation.
While concern about SB was dissipating among macro modellers, the possibility of
correlation between one explanatory variable and the error term attracted new attention in micro-
econometric research. This research was pioneered mainly by James Heckman (1976; 1978;
1979) in the context of limited dependent variable (LDV) models.6
Prior to 1970, the OLS bias in LDV models had already been well understood and tackled
by maximum likelihood based estimators such as probit and tobit. What led micro-econometric
research to reorient its direction was a practical situation where the explanatory variable of
interest in an LDV model was also truncated, such as the wage rate of labour supply in the context
of cross-section survey data. In order to establish a consistent estimator of the parametric effect
of a truncated explanatory variable, Heckman narrowed his research lens from an LDV regression
as a whole onto a particular explanatory variable within the regression. This new lens enabled
him to translate the truncation-induced OLS bias into a bias caused by an omitted variable, the
5
Outside econometrics, invariance is shown to be a strong condition for causal linear stochastic dependence by
Steyer (1984; 1988) in psychometrics. Cross-sample invariance is subsequently shown by Steyer et al (2000) as a
necessary condition to ensure causal regression models not suffering from OVB. See also Freedman (2004) on the
importance of invariance in non-experimental data regression analyses.
6
A brief historical account of this research and also the subsequent developments in programme evaluation methods
is given in Qin (2015, 2.2). The following description is written to complement rather than repeat that account.
6
inverse Mill’s ratio, a variable derived from the truncated error term of the LDV model.
Correlation between the truncated explanatory variable and the error term was thus established.
Derivation of the inverse Mill’s ratio led to an extension of the single-equation LDV model into
a two-equation one, which closely resembled the two-stage least squares (2SLS) representation
of the instrumental variable (IV) solution to SEMs. However, and in contrast to the usual 2SLS
representation, Heckman also provided a behavioural interpretation to the added instrumental
equation stating that it depicted the self-selection decisions of the micro agents involved. This
powerful interpretation rationalised the truncated explanatory variable to an endogenous variable.
On the applied front, evidence of self-selection bias (SSB) appeared much easier to obtain
than that of SB, if judged by the statistical significance of the inverse Mill’s ratio. However, it
gradually transpired that such evidence lacked robustness in that it depended on the extensive
presence of collinearity among possible control variables. It thus proved impossible to pin down
a unique inverse Mill’s ratio to verify conclusively the presence of SSB, e.g. see Puhani (2002).
Moreover, it was found that there is frequently a negligibly small difference between an IV
treatment of SB and one of SB plus SSB on an ‘endogenous’ variable, such as wage rate, e.g. see
Blau and Kahn (2007), Qin et al (2016). These findings suggest a very weak connection between
SSB and SB, but a rather strong one between SSB and multicollinearity. The latter severely
discredits Mill’s ratios as effective measures of SSB and raises the question of how much agents’
self-selection behaviour really matters at the aggregate level.
The connection between SSB and SB was subsequently abandoned in the development of
programme evaluation methods. These were developed mainly during the post 1980 period and
drew heavily on the SSB literature, e.g. see Cameron (2009, 14.5) and Wooldridge (2010, 21.1).
Obviously, outcomes of any policy-driven programmes come after their implementation.
Simultaneity is thus impossible by construction. Self-selection behaviour is not, however,
impossible because some participants of the programmes could be self-selected rather than
7
randomly selected. When average treatment effect (ATE) models were adopted from medical
science for evaluating social programmes, randomisation failures were regarded as a major
challenge, e.g. see Heckman (1992). In addition to sample selection problems concerning the
comparability between the treated group and the control group, self-selection behaviour was
considered un-ignorable on substantive grounds.7 Heckman’s SEM presentation of endogenous
dummy variables (1978) demonstrated an attractive route to tackling this issue. Once the ATE
was attached to an endogenous dummy variable, SSB correction became associated with
randomisation and the IV route was resorted to naturally, e.g. see Heckman (1996).
On the applied side, this IV route has been strongly promoted by Angrist and his associates
through a series of studies, see Angrist (1990), Angrist and Krueger (1991), and Angrist and
Pischke (2009). While their studies helped popularise the prevalence of endogeneity bias, their
applications gave rise to serious debate over the interpretability of IV-generated estimates such
as the ATE, e.g. see Angrist et al (1996, with discussion). As a result, their interpretation was
narrowed down to local ATE (LATE). Noticeably, this revised interpretation implies a partial
recognition of the causal-modifying capacity of IVs, and in particular that IV-modified
programme dummies might no longer fully represent the programme implemented in reality.
Angrist and Pischke (2015, p227) also acknowledged the possibility of the IV choice ending up
with ‘a failed research design’. A probe into their failed case by van Hüllen and Qin (2016) tracks
down the failure to misconceived causal model designs rather than estimator choices.
Similar debates have recurred in the field of development economics (see Journal of
Economic Literature, 2010, no 2). There, the key problem of IV-assisted quasi-randomisation is
criticised as a fundamental misunderstanding of exogeneity (Deaton, 2010). In contrast to the
highly theoretical style of causality analysis by the CC in rivalry with Wold’s causal chain
7
Such behaviour is referred to as ‘selection on unobservable’ in textbooks as opposed to ‘selection on observable’,
which covers both OVB and sampling selection concerning comparability of the two groups.
8
modelling arguments over half a century ago, the issues examined by Deaton are widely and
closely relevant to policy related applied modelling research. The accumulation of fragile and
imprecise IV estimates, which have been produced out of concern over the presence of
endogeneity bias, has reached such a state that it is no longer possible for the wide applied
community to maintain faith in this approach. Consequently, the IV route has increasingly been
abandoned in development economics, as occurred in macro-econometric studies decades ago.
The trend among macroeconomics and development economics to abandon the IV route
raises the question of why micro-econometric researchers have been side-tracked into an infertile
path already explored and abandoned by macro researchers. The transmutation of simultaneity-
induced endogeneity bias into one all-bias-inclusive phenomenon is pivotal because few applied
modellers would ignore the probable presence of omitted variable bias (OVB) or SSB, due to
their close association with the highly interdependent nature of economic causes and the decision-
making effect of micro agents. The fact that empirical results for treatments of the bias have been
largely disappointing, however, suggests the existence of flaws in both the conceptualisation of
endogeneity bias and its treatment. The aim of the next section is to dissect the different sources
of the bias in order to explore the common nature of the bias treatment. The subsequent two
sections are devoted to extensive discussions about the cognitive flaws surrounding the bias.
3. Endogeneity Bias: An Anatomy
The anatomy is carried out here on three key sources of endogeneity bias – SB (simultaneity
bias), OVB (omitted variable bias) and SSB (self-selection bias). ‘Measurement error’ is
discussed in relation to both SB and SSB. The anatomy aims at finding a common rationale upon
which these biases are believed reducible to one common symptom – the correlation in question.
It tracks down this rationale via the IV-estimator route of treating the biases, a universal route
taught in textbooks, and exposes the nature of the route beyond the estimation box – modification
of the explanatory variable by non-uniquely and non-causally generated regressors on the a priori
9
conviction that the variable is an invalid conditional variable. The exposure can be viewed as an
extension of the key finding in Qin (2015). Mathematical demonstration is kept to a minimum
and causal interpretation of various models is illustrated in causal graphs to facilitate the logical
exposure of the estimator-based route as an implicit re-specification device of postulated
explanatory variables of key causal interest.8
3.1. SB (Simultaneity Bias):
Let us analyse SB with a bivariate case for simplicity. When two variables are jointly
distributed, elementary probability theory dictates the following density decomposition:
(1) 𝑓𝑥,𝑦 = 𝑓𝑦|𝑥 𝑓𝑥 .
Statistical models for causal inference are commonly based on the conditional expectation 𝐸𝑦|𝑥
of 𝑓𝑦|𝑥 where 𝑓𝑥 is marginalised out. The concept of conditional expectation is crucial in bridging
causal postulates with statistical evidence by means of data aggregation. Here, it underpins
regression models such as:
(2) 𝑦 = 𝛽𝑦𝑥 𝑥 + 𝜀𝑦 .
Now, the decomposition in (1) is de facto refuted by Haavelmo (1943) and the works of the
CC on the SEMs, although they endorse the joint distribution, 𝑓𝑥,𝑦 , as being fundamental in
econometrics. The refutation is embodied in their rejecting (2) in favour of an SEM, such as:
𝑦 = 𝛽1 𝑥 + 𝜖𝑦
(3)
𝑥 = 𝛽2 𝑦 + 𝜖𝑥
Based on (3), Haavelmo demonstrates SB of the OLS, 𝛽𝑦𝑥 ≠ 𝛽1, via 𝑐𝑜𝑣(𝑥𝜖𝑦 ) ≠ 0. But such a
bi-directional position on 𝑓𝑥,𝑦 makes (3) mathematically impossible for statistical estimation. This
impossibility is termed ‘under-identification’ and circumvented by identification conditions.
These conditions secure ways to decompose 𝑓𝑥,𝑦 indirectly with the help of additional exogenous
8
Causal graphs, also known as directed acyclic graphs, are widely used in statistics and computing, e.g. see Pearl
(2009), Wermuth and Cox (2011); see also Spirtes (2005) and Elwert (2013) for their potential in econometric and
social research respectively.
10
variables,9 variables which are regarded simply as instruments for the consistent estimation of
the ‘structural’ parameters of the SEM, such as 𝛽1 and 𝛽2 in (3). Consistent estimation of a single
equation in an SEM can be generically represented by the 2SLS, e.g. the upper equation in (3):
𝑥 = 𝑉′𝜸𝑥𝑉 + 𝑢𝑥
(4)
𝑦 = 𝛽𝑦𝑥𝑉 𝑥̂ 𝑉 + 𝜖𝑦𝑉
where V denotes the IV set and 𝑥̂ 𝑉 the OLS fitted x from the upper equation. 𝛽𝑦𝑥𝑉 is known as
the IV estimator for 𝛽1 in (3).
In fact, this IV estimator is not harmless with respect to (3). It acts as an implicit model
modifying device to break circular causality in SEMs. In order to demonstrate this point more
clearly, consider the case of an errors-in-variables model in which the explanatory variable of
interest is latent, or suffers from measurement errors. This is a case where the IV method was
introduced into econometrics prior to its application to SEMs:
(5) 𝑦 = 𝛽𝑦𝑥 ∗ 𝑥 ∗ + 𝜀𝑦∗ ; 𝑥 = 𝑥 ∗ + 𝑥"
Here, IVs essentially serve as a means to trim off noisy errors, 𝑥", from x, the observed
counterpart of the latent and measurement-error free 𝑥 ∗ . Figure 1 provides a graphic illustration
of (5) and its IV treatment.
Figure 1. Errors-in-variables Model and IV Treatment
Model (5)
Note: The square symbol indicates a latent variable; the arrowed line indicates a
probabilistically conditional relation; the disimilarity between 𝑥̂ 𝑉 and 𝑥 is
shown by a semicircle (seminode) versus a circle (node); and dotted lines
indicate non-uniqueness.
9
This interpretation was implied in Wermuth’s (1992) in-depth analysis of how over-parameterisation in multivariate
linear structural equations results in non-decomposable independence hypotheses, and identification conditions help
to remove the over-parameterisation so as to achieve decomposable independence.
11
It is vital to note from Figure 1 that the IV treatment implies two conditions: (i) IVs should
be uncorrelated to conditional expectation, 𝐸𝑦|𝑥 ∗ , and (ii) the aim of the IV filter is not to
optimally predict x, as is normally expected of a regression model design, i.e. 𝑥̂ 𝑉 ≉ 𝑥 must hold.
The first condition is denoted by 𝑦 ⊥ 𝑉|𝑥 ∗ and the second by the dotted semicircle symbol in the
right panel. When the IV method is applied to SEMs, these two conditions hold in spite of the
fact that x is not regarded as suffering from measurement errors. The left panel of Figure 2 depicts
model (3) and the right panel the 2SLS-IV solution of (4).
Figure 2. SEM and IV Treatment via 2SLS
Model (4)
Model (3)
Figure 2 shows us how the bi-directional position in the left panel is broken by the
introduction of IVs. In other words, it shows how an SEM is revised into an asymmetric model
through a non-unique but significant modification of x. The fundamental bi-directional position
of SEMs is thus falsified implicitly. The revised position suggests the following alternative to
(1):
(6) 𝑓𝑥̂ 𝑉,𝑦 = 𝑓𝑦|𝑥̂ 𝑉 𝑓𝑥̂ 𝑉 ,
because condition (i), i.e. 𝑦 ⊥ 𝑉|𝑥̂ 𝑉 , enables the marginalisation over 𝑥̂ 𝑉 . This decomposition
effectively refutes x as a valid conditional variable for y, since 𝑥̂ 𝑉 ≉ 𝑥 by condition (ii).
Condition (ii) is apparently unheeded in textbook econometrics, because discussions on the
IV treatment is confined well within the estimation domain. Nonetheless, there are two places
where this condition is discernible and indispensable. The first is the Durbin and Wu-Hausman
endogeneity test, where rejection of the null hypothesis amounts to endorsing this condition. The
12
second is the generalised method of moments (GMM) – the generalised form of IV estimators.
For example, the IV estimator of model (4) can be written as:
𝛽𝑦𝑥𝑉 = (𝑋 ′ 𝑉(𝑉 ′ 𝑉)𝑉 ′ 𝑋)−1 𝑋 ′ 𝑉(𝑉 ′ 𝑉)−1 𝑉 ′ 𝑌, or: 𝛽𝑦𝑥𝑉 = (𝑋 𝑉′ 𝑋 𝑉 )−1 𝑋 𝑉′ 𝑌, where 𝑋 𝑉 = 𝑉 ′ 𝜸𝑥𝑉 .
The above dual expression tells us how the conditional expectation of x implied by the OLS is
modified by V, and the modification amounts to replacing x by a composite index of it where the
weights are derived from V. In other words, the IV treatment can be interpreted as rejecting x as
suffering from severe measurement errors while 𝑥̂ 𝑉 is assumed to be the error-free variable.
The above interpretation helps us better understand why the treatment has been effectively
abandoned by macro modellers. Theoretically, it is impossible to define a uniquely ‘optimal’ V
on the two conditions, which explains why disputes over the credibility of chosen IVs and the
associated over-identification conditions have remained unsettled. Empirically, although it is not
difficult to find IV estimates which differ significantly from their OLS counterparts, the latter
almost surely outperform the IV estimates as time-series samples extend, indicating lack of
empirical consistency of those IV estimates. Such evidence effectively rejects 𝑥̂ 𝑉 in favour of x.
Meanwhile, improved forecasting performance in macroeconometric modelling after the
common adoption of VARs leads to 𝑥̂ 𝑉 → 𝑥, as rightly anticipated by Wold’s proximity theorem.
Condition (ii) thus fails to hold. The validity of the postulate, 𝐸𝑦|𝑥 , is effectively restored.
Endogenisation of x does not automatically nullify its conditional status with respect to another
endogenous variable.
A prerequisite of that conditional status is that VARs are built on a recursive structure. The
prerequisite is eventually approved tacitly by VAR modellers, after their unsuccessful attempts
to try and maintain the SEM paradigm by extending 𝑓𝑥,𝑦 explicitly with a lagged information set
so as to facilitate the decomposition: 𝑓𝑥,𝑦,𝑉−1 = 𝑓𝑥,𝑦|𝑉−1 𝑓𝑉−1 , e.g. see Qin (2013, Chapter 3). After
all, statistically operational models have to start from a clearly specified ‘asymmetry between
cause and effect’ Cox (1992: 293).
13
3.2. OVB (Omitted Variable Bias)
OVB complicates the inference of 𝐸𝑦|𝑥 by evoking an additional relationship between x
and an omitted variable, z, in conditional models, such as (2), when 𝑐𝑜𝑣(𝑥𝑧) ≠ 0 is believed to
hold. Although rarely phrased in explicit causal terms, such a relationship is inevitably implied
in discussion over parameter interpretability.
Let us dissect OVB here from the causal perspective. There are obviously two ways to
factorise 𝑓𝑥,𝑦,𝑧 when we take z into consideration while holding y as the modelled variable:
(7a) 𝑓𝑥,𝑦,𝑧 = 𝑓𝑦|𝑥,𝑧 𝑓𝑥|𝑧 𝑓𝑧 ;

(7b) 𝑓𝑥,𝑦,𝑧 = 𝑓𝑦|𝑥,𝑧 𝑓𝑧|𝑥 𝑓𝑥 .
Under the linearity assumption, (7a) corresponds to a chain of regressions:
(8) 𝑦 = 𝛽𝑦𝑥.𝑧 𝑥 + 𝛽𝑦𝑧.𝑥 𝑧 + 𝜈𝑦

(9a) 𝑥 = 𝛽𝑥𝑧 𝑧 + 𝜈𝑥 ,
whereas (7b) entails (8) being followed by:
(9b) 𝑧 = 𝛽𝑧𝑥 𝑥 + 𝜈𝑧 .
Interestingly, OVB is of little causal concern in the latter case, i.e. (7b) or (8)+(9b), a case known
as a mediation model (the right panel of Figure 3). That is because all the parameters are regarded
as interpretable: 𝛽𝑦𝑥.𝑧 as the direct effect, the product, 𝛽𝑦𝑧.𝑥 𝛽𝑥𝑧 , as the indirect effect and 𝛽𝑦𝑥 of
the bivariate regression (2) as the total effect. It is only in (7a), or (8)+(9a), that OVB is regarded
as problematic, a problem referred to as confounding in statistics (the left panel of Figure 3).
Specifically, 𝛽𝑦𝑥 is shown to suffer from OVB defined as 𝛽𝑦𝑥 − 𝛽𝑦𝑥.𝑧 = 𝛽𝑦𝑧.𝑥 𝛽𝑧𝑥 .
Figure 3. Two types of regression chains
Model (8)+(9a) Model (8)+(9b)

Confounding Mediation
14
The contrasting attitudes towards omitted correlated variables in these two types shed light
to the time-series modelling reforms in macroeconometrics. When OVB is due to omitted lagged
variables, direct dynamic extension of static models results in models with confounding variables
– those lags. Model transformation into ECMs or using cointegration techniques effectively
recasts those models into the mediation type, such that the long-run steady-state component, i.e.
the core theoretical component, is positioned at the root of the causal chain, as compared to short-
run or transitory shocks.
When lagged correlated variables do not form the main source of OVB risk, however, ways
to circumvent confounding become varied. In order to reveal their common approach, we need
to ponder over the three special cases of (7a), as illustrated in Figure 4.10 Case (c) is irrelevant to
our current discussion as it contradicts the basic causal postulate, 𝑥 → 𝑦.11 Case (a) tells us that
z is correctly omitted because 𝛽𝑦𝑧.𝑥 = 0. This case is virtually non-existent in applied
econometrics for the obvious reason that it rules out the need for multivariate models.
Remarkably, case (a) assumes a fundamental position in textbooks – the maintained model within
which all consistent estimators aimed at circumventing the presumed 𝑐𝑜𝑣(𝑥𝜀𝑦 ) ≠ 0 are devised.
Omission of z is also justifiable in case (b) as long as the modelling objective stays at the level of
partial inference of 𝑥 → 𝑦, for the obvious reason that 𝑐𝑜𝑣(𝑥𝑧) = 0 or 𝛽𝑧𝑥 = 0.
Figure 4. Three Special Cases of (7a) or (8)+(9a)
y x z
(a) 𝑦 ⊥ 𝑧|𝑥 (b) 𝑥 ⊥ 𝑧 (c)
Case (b) underpins almost all methods to combat confounding using non-experimental data.
A major obstacle here, however, is the fact that many postulated causes do not satisfy 𝑥 ⊥ 𝑧. The
10
See Cox and Wermuth (2004) for more discussion of these cases.
11
Notice that maintaining model (2) in case (c) leads to nonsense regression.
15
IV method is thus proposed as a generic way to bypass this obstacle following the same treatment
route of SB. Imagine the following situation: z is not directly observable from available data but
is known substantively to be correlated with x. As such, (2) is an inadequate model as compared
to (8). However, the lack of data for z leads to it being grouped into the error term, 𝜀𝑦 = 𝛽𝑦𝑧.𝑥 𝑧 +
𝜈𝑦 , and hence the diagnosis: 𝑐𝑜𝑣(𝑥𝜀𝑦 ) ≠ 0. This diagnosis entangles OVB with ‘endogeneity
bias’ although simultaneity is absent in either model.
Suppose that we can find IVs which are uncorrelated with 𝑧 ∗ , the latent z, the IV treatment
is shown to be effective to circumvent the OVB with respect to 𝑧 ∗ . Similar to the SB case, the IV
treatment is not harmless for conditional decompositions, such as (7a) versus (7b), which should
arise naturally from substantive postulates. The treatment modifies (8) by substituting x with 𝑥̂ 𝑉 .
Since 𝑥̂ 𝑉 ≉ 𝑥, 𝛽𝑦𝑥̂ 𝑉.𝑧 ≠ 𝛽𝑦𝑥.𝑧 . Hence, it should not produce the parameter of interest originally
specified in (8) if the IV treatment is effective. Figure 5 illustrates this modification – from the
left panel in Figure 3 into case (b) in Figure 4 through deactivating the chain effect. The
corresponding modification to (7a) is:
(10) 𝑓𝑥̂ 𝑉,𝑦,𝑧 ∗ = 𝑓𝑦|𝑥̂ 𝑉,𝑧 ∗ 𝑓𝑥̂ 𝑉 𝑓𝑧 ∗
Since 𝑧 ∗ is latent and 𝑥̂ 𝑉 ⊥ 𝑧 ∗ is assumed to hold due to 𝑉 ⊥ 𝑧 ∗ by design, 𝐸𝑦|𝑥̂ 𝑉 is thus regarded
as an adequate model for measuring the causal effect: 𝑥 → 𝑦 on its own.12
Figure 5. Latent OVB and IV Treatment
Combining model (8) with an

errors-in-variables model 𝑉 ⊥ 𝑧∗
12
It should be noted that the orthogonal condition 𝑉 ⊥ 𝑧 ∗ may result in a modification of the modelled variable as
well. This case is not discussed here in order to keep the demonstration simple. Further discussion on this point is
provided in the next section.
16
However, 𝐸𝑦|𝑥̂ 𝑉 on the basis of (10) leads to 𝑥̂ 𝑉 → 𝑦 rather than 𝑥 → 𝑦, an inadvertent
modification of the original causal postulate, as already shown in the SB case. It is no wonder the
IV treatment here has also met with unceasing scepticism over its substantive credibility. But the
nature of the problem is probably harder to perceive than before, not only because the role of IVs
is discussed within the confinement of estimation methods in textbooks, but also because of the
non-causal requirement of IVs. This requirement excludes them from ever being considered in
augmenting the causal chains, e.g. 𝑓𝑦|𝑥,𝑧 in (7a) to 𝑓𝑦|𝑥,𝑧,𝑉 , and reinforces the belief that they can
generally help achieving reducibility of models of the confounding type to special case (b) in a
causally neutral way, e.g. from 𝑓𝑦|𝑥,𝑧 to 𝑓𝑦|𝑥̂ 𝑉 . This specious capacity is so attractive to modellers
whose major concern is how to stay within highly partial postulates while still feeling secure
about the ceteris paribus condition, e.g. see Angrist and Pischke (2015, Introduction), that it
makes the OVB-based diagnosis of 𝑐𝑜𝑣(𝑥𝜀𝑦 ) ≠ 0 much more convincing than the SB-based one.
Attributing those omitted variables as latent further exempts the correlation from being ever
directly measured and verified.
Nonetheless, SB and OVB are discernibly different as far as their behavioural connotations
are concerned. An essential notion to bridge the difference is the bias induced by self-selection
behaviour of micro agents.
3.3. SSB (Self-Selection Bias)
The SSB issue arises from LDV models in which the key explanatory variable of interest
is also data truncated. Let us modify a multivariate model, such as (8), accordingly (assuming 0
as the threshold):
𝑦𝑖 = 𝛽𝑦𝑥.𝑧 𝑥𝑖 + 𝛽𝑦𝑧.𝑥 𝑧𝑖 + 𝜈𝑖,𝑦 if 𝑥𝑖 > 0

(11)
𝑦𝑖 = 0 if 𝑥𝑖 = 0
where 𝑖 denotes sample observation. It is hypothesised that the observable part of x, the key
explanatory variable of interest, is a biased representation of the population due to certain self-
17
selection behaviour of the agents concerned. A classic example is wage rate in an hourly labour
supply model using household survey data. This bias is regarded as equivalent to 𝑥𝑖 suffering
from OVB in (11) by Heckman in his probit-OLS two-step procedure. The procedure exploits
data truncation by turning it into a binary LDV equation to capture the presumed self-selection
behaviour:
(12) 𝑑𝑖 = 𝑉𝑖′ 𝜸𝑑𝐼 + 𝑢𝑖,𝑑 𝑑𝑖 = 1 when 𝑥𝑖 > 0, 𝑑𝑖 = 0 when 𝑥𝑖 = 0
under the assumption: 𝑐𝑜𝑣(𝑢𝑑 𝜈𝑦 ) ≠ 0. An inverse Mill’s ratio, 𝑟𝑖 ,13 is generated from probit
regression of (12) and added to (11) to correct the presumed OVB in 𝑥𝑖 due to SSB:
𝑉
(13) 𝑦𝑖 = 𝛽𝑦𝑥.𝑟𝑧 𝑥𝑖 + 𝛽𝑦𝑧.𝑥 𝑧𝑖 + 𝛽𝑦𝑟.𝑥 𝑟𝑖 + 𝜈𝑖,𝑦 𝑤ℎ𝑒𝑛 𝑥𝑖 > 0
Let us now assume 𝑧 ⊥ 𝑉 for simplicity. Although this assumption is too strong when compared
with the usually required exclusion restriction, it reveals a vital difference between tobit and the
Heckman procedure. While tobit corrects the LDV effect across all the explanatory variables in
a multivariate model indiscriminately, the Heckman procedure does not. The latter only targets
𝛽𝑦𝑥.𝑧 , but not 𝛽𝑦𝑧.𝑥 in (11), because only 𝑥𝑖 is assumed to be suffering from OVB due to SSB.
Conceptually, therefore, the OVB-based SSB differs from sampling bias per se, in that self-
selection behaviour could still be present in survey samples which are shown to be adequately
representative of the population concerned. A graphic illustration of this feature is given in Figure
6. The right panel there shows a noticeable similarity to the IV graphs in Figures 2 and 5. This
similarity is naturally the result of SSB being comprehended from the same lens of an
endogenised x, albeit indirectly via d, when (13) is appended by selection equation (12). The
endogenisation is justified by the interpretation of (12) as a representation of self-selection
behaviour, diagnosed via the presumed 𝑐𝑜𝑣(𝑥𝜈𝑦 ) ≠ 0 and circumvented by OVB correction. The
𝜙(𝛾 𝑉)
13
𝑟𝑖 = Φ(𝛾𝑥.𝑉 𝑉𝑖 ) , where 𝜙(⋅) and Φ(⋅) stand respectively for the density and cumulative density of standard
𝑥.𝑉 𝑖
normal distribution.
18
substantive link between endogeneity bias and OVB is thus established, dispensing with
simultaneity as a necessary justification.
Figure 6. Truncated Regression and the Heckman Two-step Procedure
Model (11)
Model (13) plus (12)
Note: Truncated variables are illustrated by ovals and binary variables by solid nodes.
Regression using only subsample data omitting the truncation is illustrated by
truncated ovals. Dotted identities represent variable transformation. Lines
between the two regressors are arrow-free to reflect missing specification of the
causal relationship between them in (11).
The link is further turned into a more direct one once x is replaced by d, e.g. see (Heckman,
1978):
𝑦𝑖 = 𝛽𝑦𝑑.𝑧 𝑑𝑖 + 𝛽𝑦𝑧.𝑑 𝑧𝑖 + 𝜈𝑖,𝑦

(14)
𝑑𝑖 = 𝑉𝑖′ 𝜸𝑑𝑉 + 𝑢𝑖,𝑑
The above model forms the prototype of programme evaluation models beyond the LDV context.
Since self-selection behaviour is a major concern when randomisation comes under serious doubt,
the IV route to treat SSB indicated in (14) offers an expedient remedy. The remedy effectively
treats as latent the ideally randomised programme participation variable, 𝑑𝑖∗ . The endogenized d
thus turns (14) into an errors-in-variables model, as illustrated in Figure 7.
Figure 7. Endogenous Dummy Variable Model and IV Treatment
19
A further mutation extends (14) to a general latent variable model, where the key
explanatory variable of interest is believed to suffer from measurement errors due to SSB.
Dummy variables representing social programmes are used as IVs to circumvent the bias, e.g.
see Angrist and Krueger (1991) for the case of using compulsory school laws as IVs to model the
returns of education, a variable which is assumed to be suffering from OVB and SSB. Causal
diagrams of this mutation are illustrated in Figure 8, see also van Hüllen and Qin (2016). The
associated model can be written as:
𝑦𝑖 = 𝛽𝑦𝑥 ∗.𝑧 𝑥𝑖∗ + 𝛽𝑦𝑧.𝑥 ∗ 𝑧𝑖 + 𝜈𝑖,𝑦

(15)
𝑥𝑖 = 𝛾𝑥 ∗𝑑 𝑑𝑖 + 𝑢𝑖,𝑥 ∗ ⇒ 𝑥𝑖∗ = 𝛾̂𝑥 ∗𝑑 𝑑𝑖
where x is assumed to suffer from measurement errors, see (5). This mutation serves as the best
example of how assumed endogeneity bias has become an all-bias-inclusive and almost
ubiquitous problem in econometric modelling.
Figure 8. A Variation and Extension of Figure 7
The above dissection helps explain why concerns over assumed SSB-induced endogeneity
bias in x, the explanatory variable of key interest, have dissipated among seasoned researchers
when they work on multivariate LDV models using large survey data samples. Although a
significant inverse Mill’s ratio is easy to produce via purposefully exploiting collinear
information in data, it is not easy to obtain significant OVB effect on x by such a ratio in (13)
when the size of residuals takes the dominant share of sample information, e.g. when the model
fit is below 20% or less. It also helps us understand why interpretation over the ATE versus LATE
models has become a major concern among modellers on social programme evaluation. Non-
unique but significant modifications of the dummy variable representing a social programme is
20
clearly not harmless to the credibility of parameter estimates associated with those modifications,
especially when the estimates fail to demonstrate empirical consistency.
Once we have revealed the common nature of the all-bias-inclusive diagnosis of the
correlation in question as the a priori rejection of that variable as a valid conditional variable, it
becomes easier to see where the notion of endogeneity bias is flawed. It insists on significant
modifications of the key variable via non-causal and non-unique IVs and maintains innocuity of
such modifications in causal inference. Few applied modellers would be willing to have their
carefully postulated causal variables modified arbitrarily on non-causal grounds without
empirical verification. What has misled many of them into actually doing so is the logical
necessity of consistent estimation after the correlation is algebraically shown to be present in their
a priori postulated models. Hence, we need re-examine this necessity and, in particular, two key
concepts involved – the error term and consistent estimation.
4. The Error Term and Consistent Estimation
Consider, first, the error term. Unlike variables, error terms are by-products of fitted
models. They vary with both model formulations and estimator choices. Since models are
expected to explain all the regularities of interest in data, error terms are deemed residuals, which
are specified by set statistical criteria. Different estimators are available to produce residuals in
compliance with those criteria, because of differences in both model formulations and data
features. The truth in endogeneity bias arguments lies essentially in the prerequisite for modellers
to choose estimators in accordance with their modelling needs. Unfortunately, the diagnosis of
endogeneity-bias induced correlation, which buttresses the argument, has led econometrics
astray. By fostering the conviction of universal existence of the correlation, it has steered
modellers completely away from examining the possibility of whether, and under what
circumstances, a postulated cause is a valid conditional variable upon which credible inference is
data permissive. That possibility is already a priori rejected, as shown in the previous section.
21
Those consequent IV solutions are like type III errors, i.e. errors which produce ‘the right answer
to the wrong question’ (Kennedy, 2002, p.572).
The flaw in the correlation diagnosis lies not merely in the fallacy that error terms could
exist in an as model-independent way as variables. It is also manifested in its requirement on the
error term being compositionally simple enough such that significant correlation of it with a
particular regressor is identifiable. This requirement is only tenable for bivariate models. Indeed,
textbook discussions on the premise of 𝑐𝑜𝑣(𝑥𝜀𝑦 ) ≠ 0, be it associated with SB, OVB or SSB,
are all built on bivariate models. As soon as these models are extended to multivariate ones, e.g.
(8), 𝑐𝑜𝑣(𝑥𝜈𝑦 ) ≠ 0 becomes mathematically intractable. Interpretation of one regressor as the
causing variable of interest and the rest as control variables in a multivariate model is purely
based on substantive reasoning. To further assert a significant 𝑐𝑜𝑣(𝑥𝜈𝑦 ) ≠ 0 requires not only
that 𝑐𝑜𝑣(𝑧𝜈𝑦 ) = 0 for the entire set of control variables, but also that the set is exhaustive.
Neither condition is tenable in practice. In fact, error term has long been conceived as sundry
composites of what modellers are unable and/or uninterested to explain.14 The correlation
diagnosis effectively negates this traditional conception.
Nevertheless, the flaw in the correlation diagnosis is deeply camouflaged by the need for
consistent estimation. Here, consistency is judged by 𝑐𝑜𝑣(𝑥𝜀𝑦 ) = 0. The tautological nature of
this judgment underlies a vehement criticism by Pratt and Schlaifer three decades ago (1984;
14
Frisch classified statistical variations into three types – systematic variations, accidental variations and
disturbances. The latter two formed the error term. In his view, ‘accidental variations are variations due to the fact
that a great number of variables have been overlooked, consciously or unconsciously, each of the variables being
however of minor importance’, whereas ‘disturbances are variations due to the fact that one single, or a certain
limited number of highly significant variables have been overlooked’; ‘in economics we are actually often forced to
throw so much into the bag of accidental variations that this kind of variations comes very near to take on the
character of disturbances. In such cased it would perhaps be more rational to introduce a hierarchic order of types of
variations, each type corresponding to the overlooking of variables of a certain order of importance.’ (Frisch’s 1930
Yale lecture notes, Bjerkholt and Qin, 2010, p.165). In the CC works, the error term was taken simply as ‘the joint
effect of numerous separately insignificant variables that we … presume to be independent of observable exogenous
variables’ Marschak (1953, p. 12). Subsequently, the error term was generally described as ‘the effect of all those
factors which we cannot identify for one reason or another’ (Malinvaud, 1966, p. 74). However, the description has
not been directly related to the error term of simple regression models. See also Qin (2013, Chapter 8) for a history
of the error term in time-series econometrics.
22
1988).15 Essentially, their criticism builds on the primacy of tackling exhaustively the omitted
variable problem in non-experimental data modelling. Without such an ‘exhaustive exploration’
first, in their view, it is impossible to acquire adequate knowledge of what the error term
represents and hence what justifies the significant presence of 𝑐𝑜𝑣(𝑥𝜀𝑦 ) ≠ 0. This viewpoint is
summarised by Cox into an essential step of statistical inference – checking ‘the consistency of
the model with data’ (2006, Chapter 1 and Appendix B). Referring to consistent estimation as
internal consistency, Cox points out that ‘although internal consistency is desirable, to regard it
overwhelmingly predominant is in principle to accept a situation of always being self-consistently
wrong as preferable to some inconsistent procedure that is sometimes, or even quite often, right’
(ibid, p. 199).
Unfortunately, internal consistency has been promoted as such a key prerequisite by all the
endogeneity-bias induced arguments that it has trapped the profession for decades. It gears the
core of econometric research on device of consistent estimators. Abundant provision of these
‘right answers’ helps indoctrinate economists with two misconceptions. First, significant
difference between consistent estimator based estimates and their OLS counterparts is the
conclusive evidence of endogeneity bias. Since what consistent estimators do essentially is to
modify the explanatory variable of concern by non-unique and non-causal IVs, as shown in the
previous section, such evidence is not at all difficult to produce, thanks to ubiquitous correlation
or multicollinearity among many economic variables.16 Second, empirical consistency is a priori
guaranteed by using consistent estimators. As such, there is no need for a posteriori cross
validation. Empirical findings showing clear lack of convergence, or even invariance, in
consistent estimator based estimates as samples extend have thus repeatedly escaped many
modellers’ attention.
15
See also Swamy et al (2015) for a recent revisit and extension of their arguments.
16
The link between multicollinearity and ‘confluence’, a fundamental concept for interdependency of variables, is
discussed at length in Qin (2014).
23
What is simply unescapable to attend are model forecast failures. As already stated in
section 2, forecast failures played a major role in the dynamic modelling reforms in macro-
econometrics, during which static SEMs were abandoned, along with those SB-based consistent
estimators. Even estimators which were devised to correct residual autocorrelation, such as the
Cochrane-Orcutt procedure, were out of favour, because they were shown to be generally over
restrictive from the perspective of separately estimating short-run and long-run effects in
adequately specified dynamic models, e.g. (Hendry, 1995, Section 7.7). In contrast, estimators
which target directly at these effects, especially the latter, such as the Engle-Granger two-step
procedure and various ‘cointegration’ procedures have gained great popularity, e.g. see Qin
(2013, Chapters 4 & 7).
The above examples help shed light on the continuing dispute over the credibility of IV-
based parametric remedies for OVB in micro-econometrics. Residual autocorrelation in models
using time-series data is essentially an OVB problem. Estimators which aims to correct the bias
are essentially IV estimators. They achieve internal consistency by implicitly making a partial
difference transformation on all the variables, e.g. both x and y in (2), see (Greene, 2003, Section
12.8). A special case of the transformation is found in growth models, or as they are known better
in the micro context difference-in-difference models. The transformation thus changes the causal
interpretation of the parameters involved because it redefines the variables in the original causal
premise. The substantive harm of such redefinitions became transparent once macro-econometric
modellers focused their attentions on differentiating the long-run effects from the short-run
effects in dynamic models.
The above examples also corroborate Cox’s point, cited earlier, on the precedency of
model-data consistency over internal consistency. Consistency of estimators is contingent on the
models they serve being verified as data consistent first. Premature application of consistent
estimators devised purely for treating residual autocorrelation is likely to covertly modify the
24
intended causal premises and lead to incredible causal inferences. The dynamic modelling
reforms in macro-econometrics can thus be seen as a major endeavour to correct the detour of the
estimator-centred research strategy fostered by textbook econometrics and to reorient modellers’
attention towards model-data consistency. The latter is reflected, at least, in three key interrelated
aspects. First, choose models with the smallest error terms possible and ensure that their
probabilistic distributions are as close to a white-noise distribution as possible; second, choose
models with relatively constant parameters associated with conditional variables, especially
during periods with known regime shifts (the principle of super exogeneity); and third,
reparameterise models with separate long-run and short-run effects to circumvent the
confounding problem, which arises from direct inclusion of lagged correlated variables, so as to
attain causal interpretation.
It is noteworthy that these aspects are in broad concord with the direction of statistical
research, over the last few decades, on the causal Markov condition in connection to adequacy of
causal chain model designs, e.g. see Dawid (1979), Cox and Wermuth (1996; 2004), and Pearl
(2009).17 A central issue of that research is how to determine the adequacy of causal model
closure against potential OVB risk. The issue is addressed by sharpening the conditions for the
empirical adequacy of conditional independence. The resulting conditions for collapsibility,
ignorability, faithfulness and unconfoundedness are all aimed at empirical verification of the
Markov property via the error term and the invariance capacity of the model under different
sampling situations. Although the empirical background of that research mainly examines cross-
section data concerning medical and psychological trials, it is striking how much in common
those model design and evaluation conditions have with the criteria used in time-series based
17
Methodological implications of that research have also engaged the attention of philosophers, e.g. see Glymore
(2010) and Russo (2014).
25
dynamic econometric modelling, as well as with the principle of general-to-specific model
searching approach, see Hendry (1995; 2009).
What is even more striking is how mainstream micro-econometrics has resisted evolving
in the same direction, considering its largely cross-section data based background. Here, four
reasons are identified for this resistance. First, partial causal inferences form the most commonly
shared goal in micro-econometric studies. The partiality is distinctly observable at two stages.
Initially, theoretical interest is normally limited to partial explanation of a modelled variable by
one or two causes. These partial causes are assumed a priori to be ‘structural’ on substantive
grounds. At the estimation stage, although experiments with different control variables and/or
different IVs are often carried out, none of these variables are interpreted as causally important
and their relations with the explanatory variables of interest are frequently left unspecified.
Second, the majority of micro-econometric studies use data of the secondary type, i.e. data which
are collected for general purposes rather than tailor-made for their specific purposes. As such, a
rather high noisy-information ratio is widely expected. Residual based diagnostic tests are thus
seen to be too stringent to be useful indicators of mis-specified models. Instead, consistent
estimators are regarded as the universal solution to bypass major modelling problems in the
pursuit for extremely partial causal inferences. Once consistency appears secured, failures from
diagnostic tests are thought as harmless. Third, the prevalent correlations among economic
variables not only facilitate abundant production of IV-based estimates but also widen the value
ranges of these estimates as compared to the OLS, thus enhancing the chance for modellers to
choose estimates which best fit their prior expectations. Finally and probably most importantly,
there is little real-world demand for the parameters of interest to be estimated as precisely as
possible. Policy persuasion and evaluation is often the most practical purpose for empirical micro
studies. While the intricacy of an estimation procedure may enhance its persuasive power, neither
precision nor its predictive capacity counts much in the making of a good story.
26
5. Unfaithful Translation and A Priori Model Closure
The last two sections show that while the primary task of applied studies should be to
examine whether, and under what circumstances, postulated causes are valid conditional
variables, this task has been blocked by a self-reinforcing delusion of prevalent endogeneity bias.
The delusion acts as a taboo barring the translation of causal postulates directly into conditional
decompositions such as (1) in Section 2.
What has led to this loss in translation? A statement by Cox (2006: 197) summarising his
life-long experience serves as the best starting point:
‘Formalization of the research question as being concerned with aspects of a specified

kind of probability model is clearly of critical importance. It translates a subject-matter
question into a formal statistical question and that translation must be reasonably
faithful and, as far as is feasible, the consistency of the model with the data must be
checked. How this translation from subject-matter problem to statistical model is done
is often the most critical part of an analysis.’
Let us now take a historical review of how this translation process has been tackled. The
endeavour by Haavelmo and the CC essentially formalised the translation of general-equilibrium
models into joint probability distribution based statistical models. Meanwhile, direct conditional
decomposition of the joint distribution was de facto rejected on the account of simultaneity. It
was not until the redefinition of exogeneity that the possible validity of such a conditional
translation was formally considered. Knowledge of specific dynamic circumstances, such as co-
movement of long memory variables, upon which empirical validity of such conditional
translations relies has accrued from further dynamic econometric research. The resulting
evidence is now so overwhelmingly strong that few macro economists would take a static SEM
as a faithful translation of a general-equilibrium model nowadays.
Unfortunately, de facto rejection of the direct conditional decomposition has been widely
spread through textbook teaching of SB discrediting the OLS. Adoption of the SEM approach in
27
micro-econometrics has resulted in OVB and SSB being conceptually entangled with SB to
strengthen the taboo against the OLS. The extent of this preconceived bias is discernible from
numerous cases of cognitive dissonance in which OLS estimates are dismissed when they are
actually within substantively feasible value ranges, much more precise and also less variant
across different samples than their IV counterparts, or even when the value range of the latter
falls outside prior expectations, and/or even when the explanatory power of the IV-based models
dwindles to a negligible level, e.g. with 𝑅2 ≪ 0.05, worse than that achieved by the OLS.
To a large extent, this bias reflects the extremely limited role that economists and
mainstream econometricians have been ready to allow for empirical evidence when it comes to
bridging data with theory, a stance which is markedly different from that of statisticians. The
consensus to grant formalised economic postulates a ‘structural’ status, prior to any empirical
verification, has been deeply rooted in economics. Of those major figures at the CC who
formalised econometrics into an academic discipline, few had first-hand experience with data and
none was the data-exploratory type. Their endeavour to delimit econometrics to the provision of
statistically optimal estimators for a priori postulated parameters of interest was highly restricted
and Utopian, in that they relied on theorists to supply completely and correctly formulated
models. It is not surprising that they failed to comprehend Wold’s vehement arguments on the
importance of causal/conditional model designs with respect to data features. The correspondence
between causal postulates and parametrically conditional models in statistics was virtually lost in
the SEM translation by the CC group. In fact, econometricians’ awareness of the link between
conditional expectation and regression models remains rather limited to this day.18 Although the
concept of conditional independence is included in textbooks nowadays, it is taught merely as a
18
In the Econometric Theory (ET) interview of David Hendry, he recalled how the audience at the 1977 European
Econometric Society conference was bewildered by J.-F. Richard’s presentation, which used conditional expectation
based sequencing to formalise the concept of exogeneity (Ericsson and Hendry, 2004). Another telling example of
related communication failure can be found in the discussion of Wermuth (1992) between A.S. Goldberger and
statisticians.
28
statistical assumption associated with the error term. Direct translation of postulated causes of
interest into conditional variables is simply ruled out by the presumption of endogeneity bias.
Within textbook econometrics, theoretical parameters are assumed as known links between
causal variables, i.e. their ‘structural’ meaning is taken as autonomous to the choice of estimators.
The fact that there is no unique estimator for any single ‘structural’ parameter is thus out of the
realm of theorists’ concerns, and accordingly there lacks awareness that the choice of estimators
could possibly alter their causal postulates. This blind spot is further camouflaged by the process
of ‘identification’, namely the categorisation of all ad hoc model amendments to incompletely
closed empirical models. Since identification is taught as a necessary step for estimation, any
additional information brought in through identification conditions is assumed harmless to the
initial causal postulates. This explains why disputes over arbitrary identification conditions have
never ceased among applied economists, while such disputes have hardly touched the theoretical
community.19 When the step of identification is systematically incorporated into IV estimators,
its distortional effect on the causal postulate of interest, if noticed at all, is typically blamed on
poor IV choices in practice. Although the IV method was shown to function as a ‘generated
regressor’ producer by Pagan (1984) decades ago, implications of his description on causal
modelling have for the most part been overlooked. The consistent estimator-centred trap
fabricated around a fictive endogeneity bias remains too powerful to escape for the majority of
the profession.
Why has endogeneity bias gained such traction? The preceding discussion has already
addressed this question. In short, a large number of econometric analyses have been and are still
used for enhancing the persuasiveness of highly partial causal postulates. Under these
circumstances, the bias is seen as a plausible and expedient representation of all the major
statistical problems possibly associated with the ceteris paribus condition assumed in those
19
The latest resurgence can be found in Romer (2016).
29
postulates. IV remedies for the bias are thus believed to provide a general and secure route to
maintain over-simplistic models. The route works as an efficient production line for apparently
convincing stories, where quantitative precision matters little.
From a statistical viewpoint, the above situation can be ascribed to a priori model closure
such that cross validation, a vital step in statistical research, is deemed redundant. In other words,
highly partially formulated theories are translated directly into regression models irrespective of
how faithful the translation is in respect of specific data features and empirical tasks at hand.20
Major known problems of the models with data after closure are merged into a single symptom
– the correlation in question. Notice, in particular, that mathematical analysis of the correlation
requires the closure to be exercised at the bivariate model level, as demonstrated in Section 3.
The persuasiveness of the analysis lives on the direct association of the bivariate model
specification to variable interdependency, probably the most fundamental feature of economics.
But how faithful are bivariate models as translations of economic reality in general? Indeed,
justifications of endogeneity bias by SB and OVB presume direct applicability of bivariate
models to the most general situation in economic analyses – multivariate analyses. Justification
of the bias by SSB, on the other hand, presupposes the prevalence of individual self-selection
behaviour in data samples to the extent that the conditional status of the data-observable causal
variable under concern is no longer valid. Without empirical verification of those suppositions,
justifications of endogeneity bias remain highly fictitious. Clearly, the logical grounds for the
bias are severely undermined once the bivariate model setting is rejected and replaced by a
multivariate one. When closure of multivariate models is further reckoned as not possible a priori,
the estimation-centred route loses effectiveness.
20
In fact, faithfulness has become a key condition in the statistical causal modelling literature in recent years, as
mentioned in the previous sections, see also Spirtes (2009), Zhang and Spirtes (2011) for philosophers’ discussion
on this issue.
30
In view of the vast empirical literature, a priori model closure at the bivariate model level
is almost never tenable. This helps explain why while a large number of estimation techniques of
amazing mathematical complexity have been devised effectively to maintain the closure, their
empirical yield is disproportionately meagre in terms of accuracy and invariance of parameter
estimates, two basic properties of statistical inference.21 On the applied side, textbook preaching
on the need to treat endogeneity bias has misled many economists into producing, accepting and
justifying poor model fits and parameter estimates of low inference capacity, after toiling with
various estimators on data, especially large household survey samples. The knowledge gained
from their empirical research is simply not up to par with what is now achievable given the
advances of computing techniques and data availability. On the theoretical side, many estimators
devised to tackle ‘data complexity’ faced by over-simplistic models do not seek invariant
conditional expectation, the basic principle to bridge data with causal postulates. For instance,
invariance is ignored in time-varying or random parametric methods, conditional expectation is
forsaken in quantile estimators and the bridge-building task is dropped totally in non-parametric
methods.
The above observation shows that faithful translation is virtually unattainable with the
practice of a priori model closure. Even if a postulate enjoys a high degree of confidence from
accrued substantive knowledge, sizeable uncertainty will arise when it comes to the issue of how
to represent the postulate in empirical models which are consistent with data samples at hand.
The uncertainty gives rise to model-data inconsistency. The sources of such inconsistency are
often too diverse and situation-specific to be merged into one syndrome, and the sizes too large
to be treated effectively by estimator-based remedies. Active search for data-consistent empirical
models requires that model closure be exercised a posteriori and that systematic ways to let data
21
This situation actually fits what Freedman (1991) describes as an infertile way of using ‘technical fixes’ to rescue
poorly designed models.
31
assist with the model design process in an interactive manner be combined with a priori causal
reasoning. A paradigm shift in textbook econometrics is inevitable. 22
Currently, rapid advances in computing power, software development, data availability and
the speed of knowledge exchanges across disciplines are all catalytic to such a shift, e.g. see Einav
and Levin (2013), Morgan (2013), and Kalisch and Bühlmann (2014). Recent developments in
statistics and machine learning have not only lowered the technical barriers to mapping causal
postulates into statistical models, but also deepened our knowledge of the basic conditions
necessary for statistically adequate model closure. None of these conditions depend upon
choosing complicated estimators. When it comes to the estimation step, the statisticians’ view is
that parameters should ‘have clear subject-matter interpretations’ and ‘statistical theory for
estimation should be simple’ (Cox, 2006, p13). This view is actually a fairly good summary of
solid empirical studies by numerous experienced and data-exploratory economists if we examine
history closely. Their choice, through accruing experience with data, suggests that conditional
variables translated directly from substantive knowledge-based causal postulates are often far
more reliable than IV-generated synthetics.
A decisive step needed for the paradigm shift is to recognise the false promise of
endogeneity bias, thereby removing its traction. By probing into the roots of the bias, our
discussion has laid bare the specious qualities of the bias and its shaky cognitive foundation. The
success of applied models should stem from drawing together the relative advantages of both
substantive knowledge and data analysis. Few can dispute the following. On the one hand,
substantive knowledge is relatively good at identifying key causes, but not good at identifying
the appropriate functional forms of empirical models or other minor causes which are not
ignorable in estimating the effects of the key causes. On the other hand, data is the best possible
source for obtaining the missing knowledge necessary for the formulation of empirically adequate
22
This view has been independently expressed in the recent paper by Rust (2016).
32
models. Econometric practice that disregards data knowledge in model design, and that
camouflages deficiencies in model design by estimators which effectively modify key causal
variables in non-causal ways against what has originally intended in theory, can only be called
‘alchemy’, not ‘science’ (Hendry, 1980).
References
Aldrich, J. (1993) Cowles exogeneity and core exogeneity, Discussion Papers in Economics and
Econometrics, University of Southampton, No 9308.
Angrist, J. (1990) Lifetime earnings and the Vietnam era draft lottery: Evidence from social
security administrative records. American Economic Review 80: 313-36.
Angrist, J., and Krueger, A. (1991) Does compulsory school attendance affect schooling and
earnings? The Quarterly Journal of Economics 106: 979-1014.
Angrist, J., Imbens, G., and Rubin, D. (1996). Identification of causal effects using instrumental
variables. Journal of the American Statistical Association 91: 444-55.
Angrist J.D. and Pischke, J. (2009) Mostly harmless econometrics: An empiricist’s companion.
Princeton University Press.
Angrist J.D. and Pischke, J. (2015) Mastering ’Metrics: The Path from Cause to Effect, Pinceton
University Press.
Bjerkholt, O. and Qin, D. (eds.) (2010) A Dynamic Approach to Economic Theory: The Yale
Lectures of Ragnar Frisch in 1930, Routledge.
Blau, F.D. and L.M. Kahn (2007) Changes in the labor supply behaviour of married women:
1980-2000. Journal of Labor Economics 25: 393-438.
Cameron, A.C. (2009) Microeconometrics: Current methods and some recent developments, in
K. Patterson and T.C. Mills (eds.), Palgrave handbook of econometrics, vol. 2, Palgrave
MacMillan, pp. 729-74.
Christ, C.F. (1952) History of the Cowles Commission, 1932-1952, in Economic Theory and
Measurement: A Twenty Year Research Report 1932-1952. Cowles Commission for
Research in Economics, pp. 3-65.
Christ, C.F. (1960) Simultaneous equations estimation: Any verdict yet? Econometrica 28: 835-
45.
33
Colander, D. Föllmer, H., Haas, A., Goldberg, M.D., Juselius, K., Kirman, A., Lux, T. and Sloth,
B. (2009) The financial crisis and the systemic failure of academic economics. Univ. of
Copenhagen Dept. of Economics Discussion Paper 09-03.
Cox, D.R. (1992) Causality: Some statistical aspects. Journal of Royal Statistical Society Series
A. 155: 291–301.
Cox, D.R. (2006) Principles of Statistical Inference, Cambridge University Press.
Cox, D.R. and N. Wermuth (1996) Multivariate Dependencies: Models, Analysis and
Interpretation, Chapman & Hall.
Cox, D.R. and N. Wermuth (2004) Causality: A statistical view. International Statistical Review
72: 285-305.
Dawid, A.P. (1979) Conditional independence in statistical theory (with discussion). Journal of
Royal Statistical Society B. 41: 1-31.
Deaton, A. (2010) Instruments, randomization, and learning about development. Journal of
Economic Literature 48: 424-55.
Einav, L. and Levin, J. (2013) The data revolution and economic analysis. NBER Working Paper
19035.
Elwert, F. (2013) Graphical causal models, in S.L. Morgan ed. Handbook of Causal Analysis for
Social Research, Springer, Chapter 13, pp. 245-73.
Engle, R.F., Hendry, D.F., and Richard, J.-F. (1983). Exogeneity. Econometrica 51: 277–304.
Epstein, R. (1987) A History of Econometrics. Amsterdam: North-Holland.
Epstein, R., (1989) The fall of OLS in structural estimation. Oxford Economic Papers 41: 94-
107.
Ericsson, N.R. and D.F. Hendry (2004) The ET Interview: Professor David F. Hendry,
Econometric Theory 20: 745-806.
Freedman, D.A. (1991) Statistical models and shoe leather, Sociological Methodology 21: 291-
313.
Freedman, D.A. (2004) On specifying graphical models for causation, and the identification
problem. Evaluation Review 28: 267-93.
Girshick, M.A. and T. Haavelmo (1947) Statistical analysis of the demand for food: Examples of
simultaneous estimation of structural equations. Econometrica 15: 79-110.
Glymore, C. (2010) Explanation and truth, in Mayo and Spanos (eds.) (2010), pp. 331-50.
Haavelmo, T. (1943) The statistical implications of a system of simultaneous equations.
Econometrica 11: 1–12.
Haavelmo, T. (1947) Methods of measuring the marginal propensity to consume. Journal of the
American Statistical Association 42: 105-22.
34
Heckman, J. (1976) A life-cycle model of earnings, learning, and consumption. Journal of

Political Economy 84: S11-S44.
Heckman, J. (1978) Dummy endogenous variables in a simultaneous equation
system. Econometrica 46: 931-59.
Heckman, J. (1979) Sample selection bias as a specification error. Econometrica,
47: 153-61.
Heckman, J. (1992) Randomization and social program, in C. Manski and I. Garfinkel (eds.),
Evaluating Welfare and Training Programs, Harvard University Press, pp. 201-230.
Heckman, J. (1996) Randomization as an instrumental variable. Review of Economics and
Statistics 78: 336-41.
Hendry, D.F. (1980) Econometrics: Alchemy or science. Economica, 47: 387-406.
Hendry, D.F. (1995) Dynamic econometrics. Oxford University Press.
Hendry, D.F. (2009) The methodology of empirical econometric modelling: Applied
econometrics through the looking-glass, in Patterson, K. and Mills, T. C. (eds.) Palgrave
Handbook of Econometrics, vol. 2. Palgrave MacMillan, pp. 3-67.
Hoover, K.D. (2000) The Methodology of Empirical Macroeconomics, Cambridge University
Press.
Kalisch, M. and P. Bühlmann (2014) Causal structure learning and inference: A selective review.
Quality Technology & Quantitative Management 11: 3-21.
Kennedy, P. (2002) Sinning in the basement: what are the rules? The ten commandments of
econometrics, Journal of Economic Survey 16: 569-89.
Kennedy, P. (2008) A Guide to Econometrics (6th edition), Wiley-Blackwell.
Mäki, U. (ed.) (2002) Fact and Fiction in Economics, Cambridge University Press.
Malinvaud, E. (1966) Statistical Methods in Econometrics. North-Holland.
Marschak, J. (1953) Economic Measurements for Policy and Prediction, in Hood, W. C. and T.
Koopmans (eds.), Studies In Econometric Method, Yale University Press, pp. 1-26.
Mayo, D.G. and A. Spanos (eds.) (2010) Error and Inference: Recent Exchanges on
Experimental Reasoning, Reliability and the Objectivity and Rationality of Science,
Cambridge University Press.
Morgan, S.L. (ed.) (2013) Handbook of Causal Analysis for Social Research, Springer.
Pagan, A. (1984) Econometric issues in the analysis of regressions with generated regressors.
International Economic Review 25: 221-47.
Pearl, J. (2009) Causality: Models, Reasoning and Inference. Cambridge University Press.
Pratt, J.W. and R. Schlaifer (1984) On the nature and discovery of structure. Journal of the
American Statistical Association 79: 9-21.
35
Pratt, J.W. and R. Schlaifer (1988) On the interpretation and Observation of Laws. Journal of
Econometrics 39: 23-52.
Puhani, P.A. (2002) The Heckman correction for sample selection and its critique. Journal of
Economic Survey 14: 53–68.
Qin, D. (1993) Formation of Econometrics: A historical perspective, Oxford University Press.
Qin, D. (2013) A History of Econometrics: The Reformation from the 1970s, Oxford University
Press.
Qin, D. (2014) Inextricability of confluence and autonomy in econometrics. Oeconomia 4(3):
321-41.
Qin, D. (2015) Resurgence of the endogeneity-backed instrumental variable methods.
Economics: The Open-Access, Open-Assessment E-Journal, 9(2015-7), 1-35.
Qin, D., van Hüllen, S., and Wang, Q.-C. (2016) How Credible Are Shrinking Wage Elasticities
of Married Women Labour Supply? Econometrics, 4(1).
Richard, J.-F. (1980) Models with several regimes and changes in exogeneity. Review of
Economic Studies 47: 1-20.
Romer, P. (2016) The trouble with macroeconomics, Working Paper, Stern School of Business,
New York University.
Russo, F. (2014) What invariance is and how to test for it, International Studies in the Philosophy
of Science 28: 157-83.
Rust, John (2016) Mostly useless econometrics? Assessing the causal effect of econometric
theory, Foundations and Trends® in Accounting 10: 2-4, 125-203.
Sims, C.A. (1980) Macroeconomics and reality. Econometrica 48: 1-48.
Spirtes, P. (2005) Graphical models, causal inference, and econometric models, Journal of
Economic Methodology. 12:1, 1-33.
Spirtes, P. (2009) Variable definition and causal inference, Proceedings of the 13th International
Congress of Logic Methodology and Philosophy of Science, pp. 514-53.
Steyer, R. (1984) Causal linear stochastic dependencies: The formal theory, in E. Degreef, and J.
van Buggenhaut (eds.) Trends in Mathematical Psychology. North-Holland, pp. 317-46.
Steyer, R. (1988) Conditional expectations: An introduction to the concept and its applications in
empirical sciences. Methodika 2(1): 53-78.
Steyer, R., S. Gabler, A.A. von Davier and C. Nachtigall (2000) Causal regression models II:
Unconfoundedness and causal unbiasedness, Methods of Psychological Research Online,
5, No. 3.
Stock, J.H. and M.W. Watson (2003) Introduction to Econometrics, Addison-Wesley.
36
Swamy, P.A.V.B., G.S. Tavlas and S.G. Hall (2015) On the interpretation of instrumental
variables in the presence of specification errors. Econometrics 3: 55-64.
van Hüllen, S. and D. Qin (2016) Compulsory Schooling and the Returns to Education: A Re-
examination, SOAS Department of Economics Working Paper Series, No. 199.
Waugh, F.V. (1961) The place of Least Squares in econometrics. Econometrica 29: 386-96.
Wermuth, N. (1992) On block-recursive regression equations (with discussion). Brazilian
Journal of Probability and Statistics 6: 1-56.
Wermuth, N. and D.R. Cox (2011) Graphic Markov models: Overview, in J. Wright (ed.)
International Encyclopedia of Social and Behavioral Sciences (2nd ed.), Elesevier, 10: 341-
50.
Wold, H.O.A. (1954) Causality and econometrics. Econometrica 22: 162–177.

Wold, H.O.A. (1956) Causal inference from observational data: A review of ends and means.
Journal of Royal Statistical Society, Series A, 119: 28–61.
Wold, H.O.A. (1960) A generalization of causal chain models (Part III of a Triptych on Causal
Chain Systems). Econometrica 28: 443–463.
Wold, H.O.A. and Juréen, L. (1953) Demand Analysis: A Study in Econometrics, Wiley and Sons,
New York.
Wooldridge, J. M. (2010) Econometric Analysis of Cross Section and Panel Data (2nd edition),
The MIT Press.
Zhang, J., and Spirtes, P. (2011) Intervention, determinism, and the causal minimality condition,
Synthese, 182:13, 335-47.
37

Let's Take The Bias Out of Econometrics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Let's Take The Bias Out of Econometrics

Uploaded by

Copyright:

Available Formats

Working Paper Series

ISSN 1753 - 5816

Electronic copy available at: https://ssrn.com/abstract=2895627

This and other papers can be downloaded without charge from:

SOAS Department of Economics Working Paper Series at

Design and layout: O. González Dávila

Electronic copy available at: https://ssrn.com/abstract=2895627

Let’s Take the Bias Out of Econometrics

Department of Economics, SOAS, University of London, UK

estimators – modification of those variables by non-uniquely and non-causally generated

JEL classification: B23, B40, C10, C50

an a priori constructed model in which an explanatory variable is postulated to be correlated with

A slightly lengthier description can be found in Wooldridge’s textbook (2010: 54):

simultaneous-equation model (SEM). Second, the concept is fundamental as it is used to signify

‘faithfulness’ condition – accompanied by lively development in graphic model-assisted causal

and other related disciplines on the other.

validated empirically of those estimators. Methodologically, the estimator-centred approach

faithful and data-consistent models (section 5).

2. A Brief History of Endogeneity Bias

modelling community, e.g. Sims (1980).4

Although it took a few decades for econometricians’ attention on SB to wane, empirical

anticipated by Wold’s ‘proximity theorem’, which shows SB to be practically negligible in an

specifications and their cross validation.

1979) in the context of limited dependent variable (LDV) models.6

representation, Heckman also provided a behavioural interpretation to the added instrumental

powerful interpretation rationalised the truncated explanatory variable to an endogenous variable.

self-selection behaviour really matters at the aggregate level.

Obviously, outcomes of any policy-driven programmes come after their implementation.

Simultaneity is thus impossible by construction. Self-selection behaviour is not, however,

considered un-ignorable on substantive grounds.7 Heckman’s SEM presentation of endogenous

recognition of the causal-modifying capacity of IVs, and in particular that IV-modified

criticised as a fundamental misunderstanding of exogeneity (Deaton, 2010). In contrast to the

abandoned in development economics, as occurred in macro-econometric studies decades ago.

3. Endogeneity Bias: An Anatomy

exposure of the estimator-based route as an implicit re-specification device of postulated

explanatory variables of key causal interest.8

3.1. SB (Simultaneity Bias):

distributed, elementary probability theory dictates the following density decomposition:

(1) 𝑓𝑥,𝑦 = 𝑓𝑦|𝑥 𝑓𝑥 .

regression models such as:

impossibility is termed ‘under-identification’ and circumvented by identification conditions.

the IV estimator for 𝛽1 in (3).

introduced into econometrics prior to its application to SEMs:

(5) 𝑦 = 𝛽𝑦𝑥 ∗ 𝑥 ∗ + 𝜀𝑦∗ ; 𝑥 = 𝑥 ∗ + 𝑥"

of (5) and its IV treatment.

Figure 1. Errors-in-variables Model and IV Treatment

Figure 2. SEM and IV Treatment via 2SLS

through a non-unique but significant modification of x. The fundamental bi-directional position

(6) 𝑓𝑥̂ 𝑉,𝑦 = 𝑓𝑦|𝑥̂ 𝑉 𝑓𝑥̂ 𝑉 ,

effectively refutes x as a valid conditional variable for y, since 𝑥̂ 𝑉 ≉ 𝑥 by condition (ii).

Condition (ii) is apparently unheeded in textbook econometrics, because discussions on the

For example, the IV estimator of model (4) can be written as:

abandoned by macro modellers. Theoretically, it is impossible to define a uniquely ‘optimal’ V

associated over-identification conditions have remained unsettled. Empirically, although it is not

empirical consistency of those IV estimates. Such evidence effectively rejects 𝑥̂ 𝑉 in favour of x.

Meanwhile, improved forecasting performance in macroeconometric modelling after the

common adoption of VARs leads to 𝑥̂ 𝑉 → 𝑥, as rightly anticipated by Wold’s proximity theorem.

cause and effect’ Cox (1992: 293).

3.2. OVB (Omitted Variable Bias)

OVB complicates the inference of 𝐸𝑦|𝑥 by evoking an additional relationship between x

in discussion over parameter interpretability.

(7a) 𝑓𝑥,𝑦,𝑧 = 𝑓𝑦|𝑥,𝑧 𝑓𝑥|𝑧 𝑓𝑧 ;

Under the linearity assumption, (7a) corresponds to a chain of regressions: