Endogeneity in choice modeling
From DDWiki
Solutions to systems of simultaneous equations generally require variations of least-squares estimation, including Ordinary Least Squares, Instrumental Variables, Seemingly Unrelated Regression, Two-Stage Least Squares, and Three-Stage Least Squares.
Contents |
Single Regression Equation
For a linear system, OLS and IV methods can be used.
Ordinary Least Squares
Main article: least squares method
In matrix form, the equation to be regressed can be expressed as:
The equation can rearranged by placing the error component on the left hand side and squaring both sides:
To minimize the residuals, if there exist least-squares estimator
, the first-order condition of minimization should be satisfied. Thus,
The least-squares estimator is:
However, if endogenous variables are correlated with the disturbances (as in a system of simultaneous equations), or if the disturbances are correlated between multiple equations, using OLS estimators to estimate structural parameters can cause biased coefficient estimates.
Instrumental Variables
An instrumental variable is one that is correlated with the independent variable but not with the error term. The estimator is
When z and
are uncorrelated, the final term approaches zero in the limit, providing a consistent estimator. Note that when x is uncorrelated with the error term, x is itself an instrument for itself. In this light, under certain assumptions, OLS is a narrower version of IV estimators.
The approach above generalizes in a straightforward way to a regression with multiple explanatory variables. Suppose X is the T x K matrix of explanatory variables resulting from T observations on K variables. Let Z be a T x K matrix of instruments. Then
System of Regression Equations
For a linear system, Seemingly Unrelated Regression methods, 2SLS, or 3SLS methods can be used.
Seemingly Unrelated Regression (SUR)
An economic model may contain multiple equations which are independent of each other on the surface: they are not estimating the same dependent variable, they have different independent variables, etc. However, if the equations are using the same data, the errors may be correlated across the equations. SUR is an extension of the linear regression model which allows correlated errors between equations.
SUR uses generalized least squares to estimate β:
where
where
is the Kronecker product and V(Y) is an M × N matrix, where M is the number of equations and each equation has N observations. Let Σ be an M × M matrix representing the covariance of residuals between the equations.
Two-Stage Least Squares
Main Article: two-stage least squares
Two-stage least squares (2SLS) is a estimation method which utilizes instrumental variables with ordinary least squares for structural modeling when endogenous variables are present.
Suppose a model:
where
- y is Tx1 vector of dependent variables (observations)
- ε is kx1 vector of error components
- X is Txk matrix of independent variables, which may be correlated to error components
- Z is assumed a independent variable Txr matrix (r>=k) uncorrelated to error components
Stage 1: Endogenous variables X are regressed on all valid instruments Z, including the full set of exogenous variables. Since the instruments Z are exogenous, the approximations on the endogenous covariates will not be correlated with the error term. Thus,
Stage 2: A small correction need to be made to cover the sum-of-squared residuals in order to associate standard errors correctly.
Three-Stage Least Squares
3SLS is used when endogenous variables are correlated with error terms and the error terms are correlated between equations. It is 2SLS analysis followed by SUR.
Reference
- Wikipedia Article: Instrumental variables
- Wikipedia Article: Linear least squares
- Wikipedia Article: Seemingly unrelated regression
- Greene, W.H., 2003, Econometric analysis, Prentice Hall, Upper Saddle River, N.J.

