No measurement error
(X) and dependent (Y) variables are accurately measured: IV- any measurement error will bias the estimates.
DV-may be unbiased if the error is random.
The consequences of random error in
an Independent Variable may be lower R2, partial slope coefficients can
vary dramatically depending on the amount of random error in the independent variables and
the partial slope coefficients of independent variables that do not have random
measurement error will be biased if they are correlated with another independent variable
with measurement error.
No specification error
theoretical model is linear, additive, and includes the correct variables. Linear implies that the average change in the
dependent variable associated with a one unit change in an independent variable is
constant regardless of the level of the independent variable. If the partial slope for X
is not constant for differing values of X, X has a nonlinear relationship with Y and
results in biased partial slopes.
but slope does not change direction.
A log-log model takes a nonlinear specification where the slope changes as the value of X
increases and makes it linear in terms of interpreting the parameter estimates. It
accomplishes this by transforming the dependent and all independent variables by taking
their log and replacing the original variables with the logged variables. The result is coefficients that are interpreted as
a % change in Y given a 1% change in X.
Example: log Y= a + b1logX1
+ b2logX2 + b3logX3
+ log e
Note: the data must have positive values. A log of a negative value will equal 0 and as a
result biases the model. Use the anti log of
coefficients to estimate y.
and the slope changes direction (pos or neg)
A polynomial model may be used
to correct for changes in the slope coefficient sign (positive or negative): This is accomplished by adding additional variables
that are incremental powers of the independent variable to model bends in the slope.
Example: Y = a + b1X1
implies that the average change in the dependent variable associated with a one unit
change in an independent variable (X1) is constant regardless of the value of
another independent variable (X2) in the model.
If this assumption is violated, we can no longer interpret the slope by
saying "holding other variables constant" since the values of the other
variables may possibly change the slope coefficient and therefore its interpretation.
above displays a non-additive relationship when (X1) is interval/ratio and (X2)
is a dummy variable. If the partial slope for
(X1) is not constant for differing values of (X2), (X1)
and (X2) do not have an additive relationship with Y.
interaction term may be added using a dummy variable
where the slope of X1 is thought to depend on the value of a dummy variable
X2. The final model will look like the following:
= a + b1X1 + b2X1X2
X1X2=the interaction between X1 and X2
is interpreted as the slope for X1 when the dummy variable (X2) is 0
b2 is interpreted as the slope for X1 when the dummy variable (X2)
multiplicative model for two interval independent variables
two interval level independent variables that are thought to interact in how they
without interaction term
Y = a + b1X1
+b2X2 + e
Y = a
+ b1X1 +b2X2
is the interactive term
the correct independent variables implies that an irrelevant variable has not been
included in the model and/or all theoretically important relevant variables are included. Failing to include the correct variables in the
model will bias the slope coefficients and may increase the likelihood of improperly
finding statistical significance. Including
irrelevant variables will make it more difficult to find statistical significance.
irrelevant variables and, if possible, include missing relevant variables.
Mean of errors equals zero
mean error (reflected in residuals) is not equal to zero, the y intercept may be biased. Violation of this assumption will not affect the
slope coefficients. The partial slope coefficients will remain Best Unbiased Linear
Error term is normally distributed
distribution of the error term closely reflects the distribution of the dependent
variable. If the dependent variable is not
normally distributed the error term may not be normally distributed. Violation of this assumption will not bias the
partial slope coefficients but may affect significance tests.
correct other problems first and then re-evaluate the residuals
distribution of residuals is skewed to the right (higher values), try using the natural
log of the dependent variable
distribution of residuals is skewed to the left (lower values), try squaring the dependent
of the error term is constant for all values of the independent variables. Heteroskedasticity occurs when the variance of the
error term does not have constant variance. The
parameter estimates for partial slopes and the intercept are not biased if this assumption
is violated; however, the standard errors are biased and hence significance tests may not
regression residuals against the values of the independent variable (s). If there appears
an even pattern about a horizontal axis, heteroskedasticity is unlikely.
samples there may be some tapering of each end of the horizontal distribution.
If there is
a cone or bow tie shaped pattern, heteroskedasticity is suspected.
Correction: If an
excluded independent variable is suspected, including this variable in the model may
correct the problem. Otherwise, it may be
necessary to use generalized least squares (GLS) or weighted least squares (WLS) models to
create coefficients that are BLUE.
terms are not correlated across observations. Violation
of this assumption is likely to be a problem with time-series data where the value of one
observation is not completely independent of another observation. (Example: A simple two-year time series of the same
individuals is likely to find that a person's income in year 2 is correlated with their
income in the prior year.) If there is
autocorrelation, the parameter estimates for partial slopes and the intercept are not
biased but the standard errors are biased and hence significance tests may not be valid.
autocorrelation with any time series/longitudinal data
Durbin-Watson (d) statistic
no correlation between error terms
perfect positive correlation between error terms
perfect negative correlation between error terms
generalized least squares (GLS) or weighted least squares (WLS) models to create
coefficients that are BLUE.
assumption of no multicolinearity is an issue only for multiple regression models. Multicolinearity occurs when one of the independent
variables has a substantial linear relationship with another independent variable in the
equation. It occurs to some extent in any
model and is more a matter of degree of colinearity rather than whether it exists or not. Multicolinearity will result in variability in the
partial slope coefficients from one sample to the next or when models are changed
slightly. Standard errors are increased which
reduces the likelihood of finding statistical significance.
The result of the above two situations is an unbiased estimator that is very
find any variables statistically significant yet the F-statistic shows the model is
changes in coefficients as independent variables are added or deleted from the model.
covariation among the independent variables by calculating all possible bivariate
combinations of the Pearson correlation coefficient. Generally a high correlation
coefficient (say .80 or greater) suggests a problem. This
is imperfect since multicolinearity may not be reflected in a bivariate correlation
independent variable on the other independent variables.
If any of the R2s are near 1.0, there is a high degree of
sample size to lower standard errors. Doesn't always work and is normally not feasible
since adding more cases is not a simple exercise in most studies.
or more variables that are highly correlated into a single indicator of a concept.
of the variables that are highly correlated. May
result in a poorly specified model.
variables in model and rely on the joint hypothesis F-test to evaluate the significance of
the model. Especially useful if you suspect
multicolinearity is causing most if not all of the independent variables to be not