Evaluating
the power of the regression model
If we only
had information on Y (Income), our best guess of an individual's income would be the mean
income. However, if we have a paired X
variable (Education) that is related to Y, we can use this additional variable to improve
our ability to predict an individual's income.
The
independent variable's ability to model variations in Y can be evaluated by comparing the
amount of deviation explained by our model using X to the total amount of deviation in Y. This ratio is known as the Coefficient of
Determination or R^{2} which
represents the proportion of variation in Y explained by X.
It can range from 0 to 1.
Components
of Deviation (R^{2} ) (y=income; x=education)
The
components of deviation for one observation are as follows:
_{} = the deviation of the Y observation from the mean (Total
Dev.)
_{} = deviation explained by X
(Explained Dev.)
_{} = deviation not explained by X
(Unexplained Dev.)
Example
Using Tracy Data
_{} = mean income $30.8k
Yi = Tracy’s
income $44k
Xi = Tracy’s
education 18 years
_{} = Tracy’s predicted income is $40.9
_{}
The formula
for estimating deviations for all observations is as follows:
TSS (Total
Sum of Squares)
_{} = the total deviation of Y
RSS
(Regression Explained Sum of Squares)
_{} = deviation explained by X
ESS (Error
Sum of Squares)
_{} = deviation not explained by X
_{}
or
_{}
Example:



TSS 

ESS 

RSS 
Name 
Y 
_{} 
_{} 
_{} 
_{} 
_{} 
_{} 
Susan 
25 
5.8 
33.6 
24.1 
0.8 
6.7 
44.9 
Bill 
27 
3.8 
14.4 
29.7 
7.3 
1.1 
1.2 
Bob 
32 
1.2 
1.4 
35.3 
10.9 
4.5 
20.3 
Tracy 
44 
13.2 
174.2 
40.9 
9.6 
10.1 
102.0 
Joan 
26 
4.8 
23.0 
24.1 
3.6 
6.7 
44.9 
Mean = 
30.8 







_{} 

246.6 

32.2 

213.3 
Note: numbers are rounded to one decimal
_{}
_{}
Impact of R^{2 }on
predictions:
A relatively
high R^{2 }is required
to make accurate predictions (.90 or better). It
is very unlikely in social science that we will obtain R^{2 }this high,
thus we focus more on explaining relationships.
R^{2 }is sample
specific:
Two samples
with the same variables, slope, and intercept could have different R^{2 }because of
the fit between the data and the regression line (different variation in Y; see formula).
Software Output Example
