Learn moreLearn moreApplied Statistics Handbook

Table of Contents

 


 

Evaluating the power of the regression model

 

If we only had information on Y (Income), our best guess of an individual's income would be the mean income.  However, if we have a paired X variable (Education) that is related to Y, we can use this additional variable to improve our ability to predict an individual's income.

 

The independent variable's ability to model variations in Y can be evaluated by comparing the amount of deviation explained by our model using X to the total amount of deviation in Y.  This ratio is known as the Coefficient of Determination or R2 which represents the proportion of variation in Y explained by X.   It can range from 0 to 1.

 

 

Components of Deviation (R2 ) (y=income; x=education)

 

The components of deviation for one observation are as follows:

 

 = the deviation of the Y observation from the mean   (Total Dev.)

 

 = deviation explained by X                          (Explained Dev.)

 

 = deviation not explained by X        (Unexplained Dev.)

 

 

Example Using Tracy Data

 

 = mean income $30.8k

 

Yi = Tracy’s income $44k

 

Xi = Tracy’s education 18 years

 = Tracy’s predicted income is $40.9            

 

 

 

 

 


 

 

The formula for estimating deviations for all observations is as follows:

 

 

TSS (Total Sum of Squares) 

 

 = the total deviation of Y

 

 

 

RSS (Regression Explained Sum of Squares)          

 

 = deviation explained by X

 

 

 

ESS (Error Sum of Squares)

 

 = deviation not explained by X

 

 

 

       or         

 

 

 

 

Example:

 

 

 

 

TSS

 

ESS

 

RSS

Name

Y

Susan

25

-5.8

33.6

24.1

0.8

-6.7

44.9

Bill

27

-3.8

14.4

29.7

7.3

-1.1

1.2

Bob

32

1.2

1.4

35.3

10.9

4.5

20.3

Tracy

44

13.2

174.2

40.9

9.6

10.1

102.0

Joan

26

-4.8

23.0

24.1

3.6

-6.7

44.9

Mean =

30.8

 

 

 

 

 

 

 

 

246.6

 

32.2

 

213.3

Note:  numbers are rounded to one decimal

 

 

     

 

 

 

Impact of R2 on predictions: 

 

A relatively high R2 is required to make accurate predictions (.90 or better).  It is very unlikely in social science that we will obtain R2 this high, thus we focus more on explaining relationships.

 

 

R2 is sample specific: 

 

Two samples with the same variables, slope, and intercept could have different R2 because of the fit between the data and the regression line (different variation in Y; see formula).

 

Software Output Example


Google

 


Copyright 2015, AcaStat Software. All Rights Reserved.