The
chain of reasoning and systematic steps used in hypothesis testing that are outlined in
this section are the backbone of every statistical test regardless of whether one writes
out each step in a classroom setting or uses statistical software to conduct statistical
tests on variables
stored in a database.
Chain
of reasoning for inferential statistics
1.
Sample(s)
must be randomly selected
2.
Sample
estimate is compared to underlying distribution of the same size sampling distribution
3.
The
probability that a sample estimate reflects the population parameter is determined
The
four possible outcomes in hypothesis testing 




Actual
Population
Comparison 

Null
Hyp. True 
Null
Hyp. False 
DECISION 
(there
is no difference) 
(there
is a difference) 
Rejected
Null Hyp 
Type
I error
(alpha) 
Correct
Decision 
Did
not Reject Null 
Correct
Decision 
Type
II Error 

(Alpha
= probability of making a Type I error) 
Regardless of whether statistical tests are conducted by hand or through
statistical software, there is an implicit understanding that systematic steps are being
followed to determine statistical significance. These
general steps are described on the following page and include 1) assumptions, 2) stated
hypothesis, 3) rejection criteria, 4) computation of statistics, and 5) decision regarding
the null hypothesis.
The
underlying logic is based on rejecting a statement of no difference or no association,
called the null hypothesis. The null
hypothesis is only rejected when we have evidence beyond a reasonable doubt that a true
difference or association exists in the population(s) from which we drew our random
sample(s).
Reasonable
doubt is based on probability sampling distributions and can vary at the researcher's
discretion. Alpha .05 is a common benchmark
for reasonable doubt. At alpha
.05 we know from the sampling distribution that a test statistic will only occur by random
chance five times out of 100 (5% probability). Since
a test statistic that results in an alpha of .05 could only occur by random chance 5% of
the time, we assume that the test statistic resulted because there are true differences
between the population parameters, not because we drew an extremely biased random sample.
When
learning statistics we generally conduct statistical tests by hand. In these situations, we establish before the test
is conducted what test statistic is needed (called the critical value) to claim
statistical significance. So, if we know for a
given sampling distribution that a test statistic of plus or minus 1.96 would only occur
5% of the time randomly, any test statistic that is 1.96 or greater in absolute value
would be statistically significant. In an
analysis where a test statistic was exactly 1.96, you would have a 5% chance of being
wrong if you claimed statistical significance. If
the test statistic was 3.00, statistical significance could also be claimed but the
probability of being wrong would be much less (about .002 if using a 2tailed test or
twotenths of one percent; 0.2%). Both .05 and
.002 are known as alpha;
the probability of a Type I error.
When
conducting statistical tests with computer software, the exact probability of a Type I
error is calculated. It is presented in
several formats but is most commonly reported as "p <" or "Sig." or
"Signif." or "Significance." Using "p <" as an example, if a priori
you established a threshold for statistical significance at alpha
.05, any test statistic with significance at or less than .05 would be considered
statistically significant and you would be required to reject the null hypothesis of no
difference. The following table links p values
with a constant alpha benchmark of .05:
P
< 
Alpha 
Probability
of Making a Type I Error 
Decision 
.05 
.05 
5%
chance difference is not significant 
Statistically
significant 
.10 
.05 
10%
chance difference is not significant 
Not
statistically significant 
.01 
.05 
1%
chance difference is not significant 
Statistically
significant 
.96 
.05 
96%
chance difference is not significant 
Not
statistically significant 
