Learn moreLearn moreApplied Statistics Handbook

Table of Contents



The Normal Distribution

Although there are numerous sampling distributions used in hypothesis testing, the normal distribution is the most common example of how data would appear if we created a frequency histogram where the x axis represents the values of scores in a distribution and the y axis represents the frequency of scores for each value.   Most scores will be similar and therefore will group near the center of the distribution.  Some scores will have unusual values and will be located far from the center or apex of the distribution.  These unusual scores are represented below as the shaded areas of the distribution.  In hypothesis testing, we must decide whether the unusual values are simply different because of random sampling error or they are in the extreme tails of the distribution because they are truly different from others.  Sampling distributions have been developed that tell us exactly what the probability of this sampling error is in a random sample obtained from a population that is normally distributed.



Properties of a normal distribution


         Forms a symmetric bell-shaped curve

         50% of the scores lie above and 50% below the midpoint of the distribution

         Curve is asymptotic to the x axis

         Mean, median, and mode are located at the midpoint of the x axis


Using theoretical sampling probability distributions

Sampling distributions allow us to approximate the probability that a particular value would occur by chance alone.  If you collected means from an infinite number of repeated random samples of the same sample size from the same population you would find that most means will be very similar in value, in other words, they will group around the true population mean.  Most means will collect about a central value or midpoint of a sampling distribution.  The frequency of means will decrease as one travels away from the center of a normal sampling distribution.   In a normal probability distribution, about 95% of the means resulting from an infinite number of repeated random samples will fall between 1.96 standard errors above and below the midpoint of the distribution which represents the true population mean and only 5% will fall beyond (2.5% in each tail of the distribution).

The following are commonly used points on a distribution for deciding statistical significance.

90% of scores             +/- 1.65 standard errors

95% of scores             +/- 1.96 standard errors

99% of scores             +/- 2.58 standard errors


Standard error:  Mathematical adjust to the standard deviation to account for the effect sample size has on the underlying sampling distribution.  It represents the standard deviation of the sampling distribution.


Alpha and the role of the distribution tails

The percentage of scores beyond a particular point along the x axis of a sampling distribution represent the percent of the time during an infinite number of repeated samples one would expect to have a score at or beyond that value on the x axis.  This value on the x axis is known as the critical value when used in hypothesis testing. The midpoint represents the actual population value. Most scores will fall near the actual population value but will exhibit some variation due to sampling error.  If a score from a random sample falls 1.96 standard errors or farther above or below the mean of the sampling distribution, we know from the probability distribution that there is only a 5% or less chance of randomly selecting a set of scores that would produce a sample mean that far from the true population mean.  This area above and below 1.96 standard errors is the region of rejection.   

When conducting significance testing, if we have a test statistic that is at least 1.96 standard errors above or below the mean of the sampling distribution, we assume we have a statistically significant difference between our sample mean and the expected mean for the population.  Since we know a value that far from the population mean will only occur randomly 5% or less of the time, we assume the difference is the result of a true difference between the sample and the population mean, and is not the result of random sampling error.  The 5% is also known as the probability of being wrong when we conclude statistical significance.



1-tailed vs. 2-tailed statistical tests

A 2-tailed test is used when you cannot determine a priori whether a difference between population parameters will be positive or negative.  A 1-tailed test is used when you can reasonably expect a difference will be positive or negative.  If you retain the same critical value for a 1-tailed test that would be used if a 2-tailed test was employed, the alpha is halved (i.e., .05 alpha would become .025 alpha).



Copyright 2015, AcaStat Software. All Rights Reserved.