Sampling Distributions

First, and foremost, students must understand that statistics such as the sample mean and sample proportion are random variables.

Because they are random variables they have distributions associated with them.  The goal in this chapter is to build a model that describes the distribution of the sample mean or sample proportion.  Remind students that the word “describe” suggests that we want to identify the shape, center, and spread of the sampling distribution of the sample mean and sample proportion.

Section 8.1 Distribution of the Sample Mean

This section presents a discussion of the distribution of the sample mean. I build the probability models under two scenarios:  (1) assume sampling is from a normal population or (2) assume sampling is from a non-normal population.

  • Sampling from a Normal Population  There are two approaches you could take to introducing the sampling distribution of the sample mean when sampling from normal populations.
    • One option is to use simulation by obtaining at least 1000 random samples of size n = 9 from a normal population.  Personally, I recommend sampling from a normal population with mean 100 and standard deviation 15 (IQ data).  The can easily be done in StatCrunch by selecting Data > Simulate > Normal.   Generate 1000 rows and 9 columns.  For each row, compute the mean.  Then draw a histogram of the 1000 sample means, and find the mean/standard deviation of the 1000 sample means.  Students will recognize the shape of the distribution of the 1000 sample means is approximately normal, the mean is close to 100, and the standard deviation of the sample means is less than 15.  Repeat this for a sample of size n = 16.  There is an activity titled “Simulating IQ Scores” that leads students through this process in the Student Activity Workbook.
    • The second option is to use the sampling distribution applet in StatCrunch.  Try different sample sizes and generate samples one at a time initially so students can see how samples are obtained from the population and the sample mean is computed.  Eventually, generate at least 1000 random samples and have students identify the shape, center, and spread of the distribution of the sample means.
  • Sampling from a Non-Normal Population  Again, there are two approaches you may take.
    • The first approach would be to sample from a non-normal distribution (such as the exponential distribution).  Personally, I recommend against this because most students would not know what the shape of the distribution of the exponential density function looks like.  If you choose to go this route, follow the process as outlined above when sampling from normal populations.  However, you should show the students the shape of the distribution of the parent population.
    • The second approach is the approach I use in my classes, which is to use the sampling distribution applet from StatCrunch.  I use the uniform distribution as the parent population first.  Start by obtaining samples of size n = 2.  Ask the students to note the shape of the distribution of the sample means.  Also, note the mean and standard deviation of the sample means.  Increase the sample size to n = 5 or 10.  Note how the shape of the distribution of the sample mean is now approximately normal.  A cool feature of the sampling distribution applet is that you can draw your own population.  Use the mouse to draw a skewed right distribution.  Then, obtain at least 1000 samples with a sample of size n = 5.  The distribution of sample means is likely not approximately normal. However, notice the value of the mean of the sample means and the standard deviation of the sample means.  Then, increase the sample sizes until the distribution of the sample means is approximately normal.  There is an activity titled “Sampling from Normal and Non-Normal Populations” in the Student Activity Workbook that uses this approach.   It is important that students understand that the shape of the parent population is what determines how large the sample needs to be before the Central Limit Theorem “kicks in.”  For example, uniform distributions might only require a sample size of n = 4 before the distribution of the sample mean is approximately normal.  However, a highly skewed distribution requires a larger sample size.  So, the rule of thumb that the sample size must be at least 30 to invoke the Central Limit Theorem is very conservative.  Lastly, be sure students understand that the Central Limit Theorem only has to do with distribution shape – not center and spread.

Throughout the section, be sure that students understand that the normal model we are using to describe the distribution of the sample mean is a model that describes what would happen if we obtained infinitely many random samples of size n and computed the sample mean for each sample. Also, be sure to emphasize the interpretation of the probabilities as a means of foreshadowing the interpretation of P-values.

Section 8.2 Distribution of the Sample Proportion – This section presents a discussion of the distribution of the sample proportion.

  • Students should be aware that we are using a continuous distribution to model behavior of a qualitative variable with two outcomes: success and failure.  For this reason, we require a large sample size in order for the distribution of the sample proportion to be approximately normal.   We require that np(1 – p) > 10 in order for the normality condition to be satisfied.  This comes from the work of P.P. Ramsey and P.H. Ramsey in “Evaluating the Normal Approximation to the Binomial Test,” Journal of Educational Statistics 13 (1998): 173-182.
  • I like to use the sampling distribution for a binary variable applet.  See the activity “Describing the Distribution of the Sample Proportion” in the Student Activity Workbook.
  • Again, be sure to emphasize the interpretation of the probabilities as a means of foreshadowing the interpretation of P-values.