Hypothesis Testing

Hypothesis Tests Regarding a Parameter

Overview

In the last chapter we mentioned that there are two types of inference: estimation and hypothesis testing. In this chapter, we focus on hypothesis testing. There are many different forms of hypothesis testing. Here, we focus on hypothesis testing regarding the value of a single population parameter. The two parameters we conduct hypothesis testing on are the population proportion and population mean.

There are two widely accepted approaches to hypothesis testing—the classical approach and the P-value approach. While the classical method has merits (especially from a historical perspective), the P-value approach is more widely used and provides more information regarding the hypothesis test. Another advantage of the P-value approach is that the same decision rule is used throughout the course (if P-value < “alpha,” reject the statement in the null hypothesis.) In addition, the P-value approach is the method used in research and journal articles. For these reasons, we recommend that emphasis be placed on the P-value approach to hypothesis testing.

In March of 2019, the American Statistical Association (ASA) published a special issue of The American Statistician. In this issue, there are many articles regarding the use of P-values in making judgements about hypotheses. You are encouraged to download the open source issue at

https://www.tandfonline.com/toc/utas20/73/sup1?nav=tocList

Of particular importance is reading the article Moving to a World Beyond “p < 0.05”. In this article, the authors suggest that the statistical community (and researchers) should move away from the dichotomous decision of rejecting the statement in the null hypothesis is the P-value is less than some level of significance (such as ), and not rejecting the statement in the null hypothesis otherwise. The argument lies in the fact that the P-value is a random variable (whose value is determined by a random sample or random assignment) and can change from sample to sample (or experiment to experiment). Why should a study with a P-value of 0.049 be considered statistically significant, while one with a P-value of 0.051 not? As you teach your class, please be mindful of the direction the statistical community is headed. For now, we still have the “if P-value < , reject the statement in the null hypothesis” decision rule in the text. However, please emphasize the randomness of P-values and the importance that any study should be replicated (with conclusions confirmed) in subsequent stories). Also, emphasize the interpretation of the P-value as laid out in the text.

What to Emphasize

The chapter begins with a discussion of the language of hypothesis testing and how to structure hypotheses. There is a lot of vocabulary in this section and time should be dedicated to familiarizing students with the language. Be sure to emphasize what the null and alternative hypotheses represent. The null hypothesis is always a statement of "no change" or "no difference." That is, it is the statement of status quo (Latin for "existing state"). Emphasize that we always assume the statement in the null hypothesis is true. We collect sample data and essentially decide whether the sample data is consistent with the statement in the null hypothesis. If not, we reject the null hypothesis in favor of the alternative hypothesis. The alternative hypothesis is the statement we are looking to demonstrate. It is sometimes called the "research hypothesis." It is a good idea to use the court system as an analogy for hypothesis testing. The null is that the defendant is innocent and the alternative is guilt. We assume innocence and consider the evidence against this assumption. It is also important to emphasize that a defendant is never declared innocent, only not guilty, if the evidence is not enough to convince a jury of guilt. That is, we never accept the statement in the null hypothesis.

The most important concept for students to understand coming out of this chapter is the meaning of the P-value and how to interpret the P-value. Most research and academic journals will report P-values. No matter what experimental design or observational study used, the interpretation of the P-value is always the same. So, even if a student does not understand the data collection method, it is still possible to understand what the P-value is measuring.

Before you move to Section 10.2, you have a big decision to make. You must decide whether you want to cover hypothesis testing from a traditional point of view or introduce hypothesis testing using simulation. Simulation-based inference is a relatively new approach to hypothesis testing. The Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report 2016, endorsed by the American Statistical Association, acknowledges that there are new, “innovative ways to teach the logic of statistical inference.” The most popular techniques for introducing inference are simulations and resampling methods (randomization tests and bootstrapping). Read George Cobb’s article The Introductory Statistics Course: A Ptolemaic Curriculum available at

http://escholarship.org/uc/item/6hb3k0nz

Below, we lay out a guide to assist you in utilizing these materials in your classes.

Hypothesis Tests Regarding a Population Proportion The chapter begins with hypothesis tests on a population proportion. The reason for this is that tests regarding a population proportion utilize the normal model, which is familiar to students. Should you decide to go the traditional route, only cover Section 10.2. If you decide to present hypothesis testing using simulation, cover Sections 10.2A and 10.2B.

Section 10.2: Traditional Approach to Hypothesis Tests Regarding a Population Proportion

Begin by reviewing the sampling distribution of the sample proportion.
Even if you don’t want to cover simulation, there is pedagogical value in going through the applet activity that illustrates the logic of hypothesis testing (at least as a classroom demonstration). It is important for students to understand that the simulation assumes the population proportion of constituents in favor of the policy is assumed to be 50% (the proportion stated in the null hypothesis). Be sure that students recognize the shape and center of the distribution after they perform the simulation for 1000 runs. Ask students to explain what the results of a particular run represent. For example, if the student has a run with 258 heads and 242 tails, what does this mean? Next, ask students to explain what the proportion of runs that result is at least 260 heads represents. Repeat this for 270 and 280 heads. Students should recognize that the shape of the distribution of heads from the simulation is approximately normal. This is why we can use the normal model to estimate P-values.
Next, go through the steps for conducting a hypothesis test regarding a population proportion. Whether you are using technology or not, we strongly encourage you to have students determine a P-value for a right-tailed or left-tailed hypothesis test about a population proportion. This is another advantage of introducing hypothesis tests for proportions first - we can use the normal model to find P-values without needed to estimate the P-value (like we would using Student's t-distribution). Even if you are using technology to find areas under normal curves, the student should compute the probability of obtaining a sample proportion as extreme as or more extreme than the sample proportion obtained under the assumption the proportion in the null hypothesis is true. In fact, compare the results of a simulation to those obtained using the normal model. Once this is done at least once, feel free to rely on technology to find P-values.
Be sure to emphasize the interpretation of the P-value. For each hypothesis test, require students to explain what the P-value represents.
Another area of emphasis needs to be the role sample size plays in hypothesis testing. Start by discussing hypothesis testing from small sample sizes. For proportions, this means one must use the binomial probability distribution function to find P-values. In addition, the sample evidence required to reject the statement in the null hypothesis must overwhelming contradict the proportion in the null. Bottom line: when using small samples to test hypotheses, it is very difficult to reject the statement in the null hypothesis.

NEW! Sections 10.2A and 10.2B: Simulation-Based Inference to Hypothesis Testing (Optional)

Section 10.2A presents hypothesis tests for a proportion using simulation. There are two approaches that may be used with the simulation method. The first approach utilizes coin flipping. This model is useful when describing a random process (such as deciding whether stocks will go up or down for a series of stocks). Be sure to emphasize that each flip of a coin represents a choice. Also emphasize what a head represents for each flip of the coin. Clearly define the concept of a null model. Going through a tactile simulation with actual coins is a good idea. A second method for building the null model is through the use of the urn applet in StatCrunch. The advantage of this method is that you are actually building a population based on the proportion stated in the null hypothesis and randomly selecting outcomes from this population. This helps students develop an intuitive feel for the null model and the meaning of the P-value. One final item to consider is whether you use counts or proportions for the test statistic. The advantage of counts is that it is easier and is one less layer of complication. The advantage of proportions is this is the basis of the test statistic when we segue to using the normal model to estimate P-values. Once the null model is built, be sure to point out the shape, center, and spread of the outcomes of the simulation.
Once you complete Section 10.2A, jump into Section 10.2B. This section utilizes the normal model to obtain P-values for hypothesis tests on a proportion. Continue to emphasize the interpretation of P-values. You might consider presenting simulation side-by-side with the normal model approach so students can see the similarity between the two approaches.

Hypothesis Tests Regarding a Population Mean Now we conduct hypothesis tests about a population mean. We do not cover hypothesis tests for a population mean under the assumption the population standard deviation is known.

NEW! Section 10.3A: Hypothesis Tests on a Population Mean Using Simulation and the Bootstrap (Optional) - Section 10.3A is optional and may be covered prior to the discussion of Section 10.3. This material utilizes both simulation and the bootstrap to perform hypothesis tests on a population mean. Section 10.3A begins with using simulation to estimate P-values. If you did not cover bootstrapping in Chapter 9, then be sure to only cover Objective 1.

Section 10.3: Hypothesis Tests on a Population Mean (Using Student’s t-Distribution)

Section 10.3 presents hypothesis tests for a population mean using Student’s t-distribution.
Review the sampling distribution of the sample mean and the properties of Student's t-distribution.
Feel free to rely on technology to obtain P-values. However, be sure to remind students that the model approximates the P-value based on Student's t-distribution. This P-value represents the likelihood of obtaining a sample mean as extreme as, or more extreme than, the sample mean obtained under the assumption the null hypothesis is true. We could also obtain this P-value by conducting the study over and over (like we did with the simulation in the section on proportions).
Continue to emphasize the interpretation of the P-value. For each hypothesis test, require students to explain what the P-value represents.
The section ends with a discussion of statistical versus practical significance. Be sure students understand that virtually any null hypothesis could be rejected simply by increasing the sample size. Why? This is due to the fact that larger sample sizes result in lower standard errors. Students should watch out for studies that claim statistical significance when the sample sizes are large.

Section 10.4: Hypothesis Tests Regarding a Population Standard Deviation

This section focuses on hypothesis tests regarding a population standard deviation (or variance). If you find yourself pressed for time, the material in this section is optional and may be skipped without loss of continuity.
- Review the characteristics of the chi-square distribution, first introduced in Section 9.3.
- Feel free to rely on technology to obtain P-values. Remind students that the model approximates P-values based on the chi-square distribution.
- Continue to emphasize the interpretation of the P-value. Require that students provide an interpretation for each hypothesis test.

Section 10.5: Putting It Together: Which Procedure Do I Use?

It is my experience that students will have difficulty reading problems and ascertaining which statistical technique to utilize. For this reason, we wrote the Putting It Together section to provide a mix of hypothesis tests on population proportions, population means, and population standard deviations or variances. We also throw in a couple problems that require estimating a population parameter as a reminder that there are two forms of inference: estimation and hypothesis testing. Emphasize that proportions are based on qualitative data with two outcomes: means are based on quantitative data where we are interested in a measure of center (or "typical" value); standard deviations/variances are based on quantitative data where we are interested in a measure of spread (often expressed by saying words such as “consistent”).

Section 10.6: The Probability of a Type II Error and the Power of the Test

This is an optional section and may be skipped without loss of continuity.

Ideas for Traditional/Online/Blended/Flipped

By far, the most important concept for students to understand is the meaning of the P-value. To help students understand this concept, we recommend the use of the applets in StatCrunch. Either require students to go through the activity at the beginning of Section 9.1 or select one of the problems from the end of section exercises. These applet activities allow students to see how simulation may be used to approximate P-values and the activities illustrate the interpretation of P-values. Lastly, the simulations emphasize that the normal distribution (or Student's t-distribution) are models that represent what would happen if the study were conducted many, many times. That is, the P-value is essentially the long-term proportion of times an outcome as extreme as, or more extreme than, the outcome actually observed might occur if the statement in the null hypothesis is true.
Use the discussion board to ask questions about the interpretation of the P-value. It is also important to ask questions about the role of sample size in hypothesis testing and to emphasize the difference between statistical and practical significance. Finally, we recommend adding simple studies and require students to explain which statistical technique they would use to answer the research objective (such as those found in Problems 10-15 in Section 10.4). Require students to explain why the method they chose is most appropriate. In this discussion, require students to identify the response variable in the study and identify whether it is qualitative or quantitative. It is important to get students in this habit as we move to more elaborate studies.