Testing Hypotheses Regarding Two Population Proportions: Summarized Data

Let’s work Example 2 from Section 11.1.

In clinical trials of Nasonex, 3774 adult and adolescent allergy patients (patients 12 years and older) were randomly divided into two groups. The patients in group 1 (experimental group) received 200 g of Nasonex, while the patients in group 2 (control group) received a placebo. Of the 2103 patients in the experimental group, 547 reported headaches as a side effect. Of the 1671 patients in the control group, 368 reported headaches as a side effect. Is there evidence to conclude that the proportion of Nasonex users who experienced headaches as a side effect is greater than the proportion in the control group?

Here, we are testing

\(H_0:p_1 = p_2\)
\(H_1:p_1 > p_2\)

The syntax for the test is

prop.test(x = c(\(x_1\), \(x_2\)), n = c(\(n_1\), \(n_2\)), alternative = ‘greater’, correct = FALSE)

where:
* \(x_1\) and \(x_2\) are the number of patients who reported a headache from group 1 and group 2, respectively.
* \(n_1\) and \(n_2\) are the total number of patients in group 1 and group 2, respectively.

Note: For a left-tailed test, use alternative = ‘less’; for a two-tailed test, use alternative = ‘two.sided’.

prop.test(x = c(547, 368), n = c(2103, 1671), alternative = 'greater', conf.level = .95, correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  c(547, 368) out of c(2103, 1671)
## X-squared = 8.0618, df = 1, p-value = 0.00226
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.01695043 1.00000000
## sample estimates:
##    prop 1    prop 2 
## 0.2601046 0.2202274

The P-value is 0.002.

Testing Hypotheses Regarding Two Population Proportions: Raw Data

Tornado <- read.csv("https://sullystats.github.io/Statistics6e/Data/Tornadoes_2017.csv")
head(Tornado,n=3)
##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 1     1   2  9:03:00    TX       1        0          0    30000   2.55   100
## 2     1   2  9:44:00    TX       1        0          0    30000   2.57   100
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
##   NumberStates F0
## 1            1 No
## 2            1 No
## 3            1 No

We are going to focus on the column F0, which is a categorical variable that is No if the tornado is not an F0 and Yes if the tornado is an F0 tornado.

Suppose we want to know if there is a difference in the proportion of F0 tornadoes in Louisiana (LA) versus Georgia (GA).

We need to use the Mosaic package.

install.packages("mosaic")

The first thing we need to do is obtain a subset of the data set that only contains observations for Louisiana (LA) and Georgia (GA). Use the subset command on the data set “Tornado” where the “State” is (==) LA or (|) GA. We will name the new data file Data_LA_GA.

Data_LA_GA <- subset(Tornado,State=="LA"|State=="GA")
head(Data_LA_GA,n=3)
##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
## 4     1   2 10:17:00    LA       1        0          0    50000   1.20    50
## 5     1   2 10:30:00    LA       1        0          0    20000   4.64   100
##   NumberStates F0
## 3            1 No
## 4            1 No
## 5            1 No

Now, run the prop.test command in the Mosaic package. The syntax is

prop.test(response variable ~ explanatory variable,data = data frame,alternative = less or greater or two.sided,correct=FALSE)

Note: correct=FALSE turns off the correction for continuity and gives results equivalent to using the normal model.

library(mosaic)
prop.test(F0 ~ State,data=Data_LA_GA,alternative="two.sided",correct=FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  tally(F0 ~ State)
## X-squared = 2.8769, df = 1, p-value = 0.08986
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.2375579  0.0144915
## sample estimates:
##    prop 1    prop 2 
## 0.6355932 0.7471264

The P-value is 0.0899.

R and Mosaic use a distribution called the \(\chi^2\)-distribution. The test statistic is shown to be 2.8769. To find the test statistic using the normal model, find the square root of the test statistic provided.

sqrt(2.8769)
## [1] 1.696143

So, the test statistic is 1.696.