Construct Confidence Intervals for the Difference between Two Population Proportions: Summarized Data

Let’s work Example 3 from Section 11.1.

The Gallup organization surveyed 1100 adult Americans on May 6–9, 2002, and conducted an independent survey of 1024 adult Americans on May 1–10, 2018. In both surveys they asked the following: “Right now, do you think the state of moral values in the country as a whole is getting better or getting worse?” On May 1–10, 2018, 784 of the 1024 surveyed responded that the state of moral values is getting worse; on May 6–9, 2002, 737 of the 1100 surveyed responded that the state of moral values is getting worse. Construct and interpret a 90% confidence interval for the difference between the two population proportions.

The syntax for the test is

prop.test(x = c(\(x_1\), \(x_2\)), n = c(\(n_1\), \(n_2\)), conf.level = level of confidence, correct = FALSE)

where:
* \(x_1\) and \(x_2\) are the number of successes from group 1 and group 2, respectively.
* \(n_1\) and \(n_2\) are the total number of individuals in group 1 and group 2, respectively.

Note R will compute the confidence interval as \(p_1 - p_2\).

prop.test(x = c(784, 737), n = c(1024, 1100), conf.level = .90, correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  c(784, 737) out of c(1024, 1100)
## X-squared = 23.853, df = 1, p-value = 1.04e-06
## alternative hypothesis: two.sided
## 90 percent confidence interval:
##  0.06372003 0.12752997
## sample estimates:
##   prop 1   prop 2 
## 0.765625 0.670000

The lower bound on the 90% confidence interval is 0.064 and the upper bound is 0.128.

Construct Confidence Intervals for the Difference between Two Population Proportions: Raw Data

Tornado <- read.csv("https://sullystats.github.io/Statistics6e/Data/Tornadoes_2017.csv")
head(Tornado,n=3)
##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 1     1   2  9:03:00    TX       1        0          0    30000   2.55   100
## 2     1   2  9:44:00    TX       1        0          0    30000   2.57   100
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
##   NumberStates F0
## 1            1 No
## 2            1 No
## 3            1 No

We are going to focus on the column F0, which is a categorical variable that is No if the tornado is not an F0 and Yes if the tornado is an F0 tornado.

Suppose we want to compute a 95% confidence for the difference in the proportion of F0 tornadoes in LA verus GA.

We need to use the Mosaic package.

install.packages("mosaic")

The first thing we need to do is obtain a subset of the data set that only contains observations for Louisiana (LA) and Georgia (GA). Use the subset command on the data set “Tornado” where the “State” is (==) LA or (|) GA. We will name the new data file Data_LA_GA.

Data_LA_GA <- subset(Tornado,State=="LA"|State=="GA")
head(Data_LA_GA,n=3)
##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
## 4     1   2 10:17:00    LA       1        0          0    50000   1.20    50
## 5     1   2 10:30:00    LA       1        0          0    20000   4.64   100
##   NumberStates F0
## 3            1 No
## 4            1 No
## 5            1 No

Now, run the prop.test command in the Mosaic package. The syntax is

prop.test(response variable ~ explanatory variable,data = data frame,conf.level = level of confidence,correct=FALSE)

Note: correct=FALSE turns off the correction for continuity and gives results equivalent to using the normal model.

library(mosaic)
prop.test(F0 ~ State,data=Data_LA_GA,conf.level=0.95,correct=FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  tally(F0 ~ State)
## X-squared = 2.8769, df = 1, p-value = 0.08986
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.2375579  0.0144915
## sample estimates:
##    prop 1    prop 2 
## 0.6355932 0.7471264

The prop.test computes the interval as “prop1 - prop2”. The variable in prop1 is always the first entry. In looking at the results from the head command, that is the state of Louisiana. So, the interval is constructed as \(p_{LA} - p_{GA}\).