Inference about Two Means: Indpendent Samples

install.packages("mosaic")

Confidence Intervals for Two Independent Means: Summarized Data

Follow the scenario of Example 2 in Section 11.3.

\(\bar{x}_R = 54\)
\(s_R = 2.9\)
\(n_R = 513\)
\(\bar{x}_D = 41\)
\(s_D = 2.6\)
\(n_D = 513\)

To obtain a confidence interval for the difference of two means, install the PASWR package.

install.packages("PASWR")

library(PASWR)
tsum.test(mean.x=54,s.x=2.9,n.x=513,mean.y=41,s.y=2.6,n.y=513,conf.level=0.95)

## 
##  Welch Modified Two-Sample t-Test
## 
## data:  Summarized x and y
## t = 75.598, df = 1012, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.66256 13.33744
## sample estimates:
## mean of x mean of y 
##        54        41

Notice that the interval is constructed as \(\mu_x - \mu_y\). The lower bound of the confidence interval is 12.663 and the upper bound is 13.337. You may change the level of confidence to whatever level is desired.

Confidence Interval for Two Independent Means: Raw Data in Two Columns

We will use the data and scenario from Example 1 in Section 11.3. However, we will construct a 90% confidence interval from the data.

Table3 <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter11/Table3.csv")
head(Table3,n=3)

##   Flight Control
## 1   8.59    8.65
## 2   6.87    7.62
## 3   7.00    7.33

Use the following command to construct the confidence interval:

t.test(x, y, conf.level=level of confidence)

where: x is the first column in the data set (Sample 1) y is the second column in the data set (Sample 2)

t.test(Table3$Flight, Table3$Control, conf.level=0.90)

## 
##  Welch Two Sample t-test
## 
## data:  Table3$Flight and Table3$Control
## t = -1.4368, df = 25.996, p-value = 0.1627
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  -1.2013517  0.1027802
## sample estimates:
## mean of x mean of y 
##  7.880714  8.430000

The lower bound of the confidence interval is -1.201 and the upper bound is 0.103.

Hypothesis Tests for Two Independent Means: Raw Data-One Column with Quantitative Variable; One with Qualitative Variable

When the categorical variable is in one column and the quantitative response variable is in a second column, we use the Mosaic package.

Let’s use the Tornado_2017.csv data.

Tornado <- read.csv("https://sullystats.github.io/Statistics6e/Data/Tornadoes_2017.csv")
head(Tornado,n=3)

##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 1     1   2  9:03:00    TX       1        0          0    30000   2.55   100
## 2     1   2  9:44:00    TX       1        0          0    30000   2.57   100
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
##   NumberStates F0
## 1            1 No
## 2            1 No
## 3            1 No

Suppose we want to estimate the difference in the mean length of a tornado in Louisiana (LA) versus Georgia (GA) with 90% confidence?

If necessary, install the Mosaic package.

install.packages("mosaic")

First, we need to obtain a subset of the data set that only contains observations for Louisiana (LA) and Georgia (GA).

Data_LA_GA <- subset(Tornado,State=="LA"|State=="GA")  # The | means "or" in R
head(Data_LA_GA)

##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
## 4     1   2 10:17:00    LA       1        0          0    50000   1.20    50
## 5     1   2 10:30:00    LA       1        0          0    20000   4.64   100
## 6     1   2 10:30:00    LA       1        0          0   150000   2.74   100
## 7     1   2 11:06:00    LA       1        0          0    50000   0.54    50
## 8     1   2 11:30:00    LA       0        0          0    75000   0.54    25
##   NumberStates  F0
## 3            1  No
## 4            1  No
## 5            1  No
## 6            1  No
## 7            1  No
## 8            1 Yes

Now, run the t.test command in Mosaic. The syntax is

t.test(response variable ~ explanatory variable, data=data frame,conf.level = level of confidence)

t.test(Length ~ State,data=Data_LA_GA,conf.level=0.90)

## 
##  Welch Two Sample t-test
## 
## data:  Length by State
## t = 2.5539, df = 185.4, p-value = 0.01146
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  0.7505899 3.5054242
## sample estimates:
## mean in group GA mean in group LA 
##         5.345593         3.217586

Notice that Georgia is the first state in the output, so the interval is constructed as \(\mu_{GA} - \mu_{LA}\). The lower bound of the interval is 0.751 miles and the upper bound is 3.505 miles.

Note The command var.equal=TRUE will pool the standard deviations. By default, the Mosaic package assumes unequal variances (and therefore uses Welch’s t.)