First, be sure the package mosaic is installed.

install.packages('mosaic')

Now, we will load population data. First, let’s consider the fare charged by ALL Chicago taxi rides on a single day.

Taxi <- read.csv("https://sullystats.github.io/Statistics6e/Data/ChicagoTaxi.csv")
head(Taxi,n=4)
##   Trip  Fare Payment
## 1  300  6.50    Cash
## 2 1281 42.25  Credit
## 3  780 10.75    Cash
## 4  900 17.00  Credit

We are going to focus on the variable “Payment Method”, which is how the the fare is paid – cash or credit.

Let’s look at the distribution of this variable and get some summary statistics.

library(mosaic)
options(digits=3)
tally(~Payment,format="proportion", data=Taxi)
## Payment
##   Cash Credit 
##  0.523  0.477

The population proportion of fares paid with cash is 0.523.

Now, let’s take a random sample of n = 50 rides from this data set and determine the sample proportion of fares paid with cash.

tally(~Payment,format="proportion",data=sample(Taxi,50))   # Find the sample proportion of a sample of size 50
## Payment
##   Cash Credit 
##   0.52   0.48

Let’s take another random sample of n = 50 rides from this data set and determine the sample proportion of fares paid with cash.

tally(~Payment,format="proportion",data=sample(Taxi,50))   # Find the sample proportion of a sample of size 50
## Payment
##   Cash Credit 
##   0.56   0.44
SamplingDist <- bind_rows(do(5000)*c(prop = tally(~Payment,format="proportion",data=sample(Taxi,50))))
head(SamplingDist,n=4)
##   prop.Cash prop.Credit
## 1      0.44        0.56
## 2      0.44        0.56
## 3      0.40        0.60
## 4      0.58        0.42

Notice that the sample proportion of cash payments (prop.Cash) varies from sample to sample. Now, let’s look at the shape, center, and spread of the sampling distribution of \(\hat{p}\).

gf_histogram(~prop.Cash,data=SamplingDist,binwidth=0.02,color="black",fill="blue",xlab="Sample Proportion of Cash Payments",ylab="Frequency",title="Distribution of Sample Proportion of Cash Payments for Taxi Rides in Chicago")

mean(~prop.Cash,data=SamplingDist)
## [1] 0.524
sd(~prop.Cash,data=SamplingDist)
## [1] 0.0725

The shape of the distribution of \(\hat{p}\) is approximately normal because np(1 - p) \(\geq\) 10. The mean and standard deviation of the distribution of \(\hat{p}\) is \(\mu_\hat{p} = p\) and \(\sigma_\hat{p} = \sqrt\frac{p(1-p)}{n}\).