Tests for Independence and Homogeneity of Proportions

Install the Mosaic package, if necessary.

install.packages("mosaic")

Chi-Square Test for Independence:Summarized Data

Enter the data in Table 4 from Section 12.2. Use the do() command. Notice the row variable is Happiness and the column variable is Marital_Status. Name the data Table4_data.

Use the tally command to convert Table4_data into a contingency table using the tally command.

library(mosaic)
Table4_data <- rbind(
  do(600)*data.frame(Happiness="Very Happy",Marital_Status="Married"),
  do(63)*data.frame(Happiness="Very Happy",Marital_Status="Widowed"),
  do(112)*data.frame(Happiness="Very Happy",Marital_Status="Divorced/Separated"),
  do(144)*data.frame(Happiness="Very Happy",Marital_Status="Never Married"),
  do(720)*data.frame(Happiness="Pretty Happy",Marital_Status="Married"),
  do(142)*data.frame(Happiness="Pretty Happy",Marital_Status="Widowed"),
  do(355)*data.frame(Happiness="Pretty Happy",Marital_Status="Divorced/Separated"),
  do(459)*data.frame(Happiness="Pretty Happy",Marital_Status="Never Married"),
  do(93)*data.frame(Happiness="Not Too Happy",Marital_Status="Married"),
  do(51)*data.frame(Happiness="Not Too Happy",Marital_Status="Widowed"),
  do(119)*data.frame(Happiness="Not Too Happy",Marital_Status="Divorced/Separated"),
  do(127)*data.frame(Happiness="Not Too Happy",Marital_Status="Never Married")
)
Table4 <- tally(~Happiness+Marital_Status,data=Table4_data)

Use the xchisq.test on the contingency table in Table 4. Recall, xchisq.test gives expected counts, contribution to the the \(\chi^2\) test statistic, and residuals.

xchisq.test(Table4)

## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 224.12, df = 6, p-value < 2.2e-16
## 
##   119       93      127       51   
## ( 76.56) (184.61) ( 95.38) ( 33.45)
## [23.522] [45.462] [10.485] [ 9.212]
## < 4.85>  <-6.74>  < 3.24>  < 3.04> 
##        
##   355      720      459      142   
## (329.02) (793.36) (409.88) (143.74)
## [ 2.051] [ 6.784] [ 5.888] [ 0.021]
## < 1.43>  <-2.60>  < 2.43>  <-0.14> 
##        
##   112      600      144       63   
## (180.41) (435.02) (224.75) ( 78.82)
## [25.943] [62.564] [29.011] [ 3.174]
## <-5.09>  < 7.91>  <-5.39>  <-1.78> 
##        
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

The test statistic is \(\chi^2_0 = 224.12\) and the P-value is 2.2e-16 (very small).

Conditional Distribution and Bar Graph

Now, let’s construct a conditional distribution from Table 4.

Recall, to create a conditional distribution in R, use the following command:

variable <- prop.table(table, 1 or 2)

Note Use 1 to condition by the row variable; use 2 to condition by the column variable.

We are treating marital status as the explanatory variable (the column variable), so use 2 in the syntax.

Table4_condition <- prop.table(Table4,2)
Table4_condition

##                Marital_Status
## Happiness       Divorced/Separated    Married Never Married    Widowed
##   Not Too Happy         0.20307167 0.06581741    0.17397260 0.19921875
##   Pretty Happy          0.60580205 0.50955414    0.62876712 0.55468750
##   Very Happy            0.19112628 0.42462845    0.19726027 0.24609375

Now, we can see an individual is much more likely to be “Very Happy” if they are married.

Now that we have the conditional distribution, use the barplot command. The syntax is as follows:

barplot(df_name,beside=TRUE)

Note: cex.names decreases the font size of the labels. legend = TRUE adds a legend. ylim=c(0,1.2) adjusts the length of the y-axis so the legend does not overlay the graph. You should experiment with the limits until you are happy with the graph.

barplot(Table4_condition, beside = TRUE, cex.names = .7,legend=TRUE, ylim=c(0,1.2),main="Employment Status by Level of Education", xlab = "Level of Happiness", ylab = "Relative Frequency", col = c('#6897bb', '#c06723', '#baebae'))

Chi-Square Test for Independence:Raw Data

Is there an association between level of education and political philosophy? Open the SullivanStatsSurveyII data file to answer this question.

Survey <- read.csv("https://sullystats.github.io/Statistics6e/Data/SullivanStatsSurveyII.csv")
head(Survey,n=3)

##   Response_id Gender Age    Education Tax.Rate GenderIncomeInequality
## 1      290408 Female  19 Some College       10                     No
## 2      290410 Female  18 Some College       10                    Yes
## 3      290412 Female  21 Some College       10                    Yes
##   MinWageOpinion MinWageAmount Political.Philosophy Text RetirementDollars
## 1            Yes          10.0             Moderate  Yes           1200000
## 2            Yes           9.0         Conservative   No            350000
## 3            Yes           9.5              Liberal  Yes           1000000
##   RetirementAge DeathAge
## 1            65       90
## 2            61      105
## 3            60       90

Now, let’s build a contingency table using the variables “Education” and “Political Philosophy”.

ContTable <- tally(~Education+Political.Philosophy,data=Survey)
ContTable

##                                  Political.Philosophy
## Education                         Conservative Liberal Moderate
##   Bachelor's Degree                          8       3       18
##   Graduate or Professional Degree            8       4       14
##   High School Diploma                        4       2        7
##   Some College                              17      20       29

xchisq.test(ContTable)

## Warning in chisq.test(x = x, y = y, correct = correct, p = p, rescale.p =
## rescale.p, : Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 6.3355, df = 6, p-value = 0.3867
## 
##     8        3       18   
## ( 8.01)  ( 6.28)  (14.72) 
## [7.0e-06] [1.7e+00] [7.3e-01]
## <-0.0026> <-1.3077> < 0.8559>
##      
##     8        4       14   
## ( 7.18)  ( 5.63)  (13.19) 
## [9.4e-02] [4.7e-01] [4.9e-02]
## < 0.3064> <-0.6858> < 0.2219>
##      
##     4        2        7   
## ( 3.59)  ( 2.81)  ( 6.60) 
## [4.7e-02] [2.4e-01] [2.5e-02]
## < 0.2166> <-0.4850> < 0.1569>
##      
##    17       20       29   
## (18.22)  (14.28)  (33.49) 
## [8.2e-02] [2.3e+00] [6.0e-01]
## <-0.2867> < 1.5125> <-0.7763>
##      
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>