## Interactive Stats 2e

George Woodbury and I have been very busy these past few months putting together the revision of the innovative Interactive Statistics text.  This text is written entirely in MyStatLab with the mantra "read a little, watch a little, do a little," meaning, we expect students to learn through text, video, and engaging/interactive discovery exercises.  Both George and I have been using the first edition in our classes (both online and face-to-face using a flipped model).  Our experiences with the first edition along with insights from faculty around the country formulated the revision plan for the second edition.

The first edition was a great success where George and I saw increases in pass rates in our courses.  But, we weren't the only ones who had success.  Read the case study from Sam Bazzi at Henry Ford College.

Below is a list of some of the new features that you will find in the second edition.

• Excel Video Solutions
• New Ligthtboard Video.
• Updated MyStatLab Exercises
• New Material on Bootstrapping, Simulation & Randomization Methods

## New York Times Monthly Stats Feature

The American Statistical Association has partnered with the New York Times Learning Network to promote student understanding of graphs.  The goal of the feature to help students understand and think critically about graphs.  The feature will be called

What’s Going on in this Graph?

Here is a link to the announcement:

https://www.nytimes.com/2017/09/06/learning/announcing-a-new-monthly-feature-whats-going-on-in-this-graph.html

The feature will be published starting September 19 and continuing the second Tuesday of the following months (October 10November 14, and so on) through the end of the school year in May.

Leaders in statistical education will lead discussions on the date of the release from 9 am-2 pm around the following questions:

*   What do you notice?
*   What do you wonder?
*   What’s going on in this graph?

This is a great activity to pursue in your classes.

## Quartiles and Boxplots

I just finished teaching Chapter 3 of my text today.  To introduce the idea of quartiles and boxplots, I used two data sets.  The first is from the PayScale ROI Report.  The data set includes annual return on investment, total four year cost, graduation rate, and other variables for all colleges and universities throughout the country.  The data is available at

http://www.payscale.com/college-roi

I also uploaded the data to StatCrunch.  Search for “PayScale_ROI_2017” under Explore > Data.

I used the ROI data to find quartiles, identify outliers (very interesting), and draw boxplots.  By selecting this data, I was able to discuss one of the many factors a student should consider in selecting a college or university.

The second data set I used was from the data archives in the City of Chicago.  This data set lists every employee in the city, their department, employment status, and annual salary.  I used this data to draw side-by-side boxplots by department.  Again, many outliers that we were able to explore.

In both instances, students appreciated the fact that the data sets were obtained “on the fly”.  They definitely appreciated the “realness” of the data.

I very much encourage you to use these data sets in your classes and and also find other data sets that motivate your students.

## First Week of Classes

Classes have already started here at Joliet Junior College.  Last week we had our opening session.  The speaker at the session was Valencia Community College’s president, Sandy Shugart.  He is a very talented speaker and I took many lessons away from his session.  However, there was one point he made that hit me more than any of the others.

President Shugart pointed out that our craft is one where we get a “do-over” every semester.  Think about that.  Most of your students this semester will be meeting you for the first time. In addition, this may be their first exposure to the material you have been teaching for many years.  You have an opportunity to adjust your course based on previous semesters feedback and experiences (and should).

So, take the opportunity this semester (and all future semesters) to reflect on the newness of the material to your students.  Try to make the material fresh and exciting.  Share your passion for the subject and be the best that you can be for your students.

## Introduction to Hypothesis Tests on a Population Proportion

One of the most difficult concepts for students to grasp is that of a P-value.  The video below was recorded in my Introductory Statistics class at Joliet Junior College.   To introduce my students to P-values, I simulate drawing many (5000) samples from a population built based on the statement in the null hypothesis (this is called the null model).   From the simulation, students determine the relative frequency with which a sample statistic as extreme or more extreme is observed (based on selecting from a population that assumes the proportion of individuals in the population that have the characteristic is some value).

We then compare the simulated result to the P-value obtained from the normal model.

Finally, we increase the sample size to see the role sample size plays in the standard error, P-value, and ability to reject the statement in the null hypothesis.

Following this approach will increase your students conceptual understanding of P-values.  I have other suggestions for simulations in the Activity Workbook that accompanies all the texts in the Sullivan Statistics series.

## Students t Distribution

Do your students understand why we use Student’s t Distribution when performing inference on a mean when the population standard deviation is unknown?  I have uploaded a video to YouTube that I recorded in my Intro Stats class.  The video uses simulation to motivate Student’s t-distribution.

## Replication and the Media

The following research article published in PlosOne describes the role of replication in research articles as well as the lack of media follow-up on research.   I recommend that you ask your students to read this article and have a discussion about the role the media plays in delivering scientific research.  Does the media have an obligation to follow up its own reports to confirm their validity?

Here are some highlights that I garnered from the article.

A key idea to understand is that the P-value from a hypothesis test is computed from the sample data.  Therefore, the P-value is really the P-value based on the sample data in the study.  We know that sample data will vary from sample to sample, so the P-value itself will vary from sample to sample.

P-values are getting a lot of press due to the fact that some initial studies that report statistically significant results cannot be replicated in subsequent studies.

Of particular interest is that most newspapers focus on so-called lifestyle studies.   These studies pertain to associations between a pathology and a risk factor (such as the risk of lung cancer due to the choice of smoking versus not smoking or the risk of red meat consumption in getting colon cancer).  Contrast these types of studies with non-lifestyle studies (the role of a new medication on depression).

In lifestyle studies, newspapers reported on 5 of the 39 initial studies and 58 of the 600 follow-up studies.  In non-lifestyle studies, newspapers reported on 48 of the 366 initial studies,  and 45 of the 3718 follow-up studies.  What does this suggest?

Replication of results is important.  Among the 156 primary studies reported by newspapers, 76 had results that were validated by subsequent analysis.  Does this suggest less than a majority of initial studies reported by newspapers have their results validated by subsequent analysis?

Here is where it gets scary.  Among 53 initial studies covered by newspapers, 18 were confirmed by subsequent meta-analysis.  Among the 35 that were not confirmed, there were 503 studies of which 398 reported either the absence of a statistically significant effect, or a significant effect in the opposite direction.  Only 1 of the 398 studies and only 1 of the 35 meta-analysis studies was covered by newspapers.

Citation: Dumas-Mallet E, Smith A, Boraud T, Gonon F (2017) Poor replication validity of biomedical association studies reported by newspapers. PLoS ONE 12(2): e0172650. doi:10.1371/journal.pone.0172650

## ICTCM

I will be attending the International Conference on Teaching Collegiate Mathematics in Chicago.  The event is March 9 through March 12.

I am scheduled to give two talks at the conference.

1. My Positive Experience Using an Engaging and Interactive Statistics Program Online

I will be presenting with Sam Bazzi, Henry Ford College

2. Using Simulation and the Bootstrap to Introduce Hypothesis Tests on the Mean

## Mathematics of Love

In this weekend’s Wall Street Journal, there is an article entitled “In Love, Formula Suggests Only Fools Rush In”. The article looks into the marriage problem, first analyzed in Scientific American in 1960. The idea is essentially a decision about whether you should stay with the current individual you are courting, or dump that individual for someone else who may be a better option. The risk in staying with the individual you are currently courting is that there is someone “better” out there. However, the risk in moving on to another option is that you may be giving up your best option. What to do?
Why not have a little fun with this problem? Let’s say you have three options for a mate. Call them Option 1 (Ideal Mate), Option 2 (Acceptable Mate), Option 3 (Worst Choice). Suppose your selection strategy in choosing a mate is to choose the first person you meet. If you meet the individuals in random order, what is the probability you choose your ideal mate? To answer this question, you could simply lay out the possibilities.
Option 1, Option 2, Option 3
Option 1, Option 3, Option 2
Option 2, Option 1, Option 3
Option 2, Option 3, Option 1
Option 3, Option 1, Option 2
Option 3, Option 2, Option 1

Notice, the probability of selecting your ideal mate first is 1/3. Can the odds of choosing the best mate improve with a new selection strategy? Let’s try a different strategy.
Suppose you cannot choose the first person. The first person becomes the baseline for choosing your mate. If the next person you meet is an improvement on the previous person, you go with that option. Here are the outcomes for that scenario. In the first possibility you end up with Option 3 because Option 2 is not better than Option 1, so you move on and end up with the Worst Choice.
Option 1, Option 2, Option 3
Option 1, Option 3, Option 2
Option 2, Option 1, Option 3
Option 2, Option 3, Option 1
Option 3, Option 1, Option 2
Option 3, Option 2, Option 1
Notice you end up choosing the ideal mate 50% of the time! Not going with the first choice appears to be a good strategy. Is there an optimal strategy?
One strategy suggests rejecting the first 37% of potential mates is a decent way to go. In fact, it only takes a set of 20 potential mates using the 37% rejection rule to obtain a probability of finding the ideal mate equal to 0.38. So, if you have 20 potential mates, you should date the first 0.37(20) = 7 of them with the goal of learning the traits you desire in a lifelong mate. You are not allowed to choose any of these 7 as your lifelong-mate, however. After dating the seven test cases and learning the traits you desire, run a simple comparison test. Begin by comparing the 8th potential mate with the previous 7. If any of the previous 7 are thought to be superior to the 8th, you should move to the 9th potential mate and the process repeats. While dating the 9th potential mate, decide if any of the prior 8 are superior to the 9th. If so, move on; otherwise choose the 9th as your soul-mate. Continue until you have found your lifelong mate.
I mentioned this approach in my stats class and one of my students mentioned that this is how the online dating world works! The mathematics of love on this Valentine’s Day.

## Data Models versus Algorithmic Models

Here is an interesting article on the differences between Data Models (the typical models used in an Intro Stats course) and Algorithmic Models (the type of models Data Scientists use).

Statistical Models: Two Cultures