Chi-Square Goodness of Fit

class: center, middle, inverse, title-slide

# Chi-Square Goodness of Fit

---

## Annoucements

* Office hours

---

## Last week

Critiques of NHST

* The focus on _p_-values leads to what kinds of problems in the scientific literature?

* What evidence is there of these issues?

* What can we do about it?

---

## Today...

* The chi-square goodness-of-fit test
* One-sample *t*-tests

---

### Key questions:

* How do we know if category frequencies are consistent with null hypothesis expectations?

* How do we handle categories with very low frequencies?

* How do we compare one sample to a population mean?

---

# What are the steps of NHST?

1. Define null and alternative hypothesis.

2. Set and justify alpha level.

3. Determine which sampling distribution ( `$z$`, `$t$`, or `$\chi^2$` for now)

4. Calculate parameters of your sampling distribution under the null.
  * If `$z$`, calculate `$\mu$` and `$\sigma_M$`

5. Calculate test statistic under the null.
  * If `$z$`, `$\frac{\bar{X} - \mu}{\sigma_M}$`
  
--

6. Calculate probability of that test statistic or more extreme under the null, and compare to alpha.

---

One-sample tests compare your given sample with a "known" population.

* Research question: does this sample come from this population?

**Hypotheses**

`$H_0$`: Yes, this sample comes from this population.

`$H_1$`: No, this sample comes from a different population.

---

The [sample data](../data/census_at_school.csv) were obtained from Census at School, a website developed by the American Statistical Association to help students in the 4th through 12th grades understand statistical problem-solving.

* The site sponsors a survey that students can complete and a database that students and instructors can use to illustrate principles in quantitative methods.  
  
  * The database includes students from all 50 states, from grade levels 4 through 12, both boys and girls,  who have completed the survey dating back to 2010.

---

We’ll focus on this one:

Which of the following superpowers would you most like to have? Select one.

* Invisibility
* Telepathy (read minds)
* Freeze time
* Super strength
* Fly

The responses from 200 randomly selected Oregon students were obtained from the Census at School database.

---

```r
school %>%
  group_by(Superpower) %>%
  summarize(Frequency = n()) %>%
  mutate(Proportion = Frequency/sum(Frequency)) %>%
  kable(., format = "html", digits = 2)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Superpower </th>
   <th style="text-align:right;"> Frequency </th>
   <th style="text-align:right;"> Proportion </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Fly </td>
   <td style="text-align:right;"> 42 </td>
   <td style="text-align:right;"> 0.21 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Freeze time </td>
   <td style="text-align:right;"> 58 </td>
   <td style="text-align:right;"> 0.29 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Invisibility </td>
   <td style="text-align:right;"> 30 </td>
   <td style="text-align:right;"> 0.15 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Super strength </td>
   <td style="text-align:right;"> 13 </td>
   <td style="text-align:right;"> 0.06 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Telepathy </td>
   <td style="text-align:right;"> 57 </td>
   <td style="text-align:right;"> 0.28 </td>
  </tr>
</tbody>
</table>

Descriptively this is interesting.  But, are the responses unusual or atypical in any way?  To answer that question, we need some basis for comparison—a null hypothesis.  One option would be to ask if the Oregon preferences are different compared to students from the general population.

---

class: center

![](14-one_sample_files/figure-html/unnamed-chunk-4-1.png)

---

`$H_0$`: Oregon student superpower preferences are similar to the preferences of typical students in the United States.

`$H_1$`: Oregon student superpower preferences are different from the preferences of typical students in the United States.

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Superpower </th>
   <th style="text-align:right;"> OR Observed Proportion </th>
   <th style="text-align:right;"> USA Proportion </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Fly </td>
   <td style="text-align:right;"> 0.21 </td>
   <td style="text-align:right;"> 0.23 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Freeze time </td>
   <td style="text-align:right;"> 0.29 </td>
   <td style="text-align:right;"> 0.25 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Invisibility </td>
   <td style="text-align:right;"> 0.15 </td>
   <td style="text-align:right;"> 0.20 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Super strength </td>
   <td style="text-align:right;"> 0.06 </td>
   <td style="text-align:right;"> 0.08 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Telepathy </td>
   <td style="text-align:right;"> 0.28 </td>
   <td style="text-align:right;"> 0.24 </td>
  </tr>
</tbody>
</table>

---

We can set our alpha ( `$\alpha$` ) level anywhere we like. Let's stick with .05 for convention's sake.

Now we identify our sampling distribution. We'll use the **chi-square** ( `$\chi^2$` ) **distribution** because we're dealing with
* one-sample, and 
* a categorical outcome.

This can be a point of confusion: the way you measure the variable determines whether it is categorical or continuous. We can create summary statistics from categorical variables by counting or calculating proprotions -- but that makes the summary statistics continuous, *not the outcome variable itself*.

---

## Degrees of freedom

The `$\chi^2$` distribution is a single-parameter distribution defined by it's degrees of freedom.

In the case of a **goodness-of-fit test** (like this one), the degrees of freedom are `$\textbf{k-1}$`, where k is the number of groups.

The **Degrees of freedom** are the number of genuinely independent things in a calculation. It's specifically calculated as the number of quantities in a calculation minus the number of constraints.

What it means in principle is that given a set number of categories (k) and a constraint (the proportions have to add up to 1), I can freely choose numbers for k-1 categories. But for the kth category, there's only one number that will work.

---
.left-column[
.small[
The degrees of freedom are the number of categories (k) minus 1.  Given that the category frequencies must sum to the total sample size, k-1 category frequencies are free to vary; the last is determined.

]
]

```
## [1] 9.487729
```

![](14-one_sample_files/figure-html/unnamed-chunk-6-1.png)

---

## Calculating the `$\chi^2$` test statistic

To compare the Oregon observed frequencies to the US data, we need to calculate the frequencies that would have been expected if Oregon was just like all of the other states.

The expected frequencies under this null model can be obtained by taking each preference category proportion from the US data (the null expectation) and multiplying it by the sample size for Oregon:

`$$E_i = P_iN_{OR}$$`
---
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Superpower </th>
   <th style="text-align:center;"> Observed
Freq </th>
   <th style="text-align:center;"> Expected Freq </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Fly </td>
   <td style="text-align:center;"> 42 </td>
   <td style="text-align:center;"> 46.91 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Freeze time </td>
   <td style="text-align:center;"> 58 </td>
   <td style="text-align:center;"> 50.37 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Invisibility </td>
   <td style="text-align:center;"> 30 </td>
   <td style="text-align:center;"> 39.51 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Super strength </td>
   <td style="text-align:center;"> 13 </td>
   <td style="text-align:center;"> 15.80 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Telepathy </td>
   <td style="text-align:center;"> 57 </td>
   <td style="text-align:center;"> 47.41 </td>
  </tr>
</tbody>
</table>

Now what?  We need some way to index differences between these frequencies, preferably one that translates easily into a sampling distribution so that we can sensibly determine how rare or unusual the Oregon data are compared to the US (null) distribution.

---

`$$\chi^2_{df = k-1} = \sum^k_{i=1}\frac{(O_i-E_i)^2}{E_i}$$`

The chi-square goodness of fit (GOF) statistic compares observed and expected frequencies.  It is small when the observed frequencies closely match the expected frequencies under the null hypothesis.  The chi-square distribution can be used to determine the particular `$\chi^2$` value that corresponds to a rare or unusual profile of observed frequencies.

---

```r
or_observed
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##             42             58             30             13             57
```

```r
or_expected
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##       46.91358       50.37037       39.50617       15.80247       47.40741
```

---

```r
or_observed
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##             42             58             30             13             57
```

```r
or_expected
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##       46.91358       50.37037       39.50617       15.80247       47.40741
```

```r
(chi_square = sum((or_observed - or_expected)^2/or_expected))
```

```
## [1] 6.395722
```

---

```r
or_observed
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##             42             58             30             13             57
```

```r
or_expected
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##       46.91358       50.37037       39.50617       15.80247       47.40741
```

```r
(chi_square = sum((or_observed - or_expected)^2/or_expected))
```

```
## [1] 6.395722
```

```r
(critical_val = qchisq(p = 0.95, df = length(or_expected)-1))
```

```
## [1] 9.487729
```

---

```r
or_observed
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##             42             58             30             13             57
```

```r
or_expected
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##       46.91358       50.37037       39.50617       15.80247       47.40741
```

```r
(chi_square = sum((or_observed - or_expected)^2/or_expected))
```

```
## [1] 6.395722
```

```r
(critical_val = qchisq(p = 0.95, df = length(or_expected)-1))
```

```
## [1] 9.487729
```

```r
(p_val = pchisq(q = chi_square, df = length(or_expected)-1, lower.tail = F))
```

```
## [1] 0.1714805
```

---

.left-column[
.small[
The degrees of freedom are the number of categories (k) minus 1.  Given that the category frequencies must sum to the total sample size, k-1 category frequencies are free to vary; the last is determined.

]
]

![](14-one_sample_files/figure-html/unnamed-chunk-14-1.png)

]
]

![](14-one_sample_files/figure-html/unnamed-chunk-15-1.png)

---

```r
p.usa
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##     0.23456790     0.25185185     0.19753086     0.07901235     0.23703704
```

```r
chisq.test(x = or_observed, p = p.usa)
```

```
## 
## 	Chi-squared test for given probabilities
## 
## data:  or_observed
## X-squared = 6.3957, df = 4, p-value = 0.1715
```

The Oregon student preferences are not unusual under the null hypothesis (USA preferences).

Note that the `chisq.test` function takes for x a vector of the counts. In other words, to use this function, you need to calculate the summary statisttic of counts and feed that into the function.

---

```r
c.test = chisq.test(x = or_observed, p = p.usa)
str(c.test)
```

```
## List of 9
##  $ statistic: Named num 6.4
##   ..- attr(*, "names")= chr "X-squared"
##  $ parameter: Named num 4
##   ..- attr(*, "names")= chr "df"
##  $ p.value  : num 0.171
##  $ method   : chr "Chi-squared test for given probabilities"
##  $ data.name: chr "or_observed"
##  $ observed : 'table' int [1:5(1d)] 42 58 30 13 57
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ : chr [1:5] "Fly" "Freeze time" "Invisibility" "Super strength" ...
##  $ expected : 'table' num [1:5(1d)] 46.9 50.4 39.5 15.8 47.4
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ : chr [1:5] "Fly" "Freeze time" "Invisibility" "Super strength" ...
##  $ residuals: 'table' num [1:5(1d)] -0.717 1.075 -1.512 -0.705 1.393
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ : chr [1:5] "Fly" "Freeze time" "Invisibility" "Super strength" ...
##  $ stdres   : 'table' num [1:5(1d)] -0.82 1.243 -1.688 -0.735 1.595
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ : chr [1:5] "Fly" "Freeze time" "Invisibility" "Super strength" ...
##  - attr(*, "class")= chr "htest"
```

---

```r
c.test$residuals
```

```
## 
##            Fly    Freeze time   Invisibility Super strength      Telepathy 
##     -0.7173792      1.0750184     -1.5124228     -0.7049825      1.3931982
```

---

```r
lsr::goodnessOfFitTest(x = as.factor(school$Superpower), p = p.usa)
```

```
## 
##      Chi-square test against specified probabilities
## 
## Data variable:   as.factor(school$Superpower) 
## 
## Hypotheses: 
##    null:        true probabilities are as specified
##    alternative: true probabilities differ from those specified
## 
## Descriptives: 
##                observed freq. expected freq. specified prob.
## Fly                        42       46.91358      0.23456790
## Freeze time                58       50.37037      0.25185185
## Invisibility               30       39.50617      0.19753086
## Super strength             13       15.80247      0.07901235
## Telepathy                  57       47.40741      0.23703704
## 
## Test results: 
##    X-squared statistic:  6.396 
##    degrees of freedom:  4 
##    p-value:  0.171
```

(Note that this function, `goodnessOfFitTest`, takes the raw data, not the vector of counts.)

---

What if we had used the equal proportions null hypothesis?

```r
lsr::goodnessOfFitTest(x = as.factor(school$Superpower))
```

```
## 
##      Chi-square test against specified probabilities
## 
## Data variable:   as.factor(school$Superpower) 
## 
## Hypotheses: 
##    null:        true probabilities are as specified
##    alternative: true probabilities differ from those specified
## 
## Descriptives: 
##                observed freq. expected freq. specified prob.
## Fly                        42             40             0.2
## Freeze time                58             40             0.2
## Invisibility               30             40             0.2
## Super strength             13             40             0.2
## Telepathy                  57             40             0.2
## 
## Test results: 
##    X-squared statistic:  36.15 
##    degrees of freedom:  4 
##    p-value:  <.001
```

Why might this be a sensible or useful test?

---
## The usefulness of `$\chi^2$`

How often will you conducted a `$chi^2$` goodness of fit test on raw data?

* (Probably) never

How often will you come across `$\chi^2$` tests?

* (Probably) a lot!

The goodness of fit test is used to statistically test the how well a model fits data.

---

To calculate Goodness of Fit of a model to data, you build a statistical model of the process as you believe it is in the world.

- example: literacy ~ age + parental involvement
  
Then you estimate each subject's predicted value based on your model.

You compare each subject's predicted value to their actual value -- the difference is called the **residual** ( `$\varepsilon$` ).

If your model is a good fit, then

`$$\Sigma_1^N\varepsilon^2 = \chi^2$$` 
which we compare to the distribution of `$\chi^2_{N-p}$` .

Significant chi-square tests suggest the model does not fit -- the data have values that are far away from "expected."

---