Correlations

class: center, middle, inverse, title-slide

# Correlations

---

## Recap

Correlations are:

- Standardized covariances
     
     + Range from -1 to 1
     
- an effect size

+ Measure of the strength of association between two continuous variables
    
- Calculation:
  - Sum the cross-product of deviation scores
  - Divide by N-1 
  - Divide by the product of standard deviation scores
     
---

### Example

Do Pulizters help newspapers keep readers? (Data from [FiveThirtyEight](https://fivethirtyeight.com/features/do-pulitzers-help-newspapers-keep-readers/)).

```r
library(fivethirtyeight)
data("pulitzer")
head(pulitzer)
```

```
##             newspaper circ2004 circ2013 pctchg_circ num_finals1990_2003
## 1           USA Today  2192098  1674306         -24                   1
## 2 Wall Street Journal  2101017  2378827          13                  30
## 3      New York Times  1119027  1865318          67                  55
## 4   Los Angeles Times   983727   653868         -34                  44
## 5     Washington Post   760034   474767         -38                  52
## 6 New York Daily News   712671   516165         -28                   4
##   num_finals2004_2014 num_finals1990_2014
## 1                   1                   2
## 2                  20                  50
## 3                  62                 117
## 4                  41                  85
## 5                  48                 100
## 6                   2                   6
```

---

```r
x_var = pulitzer$pctchg_circ
y_var = pulitzer$num_finals2004_2014 
n = length(x_var)

x_d = x_var - mean(x_var)
y_d = y_var - mean(y_var)

describe(cbind(x_var, x_d, y_var, y_d), fast = T)
```

```
##       vars  n   mean    sd     min   max range   se
## x_var    1 50 -29.20 27.07 -100.00 67.00   167 3.83
## x_d      2 50   0.00 27.07  -70.80 96.20   167 3.83
## y_var    3 50   6.72 12.14    0.00 62.00    62 1.72
## y_d      4 50   0.00 12.14   -6.72 55.28    62 1.72
```
---

```r
# cross products
x_d*y_d
```

```
##  [1]  -29.744  560.416 5317.936 -164.544 -363.264   -5.664  -48.384  -14.904
##  [9] -156.704    2.016   17.856   -4.464  126.496   36.816 -189.904 -146.624
## [17]   25.456  -10.944   27.176  -25.024   14.336    3.976    3.776  116.416
## [25]   65.536  404.976   -4.864   -7.224 -208.624   13.056   43.896   32.096
## [33]   12.096  186.816   50.976   56.056  263.376  119.616   59.136   99.456
## [41]  -21.504   14.336  -61.824  -55.104  206.976  -46.784   40.176   99.456
## [49]   40.176  -12.584
```

```r
# sum of cross products (variation)
sum(x_d*y_d)
```

```
## [1] 6482.2
```

```r
# variance
sum(x_d*y_d)/( n-1 )
```

```
## [1] 132.2898
```

```r
# correlation

( sum(x_d*y_d)/( n-1 ) ) / ( sd(x_var)*sd(y_var) )
```

```
## [1] 0.4025279
```

---

```r
cor(pulitzer$pctchg_circ,
    pulitzer$num_finals2004_2014)
```

```
## [1] 0.4025279
```

```r
cor.test(pulitzer$pctchg_circ,
    pulitzer$num_finals2004_2014)
```

```
## 
## 	Pearson's product-moment correlation
## 
## data:  pulitzer$pctchg_circ and pulitzer$num_finals2004_2014
## t = 3.0465, df = 48, p-value = 0.003755
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1398493 0.6122747
## sample estimates:
##       cor 
## 0.4025279
```

_Note: `cor.test` cannot handle a null hypothesis other than 0. You'll have to calculate significance by hand if you're interested in using another null._

---

### Recap: testing the significance of a correlation

.pull-left[
If the null hypothesis is the .purple[nil hypothesis]:
 
 - test significance using a _t_-distribution, where 
 
 `$$\large t = \frac{r}{SE_r}$$`
 `$$\large SE_r = \sqrt{\frac{1-r^2}{N-2}}$$`
 
 `$$DF = N-2$$`
 ]
 
 .pull-right[
 If null hypothesis is not 0 `$(\text{e.g.,  }H_0:\rho_{xy} = .40)$`
 
 - Transform statistic and null using Fisher's r to Z
 
 `$$\large z^{'} = {\frac{1}{2}}ln{\frac{1+r}{1-r}}$$`
 
 `$$\large SE = \frac{1}{\sqrt{N-3}}$$`
 
 ]
 
 
---

### Example

In PSY 302, the correlation between midterm exam grades and final exam grades was .56. The class size was 104. Is this statistically significant?

--
### Using t-method

`$$\large SE_r = \sqrt{\frac{1-r^2}{N-2}} = \sqrt{\frac{1-.56^2}{104-2}} = 0.08$$`
`$$\large t = \frac{r}{SE_r} = \frac{0.56}{0.08} = 6.83$$`

---

.left-column[
Probability of getting a *t* statistic of 6.83 or greater is 0.
]

![](2-correlation_files/figure-html/unnamed-chunk-7-1.png)

---

### Example

In PSY 302, the correlation between midterm exam grades and final exam grades was .56. The class size was 104. Is this statistically significantly different from .40?

`$$\large z^{'} = {\frac{1}{2}}ln{\frac{1+r}{1-r}}= {\frac{1}{2}}ln{\frac{1+0.56}{1-0.56}} = 0.63$$`
`$$\large z^{'}_{H_0} = {\frac{1}{2}}ln{\frac{1+r}{1-r}}= {\frac{1}{2}}ln{\frac{1+0.4}{1-0.4}} = 0.42$$`
$$ SE_z = \frac{1}{\sqrt{104-3}} = 0.1$$
---

```r
r = .56
N = 104
null = .40
zr = psych::fisherz(r)
```

```
## [1] 0.6328332
```

```r
znull = psych::fisherz(null)
```

```
## [1] 0.4236489
```

```r
se = 1/sqrt(N-3)
```

```
## [1] 0.09950372
```

---

`$$Z_{\text{statistic}} = \frac{z'-\mu}{SE_z}=\frac{0.63-0.42}{0.1} = 2.1$$`

```r
stat = (zr-znull)/se
```

```
## [1] 2.102276
```

```r
pnorm(stat, lower.tail = F)*2
```

```
## [1] 0.03552913
```

---

## Today

- visualizing correlations
- correlation matrices
- reliability

---
## Visualizing correlations

For a single correlation, best practice is to visualize the relationship using a scatterplot. A best fit line is advised, as it can help clarify the strength and direction of the relationship.

[http://guessthecorrelation.com/](http://guessthecorrelation.com/)

---

![](2-correlation_files/figure-html/unnamed-chunk-11-1.png)

---

![](2-correlation_files/figure-html/unnamed-chunk-12-1.png)

---

![](2-correlation_files/figure-html/unnamed-chunk-13-1.png)

---

![](2-correlation_files/figure-html/unnamed-chunk-14-1.png)

---
![](2-correlation_files/figure-html/unnamed-chunk-15-1.png)

---

![](2-correlation_files/figure-html/unnamed-chunk-16-1.png)

---
![](2-correlation_files/figure-html/unnamed-chunk-17-1.png)

---
![](2-correlation_files/figure-html/unnamed-chunk-18-1.png)

---

## Correlation matrices

Correlations are both a descriptive and an inferential statistic. As a descriptive statistic, they're useful for understanding what's going on in a larger dataset.

Like we use the `summary()` or `describe()` (psych) functions to examine our dataset _before we run any infernetial tests_, we should also look at the correlation matrix.

---

```r
library(psych)
data(bfi)
head(bfi)
```

```
##       A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2 O3 O4
## 61617  2  4  3  4  4  2  3  3  4  4  3  3  3  4  4  3  4  2  2  3  3  6  3  4
## 61618  2  4  5  2  5  5  4  4  3  4  1  1  6  4  3  3  3  3  5  5  4  2  4  3
## 61620  5  4  5  4  4  4  5  4  2  5  2  4  4  4  5  4  5  4  2  3  4  2  5  5
## 61621  4  4  6  5  5  4  4  3  5  5  5  3  4  4  4  2  5  2  4  1  3  3  4  3
## 61622  2  3  3  4  5  4  4  5  3  2  2  2  5  4  5  2  3  4  4  3  3  3  4  3
## 61623  6  6  5  6  5  6  6  6  1  3  2  1  6  5  6  3  5  2  2  3  4  3  5  6
##       O5 gender education age
## 61617  3      1        NA  16
## 61618  3      2        NA  18
## 61620  2      2        NA  17
## 61621  5      2        NA  17
## 61622  3      1        NA  17
## 61623  1      2         3  21
```

---

```r
cor(bfi)
```

```
##           A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1
## A1         1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## A2        NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## A3        NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## A4        NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## A5        NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## C1        NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## C2        NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## C3        NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA
## C4        NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA
## C5        NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA
## E1        NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA
## E2        NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA
## E3        NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA
## E4        NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA
## E5        NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA
## N1        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA
## N2        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA
## N3        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA
## N4        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA
## N5        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA
## O1        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1
## O2        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## O3        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## O4        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## O5        NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## gender    NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## education NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## age       NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##                    O2 O3 O4 O5     gender education         age
## A1                 NA NA NA NA         NA        NA          NA
## A2                 NA NA NA NA         NA        NA          NA
## A3                 NA NA NA NA         NA        NA          NA
## A4                 NA NA NA NA         NA        NA          NA
## A5                 NA NA NA NA         NA        NA          NA
## C1                 NA NA NA NA         NA        NA          NA
## C2                 NA NA NA NA         NA        NA          NA
## C3                 NA NA NA NA         NA        NA          NA
## C4                 NA NA NA NA         NA        NA          NA
## C5                 NA NA NA NA         NA        NA          NA
## E1                 NA NA NA NA         NA        NA          NA
## E2                 NA NA NA NA         NA        NA          NA
## E3                 NA NA NA NA         NA        NA          NA
## E4                 NA NA NA NA         NA        NA          NA
## E5                 NA NA NA NA         NA        NA          NA
## N1                 NA NA NA NA         NA        NA          NA
## N2                 NA NA NA NA         NA        NA          NA
## N3                 NA NA NA NA         NA        NA          NA
## N4                 NA NA NA NA         NA        NA          NA
## N5                 NA NA NA NA         NA        NA          NA
## O1                 NA NA NA NA         NA        NA          NA
## O2         1.00000000 NA NA NA 0.02694778        NA -0.04254386
## O3                 NA  1 NA NA         NA        NA          NA
## O4                 NA NA  1 NA         NA        NA          NA
## O5                 NA NA NA  1         NA        NA          NA
## gender     0.02694778 NA NA NA 1.00000000        NA  0.04770347
## education          NA NA NA NA         NA         1          NA
## age       -0.04254386 NA NA NA 0.04770347        NA  1.00000000
```

---

```r
round(cor(bfi, use = "pairwise"),2)
```

```
##              A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1
## A1         1.00 -0.34 -0.27 -0.15 -0.18  0.03  0.02 -0.02  0.13  0.05  0.11
## A2        -0.34  1.00  0.49  0.34  0.39  0.09  0.14  0.19 -0.15 -0.12 -0.21
## A3        -0.27  0.49  1.00  0.36  0.50  0.10  0.14  0.13 -0.12 -0.16 -0.21
## A4        -0.15  0.34  0.36  1.00  0.31  0.09  0.23  0.13 -0.15 -0.24 -0.11
## A5        -0.18  0.39  0.50  0.31  1.00  0.12  0.11  0.13 -0.13 -0.17 -0.25
## C1         0.03  0.09  0.10  0.09  0.12  1.00  0.43  0.31 -0.34 -0.25 -0.02
## C2         0.02  0.14  0.14  0.23  0.11  0.43  1.00  0.36 -0.38 -0.30  0.02
## C3        -0.02  0.19  0.13  0.13  0.13  0.31  0.36  1.00 -0.34 -0.34  0.00
## C4         0.13 -0.15 -0.12 -0.15 -0.13 -0.34 -0.38 -0.34  1.00  0.48  0.09
## C5         0.05 -0.12 -0.16 -0.24 -0.17 -0.25 -0.30 -0.34  0.48  1.00  0.06
## E1         0.11 -0.21 -0.21 -0.11 -0.25 -0.02  0.02  0.00  0.09  0.06  1.00
## E2         0.09 -0.23 -0.29 -0.19 -0.33 -0.09 -0.06 -0.08  0.20  0.26  0.47
## E3        -0.05  0.25  0.39  0.19  0.42  0.12  0.15  0.09 -0.08 -0.16 -0.33
## E4        -0.06  0.28  0.38  0.30  0.47  0.14  0.12  0.09 -0.11 -0.20 -0.42
## E5        -0.02  0.29  0.25  0.16  0.27  0.25  0.25  0.21 -0.24 -0.23 -0.30
## N1         0.17 -0.09 -0.08 -0.10 -0.20 -0.07 -0.02 -0.07  0.22  0.21  0.02
## N2         0.14 -0.05 -0.09 -0.14 -0.19 -0.04 -0.01 -0.06  0.16  0.25  0.01
## N3         0.10 -0.04 -0.04 -0.07 -0.14 -0.03  0.00 -0.07  0.21  0.24  0.05
## N4         0.05 -0.09 -0.13 -0.17 -0.20 -0.10 -0.05 -0.11  0.26  0.34  0.23
## N5         0.02  0.02 -0.04 -0.01 -0.08 -0.05  0.05 -0.01  0.20  0.17  0.05
## O1         0.01  0.13  0.15  0.06  0.16  0.17  0.16  0.09 -0.09 -0.08 -0.10
## O2         0.08  0.02  0.00  0.04  0.00 -0.11 -0.04 -0.03  0.21  0.14  0.04
## O3        -0.06  0.16  0.22  0.07  0.24  0.19  0.19  0.06 -0.08 -0.08 -0.22
## O4        -0.08  0.09  0.04 -0.04  0.02  0.11  0.06  0.02  0.05  0.14  0.08
## O5         0.11 -0.09 -0.05  0.02 -0.05 -0.12 -0.05 -0.01  0.20  0.06  0.10
## gender    -0.16  0.18  0.14  0.13  0.10  0.01  0.07  0.05 -0.08 -0.09 -0.13
## education -0.14  0.01  0.00 -0.02  0.01  0.03  0.00  0.05 -0.04  0.03  0.00
## age       -0.16  0.11  0.07  0.14  0.13  0.08  0.02  0.07 -0.15 -0.09 -0.03
##              E2    E3    E4    E5    N1    N2    N3    N4    N5    O1    O2
## A1         0.09 -0.05 -0.06 -0.02  0.17  0.14  0.10  0.05  0.02  0.01  0.08
## A2        -0.23  0.25  0.28  0.29 -0.09 -0.05 -0.04 -0.09  0.02  0.13  0.02
## A3        -0.29  0.39  0.38  0.25 -0.08 -0.09 -0.04 -0.13 -0.04  0.15  0.00
## A4        -0.19  0.19  0.30  0.16 -0.10 -0.14 -0.07 -0.17 -0.01  0.06  0.04
## A5        -0.33  0.42  0.47  0.27 -0.20 -0.19 -0.14 -0.20 -0.08  0.16  0.00
## C1        -0.09  0.12  0.14  0.25 -0.07 -0.04 -0.03 -0.10 -0.05  0.17 -0.11
## C2        -0.06  0.15  0.12  0.25 -0.02 -0.01  0.00 -0.05  0.05  0.16 -0.04
## C3        -0.08  0.09  0.09  0.21 -0.07 -0.06 -0.07 -0.11 -0.01  0.09 -0.03
## C4         0.20 -0.08 -0.11 -0.24  0.22  0.16  0.21  0.26  0.20 -0.09  0.21
## C5         0.26 -0.16 -0.20 -0.23  0.21  0.25  0.24  0.34  0.17 -0.08  0.14
## E1         0.47 -0.33 -0.42 -0.30  0.02  0.01  0.05  0.23  0.05 -0.10  0.04
## E2         1.00 -0.38 -0.51 -0.37  0.17  0.19  0.20  0.35  0.25 -0.16  0.08
## E3        -0.38  1.00  0.42  0.38 -0.05 -0.07 -0.02 -0.15 -0.07  0.33 -0.07
## E4        -0.51  0.42  1.00  0.32 -0.14 -0.14 -0.10 -0.29 -0.09  0.14  0.06
## E5        -0.37  0.38  0.32  1.00  0.04  0.04 -0.06 -0.21 -0.13  0.30 -0.08
## N1         0.17 -0.05 -0.14  0.04  1.00  0.71  0.56  0.40  0.38 -0.05  0.13
## N2         0.19 -0.07 -0.14  0.04  0.71  1.00  0.55  0.39  0.35 -0.05  0.13
## N3         0.20 -0.02 -0.10 -0.06  0.56  0.55  1.00  0.52  0.43 -0.03  0.11
## N4         0.35 -0.15 -0.29 -0.21  0.40  0.39  0.52  1.00  0.40 -0.05  0.08
## N5         0.25 -0.07 -0.09 -0.13  0.38  0.35  0.43  0.40  1.00 -0.12  0.20
## O1        -0.16  0.33  0.14  0.30 -0.05 -0.05 -0.03 -0.05 -0.12  1.00 -0.21
## O2         0.08 -0.07  0.06 -0.08  0.13  0.13  0.11  0.08  0.20 -0.21  1.00
## O3        -0.23  0.39  0.21  0.29 -0.05 -0.03 -0.03 -0.06 -0.08  0.40 -0.26
## O4         0.17  0.05 -0.10  0.00  0.08  0.13  0.18  0.21  0.11  0.18 -0.07
## O5         0.08 -0.11  0.05 -0.11  0.11  0.04  0.06  0.04  0.14 -0.24  0.32
## gender    -0.05  0.05  0.08  0.07  0.04  0.10  0.12  0.00  0.21 -0.10  0.03
## education -0.01  0.00 -0.04  0.06 -0.05 -0.05 -0.05  0.01 -0.05  0.03 -0.09
## age       -0.11  0.00 -0.01  0.11 -0.09 -0.10 -0.11 -0.03 -0.10  0.05 -0.04
##              O3    O4    O5 gender education   age
## A1        -0.06 -0.08  0.11  -0.16     -0.14 -0.16
## A2         0.16  0.09 -0.09   0.18      0.01  0.11
## A3         0.22  0.04 -0.05   0.14      0.00  0.07
## A4         0.07 -0.04  0.02   0.13     -0.02  0.14
## A5         0.24  0.02 -0.05   0.10      0.01  0.13
## C1         0.19  0.11 -0.12   0.01      0.03  0.08
## C2         0.19  0.06 -0.05   0.07      0.00  0.02
## C3         0.06  0.02 -0.01   0.05      0.05  0.07
## C4        -0.08  0.05  0.20  -0.08     -0.04 -0.15
## C5        -0.08  0.14  0.06  -0.09      0.03 -0.09
## E1        -0.22  0.08  0.10  -0.13      0.00 -0.03
## E2        -0.23  0.17  0.08  -0.05     -0.01 -0.11
## E3         0.39  0.05 -0.11   0.05      0.00  0.00
## E4         0.21 -0.10  0.05   0.08     -0.04 -0.01
## E5         0.29  0.00 -0.11   0.07      0.06  0.11
## N1        -0.05  0.08  0.11   0.04     -0.05 -0.09
## N2        -0.03  0.13  0.04   0.10     -0.05 -0.10
## N3        -0.03  0.18  0.06   0.12     -0.05 -0.11
## N4        -0.06  0.21  0.04   0.00      0.01 -0.03
## N5        -0.08  0.11  0.14   0.21     -0.05 -0.10
## O1         0.40  0.18 -0.24  -0.10      0.03  0.05
## O2        -0.26 -0.07  0.32   0.03     -0.09 -0.04
## O3         1.00  0.19 -0.31  -0.04      0.09  0.04
## O4         0.19  1.00 -0.18   0.00      0.05  0.01
## O5        -0.31 -0.18  1.00   0.02     -0.06 -0.10
## gender    -0.04  0.00  0.02   1.00      0.01  0.05
## education  0.09  0.05 -0.06   0.01      1.00  0.24
## age        0.04  0.01 -0.10   0.05      0.24  1.00
```

---

```r
round(cor(bfi, use = "complete"),2)
```

```
##              A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1
## A1         1.00 -0.34 -0.26 -0.14 -0.19  0.02  0.01 -0.01  0.10  0.02  0.12
## A2        -0.34  1.00  0.48  0.34  0.38  0.09  0.13  0.19 -0.14 -0.11 -0.24
## A3        -0.26  0.48  1.00  0.38  0.50  0.10  0.14  0.13 -0.12 -0.15 -0.22
## A4        -0.14  0.34  0.38  1.00  0.32  0.08  0.22  0.13 -0.16 -0.24 -0.14
## A5        -0.19  0.38  0.50  0.32  1.00  0.12  0.11  0.13 -0.12 -0.16 -0.25
## C1         0.02  0.09  0.10  0.08  0.12  1.00  0.43  0.32 -0.35 -0.25 -0.03
## C2         0.01  0.13  0.14  0.22  0.11  0.43  1.00  0.36 -0.38 -0.30  0.02
## C3        -0.01  0.19  0.13  0.13  0.13  0.32  0.36  1.00 -0.35 -0.35 -0.02
## C4         0.10 -0.14 -0.12 -0.16 -0.12 -0.35 -0.38 -0.35  1.00  0.48  0.10
## C5         0.02 -0.11 -0.15 -0.24 -0.16 -0.25 -0.30 -0.35  0.48  1.00  0.07
## E1         0.12 -0.24 -0.22 -0.14 -0.25 -0.03  0.02 -0.02  0.10  0.07  1.00
## E2         0.08 -0.24 -0.29 -0.20 -0.33 -0.10 -0.07 -0.09  0.21  0.26  0.47
## E3        -0.04  0.25  0.38  0.20  0.41  0.13  0.15  0.10 -0.09 -0.17 -0.33
## E4        -0.07  0.30  0.39  0.33  0.48  0.14  0.12  0.10 -0.12 -0.21 -0.42
## E5        -0.02  0.30  0.26  0.16  0.27  0.26  0.25  0.22 -0.23 -0.24 -0.31
## N1         0.16 -0.08 -0.07 -0.09 -0.19 -0.06 -0.02 -0.08  0.21  0.21  0.01
## N2         0.13 -0.04 -0.08 -0.15 -0.19 -0.03  0.00 -0.06  0.15  0.24  0.01
## N3         0.09 -0.02 -0.03 -0.07 -0.13 -0.01  0.01 -0.07  0.20  0.23  0.05
## N4         0.04 -0.09 -0.13 -0.16 -0.21 -0.09 -0.04 -0.13  0.28  0.35  0.23
## N5         0.01  0.02 -0.04  0.00 -0.08 -0.05  0.05 -0.04  0.21  0.18  0.04
## O1         0.00  0.11  0.14  0.04  0.15  0.18  0.16  0.09 -0.10 -0.09 -0.10
## O2         0.07  0.03  0.03  0.05  0.00 -0.13 -0.05 -0.03  0.21  0.12  0.06
## O3        -0.06  0.15  0.22  0.04  0.22  0.19  0.18  0.06 -0.07 -0.07 -0.21
## O4        -0.09  0.05  0.02 -0.06  0.00  0.08  0.03  0.00  0.07  0.14  0.08
## O5         0.11 -0.08 -0.04  0.04 -0.04 -0.13 -0.06  0.00  0.18  0.05  0.09
## gender    -0.17  0.21  0.16  0.13  0.11  0.00  0.06  0.04 -0.07 -0.09 -0.15
## education -0.14  0.02  0.00 -0.02  0.02  0.04  0.01  0.06 -0.04  0.04  0.00
## age       -0.14  0.09  0.04  0.11  0.10  0.08  0.00  0.05 -0.12 -0.07 -0.03
##              E2    E3    E4    E5    N1    N2    N3    N4    N5    O1    O2
## A1         0.08 -0.04 -0.07 -0.02  0.16  0.13  0.09  0.04  0.01  0.00  0.07
## A2        -0.24  0.25  0.30  0.30 -0.08 -0.04 -0.02 -0.09  0.02  0.11  0.03
## A3        -0.29  0.38  0.39  0.26 -0.07 -0.08 -0.03 -0.13 -0.04  0.14  0.03
## A4        -0.20  0.20  0.33  0.16 -0.09 -0.15 -0.07 -0.16  0.00  0.04  0.05
## A5        -0.33  0.41  0.48  0.27 -0.19 -0.19 -0.13 -0.21 -0.08  0.15  0.00
## C1        -0.10  0.13  0.14  0.26 -0.06 -0.03 -0.01 -0.09 -0.05  0.18 -0.13
## C2        -0.07  0.15  0.12  0.25 -0.02  0.00  0.01 -0.04  0.05  0.16 -0.05
## C3        -0.09  0.10  0.10  0.22 -0.08 -0.06 -0.07 -0.13 -0.04  0.09 -0.03
## C4         0.21 -0.09 -0.12 -0.23  0.21  0.15  0.20  0.28  0.21 -0.10  0.21
## C5         0.26 -0.17 -0.21 -0.24  0.21  0.24  0.23  0.35  0.18 -0.09  0.12
## E1         0.47 -0.33 -0.42 -0.31  0.01  0.01  0.05  0.23  0.04 -0.10  0.06
## E2         1.00 -0.40 -0.52 -0.39  0.17  0.20  0.19  0.35  0.26 -0.16  0.08
## E3        -0.40  1.00  0.43  0.40 -0.04 -0.06 -0.01 -0.15 -0.09  0.33 -0.07
## E4        -0.52  0.43  1.00  0.33 -0.14 -0.15 -0.13 -0.31 -0.09  0.12  0.05
## E5        -0.39  0.40  0.33  1.00  0.04  0.05 -0.06 -0.21 -0.14  0.29 -0.09
## N1         0.17 -0.04 -0.14  0.04  1.00  0.71  0.57  0.41  0.38 -0.05  0.14
## N2         0.20 -0.06 -0.15  0.05  0.71  1.00  0.55  0.39  0.35 -0.05  0.12
## N3         0.19 -0.01 -0.13 -0.06  0.57  0.55  1.00  0.52  0.43 -0.05  0.11
## N4         0.35 -0.15 -0.31 -0.21  0.41  0.39  0.52  1.00  0.40 -0.06  0.08
## N5         0.26 -0.09 -0.09 -0.14  0.38  0.35  0.43  0.40  1.00 -0.15  0.20
## O1        -0.16  0.33  0.12  0.29 -0.05 -0.05 -0.05 -0.06 -0.15  1.00 -0.23
## O2         0.08 -0.07  0.05 -0.09  0.14  0.12  0.11  0.08  0.20 -0.23  1.00
## O3        -0.24  0.41  0.21  0.30 -0.03 -0.02 -0.03 -0.06 -0.08  0.39 -0.29
## O4         0.17  0.04 -0.10 -0.02  0.09  0.13  0.17  0.23  0.11  0.17 -0.08
## O5         0.08 -0.13  0.04 -0.11  0.10  0.02  0.05  0.03  0.14 -0.25  0.33
## gender    -0.08  0.05  0.11  0.08  0.04  0.09  0.11 -0.02  0.21 -0.11  0.04
## education -0.01  0.01 -0.03  0.06 -0.04 -0.04 -0.04  0.01 -0.05  0.03 -0.10
## age       -0.10 -0.02 -0.01  0.10 -0.07 -0.09 -0.11 -0.02 -0.10  0.05 -0.04
##              O3    O4    O5 gender education   age
## A1        -0.06 -0.09  0.11  -0.17     -0.14 -0.14
## A2         0.15  0.05 -0.08   0.21      0.02  0.09
## A3         0.22  0.02 -0.04   0.16      0.00  0.04
## A4         0.04 -0.06  0.04   0.13     -0.02  0.11
## A5         0.22  0.00 -0.04   0.11      0.02  0.10
## C1         0.19  0.08 -0.13   0.00      0.04  0.08
## C2         0.18  0.03 -0.06   0.06      0.01  0.00
## C3         0.06  0.00  0.00   0.04      0.06  0.05
## C4        -0.07  0.07  0.18  -0.07     -0.04 -0.12
## C5        -0.07  0.14  0.05  -0.09      0.04 -0.07
## E1        -0.21  0.08  0.09  -0.15      0.00 -0.03
## E2        -0.24  0.17  0.08  -0.08     -0.01 -0.10
## E3         0.41  0.04 -0.13   0.05      0.01 -0.02
## E4         0.21 -0.10  0.04   0.11     -0.03 -0.01
## E5         0.30 -0.02 -0.11   0.08      0.06  0.10
## N1        -0.03  0.09  0.10   0.04     -0.04 -0.07
## N2        -0.02  0.13  0.02   0.09     -0.04 -0.09
## N3        -0.03  0.17  0.05   0.11     -0.04 -0.11
## N4        -0.06  0.23  0.03  -0.02      0.01 -0.02
## N5        -0.08  0.11  0.14   0.21     -0.05 -0.10
## O1         0.39  0.17 -0.25  -0.11      0.03  0.05
## O2        -0.29 -0.08  0.33   0.04     -0.10 -0.04
## O3         1.00  0.17 -0.32  -0.04      0.10  0.02
## O4         0.17  1.00 -0.18  -0.04      0.06  0.00
## O5        -0.32 -0.18  1.00   0.04     -0.06 -0.08
## gender    -0.04 -0.04  0.04   1.00      0.01  0.05
## education  0.10  0.06 -0.06   0.01      1.00  0.25
## age        0.02  0.00 -0.08   0.05      0.25  1.00
```

---

With .purple[pairwise deletion], different sets of cases contribute to different correlations.  That maximizes the sample sizes, but can lead to problems if the data are missing for some systematic reason.

.purple[Listwise deletion] (often referred to in `R` as use complete cases) doesn't have the same issue of biasing correlations, but does result in smaller samples and potentially limited generalizability.

A good practice is comparing the different matrices; if the correlation values are very different, this suggests that the missingness that affects pairwise deletion is systematic.

---

```r
round(cor(bfi, use = "pairwise")- cor(bfi, use = "complete"),2)
```

```
##              A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1
## A1         0.00  0.00  0.00  0.00  0.00  0.01  0.00 -0.01  0.03  0.03 -0.01
## A2         0.00  0.00  0.00 -0.01  0.01  0.00  0.01  0.01 -0.01 -0.01  0.03
## A3         0.00  0.00  0.00 -0.02  0.00  0.00  0.00  0.00  0.00 -0.01  0.00
## A4         0.00 -0.01 -0.02  0.00 -0.01  0.01  0.01  0.00  0.01  0.00  0.03
## A5         0.00  0.01  0.00 -0.01  0.00  0.00  0.00  0.00 -0.01 -0.01  0.00
## C1         0.01  0.00  0.00  0.01  0.00  0.00  0.00 -0.01  0.01  0.00  0.00
## C2         0.00  0.01  0.00  0.01  0.00  0.00  0.00  0.00  0.00  0.00 -0.01
## C3        -0.01  0.01  0.00  0.00  0.00 -0.01  0.00  0.00  0.02  0.01  0.02
## C4         0.03 -0.01  0.00  0.01 -0.01  0.01  0.00  0.02  0.00 -0.01 -0.01
## C5         0.03 -0.01 -0.01  0.00 -0.01  0.00  0.00  0.01 -0.01  0.00  0.00
## E1        -0.01  0.03  0.00  0.03  0.00  0.00 -0.01  0.02 -0.01  0.00  0.00
## E2         0.01  0.01  0.00  0.01  0.00  0.01  0.01  0.01 -0.01  0.00  0.00
## E3         0.00  0.00  0.00 -0.01  0.00 -0.02  0.00 -0.02  0.01  0.01  0.01
## E4         0.01 -0.02 -0.02 -0.03 -0.01  0.00  0.00 -0.01  0.01  0.01  0.00
## E5         0.00  0.00 -0.01  0.00  0.00 -0.01  0.00  0.00  0.00  0.01  0.00
## N1         0.01 -0.01 -0.02  0.00  0.00 -0.01  0.00  0.01  0.01  0.01  0.01
## N2         0.01 -0.01  0.00  0.00  0.00 -0.01 -0.01  0.00  0.01  0.01  0.01
## N3         0.01 -0.02 -0.01  0.00 -0.01 -0.02 -0.01  0.01  0.01  0.01  0.00
## N4         0.01  0.00  0.00 -0.01  0.01 -0.01 -0.01  0.02 -0.02 -0.01  0.00
## N5         0.01  0.00  0.00  0.00  0.00  0.00  0.00  0.02 -0.02 -0.01  0.01
## O1         0.01  0.02  0.00  0.02  0.02 -0.01  0.01  0.00  0.01  0.01  0.00
## O2         0.01 -0.02 -0.03 -0.01  0.00  0.02  0.01  0.00  0.00  0.02 -0.01
## O3         0.00  0.02  0.01  0.03  0.02  0.00  0.01  0.01 -0.01 -0.01  0.00
## O4         0.01  0.03  0.01  0.02  0.01  0.03  0.03  0.02 -0.02  0.00 -0.01
## O5         0.01 -0.01 -0.01 -0.01 -0.01  0.01  0.00 -0.01  0.01  0.01  0.01
## gender     0.01 -0.03 -0.02  0.00 -0.01  0.01  0.01  0.01 -0.01  0.00  0.02
## education  0.00 -0.01 -0.01  0.00  0.00 -0.01 -0.01 -0.01  0.00 -0.01  0.00
## age       -0.02  0.02  0.03  0.03  0.03  0.00  0.02  0.02 -0.03 -0.01  0.01
##              E2    E3    E4    E5    N1    N2    N3    N4    N5    O1    O2
## A1         0.01  0.00  0.01  0.00  0.01  0.01  0.01  0.01  0.01  0.01  0.01
## A2         0.01  0.00 -0.02  0.00 -0.01 -0.01 -0.02  0.00  0.00  0.02 -0.02
## A3         0.00  0.00 -0.02 -0.01 -0.02  0.00 -0.01  0.00  0.00  0.00 -0.03
## A4         0.01 -0.01 -0.03  0.00  0.00  0.00  0.00 -0.01  0.00  0.02 -0.01
## A5         0.00  0.00 -0.01  0.00  0.00  0.00 -0.01  0.01  0.00  0.02  0.00
## C1         0.01 -0.02  0.00 -0.01 -0.01 -0.01 -0.02 -0.01  0.00 -0.01  0.02
## C2         0.01  0.00  0.00  0.00  0.00 -0.01 -0.01 -0.01  0.00  0.01  0.01
## C3         0.01 -0.02 -0.01  0.00  0.01  0.00  0.01  0.02  0.02  0.00  0.00
## C4        -0.01  0.01  0.01  0.00  0.01  0.01  0.01 -0.02 -0.02  0.01  0.00
## C5         0.00  0.01  0.01  0.01  0.01  0.01  0.01 -0.01 -0.01  0.01  0.02
## E1         0.00  0.01  0.00  0.00  0.01  0.01  0.00  0.00  0.01  0.00 -0.01
## E2         0.00  0.02  0.01  0.02  0.00  0.00  0.01 -0.01  0.00  0.00  0.00
## E3         0.02  0.00 -0.01 -0.02 -0.01 -0.01 -0.01  0.01  0.01  0.00  0.01
## E4         0.01 -0.01  0.00 -0.02  0.01  0.01  0.03  0.02  0.00  0.01  0.01
## E5         0.02 -0.02 -0.02  0.00  0.00 -0.01  0.00  0.00  0.01  0.00  0.00
## N1         0.00 -0.01  0.01  0.00  0.00  0.00 -0.01 -0.01 -0.01  0.00 -0.01
## N2         0.00 -0.01  0.01 -0.01  0.00  0.00  0.00  0.00  0.00  0.00  0.00
## N3         0.01 -0.01  0.03  0.00 -0.01  0.00  0.00  0.00  0.00  0.01  0.00
## N4        -0.01  0.01  0.02  0.00 -0.01  0.00  0.00  0.00  0.00  0.01  0.00
## N5         0.00  0.01  0.00  0.01 -0.01  0.00  0.00  0.00  0.00  0.03  0.00
## O1         0.00  0.00  0.01  0.00  0.00  0.00  0.01  0.01  0.03  0.00  0.02
## O2         0.00  0.01  0.01  0.00 -0.01  0.00  0.00  0.00  0.00  0.02  0.00
## O3         0.02 -0.02  0.00  0.00 -0.02 -0.01  0.00  0.00  0.01  0.00  0.03
## O4         0.00  0.01  0.01  0.01 -0.01  0.00  0.01 -0.02  0.01  0.01  0.01
## O5         0.00  0.02  0.01  0.00  0.01  0.02  0.01  0.01 -0.01  0.01 -0.01
## gender     0.02 -0.01 -0.03 -0.01  0.01  0.00  0.01  0.02  0.00  0.01 -0.02
## education  0.00  0.00 -0.01  0.00  0.00 -0.01 -0.01  0.00 -0.01 -0.01  0.01
## age        0.00  0.02  0.00  0.02 -0.01 -0.01  0.00 -0.01  0.00  0.00  0.00
##              O3    O4    O5 gender education   age
## A1         0.00  0.01  0.01   0.01      0.00 -0.02
## A2         0.02  0.03 -0.01  -0.03     -0.01  0.02
## A3         0.01  0.01 -0.01  -0.02     -0.01  0.03
## A4         0.03  0.02 -0.01   0.00      0.00  0.03
## A5         0.02  0.01 -0.01  -0.01      0.00  0.03
## C1         0.00  0.03  0.01   0.01     -0.01  0.00
## C2         0.01  0.03  0.00   0.01     -0.01  0.02
## C3         0.01  0.02 -0.01   0.01     -0.01  0.02
## C4        -0.01 -0.02  0.01  -0.01      0.00 -0.03
## C5        -0.01  0.00  0.01   0.00     -0.01 -0.01
## E1         0.00 -0.01  0.01   0.02      0.00  0.01
## E2         0.02  0.00  0.00   0.02      0.00  0.00
## E3        -0.02  0.01  0.02  -0.01      0.00  0.02
## E4         0.00  0.01  0.01  -0.03     -0.01  0.00
## E5         0.00  0.01  0.00  -0.01      0.00  0.02
## N1        -0.02 -0.01  0.01   0.01      0.00 -0.01
## N2        -0.01  0.00  0.02   0.00     -0.01 -0.01
## N3         0.00  0.01  0.01   0.01     -0.01  0.00
## N4         0.00 -0.02  0.01   0.02      0.00 -0.01
## N5         0.01  0.01 -0.01   0.00     -0.01  0.00
## O1         0.00  0.01  0.01   0.01     -0.01  0.00
## O2         0.03  0.01 -0.01  -0.02      0.01  0.00
## O3         0.00  0.02  0.01   0.01      0.00  0.01
## O4         0.02  0.00  0.00   0.03     -0.01  0.01
## O5         0.01  0.00  0.00  -0.01      0.00 -0.02
## gender     0.01  0.03 -0.01   0.00      0.00  0.00
## education  0.00 -0.01  0.00   0.00      0.00 -0.01
## age        0.01  0.01 -0.02   0.00     -0.01  0.00
```

---

## Types of missingness

Ideally our missingness is .purple[missing completely at random (MCAR)]. This means the probability of being missing is the same for all observations. If this is the case, our correlation estimates will be unbiased (if underpowered) and we're free to use them with no concerns (other than the usual).
* Aliens beam into a warehouse and randomly take some files.

However, our data might be .purple[missing at random (MAR)]. This means the probability of being missing is different between cases, and also the probability is related to variables we have observed. This is not great, but sometimes we can account for this using the variables we have observed (e.g., imputation, different estimation methods).
* Raccoons sneak into the warehouse and eat all the files by the door.

---

## Types of missingness

It's a problem if our data is .purple[missing not at random (MNAR)]. The probability of being missing differs for reasons that are unknown to us. This is especially problematic if the reason is associated with the variables at the heart of our study. Sensitivity analyses might help us detect MNAR-ness and possibly define the limits of our study, but we can't adjust our data for this issue.
* Criminals break into the warehouse and steal files about themselves.

---
## Visualizing correlation matrices

A single correlation can be informative; a correlation matrix is more than the sum of its parts.

Correlation matrices can be used to infer larger patterns of relationships. You may be one of the gifted who can look at a matrix of numbers and see those patterns immediately. Or you can use .purple[heat maps] to visualize correlation matrices.

```r
library(corrplot)
```

---

```r
corrplot(cor(bfi, use = "pairwise"), method = "square")
```

![](2-correlation_files/figure-html/unnamed-chunk-25-1.png)

---

![](images/comm plot-1.png)

.small[
[Beck, Condon, & Jackson, 2019](https://psyarxiv.com/857ev/)
]
---

## Factors that influence `$r$` (and most other test statistics)

1. Restriction of range (GRE scores and success)

2. Very skewed distributions (smoking and health)

3. Non-linear associations

4. Measurement overlap (modality and content)

5. Reliability

---
## Reliability

Which would you rather have?

- 1-item final exam versus 30-item?

- assessment via trained clinician vs tarot cards?

- fMRI during minor earthquake vs no earthquake?

All measurement includes error

- Score = true score + measurement error (CTT version)

- Reliability assesses the consistency of measurement; high reliability indicates less error

---

## Reliability

- Cannot correlate error (randomness) with something

- Because we do not measure our variables perfectly we get lower correlations compared to true correlations

- If we want to have a valid measure it better be a reliable measure

---
## Reliability

- think of reliability as a correlation with a measure and itself in a different world, at a different time, or a different but equal version

`$$\large r_{XX}$$`

---
## Reliability

- true score variance divided by observed variance
- how do you assess theoretical variance i.e., true score variance?

`$$\large r_{XY} = r_{X_{T} Y_{T}} {\sqrt{r_{XX}r_{YY}}}$$`

`$$\large r_{XY} = .6 {\sqrt {(.70) (.70)}}$$`

---
## Reliability

`$$\large r_{X_{T} Y_{T}} =  = {\frac {r_{XY}} {\sqrt{r_{XX}r_{YY}}}}$$`

`$$\large r_{X_{T} Y_{T}} =  = {\frac {.30} {\sqrt{(.70)(.70)}}} = .42$$`

???

### Take aways

N needed for .42 = 42
N needed for .3 = 84 -- need twice as many people!!

it doesn't work the other way -- you can't take your correlation and back calculate the true score, because reliabilities are also estimates. these can be wrong; the correlation you calculate is the max it could be

---
## Most common ways to assess

- Cronbach's alpha

```r
library(psych)
alpha(dataset[,items])
alpha(bfi[,c("A1", "A2", "A3", "A4", "A5")])
## Gives average split half correlation
## Can tell you if you are assessing a single construct
## Conflicts with tidyverse - fix with psych::alpha()
```

- Rest-retest reliability
- Kappa or ICC

---
## Reliability

- if you are going to measure something, do it well

- applies to ALL IVs and DVs, and all designs

- remember this when interpreting research

---
## Types of correlations

- Many ways to get at relationship between two variables

- Statistically the different types are _almost_ exactly the same
- Exist for historical reasons

---

## Types of correlations

1. Point Biserial
    +  continuous and dichotomous
2. Phi coefficient
    + both dichotomous
3. Spearman rank order
    + ranked data (nonparametric)
4. Biserial (assumes dichotomous is continuous)

Some important exceptions to the equivalence rule

5. Tetrachoric 
    + used for 2x2 contingency table
    + useful for assessing agreement between reviewers
6. Polychoric 
    + ordinal variables (Likert scales)
    + extension of tetrachoric
    
---

## Statistics and eugenics

The concept of the correlation is primarily attributed to Sir Frances Galton.
* He was also the founder of the [concept of eugenics](https://www.theguardian.com/commentisfree/2019/oct/03/eugenics-francis-galton-science-ideas).

The correlation coefficient was developed by his student, [Karl Pearson](https://www.britannica.com/biography/Karl-Pearson), and adapted into the ANOVA framework by [Sir Ronald Fisher](https://statmodeling.stat.columbia.edu/2020/08/01/ra-fisher-and-the-science-of-hatred/).
* Both were prominent advocates for the eugenics movement.

---

## What do we do with this information?

* Never use the correlation or the later techniques developed on it? Of course not.

* Acknowledge this history? Certainly.

* [Understand how the perspectives](https://medium.com/swlh/is-statistics-racist-59cd4ddb5fa9) of Galton, Fisher, Pearson and others [shaped our practices](http://gppreview.com/2019/12/16/eugenics-ethics-statistical-analysis/)? We must! -- these are not set in stone, [nor are they necessarily the best way](https://www.forbes.com/sites/jerrybowyer/2016/01/06/beer-vs-eugenics-the-good-and-the-bad-uses-of-statistics/?sh=3114a0c82a14) to move forward.
  * Statistical significance was a way to avoid talking about nuance or degree.
  * "Correlation does not imply causation" was a refutation of work demonstrating associations between environment and poverty.

---

class: inverse

## Next time....

Univariate regression