Purpose

Today’s lab will guide you through the process of conducting a One-Sample t-test and an Independent Samples t-test. For each test, we will first go through the process of conducting the tests using arithmetic and probability distributions in R. Then we will compare and contrast t-test functions from the {stats} and {lsr} packages. For the independent samples t-test, we also discuss a way to quickly plot the data using {ggpubr}.

To quickly navigate to the desired section, click one of the following links:

  1. One-Sample t-test
  2. Independent Samples t-test
  3. Minihacks

As always, there are minihacks to test your knowledge.


The Data

Today we will be analyzing data from Fox and Guyer’s (1978) anonymity and cooperation study. The data is included in the {carData} package and you can see information about the data set using ?Guyer. Twenty groups of four participants each played 30 trials of the the prisoner’s dilemma game. The number of cooperative choices (cooperation) made by each group were scored out of 120 (i.e., cooperative choices made by 4 participants over 30 trials). The groups either made decisions publicly or privately (condition) and groups were either comprised of all women or all men (sex).

Run the following code to load the data into your global environment.

# load data 
data <- Guyer

One-Sample t-test

A One-Sample t-test tests whether some obtained sample mean is significantly different from a value specified by the researcher. Looking at the Guyer data, we might hypothesize that groups would cooperate on more than 50% of the trials (i.e., groups would cooperate on more than 60 trials). The alternative hypothesis (\(H_{1}\)) would therefore be that the population mean of cooperation is not equal to 60 (\(\mu \neq 60\)), whereas the null hypothesis (\(H_{0}\)) would be that the population mean of cooperation is equal to 60 (\(\mu = 60\)).

You might have noticed that our alternative hypothesis was that the mean level of cooperation was not equal to 60 (\(\mu \neq 60\)) rather than that the mean level of cooperation was greater than 60 (\(\mu > 60\)). This is because we plan to run a two-tailed significance test. A two-tailed significance tests allows us to declare significance if the mean cooperation is sufficiently greater than 60 and if the mean cooperation is sufficiently less than 60. Two-tailed tests are far more common than one-tailed tests, and you should generally default to running a two-tailed test.

Using Arithmetic

Most statistical tests look at a ratio of signal to noise. The one-sample t-test is no different.

Let’s look at the equation.

\[t = \frac{\bar X - \mu}{\hat \sigma / \sqrt{N}}\]

The numerator (\(\bar x - \mu\)) is the “signal”, or the difference between our sample mean and the value we are comparing it against, and the denominator (\(\hat \sigma / \sqrt{N}\)) is the “noise”, or the inaccuracy (standard error) of our estimate. If we have more “signal” in our data, our t-statistic increases. If we have more “noise” in our data, our t-statistic decreases.

To make our lives easier, we will first calculate some descriptive statistics: (1) the mean of cooperation in our sample (coop_mean), (2) the standard deviation of cooperation in our sample (coop_sd), and (3) the number of unique values in our data (coop_n). We should also calculate the degrees of freedom for our data (coop_df). For a one-sample t-test, the degrees of freedom is equal to \(N - 1\).

# calculate descriptives
coop_mean <- mean(data$coop)
coop_sd   <- sd(data$coop)
coop_n    <- length(data$coop)

# calculate degrees of freedoom
coop_df   <- coop_n - 1

# look at the values
c("mean" = coop_mean,
  "sd"   = coop_sd,
  "n"    = coop_n,
  "df"   = coop_df)
##     mean       sd        n       df 
## 48.30000 14.28691 20.00000 19.00000

As we can see, the mean value of cooperation is 48.30.

It looks like the average cooperation was actually less than 60. But this could simply be due to chance. Let’s calculate how probable obtaining a result at least as extreme as we did here is when the null hypothesis is, in fact, true.

To do so, we will first calculate the t-statistic, which is as easy as inserting our acquired descriptive statistics into the equation above.

\[\frac{48.3 - 60}{14.29 / \sqrt{20}} = -3.66\]

Of course, we can also calculate this in R using the following code:

# calculate the t-statistic
coop_t <- (coop_mean - 60) / (coop_sd / sqrt(coop_n))

# print the t-statistic
c("t_stat" = coop_t)
##    t_stat 
## -3.662373

The t-statistic (-3.662473) is negative because our acquired mean was less than the mean is was tested against. If the acquired mean was greater than the value it was tested against, it would be positive. It is completely fine to have a negative t-statistic and you can treat it exactly the same as a positive t-statistic (in fact, some researchers will drop the negative sign when reporting the results of a negative t-test).

Next, we can use the pt() function to calculate the probability of getting a t-statistic of -3.662473 or less from a t-distribution with 19 degrees of freedom when the null hypothesis is true. The pt() function takes three arguments: (1) q (the t-statistic), (2) df (the degrees of freedom), and (3) lower.tail (whether we want to find the cumulative probability for the upper or lower part of the distribution). Below I’ve taken the absolute value of our t-statistic (using abs()) so that we can use lower.tail = FALSE regardless of whether our t-statistic is positive or negative.

To calculate a two-tailed p-value using our t-statistic, we would run the following code:

# calculate p-value
coop_p <- pt(q = abs(coop_t), df = coop_df, lower.tail = FALSE) * 2

#print p-value
c("p_val" = coop_p)
##       p_val 
## 0.001655805

We get a p-value of .001655805. In other words, there was an approximately .17% chance of getting a t-statistic equal to our less than the t-statistic we got when the null hypothesis is true.

The result is multiplied by 2 because we wanted to calculate a two-sided significance test and wanted to be able to claim significance irrespective of whether the t-statistic was above or below the 50% value of 60.

We could have run a one-sided t-test, but, as shown in the image below, we would not have been able to conclude whether our result was significant or not.

We can also calculate an effect size for our statistic. Cohen’s d is a very popular measure of effect size for the t-test and it tells you the size of your effect (the difference between the acquired mean and the value specified) in standardized units. To calculate Cohen’s d, we simply divide the difference in the means by the standard deviation.

# calculate Cohen's d
coop_d <- (coop_mean - 60) / coop_sd

# print Cohen's d
coop_d
## [1] -0.8189315

Typically, we present a Cohen’s d value as a positive number. As with the t-statistic, Cohen’s d being positive or negative just reflects whether the acquired mean was greater than the specified mean or less than the specified mean.

The thresholds for a small, medium, and large effect size are shown below.

d.value interpretation
0.2 small
0.5 medium
0.8 large

Interpreting our Cohen’s d value using the threshold from the table, we might say that there was a large difference between our acquired mean (48.30) and the mean we were comparing it against (60.00).

Finally, we can also derive the 95% confidence interval for our sample mean by running the following code:

# calculate 95% confidence interval
coop_low  <- coop_mean + ((coop_sd / sqrt(coop_n)) * qt(.025, df = coop_df))
coop_up   <- coop_mean + ((coop_sd / sqrt(coop_n)) * qt(.975, df = coop_df))

# print 95% confidence interval
c("95% CI Lower" = coop_low,
  "95% CI Upper" = coop_up)
## 95% CI Lower 95% CI Upper 
##     41.61352     54.98648

The code may seem like a bit of a mess, but we are just taking our acquired mean (coop_mean) and adding to it the standard error of the mean ((coop_sd / sqrt(coop_n)) multiplied by the critical t-statistic for a t-distribution with 19 degrees of freedom (qt(.025, df = coop_df) and qt(.975, df = coop_df), respectively). Our results of the 95% confidence interval means that, if we derived 95% confidence intervals for 100 different samples, 95% of them would contain the true population mean. In other words, there is a 95% chance that our confidence interval contains the true population mean.

The image above shows our acquired mean with a 95% confidence interval.

Using Functions

There are two useful functions for conducting a one-sample t-test in R. The first, is called t.test() and it is automatically loaded as part of the {stats} package when you first open R. To run a one-sample t-test using t.test(), you provide the function the column of the data you are interested in (e.g., x = data$cooperation) and the mean value you want to compare the data against (e.g., mu = 60).

t.test(x = data$cooperation, mu = 60)
## 
##  One Sample t-test
## 
## data:  data$cooperation
## t = -3.6624, df = 19, p-value = 0.001656
## alternative hypothesis: true mean is not equal to 60
## 95 percent confidence interval:
##  41.61352 54.98648
## sample estimates:
## mean of x 
##      48.3

As part of the output, you are provided the mean of cooperation, the t-statistic, the degrees of freedom, the p-value, and the 95% confidence interval. The values outputted by t.test() should be exactly the same as the values we calculated. Unfortunately, we did not get a measure of the effect size.

The oneSampleTTest() function from the the {lsr} package includes Cohen’s d automatically, but the downside is that you have to load the package separately.

oneSampleTTest(x = data$cooperation, mu = 60)
## 
##    One sample t-test 
## 
## Data variable:   data$cooperation 
## 
## Descriptive statistics: 
##             cooperation
##    mean          48.300
##    std dev.      14.287
## 
## Hypotheses: 
##    null:        population mean equals 60 
##    alternative: population mean not equal to 60 
## 
## Test results: 
##    t-statistic:  -3.662 
##    degrees of freedom:  19 
##    p-value:  0.002 
## 
## Other information: 
##    two-sided 95% confidence interval:  [41.614, 54.986] 
##    estimated effect size (Cohen's d):  0.819

As you can see from the output, oneSampleTTest() provides you all of the information that t.test() did, but it also includes Cohen’s d.

Interpretation and Write-Up

An example of how to write-up a one-sample t-test is included below.

"Average cooperation (M = 48.30, SD = 14.29) was substantially less than 60, t(19) = 3.66, p = .002, 95% CI [41.61, 54.97], d = 0.82.


Independent Samples t-test

Cool. But what if we wanted to compare the mean level of cooperation when decisions were made publicly versus the mean level of cooperation when decisions were made anonymously. In that case, we would use an independent samples t-test, which compares the means of, well, two independent samples. For instance, I might suspect that the mean cooperation will be greater when decisions about cooperation are made publicly rather than when decisions about cooperation are made anonymously. The alternative hypothesis (\(H_{1}\)), in that case, would be that the mean cooperation in the public group is not equal to the mean cooperation in the anonymous group (\(\mu_1 \neq \mu_2\)). The null hypothesis (\(H_{0}\)) would be that the means of the two groups are equal (\(\mu_1 = \mu_2\)).

Using Arithmetic

To calculate an independent samples t-test by hand, we will want to start by splitting our data frame into two data frames according to whether the groups made decisions anonymously or publicly.

data_public <- data %>%
  filter(condition == "public")

data_anonymous <- data %>%
  filter(condition == "anonymous")

Then, as we did for the one-sample t-test, we want to calculate descriptive statistics for both of the conditions.

# calculate group1 descriptives
group1_mean  <- mean(data_public$cooperation)
group1_sd    <- sd(data_public$cooperation)
group1_var   <- group1_sd^2
group1_n     <- length(data_public$cooperation)

# calculate group2 descriptives
group2_mean  <- mean(data_anonymous$cooperation)
group2_sd    <- sd(data_anonymous$cooperation)
group2_var   <- group2_sd^2
group2_n     <- length(data_anonymous$cooperation)

# print the descriptive statistics
list("public"    = c("mean" = group1_mean,
                     "sd"   = group1_sd,
                     "var"  = group1_var,
                     "n"    = group1_n),
     "anonymous" = c("mean" = group2_mean,
                     "sd"   = group2_sd,
                     "var"  = group2_var,
                     "n"    = group2_n))
## $public
##      mean        sd       var         n 
##  55.70000  14.84775 220.45556  10.00000 
## 
## $anonymous
##      mean        sd       var         n 
## 40.900000  9.421606 88.766667 10.000000

As we can see from the statistics, the number of participants in each condition was equal (\(n_1 = 10\); \(n_2 = 10\)). The mean (55.70) and standard deviation (14.85) of cooperation in the public condition were both larger than the mean (40.90) and standard deviation (9.42) of cooperation in the anonymous group.

To calculate the t-statistic, we will use a similar equation to that used for the one-sample t-test, but the numerator will be the difference between the condition means (\(\bar X_1 - \bar X_2\)) and the denominator will be the standard error of the difference between the means (\(\hat \sigma_p \sqrt{\frac{1}{N_1} + \frac{1}{N_2}}\)).

\[\frac{\bar X_1 - \bar X_2}{\hat \sigma_p \sqrt{\frac{1}{N_1} + \frac{1}{N_2}}}\]

The standard error of the difference between the means is the pooled standard deviation (\(\hat \sigma_p\)) decreased by the sample size (\(\sqrt{\frac{1}{N_1} + \frac{1}{N_2}}\)). The pooled standard deviation is calculated using the following equation:

\[\hat \sigma_p = \frac{((N_1 - 1) \times \hat \sigma_1^2) + ((N_2 - 1) \times \hat \sigma_2^2)}{(N_1 - 1) + (N_2 - 1)}\]

Below is the code for calculating the t-statistic according to the equations above.

# calculate the t-statistic
mean_diff    <- group1_mean - group2_mean
group1_w     <- group1_n - 1
group2_w     <- group2_n - 1
sd_pooled    <- sqrt(((group1_w * group1_var) + (group2_w * group2_var)) / (group1_w + group2_w))
n_correction <- sqrt((1 / group1_n) + (1 / group2_n))
se           <- sd_pooled * n_correction
t_stat       <- mean_diff / se

# print the t-statistic
c("t_stat" = t_stat)
##   t_stat 
## 2.661499

Next, we will calculate a p-value for the t-statistic. The code for calculating the p-value is exactly the same as before, but we will use \(\mbox{df} = N-2\) for the degrees of freedom instead of \(\mbox{df} = N-1\) because our statistic contains two means.

# calculate degrees of freedom
df <- group1_n + group2_n - 2

# print out the degrees of freedom
c("df" = df)
## df 
## 18

Now that we have our t-statistic and degrees of freedom, we can calculate the p-value.

# calculate p-value
p_val <- pt(q = abs(t_stat), df = df, lower.tail = FALSE) * 2

# print p-value
c("p_val" = p_val)
##      p_val 
## 0.01589758

Looks like the probability of obtaining a difference in means of this size or greater, when the null hypothesis is true, is 1.59%.

The code for calculating Cohen’s d is a bit more complex than what we used above, but, again, we are simply deriving the difference in means in standardized units.

# calculate cohen's d
d_val <- mean_diff / sqrt((group1_sd^2 + group2_sd^2) / 2)

# print cohen's d
c("d_val" = d_val)
##    d_val 
## 1.190259

Looks like cooperation when decisions were made publicly was far higher than cooperation when decisions were made anonymously.

We can also calculate the 95% confidence interval for this difference.

# calculate upper and lower limits of 95% CI
ci_low <- mean_diff + (se * qt(.025, df = df))
ci_up  <- mean_diff + (se * qt(.975, df = df))

# print 95% confidence interval
c("95% CI Lower" = ci_low,
  "95% CI Upper" = ci_up)
## 95% CI Lower 95% CI Upper 
##     3.117245    26.482755

Again, we can be 95% certain that our confidence interval contains the true population mean.

Using Functions

As with the one-sample t-test, we can use the t.test() function from the the built-in {stats} package to conduct an independent samples t-test.

t.test(data_public$cooperation, data_anonymous$cooperation, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  data_public$cooperation and data_anonymous$cooperation
## t = 2.6615, df = 18, p-value = 0.0159
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.117245 26.482755
## sample estimates:
## mean of x mean of y 
##      55.7      40.9

Since we specified var.equal = TRUE in the t.test() function, it calculated a Student’s t-test. If we had specified var.equal = FALSE (the default for both t.test() and independentSamplesTTest()) we would have calculated a Welch’s t-test. Unlike Student’s t-test, Welch’s t-test does not assume equal variances between groups. However, when the variances and sample sizes are equal, Welch’s t-test will produce a t-statistic that is nearly indistinguishable from the t-statistic produced by Student’s t-test. There is really no need to ever use Student’s t-test (other than cases with very small sample sizes).

We can also use the independentSamplesTTest function in the {lsr} package to get the output with Cohen’s d included.

independentSamplesTTest(formula   = cooperation ~ condition, 
                        data      = data, 
                        var.equal = TRUE)
## 
##    Student's independent samples t-test 
## 
## Outcome variable:   cooperation 
## Grouping variable:  condition 
## 
## Descriptive statistics: 
##             anonymous public
##    mean        40.900 55.700
##    std dev.     9.422 14.848
## 
## Hypotheses: 
##    null:        population means equal for both groups
##    alternative: different population means in each group
## 
## Test results: 
##    t-statistic:  -2.661 
##    degrees of freedom:  18 
##    p-value:  0.016 
## 
## Other information: 
##    two-sided 95% confidence interval:  [-26.483, -3.117] 
##    estimated effect size (Cohen's d):  1.19

The formula argument for independentSamplesTTest may look unfamiliar to some of you. It uses formula syntax, which you will use a lot when you start analyzing data using multiple regression models. In brief, whatever comes on the left of the tilde (~) is the dependent variable (in this case, cooperation) and whatever comes on the right of the tilde is the independent variable (in this case, condition).

Interpretation and Write-Up

A proper write-up for our Independent Sample t-test would be:

"Cooperation in the public condition (M = 55.70, SD = 14.84) was much greater than cooperation in the anonymous condition (M = 40.90, SD = 9.42), t(18) = 2.66, p = .018, 95% CI [3.12, 26.64], d = 1.19.

Plotting an Independent Samples t-test

We can quickly plot our means and 95% confidence interval using the ggerrorplot() function in the {ggpubr} package.

# create plot
ggerrorplot(data, 
            x         = "condition", 
            y         = "cooperation", 
            desc_stat = "mean_ci", 
            color     = "condition", 
            ylab      = "Cooperation") +
  # add results of the t.test
  stat_compare_means(method = "t.test")


Minihacks

You are welcome to work with a partner or in a small group of 2-3 people. Please feel free to ask the lab leader any questions you might have!

Minihack 1: Trouble Shooting

My advisor told me that I had multiple errors in my code. She told me the line numbers where the errors are, but she thought it would be a good learning experience for me to try to solve the errors myself. I need your help. Fix the errors in the following chunks of code.

Error 1:

I am trying to calculate a 95% confidence interval for a mean, but the interval I get doesn’t even include my estimated mean.

ci_lower <- 20 + ((5 / sqrt(120)) * qnorm(.025, 100))
ci_upper <- 20 + ((5 / sqrt(120)) * qnorm(.975, 100))

Error 2:

I am trying to calculate the p-value for a t-statistic of 3.10, but I keep getting a p-value that is greater than 1 (which my advisor says is quite unlikely).

# calculate p-value
p_val <- pt(q = 3.10, df = 19, lower.tail = TRUE) * 2

# print p_value
p_val

Error 3:

I prefer t.test() over independentSamplesTTest() because I’m an R purist, and I wanted to calculate a Cohen’s d value using the t2d() function in the {psych} package. I don’t think it is working properly. I have a t-statistic of 5.12and the sample sizes for my two groups are 15 and 22.

# load psych
library(psych)

# calculate cohen's d
t2d(t = 5.12, n = 15, n1 = 22)

Minihack 2: Using Arithmetic

You are reviewing a manuscript that claims people who were assigned to use a light editor theme in R Studio wrote better code than people who were assigned to use a dark editor theme in R Studio. Better code was operationalized as performance on a coding test (scored out of 25). The researchers state that the difference is significant at p = .023 (two-tailed), but you have your doubts. The researchers are not willing to share their data with you, but you glean the following information from the manuscript.

\(\hat{\mu}_{DarkTheme}\) = 16.56

\(\hat{\mu}_{LightTheme}\) = 20.01

\(\hat{\sigma}_{DarkTheme}\) = 5.44

\(\hat{\sigma}_{LightTheme}\) = 3.12

\(n_{DarkTheme}\) = 8

\(n_{LightTheme}\) = 19

  1. Calculate the t-statistic for the difference between the two means. Assume the variances of the two groups are equal.
# your code here
  1. Is the difference between the two means significant? Note. You will have to calculate the degrees of freedom.
# your code here
  1. Why do you think the researchers got the p-value they did?
# your code here

Minihack 3: Using Functions

The same researchers of the editor theme manuscript submit another paper on the difference in coding abilities between Mac and PC users. They argue that PC users are better at coding than Mac users. Again, coding ability was operationalized as performance on a coding test scored out of 25. Despite a growing fear that you are becoming Reviewer 2, you acquire their data through GitHub to check their analyses.

Run the following lines of code to load the data:

# set seed for reproducability
set.seed(42)

# load data
data_os <- data.frame("id"      = 1:1e5,
                      "os"      = c(rep("pc", 2e4), 
                                    rep("mac", 8e4)),
                      "ability" = c(rnorm(2e4, 15, 90),
                                    rnorm(8e4, 13.9, 1)))
  1. Perform an Independent Samples t-test on the data (preferably using a built in function).
# your code here
  1. After running the code, you think that maybe you should have run a Welch’s t-test instead of a Student’s t-test. Investigate your chosen function to figure out how to perform a Welch’s t-test on the data. Do you find a different result? Why or why not?
# your code here
  1. Let’s give the researchers the benefit of the doubt and assume the variances of both groups are equal. Is there a meaningful difference between Mac and PC users? Use a statistic to provide support for your answer.
# your code here
  1. Plot the data using ggerrorplot(). Does this support your answer to question 3?
# your code here