You can download the Rmd file here to follow along.
Today we will review how to run models containing interactions between a continuous and categorical predictor. We will go over how to specify interaction terms in R, how to interpret the model output, and how to visualize the results.
Be sure to have the following packages loaded:
library(rio) # for importing
library(tidyverse) # for plotting and data wrangling
library(psych) # for calculating descriptives
library(sjPlot) # for plotting interactions
library(emmeans) # for simple slope tests
Today’s dataset was inspired by a recent study by Markowitz & Levine (2021) (the data you will be working with has been simulated). In the study, participants completed a matrix task under time pressure and then self-reported their scores. For each matrix problem that they got right, they could earn 25 cents, so it was tempting to cheat and self-report a higher score. Half of the participants shredded their worksheet before self-reporting and half of the participants handed the worksheet to the experimenter before self-reporting. Honesty scores were also self-reported from the HEXACO Personality Inventory (from 1 = extremely low honesty to 5 = extremely high honesty). The researchers hypothesized that personality and situation would interact to predict dishonest behavior.
#import data
data <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/cheating_data.csv")
data <- data %>% mutate(condition = factor(condition,
levels = c(0,1),
labels = c("Non-shredder", "Shredder")))
str()
to look at the structure of the datahead()
to look at the first few rows of the datahonesty
and claimed_solved
honesty
and claimed_solved
, grouped by condition
#Your code here
\[\hat{Claimed_i} = \beta_0 + \beta_1Honesty \] Run this model using lm()
.
# Your code here
Graph in ggplot()
.
# Your code here
\[\hat{Claimed_i} = \beta_0 + \beta_1Honesty + \beta_2Condition \]
So far in this course, we’ve worked only with additive effects, as shown here in this model. In this model, we are measuring the relation between honesty and how many matrices the person claimed to have solved while holding condition constant.
Let’s run this model using lm()
.
# Your code here
\[\hat{Claimed} = 6.04 + 1.12Condition- .45Honesty \]
Question: interpret the model coefficients and R^2.
To calculate simple slopes, you calculate the regression equation at specific levels of one of your variables. You probably wouldn’t calculate a simple slope for a multiple regression. This is just so that we can compare it later to our interaction model simple slopes.
When condition = 0 (there isn’t a shredder - you could get caught if you cheat)
Claimed = 6.04 + 1.12Condition- .45Honesty Claimed = 6.04 + 1.12(0) - .45Honesty Claimed = 6.04 - .45Honesty
When condition = 1 (there is a shredder - you can’t get caught if you cheat)
Claimed = 6.04 + 1.12Condition- .45Honesty Claimed = 6.04 + 1.12(1) - .45Honesty Claimed = 7.16 - .45Honesty
Question: now calculate the “simple slopes” when honesty = 0, honesty = 2, and honesty = 3
# Your text here
What model do you run if you think that the honesty slope is going to be different under different conditions?
\[\hat{Claimed_i} = \beta_0 + \beta_1Honesty + \beta_2Condition + \beta_3(Honesty*Condition)\] When the interaction term is added to the model, you are allowing for the possibility that the slope of one predictor differs at different levels of the other predictor. In this example, we can now account for the possibility that the relationship between honesty and matrices solved can differ at different levels of the condition variable.
To run an interaction model with the function lm()
, enter the predictor variables you are interacting separated by an asterisk on the right side of the equation, e.g., lm(Y ~ X*Z)
. It is equivalent to running it spelled out, e.g., lm(Y ~ X + Z + X:Z)
.
# Your code here
Claimed = 7.27 -.85Honesty -1.18Shredder + .77Honesty*Shredder
Question: interpret the model coefficients and R^2.
Shredder = 0 (There isn’t a shredder - you could get caught if you cheat)
Claimed = 7.27 -.85Honesty -1.18Shredder + .77HonestyShredder Claimed = 7.27 -.85Honesty -1.18(0) + .77Honesty(0) Claimed = 7.27 -.85Honesty
Shredder = 1 (There is a shredder - you can’t get caught if you cheat)
Claimed = 7.27 -.85Honesty -1.18Shredder + .77HonestyShredder Claimed = 7.27 -.85Honesty -1.18(1) + .77Honesty(1) Claimed = 6.09 -.08Honesty
In ggplot
:
# Your code here
In sjPlot
#plot_model(model_c, type = "pred", terms = c("honesty", "condition"))
For this section, uncomment and run this code. Take some time to look at these results and try to figure out what tests we are running and what you can conclude from them.
#emtrends(model_c, ~condition, var = "honesty")
Question: What is this testing? What can you conclude?
#emtrends(model_c, pairwise~condition, var = "honesty")
Question: What is this testing? What can you conclude?
#mylist <- list(honesty=c(1,2,3,4,5), condition=c("Non-shredder","Shredder"))
#combinations <- emmeans(model_c, ~ honesty*condition, at=mylist)
#contrast(combinations, "pairwise", by = "honesty")
Question: What is this testing? What can you conclude?
One thing you may have noticed while interpreting output is that we sometimes said “when honesty = 0.” However, for this example, honesty cannot be 0 since the scale goes from 1 to 5, making the interpretations less meaningful. Instead, we can mean center honesty. Here are two ways to center:
# data <- data %>%
# mutate(honesty_c = honesty - mean(honesty),
# honesty_center = scale(honesty, scale = FALSE))
# head(data)
Now, we can re-run our model…
# model_d <- lm(data = data, claimed_solved ~ honesty_c*condition)
# summary(model_d)
Question: now, how can we interpret the intercept?
Let’s compare that to our model where the predictor was not centered…
# summary(model_c)
Question: what changed from the first model to the model with centered honesty?
You are interested in whether the time students spend studying (study_time
) interacts with test anxiety (anxiety
) to predict students’ test performance (perf
).
Import the data below:
test_perf <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/test_perf.csv")
head(test_perf)
str(test_perf)
psych::describe(test_perf)
As you can see, anxiety is measured categorically (“low” or “high”) and study time ranges from .9 hours to 6.30 hours.
Run a model testing whether test anxiety moderates the relationship between time spent studying and test performance. Center study time so that it is more meaningful.
#Your code here
Interpret the model coefficients.
Visualize the interaction using sjPlot
or ggplot
.
#Your code here
Test whether each simple slope is significantly different from 0.
#Your code here
Test whether the simple slopes for low anxiety and high anxiety are significantly different from each other.
#Your code here
What do the results of these significance tests mean?