You can download the Rmd file here to follow
along.
Purpose
Today we will review how to run models containing interactions
between a continuous and categorical predictor. We will go over how to
specify interaction terms in R, how to interpret the model output, and
how to visualize the results.
Be sure to have the following packages loaded:
library(rio) # for importing
library(tidyverse) # for plotting and data wrangling
library(psych) # for calculating descriptives
library(sjPlot) # for plotting interactions
library(emmeans) # for simple slope tests
Research scenario
Today’s dataset was inspired by a recent study by Markowitz
& Levine (2021) (the data you will be working with has been
simulated). In the study, participants completed a matrix task under
time pressure and then self-reported their scores. For each matrix
problem that they got right, they could earn 25 cents, so it was
tempting to cheat and self-report a higher score. Half of the
participants shredded their worksheet before self-reporting and half of
the participants handed the worksheet to the experimenter before
self-reporting. Honesty scores were also self-reported from the HEXACO
Personality Inventory (from 1 = extremely low honesty to 5 = extremely
high honesty). The researchers hypothesized that personality and
situation would interact to predict dishonest behavior.
Import the data
#import data
data <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/cheating_data.csv")
data <- data %>% mutate(condition = factor(condition,
labels = c("Non-shredder", "Shredder")))
Explore the data
- Use
str()
to look at the structure of the data
- Use
head()
to look at the first few rows of the
data
- Calculate descriptives for the variables
honesty
and
claimed_solved
- Calculate descriptives for the variables
honesty
and
claimed_solved
, grouped by condition
# your code here
Simple regression
First, let’s look at what the overall relationship between honesty
and number of problems people claimed they solved looks like.
Graph in ggplot()
.
data %>%
ggplot(aes(x = honesty, y = claimed_solved)) +
geom_point()+
geom_smooth(method = "lm", se = FALSE)+
geom_label(x = 2, y = 5.8, label = "all") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Second, let’s perform a simple regression model using
honesty
as a single predictor of number of problems people
claimed they solved, claimed_solved
.
\[\hat{Claimed_i} = \beta_0 +
\beta_1Honesty \]
Run this model using lm()
.
# your code here
Question: Write the full model with parameter estimates filled
in:
\[\hat{Claimed_i} = \]
Question: What do each of the parameter estimates mean?
Main Effects of Categorical & Continuous Variables
Let’s look at a model that investigates both honesty
and
condition
as predictors of claimed_solved
, but
not their interaction.
\[\hat{Claimed_i} = \beta_0 +
\beta_1Honesty + \beta_2Condition\]
Run this model using lm()
.
# your code here
Question: Write the full model with parameter estimates filled
in:
\[\hat{Claimed_i} = \]
Question: What do each of the parameter estimates mean?
We could simplify the above equation to represent the relationship
between honesty and number of problems people claim to have solved for
the non-shredder (condition = 0) and shredder conditions (condition =
1):
- Non-shredder (Condition = 0):
\[\hat{Claimed_i} = 6.04 -0.45Honesty +
1.12(0) = 6.04 - 0.45Honesty\] * Shredder (Condition = 1):
\[\hat{Claimed_i} = 6.04 -0.45Honesty +
1.12(1) = 7.16 - 0.45Honesty\] Notice that the only difference
between these two equations are their intercepts. If we graphed the two
models, it would look like:
data %>%
ggplot(aes(x = honesty, y = claimed_solved)) +
geom_point() +
geom_abline(slope = -.45, intercept = 6.04, color = "red", size = 1) +
geom_abline(slope = -.45, intercept = 7.16, color = "blue", size = 1) +
geom_label(x = 1.5, y = 5.2, label = "shredder") +
geom_label(x = 1.5, y =6.5, label = "no shredder") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
We are not allowing the relationship between honesty
and
claimed_solved
to differ between the two shredder
and non-shredder conditions.
However, our theory predicts that people will act differently
depending on the condition that they were in. To investigate this, we
need to include the interaction effect to examine whether the
relationship between honesty and number of problems people claimed they
solves differs depending on whether people were in the non-shredder
or shredder condition.
Continuous x Categorical Interaction
Visualization
First, let’s graph the continuous X categorical interaction between
honesty and condition.
In ggplot
:
# on a single graph
data %>%
ggplot(aes(x = honesty, y = claimed_solved)) +
geom_point()+
geom_smooth(method = "lm", se = FALSE, aes(color = condition)) +
# scale_color_manual(values = c("red", "blue")) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# on two separate graphs
data %>%
ggplot(aes(x = honesty, y = claimed_solved)) +
geom_point()+
geom_smooth(method = "lm", se = FALSE)+
facet_wrap(~condition) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
It certainly looks like there could be a significant interaction
effect. Let’s run the multiple regression model to examine it
further.
Centering
Notice that in this case, honesty has been measured on a 1 to 5
scale. If we do not center honesty first, the model intercept
will correspond to the predicted claimed_solved
score when
honesty = 0, a value that falls outside of the range of possible
ones.
Typically, the continuous variables will be centered prior to running
the analysis. This guarantees that the intercept will intersect the
y-axis at a meaningful value (when the predictor is equal to its
mean).
To center the continuous predictor:
# You can either...
data <- data %>%
mutate(honesty_c = honesty - mean(honesty), # subtract the mean, or
honesty_center = scale(honesty, scale = FALSE)) # use the scale() function
Running the Interaction Model
Run a multiple regression model predicting
claimed_solved
from honest_c
,
condition
, and the interaction between the two.
\[\hat{Claimed_i} = b0 + \beta_1Honesty_C
+ \beta_2Condition + \beta_3Honest_C*Condition\]
# Your code here
One more way of plotting the interaction after you’ve run the
interaction model:
In sjPlot
model_int <- lm(claimed_solved ~ honesty_c*condition, data = data)
plot_model(model_int, type = "pred", terms = c("honesty_c", "condition"))
# type = "pred" means a plot of the predictor vs the predicted values of y
# the first variable listed in the terms goes on the x-axis, the second variable gets its levels separated by color
Question: Write the full model with parameter estimates filled
in:
\[\hat{Claimed_i} = \]
Question: What do each of the parameter estimates mean?
Testing Significance of Simple Slopes
The simple slopes are the slopes representing the
relationship between predictor 1 and Y at specific levels of predictor
2. We can get simple slopes to examine what the relationship is between
honesty_c
and claimed_solved
at different
levels of condition
.
To get the simple slopes we can use the
emtrends
function from the emmeans
package.
There are different questions we can ask by calculating simple slopes
and performing hypothesis tests with them.
Q1: Is honesty a significant predictor of number of problems people
claim to have solved at each level of condition (non-shredder &
shredder conditions)?
emtrends(model_int, ~condition, var = "honesty_c")
## condition honesty_c.trend SE df lower.CL upper.CL
## Non-shredder -0.8532 0.136 160 -1.121 -0.586
## Shredder -0.0751 0.130 160 -0.333 0.182
##
## Confidence level used: 0.95
Q2: Is the slope for the non-shredder condition (-0.85) significantly
different from the slope for the shredder condition (-0.08)?
emtrends(model_int, pairwise~condition, var = "honesty_c")
## $emtrends
## condition honesty_c.trend SE df lower.CL upper.CL
## Non-shredder -0.8532 0.136 160 -1.121 -0.586
## Shredder -0.0751 0.130 160 -0.333 0.182
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## (Non-shredder) - Shredder -0.778 0.188 160 -4.137 0.0001
This is the same as the test of the interaction effect from the
overall model.
Q3: Is there a significant difference in the number of problems
people claimed to have solved between the non-shredder and shredder
conditions at different levels of honesty?
Because honesty is a continuous variable, we first have to choose
specific values to examine the difference between non-shredder and
shredder conditions at.
chosen_values <- list(honesty_c=c(-2, -1, 0, 1, 2), condition = c("Non-shredder","Shredder"))
combinations <- emmeans(model_int, ~ honesty_c*condition, at=chosen_values)
#from emmeans
contrast(combinations, "pairwise", by = "honesty_c")
## honesty_c = -2:
## contrast estimate SE df t.ratio p.value
## (Non-shredder) - Shredder 0.437 0.424 160 1.032 0.3038
##
## honesty_c = -1:
## contrast estimate SE df t.ratio p.value
## (Non-shredder) - Shredder -0.341 0.271 160 -1.259 0.2099
##
## honesty_c = 0:
## contrast estimate SE df t.ratio p.value
## (Non-shredder) - Shredder -1.119 0.194 160 -5.775 <.0001
##
## honesty_c = 1:
## contrast estimate SE df t.ratio p.value
## (Non-shredder) - Shredder -1.897 0.269 160 -7.043 <.0001
##
## honesty_c = 2:
## contrast estimate SE df t.ratio p.value
## (Non-shredder) - Shredder -2.675 0.422 160 -6.336 <.0001
Minihacks
You are interested in whether the time students spend studying
(study_time
) interacts with test anxiety
(anxiety
) to predict students’ test performance
(perf
).
Import the data below:
test_perf <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/test_perf.csv")
head(test_perf)
str(test_perf)
psych::describe(test_perf)
As you can see, anxiety is measured categorically (“low” or “high”)
and study time ranges from .9 hours to 6.30 hours.
Minihack 1
Run a model testing whether test anxiety moderates the relationship
between time spent studying and test performance. Center study time so
that it is more meaningful.
#Your code here
Interpret the model coefficients.
Minihack 2
Visualize the interaction using sjPlot
or
ggplot
.
#Your code here
Minihack 3
Test whether each simple slope is significantly different from 0.
#Your code here
Test whether the simple slopes for low anxiety and high anxiety are
significantly different from each other.
#Your code here
What do the results of these significance tests mean?