Today we will review how to run models containing interactions between two continuous predictors. We will go over how to specify interaction terms in R, how to interpret the model output and how to visualize the results.
Be sure to have the following packages installed and loaded:
library(tidyverse) # for plotting and data wrangling
library(rio) # for importing data
library(psych) # for descriptives
library(olsrr) # for diagnostics
library(broom) # for tidying model output
library(reghelper) # for calculating simple slopes
library(sjPlot) # for plotting simple slopes
The tendency to suppress the expression of emotion when regulating one’s emotions is associated with receiving less social support from friends (Srivastava et al., 2009). A graduate student is interested in examining how this relationship differs depending on the extent to which the emotion regulator holds individualistic as compared to collectivist values. To test this, participants complete the following measures:
Y
X
Z
(moderator)
social <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/social_support.csv")
str(social)
## 'data.frame': 72 obs. of 4 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ socsup : int 5 2 4 3 5 3 1 3 1 4 ...
## $ suppression: int 1 4 1 2 1 4 5 3 4 1 ...
## $ indiv : int 5 4 5 4 4 5 3 5 3 4 ...
describe(social)
\[\hat{Y} = b_0 + b_1X + b_2Z + b_3(XZ)\]
\[\hat{SocSupport} = b_0 + b_1Suppression + b_2Individualism + b_3(Suppression*Individualism)\]
Question: What does b0 represent?
Question: What does b1 represent?
Question: What does b2 represent?
Question: What does b3 represent?
centering addresses issues of multicollinearity
centering makes the model coefficients more interpretable
mutate()
and scale()
functions. By default, scale()
z-scores variables, but we can get it to just perform mean centering by setting the argument scale = FALSE
. Remember, a z score is \(\frac{X - \bar{X}}{\sigma_X}\), or a mean-centered score divided by the SD; by setting scale = FALSE
we’re telling R to center it but not scale it (i.e., don’t divide by the SD).scale()
are scale(x, center = TRUE, scale = TRUE)
. To be very explicit, we’ll set center = TRUE
and scale = FALSE
.scale()
actually changes the class of the resulting variable to be matrix
, so we’ll use as.numeric()
to force the resulting centered variables to be of type numeric
.social <- social %>%
mutate(suppression_c = as.numeric(scale(suppression, center = TRUE, scale = FALSE)),
indiv_c = as.numeric(scale(indiv, center = TRUE, scale = FALSE)))
head(social)
lm(Y ~ X*Z)
, which tells R to include X
, Z
, and XZ
(the interaction term). It is equivalent to running it spelled out, e.g., lm(Y ~ X + Z + X:Z)
. If we wanted just the interaction, we would run a model with just the last term, e.g., lm(Y ~ X:Z)
…but DON’T DO THIS!model_int <- lm(socsup ~ suppression_c*indiv_c, data = social)
summary(model_int)
##
## Call:
## lm(formula = socsup ~ suppression_c * indiv_c, data = social)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.79330 -0.81716 0.02792 1.00532 2.45494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.976971 0.141778 20.997 < 0.0000000000000002 ***
## suppression_c -0.009945 0.106094 -0.094 0.925591
## indiv_c 0.161361 0.112255 1.437 0.155175
## suppression_c:indiv_c -0.303784 0.083044 -3.658 0.000497 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.197 on 68 degrees of freedom
## Multiple R-squared: 0.1874, Adjusted R-squared: 0.1515
## F-statistic: 5.227 on 3 and 68 DF, p-value: 0.002632
Question: What can we conclude? Is there an interaction between suppression and individualism predicting social support?
##
## Call:
## lm(formula = socsup ~ suppression_c * indiv_c, data = social)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.79330 -0.81716 0.02792 1.00532 2.45494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.976971 0.141778 20.997 < 0.0000000000000002 ***
## suppression_c -0.009945 0.106094 -0.094 0.925591
## indiv_c 0.161361 0.112255 1.437 0.155175
## suppression_c:indiv_c -0.303784 0.083044 -3.658 0.000497 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.197 on 68 degrees of freedom
## Multiple R-squared: 0.1874, Adjusted R-squared: 0.1515
## F-statistic: 5.227 on 3 and 68 DF, p-value: 0.002632
Question: What does b0 represent?
Question: What does b1 represent?
Question: What does b2 represent?
Question: What does b3 represent?
##
## Call:
## lm(formula = socsup ~ suppression * indiv, data = social)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.79330 -0.81716 0.02792 1.00532 2.45494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.15430 0.91326 -0.169 0.866336
## suppression 0.88875 0.27286 3.257 0.001757 **
## indiv 1.06850 0.27183 3.931 0.000201 ***
## suppression:indiv -0.30378 0.08304 -3.658 0.000497 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.197 on 68 degrees of freedom
## Multiple R-squared: 0.1874, Adjusted R-squared: 0.1515
## F-statistic: 5.227 on 3 and 68 DF, p-value: 0.002632
# using the {olsrr} package
ols_vif_tol(model_int)
Question: Do we have any multicollinearity issues?
# piping the uncentered model to the tolerance function
lm(socsup ~ suppression*indiv, data = social) %>%
ols_vif_tol()
suppression
and indiv
, and the interaction term (a linear combination of these two variables), that we’ll call suppression_x_indiv
…social %>%
mutate(suppression_x_indiv = suppression * indiv) %>% # create the interaction term
select(suppression,
indiv,
suppression_x_indiv) %>% # select the 3 variables we want to correlate
cor() # generate a correlation matrix of these 3 variables
## suppression indiv suppression_x_indiv
## suppression 1.00000000 -0.09892416 0.6667002
## indiv -0.09892416 1.00000000 0.6095234
## suppression_x_indiv 0.66670020 0.60952336 1.0000000
Question: Do you see a problem here?
social %>%
mutate(suppression_x_indiv_c = suppression_c * indiv_c) %>% # create the interaction term
select(suppression_c,
indiv_c,
suppression_x_indiv_c) %>% # select the 3 variables we want to correlate
cor() # generate a correlation matrix of these 3 variables
## suppression_c indiv_c suppression_x_indiv_c
## suppression_c 1.00000000 -0.098924158 0.055129871
## indiv_c -0.09892416 1.000000000 -0.009086891
## suppression_x_indiv_c 0.05512987 -0.009086891 1.000000000
model_int
) into a nicely formatted data frame using broom::tidy()
…model_int_coefs <- model_int %>%
tidy()
# view the data frame of coefficients
model_int_coefs
{dplyr}
functions.# b0 (the intercept)
b0 <- model_int_coefs %>%
filter(term == "(Intercept)") %>%
select(estimate)
# b1 = the slope for supression
b_supp <- model_int_coefs %>%
filter(term == "suppression_c") %>%
select(estimate)
# b2 = the slope for individualism
b_indiv <- model_int_coefs %>%
filter(term == "indiv_c") %>%
select(estimate)
# b3 = the coefficient for the interaction term
b_supp_x_indiv <- model_int_coefs %>%
filter(term == "suppression_c:indiv_c") %>%
select(estimate)
# low = mean - 1 SD
low_indiv <- mean(social$indiv_c, na.rm = TRUE) - sd(social$indiv_c, na.rm = TRUE)
# mid = mean
mid_indiv <- mean(social$indiv_c, na.rm = TRUE)
# high = mean + 1 SD
high_indiv <- mean(social$indiv_c, na.rm = TRUE) + sd(social$indiv_c, na.rm = TRUE)
# calculate simple intercepts and bind them together into 3 rows
simple_intercepts <- rbind(
b0 + (b_indiv * low_indiv),
b0 + (b_indiv * mid_indiv),
b0 + (b_indiv * high_indiv)
)
# We'll give it a column name which will help below
colnames(simple_intercepts) <- "simple_intercepts"
# Calculate simple slopes
simple_slopes <-rbind(
b_supp + (b_supp_x_indiv * low_indiv),
b_supp + (b_supp_x_indiv * mid_indiv),
b_supp + (b_supp_x_indiv * high_indiv)
)
# Again, give it a column name
colnames(simple_slopes) <- "simple_slopes"
# create labels for the different levels of individualism
indiv_levels <- c("low","mid","high")
# put them all together into a data frame
int_plot_df <- data.frame(cbind(
indiv_levels, # labels
simple_intercepts, # simple intercepts
simple_slopes)) # simple slopes
int_plot_df
ggplot
and a few different geoms
, including geom_point()
and geom_abline()
social %>%
ggplot(aes(x = suppression_c, y = socsup)) +
# add points for scatterplot and make them invisible
geom_point(alpha = 0) +
# add line for low level of individualism
geom_abline(aes(intercept = int_plot_df$simple_intercepts[1],
slope = int_plot_df$simple_slopes[1],
color = "-1SD Individualism"), # This will give it a label
show.legend = TRUE) +
# add line for mid level of individualism
geom_abline(aes(intercept = int_plot_df$simple_intercepts[2],
slope = int_plot_df$simple_slopes[2],
color = "Mean Individualism"),
show.legend = TRUE) +
# add line for high level of individualism
geom_abline(aes(intercept = int_plot_df$simple_intercepts[3],
slope = int_plot_df$simple_slopes[3],
color = "+1SD Individualism"),
show.legend = TRUE) +
# add a title and label the axes
labs(title = "Effect of expressive suppression on social support \nat low, mid, & high levels of individualism",
x = "Expressive suppression (Centered)",
y = "Social Support") +
# we need to set the colors of the lines manually, which requires scale_color_manual()
scale_color_manual("Individualism Level",
# values() is where you specify the colors, in the form label = color_name
values = c("-1SD Individualism" = "salmon",
"Mean Individualism" = "purple",
"+1SD Individualism" = "cornflower blue")) +
theme_minimal()
Question: How would you interpret this plot?
sjPlot::plot_model()
plot_model()
function from {sjPlot}
. This function takes in a fitted model object as the first argument (i.e. model
). You can then specify the type
of plot; in this case we will tell it to plot the simple slopes of our interaction model by specifying type = "int"
. Lastly, we will specify which values of the moderator variable (in our case, Individualism) to plot as separate lines.plot_model(model = model_int, # model fit object
type = "int", # interaction
mdrt.values = "meansd") # which values of the moderator variable to use
simple_slopes()
from the {reghelper}
package (make sure you have this installed). This function takes in a model object and outputs a data frame with a row for each simple effect.sstest
value in a particular column indicates that the simple slope for this variable is being tested. The column labeled Test Estimate
will give you the simple slopes (by default at -1 SD, mean, and + 1 SD), and the remaining columns have the information that pertains to the significance test.simple_slopes(model = model_int)
simple_slopes(model_int) %>%
filter(suppression_c == "sstest")
reghelper::simple_slopes()
…simple_slopes(model_int) %>%
filter(suppression_c == "sstest") %>%
select(`Test Estimate`)
# we created this data frame earlier
int_plot_df %>%
select(simple_slopes)
For her masters project, a grad student wants to see whether the time students spend studying (study_time
) interacts with test anxiety (anxiety
) to predict students’ test performance (perf
).
Import the data below:
test_perf <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-7/data/test_perf.csv")
Run a model testing whether test anxiety moderates the relationship between time spent studying and test performance. Don’t forget to first center your predictors.
What do the model coefficients mean? Try explaining them to a friend.
Visualize the interaction. For an extra challenge, try doing this without using {sjPlot}
.
What can you infer from this plot? Again, try explaining it to a friend.
Test whether each simple slope is significantly different from 0.
What do the results of these significance tests mean?