You will need the Rmd file for today’s lab - can download it here.

Be sure to have the following packages installed and loaded:

library(tidyverse) # for plotting and data wrangling
library(rio) # for importing data
library(psych) # for descriptives
library(lsr) # for eta squared functions
library(emmeans) # for marginal means and simple effects
library(sjPlot) # for plotting model results 
library(apaTables) # for tables of means
library(car) # for testing model assumptions
library(broom) # for tidying model output

# suppress scientific notation
options(scipen = 999)

Purpose

Factorial ANOVA (aka multiple regression with categorical predictors) refers to a special case of the general linear model in which there is an interaction of two or more categorical variables (i.e. factors). A factorial design is used when there is an interest in how two or more variables (or factors) affect some outcomes variable. Rather than conduct separate one-way ANOVAs for each factor, they are all included in one analysis. Today we will review how to run factorial ANOVA models in R and how to interpret and visualize the results.

Research scenario

Based on subjects’ self-reports of rejection sensitivity (N = 80), a researcher divides subjects into two equal groups (low RS and high RS). Whereas half of the subjects in each group interact with a partner who displays a happy emotional expression during the interaction, the other half of the subjects in each group interact with a partner who displays a neutral emotional expression during the interaction. After the interaction, subjects are asked to rate the statement, “My interaction partner likes me”, on a scale from 1 (strongly disagree) to 7 (strongly agree).

Factor 1: Rejection Sensitivity

  • Low
  • High

Factor 2: Partner’s Emotional Expression

  • Neutral
  • Happy

Dependent Variable: Perceived Liking

Hypothesis Testing

With a factorial ANOVA, three different null hypotheses can be tested regarding whether there is 1) a main effect of factor 1, 2) a main effect of factor 2, and 3) an interaction between factor 1 and factor 2.

Imagine a table with means like the following:

High Low Row Means
Neutral x.xx x.xx x.xx
Happy x.xx x.xx x.xx
Column Means x.xx x.xx x.xx

Null Hypothesis 1 (Main Effect of Rejection Sensitivity): \[H_0: \mu_{Low} = \mu_{High}\]

Null Hypothesis 2 (Main Effect of Partner’s Emotional Expression): \[H_0: \mu_{Neutral} = \mu_{Happy}\] Null Hypothesis 3 (Interaction Effect):

H0: There is no interaction between rejection sensitivity and partner’s emotional expression.

Import & inspect the Data

reject <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-8/data/reject.csv")
str(reject)
## 'data.frame':    80 obs. of  3 variables:
##  $ rs     : chr  "Low" "Low" "Low" "Low" ...
##  $ partner: chr  "Neutral" "Neutral" "Neutral" "Neutral" ...
##  $ liking : int  4 4 4 5 5 5 5 5 5 5 ...
head(reject)
  • It looks like rs and partner are both being read in as character variables. Let’s go ahead and change those to factors.
reject <- reject %>% 
  mutate(rs = as.factor(rs),
         partner = as.factor(partner))
  • Check the structure again.
str(reject)
## 'data.frame':    80 obs. of  3 variables:
##  $ rs     : Factor w/ 2 levels "High","Low": 2 2 2 2 2 2 2 2 2 2 ...
##  $ partner: Factor w/ 2 levels "Happy","Neutral": 2 2 2 2 2 2 2 2 2 2 ...
##  $ liking : int  4 4 4 5 5 5 5 5 5 5 ...

Notice that by default R orders factor levels alphabetically. In our case, this means that High will be the reference group of rejection sensitivity and Happy will be the reference group of interaction partner’s emotional expression. However, it might be more intuitive to have Low and Neutral be the reference groups, respectively.

  • To accomplish this, we can simply re-order the levels of our factor variables with fct_relevel().
reject <- reject %>% 
  mutate(rs = fct_relevel(rs, c("Low", "High")), 
         partner = fct_relevel(partner, c("Neutral", "Happy")))

str(reject) # Notice the re-ordering of levels
## 'data.frame':    80 obs. of  3 variables:
##  $ rs     : Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
##  $ partner: Factor w/ 2 levels "Neutral","Happy": 1 1 1 1 1 1 1 1 1 1 ...
##  $ liking : int  4 4 4 5 5 5 5 5 5 5 ...
  • We can also see how the factors are being coded using contrasts().
contrasts(reject$rs)
##      High
## Low     0
## High    1
contrasts(reject$partner)
##         Happy
## Neutral     0
## Happy       1

Notice that each are being dummy coded. It’s very important to be aware of how the factors in your regression model are being coded to be able to accurately interpret the regression coefficients in your output.

Speaking of regression, note we are going to conduct the analysis using the lm() function. The following is how you could write the factorial ANOVA as the regression equation that we’re estimating:

\[Liking_i = \beta_0 + \beta_1RejectionSensitivity_i + \beta_2PartnerEmotion_i + \beta_3RejectionXPartner_i + \epsilon_i\]

Descriptive Statistics

Obtaining sample size per condition.

reject %>%
  group_by(rs, partner) %>% 
  summarize(n = n())
## `summarise()` has grouped output by 'rs'. You can override using the `.groups`
## argument.

Table of Means

The results of a factorial ANOVA are often represented in a table of means, which contains the means of each combination of conditions (the cell means) and means for the levels of each variable ignoring the other variables (the marginal means).

Table of Means using apa.2way.table()

  • The apa.2way.table() function from the apaTables package is a very convenient way to get our cell means and marginal means. This function works for any type of 2-way ANOVA, regardless of the number of levels of your factors, e.g. it would work for a 3 x 3 ANOVA. All you need to do is indicate what the IV’s (aka factors) and DV are and specify show.marginal.means = TRUE.
table_of_means <- apa.2way.table(iv1 = rs, 
               iv2 = partner, 
               dv = liking, 
               data = reject,
               show.marginal.means = TRUE)
table_of_means
## 
## 
## Means and standard deviations for liking as a function of a 2(rs) X 2(partner) design 
## 
##           partner                              
##           Neutral      Happy      Marginal     
##        rs       M   SD     M   SD        M   SD
##       Low    5.55 1.05  6.00 0.73     5.78 0.92
##      High    1.80 0.70  6.45 0.69     4.12 2.45
##  Marginal    3.67 2.09  6.22 0.73              
## 
## Note. M and SD represent mean and standard deviation, respectively. 
## Marginal indicates the means and standard deviations pertaining to main effects.

We can also get the output of the apa.2way.table() function exported to a word document.

apa.2way.table(iv1 = rs,
               iv2 = partner,
               dv = liking,
               data = reject,
               show.marginal.means = TRUE,
               filename = "table_of_means.doc")
## 
## 
## Means and standard deviations for liking as a function of a 2(rs) X 2(partner) design 
## 
##           partner                              
##           Neutral      Happy      Marginal     
##        rs       M   SD     M   SD        M   SD
##       Low    5.55 1.05  6.00 0.73     5.78 0.92
##      High    1.80 0.70  6.45 0.69     4.12 2.45
##  Marginal    3.67 2.09  6.22 0.73              
## 
## Note. M and SD represent mean and standard deviation, respectively. 
## Marginal indicates the means and standard deviations pertaining to main effects.

Table of Means using dplyr

The following can be used to obtain cell means.

reject %>%
  group_by(rs, partner) %>% # Group by both rs and partner
  summarize(mean = mean(liking, na.rm = TRUE), # Obtain the mean for each condition
            sd = sd(liking, na.rm = TRUE)) %>%  # Obtain the SD for each condition
knitr::kable(digits = c(NA, NA, 2, 2, 2), # Specifying number of digits to report
             caption = "Cell Means & SD")
## `summarise()` has grouped output by 'rs'. You can override using the `.groups`
## argument.
Cell Means & SD
rs partner mean sd
Low Neutral 5.55 1.05
Low Happy 6.00 0.73
High Neutral 1.80 0.70
High Happy 6.45 0.69

The following can be used to obtain marginal means for rejection sensitivity.

# Marginal Means for Rejection Sensitivity
reject %>%
  group_by(rs) %>% # Just group by rejection sensitivity
  summarize(mean = mean(liking, na.rm = TRUE),
            sd = sd(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(NA, 2, 2, 2),
             caption = "Marginal Means & SD for Rejection Sensitivity")
Marginal Means & SD for Rejection Sensitivity
rs mean sd
Low 5.78 0.92
High 4.12 2.45

And the marginal means for partner’s emotional expression.

# Marginal Means for Partner's Emotional Expression
reject %>%
  group_by(partner) %>% # Just group by partner
  summarize(mean = mean(liking, na.rm = TRUE),
            sd = sd(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(NA, 2, 2, 2),
             caption = "Marginal Means & SD for Partner's Emotional Expression")
Marginal Means & SD for Partner’s Emotional Expression
partner mean sd
Neutral 3.67 2.09
Happy 6.22 0.73

The following can be used to obtain the grand mean (the overall mean on the DV).

reject %>% # note that we don't need group_by for this one
  summarize(mean = mean(liking, na.rm = TRUE),
            sd = sd(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(2, 2, 2),
             caption = "Grand Mean")
Grand Mean
mean sd
4.95 2.02

Question: Which means are being compared in the main effect of rejection sensitivity?

Question: Which means are being compared in the main effect of interaction partner?

Question: Which means are involved in the interaction?

Conduct the Factorial ANOVA

Using Regression: lm()

Factorial ANOVA is the method by which we can examine whether two (or more) categorical IVs have joint effects on a continuous outcome of interest. Like all general linear models, factorial ANOVA is a specific case of multiple regression. However, we may choose to use an ANOVA framework for the sake of interpretability.

Write the lm() code that performs the factorial ANOVA analysis.

## your code here

Results in regression framework

## your code here

Question: Write the full model with parameter estimates filled in below:

\[Liking_i = \] >Question: What do each of the regression coefficient estimates mean?

It may be helpful to refer back to the table of means.

table_of_means
## 
## 
## Means and standard deviations for liking as a function of a 2(rs) X 2(partner) design 
## 
##           partner                              
##           Neutral      Happy      Marginal     
##        rs       M   SD     M   SD        M   SD
##       Low    5.55 1.05  6.00 0.73     5.78 0.92
##      High    1.80 0.70  6.45 0.69     4.12 2.45
##  Marginal    3.67 2.09  6.22 0.73              
## 
## Note. M and SD represent mean and standard deviation, respectively. 
## Marginal indicates the means and standard deviations pertaining to main effects.

Recoding the Factors

Obviously, the means being compared by b1 and b2 do not represent our main effects. The main effect of RS would be a comparison of the marginal means for the low and high conditions. The main effect of partner would be a comparison of the marginal means for the neutral and happy conditions.

With the way we have rs and partner coded in the model, the regression coefficient estimates don’t correspond simply to the mean difference between conditions. We can recode the factors so that the parameter estimates, b1 and b2, correspond to a test of the main effects.

Question: How should we code the levels of rejection sensitivity and partner emotion so that the parameter estimates, b1 and b2, correspond to a test of their main effects?

Assign new codes to the levels of each factor.

## your code here

Re-run the analysis with newly coded factors.

## your code here

Using Traditional ANOVA: anova()

You can also perform a traditional ANOVA and get straightforward results by passing the model to the anova() function.

model <- lm(liking ~ rs*partner, data = reject)
anova(model)

Question: How would you interpret these results?

Effect Sizes

Effect size is really important to report. It provides an idea of the size of the effect of a predictor variable on an outcome variable. Reporting effect size in publications is also very helpful for other researchers wanting to conduct a priori power analyses.

We can get effect sizes by passing our model to the etaSquared() function.

etaSquared(model)
##               eta.sq eta.sq.part
## rs         0.1692045   0.5258329
## partner    0.4041330   0.7259280
## rs:partner 0.2740833   0.6423889

Question: What is the difference between eta-squared and partial eta-squared?

Visualization

Visualizing the results is helpful for understanding your results and interpreting significant (or non-significant) effects. Especially when there is a significant interaction effect, a plot can help illustrate what the interaction effect is.

Main effects

  • Remember that main effects correspond to differences in marginal means. To plot main effects, we can use the plot_model() function from the sjPlot package. This function takes three arguments: 1) the fitted model object, 2) because we want to plot marginal means, we specify type = emm, and 3) Specify in the terms = argument which variable(s) you want the marginal means for.

Plotting the Main Effect of Rejection Sensitivity

plot_model(model, type = "emm", terms = "rs") # bars correspond to 95% CIs
## NOTE: Results may be misleading due to involvement in interactions

Plotting the Main Effect of Partner’s Emotional Expression

plot_model(model, type = "emm", terms = "partner") 
## NOTE: Results may be misleading due to involvement in interactions

Plotting the Interaction Effect

plot_model(model, type = "emm", terms = c("rs", "partner")) 

Switch how the interaction is visualized by switching the order of terms.

Question: How would you describe in a paper what the significant interaction effect means?

Simple effects

Simple effects are the effect of some factor (e.g., interaction partner’s expression) at each level of another factor (e.g., at high and low RS separately).

We can get these simple effects using the emmeans function and calculating the differences between every pair of cell means.

# Comparing cell means
emmeans(model, pairwise~rs*partner, adjust = "bonferroni")
## $emmeans
##  rs   partner emmean   SE df lower.CL upper.CL
##  Low  Neutral   5.55 0.18 76     5.19     5.91
##  High Neutral   1.80 0.18 76     1.44     2.16
##  Low  Happy     6.00 0.18 76     5.64     6.36
##  High Happy     6.45 0.18 76     6.09     6.81
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast                   estimate    SE df t.ratio p.value
##  Low Neutral - High Neutral     3.75 0.254 76  14.754  <.0001
##  Low Neutral - Low Happy       -0.45 0.254 76  -1.770  0.4840
##  Low Neutral - High Happy      -0.90 0.254 76  -3.541  0.0041
##  High Neutral - Low Happy      -4.20 0.254 76 -16.524  <.0001
##  High Neutral - High Happy     -4.65 0.254 76 -18.294  <.0001
##  Low Happy - High Happy        -0.45 0.254 76  -1.770  0.4840
## 
## P value adjustment: bonferroni method for 6 tests
# for info on types of p-value adjustments you can choose from:
?summary.emmGrid # scroll to p-value adjustments
## starting httpd help server ... done

Question: Does partner emotion have an effect on perceived liking for people low on rejection sensitivity? For people high on rejection sensitivty?

Question: Does rejection sensitivity have an effect on perceived liking when partner shows neutral emotions? When partner shows happy emotions?

You could also use the emmeans function to compare marginal means for each factor individually.

# Comparing marginal means for partner emotion
emmeans(model, pairwise~partner, adjust = "bonferroni")
## NOTE: Results may be misleading due to involvement in interactions
## $emmeans
##  partner emmean    SE df lower.CL upper.CL
##  Neutral   3.67 0.127 76     3.42     3.93
##  Happy     6.22 0.127 76     5.97     6.48
## 
## Results are averaged over the levels of: rs 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast        estimate   SE df t.ratio p.value
##  Neutral - Happy    -2.55 0.18 76 -14.188  <.0001
## 
## Results are averaged over the levels of: rs
# Comparing marginal means for partner emotion
emmeans(model, pairwise~rs, adjust = "bonferroni")
## NOTE: Results may be misleading due to involvement in interactions
## $emmeans
##  rs   emmean    SE df lower.CL upper.CL
##  Low    5.78 0.127 76     5.52     6.03
##  High   4.12 0.127 76     3.87     4.38
## 
## Results are averaged over the levels of: partner 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast   estimate   SE df t.ratio p.value
##  Low - High     1.65 0.18 76   9.180  <.0001
## 
## Results are averaged over the levels of: partner