You will need the Rmd file for today’s lab - can download it here.

Be sure to have the following packages installed and loaded:

library(tidyverse) # for plotting and data wrangling
library(rio) # for importing data
library(psych) # for descriptives
library(lsr) # for eta squared functions
library(emmeans) # for marginal means and simple effects

## Warning: package 'emmeans' was built under R version 4.1.2

library(sjPlot) # for plotting model results 
library(apaTables) # for tables of means
library(car) # for testing model assumptions
library(broom) # for tidying model output

# suppress scientific notation
options(scipen = 999)

Purpose

Factorial ANOVA (aka multiple regression with categorical predictors) refers to a special case of the general linear model in which there is an interaction of two or more categorical variables (i.e. factors). A factorial design is used when there is an interest in how two or more variables (or factors) affect some outcomes variable. Rather than conduct separate one-way ANOVAs for each factor, they are all included in one analysis. Today we will review how to run factorial ANOVA models in R and how to interpret and visualize the results.

Research scenario

Based on subjects’ self-reports of rejection sensitivity (N = 80), a researcher divides subjects into two equal groups (low RS and high RS). Whereas half of the subjects in each group interact with a partner who displays a happy emotional expression during the interaction, the other half of the subjects in each group interact with a partner who displays a neutral emotional expression during the interaction. After the interaction, subjects are asked to rate the statement, “My interaction partner likes me”, on a scale from 1 (strongly disagree) to 7 (strongly agree).

Factor 1: Rejection Sensitivity - Low - High

Factor 2: Partner’s Emotional Expression - Neutral - Happy

Dependent Variable: Perceived Liking

Hypothesis Testing

With a factorial ANOVA, three different null hypotheses can be tested regarding whether there is 1) a main effect of factor 1, 2) a main effect of factor 2, and 3) an interaction between factor 1 and factor 2.

Imagine a table with means like the following:

	High	Low	Mean
Neutral	10	10	10
Happy	10	10	10
Mean	10	10	10

Null Hypothesis 1 (Main Effect of Rejection Sensitivity): \[H_0: \mu_{Low} = \mu_{High}\] Another way to state this hypothesis: \[\alpha_{Low} = \mu_{Low} - \mu_{all}\] \[\alpha_{High} = \mu_{High} - \mu_{all}\] \[H_0: \alpha_{Low} = \alpha_{High} = 0\]

Null Hypothesis 2 (Main Effect of Partner’s Emotional Expression): \[H_0: \mu_{Neutral} = \mu_{Happy}\] Again, another way to state this hypothesis: \[\beta_{Neutral} = \mu_{Neutral} - \mu_{all}\] \[\beta_{Happy} = \mu_{Happy} - \mu_{all}\] \[H_0: \beta_{Neutral} = \beta_{Happy} = 0\]

Null Hypothesis 3 (Interaction Effect):

H_0: There is no interaction between rejection sensitivity and partner’s emotional expression.

Or, lastly, in another way:

\[\alpha\beta_{Low+Neutral} = \mu_{Low+Neutral} - \alpha_{Low} - \beta_{Neutral} - \mu_{all}\] \[\alpha\beta_{High+Neutral} = \mu_{High+Neutral} - \alpha_{High} - \beta_{Neutral} - \mu_{all}\] \[\alpha\beta_{Low+Happy} = \mu_{Low+Happy} - \alpha_{Low} - \beta_{Happy} - \mu_{all}\] \[\alpha\beta_{High+Happy} = \mu_{High+Happy} - \alpha_{High} - \beta_{Happy} - \mu_{all}\] \[H_0: \alpha\beta_{Low+Neutral} = \alpha\beta_{High+Neutral} = \alpha\beta_{Low+Happy} = \alpha\beta_{High+Happy} = 0\]

Import & inspect the Data

reject <- import("https://raw.githubusercontent.com/uopsych/psy612/master/labs/lab-8/data/reject.csv")

str(reject)

## 'data.frame':    80 obs. of  3 variables:
##  $ rs     : chr  "Low" "Low" "Low" "Low" ...
##  $ partner: chr  "Neutral" "Neutral" "Neutral" "Neutral" ...
##  $ liking : int  4 4 4 5 5 5 5 5 5 5 ...

head(reject)

It looks like rs and partner are both being read in as character variables. Let’s go ahead and change those to factors.

reject <- reject %>% 
  mutate(rs = as.factor(rs),
         partner = as.factor(partner))

Check the structure again.

str(reject)

## 'data.frame':    80 obs. of  3 variables:
##  $ rs     : Factor w/ 2 levels "High","Low": 2 2 2 2 2 2 2 2 2 2 ...
##  $ partner: Factor w/ 2 levels "Happy","Neutral": 2 2 2 2 2 2 2 2 2 2 ...
##  $ liking : int  4 4 4 5 5 5 5 5 5 5 ...

Notice that by default R orders factor levels alphabetically. In our case, this means that High will be the reference group of rejection sensitivity and Happy will be the reference group of interaction partner’s emotional expression. However, it might be more intuitive to have Low and Neutral be the reference groups, respectively.

To accomplish this, we can simply re-order the levels of our factor variables with fct_relevel().

reject <- reject %>% 
  mutate(rs = fct_relevel(rs, c("Low", "High")), 
         partner = fct_relevel(partner, c("Neutral", "Happy")))

str(reject) # Notice the re-ordering of levels

## 'data.frame':    80 obs. of  3 variables:
##  $ rs     : Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
##  $ partner: Factor w/ 2 levels "Neutral","Happy": 1 1 1 1 1 1 1 1 1 1 ...
##  $ liking : int  4 4 4 5 5 5 5 5 5 5 ...

We can also see how the factors are being coded using contrasts().

contrasts(reject$rs)

##      High
## Low     0
## High    1

contrasts(reject$partner)

##         Happy
## Neutral     0
## Happy       1

Notice that each are being dummy coded. It’s very important to be aware of how the factors in your regression model are being coded to be able to accurately interpret the regression coefficients in your output.

Speaking of regression, note we are going to conduct the analysis using the lm() function. The following is how you could write the factorial ANOVA as the regression equation that we’re estimating:

\[Liking_i = \beta_0 + \beta_1Partner_i + \beta_2Liking_i + \beta_3PartnerxLiking_i + \epsilon_i\]

Descriptive Statistics

Obtaining sample size per condition.

reject %>%
  group_by(rs, partner) %>% 
  summarize(n = n())

## `summarise()` has grouped output by 'rs'. You can override using the `.groups` argument.

Table of Means

The results of a factorial ANOVA are often represented in a table of means, which contains the means of each combination of conditions (the cell means) and means for the levels of each variable ignoring the other variables (the marginal means).

The following can be used to obtain cell means.

reject %>%
  group_by(rs, partner) %>% # Group by both rs and partner
  summarize(mean = mean(liking, na.rm = TRUE), # Obtain the mean for each condition
            sd = sd(liking, na.rm = TRUE)) %>%  # Obtain the SD for each condition
knitr::kable(digits = c(NA, NA, 2, 2, 2), # Specifying number of digits to report
             caption = "Cell Means & SD")

## `summarise()` has grouped output by 'rs'. You can override using the `.groups` argument.

Cell Means & SD
rs	partner	mean	sd
Low	Neutral	5.55	1.05
Low	Happy	6.00	0.73
High	Neutral	1.80	0.70
High	Happy	6.45	0.69

The following can be used to obtain marginal means for rejection sensitivity.

# Marginal Means for Rejection Sensitivity
reject %>%
  group_by(rs) %>% # Just group by rejection sensitivity
  summarize(mean = mean(liking, na.rm = TRUE),
            sd = sd(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(NA, 2, 2, 2),
             caption = "Marginal Means & SD for Rejection Sensitivity")

Marginal Means & SD for Rejection Sensitivity
rs	mean	sd
Low	5.78	0.92
High	4.12	2.45

And the marginal means for partner’s emotional expression.

# Marginal Means for Partner's Emotional Expression
reject %>%
  group_by(partner) %>% # Just group by partner
  summarize(mean = mean(liking, na.rm = TRUE),
            sd = sd(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(NA, 2, 2, 2),
             caption = "Marginal Means & SD for Partner's Emotional Expression")

Marginal Means & SD for Partner’s Emotional Expression
partner	mean	sd
Neutral	3.67	2.09
Happy	6.22	0.73

The following can be used to obtain the grand mean (the overall mean on the DV).

reject %>% # note that we don't need group_by for this one
  summarize(mean = mean(liking, na.rm = TRUE),
            sd = sd(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(2, 2, 2),
             caption = "Grand Mean")

Grand Mean
mean	sd
4.95	2.02

Table of Means - An Easier Way

The apa.2way.table() function from the apaTables package is a much more convenient way to get our cell means and marginal means. This function works for any type of 2-way ANOVA, regardless of the number of levels of your factors, e.g. it would work for a 3 x 3 ANOVA. All you need to do is indicate what the IV’s (aka factors) and DV are and specify show.marginal.means = TRUE.

table_of_means <- apa.2way.table(iv1 = rs, 
               iv2 = partner, 
               dv = liking, 
               data = reject,
               show.marginal.means = TRUE)
table_of_means

## 
## 
## Means and standard deviations for liking as a function of a 2(rs) X 2(partner) design 
## 
##           partner                              
##           Neutral      Happy      Marginal     
##        rs       M   SD     M   SD        M   SD
##       Low    5.55 1.05  6.00 0.73     5.78 0.92
##      High    1.80 0.70  6.45 0.69     4.12 2.45
##  Marginal    3.67 2.09  6.22 0.73              
## 
## Note. M and SD represent mean and standard deviation, respectively. 
## Marginal indicates the means and standard deviations pertaining to main effects.

We can also get the output of the apa.2way.table() function exported to a word document.

apa.2way.table(iv1 = rs,
               iv2 = partner,
               dv = liking,
               data = reject,
               show.marginal.means = TRUE,
               filename = "table_of_means.doc")

## 
## 
## Means and standard deviations for liking as a function of a 2(rs) X 2(partner) design 
## 
##           partner                              
##           Neutral      Happy      Marginal     
##        rs       M   SD     M   SD        M   SD
##       Low    5.55 1.05  6.00 0.73     5.78 0.92
##      High    1.80 0.70  6.45 0.69     4.12 2.45
##  Marginal    3.67 2.09  6.22 0.73              
## 
## Note. M and SD represent mean and standard deviation, respectively. 
## Marginal indicates the means and standard deviations pertaining to main effects.

Question: Which means are being compared in the main effect of rejection sensitivity?

Question: Which means are being compared in the main effect of interaction partner?

Question: Which means are involved in the interaction?

Conduct the Factorial ANOVA

Factorial ANOVA is the method by which we can examine whether two (or more) categorical IVs have joint effects on a continuous outcome of interest. Like all general linear models, factorial ANOVA is a specific case of multiple regression. However, we may choose to use an ANOVA framework for the sake of interpretability.

model <- lm (liking ~ rs*partner, data = reject)

Results in regression framework

Question: What do each of the regression coefficient estimates mean?

b0 = 5.55

b1 = -3.75

b2 = 0.45

b3 = 4.20

It may be helpful to refer back to the table of means.

table_of_means

## 
## 
## Means and standard deviations for liking as a function of a 2(rs) X 2(partner) design 
## 
##           partner                              
##           Neutral      Happy      Marginal     
##        rs       M   SD     M   SD        M   SD
##       Low    5.55 1.05  6.00 0.73     5.78 0.92
##      High    1.80 0.70  6.45 0.69     4.12 2.45
##  Marginal    3.67 2.09  6.22 0.73              
## 
## Note. M and SD represent mean and standard deviation, respectively. 
## Marginal indicates the means and standard deviations pertaining to main effects.

Getting Factorial ANOVA Results using anova()

Obviously, the means being compared by b1 and b2 do not represent our main effects. The main effect of RS would be a comparison of the marginal means for the low and high conditions. The main effect of partner would be a comparison of the marginal means for the neutral and happy conditions.

With the way we have rs and partner coded in the model, the regression coefficient estimates don’t correspond simply to the mean difference between conditions. For testing our main effects, you can get more straightforward results using anova().

anova(model)

Calculating the SS for each term

Recall the null hypothesis being tested by each row in the ANOVA table.

Null Hypothesis 1 (Main Effect of Rejection Sensitivity):

\[H_0: \mu_{Low} = \mu_{High}\] \[\alpha_{Low} = \mu_{Low} - \mu_{all}\] \[\alpha_{High} = \mu_{High} - \mu_{all}\] \[H_0: \alpha_{Low} = \alpha_{High} = 0\]

Question: Where do we look for variability to examine the effect of rejection sensitivity on liking?

Recall from lecture the formula for calculating SS for the effect of the row variable is:

Let’s calculate it from scratch.

# Calculating the grand mean
GM <- mean(reject$liking, na.rm = TRUE) 
GM

## [1] 4.95

# Marginal Means for Rejection Sensitivity
reject %>%
  group_by(rs) %>% 
  summarize(mean = mean(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(NA, 5))

rs	mean
Low	5.775
High	4.125

# Recall number of people per row and column
reject %>%
  group_by(rs, partner) %>% 
  summarize(n = n())

## `summarise()` has grouped output by 'rs'. You can override using the `.groups` argument.

# Calculate SS_rs
SS_rs <- 2*20*((4.125-GM)^2+ (5.775-GM)^2)
SS_rs

## [1] 54.45

# Compare to anova() output
anova(model)

Null Hypothesis 2 (Main Effect of Partner’s Emotional Expression):

\[H_0: \mu_{Neutral} = \mu_{Happy}\] Or, \[\beta_{Neutral} = \mu_{Neutral} - \mu_{all}\] \[\beta_{Happy} = \mu_{Happy} - \mu_{all}\] \[H_0: \beta_{Neutral} = \beta_{Happy} = 0\]

Question: Where do we look for variability to examine the effect of partner’s emotional expression on liking?

The formula for calculating SS for the effect of the column variable is:

Let’s calculate it from scratch.

# Marginal Means for Partner's Emotional Expression
reject %>%
  group_by(partner) %>% 
  summarize(mean = mean(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(NA, 5))

partner	mean
Neutral	3.675
Happy	6.225

# Calculate SS_partner
SS_partner <- 2*20*((6.225-GM)^2 + (3.675-GM)^2)
SS_partner

## [1] 130.05

# Compare to anova() output
anova(model)

Null Hypothesis 3 (Interaction Effect):

H0: There is no interaction between rejection sensitivity and partner’s emotional expression.

Question: Where do we look for variability to examine whether there is an interaction between rejection sensitivity and partner’s emotional expression?

The formula for calculating SS for the effect of the interaction effect is:

This is a simplification of doing the following:

In other words, is there any variability remaining among the cell means that is not completely explained by the main effects of the row and column variable?

Go ahead and try calculating the SS for the interaction from scratch using the formula above.

# Cell Means
reject %>%
  group_by(rs, partner) %>%
  summarize(mean = mean(liking, na.rm = TRUE)) %>% 
knitr::kable(digits = c(NA, NA, 5))

## `summarise()` has grouped output by 'rs'. You can override using the `.groups` argument.

rs	partner	mean
Low	Neutral	5.55
Low	Happy	6.00
High	Neutral	1.80
High	Happy	6.45

table_of_means

## 
## 
## Means and standard deviations for liking as a function of a 2(rs) X 2(partner) design 
## 
##           partner                              
##           Neutral      Happy      Marginal     
##        rs       M   SD     M   SD        M   SD
##       Low    5.55 1.05  6.00 0.73     5.78 0.92
##      High    1.80 0.70  6.45 0.69     4.12 2.45
##  Marginal    3.67 2.09  6.22 0.73              
## 
## Note. M and SD represent mean and standard deviation, respectively. 
## Marginal indicates the means and standard deviations pertaining to main effects.

# Calculate SS_interaction
SS_Int <- 20*((5.55-5.78-3.67+GM)^2+(1.8-4.12-3.67+GM)^2+(6-5.78-6.22+GM)^2+(6.45-4.12-6.22+GM)^2)

# Compare to anova() output
anova(model)

Calcuating MS

The variance, or MS, for each term is simply calculated by taking the SS for that term and dividing it by the degrees of freedom for that term.

Recall the formulas for calcualting df:

df_Row = R - 1
df_Column = C - 1
df_Interaction = (R-1)(C-1)

where R is the number of rows and C is the number of columns in your table of means.

MS_rs <- 54.45/1 
MS_partner <- 130.05/1
MS_Interaction <- 88.2/1

Calucating the F-statistic for each test

To finally test each null hypothesis, we need to calculate an F-statistic and either compare that F-statistic to a critical value or examine the probability of obtaining an F-statistic as extreme as the one we obtained if the null hypothesis were true (aka, the p-value).

Question: The MS for each term that we calculated above will be the numerator of our F-statistic. What do we use as the denominator?

F_rs = -99
F_partner = -99
F_Interaction = -99

F_rs

## [1] -99

F_partner

## [1] -99

F_Interaction

## [1] -99

anova(model)

Question: What is our conclusion about each of our null hypotheses?

Effect Sizes (Eta-Squared and Partial Eta-Squared)

Effect size is really important to report. It provides an idea of the size of the effect of a predictor variable on an outcome variable. Reporting effect size in publications is also very helpful for other researchers wanting to conduct a priori power analyses.

etaSquared(model)

##               eta.sq eta.sq.part
## rs         0.1692045   0.5258329
## partner    0.4041330   0.7259280
## rs:partner 0.2740833   0.6423889

Question: What is the difference between eta-squared and partial eta-squared? Do you think one has strengths over the other?

Question: Sara will go over that on the coming Tuesday’s lecture!

Visualization

Visualizing the results is helpful for understanding your results and interpreting significant (or non-significant) effects. Especially when there is a significant interaction effect, a plot can help illustrate what the interaction effect is.

Main effects

Remember that main effects correspond to differences in marginal means. To plot main effects, we can use the plot_model() function from the sjPlot package. This function takes three arguments: 1) the fitted model object, 2) because we want to plot marginal means, we specify type = emm, and 3) Specify in the terms = argument which variable(s) you want the marginal means for.

Plotting the Main Effect of Rejection Sensitivity

plot_model(model, type = "emm", terms = "rs")

## NOTE: Results may be misleading due to involvement in interactions

Plotting the Main Effect of Partner’s Emotional Expression

plot_model(model, type = "emm", terms = "partner")

## NOTE: Results may be misleading due to involvement in interactions

Plotting the Interaction Effect

plot_model(model, type = "emm", terms = c("rs", "partner"))

Switch how the interaction is visualized by switching the order of terms.

Question: How would you describe in a paper what the significant interaction effect means?

Simple effects

Simple effects are the effect of some factor (e.g., interaction partner’s expression) at each level of another factor (e.g., at high and low RS separately).

We’ll look at the simple effect of interaction partner having a neutral vs. happy expression on perceived liking at different levels of rejection sensitivity. We’ll use the {emmeans} package.
To get simple effects, we combine the emmeans() function with the contrast() function (both from {emmeans}). emmeans() works by passing it a model and then specifying which variables you’re looking at. Then, we pass that along to contrast(), which can give us a variety of different contrasts.
If we want simple effects for interaction partner expression at each level of rejection sensitivity we can use by = "rs" and simple = "partner".

model %>% emmeans(specs = c("partner", "rs")) %>% # specify our two factors
  contrast(by = "rs", # by is the variable we are looking at each level of
           simple = "partner") # simple is what we want simple effects of

## rs = Low:
##  contrast       estimate    SE df t.ratio p.value
##  Neutral effect   -0.225 0.127 76  -1.770  0.0807
##  Happy effect      0.225 0.127 76   1.770  0.0807
## 
## rs = High:
##  contrast       estimate    SE df t.ratio p.value
##  Neutral effect   -2.325 0.127 76 -18.294  <.0001
##  Happy effect      2.325 0.127 76  18.294  <.0001
## 
## P value adjustment: fdr method for 2 tests

Question: What do the results of this simple effects analysis show?

Now let’s look at simple effects for rejection sensitivity at each level of partner expression.

model %>% emmeans(specs = c("partner", "rs")) %>% # specify our two factors
  contrast(by = "partner", # by is the variable we are looking at each level of
           simple = "rs") # simple is what we want simple effects of

## partner = Neutral:
##  contrast    estimate    SE df t.ratio p.value
##  Low effect     1.875 0.127 76  14.754  <.0001
##  High effect   -1.875 0.127 76 -14.754  <.0001
## 
## partner = Happy:
##  contrast    estimate    SE df t.ratio p.value
##  Low effect    -0.225 0.127 76  -1.770  0.0807
##  High effect    0.225 0.127 76   1.770  0.0807
## 
## P value adjustment: fdr method for 2 tests