{papaja}
Prior to today’s lab, make sure that you have installed LaTeX. You can download the Rmd file here and the example {papaja}
.Rmd file here.
The following lab covers {papaja}
, a package for creating reproducible, APA-style manuscripts in Rmarkdown. We will go over how to write a manuscript in {papaja}
, including creating tables, figures, citations, and in-text statistical reporting. This page has some broad (overview) information and links to useful resources.
{papaja}
?{papaja}
is an R package for Preparing APA Journal Articles. It contains several functions and perhaps most importantly, an .Rmd template that formats an output document (.docx or .pdf) in APA format. Here is the link to papaja’s documentation, the most helpful and comprehensive {papaja}
resource out there.
{papaja}
First, we need to install {papaja}
. {papaja}
is not yet on CRAN, so we have to download it from GitHub rather than using install.packages()
. This requires the {devtools}
package. Let’s install both {devtools}
and {papaja}
.
# install the devtools package if necessary
if(!"devtools" %in% rownames(installed.packages())) install.packages("devtools")
# install the stable development version of papaja from GitHub if necessary
if(!"papaja" %in% rownames(installed.packages())) devtools::install_github("crsh/papaja")
# load papaja
library(papaja)
In addition to {papaja}
, the {citr}
package is going to be helpful when it comes to citing works in text (within our .Rmd) and the {corx}
package is going to make it easier to create a table of correlations.
# install the citr package if necessary
if(!"citr" %in% rownames(installed.packages())) devtools::install_github("crsh/citr")
# install the corx package if necessary
if(!"corx" %in% rownames(installed.packages())) install.packages("corx")
# load corx
library(corx)
OK, now we’re ready to go! Below I’ll briefly review some of the functions we’ll be using today before we dive into hands-on work in a {papaja}
template .Rmd.
The information covered in this section is not specific to {papaja}
, but it will be very useful when it comes to preparing manuscripts using {papaja}
.
To insert a value stored within an object into text in Rmarkdown, you can write it as “r name_of_object” (replacing the quotes with back ticks; i.e., `).
For example, if you stored the value 5
in an object called x
you would write “r x” (replacing the quotes with back ticks) to have the number 5 inserted in the text when you knit the Rmarkdown document.
There are a few options for making APA-style tables in {papaja}
, each of which has different strengths and weaknesses.
We’ll start by going over apa_table()
from {papaja}
.apa_table()
can be used to format a dataframe as an APA-style table. It is built on top of knitr::kable()
. When using apa_table()
, the workflow generally goes something like:
printnum()
(from {papaja}
) to format the numeric columns.apa_table()
, and set the caption and note (if applicable).results = "asis"
.Let’s start by getting a table of descriptive stats. We’ll use the bfi
data from the {psych}
package and use bfi.keys
to score the five scale scores.
b5 <- bfi %>%
cbind(scoreVeryFast(keys = bfi.keys, items = bfi))
First, we create a dataframe with the descriptive statistics of our Big Five scales. We’ll use some of the {dplyr}
functions for this.
descriptives <- b5 %>%
# select the columns for agreeableness to openness
select(agree:openness) %>%
# convert into long format
gather(scale, score) %>%
# group_by trait
group_by(scale) %>%
# calculate the mean, median, sd, min, and max for each trait
summarize(
Mean = mean(score),
Median = median(score),
SD = sd(score),
Min = min(score),
Max = max(score))
Next, we format the numerical columns using printnum()
function from {papaja}
for this. In essence, this puts everything to 2 decimal points (you can change this), pads with zeroes when necessary, and turns those to strings so they print correctly. “-1” indicates that the first column is non-numerical so does not apply.
descriptives <- descriptives %>%
# if the column is numeric, apply the printnum function
mutate_if(is.numeric, printnum)
Then, we run apa_table()
on the dataframe, and we can add a caption and note. You have to set results = "asis"
in the chunk options for it to render correctly:
apa_table(descriptives,
caption = "Descriptive statistics of Big Five Scale Scores.",
note = "This table was created with apa_table().")
scale | Mean | Median | SD | Min | Max |
---|---|---|---|---|---|
agree | 4.65 | 4.80 | 0.90 | 1.00 | 6.00 |
conscientious | 4.27 | 4.40 | 0.95 | 1.00 | 6.00 |
extraversion | 4.15 | 4.20 | 1.06 | 1.00 | 6.00 |
neuroticism | 3.16 | 3.00 | 1.20 | 1.00 | 6.00 |
openness | 4.59 | 4.60 | 0.81 | 1.20 | 6.00 |
Note. This table was created with apa_table().
Now we have a nicely formatted APA-style table that is fully reproducible.
Next, let’s get a table of scale inter-correlations for the same dataset. Correlation tables are a little difficult in {papaja}
, but the corx()
function from the {corx}
package makes it a little easier.
Basically, the workflow for this is:
corx()
to create the correlation table.apa_table()
to add a caption, note, and print it.cor <- b5 %>%
# select the columns for agreeableness to openness
select(agree:openness) %>%
# create the correlation matrix
corx(triangle = "lower",
stars = c(0.05, 0.01, 0.001),
describe = c(M = mean, SD = sd))
apa_table(cor$apa,
caption = "Example correlation matrix",
note = "* p < 0.05; ** p < 0.01; *** p < 0.001")
1 | 2 | 3 | 4 | M | SD | |
---|---|---|---|---|---|---|
1. agree | - | 4.65 | 0.90 | |||
2. conscientious | .26*** | - | 4.27 | 0.95 | ||
3. extraversion | .46*** | .26*** | - | 4.15 | 1.06 | |
4. neuroticism | -.19*** | -.23*** | -.22*** | - | 3.16 | 1.20 |
5. openness | .15*** | .20*** | .21*** | -.09*** | 4.59 | 0.81 |
Note. * p < 0.05; ** p < 0.01; *** p < 0.001
Now we’ll get a regression table using another method available in {papaja}
. We’ll use the apa_print()
function. This function takes statistical results as input and produces a list containing an APA-style table and several objects useful for printing results in-text (covered below).
The workflow for this method is:
apa_print()
.apa_print()
object with $table
.Let’s do that with two regression models. Let’s regress conscientious
on age
(model 1) and conscientious
on age
and education
(model 2).
First, fit the two regression models.
# tidy the data
b5 <- b5 %>%
# convert education to a factor
mutate(education = case_when(education == 1 ~ "Some HS",
education == 2 ~ "Finished HS",
education == 3 ~ "Some College",
education == 4 ~ "College Grad",
education == 5 ~ "Grad Degree"),
education = factor(education,
levels = c("Some HS",
"Finished HS",
"Some College",
"College Grad",
"Grad Degree")),) %>%
# remove rows where conscientiousness, age, or education are NA
filter(!is.na(conscientious),
!is.na(age),
!is.na(education))
# fit the regression models
model_1 <- lm(conscientious ~ age, data = b5)
model_2 <- lm(conscientious ~ age + education, data = b5)
Next, use apa_print()
to create those APA-style results objects:
# model 1 results
model_1_results <- apa_print(model_1)
# model 2 results
model_2_results <- apa_print(model_2)
# model comparison results
model_comp_results <- apa_print(list(model_1,
model_2),
boot_samples = 0)
Note: If we didn’t set boot_samples = 0
, apa_print()
would have produced bootstrapped CIs for the \(\Delta R^2\)s. We will cover bootstrapping next week.
After we have used apa_print()
to create the results objects, we can extract the APA tables.
You could get a table for just the Model 1 results using the code below:
model_1_results$table %>%
apa_table(caption = "Model Regressing Conscientiousness on Age",
note = "* p < 0.05; ** p < 0.01; *** p < 0.001")
Predictor | \(b\) | 95% CI | \(t(2575)\) | \(p\) |
---|---|---|---|---|
Intercept | 4.07 | \([3.96\), \(4.17]\) | 75.42 | < .001 |
Age | 0.01 | \([0.00\), \(0.01]\) | 4.71 | < .001 |
Note. * p < 0.05; ** p < 0.01; *** p < 0.001
Or we could get a table for just the Model 2 results:
model_2_results$table %>%
apa_table(caption = "Model Regressing Conscientiousness on Age & Education",
note = "* p < 0.05; ** p < 0.01; *** p < 0.001")
Predictor | \(b\) | 95% CI | \(t(2571)\) | \(p\) |
---|---|---|---|---|
Intercept | 3.87 | \([3.72\), \(4.02]\) | 50.43 | < .001 |
Age | 0.01 | \([0.01\), \(0.01]\) | 5.54 | < .001 |
EducationFinished HS | 0.05 | \([-0.12\), \(0.21]\) | 0.54 | .588 |
EducationSome College | 0.25 | \([0.11\), \(0.38]\) | 3.63 | < .001 |
EducationCollege Grad | 0.02 | \([-0.13\), \(0.18]\) | 0.28 | .782 |
EducationGrad Degree | 0.06 | \([-0.09\), \(0.22]\) | 0.79 | .430 |
Note. * p < 0.05; ** p < 0.01; *** p < 0.001
Finally, we can get these results together using the table stored in model_comp_results
:
model_comp_results$table %>%
apa_table(caption = "Model Comparison: Conscientiousness by Age and Age + Education",
note = "* p < 0.05; ** p < 0.01; *** p < 0.001")
Model 1 | Model 2 | |
---|---|---|
Intercept | \(4.07\) \([3.96\), \(4.17]\) | \(3.87\) \([3.72\), \(4.02]\) |
Age | \(0.01\) \([0.00\), \(0.01]\) | \(0.01\) \([0.01\), \(0.01]\) |
EducationCollege Grad | \(0.02\) \([-0.13\), \(0.18]\) | |
EducationFinished HS | \(0.05\) \([-0.12\), \(0.21]\) | |
EducationGrad Degree | \(0.06\) \([-0.09\), \(0.22]\) | |
EducationSome College | \(0.25\) \([0.11\), \(0.38]\) | |
\(R^2\) [90% CI] | \(.01\) \([0.00\), \(0.02]\) | \(.02\) \([0.01\), \(0.03]\) |
\(F\) | 22.16 | 10.93 |
\(df_1\) | 1 | 5 |
\(df_2\) | 2575 | 2571 |
\(p\) | < .001 | < .001 |
\(\mathrm{AIC}\) | 6,979.81 | 6,955.69 |
\(\mathrm{BIC}\) | 6,997.38 | 6,996.68 |
\(\Delta R^2\) | \(.01\) | |
\(F\) | 8.06 | |
\(df_1\) | 4 | |
\(df_2\) | 2,571 | |
\(p\) | < .001 | |
\(\Delta \mathrm{AIC}\) | -24.12 | |
\(\Delta \mathrm{BIC}\) | -0.70 |
Note. * p < 0.05; ** p < 0.01; *** p < 0.001
The last thing I’ll mention here is that we can use the chunk labels for our table to reference them in text. For example, we could reference this last table by adding Table \@ref(tab:reg-tbl-3)
in the {papaja}
.Rmd.
You can also create figures in {papaja}
manuscripts by using practically any of the methods you already know. I personally like {ggplot}
, which can be put into APA format by using the theme_apa()
function from {papaja}
.
Let’s create an APA-formatted plot for the relation between age
and conscientious
.
ggplot(b5, aes(x = age, y = conscientious)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm") +
labs(x = "Age",
y = "Conscientiousness") +
theme_apa()
## `geom_smooth()` using formula 'y ~ x'
The best way to give the figure a caption is to preface it with:
(ref:fig-ageXconsc-cap) Conscientiousness by Age.
You then set the chunk option fig.cap = "(ref:fig-ageXconsc-cap) "
. There is an example of this in the example {papaja}
.Rmd, so we will see it in action in a moment.
We are able to extend this process of creating a figure to more complicated plots. For example, let’s add education
to the figure.
ggplot(b5, aes(x = age, y = conscientious, color = education)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Age",
y = "Conscientiousness") +
theme_apa()
## `geom_smooth()` using formula 'y ~ x'
Or one that has age
by each of the Big Five personality traits:
b5 %>%
select(age, agree:openness) %>%
gather(trait, score, -age) %>%
ggplot(aes(x = age, y = score)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm") +
labs(x = "Age",
y = "Scale Score") +
facet_grid(~trait) +
theme_apa()
## `geom_smooth()` using formula 'y ~ x'
One of the most useful aspects of writing manuscripts in .Rmd and {papaja}
is that we can report our results directly from the statistical models we run in R, reducing the likelihood of copy-and-pasting errors.
We can use apa_print()
from {papaja}
to report the results of our statistical tests in text. Abbreviated results (estimate and CI) can be called with $estimate
from the apa_print()
object. For example, model_1_results$estimate$age
will print a nicely formatted printout of the slope for age on conscientiousness:
Age significantly predicted conscientiousness, \(b = 0.01\), 95% CI \([0.00\), \(0.01]\).
We can instead get the full result (including t, df, and p) by calling the $full_result
object, so model_1_results$full_result$age
will be rendered like so:
Age significantly predicted conscientiousness, \(b = 0.01\), 95% CI \([0.00\), \(0.01]\), \(t(2575) = 4.71\), \(p < .001\).
We could get the results of other slopes by changing out $age
to reference other terms. For example, we could report the result of the difference between some high school and having finished high school with model_2_results$full_result$educationFinished_HS
:
Finishing highschool (vs. not finishing highschool) was not associated with Conscientiousness, \(b = 0.05\), 95% CI \([-0.12\), \(0.21]\), \(t(2571) = 0.54\), \(p = .588\).
We can also get model fit results, for both the individual models and for the comparison object. For example:
model_1_results$full_result$modelfit$r2
will print the \(R^2\) for model 1.model_2_results$full_result$modelfit$r2
will print the \(R^2\) for model 2.model_comp_results$full_result$model2
will print the \(\Delta R^2\) value (difference in \(R^2\)), and the F test for that difference.Putting it all together, we can report those results with something like the following:
The model with just age (\(R^2 = .01\), 90% CI \([0.00\), \(0.02]\), \(F(1, 2575) = 22.16\), \(p < .001\)) explained significantly more variance than the model that also included (dummy-coded) education (\(R^2 = .02\), 90% CI \([0.01\), \(0.03]\), \(F(5, 2571) = 10.93\), \(p < .001\)), \(\Delta R^2 = .01\), \(F(4, 2,571) = 8.06\), \(p < .001\).
{papaja}
has various print methods, so be sure to check out the documentation.
The final ingredient we will talk about before getting into our example manuscript and the (singular) hack for the lab is the bibliography. All RMDs can include a bibliography:
entry in the YAML (the header portion of an .Rmd), where you can link to a BibTeX (i.e., .bib
) file. Bibtex entries look something like this:
@article{goldberg1990alternative,
title={An alternative" description of personality": the big-five factor structure.},
author={Goldberg, Lewis R},
journal={Journal of personality and social psychology},
volume={59},
number={6},
pages={1216--1229},
year={1990},
doi = {https://doi.org/10.1037/0022-3514.59.6.1216},
publisher={American Psychological Association}
}
One convenient thing, is that you can get them from Google Scholar. Let’s take a look at the scholar results when we search for McClelland & Judd (1993).
From there, click on the quotes, and then on the BibTeX link, which takes you here
You should see this:
@article{mcclelland1993statistical,
title={Statistical difficulties of detecting interactions and moderator effects.},
author={McClelland, Gary H and Judd, Charles M},
journal={Psychological bulletin},
volume={114},
number={2},
pages={376},
year={1993},
publisher={American Psychological Association}
}
You would then want to copy that into a .bib file (you can edit it within RStudio or choose another text editor), and then make sure you reference that .bib file in the references:
part of your YAML (we’ll see this in the example).
You can also use popular reference managers like Mendeley or Zotero to create BibTeX files.
Then, we could reference this article by referring to its name. There are basically three options:
Type | code | how it renders |
---|---|---|
Parenthetical citation | [@mcclelland1993statistical] |
(McClelland and Judd 1993) |
Non-parenthetical | @mcclelland1993statistical |
McClelland and Judd (1993) |
just the year | [-@mcclelland1993statistical] . |
(1993) |
Note that you can also put additional text in the brackets. For example,
[see @mcclelland1993statistical]
becomes…
(see McClelland and Judd 1993)
You can also add multiple citations by separating them with a ;
. For example,
[@mcclelland1993statistical; @cohen2013applied]
becomes…
(McClelland and Judd 1993; Cohen et al. 2013)
One of the coolest things about using BibTeX is that references will only appear in your reference list that you cite in text! You can see that here, because the ‘lab-9.bib’ file has four entries and the reference list below has just the two we reference above. That means no more last-minute checks on the reference list to make sure you removed the articles you references in a previous draft (that are no longer referenced). This alone is a huge time saver.
{papaja}
){papaja}
makes it easy to cite any R package you’ve used with the r_refs(file = "r-references.bib")
, which creates a .bib file (named r-references.bib in that example code) with BibTeX entries for all R packages you’ve called in the script. It is super handy!
You can combine this with the {cite_r()}
package from {papaja}
, which will create an in-text citation of all the packages you’ve used - we’ll see this in the example {papaja}
.Rmd
That is all for this tutorial. Now let’s turn to the example .Rmd, and finally work on the hack (time permitting)!
{papaja}
hackToday there is just one hack. For this hack, we want you to get started on your final project .Rmd.
Create a new .Rmd file for your final project using the {papaja}
template.
Edit the YAML to have the title and author of your project
Create a .bib file with at least one reference you know you’ll need for your final project.
Cite the reference from your .bib (part 3) in text.
Run at least one statistical test.
Report the statistical test from part 4 in-text, in a table, and in a figure.