Please complete this assignment using the RMarkdown file provided [Link to Rmarkdown file here]. Once you download the RMarkdown file please (1) include your name in the preamble, (2) rename the file to include your last name (e.g., “weston-homework-4.Rmd”). When you turn in the assignment, include both the .Rmd and knitted .html files.
To receive full credit on this homework assignment, you must earn 15 points. You may notice that the total number of points available to earn on this assignment is 29 – this means you do not have to answer all of the questions. You may choose which questions to answer. You cannot earn more than 15 points, but it may be worth attempting many questions for learning’s sake. Here are a couple things to keep in mind:
Points are all-or-nothing, meaning you cannot receive partial credit if you correctly answer only some of the bullet points for a question. All must be answered correctly.
After the homework has been graded, you may retry questions to earn full credit, but you may only retry the questions you attempted on the first round.
The first time you complete this assignment, it must be turned in by 9am on the due date. Late assignments will receive 50% of the points earned. For example, if you correctly answer questions totaling 14 points, the assignment will receive 7 points. If you resubmit this assignment with corrected answers (a total of 15 points), the assignment will receive 7.5 points.
You may discuss homework assignments with your classmates; however, it is important that you complete each assignment on your own and do not simply copy someone else’s code. If we believe one student has copied another’s work, both students will receive a 0 on the homework assignment and will not be allowed to resubmit the assignment for points.
Data:
-homework-happy
: The dataset looks at happiness in college as a relationship with school success, friendship quality, socioeconomic status, and an intervention group (1 = control, 2 = study skills training, 3 = social skills training). (Note that the variable names have spaces in them, which may make it tricky to work with. You might consider renaming the variables.)
-homework-spatial
: These data come from Experiment 1 of Maglio & Polman (2014) where they examined whether a person’s spatial orientation would affect perceived distance. They conducted the experiment on a train. Half of the participants were headed eastbound whereas the other half were headed westbound (direction
and also orientation
; these are the same variable). Participants were also randomly assigned to indicate the subjective distance of one of the four subway stations (station
). The outcome variable is subjective distance
.
Using the dataset homework-happy
, run a two-predictor regression model predicting happiness by friendship and school success and the interaction between the two.
Describe your hypothesis for the interaction using the study variables (i.e., don’t write a formula, but describe the hypothesis in words).
Describe in words exactly what the coefficients \(b_1\), \(b_2\), and \(b_3\) are telling us in this model.
Refer to the model built in Question 3.1.
Center the independent variables and re-estimate the model. Interpret the coefficient estimates.
Refer to the model built in Question 3.1.
Health researchers commonly acknowledge that cardiac arrest is caused by high levels of cholesterol. They also agree that cholesterol is caused by smoking and weight. Moreover, it is generally agreed that both the choice to smoke and choices that contribute to weight are caused by generally unhealthy life styles. (Note, these are not the only causal factors contributing to either smoking or weight, but we’re trying to keep the example manageable, for a 5-point problem).
Build a DAG model that represents these known relationships. What are the different paths that transmit associations from smoking to cardiac arrest?
What variables, if any should a researcher control for if they want to estimate the causal association between smoking and cardiac arrest?
Should the researcher control for cholestorol? Why or why not?
Maglio & Polman (2014) studied the psychological distance of objects (how far away we feel an object is) in the context of the subway: How far away does a passenger on a train perceive a station to be? For their study, they recruited participants who were standing either on an eastbound or westbound platform of a specific subway station (Bay Street). All participants were asked how far away another station on the route feels on a scale from 1 (very close) to 7 (very far). Participants were randomly asked about one of four stations: Spadina (two stops to the west), St. George (one stop to the west), Bloor-Yonge (one stop to the east), or Sherbourne (two stops to the east).
What are the sample sizes for each group?
Run the factorial ANOVA from Experiment 1. Interpret the sums of squares and F-ratios.
What is the effect size associated with each of your terms? What is the \(R^2\) for the model?
Conduct a series of post-hoc tests that address the following hypotheses (be sure to interpret):
Graph the results in a manner that you think best conveys that data. Make sure you include 95% CIs.
Refer to the dataset and model estimated in Question 5.2(#q5.2).
Run a power analysis to determine sample size needed for the effect sizes found in the model fit above. (Note: the pwr
package is a nice one to use but be sure to use the glm/regression one, pwr.f2.test
, as opposed to the one-way anova. Also, pay close attention to the effect size requested. Those more adventurous can perform simulations to calculate power.)
The power analysis above does not give you the power for detecting differences between two cell means. Find the number of subjects needed for power of .90 with an effect size difference of .4 between two cell means, assuming no corrections for Type 1 error. (Hint: .4 is an unstandardized effect size, but most power functions require a standardized effect size. Some of the equations covered in PSY 611 may be helpful here.)
Assume that the researchers had an additional variable, familiarity with the area, to use as a covariate. The reasoning is that people who know the area well may make better distance predictions than those that do not. How do you think this would impact the findings? Describe what would change in terms of your standard regression components (why and in what direction).