Validity (4 types)
More about construct validity and its role in scale development.
Applicant selection for medical residency positions. * Subtest of medical licensing exam is best predictor of residency success. * Great! * Right?
Fallacy:
Once a concept has been clarified, the next step is to measure it.
Classical test theory states that:
\[ X = T + E \]
X: Observed Score
T: True Score
E: error (random and unpredictable)
Error is random and unpredictable.
Thus, if we measure people enough “times”, we can get a good sense of their true score.
Measuring people multiple times is often impractical and inefficient (and potentially theoretically wrong), so how can we remove measurement error during a single assessment?
How many items we do we need?
Measure of the consistency of a single test or scale. (Like measuring the person many times in a single session.)
Not validity! You can have a reliable measure that is not valid.
Which items?
It wasn’t assigned, but please read a great article by Len Simms1. This article focuses on the measurement using survey methods, but this logic extends to any measurement in which you aggregate multiple responses or scores.
I’m going to use this article as a template for developing a new scale with you.
There are many more articles worth reading on this (see Bonus Materials), and frankly, you should look for a class in measurement/psychometrics.
As written by Jane Loevinger (1957) and summarized by Len Simms (2008)
Let’s develop a scale for “academic stress.”
What are some specific aspects of academic stress?
Write the item pool
Add items here: https://shorturl.at/rGg7m
Takeaways: * Be able to perform psychometric analysis from memory. * (Kidding) * Classical test theory * Scale development takes time, thought, iteration, and data.
Describing data