The use of self-assessment is prevalent in medical education, particularly in the context of single-institution, smaller, unfunded evaluations. Self-assessment can include a variety of question types that asks what a learner thinks they know about a topic (knowledge), or how confident the learner feels about doing a procedure, prescribing a medication, treating a specific condition, or working with a certain type of patient or population (confidence). Together, self-assessed knowledge and confidence are often used to stand in for objectively measured or observed knowledge or skill.
In education, self-assessment is philosophically tied to the concept of lifelong learning and to problem-based learning. An excellent and quick overview of the evolution of both peer- and self-assessment is included in the introduction to a paper by Papinczak and colleagues.1 The ability to accurately evaluate one’s own acquisition of new knowledge and skills is laudable, and the concept is supported by cognitive theory as well.2 However, in practice, self-assessment has repeatedly been proven to be poorly correlated with other external or objective assessments of knowledge or skill. To give just a few examples, this has proven true when comparing self-assessment against peer- and tutor-assessment in problem-based learning scenarios1; in resident, medical student, and other learner predictions of their own test performance3–5; and repeatedly in comparisons of confidence versus knowledge, such as in statistical literacy among clinicians,6 or ability versus confidence of residents to accurately diagnose dementia.7 The mismatch between confidence, self-assessed knowledge or skill, and objectively measured knowledge and skills can be tied to the Dunning-Kruger Effect,8 which describes a nonlinear relationship between how much learners believe they know, versus how much they actually know about a topic, with the mismatch being more pronounced in those who know the least about a topic.2,9
Unfortunately, self-assessments of both knowledge and confidence are frequently used to evaluate curricular elements or training modalities, not because of a theoretically-based commitment to lifelong learning, but rather because they are, frankly, easy to implement. Much like a preponderance of the literature that has reported on comparisons between self- versus objective assessment, a student-led study at my own institution recently found the same thing: we found very poor correlations between fourth-year medical student self-assessments of confidence in their own knowledge and ability to manage diabetes, and objective assessment using diabetes questions from standardized examinations.10
At PRiMER, we have been progressively applying stricter thresholds for publishing research reports that rely upon self-assessed knowledge or confidence as the only outcome measures. As described in our quality guidelines,11 PRiMER utilizes the Kirkpatrick Model of Assessment in our evaluation of submitted manuscripts. Within the context of the Kirkpatrick framework, we view self-assessment as essentially a level-1 “reaction” measure. Given the poor performance of self-assessment, crossing many domains of knowledge and skill acquisition, and applying to both self-assessment of knowledge and confidence, we believe it is important to hold this line.
Of course, there are times when self-assessment of knowledge or confidence are appropriate. A few examples include:
- The inclusion of a self-assessment alongside other quantitative and objective measures of knowledge or skill acquisition, or alongside qualitative, phenomenological descriptions of student experiences.
- Studies that actually examine meta-cognition, or that evaluate educational activities that are specifically designed to increase self-awareness, confidence, or similar domains. For example, many studies that observe confidence in treating patients or populations that differ from the clinician (in gender, race, culture, or socio-economic status or other traits) can impact who engages in care, who enters specific specialties or practice contexts, etc. In cases like these, where the entire point of an intervention is to increase confidence, it may be intellectually honest to ask the learners about their own confidence levels.
- There are also circumstances where no validated measures that exist, and/or confidence is a necessary precursor to action. For example, an activity may be designed specifically to impart confidence in taking an action, such as “engaging in advocacy work.” It makes sense to ask learners if they feel more confident and interested to engage in advocacy work, if that confidence is required to take the next step.
An example that encompasses most of these points appears in a 2023 paper by Robinson and Mishori.12 In that report, self-assessed knowledge and confidence questions were combined with a deeper qualitative study phase to give an overall picture of the outcomes from a medical student advocacy workshop. There are few widely-recognized instruments or formalized assessments of advocacy skills, there is a need for a person to be interested and engaged in order to move forward with advocacy activities, and the manuscript combined self-assessment with other modalities of evaluation. This is a good recent example of where we have moved forward with a publication that incorporated self-assessment.
Where we will take a much stronger stance is with instances where self-assessed knowledge or comfort are simply presented as a proxy for objective measurement of knowledge or skill acquisition. The model evaluation will contain:
- Measurement of skill or knowledge against a baseline
- Preferably with either a reference population, statistical control for covariates, or both
- Assessment that is based upon objective measurement (eg, quiz items, tests, outcomes on standardized examinations) or external (nonself) observation
- Analysis conducted within a reasonable time interval and an appropriate level of rigor.
We believe that enforcing these standards, which are in keeping with the quality guidelines that have been in place since PRiMER began, will serve two purposes: increasing the overall rigor of the journal and its contributions to the literature, and aiding our educational mission to nurture new scholars as they initiate their research careers. Holding a higher bar will establish good habits at the outset of those careers, and contribute to an overall improvement in the quality of medical education research, as an aspirational goal for PRiMER.
There are no comments for this article.