On January 26, 2022, United States Medical Licensing Examination (USMLE) Step 1 Examination (Step 1) scores will be reported as pass or fail, as the test was initially designed. This decision was made thoughtfully and with broad input from stakeholder organizations as part of the Invitational Conference on USMLE Scoring (InCUS).1 Family medicine (FM) educators should celebrate this change. The unintended consequences overemphasizing Step 1 have been well described for both faculty2 and students.3 The current use of Step 1 as a filter for graduate medical education (GME) applications is a poor predictor of clinical performance,4,5 perpetuates structural inequities by race and gender,5,6 and negatively impacts student well-being by shifting attention away from institutional undergraduate medical education (UME) performance and extracurricular activities including service and research.3 Finally, there is reason to suspect there will be positive implications of this change concerning FM specialty choice. Chen et al describe how students choosing to specialize in primary care are often assumed to have lower examination scores and students with high Step 1 scores are commonly encouraged to apply to more competitive specialties.
The InCUS group did not recommend USMLE grade changes in isolation. They also recommended a full review of the UME-to-GME transition, leading to the creation of the UME-to-GME Education Review Committee (UGRC). The UGRC recently released their initial report with 43 preliminary reccommendations7 including issues of advising, competency assessment, information available about applicants, interviews and visiting rotations, equity, intern preparedness, oversight, and transitioning from student to resident. I strongly encourage family medicine educators to review these recommendations. They acknowledge that the current system is failing applicants, programs, and the public good. Although we can and should debate the details, these recommendations seek to revitalize and improve the entire UME-GME transition process. Additionally, the recommendations around competency assessment and communication form a bridge to the Accreditation Council for Graduate Medical Education (ACGME) Milestones and allow for a more seamless learning framework and opportunities for better evaluation strategies in the future.
During the COVID-19 pandemic, the USMLE Step 2 Clinical Skills Examination (CS) was suspended, then retired.8 While the pass rate for US graduates has been at or above 95%, the pass rate for international graduates (IMGs) has been around 75%.9 For programs with a large IMG applicant pool, this is yet another stressor. The Education Commission for Foreign Medical Graduates has responded by creating multiple pathways to allow IMGs to demonstrate the necessary skills to enter residency. There are now six possible pathways for the 2022 match.10 IMGs play an essential role in our workforce, and programs experience unique challenges when trying to review these applicants as they make up approximately 60% of all FM applicants.11
In parallel with these changes, medical schools are shifting to pass/fail grading.12 This shift is being driven by the heterogeneity of systems and imprecision of meaning,13 an increased focus on competency-based standards across the medical education continuum, and evidence that pass/no-pass grading can improve student well-being14 without impacting performance.15 Some are concerned that these changes will swing the pendulum too far away from medical knowledge, but the requisite knowledge has not changed. Step 1 will still provide, “the ability of medical licensing authorities to use the exam for its primary purpose of medical licensure eligibility.”1
The confluence of these changes likely leaves program directors (PDs) feeling less informed while managing more applications than ever. The mean number of applications per applicant has skyrocketed in the past 20 years in a vicious cycle described as application fever.16 Despite informational campaigns by medical schools and organizations,17 the number of applications per applicant continues to climb.11 In 2020, FM programs averaged 1,147 applications,11 At 10 minutes per file, it would take 8 days working around the clock to perform even a cursory review. The current system overwhelms PDs and forces them to seek ways to filter out applicants. The National Residency Matching Program (NRMP) PD Survey reports that only 30% of FM applicants receive an in-depth review!18 Not only are filters like Step 1 score problematic for equity, well-being, and future performance, our residencies are all going after the same applicants. In 2016, 7% of FM applicants received 50% of all interview offers, and 23% of those who interviewed comprised 50% of all interviews.19 In the short term, focus will likely shift to the USMLE Step 2 Clinical Knowledge Exam. Eighty-eight percent of internal medicine and orthopedic PDs surveyed reported that the Step 1 grade change will increase emphasis on Clinical Knowledge Exam grades.20 It appears we need filters, but what are the factors we want to filter in or out of our programs?
Applicants want better data too. There is no realistic way for an applicant to determine which of the 700+ FM GME programs they would be a good fit for, and the lack of transparency around what programs are looking for in applicants further increases pressures to overapply. Applicants realize that if they don’t look like a student in the top 10%, the data suggest they should apply broadly to get enough interviews to match. Applicants and advisors have a variety of tools at their disposal including those produced by the American Academy of Family Physicians (AAFP),21 the NRMP,18,22 the Association of American Medical Colleges,23,24,17 the American Medical Association (AMA),25 University Collaboration,26 and third-party platforms.27-29 However, these data often focus on easily-measured factors of questionable significance, such as “number of job experiences.” Despite these resources, a lack of transparency around selection criteria remains. The AAFP Residency Directory and AMA’s FREIDA come closest to allowing students to search and filter programs, but even these are limited to searching by geography, program size, community served, and program type. Are these the factors we want our training programs defined by? The fit of applicants to programs can improve if we increase transparency about what we are looking for and add more meaningful programmatic data to the AAFP Residency Directory.
It will take multiple interventions in parallel to get us out of this mess. We must decrease the applications per applicant, clarify what we care about in applicants, be transparent in the mission and outcomes of our programs, and help build a system that allows for greater bidirectional transparency and sorting. We should not be satisfied with a system that screens out 70% of applicants. Except for application caps, it is unlikely that any single intervention will get us back to the ratios we saw in the early 2000s, but there are other options to consider. Staged applications (early acceptance), preference signaling, and interview caps could all reduce the load on programs but do little to help applicants know which programs may be a good fit for them.
Step 1 scores poorly predict GME success and have a hard time measuring the knowledge, skills, and attitudes that do predict success. For over a decade we’ve talked about the need for collaboration between UME and GME so interns are ready to hit the ground running.30 Recent graduates have higher USMLE scores than any prior generation of physicians. With less focus on Step 1 content, this is an opportunity to acknowledge and focus on the skills across all the ACGME Core Competencies allowing applicants to thrive as residents.
Finally, the FM community should begin treating Step 1 as a pass/no-pass exam with the class of ’23. The pandemic’s impact on this cohort has been incredible. As a student affairs dean, I’ve heard countless stories of how COVID-19 personally impacted students and their families during the dedicated study period. As students tried to sit for the exam, the disruption continued. Nearly half the students at my institution had their exams canceled because of testing site closures. Some were notified the day of their exam and many had multiple cancellations.
Three-digit Step 1 scores will soon be history, and many are nervous about their ability to differentiate applicants without these data. These scores were never validated for this purpose and the unintended consequences of the Step 1 climate and application fever are ultimately bad for programs, applicants, and the public good. We should work within our specialty and across academic medicine to break the vicious cycle of overapplication. We should be transparent about what we are looking for in applicants and what we strive for in our graduates. Now is the time to define what we care about, and how we can assess and communicate those data. In coordination with each of the above suggestions, we should study how these interventions meet the needs of our programs, applicants, and their future patients.
There are no comments for this article.