I read with interest Shimkin and colleagues’ evaluation of the Underserved Pathway (UP) at the University of Washington School of Medicine, which used propensity score matching (PSM) to examine the association of UP participation with career outcomes.1
PSM is an elegant approach to handling self-selection into voluntary programs in statistical analysis comparing outcomes between program participants and non-participants. By limiting the sample to “cases” (UP participants) and matched “controls” (non-UP participants who had a similar “propensity” to have participated in the program, based on other characteristics), PSM can reduce confounding bias in a different manner than single-equation techniques, such as multivariable regression of career outcomes on UP participation.2 However, after PSM is used to create the matched sample, analyses are still subject to confounding by unobserved variables, special methods are needed to analyze the matched sample correctly, and special limitations apply to the conclusions from this analysis.2,3
Fundamentally, throwing away “unmatched” controls trades off a smaller sample size (384, relative to the initial 2,027 students) for arguably lower bias in the estimate of the UP effect.4 The latter point is critical. A propensity-matched sample should be used only for estimating the effect of the primary exposure (UP participation) on one or more outcomes. Therefore, among the odds ratios (ORs) shown in Tables 2‐5, only the OR for UP participation can be considered generalizable. The other ORs, for example, OR of “race” in Table 3, are not generalizable to the school’s graduates (let alone to other institutions), because they are calculated from a sample purposefully selected to be balanced on the propensity of participating in UP. Conclusions about other drivers of specialty choice and practice location, apart from UP participation, are likely to be different if those exposures were studied in the entire original sample, or if each exposure were studied separately with its own PSM analysis.
The implementation of PSM in this study would have benefitted from two other technical modifications, as described in more detail in prior literature on this method.2,3,5 First, the balance of covariates in the matched sample (Table 1), should have been analyzed using standardized differences (which are not sensitive to sample size), rather than null hypothesis testing methods, such as analysis of variance (where an educationally significant difference could appear to not be “statistically significant” if the sample size were small enough). Second, multivariable analyses of each outcome should have accounted for the matched, or clustered, nature of the sample. A conventional logistic regression, for example, assumes that all observations were sampled independently of one another. In this case, conditional (or “fixed-effects”) logistic regression should have been used to reflect the fact that each control was selected to match a specific case.
PSM is a powerful tool for reducing the confounding bias that vexes many medical education projects. Yet, results from PSM must be interpreted with caution, and its singular purpose can make it unsuitable for broader research questions, such as asking which among many exposures can predict a certain outcome.

There are no comments for this article.