ORIGINAL ARTICLES

Correlation of Milestone Ratings and Family Physicians’ Early Diabetes Management

Sean O. Hogan, PhD | Kenji Yamazaki, PhD | Eric S. Holmboe, MD

Fam Med. 2025;57(2):83-90.

DOI: 10.22454/FamMed.2025.980357

Return to Issue

Abstract

Background and Objectives: Family physicians manage the treatment of patients with chronic illnesses like type 2 diabetes mellitus (T2DM). During residency, trainees are assessed on their management of chronic disease under the Accreditation Council for Graduate Medical Education patient care (PC) milestone. Residency programs are expected to ensure that trainees are prepared to meet patients’ needs; however, evidence is mixed as to whether milestone evaluations predict how well a physician will perform in early unsupervised practice. This study tested whether higher PC milestone evaluations predict greater adherence to T2DM guidelines for early-career family physicians.

Methods: Using national provider identification numbers, we linked family medicine trainees’ penultimate PC milestones with commercial insurance claims for T2DM patients. We associated doctors with patients by identifying the doctors who performed the evaluation and maintenance exams and observing the extent to which those patients received HbA1c, retinal, and renal functioning exams. We followed doctors who graduated in June 2016 through the first 18 months of unsupervised practice.

Results: Milestones were not significantly associated with screening outcomes: HbA1c (OR=0.963, 95% CI [0.840, 1.104]), nephropathy (OR=0.983, 95% CI [0.901, 1.072]), or eye exam (OR=1.001, 95% CI [0.936, 1.070]). Rather, for every additional diabetes patient a family physician saw, administration of standard tests increased: HbA1c (OR=1.005, 95% CI [1.002, 1.009]) and nephropathy (OR=1.004, 95% CI [1.002, 1.006]).

Conclusions: Milestones for chronic disease management were not correlated with diabetes management for early career family physicians. The volume of diabetic patients under a doctor’s care was positively correlated with levels of expected screenings.

BACKGROUND

During residency, family medicine trainees are assessed on their skills in chronic disease management under the Accreditation Council for Graduate Medical Education (ACGME) milestone competency of patient care (PC). Use of the milestones assessment began in 2013 as a component of the Outcome Project and its emphasis on training physicians to deliver high-quality health care. 1- 3 Recent empirical studies have reached mixed conclusions about whether milestone evaluations are predictive of outcomes in early unsupervised practice. Smith and colleagues found a correlation between milestone evaluations near the end of vascular surgery training and surgical complications in early unsupervised practice. 4 Similarly, Han and colleagues found that low professionalism and interpersonal communication milestones correlated with patient-initiated complaints in early unsupervised practice. 5 Kendrick and associates were unable to detect a correlation between final year milestones and postoperative morbidity and mortality among patients of surgeons in their first 2 years of unsupervised practice. 6 We, therefore, sought to determine whether a correlation exists between a milestone measure directly attentive to chronic disease management and the diabetes care provided by family physicians in early unsupervised practice.

The PC milestone for chronic disease management (PC2) assesses the resident using general terms. Once in unsupervised practice, physicians are likely to have their chronic disease management assessed differently, using disease-specific quality indicators such as the Healthcare Effectiveness Data and Information Set (HEDIS). 7 The HEDIS T2DM standards have largely been adopted by the American Academy of Family Physicians. 8 The standards call for T2DM patients under the care of a family physician to receive blood glucose (HbA1c) monitoring at least semiannually, an annual eye exam, and annual attention for nephropathy.

We hypothesized that variation in proper T2DM management, as recorded in insurance claims during the first 2 years of unsupervised practice, would correlate with resident milestone assessment in the chronic disease management subcompetency.

METHODS

We focused on residents who graduated from a family medicine program in the academic year 2015–2016 (ending June 2016). We used insurance claims provided by Blue Health Intelligence (BHI) to identify recent family medicine program graduates who cared for at least five T2DM patients and to reliably estimate physician performance in an outpatient setting during their first 18 months of unsupervised care. The BHI data are compiled from patients insured by a Blue Cross Blue Shield (BCBS) plan in the United States. We relied on BHI because of the BCBS coverage, which includes a network of 36 independently operated entities insuring approximately 109 million Americans (about a third of the population) and contracts with about 90% of hospitals and physicians, touching every ZIP code in the United States. 9

Milestones

As has become practice in studies relating the milestones to early unsupervised practice, we used the penultimate milestones evaluation as the point to compare to early practice habits. 4, 5 These evaluations are collected about 6 months prior to program completion. The choice of milestone during the penultimate assessment period was motivated by the following reasons. First, the level 4.0 rating of each milestone subcompetency is a recommended level; however, this guideline is often mistaken as a graduation requirement. The specialty-wide ratings at the final assessment period, where ratings center around level 4.0, 11 show substantially less variability than in the prior periods, suggesting the misinterpretation of milestones as a graduation requirement. This finding may suggest that milestone ratings at this stage are affected by considerations other than the assessor’s true evaluation of a resident’s performance.

The use of milestones 1.0 and the time period we chose allowed us to avoid any effects of the COVID-19 pandemic on implementation and assessment, or on entry into unsupervised care. The new harmonized family medicine milestones went into effect in the summer of 2020 and would not have permitted us time to study a cohort that graduated under the use of the new milestones.

Second, the penultimate rating also has educational value. It represents a time when faculty can intervene with residents. We focused on 3,450 third-year residents whose penultimate milestones were reported for the July–December 2015 period. Residents were trained in 448 programs in metropolitan [3,310 (95.9%)], micropolitan [136 (3.9%)], or small-town areas [4 (0.1%)]. Of the 3,450 residents, 3,328 (96.5%) were reported as having completed the 3-year training between June 2015 and August 2016. Each resident’s national provider identifier was used to link milestone data to the physician who rendered care in the BHI dataset.

Inclusion and Exclusion Procedures

Identifying Type 2 Diabetes Patients. T2DM patients were identified using quality indicators based on a modified version of the 2020 Quality Rating System HEDIS Value Set Directory. 12 HEDIS indicators rely on International Classification of Diseases (ICD), Current Procedural Terminology (CPT), and some additional codes. Because the BHI data contained only the CPT and ICD codes, we could not use the HEDIS standards precisely as described by the National Committee for Quality Assurance (NCQA). Additionally, we used NCQA instructions to infer which patients had diabetes. The instructions are based on ICD/CPT codes for the nature of patient encounters or on evidence the patient is prescribed an insulin directly related to diabetes care. The determination of T2DM patients was made for those who appeared in the dataset between January 2016 and December 2017 or between January 2017 and December 2018. This selection yielded 416,882 T2DM patients for our study.

Attributing Patients to Physicians. A total of 170 different methods are available for attributing a patient’s care to a physician or physician practice. 13 These methods generally rely on one or more of these features: (a) the frequency (majority or plurality) of encounters between a physician and patient, 14 (b) the preponderance of costs for a course of treatment, or (c) the recency of a patient-physician encounter. 15, 16 The appropriate method, however, must be tailored to the purpose of the study. 15-17

Because family physicians see patients for any number of conditions, our approach was to identify the family physician who had responsibility for a patient’s overall care. 13, 14 We attributed responsibility to a member of the June 2016 graduating cohort who met the T2DM patient for an evaluation and maintenance visit in the outpatient setting during the 18-months immediately after residency graduation. In this way, we could ensure that the physicians in our study had the opportunity to establish themselves in practice and that we would not misattribute diabetes management to a physician who may have seen a patient for only an acute visit. This approach resulted in 29,050 outpatients being attributed to 2,186 physicians.

Coding Events for Diabetes Management. For each expected diabetes screening (HbA1c, nephropathy screening, and retinal exam), we coded 1 if the BHI dataset recorded a claim for the patient having received the lab work from any health care provider, and 0 otherwise. We measured compliance with screening between the day of the first visit with the attributed physician and the subsequent 365 days. Similarly, this 365-day study period was applied to patients whose first visit to the physician occurred between January and December 2017. We determined the study period to fairly capture the physician’s compliance with diabetes screening for T2DM patients regardless of the date of their first visit to the physician.

Statistical Analysis

We focused analysis on whether the penultimate milestones were associated with adherence to recommended diabetes screenings. This analysis controlled for the number of T2DM patients attributed to physicians and was conducted separately for HbA1c, nephropathy screening, and retinal screening.

We employed a multilevel modeling approach using generalized estimating equations (GEE) with a logistic model to account for the nested structure of the outcome—patients nested within each physician and physicians clustered in each residency program. GEE is appropriate for making population-averaged inferences over different clusters. 18, 19 The GEE model included an exchangeable working correlation matrix to deal with correlations between patient outcomes under the physician nested within the training program by assuming equal correlation among these outcomes.

The results are presented as odds ratios with 95% confidence intervals (Figure 3 ). We determined statistical significance using two-tailed P values, with a significance threshold of P<.05. All statistical procedures were performed using SparkR version 3.5.0 (Apache Software Foundation).

The institutional review board at the American Institutes of Research rendered the study exempt because it used administrative data.

RESULTS

Of the 25,278 T2DM patients, 13,420 were identified as female (53.1%), and 11,858 as male (46.9%; Table 1). The median age (IQR) was 54 years (46–60), with a range of 18 to 75 years. Overall, 18,768 (74.2%) patients received at least one HbA1c test within the study period, 19,031 (75.3%) received nephropathy screening, and 8,831 (34.9%) received an eye exam. The mean (SD) PC2 milestone for 1,242 physicians was 3.67 (0.57), with a range of 1.5 to 5.0. The median number of patients attributed (IQR) was 14 (8–25), with a range of 5 to 166.

For HbA1c and nephropathy screening, physicians responsible for more T2DM patients tended to see increased rates of administering recommended tests for T2DM. Every additional diabetes patient attributed to them reflected an increase in the administration of tests: HbA1c (OR=1.005, 95% CI [1.002, 1.009]) and nephropathy (OR=1.004, 95% CI [1.002, 1.006]). However, no statically significant association was observed for eye exam administration (OR=1.001, 95% CI[1.000, 1.002]). In contrast, the milestones ratings were not significantly associated with any of the outcomes: HbA1c (OR=0.963, 95% CI [0.840, 1.104]), nephropathy (OR=0.983, 95% CI [0.901, 1.072]), and eye exam (OR=1.001, 95% CI [0.936, 1.070]; Table 2).

DISCUSSION

Family physicians have responsibilities for managing the treatment of patients with chronic illnesses. T2DM is one chronic disease family physicians are very likely to encounter. Costing more than $306 billion in direct medical costs in 2022, 20 38.4 million Americans have diabetes, and more than 90% of them suffer from T2DM. 21 For these patients, the family physician may be their principal point of contact with the health care system. 22 Therefore, ensuring that family physicians are prepared to deliver patients appropriate care for chronic diseases is important.

Comparing learner performance to early career clinical outcomes is important in evaluating the quality of medical education. 3 In this study, we followed the care of privately insured T2DM patients provided by early-career family physicians. We did not find an association between milestones for chronic disease management and later patient receipt of standard T2DM screenings. To the contrary, Smith and her colleagues 4 and Han et al 5 found that milestone evaluations were associated with early-career outcomes.

The conflicting findings may result from important distinctions between surgical and nonsurgical medical services. One reason for this discrepancy is that ACGME requires surgical trainees to report procedural activity. In family medicine, trainees are not required to engage in a minimum number of visits with patients with each type of chronic illness. As such, they may not have developed confidence in performing some examinations or have had the equipment to do so during training. Faculty may not have been prompted to evaluate residents on any particular type of chronic illness, such as T2DM. The mandatory minima in surgical procedures may aid clinical competency committees in clearly observing their trainees across specific real-world experiences in arriving at specific milestone assessments.

Another important difference is that Smith et al 4, 17 used a composite of 15 milestone subcompetencies to generate a signal; whereas, our analysis rested on a single subcompetency aligned with the anticipated outcomes. Because the milestone subcompetency we chose assesses chronic disease care, we intentionally restricted this study to test whether that single subcompetency would signal later unsupervised practice patterns. These findings suggest that future studies might need to also take into account additional milestone competencies put to use in a particular patient encounter. Physician-patient encounters are multifaceted, so assessments for multiple competencies may be needed in predicting practice outcomes.

Another potential source of differences is that a surgical complication is attributable directly to the surgeon’s skill. In the primary care fields, a physician may rely on a care team to follow through on diabetes screening. Claims data do not record when the doctor may have ordered a test, and thus the results may reflect a degree of the patient’s willingness or ability to follow through. In addition, claims data focused on process measures may be less sensitive than surgical registry data when examining for trends at the individual level.

We also found that recommended screening among recent graduates was below national norms, for HbA1c in particular. Compliance with clinical guidelines for monitoring progression of T2DM among our sample was below estimates for insured adults nationwide. An estimated 86.6% of insured T2DM patients reported having their HbA1c tested in the prior year, 23 compared to the 74.3% of patients of the recent residency program graduates in our sample. Rates of urine albumin screening for diabetes-related kidney disease among insured patients is estimated to be about 80%; 24 one study, however, using a large administrative database of electronic health records, found that only 37.7% of commercially insured patients received kidney screenings. 25 Retinal exams are estimated to be performed on about 37% of adults with diabetes, 25 slightly more than the 34.9% of the patients of the doctors in our sample.

While this finding is noteworthy, we are not suggesting that new family physicians are less prepared than their veteran colleagues on a national level. While we found that performance improved, predictably, with increased encounters with T2DM patients, we were unable to find other studies that compared patient volume as a benchmark for the larger family physician community. We could not control for national patient volume to compare new with veteran physicians.

Our findings suggest that early-career physicians with greater numbers of diabetic patients may find that their patients receive higher rates of standard screenings for T2DM. This finding may encourage family medicine programs to ensure that trainees interact with patients who have a variety of chronic conditions and that they are assessed on specific disease management. Finally, we resist the temptation to become overly critical of the milestones assessments given our findings. The difference in our results and those of Smith et al 4 or Han et al 5 suggest that connecting milestone assessments to early career outcomes may be sensitive to methodological approach, data sources, specialties, or the medical conditions being monitored. Outcomes analysis is in its infancy, and further research is necessary. For example, contrasting BHI data with other insurers’ combination of milestones, as Smith et al 4 did, may be revealing.

Limitations

Our study had several limitations. We studied a single cohort of residency graduates from a single specialty, from a single network of insurance, and with a single chronic condition. A wider array of chronic diseases may have revealed a correlation between the PC2 milestones and appropriate screening for other conditions. While T2DM is widespread and predictably an important part of the family physician’s work, additional studies will shed light on the chronic disease milestone subcompetency and later care for conditions such as high blood pressure or asthma. We also relied on a single insurer, albeit a very large one. That insurer’s coverage may influence patient behaviors related to their follow-through on recommended care.

Furthermore, claims data report only when a provider is reimbursed for services. They do not reflect whether a family physician ordered a test or referred the patient for subspecialty care. Patient behavior, or access to other specialists and labs, may account for some lack of follow-through. Still, HbA1C and urine tests can be performed during the evaluation and management visit that initiated doctor-patient relationships as we defined them, so we hope this is not a significant limitation. And the actual HEDIS indicators rely on the claims, and our results emulated this common approach to assessing quality of care.

Furthermore, the milestones and claims data may not lend themselves to validating each other. The milestones were designed as formative assessments, and various program directors have not had a shared mental model of how they should be operationalized; so residents may not be assessed consistently across programs or across types of patient encounters they may have. 26 Finally, the extant research testifies to challenges with using claims data, including attribution of provider to patient, selection of appropriate measures, completeness of the data as they were entered, 27 and how well they reflected the population being studied. 28

CONCLUSIONS

In this study, a single ACGME milestone subcompetency for chronic disease management was not correlated with diabetes management screenings for the privately insured patients of early career family physicians. Early career family physicians with higher numbers of diabetic patients had higher levels of expected screenings. Additional research will be needed to shed light on the utility of claims data in assessing early career chronic disease management by primary care physicians.

References

  1. Nasca TJ, Philibert I, Brigham T, Flynn TC. The next GME accreditation system—rationale and benefits. N Engl J Med. 2012;366(11):1,051-1,056. doi:10.1056/NEJMsr1200117
  2. Edgar L, Roberts S, Holmboe E. Milestones 2.0: A step forward. J Grad Med Educ. 2018;10(3):367-369. doi:10.4300/JGME-D-18-00372.1
  3. Chen FM, Bauchner H, Burstin H. A call for outcomes research in medical education. Acad Med. 2004;79(10):955-960. doi:10.1097/00001888-200410000-00010
  4. Smith BK, Yamazaki K, Tekian A, et al. Accreditation Council for Graduate Medical Education milestone training ratings and surgeons’ early outcomes. JAMA Surg. 2024;159(5):546-552. doi:10.1001/jamasurg.2024.0040
  5. Han M, Hamstra SJ, Hogan SO, et al. Trainee physician milestone ratings and patient complaints in early post-training practice. JAMA Netw Open. 2023;6(4):e237588. doi:10.1001/jamanetworkopen.2023.7588
  6. Kendrick DE, Thelen AE, Chen X, et al. Association of surgical resident competency ratings with patient outcomes. Acad Med. 2023;98(7):813-820. doi:10.1097/ACM.0000000000005157
  7. Glassman JR, Hopkins DSP, Bundorf MK, et al. Association between HEDIS performance and primary care physician age, group affiliation, training, and participation in ACA exchanges. J Gen Intern Med. 2020;35(6):1,730-1,735. doi:10.1007/s11606-020-05642-3
  8. American Academy of Family Physicians. Diabetes: clinical guidance and practice resources. Accessed Dec. 27, 2023. https://www.aafp.org/family-physician/patient-care/clinical-recommendations/clinical-guidance-diabetes.html
  9. Brennan T. The settlement of the Blue Cross Blue Shield antitrust litigation: creating a new potential catalyst for health insurance industry restructuring. JAMA Health Forum. 2022;3(12):e224737. doi:10.1001/jamahealthforum.2022.4737
  10. Accreditation Council for Graduate Medical Education; American Board of Family Medicine. The family medicine milestone project. September 2013. http://residents.fammed.org/FamilyMedicineMilestones%20-%20Final.pdf
  11. Hamstra SJ, Edgar L, Yamazaki K, Holmboe ES. Milestones Annual Report.  Accreditation Council for Graduate Medical Education; 2017. https://www.acgme.org/globalassets/pdfs/milestones/milestonesannualreport2017.pdf
  12. National Committee for Quality Assurance. 2020 quality rating system (QRS) HEDIS value set directory. NCQA; 2020. https://store.ncqa.org/2020-quality-rating-system-qrs-hedis-value-set-directory.html
  13. Riley W, Love K, Wilson C. Patient attribution—a call for a system redesign. JAMA Health Forum. 2023;4(3):e225527. doi:10.1001/jamahealthforum.2022.5527
  14. Pham HH, Schrag D, O’Malley AS, Wu B, Bach PB. Care patterns in Medicare and their implications for pay for performance. N Engl J Med. 2007;356(11):1,130-1,139. doi:10.1056/NEJMsa063979
  15. Ryan A, Linden A, Maurer K, et al. Attribution methods and implications for measuring performance in health care. National Quality Forum; July 15, 2016. Accessed December 27, 2023. http://www.qualityforum.org/Projects/a-b/Attribution_2015-2016/Commissioned_Paper.aspx
  16. Mehrotra A, Adams JL, Thomas JW, McGlynn EA. The effect of different attribution rules on individual physician cost profiles. Ann Intern Med. 2010;152(10):649-654. doi:10.7326/0003-4819-152-10-201005180-00005
  17. Pope GC. Attributing patients to physicians for pay for performance. In: Cromwell J, Trisolini MG, Pope GC, Mitchell JB, Greenwald LM, eds. Pay for Performance in Health Care: Methods and Approaches. RTI Press; March 2011. https://www.rti.org/rti-press-publication/pay-performance-health-care/fulltext.pdf
  18. Mcneish, D, Stapleton, L M & Silverman, R D . 2017. On the unnecessary ubiquity of hierarchical linear modeling. Psychol Methods 22(1):114140.
  19. Ten Have TR, Ratcliffe SJ, Reboussin BA, Miller ME. Deviations from the population-averaged versus cluster-specific relationship for clustered binary data. Stat Methods Med Res. 2004;13(1):3-16. doi:10.1191/0962280204sm355ra
  20. Parker ED, Lin J, Mahoney T, et al. Economic costs of diabetes in the U.S. in 2022. Diabetes Care. 2024;47(1):26-43. doi:10.2337/dci23-0085
  21. Centers for Disease Control and Prevention. Type 2 diabetes. Accessed December 27m 2023. https://www.cdc.gov/diabetes/about/about-type-2-diabetes.html?CDC_AAref_Val=https://www.cdc.gov/diabetes/basics/type2.html
  22. Kushner PR, Cavender MA, Mende CW. Role of primary care clinicians in the management of patients with type 2 diabetes and cardiorenal diseases. Clin Diabetes. 2022;40(4):401-412. doi:10.2337/cd21-0119
  23. Twarog JP, Charyalu AM, Subhani MR, Shrestha P, Peraj E. Differences in HbA1C% screening among U.S. adults diagnosed with diabetes: findings from the National Health and Nutrition Examination Survey (NHANES). Prim Care Diabetes. 2018;12(6):533-536. doi:10.1016/j.pcd.2018.07.006
  24. Keong, F, Gander, J, Wilson, D, Durthaler, J, Pimentel, B & Barzilay, J I . 2023. Albuminuria screening in people with type 2 diabetes in a managed care organization. AJPM Focus 2(4):100133.
  25. Keong F, Gander J, Wilson D, Durthaler J, Pimentel B, Barzilay JI. Albuminuria screening in people with type 2 diabetes in a managed care organization. AJPM Focus. 2023;2(4):100133. doi:10.1016/j.focus.2023.100133
  26. Edgar, L, Mclean, S, Hogan, S O, Hamstra, S & Holmboe, E S . 2020. The Milestones Guidebook, version 2020. Accreditation Council for Graduate Medical Education. https://www.acgme.org/globalassets/MilestonesGuidebook.pdf
  27. Tyree PT, Lind BK, Lafferty WE. Challenges of using medical insurance claims data for utilization analysis. Am J Med Qual. 2006;21(4):269-275. doi:10.1177/1062860606288774
  28. Mouchawar J, Byers T, Warren M, Schluter WW. The sensitivity of Medicare billing claims data for monitoring mammography use by elderly women. Med Care Res Rev. 2004;61(1):116-127. doi:10.1177/1077558703260182

Lead Author

Sean O. Hogan, PhD

Affiliations: Accreditation Council for Graduate Medical Education, Chicago, IL | University of Illinois Chicago

Co-Authors

Kenji Yamazaki, PhD - Accreditation Council for Graduate Medical Education, Chicago, IL

Eric S. Holmboe, MD - Intealth, Philadelphia, PA

Corresponding Author

Sean O. Hogan, PhD

Correspondence: Department of Research, Milestones Development and Evaluation, Accreditation Council for Graduate Medical Education, Chicago, IL

Email: shogan@acgme.org

Fetching other articles...

Loading the comment form...

Submitting your comment...

There are no comments for this article.

Downloads & Info

Share

Related Content

Tags

Searching for articles...