Development of a Machine Learning Model Using Electronic Health Record Data to Identify Antibiotic Use Among Hospitalized Patients

JAMA Netw Open. 2021 Mar 1;4(3):e213460. doi: 10.1001/jamanetworkopen.2021.3460.


IMPORTANCE: Comparisons of antimicrobial use among hospitals are difficult to interpret owing to variations in patient case mix. Risk-adjustment strategies incorporating larger numbers of variables haves been proposed as a method to improve comparisons for antimicrobial stewardship assessments.

OBJECTIVE: To evaluate whether variables of varying complexity and feasibility of measurement, derived retrospectively from the electronic health records, accurately identify inpatient antimicrobial use.

DESIGN, SETTING, AND PARTICIPANTS: Retrospective cohort study, using a 2-stage random forests machine learning modeling analysis of electronic health record data. Data were split into training and testing sets to measure model performance using area under the curve and absolute error. All adult and pediatric inpatient encounters from October 1, 2015, to September 30, 2017, at 2 community hospitals and 1 academic medical center in the Duke University Health System were analyzed. A total of 204 candidate variables were categorized into 4 tiers based on feasibility of measurement from the electronic health records.

MAIN OUTCOMES AND MEASURES: Antimicrobial exposure was measured at the encounter level in 2 ways: binary (ever or never) and number of days of therapy. Analyses were stratified by age (pediatric or adult), unit type, and antibiotic group.

RESULTS: The data set included 170 294 encounters and 204 candidate variables from 3 hospitals during the 3-year study period. Antimicrobial exposure occurred in 80 190 encounters (47%); 64 998 (38%) received 1 to 6 days of therapy, and 15 192 (9%) received 7 or more days of therapy. Two-stage models identified antimicrobial use with high fidelity (mean area under the curve, 0.85; mean absolute error, 1.0 days of therapy). Addition of more complex variables increased accuracy, with largest improvements occurring with inclusion of diagnosis information. Accuracy varied based on location and antibiotic group. Models underestimated the number of days of therapy of encounters with long lengths of stay.

CONCLUSIONS AND RELEVANCE: Models using variables derived from electronic health records identified antimicrobial exposure accurately. Future risk-adjustment strategies incorporating encounter-level information may make comparisons of antimicrobial use more meaningful for hospital antimicrobial stewardship assessments.

PMID:33779743 | DOI:10.1001/jamanetworkopen.2021.3460