Antimicrobial minimum inhibitory concentrations can be imputed from phenotypic data using a random forest approach

Gayatri Anil Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY
Department of Public and Ecosystem Health, College of Veterinary Medicine, Cornell University, Ithaca, NY

Search for other papers by Gayatri Anil in
Current site
Google Scholar
PubMed
Close
 BS https://orcid.org/0000-0002-0956-7279
,
Joshua Glass Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY

Search for other papers by Joshua Glass in
Current site
Google Scholar
PubMed
Close
 MS https://orcid.org/0000-0003-1067-0537
,
Abdolreza Mosaddegh Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY
Multidisciplinary Graduate Engineering Department, College of Engineering, Northeastern University, Oakland, CA

Search for other papers by Abdolreza Mosaddegh in
Current site
Google Scholar
PubMed
Close
 PhD https://orcid.org/0000-0001-5840-3628
, and
Casey L. Cazer Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY
Department of Public and Ecosystem Health, College of Veterinary Medicine, Cornell University, Ithaca, NY

Search for other papers by Casey L. Cazer in
Current site
Google Scholar
PubMed
Close
 DVM, PhD https://orcid.org/0000-0002-2290-1868

Abstract

Objective

Antimicrobial resistance (AMR) is a public health threat requiring monitoring across multiple sectors because AMR genes and pathogens can pass between humans, animals, and the environment. Idiosyncrasies in AMR data, including missing data and changes in testing protocols, make characterizing AMR trends over time and sectors challenging. Therefore, this study applied machine learning methods to impute missing minimum inhibitory concentrations.

Methods

Models were built using cattle-associated Escherichia coli from the National Antimicrobial Resistance Monitoring System. Random forest models were designed to predict the minimum inhibitory concentration of a given E coli isolate for 10 antimicrobials. Predictors included isolate metadata and the minimum inhibitory concentrations of other antimicrobials. Model performance was evaluated on held-out test data and 2 external datasets (E coli isolated from chickens and humans).

Results

Overall, the accuracy within 1 minimum inhibitory concentration category was over 80% for all 10 antimicrobials and over 90% for 5 antimicrobials on test data. Six of the models performed as well on both external datasets as on test data, whereas the remaining 4 had similar accuracy on the human dataset but lower on the chicken data.

Conclusions

These results indicate that the models can predict minimum inhibitory concentration values at a level of accuracy that would be helpful for imputation in resistance datasets.

Clinical Relevance

The imputation of missing minimum inhibitory concentrations would allow for better evaluation of AMR trends over time, helping inform stewardship policies. These models may also help streamline surveillance and clinical susceptibility testing because they suggest which antimicrobials need to be laboratory-tested and which can be extrapolated by modeling.

Abstract

Objective

Antimicrobial resistance (AMR) is a public health threat requiring monitoring across multiple sectors because AMR genes and pathogens can pass between humans, animals, and the environment. Idiosyncrasies in AMR data, including missing data and changes in testing protocols, make characterizing AMR trends over time and sectors challenging. Therefore, this study applied machine learning methods to impute missing minimum inhibitory concentrations.

Methods

Models were built using cattle-associated Escherichia coli from the National Antimicrobial Resistance Monitoring System. Random forest models were designed to predict the minimum inhibitory concentration of a given E coli isolate for 10 antimicrobials. Predictors included isolate metadata and the minimum inhibitory concentrations of other antimicrobials. Model performance was evaluated on held-out test data and 2 external datasets (E coli isolated from chickens and humans).

Results

Overall, the accuracy within 1 minimum inhibitory concentration category was over 80% for all 10 antimicrobials and over 90% for 5 antimicrobials on test data. Six of the models performed as well on both external datasets as on test data, whereas the remaining 4 had similar accuracy on the human dataset but lower on the chicken data.

Conclusions

These results indicate that the models can predict minimum inhibitory concentration values at a level of accuracy that would be helpful for imputation in resistance datasets.

Clinical Relevance

The imputation of missing minimum inhibitory concentrations would allow for better evaluation of AMR trends over time, helping inform stewardship policies. These models may also help streamline surveillance and clinical susceptibility testing because they suggest which antimicrobials need to be laboratory-tested and which can be extrapolated by modeling.

Antimicrobial resistance (AMR) is a major public health threat because, with AMR bacteria and genes able to pass between humans, animals, and the environment, antimicrobial use in 1 sector affects the others. In 2019, the WHO declared AMR a top-10 global public health threat and prioritized combatting AMR in its strategic plan.1 Many countries, international bodies, and professional organizations have developed antimicrobial surveillance programs and are implementing antimicrobial stewardship programs or regulations. It is essential to develop surveillance methodologies that can detect impacts of the stewardship programs so that efforts can be directed to the most efficacious policies. However, the relationship between AMR genes, AMR phenotypes, host factors, and antimicrobial use is complex, and several studies have been unable to link antimicrobial stewardship policies with changes in AMR. For example, in 2005, approval for use of fluoroquinolone drugs in poultry was withdrawn in response to increasing fluoroquinolone resistance in poultry-associated Campylobacter infections in people.2 Several follow-up studies,36 however, were unable to find reduced AMR in Campylobacter after the fluoroquinolone withdrawal. This inability to link changes in AMR with changes in policy constrains our ability to evaluate AMR stewardship policies and enact new, effective ones.

Idiosyncrasies in AMR data in resistance datasets, including missing data and changes in testing protocols, such as antimicrobials and concentrations tested in different years and between surveillance labs,7 make characterizing AMR trends over time challenging. Antimicrobial resistance is typically assessed in a microbiology laboratory by growing and isolating the bacterial pathogen, identifying it taxonomically, and then performing minimum inhibitory concentration assays8,9 by growing the isolate in the presence of multiple concentrations of antimicrobial agents to determine the minimum concentration of drug needed to inhibit the bacteria's growth (minimum inhibitory concentration value). These laboratory analyses are time consuming as even analysis of fast-growing bacteria may exceed 36 hours and require testing the isolate against several antimicrobial agents to determine its susceptibility profile. Once the minimum inhibitory concentration value for a given antimicrobial is determined, breakpoints published by entities like the Clinical and Laboratory Standards Institute (CLSI), the US FDA, or the European Society of Clinical Microbiology and Infectious Disease (EUCAST) are used to interpret the minimum inhibitory concentration value to classify the isolate as being susceptible, intermediate, or resistant to the tested drug.7,10 However, over time, as new pharmacological and patient outcome data is obtained and resistance mechanisms emerge, breakpoint guidelines are revised.7 Manufacturers of automated antimicrobial susceptibility testing (AST) devices may have to reformulate these devices to test an minimum inhibitory concentration range that accommodates the new breakpoints or may change which antimicrobial agents are being tested. These revisions lead to a cycle where labs have to continuously update their susceptibility typing methods, and different labs may be following different methods depending on which version of breakpoint guidelines or AST device is being used.11 These changes may lead to data idiosyncrasies that make it difficult to compare AMR trends over time, such as if a drug was tested only in select years or if the range of minimum inhibitory concentrations tested was modified.

Prediction models have been used to supplement or replace phenotypic antimicrobial susceptibility typing. Most commonly, a curated set of reference genes known to confer resistance is used to predict susceptibility from bacterial whole-genome sequencing data.8 These methods, however, require the bacterial species’ resistance mechanisms to be well known to give accurate predictions. Recently, machine-learning has been utilized for AMR prediction as they are able to make predictions a priori without any underlying information about the input data.8 Several studies1214 have used bacterial genomic sequences to predict whether the bacteria is susceptible or resistant to particular antibiotics. More recently, genomic data have been used to predict the minimum inhibitory concentration value of a given isolate using either k-mer or pangenome-based approaches for feature selection.8,9,15,16 By predicting minimum inhibitory concentration values versus binary susceptible-resistance categories, these methods preserve the granularity of AMR data and allow for the comparison of minimum inhibitory concentration differences above and below susceptibility breakpoints. Minimum inhibitory concentration predictions are also more robust to changes in breakpoints than susceptible-versus-resistant prediction models. The existing minimum inhibitory concentration prediction models, however, all require having bacterial genome sequences as the input for prediction, and due to cost, not all isolates collected for surveillance or clinical reasons are sequenced. In addition, historic isolates may not have been preserved and so cannot be sequenced, thus leaving only phenotypic minimum inhibitory concentration data available for analysis.

The aim of this study is to impute missing data in AMR datasets while preserving minimum inhibitory concentration–level granularity. By using phenotypic rather than genotypic data as predictors, these models may be able to be applied to a wider range of surveillance and clinical datasets, including isolates collected before sequencing technologies were widely available. Imputing missing AMR data could allow for better evaluation of AMR trends over time, helping inform stewardship policies. In this study, we used random forest models to predict Escherichia coli minimum inhibitory concentration values for a specific antimicrobial from minimum inhibitory concentration values for other antimicrobials and isolate metadata. Models were made to predict minimum inhibitory concentration values for 10 different antimicrobial agents. Cattle-associated E coli was selected for model building as they were tested against a wide range of antimicrobials by The National Antimicrobial Resistance Monitoring System (NARMS) and are a common cause of foodborne illness in people. Model performance was then evaluated on cattle-associated E coli test data and 2 external datasets curated by NARMS, chicken-associated E coli and E coli O157:H7 isolates collected from human clinical cases.

Methods

Isolate minimum inhibitory concentration data and metadata

A total of 11,713 cattle-associated E coli isolates collected by NARMS from 2002 through 2019 were used in this study. The National Antimicrobial Resistance Monitoring System is a national surveillance program in the US that tracks changes in the antimicrobial susceptibility of foodborne enteric bacteria isolated from sick people, food animals, and retail meats.2 The isolates in this study were collected by NARMS from either raw beef purchased from grocery stores from 2002 through 2019 or from cecal samples collected from cattle at slaughter from 2013 through 2019. The National Antimicrobial Resistance Monitoring System then determines the antimicrobial susceptibility of each isolate using broth microdilution on a Sensititre panel (ThermoFisher Scientific) at FDA and USDA laboratories.17 Susceptibility testing for 16 total antimicrobials was performed in at least 1 year of the study period: tetracycline, nalidixic acid, ciprofloxacin, chloramphenicol, ampicillin, azithromycin, trimethoprim-sulfamethoxazole, sulfisoxazole, ceftriaxone, cefoxitin, ceftiofur, amoxicillin-clavulanic acid, streptomycin, kanamycin, gentamycin, and amikacin. The dataset used in this study, including isolate minimum inhibitory concentration values and metadata, is publicly available and was downloaded from NARMS, website.18

Data processing

Data preprocessing and model building was conducted in RStudio (RStudio, version 2024.09.1, Posit PBC; R, version 4.3.2, The R Foundation). Minimum inhibitory concentration values were tabulated for each antimicrobial tested. To ensure that each category had unique boundaries, isolates were removed from the dataset if the minimum inhibitory concentration interval overlapped with another, more common minimum inhibitory concentration interval. For example, for the antimicrobial azithromycin, there were 5 isolates with an minimum inhibitory concentration of > 16, 5 isolates classified as = 32, and 1 isolate classified as > 32. Since the > 16 isolates had an minimum inhibitory concentration interval that overlapped with both the = 32 category and > 32 category, isolates designated as > 16 for azithromycin were removed from the dataset. Such discrepancies led to the removal of 10 isolates, leaving 11,713 isolates total.

In addition, since the models sought to predict an minimum inhibitory concentration value, the number of isolates within each minimum inhibitory concentration category in the training dataset needed to be similar to ensure balanced predictions. Therefore, some minimum inhibitory concentration categories were combined to create broader minimum inhibitory concentration intervals, so categories with few isolates were condensed. Categories were not condensed across breakpoint boundaries, though, to allow for the comparison of model accuracy on susceptible-versus-resistant isolates. For example, 10 isolates had a cefoxitin minimum inhibitory concentration of ≤ 0.5, 260 isolates had an minimum inhibitory concentration of = 1, 2,243 isolates had an minimum inhibitory concentration of = 2, 7,203 isolates had an minimum inhibitory concentration of = 4, 1,760 isolates had an minimum inhibitory concentration of = 8, 132 isolates had an minimum inhibitory concentration of = 16, and 105 isolates had an minimum inhibitory concentration of ≥ 32. Because the minimum inhibitory concentration categories ≤ 0.5 and = 1 had fewer isolates, they were combined with the = 2 category to make a new category, ≤ 2, which had 2,513 isolates. However, since NARMS breakpoints for cefoxitin are susceptible: ≤ 8, intermediate: ≥ 16, and resistant: ≥ 32, and the epidemiologic cutoff value from EUCAST is 16, the minimum inhibitory concentration categories = 16 and ≥ 32 were not condensed. The final minimum inhibitory concentration categories used for each antimicrobial and the number of isolates present in each interval are detailed in Table 1. These categories were used both as model predictor outputs when modeling the given antimicrobial and as levels in the model when the antimicrobial was used as a predictor variable.

Table 1

Number of cattle-associated Escherichia coli isolates in each minimum inhibitory concentration category.

Antimicrobial Minimum inhibitory concentration distribution
Minimum inhibitory concentration value 0.015 0.03 0.06 0.125 0.25 0.5 1 2 4 8 16 32 64 128 256
AMC 596a 2424a 7184a 1344a 35b 130c
AMI 301a 1646a 559a 62a 4a
AMP 1115a 5146a 4618a 175a 23b 635c
AXO 11553a 32a 16a 4b 5c 31c 43c 21c 8c
CHL 4933a 6086a 200b 494c
CIP 11413a 158a 142c
COT 10750a 821a 25a 117c
FIS 8609a 1149a 67a 1287a
FOX 2513a 7203a 1760a 132b 105c
GEN 390a 6793a 4151a 289a 19a 18b 53c
KAN 3642a 42a 2b 86c
NAL 1019a 8524a 2041a 19a 110c
TET 8081a 437b 220c 2975c
TIO 319a 2283a 2922a 68a 9a 8b 46c

AMC = Amoxicillin-clavulanic acid. AMI = Amikacin. AMP = Ampicillin. AXO = Ceftriaxone. CHL = Chloramphenicol. CIP = Ciprofloxacin. COT = Trimethoprim-sulfamethoxazole. FIS = Sulfisoxazole. FOX = Cefoxitin. GEN = Gentamicin. KAN = Kanamycin. NAL = Nalidixic acid. NARMS = The National Antimicrobial Resistance Monitoring System. TET = Tetracycline. TIO = Ceftiofur.

The number of isolates listed is the number of isolates in each category for the entire cattle-associated E coli dataset (before splitting into training and testing data).

a

The National Antimicrobial Resistance Monitoring System breakpoints were used to label susceptible isolates.

b

Intermediate isolates (if the intermediate category was present in the breakpoints for that drug).

c

Resistant isolates. For a given antimicrobial, if 2 minimum inhibitory concentration categories were combined to make a broader interval in the model, this was reflected by merging the cells corresponding to the interval together.

Predictor variables

The predictor variables used in each model were year of isolate collection and minimum inhibitory concentration values of other antimicrobials (Table 2). Year was treated as a nominal categorical variable (data type set as factor in R) and antimicrobial minimum inhibitory concentration values as ordinal categorical variables (data type set as ordinal factor in R). Minimum inhibitory concentration values were treated as ordinal categorical variables rather than a numeric variable since they represent sequential categories and do not have continuous values (ie, an isolate can have an minimum inhibitory concentration value of 2 or 4 but not 2.22 when measured with broth microdilution).

Table 2

Predictor variables used in each model.

Predictor variables AMC AMI AMP CHL CIP COT FIS FOX TET TIO
Year of isolate collection X X X X X X X X X X
AMC minimum inhibitory concentrations X X X X X X X
AMP minimum inhibitory concentrations X X X X X X X X
AXO minimum inhibitory concentrations X X X X X X X X X X
CHL minimum inhibitory concentrations X X X X X X X X X
CIP minimum inhibitory concentrations X X X X X X X X X
COT minimum inhibitory concentrations X X X X X X X X X
FIS minimum inhibitory concentrations X X X X X X X X
FOX minimum inhibitory concentrations X X X X X X X X X
GEN minimum inhibitory concentrations X X X X X X X X X X
KAN minimum inhibitory concentrations X
NAL minimum inhibitory concentrations X X X X X X X X X X
TET minimum inhibitory concentrations X X X X X X X X X
TIO minimum inhibitory concentrations X
Total number predictor variables 11 14 10 11 11 9 11 11 11 11

Model predictor variables were year of isolate collection and known isolate minimum inhibitory concentration values for other antimicrobials. Year of isolate collection was treated as a nominal categorical variable, and all antimicrobial minimum inhibitory concentration predictors were treated as ordinal categorical variables. Predictor variables that are present in a given model are marked with an X.

Any antimicrobial that was tested on at least 90% of the isolates used to build a given model was considered a possible predictor variable for feature selection. The package used to build the models (randomForest package in R) is not able to account for missing predictor values. Therefore, only isolates with a complete set of predictor variables were included in the training, testing, or validation dataset.

Random forest model generation

Models were built to predict the minimum inhibitory concentrations of 10 different antimicrobial agents across 8 drug classes (Table 3). The drug classes and representative antimicrobial agent modeled are tetracyclines (tetracycline), quinolones (ciprofloxacin), phenicols (chloramphenicol), penicillins (ampicillin), folate pathway agonists (trimethoprim-sulfamethoxazole and sulfisoxazole), cephems (cefoxitin and ceftiofur), β-lactam combination agents (amoxicillin-clavulanic acid), and aminoglycosides (amikacin). Random forests were selected for modeling because they can incorporate both ordinal and nominal predictor variables, can clearly describe which predictor variables are most important to the model, and are more robust to overfitting since the final prediction is the average prediction from several independently grown trees.19

Table 3

Accuracy of each antimicrobial model on validation datasets.

Cattle E coli test data External data: NARMS chicken E coli External data: NARMS human E coli O157:H7 clinical cases
Antimicrobial Exact accuracy (%) ± 1 category accuracy (%) Exact accuracy (%) ± 1 category accuracy (%) Exact accuracy (%) ± 1 category accuracy (%)
AMC 52 97 58 98 42 98
AMI 34 85 34 80 57 84
AMP 46 86 39 69 30 84
CHL 52 91 34 63 40 94
CIP 84 99 77 99 77 97
COT 90 97 68 82 83 93
FIS 59 85 54 75 50 88
FOX 45 88 48 92 36 90
TET 65 81 59 65 65 91
TIO 61 96 54 93 37 95

Antimicrobial susceptibility testing data from two external datasets, NARMS chicken-associated E coli and NARMS human clinical cases caused by E coli O157:H7, were used for validation. The exact accuracy and accuracy within a ± 1 minimum inhibitory concentration category of each model on the two validation datasets were compared to the accuracy on cattle-associated E coli test data.

For each model, the dataset was filtered to exclude isolates that were missing minimum inhibitory concentrations for the antimicrobial of interest. A random sample of 70% of the remaining isolates was then allocated for training the model, and 30% was reserved for testing the model's accuracy. Models were built using the classification option in the randomForest package in R, with the number of trees (ntree) set to 500.20

Model feature selection

Predictor variable importance was determined using the variable importance plot function (varImpPlot) included in the R randomForest package. The importance of a given predictor was assessed by permuting the specific variable and measuring how model accuracy changes across 10 crossfolds with the variable removed. Specifically, the mean decrease in accuracy and mean decrease in Gini index, a measure that indicates the probability that a randomly picked observation is erroneously classified, were calculated across all crossfolds.20 All predictor variables that, when included, improved the accuracy of the model and whose Gini importance was > 0 were used as predictor variables in the final model.

Model evaluation

Model performance was determined by evaluating its accuracy in classifying test data. Accuracy statistics were computed based on the confusion matrix returned by the model for the test data. From the confusion matrix, the accuracy of the model in predicting the exact laboratory-derived minimum inhibitory concentration value (Equation 1) as well as its accuracy within 1 minimum inhibitory concentration category was determined (Equation 2). Food and Drug Administration–approved automated AST devices have a margin of error of 1 2-fold minimum inhibitory concentration dilution,21 and so laboratory-determined minimum inhibitory concentration values may vary from an isolate's true minimum inhibitory concentration value by a 2-fold dilution. In some situations, quality control ranges include 2 or more minimum inhibitory concentration dilutions,22 so a ± 2 minimum inhibitory concentration category accuracy was also calculated (Supplementary Material S1). Since accuracy was assessed by comparing the model's predictions to NARMS laboratory-reported minimum inhibitory concentration values, ± 1 minimum inhibitory concentration category accuracy may better account for this variation than exact accuracy. However, a risk of using the ± 1 minimum inhibitory concentration category accuracy is that the margin could cross breakpoint boundaries and affect susceptible or resistant interpretations. To account for this risk, the percentage of isolates classified as accurate using the ± 1 margin that crossed a breakpoint boundary was calculated. A description of how this calculation was performed and the results of this calculation are included in Supplementary Material S1.

Exactaccuracy= numberofisolateswhosepredictedminimuminhibitoryconcentrationmatchedNARMSreportedminimuminhibitorytotalnumberofisolates
± 1 accuracy=numberofisolateswhosepredictedminimuminhibitoryconcentrationwaswithin1minimuminhibitoryconcentrationcategoryofNARMSreportedminimuminhibitoryconcentrationtotalnumberofisolates

The model's sensitivity in classifying resistant isolates as having a resistant minimum inhibitory concentration (Equation 3) and susceptible isolates as having a susceptible minimum inhibitory concentration (Equation 4) was also assessed. Sensitivity calculations were computed using both NARMS-provided breakpoints and the epidemiological cutoff value (ECOFF) published by EUCAST. The National Antimicrobial Resistance Monitoring System breakpoints are based on whether an isolate with a given minimum inhibitory concentration value is susceptible or resistant to treatment with that drug in a clinical setting,23 whereas the ECOFF value is the minimum inhibitory concentration value that demarcates bacterial isolates within a species with phenotypically detectable acquired resistance mechanisms from those without acquired resistance.24 Both sets of breakpoints were used because we were interested in how well the models could differentiate between clinically susceptible versus resistant isolates and between those with and without resistance mechanisms. Finally, models were assessed by computing error rates. Specifically, very major error (VME) refers to the number of isolates incorrectly predicted to be susceptible or intermediate when NARMS-reported minimum inhibitory concentration was resistant divided by the total number of NARMS-resistant isolates (Equation 5), whereas major error (ME) is the number of isolates predicted to be resistant when NARMS-reported minimum inhibitory concentration was susceptible or intermediate divided by the total number of NARMS-reported susceptible or intermediate isolates (Equation 6).

Resistantsensitivity= numberofisolatescorrectlypredictedwithresistantminimuminhibitoryconcentrationtotalnumberofresistantisolates
Susceptiblesensitivity= numberofisolatescorrectlypredictedwithsusceptibleorintermediateminimuminhibitoryconcentrationtotalnumberofsusceptibleorintermediateisolates
VME= numberofresistantisolatespredictedwithasusceptibleorintermediateminimuminhibitoryconcentrationtotalnumberofresistantisolates
ME= numberofsusceptibleorintermediateisolatespredictedwithresistantminimuminhibitoryconcentrationtotalnumberofsusceptibleandintermediateisolates

Because the number of isolates in each minimum inhibitory concentration category of the antimicrobial being predicted was unbalanced even after collapsing some minimum inhibitory concentration categories, 2 balancing methods were tested to see if they improved the prediction performance of the model across all minimum inhibitory concentration categories. First, training data were oversampled so that exactly the same number of isolates were present in each minimum inhibitory concentration category. Second, a mix of undersampling and oversampling of training data was used to obtain approximately the same number of training observations for each minimum inhibitory concentration category. The original model, oversampling model, and mixed under and oversampling model were then compared, and the best-performing model on test data was selected for each antimicrobial to be predicted.

The models were also evaluated on 2 external datasets to gauge their applicability for prediction in other E coli resistance datasets. The 2 external datasets used were both curated by NARMS and included either chicken-associated E coli isolates collected from raw chicken purchased from grocery stores from 2002 through 2019 or cecal samples from chicken at slaughter from 2013 through 2019 and E coli O157:H7 isolates cultured from clinically ill people from 1996 through 2023. The external datasets were first preprocessed so that the AST data were grouped into the same minimum inhibitory concentration categories used to train the model (Table 1). Also, if the year of isolate collection fell outside the training data years (2002 through 2019), then the isolate was relabeled with the closest training data year to the year of collection. The exact model accuracy (Equation 1) and accuracy within a ± 1 minimum inhibitory concentration category (Equation 2) were then determined on both validation datasets.

Code availability

R code and input data required to reproduce this analysis is available at https://zenodo.org/records/14756655.

Results

The minimum inhibitory concentration categories for each antimicrobial after data processing and the number of NARMS cattle-associated E coli isolates available for each of these categories is provided in Table 1.

The exact accuracy (Table 3; test data column) varied among antimicrobial models: the amikacin model had the lowest accuracy (34%) on test data, and the trimethoprim-sulfamethoxazole model had the highest accuracy (90%) on test data. The average exact accuracy among models was 59%. The ± 1 accuracies for all models (Table 3; test data column), however, were above 81%, and the average ± 1 accuracy among models was 91%. For each antimicrobial model, the exact accuracy and ± 1 accuracy for each minimum inhibitory concentration value was also computed and is included along with the test data prediction confusion matrices in Supplementary Material S1. The error in resistance classification caused by using the ± 1 minimum inhibitory concentration category margin was very low. Across the antimicrobial models, on average, 0.19% of isolates classified as accurate using ± 1 accuracy crossed the intermediate-resistant (or susceptible-resistant if an intermediate category did not exist for that antimicrobial) boundary on test data (Supplementary Material S1). In general, the models were better at predicting minimum inhibitory concentrations that were more common (had a greater proportion of isolates before balancing) than less frequent minimum inhibitory concentrations (Supplementary Material S1).

Models were also assessed for their ability to correctly classify resistant-versus-susceptible isolates (Table 4). Overall, resistant and susceptible sensitivities were over 80% for most antimicrobial models. Resistant and susceptible sensitivities were generally similar for a given antimicrobial model regardless of whether NARMS or ECOFF breakpoints were used. Very major error rate (ie, the percentage of resistant isolates incorrectly assigned a susceptible or intermediate minimum inhibitory concentration value by the model) and ME rate (ie, the percentage of susceptible isolates incorrectly assigned a resistant minimum inhibitory concentration value) were also computed using NARMS breakpoints to determine the susceptibility boundary. In general, VME (range, 8% to 78%) was higher than ME (range, 0% to 27%), suggesting that these models were better at classifying susceptible isolates versus resistant ones. Very major error was particularly high for antimicrobials where there was only 1 resistant minimum inhibitory concentration category and several susceptible or intermediate minimum inhibitory concentration categories, such as in the case of ampicillin or trimethoprim-sulfamethoxazole (Table 1), and thus the margin of error was very narrow (the minimum inhibitory concentration category must be predicted exactly to avoid being classified as a VME).

Table 4

Sensitivity and error rates for each antimicrobial random forest model on test data.

Antimicrobial NARMS-resistant sensitivity (%) NARMS-susceptible sensitivity (%) ECOFF-resistant sensitivity (%) ECOFF-susceptible sensitivity (%) Very major error (%) Major error (%)
AMC 91 99 82 99 9 0.03
AMP 74 92 75 91 26 8
CHL 80 84 92 93 20 16
CIP 91 99 91 99 9 0.94
COT 22 98 26 98 78 2
FOX 85 99 85 99 15 0.45
TET 62 73 57 89 37 27
TIO 86 100 94 98 14 0

ECOFF = Epidemiologic cut-off value. NARMS = The National Antimicrobial Resistance Monitoring System.

Sensitivity and error rates for amikacin and sulfisoxazole could not be calculated since none of NARMS isolates were resistant to those antimicrobial agents.

A benefit of random forest models is that it is possible to see which variables are the most important predictors in the model. In general, the most important predictor variable was another drug in the same drug class or a related drug class (ie, β-lactam combination agents vs penicillins) if such a drug was available as a predictor (Figure 1). For example, the most important predictor variable for the amikacin model was gentamicin, another aminoglycoside, and the most important predictor for the amoxicillin-clavulanic acid model was ampicillin (also a β-lactam drug). Among the metadata predictors, year of isolate collection was consistently an important predictor across antimicrobial models.

Figure 1
Figure 1

Importance of predictor variables in each antimicrobial model. Predictor variable importance is determined by removing a given predictor variable and measuring the mean change in accuracy of the model scaled by its SD across all out-of-bag samples when training the model. Predictor variables that have a higher mean decrease in accuracy value when removed are the most important predictors. A—Amoxicillin-clavulanic acid model. B—Amikacin model. C—Ampicillin model. D—Chloramphenicol model. E—Ciprofloxacin model. F—Trimethoprim-sulfamethoxazole model. G—Sulfisoxazole model. H—Cefoxitin model. I—Tetracycline model. J—Ceftiofur model. AMC = Amoxicillin-clavulanic acid. AMI = Amikacin. AMP = Ampicillin. AXO = Ceftriaxone. CHL = Chloramphenicol. CIP = Ciprofloxacin. COT = Trimethoprim-sulfamethoxazole. FIS = Sulfisoxazole. FOX = Cefoxitin. GEN = Gentamicin. KAN = Kanamycin. NAL = Nalidixic acid. TET = Tetracycline. TIO = Ceftiofur. Year = Year of isolate collection.

Citation: American Journal of Veterinary Research 2025; 10.2460/ajvr.24.10.0314

The models were evaluated on 2 external datasets: chicken E coli isolates and E coli O157:H7 isolates from clinically ill human patients (Table 3). The accuracies were similar on both external datasets and the test data for 6 out of the 10 models: amoxicillin-clavulanic acid, amikacin, ciprofloxacin, sulfisoxazole, cefoxitin, and ceftiofur. The models for ampicillin, chloramphenicol, trimethoprim-sulfamethoxazole, and tetracycline performed about as well on the human isolates as the test data but had a lower accuracy on the chicken data (accuracies were 10 to 30 percentage points lower on chicken data).

Discussion

This study sought to develop random forest machine learning models for the imputation of minimum inhibitory concentration values for several antimicrobials using phenotypic AST data as predictors. By using phenotypic rather than genomic data as predictors, these models may be able to be applied to a wider range of AMR datasets, including historical isolates, from which sequencing data is not available or too expensive to obtain.

The ± 1 minimum inhibitory concentration category accuracies (5 models > 90%) were comparable to ± 1 accuracies reported by studies8,15,25 predicting minimum inhibitory concentration values from genomic data, suggesting that phenotypic data can be used as an effective predictor when sequence data is not available. The ± 1 minimum inhibitory concentration category accuracy was notably higher than exact accuracy for all models except ciprofloxacin and trimethoprim-sulfamethoxazole. Accuracy was determined by comparing predictions to NARMS laboratory-reported minimum inhibitory concentration values, and since FDA-approved automated AST devices have a margin of error of ± 1 2-fold minimum inhibitory concentration dilution,8 ± 1 minimum inhibitory concentration category accuracy may better be able to account for the variation in laboratory minimum inhibitory concentrations than exact accuracy. Moreover, using the ± 1 minimum inhibitory concentration category margin very rarely affected susceptible or resistant interpretations (Supplementary Material S1).

Ciprofloxacin and trimethoprim-sulfamethoxazole ± 1 accuracy and exact accuracy may have been similar because, during data preprocessing, it was found that some minimum inhibitory concentration categories had very few isolates, and so categories were combined to make fewer categories with broader minimum inhibitory concentration intervals (Table 1). The broader interval categories already encompassed ± 1 dilutions for ciprofloxacin and trimethoprim-sulfamethoxazole, making ± 1 category accuracy and exact accuracy similar. In addition, resistant and susceptible sensitivities were over 80% for most antimicrobial models (Table 4), indicating that the models are able to differentiate between susceptible and resistant isolates. In general, though, the models had more difficulty classifying resistant isolates than susceptible ones, and the VME rate was high, especially for the ampicillin, trimethoprim-sulfamethoxazole, and tetracycline models. Several antimicrobial models had only 1 resistant minimum inhibitory concentration category and many susceptible or intermediate minimum inhibitory concentration categories (Table 1), and thus the minimum inhibitory concentration category must be predicted exactly to avoid being classified as a VME. However, with the exception of the sulfamethoxazole and tetracycline models, the other models are able to predict resistant isolates within 1 minimum inhibitory concentration category of the lab-derived value with high accuracy (Supplementary Material S1), suggesting that its imputations would still be useful.

An advantage of random forest models is that it is possible to see which variables are the most important predictors. In 8 out of the 10 models, year of isolate collection was a top 3 predictor variable (Figure 1). The importance of year as a predictor implies that minimum inhibitory concentration values, and thus AMR, vary over time and suggests that it may be possible to see changes in resistance over time on an minimum inhibitory concentration level, which would be very useful for the evaluation of antimicrobial stewardship policies. The feature importance plots also showed that, in general, if another antimicrobial in the same drug class or related drug class were available, it was the top antimicrobial predictor. These feature importance plots may be very useful for streamlining AST surveillance protocols and clinical susceptibility testing because it suggests which antimicrobials need to be tested in a lab (antimicrobials that have low correlation with others) and which can be extrapolated by modeling (antimicrobials that are highly correlated with others). Such determinations could be helpful, especially in veterinary medicine where clients have to pay for the cost of clinical susceptibility testing.26

Model imputation accuracy was also assessed using 2 external validation datasets, chicken-associated E coli and E coli O157:H7 collected from clinically ill humans. The models for amoxicillin, amikacin, ciprofloxacin, sulfisoxazole, cefoxitin, and ceftiofur had similar exact and ± 1 minimum inhibitory concentration category accuracies on the test data and 2 external datasets (Table 3), indicating that these models are robust and can be used for prediction even across different host species.

The ampicillin, chloramphenicol, trimethoprim-sulfamethoxazole, and tetracycline models had a similar accuracy to the test results on the human dataset but a lower accuracy on the chicken data. Escherichia coli O157:H7 is the most common cause of enterohemorrhagic E coli infections in people, and this strain is carried primarily by healthy cattle and can be spread to humans by eating improperly cooked beef or unpasteurized dairy products.27 Escherichia coli O157:H7, however, is rare in chickens.27 The O157:H7 isolates in the human data are therefore likely more closely related to the cattle-associated E coli used to train the model than the chicken-associated E coli isolates and so had higher model accuracies than the chicken dataset.

A limitation of this study is that the random forests require all predictor variables to be present in a dataset for the model to run. This means that the models cannot be automatically applied to resistance datasets with minimum inhibitory concentration values missing for multiple drugs. However, to apply the technique in sparsely populated datasets, this can be overcome by training a model in a similar fashion using only the predictor variables present in the given dataset. Although models trained using limited predictor variables likely would have lower accuracies, it may still provide useful imputations if the most important predictor variables are present.

The high overall ± 1 accuracies of the models on test and external validation data indicate that the predicted minimum inhibitory concentration values are well correlated with laboratory-derived minimum inhibitory concentration values and at a level of accuracy that would be helpful for surveillance purposes. Moreover, the resistant and susceptible sensitivities indicate that the models are able to differentiate between clinically resistant and susceptible isolates (NARMS) and those with and without acquired resistance mechanisms (ECOFF). Although these models have the limitation that they may need to be specifically trained for particular host species or pathogenic strains if the resistance distribution varies significantly, this study provides a workflow that can be used to develop phenotypic prediction–based minimum inhibitory concentration models for other host species and clinically important bacteria. Such imputation models could help simplify surveillance testing and allow for the imputation of missing AMR data, enabling better evaluation of AMR trends over time.

Supplementary Materials

Supplementary materials are posted online at the journal website: avmajournals.avma.org.

Acknowledgments

None reported.

Disclosures

Dr. Cazer served as Guest Editor for this AJVR Supplemental Issue. She declares that she had no role in the editorial direction of this manuscript.

No AI-assisted technologies were used in the composition of this manuscript.

Funding

This research was supported by the USDA National Institute of Food and Agriculture Agriculture and Food Research Initiative project No. 2023–68015-4092. Gayatri Anil was supported by the National Institute of General Medical Sciences of the NIH under award No. T32GM150453. The content of this study is solely the responsibility of the authors and does not necessarily represent the official views of the USDA or the NIH.

References

  • 1.

    Ten threats to global health in 2019. WHO. March 21, 2019. Accessed September 9, 2024. https://www.who.int/vietnam/news/feature-stories/detail/ten-threats-to-global-health-in-2019

  • 2.

    Karp BE, Tate H, Plumblee JR, et al. National antimicrobial resistance monitoring system: two decades of advancing public health through integrated surveillance of antimicrobial resistance. Foodborne Pathog Dis. 2017;14(10):545557. doi:10.1089/fpd.2017.2283

    • Search Google Scholar
    • Export Citation
  • 3.

    Zawack K, Li M, Booth JG, Love W, Lanzas C, Gröhn YT. Monitoring antimicrobial resistance in the food supply chain and its implications for FDA policy initiatives. Antimicrob Agents Chemother. 2016;60(9):53025311. doi:10.1128/AAC.00688-16

    • Search Google Scholar
    • Export Citation
  • 4.

    Zhao S, Young SR, Tong E, et al. Antimicrobial resistance of Campylobacter isolates from retail meat in the United States between 2002 and 2007. Appl Environ Microbiol. 2010;76(24):79497956. doi:10.1128/AEM.01297-10

    • Search Google Scholar
    • Export Citation
  • 5.

    Price LB, Lackey LG, Vailes R, Silbergeld E. The persistence of fluoroquinolone-resistant Campylobacter in poultry production. Environ Health Perspect. 2007;115(7):10351039. doi:10.1289/ehp.10050

    • Search Google Scholar
    • Export Citation
  • 6.

    Nannapaneni R, Hanning I, Wiggins KC, Story RP, Ricke SC, Johnson MG. Ciprofloxacin-resistant Campylobacter persists in raw retail chicken after the fluoroquinolone ban. Food Addit Contam Part A. 2009;26(10):13481353. doi:10.1080/02652030903013294

    • Search Google Scholar
    • Export Citation
  • 7.

    Development of in vitro susceptibility testing criteria and quality control parameters. Clinical and Laboratory Standards Institute. January 2018. Accessed October 15, 2024. https://clsi.org/media/1952/m23ed5_sample_final.pdf

  • 8.

    Nguyen M, Long SW, McDermott PF, et al. Using machine learning to predict antimicrobial MICs and associated genomic features for nontyphoidal Salmonella. J Clin Microbiol. 2019;57(2):e01260-18. doi:10.1128/JCM.01260-18

    • Search Google Scholar
    • Export Citation
  • 9.

    Yasir M, Karim AM, Malik SK, Bajaffer AA, Azhar EI. Prediction of antimicrobial minimum inhibitory concentrations for Neisseria gonorrhoeae using machine learning models. Saudi J Biol Sci. 2022;29(5):36873693. doi:10.1016/j.sjbs.2022.02.047

    • Search Google Scholar
    • Export Citation
  • 10.

    Clinical breakpoints. European Committee on Antimicrobial Susceptibility Testing. January 1, 2024. Accessed September 11, 2024. https://www.eucast.org/fileadmin/src/media/PDFs/EUCAST_files/Breakpoint_tables/v_14.0_Breakpoint_Tables.pdf

  • 11.

    Patel JB, Alby K, Humphries R, et al. Updating breakpoints in the United States: a summary from the ASM clinical microbiology open 2022. J Clin Microbiol. 2023;61(10):e0115422. doi:10.1128/jcm.01154-22

    • Search Google Scholar
    • Export Citation
  • 12.

    Macesic N, Bear Don’t Walk OJ IV, Pe’er I, Tatonetti NP, Peleg AY, Uhlemann AC. Predicting phenotypic polymyxin resistance in Klebsiella pneumoniae through machine learning analysis of genomic data. mSystems. 2020;5(3):e00656-19. doi:10.1128/msystems.00656-19

    • Search Google Scholar
    • Export Citation
  • 13.

    Avershina E, Sharma P, Taxt AM, et al. AMR-Diag: neural network based genotype-to-phenotype prediction of resistance towards β-lactams in Escherichia coli and Klebsiella pneumoniae. Comput Struct Biotechnol J. 2021;19:18961906. doi:10.1016/j.csbj.2021.03.027

    • Search Google Scholar
    • Export Citation
  • 14.

    Břinda K, Callendrello A, Ma KC, et al. Rapid inference of antibiotic resistance and susceptibility by genomic neighbour typing. Nat Microbiol. 2020;5(3):455464. doi:10.1038/s41564-019-0656-6

    • Search Google Scholar
    • Export Citation
  • 15.

    Wang S, Zhao C, Yin Y, Chen F, Chen H, Wang H. A practical approach for predicting antimicrobial phenotype resistance in Staphylococcus aureus through machine learning analysis of genome data. Front Microbiol. 2022;13:841289. doi:10.3389/fmicb.2022.841289

    • Search Google Scholar
    • Export Citation
  • 16.

    Yang MR, Su SF, Wu YW. Using bacterial pan-genome-based feature selection approach to improve the prediction of minimum inhibitory concentration (MIC). Front Genet. 2023;14:1054032. doi:10.3389/fgene.2023.1054032

    • Search Google Scholar
    • Export Citation
  • 17.

    Manual of Laboratory Methods. The National Antimicrobial Resistance Monitoring System. June 4, 2021. Accessed September 16, 2024. https://www.fda.gov/media/101423/download

  • 18.

    Integrated reports/summaries. FDA. November 8, 2024. Accessed January 9, 2023. https://www.fda.gov/animal-veterinary/national-antimicrobial-resistance-monitoring-system/integrated-reportssummaries

  • 19.

    Breiman L. Random forests. Mach Learn. 2001;45(1):532.

  • 20.

    Breiman L, Cutler A. Breiman and Cutler's random forests for classification and regression. October 14, 2022. Accessed September 16, 2024. https://cran.r-project.org/web/packages/randomForest/randomForest.pdf

  • 21.

    Guidance for industry and FDA: class II special controls guidance document: antimicrobial susceptibility test (AST) systems. FDA. August 28, 2009. Accessed December 9, 2024. https://www.fda.gov/media/88069/download

  • 22.

    Implications of breakpoints splitting the wild type and/or resistant populations. European Committee on Antimicrobial Susceptibility Testing. May 12, 2016. Accessed December 9, 2024. https://www.eucast.org/fileadmin/src/media/PDFs/EUCAST_files/General_documents/Splitting_WT_and_resistant_populations_20160626.pdf

  • 23.

    National Antimicrobial Resistance Monitoring System. USDA Food Safety and Inspection Service. Updated January 17, 2025. Accessed September 16, 2024. https://www.fsis.usda.gov/science-data/national-antimicrobial-resistance-monitoring-system-narms

  • 24.

    Kahlmeter G, Turnidge J. How to: ECOFFs—the why, the how, and the don’ts of EUCAST epidemiological cutoff values. Clin Microbiol Infect. 2022;28(7):952954. doi:10.1016/j.cmi.2022.02.024

    • Search Google Scholar
    • Export Citation
  • 25.

    Batisti Biffignandi G, Chindelevitch L, Corbella M, Feil EJ, Sassera D, Lees JA. Optimising machine learning prediction of minimum inhibitory concentrations in Klebsiella pneumoniae. Microb Genomics. 2024;10(3):001222. doi:10.1099/mgen.0.001222

    • Search Google Scholar
    • Export Citation
  • 26.

    De Briyne N, Atkinson J, Pokludová L, Borriello SP, Price S. Factors influencing antibiotic prescribing habits and use of sensitivity testing amongst veterinarians in Europe. Vet Rec. 2013;173(19):475475. doi:10.1136/vr.101454

    • Search Google Scholar
    • Export Citation
  • 27.

    Ferens WA, Hovde CJ. Escherichia coli O157:H7: animal reservoir and sources of human infection. Foodborne Pathog Dis. 2011;8(4):465487. doi:10.1089/fpd.2010.0673

    • Search Google Scholar
    • Export Citation

Supplementary Materials

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 141 141 141
PDF Downloads 127 127 127
Advertisement