Abstract
OBJECTIVE
To evaluate a predictive model’s ability to determine cattle mortality following first and second treatment for bovine respiratory disease and to understand the differences in net returns comparing predictive models to the status quo.
METHODS
2 boosted decision tree models were constructed, 1 using data known at first treatment and 1 with data known at second treatment. Then, the economic impact of each outcome (true positive, true negative, false positive, and false negative) was estimated using various market values to determine the net return per head of using the predictive model to determine which animals should be culled at treatment. This was compared to the status quo to determine the difference in net return.
RESULTS
The models constructed for the prediction of mortality performed with moderate accuracy (areas under the curve > 0.7). The economic analysis found that the models at a high specificity (> 90%) could generate a positive net return in comparison to status quo.
CONCLUSIONS
This study showed that predictive models may be a useful tool to make culling decisions and could result in positive net returns.
CLINICAL RELEVANCE
Bovine respiratory disease is the costliest health condition experienced by cattle on feed. Feedyard record-keeping systems generate vast amounts of data that could be used in predictive models to make management decisions. It is essential to understand the accuracy of predictions made via machine learning. However, the economic impact of implementing predictive models in a feedyard will influence adoption.
Bovine respiratory disease (BRD) is among the costliest diseases encountered in the feedlot industry.1 Few diagnostics can be efficiently applied at the time of treatment to predict the final outcome of an animal diagnosed with BRD in the feedyard. Economic loss due to mortality is a concern for feedyard managers, and early shipments of chronically sick or poor-doing cattle at a discounted price (commonly called railing or realizing) is used to ensure some return on investment in these cattle.
Modern data analytic strategies, including machine learning algorithms and predictive models, are now being researched for use in dairy and beef production.2–4 The vast amount of data produced by feedyards today make them an environment ideal for testing the application of predictive models to assist management decisions. Predictive analytics and machine learning approach the prediction of an outcome differently than traditional statistics. Rather than measure the association of a predictor variable and an outcome of interest, predictive models utilize inputs to make predictions regardless of association.5 Risk factors associated with BRD morbidity and mortality have been well-researched using traditional statistical methods. Weight of incoming cattle to a feedyard, distance transported, time of the year, and more have been well-established inputs for BRD management decisions.6 These data are readily available in a feedyard’s data management system and could easily be incorporated into a predictive model. The performance of a predictive model is measured using several metrics that describe its accuracy, but when applying this tool to make management decisions in a feedyard, it is also important to understand the economic implications.
The objective of this study is to determine the ability of a boosted decision tree predictive model to predict mortality within 15 days of an animal’s first or second treatment for BRD and to quantify the net return of using the predictive model compared to status quo at different probability thresholds as a management tool to decide which animals should be culled (railed) ahead of their pen mates. We hypothesize that the boosted decision tree models will perform with moderate accuracy when asked to determine the outcome of an animal in the next 15 days. We anticipate that by using the models to determine which animals should be railed, feedyard managers may have the ability to return an economic benefit depending on which probability threshold is utilized.
Methods
IACUC approval was not obtained for this study due to data being obtained from existing databases provided by several feedyards.
Data collection
Treatment records were obtained from 13 US feedyards from 2018 through 2021 and included information from 396,486 head of treated cattle. Feedyards included in this dataset were located in the central High Plains region of the US. Records obtained from each yard were on an individual animal basis and were tied to cohort-level data. A cohort was defined as a group of cattle purchased and managed similarly but not necessarily housed in the same pen for the entirety of the feeding phase.
In addition to data collected from feedyards, relevant economic data were acquired from a variety of sources. Cull cow slaughter prices were obtained from the USDA Agricultural Marketing Service database.7 These records were obtained using an application programming interface call to the system to obtain cull cow slaughter prices from each state represented in the dataset. The cull cow slaughter prices were later used in the economic analysis. Monthly fed cattle prices from January 2018 through December 2021 were obtained from the CattleFax online database.8 Historic commodity prices from October 2018 through May 2019 were used to determine the price per pound of a feed ration. Alfalfa hay and corn prices were acquired from Focus on Feedlots Monthly Reports.9 Wet distillers grain price was from Agriculture Marketing Resource Center’s Weekly Ethanol, Distillers Grain, and Corn Prices and was an average of the monthly prices from Iowa, Illinois, Nebraska, South Dakota, and Wisconsin.10
Feedyard data transformation
Before feedyard data could be used to create predictive models, they were transformed into the appropriate format, and feature engineering was used to generate additional variables to be included in the models. These steps were all performed in R.11 First, inclusion criteria were applied to the dataset, including validating records, only cohorts with heifers, steers or mixed genders, cohort and treatment weights less than 3,000 pounds, cohort arrival head counts less than 600, and cattle meeting case definition for each model (Figure 1). These inclusion criteria were applied to minimize the inclusion of data entry errors and to ensure that only animals with complete data were included in the dataset.
Exclusion criteria and dataflow. BRD1 = First treatment for bovine respiratory disease. BRD2 = Second treatment for bovine respiratory disease.
Citation: American Journal of Veterinary Research 85, 12; 10.2460/ajvr.24.06.0169
Two different datasets were created: 1 included all data known at cattle’s first treatment for BRD (BRD1), and the other included all data known at cattle’s second treatment for BRD (BRD2). All cattle in the second treatment dataset were included in the first treatment dataset. A case definition was created for each dataset. The diagnosis of BRD was based on standard operational procedures for each feedyard, which involved identifying individual animals within each cohort that illustrated signs typical of BRD, which included depression, anorexia, and increased respiration. Following diagnosis in the pen, animals were taken to a hospital facility where diagnosis was confirmed. To be considered a first treatment for BRD, an animal was administered an antimicrobial for a diagnosis of BRD for the first time. To be considered a second treatment for BRD, an animal was administered an antimicrobial for a diagnosis of BRD for the second time. These case definitions further refined our datasets and are also represented in Figure 1.
Feature engineering and variable generation consisted of creating new variables using the existing data to describe individual-, cohort-, and feedyard-level characteristics. These variables encompassed only information that would be known at the time of first or second treatment for BRD for each respective dataset (Supplementary Tables S1 and S2). The outcome of interest was captured in a variable called DNF_15 (did not finish at 15 days post treatment event). This variable described whether an animal experienced mortality or was culled (railed) within the next 15 days following the BRD treatment of interest. The cutoff of 15 days was used based on an assessment of the range of days post treatment event, which demonstrated that approximately 40% of mortalities and railing events occurred by 15 days after treatment. Data were then transformed so that each row of data corresponded to 1 animal and each animal had data for all variables. The final BRD1 dataset consisted of 60 variables, and the final BRD2 dataset consisted of 102 variables, with 60 of those repeated from the BRD1 dataset. These datasets were used to construct predictive models capable of determining the DNF_15 of individual animals at first or second treatment for BRD.
Model building process
Before the model building step occurred, each dataset was first split into separate training and testing sets, stratified by prevalence of outcome of interest, DNF_15. The training set represented 75% of the original dataset, whereas the testing set represented the other 25%. Using the Pipeline Designer function in Azure Machine Learning, datasets were used to train, test, and, eventually, validate a boosted decision tree model.12 This algorithm was chosen out of the many possible options due to prior experience and research done within our group.2,13 Decision trees are classification models that split data based on discrete attributes within variables that best characterize a class to be predicted. One split is called a node. A following split creates a branch. These splits continue through all variables in the dataset. Boosting is an ensemble method that creates multiple additional trees based on the misclassification of the first tree. Boosted decision trees have been reported to be more robust and efficient than more complicated predictive models, such as neural networks.14
Before model training occurred, a 2-way chi-squared feature scoring method was applied to the training dataset in order to remove redundant variables in each dataset. Feature selection allows for better performance of a predictive model and the removal of unnecessary variables.15 Following 2-way chi-squared analysis, features were selected for inclusion in machine learning models.
Hyperparameters are used in machine learning to adjust the model training process and optimize performance. In this study, a module provided by Azure Machine Learning performed a parameter sweep to identify the hyperparameters that would maximize the area under the receiver operating characteristics curve (AUC).16 This sweep was performed with the training dataset following feature selection. Hyperparameter tuning led to adjustments made in the default architecture. The tuning process indicated that the same parameters should be used for the BRD1 and BRD2 models to maximize the predictive power of the models as indicated by the highest AUC value. The maximum number of leaves per tree was set to 32, resulting in a maximum of 32 nodes that could be created by any tree. A larger number would put the model at risk of overfitting the data. A minimum of 50 samples per leaf node was set. This means that at least 50 cases were needed in order to create a rule. The number of trees constructed was set to 500. Creating more trees results in better coverage of the data but increases training time. Lastly, the learning rate was set to 0.025. Adjustment of the learning rate defines how fast the learner converges on the optimal solution. Decreasing the learning rate results in more precision but increases training time.17
The boosted decision tree model was now tested to predict the outcome of interest, DNF_15, using the testing dataset that had been withheld from the previous training steps. This was done using a technique called crossfold validation. This technique is commonly applied in machine learning to assess the variability and reliability of the dataset used to train a model.18 The testing datasets were each divided randomly into 10 folds, with data from 1 fold retained for validation. The model is trained on 9 folds, and final performance metrics are calculated using the validation fold. The process is repeated, with each fold undergoing a turn as the validation fold. The end result is 10 sets of performance metrics, allowing for evaluation of the dataset’s variability and the model’s reliability. To simplify the analysis, the fold resulting in the highest AUC value was selected for further evaluation.
Model performance was evaluated using accuracy, sensitivity, specificity, positive and negative predictive values, and AUC. These values were calculated using the confusion matrices produced by each model run in Azure. Figure 2 outlines all the model building and evaluation steps. The usefulness of the results needed to be put into context, so an economic analysis was performed using the scored datasets obtained from the crossvalidation fold of each predictive model with the highest AUC. The economic analysis is described in further detail below.
The steps followed to train, test, and validate the boosted decision tree models. These steps were performed for each dataset (BRD1 and BRD2).
Citation: American Journal of Veterinary Research 85, 12; 10.2460/ajvr.24.06.0169
Economic analysis
In order to address the economic implications of applying predictive models to assist in feedyard post-treatment culling decisions, an economic model was created to estimate net returns. The predictive models described above encompassed the 15-day post-treatment interval; however, the economic model was designed to account for the remainder of the feeding period to capture the long-term economic effects of using a predictive model to make a culling decision. Cost and income calculations were estimated for various processes of production and culling.
The net returns for each individual diagnostic outcome were estimated using unique formulae for each diagnostic outcome at first and second treatment for BRD. Diagnostic outcomes fell into 1 of 4 categories: true negatives, false positives, true positives, or false negatives. Cattle that truly lived and remained with the group beyond 15 days were either predicted to live by the model (true negatives) or predicted to experience mortality (false positives). Cattle that truly died or were culled within 15 days post-treatment were either predicted to experience mortality by the model (true positives) or predicted to live by the model (false negatives). A summary of each outcome and their respective net return formulae for the BRD1 data are described in Figure 3. The BRD2 formulae were similar but included an additional treatment cost in each outcome formula.
Confidence matrix describing the costs or profits associated with each possible model outcome for the BRD1 model. FCTN = Adjusted feed cost for true negative. RiskDNF = Risk of did-not-finish adjustment. RP = Railer price. SP = Sale price. TC = Treatment cost.
Citation: American Journal of Veterinary Research 85, 12; 10.2460/ajvr.24.06.0169
Detailed descriptions of the calculations used to estimate the individual elements of the net return equations for each model outcome can be found in Table 1. This economic model did not include purchase cost, processing cost, yardage, or feed cost up until the time of treatment as we were interested in determining the differences between model outcomes for net returns following first or second treatment for BRD. Feed costs were adjusted for the true negative and false positive animals based on expected changes in consumption for animals in each category. False positive animals would be shipped early, thus sparing the feedyard this feed cost, so it was included as cost savings in the calculation for net return of this outcome. Treatment costs were applied to negative prognostic model outcomes (true and false negatives) and were based on the cost of generic tulathromycin.19
Summary of calculations for estimation of costs and prices associated with model results.
Cost/income | Calculation | Result ($) |
---|---|---|
RP | Average weight at treatment X estimated railer price per cwt | BRD1 = (817 lbs) X ($45.04/100 lbs) = $367.78 BRD2 = (863 lbs) X ($45.04/100 lbs) = $388.58 |
SP | Average out weight X national average sale price per cwt | BRD1 = (1,499 lbs) X ($116.10/100 lbs) = $1,739.92 BRD2 = (1,490 lbs) X ($116.10/100 lbs) = $1,730.26 |
FCTN | (Average feed consumed by lot/average head in the lot) X average cost of feed per pound | BRD1 = (613,544 lbs/151 head) X ($0.0803/lb) = $326.87 BRD2 = (644,290 lbs/160 head) X ($0.0803/lb) = $323.36 |
BRD1 TC | Average weight at BRD1 X cost of tulathromycin per cwt | (874 lbs) X ($2.01/100 lbs) = $17.57 |
BRD2 TC | Average weight at BRD2 X cost of tulathromycin per cwt | (863 lbs) X ($2.01/100 lbs) = $16.41 |
RiskDNF | [(median cumulative mortality/median final days on feed) X average days between treatment and end of feeding period] X sale price | BRD1 = ([0.013/178 days] X 145 days) X $1,739.92 = $18.36 BRD2 = ([0.013/178 days] X 153 days) X $1,730.26 = $19.38 |
BRD1 = First treatment for bovine respiratory disease. BRD2 = Second treatment for bovine respiratory disease. FCTN = Adjusted feed cost for true negative. RiskDNF = Risk of did-not-finish adjustment. RP = Railer price. SP = Sale price. TC = Treatment cost.
Cull cow prices were derived from CattleFax data from the states represented in our dataset from 2018 through 2021.8 This method of estimating railer prices was recently described by Horton et al.20 This yielded a railer price of $45.04/cwt. Sale price was determined based on the average out-weight of the lots included in each dataset and calculated using 6-state averages from CattleFax.8
The risk of did not finish (RiskDNF) later in the feeding period needed to be accounted for in the economic model for the false positive and true negative model outcomes. The median RiskDNF was calculated using data from Babcock et al21 by dividing the median cumulative mortality rate (0.013) by the median days on feed (178 days), resulting in a RiskDNF of 0.007%. The RiskDNF for each dataset was estimated by multiplying the estimated daily risk of mortality by the average days between treatment and end of the feeding period. The respective sale prices for each dataset were multiplied by the RiskDNF later to estimate the cost associated with DNF later (Figure 3).
Next, the overall net return for using the predictive model (either at first or second treatment for BRD) was based on the frequency of each outcome and their individual net returns. A status quo was calculated based on no change in action from normal management procedures performed following treatment (cattle remained in pen with no culling but may have experienced mortality within the 15-day post-treatment observational period). The cost associated with the status quo was estimated in order to make a comparison to the economic results from utilizing the predictive model. To do this, a cost for animals that DNF within 15 days of treatment for BRD was determined to be a loss equivalent to the money lost from the potential sale of that animal (sale price) and the money lost on administering a treatment to that animal. The animals that survived also received an estimated cost. This was equivalent to the calculation for a true negative model outcome (the net return from the sale price after the subtraction of feed cost and treatment cost and the adjustment for RiskDNF later in the feeding period). These 2 costs were calculated for both the BRD1 and BRD2 datasets and were used to calculate a status quo cost per head. The status quo was set to zero and compared to the overall net returns of using the predictive models to make a culling decision at first and second treatment.
Each predictive model run produced a scored dataset that included the true outcome, predicted outcome, and the probability of that predicted outcome for each observation. A scored dataset displays the predicted outcome and a measure of certainty of the prediction, usually given as a probability. The decision of scoring a predicted outcome is based on the probability threshold, the default of which is usually 0.5. For example, a predicted outcome of DNF_15 with a 0.7 probability will be scored as a DNF_15 positive (would not finish) by the model, whereas a predicted outcome of DNF_15 with a 0.4 probability will be scored as an animal that will survive. The probability threshold can be increased or decreased to adjust sensitivity and specificity, thus altering the total numbers of true positives, true negatives, false positives, and false negatives. In this study, 9 probability thresholds from > 0.1 to > 0.9 in 0.1 increments were used for each model. The results for the net return calculation for each prognostic outcome were multiplied by the total number of true positives, true negatives, false positives, and false negatives at each probability threshold for each model (BRD1 and BRD2). The total costs were then divided by the number of observations (head of cattle) in each dataset to return an approximate net return of running the model per head. The difference of these values from the status quo per head calculations were then determined to understand the difference in net return between using the predictive model to make an early culling decision and the status quo for the feedyard.
Results
Following data transformation and variable creation, the final BRD1 dataset consisted of 133,249 animals, with a 6.1% prevalence of DNF_15. The final BRD2 dataset consisted of 27,726 animals, with a 14.0% prevalence of DNF_15. For the BRD1 dataset, the top 30 variables were kept, and the top 50 variables were kept for the BRD2 dataset following ranking from the chi-squared filtering approach. Supplementary Tables S3 and S4 indicate the features selected for inclusion in both datasets along with the associated chi-square correlation coefficients.
Evaluation of models
Two separate boosted decision tree models were trained, tested, and validated to predict DNF_15 based on information known at first (BRD1) and second treatment (BRD2) for BRD. Table 2 provides a summary of performance metrics for these models. Overall, both models demonstrated high specificities and high negative predictive values but low positive predictive values and low sensitivities. The AUCs for both models were between 0.7 and 0.8, demonstrating moderate predictive power.
Summary of model performance predicting did not finish at 15 days post treatment event at first and second treatment for bovine respiratory disease.
Metric | BRD1 model | BRD2 model |
---|---|---|
AUC-ROC | 0.811 | 0.783 |
True positives | 8 | 14 |
True negatives | 3,100 | 586 |
False positives | 10 | 11 |
False negatives | 213 | 82 |
Sensitivity | 3.6% | 14.6% |
Specificity | 99.7% | 98.2% |
PPV | 0.44 | 0.56 |
NPV | 0.94 | 0.88 |
Accuracy | 93.3% | 86.6% |
AUC = Area under the curve. NPV = Negative predictive values. PPV = Positive predictive values. ROC = Receiver operating characteristic.
Economic analysis results
The results from the economic model illustrated different net returns for each of the outcomes as classified by model prediction and actual cattle outcome. The net returns for true positives were $393.75 and $388.58/head for the BRD1 and BRD2, respectively. True negative net returns were profits of $1,377.12 and $1,371.11/head for BRD1 and BRD2, respectively. False positives resulted in the largest losses, with net returns of −$1,000.94 and −$998.94/head for BRD1 and BRD2, respectively. False negatives were also losses, with −$411.32/head for BRD1 and −$404.99/head for BRD2. The results for status quo economic estimates for BRD1 and BRD2 were $1,186.36 and $933.32, respectively.
Using the equations for net return described above, the difference from the status quo was calculated for each model (BRD1 and BRD2) across all probability thresholds, from > 0.1 to > 0.9. Figure 4 graphs the economic results from the BRD1 and BRD2 models. In general, as the probability threshold increased, the net return per head increased as well. For the BRD1 model, net return peaked at the threshold of > 0.6 and then decreased slightly. In the BRD2 model, net return continued to increase up to the probability threshold of > 0.9.
Difference in net returns using the model compared to status quo for BRD1 and BRD2.
Citation: American Journal of Veterinary Research 85, 12; 10.2460/ajvr.24.06.0169
Discussion
With the models constructed in this study, we attempted to predict mortality within the next 15 days of both first and second treatment for BRD. Our models included variables previously identified to be associated with mortality due to BRD, such as weight and body temperature at the time of treatment, quarter of arrival, and days on feed. Often the variables collected for these kinds of studies are involved in 2- and potentially 3-way interactions.6 These complex interactions may be difficult to interpret and thus make machine learning an attractive application to predict mortality. Additionally, predictive models are well-suited to handling large amounts of data, which is often the case when conducting a retrospective analysis of feedyard data.22 Predictive models are able to handle large amounts of data, and algorithms exist to handle collinearity, mitigating another concern with using traditional statistical approaches.23 The algorithm used in this study was a boosted decision tree. Boosting allows for correction of misclassification and has been shown to be superior to other algorithms, such as neural networks, when used on datasets with large numbers of variables.14 Decision trees are known to be applicable to a wide range of problems in addition to their ability to handle data with high dimensionality.24
Based on our AUC values of 0.81 for the BRD1 model and 0.78 for the BRD2 model, we determined the models investigated in this study to have moderate accuracy. Recent studies2,3,21,25 have investigated the prediction of BRD morbidity or case fatality using machine learning. Feuz et al25 reports higher sensitivities for the algorithms tested in their study but do not report AUC values. Amrine et al2 report a maximum AUC of 0.62, which is considerably lower than our reported AUC values. Babcock et al21 reports as high as 91% of lots correctly classified for cumulative morbidity risk; however, they do not report AUC values, making their models difficult to directly compare to ours. Rojas et al3 constructed models to determine BRD morbidity on a pen-level basis. The highest AUC reported in that study was 0.79. Overall, our models perform similarly to other published classification model studies; however, our models demonstrate lower sensitivities, likely due to the imbalanced nature of the outcome class. Additionally, direct comparisons are difficult to make as the aforementioned studies made predictions on a lot-level basis, whereas the current study investigated predictions on an individual basis. Predictions made on an individual basis are likely subject to higher variability and thus are less accurate than those made on a lot level.
We utilized a pooled dataset consisting of feedyard data from multiple operations in different geographic locations over several years. This allowed for a robust but highly variable dataset. The predictive models built from this dataset are therefore quite general and may result in poorer performance in comparison to a predictive model built using a specific feedyard’s dataset. Previous research has demonstrated that models built using individual feedyard data can perform either worse or better in comparison to models built from a pooled dataset.2,13 The reason for this finding likely lies in the data quality differences between feedyards that wash out in the pooled dataset. However, consistent between the aforementioned studies is the finding that the variability in model performance decreases when creating models for individual feedyards.
Our economic analysis found that increasing the specificity of the models increased the net return per head compared to the status quo, with an even further increased net return found with the BRD2 model. A couple of papers have performed similar economic analyses using classification models on treated cattle in a feedlot. In 2021, Feuz et al25 reported an average increase of $14.01 net return per head compared to a simulated standard culling procedure when using the logistic regression model described in their study. They simulated the net return per head using various predictive model algorithms to model mortality in feedlot cattle removed from their pen for any health-related reason. In 2022, Feuz et al26 conducted a similar study with more feedyards included in the dataset. This time they reported an average increase in net return of $6.31/head. These are far lower than our economic results. Feuz et al investigated the prediction of mortality to the end of the feeding period in both papers,25,26 whereas we focused on the period of 15 days post treatment for BRD. Other work has shown a decrease in predictive ability over longer periods of time.13 Finally, Feuz et al, in both papers,25,26 included calves pulled and treated multiple times for the same or different disease process as the first pull. This means that each health incident and prediction in an individual calf was considered independent from any others, allowing for that animal to be newly classified unless it had been previously categorized as DNF.
Despite the differences in net returns, their findings support that increased specificity results in higher net returns, indicating that false positives cost more than false negatives. This same finding is supported by other studies that have investigated changes in net returns associated with changing diagnostic test characteristics.27 Thus, when using economic analysis to consider predictive models for implementation, specificity (the ability to determine the true negatives) will drive the selection. This consequently sacrifices sensitivity, resulting in an increased number of false negatives. In situations where detecting truly diseased individuals outweighs economic advantages, consideration should be given to a model with an increased sensitivity.
The models designed in this study have never been implemented to make culling decisions in real time, and thus the results reported are simply expectations. This paper investigated the economic implications of making a culling decision assuming the animal in question would be shipped the same day the decision was made. This is unlikely to happen in actuality as animals that are culled early are not shipped daily and often must remain at the feedyard in order to wait out their withdrawal period. Therefore, we are likely overestimating the net returns in our economic analysis as we did not account for continued loss that would inevitably occur as animals wait to get shipped. The outcome DNF_15 incorporated both animals lost to mortality and those that were culled early. There is no guarantee that the animals that were culled in the original dataset would have died before the end of the feeding phase. However, they were included in our outcome because this population of cattle serves as a source of loss comparable to mortalities at the feedyard.
Additionally, the predictive models were built on data limited to the feedyards we had access to. The formulae created for the economic analysis were based largely on estimates of prices that are constantly changing. Although beyond the scope of the current study, a sensitivity analysis or stochastic model of the economic analysis would be useful in capturing the impact the volatile market has on outcome. Further research and additional data are needed to create more robust analyses.
In conclusion, the intersection between machine learning and livestock production has occurred in recent years. More studies are exploring the application of predictive models to health and productivity challenges in various livestock systems. With this paper, we hope to contribute to the growing body of knowledge on the use of predictive models to make management decisions on US feedyards by creating models that determined whether or not an animal would die in the next 15 days after first and second treatment for BRD. Additionally, we sought to investigate the estimated net returns of using a predictive model to make a culling decision in comparison to the status quo. The boosted decision tree models we constructed performed with moderate accuracy, and we found that by increasing the specificity of those models we were able to generate high net returns in comparison to a feedyard’s status quo. These models and the economic analysis performed in this study cannot capture the complexity and volatility of cattle health and the market. Therefore, analysis with a stochastic predictive and economic model could help more accurately represent what the expected economic benefit would be when using machine learning to implement an early culling strategy at a feedyard.
Supplementary Materials
Supplementary materials are posted online at the journal website: avmajournals.avma.org
Acknowledgments
None reported.
Disclosures
Dr. White is a member of the AJVR Scientific Review Board, but was not involved in the editorial evaluation of or decision to accept this article for publication.
No AI-assisted technologies were used in the generation of this manuscript.
Funding
This research was funded in part by the Beef Cattle Institute and Kansas State University.
ORCID
B. J. White https://orcid.org/0000-0002-4293-6128
References
- 1.↑
Feedlot 2011 part IV: health and health management on U.S. feedlots with a capacity of 1,000 or more head. USDA APHIS. September 2013. Accessed May 2022. https://www.aphis.usda.gov/sites/default/files/feed11_dr_partiv.pdf
- 2.↑
Amrine DE, White BJ, Larson RL. Comparison of classification algorithms to predict outcomes of feedlot cattle identified and treated for bovine respiratory disease. Comput Electron Agric. 2009;105:9–19. doi:10.1016/j.compag.2014.04.009
- 3.↑
Rojas HA, White BJ, Amrine DE, Larson RL. Predicting bovine respiratory disease risk in feedlot cattle in the first 45 days post arrival. Pathogens. 2022;11(4):442. doi:10.3390/pathogens11040442
- 4.↑
Wisnieski L, Norby B, Pierce SJ, Becker T, Gandy JC, Sordillo LM. Predictive models for early lactation disease in transition dairy cattle at dry-off. Prev Vet Med. 2019;163:68–78. doi:10.1016/j.prevetmed.2018.12.014
- 5.↑
Wisnieski L, Amrine DE, Renter DG. Predictive modeling of bovine respiratory disease outcomes in feedlot cattle: a narrative review. Livest Sci. 2021;251:104666.
- 6.↑
Avra TD, Abell KM, Shane DD, Theurer ME, Larson RL, White BJ. A retrospective analysis of risk factors associated with bovine respiratory disease treatment failure in feedlot cattle. J Anim Sci. 2017;95(4):1521–1527. doi:10.2527/jas2016.1254
- 8.↑
CattleFax six state fed steer price. CattleFax. 2024. Accessed May 2021. https://www.cattlefax.com/#!/data/cattle/other-prices/six-state-fed-steer/
- 9.↑
Focus on feedlots monthly reports. Kansas State University Animal Sciences and Industry. Accessed June 1, 2022. https://www.asi.k-state.edu/about/newsletters/focus-on-feedlots/monthly-reports.html
- 10.↑
Johanns A. Prices and profitability models. Agricultural Marketing Resource Center. 2024. Accessed May 2022. https://www.agmrc.org/renewable-energy/prices-and-profitability-models
- 11.↑
R: a language and environment for statistical computing. Version R 4.2.0. R Foundation for Statistical Computing. Accessed May 28, 2021. https://www.R-project.org/
- 12.↑
Microsoft Azure machine learning. Microsoft. Accessed May 24, 2021. https://azure.microsoft.com/en-us/services/machine-learning/
- 13.↑
Heinen L, White B, Amrine D, Larson RL. Evaluation of predictive models to determine final outcome for feedlot cattle based on information available at first treatment for bovine respiratory disease. Am J Vet Res. 2023;84(10):1–8. doi:10.2460/ajvr.23.05.0094
- 14.↑
Roe BP, Yang H, Zhu J, Liu Y, Stancu I, McGregor G. Boosted decision trees as an alternative to artificial neural networks for particle identification. Nucl Instrum Methods Phys Res A. 2005;543(2–3):577–584. doi:10.1016/j.nima.2004.12.018
- 15.↑
likebupt, PeterCLu, v-chmccl. Filter based feature selection. Microsoft. August 28, 2024. Accessed May 2022. https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/filter-based-feature-selection?view=azureml-api-2
- 16.↑
likebupt, Bowen-Guo-Microsoft, PeterCLu, v-chmccl. Tune model hyperparameters. Microsoft. August 28, 2024. Accessed May 2022. https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/tune-model-hyperparameters?view=azureml-api-2
- 17.↑
likebupt, PeterCLu, v-chmccl. Two-class boosted decision tree component. Microsoft. August 28, 2024. Accessed May 2022. https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/two-class-boosted-decision-tree?view=azureml-api-2
- 18.↑
likebupt, PeterCLu, v-chmccl. Cross validate model. Microsoft. August 28, 2024. Accessed May 2022. https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/cross-validate-model?view=azureml-api-2
- 19.↑
Macrosyn tulathromycin for cattle and swine. ValleyVet. 2024. Accessed June 1, 2022. https://www.valleyvet.com/ct_detail.html?pgguid=a1dd674f-99e9-41e7-bd16-5cc0fc3e0c8b
- 20.↑
Horton L, Depenbusch B, Pendell D, Renter DG. Description of feedlot animals culled for slaughter, revenue received, and associations with reported US beef market prices. Bov Pract. 2021;55(1):65–77. doi:10.21423/bovine-vol55no1p65-77
- 21.↑
Babcock A, White B, Renter D, Dubnicka SR, Scott HM. Predicting cumulative risk of bovine respiratory disease complex using feedlot arrival data and daily morbidity and mortality counts. Can J Vet Res. 2013;77(1):33–44.
- 23.↑
Schooling C, Jones H. Clarifying question about “risk factors”: predictors versus explanation. Emerg Themes Epidemiol. 2018;15:10. doi:10.1186/s12982-018-0080-z
- 24.↑
Swain P, Hauska H. The decision tree classifier: design and potential. IEEE Trans Geosci Remote Sens. 1977;15(3):142–147. doi:10.1109/TGE.1977.6498972
- 25.↑
Feuz R, Feuz K, Johnson MD. Improving feedlot profitability using operational data in mortality prediction modeling. J Agric Econ. 2021;46(2):242–255.
- 26.↑
Feuz R, Feuz K, Gradner J, Theurer M, Johnson M. Scalability and robustness of feed yard mortality prediction modeling to improve profitability. Agric Resour Econom Rev. 2022;51(3):610–632. doi:10.1017/age.2022.19
- 27.↑
Theurer M, White B, Larson R, Schroeder TC. A stochastic model to determine the economic value of changing diagnostic test characteristics for identification of cattle for treatment of bovine respiratory disease. J Anim Sci. 2015;93(3):1398–1410.