Porcine reproductive and respiratory syndrome virus (PRRSV) is the causative agent of porcine reproductive and respiratory syndrome (PRRS), one of the most important endemic diseases affecting pigs globally.1–4 The RNA virus belongs to the genus Rodartevirus (order Nidovirales, family Arteriviridae).5,6 Variants of the virus are typically classified into 2 main species, Betaarterivirus suid 1 (type 1) and Betaarterivirus suid 2 (type 2).7 Within North America, it is common to find variants belonging to the type 2 species.8 Specific variants within the type 2 species have been noted as causing a higher clinical impact.9–13 Of recent note, PRRSV lineage 1 C has been identified as causing a high mortality rate in piglets relative to other circulating variants in the US.14,15 In Canada and other parts of North America, the classification of PRRSV into discrete variants is most commonly done using open reading frame 5 (ORF-5) sequences,16–18 using restriction fragment length polymorphism, different types of phylogenetic analysis,16,18–20 or a combination thereof.
Through the use of challenge studies,21 observational studies,22 and industry reports,15,23 variants or specific restriction fragment length polymorphism types have been linked with clinical impact, and such a link can be utilized to inform the veterinary decision-making process24 related to biosecurity or PRRS control. Regular sequencing of PRRSV ORF-5 and widespread use of production management systems in commercial herds could provide a basis for a more formal evaluation of the link between nucleotide sequence data and clinical impacts, for example, through the development and validation of predictive models.25 Despite such potential, attempts to link PRRSV nucleotide sequence data with clinical impact through the use of predictive modeling approaches have been infrequently utilized.18,26 Such models have been used on viral genetic data in attempts to predict the clinical impact of SARS-CoV-2 in people27–29 and the pathogenicity of H5 subtypes in susceptible avian species.30 If such a link is established successfully for PRRSV, it could potentially be used not only to contribute to the overall knowledge of PRRSV but also to inform veterinary practitioners in making herd-level decisions in near-real time and in a manner that would be aligned with herd-level precision medicine.18 However, initial attempts demonstrated that overall predictive performance was found inadequate, specifically as it pertained to high-impact outbreaks.18 One of the potential explanations for such performance was that the predictive power was lost due to the aggregation of genetic code through discrete classifications.18 Therefore, the objective of this study was to determine the predictive potential of the ORF-5 sequence and the basic demographic data on the severity of the impact on selected production parameters observed during clinical PRRSV outbreaks in Ontario sow herds. We hypothesized that predictive models trained on ORF-5 nucleotide sequences and basic demographic data (herd size, herd type, previous infection status to PRRSV, and date clinical signs observed) would be able to classify the clinical impact on abortion and preweaning mortality with higher accuracy than the no information rate, defined as the proportion of the majority class. In that, the majority class is the largest outcome group of the data a machine learning model is trained on. Therefore, if this modeling attempt is successful, the best predicting machine learning model would predict at an accuracy that is higher than the base proportion of the largest outcome category and would have a satisfactory classification performance in each of the outcome classes.
Methods
This study was a retrospective longitudinal study of sow herds that reported clinical outbreaks in Ontario at various points between September 5, 2009, and February 5, 2019, with herds as units of analysis. Working with swine veterinary clinics, veterinarians enrolled any positive sow herds within the southern Ontario region, the owners of which agreed to participate. The start period of the study was the week when the clinical signs, suggestive of PRRSV, were first observed by the producer. The end of the study period was either a fixed duration of 8 weeks after the start of the outbreak (for the 2017–2019 data collection frame), or a variable duration of the outbreak based on when the veterinary report (questionnaire from which production data was gathered) was submitted. The variable study duration was only specific for the 2009–2012 data collection frame.
Data management: survey data collection and management
Only farms with complete responses for preweaning mortality, abortion, or both were included for future analysis. The data utilized for this study were aggregated from across the 2 different survey time frames. In each surveillance, the January 16, 2017, to February 5, 2019 (2017–2019 time frame), and September 5, 2009, to December 26, 2012 (2009–2012 time frame), farm production measures were measured across different temporal periods; a total of 8 weeks since the start of the outbreak (2017–2019 time frame) and the duration between the start of the outbreak and the time when the report was done (2009–2012 time frame), respectively. In both data collection periods, veterinarians gathered production data from each case farm’s production reporting system. This system varied from farm to farm and could have been an in-house electronic reporting system or paper reports. The veterinarian would then transcribe these data to the questionnaire provided. Across both time frames, a case farm, or outbreak, was defined as the observation of clinical signs typical of PRRS reported to veterinary clinics in Ontario and diagnostically confirmed using PCR and ORF-5 nucleotide sequencing conducted in the Animal Health Laboratory (AHL) of the University of Guelph. Although any clinical signs typical of PRRS could be reported, only impacts on preweaning mortality and abortion were utilized in this study. To accommodate for differences in the length of observation periods, period-specific processing of the production measures for modeling abortion and preweaning mortality was done.
The survey data collected between 2017 and 2019 have been explained in previous work18 and followed the University of Guelph, Research Ethics Board Protocol ID 14DC033. In brief, a questionnaire was distributed to swine veterinary clinics in southern Ontario (Canada). The clinics then collected cases from the farms that were part of an Area Regional Control and Elimination program in Ontario, Canada.31 Clinical outcomes (ie, production data) were gathered for 8 weeks postinitial PRRSV confirmation.
The data collected for PRRS outbreaks that occurred during the 2009–2012 time frame followed a similar collection framework, with some differences. The major difference was that the week when a survey was conducted was the end date of the study period for each individual herd. This resulted in a follow-up period that could be longer or shorter than 8 weeks. Because of this difference at the end of the follow-up periods between the 2 data collection frames, we further processed data and aggregated herd-level outcomes separately for the 2 data collection periods. Outcome measures of note for this study were the weekly number of abortions and the percent preweaning mortality of piglets born alive.
Farm demographic data collection has been described previously.18 Specific demographic variables for this paper include a unique farm identifier (AHL case number), most likely PRRS status before the clinical outbreak,18,32 herd size (number of sows on the farm), herd type (production stages of swine on the farm), and date of first clinical signs. As per the University of Guelph, Research Ethics Board Protocol ID14DC033 (2017–2019 time frame) and 10MY025 (2009–2012 time frame), data are not available to share as per approved data sharing agreements.
Data management: outcome aggregation
Although multiple weeks of production data since the start of the outbreak in each herd were available, the observed production parameters were first aggregated into a single production parameter representative of the entire follow-up period. This was done to have a standardized measure of the outbreaks across the 2 study periods. Subsequently, these new aggregated measures were categorized into 2 or 3 discrete levels, based on the magnitude of this mean impact on each production parameter considered in this study. These groupings were informed based on consultation with swine veterinarians, and the distribution of data between years and outcomes was tested via a Kruskal-Wallis rank sum test.
The 2009–2012 abortion data were measured as a percentage of the number of sows on the farm who aborted and converted to per 1,000 sows. For preweaning mortality data within the 2009–2012 period provided by veterinary clinics, as it was calculated as a percentage of the total number of suckling piglets from the total mortality and total pigs born since the start of the outbreak, no further outcome transformation was needed. Specifically, as it was measured as the total preweaning mortality since the start of the outbreak, already an aggregate measure for the entire period, it did not require further transformation.
Regarding abortion outcomes measured between 2017 and 2019, as they were measured as the number of sows that were aborted in each week, weekly counts were then summed across all 8 weeks and then divided by herd size and transformed to the total number of abortions per 1,000 sows. As the preweaning data gathered between 2017 and 2019 was calculated on a weekly basis for the 8 weeks postinitial clinical signs, the mean percent of preweaning mortality was calculated from the 8 weeks of preweaning mortality numbers, and this mean value was taken as a representative of the outbreak.
As mentioned, the distributions for each outcome measure were examined and tested for differences through the Kruskal-Wallis rank sum test. Based on a lack of statistical significance between study periods, data between the 2 strata were merged and then classified into ordinal outcomes. These ordinal outcomes were high, medium, and none (no impact) for abortion and low or high impact for preweaning mortality. The number of categories for each outcome was determined by the distribution of observations across each outcome measure. Abortion high-impact outcomes measured < 20 sows aborting per 1,000 sows, medium-impact outcomes measured > 0 and ≤ 20 sows aborting per 1,000 sows, and the none category consisted of farms that reported zero abortions per 1,000 sows. Farms that had preweaning mortalities ≥ 0% or ≤ 20% were categorized as low impact, while farms with > 20% preweaning mortality were classified as high-impact farms. Data management and descriptive analysis were conducted in the statistical software R, version 4.1.1.33 Additional packages utilized for data management were tidyverse, doBy, dplyr, and lubridate.34–37
Data management: sequence transformation and alignment
Multiple sequence alignment was performed in Molecular Evolutionary Genetics Analysis, version 7 (MEGA 7),38 utilizing the ClustalW method.18 After sequence alignment, sequence data were imported into the statistical software R, version 4.1.1, through the package Biostrings.33,39 Subsequently, sequences were matched to farms based on AHL case numbers. The aligned and matched sequences were analyzed through multiple correspondence analysis (MCA) utilizing the packages FactoMineR and FactoShiny.40,41
Statistical dimension reduction and prognostic modeling
Multiple correspondence analysis is a statistical technique used for analyzing the relationships between categorical variables.41,42 Utilizing MCA in this context aims to represent the relationships between nucleotide base pairs in a lower dimensional space, making it easier to visualize and interpret the data. It achieves dimensionality reduction by creating new synthetic variables (dimensions), which are linear combinations of the original categorical nucleotides. Variability explained by these new variables was assessed across 2 dimensions and by outcome category for both abortion and preweaning mortality outcomes. Both the raw aligned sequence data and MCA dimensions were used separately in models for input data sets along with demographic factors.
Six commonly used machine learning models were chosen for this study.18,25,43–51 These models can be grouped into 3 major categories: tree-based ensemble classifiers, distance-based classifiers, and regularized logistic regressions. The tree-based ensemble classifiers included random forest (RF), gradient boosting machines (GBMs), and extreme boosting machines (XBMs). These tree-based ensemble methods combine multiple classification trees to enhance the predictive performance of trained algorithms. Random forest constructs numerous classification trees, each trained on a different sample and a random subset of predictors, and provides classifications through majority voting.25 In contrast to such predictions based on distinct trees, the other 2 ensemble-based frameworks rely on building trees iteratively through a boosting principle.51 The GBMs iteratively train weak learners, typically shallow classification trees, with each subsequent tree building on the errors of its predecessors by minimizing a loss function.51 Extreme boosting machines improve, in theory, upon GBM by incorporating regularization and a more efficient tree-building algorithm. Both XBM and GBM typically require tuning of a larger number of parameters to prevent overfitting than RFs. The K-nearest neighbors (KNN) is a nonparametric algorithm that classifies a new observation’s class based on its proximity to data points in the training set.25 The KNN algorithm assigns the class label based on the majority class among close neighbors, the number of which is determined during the training and tuning process.25 Regularized forms of logistic regression, least absolute shrinkage and selection operator (LASSO) and ridge logistic regressions (Ridge), introduce a penalty term to the logistic regression, constraining the magnitude of the coefficients and preventing overfitting,25 The penalty employed in Ridge shrinks coefficients toward zero but does not eliminate them entirely. In contrast, the penalty employed in LASSO forces some coefficients to become zero, effectively leading to feature selection.25 All models were fine-tuned with corresponding parameters, such as mtry for RF models; interaction depth, number of trees/nodes, and shrinkage for GBM; λ and α for LASSO and Ridge regressions, respectively; K-neighbors for KNN; and the number of rounds, maximum interaction depth, the number of columns sampled, and the η parameter for XBM.
The model’s name abbreviations are based on the model utilized and the input structure for the genetic data. Models that utilized raw aligned sequences are designated with an “R” in the model name, and models that utilized 2-dimensional MCA data receive an “MCA.” The abbreviated model names are RF raw and MCA (RFR and RFMCA), GBM raw and MCA (GBMR and GBMMCA), LASSO raw (LASSOR), Ridge raw (RIDGER), KNN raw and MCA (KNNR and KNNMCA), and XBM raw and MCA (XBMR and XBMMCA). Models were trained and tuned utilizing k-fold cross-validation. For abortion outcome models, 5-fold cross-validation was utilized, while 4-fold cross-validation was utilized for preweaning mortality models. The number of folds was chosen to distribute the number of observations within each outcome category evenly.
As a note, with the exception of LASSO and Ridge models, both aligned genetic sequence and dimensionality reduction inputs (MCA) were utilized for input data. This culminated in 20 model variations, 10 for each outcome measure. Multiple correspondence inputs were not utilized for Ridge and LASSO models as there is inherent and automated variable reduction built into the models; therefore, it was deemed unnecessary to utilize the reduced data (MCA).
Owing to the limited dataset, model performance was assessed only through the inherent data splitting functionality through cross-validation. Accuracy, the AUC, sensitivity, and specificity were gathered for each model. For abortion models, variations of these measures were considered. These include weighted accuracy and class-specific sensitivity and specificity, along with micro- and macro-AUC scores. These modified measures were chosen based on their ability to account for 3 outcome levels, an unbalanced data set, and the number of observations per class. The hierarchy of these measures for model performance assessment utilized in this study was as follows: accuracy, AUC, and sensitivity. The assessment was a stepwise decision process. For abortion, the per-class sensitivity and specificity measures were investigated, and micro-AUC was utilized as the final deciding measure. Additional model investigations were conducted, such as model-specific variable importance and partial dependence plots for the best-performing model.
Results
Population and clinical impact
The median herd size was 650 with 75% of herds falling below 980 sows. Descriptive measures by stratified time frames and combined time frame post-transformation are provided (Table 1). A Kruskal-Wallis rank sum tests for differences between distributions of outcomes were not significant (P values > .05; preweaning mortality, χ2 = 0.75, degrees of freedom = 1, P = .39; abortion, χ2 = 2.79, degrees of freedom = 1, P = .09).
Descriptive statistics for preweaning mortality (%) and abortion (rate per 1,000 sows) in sow herds experiencing porcine reproductive and respiratory syndrome virus (PRRSV) outbreaks, stratified by time frame and ordinal outcome level in a study of the impact of PRRSV genome, represented by the open reading frame 5 nucleotide sequences and demographic factors on abortion and preweaning mortality in Ontario (Canada) sow herds in a period between 2009 and 2019 obtained from retrospective longitudinal studies.
Clinical measure/year/impact/impact | No. of farms | Median | IQR | Minimum | Maximum |
---|---|---|---|---|---|
Abortion per 1,000 sows | |||||
Combined* | |||||
High | 14 | 104.68 | 118.7 | 23.3 | 500 |
Medium | 24 | 9.21 | 7.78 | 0.67 | 20 |
None | 13 | 0 | 0 | 0 | 0 |
2009–2012 | |||||
High | 10 | 104.68 | 125.78 | 50 | 500 |
Medium | 6 | 10.88 | 2.48 | 3.08 | 18.6 |
None | 5 | 0 | 0 | 0 | 0 |
2017–2019 | |||||
High | 4 | 95.71 | 144.52 | 23.33 | 185.71 |
Medium | 18 | 6.49 | 7.29 | 0.67 | 20 |
None | 8 | 0 | 0 | 0 | 0 |
Preweaning mortality | |||||
Combined* | |||||
High | 16 | 34.16% | 22.72% | 22.34% | 60% |
Low | 27 | 13.2%5 | 10.84% | 0 | 20% |
2009–2012 | |||||
High | 6 | 45.00% | 17.50% | 30% | 60% |
Low | 15 | 14.28% | 15.40% | 0 | 20% |
2017–2019 | |||||
High | 10 | 29.71% | 10.67% | 22.34% | 54.21% |
Low | 12 | 12.83% | 6.5%3 | 6.21% | 20.00% |
Number of farms, median, minimum, maximum, and IQR are presented.
*All observations from the 2009–2012 and 2017–2019 time frames are grouped together.
The mean percent of preweaning mortality for the 2009–2012 dataset was 20.30% (SD = 17.40%; n = 21). When the mean was taken across the 8 weeks for each farm (2017–2019 dataset), the minimum preweaning mortality was 6.21% (SD = 4.75%; n = 22) with a maximum of 54.2% (SD = 27.25%; n = 22). The mean of the weekly averaged preweaning mortality measured across all farms in the 2017–2019 strata was 22.51% (SD = 12.56; n = 22). The preweaning mortality by outcome grouping for the combined data set and each yearly stratum is provided (Table 1).
The mean number of abortions per 1,000 sows for the 2009–2012 dataset was 80.93 or roughly 81 sows per 1,000 (SD = 124.25; n = 21). For the 2017–2019 dataset, on a weekly basis, the minimum number of abortions observed in a given week was 0 with a maximum of 52. When the mean was taken across all farms and all 8 weeks, the average number of abortions was 1.7 or roughly 2 sows per week (SD = 4.84; n = 30). When aggregated, the mean number of abortions per 1,000 sows was 18.42 or roughly 18 sows per 1,000 (SD = 43.44; n = 30) for the 2017–2019 time period. Abortions per 1,000 sows by outcome grouping for the combined data set and each time period stratum are provided (Table 1).
Multiple correspondence analysis
A breakdown of sequences and outcomes across the 2 MCA dimensions utilized is provided (Supplementary Figures S1 and S2). For abortion (n = 51), dimension 1 had the largest percent variance explained with 16.2%, followed by dimension 2 with 9.2%. For preweaning mortality (n = 43), across the 2 dimensions, 25.2% of the variability was explained, with 15.2% explained in dimension 1% and 10.0% explained in dimension 2.
Predictive model performance: abortion
All model accuracies with corresponding SDs and 95% CIs are provided (Table 2). Of the 10 models, 2 had base accuracy values that were significantly different from the no information rate, GBMMCA and XBMMCA (P = .03). Between these 2 models, the XBMMCA model had a higher weighted accuracy and micro-AUC score, 72% and 0.68, respectively. When comparing sensitivities and specificities across these 2 models, the XBMMCA was a better predictor with regard to the correct identification of medium and high outcomes (sensitivity = 50% and 87.5%, respectively) and slightly better at classifying outbreaks that were not high and medium impact (specificity = 92.1%; Table 3) than the GBMMCA model. Conversely, the GBMMCA model was better at predicting low-impact outcomes (sensitivity = 53.8%), along with classifying non–high-impact outbreaks and non–medium-impact outbreaks (specificity = 91.8%). These results culminated in the XBMMCA (accuracy = 60.8; 95% CI, 46.1% to 74.2%) model being the best-performing model, with a 13.8% difference from the no information rate.
Performance measures for predictive models of the impact of PRRSV genome and demographics factors on abortion in Ontario sow herds, using 3-level ordinal outcome prediction (high, medium, and low).
Model | Accuracy (%) | Accuracy SD (%) | 95% CI (%) | Weighted accuracy (%) | No information rate (%) | κ | P value | Macro-AUC | Micro-AUC | F |
---|---|---|---|---|---|---|---|---|---|---|
RFR | 54.90 | 15.7 | 40.3–68.9 | 68.3 | 47 | 0.27 | .16 | 0.59 | 0.65 | 0.48 |
RFMCA | 51.0 | 9.1 | 36.6–65.2 | 54.2 | 47 | 0.09 | .33 | 0.59 | 0.65 | 0.23 |
GBMR | 58.80 | 20.9 | 44.2–72.4 | 71.0 | 47 | 0.33 | .06 | 0.34 | 0.35 | 0.54 |
GBMMCA | 60.70 | 13.2 | 46.1–74.2 | 71.6 | 47 | 0.35 | .03 | 0.34 | 0.36 | 0.57 |
LASSOR | 45.1 | 10.5 | 31.1–59.7 | 53.94 | 47 | 0.08 | .66 | 0.55 | 0.60 | 0.38 |
RIDGER | 43.14 | 13.3 | 29.3–57.7 | 50.5 | 47 | 0.00 | .75 | 0.52 | 0.60 | 0.27 |
KNNR | 47.1 | 6.0 | 32.9–61.5 | 50.0 | 47 | 0.0 | .55 | 0.55 | 0.41 | 0.21 |
KNNMCA | 56.9 | 7.5 | 42.2–70.6 | 64.4 | 47 | 0.29 | .10 | 0.42 | 0.35 | 0.47 |
XBMR | 58.80 | 12.1 | 44.1–72.4 | 73.2 | 47 | 0.35 | .06 | 0.63 | 0.66 | 0.54 |
XBMMCA | 60.80 | 12.4 | 46.1–74.2 | 72 | 47 | 0.34 | .03 | 0.62 | 0.68 | 0.41 |
Accuracy, accuracy SD, accuracy 95% CI, modified weighted accuracy, no information rate, κ statistic, P value, micro- and macro-area under the curve (AUC), and mean F score are presented as detailed (Table 1).
GBMMCA = Gradient boosting machine multiple correspondence analysis. GBMR = Gradient boosting machine. KNNMCA = K-nearest neighbors multiple correspondence analysis. KNNR = K-nearest neighbors raw. LASSOR = Least absolute shrinkage and selection operator raw. RFMCA = Random forest multiple correspondence analysis. RFR = Random forest raw. RIDGER = Ridge raw. XBMMCA = Extreme boosting machine multiple correspondence analysis. XBMR = Extreme boosting machine raw.
Per ordinal outcome class sensitivity (Sn), specificity (Sp), F score, and AUC scores for predictive models trained on PRRSV nucleotide sequence and demographic data in Ontario sow herds to classify herd-level abortion into 3 ordinal outcome classes (high, medium, and low).
Model/outcome | Sn (%) | Sn CI (%) | Sp (%) | Sp CI (%) | F score | AUC |
---|---|---|---|---|---|---|
RFR | ||||||
High | 35.7 | 10.61–60.81 | 86.5 | 75.47–97.5 | 0.42 | 0.60 |
Medium | 79.2 | 62.92–95.41 | 59.3 | 40.73–77.79 | 0.7 | 0.71 |
Low | 30.8 | 5.68–55.86 | 81.6 | 69.25–93.9 | 0.33 | 0.48 |
RFMCA | ||||||
High | 14.3 | −4.04–32.62 | 97.3 | 92.07–100.00 | 0.23 | 0.61 |
Medium | 100 | 100–100 | 11.1 | −0.74–22.97 | 0.66 | 0.68 |
Low | 0.0 | 0–0 | 100 | 100–100 | Na | 0.47 |
GBMR | ||||||
High | 42.9 | 16.93–68.78 | 89.2 | 79.18–99.19 | 0.50 | 0.3 |
Medium | 79.2 | 62.92–95.41 | 55.6 | 36.81–74.3 | 0.69 | 0.39 |
Low | 38.5 | 12.01–64.91 | 86.8 | 76.09–97.59 | 0.43 | 0.32 |
GBMMCA | ||||||
High | 35.7 | 10.61–60.81 | 91.8 | 83.1–100.00 | 0.45 | 0.3 |
Medium | 79.1 | 62.92–95.41 | 51.8 | 33–70.7 | 0.68 | 0.37 |
Low | 53.8 | 26.75–80.95 | 89.4 | 79.72–99.23 | 0.58 | 0.36 |
LASSOR | ||||||
High | 28.6 | 4.91–52.24 | 91.8 | 83.1–100.00 | 0.38 | 0.60 |
Medium | 70.8 | 52.65–89.02 | 37.04 | 18.82–55.25 | 0.59 | 0.58 |
Low | 15.3 | −4.23–35 | 78.9 | 65.98–91.91 | 0.20 | 0.47 |
RIDGER | ||||||
High | 14.3 | −4.04–32.62 | 91.9 | 83.1–100.00 | 0.21 | 0.66 |
Medium | 83.3 | 68.42–98.24 | 22.2 | 6.54–37.9 | 0.61 | 0.47 |
Low | 0.0 | 0–0 | 86.8 | 76.09–97.59 | NA | 0.42 |
KNNR | ||||||
High | 0.0 | 0–0 | 100 | 100–100 | NA | 0.57 |
Medium | 100 | 100–100 | 0.0 | 0–0 | 0.64 | 0.52 |
Low | 0.0 | 0–0 | 100 | 100–100 | NA | 0.57 |
KNNMCA | ||||||
High | 50.0 | 23.81–76.19 | 75.68 | 61.85–89.5 | 0.46 | 0.34 |
Medium | 83.3 | 68.42–98.24 | 59.3 | 40.73–77.79 | 0.72 | 0.35 |
Low | 15.4 | −4.23–35 | 94.7 | 87.64–100 | 0.23 | 0.57 |
XBMR | ||||||
High | 42.8 | 16.93–68.78 | 83.8 | 71.91–95.66 | 0.46 | 0.64 |
Medium | 75.0 | 57.68–92.32 | 74.1 | 57.54–90.6 | 0.73 | 0.72 |
Low | 46.2 | 19.05–73.25 | 78.9 | 65.98–91.91 | 0.44 | 0.78 |
XBMMCA | ||||||
High | 50.0 | 23.81–76.19 | 89.2 | 79.18–99.19 | 0.56 | 0.69 |
Medium | 87.5 | 74.27–100.73 | 51.9 | 33–70.7 | 0.72 | 0.69 |
Low | 23.1 | 0.17–45.98 | 92.1 | 83.53–100.00 | 0.32 | 0.47 |
Model performance was evaluated using folded cross-validation as detailed (Table 1).
Variable importance for the best-fit model is shown (Figure 1). The least important variables were the month of the year, based on the date of first clinical sign, and herd type. Herd size, or the number of sows on the farm, was considered the most important variable when predicting clinical outcomes. This was followed by dimension 2 of the sequence MCA data. Thus, the PRRSV ORF-5 genetic data represented through the 2 dimensions were among the top 3 most important variables in the best-performing model, immediately after herd size. A partial dependence plot based on herd size for the best-fit model is shown (Figure 2).
Predictive model performance: preweaning mortality
Individual model accuracies with corresponding SDs and 95% CIs are provided (Table 4). No model had a significantly different accuracy when compared to the no information rate (no information rate = 62.8%). The closest was XBMMCA (P = 0.07), which had the highest model accuracy with 74.4% (95% CI, 58.8% to 86.5%), which is an 11.6% improvement from the no information rate. The XBMMCA also had the largest sensitivity value (56.20%). When comparing specificities across all models, the XBMMCA model was the second lowest, with the LASSOR model having the lowest specificity (85.20% and 77.80%, respectively). Regarding AUC scores, the XBMMCA model fell immediately when compared to the other models, with an AUC of 0.61. Of the other models KNNMCA, LASSOR, and RIDGER had larger AUC values, while the model had the same as the XBMMCA model. Based on 2 of the 3 parameters model measures being the highest performing, XBMMCA was deemed the best performing model.
Performance measures for predictive models of the impact of PRRSV genome and demographic factors on preweaning mortality in Ontario sow herds, using 2-level outcome prediction (high and low).
Model | Accuracy (%) | Accuracy SD (%) | 95% CI (%) | No information (%) | κ | P value | Sn (%) | Sn CI (%) | Sp (%) | Sp CI (%) | AUC |
---|---|---|---|---|---|---|---|---|---|---|---|
RFR | 65.1 | 3.2 | 49.07–79.00 | 62.80 | 0.07 | .44 | 6.30 | 1.60–30.2 | 100.00 | 87.2–100 | 0.62 |
RFMCA | 62.8 | 6.3 | 46.7–77.0 | 62.80 | 0.08 | .57 | 18.80 | 4.05–45.65 | 88.90 | 70.84–97.65 | 0.57 |
GBMR | 72.1 | 6.7 | 56.3– −84.7 | 62.80 | 0.31 | .13 | 31.20 | 11.02–58.66 | 96.00 | 81.03–99.99 | 0.47 |
GBMMCA | 69.8 | 7.9 | 53.9–82.8 | 62.80 | 0.26 | .21 | 31.20 | 11.02–58.66 | 92.60 | 75.71–99.09 | 0.57 |
LASSOR | 65.1 | 8.1 | 49.1–79.0 | 62.80 | 0.22 | .44 | 43.80 | 19.75–70.12 | 77.80 | 57.74–91.38 | 0.66 |
RIDGER | 67.4 | 6.0 | 51.4–81.0 | 62.80 | 0.15 | .32 | 12.50 | 1.55–38.35 | 100.00 | 87.23–100 | 0.66 |
KNNR | 65.1 | 5.4 | 49.1–79.0 | 62.80 | 0.08 | .44 | 6.30 | 0.16–30.23 | 100.00 | 87.23–100 | 0.57 |
KNNMCA | 69.8 | 16.2 | 53.9–82.8 | 62.80 | 0.31 | .22 | 43.70 | 19.75–70.12 | 85.20 | 66.27–95.81 | 0.64 |
XBMR | 67.4 | 11.4 | 51.4–81.0 | 62.80 | 0.2 | .32 | 25.00 | 7.27–52.38 | 92.60 | 75.71–99.09 | 0.52 |
XBMMCA | 74.4 | 13.2 | 58.8–86.5 | 62.80 | 0.43 | .07 | 56.20 | 29.88–80.25 | 85.20 | 66.27–95.81 | 0.61 |
Accuracy, accuracy SD, accuracy 95% CI, no information rate, κ statistic, P value, Sn, Sp, and AUC are displayed as detailed (Table 1).
Variable importance for the best-fit model is shown (Figure 1). The least important variables were time of the year, based on the date of first clinical sign; herd type, specifically farrow to grow; and previous herd status to PRRSV. Herd size, or the number of sows on the farm, was considered the second most important variable when predicting clinical outcomes. The most important was dimension 2 of the sequence MCA data, while dimension 1 was the third most important. Thus, PRRSV ORF-5 genetic data represented through the 2 dimensions were also among the top 3 most important variables in the best-performing predictive model, with dimension 2 being the most important variable. A partial dependence plot based on herd size for the best-fit model is shown (Figure 2). Note that predictive probabilities were more variable at smaller herd sizes, while the largest herd sizes did not differ greatly in magnitude.
Discussion
The best-performing models, XBMMCA for both health outcomes, increased accuracy over the frequency of the majority class by 13.8% and 11.6% for abortion and preweaning mortality, respectively. This suggests that the entire ORF-5 sequences, in combination with basic demographic factors, contain information that could be linked with PRRS impact on swine farms for predictive purposes. Nonetheless, the increase in accuracy does not warrant the deployment or routine use of such models. Sensitivities were consistently low for high-impact outcomes, with a maximum sensitivity being observed at 50% and 56.2% in abortion and preweaning models, respectively. Outbreaks with high impact on production are generally of the most interest, and the model that could be reasonably used in the field should have this capacity. Overall, this is consistent with other work done in the area.18,26 Melmer et al18 used an RF model with a continuous outcome for abortion and preweaning mortality. Due to the nature of the outcome, a direct comparison between model performances was not possible. However, the previous model consistently underpredicted outbreaks with high-quantitative impacts,18 which is consistent with the results observed in the current models. Chadha et al26 used PRRSV sequence data from different time periods on 3 performance measures: sow mortality, preweaning mortality, and abortion. Interestingly, although the original data were different and different data processing steps and modeling approaches were utilized, the increase in accuracy over the baseline frequency was qualitatively similar. There could be several reasons for this limited increase in accuracy. One possibility is that the existing variables were not recorded with sufficient accuracy (eg, previous PRRS stability status), which would hinder model performance, reasons for which have been mentioned previously.18 An additional consideration is that the information comprised within ORF-5 sequences, in conjunction with other basic variables, only contains sufficient information for a moderate increase in accuracy, potentially suggesting that more data are needed to increase accuracy. This could include the collection of more data or sequencing of other parts of the genome or the entire PRRSV genome itself. As this type of modeling focuses on predicting outcomes that are a combination of disease impact and disease spread before interventions occur, alternatively, other herd-level variables such as host genetics and the population structure of herds could add to the predictive performance as well.
Viral genetics, herein represented by the 2 dimensions of the MCA, was consistently observed to be among the top 3 most important variables for both best-performing models. This corroborates the predictive potential of viral genetics concerning clinical impact.18 This could indicate a genetic differentiation to a virus’s impact or potential impact on a farm, similar to previous findings.18 However, it should be noted, that even though genetics can be seen as overall important when predicting clinical impact, it was not considered equally important across both outcomes. This could also indicate that the degree to which genetics predicts clinical impact could differ based on the specific performance measure being examined, and therefore, future work should continue to look at genetics in conjunction with herd demographics for prognostic modeling of PRRS impact. Nonetheless, more data are needed to increase the robustness of model training, which could lead to improvement in model accuracy and higher confidence in the ranking of variable importance. Alternative approaches to sequence transformation should be explored, and the inclusion of sequences from wider geographic regions could increase the external validity of such findings. With widely available electronic production record keeping and regular sequencing of PRRS-positive samples in many swine-producing regions,18–20,52,53 this task should be more achievable now than at any previous time.
Other variables that were identified as important included herd size, season (represented by individual months), and herd type. Herd demographics are known to impact the severity and duration of a PRRS outbreak.54 Herd size was identified as an important predictor across both outcomes in the previous study.18 In particular, herd size was considered most important when predicting abortion outcomes, while second most when predicting preweaning outcomes, which is consistent with previous results.18 Partial dependence plots indicate that smaller herd sizes were more likely to have higher clinical impacts.18 A possible explanation, reiterated here, is that herds with a small sow inventory and a high impact on production parameters would have a higher likelihood of entering into the study when compared to herds of comparable size but with a lower impact on production.18 This would lead to an overrepresentation of small herds with noticeable clinical impact relative to small herds with lower clinical impact, which could be more challenging to detect and include because of the small population. Similar to other studies,18 previous PRRS status, essentially representing PRRSV herd immunity, was less important in the analysis than anticipated. Although speculative, we believe this could be a consequence of a lack of standardized interpretation of PRRS stability or just insufficient measurement to accurately assess herd-level PRRSV immunity.
The main limitation of this study is associated with sample size. The number of farms included in this study was increased from its predecessor18; however, it was still not substantial enough to have separate training and testing data sets. Therefore, the modeling relied on the internal validation methods of k-fold cross-validation. Future work can address this issue with larger data sets, which would also mitigate other limitations such as poorly differentiated outcomes.26 The sample size was also hindered by data quality, as herds were also excluded due to incomplete information, one of which was missing data related to outcome measures. This exclusion added to the potential selection bias. Standardized long-term surveillance programs would mitigate this issue in the future, along with limiting the potential for misclassification of herds regarding previous herd exposure to PRRSV.
This immediate study, and its predecessor,18 relied heavily on herds enrolled in the Area Regional Control and Elimination program, which is a voluntary surveillance program. However, some of the case-positive herds were excluded for several reasons, such as incomplete questionnaires. Both mechanisms could lead to selection bias. This potential for selection bias is not well understood, and it is unclear whether the magnitude of impact on the farm could have influenced the decision to participate. We could speculate that owners of herds with a high impact of PRRS on production would be interested in participation; however, they might also have higher requirements on their time and resources while managing a severe outbreak. Contrary to this, outbreaks with minimal production impacts may have been excluded, leading to the omission of less virulent virus forms and potentially underestimating the effects of PRRSV clades or genotypes on production parameters.18 Either of these could limit the spectrum of severity of PRRSV impact in the study population relative to the source population. This could introduce further bias if the reason for participation is also associated with the PRRS genome. The present study was an extension of previous work conducted18 with more robust genetic input data, and a larger sample size of farms by comparison. As such, we hypothesized that the predictive models trained on ORF-5 nucleotide sequence and basic demographic data would be able to classify clinical impact on abortion and preweaning mortality with higher accuracy than the no information rate and improve upon previously attempted modeling efforts.18 The best-performing predictive models utilizing PRRSV ORF-5 nucleotide sequence data and basic demographic information increased predictive accuracy above the no information rate (frequency of the majority class). Genetic information, represented by the 2 dimensions from MCA, was consistently among the top 3 most important variables providing support for the value of PRRSV genetic data. Nevertheless, the increase in predictive accuracy was modest, which was consistent with other studies using similar approaches and the same source population yet different time periods.18,26 Furthermore, the most severe outbreaks were predicted at an unsatisfactory level.18 Ultimately, this hinders the utility of these models for the practical application of predicting the impact of PRRS in real time. It could be the case that raw genetics, and/or MCA representations along with the basic demographic factors shown herein, may be insufficient when predicting clinical outcomes of PRRSV infection. Utilizing the whole genome, or different representations of genetic data, such as k-mers,48 could help facilitate this work and potentially ascertain the degree to which genetics plays a role in the impact of the virus. Further, for an accurate representation of the impact of PRRS, future work needs to utilize substantially larger data sets gathered through robust PRRSV surveillance. This would then also mitigate potential biases associated with demographic predictors. Overall, machine learning approaches show promise in the classification of PRRS impact, supporting our initial hypothesis, but their performance for this specific application needs to improve before they can be deployed in the field to assist practitioners in their decision-making related to the management of PRRS outbreaks.
Supplementary Materials
Supplementary materials are posted online at the journal website: avmajournals.avma.org.
Acknowledgments
None reported.
Disclosures
The authors have nothing to disclose. No AI-assisted technologies were used in the generation of this manuscript.
Funding
The Ontario Ministry of Agriculture Food and Rural Affairs and a Natural Sciences and Engineering Research Council Discovery Grant supported this study financially.
ORCID
D. Melmer https://orcid.org/0000-0001-9215-0441
References
- 1.↑
Davies PR. One world, one health: the threat of emerging swine diseases. A North American perspective. Transbound Emerg Dis. 2012;59(suppl 1):18–26. doi:10.1111/j.1865-1682.2012.01312.x
- 2.
Murtaugh MP, Stadejek T, Abrahante JE, Lam TT, Leung FC. The ever-expanding diversity of porcine reproductive and respiratory syndrome virus. Virus Res. 2010;154(1–2):18–30. doi:10.1016/j.virusres.2010.08.015
- 3.
Keay S, Sargeant JM, O’Connor A, Friendship R, O’Sullivan T, Poljak Z. Veterinarian barriers to knowledge translation (KT) within the context of swine infectious disease research: an international survey of swine veterinarians. BMC Vet Res. 2020;16(1):416. doi:10.1186/s12917-020-02617-8
- 4.↑
Franzo G, Faustini G, Legnardi M, Cecchinato M, Drigo M, Tucciarone CM. Phylodynamic and phylogeographic reconstruction of porcine reproductive and respiratory syndrome virus (PRRSV) in Europe: patterns and determinants. Transbound Emerg Dis. 2022;69(5):e2175–e2184. doi:10.1111/tbed.14556
- 5.↑
Meulenberg JJ, Hulst MM, de Meijer EJ, et al. Lelystad virus, the causative agent of porcine epidemic abortion and respiratory syndrome (PEARS), is related to LDV and EAV. Virology. 1993;192(1):62–72. doi:10.1006/viro.1993.1008.
- 6.↑
Kuhn JH, Lauck M, Bailey AL, et al. Reorganization and expansion of the nidoviral family Arteriviridae. Arch Virol. 2016;161(3):755–768. doi:10.1007/s00705-015-2672-z
- 7.↑
Walker PJ, Siddell SG, Lefkowitz EJ, et al. Changes to virus taxonomy and the statutes ratified by the International Committee on Taxonomy of Viruses (2020). Arch Virol. 2020;165(11):2737–2748. doi:10.1007/s00705-020-04752-x
- 8.↑
Zimmerman JJ, Dee SA, Holtkamp DJ, et al. Porcine reproductive and respiratory syndrome viruses (porcine arteriviruses). In: Zimmerman JJ, Karriker LA, Ramirez A, Schwartz KJ, Stevenson GW, Zhang J, eds. Diseases of Swine. 11th ed. Wiley-Blackwell; 2019:685–708.
- 9.↑
Alkhamis MA, Perez AM, Murtaugh MP, Wang X, Morrison RB. Applications of Bayesian phylodynamic methods in a recent U.S. porcine reproductive and respiratory syndrome virus outbreak. Front Microbiol. 2016;7:67. 10.3389/fmicb.2016.00067
- 10.
Zhang HL, Zhang WL, Xiang LR, et al. Emergence of novel porcine reproductive and respiratory syndrome viruses (ORF5 RFLP 1-7-4 viruses) in China. Vet Microbiol. 2018;222:105–108. doi:10.1016/j.vetmic.2018.06.017
- 11.
Larochelle R, D’Allaire S, Magar R. Molecular epidemiology of porcine reproductive and respiratory syndrome virus (PRRSV) in Québec. Virus Res. 2003;96(1–2):3–14. doi:10.1016/s0168-1702(03)00168-0
- 12.
Shi M, Lemey P, Singh Brar M, et al. The spread of type 2 porcine reproductive and respiratory syndrome virus (PRRSV) in North America: a phylogeographic approach. Virology. 2013;447(1–2):146–154. doi:10.1016/j.virol.2013.08.028
- 13.↑
Tian K, Yu X, Zhao T, et al. Emergence of fatal PRRSV variants: unparalleled outbreaks of atypical PRRS in China and molecular dissection of the unique hallmark. PLoS One. 2007;2(6):e526. doi:10.1371/journal.pone.0000526
- 14.↑
Kikuti M, Paploski IAD, Pamornchainavakul N, et al. Emergence of a new lineage 1C variant of porcine reproductive and respiratory syndrome virus 2 in the United States. Front Vet Sci. 2021;8:752938. doi:10.3389/fvets.2021.752938
- 15.↑
SHIC/AASV PRRS 1-4-4 lineage 1C webinar provides information on recent outbreaks. Swine Health Information Centre. 2021. Accessed August 7, 2022. https://www.swinehealth.org/shic-aasv-prrs-1-4-4-lineage-1c-webinar-provides-information-on-recent-outbreaks/
- 16.↑
Melmer DJ, Friendship R, O’Sullivan TL, et al. Classification of porcine reproductive and respiratory syndrome virus in Ontario using Bayesian phylogenetics and assessment of temporal trends. Can J Vet Res. 2021;85(2):83–92.
- 17.
Kvisgaard LK, Larsen LE, Hjulsager CK, et al. Genetic and biological characterization of a porcine reproductive and respiratory syndrome virus 2 (PRRSV-2) causing significant clinical disease in the field. Vet Microbiol. 2017;211:74–83. Published correction appears in Vet Microbiol. 2018;213:143–149. doi:10.1016/j.vetmic.2017.10.001
- 18.↑
Melmer DJ, O’Sullivan TL, Greer A, et al. The impact of porcine reproductive and respiratory syndrome virus (PRRSV) genotypes, established on the basis of ORF-5 nucleotide sequences, on three production parameters in Ontario sow farms. Prev Vet Med. 2021;189:105312. doi:10.1016/j.prevetmed.2021.105312
- 19.
Lambert M, Delisle B, Arsenault J, Poljak Z, D’Allaire S. Positioning Quebec ORF5 sequences of porcine reproductive and respiratory syndrome virus (PRRSV) within Canada and worldwide diversity. Infect Genet Evol. 2019;74:103999. doi:10.1016/j.meegid.2019.103999
- 20.↑
Lambert M, Arsenault J, Audet P, Delisle B, D’Allaire S. Evaluating an automated clustering approach in a perspective of ongoing surveillance of porcine reproductive and respiratory syndrome virus (PRRSV) field strains. Infect Genet Evol. 2019;73:295–305. doi:10.1016/j.meegid.2019.04.014
- 21.↑
Goldberg TL, Weigel RM, Hahn EC, Scherba G. Associations between genetics, farm characteristics and clinical disease in field outbreaks of porcine reproductive and respiratory syndrome virus. Prev Vet Med. 2000;43(4):293–302. doi:10.1016/s0167-5877(99)00104-x
- 22.↑
Rosendal T, Dewey C, Friendship R, Wootton S, Young B, Poljak Z. Association between the genetic similarity of the open reading frame 5 sequence of porcine reproductive and respiratory syndrome virus and the similarity in clinical signs of porcine reproductive and respiratory syndrome in Ontario swine herds. Can J Vet Res. 2014;78(4):250–259.
- 23.↑
Ontario Animal Health Network (OAHN) swine network quarterly producer/industry report. Ontario Animal Health Network. 2020. Accessed December 3, 2022. https://www.oahn.ca/reports/swine-producer-industry-report-q4-2020/
- 24.↑
Vandeweerd JM, Vandeweerd S, Gustin C, et al. Understanding veterinary practitioners’ decision-making process: implications for veterinary medical education. J Vet Med Educ. 2012;39(2):142–151. doi:10.3138/jvme.0911.098R1
- 26.↑
Chadha A, Dara R, Pearl DL, Gillis D, Rosendal T, Poljak Z. Classification of porcine reproductive and respiratory syndrome clinical impact in Ontario sow herds using machine learning approaches. Front Vet Sci. 2023;10:1175569. doi:10.3389/fvets.2023.1175569
- 27.↑
Nagpal S, Pinna NK, Pant N, Singh R, Srivastava D, Mande SS. Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? J Mol Biol. 2022;434(15):167684. doi:10.1016/j.jmb.2022.167684
- 28.
Sokhansanj BA, Rosen GL. Predicting COVID-19 disease severity from SARS-CoV-2 spike protein sequence by mixed effects machine learning. Comput Biol Med. 2022;149:105969. doi:10.1016/j.compbiomed.2022.105969
- 29.↑
Sokhansanj BA, Zhao Z, Rosen GL. Interpretable and predictive deep neural network modeling of the SARS-CoV-2 spike protein sequence to predict COVID-19 disease severity. Biology (Basel). 2022;11(12):1786. doi:10.3390/biology11121786
- 30.↑
Chadha A, Dara R, Pearl DL, Sharif S, Poljak Z. Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques. Prev Vet Med. 2023;216:105924. doi:10.1016/j.prevetmed.2023.105924
- 31.↑
Arruda AG, Poljak Z, Friendship R, Carpenter J, Hand K. Descriptive analysis and spatial epidemiology of porcine reproductive and respiratory syndrome (PRRS) for swine sites participating in area regional control and elimination programs from 3 regions of Ontario. Can J Vet Res. 2015;79(4):268–278.
- 32.↑
Holtkamp DJ, Morrison B, Classen DM, et al. Terminology for classifying swine herds by PRRS status. J Swine Health Prod. 2011;19(1):44–56.
- 33.↑
R Core Team. R: a Language and Environment for Statistical Computing. R Foundation for Statistical Computing. 2021. https://www.r-project.org/
- 34.↑
Wickham H, Averick M, Bryan J, et al. Welcome to the tidyverse. J Open Source Softw. 2019;4(43):1686. doi:10.21105/joss.01686
- 35.
Grolemund G, Wickham H. Dates and times made easy with lubridate. J Stat Softw. 2011;3(40):1–25. doi:10.18637/jss.v040.i03
- 36.
Højsgaard S, Halekoh U. doBy: Groupwise Statistics, LSmeans, Linear Contrasts, Utilities. 2016. http://cran.r-project.org/package=doBy
- 37.↑
Wickham H, Francois R, Lionel H, Muller K. dplyr: a Grammar of Data Manipulation. 2020. https://cran.r-project.org/package=dplyr
- 38.↑
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–1874. doi:10.1093/molbev/msw054
- 39.↑
Pages H, Aboyoun P, Gentleman R, Debroy R. Biostrings: Efficient Manipulation of Biological Strings. R Package, Version 2620. 2021. Accessed August 21, 2022. https://bioconductor.org/packages/Biostrings
- 40.↑
Vaissie P, Monge A, Husson F. Factoshiny: Perform Factorial Analysis from “FactoMineR” with a Shiny Application. R Package, Version 24. 2021. Accessed August 21, 2022. https://CRAN.R-project.org/package=Factoshiny
- 41.↑
Lê S, Josse J, Husson F. FactoMineR: an R package for multivariate analysis. J Stat Softw. 2008;25(1):1–18. doi:10.18637/jss.v025.i01
- 42.↑
Abdi H, Valentin D. Multiple correspondence analysis. In: Salkind N, ed. Encyclopedia of Measurement and Statistics. Vol 2. Sage; 2007. Accessed July 6, 2022. https://personal.utdallas.edu/∼herve/Abdi-MCA2007-pretty.pdf
- 43.↑
Babayan SA, Orton RJ, Streicker DG. Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science. 2018;362(6414):577–580. doi:10.1126/science.aap9072
- 44.
Valdes-Donoso P, VanderWaal K, Jarvis LS, Wayne SR, Perez AM. Using machine learning to predict swine movements within a regional program to improve control of infectious diseases in the US. Front Vet Sci. 2017;4:2. doi:10.3389/fvets.2017.00002
- 45.
Silva GS, Machado G, Baker KL, Holtkamp DJ, Linhares DCL. Machine-learning algorithms to identify key biosecurity practices and factors associated with breeding herds reporting PRRS outbreak. Prev Vet Med. 2019;171:104749. doi:10.1016/j.prevetmed.2019.104749
- 46.
Muthukrishnan R, Rohini R. LASSO: a feature selection technique in predictive modeling for machine learning. In: 2016 IEEE International Conference on Advances in Computer Applications (ICACA). Institute of Electrical and Electronics Engineers; 2017:18–20. Accessed July 7, 2022. doi:10.1109/ICACA.2016.7887916
- 47.
Ogutu JO, Schulz-Streeck T, Piepho HP. Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc. 2012;6(suppl 2):S10. doi:10.1186/1753-6561-6-S2-S10
- 48.↑
Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 2017;5(1):69. doi:10.1186/s40168-017-0283-5
- 49.
Zhang M, Yang L, Ren J, Ahlgren NA, Fuhrman JA, Sun F. Prediction of virus-host infectious association by supervised learning methods. BMC Bioinformatics. 2017;18(suppl 3):60. doi:10.1186/s12859-017-1473-7
- 50.
Kim J, Lee K, Rupasinghe R, Rezaei S, Martínez-López B, Liu X. Applications of machine learning for the classification of porcine reproductive and respiratory syndrome virus sublineages using amino acid scores of ORF5 gene. Front Vet Sci. 2021;8:683134. doi:10.3389/fvets.2021.683134
- 51.↑
Bentéjac C, Csörgő A, Martínez-Muñoz G. A comparative analysis of gradient boosting algorithms. Artif Intel Rev. 2021;54:1937–1967.
- 52.↑
Arruda AG, Vilalta C, Puig P, Perez A, Alba A. Time-series analysis for porcine reproductive and respiratory syndrome in the United States. PLoS One. 2018;13(4):e0195282. doi:10.1371/journal.pone.0195282
- 53.↑
Alkhamis MA, Arruda AG, Morrison RB, Perez AM. Novel approaches for spatial and molecular surveillance of porcine reproductive and respiratory syndrome virus in the United States. Sci Rep. 2017;7:4343. doi:10.1038/s41598-017-04628-2
- 54.↑
Linhares DC, Cano JP, Torremorell M, Morrison RB. Comparison of time to PRRSv-stability and production losses between two exposure programs to control PRRSv in sow herds. Prev Vet Med. 2014;116(1–2):111–119. doi:10.1016/j.prevetmed.2014.05.010