Introduction
Veterinary medical education programs are academically intensive, and some students may experience academic difficulties resulting in delayed graduation, academic dismissal,1–3 or withdrawal due to poor academic performance.4 Specifically, veterinary students report difficulties in managing a heavy workload with complex content and experience additional nonacademic challenges such as financial worries and constraints, struggles with high expectations, declining physical health, challenging peer relationships, isolation, and lack of support.5,6 Evidence strongly indicates that when at-risk students are identified at the early stages of their academic career, targeted interventions can be most effective and have a long-lasting impact in improving academic achievement.7,8 Simple early interventions such as having disadvantaged high school students read normative experiences from university students about how they adjusted to college or how these students shared similar challenges (ie, not fitting in or making friends) increased full-time enrollment rates throughout students’ college pursuits as well as increasing students’ grade point averages (GPAs).9
Current research in veterinary medical education focuses on preadmissions data to inform admissions committees on how to best select applicants who will succeed in completing and graduating the veterinary curriculum, reducing the educational debt to the prospective applicant while decreasing the cost to the institution.10 Specifically, the limited number of studies assessing preadmission predictors of veterinary academic success has found that undergraduate cumulative GPA, science GPA,1,2,10,11 the graduate record examination (GRE),11–13 and the demographic variable of age14 can all directly positively correlate with veterinary school grades.
In recent years, many veterinary schools have begun to embrace holistic admission review processes, defined by the American Association of Medical Colleges as “mission-aligned admissions or selection processes that take into consideration applicants’ experiences, attributes, and academic metrics as well as the value an applicant would contribute to learning, practice, and teaching.”15 The adoption of holistic admission expands the applicant pool and, as such, the need to gain further appreciation surrounding predictors of success in veterinary academia.
It has been proposed that machine learning (ML), a subfield of AI, is an untapped potential methodology for college administrators and educators to identify students at risk of failing academically during the preclinical training years.16 While many ML algorithms exist, random forests have been suggested to be particularly useful in higher education, as they combine the outputs of many different decision trees into 1 overall optimal prediction model.16–18 A combination of accurate prediction models and successful interventions could help support the development of a diverse veterinary profession.
The overarching goal of this study was to employ ML to help identify factors that predict DVM candidates enrolled in a year-round veterinary professional program who are at risk of academic failure. By identifying the factors most important for predicting academic success and failure, future studies can develop and assess targeted interventions, with prospective studies designed to determine whether these interventions successfully help identified at-risk students.19 A combination of accurate prediction models and successful interventions may contribute to the development of a diverse veterinary profession.
The results of a prior study16 that used simulated educational data suggest that the random forest ML model is a robust algorithm for veterinary student data, in part due to being resilient to outlier data and handling nonlinear data. Therefore, we hypothesized that random forest classifier ML models could be used to successfully predict students who will or will not graduate or have graduated from a year-round veterinary program.
The objectives of this study were to (1) build a random forest model that could be used to identify the most important pre- and postadmission determinants for predicting student graduation or withdrawal from the program and (2) assess whether these most important predictors change over a period of 10 years.
Methods
The Ross University School of Veterinary Medicine program overview
The Ross University School of Veterinary Medicine (RUSVM) curriculum is year-round and 40 months in duration. Admission to the program can occur in January, May, or August, and therefore, the year-round academic calendar extends across three 15-week semesters (fall, winter, and summer). Students are required to successfully complete 7 preclinical semesters on the RUSVM campus located on the island of Saint Kitts in the West Indies. Subsequently, students complete 3 clinical semesters at an affiliated AVMA-accredited school.
Selection of data
The RUSVM student data were sourced from an internal database, OutReachIQ, and data included all students admitted at RUSVM from January 2013 through May 2022 with curricular outcomes, ensuring data integrity. All demographic data within OutReachIQ were obtained via the standardized application process through the Veterinary Medical College Application Service. The study was approved by the RUSVM Institutional Review Board (No. 22-03-XP).
Preparation and anonymization of data for analysis
All data preparation, model development, and model validation were completed as recommended by Hooper et al16 by use of Spyder (version 5.1.5; Spyder Website Contributors), and Python (version 3.9.12; Python Software Foundation) ran within an Anaconda Distribution (version 2022.10)–designated environment.
Selection and anonymization of student records
Students enrolled in a combined dual-degree program (eg, DVM–Master of Science) were excluded from the dataset, as the structure of the program can be different than the DVM curriculum. All DVM student records were assigned a randomized identification by use of the Python package NumPy (version 1.264.0) prior to anonymizing the dataset. Also, the numerical age at admission was categorized into the following age ranges to maintain the students’ confidentiality while concurrently ensuring an even distribution of students among the age categories: 17 to 22, 23 to 24, 25 to 26, 27 to 29, and 30 and over. Four categorical values were established for the country of origin (Canada, US, other, and not reported), taking into consideration that ≤ students identified from countries outside of North America, once again ensuring confidentiality.
Preparation of predictors
States were grouped into categorical variables based on geographic regions as defined by the US Census Bureau.20 All categorical variables were one-hot encoded by use of the scikit-learn Python package (version 1.2.0).21 All raw and model demographic variables are summarized in Supplementary Tables S1 and S2.
Students admitted to RUSVM for spring, summer, or fall semesters were grouped by year for the admission date. The RUSVM veterinary preparation program (Vet Prep) was assigned 0, and the preclinical students were assigned 1 through 7 according to their semester of enrollment. Ross University School of Veterinary Medicine students enrolled in their clinical year were all assigned the numerical value 10, as the database does not differentiate between the clinical semesters 8 through 10. Graduate record examination scores prior to August 2011 were scored on a scale from 200 to 800, and these were converted to the new GRE scale22 of 130 to 170. Starting in 2020, the GRE became optional for admission to RUSVM; therefore, this variable was not included in the overall model nor the individual models for years 2020 to 2022. A leave of absence (LOA), defined as a minimum of 1 day and maximum of 14 days out of classes, was considered 1 absence, and each LOA regardless of the length was cumulatively added. Awarded scholarships were captured cumulatively, considering that we identified over 182 different scholarships being represented. All raw and model academic variables are summarized in Supplementary Table S3. Students on federal student aid through the Free Application for Federal Student Aid (FAFSA) form with/without a financial aid or registration hold on their account were designated a binary variable, with no represented by 0 and yes represented by 1. All raw and model financial aid variables are summarized in Supplementary Table S4.
Target variable creation
After the dataset was anonymized and cleaned, a categorical (binary) target variable “graduated/active student or not graduated” was created for all student records. Accordingly, the numerical value 0 represented a student which we defined as being enrolled at RUSVM but not having completed the DVM program, while 1 designated a student who graduated with their DVM degree from RUSVM’s accelerated program or was an actively enrolled student in RUSVM’s DVM program. We did not include students on an LOA, since we did not know whether they would return to the DVM program and/or graduate.
We accounted for the severe outcome imbalance observed due to the large percentage of admitted students graduating versus the very low percentage of admitted students not graduating by applying the synthetic minority oversampling technique to balance the dataset. Correlations for all variables were assessed by use of the Pearson correlation coefficient and Cramér V. Variables with a Pearson correlation coefficient or Cramér V over 0.7 were reduced to 1 variable. Citizenship and origin variables were not reduced, as our student population has a high percentage of minority students for which their citizenship may differ from their origin country. Upon creation of the target variable, a total of 11 individual datasets were formed: 1 dataset containing all student records for a model assessing all years concurrently and a dataset for each individual year. Each individual dataset was divided into training and testing sets by use of a 70:30 ratio.
Random forest classification models
A total of 11 random forest classifier models were developed by use of the scikit-learn Python package.21 The model performance was k-fold cross-validated (k = 10) by use of the testing dataset. Subsequently after creation each of the 11 random forest classification models, the hyperparameters were tuned, as previously described.16 The performance of the default parameter random forest classification model was compared to the random forest classification model with hyperparameter tuning. The best model was selected for assessing the most important predictor features based on the Gini impurity criterion, which is considered the most common method for determining features of importance, as explained in Hooper et al.15 All selected values for each year model are summarized in Supplementary Table S5.
Performance metrics
For each model, overall accuracy, sensitivity, specificity, precision, F1 score, and the area under the curve (AUC) of the receiver operating characteristic (ROC) curve were calculated while using k-fold cross-validation (k = 10), as previously described.16 The complete code for building a random forest model, performance metrics, and validation is available.23 An example of simulated student data is also provided within the same repository, as Institutional Review Board restrictions and confidentiality concerns prohibit the public release of the raw student data.
Results
Selection of data
After eliminating records of dual-degree students and students on an LOA and running the synthetic minority oversampling technique to balance the dataset, a total of 8,090 student records were utilized for creation and validation of the random forest classification models. Student semester and completed credits were highly correlated (r = 0.98), and preclinical GPA was highly correlated to cumulative GPA (r = 0.99). While random forests can handle correlated data, they cannot handle nearly perfectly correlated data, so only student semester was retained, as all students complete the same preclinical curriculum. We retained only the preclinical cumulative GPA and excluded both the clinical and overall cumulative GPA due to variability in veterinary colleges’ grading schemes. Citizenship and country of origin were retained in the models due to these variables having low features of importance, and the possible inflation of feature importance scores appeared to be minimal.
Years 2013 to 2022 random forest model
The selected hyperparameters for the best performing random forest model utilizing all student data are reported in Supplementary Table S5.
The overall model that incorporated all data from 2013 to 2022 performed very well to excellently for all validation methods employed. Our model had an excellent ROC AUC score of 0.999 and excellent F1 score of 0.989 (Table 1). Additionally, our model had high accuracy, with 98.9% of the predictions being correctly classified. The recall (sensitivity) was 99.1%, meaning that 99.1% of the time, the model correctly predicted the students who truly graduated from RUSVM or were actively enrolled. The specificity was 98.8%, meaning that 98.8% of the time, the model correctly predicted the students who did not graduate from RUSVM. The results of the k-fold validation methods (k = 10) employed within the performance evaluation methods showed that the model performed well for any given set of data rather than only performing when given the original testing data.
Performance metric results for the overall random forest model with data from 2013 to 2022 and each individual year model.
Model | ROC AUC | Accuracy | Recall (sensitivity) | Specificity | Precision | F1 |
---|---|---|---|---|---|---|
2013–2022 | 0.998 | 0.989 | 0.991 | 0.988 | 0.987 | 0.989 |
2013 | 0.998 | 0.992 | 0.994 | 0.989 | 0.990 | 0.993 |
2014 | 0.998 | 0.995 | 0.998 | 0.989 | 0.990 | 0.995 |
2015 | 0.997 | 0.982 | 0.990 | 0.831 | 0.981 | 0.990 |
2016 | 0.994 | 0.992 | 0.998 | 0.995 | 0.995 | 0.995 |
2017 | 0.993 | 0.983 | 0.989 | 0.780 | 0.984 | 0.990 |
2018 | 0.992 | 0.977 | 0.988 | 0.893 | 0.977 | 0.983 |
2019 | 0.998 | 0.978 | 0.985 | 0.947 | 0.989 | 0.987 |
2020 | 0.981 | 0.960 | 0.988 | 0.892 | 0.959 | 0.971 |
2021 | 0.994 | 0.961 | 0.976 | 0.931 | 0.966 | 0.978 |
2022 | 0.997 | 0.963 | 0.988 | 0.767 | 0.969 | 0.977 |
AUC = Area under the curve. ROC = Receiver operating characteristic.
Most important features
Academic and financial features were identified as contributing the most to model predictions. The top 4 features contributing most to the prediction of our target variable, based on Gini impurity criterion values (reported in parentheses), were student semester (0.282), preclinical cumulative GPA (0.182), failed credits (0.117), and curriculum phase (0.101). All demographic features contributed < 0.012 to the final model. Figure 1 shows the top 10 features. All variables except GRE were included in the final model, and feature reduction methods were not used to reduce the dimensionality of the dataset due to the goals of this study. Supplementary Table S6 shows the feature importance ranking in descending order of importance, with the most important feature having the highest Gini importance value and the least important having the lowest Gini importance value.
Demographic data
None of the demographic variables were identified as one of the top important features of the model. Supplementary Table S6 shows the demographic variables of age ranging from 23 to 29 are more important predictors in the model than the ethnicity/race variables.
Academic data
Most withdrawals occurred during the preclinical training phase of the curriculum. Graduated and actively enrolled students had a higher mean GPA compared to those who did not graduate. Students who graduated or were actively enrolled failed fewer credits and had a lower mean LOA count.
Financial aid data
Over half of students who did not graduate did not have an active FAFSA during their last semester. Eighty-seven percent of those who were actively enrolled had an active FAFSA and larger average outstanding account balance of $5,585 ± $10,069. Graduated and active students received nearly twice as many scholarships as those who did not graduate.
Individual year random forest models
Preclinical cumulative GPA, student semester, and failed credits were found to be included within the most important features in all yearly random forest models. In years where students had entered their clinical year, curriculum phase was found to be a top contributor, with students more likely to graduate if they entered the clinical phase of the RUSVM curriculum. The GRE was not identified as an important feature in the yearly models, which incorporated it as a variable. Its contribution was very minor, with the largest value being 0.025 in 2019. Only the 2017 and 2022 models had a demographic variable in the top 5 most important variables. The 2013 model identified 1 demographic variable of race (White) as fifth, with a value of 0.014. The 2022 model identified 1 demographic variable of origin nation (US) as the fifth most important feature, with a value of 0.024. Other most important features identified were financial aid holds, whether students had an active FAFSA on file, the number of LOAs, registration holds, scholarships, and the total outstanding balance. Nearly all the random forest models had only 3 to 4 variables with feature importance values > 0.10. Figure 2 summarizes the model features by year for the top 5 features. Supplementary Tables S7–S16 show the feature importance ranking for all features.
Discussion
With the ever-changing dynamics of the veterinary student population, there is a need to develop effective methodologies to identify DVM candidates early in their educational pursuits who are at risk of academic difficulties. A recently published ML primer16 suggested that ML algorithms could be used to answer hypothesis-driven questions within the veterinary medical education field. Using this prior study as a launch point, we constructed random forest classification models, a type of ML, to identify potential risk factors of students who suffered academic difficulties while in a veterinary professional program. The study’s overarching goal was to identify variables that should be incorporated into future ML prediction models designed to identify students at risk of academic difficulties. Identifying at-risk students early within their veterinary education enables veterinary colleges to develop targeted interventions for these students prior to academic dismissal.
Our study results supported that the random forest classifier models by year were able to successfully differentiate students who either graduated or were active students in a year-round veterinary medical education training program from those who were not successful in completing the accelerated DVM program. While we used all data available from OutReachIQ, it is important to recognize that the RUSVM student population may not represent the student population at other veterinary colleges and additional or different variables should be assessed. The work we present here provides a framework that other veterinary colleges can adapt to their specific student populations. For example, approximately 35% of RUSVM applicants admitted each year identify as underrepresented minorities, so we incorporated race and ethnicity into a single model. Other colleges must consider whether incorporating race and ethnicity into a single model or creating separate models for underrepresented minority students is necessary.
Additional sources of data may need to be incorporated based on the curriculum. Ross University School of Veterinary Medicine follows a traditional, discipline-based curriculum, whereas other veterinary colleges deliver an integrated, systems-based approach. Data sources relevant to RUSVM and other veterinary programs may include learning management system data (eg, Canvas), Veterinary Medical College Application Service data, and in-house student surveys. These data sources offer a combination of academic and nonacademic factors.
Our models performed with excellent accuracy and sensitivity and had good to excellent specificity, with the lowest specificity occurring in the 2022 admission year model (Table 1). The overall lower specificity, or the ability of our model to classify students who did not graduate in 2017 and 2022, could be explained by a large portion of student withdrawals being related to transferring to a non–year-round DVM program or potentially related to delaying their education during the COVID-19 pandemic, rather than withdrawing due to failing multiple courses. A limitation of our study was that OutReachIQ did not include information on why a student withdrew from the program, as the institutional withdrawal form does not require students to provide a reason. We suggest that veterinary training programs consider requiring students to specify their reason for withdrawal on the form or, alternatively, conduct exit interviews. This information could be incorporated into future studies to help identify additional academic and nonacademic factors that influence withdrawals.
The lower specificity in 2022 may have been influenced by a lack of or less than desired preparation of the students for a rigorous veterinary school curriculum compared to students enrolled prior to the COVID-19 pandemic. We acknowledge that the majority of students admitted in 2022 completed their veterinary school prerequisites during the COVID-19 pandemic remotely and a significant portion of college students reported encountering serious challenges with online learning, with nearly half of college students reporting the belief that their academic performance declined.24 The European Union has recognized some of these issues in higher education and identified that “adapting assessment processes to safeguard quality standards and academic integrity in the context of online learning” is an area that needs to be urgently addressed.24
Adjusting the model hyperparameters (Supplementary Table S5) is just 1 approach for improving model performance. We did not include additional nonacademic variables because our variables were consistently collected across all reported years, which was necessary to assess changes over time. Specificity might have been improved by adding additional nonacademic variables, removing less-relevant variables (feature reduction),16 or by modifying variables, such as distinguishing between in-person and online courses. Future studies can address external factors which were not consistent or anticipated (ie, COVID-19) over the years of study.
As outlined in Table 1, all our random forest models performed excellently with very limited feature reduction, which further supports our hypothesis that random forest classifier models can successfully predict students who will be academically successful in an accelerated DVM program and those at risk of being unsuccessful in completing an accelerated DVM program. Feature reduction is commonly done by data scientists to improve prediction performance while reducing computation time and allowing a better understanding of the data by reducing the complexity of the mode.7,25 Our team decided not to reduce the number of features or variables in our models for the following 2 reasons. First, our models had high performance without feature reduction and no evidence of overfitting. Second, it is equally important for veterinary educators to understand which features are of minimal importance. Understanding all potential features ranging from most important to least important could help inform admissions policies such as whether GRE scores should be a required prerequisite for admission and will influence the development of programs designed to minimize academic difficulties and dismissals due to poor performance.8 This is particularly important, as ML models identify students prior to the start of the first semester or students who are in the early preclinical semesters when intervention will be most effective at reducing academic difficulty and withdrawals or dismissals.7,8,26–28
Random forests are a robust prediction model. Prediction models are typically divided into classification models29 or regression models.30 We chose to use a classification model because we defined a successful student as one who graduated with their DVM degree or was currently enrolled and an active student at RUSVM, rather than the traditional academic achievement of GPA.31 We acknowledge that it is unknown whether high performance in medical or veterinary school is reflective on how good of a doctor a student becomes.32,33 Nor is it known how veterinary student GPA in preclinical courses correlates to a student being able to successfully perform all the AVMA Council on Education–specific clinical competencies expected of entry-level practitioners.34,35
Our dataset incorporated a combination of pre- and postadmission factors, incorporating academic, financial, and demographic predictors. These predictors were both categorical variables, such as the race or country of origin, and continuous variables, such as cumulative GPA or GRE score. By one-hot encoding the categorical variables, the random forest was able to use both categorical and continuous variables while simultaneously not biasing our results by ranking. Random forests are not highly susceptible to outliers, and with this robustness to outliers,36,37 all student data with complete records could be incorporated. Additionally, random forests do not require us to scale our features, meaning we did not need to standardize and normalize the variables prior to creating the models.38 Furthermore, random forests handle high nonlinearity between independent variables, as nonlinear parameters typically do not affect the performance of the decision tree models because the splitting of the decision nodes is based on absolute values (yes or no) and the branches are not based on a numerical value of the feature.18,36–39 We highlighted the data preparation, ensuring data integrity that is accurate and unbiased. The disadvantage of using random forest models relates mostly to the time required to train the model and the computational power and resources required to create a lot of decision trees that compose the random forest model.38 Exploration of other ML algorithms, such as support vector machines, may perform similar to random forests while requiring less computational power.
Only academic and financial variables were consistently identified as important predictors both in the overall and yearly random forest models. All models consistently showed that students with higher preclinical cumulative GPAs were more likely to be successful. This is consistent with undergraduate cumulative GPA and science GPA being important predictors of veterinary student success.1,2,10,11 Additionally, as students progressed through the DVM degree program, they were less likely not to graduate and those with fewer failed credits were more likely to graduate and not be dismissed. Two studies,11,12 1 published in 2004 and 1 published in 2020, correlated GRE scores with veterinary student success.
Only beginning in 2016 did the GRE begin to be recognized as a very minorly important variable in our yearly models. The GRE-verbal and GRE-quantitative scores have not increased significantly over time,40 whereas in higher education, overall GPAs have increased over time. It is unclear whether this increase in cumulative GPAs is due to factors such as improved assessment literacy41 or “grade inflation” as a result of faculty needing positive student evaluations for promotion and tenure.41 This suggests that further assessment into covariates contributing to higher GPAs should be pursued in future studies.
Key financial variables were considered, including outstanding balances, whether students had an active FAFSA on file, and scholarships, because veterinary students report financial worries and constraints as nonacademic challenges negatively impacting their academic performance.5,6 Total outstanding balance as well as whether a student had a FAFSA on file were both identified in the top 10 most important features for the overall model. Total outstanding balance was also identified in the past 7 years as one of the most important factors. Active students were more likely to have a larger outstanding balance than those who did not successfully graduate. Interestingly, when assessing the linear relationship of preclinical cumulative GPA and total balance with the Pearson correlation coefficient, there was no correlation (r = –0.002).
In conclusion, study results supported the exploratory use of random forest classifier models in accurately predicting students at risk as well as those who were academically successful. Future studies should assess the use of random forest models for early prediction and identification of students in their veterinary education and explore what types of interventions can be mostly effective in minimizing dismissals due to poor academic performance. Additionally, there is a research gap regarding best practices for using ML in evaluating veterinary student applicants and admission processes. These guidelines should be established before exploring variables like mental health status and social determinants to ensure that explicit and implicit bias is avoided. This is particularly important when assessing how such variables might contribute to the students’ overall GPA, which was consistently identified as a key factor in student success. Just as important is the communication of the findings to the practicing veterinary community, as veterinarians could be invaluable to social belonging and other key intervention strategies for veterinary students.
Supplementary Materials
Supplementary materials are posted online at the journal website: avmajournals.avma.org.
Acknowledgments
None reported.
Disclosures
The authors have nothing to disclose. No AI-assisted technologies were used in the generation of this manuscript.
Funding
Funding for this study was provided by the Ross University School of Veterinary Medicine Research Center for Veterinary Education, Diversity, and Data Analytics and Ross University School of Veterinary Medicine intramural grant 44019-2023.
ORCID
S. E. Hooper https://orcid.org/0000-0003-1150-2500
N. Ragland https://orcid.org/0000-0002-7959-4803
E. Artemiou https://orcid.org/0000-0001-7308-2316
References
- 1.↑
Van Vertloo LR, Burzette RG, Danielson JA. Predicting academic difficulty in veterinary medicine: a case-control study. J Vet Med Educ. 2022;49(4):524-530. doi:10.3138/jvme-2021-0034
- 2.↑
Rush BR, Sanderson MW, Elmore RG. Pre-matriculation indicators of academic difficulty during veterinary school. J Vet Med Educ. 2005;32(4):517-522. doi:10.3138/jvme.32.4.517
- 3.↑
Raidal SL, Lord J, Hayes LM, Hyams J, Lievaart J. Student selection to a rural veterinary school, I: applicant demographics and predictors of success within the application process. Aust Vet J. 2019;97(6):175-184. doi:10.1111/avj.12820
- 4.↑
OutReachIQ Database. Ross University School of Veterinary Medicine; version XXXX. Accessed August 1, 2022.
- 5.↑
Liu AR, van Gelderen IF. A systematic review of mental health-improving interventions in veterinary students. J Vet Med Educ. 2020;47(6):745-758. doi:10.3138/jvme.2018-0012
- 6.↑
Hafen M Jr, Drake AS, Elmore RG. Predictors of psychological well-being among veterinary medical students. J Vet Med Educ. 2022;50(3):297-304. doi:10.3138/jvme-2021-0133
- 7.↑
Alyahyan E, Düştegör D. Predicting academic success in higher education: literature review and best practices. Int J Educ Technol High Educ. 2020;17(1):3. doi:10.1186/s41239-020-0177-7
- 8.↑
Harackiewicz JM, Priniski SJ. Improving student outcomes in higher education: the science of targeted intervention. Annu Rev Psychol. 2018;69:409-435. doi:10.1146/annurev-psych-122216-011725
- 9.↑
Yeager DS, Walton GM, Brady ST, et al. Teaching a lay theory before college narrows achievement gaps at scale. Proc Natl Acad Sci USA. 2016;113(24):e3341-e3348. doi:10.1073/pnas.1524360113
- 10.↑
Confer AW, Lorenz MD. Pre-professional institutional influence on predictors of first-year academic performance in a veterinary college. J Vet Med Educ. 1999;26:16-20.
- 11.↑
Danielson JA, Burzette RG. GRE and undergraduate GPA as predictors of veterinary medical school grade point average, VEA scores and NAVLE scores while accounting for range restriction. Front Vet Sci. 2020;7:576354. doi:10.3389/fvets.2020.576354
- 12.↑
Powers DE. Validity of Graduate Record Examinations (GRE) general test scores for admissions to colleges of veterinary medicine. J Appl Psychol. 2004;89(2):208-219. doi:10.1037/0021-9010.89.2.208
- 13.↑
Holladay SD, Gogal RM, Karpen S. Brief communication: predictive value of veterinary student application data for performance in clinical year 4. J Vet Med Educ. 2022;49(6):748-750. doi:10.3138/jvme-2021-0012
- 14.↑
Green WH, Watson SE, Kennedy GA, Miceli CA, Taboada J. Forecasting veterinary school admission probabilities for undergraduate student profiles. J Vet Med Educ. 2006;33(3):441-446. doi:10.3138/jvme.33.3.441
- 15.↑
Holistic review. Association of American Medical Colleges. Accessed July 17, 2024. https://www.aamc.org/services/member-capacity-building/holistic-review
- 16.↑
Hooper SE, Hecker KG, Artemiou E. Using machine learning in veterinary medical education: an introduction for veterinary medicine educators. Vet Sci. 2023;10(9):537. doi:10.3390/vetsci1009053717
- 17.
He L, Levine RA, Fan J, Beemer J, Stronach J. Random forest as a predictive analytics alternative to regression in institutional research. Practical Assess Res Eval. 2018;23:1.
- 18.↑
Spoon K, Beemer J, Whitmer JC, et al. Random forests for evaluating pedagogy and informing personalized learning. J Educ Data Mining. 2016;8(2):20-50. doi:10.5281/zenodo.3554595
- 19.↑
Doleck T, Lajoie S. Social networking and academic performance: a review. Educ Inf Technol (Dordr). 2018;23(1):435-465. doi:10.1007/s10639-017-9612-3
- 20.↑
Geographic Levels. US Census Bureau. Accessed July 16, 2024. https://www.census.gov/programs-surveys/economic-census/guidance-geographies/levels.html.
- 21.↑
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12(85): 2825-2830.
- 22.↑
Old GRE to new GRE comparison. Prosper Overseas. Accessed July 16, 2024. http://www.prosperoverseas.com/old-and-new-gre-comparison-tool
- 23.↑
Random forest machine learning models for veterinary student recruitment and retention. MicroBatVet. GitHub Inc; 2024. https://github.com/RUSVMCenter4/Random-Forest-Machine-Learning-Models-for-Veterinary-Student-Recruitment-and-Retention. doi.org/10.5281/zenodo.14051066
- 24.↑
Farnell T, Skledar Matijević A, Šćukanec Schmidt N; European Commission: Directorate-General for Education, Youth, Sport and Culture; Public Policy and Management Institute. The Impact of COVID-19 on Higher Education: A Review of Emerging Evidence. Publications Office of the European Union; 2021.
- 25.↑
Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng. 2014;40(1):16-28. doi:10.1016/j.compeleceng.2013.11.024
- 26.↑
Calvet Liñán L, Juan Pérez ÁA. Educational data mining and learning analytics: differences, similarities, and time evolution. Int J Educ Technol High Educ. 2015;12:98-112. doi:10.7238/rusc.v12i3.2515
- 27.
Algarni A. Data mining in education. Int J Adv Comput Sci Appl. 2016;7(6):456-461. doi:10.14569/IJACSA.2016.070659
- 28.↑
Larusson JA, White B, eds. Learning Analytics: From Research to Practice. Springer; 2014. doi:10.1007/978-1-4614-3305-7
- 29.↑
Umadevi S, Marseline KSJ. A survey on data mining classification algorithms. In: 2017 International Conference on Signal Processing and Communication. Institute of Electrical and Electronics Engineers; 2017:264-268. doi:10.1109/CSPC.2017.8305851
- 30.↑
Bragança R, Portela F, Santos M. A regression data mining approach in Lean Production. Concurr Comput. 2019;31(22):e4449. doi:10.1002/cpe.4449
- 31.↑
Parker JDA, Summerfeldt LJ, Hogan MJ, Majeski SA. Emotional intelligence and academic success: examining the transition from high school to university. Pers Individ Dif. 2004;36(1):163-172. doi:10.1016/S0191-8869(03)00076-X
- 32.↑
Hudson NPH, Rhind SM, Mellanby RJ, Giannopoulos GM, Dalziel L, Shaw DJ. Success at veterinary school: evaluating the influence of intake variables on year-1 examination performance. J Vet Med Educ. 2020;47(2):218-229. doi:10.3138/jvme.0418-042r
- 33.↑
Cleland J, Dowell J, McLachlan J, Nicholson S, Patterson F. Identifying Best Practice in the Selection of Medical Students (Literature Review and Interview Survey). General Medical Council; 2012.
- 34.↑
Chaney KP, Hodgson JL, Banse HE, et al.; AVMC Council on Outcomes-based Veterinary Education. Competency-Based Veterinary Education: CBVE 2.0 Model. American Association of Veterinary Medical Colleges; 2024.
- 35.↑
Molgaard LK, Hodgson JL, Bok HGJ, et al.; AAVMC Working Group on Competency-Based Veterinary Education. Competency-Based Veterinary Education: Part 1 - CBVE Framework. Association of American Veterinary Medical Colleges; 2018.
- 36.↑
Cutler A, Cutler DR, Stevens JR. Random forests. In: Zhang C, Ma Y, eds. Ensemble Machine Learning: Methods and Applications. Springer; 2012:157-175.
- 37.↑
Horning N. Random forests: an algorithm for image classification and generation of continuous fields data sets. Presented at: The International Conference on Geoinformatics for Spatial Infrastructure Development in Earth and Allied Sciences; December 9-11, 2010; Hanoi, Vietnam. Accessed October 22, 2022. https://www.geoinfo-lab.org/gisideas10/papers/Random%20Forests%20%20An%20algorithm%20for%20image%20classification%20and%20generation%20of.pdf
- 38.↑
Kumar N. Advantages and disadvantages of random forest algorithm in machine learning. The Professionals Point. February 23, 2019. Accessed July 17, 2024. https://theprofessionalspoint.blogspot.com/2019/02/advantages-and-disadvantages-of-random.html
- 39.↑
Louppe G. Understanding random forests: from theory to practice. ArXiv. Preprint posted online July 28, 2014. Revised June 3, 2015. Accessed October 22, 2022. doi:10.48550/arXiv.1407.7502
- 40.↑
Bleske-Rechek A, Browne K. Trends in GRE scores and graduate enrollments by gender and ethnicity. Intelligence. 2014;46:25-34. doi:10.1016/j.intell.2014.05.005
- 41.↑
Stroebe W. Student evaluations of teaching encourages poor teaching and contributes to grade inflation: a theoretical and empirical analysis. Basic Appl Soc Psych. 2020;42(4):276-294. doi:10.1080/01973533.2020.1756817