Graduates from programs accredited by the AVMA Council on Education or individuals who complete an equivalent program must pass the NAVLE before they may practice veterinary medicine in the United States.1 In addition, fourth-year veterinary students are eligible to complete the NAVLE, provided their anticipated graduation date is within 10 months after the testing period. The NAVLE, overseen by the International Council for Veterinary Assessment, consists of 300 scored items relevant to an array of small and large animal species. Candidates are assessed according to their ability to correctly answer multiple-choice questions reflecting skills in data gathering, data interpretation, health maintenance, and problem management.2 Successful completion of the NAVLE is intended to reflect a minimum standard of readiness for clinical practice.
Students and graduates of human medical programs experience a different path to initial licensure. The USMLE is a 3-part assessment administered by the Federation of State Medical Boards and the National Board of Medical Examiners. The Step 1 portion of the USMLE is a multiple-choice examination designed to test knowledge of the basic sciences foundational to health, disease, and patient management. Emphasis is placed on interpretation, problem solving, and knowledge application.3 Medical students typically complete the assessment at the end of their second year, and they are required to pass the Step 1 examination prior to progression to the clinical (third and fourth) years of the program. The Step 2 portion of the USMLE is divided into multiple-choice assessment of clinical knowledge and interactive assessment of clinical skills. The clinical knowledge portion emphasizes “principles of clinical science that are deemed important for the practice of medicine under supervision in postgraduate training” and is designed to test knowledge related to health and disease of body systems as well as competencies (eg, diagnosis, patient management, and professionalism).4 The Step 2 clinical skills examination is administered at a centralized testing center and incorporates standardized patient encounters to assess skills in information gathering, physical examination, and communication.5 Step 2 is completed early in the fourth year of medical training in anticipation of progression to residency positions. Step 3 of the USMLE is completed after the first year of postgraduate training and is aimed at assessing skills in applying biomedical and clinical sciences principles to patient management in the absence of supervision.6
Inspired by the USMLE, and in response to a drop in NAVLE pass rates among CSU students, a desire for greater curriculum integration, and the perception by faculty that students often fail to retain information from year to year, administrators and faculty members at CSU implemented the CS program. The current form of the CS program, which is the subject of this report, consists of 3 examinations (CSI, CSII, and CSIII). The CSI is completed by students entering the second year of the DVM program, the CSII is completed by students entering the third year, and the CSIII is completed during the week in May immediately preceding the start of the fourth year. The CSI and CSII are each composed of 2 parts: a case-based, open-resource online examination and an in-class, closed-resource multiple-choice examination. The online portions of both the CSI and CSII are completed over approximately 12 weeks in the summer, whereas the in-class portions of both the CSI and CSII are completed on the first day of the fall semester. The CSIII, which is a closed-resource examination, is entirely case-based and is delivered via an online LMS.a
We hypothesized that implementation of the CS examinations, in their current form, would be associated with enhanced student performance after the first year and improved NAVLE performance. Underlying these hypotheses were 2 educational theories, both of which are addressed by the CS program. First, a growing body of literature indicates that testing is a highly effective teaching tool7,8 that enhances learning and retention far better than simply reviewing material. Second, repeated, spaced exposure to material in all stages of human learning,9 including in medical education,10,11 helps reduce the steepness of the “forgetting curve.” By requiring students to reengage with material to which they had been exposed in the previous year and testing them on this material, the CS examinations bring both of these well-validated educational principles into practice in the veterinary curriculum.
The present report provides data relevant to the aforementioned hypotheses and additional outcomes related to learner, faculty, and programmatic aims of the CSI and CSII. Additionally, information is provided regarding the evolution of the examinations over time as efforts were made to optimize the examination and student experience. Because of differences in timing, nature, and goals of the CSIII (implemented in 2010) relative to those of the CSI and CSII, the CSIII is not the focus of this report.
Evolution of the CSU CS Examinations
Implemented in 2009, the CS examinations were initially managed by the Associate Dean for Veterinary Academic and Student Affairs at CSU, the DVM Education Development Coordinator, and a faculty member who developed the CSIII. These individuals worked closely with faculty members to solicit and optimize CS questions and conduct remediations when students did not pass on the initial attempt. In 2013, a multidisciplinary CS Committee composed of 6 DVM faculty members representing all 4 years of the curriculum was formed. At that time, the version of the CS examinations described in this report was implemented. This version contained substantial changes to the questions, format, grading, and most importantly, consequences for students who did not pass on the initial attempt.
Key changes
Several specific key changes were implemented. In the fall of 2013, the remediation exercise for both the CSI and CSII was changed from a written reflection to a second examination composed of both original and new questions. The rationale was that the written reflection was often done poorly, with minimal benefit to student learning. Interviews of students requiring remediation suggested that they did not view the reflection exercise as sufficiently onerous so as to prompt effective examination preparation. Individuals at another veterinary program also observed student apathy in the face of lackluster consequences.12 With initiation of the second examination, students who did not pass this remediation exercise were dismissed from the program with the opportunity to request readmission through the DVM Scholastic Standards Committee.
Another key change was the grading strategy. Examinations were designated pass or fail until 2013, when letter grades were introduced. The rationale was that, in the context of a high failure rate, grade assignment would incentivize students to prepare and succeed. The examination designation was also changed. In 2018, the CS examinations were designated as a required noncourse and changed to pass-fail grading again after being 1-credit independent study courses since inception. The intention was to reduce student anxiety by changing the designation and eliminating the grade and transcript record.
In 2014, an objective structured clinical examination component was added to the CSII with the aim of introducing a versatile evaluative tool by which clinical competencies could be assessed on the basis of objective testing through direct observation.13
Furthermore, CS-specific student learning objectives were gradually defined. Historically, students were directed to course materials to guide preparation for the CS examinations. In 2019, all courses were associated with CS learning objectives; approximately 25% of the CS learning objectives reflected course objectives, thus being too broad and requiring further refinement. The rationale for CS-specific objectives was that facilitated directed study was expected to clarify expectations, help students to focus studying efforts, and reduce anxiety.
The final key change to the CS examinations was the uploading of in-class multiple-choice CS questions into an assessment software programb in 2017. This was done so that the questions could be coded according to course of origin and students could be provided with performance feedback by course, with the understanding that descriptive feedback is correlated with improved student performance.14
Goals of the CS program
Shortly after the CS Committee was formed, 7 goals of the CS program were identified. These goals and their underlying rationales were as follows:
1. Application of material: Studying for the CS provides opportunity to apply material learned to novel scenarios; the examination should assess this ability.
2. Assessment of students: The CS examinations should be used to ensure that all students who progress in the program have a mastery of key concepts and the ability to integrate and apply information from the previous year. Students should be able to use the CS examinations to identify deficits in knowledge or clinical reasoning abilities.
3. Recall practice: Studying for examinations is a well-established teaching tool, equivalent to other forms of instructional activity. Repeated, spaced interaction with material decreases the steepness of the “forgetting curve” with regard to the recall and application of information.
4. Integration of material by faculty: Faculty should use the CS examinations as a tool for integrating their material with other courses in the same year and within the same discipline across years.
5. Integration of material by students: Preparation for and completion of the CS examinations should encourage students to integrate material across courses.
6. Review of material from the previous year: Studying for the CS examinations should ensure that students have reviewed important facts and concepts from the previous year to optimize preparedness for learning new material.
7. Assessment of curriculum: The CS examinations should be used to track the performance of students as a whole in specific subject areas, disciplines, and competencies, both within individual years and courses and longitudinally throughout the curriculum. This analysis would then be used to improve the curriculum in subsequent years.
Although improvement of NAVLE pass rates was an impetus behind the development of the CS program in 2009, this was not identified as a CS program goal by the CS Committee in 2013.
Examination delivery and grading
The case-based, open-resource summer examinations (CSI and CSII) were completed by students within the online LMS. Responses to the multiple-choice examinations (CSI and CSII) were entered by students onto computer-readable forms,c and beginning in the fall of 2015, the results were imported into the assessment software programb for analysis. The CSI short-answer questions and the CSII objective structured clinical examinations were graded by hand by relevant faculty and entered into the LMS gradebook by the chair of the CS Committee. Performance data were verified by committee members assigned to each examination and ultimately reported to students by the committee chair.
To determine how well each question discriminated between students who performed well overall versus those who did not, point biserial correlations were calculated for all questions on the multiple-choice examinations. Guidelines were followed to determine which questions should be retained and which should be removed in grade calculations, but decisions about ultimately removing questions were made after consultation with the faculty authors of the questions. Per these guidelines, if > 50% of students answered a question correctly, the question remained. If 30% to 50% of the students answered correctly, the question remained if the biserial correlation was > 0.2. If < 30% of the students answered correctly, the question was removed. To pass the examinations, students were required to earn a grade of at least 80% on the online, case-based, open-resource portions of the CSI and CSII and at least 70% on the in-class, closed-resource segments.
Program Outcomes: Reliability and Pass Rates
The study design was reviewed and deemed exempt from the need for approval by the CSU Research, Integrity and Compliance Review Office Institutional Review Board (016-19H).
Examination reliability
Beginning in 2017, with uploading of CS multiple-choice questions to the assessment software program,b the software was used to evaluate the reliability of the 2015–2018 examinations by calculation of Kuder-Richardson 20 scores, with 0 indicating no reliability and 1 indicating perfect reliability. These scores remained remarkably consistent across examinations and years, ranging from 0.80 to 0.83 for the CSI and from 0.79 to 0.81 for the CSII.
Examination pass rates
First-time pass rates for the CSI and CSII were summarized by year of completion (Table 1). To maintain student privacy, ineligibility for remediation or incidents of remediation failure are not reported by year. Of the 854 students completing the CSI examination for the first time between 2013 and 2018, 42 (4.9%) did not pass. Of these 42 students, 2 were ineligible for remediation owing to a combination of CS-related and non–CS-related factors. Of the remaining 40 students, 39 (98%) passed the remediation exercise (ie, completion of a second examination). In total, 3 students were dismissed from the DVM program partly or entirely because of their CS performance. Consequently, including the successful remediations, the overall pass rate for the CSI was 99.6% (851/854).
Percentage (proportion) of veterinary students at CSU who passed the CSI (second-year students) and CSII (third-year students) the first time, by year of examination.
Examination | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | Overall |
---|---|---|---|---|---|---|---|
CSI | 98.5 (135/137) | 93.5 (129/138) | 97.8 (136/139) | 96.5 (139/144) | 88.4 (130/147) | 96.0 (143/149) | 95.1 (812/854) |
CSII | NA | NA | 96.3 (130/135) | 96.4 (134/139) | 100 (142/142) | 99.3 (142/143) | 98.0 (548/559) |
NA = Not applicable because the CSII was first offered in 2015.
Examinations were completed by students at the beginning of the fall semester in the indicated year.
A total of 559 students completed the CSII for the first time from 2015 through 2018, of which 11 (2.0%) failed. Remediation for all students who failed the CSII examination initially was successful; none were dismissed from the program. Overall, students performed better on the CSII than on the CSI, which was attributed to additional knowledge acquired during their second year as well as familiarity with the examination structure and needed preparation given their experience with the CSI.
The lowest quartile according to class standing was overrepresented by those who failed the CSI and CSII on the first attempt. These students were ranked from 75th to 146th (median, 129th), with class sizes ranging from 135 to 149 students, depending on the year.
Program Outcomes: Student Performance
Second-year course performance
Course point accumulation for each student was analyzed to determine whether yearly performance, from first to second year in the DVM program, was impacted by implementation of CS examinations. Points were first standardized to a percentage of the maximum points possible for a given course, where the maximum was the actual maximum (ie, 100% of available course points) where available and the maximum score attained by 1 or more students otherwise. In each year of the DVM program (ie, first and second), performance for each student was calculated by taking a weighted mean of points, with the number of credits assigned to each course as the weighting factor. Each DVM class was evaluated separately to group students on the basis of quartile groups (ie, first [lowest], second, third, or fourth [highest]) for first-year performance. The amount of improvement from the first to second year was calculated as the difference between performance in the second year and performance in the first year. Because students from different quartile groups were not compared and because the range of performance values was similar among all years in each quartile group, no transformations were made to normalize the difference calculations. The mean amount of improvement for a DVM class was determined by calculating the individual amount of improvement for each student in the class, then obtaining the mean for that class; an identical approach was used to measure the means of each quartile group within each DVM class.
Statistical softwared was used for analyses. Differences in amounts of improvement were evaluated by use of a linear model that included the interaction between quartile group and academic year as a fixed effect (before 2016 vs 2016 and after, with the DVM class of 2016 being the first to complete [in 2013] the most recent version of the CSI [as of 2019]). Student performance before versus after the CSI was tested for each quartile group by estimation of marginal means.e Differences in proportions of students gaining or losing points were evaluated with the χ2 test.
The mean amount of improvement for each quartile group within each DVM class was graphically displayed (Figure 1). For students in the lowest quartile group during their first year of study, the mean difference in performance from the first year to the second year was higher in the classes of 2016 and later, who completed the most recent version of the CSI, than in the earlier classes, who completed an earlier version of the examination (t test, P = 0.001). No such improvement was identified for students outside the first quartile group (second quartile group, P = 0.18; third quartile group, P = 0.10; fourth quartile group, P = 0.60). Although the mean amount of improvement in class performance was greater in and after 2016, a nonsignificant amount of improvement was observed from first year to second year prior to 2016 in all quartiles. The classes of 2010 and 2011 had smaller amounts of improvement than did any of the subsequent classes. Furthermore, the size of improvements increased from 2011 through 2014; therefore, the degree to which the CS experience contributed to the observed improvements for the classes of 2016 to 2020 remains unclear.
NAVLE pass rates
Although an improvement in NAVLE pass rates was not identified as a principal goal of the CS examinations by the original CS Committee in 2013, NAVLE performance was examined to identify any change associated with implementation of the most recent version of the CS examinations. For this purpose, national and CSU data related to first-time NAVLE pass rates and mean scores were obtained from annual reports of the International Council for Veterinary Assessment sent to each participating college of veterinary medicine. These data showed that the NAVLE pass rates for CSU students increased from 85% in 2015 to 95% in 2016, rising to values exceeding the national mean (Figure 2). The CSU pass rates and mean scores have remained above the national mean subsequently.
Program Outcomes: Student Surveys
Two types of student surveys were conducted. Postexamination surveys were voluntarily completed by second- and third-year students within 1 week after completion of the in-class CSI and CSII examinations. Fourth-year surveys were completed by fourth-year students just prior to graduation and included reflective questions regarding the CSI and CSII online and in-class examinations.
Postexamination surveys
Since 2013, a postexamination survey has been offered to second-year students following the CSI. The CSII postexamination survey has been offered to third-year students since 2015, when the format was revised to reflect that of the CSI. Any student completing the survey within 1 week after completing the CS examination received an additional 0.5 percentage point in their final in-class score. The survey was completed before students received their scores, but faculty graders did not view survey results prior to score submission. The class size ranged from 135 to 149 students between 2013 and 2018. Between 95% and 100% of students completed the survey for the CSI during that time, and between 67% and 87% of students completed the survey for the CSII.
The majority of second-year students responded that the CSI online examination was very beneficial or extremely beneficial in aiding contextualization of material learned the previous year (Figure 3). The most common responses to questions about whether the CS examination had helped students retain information from the previous year or had impacted their confidence entering the second year of the DVM program indicated that they found the examination moderately beneficial or positive, respectively, in these regards. A high percentage of students agreed that they had learned new material in completing the exercise and that the examination had piqued their interest in particular topics. Although nearly all students indicated that there was benefit to the CSI examinations, opinions were split as to whether the benefit exceeded the perceived drawbacks.
Regarding the CSII, high percentages of third-year students viewed the online examination as moderately or very beneficial in contextualizing learned material and moderately beneficial in aiding retention (Figure 4). The most common answer in response to how the examination had impacted their confidence entering the third year of the program was “neutral.” Most students agreed that the examination had led them to gain new knowledge and that their interest in additional learning had been piqued. Similar to the CSI, although nearly all students indicated that there was benefit to the CSII examinations, there were split opinions as to whether the benefit exceeded the perceived drawbacks.
Sixty-one percent of students indicated that the online CSI examinations were somewhat beneficial in preparing for the in-class examinations, whereas 62% reported the same response in relation to the CSII. Fifty-eight percent of respondents indicated that more preparation for the CSI in-class examinations would have been useful; 47% of respondents reflected the same view of the CSII. Thirty-seven percent of students reflected that the CSI was somewhat more difficult than anticipated, and 46% indicated that the CSII was “about what they expected.” In 2017 and 2018, 38% of students indicated that the objectives were moderately beneficial in preparing for the in-class CSI examinations, whereas 39% stated that objectives were minimally beneficial in preparing for the CSII in-class examination.
As part of the postexamination survey, students of the DVM classes of 2015 and 2016 were asked to estimate the amount of time they had invested in preparing for the CSI online and in-class examinations. The median value for both cohorts was 75 hours, with the minimum amount of time ranging from 5 to 20 hours and the maximum amount ranging from 325 to 350 hours.
Two authors (ACA and MAF) independently reviewed the free-text comments entered in the survey, then identified key themes derived by both individuals. Features consistently identified as contributing to reported dislike of the CSI and CSII included the high-stakes nature of the examinations and associated anxiety, interference with summer activities (eg, employment, experiential opportunities, or recreation), and lack of directed study through focused CS learning objectives.
Fourth-year survey
In 2018 and 2019, questions specific to the CS program were embedded in the fourth-year survey, which served as an important instrument for outcomes assessment for the overall DVM program. Fourth-year students were sent a link via email in March; completion of the voluntary online survey was highly encouraged through 2 reminder emails sent in April. The survey was open to students from the end of March through May of their graduation year. Questions were designed to elicit how much the students believed the CS examinations had helped them, with response options ranging from a score of 1 (did not help) to 7 (helped). Response rates were 72% (99/138) for the class of 2018 and 61% (86/142) for the class of 2019, for an overall response rate of 66% (185/280).
Many students indicated a score of ≥ 5 when asked if the in-class CSI and CSII examinations helped with integration of material across courses (2018, 60%; 2019, 47%) and with recall and application of previously learned information (2018, 57%; 2019, 52%; Figure 5). A similar pattern of responses was observed when the same questions were asked in relation to the CSI and CSII online case-based examinations (2018, 80% and 77%, respectively; 2019, 71% and 64%, respectively).
Discussion
Studies8,15 have shown the value of practicing retrieval of information learned (ie, retrieval practice, such as that achieved through self-testing) and spreading out study efforts over time (ie, spaced repetition) on student performance and retention of ideas. For example, medical students on a third-year urology rotation who received clinically relevant practice test questions and associated answers via email over a period of several months performed significantly better on an end-of-year examination than did students not participating in this spaced retrieval practice.16 In another study,17 retrieval practice and spaced repetition through self-directed study efforts of medical students were associated with superior performance on the USMLE Step 1 examination, compared with no such efforts.
Consistent with the idea that information retrieval and repetition promote retention, overall course performance data for the CSU veterinary students of the present report suggested that students in the first quartile group (ie, students with the lowest performance) improved after implementation of the most recent version of the CSI, in contrast to the other 3 quartile groups, in which no improvement was observed. Compulsory recall, integration, and application of information learned by this cohort of students may have been particularly useful for promoting confidence and competence. In light of the postexamination and fourth-year survey findings that suggested the value of the CS examinations in promoting recall, the contextualization of information, and confidence, course performance was likely an incomplete and insensitive measure of the value of the CS program. Additional limitations of course performance data included the likely presence of confounding variables that were not controlled for. Although the observed amount of improvement from 2016 and afterward might have been related to the CSI, other factors that may have contributed included changes in course structure, unique class composition, student support, or earlier identification of struggling students.
The second goal of the CS program was assessment of students; that is, the CS examinations should measure student mastery of key concepts and the ability to integrate and apply information from the previous year. The lowest quartile group of each class was overrepresented by students who had failed at least 1 CS examination, suggesting that the program accurately allowed identification of some students who may have had knowledge deficits or difficulty applying sound clinical reasoning to novel scenarios. Certainly students do not require the CS examinations to alert them to the fact that their course performance places them in the lowest quartile group. Nevertheless, obtaining C grades or successfully remediating a D grade may provide a false sense of mastery that does not reflect, in some individuals, the depth of knowledge necessary to integrate material across courses and apply knowledge to novel scenarios. The greatest value of the CS examinations may lie in the preparation; whereas many students reported benefit, and likely all students gained from the experience, course performance data suggested that the students who had less success mastering course concepts likely benefited the most from compulsory review of key course concepts that would otherwise not have occurred.
To further promote assessment of students through the CS program, use of an assessment software programb was implemented in 2017. Postexamination feedback allowed students to identify knowledge deficits by course. In the future, more advanced coding of individual questions within the program will enable more specific and therefore more useful data, such as the more granular classification of cardiovascular physiology within the broader category of physiology. Students who successfully passed the CS examinations were able to review their completed assessments to identify questions that were answered incorrectly; however, few actually took advantage of this opportunity. Students who failed to pass a CS examination the first time have historically been offered unlimited opportunity to review their examination in a supervised setting, without electronic devices. Most students have taken advantage of this as they prepared for remediation. Additionally, students could request 1-on-1 or group tutoring from trained peer tutors within the more advanced classes.
In the third and fourth years, clinical performance in relation to AVMA competencies is assessed at CSU by use of a 10-point scale using curriculum and assessment management software.f Because of changes in grading methods over time (ie, from pass or fail to letter grades) and the scale itself, it was not possible to reliably evaluate clinical performance in relation to implementation of the CS program.
Our findings indicated that first-time NAVLE pass rates increased and mean performance improved after implementation of the most recent versions of the CSI and CSII. At the most basic level, preparation for and completion of multiple high-stakes, multidisciplinary examinations may have provided students with a practical advantage. Additionally, consistent with aforementioned learning theory, 3 opportunities for recall, integration, and application of information learned were likely to equip students for success when taking the NAVLE. However, in 2016, a NAVLE resource page for students was created. The site contained recorded faculty review sessions and sources for practice questions. Although students had access to these resources prior to this time, the site provided a central repository for more efficient access. It remains unknown whether, and the extent to which, this resource contributed to improvements in NAVLE performance.
Subjective survey data in the present study were consistent with the aforementioned improvement in course and NAVLE performance as well as with the first, third, fifth, and sixth goals of the CS program. Postexamination survey responses reflected a perceived benefit of the CSI and CSII examinations in contextualizing learned information, promoting retention, improving confidence, prompting learning of new material, and stimulating interest in additional topics. Fourth-year survey responses offered a different perspective, with the CS experience contextualized within the entirety of the DVM training program. Fourth-year students acknowledged the benefit of the CS program, particularly of the case-based online examinations, in promoting integration, recall, and application of information learned. Interestingly, many students reported a negative or neutral effect of the in-class examinations for both CSI and CSII on confidence entering the subsequent academic year. Because the surveys were completed within 1 week after the examinations were completed, before scores were reported, students may have believed that they performed more poorly than they actually did. As students expressed less anxiety associated with the open-resource online cases, sole adoption of this method may provide an effective yet encouraging approach to CS programming.
In addition to goals directly tied to student learning, the fourth CS goal was to allow faculty members the opportunity to integrate content from multiple courses and years. The open-resource online cases completed over the summer accomplished this. To develop these cases, multiple faculty members gathered together for case selection and subsequent content integration. Alternatively, members of the CS Committee worked with individual faculty members to accomplish the same outcome. Regardless of the approach, the CS cases were chosen to reflect clinically relevant primary care scenarios with multidisciplinary facets. Student feedback provided in open-text portions of postexamination surveys indicated that the opportunity for content integration was recognized and valued as preparation for future case management.
Additionally, per the seventh CS goal, the CS examinations were used to screen for courses or particular subject matter that required additional or alternative emphasis, reflected by a large number of students who failed to demonstrate mastery. Although some course subjects were inherently more difficult than others and thus associated with a larger number of students who performed suboptimally, no single course or topic has yet been identified as a particular challenge for a majority of students. Considerable effort has been invested in working with individual faculty members to optimize quality of questions, improve clinical relevance of items, and avoid excessively detailed questions. As noted, most faculty members now provide students with CS-specific learning objectives and tie CS assessment items directly to those objectives. Collectively, these efforts reduce the opportunity for artifactual decreases in student performance not attributable to learner knowledge or clinical reasoning ability.
An unanticipated benefit of the CS program was appreciated with admission of transfer students as well as with initiation of the 2 + 2 program with the University of Alaska Fairbanks in 2015. The CS examinations provided a robust measure by which student performance could be objectively compared between cohorts, allowing timely identification of any meaningful discrepancies possibly attributable to inconsistencies in the learning environment.
Although the data reported here suggested benefits of the CS program, adverse sequelae were also identified. Student surveys delivered immediately after completion of CSI and CSII examinations highlighted this, revealing both perceived value and dislike for the CS exercises. Open-text comments revealed that reduced well-being was a noteworthy adverse effect of the CS experience. Specifically, students reported that the high-stakes nature of the examinations created anxiety and that preparation for the examinations precluded engagement in alternative summer activities. Medical students experience considerable anxiety associated with the USMLE, and studies18,19 have shown that test anxiety and performance are inversely related. Some authors have proposed approaches such as instruction in preparatory20 and test-taking19 strategies. Within faculty forums and CS Committee meetings at CSU, opportunities have been identified and implemented to reduce the adverse impact on veterinary student well-being, such as removal of the CS examination results from the transcript, reversion to pass-fail grading, and provision of CS-specific objectives. Many discussions have focused on alternative timing of the examination. Administration of the examinations at the end of the spring semester was suggested by some but was not implemented because this timing would not optimally fulfill the goals of recall and review in preparation for the subsequent fall semester.
Related to the idea of well-being as well as the known benefit of frequent, formative assessment in professional student learning21 is the idea of CS exercises throughout the academic year that students may complete more than once to ultimately pass. Distinct from course assessments, these examinations would require content integration across disciplines and application to novel scenarios. To encourage recall, review, integration, and application of information learned and development of clinical reasoning skills, a summer case-based examination could subsequently be offered. Given that the case-based, open-resource examinations have been described by students as valuable and less anxiety provoking than a closed-resource examination, such an approach may serve the dual benefit of positively impacting learning while promoting student well-being. Even so, as previously mentioned, a closed-resource examination is an effective learning tool that compels students to broadly review information across courses. Such an examination also has the added potential benefit of a preparation and testing process that is similar to that for the NAVLE.
Additional limitations to the data reported here included the lack of measurements of clinical performance and of outcomes associated with success after graduation. Individuals studying the impact of the USMLE have leveraged large sample sizes and a standardized examination to create a modest but useful body of literature reflecting the impact of USMLE on the development of clinical skills. For example, a subset of USMLE Step 2 performance data was associated with improved history taking and physical examination skills in the supervised residency setting.22 A study23 involving third-year medical students showed a positive relationship between performance on the USMLE Step 1 examination and performance on the National Board of Medical Examiners subject examination. Additionally, USMLE scores have been negatively associated with professional disciplinary action24 and positively associated with performance on board certification25 and medical licensing examinations26 as well as internships.27 Others have questioned the use of USMLE scores as determinants of resident selection given the lack of evidence linking performance data with skill acquisition.28 One study29 revealed no correlation of faculty ranking of emergency medicine residents with performance on the USMLE Step 1 examination. A study30 involving otolaryngology residents revealed a positive correlation between performance on the USMLE and performance on the written qualifying board examination, yet the investigators suggested that performance on the USMLE was of limited practical usefulness because individuals who performed less well on the USMLE remained likely to pass the qualifying examination. Certainly the distinctions between veterinary and medical training render such studies of limited application to CS-type examinations in veterinary medicine. Additionally, the CS examinations at CSU are not standardized or consistent with other veterinary programs in relation to timing of administration or number of examinations. Consequently, outcomes data associated with the CSU CS program must be considered in the context in which they were obtained.
Summary
Data suggested that the CS examinations at CSU were valuable in improving overall course performance for students in the lowest quartile on the basis of accumulated course points as well as first-time NAVLE performance. Overall, the data presented here demonstrated achievement of CS program goals. The impact of the CS examinations on postgraduation success remains unknown; indeed, even the medical literature contains discrepant conclusions in this regard. The adverse impact on student well-being appears to be the primary drawback to such high-stakes examinations. Measures taken to lessen this consequence included implementation of pass-fail grading and removal of the CS examination from the transcript. Interventions aimed at reducing test anxiety and improving preparatory and test-taking abilities may be valuable. At CSU, narrowing the field of testable material through CS-specific learning objectives has been useful. A CS program composed solely of open-resource cases may be a less rigorous but effective approach that could be viewed more positively by students. With evidence for substantial benefit and opportunities to alleviate adverse sequelae, we intend to further optimize the CS program through continual evaluation and integration of best practices in professional training.
ABBREVIATIONS
CS | Capstone |
CSU | Colorado State University |
LMS | Learning management system |
NAVLE | North American Veterinary Licensing Examination |
USMLE | United States Medical Licensing Examination |
Footnotes
Canvas, Instructure, Salt Lake City, Utah.
ExamSoft, Dallas, Tex.
Scantron, Eagan, Minn.
R: A language and environment for statistical computing, version 3.5.1, R Foundation for Statistical Computing, Vienna, Austria.
emmeans: Estimated marginal means, aka least-squares means, version 1.2.3, Lenth R, Singmann H, Love J, et al. Available at: cran.r-project.org/web/packages/emmeans/index.html. Accessed Mar 9, 2020.
one45, Vancouver, BC.
References
1. AVMA. Information for foreign veterinary graduates on working as a veterinarian in the US. Available at: www.avma.org/ProfessionalDevelopment/Education/Foreign/Pages/ECFVG-working-in-us.aspx. Accessed Sep 30, 2019.
2. International Council for Veterinary Assessment. NAVLE. Available at: www.icva.net/navle. Accessed Sep 30, 2019.
3. USMLE. Content description and general information: Step 1. Available at: www.usmle.org/pdfs/step-1/content_step1.pdf. Accessed Sep 30, 2019.
4. USMLE. Content description and general information: Step 2 Clinical Knowledge (CK). Available at: www.usmle.org/pdfs/step-2-ck/Step2CK_Content.pdf. Accessed Sep 30, 2019.
5. USMLE. Step 2 CS. Available at: www.usmle.org/step-2-cs/#overview. Accessed Sep 30, 2019.
6. USMLE. Step 3. Available at: www.usmle.org/step-3. Accessed Sep 30, 2019.
7. Karpicke JD, Roediger HL. The critical importance of retrieval for learning. Science 2008;319:966–968.
8. Larsen DP, Butler AC, Roediger HL. Test-enhanced learning in medical education. Med Educ 2008;42:959–966.
9. Donovan JJ, Radosevich DJ. A meta-analytic review of the distribution of practice effect: now you see it, now you don't. J Appl Psychol 1999;84:795–805.
10. Dolan BM, Yialamas MA, McMahon GT. A randomized educational intervention trial to determine the effect of online education on the quality of resident-delivered care. J Grad Med Educ 2015;7:376–381.
11. Butler AC, Raley ND. The future of medical education: assessing the impact of interventions on long-term retention and clinical care. J Grad Med Educ 2015;7:483–485.
12. Foreman JH, Morin DE, Graves TK, et al. Veterinary curriculum transformation at the University of Illinois, 2006–2016. J Vet Med Educ 2017;44:471–479.
13. Zayyan M. Objective structured clinical examination: the assessment of choice. Oman Med J 2011;26:219–222.
14. Lipnevich AA, Smith JK. Effects of differential feedback on students’ examination performance. J Exp Psychol Appl 2009;15:319–333.
15. Dunlosky J, Rawson KA, Marsh EJ, et al. Improving students’ learning with effective learning techniques: promising directions from cognitive and educational psychology. Psychol Sci Public Interest 2013;14:4–58.
16. Kerfoot BP, DeWolf WC, Masser BA, et al. Spaced education improves the retention of clinical knowledge by medical students: a randomised controlled trial. Med Educ 2007;41:23–31.
17. Deng F, Gluckstein JA, Larsen DP. Student-directed retrieval practice is a predictor of medical licensing examination performance. Perspect Med Educ 2015;4:308–313.
18. Frierson HT Jr, Hoban JD. The effects of acute test anxiety on NBME Part I performance. J Natl Med Assoc 1992;84:686–689.
19. Green M, Angoff N, Encandela J. Test anxiety and United States Medical Licensing Examination scores. Clin Teach 2016;13:142–146.
20. Strowd RE, Lambros A. Impacting student anxiety for the USMLE Step 1 through process-oriented preparation. Med Educ Online 2010;15:4880.
21. Gofton W, Dudek N, Barton G, et al. Work based assessment implementation guide: formative tips for medical teaching practice. Ottawa: Royal College of Physicians and Surgeons of Canada, 2017;1–12.
22. Cuddy MM, Winward ML, Johnston MM, et al. Evaluating validity evidence for USMLE Step 2 Clinical Skills data gathering and data interpretation scores: does performance predict history-taking and physical examination ratings for first-year internal medicine residents? Acad Med 2016;91:133–139.
23. Colbert C, McNeal T, Lezama M, et al. Factors associated with performance in an internal medicine clerkship. Proc Bayl Univ Med Cent 2017;30:38–40.
24. Cuddy MM, Young A, Gelman A, et al. Exploring the relationships between USMLE performance and disciplinary action in practice: a validity study of score inferences from a licensure examination. Acad Med 2017;92:1780–1785.
25. Harmouche E, Goyal N, Pinawin A, et al. USMLE scores predict success in ABEM initial certification: a multicenter study. West J Emerg Med 2017;18:544–549.
26. Kane KE, Yenser D, Weaver KR, et al. Correlation between United States Medical Licensing Examination and Comprehensive Osteopathic Medical Licensing Examination scores for applicants to a dually approved emergency medicine residency. J Emerg Med 2017;52:216–222.
27. Lee M, Vermillion M. Comparative values of medical school assessments in the prediction of internship performance. Med Teach 2018;40:1287–1292.
28. McGaghie WC, Cohen ER, Wayne DB. Are United States Medical Licensing Exam Step 1 and 2 scores valid measures for postgraduate medical residency selection decisions? Acad Med 2011;86:48–52.
29. Wagner JG, Schneberk T, Zobrist M, et al. What predicts performance? A multicenter study examining the association between resident performance, rank list position, and United States Medical Licensing Examination Step 1 scores. J Emerg Med 2017;52:332–340.
30. Puscas L, Chang CWD, Lee HJ, et al. USMLE and otolaryngology: predicting board performance. Otolaryngol Head Neck Surg 2017;156:1130–1135.