Point-of-care platform integrated with deep-learning, convolutional neural network algorithms effectively evaluates canine and feline peripheral blood smears

Eric Morissette Global Diagnostics, Zoetis Inc, Parsippany, NJ

Search for other papers by Eric Morissette in
Current site
Google Scholar
PubMed
Close
 DVM, DACVP https://orcid.org/0000-0002-7597-928X
,
Cory D. Penn Global Diagnostics, Zoetis Inc, Parsippany, NJ

Search for other papers by Cory D. Penn in
Current site
Google Scholar
PubMed
Close
 DVM https://orcid.org/0000-0003-1600-8616
,
Ruth A. Hall Sedlak Veterinary Medicine, Research and Development, Zoetis Inc, Kalamazoo, MI

Search for other papers by Ruth A. Hall Sedlak in
Current site
Google Scholar
PubMed
Close
 PhD
,
Austin J. Rhodes Veterinary Medicine, Research and Development, Zoetis Inc, Kalamazoo, MI

Search for other papers by Austin J. Rhodes in
Current site
Google Scholar
PubMed
Close
 PhD
,
Dan S. Tippetts Techcyte, West Orem, UT

Search for other papers by Dan S. Tippetts in
Current site
Google Scholar
PubMed
Close
 MBA
,
Mike Loenser Global Diagnostics, Zoetis Inc, Parsippany, NJ

Search for other papers by Mike Loenser in
Current site
Google Scholar
PubMed
Close
 DVM, MBA
, and
Richard Goldstein Global Diagnostics, Zoetis Inc, Parsippany, NJ

Search for other papers by Richard Goldstein in
Current site
Google Scholar
PubMed
Close
 DVM, DACVIM

Abstract

OBJECTIVE

To perform a diagnostic assessment of a point-of-care veterinary multiuse platform integrated with a model comprised of deep-learning, convolutional neural network algorithms for evaluating canine/feline peripheral blood smears compared to board-certified clinical pathologists (CPs).

METHODS

This study had a blinded, randomized, incomplete block design, and results were compared between CPs and algorithms. Blood smears from convenience samples from veterinary diagnostic reference laboratories from October to December 2021 were used. Study phase A comprised 2 parts: (1) object class identifier algorithm (leukocytes, platelets, polychromatophils, and nucleated erythrocytes) versus CP within the same field of view (FOV); and (2) monolayer detection algorithm plus object class identifier algorithm versus CPs with different FOVs. Study phase B comprised algorithms versus CP for platelet clump identification. Study phase C comprised algorithms versus CP for polychromatophil identification. Metrics including sensitivity, specificity, and agreement were used.

RESULTS

The sample size was 59 dogs and 60 cats in phase A, 92 dogs and 69 cats in phase B, and 47 dogs and 12 cats in phase C. For study phase A, part 1, the 5-part leukocyte differential count agreement was 96.6% for canine and 91.7% for feline blood smears, and for part 2, the agreement for estimated total leukocyte, platelet, polychromatophil, and nucleated erythrocyte counts ranged from 70% to 95% across species. In study phase B, the algorithm had 90% sensitivity and 88% specificity. The algorithm for polychromatophils had 100% agreement with CP results in phase C.

CONCLUSIONS

This platform achieved results comparable to those of CPs. Results are meant to complement automated CBC results.

CLINICAL RELEVANCE

Veterinarians may add this assessment as part of their standard in-clinic hematology analysis for patients.

Abstract

OBJECTIVE

To perform a diagnostic assessment of a point-of-care veterinary multiuse platform integrated with a model comprised of deep-learning, convolutional neural network algorithms for evaluating canine/feline peripheral blood smears compared to board-certified clinical pathologists (CPs).

METHODS

This study had a blinded, randomized, incomplete block design, and results were compared between CPs and algorithms. Blood smears from convenience samples from veterinary diagnostic reference laboratories from October to December 2021 were used. Study phase A comprised 2 parts: (1) object class identifier algorithm (leukocytes, platelets, polychromatophils, and nucleated erythrocytes) versus CP within the same field of view (FOV); and (2) monolayer detection algorithm plus object class identifier algorithm versus CPs with different FOVs. Study phase B comprised algorithms versus CP for platelet clump identification. Study phase C comprised algorithms versus CP for polychromatophil identification. Metrics including sensitivity, specificity, and agreement were used.

RESULTS

The sample size was 59 dogs and 60 cats in phase A, 92 dogs and 69 cats in phase B, and 47 dogs and 12 cats in phase C. For study phase A, part 1, the 5-part leukocyte differential count agreement was 96.6% for canine and 91.7% for feline blood smears, and for part 2, the agreement for estimated total leukocyte, platelet, polychromatophil, and nucleated erythrocyte counts ranged from 70% to 95% across species. In study phase B, the algorithm had 90% sensitivity and 88% specificity. The algorithm for polychromatophils had 100% agreement with CP results in phase C.

CONCLUSIONS

This platform achieved results comparable to those of CPs. Results are meant to complement automated CBC results.

CLINICAL RELEVANCE

Veterinarians may add this assessment as part of their standard in-clinic hematology analysis for patients.

Microscopic evaluation of a peripheral blood smear is a critical complement for any automated CBC to confirm analyzer findings and optimize clinical decision-making through the detection and verification of cell counts, cell indices, and morphologic abnormalities.13 However, challenges in routine clinical practice, suboptimal blood smear preparation, inconsistent cellular identification, and limited laboratory technician time may make this evaluation difficult.35 Furthermore, intra- and interobserver variability in manual microscopic evaluation, as evidenced even among board-certified, veterinary clinical pathologists (CPs), can contribute to the challenges.610

To help overcome some of these challenges, artificial intelligence and deep-learning, convolutional neural network algorithms have been recently introduced and used in veterinary medicine for in-clinic diagnostic purposes: identification of and enumeration of intestinal parasites in feces11,12 and of bacteria, yeast, and inflammatory cells in swabbed material from skin and ears. Similarly, artificial intelligence and deep-learning, convolutional neural network algorithms have been used in human medicine for hematology-specific diagnostic purposes,1318 including those that served as the basis for a veterinary application.19

The same veterinary multiuse platform used for identifying and enumerating intestinal parasites and cells and microorganisms in skin and ear swabs has been outfitted with hematology-specific algorithms and deep-learning, convolutional neural networks to identify and enumerate blood cells in canine and feline blood smears. After a slide has been stained with a Romanowsky-type stain, scanned, and digitized, a cloud-based algorithm first identifies the cellular monolayer, and then another algorithm estimates the total leukocyte count; provides a 5-part leukocyte differential count (neutrophils, lymphocytes, monocytes, eosinophils, and basophils); estimates the platelet count, nucleated erythrocyte count, and polychromatophil (reticulocyte) count; and categorizes the platelet clumps. If clinically warranted, CPs could review the entire digitized slide to confirm results and report any additional findings.

The objectives of the present study were to evaluate the hematologic-specific algorithms’ performance as compared to CPs through their ability to (1) identify optimal cellular monolayers within the blood smear; (2) identify and enumerate leukocytes, including those comprising the 5-part differential; (3) identify and measure platelet clumps; and (4) identify and enumerate nucleated erythrocytes and polychromatophils.

Methods

Blood sample collection and preanalytical evaluation

Veterinarians from clinical practice collected blood samples from client-owned dogs and cats into EDTA-containing blood tubes as part of their routine work-up. Veterinary personnel packaged these blood samples and sent them to a veterinary diagnostic reference laboratory. Although veterinarians knew whether these pets were healthy or ill, their health status may not have been known by veterinary reference laboratory personnel. At the laboratory, personnel prepared slides of blood smears from these samples and allowed slides to air dry. Next, personnel stained the slides with a Romanowsky-type stain20 (eg, Wright-Giemsa stain). Blood smears were then determined for their suitability for inclusion in the present study. Inclusion criteria comprised a blood smear that covered approximately one-half to three-fourths of the length of the slide, had slight rounding of the feathered edge, had visible lateral edges, and was smooth (ie, blood smear without irregularities, holes, or streaks); had the majority if not the whole blood drop that could be picked up and spread to create the blood smear; and had no identifiable staining irregularities or other problems. Stained slides were later shipped to 1 of 2 research sites, based on geographic location. Samples were collected from October to December 2021. For each phase of the study, the samples were convenience samples, meaning that for some phases, there were fewer samples than in other phases (12 feline samples in phase C) due to the types of samples (ie, blood samples indicative of regenerative anemia) available. This study did not require regulatory review or approval by the IACUC owing to the study design.

At the research sites, a 24 X 60-mm coverslip was placed onto each slide with the use of immersion oil. Then, 1 of 10 scanners (Ocus 40; Grundium) was used to scan the entire coverslipped portion of each slide. The scanning microscope initially scanned the slide with an overview lens. Next, the system quickly determined which area of the slide was suitable for analysis. Then, the system scanned and captured high-resolution images of that area. Specifically, the scanner captured high-resolution images of the entire coverslipped area with an effective magnification of 0.25 µm/pixel through an objective lens with 0.75 numerical aperture and 20X apochromat with a 2X optical doubler, which was equivalent to 40X magnification with a standard microscope.

Artificial intelligence blood smear with a veterinary multiuse platform that uses deep-learning, convolutional neural network algorithms

After the slides were prepared, they were scanned and analysis was performed through the use of a deep-learning, convolutional neural network based on YOLOv3 (You Only Look Once, version 3) that comprised an algorithm and an object-based approach to quantitative image analysis to identify the object classes leukocytes, erythrocytes, platelets, and polychromatophils.21,22 The network works by applying a series of mathematical operations called convolutions to the presented image, which is the scanned image of a blood smear. Convolutions in image processing involve passing many small filters, known as kernels, over the entire image. This process helps in analyzing and extracting various features from the image. These features include but are not limited to edges, textures, shades of colors, and patterns. The kernels effectively scan the image, applying a mathematical operation that highlights specific characteristics at each location, which allows for the identification of these aforementioned features. This process is fundamental in tasks such as image recognition and classification because this process enables the detection of critical visual information that defines the image's content. These features are then applied to the neural network, for which the output is a confidence score. The confidence score is the probability of the image being detected correctly by the algorithm and is generally given as a percentage.23 The scanning and analysis time was approximately 10 minutes for each slide with an adequate internet upload speed.

Training of the hematology model

Training was performed by a CP (EM) who also acted as the data manager for the present study. The initial part of the training involved, on a typical blood smear, the manual labeling by the CP of the objects of interest (ie, object classes [eg, leukocytes, platelets]) considered to be the foreground and the objects of no interest or objects that could confuse the model considered to be the background. This was performed to reach the threshold numbers of objects in training to allow the model to learn the initial object classes. When this threshold was reached, the model then performed daily finder runs on various sets of randomly selected, fully or partially, unlabeled digitized images, so new object classes could be identified and uploaded for labeling. These digitized images were randomly selected from thousands of images available for training. Finder runs involved having the model systematically search for new, impactful objects. Finder runs were used as an iterative human-in-the-loop process wherein the CP evaluated the previous training run, made the relevant changes, and observed the effects of these changes on the next run. When the relevant changes were implemented, the model was retrained, and another finder run was performed by using the updated model. With each new iteration of the model, the training dataset improved and so did the model.

The model's performance is measured against a mutually exclusive dataset, called a holdout or test dataset. The holdout dataset served as a proxy for unseen data, mimicking real-world scenarios in which the model will encounter new observations. Evaluating the model on this independent dataset provided a more specific estimation of the model's ability to generalize to new, unseen data. Generalization indicates the model's ability to apply acquired concepts to instances not encountered during the training phases or within the training dataset, such that the model can effectively extrapolate from familiar instances to unfamiliar ones. In the present study, the model's progressive performance metrics during testing with the holdout dataset were assessed by using enhanced precision-recall curves (ie, positive predictive value vs sensitivity curves). The precision-recall curves show the relationship/balance between sensitivity (recall) and positive predictive value (precision). A large area under the curve indicates a strong performance in both metrics. High recall (sensitivity) indicates a high percentage of true positive results (ie, few false negative results). Alternatively, high precision (positive predictive value), indicates a high percentage of accurate positive predictions with very few false positive results.

Study phase A: cellular monolayer and object class detection

Blood smears made from blood samples collected from dogs (n = 59) and cats (60) were included. The model's cellular monolayer detection algorithm initially analyzed, identified, and selected areas of the blood smear for which the algorithm has the highest confidence score. The object class identifier algorithm further analyzed the selected monolayer areas to determine whether a minimum of 200 leukocytes was present. If 200 leukocytes were not counted within the monolayer that had the highest confidence score, the model looked in the slightly thicker area and slightly thinner area of the monolayer, which was still considered as the monolayer, and applied a penalty score for each area to keep the confidence score as high as possible to reach the minimum of 200 cells. The minimum of 200 was selected because it is the recommended number of leukocytes to be identified through microscopy with a 40X or 50X objective for a manual differential leukocyte count.24 If 200 cells were not counted within the selected monolayer, whether due to a low number of monolayers, the leukocytes concentrated in the feathered edge of the blood smear and thus decreased their number in the monolayer, or the blood sample was collected from a leukopenic patient, the algorithm immediately flagged the sample and did not further analyze it. If the blood smear had an adequate monolayer and a minimum of 200 leukocytes was observed in the monolayer, the following object classes were identified and enumerated: neutrophils, lymphocytes, monocytes, eosinophils, basophils, nucleated erythrocytes (per 100 leukocytes), platelets (number and presence of clumps), and polychromatophils. The number of 40X field of views (FOVs) depended on the WBC cellularity.

Phase A of the study comprised 2 parts: part 1, object class identifier; and part 2, monolayer detector plus object class identifier. The biometrician (AJR) according to a randomized incomplete block design randomly assigned 2 of 4 CPs to each included blood smear slide according to its order of accession. Order of slide accession was partitioned into groups of 6 slides, and within each group of 6 slides, 1 of the 6 unique CP pairings was randomly assigned to each slide. For each slide, 2 CPs estimated the leukocyte count, provided a 200, 5-part leukocyte differential count (neutrophils, lymphocytes, monocytes, eosinophils, and basophils), estimated the platelet count, and indicated the size of platelet aggregates, the percentage of polychromatophils, and the nucleated erythrocyte count (per 100 leukocytes). Their averaged results served as reference for the algorithms for that blood smear.

For part 1, 1 set of FOV indicators (overlay) was used within the scanned image to define areas similar to the standard FOV from a light microscope at a standard magnification of 40X. The CPs selected the number of FOVs needed to adequately evaluate each blood smear, and the algorithm analyzed the same FOVs. Analysis of the same FOVs was performed to remove subsampling variability, thus ensuring we were evaluating only the object class identifier algorithm.

For part 2, which comprised monolayer detection first and object class identification second, each CP selected their own FOVs, and the algorithm selected its own FOVs. Therefore, the FOVs most likely differed between the CPs and the algorithm. Part 2 was performed to mimic real-world scenarios, in which a CP may select different FOVs/monolayers for analysis, compared with the algorithm, when a CP may be requested to review a blood smear initially analyzed by the algorithm.

A second set of FOV indicators, based on the Miller disk method,25 was used for the polychromatophil count. Briefly, a small, square FOV was inset in a large, square FOV, with the small square one-ninth of the area of the large square. Then, CPs counted the number of erythrocytes in the small square and the number of polychromatophils in the large square. Then, the percentage of polychromatophils equaled the total number of polychromatophils in the large square divided by 9 times the total number of erythrocytes in the small square.

A third set of FOV indicators that represented 100X magnification was used to estimate platelet counts. To standardize how CPs determined the size of the platelet clumps, thus decreasing interobserver variability, a set of premeasured (fixed-sized) FOV overlays was provided. These overlays represented the minimum size for a medium and a large platelet clump.

Study phase B: platelet clump identification

Slides of blood smears were prepared as described above. Specific to study phase B, blood smears were evaluated for platelet clumps by a CP (EM) who was blinded to species and reference laboratory results. In addition, the CP selected his own FOVs, and the algorithm selected its own FOVs for this portion of the study. Similar to study phase A, a set of premeasured, fixed-sized FOV overlays were used to classify and standardize platelet clumps as medium and large.

Study phase C: reticulocyte (polychromatophil) identification

Slides of blood smears were prepared as described above. Specific to study phase C, each blood smear was evaluated for reticulocytes with new methylene blue stain or polychromatophils with a Romanowsky-type stain. One blood smear slide was stained with a Romanowsky-type stain similarly for study phases A and B; this stained blood smear was used for testing the algorithm. A second slide of a blood smear, unique to study phase C, was prepared directly after the blood sample was mixed with new methylene blue stain by using the following technique: (1) mix well equal amounts (about 2 to 3 drops) of EDTA-anticoagulated whole blood and new methylene blue stain; (2) incubate the mixture at room temperature for 20 minutes; (3) remix the mixture; and (4) make a blood smear. Laboratory personnel prepared the new methylene blue–stained blood smear slide and like before coverslipped each slide, so it could be evaluated for reticulocytes by 2 of 4 CPs.

The CPs evaluated each new methylene blue-stained blood smear with standard light microscopy or each Romanowsky-type–stained blood smear through a digitized image produced by 1 of 10 scanners. The CPs selected their own FOVs, and the algorithm selected its own FOVs for this portion of the study. For light microscopy, CPs used the Miller disk method25 within the eyepiece of the microscope. In random consecutive FOVs in which the erythrocytes were evenly distributed, the reticulocytes identified in both the small (one-ninth the size of the large square) and large squares were quantified as follows: (1) CPs counted the reticulocytes within the entire large square, including those that abutted the lines on the left and bottom of the ruled area; and (2) CPs counted the erythrocytes in the small square until at least 200 were counted, with any reticulocyte in the small square counted as an erythrocyte. For digitized images, the Miller disk method was also used with presized small and large FOV overlays similar to those used for part 1 of study phase A. For both evaluations, the reticulocyte/polychromatophil count was reported as a percentage of erythrocytes:
%reticulocytes/polychromatophils 100 × total number of counted reticulcytes/polychromatophils(total number of counted erythrocytes) × 9.

Statistical analysis

Statistical analyses were performed with SAS (version 9.4; SAS Institute Inc) and R.26

Study phase A

For the CP-derived 5-part leukocyte differential count, a parametric bootstrap procedure provided slide-level 99% prediction interval estimates for each leukocyte type that accounted for variation between CPs. A binomial distribution was used, and assuming a 200 cell differential count made by each CP, counts for each leukocyte type were slightly adjusted by adding 1 additional positive count and 3 additional negative counts (eg, if a CP identified 20% [40/200] monocytes, then the adjustment was [40 + 1]/[200 + 4] = 20.1% monocytes). For this procedure, 10,000 bootstrap iterations were undertaken. Based on these results, a 99% CP prediction interval was constructed for each slide and object class. For leukocytes with low prevalence, this adjustment provided bootstrap prediction intervals that approximated those found in a Rümke table for assessing leukocyte differential results. For example, the percentage of lymphocytes in a 200-cell differential count is 20% (40/200), and by use of a 95% prediction interval table, lymphocyte count is predicted with 95% certainty to be between 14.5% and 25.5%.27

Within each species for the other object classes (nucleated erythrocytes, platelets, and polychromatophils), slides were ranked by the average number for each object class the CPs identified. For each ranked slide and object class, the average number for each of 2 slides for an object class near the number of the object class in the 1 slide of interest and the number of the object class in the slide of interest (ie, an average of 3 numbers/results) were then used to calculate a weighted estimate of variance. To do this, a weighting factor of 1 was used for the slide of interest, and a weighting factor of 0.25 was used for each of the other 2 slides. The weighted estimate of variance was then used to construct a 95% prediction interval for each slide and object class. A slide-prediction plot with rank-ordered sample numbers on the x-axis and individual CP-derived numbers of an object class on the y-axis was created for each object class, including the 5-part leukocyte differential count. The slide-prediction plot also included the lines denoting the average CP result, the algorithm result, and the 95% or, for the 5-part leukocyte differential count, the 99% prediction interval.

After CP-based prediction intervals were estimated for each object class and slide, we determined whether algorithm results were within the 95% or 99% prediction intervals. A 99% prediction interval was used for the 5-part leukocyte differential count, thereby yielding approximately 95% prediction intervals across the 5 leukocytes. Therefore, for each slide, the differential count was classified to be in agreement when the algorithm results were within the 99% prediction intervals created by the CP results. A 95% prediction interval was used for identifying the other object classes and their enumerations.

Study phase B

Canine and feline blood smears were evaluated. A receiver-operating characteristics curve (ROC) was created from 60% of the slides from each species for the algorithm. Through ROC analysis, the optimal cutoff was established for which to determine the algorithm's sensitivity and specificity in identifying and correctly classifying medium and large platelet clumps in the validation dataset of blood smears. The validation dataset comprised the remaining 40% of the aforementioned identified slides.

Study phase C

The biometrician (AJR) according to a randomized incomplete block design randomly assigned 2 of 4 CPs to each included slide, similar to what was done for study phase A. The 95% prediction interval for study phase C was determined from the CP results. We classified a result to be in agreement when the algorithm result was within the 95% prediction interval created by the CP results. Correlations for results for each of 3 CPs with the algorithm result as well as between each unique CP pair were also evaluated through the Pearson r determined with Passing-Bablok regression analyses; strong correlation equates to a Pearson r > 0.5.

Results

After training, the monolayer classifier algorithm, version 6908, and object class identifier algorithm, version 7070, were used. Holdout testing of the algorithm yielded an average precision of 0.8707 for combined object classes, which indicated that the algorithm was prepared for clinical studies detailed below (Figure 1).

Figure 1
Figure 1

An enhanced precision-recall curve produced with the holdout (test) datasets for all (combined [global]) object classes. Recall is the sensitivity, and precision is the positive predictive value. The outer blue line of the curve is the precision-recall curve, yielding an average precision (AP) of 0.8707. Object class confusion is denoted by the green (related class) and orange (sibling class) lines. Unrelated object class confusion is denoted by the red line.

Citation: American Journal of Veterinary Research 2025; 10.2460/ajvr.24.08.0226

Study phase A: parts 1 and 2

For part 1, object class identifier algorithm alone and with the same FOV among the CPs and the algorithm, the number of canine slides that had blood smears with algorithm-identified object classes that were within the 95% prediction interval based on analysis of CP results were as follows: estimated leukocyte count, n = 59 (100%); estimated platelet count, 56 (95%); estimated polychromatophil count, 48 (81%); and estimated nucleated erythrocyte count, 48 (81%). Similarly, the number of feline slides were as follows: estimated leukocyte count, 60 (100%); estimated platelet count, 45 (75%); estimated polychromatophil count, 40 (67%); and estimated nucleated erythrocytes, 45 (45/59 [76%]). For the 5-part leukocyte differential count, algorithm-identified object classes neutrophils, lymphocytes, monocytes, eosinophils, and basophils agreed with the same object classes that CPs identified for 57 (96.6%) canine slides and 55 (91.7%) feline slides. The minimum number of object classes in agreement for both species was 3 (dog slides, n = 1 [1.7%]; cat slides, 4 [6.7%]). Of the canine slides (n = 59), 98.3% to 100% had neutrophils, lymphocytes, monocytes, eosinophils, and basophils that were within their respective 99% prediction intervals (Figure 2), whereas of the feline slides (60), 93.3% to 100% had neutrophils, lymphocytes, monocytes, eosinophils, and basophils within their respective 99% prediction interval (Figure 3).

Figure 2
Figure 2
Figure 2
Figure 2
Figure 2
Figure 2

From part 1 of study A, slide-prediction plots with percentages of the object classes neutrophils (A), lymphocytes (B), monocytes (C), eosinophils (D), and basophils (E), as determined by individual clinical pathologists (CPs; blue dots) and by the object class identifier algorithm alone (orange dots) from the digitized images with the same fields of view (FOVs) of blood smears prepared from blood samples collected from 59 dogs. Samples are rank ordered on the x-axis, with samples with lower percentages of the specified object class on the left and with higher percentages of the specified object class on the right. Algorithm results that fell within the 99% CP-derived prediction interval (black lines) indicate the algorithm results matched the results of the CP results. Blue line is the average result among CPs. Orange line is the continuous object class identifier algorithm result. Diamond surrounding orange dot (A to C) indicates sample with an algorithm result that did not match the CP results and thus did not fall within the 99% CP-derived prediction interval. The y-axis begins at 30% with neutrophils, whereas the y-axis begins at 0% for the other 4 leukocytes.

Citation: American Journal of Veterinary Research 2025; 10.2460/ajvr.24.08.0226

Figure 3
Figure 3
Figure 3
Figure 3
Figure 3
Figure 3

From part 1 of study A, slide-precision plots with percentages of the object classes neutrophils (A), lymphocytes (B), monocytes (C), eosinophils (D), and basophils (E), as determined by individual CPs and by the object class identifier algorithm alone from the digitized images with the same FOVs of blood smears prepared from blood samples collected from 60 cats. Samples are rank ordered in the x-axis, with samples with lower percentages of the specified object class on the left and with higher percentages of the specified object class on the right. Algorithm results that fell within the 99% CP-derived prediction interval indicate the algorithm results matched the results of the CP results. Blue line is the average result among CPs. Orange line is the continuous object class identifier algorithm result. Diamond surrounding orange dot (A to C) indicates sample with an algorithm result that did not match the CP results and thus did not fall within the 99% CP-derived prediction interval. The y-axis begins at 20% with neutrophils, whereas the y-axis begins at 0% for the other 4 leukocytes.

Citation: American Journal of Veterinary Research 2025; 10.2460/ajvr.24.08.0226

For part 2, cellular monolayer detection algorithm plus object class identifier algorithm with different FOVs among CPs and algorithms, the number of canine slides that had blood smears with object classes that were within the 95% prediction interval based on analysis of CP results were as follows: estimated leukocyte count, n = 56 (94.9%); estimated platelet count, 55 (93.2%); estimated polychromatophil count, 52 (88.1%); and estimated nucleated erythrocyte count (per 100 leukocytes), 41 (69.5%) (Figure 4). Similarly, the numbers of feline slides were as follows: estimated leukocyte count, 54 (90%); estimated platelet count, 50 (83.3%); estimated polychromatophil count, 57 (95%); and estimated nucleated erythrocytes, 45 (45/59 [76.3%]) (Figure 5). For the 5-part leukocyte differential count, algorithm-identified object classes neutrophils, lymphocytes, monocytes, eosinophils, and basophils were within the 99% prediction intervals for the same CP-identified object classes for 52 (88.1%) canine slides and 49 (81.7%) feline slides (Supplementary Figures S1 and S2). Six (10.2%) canine slides and 7 (11.7%) feline slides had a minimum of 4 object classes in agreement.

Figure 4
Figure 4
Figure 4
Figure 4
Figure 4

From part 2 of study A, slide-precision plots with estimated (Est) percentage counts for the object classes leukocytes (A), platelets (B), polychromatophils (C), and nucleated erythrocytes (NRBC; D), as determined by individual CPs (blue dots) and by the monolayer detection algorithm plus the object class identifier algorithm from the digitized images with different FOVs of blood smears prepared from blood samples collected from 59 dogs. Samples are rank ordered in the x-axis, with samples with lower percentages of the specified object class on the left and with higher percentages of the specified object class on the right. Algorithm results that fell within the 95% CP-derived prediction interval (black lines) indicate the algorithm results matched the results of the CP results. Blue line is the average result among CPs. Orange line is the continuous monolayer detection and object class identifier algorithm result. Diamond surrounding orange dot indicates sample with an algorithm result that did not match the CP results and thus did not fall within the 95% CP-derived prediction interval. 10E3 represents 10 raised to power of 3 for estimated cell counts.

Citation: American Journal of Veterinary Research 2025; 10.2460/ajvr.24.08.0226

Figure 5
Figure 5
Figure 5
Figure 5
Figure 5

From part 2 of study phase A, slide-precision plots with estimated percentage counts for the object classes leukocytes (A), platelets (B), polychromatophils (C), and nucleated erythrocytes (D; per 100 leukocytes), as determined by individual CPs and by the monolayer detection algorithm plus the object class identifier algorithm from the digitized images with different FOVs of blood smears prepared from blood samples collected from 60 cats. Samples are rank ordered in the x-axis, with samples with lower percentages of the specified object class on the left and with higher percentages of the specified object class on the right. Algorithm results that fell within the 95% CP-derived prediction interval indicate the algorithm results matched the results of the CP results. Orange line is the continuous monolayer detection and object class identifier algorithm result. Diamond surrounding orange dot indicates sample with an algorithm result that did not match the CP results and thus did not fall within the 95% CP-derived prediction interval.

Citation: American Journal of Veterinary Research 2025; 10.2460/ajvr.24.08.0226

Study phase B

A total of 161 blood smear slides were identified, 108 with platelet clumps and 53 without platelet clumps; 92 blood smear slides were canine and 69 were feline. A ROC curve was created from 60% of the slides (dog slides, n = 55; cat slides, 41) for the algorithm. From the created ROC curve, a cutoff of 1.5 yielded optimal sensitivity of 79% and specificity of 94% (Supplementary Figure S3). From the subsequent analysis for combined medium and large platelet clumps in the blood smears of slides in the validation dataset (n = 65 [dog slides, 37; cat slides, 28]), estimated sensitivity and specificity were 88% and 90%, respectively.

Study phase C

A total of 59 blood smears from blood samples collected from 47 dogs and 12 cats were evaluated. All algorithm results were within the 95% prediction interval (Figure 6). The Pearson r was 0.829 for results (n = 58) from the average result from 3 CPs versus the algorithm and between each of 3 unique CP-algorithm pairs was 0.74 (number of slides, n = 40), 0.838 (38), and 0.857 (38). The Pearson r between each unique CP pair was 0.812 (number of slides, n = 20), 0.872 (20), and 0.973 (19).

Figure 6
Figure 6

From study phase C, slide-prediction plot with estimated percent polychromatophil counts as determined by individual CPs (blue dots) from the light microscopic or digitized images of blood smears prepared from blood samples collected from 59 animals (dogs, n = 47; cats, 12). Samples are rank ordered in the x-axis, with samples with lower polychromatophil percentages on the left and with higher polychromatophil percentages on the right. Algorithm results are within the 95% CP-derived prediction interval (black lines), which indicates the algorithm results matched the results of the CP results. Dashed line is the average result among CPs. Orange dot is the individual algorithm result. Orange line is continuous algorithm result.

Citation: American Journal of Veterinary Research 2025; 10.2460/ajvr.24.08.0226

Discussion

In these studies, we observed that a novel, point-of-care multiuse platform that uses artificial intelligence and deep-learning, convolutional neural network algorithms for evaluating canine and feline peripheral blood smears achieved results comparable to those of CPs. Specifically, our study showed the algorithms successfully and repeatedly detected cellular monolayers and identified and enumerated leukocytes, platelets, nucleated erythrocytes, and polychromatophils, compared with results from CPs. Microscopic evaluation, the traditional method, of a blood smear is a critical complement to automated CBC results. The inherent challenges of using the traditional manual method for identifying and enumerating common and lower incidence cells in a blood smear could be offset by using this platform and its hematology-specific algorithms.

The use of this platform and its hematology-specific algorithms may help with the repeatability of results and reduce the overall veterinary clinic workload, as time spent at the microscope for manual evaluation of the blood smear by veterinary professionals can be avoided. This approach also allows for CP review of blood smears that include unidentified object classes or for which practicing veterinarians request a CP's review, thereby improving access to experts. Improved efficiency with the use of similar platforms has been reported in human medicine.1418

One barrier that may not be overcome with this platform alone is the potential for blood smear creation that may not consistently reach optimal standards. Regardless of the method used to evaluate blood smears, standard light microscopy with manual review or digitized image and subsequent algorithm analysis, sample processing may be a source of variability. Slide preparation technique is important and poor technique may lead to nonrandom and nonhomogeneous distribution of cells on blood films prepared with the wedge-pull technique.28

In part 1 of study phase A, evaluating only the object class identifier, we observed that the algorithm performed comparably to the CPs when the algorithm evaluated the same FOVs that the CPs evaluated, especially appreciated in the slide-prediction plots. This evaluation was performed to solely evaluate the object class identifier algorithm and exclude possible subsampling variability that may be introduced by selecting individualized areas (FOVs) of monolayers for analysis. The object class identifier algorithm performed well based on the results for the 5-part leukocyte differential count for which the CP selected the FOV and the algorithm analyzed the same FOV.

Part 2 of study phase A was performed to mimic real-world scenarios in which a CP may be requested to review a blood smear for which algorithms have already analyzed and results have already been reported. The reported total allowable error, a quality control measure that sets a limit for combined imprecision (random error) and bias (inaccuracy or systematic error) such that results exceeding this limit affect clinical decision-making for neutrophils and lymphocytes as noted by the American Society for Veterinary Clinical Pathology is 15% and for monocytes and eosinophils is approximately 50%.29 Irrespective of species, reported total allowable error for estimated leukocyte, platelet, and polychromatophil counts is 15% to 20%, 20% to 25%, and 20%, respectively, for dogs and most likely higher for cats. Therefore, results from the present study were well within the total allowable error and reported measurement uncertainty.8,29 Moreover, specifically for accurately identifying reticulocytes in human blood smears, interobserver variability may exceed 30% among laboratory technologists.25

When the same FOVs were evaluated by the CPs and the algorithm, percentages of algorithm results within the 95% or 99% prediction intervals were higher than the percentages of algorithm results within the 95% or 99% prediction intervals when different FOVs were evaluated by the CPs and the algorithm. This outcome was expected because the same portion of the monolayer was evaluated by both the CP and the algorithm. Thus the algorithm would have more likely identified the same cells in number and proportionality that the CPs did in the same FOV that the CPs selected. In contrast, for the second part of study phase A, different portions of the monolayer were most likely evaluated by the CP and the algorithm, such that the variable of the selected monolayer was introduced. By introducing a second variable (the other was object class), results were slightly lower.

In study phase B, the algorithm performed well with an estimated sensitivity of 88% and specificity of 90% for identification of platelet clumps, which is in line with expected variability and shows that the model performs well for identifying platelet clumps, which can be common in veterinary samples.10

All algorithm results in study phase C were within the 95% prediction interval, and correlations were strong between average CP and algorithm results, between individual CP and algorithm results, and between individual CP results (Pearson r ≥ 0.812 for all combinations). This may indicate that in veterinary medicine interobserver variability is not as high as that in human medicine. With only 12 feline slides available for inclusion in this study phase, correlations could have been affected due to the lower numbers; however, this was unavoidable due to the convenience of sample collection.

Overall, this multiuse platform that includes deep-learning, convolutional neural networks of hematology-specific algorithms achieved results comparable to CPs. This platform can be used to complement hematology analyzer CBC results by confirming automated cell counts, detecting platelet clumps, and providing polychromatophil counts. Ongoing training and testing with unseen datasets from real-world blood smears will continue. The iterative human-in-the-loop process is expected to improve model performance in the future. Future iterations will be validated with a new unseen dataset after meeting desired performance targets.

Supplementary Materials

Supplementary materials are posted online at the journal website: avmajournals.avma.org.

Acknowledgments

Authors acknowledge paid editorial assistance by Matthew R. Krecic, DVM, MS, MBA, K-File Medical Writing and Editing Services LLC.

Disclosures

Drs. Morissette, Penn, Hall Sedlak, Rhodes, Loenser, and Goldstein are employees of Zoetis. Dan S. Tippetts is an employee of Techcyte.

No AI-assisted technologies were used in the generation of this manuscript.

Funding

This research/publication was funded by Zoetis.

References

  • 1.

    Villiers E, Ristic J. BSAVA Manual of Canine and Feline Clinical Pathology. 3rd ed. British Small Animal Veterinary Association; 2016:2733.

    • Search Google Scholar
    • Export Citation
  • 2.

    Thrall MA, Weiser G, Allison RW, Campbell TW. Veterinary Hematology and Clinical Chemistry. 2nd ed. John Wiley & Sons, Inc; 2012:318.

    • Search Google Scholar
    • Export Citation
  • 3.

    Zabolotzky SM, Walker DB. Peripheral blood smears. In: Cowell RL, Valenciano AC, eds. Cowell and Tyler's Diagnostic Cytology and Hematology of the Dog and Cat. 5th ed. Elsevier; 2020:438467.

    • Search Google Scholar
    • Export Citation
  • 4.

    Tvedten H, Willard MD. Small Animal Clinical Diagnosis by Laboratory Methods. 5th ed. Saunders; 2012:1237.

  • 5.

    Stirn M, Moritz A, Bauer N. Rate of manual leukocyte differentials in dog, cat and horse blood samples using ADVIA 120 cytograms. BMC Vet Res. 2014;10:125.

    • Search Google Scholar
    • Export Citation
  • 6.

    Kjelgaard-Hansen M, Jensen AL. Is the inherent imprecision of manual leukocyte differential counts acceptable for quantitative purposes? Vet Clin Pathol. 2006;35(3):268270.

    • Search Google Scholar
    • Export Citation
  • 7.

    Fuentes-Arderiu X, Garcia-Panyella M, Dot-Bach D. Between-examiner reproducibility in manual differential leukocyte counting. Accred Qual Assur. 2007;12:643645.

    • Search Google Scholar
    • Export Citation
  • 8.

    Fuentes-Arderiu X, Dot-Bach D. Measurement uncertainty in manual differential leukocyte counting. Clin Chem Lab Med. 2009;47(1):112115.

    • Search Google Scholar
    • Export Citation
  • 9.

    Tvedten HW, Lilliehook IE. Canine differential leukocyte counting with the CellaVision DM96Vision, Sysmex XT-2000iV, and Advia 2120 hematology analyzers and a manual method. Vet Clin Path. 2011;40(3):324339.

    • Search Google Scholar
    • Export Citation
  • 10.

    Paltrinieri S, Paciletti V, Zambarbieri J. Analytical variability of estimated platelet counts on canine blood smears. Vet Clin Pathol. 2018;47(2):197204.

    • Search Google Scholar
    • Export Citation
  • 11.

    Nagamori Y, Sedlak RH, DeRosa A, et al. Evaluation of the VETSCAN IMAGYST: an in‑clinic canine and feline fecal parasite detection system integrated with a deep learning algorithm. Parasit Vectors. 2020;13(1):346. doi:10.1186/s13071-020-04215-x

    • Search Google Scholar
    • Export Citation
  • 12.

    Nagamori Y, Sedlak RH, DeRosa A, et al. Further evaluation and validation of the VETSCAN IMAGYST: in‑clinic feline and canine fecal parasite detection system integrated with a deep learning algorithm. Parasit Vectors. 2021;14(1):89. doi:10.1186/s13071-021-04591-y

    • Search Google Scholar
    • Export Citation
  • 13.

    Shouval R, Fein JA, Savani B, Mohty M, Nagler A. Machine learning and artificial intelligence in haematology. Br J Haematol. 2021;192(2):239250.

    • Search Google Scholar
    • Export Citation
  • 14.

    Wencke W, Haferlach C, Nadarajah N, et al. How artificial intelligence might disrupt diagnostics in hematology in the near future. Oncogene. 2021;40(25):42714280. doi:10.1038/s41388-021-01861-y

    • Search Google Scholar
    • Export Citation
  • 15.

    Pohlkamp C, Jhalani K, Nadarajah N, et al. Machine learning (ML) can successfully support microscopic differential counts of peripheral blood smears in a high throughput hematology laboratory. Blood. 2020;136(suppl 1):4546. doi:10.1182/blood-2020-140215

    • Search Google Scholar
    • Export Citation
  • 16.

    Im H, Pathania D, McFarland PJ, et al. Design and clinical validation of a point-of-care device for the diagnosis of lymphoma via contrast-enhanced microholography and machine learning. Nat Biomed Eng. 2018;2(9):666674. doi:10.1038/s41551-018-0265-3

    • Search Google Scholar
    • Export Citation
  • 17.

    Ko BS, Wang YF, Li JL, et al. Clinically validated machine learning algorithm for detecting residual diseases with multicolor flow cytometry analysis in acute myeloid leukemia and myelodysplastic syndrome. EBioMedicine. 2018; 37:91100. doi:10.1016/j.ebiom.2018.10.042

    • Search Google Scholar
    • Export Citation
  • 18.

    Boldú L, Merino A, Alférez S, Molina A, Acevedo A, Rodellar J. Automatic recognition of different types of acute leukaemia in peripheral blood by image analysis. J Clin Pathol. 2019;72(11):755761.

    • Search Google Scholar
    • Export Citation
  • 19.

    Makhija K, Lincz LF, Attalla K, Scorgie FE, Enjeti AK, Prasad R. White blood cell evaluation in haematological malignancies using a web-based digital microscopy platform. Int J Lab Hematol. 2021;43(6):13791387.

    • Search Google Scholar
    • Export Citation
  • 20.

    Paterson S. Core investigation and laboratory techniques. In: Jackson H, Marsella R, eds. BSAVA Manual of Canine and Feline Dermatology. 4th ed. British Small Animal Veterinary Association; 2022:2431.

    • Search Google Scholar
    • Export Citation
  • 21.

    Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time objective detection. arXiv. Preprint posted online June 8, 2015. doi:10.48550/arXiv.1506.02640

    • Search Google Scholar
    • Export Citation
  • 22.

    Zuraw A, Aeffner F. Whole-slide imaging, tissue imaging analysis, and artificial intelligence in veterinary pathology: an updated introduction and review. Vet Parasitol. 2022;59(1):625. doi:10.1177/03009858211040484

    • Search Google Scholar
    • Export Citation
  • 23.

    Mandal S, Basharat Mones SM, Das A, Balas VE, Nath Shaw R, Ghosh A. Single shot detection for detecting real-time flying objects for unmanned aerial vehicle. In: Nath Shaw R, Ghosh A, Balas VE, Bianchini M, eds. Artificial Intelligence for Future Generation. Elsevier; 2021:3753.

    • Search Google Scholar
    • Export Citation
  • 24.

    Harvey JW. Veterinary Hematology: a Diagnostic Guide and Color Atlas. Saunders; 2012:1132.

  • 25.

    Riley RS, Ben-Ezra JM, Goel R, Tidwell A. Reticulocytes and reticulocyte enumeration. J Clin Lab Anal. 2001;15(5):267294.

  • 26.

    R Core Team. R: a Language and Environment for Statistical Computing. R Foundation for Statistical Computing. 2024. Accessed March 7, 2022. https://www.r-project.org/

    • Search Google Scholar
    • Export Citation
  • 27.

    Rümke CL. The statistically expected variability in differential leukocyte counting. In: Koepke JA, ed. Differential Leukocyte Counting. College of American Pathologists; 1977:3945.

    • Search Google Scholar
    • Export Citation
  • 28.

    Houwen B. The differential cell count. Lab Hematol. 2001;7:89100.

  • 29.

    Nabity MB, Harr KE, Camus MS, Flatland B, Vap LM. ASVCP guidelines: allowable total error hematology. Vet Clin Pathol. 2018;47(1):921.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 248 248 168
PDF Downloads 177 177 99
Advertisement