Of Lyme disease and machine learning in a One Health world

Olaf Berke Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
Centre for Public Health and Zoonoses, University of Guelph, Guelph, ON, Canada
One Health Institute, University of Guelph, Guelph, ON, Canada
Centre for Advancing Responsible and Ethical Artificial Intelligence, University of Guelph, Guelph, ON, Canada

Search for other papers by Olaf Berke in
Current site
Google Scholar
PubMed
Close
 PhD https://orcid.org/0000-0003-3537-0629
,
Sarah T. Chan Centre for Data Management, Innovation and Analytics, Public Health Agency of Canada, Ottawa, ON, Canada

Search for other papers by Sarah T. Chan in
Current site
Google Scholar
PubMed
Close
 MSc
, and
Armin Orang Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada

Search for other papers by Armin Orang in
Current site
Google Scholar
PubMed
Close
 PhD https://orcid.org/0000-0002-4069-9771
Open access

Abstract

Objective

Lyme disease is a vector-borne emerging zoonosis in Ontario driven by human population growth and climate change. Lyme disease is also a prime example of the One Health concept. While little can be done to immediately reverse climate change and population growth, public health must resort to health communication as its best option for disease control until an effective vaccine becomes available. Disease surveillance enabling precision public health has an important role in this respect: one of the goals of disease surveillance is to forecast the future burden of disease to inform those who need to know. The goal of this study was to forecast the burden of Lyme disease using automated machine learning and statistical learning approaches.

Methods

Lyme disease reports were retrieved from Ontario’s integrated Public Health Information System surveillance system from January 2005 to December 2023. The reports from January 2005 to December 2021 were used as training data, and reports from January 2022 to December 2023 served as validation data. Forecasts from a seasonal autoregressive integrated moving-average model were used as a benchmark for forecasts from a feed-forward single-layer neural network machine learning algorithm.

Results

The Lyme disease burden in Ontario is predicted to increase dramatically. Neither the neural network nor the seasonal autoregressive integrated moving-average model proved to be generally more accurate.

Conclusions

The increasing burden of human Lyme disease is concerning to public health, further indicating ecosystem changes and challenges for canine health.

Clinical Relevance

Human Lyme disease surveillance provides useful information to veterinarians.

Abstract

Objective

Lyme disease is a vector-borne emerging zoonosis in Ontario driven by human population growth and climate change. Lyme disease is also a prime example of the One Health concept. While little can be done to immediately reverse climate change and population growth, public health must resort to health communication as its best option for disease control until an effective vaccine becomes available. Disease surveillance enabling precision public health has an important role in this respect: one of the goals of disease surveillance is to forecast the future burden of disease to inform those who need to know. The goal of this study was to forecast the burden of Lyme disease using automated machine learning and statistical learning approaches.

Methods

Lyme disease reports were retrieved from Ontario’s integrated Public Health Information System surveillance system from January 2005 to December 2023. The reports from January 2005 to December 2021 were used as training data, and reports from January 2022 to December 2023 served as validation data. Forecasts from a seasonal autoregressive integrated moving-average model were used as a benchmark for forecasts from a feed-forward single-layer neural network machine learning algorithm.

Results

The Lyme disease burden in Ontario is predicted to increase dramatically. Neither the neural network nor the seasonal autoregressive integrated moving-average model proved to be generally more accurate.

Conclusions

The increasing burden of human Lyme disease is concerning to public health, further indicating ecosystem changes and challenges for canine health.

Clinical Relevance

Human Lyme disease surveillance provides useful information to veterinarians.

Lyme disease (LD) is an emerging vector-borne zoonosis in the US and Canada caused by the bacteria Borrelia burgdorferi. Lyme disease is transmitted between hosts by its vector the black-legged tick (Ixodes scapularis).1 Furthermore, LD presents a prime example of the importance of the One Health concept and approach.

The emergence of LD has various reasons, which are all related to one root cause: human population growth.2 On one hand, population growth directly increases the frequency and thus risk of contact between humans and the tick vector, which results in increasing incidence. On the other hand, human population growth drives climate change leading to shifts in vector habitats and land use patterns, which result in a geographic emergence of the tick vector and the spread of LD.

Furthermore, the range of the tick vector is driven by forestation and deer presence. During the colonization of North America and its related decline in forestation and deer population, ticks are presumed to have retreated to 2 ecological niches in the northern US: the coastal area of Massachusetts and north of Wisconsin.3 The authors report that until the year 2000, the deer population had rebounded to its presumed precolonial strength. A warmer climate likely facilitated the following geographic spread of ticks into Canada.1,3

The temporal emergence of LD in Ontario has been discussed in the literature1 and is visualized in Figure 1, which displays a time series plot of reported monthly LD cases increasing from less than 20 monthly cases in 2005 to a maximum of 658 in 1 month of 2023.

Figure 1
Figure 1

Time series plot of reported Lyme disease (LD) cases in Ontario from 2005 to 2023 (blue line) with an overlaid trend line (red line).

Citation: American Journal of Veterinary Research 86, S1; 10.2460/ajvr.24.10.0300

Lyme disease was first reported in the city of Lyme, CT, in 1977 and linked to changing land-use patterns and specifically the urbanization of former farmland.4 As cities grew with new development areas encroaching on ecological niches, it resulted in an increasing contact rate between human and wildlife populations such as deer and rodents, ie, the hosts of the black-legged tick vector.

The life cycle of the tick vector is important to note. It typically lasts for 2 years, while the ticks go through 4 life stages from eggs to larvae, nymphs, and adults. The life cycle involves several host species because ticks need a blood meal at each life stage.5 The host range includes mainly deer and small rodents. However, it can also include various mammal, avian, and amphibian species. Humans and their companion dogs are mostly infected by nymphs during spring and are recorded as LD cases in summer1; cats rarely contract LD.

Lyme disease in North America and Europe differ. While on both continents the infection is transmitted by ticks of the Ixodes species, the bacterial agent differs. In Europe, Borelia afzelii and Borelia garinii are the causal agents of LD, whereas in North America LD infections are caused by B burgdorferi.6

Control of LD is a complex issue. It is difficult to reverse human population growth and stop climate change. The United Nations Program of Sustainable Development Goals is an attempt to reduce the world’s population via education specifically of women to reduce the fertility rate and thus population growth to achieve climate mitigation.7

Lyme disease is not a vaccine-preventable disease and is thus difficult to control. Public health can only intervene using public communication and education programs about LD. While much about the ecology of LD is known and understood, the control of any emerging disease is a moving target. Therefore, public health must rely on information from disease surveillance to inform the stakeholders in an attempt to affect modifiable risk factors including the use of protective clothing and tick repellents followed by quick tick removal from the human body and that of their pets. One purpose of disease surveillance is to forecast the burden of disease to inform public health awareness and programming for disease control. While surveillance systems for human infections exist, similar reporting systems for companion animals are lacking. Thus the veterinary community relies on reports from diagnostic labs, which cannot be considered representative but can be biased in various ways. Reversing the typical “dogs as sentinels for humans” approach8 and employing “humans as sentinels for dogs”9 are ways to learn about the trends in companion dog LD risk.

For long the application of time series analysis and specifically the seasonal autoregressive integrated moving-average (SARIMA) model to forecast disease incidence reported by disease surveillance systems has been advocated for.10 A recent example is a study11 of the effectiveness of dog rabies control efforts in Bangladesh using a One Health approach. Forecasts from a SARIMA model predict the eradication of dog rabies by 2030.

The rise of AI and machine learning algorithms for the analysis of big data found its application in time series forecasting. Time series data collected in disease surveillance systems are big data due to their dynamic nature, which lets them grow ever larger over time. This dynamic situation calls for automated data analysis. Machine learning algorithms (ie, artificial neural networks) are naturally adaptable to automation. Fitting statistical time series models (ie, the SARIMA model) can also be automated, but the lack of diagnostic checking of model assumptions might bias the analysis and forecasts. Recent studies1214 compare the accuracy of forecasts from machine learning algorithms to those from the statistical SARIMA model. In terms of forecasting accuracy, these studies generally found no advantage of one over the other approach to forecasting, ie, using machine learning algorithms or traditional statistical models.

Therefore, the goal of this study was to explore how AI or machine learning algorithms can support LD surveillance by automating time series forecasting and furthering precision public health using case records from Public Health Ontario.

The specific objectives were to (1) employ automated fitting of a SARIMA model as a benchmark for LD forecasting; (2) use an automated machine learning algorithm, ie, an artificial neural network for time series forecasting; and (3) compare the accuracy of the respective forecasts from SARIMA and the AI–driven neural network.

Methods

The data for this study were the LD case counts as reported to and retrieved from Public Health Ontario’s integrated Public Health Information System surveillance system for January 2005 to December 2023.15 For the analysis, the data were split into training and validation data sets, 2005 to 2021 and 2022 to 2023, respectively.

For temporal exploration, a time series plot was inspected. Further, a decomposition into potential trend and seasonal components using the seasonal and trend decomposition using Loess (STL) method was applied.16 More specifically, the seasonal component was modeled as time invariant, while the trend component was extracted using a 75-month-long window.

Modeling is generally based on a preprocessed time series, which can include centering and scaling of the training data, ie, removing a trend via integration followed by Box-Cox transformation.

As a traditional modeling approach for time series, the SARIMA was applied.17 The SARIMA model was fit using an automated algorithm for optimizing the Akaike information criterion. Automation of the SARIMA modeling was further conditional on results from the STL analysis, ie, the detection of the presence of trend and seasonal components. In addition, a single hidden layer artificial feedforward neural network (ANN) algorithm was used as an example of a machine learning algorithm for comparison. The ANN learned the weights in its hidden layer using the observed training data and 100 runs with random starting values. Forecasts were based on iterative one-step ahead forecasts. Prediction intervals for the forecasts were simulated using bootstrapped residuals. The accuracy of the forecasts from the statistical and machine learning approaches was measured using the mean absolute error of prediction (MAE) and the root mean squared error of prediction. Further details regarding the SARIMA and ANN algorithms as well as the accuracy measures are presented by the authors of the R package forecast.18

Results

The data for this study comprise a total of 10,936 reported cases over 228 months from January 2005 to December 2023, ie, 19 years. The median number of LD cases per month was 13, which ranged from 0 to 688 cases. The STL indicated the presence of a trend and seasonal component. The increasing trend is also visualized in Figure 1. Furthermore, the seasonal swing increased with the trend, providing a rationale for applying a variance-stabilizing data transformation.

Due to zero counts in the early years, 1 out of 4 cases was added to the case counts before allowing for a log transformation of the case counts. Following analysis, all results were back-transformed to the observational scale.

The models as fit by their automated fitting algorithms are as follows: SARIMA(p,d,q)(P,D,Q)s = SARIMA(0,1,1)(1,1,2)12 and ANN(p,P,k)s = ANN(4,1,3)12. The forecast accuracies for the 2 models and 2 forecast horizons (12 months of 2023 and 24 months of 2022 to 2023) are reported in Table 1.

Table 1

Accuracy measures for the seasonal auto­regressive integrated moving-average (SARIMA) and artificial feedforward neural network (ANN) forecast using different forecast horizons of 1 year (2022) and 2 years (2022 to 2023).

Model/accuracy 2022 2022–2023
SARIMA(0,1,1)(1,1,2)12
 MAE 35 92
 RMSE 80 131
ANN(4,1,3)12
 MAE 52 38
 RMSE 107 57

MAE = Mean absolute error of prediction. RMSE = Root mean squared error of prediction. Minimal values appear in italics.

The forecasts of the future LD reports based on the SARIMA and ANN models are visualized together with the truly observed monthly LD frequency in Figure 2.

Figure 2
Figure 2

Time series plot of the monthly reported LD case counts (black line) from 2020 to 2023, with its forecasts for the years 2022 and 2023 from the seasonal autoregressive integrated moving-average and artificial feedforward neural network models (blue and orange lines, respectively) using training data from 2005 to and including 2022.

Citation: American Journal of Veterinary Research 86, S1; 10.2460/ajvr.24.10.0300

Comparing the forecast accuracy measures (which should be minimized to indicate more accurate predictions), it appears that in the current example, the SARIMA model is more favorable for 1 year of monthly forecasts, whereas the ANN has an advantage considering the 2-year forecast horizon.

Discussion

The goal of this study was to explore automated time series forecasting using traditional statistical modeling using the SARIMA modeling approach and compare this to an AI algorithm, ie, the neural network modeling approach. Comparing their forecasting accuracy in terms of the root mean squared error of prediction is more of academic interest as opposed to the MAE, which can be easily interpreted as the average error of the forecasts in terms of the observed time series, ie, the reported LD case counts. Here the accuracy is at a level of MAE = 51 cases for the ANN when forecasts for 12 to 24 months are considered but varies for the SARIMA from 30 to 88 cases. For reference, the observed LD case count varies during 2022 to 2023 from 16 to 688. It can be concluded from Figure 2 that the forecasts of both models are reasonably accurate and only stray from the truth during the summer peaks.

Furthermore, the study shows that neither model is always better at forecasting than the other. This is a reflection of the “no free lunch theorem” of machine learning,19 which says that no generally optimal method exists, rather the predictive accuracy is situational. A promising strategy to go forward is to employ so-called ensemble forecasts, which average forecasts from an ensemble of models. Unless a single model is always the best at forecasting, it must be assumed that an average over the forecasts from a set or an ensemble of models will be more accurate. The strategy is successful in situations where over- and underprediction effects can be averaged out over the models in the ensemble.

Finally, the study shows that LD is emerging at increasing speed in Ontario (Figure 1). This is deeply concerning and can be considered an indication of a shift in the ecosystem due to climate change or increasing contact with the tick vector. This result also indicates that beyond the human population, their companion dogs are at an equally increasing risk for LD infections. A closer look at Figure 1 also shows a curious 2-year cycle with elevated summer peaks compared to the previous year. This pattern, if confirmed in other jurisdictions, might be related to a 2-year lifecycle of the tick vector and thus of interest to One Health stakeholders.

In conclusion, all One Health stakeholders should be concerned about the increasing trend in LD reports. Time series forecasting can accurately predict future caseloads and support timed public health messaging as the only control measure until vaccines against LD become available.

Acknowledgments

None reported.

Disclosures

The authors have nothing to disclose. No AI-assisted technologies were used in the composition of this manuscript.

Funding

The research was funded by the authors’ departments.

References

  • 1.

    Nelder MP, Wijayasri S, Russell CB, et al. The continued rise of Lyme disease in Ontario, Canada: 2017. Can Commun Dis Rep. 2018;44(10):231236. doi:10.14745/ccdr.v44i10a01

    • Search Google Scholar
    • Export Citation
  • 2.

    Spernovasilis N, Markaki I, Papadakis M, Tsioutis C, Markaki L. Epidemics and pandemics: is human overpopulation the elephant in the room? Ethics Med Public Health. 2021;19:100728. doi:10.1016/j.jemep.2021.100728

    • Search Google Scholar
    • Export Citation
  • 3.

    Eisen L, Eisen RJ. Changes in the geographic distribution of the blacklegged tick, Ixodes scapularis, in the United States. Ticks Tick Borne Dis. 2023;14(6):102233. doi:10.1016/j.ttbdis.2023.102233

    • Search Google Scholar
    • Export Citation
  • 4.

    A brief history of Lyme disease in Connecticut. Connecticut State Department of Public Health. Last modified July 1, 2019. Accessed October 15, 2024. https://portal.ct.gov/dph/epidemiology-and-emerging-infections/a-brief-history-of-lyme-disease-in-connecticut

    • Search Google Scholar
    • Export Citation
  • 5.

    Tick lifecycles. CDC. Last modified October 11, 2024. Accessed October 15, 2024. https://www.cdc.gov/ticks/about/tick-lifecycles.html

  • 6.

    Marques AR, Strle F, Wormser GP. Comparison of Lyme disease in the United States and Europe. Emerg Infect Dis. 2021;27(8):20172024. doi:10.3201/eid2708.204763

    • Search Google Scholar
    • Export Citation
  • 7.

    Muttarak R. Population and climate change: decent living for all without compromising climate mitigation. UN Chronicle. April 8, 2024. Accessed January 31, 2025. https://www.un.org/en/population-climate-change-decent-living-all-without-compromising-climate-mitigation

    • Search Google Scholar
    • Export Citation
  • 8.

    Sexton C, Ruple A. Canine sentinels and our shared exposome. Science. 2024;384(6701):11701172. doi:10.1126/science.adl0426

  • 9.

    Rabinowitz P, Scotch M, Conti L. Human and animal sentinels for shared health risks. Vet Ital. 2009;45(1):2324.

  • 10.

    Allard R. Use of time-series analysis in infectious disease surveillance. Bull World Health Organ. 1998;76(4):327333.

  • 11.

    Ghosh S, Hasan MN, Nath ND, et al. Rabies control in Bangladesh and prediction of human rabies cases by 2030: a One Health approach. Lancet Reg Health Southeast Asia. 2024;27:100452. doi:10.1016/j.lansea.2024.100452

    • Search Google Scholar
    • Export Citation
  • 12.

    Berke O, Trotz-Williams L, de Montigny S. Good times bad times: automated forecasting of seasonal cryptosporidiosis in Ontario using machine learning. Can Commun Dis Rep. 2020;46(6):192197. doi:10.14745/ccdr.v46i06a07

    • Search Google Scholar
    • Export Citation
  • 13.

    Punyapornwithaya V, Mishra P, Sansamur C, et al. Time-series analysis for the number of foot and mouth disease outbreak episodes in cattle farms in Thailand using data from 2010–2020. Viruses. 2022;14(7):1367. doi:10.3390/v14071367

    • Search Google Scholar
    • Export Citation
  • 14.

    Orang A, Berke O, Poljak Z, Greer AL, Rees EE, Ng V. Forecasting seasonal influenza activity in Canada—comparing seasonal auto-regressive integrated moving average and artificial neural network approaches for public health preparedness. Zoonoses Public Health. 2024;71(3):304313. doi:10.1111/zph.13114

    • Search Google Scholar
    • Export Citation
  • 15.

    Lyme disease. Public Health Ontario. Last modified May 3, 2024. Accessed October 15, 2024. https://www.publichealthontario.ca/diseases-and-conditions/infectious-diseases/vector-borne-zoonotic-diseases/lyme-disease

    • Search Google Scholar
    • Export Citation
  • 16.

    Cleveland RB, Cleveland WS, McRae JE, Terpenning I. STL: a seasonal-trend decomposition procedure based on Loess. J Off Stat. 1990;6(1):373.

    • Search Google Scholar
    • Export Citation
  • 17.

    Box G, Jenkins G. Time Series Analysis: Forecasting and Control. Holden-Day; 1970.

  • 18.

    Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. 3rd ed. OTexts; 2021. http://OTexts.com/fpp3

  • 19.

    Wolpert DH. The lack of a priori distinctions between learning algorithms. Neural Comput 1996;8(7):13411390.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1151 1151 304
PDF Downloads 414 414 78
Advertisement