RESUMO
BACKGROUND: Clinical babesiosis is diagnosed, and parasite burden is determined, by microscopic inspection of a thick or thin Giemsa-stained peripheral blood smear. However, quantitative analysis by manual microscopy is subject to error. As such, methods for the automated measurement of percent parasitemia in digital microscopic images of peripheral blood smears could improve clinical accuracy, relative to the predicate method. METHODS: Individual erythrocyte images were manually labeled as "parasite" or "normal" and were used to train a model for binary image classification. The best model was then used to calculate percent parasitemia from a clinical validation dataset, and values were compared to a clinical reference value. Lastly, model interpretability was examined using an integrated gradient to identify pixels most likely to influence classification decisions. RESULTS: The precision and recall of the model during development testing were 0.92 and 1.00, respectively. In clinical validation, the model returned increasing positive signal with increasing mean reference value. However, there were 2 highly erroneous false positive values returned by the model. Further, the model incorrectly assessed 3 cases well above the clinical threshold of 10%. The integrated gradient suggested potential sources of false positives including rouleaux formations, cell boundaries, and precipitate as deterministic factors in negative erythrocyte images. CONCLUSIONS: While the model demonstrated highly accurate single cell classification and correctly assessed most slides, several false positives were highly incorrect. This project highlights the need for integrated testing of machine learning-based models, even when models in the development phase perform well.