RESUMO
In 2016, Centers for Disease Control and Prevention (CDC) established surveillance of pregnant women with Zika virus infection and their infants in the U.S. states, territories, and freely associated states. To identify cases of Zika-associated birth defects, subject matter experts review data reported from medical records of completed pregnancies to identify findings that meet surveillance case criteria (manual review). The volume of reported data increased over the course of the Zika virus outbreak in the Americas, challenging the resources of the surveillance system to conduct manual review. Machine learning was explored as a possible method for predicting case status. Ensemble models (using machine learning algorithms including support vector machines, logistic regression, random forests, k-nearest neighbors, gradient boosted trees, and decision trees) were developed and trained using data collected from January 2016-October 2017. Models were developed separately, on data from the U.S. states, non-Puerto Rico territories, and freely associated states (referred to as the U.S. Zika Pregnancy and Infant Registry [USZPIR]) and data from Puerto Rico (referred to as the Zika Active Pregnancy Surveillance System [ZAPSS]) due to differences in data collection and storage methods. The machine learning models demonstrated high sensitivity for identifying cases while potentially reducing volume of data for manual review (USZPIR: 96% sensitivity, 25% reduction in review volume; ZAPSS: 97% sensitivity, 50% reduction in review volume). Machine learning models show potential for identifying cases of Zika-associated birth defects and for reducing volume of data for manual review, a potential benefit in other public health emergency response settings.