RESUMEN
CONTEXT: Public health epidemiologists monitor data sources for disease outbreaks and other events of public health concern, but manual review of records to identify cases of interest is slow and labor-intensive and may not reflect evolving data practices. To automatically identify cases from electronic data sources, epidemiologists must use "case definitions" or formal logic that captures the criteria used to identify a record as a case of interest. OBJECTIVE: To establish a methodology for development and evaluation of case definitions. A logical evaluation framework to approach case definitions will allow jurisdictions the flexibility to implement a case definition tailored to their goals and available data. DESIGN: Case definition development is explained as a process with multiple logical components combining free-text and categorical data fields. The process is illustrated with the development of a case definition to identify emergency medical services (EMS) call records related to opioid overdoses in Maryland. SETTING: The Maryland Department of Health (MDH) installation of the Electronic Surveillance System for Early Notification of Community-Based Epidemics (ESSENCE), which began capturing EMS call records in ESSENCE in 2019 to improve statewide coverage of all-hazards health issues. RESULTS: We describe a case definition evaluation framework and demonstrate its application through development of an opioid overdose case definition to be used in MDH ESSENCE. We show the iterative process of development, from defining how a case can be identified conceptually to examining each component of the conceptual definition and then exploring how to capture that component using available data. CONCLUSION: We present a framework for developing and qualitatively assessing case definitions and demonstrate an application of the framework to identifying opioid overdose incidents from MDH EMS data. We discuss guidelines to support jurisdictions in applying this framework to their own data and public health challenges to improve local surveillance capability.
Asunto(s)
Sobredosis de Opiáceos , Humanos , Maryland/epidemiología , Sobredosis de Opiáceos/diagnóstico , Sobredosis de Opiáceos/epidemiología , Salud Pública/métodos , Salud Pública/normas , Vigilancia de la Población/métodos , Servicios Médicos de Urgencia/métodos , Servicios Médicos de Urgencia/normas , Servicios Médicos de Urgencia/estadística & datos numéricosRESUMEN
BACKGROUND: The opioid epidemic in the United States has devastated the lives of individuals and imposed decades-long opportunity costs on the community. METHODS: We analyzed Emergency Medical Services (EMS) data from the Maryland Department of Health installation of the Electronic Surveillance System for Early Notification of Community-Based Epidemics (ESSENCE) to assess the impact of COVID-19 on EMS call volume and how COVID-19 impacted patients' decisions whether to accept transport to a hospital following an EMS call. RESULTS: The rate of patients accepting transportation via EMS to a hospital emergency department (ED) declined for both opioid-related and non-opioid-related calls from prepandemic (before April 2020) to mid-pandemic (mid-March 2020 to mid-April 2020). The opioid-related call volume increased more from pre- to mid-pandemic for male patients than for female patients, and this "gender gap" had not returned to prepandemic levels by April 2021. CONCLUSION: Consistent with reports from other states, the pandemic worsened the opioid crisis in Maryland, impacting some populations more than others while also decreasing the likelihood that individuals experiencing an opioid-related overdose would seek further medical care following an EMS call.
Asunto(s)
COVID-19 , Servicios Médicos de Urgencia , Sobredosis de Opiáceos , Humanos , Masculino , Femenino , Estados Unidos , COVID-19/epidemiología , Pandemias , Maryland/epidemiología , Analgésicos Opioides , Servicio de Urgencia en HospitalRESUMEN
To compare the performance of the standard Historical Limits Method (HLM), with a modified HLM (MHLM), the Farrington-like Method (FLM), and the Serfling-like Method (SLM) in detecting simulated outbreak signals. We used weekly time series data from 12 infectious diseases from the U.S. Centers for Disease Control and Prevention's National Notifiable Diseases Surveillance System (NNDSS). Data from 2006 to 2010 were used as baseline and from 2011 to 2014 were used to test the four detection methods. MHLM outperformed HLM in terms of background alert rate, sensitivity, and alerting delay. On average, SLM and FLM had higher sensitivity than MHLM. Among the four methods, the FLM had the highest sensitivity and lowest background alert rate and alerting delay. Revising or replacing the standard HLM may improve the performance of aberration detection for NNDSS standard weekly reports.
Asunto(s)
Enfermedades Transmisibles/epidemiología , Brotes de Enfermedades , Vigilancia de la Población/métodos , Humanos , Estados Unidos/epidemiologíaRESUMEN
National syndromic surveillance systems require optimal anomaly detection methods. For method performance comparison, we injected multi-day signals stochastically drawn from lognormal distributions into time series of aggregated daily visit counts from the U.S. Centers for Disease Control and Prevention's BioSense syndromic surveillance system. The time series corresponded to three different syndrome groups: rash, upper respiratory infection, and gastrointestinal illness. We included a sample of facilities with data reported every day and with median daily syndromic counts ⩾1 over the entire study period. We compared anomaly detection methods of five control chart adaptations, a linear regression model and a Poisson regression model. We assessed sensitivity and timeliness of these methods for detection of multi-day signals. At a daily background alert rate of 1% and 2%, the sensitivities and timeliness ranged from 24 to 77% and 3.3 to 6.1days, respectively. The overall sensitivity and timeliness increased substantially after stratification by weekday versus weekend and holiday. Adjusting the baseline syndromic count by the total number of facility visits gave consistently improved sensitivity and timeliness without stratification, but it provided better performance when combined with stratification. The daily syndrome/total-visit proportion method did not improve the performance. In general, alerting based on linear regression outperformed control chart based methods. A Poisson regression model obtained the best sensitivity in the series with high-count data.
Asunto(s)
Algoritmos , Biovigilancia , Brotes de Enfermedades , Centers for Disease Control and Prevention, U.S. , Modelos Lineales , Vigilancia de la Población , Sensibilidad y Especificidad , Estados UnidosRESUMEN
BACKGROUND: A small proportion of high-need patients persistently use the bulk of health care services and incur disproportionate costs. Population health management (PHM) programs often refer to these patients as persistent high utilizers (PHUs). Accurate PHU prediction enables PHM programs to better align scarce health care resources with high-need PHUs while generally improving outcomes. While prior research in PHU prediction has shown promise, traditional regression methods used in these studies have yielded limited accuracy. OBJECTIVE: We are seeking to improve PHU predictions with an ensemble approach in a retrospective observational study design using insurance claim records. METHODS: We defined a PHU as a patient with health care costs in the top 20% of all patients for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in next 24 months. Our study population included 165,595 patients in the Johns Hopkins Health Care plan, with 8359 (5.1%) patients identified as PHUs in 2014 and 2015. We assessed the performance of several standalone machine learning methods and then an ensemble approach combining multiple models. RESULTS: The candidate ensemble with complement naïve Bayes and random forest layers produced increased sensitivity and positive predictive value (PPV; 49.0% and 50.3%, respectively) compared to logistic regression (46.8% and 46.1%, respectively). CONCLUSIONS: Our results suggest that ensemble machine learning can improve prediction of care management needs. Improved PPV implies reduced incorrect referral of low-risk patients. With the improved sensitivity/PPV balance of this approach, resources may be directed more efficiently to patients needing them most.
RESUMEN
INTRODUCTION: Electronic influenza surveillance systems aid in health surveillance and clinical decision-making within the emergency department (ED). While major advances have been made in integrating clinical decision-making tools within the electronic health record (EHR), tools for sharing surveillance data are often piecemeal, with the need for data downloads and manual uploads to shared servers, delaying time from data acquisition to end-user. Real-time surveillance can help both clinicians and public health professionals recognize circulating influenza earlier in the season and provide ongoing situational awareness. METHODS: We created a prototype, cloud-based, real-time reporting system in two large, academically affiliated EDs that streamed continuous data on a web-based dashboard within hours of specimen collection during the influenza season. Data included influenza test results (positive or negative) coupled with test date, test instrument geolocation, and basic patient demographics. The system provided immediate reporting to frontline clinicians and to local, state, and federal health department partners. RESULTS: We describe the process, infrastructure requirements, and challenges of developing and implementing the prototype system. Key process-related requirements for system development included merging data from the molecular test (GeneXpert) with the hospitals' EHRs, securing data, authorizing/authenticating users, and providing permissions for data access refining visualizations for end-users. CONCLUSION: In this case study, we effectively integrated multiple data systems at four distinct hospital EDs, relaying data in near real time to hospital-based staff and local and national public health entities, to provide laboratory-confirmed influenza test results during the 2014-2015 influenza season. Future innovations need to focus on integrating the dashboard within the EHR and clinical decision tools.
Asunto(s)
Gripe Humana , Nube Computacional , Servicio de Urgencia en Hospital , Humanos , Gripe Humana/diagnóstico , Gripe Humana/epidemiología , Vigilancia de la Población/métodos , Estaciones del AñoRESUMEN
Algorithms for identifying public health threats or disease outbreaks are vulnerable to false alarms arising from sudden shifts in health-care utilization or data participation. This paper describes a method of reducing false alerts in automated public health surveillance algorithms, and in particular, automated syndromic surveillance algorithms, that rely on health-care utilization data. The technique is based on monitoring syndromic counts with reference to a suitable background, or reference, series of counts. The suitability of the background time series in decreasing the false-alarm rate will be shown to be related mathematically to the so-called mutual information that exists between the random variables representing the syndromic and background time series of counts. The method can be understood as a noise cancellation filter technique in which one noisy (reference) channel is used to cancel the background noise of the monitored (measured) channel. The issues discussed here may also be relevant to the appropriate use of rates in epidemiology and biostatistics.
Asunto(s)
Sesgo , Biovigilancia/métodos , Modelos Estadísticos , Algoritmos , Bioestadística/métodos , Botulismo/epidemiología , Simulación por Computador , Atención a la Salud/estadística & datos numéricos , Procesamiento Automatizado de Datos , Exantema/epidemiología , Humanos , Teoría de la Información , Método de Montecarlo , Curva ROC , SíndromeRESUMEN
This paper describes the problem of public health monitoring for waterborne disease outbreaks using disparate evidence from health surveillance data streams and environmental sensors. We present a combined monitoring approach along with examples from a recent project at the Johns Hopkins University Applied Physics Laboratory in collaboration with the U.S. Environmental Protection Agency. The project objective was to build a module for the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) to include water quality data with health indicator data for the early detection of waterborne disease outbreaks. The basic question in the fused surveillance application is 'What is the likelihood of the public health threat of interest given recent information from available sources of evidence?' For a scientific perspective, we formulate this question in terms of the estimation of positive predictive value customary in classical epidemiology, and we present a solution framework using Bayesian Networks (BN). An overview of the BN approach presents advantages, disadvantages, and required adaptations needed for a fused surveillance capability that is scalable and robust relative to the practical data environment. In the BN project, we built a top-level health/water-quality fusion BN informed by separate waterborne-disease-related networks for the detection of water contamination and human health effects. Elements of the art of developing networks appropriate to this environment are discussed with examples. Results of applying these networks to a simulated contamination scenario are presented.
Asunto(s)
Biovigilancia/métodos , Brotes de Enfermedades/estadística & datos numéricos , Monitoreo del Ambiente/métodos , Algoritmos , Teorema de Bayes , Simulación por Computador , Técnicas de Apoyo para la Decisión , Enfermedad/etiología , Indicadores de Salud , Humanos , Toxinas Marinas/toxicidad , Oxocinas/toxicidad , Vigilancia de la Población/métodos , Valor Predictivo de las Pruebas , Prevalencia , Probabilidad , Microbiología del Agua , Contaminación del Agua/efectos adversos , Contaminación del Agua/análisisRESUMEN
BACKGROUND: Automated surveillance systems require statistical methods to recognize increases in visit counts that might indicate an outbreak. In prior work we presented methods to enhance the sensitivity of C2, a commonly used time series method. In this study, we compared the enhanced C2 method with five regression models. METHODS: We used emergency department chief complaint data from US CDC BioSense surveillance system, aggregated by city (total of 206 hospitals, 16 cities) during 5/2008-4/2009. Data for six syndromes (asthma, gastrointestinal, nausea and vomiting, rash, respiratory, and influenza-like illness) was used and was stratified by mean count (1-19, 20-49, ≥50 per day) into 14 syndrome-count categories. We compared the sensitivity for detecting single-day artificially-added increases in syndrome counts. Four modifications of the C2 time series method, and five regression models (two linear and three Poisson), were tested. A constant alert rate of 1% was used for all methods. RESULTS: Among the regression models tested, we found that a Poisson model controlling for the logarithm of total visits (i.e., visits both meeting and not meeting a syndrome definition), day of week, and 14-day time period was best. Among 14 syndrome-count categories, time series and regression methods produced approximately the same sensitivity (<5% difference) in 6; in six categories, the regression method had higher sensitivity (range 6-14% improvement), and in two categories the time series method had higher sensitivity. DISCUSSION: When automated data are aggregated to the city level, a Poisson regression model that controls for total visits produces the best overall sensitivity for detecting artificially added visit counts. This improvement was achieved without increasing the alert rate, which was held constant at 1% for all methods. These findings will improve our ability to detect outbreaks in automated surveillance system data.
Asunto(s)
Enfermedades Transmisibles , Brotes de Enfermedades/prevención & control , Vigilancia de la Población/métodos , Bioterrorismo/prevención & control , Brotes de Enfermedades/estadística & datos numéricos , Humanos , Informática Médica , Modelos Estadísticos , Análisis de RegresiónRESUMEN
BACKGROUND: The Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE) is a secure web-based tool that enables health care practitioners to monitor health indicators of public health importance for the detection and tracking of disease outbreaks, consequences of severe weather, and other events of concern. The ESSENCE concept began in an internally funded project at the Johns Hopkins University Applied Physics Laboratory, advanced with funding from the State of Maryland, and broadened in 1999 as a collaboration with the Walter Reed Army Institute for Research. Versions of the system have been further developed by Johns Hopkins University Applied Physics Laboratory in multiple military and civilian programs for the timely detection and tracking of health threats. OBJECTIVE: This study aims to describe the components and development of a biosurveillance system increasingly coordinating all-hazards health surveillance and infectious disease monitoring among large and small health departments, to list the key features and lessons learned in the growth of this system, and to describe the range of initiatives and accomplishments of local epidemiologists using it. METHODS: The features of ESSENCE include spatial and temporal statistical alerting, custom querying, user-defined alert notifications, geographical mapping, remote data capture, and event communications. To expedite visualization, configurable and interactive modes of data stratification and filtering, graphical and tabular customization, user preference management, and sharing features allow users to query data and view geographic representations, time series and data details pages, and reports. These features allow ESSENCE users to gather and organize the resulting wealth of information into a coherent view of population health status and communicate findings among users. RESULTS: The resulting broad utility, applicability, and adaptability of this system led to the adoption of ESSENCE by the Centers for Disease Control and Prevention, numerous state and local health departments, and the Department of Defense, both nationally and globally. The open-source version of Suite for Automated Global Electronic bioSurveillance is available for global, resource-limited settings. Resourceful users of the US National Syndromic Surveillance Program ESSENCE have applied it to the surveillance of infectious diseases, severe weather and natural disaster events, mass gatherings, chronic diseases and mental health, and injury and substance abuse. CONCLUSIONS: With emerging high-consequence communicable diseases and other health conditions, the continued user requirement-driven enhancements of ESSENCE demonstrate an adaptable disease surveillance capability focused on the everyday needs of public health. The challenge of a live system for widely distributed users with multiple different data sources and high throughput requirements has driven a novel, evolving architecture design.
Asunto(s)
Epidemias , Salud Pública , Electrónica , Humanos , Vigilancia de la Población , Informática en Salud PúblicaRESUMEN
Feral swine populations in the United States (US) are capable of carrying diseases that threaten the health of the domestic swine industry. Performing routine, near-real time monitoring for an unusual rise in feral swine slaughter condemnation will increase situational awareness and early detection of potential animal health issues, trends, and emerging diseases. In preparation to add feral swine to APHIS weekly monitoring, a descriptive analysis of feral swine slaughter and condemnations was conducted to understand the extent of commercial feral swine slaughter in the US at federally inspected slaughter establishments and to determine which condemnation reasons should be included. There were 17 establishments that slaughtered 242,198 feral swine across seven states from 2017 to 2019. For all 17 establishments combined, feral swine accounted for 63% of slaughtered animals. A total of 23 types of condemnation reasons were noted: Abscess/Pyemia, Arthritis, Contamination, Deads, Emaciation, General Miscellaneous, Icterus, Injuries, Metritis, Miscellaneous Degenerative & Dropsical Condition, Miscellaneous Inflammatory Diseases, Miscellaneous Parasitic Conditions, Moribund, Nephritis/Pyelitis, Non-ambulatory, Pericarditis, Pneumonia, Residue, Sarcoma, Septicemia, Sexual Odor, Toxemia, and Uremia. Exploratory analysis was conducted to determine which condemnation reasons should be included for weekly monitoring. For most condemn reasons, weeks of unusually high condemnations were noted. For example, a period of high pneumonia condemnations occurred from December 2, 2018 through February 3, 2019 with a spike on January 6, 2019 and a spike in dead swine occurred on November 3, 2019. The seasonal impacts on limited quality food resources, seasonal variation in the pathogen(s) causing pneumonia, and harsher weather are suspected to have an impact on the higher condemnation rates of pneumonia and dead swine during the winter months. Based on condemnation frequencies and the likelihood of enabling situational awareness and early detection of feral swine health emerging diseases, the following were selected for weekly monitoring: abscess/pyemia, contamination/peritonitis, deads, emaciation, injuries, miscellaneous parasitic conditions, moribund, pneumonia and septicemia. Detection of notable increases in condemnation reasons strongly suggestive of foreign animal or emerging diseases should contribute valuable evidence toward the overall disease discovery process when the anomalies are both confirmed with follow up investigation and combined with other types of surveillance.
RESUMEN
BACKGROUND: A high proportion of health care services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing costs and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve health care system efficiencies and improve the overall quality of patient care. OBJECTIVE: The aim of this study was to detect key classes of diseases and medications among the study population and to assess the predictive value of these classes in identifying PHUs. METHODS: This study was a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring health care costs in the top 20% of all patients' costs for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied latent class analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in health care utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic covariates, and health utilization covariates. Predictive powers of the regression models were assessed and compared using standard metrics. RESULTS: We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9% were women, 3.3% had ≥1 hospitalization, and 19.1% had 10+ outpatient visits in 2013. A total of 8359 (5.09%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: acute upper respiratory infection (URI) (n=53,232; 4.6% PHUs), mental health (n=34,456; 12.8% PHUs), otitis media (n=24,992; 4.5% PHUs), and musculoskeletal (n=24,799; 15.5% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model that included LCA categories (F1=38.62%) compared to that of a complex risk stratification model with a full set of predictors (F1=48.20%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the mental health and musculoskeletal subpopulations (F1-scores of 48.69% and 48.15%, respectively). F1-scores were lower than that of the complex model when the LCA-enabled models were limited to the otitis media and acute URI subpopulations (45.77% and 43.05%, respectively). CONCLUSIONS: Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other health care settings.
RESUMEN
BioSense is a US national system that uses data from health information systems for automated disease surveillance. We studied 4 time-series algorithm modifications designed to improve sensitivity for detecting artificially added data. To test these modified algorithms, we used reports of daily syndrome visits from 308 Department of Defense (DoD) facilities and 340 hospital emergency departments (EDs). At a constant alert rate of 1%, sensitivity was improved for both datasets by using a minimum standard deviation (SD) of 1.0, a 14-28 day baseline duration for calculating mean and SD, and an adjustment for total clinic visits as a surrogate denominator. Stratifying baseline days into weekdays versus weekends to account for day-of-week effects increased sensitivity for the DoD data but not for the ED data. These enhanced methods may increase sensitivity without increasing the alert rate and may improve the ability to detect outbreaks by using automated surveillance system data.
Asunto(s)
Algoritmos , Biovigilancia/métodos , Enfermedades Transmisibles Emergentes/epidemiología , Automatización , Brotes de Enfermedades/estadística & datos numéricos , Servicios Médicos de Urgencia/estadística & datos numéricos , Métodos Epidemiológicos , Humanos , Informática en Salud Pública/métodos , Sensibilidad y Especificidad , Estados Unidos/epidemiologíaRESUMEN
This paper discusses further advances in making robust predictions with the Holt-Winters forecasts for a variety of syndromic time series behaviors and introduces a control-chart detection approach based on these forecasts. Using three collections of time series data, we compare biosurveillance alerting methods with quantified measures of forecast agreement, signal sensitivity, and time-to-detect. The study presents practical rules for initialization and parameterization of biosurveillance time series. Several outbreak scenarios are used for detection comparison. We derive an alerting algorithm from forecasts using Holt-Winters-generalized smoothing for prospective application to daily syndromic time series. The derived algorithm is compared with simple control-chart adaptations and to more computationally intensive regression modeling methods. The comparisons are conducted on background data from both authentic and simulated data streams. Both types of background data include time series that vary widely by both mean value and cyclic or seasonal behavior. Plausible, simulated signals are added to the background data for detection performance testing at signal strengths calculated to be neither too easy nor too hard to separate the compared methods. Results show that both the sensitivity and the timeliness of the Holt-Winters-based algorithm proved to be comparable or superior to that of the more traditional prediction methods used for syndromic surveillance.
Asunto(s)
Algoritmos , Biovigilancia/métodos , Bioestadística/métodos , Interpretación Estadística de Datos , Brotes de Enfermedades/estadística & datos numéricos , Humanos , Método de Montecarlo , Análisis de Regresión , Factores de TiempoRESUMEN
BACKGROUND: The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. METHODS: The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate) and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. RESULTS: Simple estimation methods that account for day-of-week (DOW) data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. CONCLUSION: The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving different spatial resolution or other syndromes can yield further improvement.
Asunto(s)
Biovigilancia/métodos , Análisis por Conglomerados , Bases de Datos Factuales/normas , HumanosRESUMEN
BACKGROUND: Surveillance of univariate syndromic data as a means of potential indicator of developing public health conditions has been used extensively. This paper aims to improve the performance of detecting outbreaks by using a background forecasting algorithm based on the adaptive recursive least squares method combined with a novel treatment of the Day of the Week effect. METHODS: Previous work by the first author has suggested that univariate recursive least squares analysis of syndromic data can be used to characterize the background upon which a prediction and detection component of a biosurvellance system may be built. An adaptive implementation is used to deal with data non-stationarity. In this paper we develop and implement the RLS method for background estimation of univariate data. The distinctly dissimilar distribution of data for different days of the week, however, can affect filter implementations adversely, and so a novel procedure based on linear transformations of the sorted values of the daily counts is introduced. Seven-days ahead daily predicted counts are used as background estimates. A signal injection procedure is used to examine the integrated algorithm's ability to detect synthetic anomalies in real syndromic time series. We compare the method to a baseline CDC forecasting algorithm known as the W2 method. RESULTS: We present detection results in the form of Receiver Operating Characteristic curve values for four different injected signal to noise ratios using 16 sets of syndromic data. We find improvements in the false alarm probabilities when compared to the baseline W2 background forecasts. CONCLUSION: The current paper introduces a prediction approach for city-level biosurveillance data streams such as time series of outpatient clinic visits and sales of over-the-counter remedies. This approach uses RLS filters modified by a correction for the weekly patterns often seen in these data series, and a threshold detection algorithm from the residuals of the RLS forecasts. We compare the detection performance of this algorithm to the W2 method recently implemented at CDC. The modified RLS method gives consistently better sensitivity at multiple background alert rates, and we recommend that it should be considered for routine application in bio-surveillance systems.
Asunto(s)
Brotes de Enfermedades , Vigilancia de la Población/métodos , Algoritmos , Instituciones de Atención Ambulatoria , Interpretación Estadística de Datos , Predicción , Humanos , Análisis de los Mínimos Cuadrados , Personal Militar , Infecciones del Sistema Respiratorio/epidemiologíaRESUMEN
INTRODUCTION: The Risk Identification Unit (RIU) of the US Dept. of Agriculture's Center for Epidemiology and Animal Health (CEAH) conducts weekly surveillance of national livestock health data and routine coordination with agricultural stakeholders. As part of an initiative to increase the number of species, health issues, and data sources monitored, CEAH epidemiologists are building a surveillance system based on weekly syndromic counts of laboratory test orders in consultation with Colorado State University laboratorians and statistical analysts from the Johns Hopkins University Applied Physics Laboratory. Initial efforts focused on 12 years of equine test records from three state labs. Trial syndrome groups were formed based on RIU experience and published literature. Exploratory analysis, stakeholder input, and laboratory workflow details were needed to modify these groups and filter the corresponding data to eliminate alerting bias. Customized statistical detection methods were sought for effective monitoring based on specialized laboratory information characteristics and on the likely presentation and animal health significance of diseases associated with each syndrome. METHODS: Data transformation and syndrome formation focused on test battery type, test name, submitter source organization, and specimen type. We analyzed time series of weekly counts of tests included in candidate syndrome groups and conducted an iterative process of data analysis and veterinary consultation for syndrome refinement and record filters. This process produced a rule set in which records were directly classified into syndromes using only test name when possible, and otherwise, the specimen type or related body system was used with test name to determine the syndrome. Test orders associated with government regulatory programs, veterinary teaching hospital testing protocols, or research projects, rather than clinical concerns, were excluded. We constructed a testbed for sets of 1000 statistical trials and applied a stochastic injection process assuming lognormally distributed incubation periods to choose an alerting algorithm with the syndrome-required sensitivity and an alert rate within the specified acceptable range for each resulting syndrome. Alerting performance of the EARS C3 algorithm traditionally used by CEAH was compared to modified C2, CuSUM, and EWMA methods, with and without outlier removal and adjustments for the total weekly number of non-mandatory tests. RESULTS: The equine syndrome groups adopted for monitoring were abortion/reproductive, diarrhea/GI, necropsy, neurological, respiratory, systemic fungal, and tickborne. Data scales, seasonality, and variance differed widely among the weekly time series. Removal of mandatory and regulatory tests reduced weekly observed counts significantly-by >80% for diarrhea/GI syndrome. The RIU group studied outcomes associated with each syndrome and called for detection of single-week signals for most syndromes with expected false-alert intervals >8 and <52 weeks, 8-week signals for neurological and tickborne monitoring (requiring enhanced sensitivity), 6-week signals for respiratory, and 4-week signals for systemic fungal. From the test-bed trials, recommended methods, settings and thresholds were derived. CONCLUSIONS: Understanding of laboratory submission sources, laboratory workflow, and of syndrome-related outcomes are crucial to form syndrome groups for routine monitoring without artifactual alerting. Choices of methods, parameters, and thresholds varied by syndrome and depended strongly on veterinary epidemiologist-specified performance requirements.
Asunto(s)
Técnicas de Laboratorio Clínico/tendencias , Enfermedades de los Caballos , Vigilancia de Guardia/veterinaria , Algoritmos , Animales , Técnicas de Laboratorio Clínico/veterinaria , Colorado , Brotes de Enfermedades/veterinaria , Enfermedades de los Caballos/diagnóstico , Caballos , Vigilancia de la PoblaciónRESUMEN
OBJECTIVE: Broadly, this research aims to improve the outbreak detection performance and, therefore, the cost effectiveness of automated syndromic surveillance systems by building novel, recombinant temporal aberration detection algorithms from components of previously developed detectors. METHODS: This study decomposes existing temporal aberration detection algorithms into two sequential stages and investigates the individual impact of each stage on outbreak detection performance. The data forecasting stage (Stage 1) generates predictions of time series values a certain number of time steps in the future based on historical data. The anomaly measure stage (Stage 2) compares features of this prediction to corresponding features of the actual time series to compute a statistical anomaly measure. A Monte Carlo simulation procedure is then used to examine the recombinant algorithms' ability to detect synthetic aberrations injected into authentic syndromic time series. RESULTS: New methods obtained with procedural components of published, sometimes widely used, algorithms were compared to the known methods using authentic datasets with plausible stochastic injected signals. Performance improvements were found for some of the recombinant methods, and these improvements were consistent over a range of data types, outbreak types, and outbreak sizes. For gradual outbreaks, the WEWD MovAvg7+WEWD Z-Score recombinant algorithm performed best; for sudden outbreaks, the HW+WEWD Z-Score performed best. CONCLUSION: This decomposition was found not only to yield valuable insight into the effects of the aberration detection algorithms but also to produce novel combinations of data forecasters and anomaly measures with enhanced detection performance.
Asunto(s)
Algoritmos , Enfermedades Transmisibles/diagnóstico , Vigilancia de la Población/métodos , Informática en Salud Pública/métodos , Bioterrorismo/prevención & control , Enfermedades Transmisibles/epidemiología , Brotes de Enfermedades/prevención & control , Mediciones Epidemiológicas , Humanos , Análisis de Regresión , TiempoRESUMEN
The primary objective of this ecologic and contextual study is to determine statistically significant short-term associations between air quality (daily ozone and particulate concentrations) and Medicaid patient general acute care daily visits for asthma exacerbations over 11 years for Washington, DC residents, and to identify regions and populations that may experience increased asthma exacerbations related to air quality. After removing long-term trends and day-of-week effects in the Medicaid data, Poisson regression was applied to daily time series data. Significant associations were found between asthma-related general acute care visits and ozone concentrations. Significant associations with both ozone and PM2.5 concentrations were observed for 5- to 12-year-olds. While poor air quality was closely associated with asthma exacerbations observed in acute care visits in areas where Medicaid enrollment was high, the strongest associations between asthma-related visits and air quality were not always for the areas with the highest Medicaid enrollment.