Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
JMIR Med Inform ; 10(3): e33212, 2022 Mar 24.
Article in English | MEDLINE | ID: mdl-35275063

ABSTRACT

BACKGROUND: A small proportion of high-need patients persistently use the bulk of health care services and incur disproportionate costs. Population health management (PHM) programs often refer to these patients as persistent high utilizers (PHUs). Accurate PHU prediction enables PHM programs to better align scarce health care resources with high-need PHUs while generally improving outcomes. While prior research in PHU prediction has shown promise, traditional regression methods used in these studies have yielded limited accuracy. OBJECTIVE: We are seeking to improve PHU predictions with an ensemble approach in a retrospective observational study design using insurance claim records. METHODS: We defined a PHU as a patient with health care costs in the top 20% of all patients for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in next 24 months. Our study population included 165,595 patients in the Johns Hopkins Health Care plan, with 8359 (5.1%) patients identified as PHUs in 2014 and 2015. We assessed the performance of several standalone machine learning methods and then an ensemble approach combining multiple models. RESULTS: The candidate ensemble with complement naïve Bayes and random forest layers produced increased sensitivity and positive predictive value (PPV; 49.0% and 50.3%, respectively) compared to logistic regression (46.8% and 46.1%, respectively). CONCLUSIONS: Our results suggest that ensemble machine learning can improve prediction of care management needs. Improved PPV implies reduced incorrect referral of low-risk patients. With the improved sensitivity/PPV balance of this approach, resources may be directed more efficiently to patients needing them most.

2.
JMIR Med Inform ; 9(11): e31442, 2021 Nov 25.
Article in English | MEDLINE | ID: mdl-34592712

ABSTRACT

BACKGROUND: A high proportion of health care services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing costs and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve health care system efficiencies and improve the overall quality of patient care. OBJECTIVE: The aim of this study was to detect key classes of diseases and medications among the study population and to assess the predictive value of these classes in identifying PHUs. METHODS: This study was a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring health care costs in the top 20% of all patients' costs for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied latent class analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in health care utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic covariates, and health utilization covariates. Predictive powers of the regression models were assessed and compared using standard metrics. RESULTS: We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9% were women, 3.3% had ≥1 hospitalization, and 19.1% had 10+ outpatient visits in 2013. A total of 8359 (5.09%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: acute upper respiratory infection (URI) (n=53,232; 4.6% PHUs), mental health (n=34,456; 12.8% PHUs), otitis media (n=24,992; 4.5% PHUs), and musculoskeletal (n=24,799; 15.5% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model that included LCA categories (F1=38.62%) compared to that of a complex risk stratification model with a full set of predictors (F1=48.20%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the mental health and musculoskeletal subpopulations (F1-scores of 48.69% and 48.15%, respectively). F1-scores were lower than that of the complex model when the LCA-enabled models were limited to the otitis media and acute URI subpopulations (45.77% and 43.05%, respectively). CONCLUSIONS: Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other health care settings.

3.
Public Health Rep ; 132(1_suppl): 116S-126S, 2017.
Article in English | MEDLINE | ID: mdl-28692395

ABSTRACT

Syndromic surveillance has expanded since 2001 in both scope and geographic reach and has benefited from research studies adapted from numerous disciplines. The practice of syndromic surveillance continues to evolve rapidly. The International Society for Disease Surveillance solicited input from its global surveillance network on key research questions, with the goal of improving syndromic surveillance practice. A workgroup of syndromic surveillance subject matter experts was convened from February to June 2016 to review and categorize the proposed topics. The workgroup identified 12 topic areas in 4 syndromic surveillance categories: informatics, analytics, systems research, and communications. This article details the context of each topic and its implications for public health. This research agenda can help catalyze the research that public health practitioners identified as most important.


Subject(s)
Population Surveillance/methods , Public Health Informatics , Research , Communication , Data Accuracy , Humans , Information Dissemination
5.
PLoS One ; 8(12): e84077, 2013.
Article in English | MEDLINE | ID: mdl-24386335

ABSTRACT

BACKGROUND: The U.S. Department of Veterans Affairs (VA) and Department of Defense (DoD) had more than 18 million healthcare beneficiaries in 2011. Both Departments conduct individual surveillance for disease events and health threats. METHODS: We performed joint and separate analyses of VA and DoD outpatient visit data from October 2006 through September 2010 to demonstrate geographic and demographic coverage, timeliness of influenza epidemic awareness, and impact on spatial cluster detection achieved from a joint VA and DoD biosurveillance platform. RESULTS: Although VA coverage is greater, DoD visit volume is comparable or greater. Detection of outbreaks was better in DoD data for 58% and 75% of geographic areas surveyed for seasonal and pandemic influenza, respectively, and better in VA data for 34% and 15%. The VA system tended to alert earlier with a typical H3N2 seasonal influenza affecting older patients, and the DoD performed better during the H1N1 pandemic which affected younger patients more than normal influenza seasons. Retrospective analysis of known outbreaks demonstrated clustering evidence found in separate DoD and VA runs, which persisted with combined data sets. CONCLUSION: The analyses demonstrate two complementary surveillance systems with evident benefits for the national health picture. Relative timeliness of reporting could be improved in 92% of geographic areas with access to both systems, and more information provided in areas where only one type of facility exists. Combining DoD and VA data enhances geographic cluster detection capability without loss of sensitivity to events isolated in either population and has a manageable effect on customary alert rates.


Subject(s)
Data Mining/methods , Public Health Surveillance , Veterans Health/statistics & numerical data , Veterans/statistics & numerical data , Adolescent , Adult , Aged , Biosurveillance , Child , Child, Preschool , Databases, Factual , Humans , Infant , Infant, Newborn , Influenza A Virus, H1N1 Subtype/physiology , Influenza A Virus, H3N2 Subtype/physiology , Influenza, Human/epidemiology , Middle Aged , Pandemics/statistics & numerical data , Retrospective Studies , Time Factors , United States , United States Department of Defense/statistics & numerical data , United States Department of Veterans Affairs/statistics & numerical data , Young Adult
6.
Disaster Med Public Health Prep ; 5(1): 37-45, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21402825

ABSTRACT

OBJECTIVE: We evaluated emergency department (ED) data, emergency medical services (EMS) data, and public utilities data for describing an outbreak of carbon monoxide (CO) poisoning following a windstorm. METHODS: Syndromic ED data were matched against previously collected chart abstraction data. We ran detection algorithms on selected time series derived from all 3 data sources to identify health events associated with the CO poisoning outbreak. We used spatial and spatiotemporal scan statistics to identify geographic areas that were most heavily affected by the CO poisoning event. RESULTS: Of the 241 CO cases confirmed by chart review, 190 (78.8%) were identified in the syndromic surveillance data as exact matches. Records from the ED and EMS data detected an increase in CO-consistent syndromes after the storm. The ED data identified significant clusters of CO-consistent syndromes, including zip codes that had widespread power outages. Weak temporal gastrointestinal (GI) signals, possibly resulting from ingestion of food spoiled by lack of refrigeration, were detected in the ED data but not in the EMS data. Spatial clustering of GI-based groupings in the ED data was not detected. CONCLUSIONS: Data from this evaluation support the value of ED data for surveillance after natural disasters. Enhanced EMS data may be useful for monitoring a CO poisoning event, if these data are available to the health department promptly.


Subject(s)
Carbon Monoxide Poisoning/epidemiology , Disasters/statistics & numerical data , Emergency Service, Hospital/statistics & numerical data , Wind , Adolescent , Adult , Aged , Algorithms , Child , Child, Preschool , Cluster Analysis , Data Collection/methods , Female , Geography , Humans , Infant , Male , Middle Aged , Population Surveillance/methods , Retrospective Studies , Risk Assessment , Time Factors , Washington/epidemiology , Weather , Young Adult
7.
Stat Med ; 30(14): 1665-77, 2011 Jun 30.
Article in English | MEDLINE | ID: mdl-21432890

ABSTRACT

Algorithms for identifying public health threats or disease outbreaks are vulnerable to false alarms arising from sudden shifts in health-care utilization or data participation. This paper describes a method of reducing false alerts in automated public health surveillance algorithms, and in particular, automated syndromic surveillance algorithms, that rely on health-care utilization data. The technique is based on monitoring syndromic counts with reference to a suitable background, or reference, series of counts. The suitability of the background time series in decreasing the false-alarm rate will be shown to be related mathematically to the so-called mutual information that exists between the random variables representing the syndromic and background time series of counts. The method can be understood as a noise cancellation filter technique in which one noisy (reference) channel is used to cancel the background noise of the monitored (measured) channel. The issues discussed here may also be relevant to the appropriate use of rates in epidemiology and biostatistics.


Subject(s)
Bias , Biosurveillance/methods , Models, Statistical , Algorithms , Biostatistics/methods , Botulism/epidemiology , Computer Simulation , Delivery of Health Care/statistics & numerical data , Electronic Data Processing , Exanthema/epidemiology , Humans , Information Theory , Monte Carlo Method , ROC Curve , Syndrome
9.
Stat Med ; 30(5): 470-9, 2011 Feb 28.
Article in English | MEDLINE | ID: mdl-21290403

ABSTRACT

This paper describes the problem of public health monitoring for waterborne disease outbreaks using disparate evidence from health surveillance data streams and environmental sensors. We present a combined monitoring approach along with examples from a recent project at the Johns Hopkins University Applied Physics Laboratory in collaboration with the U.S. Environmental Protection Agency. The project objective was to build a module for the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) to include water quality data with health indicator data for the early detection of waterborne disease outbreaks. The basic question in the fused surveillance application is 'What is the likelihood of the public health threat of interest given recent information from available sources of evidence?' For a scientific perspective, we formulate this question in terms of the estimation of positive predictive value customary in classical epidemiology, and we present a solution framework using Bayesian Networks (BN). An overview of the BN approach presents advantages, disadvantages, and required adaptations needed for a fused surveillance capability that is scalable and robust relative to the practical data environment. In the BN project, we built a top-level health/water-quality fusion BN informed by separate waterborne-disease-related networks for the detection of water contamination and human health effects. Elements of the art of developing networks appropriate to this environment are discussed with examples. Results of applying these networks to a simulated contamination scenario are presented.


Subject(s)
Biosurveillance/methods , Disease Outbreaks/statistics & numerical data , Environmental Monitoring/methods , Algorithms , Bayes Theorem , Computer Simulation , Decision Support Techniques , Disease/etiology , Health Status Indicators , Humans , Marine Toxins/toxicity , Oxocins/toxicity , Population Surveillance/methods , Predictive Value of Tests , Prevalence , Probability , Water Microbiology , Water Pollution/adverse effects , Water Pollution/analysis
10.
J Am Med Inform Assoc ; 16(6): 855-63, 2009.
Article in English | MEDLINE | ID: mdl-19717809

ABSTRACT

This study introduces new information fusion algorithms to enhance disease surveillance systems with Bayesian decision support capabilities. A detection system was built and tested using chief complaints from emergency department visits, International Classification of Diseases Revision 9 (ICD-9) codes from records of outpatient visits to civilian and military facilities, and influenza surveillance data from health departments in the National Capital Region (NCR). Data anomalies were identified and distribution of time offsets between events in the multiple data streams were established. The Bayesian Network was built to fuse data from multiple sources and identify influenza-like epidemiologically relevant events. Results showed increased specificity compared with the alerts generated by temporal anomaly detection algorithms currently deployed by NCR health departments. Further research should be done to investigate correlations between data sources for efficient fusion of the collected data.


Subject(s)
Data Mining/methods , Decision Support Techniques , Disease Outbreaks/prevention & control , Influenza, Human/prevention & control , Population Surveillance/methods , Algorithms , Bayes Theorem , District of Columbia/epidemiology , Health Status Indicators , Humans , Influenza, Human/epidemiology , Maryland/epidemiology , Virginia/epidemiology
12.
Stat Med ; 28(26): 3226-48, 2009 Nov 20.
Article in English | MEDLINE | ID: mdl-19725023

ABSTRACT

This paper discusses further advances in making robust predictions with the Holt-Winters forecasts for a variety of syndromic time series behaviors and introduces a control-chart detection approach based on these forecasts. Using three collections of time series data, we compare biosurveillance alerting methods with quantified measures of forecast agreement, signal sensitivity, and time-to-detect. The study presents practical rules for initialization and parameterization of biosurveillance time series. Several outbreak scenarios are used for detection comparison. We derive an alerting algorithm from forecasts using Holt-Winters-generalized smoothing for prospective application to daily syndromic time series. The derived algorithm is compared with simple control-chart adaptations and to more computationally intensive regression modeling methods. The comparisons are conducted on background data from both authentic and simulated data streams. Both types of background data include time series that vary widely by both mean value and cyclic or seasonal behavior. Plausible, simulated signals are added to the background data for detection performance testing at signal strengths calculated to be neither too easy nor too hard to separate the compared methods. Results show that both the sensitivity and the timeliness of the Holt-Winters-based algorithm proved to be comparable or superior to that of the more traditional prediction methods used for syndromic surveillance.


Subject(s)
Algorithms , Biosurveillance/methods , Biostatistics/methods , Data Interpretation, Statistical , Disease Outbreaks/statistics & numerical data , Humans , Monte Carlo Method , Regression Analysis , Time Factors
14.
Stat Med ; 26(22): 4202-18, 2007 Sep 30.
Article in English | MEDLINE | ID: mdl-17335120

ABSTRACT

For robust detection performance, traditional control chart monitoring for biosurveillance is based on input data free of trends, day-of-week effects, and other systematic behaviour. Time series forecasting methods may be used to remove this behaviour by subtracting forecasts from observations to form residuals for algorithmic input. We describe three forecast methods and compare their predictive accuracy on each of 16 authentic syndromic data streams. The methods are (1) a non-adaptive regression model using a long historical baseline, (2) an adaptive regression model with a shorter, sliding baseline, and (3) the Holt-Winters method for generalized exponential smoothing. Criteria for comparing the forecasts were the root-mean-square error, the median absolute per cent error (MedAPE), and the median absolute deviation. The median-based criteria showed best overall performance for the Holt-Winters method. The MedAPE measures over the 16 test series averaged 16.5, 11.6, and 9.7 for the non-adaptive regression, adaptive regression, and Holt-Winters methods, respectively. The non-adaptive regression forecasts were degraded by changes in the data behaviour in the fixed baseline period used to compute model coefficients. The mean-based criterion was less conclusive because of the effects of poor forecasts on a small number of calendar holidays. The Holt-Winters method was also most effective at removing serial autocorrelation, with most 1-day-lag autocorrelation coefficients below 0.15. The forecast methods were compared without tuning them to the behaviour of individual series. We achieved improved predictions with such tuning of the Holt-Winters method, but practical use of such improvements for routine surveillance will require reliable data classification methods.


Subject(s)
Forecasting/methods , Models, Statistical , Population Surveillance , Bias , Health Behavior , Humans , Regression Analysis , Sensitivity and Specificity , Time Factors
15.
Environ Health ; 6: 9, 2007 Mar 21.
Article in English | MEDLINE | ID: mdl-17376237

ABSTRACT

BACKGROUND: The District of Columbia (DC) Department of Health, under a grant from the US Centers for Disease Control and Prevention, established an Environmental Public Health Tracking Program. As part of this program, the goals of this contextual pilot study are to quantify short-term associations between daily pediatric emergency department (ED) visits and admissions for asthma exacerbations with ozone and particulate concentrations, and broader associations with socio-economic status and age group. METHODS: Data included daily counts of de-identified asthma-related pediatric ED visits for DC residents and daily ozone and particulate concentrations during 2001-2004. Daily temperature, mold, and pollen measurements were also obtained. After a cubic spline was applied to control for long-term seasonal trends in the ED data, a Poisson regression analysis was applied to the time series of daily counts for selected age groups. RESULTS: Associations between pediatric asthma ED visits and outdoor ozone concentrations were significant and strongest for the 5-12 year-old age group, for which a 0.01-ppm increase in ozone concentration indicated a mean 3.2% increase in daily ED visits and a mean 8.3% increase in daily ED admissions. However, the 1-4 yr old age group had the highest rate of asthma-related ED visits. For 1-17 yr olds, the rates of both asthma-related ED visits and admissions increased logarithmically with the percentage of children living below the poverty threshold, slowing when this percentage exceeded 30%. CONCLUSION: Significant associations were found between ozone concentrations and asthma-related ED visits, especially for 5-12 year olds. The result that the most significant ozone associations were not seen in the age group (1-4 yrs) with the highest rate of asthma-related ED visits may be related to the clinical difficulty in accurately diagnosing asthma among this age group. We observed real increases in relative risk of asthma ED visits for children living in higher poverty zip codes versus other zip codes, as well as similar logarithmic relationships for visits and admissions, which implies ED over-utilization may not be a factor. These results could suggest designs for future epidemiological studies that include more information on individual exposures and other risk factors.


Subject(s)
Air Pollutants/adverse effects , Asthma/etiology , Emergency Service, Hospital/statistics & numerical data , Poverty , Adolescent , Age Distribution , Air Pollutants/analysis , Asthma/epidemiology , Child , Child, Preschool , District of Columbia/epidemiology , Humans , Infant , Ozone/adverse effects , Ozone/analysis , Seasons
16.
MMWR Suppl ; 54: 55-62, 2005 Aug 26.
Article in English | MEDLINE | ID: mdl-16177694

ABSTRACT

INTRODUCTION: In concert with increased concerns regarding both biologic terrorism and new natural infectious disease threats (e.g., severe acute respiratory syndrome [SARS] and West Nile virus), as a result of advances in medical informatics, various data sources are available to epidemiologists for routine, prospective monitoring of public health. The synthesis of this evidence requires tools to find anomalies within various data stream combinations while maintaining manageable false alarm rates. OBJECTIVES: The objectives of this report are to establish statistical hypotheses to define the compound multivariate problem of surveillance systems, present statistical methods for testing these hypotheses, and examine results of applying these methods to simulated and actual data. METHODS: Canonical problems of parallel monitoring and consensus monitoring are considered in this report. Modified Bonferroni methods are examined for parallel monitoring. Both multiple univariate and multivariate methods are applied for consensus monitoring. A multivariate adaptation of Monte Carlo trials, using the injection of epidemic-curve-like signals in the multiple data streams of interest, is presented for evaluation of the various tests. RESULTS: The Monte Carlo test results demonstrate that the multiple univariate combination methods of Fisher and Edgington provide the most robust detection performance across the scenarios tested. As the number of data streams increases, methods based on Hotelling's T2 offer added sensitivity for certain signal scenarios. This potential advantage is clearer when strong correlation exists among the data streams. CONCLUSION: Parallel and consensus monitoring tools must be blended to enable a surveillance system with distributed sensitivity and controlled alert rates. Whether a multiple univariate or multivariate approach should be used for consensus monitoring depends on the number and distribution of useful data sources and also on their covariance structure and stationarity. Strong, consistent correlation among numerous sources warrants the examination of multivariate control charts.


Subject(s)
Disease Outbreaks/prevention & control , Epidemiologic Measurements , Population Surveillance/methods , Public Health Informatics/instrumentation , Algorithms , Humans , Models, Statistical
17.
MMWR Suppl ; 53: 67-73, 2004 Sep 24.
Article in English | MEDLINE | ID: mdl-15714632

ABSTRACT

INTRODUCTION: Syndromic surveillance systems are used to monitor daily electronic data streams for anomalous counts of features of varying specificity. The monitored quantities might be counts of clinical diagnoses, sales of over-the-counter influenza remedies, school absenteeism among a given age group, and so forth. Basic data-aggregation decisions for these systems include determining which records to count and how to group them in space and time. OBJECTIVES: This paper discusses the application of spatial and temporal data-aggregation strategies for multiple data streams to alerting algorithms appropriate to the surveillance region and public health threat of interest. Such a strategy was applied and evaluated for a complex, authentic, multisource, multiregion environment, including >2 years of data records from a system-evaluation exercise for the Defense Advanced Research Project Agency (DARPA). METHODS: Multivariate and multiple univariate statistical process control methods were adapted and applied to the DARPA data collection. Comparative parametric analyses based on temporal aggregation were used to optimize the performance of these algorithms for timely detection of a set of outbreaks identified in the data by a team of epidemiologists. RESULTS: The sensitivity and timeliness of the most promising detection methods were tested at empirically calculated thresholds corresponding to multiple practical false-alert rates. Even at the strictest false-alert rate, all but one of the outbreaks were detected by the best method, and the best methods achieved a 1-day median time before alert over the set of test outbreaks. CONCLUSIONS: These results indicate that a biosurveillance system can provide a substantial alerting-timeliness advantage over traditional public health monitoring for certain outbreaks. Comparative analyses of individual algorithm results indicate further achievable improvement in sensitivity and specificity.


Subject(s)
Bioterrorism/prevention & control , Disease Outbreaks/prevention & control , Epidemiologic Measurements , Models, Statistical , Population Surveillance/methods , Algorithms , Data Collection , Humans , United States
18.
J Urban Health ; 80(2 Suppl 1): i57-65, 2003 Jun.
Article in English | MEDLINE | ID: mdl-12791780

ABSTRACT

Researchers working on the Department of Defense Global Emerging Infections System (DoD-GEIS) pilot system, the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE), have applied scan statistics for early outbreak detection using both traditional and nontraditional data sources. These sources include medical data indexed by International Classification of Disease, 9th Revision (ICD-9) diagnosis codes, as well as less-specific, but potentially timelier, indicators such as records of over-the-counter remedy sales and of school absenteeism. Early efforts employed the Kulldorff scan statistic as implemented in the SaTScan software of the National Cancer Institute. A key obstacle to this application is that the input data streams are typically based on time-varying factors, such as consumer behavior, rather than simply on the populations of the component subregions. We have used both modeling and recent historical data distributions to obtain background spatial distributions. Data analyses have provided guidance on how to condition and model input data to avoid excessive clustering. We have used this methodology in combining data sources for both retrospective studies of known outbreaks and surveillance of high-profile events of concern to local public health authorities. We have integrated the scan statistic capability into a Microsoft Access-based system in which we may include or exclude data sources, vary time windows separately for different data sources, censor data from subsets of individual providers or subregions, adjust the background computation method, and run retrospective or simulated studies.


Subject(s)
Disease Outbreaks , Population Surveillance/methods , Public Health Informatics , Bioterrorism , Cluster Analysis , Data Interpretation, Statistical , Hospitals, Military/statistics & numerical data , Humans , International Classification of Diseases , United States/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...