Pesquisa | Biblioteca Virtual em Saúde

Measuring quality of DNA sequence data via degradation.

Karr, Alan F; Hauzel, Jason; Porter, Adam A; Schaefer, Marcel.

PLoS One ; 17(8): e0271970, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35921272

RESUMO

We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes, illustrated by outlier detection. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.

Assuntos

Genoma , Sequência de Bases , Bases de Dados Factuais

Specified Certainty Classification, with Application to Read Classification for Reference-Guided Metagenomic Assembly.

Karr, Alan F; Hauzel, Jason; Menon, Prahlad; Porter, Adam A; Schaefer, Marcel.

ArXiv ; 2021 Sep 13.

Artigo em Inglês | MEDLINE | ID: mdl-34545333

RESUMO

Specified Certainty Classification (SCC) classifiers whose outputs carry uncertainties, typically in the form of Bayesian posterior probabilities. By allowing the classifier output to be less precise than one of a set of atomic decisions, SCC allows all decisions to achieve a specified level of certainty, as well as provides insights into classifier behavior by examining all decisions that are possible. Our primary illustration is read classification for reference-guided genome assembly, but we demonstrate the breadth of SCC by also analyzing COVID-19 vaccination data.

Comparing record linkage software programs and algorithms using real-world data.

Karr, Alan F; Taylor, Matthew T; West, Suzanne L; Setoguchi, Soko; Kou, Tzuyung D; Gerhard, Tobias; Horton, Daniel B.

PLoS One ; 14(9): e0221459, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31550255

RESUMO

Linkage of medical databases, including insurer claims and electronic health records (EHRs), is increasingly common. However, few studies have investigated the behavior and output of linkage software. To determine how linkage quality is affected by different algorithms, blocking variables, methods for string matching and weight determination, and decision rules, we compared the performance of 4 nonproprietary linkage software packages linking patient identifiers from noninteroperable inpatient and outpatient EHRs. We linked datasets using first and last name, gender, and date of birth (DOB). We evaluated DOB and year of birth (YOB) as blocking variables and used exact and inexact matching methods. We compared the weights assigned to record pairs and evaluated how matching weights corresponded to a gold standard, medical record number. Deduplicated datasets contained 69,523 inpatient and 176,154 outpatient records, respectively. Linkage runs blocking on DOB produced weights ranging in number from 8 for exact matching to 64,273 for inexact matching. Linkage runs blocking on YOB produced 8 to 916,806 weights. Exact matching matched record pairs with identical test characteristics (sensitivity 90.48%, specificity 99.78%) for the highest ranked group, but algorithms differentially prioritized certain variables. Inexact matching behaved more variably, leading to dramatic differences in sensitivity (range 0.04-93.36%) and positive predictive value (PPV) (range 86.67-97.35%), even for the most highly ranked record pairs. Blocking on DOB led to higher PPV of highly ranked record pairs. An ensemble approach based on averaging scaled matching weights led to modestly improved accuracy. In summary, we found few differences in the rankings of record pairs with the highest matching weights across 4 linkage packages. Performance was more consistent for exact string matching than for inexact string matching. Most methods and software packages performed similarly when comparing matching accuracy with the gold standard. In some settings, an ensemble matching approach may outperform individual linkage algorithms.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Registro Médico Coordenado/métodos , Software , Bases de Dados Factuais/estatística & dados numéricos , Registros Eletrônicos de Saúde/normas , Humanos , Registro Médico Coordenado/normas

A Bayesian spatio-temporal approach for real-time detection of disease outbreaks: a case study.

Zou, Jian; Karr, Alan F; Datta, Gauri; Lynch, James; Grannis, Shaun.

BMC Med Inform Decis Mak ; 14: 108, 2014 Dec 05.

Artigo em Inglês | MEDLINE | ID: mdl-25476843

RESUMO

BACKGROUND: For researchers and public health agencies, the complexity of high-dimensional spatio-temporal data in surveillance for large reporting networks presents numerous challenges, which include low signal-to-noise ratios, spatial and temporal dependencies, and the need to characterize uncertainties. Central to the problem in the context of disease outbreaks is a decision structure that requires trading off false positives for delayed detections. METHODS: In this paper we apply a previously developed Bayesian hierarchical model to a data set from the Indiana Public Health Emergency Surveillance System (PHESS) containing three years of emergency department visits for influenza-like illness and respiratory illness. Among issues requiring attention were selection of the underlying network (Too few nodes attenuate important structure, while too many nodes impose barriers to both modeling and computation.); ensuring that confidentiality protections in the data do not impede important modeling day of week effects; and evaluating the performance of the model. RESULTS: Our results show that the model captures salient spatio-temporal dynamics that are present in public health surveillance data sets, and that it appears to detect both "annual" and "atypical" outbreaks in a timely, accurate manner. We present maps that help make model output accessible and comprehensible to public health authorities. We use an illustrative family of decision rules to show how output from the model can be used to inform false positive-delayed detection tradeoffs. CONCLUSIONS: The advantages of our methodology for addressing the complicated issues of real world surveillance data applications are three-fold. We can easily incorporate additional covariate information and spatio-temporal dynamics in the data. Second, we furnish a unified framework to provide uncertainties associated with each parameter. Third, we are able to handle multiplicity issues by using a Bayesian approach. The urgent need to quickly and effectively monitor the health of the public makes our methodology a potentially plausible and useful surveillance approach for health professionals.

Assuntos

Surtos de Doenças/estatística & dados numéricos , Serviço Hospitalar de Emergência/estatística & dados numéricos , Influenza Humana/epidemiologia , Vigilância da População/métodos , Análise Espaço-Temporal , Teorema de Bayes , Humanos , Indiana/epidemiologia , Cadeias de Markov , Modelos Biológicos , Distribuição Normal , Estudos de Casos Organizacionais , Doenças Respiratórias/epidemiologia

A spatio-temporal absorbing state model for disease and syndromic surveillance.

Heaton, Matthew J; Banks, David L; Zou, Jian; Karr, Alan F; Datta, Gauri; Lynch, James; Vera, Francisco.

Stat Med ; 31(19): 2123-36, 2012 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-22388709

RESUMO

Reliable surveillance models are an important tool in public health because they aid in mitigating disease outbreaks, identify where and when disease outbreaks occur, and predict future occurrences. Although many statistical models have been devised for surveillance purposes, none are able to simultaneously achieve the important practical goals of good sensitivity and specificity, proper use of covariate information, inclusion of spatio-temporal dynamics, and transparent support to decision-makers. In an effort to achieve these goals, this paper proposes a spatio-temporal conditional autoregressive hidden Markov model with an absorbing state. The model performs well in both a large simulation study and in an application to influenza/pneumonia fatality data.

Assuntos

Surtos de Doenças/estatística & dados numéricos , Influenza Humana/epidemiologia , Vigilância da População/métodos , Conglomerados Espaço-Temporais , Teorema de Bayes , Simulação por Computador , Humanos , Cadeias de Markov , Distribuição de Poisson , Síndrome , Estados Unidos/epidemiologia

Estimation of propensity scores using generalized additive models.

Woo, Mi-Ja; Reiter, Jerome P; Karr, Alan F.

Stat Med ; 27(19): 3805-16, 2008 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-18366144

RESUMO

Propensity score matching is often used in observational studies to create treatment and control groups with similar distributions of observed covariates. Typically, propensity scores are estimated using logistic regressions that assume linearity between the logistic link and the predictors. We evaluate the use of generalized additive models (GAMs) for estimating propensity scores. We compare logistic regressions and GAMs in terms of balancing covariates using simulation studies with artificial and genuine data. We find that, when the distributions of covariates in the treatment and control groups overlap sufficiently, using GAMs can improve overall covariate balance, especially for higher-order moments of distributions. When the distributions in the two groups overlap insufficiently, GAM more clearly reveals this fact than logistic regression does. We also demonstrate via simulation that matching with GAMs can result in larger reductions in bias when estimating treatment effects than matching with logistic regression.

Assuntos

Modelos Logísticos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Análise de Variância , Simulação por Computador , Fatores de Confusão Epidemiológicos , Humanos , Observação , Resultado do Tratamento

Secure analysis of distributed chemical databases without data integration.

Karr, Alan F; Feng, Jun; Lin, Xiaodong; Sanil, Ashish P; Young, S Stanley; Reiter, Jerome P.

J Comput Aided Mol Des ; 19(9-10): 739-47, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16267693

RESUMO

We present a method for performing statistically valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multi-party computation to share local sufficient statistics necessary to compute least squares estimators of regression coefficients, error variances and other quantities of interest. We illustrate our method with an example containing four companies' rather different databases.

Assuntos

Bases de Dados Factuais , Modelos Químicos , Algoritmos , Segurança Computacional , Análise dos Mínimos Quadrados , Modelos Lineares , Compostos Orgânicos/química , Análise de Regressão , Solubilidade , Água

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA