Pesquisa | Portal de Pesquisa da BVS

Development of a COVID-19 Application Ontology for the ACT Network.

Visweswaran, Shyam; Samayamuthu, Malarkodi J; Morris, Michele; Weber, Griffin M; MacFadden, Douglas; Trevvett, Philip; Klann, Jeffrey G; Gainer, Vivian; Benoit, Barbara; Murphy, Shawn N; Patel, Lav; Mirkovic, Nebojsa; Borovskiy, Yuliya; Johnson, Robert D; Wyatt, Matthew C; Wang, Amy Y; Follett, Robert W; Chau, Ngan; Zhu, Wenhong; Abajian, Mark; Chuang, Amy; Bahroos, Neil; Reeder, Phillip; Xie, Donglu; Cai, Jennifer; Sendro, Elaina R; Toto, Robert D; Firestein, Gary S; Nadler, Lee M; Reis, Steven E.

medRxiv ; 2021 Apr 14.

Artigo em Inglês | MEDLINE | ID: mdl-33791734

RESUMO

Clinical data networks that leverage large volumes of data in electronic health records (EHRs) are significant resources for research on coronavirus disease 2019 (COVID-19). Data harmonization is a key challenge in seamless use of multisite EHRs for COVID-19 research. We developed a COVID-19 application ontology in the national Accrual to Clinical Trials (ACT) network that enables harmonization of data elements that that are critical to COVID-19 research. The ontology contains over 50,000 concepts in the domains of diagnosis, procedures, medications, and laboratory tests. In particular, it has computational phenotypes to characterize the course of illness and outcomes, derived terms, and harmonized value sets for SARS-CoV-2 laboratory tests. The ontology was deployed and validated on the ACT COVID-19 network that consists of nine academic health centers with data on 14.5M patients. This ontology, which is freely available to the entire research community on GitHub at https://github.com/shyamvis/ACT-COVID-Ontology, will be useful for harmonizing EHRs for COVID-19 research beyond the ACT network.

Development of a Coronavirus Disease 2019 (COVID-19) Application Ontology for the Accrual to Clinical Trials (ACT) network.

Visweswaran, Shyam; Samayamuthu, Malarkodi J; Morris, Michele; Weber, Griffin M; MacFadden, Douglas; Trevvett, Philip; Klann, Jeffrey G; Gainer, Vivian S; Benoit, Barbara; Murphy, Shawn N; Patel, Lav; Mirkovic, Nebojsa; Borovskiy, Yuliya; Johnson, Robert D; Wyatt, Matthew C; Wang, Amy Y; Follett, Robert W; Chau, Ngan; Zhu, Wenhong; Abajian, Mark; Chuang, Amy; Bahroos, Neil; Reeder, Phillip; Xie, Donglu; Cai, Jennifer; Sendro, Elaina R; Toto, Robert D; Firestein, Gary S; Nadler, Lee M; Reis, Steven E.

JAMIA Open ; 4(2): ooab036, 2021 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-34113801

RESUMO

Clinical data networks that leverage large volumes of data in electronic health records (EHRs) are significant resources for research on coronavirus disease 2019 (COVID-19). Data harmonization is a key challenge in seamless use of multisite EHRs for COVID-19 research. We developed a COVID-19 application ontology in the national Accrual to Clinical Trials (ACT) network that enables harmonization of data elements that are critical to COVID-19 research. The ontology contains over 50 000 concepts in the domains of diagnosis, procedures, medications, and laboratory tests. In particular, it has computational phenotypes to characterize the course of illness and outcomes, derived terms, and harmonized value sets for severe acute respiratory syndrome coronavirus 2 laboratory tests. The ontology was deployed and validated on the ACT COVID-19 network that consists of 9 academic health centers with data on 14.5M patients. This ontology, which is freely available to the entire research community on GitHub at https://github.com/shyamvis/ACT-COVID-Ontology, will be useful for harmonizing EHRs for COVID-19 research beyond the ACT network.

A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.

Joffe, Erel; Byrne, Michael J; Reeder, Phillip; Herskovic, Jorge R; Johnson, Craig W; McCoy, Allison B; Sittig, Dean F; Bernstam, Elmer V.

J Am Med Inform Assoc ; 21(1): 97-104, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-23703827

RESUMO

INTRODUCTION: Clinical databases require accurate entity resolution (ER). One approach is to use algorithms that assign questionable cases to manual review. Few studies have compared the performance of common algorithms for such a task. Furthermore, previous work has been limited by a lack of objective methods for setting algorithm parameters. We compared the performance of common ER algorithms: using algorithmic optimization, rather than manual parameter tuning, and on two-threshold classification (match/manual review/non-match) as well as single-threshold (match/non-match). METHODS: We manually reviewed 20,000 randomly selected, potential duplicate record-pairs to identify matches (10,000 training set, 10,000 test set). We evaluated the probabilistic expectation maximization, simple deterministic and fuzzy inference engine (FIE) algorithms. We used particle swarm to optimize algorithm parameters for a single and for two thresholds. We ran 10 iterations of optimization using the training set and report averaged performance against the test set. RESULTS: The overall estimated duplicate rate was 6%. FIE and simple deterministic algorithms allowed a lower manual review set compared to the probabilistic method (FIE 1.9%, simple deterministic 2.5%, probabilistic 3.6%; p<0.001). For a single threshold, the simple deterministic algorithm performed better than the probabilistic method (positive predictive value 0.956 vs 0.887, sensitivity 0.985 vs 0.887, p<0.001). ER with FIE classifies 98.1% of record-pairs correctly (1/10,000 error rate), assigning the remainder to manual review. CONCLUSIONS: Optimized deterministic algorithms outperform the probabilistic method. There is a strong case for considering optimized deterministic methods for ER.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Benchmarking , Lógica Fuzzy , Humanos , Registro Médico Coordenado/métodos , Probabilidade

Optimized dual threshold entity resolution for electronic health record databases--training set size and active learning.

Joffe, Erel; Byrne, Michael J; Reeder, Phillip; Herskovic, Jorge R; Johnson, Craig W; McCoy, Allison B; Bernstam, Elmer V.

AMIA Annu Symp Proc ; 2013: 721-30, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24551372

RESUMO

Clinical databases may contain several records for a single patient. Multiple general entity-resolution algorithms have been developed to identify such duplicate records. To achieve optimal accuracy, algorithm parameters must be tuned to a particular dataset. The purpose of this study was to determine the required training set size for probabilistic, deterministic and Fuzzy Inference Engine (FIE) algorithms with parameters optimized using the particle swarm approach. Each algorithm classified potential duplicates into: definite match, non-match and indeterminate (i.e., requires manual review). Training sets size ranged from 2,000-10,000 randomly selected record-pairs. We also evaluated marginal uncertainty sampling for active learning. Optimization reduced manual review size (Deterministic 11.6% vs. 2.5%; FIE 49.6% vs. 1.9%; and Probabilistic 10.5% vs. 3.5%). FIE classified 98.1% of the records correctly (precision=1.0). Best performance required training on all 10,000 randomly-selected record-pairs. Active learning achieved comparable results with 3,000 records. Automated optimization is effective and targeted sampling can reduce the required training set size.

Assuntos

Algoritmos , Inteligência Artificial , Registros Eletrônicos de Saúde , Lógica Fuzzy

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA