OARD: Open annotations for rare diseases and their phenotypes based on real-world data.

Liu, Cong; Ta, Casey N; Havrilla, Jim M; Nestor, Jordan G; Spotnitz, Matthew E; Geneslaw, Andrew S; Hu, Yu; Chung, Wendy K; Wang, Kai; Weng, Chunhua

Liu, Cong; Ta, Casey N; Havrilla, Jim M; Nestor, Jordan G; Spotnitz, Matthew E; Geneslaw, Andrew S; Hu, Yu; Chung, Wendy K; Wang, Kai; Weng, Chunhua.

Afiliação

Liu C; Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
Ta CN; Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
Havrilla JM; Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Nestor JG; Division of Nephrology, Department of Medicine, Columbia University, New York, NY 10032, USA.
Spotnitz ME; Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
Geneslaw AS; Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Hu Y; Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Chung WK; Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA.
Wang K; Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Weng C; Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA. Electronic address: cw2384@cumc.columbia.edu.

Am J Hum Genet ; 109(9): 1591-1604, 2022 09 01.

Article em En | MEDLINE | ID: mdl-35998640

ABSTRACT

ABSTRACT

Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.

Assuntos

Processamento de Linguagem Natural; Doenças Raras; Registros Eletrônicos de Saúde; Humanos; Fenótipo; Doenças Raras/genética

Palavras-chave

electronic health records; human phenotype ontology; knowledge graph; natural language processing; open data sharing; phenotype association; rare disease

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Doenças Raras Tipo de estudo: Guideline / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google