RESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
PURPOSE: Clinicians and researchers must contextualize a patient's genetic variants against population-based references with detailed phenotyping. We sought to establish globally scalable technology, policy, and procedures for sharing biosamples and associated genomic and phenotypic data on broadly consented cohorts, across sites of care. METHODS: Three of the nation's leading children's hospitals launched the Genomic Research and Innovation Network (GRIN), with federated information technology infrastructure, harmonized biobanking protocols, and material transfer agreements. Pilot studies in epilepsy and short stature were completed to design and test the collaboration model. RESULTS: Harmonized, broadly consented institutional review board (IRB) protocols were approved and used for biobank enrollment, creating ever-expanding, compatible biobanks. An open source federated query infrastructure was established over genotype-phenotype databases at the three hospitals. Investigators securely access the GRIN platform for prep to research queries, receiving aggregate counts of patients with particular phenotypes or genotypes in each biobank. With proper approvals, de-identified data is exported to a shared analytic workspace. Investigators at all sites enthusiastically collaborated on the pilot studies, resulting in multiple publications. Investigators have also begun to successfully utilize the infrastructure for grant applications. CONCLUSIONS: The GRIN collaboration establishes the technology, policy, and procedures for a scalable genomic research network.
Assuntos
Gerenciamento de Dados/métodos , Processamento Eletrônico de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Bancos de Espécimes Biológicos/normas , Pesquisa Biomédica/métodos , Bases de Dados Factuais , Bases de Dados Genéticas , Comitês de Ética em Pesquisa , Genômica/métodos , Humanos , Disseminação de Informação , PesquisadoresRESUMO
Motivation: In the era of big data and precision medicine, the number of databases containing clinical, environmental, self-reported and biochemical variables is increasing exponentially. Enabling the experts to focus on their research questions rather than on computational data management, access and analysis is one of the most significant challenges nowadays. Results: We present Rcupcake, an R package that contains a variety of functions for leveraging different databases through the BD2K PIC-SURE RESTful API and facilitating its query, analysis and interpretation. The package offers a variety of analysis and visualization tools, including the study of the phenotype co-occurrence and prevalence, according to multiple layers of data, such as phenome, exposome or genome. Availability and implementation: The package is implemented in R and is available under Mozilla v2 license from GitHub (https://github.com/hms-dbmi/Rcupcake). Two reproducible case studies are also available (https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/SSCcaseStudy_v01.ipynb, https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/NHANEScaseStudy_v01.ipynb). Contact: paul_avillach@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Genoma Humano , Fenótipo , Medicina de Precisão , Software , Bases de Dados Factuais , HumanosRESUMO
Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData Catalystâ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.
Assuntos
COVID-19 , Computação em Nuvem , Humanos , Ecossistema , Reprodutibilidade dos Testes , Pulmão , SoftwareRESUMO
Hematopoietic cell transplant for sickle cell disease is curative but is associated with life threatening complications most of which occur within the first 2 years after transplantation. In the current era with interest in gene therapy and gene editing we felt it timely to report on sickle cell disease transplant recipients who were alive for at least 2-year after transplantation, not previously reported. Our objectives were to (1) report the conditional survival rates of patients who were alive for 2 or more years after transplantation (2) identify risk factors for death beyond 2 years after transplantation and (3) compare all-cause mortality risks to those of an age-, sex- and race-matched general population in the United States. By limiting to 2-year survivors, we exclude deaths that occur as a direct consequence of the transplantation procedure. De-identified records of 1149 patients were reviewed from a publicly available data source and 950 patients were eligible (https://picsure.biodatacatalyst.nhlbi.nih.gov). All analyses were performed in this secure cloud environment using the available statistical software package(s). The validity of the public database was confirmed by reproducing results from an earlier publication. Conditional survival estimates were obtained using the Kaplan-Meier method for the sub-cohort that had survived a given length (x) of time after transplantation. Cox regression models were built to identify risk factors associated with mortality beyond 2 years after transplantation. The standardized relative mortality risk (SMR) or the ratio of observed to expected number of deaths, was used to quantify all-cause mortality risk after transplantation and compared to age, race and sex-matched general population. Person-years at risk were calculated from an anchor date (i.e., 2-, 5- and 7-years) after transplantation until date of death or last date known alive. The expected number of deaths was calculated using age, race and sex-specific US mortality rates. The median follow up was 5 years (range 2-20) and 300 (32%) patients were observed for more than 7 years. Among those who lived for at least 7 years after transplantation the 12-year probability of survival was 97% (95% CI, 92%-99%). Compared to an age-, race- and sex-matched US population, the risk for late death after transplantation was higher as late as 7 years after transplantation (hazard ratio (HR) 3.2; P= .020) but the risk receded over time. Risk factors for late death included age at transplant and donor type. For every 10-year increment in patient age, an older patient was 1.75 times more likely to die than a younger patient (P= .0004). Compared to HLA-matched siblings the use of other donors was associated with higher risk for late death (HR 3.49; P= .003). Graft failure (beyond 2-years after transplantation) was 7% (95% CI, 5%-9%) and graft failure was higher after transplantation of grafts from donors who were not HLA-matched siblings (HR 2.59, P< .0001). Long-term survival after transplantation is excellent and support this treatment as a cure for sickle cell disease. The expected risk for death recedes over time but the risk for late death is not negligible.
Assuntos
Anemia Falciforme , Transplante de Células-Tronco Hematopoéticas , Anemia Falciforme/terapia , Feminino , Transplante de Células-Tronco Hematopoéticas/efeitos adversos , Humanos , Masculino , Modelos de Riscos Proporcionais , Doadores de Tecidos , Transplante Homólogo , Estados Unidos/epidemiologiaRESUMO
OBJECTIVE: When studying any specific rare disease, heterogeneity and scarcity of affected individuals has historically hindered investigators from discerning on what to focus to understand and diagnose a disease. New nongenomic methodologies must be developed that identify similarities in seemingly dissimilar conditions. MATERIALS AND METHODS: This observational study analyzes 1042 patients from the Undiagnosed Diseases Network (2015-2019), a multicenter, nationwide research study using phenotypic data annotated by specialized staff using Human Phenotype Ontology terms. We used Louvain community detection to cluster patients linked by Jaccard pairwise similarity and 2 support vector classifier to assign new cases. We further validated the clusters' most representative comorbidities using a national claims database (67 million patients). RESULTS: Patients were divided into 2 groups: those with symptom onset before 18 years of age (n = 810) and at 18 years of age or older (n = 232) (average symptom onset age: 10 [interquartile range, 0-14] years). For 810 pediatric patients, we identified 4 statistically significant clusters. Two clusters were characterized by growth disorders, and developmental delay enriched for hypotonia presented a higher likelihood of diagnosis. Support vector classifier showed 0.89 balanced accuracy (0.83 for Human Phenotype Ontology terms only) on test data. DISCUSSIONS: To set the framework for future discovery, we chose as our endpoint the successful grouping of patients by phenotypic similarity and provide a classification tool to assign new patients to those clusters. CONCLUSION: This study shows that despite the scarcity and heterogeneity of patients, we can still find commonalities that can potentially be harnessed to uncover new insights and targets for therapy.
Assuntos
Doenças não Diagnosticadas , Adolescente , Adulto , Criança , Pré-Escolar , Bases de Dados Factuais , Humanos , Lactente , Recém-Nascido , Doenças Raras/diagnóstico , Doenças Raras/epidemiologiaRESUMO
OBJECTIVE: To advance use of real-world data (RWD) for pharmacovigilance, we sought to integrate a high-sensitivity natural language processing (NLP) pipeline for detecting potential adverse drug events (ADEs) with easily interpretable output for high-efficiency human review and adjudication of true ADEs. MATERIALS AND METHODS: The adverse drug event presentation and tracking (ADEPT) system employs an open source NLP pipeline to identify in clinical notes mentions of medications and signs and symptoms potentially indicative of ADEs. ADEPT presents the output to human reviewers by highlighting these drug-event pairs within the context of the clinical note. To measure incidence of seizures associated with sildenafil, we applied ADEPT to 149 029 notes for 982 patients with pediatric pulmonary hypertension. RESULTS: Of 416 patients identified as taking sildenafil, NLP found 72 [17%, 95% confidence interval (CI) 14-21] with seizures as a potential ADE. Upon human review and adjudication, only 4 (0.96%, 95% CI 0.37-2.4) patients with seizures were determined to have true ADEs. Reviewers using ADEPT required a median of 89 s (interquartile range 57-142 s) per patient to review potential ADEs. DISCUSSION: ADEPT combines high throughput NLP to increase sensitivity of ADE detection and human review, to increase specificity by differentiating true ADEs from signs and symptoms related to comorbidities, effects of other medications, or other confounders. CONCLUSION: ADEPT is a promising tool for creating gold standard, patient-level labels for advancing NLP-based pharmacovigilance. ADEPT is a potentially time savings platform for computer-assisted pharmacovigilance based on RWD.