Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
2.
J Am Med Inform Assoc ; 30(7): 1293-1300, 2023 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-37192819

RESUMEN

Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData CatalystⓇ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.


Asunto(s)
COVID-19 , Nube Computacional , Humanos , Ecosistema , Reproducibilidad de los Resultados , Pulmón , Programas Informáticos
3.
Front Immunol ; 12: 782152, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34868058

RESUMEN

Minor histocompatibility antigens (mHAg) composed of peptides presented by HLA molecules can cause immune responses involved in graft-versus-host disease (GVHD) and graft-versus-leukemia effects after allogeneic hematopoietic cell transplantation (HCT). The current study was designed to identify individual graft-versus-host genomic mismatches associated with altered risks of acute or chronic GVHD or relapse after HCT between HLA-genotypically identical siblings. Our results demonstrate that in allogeneic HCT between a pair of HLA-identical siblings, a mHAg manifests as a set of peptides originating from annotated proteins and non-annotated open reading frames, which i) are encoded by a group of highly associated recipient genomic mismatches, ii) bind to HLA allotypes in the recipient, and iii) evoke a donor immune response. Attribution of the immune response and consequent clinical outcomes to individual peptide components within this set will likely differ from patient to patient according to their HLA types.


Asunto(s)
Trasplante de Células Madre Hematopoyéticas , Antígenos de Histocompatibilidad Menor/inmunología , Inmunología del Trasplante , Adolescente , Adulto , Anciano , Alelos , Niño , Preescolar , Susceptibilidad a Enfermedades/inmunología , Femenino , Predisposición Genética a la Enfermedad , Variación Genética , Enfermedad Injerto contra Huésped/epidemiología , Enfermedad Injerto contra Huésped/etiología , Antígenos HLA/genética , Antígenos HLA/inmunología , Trasplante de Células Madre Hematopoyéticas/efectos adversos , Trasplante de Células Madre Hematopoyéticas/métodos , Humanos , Incidencia , Lactante , Recién Nacido , Desequilibrio de Ligamiento , Masculino , Persona de Mediana Edad , Antígenos de Histocompatibilidad Menor/genética , Péptidos/genética , Péptidos/inmunología , Trasplante Homólogo , Adulto Joven
4.
HGG Adv ; 2(3)2021 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-34337551

RESUMEN

Whole-genome sequencing (WGS) and whole-exome sequencing studies have become increasingly available and are being used to identify rare genetic variants associated with health and disease outcomes. Investigators routinely use mixed models to account for genetic relatedness or other clustering variables (e.g., family or household) when testing genetic associations. However, no existing tests of the association of a rare variant with a binary outcome in the presence of correlated data control the type 1 error where there are (1) few individuals harboring the rare allele, (2) a small proportion of cases relative to controls, and (3) covariates to adjust for. Here, we address all three issues in developing a framework for testing rare variant association with a binary trait in individuals harboring at least one risk allele. In this framework, we estimate outcome probabilities under the null hypothesis and then use them, within the individuals with at least one risk allele, to test variant associations. We extend the BinomiRare test, which was previously proposed for independent observations, and develop the Conway-Maxwell-Poisson (CMP) test and study their properties in simulations. We show that the BinomiRare test always controls the type 1 error, while the CMP test sometimes does not. We then use the BinomiRare test to test the association of rare genetic variants in target genes with small-vessel disease (SVD) stroke, short sleep, and venous thromboembolism (VTE), in whole-genome sequence data from the Trans-Omics for Precision Medicine (TOPMed) program.

5.
Cell Rep ; 32(7): 108029, 2020 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-32814038

RESUMEN

Characterizing the tissue-specific binding sites of transcription factors (TFs) is essential to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting enables the prediction of genome-wide binding sites for hundreds of TFs simultaneously. Despite the public availability of high-quality DNase-seq data from hundreds of samples, a comprehensive, up-to-date resource for the locations of genomic footprints is lacking. Here, we develop a scalable footprinting workflow using two state-of-the-art algorithms: Wellington and HINT. We apply our workflow to detect footprints in 192 ENCODE DNase-seq experiments and predict the genomic occupancy of 1,515 human TFs in 27 human tissues. We validate that these footprints overlap true-positive TF binding sites from ChIP-seq. We demonstrate that the locations, depth, and tissue specificity of footprints predict effects of genetic variants on gene expression and capture a substantial proportion of genetic risk for complex traits.


Asunto(s)
Sitios de Unión/genética , Desoxirribonucleasas/metabolismo , Genómica/métodos , Factores de Transcripción/metabolismo , Humanos
6.
PLoS One ; 14(4): e0213013, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30973881

RESUMEN

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.


Asunto(s)
Macrodatos , Ciencia de los Datos/estadística & datos numéricos , Bases de Datos Factuales/estadística & datos numéricos , Algoritmos , Humanos , Difusión de la Información , Estudios Longitudinales , Programas Informáticos
7.
PLoS One ; 11(8): e0157077, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27494614

RESUMEN

BACKGROUND: A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. METHODS AND FINDINGS: Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. CONCLUSIONS: Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.


Asunto(s)
Bases de Datos Factuales , Enfermedad de Parkinson/diagnóstico , Anciano , Progresión de la Enfermedad , Femenino , Humanos , Modelos Logísticos , Masculino , Neuroimagen , Enfermedad de Parkinson/genética , Enfermedad de Parkinson/patología , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...