Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 155(1): 70-80, 2013 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-24074861

RESUMEN

Although countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. We detect thousands of associations between Mendelian and complex diseases, revealing a nondegenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this "Mendelian code." Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute nonadditively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.


Asunto(s)
Enfermedad/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Registros de Salud Personal , Humanos , Penetrancia , Polimorfismo de Nucleótido Simple
2.
PLoS Comput Biol ; 19(3): e1010944, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36913405

RESUMEN

We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.


Asunto(s)
Programas Informáticos , Vocabulario Controlado , Registros
3.
Trends Genet ; 35(3): 223-234, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30691868

RESUMEN

Data commons collate data with cloud computing infrastructure and commonly used software services, tools, and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize, and share large-scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import, and analyze the data in a data commons. Data lakes provide an alternative to data commons and simply provide access to data, with the data curation and analysis deferred until later and delegated to those that access the data. We review software platforms for managing, analyzing, and sharing genomic data, with an emphasis on data commons, but also cover data ecosystems and data lakes.


Asunto(s)
Nube Computacional/tendencias , Genómica/métodos , Difusión de la Información/métodos , Programas Informáticos , Macrodatos , Investigación Biomédica/tendencias , Biología Computacional/tendencias , Humanos
4.
Genome Res ; 27(10): 1743-1751, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28847918

RESUMEN

Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most cancer pharmacogenomics discovery is conducted in preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute drug response in very large clinical cancer genomics data sets, such as The Cancer Genome Atlas (TCGA). The approach works by creating statistical models relating gene expression to drug response in large panels of cancer cell lines and applying these models to tumor gene expression data in the clinical data sets (e.g., TCGA). This yields an imputed drug response for every drug in each patient. These imputed drug response data are then associated with somatic genetic variants measured in the clinical cohort, such as copy number changes or mutations in protein coding genes. These analyses recapitulated drug associations for known clinically actionable somatic genetic alterations and identified new predictive biomarkers for existing drugs.


Asunto(s)
Antineoplásicos/farmacología , Biomarcadores de Tumor/genética , Genoma Humano , Genómica/métodos , Neoplasias , Pruebas de Farmacogenómica/métodos , Femenino , Humanos , Masculino , Neoplasias/tratamiento farmacológico , Neoplasias/genética
5.
Blood ; 130(4): 453-459, 2017 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-28600341

RESUMEN

The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.


Asunto(s)
Bases de Datos Genéticas , Neoplasias/genética , Medicina de Precisión , Programas Informáticos , Humanos , National Cancer Institute (U.S.) , Estados Unidos
6.
Genome Res ; 24(7): 1224-35, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24985916

RESUMEN

Annotation of regulatory elements and identification of the transcription-related factors (TRFs) targeting these elements are key steps in understanding how cells interpret their genetic blueprint and their environment during development, and how that process goes awry in the case of disease. One goal of the modENCODE (model organism ENCyclopedia of DNA Elements) Project is to survey a diverse sampling of TRFs, both DNA-binding and non-DNA-binding factors, to provide a framework for the subsequent study of the mechanisms by which transcriptional regulators target the genome. Here we provide an updated map of the Drosophila melanogaster regulatory genome based on the location of 84 TRFs at various stages of development. This regulatory map reveals a variety of genomic targeting patterns, including factors with strong preferences toward proximal promoter binding, factors that target intergenic and intronic DNA, and factors with distinct chromatin state preferences. The data also highlight the stringency of the Polycomb regulatory network, and show association of the Trithorax-like (Trl) protein with hotspots of DNA binding throughout development. Furthermore, the data identify more than 5800 instances in which TRFs target DNA regions with demonstrated enhancer activity. Regions of high TRF co-occupancy are more likely to be associated with open enhancers used across cell types, while lower TRF occupancy regions are associated with complex enhancers that are also regulated at the epigenetic level. Together these data serve as a resource for the research community in the continued effort to dissect transcriptional regulatory mechanisms directing Drosophila development.


Asunto(s)
Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Regulación de la Expresión Génica , Genoma de los Insectos , Factores de Transcripción , Transcripción Genética , Animales , Secuencia de Bases , Sitios de Unión , Cromatina/genética , Cromatina/metabolismo , Análisis por Conglomerados , Biología Computacional/métodos , Elementos de Facilitación Genéticos , Perfilación de la Expresión Génica , Genómica/métodos , Motivos de Nucleótidos , Unión Proteica , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/metabolismo
7.
Nature ; 471(7339): 527-31, 2011 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-21430782

RESUMEN

Systematic annotation of gene regulatory elements is a major challenge in genome science. Direct mapping of chromatin modification marks and transcriptional factor binding sites genome-wide has successfully identified specific subtypes of regulatory elements. In Drosophila several pioneering studies have provided genome-wide identification of Polycomb response elements, chromatin states, transcription factor binding sites, RNA polymerase II regulation and insulator elements; however, comprehensive annotation of the regulatory genome remains a significant challenge. Here we describe results from the modENCODE cis-regulatory annotation project. We produced a map of the Drosophila melanogaster regulatory genome on the basis of more than 300 chromatin immunoprecipitation data sets for eight chromatin features, five histone deacetylases and thirty-eight site-specific transcription factors at different stages of development. Using these data we inferred more than 20,000 candidate regulatory elements and validated a subset of predictions for promoters, enhancers and insulators in vivo. We identified also nearly 2,000 genomic regions of dense transcription factor binding associated with chromatin activity and accessibility. We discovered hundreds of new transcription factor co-binding relationships and defined a transcription factor network with over 800 potential regulatory relationships.


Asunto(s)
Drosophila melanogaster/genética , Genoma de los Insectos/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Cromatina/metabolismo , Ensamble y Desensamble de Cromatina , Inmunoprecipitación de Cromatina , Elementos de Facilitación Genéticos/genética , Histona Desacetilasas/metabolismo , Elementos Aisladores/genética , Regiones Promotoras Genéticas/genética , Reproducibilidad de los Resultados , Elementos Silenciadores Transcripcionales/genética , Factores de Transcripción/metabolismo
8.
Comput Sci Eng ; 18(5): 10-20, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-29033693

RESUMEN

Data commons collocate data, storage, and computing infrastructure with core services and commonly used tools and applications for managing, analyzing, and sharing data to create an interoperable resource for the research community. An architecture for data commons is described, as well as some lessons learned from operating several large-scale data commons.

9.
Blood ; 121(6): 975-83, 2013 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-23212519

RESUMEN

Loss of chromosome 7 and del(7q) [-7/del(7q)] are recurring cytogenetic abnormalities in hematologic malignancies, including acute myeloid leukemia and therapy-related myeloid neoplasms, and associated with an adverse prognosis. Despite intensive effort by many laboratories, the putative myeloid tumor suppressor(s) on chromosome 7 has not yet been identified.We performed transcriptome sequencing and SNP array analysis on de novo and therapy-related myeloid neoplasms, half with -7/del(7q). We identified a 2.17-Mb commonly deleted segment on chromosome band 7q22.1 containing CUX1, a gene encoding a homeodomain-containing transcription factor. In 1 case, CUX1 was disrupted by a translocation, resulting in a loss-of-function RNA fusion transcript. CUX1 was the most significantly differentially expressed gene within the commonly deleted segment and was expressed at haploinsufficient levels in -7/del(7q) leukemias. Haploinsufficiency of the highly conserved ortholog, cut, led to hemocyte overgrowth and tumor formation in Drosophila melanogaster. Similarly, haploinsufficiency of CUX1 gave human hematopoietic cells a significant engraftment advantage on transplantation into immunodeficient mice. Within the RNA-sequencing data, we identified a CUX1-associated cell cycle transcriptional gene signature, suggesting that CUX1 exerts tumor suppressor activity by regulating proliferative genes. These data identify CUX1 as a conserved, haploinsufficient tumor suppressor frequently deleted in myeloid neoplasms.


Asunto(s)
Deleción Cromosómica , Cromosomas Humanos Par 7/genética , Proteínas de Homeodominio/genética , Leucemia Mieloide/genética , Proteínas Nucleares/genética , Proteínas Represoras/genética , Enfermedad Aguda , Animales , Western Blotting , Línea Celular Tumoral , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Haploinsuficiencia , Células HeLa , Proteínas de Homeodominio/metabolismo , Humanos , Subunidad gamma Común de Receptores de Interleucina/deficiencia , Subunidad gamma Común de Receptores de Interleucina/genética , Células K562 , Leucemia Mieloide/metabolismo , Leucemia Mieloide/patología , Ratones , Ratones Endogámicos NOD , Ratones Noqueados , Ratones SCID , Proteínas Nucleares/metabolismo , Interferencia de ARN , Proteínas Represoras/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Factores de Transcripción , Translocación Genética , Proteínas Supresoras de Tumor/genética , Proteínas Supresoras de Tumor/metabolismo , Células U937 , Ensayos Antitumor por Modelo de Xenoinjerto
11.
JAMIA Open ; 7(1): ooae004, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38304249

RESUMEN

Objective: The Pediatric Cancer Data Commons (PCDC)-a project of Data for the Common Good-houses clinical pediatric oncology data and utilizes the open-source Gen3 platform. To meet the needs of end users, the PCDC development team expanded the out-of-box functionality and developed additional custom features that should be useful to any group developing similar data commons. Materials and Methods: Modifications of the PCDC data portal software were implemented to facilitate desired functionality. Results: Newly developed functionality includes updates to authorization methods, expansion of filtering capabilities, and addition of data analysis functions. Discussion: We describe the process by which custom functionalities were developed. Features are open source and available to be implemented and adapted to suit needs of data portals that utilize the Gen3 platform. Conclusion: Data portals are indispensable tools for facilitating data sharing. Open-source infrastructure facilitates a modular and collaborative approach for meeting needs of end users and stakeholders.

12.
Cancer Res ; 84(9): 1384-1387, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38488505

RESUMEN

The NCI Cancer Research Data Commons (CRDC) is a collection of data commons, analysis platforms, and tools that make existing cancer data more findable and accessible by the cancer research community. In practice, the two biggest hurdles to finding and using data for discovery are the wide variety of models and ontologies used to describe data, and the dispersed storage of that data. Here, we outline core CRDC services to aggregate descriptive information from multiple studies for findability via a single interface and to provide a single access method that spans multiple data commons. See related articles by Wang et al., p. 1388, Pot et al., p. 1396, and Kim et al., p. 1404.


Asunto(s)
National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/terapia , Investigación Biomédica/normas , Bases de Datos Factuales
13.
JAMIA Open ; 7(2): ooae025, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38617994

RESUMEN

Objectives: A data commons is a software platform for managing, curating, analyzing, and sharing data with a community. The Pandemic Response Commons (PRC) is a data commons designed to provide a data platform for researchers studying an epidemic or pandemic. Methods: The PRC was developed using the open source Gen3 data platform and is based upon consortium, data, and platform agreements developed by the not-for-profit Open Commons Consortium. A formal consortium of Chicagoland area organizations was formed to develop and operate the PRC. Results: The consortium developed a general PRC and an instance of it for the Chicagoland region called the Chicagoland COVID-19 Commons. A Gen3 data platform was set up and operated with policies, procedures, and controls for a NIST SP 800-53 revision 4 Moderate system. A consensus data model for the commons was developed, and a variety of datasets were curated, harmonized and ingested, including statistical summary data about COVID cases, patient level clinical data, and SARS-CoV-2 viral variant data. Discussion and conclusions: Given the various legal and data agreements required to operate a data commons, a PRC is designed to be in place and operating at a low level prior to the occurrence of an epidemic, with the activities increasing as required during an epidemic. A regional instance of a PRC can also be part of a broader data ecosystem or data mesh consisting of multiple regional commons supporting pandemic response through sharing regional data.

14.
Stud Health Technol Inform ; 310: 735-739, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269906

RESUMEN

High-resolution whole slide image scans of histopathology slides have been widely used in recent years for prediction in cancer. However, in some cases, clinical informatics practitioners may only have access to low-resolution snapshots of histopathology slides, not high-resolution scans. We evaluated strategies for training neural network prognostic models in non-small cell lung cancer (NSCLC) based on low-resolution snapshots, using data from the Veterans Affairs Precision Oncology Data Repository. We compared strategies without transfer learning, with transfer learning from general domain images, and with transfer learning from publicly available high-resolution histopathology scans. We found transfer learning from high-resolution scans achieved significantly better performance than other strategies. Our contribution provides a foundation for future development of prognostic models in NSCLC that incorporate data from low-resolution pathology slide snapshots alongside known clinical predictors.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Informática Médica , Humanos , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico por imagen , Neoplasias Pulmonares/diagnóstico por imagen , Medicina de Precisión , Aprendizaje Automático
15.
Cancer Res ; 84(9): 1388-1395, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38488507

RESUMEN

Since 2014, the NCI has launched a series of data commons as part of the Cancer Research Data Commons (CRDC) ecosystem housing genomic, proteomic, imaging, and clinical data to support cancer research and promote data sharing of NCI-funded studies. This review describes each data commons (Genomic Data Commons, Proteomic Data Commons, Integrated Canine Data Commons, Cancer Data Service, Imaging Data Commons, and Clinical and Translational Data Commons), including their unique and shared features, accomplishments, and challenges. Also discussed is how the CRDC data commons implement Findable, Accessible, Interoperable, Reusable (FAIR) principles and promote data sharing in support of the new NIH Data Management and Sharing Policy. See related articles by Brady et al., p. 1384, Pot et al., p. 1396, and Kim et al., p. 1404.


Asunto(s)
Difusión de la Información , National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/metabolismo , Difusión de la Información/métodos , Investigación Biomédica , Genómica/métodos , Animales , Proteómica/métodos
16.
Appl Environ Microbiol ; 79(5): 1757-9, 2013 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-23315735

RESUMEN

"Candidatus Portiera aleyrodidarum" is the primary endosymbiont of whiteflies. We report two complete genome sequences of this bacterium from the worldwide invasive B and Q biotypes of the whitefly Bemisia tabaci. Differences in the two genome sequences may add insights into the complex differences in the biology of both biotypes.


Asunto(s)
Halomonadaceae/aislamiento & purificación , Halomonadaceae/fisiología , Hemípteros/microbiología , Simbiosis , Animales , ADN Bacteriano/química , ADN Bacteriano/genética , Genoma Bacteriano , Halomonadaceae/clasificación , Halomonadaceae/genética , Datos de Secuencia Molecular
17.
bioRxiv ; 2023 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-38106010

RESUMEN

Spatial transcriptomics (ST) has enhanced RNA analysis in tissue biopsies, but interpreting these data is challenging without expert input. We present Automated Tissue Alignment and Traversal (ATAT), a novel computational framework designed to enhance ST analysis in the context of multiple and complex tissue architectures and morphologies, such as those found in biopsies of the gastrointestinal tract. ATAT utilizes self-supervised contrastive learning on hematoxylin and eosin (H&E) stained images to automate the alignment and traversal of ST data. This approach addresses a critical gap in current ST analysis methodologies, which rely heavily on manual annotation and pathologist expertise to delineate regions of interest for accurate gene expression modeling. Our framework not only streamlines the alignment of multiple ST samples, but also demonstrates robustness in modeling gene expression transitions across specific regions. Additionally, we highlight the ability of ATAT to traverse complex tissue topologies in real-world cases from various individuals and conditions. Our method successfully elucidates differences in immune infiltration patterns across the intestinal wall, enabling the modeling of transcriptional changes across histological layers. We show that ATAT achieves comparable performance to the state-of-the-art method, while alleviating the burden of manual annotation and enabling alignment of tissue samples with complex morphologies.

18.
Artículo en Inglés | MEDLINE | ID: mdl-38050021

RESUMEN

Veterans are at an increased risk for prostate cancer, a disease with extraordinary clinical and molecular heterogeneity, compared with the general population. However, little is known about the underlying molecular heterogeneity within the veteran population and its impact on patient management and treatment. Using clinical and targeted tumor sequencing data from the National Veterans Affairs health system, we conducted a retrospective cohort study on 45 patients with advanced prostate cancer in the Veterans Precision Oncology Data Commons (VPODC), most of whom were metastatic castration-resistant. We characterized the mutational burden in this cohort and conducted unsupervised clustering analysis to stratify patients by molecular alterations. Veterans with prostate cancer exhibited a mutational landscape broadly similar to prior studies, including KMT2A and NOTCH1 mutations associated with neuroendocrine prostate cancer phenotype, previously reported to be enriched in veterans. We also identified several potential novel mutations in PTEN, MSH6, VHL, SMO, and ABL1 Hierarchical clustering analysis revealed two subgroups containing therapeutically targetable molecular features with novel mutational signatures distinct from those reported in the Catalogue of Somatic Mutations in Cancer database. The clustering approach presented in this study can potentially be used to clinically stratify patients based on their distinct mutational profiles and identify actionable somatic mutations for precision oncology.


Asunto(s)
Neoplasias de la Próstata , Veteranos , Masculino , Humanos , Estudios Retrospectivos , Medicina de Precisión , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , Oncología Médica , Mutación
19.
J Mol Diagn ; 25(3): 143-155, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36828596

RESUMEN

The Blood Profiling Atlas in Cancer (BLOODPAC) Consortium is a collaborative effort involving stakeholders from the public, industry, academia, and regulatory agencies focused on developing shared best practices on liquid biopsy. This report describes the results from the JFDI (Just Freaking Do It) study, a BLOODPAC initiative to develop standards on the use of contrived materials mimicking cell-free circulating tumor DNA, to comparatively evaluate clinical laboratory testing procedures. Nine independent laboratories tested the concordance, sensitivity, and specificity of commercially available contrived materials with known variant-allele frequencies (VAFs) ranging from 0.1% to 5.0%. Each participating laboratory utilized its own proprietary evaluation procedures. The results demonstrated high levels of concordance and sensitivity at VAFs of >0.1%, but reduced concordance and sensitivity at a VAF of 0.1%; these findings were similar to those from previous studies, suggesting that commercially available contrived materials can support the evaluation of testing procedures across multiple technologies. Such materials may enable more objective comparisons of results on materials formulated in-house at each center in multicenter trials. A unique goal of the collaborative effort was to develop a data resource, the BLOODPAC Data Commons, now available to the liquid-biopsy community for further study. This resource can be used to support independent evaluations of results, data extension through data integration and new studies, and retrospective evaluation of data collection.


Asunto(s)
ADN Tumoral Circulante , Neoplasias Hematológicas , Neoplasias , Humanos , Estudios Retrospectivos , Neoplasias/genética , Biopsia Líquida/métodos
20.
Cancer Res ; 83(8): 1175-1182, 2023 04 14.
Artículo en Inglés | MEDLINE | ID: mdl-36625843

RESUMEN

Big data in healthcare can enable unprecedented understanding of diseases and their treatment, particularly in oncology. These data may include electronic health records, medical imaging, genomic sequencing, payor records, and data from pharmaceutical research, wearables, and medical devices. The ability to combine datasets and use data across many analyses is critical to the successful use of big data and is a concern for those who generate and use the data. Interoperability and data quality continue to be major challenges when working with different healthcare datasets. Mapping terminology across datasets, missing and incorrect data, and varying data structures make combining data an onerous and largely manual undertaking. Data privacy is another concern addressed by the Health Insurance Portability and Accountability Act, the Common Rule, and the General Data Protection Regulation. The use of big data is now included in the planning and activities of the FDA and the European Medicines Agency. The willingness of organizations to share data in a precompetitive fashion, agreements on data quality standards, and institution of universal and practical tenets on data privacy will be crucial to fully realizing the potential for big data in medicine.


Asunto(s)
Macrodatos , Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/terapia , Medicina de Precisión , Almacenamiento y Recuperación de la Información
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA