Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 52
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 155(1): 70-80, 2013 Sep 26.
Article in English | MEDLINE | ID: mdl-24074861

ABSTRACT

Although countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. We detect thousands of associations between Mendelian and complex diseases, revealing a nondegenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this "Mendelian code." Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute nonadditively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.


Subject(s)
Disease/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Models, Genetic , Health Records, Personal , Humans , Penetrance , Polymorphism, Single Nucleotide
2.
PLoS Comput Biol ; 19(3): e1010944, 2023 03.
Article in English | MEDLINE | ID: mdl-36913405

ABSTRACT

We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.


Subject(s)
Software , Vocabulary, Controlled , Records
3.
Trends Genet ; 35(3): 223-234, 2019 03.
Article in English | MEDLINE | ID: mdl-30691868

ABSTRACT

Data commons collate data with cloud computing infrastructure and commonly used software services, tools, and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize, and share large-scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import, and analyze the data in a data commons. Data lakes provide an alternative to data commons and simply provide access to data, with the data curation and analysis deferred until later and delegated to those that access the data. We review software platforms for managing, analyzing, and sharing genomic data, with an emphasis on data commons, but also cover data ecosystems and data lakes.


Subject(s)
Cloud Computing/trends , Genomics/methods , Information Dissemination/methods , Software , Big Data , Biomedical Research/trends , Computational Biology/trends , Humans
4.
Genome Res ; 27(10): 1743-1751, 2017 10.
Article in English | MEDLINE | ID: mdl-28847918

ABSTRACT

Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most cancer pharmacogenomics discovery is conducted in preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute drug response in very large clinical cancer genomics data sets, such as The Cancer Genome Atlas (TCGA). The approach works by creating statistical models relating gene expression to drug response in large panels of cancer cell lines and applying these models to tumor gene expression data in the clinical data sets (e.g., TCGA). This yields an imputed drug response for every drug in each patient. These imputed drug response data are then associated with somatic genetic variants measured in the clinical cohort, such as copy number changes or mutations in protein coding genes. These analyses recapitulated drug associations for known clinically actionable somatic genetic alterations and identified new predictive biomarkers for existing drugs.


Subject(s)
Antineoplastic Agents/pharmacology , Biomarkers, Tumor/genetics , Genome, Human , Genomics/methods , Neoplasms , Pharmacogenomic Testing/methods , Female , Humans , Male , Neoplasms/drug therapy , Neoplasms/genetics
5.
Blood ; 130(4): 453-459, 2017 07 27.
Article in English | MEDLINE | ID: mdl-28600341

ABSTRACT

The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.


Subject(s)
Databases, Genetic , Neoplasms/genetics , Precision Medicine , Software , Humans , National Cancer Institute (U.S.) , United States
6.
Genome Res ; 24(7): 1224-35, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24985916

ABSTRACT

Annotation of regulatory elements and identification of the transcription-related factors (TRFs) targeting these elements are key steps in understanding how cells interpret their genetic blueprint and their environment during development, and how that process goes awry in the case of disease. One goal of the modENCODE (model organism ENCyclopedia of DNA Elements) Project is to survey a diverse sampling of TRFs, both DNA-binding and non-DNA-binding factors, to provide a framework for the subsequent study of the mechanisms by which transcriptional regulators target the genome. Here we provide an updated map of the Drosophila melanogaster regulatory genome based on the location of 84 TRFs at various stages of development. This regulatory map reveals a variety of genomic targeting patterns, including factors with strong preferences toward proximal promoter binding, factors that target intergenic and intronic DNA, and factors with distinct chromatin state preferences. The data also highlight the stringency of the Polycomb regulatory network, and show association of the Trithorax-like (Trl) protein with hotspots of DNA binding throughout development. Furthermore, the data identify more than 5800 instances in which TRFs target DNA regions with demonstrated enhancer activity. Regions of high TRF co-occupancy are more likely to be associated with open enhancers used across cell types, while lower TRF occupancy regions are associated with complex enhancers that are also regulated at the epigenetic level. Together these data serve as a resource for the research community in the continued effort to dissect transcriptional regulatory mechanisms directing Drosophila development.


Subject(s)
Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Gene Expression Regulation , Genome, Insect , Transcription Factors , Transcription, Genetic , Animals , Base Sequence , Binding Sites , Chromatin/genetics , Chromatin/metabolism , Cluster Analysis , Computational Biology/methods , Enhancer Elements, Genetic , Gene Expression Profiling , Genomics/methods , Nucleotide Motifs , Protein Binding , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism
7.
Nature ; 471(7339): 527-31, 2011 Mar 24.
Article in English | MEDLINE | ID: mdl-21430782

ABSTRACT

Systematic annotation of gene regulatory elements is a major challenge in genome science. Direct mapping of chromatin modification marks and transcriptional factor binding sites genome-wide has successfully identified specific subtypes of regulatory elements. In Drosophila several pioneering studies have provided genome-wide identification of Polycomb response elements, chromatin states, transcription factor binding sites, RNA polymerase II regulation and insulator elements; however, comprehensive annotation of the regulatory genome remains a significant challenge. Here we describe results from the modENCODE cis-regulatory annotation project. We produced a map of the Drosophila melanogaster regulatory genome on the basis of more than 300 chromatin immunoprecipitation data sets for eight chromatin features, five histone deacetylases and thirty-eight site-specific transcription factors at different stages of development. Using these data we inferred more than 20,000 candidate regulatory elements and validated a subset of predictions for promoters, enhancers and insulators in vivo. We identified also nearly 2,000 genomic regions of dense transcription factor binding associated with chromatin activity and accessibility. We discovered hundreds of new transcription factor co-binding relationships and defined a transcription factor network with over 800 potential regulatory relationships.


Subject(s)
Drosophila melanogaster/genetics , Genome, Insect/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Animals , Chromatin/metabolism , Chromatin Assembly and Disassembly , Chromatin Immunoprecipitation , Enhancer Elements, Genetic/genetics , Histone Deacetylases/metabolism , Insulator Elements/genetics , Promoter Regions, Genetic/genetics , Reproducibility of Results , Silencer Elements, Transcriptional/genetics , Transcription Factors/metabolism
8.
Comput Sci Eng ; 18(5): 10-20, 2016.
Article in English | MEDLINE | ID: mdl-29033693

ABSTRACT

Data commons collocate data, storage, and computing infrastructure with core services and commonly used tools and applications for managing, analyzing, and sharing data to create an interoperable resource for the research community. An architecture for data commons is described, as well as some lessons learned from operating several large-scale data commons.

9.
Blood ; 121(6): 975-83, 2013 Feb 07.
Article in English | MEDLINE | ID: mdl-23212519

ABSTRACT

Loss of chromosome 7 and del(7q) [-7/del(7q)] are recurring cytogenetic abnormalities in hematologic malignancies, including acute myeloid leukemia and therapy-related myeloid neoplasms, and associated with an adverse prognosis. Despite intensive effort by many laboratories, the putative myeloid tumor suppressor(s) on chromosome 7 has not yet been identified.We performed transcriptome sequencing and SNP array analysis on de novo and therapy-related myeloid neoplasms, half with -7/del(7q). We identified a 2.17-Mb commonly deleted segment on chromosome band 7q22.1 containing CUX1, a gene encoding a homeodomain-containing transcription factor. In 1 case, CUX1 was disrupted by a translocation, resulting in a loss-of-function RNA fusion transcript. CUX1 was the most significantly differentially expressed gene within the commonly deleted segment and was expressed at haploinsufficient levels in -7/del(7q) leukemias. Haploinsufficiency of the highly conserved ortholog, cut, led to hemocyte overgrowth and tumor formation in Drosophila melanogaster. Similarly, haploinsufficiency of CUX1 gave human hematopoietic cells a significant engraftment advantage on transplantation into immunodeficient mice. Within the RNA-sequencing data, we identified a CUX1-associated cell cycle transcriptional gene signature, suggesting that CUX1 exerts tumor suppressor activity by regulating proliferative genes. These data identify CUX1 as a conserved, haploinsufficient tumor suppressor frequently deleted in myeloid neoplasms.


Subject(s)
Chromosome Deletion , Chromosomes, Human, Pair 7/genetics , Homeodomain Proteins/genetics , Leukemia, Myeloid/genetics , Nuclear Proteins/genetics , Repressor Proteins/genetics , Acute Disease , Animals , Blotting, Western , Cell Line, Tumor , Drosophila melanogaster/genetics , Gene Expression Profiling , Haploinsufficiency , HeLa Cells , Homeodomain Proteins/metabolism , Humans , Interleukin Receptor Common gamma Subunit/deficiency , Interleukin Receptor Common gamma Subunit/genetics , K562 Cells , Leukemia, Myeloid/metabolism , Leukemia, Myeloid/pathology , Mice , Mice, Inbred NOD , Mice, Knockout , Mice, SCID , Nuclear Proteins/metabolism , RNA Interference , Repressor Proteins/metabolism , Reverse Transcriptase Polymerase Chain Reaction , Transcription Factors , Translocation, Genetic , Tumor Suppressor Proteins/genetics , Tumor Suppressor Proteins/metabolism , U937 Cells , Xenograft Model Antitumor Assays
11.
JAMIA Open ; 7(1): ooae004, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38304249

ABSTRACT

Objective: The Pediatric Cancer Data Commons (PCDC)-a project of Data for the Common Good-houses clinical pediatric oncology data and utilizes the open-source Gen3 platform. To meet the needs of end users, the PCDC development team expanded the out-of-box functionality and developed additional custom features that should be useful to any group developing similar data commons. Materials and Methods: Modifications of the PCDC data portal software were implemented to facilitate desired functionality. Results: Newly developed functionality includes updates to authorization methods, expansion of filtering capabilities, and addition of data analysis functions. Discussion: We describe the process by which custom functionalities were developed. Features are open source and available to be implemented and adapted to suit needs of data portals that utilize the Gen3 platform. Conclusion: Data portals are indispensable tools for facilitating data sharing. Open-source infrastructure facilitates a modular and collaborative approach for meeting needs of end users and stakeholders.

12.
JAMIA Open ; 7(2): ooae025, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38617994

ABSTRACT

Objectives: A data commons is a software platform for managing, curating, analyzing, and sharing data with a community. The Pandemic Response Commons (PRC) is a data commons designed to provide a data platform for researchers studying an epidemic or pandemic. Methods: The PRC was developed using the open source Gen3 data platform and is based upon consortium, data, and platform agreements developed by the not-for-profit Open Commons Consortium. A formal consortium of Chicagoland area organizations was formed to develop and operate the PRC. Results: The consortium developed a general PRC and an instance of it for the Chicagoland region called the Chicagoland COVID-19 Commons. A Gen3 data platform was set up and operated with policies, procedures, and controls for a NIST SP 800-53 revision 4 Moderate system. A consensus data model for the commons was developed, and a variety of datasets were curated, harmonized and ingested, including statistical summary data about COVID cases, patient level clinical data, and SARS-CoV-2 viral variant data. Discussion and conclusions: Given the various legal and data agreements required to operate a data commons, a PRC is designed to be in place and operating at a low level prior to the occurrence of an epidemic, with the activities increasing as required during an epidemic. A regional instance of a PRC can also be part of a broader data ecosystem or data mesh consisting of multiple regional commons supporting pandemic response through sharing regional data.

13.
Cancer Res ; 84(9): 1384-1387, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38488505

ABSTRACT

The NCI Cancer Research Data Commons (CRDC) is a collection of data commons, analysis platforms, and tools that make existing cancer data more findable and accessible by the cancer research community. In practice, the two biggest hurdles to finding and using data for discovery are the wide variety of models and ontologies used to describe data, and the dispersed storage of that data. Here, we outline core CRDC services to aggregate descriptive information from multiple studies for findability via a single interface and to provide a single access method that spans multiple data commons. See related articles by Wang et al., p. 1388, Pot et al., p. 1396, and Kim et al., p. 1404.


Subject(s)
National Cancer Institute (U.S.) , Neoplasms , Humans , United States , Neoplasms/therapy , Biomedical Research/standards , Databases, Factual
14.
Stud Health Technol Inform ; 310: 735-739, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269906

ABSTRACT

High-resolution whole slide image scans of histopathology slides have been widely used in recent years for prediction in cancer. However, in some cases, clinical informatics practitioners may only have access to low-resolution snapshots of histopathology slides, not high-resolution scans. We evaluated strategies for training neural network prognostic models in non-small cell lung cancer (NSCLC) based on low-resolution snapshots, using data from the Veterans Affairs Precision Oncology Data Repository. We compared strategies without transfer learning, with transfer learning from general domain images, and with transfer learning from publicly available high-resolution histopathology scans. We found transfer learning from high-resolution scans achieved significantly better performance than other strategies. Our contribution provides a foundation for future development of prognostic models in NSCLC that incorporate data from low-resolution pathology slide snapshots alongside known clinical predictors.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Medical Informatics , Humans , Carcinoma, Non-Small-Cell Lung/diagnostic imaging , Lung Neoplasms/diagnostic imaging , Precision Medicine , Machine Learning
15.
Cancer Res ; 84(9): 1388-1395, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38488507

ABSTRACT

Since 2014, the NCI has launched a series of data commons as part of the Cancer Research Data Commons (CRDC) ecosystem housing genomic, proteomic, imaging, and clinical data to support cancer research and promote data sharing of NCI-funded studies. This review describes each data commons (Genomic Data Commons, Proteomic Data Commons, Integrated Canine Data Commons, Cancer Data Service, Imaging Data Commons, and Clinical and Translational Data Commons), including their unique and shared features, accomplishments, and challenges. Also discussed is how the CRDC data commons implement Findable, Accessible, Interoperable, Reusable (FAIR) principles and promote data sharing in support of the new NIH Data Management and Sharing Policy. See related articles by Brady et al., p. 1384, Pot et al., p. 1396, and Kim et al., p. 1404.


Subject(s)
Information Dissemination , National Cancer Institute (U.S.) , Neoplasms , Humans , United States , Neoplasms/metabolism , Information Dissemination/methods , Biomedical Research , Genomics/methods , Animals , Proteomics/methods
16.
Appl Environ Microbiol ; 79(5): 1757-9, 2013 Mar.
Article in English | MEDLINE | ID: mdl-23315735

ABSTRACT

"Candidatus Portiera aleyrodidarum" is the primary endosymbiont of whiteflies. We report two complete genome sequences of this bacterium from the worldwide invasive B and Q biotypes of the whitefly Bemisia tabaci. Differences in the two genome sequences may add insights into the complex differences in the biology of both biotypes.


Subject(s)
Halomonadaceae/isolation & purification , Halomonadaceae/physiology , Hemiptera/microbiology , Symbiosis , Animals , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Genome, Bacterial , Halomonadaceae/classification , Halomonadaceae/genetics , Molecular Sequence Data
17.
bioRxiv ; 2023 Dec 10.
Article in English | MEDLINE | ID: mdl-38106010

ABSTRACT

Spatial transcriptomics (ST) has enhanced RNA analysis in tissue biopsies, but interpreting these data is challenging without expert input. We present Automated Tissue Alignment and Traversal (ATAT), a novel computational framework designed to enhance ST analysis in the context of multiple and complex tissue architectures and morphologies, such as those found in biopsies of the gastrointestinal tract. ATAT utilizes self-supervised contrastive learning on hematoxylin and eosin (H&E) stained images to automate the alignment and traversal of ST data. This approach addresses a critical gap in current ST analysis methodologies, which rely heavily on manual annotation and pathologist expertise to delineate regions of interest for accurate gene expression modeling. Our framework not only streamlines the alignment of multiple ST samples, but also demonstrates robustness in modeling gene expression transitions across specific regions. Additionally, we highlight the ability of ATAT to traverse complex tissue topologies in real-world cases from various individuals and conditions. Our method successfully elucidates differences in immune infiltration patterns across the intestinal wall, enabling the modeling of transcriptional changes across histological layers. We show that ATAT achieves comparable performance to the state-of-the-art method, while alleviating the burden of manual annotation and enabling alignment of tissue samples with complex morphologies.

18.
Article in English | MEDLINE | ID: mdl-38050021

ABSTRACT

Veterans are at an increased risk for prostate cancer, a disease with extraordinary clinical and molecular heterogeneity, compared with the general population. However, little is known about the underlying molecular heterogeneity within the veteran population and its impact on patient management and treatment. Using clinical and targeted tumor sequencing data from the National Veterans Affairs health system, we conducted a retrospective cohort study on 45 patients with advanced prostate cancer in the Veterans Precision Oncology Data Commons (VPODC), most of whom were metastatic castration-resistant. We characterized the mutational burden in this cohort and conducted unsupervised clustering analysis to stratify patients by molecular alterations. Veterans with prostate cancer exhibited a mutational landscape broadly similar to prior studies, including KMT2A and NOTCH1 mutations associated with neuroendocrine prostate cancer phenotype, previously reported to be enriched in veterans. We also identified several potential novel mutations in PTEN, MSH6, VHL, SMO, and ABL1 Hierarchical clustering analysis revealed two subgroups containing therapeutically targetable molecular features with novel mutational signatures distinct from those reported in the Catalogue of Somatic Mutations in Cancer database. The clustering approach presented in this study can potentially be used to clinically stratify patients based on their distinct mutational profiles and identify actionable somatic mutations for precision oncology.


Subject(s)
Prostatic Neoplasms , Veterans , Male , Humans , Retrospective Studies , Precision Medicine , Prostatic Neoplasms/genetics , Prostatic Neoplasms/pathology , Medical Oncology , Mutation
19.
J Mol Diagn ; 25(3): 143-155, 2023 03.
Article in English | MEDLINE | ID: mdl-36828596

ABSTRACT

The Blood Profiling Atlas in Cancer (BLOODPAC) Consortium is a collaborative effort involving stakeholders from the public, industry, academia, and regulatory agencies focused on developing shared best practices on liquid biopsy. This report describes the results from the JFDI (Just Freaking Do It) study, a BLOODPAC initiative to develop standards on the use of contrived materials mimicking cell-free circulating tumor DNA, to comparatively evaluate clinical laboratory testing procedures. Nine independent laboratories tested the concordance, sensitivity, and specificity of commercially available contrived materials with known variant-allele frequencies (VAFs) ranging from 0.1% to 5.0%. Each participating laboratory utilized its own proprietary evaluation procedures. The results demonstrated high levels of concordance and sensitivity at VAFs of >0.1%, but reduced concordance and sensitivity at a VAF of 0.1%; these findings were similar to those from previous studies, suggesting that commercially available contrived materials can support the evaluation of testing procedures across multiple technologies. Such materials may enable more objective comparisons of results on materials formulated in-house at each center in multicenter trials. A unique goal of the collaborative effort was to develop a data resource, the BLOODPAC Data Commons, now available to the liquid-biopsy community for further study. This resource can be used to support independent evaluations of results, data extension through data integration and new studies, and retrospective evaluation of data collection.


Subject(s)
Circulating Tumor DNA , Hematologic Neoplasms , Neoplasms , Humans , Retrospective Studies , Neoplasms/genetics , Liquid Biopsy/methods
20.
Cancer Res ; 83(8): 1175-1182, 2023 04 14.
Article in English | MEDLINE | ID: mdl-36625843

ABSTRACT

Big data in healthcare can enable unprecedented understanding of diseases and their treatment, particularly in oncology. These data may include electronic health records, medical imaging, genomic sequencing, payor records, and data from pharmaceutical research, wearables, and medical devices. The ability to combine datasets and use data across many analyses is critical to the successful use of big data and is a concern for those who generate and use the data. Interoperability and data quality continue to be major challenges when working with different healthcare datasets. Mapping terminology across datasets, missing and incorrect data, and varying data structures make combining data an onerous and largely manual undertaking. Data privacy is another concern addressed by the Health Insurance Portability and Accountability Act, the Common Rule, and the General Data Protection Regulation. The use of big data is now included in the planning and activities of the FDA and the European Medicines Agency. The willingness of organizations to share data in a precompetitive fashion, agreements on data quality standards, and institution of universal and practical tenets on data privacy will be crucial to fully realizing the potential for big data in medicine.


Subject(s)
Big Data , Neoplasms , Humans , Neoplasms/diagnosis , Neoplasms/therapy , Precision Medicine , Information Storage and Retrieval
SELECTION OF CITATIONS
SEARCH DETAIL