RESUMO
A comprehensive catalog of cancer driver mutations is essential for understanding tumorigenesis and developing therapies. Exome-sequencing studies have mapped many protein-coding drivers, yet few non-coding drivers are known because genome-wide discovery is challenging. We developed a driver discovery method, ActiveDriverWGS, and analyzed 120,788 cis-regulatory modules (CRMs) across 1,844 whole tumor genomes from the ICGC-TCGA PCAWG project. We found 30 CRMs with enriched SNVs and indels (FDR < 0.05). These frequently mutated regulatory elements (FMREs) were ubiquitously active in human tissues, showed long-range chromatin interactions and mRNA abundance associations with target genes, and were enriched in motif-rewiring mutations and structural variants. Genomic deletion of one FMRE in human cells caused proliferative deficiencies and transcriptional deregulation of cancer genes CCNB1IP1, CDH1, and CDKN2B, validating observations in FMRE-mutated tumors. Pathway analysis revealed further sub-significant FMREs at cancer genes and processes, indicating an unexplored landscape of infrequent driver mutations in the non-coding genome.
Assuntos
Biomarcadores Tumorais/genética , Cromatina/metabolismo , Redes Reguladoras de Genes , Mutação , Neoplasias/genética , Neoplasias/patologia , Sequências Reguladoras de Ácido Nucleico , Proliferação de Células , Cromatina/genética , Biologia Computacional/métodos , Análise Mutacional de DNA , Genoma Humano , Células HEK293 , HumanosRESUMO
The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability and Technology. The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR's design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.
Assuntos
Software , Humanos , Genoma , Genômica , Disseminação de InformaçãoRESUMO
The Reactome Knowledgebase (https://reactome.org), an Elixir and GCBR core biological data resource, provides manually curated molecular details of a broad range of normal and disease-related biological processes. Processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Here we review progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways. Finally, we briefly discuss issues involved in making Reactome more fully interoperable with other related resources such as the Gene Ontology and maintaining the resulting community resource network.
Assuntos
Bases de Conhecimento , Redes e Vias Metabólicas , Transdução de Sinais , Humanos , Redes e Vias Metabólicas/genética , Proteoma/genéticaRESUMO
Cancers are caused by genomic alterations known as drivers. Hundreds of drivers in coding genes are known but, to date, only a handful of noncoding drivers have been discovered-despite intensive searching1,2. Attention has recently shifted to the role of altered RNA splicing in cancer; driver mutations that lead to transcriptome-wide aberrant splicing have been identified in multiple types of cancer, although these mutations have only been found in protein-coding splicing factors such as splicing factor 3b subunit 1 (SF3B1)3-6. By contrast, cancer-related alterations in the noncoding component of the spliceosome-a series of small nuclear RNAs (snRNAs)-have barely been studied, owing to the combined challenges of characterizing noncoding cancer drivers and the repetitive nature of snRNA genes1,7,8. Here we report a highly recurrent A>C somatic mutation at the third base of U1 snRNA in several types of tumour. The primary function of U1 snRNA is to recognize the 5' splice site via base-pairing. This mutation changes the preferential A-U base-pairing between U1 snRNA and the 5' splice site to C-G base-pairing, and thus creates novel splice junctions and alters the splicing pattern of multiple genes-including known drivers of cancer. Clinically, the A>C mutation is associated with heavy alcohol use in patients with hepatocellular carcinoma, and with the aggressive subtype of chronic lymphocytic leukaemia with unmutated immunoglobulin heavy-chain variable regions. The mutation in U1 snRNA also independently confers an adverse prognosis to patients with chronic lymphocytic leukaemia. Our study demonstrates a noncoding driver in spliceosomal RNAs, reveals a mechanism of aberrant splicing in cancer and may represent a new target for treatment. Our findings also suggest that driver discovery should be extended to a wider range of genomic regions.
Assuntos
Mutação , Neoplasias/genética , RNA Nuclear Pequeno/genética , Spliceossomos/genética , Humanos , Neoplasias/patologia , Neoplasias/fisiopatologia , Sítios de Splice de RNA , Splicing de RNA , Fatores de Processamento de RNA/genéticaRESUMO
Study of the origin and development of cerebellar tumours has been hampered by the complexity and heterogeneity of cerebellar cells that change over the course of development. Here we use single-cell transcriptomics to study more than 60,000 cells from the developing mouse cerebellum and show that different molecular subgroups of childhood cerebellar tumours mirror the transcription of cells from distinct, temporally restricted cerebellar lineages. The Sonic Hedgehog medulloblastoma subgroup transcriptionally mirrors the granule cell hierarchy as expected, while group 3 medulloblastoma resembles Nestin+ stem cells, group 4 medulloblastoma resembles unipolar brush cells, and PFA/PFB ependymoma and cerebellar pilocytic astrocytoma resemble the prenatal gliogenic progenitor cells. Furthermore, single-cell transcriptomics of human childhood cerebellar tumours demonstrates that many bulk tumours contain a mixed population of cells with divergent differentiation. Our data highlight cerebellar tumours as a disorder of early brain development and provide a proximate explanation for the peak incidence of cerebellar tumours in early childhood.
Assuntos
Neoplasias Cerebelares/genética , Neoplasias Cerebelares/patologia , Evolução Molecular , Feto/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Regulação Neoplásica da Expressão Gênica , Transcrição Gênica , Animais , Neoplasias Cerebelares/classificação , Cerebelo/citologia , Cerebelo/embriologia , Cerebelo/metabolismo , Criança , Feminino , Feto/citologia , Glioma/classificação , Glioma/genética , Glioma/patologia , Humanos , Meduloblastoma/classificação , Meduloblastoma/genética , Meduloblastoma/patologia , Camundongos , Análise de Sequência de RNA , Análise de Célula Única , Fatores de Tempo , Transcriptoma/genéticaRESUMO
MOTIVATION: JBrowse Jupyter is a package that aims to close the gap between Python programming and genomic visualization. Web-based genome browsers are routinely used for publishing and inspecting genome annotations. Historically they have been deployed at the end of bioinformatics pipelines, typically decoupled from the analysis itself. However, emerging technologies such as Jupyter notebooks enable a more rapid iterative cycle of development, analysis and visualization. RESULTS: We have developed a package that provides a Python interface to JBrowse 2's suite of embeddable components, including the primary Linear Genome View. The package enables users to quickly set up, launch and customize JBrowse views from Jupyter notebooks. In addition, users can share their data via Google's Colab notebooks, providing reproducible interactive views. AVAILABILITY AND IMPLEMENTATION: JBrowse Jupyter is released under the Apache License and is available for download on PyPI. Source code and demos are available on GitHub at https://github.com/GMOD/jbrowse-jupyter.
Assuntos
Biologia Computacional , Genômica , Software , Genoma , NavegadorRESUMO
This article presents 14 quick tips to build a team to crowdsource data for public health advocacy. It includes tips around team building and logistics, infrastructure setup, media and industry outreach, and project wrap-up and archival for posterity.
Assuntos
Crowdsourcing , Saúde Pública , Web SemânticaRESUMO
The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.
Assuntos
Antivirais/farmacologia , Bases de Conhecimento , Proteínas/metabolismo , COVID-19/metabolismo , Curadoria de Dados , Genoma Humano , Interações Hospedeiro-Patógeno , Humanos , Proteínas/genética , Transdução de Sinais , SoftwareRESUMO
Dockstore (https://dockstore.org/) is an open source platform for publishing, sharing, and finding bioinformatics tools and workflows. The platform has facilitated large-scale biomedical research collaborations by using cloud technologies to increase the Findability, Accessibility, Interoperability and Reusability (FAIR) of computational resources, thereby promoting the reproducibility of complex bioinformatics analyses. Dockstore supports a variety of source repositories, analysis frameworks, and language technologies to provide a seamless publishing platform for authors to create a centralized catalogue of scientific software. The ready-to-use packaging of hundreds of tools and workflows, combined with the implementation of interoperability standards, enables users to launch analyses across multiple environments. Dockstore is widely used, more than twenty-five high-profile organizations share analysis collections through the platform in a variety of workflow languages, including the Broad Institute's GATK best practice and COVID-19 workflows (WDL), nf-core workflows (Nextflow), the Intergalactic Workflow Commission tools (Galaxy), and workflows from Seven Bridges (CWL) to highlight just a few. Here we describe the improvements made over the last four years, including the expansion of system integrations supporting authors, the addition of collaboration features and analysis platform integrations supporting users, and other enhancements that improve the overall scientific reproducibility of Dockstore content.
Assuntos
Biologia Computacional/métodos , Disseminação de Informação , Internet , Software , Fluxo de Trabalho , Computação em Nuvem , Biologia Computacional/educação , Visualização de Dados , Humanos , National Heart, Lung, and Blood Institute (U.S.) , National Human Genome Research Institute (U.S.) , Reprodutibilidade dos Testes , Estados UnidosRESUMO
Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Genômica/métodos , Proteínas de Plantas/genética , Plantas/genética , Produtos Agrícolas , Elementos de DNA Transponíveis , Duplicação Gênica , Ontologia Genética , Redes Reguladoras de Genes , Internet , Bases de Conhecimento , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , Plantas/classificação , Plantas/metabolismo , Poliploidia , Mapeamento de Interação de Proteínas , Software , Zea mays/genética , Zea mays/metabolismoRESUMO
MOTIVATION: Genome browsers are an essential tool in genome analysis. Modern genome browsers enable complex and interactive visualization of a wide variety of genomic data modalities. While such browsers are very powerful, they can be challenging to configure and program for bioinformaticians lacking expertise in web development. RESULTS: We have developed an R package that provides an interface to the JBrowse 2 genome browser. The package can be used to configure and customize the browser entirely with R code. The browser can be deployed from the R console, or embedded in Shiny applications or R Markdown documents. AVAILABILITY AND IMPLEMENTATION: JBrowseR is available for download from CRAN, and the source code is openly available from the Github repository at https://github.com/GMOD/JBrowseR/.
Assuntos
Genoma , Genômica , SoftwareRESUMO
Pancreatic cancer, a highly aggressive tumour type with uniformly poor prognosis, exemplifies the classically held view of stepwise cancer development. The current model of tumorigenesis, based on analyses of precursor lesions, termed pancreatic intraepithelial neoplasm (PanINs) lesions, makes two predictions: first, that pancreatic cancer develops through a particular sequence of genetic alterations (KRAS, followed by CDKN2A, then TP53 and SMAD4); and second, that the evolutionary trajectory of pancreatic cancer progression is gradual because each alteration is acquired independently. A shortcoming of this model is that clonally expanded precursor lesions do not always belong to the tumour lineage, indicating that the evolutionary trajectory of the tumour lineage and precursor lesions can be divergent. This prevailing model of tumorigenesis has contributed to the clinical notion that pancreatic cancer evolves slowly and presents at a late stage. However, the propensity for this disease to rapidly metastasize and the inability to improve patient outcomes, despite efforts aimed at early detection, suggest that pancreatic cancer progression is not gradual. Here, using newly developed informatics tools, we tracked changes in DNA copy number and their associated rearrangements in tumour-enriched genomes and found that pancreatic cancer tumorigenesis is neither gradual nor follows the accepted mutation order. Two-thirds of tumours harbour complex rearrangement patterns associated with mitotic errors, consistent with punctuated equilibrium as the principal evolutionary trajectory. In a subset of cases, the consequence of such errors is the simultaneous, rather than sequential, knockout of canonical preneoplastic genetic drivers that are likely to set-off invasive cancer growth. These findings challenge the current progression model of pancreatic cancer and provide insights into the mutational processes that give rise to these aggressive tumours.
Assuntos
Carcinogênese/genética , Carcinogênese/patologia , Rearranjo Gênico/genética , Genoma Humano/genética , Modelos Biológicos , Mutagênese/genética , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/patologia , Carcinoma in Situ/genética , Cromotripsia , Variações do Número de Cópias de DNA/genética , Progressão da Doença , Evolução Molecular , Feminino , Genes Neoplásicos/genética , Humanos , Masculino , Mitose/genética , Mutação/genética , Invasividade Neoplásica/genética , Invasividade Neoplásica/patologia , Metástase Neoplásica/genética , Metástase Neoplásica/patologia , Poliploidia , Lesões Pré-Cancerosas/genéticaRESUMO
Plant Reactome (https://plantreactome.gramene.org) is an open-source, comparative plant pathway knowledgebase of the Gramene project. It uses Oryza sativa (rice) as a reference species for manual curation of pathways and extends pathway knowledge to another 82 plant species via gene-orthology projection using the Reactome data model and framework. It currently hosts 298 reference pathways, including metabolic and transport pathways, transcriptional networks, hormone signaling pathways, and plant developmental processes. In addition to browsing plant pathways, users can upload and analyze their omics data, such as the gene-expression data, and overlay curated or experimental gene-gene interaction data to extend pathway knowledge. The curation team actively engages researchers and students on gene and pathway curation by offering workshops and online tutorials. The Plant Reactome supports, implements and collaborates with the wider community to make data and tools related to genes, genomes, and pathways Findable, Accessible, Interoperable and Re-usable (FAIR).
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica , Metabolômica , Plantas/genética , Plantas/metabolismo , Proteômica , Redes Reguladoras de Genes , Genômica/métodos , Humanos , Redes e Vias Metabólicas , Metabolômica/métodos , Proteômica/métodos , Transdução de Sinais , NavegadorRESUMO
The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations in a single consistent data model, an extended version of a classic metabolic map. Reactome functions both as an archive of biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. To extend our ability to annotate human disease processes, we have implemented a new drug class and have used it initially to annotate drugs relevant to cardiovascular disease. Our annotation model depends on external domain experts to identify new areas for annotation and to review new content. New web pages facilitate recruitment of community experts and allow those who have contributed to Reactome to identify their contributions and link them to their ORCID records. To improve visualization of our content, we have implemented a new tool to automatically lay out the components of individual reactions with multiple options for downloading the reaction diagrams and associated data, and a new display of our event hierarchy that will facilitate visual interpretation of pathway analysis results.
Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Bases de Conhecimento , Software , Genoma Humano , Humanos , Redes e Vias Metabólicas , Mapas de Interação de Proteínas , Transdução de SinaisRESUMO
WormBase (https://wormbase.org/) is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genes de Helmintos , Animais , Mineração de Dados , Genômica , Internet , Interface Usuário-ComputadorRESUMO
Somatic mutations in protein-coding regions can generate 'neoantigens' causing developing cancers to be eliminated by the immune system. Quantitative estimates of the strength of this counterselection phenomenon have been lacking. We quantified the extent to which somatic mutations are depleted in peptides that are predicted to be displayed by major histocompatibility complex (MHC) class I proteins. The extent of this depletion depended on expression level of the neoantigenic gene, and on whether the patient had one or two MHC-encoding alleles that can display the peptide, suggesting MHC-encoding alleles are incompletely dominant. This study provides an initial quantitative understanding of counter-selection of identifiable subclasses of neoantigenic somatic variation.
Assuntos
Antígenos de Histocompatibilidade Classe I/metabolismo , Mutação de Sentido Incorreto , Peptídeos/genética , Alelos , Apresentação de Antígeno , Antígenos de Neoplasias/genética , HumanosRESUMO
The binding of PRDM9 to chromatin is a key step in the induction of DNA double-strand breaks associated with meiotic recombination hotspots; it is normally expressed solely in germ cells. We interrogated 1879 cancer samples in 39 different cancer types and found that PRDM9 is unexpectedly expressed in 20% of these tumors even after stringent gene homology correction. The expression levels of PRDM9 in tumors are significantly higher than those found in healthy neighboring tissues and in healthy nongerm tissue databases. Recurrently mutated regions located within 5 Mb of the PRDM9 loci, as well as differentially expressed genes in meiotic pathways, correlate with PRDM9 expression. In samples with aberrant PRDM9 expression, structural variant breakpoints frequently neighbor the DNA motif recognized by PRDM9, and there is an enrichment of structural variants at sites of known meiotic PRDM9 activity. This study is the first to provide evidence of an association between aberrant expression of the meiosis-specific gene PRDM9 with genomic instability in cancer.
Assuntos
Regulação da Expressão Gênica no Desenvolvimento , Histona-Lisina N-Metiltransferase/genética , Taxa de Mutação , Neoplasias/genética , Pontos de Quebra do Cromossomo , Instabilidade Genômica , Histona-Lisina N-Metiltransferase/metabolismo , HumanosRESUMO
We describe JBrowse Connect, an optional expansion to the JBrowse genome browser, targeted at developers. JBrowse Connect allows live messaging, notifications for new annotation tracks, heavy-duty analyses initiated by the user from within the browser, and other dynamic features. We present example applications of JBrowse Connect that allow users 1) to specify and execute BLAST searches by either running on the same host as the webserver, with a self-contained BLAST module leveraging NCBI Blast+ commands, or via a managed Galaxy instance that can optionally run on a different host, and 2) to run the primer design service Primer3. JBrowse Connect allows users to track job progress and view results in the context of the browser. The software is available under a choice of open source licenses including LGPL and the Artistic License.
Assuntos
Bases de Dados Genéticas , Genômica/métodos , Software , InternetRESUMO
Pancreatic ductal adenocarcinoma (PDAC) has the worst prognosis among solid malignancies and improved therapeutic strategies are needed to improve outcomes. Patient-derived xenografts (PDX) and patient-derived organoids (PDO) serve as promising tools to identify new drugs with therapeutic potential in PDAC. For these preclinical disease models to be effective, they should both recapitulate the molecular heterogeneity of PDAC and validate patient-specific therapeutic sensitivities. To date however, deep characterization of the molecular heterogeneity of PDAC PDX and PDO models and comparison with matched human tumour remains largely unaddressed at the whole genome level. We conducted a comprehensive assessment of the genetic landscape of 16 whole-genome pairs of tumours and matched PDX, from primary PDAC and liver metastasis, including a unique cohort of 5 'trios' of matched primary tumour, PDX, and PDO. We developed a pipeline to score concordance between PDAC models and their paired human tumours for genomic events, including mutations, structural variations, and copy number variations. Tumour-model comparisons of mutations displayed single-gene concordance across major PDAC driver genes, but relatively poor agreement across the greater mutational load. Genome-wide and chromosome-centric analysis of structural variation (SV) events highlights previously unrecognized concordance across chromosomes that demonstrate clustered SV events. We found that polyploidy presented a major challenge when assessing copy number changes; however, ploidy-corrected copy number states suggest good agreement between donor-model pairs. Collectively, our investigations highlight that while PDXs and PDOs may serve as tractable and transplantable systems for probing the molecular properties of PDAC, these models may best serve selective analyses across different levels of genomic complexity.
Assuntos
Carcinoma Ductal Pancreático/genética , Genoma/genética , Modelos Biológicos , Neoplasias Experimentais/genética , Neoplasias Pancreáticas/genética , Animais , Pesquisa Biomédica/normas , Humanos , Pâncreas/patologiaRESUMO
In acute myeloid leukaemia (AML), the cell of origin, nature and biological consequences of initiating lesions, and order of subsequent mutations remain poorly understood, as AML is typically diagnosed without observation of a pre-leukaemic phase. Here, highly purified haematopoietic stem cells (HSCs), progenitor and mature cell fractions from the blood of AML patients were found to contain recurrent DNMT3A mutations (DNMT3A(mut)) at high allele frequency, but without coincident NPM1 mutations (NPM1c) present in AML blasts. DNMT3A(mut)-bearing HSCs showed a multilineage repopulation advantage over non-mutated HSCs in xenografts, establishing their identity as pre-leukaemic HSCs. Pre-leukaemic HSCs were found in remission samples, indicating that they survive chemotherapy. Therefore DNMT3A(mut) arises early in AML evolution, probably in HSCs, leading to a clonally expanded pool of pre-leukaemic HSCs from which AML evolves. Our findings provide a paradigm for the detection and treatment of pre-leukaemic clones before the acquisition of additional genetic lesions engenders greater therapeutic resistance.