RESUMEN
BACKGROUND: Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods. RESULTS: We demonstrate Parliament's efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus. CONCLUSIONS: HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.
Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Análisis de Secuencia de ADN/métodos , Biología Computacional , Bases de Datos Genéticas , Diploidia , Humanos , Programas InformáticosRESUMEN
The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) is a set of web-based tools to display, investigate and analyse cancer genomics data and its associated clinical information. The browser provides whole-genome to base-pair level views of several different types of genomics data, including some next-generation sequencing platforms. The ability to view multiple datasets together allows users to make comparisons across different data and cancer types. Biological pathways, collections of genes, genomic or clinical information can be used to sort, aggregate and zoom into a group of samples. We currently display an expanding set of data from various sources, including 201 datasets from 22 TCGA (The Cancer Genome Atlas) cancers as well as data from Cancer Cell Line Encyclopedia and Stand Up To Cancer. New features include a completely redesigned user interface with an interactive tutorial and updated documentation. We have also added data downloads, additional clinical heatmap features, and an updated Tumor Image Browser based on Google Maps. New security features allow authenticated users access to private datasets hosted by several different consortia through the public website.
Asunto(s)
Bases de Datos Genéticas , Genómica , Neoplasias/genética , Línea Celular Tumoral , Humanos , InternetRESUMEN
Effective data sharing is key to accelerating research to improve diagnostic precision, treatment efficacy, and long-term survival in pediatric cancer and other childhood catastrophic diseases. We present St. Jude Cloud (https://www.stjude.cloud), a cloud-based data-sharing ecosystem for accessing, analyzing, and visualizing genomic data from >10,000 pediatric patients with cancer and long-term survivors, and >800 pediatric sickle cell patients. Harmonized genomic data totaling 1.25 petabytes are freely available, including 12,104 whole genomes, 7,697 whole exomes, and 2,202 transcriptomes. The resource is expanding rapidly, with regular data uploads from St. Jude's prospective clinical genomics programs. Three interconnected apps within the ecosystem-Genomics Platform, Pediatric Cancer Knowledgebase, and Visualization Community-enable simultaneously performing advanced data analysis in the cloud and enhancing the Pediatric Cancer knowledgebase. We demonstrate the value of the ecosystem through use cases that classify 135 pediatric cancer subtypes by gene expression profiling and map mutational signatures across 35 pediatric cancer subtypes. SIGNIFICANCE: To advance research and treatment of pediatric cancer, we developed St. Jude Cloud, a data-sharing ecosystem for accessing >1.2 petabytes of raw genomic data from >10,000 pediatric patients and survivors, innovative analysis workflows, integrative multiomics visualizations, and a knowledgebase of published data contributed by the global pediatric cancer community.This article is highlighted in the In This Issue feature, p. 995.
Asunto(s)
Anemia de Células Falciformes/genética , Nube Computacional , Genómica , Difusión de la Información , Neoplasias/genética , Niño , Ecosistema , Hospitales Pediátricos , HumanosRESUMEN
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.
Asunto(s)
Genómica/métodos , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Exoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Difusión de la Información/métodos , Mutación , Programas Informáticos , Secuenciación del Exoma/métodosRESUMEN
The detection of somatic single nucleotide variants is a crucial component to the characterization of the cancer genome. Mutation calling algorithms thus far have focused on comparing the normal and tumor genomes from the same individual. In recent years, it has become routine for projects like The Cancer Genome Atlas (TCGA) to also sequence the tumor RNA. Here we present RADIA (RNA and DNA Integrated Analysis), a novel computational method combining the patient-matched normal and tumor DNA with the tumor RNA to detect somatic mutations. The inclusion of the RNA increases the power to detect somatic mutations, especially at low DNA allelic frequencies. By integrating an individual's DNA and RNA, we are able to detect mutations that would otherwise be missed by traditional algorithms that examine only the DNA. We demonstrate high sensitivity (84%) and very high precision (98% and 99%) for RADIA in patient data from endometrial carcinoma and lung adenocarcinoma from TCGA. Mutations with both high DNA and RNA read support have the highest validation rate of over 99%. We also introduce a simulation package that spikes in artificial mutations to patient data, rather than simulating sequencing data from a reference genome. We evaluate sensitivity on the simulation data and demonstrate our ability to rescue back mutations at low DNA allelic frequencies by including the RNA. Finally, we highlight mutations in important cancer genes that were rescued due to the incorporation of the RNA.
Asunto(s)
Adenocarcinoma/genética , Neoplasias Endometriales/genética , Neoplasias Pulmonares/genética , Mutación , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Femenino , Genoma Humano , Humanos , Polimorfismo de Nucleótido Simple , Sensibilidad y EspecificidadRESUMEN
The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu) offers interactive visualization and exploration of TCGA genomic, phenotypic, and clinical data, as produced by the Cancer Genome Atlas Research Network. Researchers can explore the impact of genomic alterations on phenotypes by visualizing gene and protein expression, copy number, DNA methylation, somatic mutation and pathway inference data alongside clinical features, Pan-Cancer subtype classifications and genomic biomarkers. Integrated Kaplan-Meier survival analysis helps investigators to assess survival stratification by any of the information.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Neoplasias/genética , Navegador Web , Animales , Humanos , Neoplasias/metabolismoRESUMEN
A significant proportion of enzymes display cooperativity in binding ligand molecules, and such effects have an important impact on metabolic regulation. This is easiest to understand in the case of positive cooperativity. Sharp responses to changes in metabolite concentrations can allow organisms to better respond to environmental changes and maintain metabolic homeostasis. However, despite the fact that negative cooperativity is almost as common as positive, it has been harder to imagine what advantages it provides. Here we use computational models to explore the utility of negative cooperativity in one particular context: that of an inhibitor binding to an enzyme. We identify several factors which may contribute, and show that acting together they can make negative cooperativity advantageous.