RESUMEN
Mixed phenotype acute leukaemia (MPAL) is a high-risk subtype of leukaemia with myeloid and lymphoid features, limited genetic characterization, and a lack of consensus regarding appropriate therapy. Here we show that the two principal subtypes of MPAL, T/myeloid (T/M) and B/myeloid (B/M), are genetically distinct. Rearrangement of ZNF384 is common in B/M MPAL, and biallelic WT1 alterations are common in T/M MPAL, which shares genomic features with early T-cell precursor acute lymphoblastic leukaemia. We show that the intratumoral immunophenotypic heterogeneity characteristic of MPAL is independent of somatic genetic variation, that founding lesions arise in primitive haematopoietic progenitors, and that individual phenotypic subpopulations can reconstitute the immunophenotypic diversity in vivo. These findings indicate that the cell of origin and founding lesions, rather than an accumulation of distinct genomic alterations, prime tumour cells for lineage promiscuity. Moreover, these findings position MPAL in the spectrum of immature leukaemias and provide a genetically informed framework for future clinical trials of potential treatments for MPAL.
Asunto(s)
Leucemia Bifenotípica Aguda/genética , Leucemia Bifenotípica Aguda/patología , Linaje de la Célula/genética , Análisis Mutacional de ADN , Femenino , Variación Genética/genética , Genoma Humano/genética , Genómica , Humanos , Inmunofenotipificación , Leucemia Bifenotípica Aguda/clasificación , Masculino , Modelos Genéticos , Mutación/genética , Células Madre Neoplásicas/inmunología , Células Madre Neoplásicas/metabolismo , Células Madre Neoplásicas/patología , Fenotipo , Transactivadores/genéticaRESUMEN
Although generally curable with intensive chemotherapy in resource-rich settings, Burkitt lymphoma (BL) remains a deadly disease in older patients and in sub-Saharan Africa. Epstein-Barr virus (EBV) positivity is a feature in more than 90% of cases in malaria-endemic regions, and up to 30% elsewhere. However, the molecular features of BL have not been comprehensively evaluated when taking into account tumor EBV status or geographic origin. Through an integrative analysis of whole-genome and transcriptome data, we show a striking genome-wide increase in aberrant somatic hypermutation in EBV-positive tumors, supporting a link between EBV and activation-induced cytidine deaminase (AICDA) activity. In addition to identifying novel candidate BL genes such as SIN3A, USP7, and CHD8, we demonstrate that EBV-positive tumors had significantly fewer driver mutations, especially among genes with roles in apoptosis. We also found immunoglobulin variable region genes that were disproportionally used to encode clonal B-cell receptors (BCRs) in the tumors. These include IGHV4-34, known to produce autoreactive antibodies, and IGKV3-20, a feature described in other B-cell malignancies but not yet in BL. Our results suggest that tumor EBV status defines a specific BL phenotype irrespective of geographic origin, with particular molecular properties and distinct pathogenic mechanisms. The novel mutation patterns identified here imply rational use of DNA-damaging chemotherapy in some patients with BL and targeted agents such as the CDK4/6 inhibitor palbociclib in others, whereas the importance of BCR signaling in BL strengthens the potential benefit of inhibitors for PI3K, Syk, and Src family kinases among these patients.
Asunto(s)
Biomarcadores de Tumor/genética , Linfoma de Burkitt/genética , Infecciones por Virus de Epstein-Barr/complicaciones , Genes de Inmunoglobulinas , Genoma Humano , Mutación , Transcriptoma , Adolescente , Adulto , Linfoma de Burkitt/patología , Linfoma de Burkitt/virología , Niño , Preescolar , Estudios de Cohortes , Citidina Desaminasa/genética , Infecciones por Virus de Epstein-Barr/genética , Infecciones por Virus de Epstein-Barr/virología , Femenino , Estudios de Seguimiento , Herpesvirus Humano 4/aislamiento & purificación , Humanos , Lactante , Recién Nacido , Masculino , Fenotipo , Pronóstico , Adulto JovenRESUMEN
The NCI's Cloud Resources (CR) are the analytical components of the Cancer Research Data Commons (CRDC) ecosystem. This review describes how the three CRs (Broad Institute FireCloud, Institute for Systems Biology Cancer Gateway in the Cloud, and Seven Bridges Cancer Genomics Cloud) provide access and availability to large, cloud-hosted, multimodal cancer datasets, as well as offer tools and workspaces for performing data analysis where the data resides, without download or storage. In addition, users can upload their own data and tools into their workspaces, allowing researchers to create custom analysis workflows and integrate CRDC-hosted data with their own. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Kim et al., p. 1404.
Asunto(s)
Nube Computacional , National Cancer Institute (U.S.) , Neoplasias , Humanos , Neoplasias/genética , Estados Unidos , Investigación Biomédica , Genómica/métodos , Biología Computacional/métodosRESUMEN
More than ever, scientific progress in cancer research hinges on our ability to combine datasets and extract meaningful interpretations to better understand diseases and ultimately inform the development of better treatments and diagnostic tools. To enable the successful sharing and use of big data, the NCI developed the Cancer Research Data Commons (CRDC), providing access to a large, comprehensive, and expanding collection of cancer data. The CRDC is a cloud-based data science infrastructure that eliminates the need for researchers to download and store large-scale datasets by allowing them to perform analysis where data reside. Over the past 10 years, the CRDC has made significant progress in providing access to data and tools along with training and outreach to support the cancer research community. In this review, we provide an overview of the history and the impact of the CRDC to date, lessons learned, and future plans to further promote data sharing, accessibility, interoperability, and reuse. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Pot et al., p. 1396.
Asunto(s)
Difusión de la Información , National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/terapia , Difusión de la Información/métodos , Investigación Biomédica/tendencias , Bases de Datos Factuales , MacrodatosRESUMEN
Since 2014, the NCI has launched a series of data commons as part of the Cancer Research Data Commons (CRDC) ecosystem housing genomic, proteomic, imaging, and clinical data to support cancer research and promote data sharing of NCI-funded studies. This review describes each data commons (Genomic Data Commons, Proteomic Data Commons, Integrated Canine Data Commons, Cancer Data Service, Imaging Data Commons, and Clinical and Translational Data Commons), including their unique and shared features, accomplishments, and challenges. Also discussed is how the CRDC data commons implement Findable, Accessible, Interoperable, Reusable (FAIR) principles and promote data sharing in support of the new NIH Data Management and Sharing Policy. See related articles by Brady et al., p. 1384, Pot et al., p. 1396, and Kim et al., p. 1404.
Asunto(s)
Difusión de la Información , National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/metabolismo , Difusión de la Información/métodos , Investigación Biomédica , Genómica/métodos , Animales , Proteómica/métodosRESUMEN
The Comprehensive Microbial Resource or CMR (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes. In addition to displaying the original annotation from GenBank, the CMR makes available secondary automated structural and functional annotation across all genomes to provide consistent data types necessary for effective mining of genomic data. Precomputed homology searches are stored to allow meaningful genome comparisons. The CMR supplies users with over 50 different tools to utilize the sequence and annotation data across one or more of the 571 currently available genomes. At the gene level users can view the gene annotation and underlying evidence. Genome level information includes whole genome graphical displays, biochemical pathway maps and genome summary data. Comparative tools display analysis between genomes with homology and genome alignment tools, and searches across the accessions, annotation, and evidence assigned to all genes/genomes are available. The data and tools on the CMR aid genomic research and analysis, and the CMR is included in over 200 scientific publications. The code underlying the CMR website and the CMR database are freely available for download with no license restrictions.
Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Genes Bacterianos , Biología Computacional/tendencias , Genoma Bacteriano , Almacenamiento y Recuperación de la Información/métodos , Internet , Estructura Terciaria de Proteína , Programas InformáticosRESUMEN
Pathema (http://pathema.jcvi.org) is one of the eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infectious Disease (NIAID) designed to serve as a core resource for the bio-defense and infectious disease research community. Pathema strives to support basic research and accelerate scientific progress for understanding, detecting, diagnosing and treating an established set of six target NIAID Category A-C pathogens: Category A priority pathogens; Bacillus anthracis and Clostridium botulinum, and Category B priority pathogens; Burkholderia mallei, Burkholderia pseudomallei, Clostridium perfringens and Entamoeba histolytica. Each target pathogen is represented in one of four distinct clade-specific Pathema web resources and underlying databases developed to target the specific data and analysis needs of each scientific community. All publicly available complete genome projects of phylogenetically related organisms are also represented, providing a comprehensive collection of organisms for comparative analyses. Pathema facilitates the scientific exploration of genomic and related data through its integration with web-based analysis tools, customized to obtain, display, and compute results relevant to ongoing pathogen research. Pathema serves the bio-defense and infectious disease research community by disseminating data resulting from pathogen genome sequencing projects and providing access to the results of inter-genomic comparisons for these organisms.
Asunto(s)
Infecciones Bacterianas/microbiología , Enfermedades Transmisibles/microbiología , Biología Computacional/métodos , Bases de Datos Genéticas , Secuencia de Aminoácidos , Animales , Infecciones Bacterianas/diagnóstico , Biología Computacional/tendencias , Genoma Bacteriano , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Datos de Secuencia Molecular , National Institute of Allergy and Infectious Diseases (U.S.) , Homología de Secuencia de Aminoácido , Programas Informáticos , Estados UnidosRESUMEN
The complete genomes of three strains from the phylum Acidobacteria were compared. Phylogenetic analysis placed them as a unique phylum. They share genomic traits with members of the Proteobacteria, the Cyanobacteria, and the Fungi. The three strains appear to be versatile heterotrophs. Genomic and culture traits indicate the use of carbon sources that span simple sugars to more complex substrates such as hemicellulose, cellulose, and chitin. The genomes encode low-specificity major facilitator superfamily transporters and high-affinity ABC transporters for sugars, suggesting that they are best suited to low-nutrient conditions. They appear capable of nitrate and nitrite reduction but not N(2) fixation or denitrification. The genomes contained numerous genes that encode siderophore receptors, but no evidence of siderophore production was found, suggesting that they may obtain iron via interaction with other microorganisms. The presence of cellulose synthesis genes and a large class of novel high-molecular-weight excreted proteins suggests potential traits for desiccation resistance, biofilm formation, and/or contribution to soil structure. Polyketide synthase and macrolide glycosylation genes suggest the production of novel antimicrobial compounds. Genes that encode a variety of novel proteins were also identified. The abundance of acidobacteria in soils worldwide and the breadth of potential carbon use by the sequenced strains suggest significant and previously unrecognized contributions to the terrestrial carbon cycle. Combining our genomic evidence with available culture traits, we postulate that cells of these isolates are long-lived, divide slowly, exhibit slow metabolic rates under low-nutrient conditions, and are well equipped to tolerate fluctuations in soil hydration.
Asunto(s)
Bacterias/genética , Bacterias/aislamiento & purificación , ADN Bacteriano/genética , Genoma Bacteriano , Microbiología del Suelo , Antibacterianos/biosíntesis , Transporte Biológico , Metabolismo de los Hidratos de Carbono , Cianobacterias/genética , ADN Bacteriano/química , Hongos/genética , Macrólidos/metabolismo , Datos de Secuencia Molecular , Nitrógeno/metabolismo , Filogenia , Proteobacteria/genética , Análisis de Secuencia de ADN , Homología de SecuenciaRESUMEN
TIGRFAMs is a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions. Each family is based on a hidden Markov model (HMM), where both cutoff scores and membership in the seed alignment are chosen so that the HMMs can classify numerous proteins according to their specific molecular functions. Most TIGRFAMs models describe 'equivalog' families, where both orthology and lateral gene transfer may be part of the evolutionary history, but where a single molecular function has been conserved. The Genome Properties system contains a queriable set of metabolic reconstructions, genome metrics and extractions of information from the scientific literature. Its genome-by-genome assertions of whether or not specific structures, pathways or systems are present provide high-level conceptual descriptions of genomic content. These assertions enable comparative genomics, provide a meaningful biological context to aid in manual annotation, support assignments of Gene Ontology (GO) biological process terms and help validate HMM-based predictions of protein function. The Genome Properties system is particularly useful as a generator of phylogenetic profiles, through which new protein family functions may be discovered. The TIGRFAMs and Genome Properties systems can be accessed at http://www.tigr.org/TIGRFAMs and http://www.tigr.org/Genome_Properties.
Asunto(s)
Proteínas Arqueales/fisiología , Proteínas Bacterianas/fisiología , Bases de Datos de Proteínas , Proteínas Arqueales/clasificación , Proteínas Arqueales/genética , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/genética , Genoma Bacteriano , Genómica , Internet , Filogenia , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens.
Asunto(s)
Ehrlichia/genética , Ehrlichiosis/genética , Genómica/métodos , Animales , Biotina/metabolismo , Reparación del ADN , Ehrlichiosis/microbiología , Genoma , Humanos , Modelos Biológicos , Filogenia , Rickettsia/genética , GarrapatasRESUMEN
In the version of this article originally published, the color key in Fig. 1a was wrong. In the Cytogenetics key, the box over t(8;21) originally was green. It should have been red, matching the color of the sections of the pie graphs below the key that were labeled with 15% and 19%.
RESUMEN
This corrects the article DOI: 10.1038/nm.4439.
RESUMEN
We present the molecular landscape of pediatric acute myeloid leukemia (AML) and characterize nearly 1,000 participants in Children's Oncology Group (COG) AML trials. The COG-National Cancer Institute (NCI) TARGET AML initiative assessed cases by whole-genome, targeted DNA, mRNA and microRNA sequencing and CpG methylation profiling. Validated DNA variants corresponded to diverse, infrequent mutations, with fewer than 40 genes mutated in >2% of cases. In contrast, somatic structural variants, including new gene fusions and focal deletions of MBNL1, ZEB2 and ELF1, were disproportionately prevalent in young individuals as compared to adults. Conversely, mutations in DNMT3A and TP53, which were common in adults, were conspicuously absent from virtually all pediatric cases. New mutations in GATA2, FLT3 and CBL and recurrent mutations in MYC-ITD, NRAS, KRAS and WT1 were frequent in pediatric AML. Deletions, mutations and promoter DNA hypermethylation convergently impacted Wnt signaling, Polycomb repression, innate immune cell interactions and a cluster of zinc finger-encoding genes associated with KMT2A rearrangements. These results highlight the need for and facilitate the development of age-tailored targeted therapies for the treatment of pediatric AML.
Asunto(s)
Leucemia Mieloide Aguda/genética , Mutación , Niño , Aberraciones Cromosómicas , Metilación de ADN , Humanos , TranscriptomaRESUMEN
Desulfovibrio vulgaris Hildenborough is a model organism for studying the energy metabolism of sulfate-reducing bacteria (SRB) and for understanding the economic impacts of SRB, including biocorrosion of metal infrastructure and bioremediation of toxic metal ions. The 3,570,858 base pair (bp) genome sequence reveals a network of novel c-type cytochromes, connecting multiple periplasmic hydrogenases and formate dehydrogenases, as a key feature of its energy metabolism. The relative arrangement of genes encoding enzymes for energy transduction, together with inferred cellular location of the enzymes, provides a basis for proposing an expansion to the 'hydrogen-cycling' model for increasing energy efficiency in this bacterium. Plasmid-encoded functions include modification of cell surface components, nitrogen fixation and a type-III protein secretion system. This genome sequence represents a substantial step toward the elucidation of pathways for reduction (and bioremediation) of pollutants such as uranium and chromium and offers a new starting point for defining this organism's complex anaerobic respiration.
Asunto(s)
Desulfovibrio vulgaris/genética , Genoma Bacteriano , Desulfovibrio vulgaris/metabolismo , Metabolismo Energético , Datos de Secuencia MolecularRESUMEN
Advancements in next-generation sequencing and other -omics technologies are accelerating the detailed molecular characterization of individual patient tumors, and driving the evolution of precision medicine. Cancer is no longer considered a single disease, but rather, a diverse array of diseases wherein each patient has a unique collection of germline variants and somatic mutations. Molecular profiling of patient-derived samples has led to a data explosion that could help us understand the contributions of environment and germline to risk, therapeutic response, and outcome. To maximize the value of these data, an interdisciplinary approach is paramount. The National Cancer Institute (NCI) has initiated multiple projects to characterize tumor samples using multi-omic approaches. These projects harness the expertise of clinicians, biologists, computer scientists, and software engineers to investigate cancer biology and therapeutic response in multidisciplinary teams. Petabytes of cancer genomic, transcriptomic, epigenomic, proteomic, and imaging data have been generated by these projects. To address the data analysis challenges associated with these large datasets, the NCI has sponsored the development of the Genomic Data Commons (GDC) and three Cloud Resources. The GDC ensures data and metadata quality, ingests and harmonizes genomic data, and securely redistributes the data. During its pilot phase, the Cloud Resources tested multiple cloud-based approaches for enhancing data access, collaboration, computational scalability, resource democratization, and reproducibility. These NCI-led efforts are continuously being refined to better support open data practices and precision oncology, and to serve as building blocks of the NCI Cancer Research Data Commons.
RESUMEN
[This corrects the article on p. 83 in vol. 5, PMID: 28983483.].
RESUMEN
We performed genome-wide sequencing and analyzed mRNA and miRNA expression, DNA copy number, and DNA methylation in 117 Wilms tumors, followed by targeted sequencing of 651 Wilms tumors. In addition to genes previously implicated in Wilms tumors (WT1, CTNNB1, AMER1, DROSHA, DGCR8, XPO5, DICER1, SIX1, SIX2, MLLT1, MYCN, and TP53), we identified mutations in genes not previously recognized as recurrently involved in Wilms tumors, the most frequent being BCOR, BCORL1, NONO, MAX, COL6A3, ASXL1, MAP3K4, and ARID1A. DNA copy number changes resulted in recurrent 1q gain, MYCN amplification, LIN28B gain, and MIRLET7A loss. Unexpected germline variants involved PALB2 and CHEK2. Integrated analyses support two major classes of genetic changes that preserve the progenitor state and/or interrupt normal development.
Asunto(s)
Genes Relacionados con las Neoplasias , Neoplasias Renales/genética , Tumor de Wilms/genética , Aneuploidia , Metilación de ADN , Epigénesis Genética , Dosificación de Gen , Regulación Neoplásica de la Expresión Génica , Estudio de Asociación del Genoma Completo , Mutación de Línea Germinal , Humanos , MicroARNs/biosíntesis , MicroARNs/genética , Conformación Proteica , ARN Mensajero/biosíntesis , ARN Mensajero/genética , ARN Neoplásico/biosíntesis , ARN Neoplásico/genéticaRESUMEN
The genomic and clinical information used to develop and implement therapeutic approaches for acute myelogenous leukemia (AML) originated primarily from adult patients and has been generalized to patients with pediatric AML. However, age-specific molecular alterations are becoming more evident and may signify the need to age-stratify treatment regimens. The NCI/COG TARGET-AML initiative used whole exome capture sequencing (WXS) to interrogate the genomic landscape of matched trios representing specimens collected upon diagnosis, remission, and relapse from 20 cases of de novo childhood AML. One hundred forty-five somatic variants at diagnosis (median 6 mutations/patient) and 149 variants at relapse (median 6.5 mutations) were identified and verified by orthogonal methodologies. Recurrent somatic variants [in (greater than or equal to) 2 patients] were identified for 10 genes (FLT3, NRAS, PTPN11, WT1, TET2, DHX15, DHX30, KIT, ETV6, KRAS), with variable persistence at relapse. The variant allele fraction (VAF), used to measure the prevalence of somatic mutations, varied widely at diagnosis. Mutations that persisted from diagnosis to relapse had a significantly higher diagnostic VAF compared with those that resolved at relapse (median VAF 0.43 vs. 0.24, P < 0.001). Further analysis revealed that 90% of the diagnostic variants with VAF >0.4 persisted to relapse compared with 28% with VAF <0.2 (P < 0.001). This study demonstrates significant variability in the mutational profile and clonal evolution of pediatric AML from diagnosis to relapse. Furthermore, mutations with high VAF at diagnosis, representing variants shared across a leukemic clonal structure, may constrain the genomic landscape at relapse and help to define key pathways for therapeutic targeting. Cancer Res; 76(8); 2197-205. ©2016 AACR.