RESUMO
We conducted the largest investigation of predisposition variants in cancer to date, discovering 853 pathogenic or likely pathogenic variants in 8% of 10,389 cases from 33 cancer types. Twenty-one genes showed single or cross-cancer associations, including novel associations of SDHA in melanoma and PALB2 in stomach adenocarcinoma. The 659 predisposition variants and 18 additional large deletions in tumor suppressors, including ATM, BRCA1, and NF1, showed low gene expression and frequent (43%) loss of heterozygosity or biallelic two-hit events. We also discovered 33 such variants in oncogenes, including missenses in MET, RET, and PTPN11 associated with high gene expression. We nominated 47 additional predisposition variants from prioritized VUSs supported by multiple evidences involving case-control frequency, loss of heterozygosity, expression effect, and co-localization with mutations and modified residues. Our integrative approach links rare predisposition variants to functional consequences, informing future guidelines of variant classification and germline genetic testing in cancer.
Assuntos
Células Germinativas/metabolismo , Neoplasias/patologia , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Deleção de Genes , Frequência do Gene , Predisposição Genética para Doença , Genótipo , Células Germinativas/citologia , Mutação em Linhagem Germinativa , Humanos , Perda de Heterozigosidade/genética , Mutação de Sentido Incorreto , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas c-met/genética , Proteínas Proto-Oncogênicas c-ret/genética , Proteínas Supressoras de Tumor/genéticaRESUMO
The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.
Assuntos
Genoma Humano , Genômica , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNARESUMO
Most mutations in cancer genomes are thought to be acquired after the initiating event, which may cause genomic instability and drive clonal evolution. However, for acute myeloid leukemia (AML), normal karyotypes are common, and genomic instability is unusual. To better understand clonal evolution in AML, we sequenced the genomes of M3-AML samples with a known initiating event (PML-RARA) versus the genomes of normal karyotype M1-AML samples and the exomes of hematopoietic stem/progenitor cells (HSPCs) from healthy people. Collectively, the data suggest that most of the mutations found in AML genomes are actually random events that occurred in HSPCs before they acquired the initiating mutation; the mutational history of that cell is "captured" as the clone expands. In many cases, only one or two additional, cooperating mutations are needed to generate the malignant founding clone. Cells from the founding clone can acquire additional cooperating mutations, yielding subclones that can contribute to disease progression and/or relapse.
Assuntos
Evolução Clonal , Leucemia Mieloide Aguda/genética , Mutação , Adulto , Idoso , Análise Mutacional de DNA , Progressão da Doença , Feminino , Estudo de Associação Genômica Ampla , Células-Tronco Hematopoéticas/metabolismo , Humanos , Leucemia Mieloide Aguda/fisiopatologia , Masculino , Pessoa de Meia-Idade , Proteínas de Fusão Oncogênica/genética , Recidiva , Pele/metabolismo , Adulto JovemRESUMO
The Drug-Gene Interaction Database (DGIdb, https://dgidb.org) is a publicly accessible resource that aggregates genes or gene products, drugs and drug-gene interaction records to drive hypothesis generation and discovery for clinicians and researchers. DGIdb 5.0 is the latest release and includes substantial architectural and functional updates to support integration into clinical and drug discovery pipelines. The DGIdb service architecture has been split into separate client and server applications, enabling consistent data access for users of both the application programming interface (API) and web interface. The new interface was developed in ReactJS, and includes dynamic visualizations and consistency in the display of user interface elements. A GraphQL API has been added to support customizable queries for all drugs, genes, annotations and associated data. Updated documentation provides users with example queries and detailed usage instructions for these new features. In addition, six sources have been added and many existing sources have been updated. Newly added sources include ChemIDplus, HemOnc, NCIt (National Cancer Institute Thesaurus), Drugs@FDA, HGNC (HUGO Gene Nomenclature Committee) and RxNorm. These new sources have been incorporated into DGIdb to provide additional records and enhance annotations of regulatory approval status for therapeutics. Methods for grouping drugs and genes have been expanded upon and developed as independent modular normalizers during import. The updates to these sources and grouping methods have resulted in an improvement in FAIR (findability, accessibility, interoperability and reusability) data representation in DGIdb.
Assuntos
Medicina de Precisão , Humanos , Bases de Dados de Produtos Farmacêuticos , Descoberta de Drogas , Internet , Interface Usuário-Computador , Vocabulário ControladoRESUMO
CIViC (Clinical Interpretation of Variants in Cancer; civicdb.org) is a crowd-sourced, public domain knowledgebase composed of literature-derived evidence characterizing the clinical utility of cancer variants. As clinical sequencing becomes more prevalent in cancer management, the need for cancer variant interpretation has grown beyond the capability of any single institution. CIViC contains peer-reviewed, published literature curated and expertly-moderated into structured data units (Evidence Items) that can be accessed globally and in real time, reducing barriers to clinical variant knowledge sharing. We have extended CIViC's functionality to support emergent variant interpretation guidelines, increase interoperability with other variant resources, and promote widespread dissemination of structured curated data. To support the full breadth of variant interpretation from basic to translational, including integration of somatic and germline variant knowledge and inference of drug response, we have enabled curation of three new Evidence Types (Predisposing, Oncogenic and Functional). The growing CIViC knowledgebase has over 300 contributors and distributes clinically-relevant cancer variant data currently representing >3200 variants in >470 genes from >3100 publications.
Assuntos
Variação Genética , Neoplasias , Humanos , Neoplasias/genética , Bases de Conhecimento , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
The Drug-Gene Interaction Database (DGIdb, www.dgidb.org) is a web resource that provides information on drug-gene interactions and druggable genes from publications, databases, and other web-based sources. Drug, gene, and interaction data are normalized and merged into conceptual groups. The information contained in this resource is available to users through a straightforward search interface, an application programming interface (API), and TSV data downloads. DGIdb 4.0 is the latest major version release of this database. A primary focus of this update was integration with crowdsourced efforts, leveraging the Drug Target Commons for community-contributed interaction data, Wikidata to facilitate term normalization, and export to NDEx for drug-gene interaction network representations. Seven new sources have been added since the last major version release, bringing the total number of sources included to 41. Of the previously aggregated sources, 15 have been updated. DGIdb 4.0 also includes improvements to the process of drug normalization and grouping of imported sources. Other notable updates include the introduction of a more sophisticated Query Score for interaction search results, an updated Interaction Score, the inclusion of interaction directionality, and several additional improvements to search features, data releases, licensing documentation and the application framework.
Assuntos
Crowdsourcing , Bases de Dados Factuais , Bases de Dados Genéticas , Drogas em Investigação/farmacologia , Genoma Humano/efeitos dos fármacos , Medicamentos sob Prescrição/farmacologia , Bases de Dados de Compostos Químicos , Drogas em Investigação/química , Genótipo , Humanos , Internet , Bases de Conhecimento , Fenótipo , Medicamentos sob Prescrição/química , SoftwareRESUMO
Aberrant phospho-signaling is a hallmark of cancer. We investigated kinase-substrate regulation of 33,239 phosphorylation sites (phosphosites) in 77 breast tumors and 24 breast cancer xenografts. Our search discovered 2134 quantitatively correlated kinase-phosphosite pairs, enriching for and extending experimental or binding-motif predictions. Among the 91 kinases with auto-phosphorylation, elevated EGFR, ERBB2, PRKG1, and WNK1 phosphosignaling were enriched in basal, HER2-E, Luminal A, and Luminal B breast cancers, respectively, revealing subtype-specific regulation. CDKs, MAPKs, and ataxia-telangiectasia proteins were dominant, master regulators of substrate-phosphorylation, whose activities are not captured by genomic evidence. We unveiled phospho-signaling and druggable targets from 113 kinase-substrate pairs and cascades downstream of kinases, including AKT1, BRAF and EGFR. We further identified kinase-substrate-pairs associated with clinical or immune signatures and experimentally validated activated phosphosites of ERBB2, EIF4EBP1, and EGFR. Overall, kinase-substrate regulation revealed by the largest unbiased global phosphorylation data to date connects driver events to their signaling effects.
Assuntos
Neoplasias da Mama/metabolismo , Proteínas Quinases/metabolismo , Feminino , Humanos , Fosforilação , Transdução de SinaisRESUMO
High-throughput DNA sequencing has revolutionized the study of cancer genomics with numerous discoveries that are relevant to cancer diagnosis and treatment. The latest sequencing and analysis methods have successfully identified somatic alterations, including single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions. Additional computational techniques have proved useful for defining the mutations, genes and molecular networks that drive diverse cancer phenotypes and that determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic and epigenomic alterations in cancer, and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application.
Assuntos
Mineração de Dados/métodos , Genômica/métodos , Neoplasias/genética , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Neoplasias/metabolismo , Transdução de Sinais , SoftwareRESUMO
The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/ß-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.
Assuntos
Carcinogênese/genética , Mutação/genética , Neoplasias/classificação , Neoplasias/genética , Ciclo Celular/genética , Células Clonais/metabolismo , Células Clonais/patologia , Estudos de Coortes , Reparo do DNA/genética , Humanos , Mutação INDEL/genética , Proteínas Quinases Ativadas por Mitógeno/genética , Modelos Genéticos , Neoplasias/metabolismo , Neoplasias/patologia , Oncogenes/genética , Fosfatidilinositol 3-Quinases/genética , Mutação Puntual/genética , Receptores Proteína Tirosina Quinases/metabolismo , Análise de Sobrevida , Fatores de TempoRESUMO
Harmonization of cancer variant representation, efficient communication, and free distribution of clinical variant-associated knowledge are central problems that arise with increased usage of clinical next-generation sequencing. The Clinical Genome Resource (ClinGen) Somatic Working Group (WG) developed a minimal variant level data (MVLD) representation of cancer variants, and has an ongoing collaboration with Clinical Interpretations of Variants in Cancer (CIViC), an open-source platform supporting crowdsourced and expert-moderated cancer variant curation. Harmonization between MVLD and CIViC variant formats was assessed by formal field-by-field analysis. Adjustments to the CIViC format were made to harmonize with MVLD and support ClinGen Somatic WG curation activities, including four new features in CIViC: (1) introduction of an assertions feature for clinical variant assessment following the Association of Molecular Pathologists (AMP) guidelines, (2) group-level curation tracking for organizations, enabling member transparency, and curation effort summaries, (3) introduction of ClinGen Allele Registry IDs to CIViC, and (4) mapping of CIViC assertions into ClinVar submission with automated submissions. A generalizable workflow utilizing MVLD and new CIViC features is outlined for use by ClinGen Somatic WG task teams for curation and submission to ClinVar, and provides a model for promoting harmonization of cancer variant representation and efficient distribution of this information.
Assuntos
Genoma Humano/genética , Neoplasias/genética , Bases de Dados Genéticas , Testes Genéticos , Variação Genética/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , SoftwareRESUMO
Most patients with acute myeloid leukaemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level. To determine the mutational spectrum associated with relapse, we sequenced the primary tumour and relapse genomes from eight AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to define clonality and clonal evolution patterns precisely at relapse. In addition to discovering novel, recurrently mutated genes (for example, WAC, SMC3, DIS3, DDX41 and DAXX) in AML, we also found two major clonal evolution patterns during AML relapse: (1) the founding clone in the primary tumour gained mutations and evolved into the relapse clone, or (2) a subclone of the founding clone survived initial therapy, gained additional mutations and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific versus primary tumour mutations in all eight cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped, in part, by the chemotherapy that the patients receive to establish and maintain remissions.
Assuntos
Evolução Clonal/genética , Genoma Humano/genética , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , Antineoplásicos/efeitos adversos , Antineoplásicos/uso terapêutico , Células Clonais/efeitos dos fármacos , Células Clonais/metabolismo , Células Clonais/patologia , Dano ao DNA/efeitos dos fármacos , Análise Mutacional de DNA , Genes Neoplásicos/genética , Genoma Humano/efeitos dos fármacos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Mutagênese/efeitos dos fármacos , Mutagênese/genética , Recidiva , Reprodutibilidade dos TestesRESUMO
To correlate the variable clinical features of oestrogen-receptor-positive breast cancer with somatic alterations, we studied pretreatment tumour biopsies accrued from patients in two studies of neoadjuvant aromatase inhibitor therapy by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes (RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to haematopoietic disorders. Mutant MAP3K1 was associated with luminal A status, low-grade histology and low proliferation rates, whereas mutant TP53 was associated with the opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon aromatase inhibitor treatment. Pathway analysis demonstrated that mutations in MAP2K4, a MAP3K1 substrate, produced similar perturbations as MAP3K1 loss. Distinct phenotypes in oestrogen-receptor-positive breast cancer are associated with specific patterns of somatic mutations that map into cellular pathways linked to tumour biology, but most recurrent mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing.
Assuntos
Inibidores da Aromatase/uso terapêutico , Aromatase/metabolismo , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Genoma Humano/genética , Anastrozol , Androstadienos/farmacologia , Androstadienos/uso terapêutico , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Reparo do DNA , Exoma/genética , Éxons/genética , Feminino , Variação Genética/genética , Humanos , Letrozol , MAP Quinase Quinase 4/genética , MAP Quinase Quinase Quinase 1/genética , Mutação/genética , Nitrilas/farmacologia , Nitrilas/uso terapêutico , Receptores de Estrogênio/metabolismo , Resultado do Tratamento , Triazóis/farmacologia , Triazóis/uso terapêuticoRESUMO
The Drug-Gene Interaction Database (DGIdb, www.dgidb.org) is a web resource that consolidates disparate data sources describing drug-gene interactions and gene druggability. It provides an intuitive graphical user interface and a documented application programming interface (API) for querying these data. DGIdb was assembled through an extensive manual curation effort, reflecting the combined information of twenty-seven sources. For DGIdb 2.0, substantial updates have been made to increase content and improve its usefulness as a resource for mining clinically actionable drug targets. Specifically, nine new sources of drug-gene interactions have been added, including seven resources specifically focused on interactions linked to clinical trials. These additions have more than doubled the overall count of drug-gene interactions. The total number of druggable gene claims has also increased by 30%. Importantly, a majority of the unrestricted, publicly-accessible sources used in DGIdb are now automatically updated on a weekly basis, providing the most current information for these sources. Finally, a new web view and API have been developed to allow searching for interactions by drug identifiers to complement existing gene-based search functionality. With these updates, DGIdb represents a comprehensive and user friendly tool for mining the druggable genome for precision medicine hypothesis generation.
Assuntos
Bases de Dados de Produtos Farmacêuticos , Descoberta de Drogas , Genes/efeitos dos fármacos , Mineração de Dados , LigantesRESUMO
BACKGROUND: Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns of mutations and epigenetic phenotypes are not yet clear. METHODS: We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis. RESULTS: AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes. Of these, an average of 5 are in genes that are recurrently mutated in AML. A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples. Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcription-factor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumor-suppressor genes (16%), DNA-methylation-related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%). Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories. CONCLUSIONS: We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients. The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification. (Funded by the National Institutes of Health.).
Assuntos
Leucemia Mieloide Aguda/genética , Mutação , Adulto , Ilhas de CpG , Metilação de DNA , Epigenômica , Feminino , Expressão Gênica , Fusão Gênica , Genoma Humano , Humanos , Leucemia Mieloide Aguda/classificação , Masculino , MicroRNAs/genética , Pessoa de Meia-Idade , Nucleofosmina , Análise de Sequência de DNA/métodosRESUMO
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
Assuntos
Mapeamento Cromossômico/métodos , Genoma Humano/genética , Bases de Conhecimento , Modelos Genéticos , Análise de Sequência de DNA/métodos , Interface Usuário-Computador , Algoritmos , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Alinhamento de Sequência/métodosRESUMO
Massively parallel DNA sequencing technologies provide an unprecedented ability to screen entire genomes for genetic changes associated with tumour progression. Here we describe the genomic analyses of four DNA samples from an African-American patient with basal-like breast cancer: peripheral blood, the primary tumour, a brain metastasis and a xenograft derived from the primary tumour. The metastasis contained two de novo mutations and a large deletion not present in the primary tumour, and was significantly enriched for 20 shared mutations. The xenograft retained all primary tumour mutations and displayed a mutation enrichment pattern that resembled the metastasis. Two overlapping large deletions, encompassing CTNNA1, were present in all three tumour samples. The differential mutation frequencies and structural variation patterns in metastasis and xenograft compared with the primary tumour indicate that secondary tumours may arise from a minority of cells within the primary tumour.
Assuntos
Neoplasias Encefálicas/genética , Neoplasias Encefálicas/secundário , Neoplasias da Mama/genética , Genoma Humano/genética , Mutação/genética , Transplante de Neoplasias , Adulto , Neoplasias da Mama/patologia , Variações do Número de Cópias de DNA/genética , Análise Mutacional de DNA , Progressão da Doença , Feminino , Frequência do Gene/genética , Genômica , Humanos , Translocação Genética/genética , Transplante Heterólogo , alfa Catenina/genéticaRESUMO
People diagnosed with cancer and their formal and informal caregivers are increasingly faced with a deluge of complex information, thanks to rapid advancements in the type and volume of diagnostic, prognostic, and treatment data. This commentary discusses the opportunities and challenges that the society faces as we integrate large volumes of data into regular cancer care.
Assuntos
Neoplasias , Humanos , Neoplasias/terapia , Pesquisa BiomédicaRESUMO
BACKGROUND: The genetic alterations responsible for an adverse outcome in most patients with acute myeloid leukemia (AML) are unknown. METHODS: Using massively parallel DNA sequencing, we identified a somatic mutation in DNMT3A, encoding a DNA methyltransferase, in the genome of cells from a patient with AML with a normal karyotype. We sequenced the exons of DNMT3A in 280 additional patients with de novo AML to define recurring mutations. RESULTS: A total of 62 of 281 patients (22.1%) had mutations in DNMT3A that were predicted to affect translation. We identified 18 different missense mutations, the most common of which was predicted to affect amino acid R882 (in 37 patients). We also identified six frameshift, six nonsense, and three splice-site mutations and a 1.5-Mbp deletion encompassing DNMT3A. These mutations were highly enriched in the group of patients with an intermediate-risk cytogenetic profile (56 of 166 patients, or 33.7%) but were absent in all 79 patients with a favorable-risk cytogenetic profile (P<0.001 for both comparisons). The median overall survival among patients with DNMT3A mutations was significantly shorter than that among patients without such mutations (12.3 months vs. 41.1 months, P<0.001). DNMT3A mutations were associated with adverse outcomes among patients with an intermediate-risk cytogenetic profile or FLT3 mutations, regardless of age, and were independently associated with a poor outcome in Cox proportional-hazards analysis. CONCLUSIONS: DNMT3A mutations are highly recurrent in patients with de novo AML with an intermediate-risk cytogenetic profile and are independently associated with a poor outcome. (Funded by the National Institutes of Health and others.).
Assuntos
DNA (Citosina-5-)-Metiltransferases/genética , Leucemia Mieloide Aguda/genética , Mutação , Adulto , Metilação de DNA , DNA Metiltransferase 3A , Análise Mutacional de DNA/métodos , Feminino , Mutação da Fase de Leitura , Expressão Gênica , Humanos , Cariotipagem , Leucemia Mieloide Aguda/mortalidade , Masculino , Pessoa de Meia-Idade , Técnicas de Amplificação de Ácido Nucleico , Prognóstico , Modelos de Riscos Proporcionais , Análise de SobrevidaRESUMO
Somatic mutations within non-coding regions and even exons may have unidentified regulatory consequences that are often overlooked in analysis workflows. Here we present RegTools ( www.regtools.org ), a computationally efficient, free, and open-source software package designed to integrate somatic variants from genomic data with splice junctions from bulk or single cell transcriptomic data to identify variants that may cause aberrant splicing. We apply RegTools to over 9000 tumor samples with both tumor DNA and RNA sequence data. RegTools discovers 235,778 events where a splice-associated variant significantly increases the splicing of a particular junction, across 158,200 unique variants and 131,212 unique junctions. To characterize these somatic variants and their associated splice isoforms, we annotate them with the Variant Effect Predictor, SpliceAI, and Genotype-Tissue Expression junction counts and compare our results to other tools that integrate genomic and transcriptomic data. While many events are corroborated by the aforementioned tools, the flexibility of RegTools also allows us to identify splice-associated variants in known cancer drivers, such as TP53, CDKN2A, and B2M, and other genes.