ABSTRACT
Most somatic mutations that arise during normal development are present at low levels in single or multiple tissues depending on the developmental stage and affected organs. However, the effect of human developmental stages or mutations of different organs on the features of somatic mutations is still unclear. Here, we performed a systemic and comprehensive analysis of low-level somatic mutations using deep whole-exome sequencing (average read depth ~500×) of 498 multiple organ tissues with matched controls from 190 individuals. Our results showed that early clone-forming mutations shared between multiple organs were lower in number but showed higher allele frequencies than late clone-forming mutations [0.54 vs. 5.83 variants per individual; 6.17% vs. 1.5% variant allele frequency (VAF)] along with less nonsynonymous mutations and lower functional impacts. Additionally, early and late clone-forming mutations had unique mutational signatures that were distinct from mutations that originated from tumors. Compared with early clone-forming mutations that showed a clock-like signature across all organs or tissues studied, late clone-forming mutations showed organ, tissue, and cell-type specificity in the mutation counts, VAFs, and mutational signatures. In particular, analysis of brain somatic mutations showed a bimodal occurrence and temporal-lobe-specific signature. These findings provide new insights into the features of somatic mosaicism that are dependent on developmental stage and brain regions.
Subject(s)
Mosaicism , Neoplasms , Gene Frequency , Humans , Mutation , Neoplasms/genetics , Exome SequencingABSTRACT
Summary: Ion Torrent sequencing is one of the most frequently used platforms in healthcare research and industry. Despite many advantages, platform-specific artifacts complicate efficient separation of true variants from errors, especially in variants with lower allele frequencies (<15%). Here, we developed a multi-step filtering toolbox AIRVF that works on flowgram, raw and mapped reads and called variants to reduce artifact-driven false variant calls. Tests on sequencing data of standard reference material showed up to â¼98% reduction of false variants when combined to conventional public pipelines and â¼48% to the in-house commercial solution, with a minimal loss of sensitivity. Availability and implementation: The program with a detailed manual is available at https://sourceforge.net/projects/airvf/. Contact: swkim@yuhs.ac. Supplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
Diagnostic Errors , High-Throughput Nucleotide Sequencing/methods , Software , Gene Frequency , Humans , Sensitivity and SpecificityABSTRACT
MOTIVATION: Advances in sequencing technologies have remarkably lowered the detection limit of somatic variants to a low frequency. However, calling mutations at this range is still confounded by many factors including environmental contamination. Vector contamination is a continuously occurring issue and is especially problematic since vector inserts are hardly distinguishable from the sample sequences. Such inserts, which may harbor polymorphisms and engineered functional mutations, can result in calling false variants at corresponding sites. Numerous vector-screening methods have been developed, but none could handle contamination from inserts because they are focusing on vector backbone sequences alone. RESULTS: We developed a novel method-Vecuum-that identifies vector-originated reads and resultant false variants. Since vector inserts are generally constructed from intron-less cDNAs, Vecuum identifies vector-originated reads by inspecting the clipping patterns at exon junctions. False variant calls are further detected based on the biased distribution of mutant alleles to vector-originated reads. Tests on simulated and spike-in experimental data validated that Vecuum could detect 93% of vector contaminants and could remove up to 87% of variant-like false calls with 100% precision. Application to public sequence datasets demonstrated the utility of Vecuum in detecting false variants resulting from various types of external contamination. AVAILABILITY AND IMPLEMENTATION: Java-based implementation of the method is available at http://vecuum.sourceforge.net/ CONTACT: swkim@yuhs.acSupplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
High-Throughput Nucleotide Sequencing , Mutation , Alleles , Genetic Vectors , Recombination, Genetic , SoftwareABSTRACT
BACKGROUND: Alternative splicing events that result in the production of multiple gene isoforms reveals important molecular mechanisms. Gene isoforms are often differentially expressed across organs and tissues, developmental stages, and disease conditions. Specifically, recent studies show that aberrant regulation of alternative splicing frequently occurs in cancer to affect tumor cell transformation and growth. While analysis of isoform expression is important for discovering tumor-specific isoform signatures and interpreting relevant genomic mutations, there is currently no web-based, easy-to-use, and publicly available platform for this purpose. DESCRIPTION: We developed ISOexpresso to provide information regarding isoform existence and expression, which can be grouped by cancer vs. normal conditions, cancer types, and tissue types. ISOexpresso implements two main functions: First, the Isoform Expression View function creates visualizations for condition-specific RNA/isoform expression patterns upon query of a gene of interest. With this function, users can easily determine the major isoform (the most expressed isoform in a sample) of a gene with respect to the condition and check whether it matches the known canonical isoform. ISOexpresso outputs expression levels of all known transcripts to check alterations of expression landscape and to find potential tumor-specific isoforms. Second, the User Data Annotation function supports annotation of genomic variants to determine the most plausible consequence of a variation (e.g., an amino acid change) among many possible interpretations. As most coding sequence mutations are effective through the subsequent transcription and translation, ISOexpresso automatically prioritizes transcripts that act as backbones for mutation effect prediction by their relative expression. By employing ISOexpresso, we could investigate the consistency between the most expressed and known canonical/principal isoforms, as well as infer candidate tumor-specific isoforms based on their expression levels. In addition, we confirmed that ISOexpresso could easily reproduce previously known isoform expression patterns: recurrent observation of a major isoform across tissues, differential isoform expression patterns in a given tissue, and switching of major isoform during tumorigenesis. CONCLUSIONS: ISOexpresso serves as a web-based, easy-to-use platform for isoform expression and alteration analysis based on large-scale cancer database. We anticipate that ISOexpresso will expedite formulation and confirmation of novel hypotheses by providing isoform-level perspectives on cancer research. The ISOexpresso database is available online at http://wiki.tgilab.org/ISOexpresso/ .
Subject(s)
Gene Expression Profiling/instrumentation , Gene Expression Regulation, Neoplastic , Neoplasm Proteins/metabolism , Neoplasms/metabolism , User-Computer Interface , Databases, Factual , Forkhead Box Protein M1/genetics , Forkhead Box Protein M1/metabolism , Humans , Internet , Mutation , Neoplasm Proteins/genetics , Protein Isoforms/genetics , Protein Isoforms/metabolismABSTRACT
Accurate genome-wide detection of somatic mutations with low variant allele frequency (VAF, <1%) has proven difficult, for which generalized, scalable methods are lacking. Herein, we describe a new computational method, called RePlow, that we developed to detect low-VAF somatic mutations based on simple, library-level replicates for next-generation sequencing on any platform. Through joint analysis of replicates, RePlow is able to remove prevailing background errors in next-generation sequencing analysis, facilitating remarkable improvement in the detection accuracy for low-VAF somatic mutations (up to ~99% reduction in false positives). The method is validated in independent cancer panel and brain tissue sequencing data. Our study suggests a new paradigm with which to exploit an overwhelming abundance of sequencing data for accurate variant detection.
Subject(s)
Computational Biology/methods , DNA Mutational Analysis/methods , Models, Statistical , Whole Genome Sequencing/methods , Algorithms , Brain/pathology , Gene Frequency/genetics , Genome, Human/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Neoplasms/genetics , Neoplasms/pathology , Polymorphism, Single Nucleotide/geneticsABSTRACT
The role of brain somatic mutations in Alzheimer's disease (AD) is not well understood. Here, we perform deep whole-exome sequencing (average read depth 584×) in 111 postmortem hippocampal formation and matched blood samples from 52 patients with AD and 11 individuals not affected by AD. The number of somatic single nucleotide variations (SNVs) in AD brain specimens increases significantly with aging, and the rate of mutation accumulation in the brain is 4.8-fold slower than that in AD blood. The putatively pathogenic brain somatic mutations identified in 26.9% (14 of 52) of AD individuals are enriched in PI3K-AKT, MAPK, and AMPK pathway genes known to contribute to hyperphosphorylation of tau. We show that a pathogenic brain somatic mutation in PIN1 leads to a loss-of-function mutation. In vitro mimicking of haploinsufficiency of PIN1 aberrantly increases tau phosphorylation and aggregation. This study provides new insights into the genetic architecture underlying the pathogenesis of AD.
Subject(s)
Alzheimer Disease/genetics , NIMA-Interacting Peptidylprolyl Isomerase/genetics , Protein Aggregation, Pathological/genetics , tau Proteins/metabolism , Age Factors , Aged , Aged, 80 and over , Aging/genetics , Alzheimer Disease/pathology , Animals , Cell Line, Tumor , Female , Gene Knockdown Techniques , Haploinsufficiency , Hippocampus/cytology , Hippocampus/pathology , Humans , Loss of Function Mutation , Male , Mice , Middle Aged , Mutation Rate , NIMA-Interacting Peptidylprolyl Isomerase/metabolism , Neurons , Phosphorylation/genetics , Polymorphism, Single Nucleotide , Protein Aggregation, Pathological/pathology , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Exome SequencingABSTRACT
The treatment of Lung adenocarcinoma (LUAD) could benefit from the incorporation of precision medicine. This study was to identify cancer-related genetic alterations by next generation sequencing (NGS) in resected LUAD samples from Korean patients and to determine their associations with clinical features. A total of 201 tumors and their matched peripheral blood samples were analyzed using targeted sequencing via the Illumina HiSeq 2500 platform of 242 genes with a median depth of coverage greater than 500X. One hundred ninety-two tumors were amenable to data analysis. EGFR was the most frequently mutated gene, occurring in 106 (55%) patients, followed by TP53 (n = 67, 35%) and KRAS (n = 11, 6%). EGFR mutations were strongly increased in patients that were female and never-smokers. Smokers had a significantly higher tumor mutational burden (TMB) than never-smokers (average 4.84 non-synonymous mutations/megabase [mt/Mb] vs. 2.84 mt/Mb, p = 0.019). Somatic mutations of APC, CTNNB1, and AMER1 in the WNT signaling pathway were highly associated with shortened disease-free survival (DFS) compared to others (median DFS of 89 vs. 27 months, p = 0.018). Patients with low TMB, annotated as less than 2 mt/Mb, had longer DFS than those with high TMB (p = 0.041). A higher frequency of EGFR mutations and a lower of KRAS mutations were observed in Korean LUAD patients. Profiles of 242 genes mapped in this study were compared with whole exome sequencing genetic profiles generated in The Cancer Genome Atlas Lung Adenocarcinoma. NGS-based diagnostics can provide clinically relevant information such as mutations or TMB from readily available formalin-fixed paraffin-embedded tissue.
Subject(s)
Adenocarcinoma of Lung/genetics , Antineoplastic Agents/therapeutic use , Biomarkers, Tumor/genetics , Lung Neoplasms/genetics , Precision Medicine/methods , Adenocarcinoma of Lung/drug therapy , Adenocarcinoma of Lung/mortality , Adult , Aged , Aged, 80 and over , Antineoplastic Agents/pharmacology , Asian People/genetics , Biomarkers, Tumor/antagonists & inhibitors , DNA Mutational Analysis , Disease-Free Survival , Feasibility Studies , Female , Follow-Up Studies , High-Throughput Nucleotide Sequencing , Humans , Lung/pathology , Lung Neoplasms/drug therapy , Lung Neoplasms/mortality , Male , Middle Aged , Mutation , Republic of Korea/epidemiology , Smoking/epidemiology , Exome SequencingABSTRACT
OBJECTIVE: To identify whether somatic mutations in SLC35A2 alter N-glycan structures in human brain tissues and cause nonlesional focal epilepsy (NLFE) or mild malformation of cortical development (mMCD). METHODS: Deep whole exome and targeted sequencing analyses were conducted for matched brain and blood tissues from patients with intractable NLFE and patients with mMCD who are negative for mutations in mTOR pathway genes. Furthermore, tissue glyco-capture and nanoLC/mass spectrometry analysis were performed to examine N-glycosylation in affected brain tissue. RESULTS: Six of the 31 (19.3%) study patients exhibited brain-only mutations in SLC35A2 (mostly nonsense and splicing site mutations) encoding a uridine diphosphate (UDP)-galactose transporter. Glycome analysis revealed the presence of an aberrant N-glycan series, including high degrees of N-acetylglucosamine, in brain tissues with SLC35A2 mutations. CONCLUSION: Our study suggests that brain somatic mutations in SLC35A2 cause intractable focal epilepsy with NLFE or mMCD via aberrant N-glycosylation in the affected brain.
ABSTRACT
Accumulation of DNA mutations alters amino acid sequence in the key domains of oncoproteins, leading to cellular malignant transformation. Due to redundancy of the genetic code, the same amino acid alteration can be achieved by multiple distinct genetic mutations, which are considered functionally identical and not actively distinguished in the current cancer genome research. For the first time, we analyzed the distribution of codon level transitions acquired by somatic mutations in human cancers. By analyzing the ~2.5 million nonsynonymous somatic single nucleotide variations (SNVs) found in the COSMIC database, we found 41 recurrent amino acid alterations whose DNA changes are significantly biased toward a specific codon transition. Additional analyses partially identified functional discrepancies between the favored and avoided codon transitions in terms of mutational process, codon usage, alternative splicing, and mRNA stability.