Search | VHL Search Portal

Computational identification of micro-structural variations and their proteogenomic consequences in cancer.

Lin, Yen-Yi; Gawronski, Alexander; Hach, Faraz; Li, Sujun; Numanagic, Ibrahim; Sarrafi, Iman; Mishra, Swati; McPherson, Andrew; Collins, Colin C; Radovich, Milan; Tang, Haixu; Sahinalp, S Cenk.

Bioinformatics ; 34(10): 1672-1681, 2018 05 15.

Article in English | MEDLINE | ID: mdl-29267878

ABSTRACT

Motivation: Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results: We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation: MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie. Contact: cenksahi@indiana.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Subject(s)

Breast Neoplasms/genetics , Gene Fusion , Neoplasm Proteins/genetics , Proteogenomics/methods , Software , Female , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Humans , Mass Spectrometry/methods , Neoplasm Proteins/analysis , Sequence Analysis, RNA/methods

SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA.

Kockan, Can; Hach, Faraz; Sarrafi, Iman; Bell, Robert H; McConeghy, Brian; Beja, Kevin; Haegert, Anne; Wyatt, Alexander W; Volik, Stanislav V; Chi, Kim N; Collins, Colin C; Sahinalp, S Cenk.

Bioinformatics ; 33(1): 26-34, 2017 01 01.

Article in English | MEDLINE | ID: mdl-27531099

ABSTRACT

MOTIVATION: Successful development and application of precision oncology approaches require robust elucidation of the genomic landscape of a patient's cancer and, ideally, the ability to monitor therapy-induced genomic changes in the tumour in an inexpensive and minimally invasive manner. Thanks to recent advances in sequencing technologies, 'liquid biopsy', the sampling of patient's bodily fluids such as blood and urine, is considered as one of the most promising approaches to achieve this goal. In many cancer patients, and especially those with advanced metastatic disease, deep sequencing of circulating cell free DNA (cfDNA) obtained from patient's blood yields a mixture of reads originating from the normal DNA and from multiple tumour subclones-called circulating tumour DNA or ctDNA. The ctDNA/cfDNA ratio as well as the proportion of ctDNA originating from specific tumour subclones depend on multiple factors, making comprehensive detection of mutations difficult, especially at early stages of cancer. Furthermore, sensitive and accurate detection of single nucleotide variants (SNVs) and indels from cfDNA is constrained by several factors such as the sequencing errors and PCR artifacts, and mapping errors related to repeat regions within the genome. In this article, we introduce SiNVICT, a computational method that increases the sensitivity and specificity of SNV and indel detection at very low variant allele frequencies. SiNVICT has the capability to handle multiple sequencing platforms with different error properties; it minimizes false positives resulting from mapping errors and other technology specific artifacts including strand bias and low base quality at read ends. SiNVICT also has the capability to perform time-series analysis, where samples from a patient sequenced at multiple time points are jointly examined to report locations of interest where there is a possibility that certain clones were wiped out by some treatment while some subclones gained selective advantage. RESULTS: We tested SiNVICT on simulated data as well as prostate cancer cell lines and cfDNA obtained from castration-resistant prostate cancer patients. On both simulated and biological data, SiNVICT was able to detect SNVs and indels with variant allele percentages as low as 0.5%. The lowest amounts of total DNA used for the biological data where SNVs and indels could be detected with very high sensitivity were 2.5 ng on the Ion Torrent platform and 10 ng on Illumina. With increased sequencing and mapping accuracy, SiNVICT might be utilized in clinical settings, making it possible to track the progress of point mutations and indels that are associated with resistance to cancer therapies and provide patients personalized treatment. We also compared SiNVICT with other popular SNV callers such as MuTect, VarScan2 and Freebayes. Our results show that SiNVICT performs better than these tools in most cases and allows further data exploration such as time-series analysis on cfDNA sequencing data. AVAILABILITY AND IMPLEMENTATION: SiNVICT is available at: https://sfu-compbio.github.io/sinvictSupplementary information: Supplementary data are available at Bioinformatics online. CONTACT: cenk@sfu.ca.

Subject(s)

DNA Mutational Analysis/methods , DNA, Neoplasm/blood , INDEL Mutation , Neoplasms/genetics , Point Mutation , Software , Gene Frequency , High-Throughput Nucleotide Sequencing/methods , Humans , Male , Neoplasms/blood , Sensitivity and Specificity

mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications.

Hach, Faraz; Sarrafi, Iman; Hormozdiari, Farhad; Alkan, Can; Eichler, Evan E; Sahinalp, S Cenk.

Nucleic Acids Res ; 42(Web Server issue): W494-500, 2014 Jul.

Article in English | MEDLINE | ID: mdl-24810850

ABSTRACT

High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide , Software , Genome, Human , Humans , Internet , Sequence Alignment

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL