Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
F S Sci ; 5(2): 130-140, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38369016

RESUMO

OBJECTIVE: To determine if early spermatocytes can be enriched from a human testis biopsy using fluorescence-activated cell sorting (FACS). DESIGN: Potential surface markers for early spermatocytes were identified using bioinformatics analysis of single-cell RNA-sequenced human testis tissue. Testicular sperm extraction samples from three participants with normal spermatogenesis were digested into single-cell suspensions and cryopreserved. Two to four million cells were obtained from each and sorted by FACS as separate biologic replicates using antibodies for the identified surface markers. A portion from each biopsy remained unsorted to serve as controls. The sorted cells were then characterized for enrichment of early spermatocytes. SETTING: A laboratory study. PATIENTS: Three men with a diagnosis of obstructive azoospermia (age range, 30-40 years). INTERVENTION: None. MAIN OUTCOME MEASURES: Sorted cells were characterized for RNA expression of markers encompassing the stages of spermatogenesis. Sorting markers were validated by their reactivity on human testis formalin-fixed paraffin-embedded tissue. RESULTS: Serine protease 50 (TSP50) and SWI5-dependent homologous recombination repair protein 1 were identified as potential surface proteins specific for early spermatocytes. After FACS sorting, the TSP50-sorted populations accounted for 1.6%-8.9% of total populations and exhibited the greatest average-fold increases in RNA expression for the premeiotic marker stimulated by retinoic acid (STRA8), by 23-fold. Immunohistochemistry showed the staining pattern for TSP50 to be strong in premeiotic undifferentiated embryonic cell transcription factor 1-/doublesex and Mab-3 related transcription factor 1-/STRA8+ spermatogonia as well as SYCP3+/protamine 2- spermatocytes. CONCLUSION: This work shows that TSP50 can be used to enrich early STRA8-expressing spermatocytes from human testicular biopsies, providing a means for targeted single-cell RNA sequencing analysis and in vitro functional interrogation of germ cells during the onset of meiosis. This could enable investigation into details of the regulatory pathways underlying this critical stage of spermatogenesis, previously difficult to enrich from whole tissue samples.


Assuntos
Citometria de Fluxo , Espermatócitos , Humanos , Masculino , Espermatócitos/metabolismo , Espermatócitos/patologia , Adulto , Citometria de Fluxo/métodos , Biópsia/métodos , Espermatogênese/fisiologia , Testículo/patologia , Testículo/metabolismo , Azoospermia/patologia , Azoospermia/diagnóstico , Azoospermia/metabolismo , Azoospermia/genética , Separação Celular/métodos , Análise de Célula Única/métodos
2.
Res Sq ; 2024 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-38352568

RESUMO

Androgen receptor (AR)-mediated transcription plays a critical role in normal prostate development and prostate cancer growth. AR drives gene expression by binding to thousands of cis-regulatory elements (CRE) that loop to hundreds of target promoters. With multiple CREs interacting with a single promoter, it remains unclear how individual AR bound CREs contribute to gene expression. To characterize the involvement of these CREs, we investigated the AR-driven epigenetic and chromosomal chromatin looping changes. We collected a kinetic multi-omic dataset comprised of steady-state mRNA, chromatin accessibility, transcription factor binding, histone modifications, chromatin looping, and nascent RNA. Using an integrated regulatory network, we found that AR binding induces sequential changes in the epigenetic features at CREs, independent of gene expression. Further, we showed that binding of AR does not result in a substantial rewiring of chromatin loops, but instead increases the contact frequency of pre-existing loops to target promoters. Our results show that gene expression strongly correlates to the changes in contact frequency. We then proposed and experimentally validated an unbalanced multi-enhancer model where the impact on gene expression of AR-bound enhancers is heterogeneous, and is proportional to their contact frequency with target gene promoters. Overall, these findings provide new insight into AR-mediated gene expression upon acute androgen simulation and develop a mechanistic framework to investigate nuclear receptor mediated perturbations.

3.
Nucleic Acids Res ; 51(2): e11, 2023 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-36478271

RESUMO

Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem-the Minimum Error Clustering into Isoforms (MErCi) problem-and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line with a matched short-read RNA-seq dataset. Freddie results in isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/.


Assuntos
Processamento Alternativo , Software , Transcriptoma , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA-Seq , Análise de Sequência de RNA/métodos
4.
Nucleic Acids Res ; 51(3): e18, 2023 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-36546757

RESUMO

The vast majority of disease-associated single nucleotide polymorphisms (SNP) identified from genome-wide association studies (GWAS) are localized in non-coding regions. A significant fraction of these variants impact transcription factors binding to enhancer elements and alter gene expression. To functionally interrogate the activity of such variants we developed snpSTARRseq, a high-throughput experimental method that can interrogate the functional impact of hundreds to thousands of non-coding variants on enhancer activity. snpSTARRseq dramatically improves signal-to-noise by utilizing a novel sequencing and bioinformatic approach that increases both insert size and the number of variants tested per loci. Using this strategy, we interrogated known prostate cancer (PCa) risk-associated loci and demonstrated that 35% of them harbor SNPs that significantly altered enhancer activity. Combining these results with chromosomal looping data we could identify interacting genes and provide a mechanism of action for 20 PCa GWAS risk regions. When benchmarked to orthogonal methods, snpSTARRseq showed a strong correlation with in vivo experimental allelic-imbalance studies whereas there was no correlation with predictive in silico approaches. Overall, snpSTARRseq provides an integrated experimental and computational framework to functionally test non-coding genetic variants.


Assuntos
Estudo de Associação Genômica Ampla , Sequências Reguladoras de Ácido Nucleico , Humanos , Masculino , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Fatores de Transcrição/genética
5.
Nat Genet ; 54(9): 1364-1375, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36071171

RESUMO

Many genetic variants affect disease risk by altering context-dependent gene regulation. Such variants are difficult to study mechanistically using current methods that link genetic variation to steady-state gene expression levels, such as expression quantitative trait loci (eQTLs). To address this challenge, we developed the cistrome-wide association study (CWAS), a framework for identifying genotypic and allele-specific effects on chromatin that are also associated with disease. In prostate cancer, CWAS identified regulatory elements and androgen receptor-binding sites that explained the association at 52 of 98 known prostate cancer risk loci and discovered 17 additional risk loci. CWAS implicated key developmental transcription factors in prostate cancer risk that are overlooked by eQTL-based approaches due to context-dependent gene regulation. We experimentally validated associations and demonstrated the extensibility of CWAS to additional epigenomic datasets and phenotypes, including response to prostate cancer treatment. CWAS is a powerful and biologically interpretable paradigm for studying variants that influence traits by affecting transcriptional regulation.


Assuntos
Cromatina , Neoplasias da Próstata , Cromatina/genética , Regulação da Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Masculino , Polimorfismo de Nucleotídeo Único/genética , Neoplasias da Próstata/genética , Locos de Características Quantitativas/genética
6.
BMC Genomics ; 23(1): 129, 2022 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-35164688

RESUMO

BACKGROUND: The advent of next-generation sequencing technologies empowered a wide variety of transcriptomics studies. A widely studied topic is gene fusion which is observed in many cancer types and suspected of having oncogenic properties. Gene fusions are the result of structural genomic events that bring two genes closely located and result in a fused transcript. This is different from fusion transcripts created during or after the transcription process. These chimeric transcripts are also known as read-through and trans-splicing transcripts. Gene fusion discovery with short reads is a well-studied problem, and many methods have been developed. But the sensitivity of these methods is limited by the technology, especially the short read length. Advances in long-read sequencing technologies allow the generation of long transcriptomics reads at a low cost. Transcriptomic long-read sequencing presents unique opportunities to overcome the shortcomings of short-read technologies for gene fusion detection while introducing new challenges. RESULTS: We present Genion, a sensitive and fast gene fusion detection method that can also detect read-through events. We compare Genion against a recently introduced long-read gene fusion discovery method, LongGF, both on simulated and real datasets. On simulated data, Genion accurately identifies the gene fusions and its clustering accuracy for detecting fusion reads is better than LongGF. Furthermore, our results on the breast cancer cell line MCF-7 show that Genion correctly identifies all the experimentally validated gene fusions. CONCLUSIONS: Genion is an accurate gene fusion caller. Genion is implemented in C++ and is available at https://github.com/vpc-ccg/genion .


Assuntos
Software , Transcriptoma , Fusão Gênica , Genômica , Sequenciamento de Nucleotídeos em Larga Escala
7.
Nat Cell Biol ; 23(9): 1023-1034, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34489572

RESUMO

Cancers adapt to increasingly potent targeted therapies by reprogramming their phenotype. Here we investigated such a phenomenon in prostate cancer, in which tumours can escape epithelial lineage confinement and transition to a high-plasticity state as an adaptive response to potent androgen receptor (AR) antagonism. We found that AR activity can be maintained as tumours adopt alternative lineage identities, with changes in chromatin architecture guiding AR transcriptional rerouting. The epigenetic regulator enhancer of zeste homologue 2 (EZH2) co-occupies the reprogrammed AR cistrome to transcriptionally modulate stem cell and neuronal gene networks-granting privileges associated with both fates. This function of EZH2 was associated with T350 phosphorylation and establishment of a non-canonical polycomb subcomplex. Our study provides mechanistic insights into the plasticity of the lineage-infidelity state governed by AR reprogramming that enabled us to redirect cell fate by modulating EZH2 and AR, highlighting the clinical potential of reversing resistance phenotypes.


Assuntos
Regulação Neoplásica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Neoplasias da Próstata/patologia , Receptores Androgênicos/metabolismo , Linhagem Celular Tumoral , Proteína Potenciadora do Homólogo 2 de Zeste/genética , Proteína Potenciadora do Homólogo 2 de Zeste/metabolismo , Redes Reguladoras de Genes/fisiologia , Humanos , Masculino , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Receptores Androgênicos/genética , Transdução de Sinais/fisiologia
8.
Genome Biol ; 22(1): 149, 2021 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-33975627

RESUMO

BACKGROUND: Androgen receptor (AR) is critical to the initiation, growth, and progression of prostate cancer. Once activated, the AR binds to cis-regulatory enhancer elements on DNA that drive gene expression. Yet, there are 10-100× more binding sites than differentially expressed genes. It is unclear how or if these excess binding sites impact gene transcription. RESULTS: To characterize the regulatory logic of AR-mediated transcription, we generated a locus-specific map of enhancer activity by functionally testing all common clinical AR binding sites with Self-Transcribing Active Regulatory Regions sequencing (STARRseq). Only 7% of AR binding sites displayed androgen-dependent enhancer activity. Instead, the vast majority of AR binding sites were either inactive or constitutively active enhancers. These annotations strongly correlated with enhancer-associated features of both in vitro cell lines and clinical prostate cancer samples. Evaluating the effect of each enhancer class on transcription, we found that AR-regulated enhancers frequently interact with promoters and form central chromosomal loops that are required for transcription. Somatic mutations of these critical AR-regulated enhancers often impact enhancer activity. CONCLUSIONS: Using a functional map of AR enhancer activity, we demonstrated that AR-regulated enhancers act as a regulatory hub that increases interactions with other AR binding sites and gene promoters.


Assuntos
Elementos Facilitadores Genéticos , Receptores Androgênicos/genética , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Genoma Humano , Humanos , Masculino , Anotação de Sequência Molecular , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Neoplasias da Próstata/genética , Reprodutibilidade dos Testes
9.
Bioinformatics ; 36(12): 3703-3711, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32259207

RESUMO

MOTIVATION: The ubiquitous abundance of circular RNAs (circRNAs) has been revealed by performing high-throughput sequencing in a variety of eukaryotes. circRNAs are related to some diseases, such as cancer in which they act as oncogenes or tumor-suppressors and, therefore, have the potential to be used as biomarkers or therapeutic targets. Accurate and rapid detection of circRNAs from short reads remains computationally challenging. This is due to the fact that identifying chimeric reads, which is essential for finding back-splice junctions, is a complex process. The sensitivity of discovery methods, to a high degree, relies on the underlying mapper that is used for finding chimeric reads. Furthermore, all the available circRNA discovery pipelines are resource intensive. RESULTS: We introduce CircMiner, a novel stand-alone circRNA detection method that rapidly identifies and filters out linear RNA sequencing reads and detects back-splice junctions. CircMiner employs a rapid pseudo-alignment technique to identify linear reads that originate from transcripts, genes or the genome. CircMiner further processes the remaining reads to identify the back-splice junctions and detect circRNAs with single-nucleotide resolution. We evaluated the efficacy of CircMiner using simulated datasets generated from known back-splice junctions and showed that CircMiner has superior accuracy and speed compared to the existing circRNA detection tools. Additionally, on two RNase R treated cell line datasets, CircMiner was able to detect most of consistent, high confidence circRNAs compared to untreated samples of the same cell line. AVAILABILITY AND IMPLEMENTATION: CircMiner is implemented in C++ and is available online at https://github.com/vpc-ccg/circminer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA Circular , RNA , Sequência de Bases , RNA/genética , Splicing de RNA , Análise de Sequência de RNA
10.
Sci Rep ; 10(1): 2026, 2020 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-32029828

RESUMO

Clear-cell renal cell carcinoma (ccRCC) is a common therapy resistant disease with aberrant angiogenic and immunosuppressive features. Patients with metastatic disease are treated with targeted therapies based on clinical features: low-risk patients are usually treated with anti-angiogenic drugs and intermediate/high-risk patients with immune therapy. However, there are no biomarkers available to guide treatment choice for these patients. A recently published phase II clinical trial observed a correlation between ccRCC patients' clustering and their response to targeted therapy. However, the clustering of these groups was not distinct. Here, we analyzed the gene expression profile of 469 ccRCC patients, using featured selection technique, and have developed a refined 66-gene signature for improved sub-classification of patients. Moreover, we have identified a novel comprehensive expression profile to distinguish between migratory stromal and immune cells. Furthermore, the proposed 66-gene signature was validated using a different cohort of 64 ccRCC patients. These findings are foundational for the development of reliable biomarkers that may guide treatment decision-making and improve therapy response in ccRCC patients.


Assuntos
Inibidores da Angiogênese/uso terapêutico , Antineoplásicos Imunológicos/uso terapêutico , Biomarcadores Tumorais/genética , Carcinoma de Células Renais/tratamento farmacológico , Neoplasias Renais/tratamento farmacológico , Medicina de Precisão/métodos , Inibidores da Angiogênese/farmacologia , Antineoplásicos Imunológicos/farmacologia , Biomarcadores Tumorais/antagonistas & inibidores , Carcinoma de Células Renais/genética , Tomada de Decisão Clínica/métodos , Análise por Conglomerados , Estudos de Coortes , Conjuntos de Dados como Assunto , Estudos de Viabilidade , Feminino , Perfilação da Expressão Gênica , Humanos , Neoplasias Renais/genética , Masculino , Oncologia/métodos , Pessoa de Meia-Idade , Seleção de Pacientes , Prognóstico , Transcriptoma/genética
11.
Genome Res ; 29(11): 1860-1877, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31628256

RESUMO

Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and-as a first in tumor phylogeny reconstruction-a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Filogenia , Análise de Célula Única/métodos , Humanos , Neoplasias/patologia
12.
Nucleic Acids Res ; 47(7): e38, 2019 04 23.
Artigo em Inglês | MEDLINE | ID: mdl-30759232

RESUMO

MOTIVATION: Cancer is a complex disease that involves rapidly evolving cells, often forming multiple distinct clones. In order to effectively understand progression of a patient-specific tumor, one needs to comprehensively sample tumor DNA at multiple time points, ideally obtained through inexpensive and minimally invasive techniques. Current sequencing technologies make the 'liquid biopsy' possible, which involves sampling a patient's blood or urine and sequencing the circulating cell free DNA (cfDNA). A certain percentage of this DNA originates from the tumor, known as circulating tumor DNA (ctDNA). The ratio of ctDNA may be extremely low in the sample, and the ctDNA may originate from multiple tumors or clones. These factors present unique challenges for applying existing tools and workflows to the analysis of ctDNA, especially in the detection of structural variations which rely on sufficient read coverage to be detectable. RESULTS: Here we introduce SViCT , a structural variation (SV) detection tool designed to handle the challenges associated with cfDNA analysis. SViCT can detect breakpoints and sequences of various structural variations including deletions, insertions, inversions, duplications and translocations. SViCT extracts discordant read pairs, one-end anchors and soft-clipped/split reads, assembles them into contigs, and re-maps contig intervals to a reference genome using an efficient k-mer indexing approach. The intervals are then joined using a combination of graph and greedy algorithms to identify specific structural variant signatures. We assessed the performance of SViCT and compared it to state-of-the-art tools using simulated cfDNA datasets with properties matching those of real cfDNA samples. The positive predictive value and sensitivity of our tool was superior to all the tested tools and reasonable performance was maintained down to the lowest dilution of 0.01% tumor DNA in simulated datasets. Additionally, SViCT was able to detect all known SVs in two real cfDNA reference datasets (at 0.6-5% ctDNA) and predict a novel structural variant in a prostate cancer cohort. AVAILABILITY: SViCT is available at https://github.com/vpc-ccg/svict. Contact:faraz.hach@ubc.ca.


Assuntos
Algoritmos , Ácidos Nucleicos Livres/sangue , Ácidos Nucleicos Livres/genética , Análise Mutacional de DNA/métodos , Mutação , DNA Tumoral Circulante/sangue , DNA Tumoral Circulante/genética , Conjuntos de Dados como Assunto , Humanos , Masculino , Neoplasias da Próstata/genética , Sensibilidade e Especificidade
13.
Genome Med ; 11(1): 8, 2019 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-30777124

RESUMO

BACKGROUND: Malignant peritoneal mesothelioma (PeM) is a rare and fatal cancer that originates from the peritoneal lining of the abdomen. Standard treatment of PeM is limited to cytoreductive surgery and/or chemotherapy, and no effective targeted therapies for PeM exist. Some immune checkpoint inhibitor studies of mesothelioma have found positivity to be associated with a worse prognosis. METHODS: To search for novel therapeutic targets for PeM, we performed a comprehensive integrative multi-omics analysis of the genome, transcriptome, and proteome of 19 treatment-naïve PeM, and in particular, we examined BAP1 mutation and copy number status and its relationship to immune checkpoint inhibitor activation. RESULTS: We found that PeM could be divided into tumors with an inflammatory tumor microenvironment and those without and that this distinction correlated with haploinsufficiency of BAP1. To further investigate the role of BAP1, we used our recently developed cancer driver gene prioritization algorithm, HIT'nDRIVE, and observed that PeM with BAP1 haploinsufficiency form a distinct molecular subtype characterized by distinct gene expression patterns of chromatin remodeling, DNA repair pathways, and immune checkpoint receptor activation. We demonstrate that this subtype is correlated with an inflammatory tumor microenvironment and thus is a candidate for immune checkpoint blockade therapies. CONCLUSIONS: Our findings reveal BAP1 to be a potential, easily trackable prognostic and predictive biomarker for PeM immunotherapy that refines PeM disease classification. BAP1 stratification may improve drug response rates in ongoing phases I and II clinical trials exploring the use of immune checkpoint blockade therapies in PeM in which BAP1 status is not considered. This integrated molecular characterization provides a comprehensive foundation for improved management of a subset of PeM patients.


Assuntos
Biomarcadores Tumorais/genética , Haploinsuficiência , Mesotelioma/genética , Neoplasias Peritoneais/genética , Proteínas Supressoras de Tumor/genética , Ubiquitina Tiolesterase/genética , Biomarcadores Tumorais/metabolismo , Humanos , Imunoterapia , Mesotelioma/classificação , Mesotelioma/terapia , Mutação , Neoplasias Peritoneais/classificação , Neoplasias Peritoneais/terapia , Microambiente Tumoral , Proteínas Supressoras de Tumor/metabolismo , Ubiquitina Tiolesterase/metabolismo
14.
Bioinformatics ; 35(11): 1829-1836, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30351359

RESUMO

MOTIVATION: Next-Generation Sequencing has led to the availability of massive genomic datasets whose processing raises many challenges, including the handling of sequencing errors. This is especially pertinent in cancer genomics, e.g. for detecting low allele frequency variations from circulating tumor DNA. Barcode tagging of DNA molecules with unique molecular identifiers (UMI) attempts to mitigate sequencing errors; UMI tagged molecules are polymerase chain reaction (PCR) amplified, and the PCR copies of UMI tagged molecules are sequenced independently. However, the PCR and sequencing steps can generate errors in the sequenced reads that can be located in the barcode and/or the DNA sequence. Analyzing UMI tagged sequencing data requires an initial clustering step, with the aim of grouping reads sequenced from PCR duplicates of the same UMI tagged molecule into a single cluster, and the size of the current datasets requires this clustering process to be resource-efficient. RESULTS: We introduce Calib, a computational tool that clusters paired-end reads from UMI tagged sequencing experiments generated by substitution-error-dominant sequencing platforms such as Illumina. Calib clusters are defined as connected components of a graph whose edges are defined in terms of both barcode similarity and read sequence similarity. The graph is constructed efficiently using locality sensitive hashing and MinHashing techniques. Calib's default clustering parameters are optimized empirically, for different UMI and read lengths, using a simulation module that is packaged with Calib. Compared to other tools, Calib has the best accuracy on simulated data, while maintaining reasonable runtime and memory footprint. On a real dataset, Calib runs with far less resources than alignment-based methods, and its clusters reduce the number of tentative false positive in downstream variation calling. AVAILABILITY AND IMPLEMENTATION: Calib is implemented in C++ and its simulation module is implemented in Python. Calib is available at https://github.com/vpc-ccg/calib. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Algoritmos , Análise por Conglomerados , DNA , Análise de Sequência de DNA
15.
Bioinformatics ; 34(10): 1672-1681, 2018 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-29267878

RESUMO

Motivation: Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results: We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation: MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie. Contact: cenksahi@indiana.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias da Mama/genética , Fusão Gênica , Proteínas de Neoplasias/genética , Proteogenômica/métodos , Software , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Espectrometria de Massas/métodos , Proteínas de Neoplasias/análise , Análise de Sequência de RNA/métodos
16.
Clin Cancer Res ; 23(24): 7596-7607, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-28954787

RESUMO

Purpose: Gene fusions are frequently found in prostate cancer and may result in the formation of unique chimeric amino acid sequences (CASQ) that span the breakpoint of two fused gene products. This study evaluated the potential for fusion-derived CASQs to be a source of tumor neoepitopes, and determined their relationship to patterns of immune signatures in prostate cancer patients.Experimental Design: A computational strategy was used to identify CASQs and their corresponding predicted MHC class I epitopes using RNA-Seq data from The Cancer Genome Atlas of prostate tumors. In vitro peptide-specific T-cell expansion was performed to identify CASQ-reactive T cells. A multivariate analysis was used to relate patterns of in silico-predicted tumor-infiltrating immune cells with prostate tumors harboring these mutational events.Results: Eighty-seven percent of tumors contained gene fusions with a mean of 12 per tumor. In total, 41% of fusion-positive tumors were found to encode CASQs. Within these tumors, 87% gave rise to predicted MHC class I-binding epitopes. This observation was more prominent when patients were stratified into low- and intermediate/high-risk categories. One of the identified CASQ from the recurrent TMPRSS2:ERG type VI fusion contained several high-affinity HLA-restricted epitopes. These peptides bound HLA-A*02:01 in vitro and were recognized by CD8+ T cells. Finally, the presence of fusions and CASQs were associated with expression of immune cell infiltration.Conclusions: Mutanome analysis of gene fusion-derived CASQs can give rise to patient-specific predicted neoepitopes. Moreover, these fusions predicted patterns of immune cell infiltration within a subgroup of prostate cancer patients. Clin Cancer Res; 23(24); 7596-607. ©2017 AACR.


Assuntos
Epitopos de Linfócito T/genética , Antígeno HLA-A2/genética , Proteínas de Fusão Oncogênica/genética , Neoplasias da Próstata/imunologia , Sequência de Aminoácidos/genética , Linfócitos T CD8-Positivos/imunologia , Análise Mutacional de DNA , Epitopos de Linfócito T/imunologia , Genes MHC Classe I/genética , Genes MHC Classe I/imunologia , Antígeno HLA-A2/imunologia , Humanos , Masculino , Proteínas de Fusão Oncogênica/imunologia , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Ligação Proteica , Linfócitos T/imunologia
17.
Bioinformatics ; 33(1): 26-34, 2017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-27531099

RESUMO

MOTIVATION: Successful development and application of precision oncology approaches require robust elucidation of the genomic landscape of a patient's cancer and, ideally, the ability to monitor therapy-induced genomic changes in the tumour in an inexpensive and minimally invasive manner. Thanks to recent advances in sequencing technologies, 'liquid biopsy', the sampling of patient's bodily fluids such as blood and urine, is considered as one of the most promising approaches to achieve this goal. In many cancer patients, and especially those with advanced metastatic disease, deep sequencing of circulating cell free DNA (cfDNA) obtained from patient's blood yields a mixture of reads originating from the normal DNA and from multiple tumour subclones-called circulating tumour DNA or ctDNA. The ctDNA/cfDNA ratio as well as the proportion of ctDNA originating from specific tumour subclones depend on multiple factors, making comprehensive detection of mutations difficult, especially at early stages of cancer. Furthermore, sensitive and accurate detection of single nucleotide variants (SNVs) and indels from cfDNA is constrained by several factors such as the sequencing errors and PCR artifacts, and mapping errors related to repeat regions within the genome. In this article, we introduce SiNVICT, a computational method that increases the sensitivity and specificity of SNV and indel detection at very low variant allele frequencies. SiNVICT has the capability to handle multiple sequencing platforms with different error properties; it minimizes false positives resulting from mapping errors and other technology specific artifacts including strand bias and low base quality at read ends. SiNVICT also has the capability to perform time-series analysis, where samples from a patient sequenced at multiple time points are jointly examined to report locations of interest where there is a possibility that certain clones were wiped out by some treatment while some subclones gained selective advantage. RESULTS: We tested SiNVICT on simulated data as well as prostate cancer cell lines and cfDNA obtained from castration-resistant prostate cancer patients. On both simulated and biological data, SiNVICT was able to detect SNVs and indels with variant allele percentages as low as 0.5%. The lowest amounts of total DNA used for the biological data where SNVs and indels could be detected with very high sensitivity were 2.5 ng on the Ion Torrent platform and 10 ng on Illumina. With increased sequencing and mapping accuracy, SiNVICT might be utilized in clinical settings, making it possible to track the progress of point mutations and indels that are associated with resistance to cancer therapies and provide patients personalized treatment. We also compared SiNVICT with other popular SNV callers such as MuTect, VarScan2 and Freebayes. Our results show that SiNVICT performs better than these tools in most cases and allows further data exploration such as time-series analysis on cfDNA sequencing data. AVAILABILITY AND IMPLEMENTATION: SiNVICT is available at: https://sfu-compbio.github.io/sinvictSupplementary information: Supplementary data are available at Bioinformatics online. CONTACT: cenk@sfu.ca.


Assuntos
Análise Mutacional de DNA/métodos , DNA de Neoplasias/sangue , Mutação INDEL , Neoplasias/genética , Mutação Puntual , Software , Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Masculino , Neoplasias/sangue , Sensibilidade e Especificidade
18.
Nat Genet ; 47(7): 736-45, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26005866

RESUMO

Herein we provide a detailed molecular analysis of the spatial heterogeneity of clinically localized, multifocal prostate cancer to delineate new oncogenes or tumor suppressors. We initially determined the copy number aberration (CNA) profiles of 74 patients with index tumors of Gleason score 7. Of these, 5 patients were subjected to whole-genome sequencing using DNA quantities achievable in diagnostic biopsies, with detailed spatial sampling of 23 distinct tumor regions to assess intraprostatic heterogeneity in focal genomics. Multifocal tumors are highly heterogeneous for single-nucleotide variants (SNVs), CNAs and genomic rearrangements. We identified and validated a new recurrent amplification of MYCL, which is associated with TP53 deletion and unique profiles of DNA damage and transcriptional dysregulation. Moreover, we demonstrate divergent tumor evolution in multifocal cancer and, in some cases, tumors of independent clonal origin. These data represent the first systematic relation of intraprostatic genomic heterogeneity to predicted clinical outcome and inform the development of novel biomarkers that reflect individual prognosis.


Assuntos
Neoplasias da Próstata/genética , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Estudos de Associação Genética , Heterogeneidade Genética , Genoma Humano , Humanos , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Mutação Puntual , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/patologia , Proteínas Proto-Oncogênicas c-myc/genética
19.
Bioinformatics ; 28(12): i179-87, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22689759

RESUMO

MOTIVATION: Computational identification of genomic structural variants via high-throughput sequencing is an important problem for which a number of highly sophisticated solutions have been recently developed. With the advent of high-throughput transcriptome sequencing (RNA-Seq), the problem of identifying structural alterations in the transcriptome is now attracting significant attention. In this article, we introduce two novel algorithmic formulations for identifying transcriptomic structural variants through aligning transcripts to the reference genome under the consideration of such variation. The first formulation is based on a nucleotide-level alignment model; a second, potentially faster formulation is based on chaining fragments shared between each transcript and the reference genome. Based on these formulations, we introduce a novel transcriptome-to-genome alignment tool, Dissect (DIScovery of Structural Alteration Event Containing Transcripts), which can identify and characterize transcriptomic events such as duplications, inversions, rearrangements and fusions. Dissect is suitable for whole transcriptome structural variation discovery problems involving sufficiently long reads or accurately assembled contigs. RESULTS: We tested Dissect on simulated transcripts altered via structural events, as well as assembled RNA-Seq contigs from human prostate cancer cell line C4-2. Our results indicate that Dissect has high sensitivity and specificity in identifying structural alteration events in simulated transcripts as well as uncovering novel structural alterations in cancer transcriptomes. AVAILABILITY: Dissect is available for public use at: http://dissect-trans.sourceforge.net.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Transcriptoma , Linhagem Celular Tumoral , Perfilação da Expressão Gênica/métodos , Humanos , Masculino , Modelos Teóricos , Neoplasias da Próstata , RNA/genética , Alinhamento de Sequência
20.
J Pathol ; 227(3): 286-97, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22553170

RESUMO

The current paradigm of cancer care relies on predictive nomograms which integrate detailed histopathology with clinical data. However, when predictions fail, the consequences for patients are often catastrophic, especially in prostate cancer where nomograms influence the decision to therapeutically intervene. We hypothesized that the high dimensional data afforded by massively parallel sequencing (MPS) is not only capable of providing biological insights, but may aid molecular pathology of prostate tumours. We assembled a cohort of six patients with high-risk disease, and performed deep RNA and shallow DNA sequencing in primary tumours and matched metastases where available. Our analysis identified copy number abnormalities, accurately profiled gene expression levels, and detected both differential splicing and expressed fusion genes. We revealed occult and potentially dormant metastases, unambiguously supporting the patients' clinical history, and implicated the REST transcriptional complex in the development of neuroendocrine prostate cancer, validating this finding in a large independent cohort. We massively expand on the number of novel fusion genes described in prostate cancer; provide fresh evidence for the growing link between fusion gene aetiology and gene expression profiles; and show the utility of fusion genes for molecular pathology. Finally, we identified chromothripsis in a patient with chronic prostatitis. Our results provide a strong foundation for further development of MPS-based molecular pathology.


Assuntos
Adenocarcinoma/genética , Biomarcadores Tumorais/genética , Transformação Celular Neoplásica/genética , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias Hormônio-Dependentes/genética , Células Neuroendócrinas/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Neoplasias da Próstata/genética , Adenocarcinoma/metabolismo , Adenocarcinoma/secundário , Adenocarcinoma/terapia , Idoso , Processamento Alternativo , Biomarcadores Tumorais/sangue , Colúmbia Britânica , Linhagem Celular Tumoral , Transformação Celular Neoplásica/metabolismo , Transformação Celular Neoplásica/patologia , Análise por Conglomerados , Técnicas de Apoio para a Decisão , Dosagem de Genes , Fusão Gênica , Predisposição Genética para Doença , Humanos , Metástase Linfática , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Neoplasias Hormônio-Dependentes/metabolismo , Neoplasias Hormônio-Dependentes/patologia , Neoplasias Hormônio-Dependentes/terapia , Células Neuroendócrinas/patologia , Nomogramas , Seleção de Pacientes , Fenótipo , Medicina de Precisão , Prognóstico , Antígeno Prostático Específico/sangue , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia , Neoplasias da Próstata/terapia , Interferência de RNA , Transfecção
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA