RESUMO
MOTIVATION: Whole exome and gene panel sequencing are increasingly used for oncological diagnostics. To investigate the accuracy of SCNA detection algorithms on simulated and clinical tumor samples, the precision and sensitivity of four SCNA callers were measured using 50 simulated whole exome and 50 simulated targeted gene panel datasets, and using 119 TCGA tumor samples for which SNP array data were available. RESULTS: On synthetic exome and panel data, VarScan2 mostly called false positives, whereas Control-FREEC was precise (>90% correct calls) at the cost of low sensitivity (<40% detected). ONCOCNV was slightly less precise on gene panel data, with similarly low sensitivity. This could be explained by low sensitivity for amplifications and high precision for deletions. Surprisingly, these results were not strongly affected by moderate tumor impurities; only contaminations with more than 60% non-cancerous cells resulted in strongly declining precision and sensitivity. On the 119 clinical samples, both Control-FREEC and CNVkit called 71.8% and 94%, respectively, of the SCNAs found by the SNP arrays, but with a considerable amount of false positives (precision 29% and 4.9%). DISCUSSION: Whole exome and targeted gene panel methods by design limit the precision of SCNA callers, making them prone to false positives. SCNA calls cannot easily be integrated in clinical pipelines that use data from targeted capture-based sequencing. If used at all, they need to be cross-validated using orthogonal methods. AVAILABILITY AND IMPLEMENTATION: Scripts are provided as supplementary information. CONTACT: gunther.jansen@molecularhealth.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Sequenciamento do Exoma/métodos , DNA de Neoplasias , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Reprodutibilidade dos TestesRESUMO
Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.
Assuntos
Arabidopsis/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , Transcrição Gênica/genética , Arabidopsis/classificação , Proteínas de Arabidopsis/genética , Sequência de Bases , Genes de Plantas/genética , Genômica , Haplótipos/genética , Mutação INDEL/genética , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Proteoma/genética , Plântula/genética , Análise de Sequência de DNARESUMO
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Assuntos
RNA/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Sequência de Bases , Internet , SoftwareRESUMO
Deep transcriptome sequencing (RNA-Seq) has become a vital tool for studying the state of cells in the context of varying environments, genotypes and other factors. RNA-Seq profiling data enable identification of novel isoforms, quantification of known isoforms and detection of changes in transcriptional or RNA-processing activity. Existing approaches to detect differential isoform abundance between samples either require a complete isoform annotation or fall short in providing statistically robust and calibrated significance estimates. Here, we propose a suite of statistical tests to address these open needs: a parametric test that uses known isoform annotations to detect changes in relative isoform abundance and a non-parametric test that detects differential read coverages and can be applied when isoform annotations are not available. Both methods account for the discrete nature of read counts and the inherent biological variability. We demonstrate that these tests compare favorably to previous methods, both in terms of accuracy and statistical calibrations. We use these techniques to analyze RNA-Seq libraries from Arabidopsis thaliana and Drosophila melanogaster. The identified differential RNA processing events were consistent with RT-qPCR measurements and previous studies. The proposed toolkit is available from http://bioweb.me/rdiff and enables in-depth analyses of transcriptomes, with or without available isoform annotation.
Assuntos
Processamento Pós-Transcricional do RNA , Algoritmos , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Interpretação Estatística de Dados , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Perfilação da Expressão Gênica , Anotação de Sequência Molecular , Isoformas de RNA/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Deep sequencing of transcriptomes allows quantitative and qualitative analysis of many RNA species in a sample, with parallel comparison of expression levels, splicing variants, natural antisense transcripts, RNA editing and transcriptional start and stop sites the ideal goal. By computational modeling, we show how libraries of multiple insert sizes combined with strand-specific, paired-end (SS-PE) sequencing can increase the information gained on alternative splicing, especially in higher eukaryotes. Despite the benefits of gaining SS-PE data with paired ends of varying distance, the standard Illumina protocol allows only non-strand-specific, paired-end sequencing with a single insert size. Here, we modify the Illumina RNA ligation protocol to allow SS-PE sequencing by using a custom pre-adenylated 3' adaptor. We generate parallel libraries with differing insert sizes to aid deconvolution of alternative splicing events and to characterize the extent and distribution of natural antisense transcription in C. elegans. Despite stringent requirements for detection of alternative splicing, our data increases the number of intron retention and exon skipping events annotated in the Wormbase genome annotations by 127% and 121%, respectively. We show that parallel libraries with a range of insert sizes increase transcriptomic information gained by sequencing and that by current established benchmarks our protocol gives competitive results with respect to library quality.
Assuntos
Caenorhabditis elegans/genética , Perfilação da Expressão Gênica/métodos , Transcriptoma , Processamento Alternativo , Animais , Proteínas de Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Bases de Dados Genéticas , Biblioteca Gênica , Genes de Helmintos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Análise de Sequência de RNA , Transcrição GênicaRESUMO
We provide a novel web service, called rQuant.web, allowing convenient access to tools for quantitative analysis of RNA sequencing data. The underlying quantitation technique rQuant is based on quadratic programming and estimates different biases induced by library preparation, sequencing and read mapping. It can tackle multiple transcripts per gene locus and is therefore particularly well suited to quantify alternative transcripts. rQuant.web is available as a tool in a Galaxy installation at http://galaxy.fml.mpg.de. Using rQuant.web is free of charge, it is open to all users, and there is no login requirement.
Assuntos
Perfilação da Expressão Gênica , Análise de Sequência de RNA , Software , Animais , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Internet , RNA Mensageiro/análiseRESUMO
Rice, the primary source of dietary calories for half of humanity, is the first crop plant for which a high-quality reference genome sequence from a single variety was produced. We used resequencing microarrays to interrogate 100 Mb of the unique fraction of the reference genome for 20 diverse varieties and landraces that capture the impressive genotypic and phenotypic diversity of domesticated rice. Here, we report the distribution of 160,000 nonredundant SNPs. Introgression patterns of shared SNPs revealed the breeding history and relationships among the 20 varieties; some introgressed regions are associated with agronomic traits that mark major milestones in rice improvement. These comprehensive SNP data provide a foundation for deep exploration of rice diversity and gene-trait relationships and their use for future rice improvement.
Assuntos
Variação Genética , Genoma de Planta/genética , Oryza/genética , Polimorfismo de Nucleotídeo Único , Mapeamento Cromossômico , Cromossomos de Plantas/genética , Frequência do Gene , Genótipo , Dados de Sequência Molecular , Oryza/classificação , Filogenia , Locos de Características Quantitativas/genética , Análise de Sequência de DNA , Especificidade da EspécieRESUMO
Renal cell carcinoma (RCC) is a kidney cancer with an onset mainly during the sixth or seventh decade of the patient's life. Patients with advanced, metastasized RCC have a poor prognosis. The majority of patients develop treatment resistance towards Standard of Care (SoC) drugs within months. Tyrosine kinase inhibitors (TKIs) are the backbone of first-line therapy and have been partnered with an immune checkpoint inhibitor (ICI) recently. Despite the most recent progress, the development of novel therapies targeting acquired TKI resistance mechanisms in advanced and metastatic RCC remains a high medical need. Preclinical models with high translational relevance can significantly support the development of novel personalized therapies. It has been demonstrated that patient-derived xenograft (PDX) models represent an essential tool for the preclinical evaluation of novel targeted therapies and their combinations. In the present project, we established and molecularly characterized a comprehensive panel of subcutaneous RCC PDX models with well-conserved molecular and pathological features over multiple passages. Drug screening towards four SoC drugs targeting the vascular endothelial growth factor (VEGF) and PI3K/mTOR pathway revealed individual and heterogeneous response profiles in those models, very similar to observations in patients. As unique features, our cohort includes PDX models from metastatic disease and multi-tumor regions from one patient, allowing extended studies on intra-tumor heterogeneity (ITH). The PDX models are further used as basis for developing corresponding in vitro cell culture models enabling advanced high-throughput drug screening in a personalized context. PDX models were subjected to next-generation sequencing (NGS). Characterization of cancer-relevant features including driver mutations or cellular processes was performed using mutational and gene expression data in order to identify potential biomarker or treatment targets in RCC. In summary, we report a newly established and molecularly characterized panel of RCC PDX models with high relevance for translational preclinical research.
RESUMO
Metastatic renal cell carcinoma (RCC) exhibits poor prognosis. Better knowledge of distant metastases is crucial to foster personalized treatment strategies. Here, we aimed to investigate the genetic landscape of metastases, including synchronous and/or recurrent metastases to elucidate potential drug target genes and clinically relevant mutations in a real-world setting of patients. We assessed 81 metastases from 56 RCC patients, including synchronous and/or recurrent metastases of 19 patients. Samples were analysed through next-generation sequencing with a high coverage (~1000× mean coverage). We therefore established a novel sequencing panel comprising 32 genes with impact on RCC development. We observed a high frequency of mutations in known RCC driver genes (e.g., >40% carriers of VHL and PBRM1 mutations) in metastases irrespective of the metastatic site. The somatic mutational composition was significantly associated with cancer-specific survival (p(logrank) = 0.03). Moreover, we identified in 34 patients at least one drug target gene as well as clinically relevant mutations listed in the VICC Meta-Knowledgebase in 7%. In addition to significantly higher mutational burden in recurrent metastases compared to earlier ones, synchronous and/or recurrent metastases of individual patients, even after a time-period >2 yrs, shared a high proportion of somatic events. Our data demonstrate the importance of somatic profiling in metastases for precision medicine in RCC.
RESUMO
Precision medicine attempts to individualize cancer therapy by matching tumor-specific genetic changes with effective targeted therapies. A crucial first step in this process is the reliable identification of cancer-relevant variants, which is considerably complicated by the impurity and heterogeneity of clinical tumor samples. We compared the impact of admixture of non-cancerous cells and low somatic allele frequencies on the sensitivity and precision of 19 state-of-the-art SNV callers. We studied both whole exome and targeted gene panel data and up to 13 distinct parameter configurations for each tool. We found vast differences among callers. Based on our comprehensive analyses we recommend joint tumor-normal calling with MuTect, EBCall or Strelka for whole exome somatic variant calling, and HaplotypeCaller or FreeBayes for whole exome germline calling. For targeted gene panel data on a single tumor sample, LoFreqStar performed best. We further found that tumor impurity and admixture had a negative impact on precision, and in particular, sensitivity in whole exome experiments. At admixture levels of 60% to 90% sometimes seen in pathological biopsies, sensitivity dropped significantly, even when variants were originally present in the tumor at 100% allele frequency. Sensitivity to low-frequency SNVs improved with targeted panel data, but whole exome data allowed more efficient identification of germline variants. Effective somatic variant calling requires high-quality pathological samples with minimal admixture, a consciously selected sequencing strategy, and the appropriate variant calling tool with settings optimized for the chosen type of data.
Assuntos
Benchmarking , Bases de Dados Genéticas , Neoplasias/genética , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Simulação por Computador , Exoma/genética , Frequência do Gene/genética , Células Germinativas/metabolismo , Humanos , Padrões de Referência , Reprodutibilidade dos Testes , Alinhamento de SequênciaRESUMO
Pancreatic ductal adenocarcinoma (PDAC) is a tumor with an extremely poor prognosis, predominantly as a result of chemotherapy resistance and numerous somatic mutations. Consequently, PDAC is a prime candidate for the use of sequencing to identify causative mutations, facilitating subsequent administration of targeted therapy. In a feasibility study, we retrospectively assessed the therapeutic recommendations of a novel, evidence-based software that analyzes next-generation sequencing (NGS) data using a large panel of pharmacogenomic biomarkers for efficacy and toxicity. Tissue from 14 patients with PDAC was sequenced using NGS with a 620 gene panel. FASTQ files were fed into treatmentmap. The results were compared with chemotherapy in the patients, including all side effects. No changes in therapy were made. Known driver mutations for PDAC were confirmed (e.g. KRAS, TP53). Software analysis revealed positive biomarkers for predicted effective and ineffective treatments in all patients. At least one biomarker associated with increased toxicity could be detected in all patients. Patients had been receiving one of the currently approved chemotherapy agents. In two patients, toxicity could have been correctly predicted by the software analysis. The results suggest that NGS, in combination with an evidence-based software, could be conducted within a 2-week period, thus being feasible for clinical routine. Therapy recommendations were principally off-label use. Based on the predominant KRAS mutations, other drugs were predicted to be ineffective. The pharmacogenomic biomarkers indicative of increased toxicity could be retrospectively linked to reported negative side effects in the respective patients. Finally, the occurrence of somatic and germline mutations in cancer syndrome-associated genes is noteworthy, despite a high frequency of these particular variants in the background population. These results suggest software-analysis of NGS data provides evidence-based information on effective, ineffective and toxic drugs, potentially forming the basis for precision cancer medicine in PDAC.