RESUMEN
OBJECTIVE: Administration of targeted therapies provides a promising treatment strategy for urachal adenocarcinoma (UrC) or primary bladder adenocarcinoma (PBAC); however, the selection of appropriate drugs remains difficult. Here, we aimed to establish a routine compatible methodological pipeline for the identification of the most important therapeutic targets and potentially effective drugs for UrC and PBAC. METHODS: Next-generation sequencing, using a 161 cancer driver gene panel, was performed on 41 UrC and 13 PBAC samples. Clinically relevant alterations were filtered, and therapeutic interpretation was performed by in silico evaluation of drug-gene interactions. RESULTS: After data processing, 45/54 samples passed the quality control. Sequencing analysis revealed 191 pathogenic mutations in 68 genes. The most frequent gain-of-function mutations in UrC were found in KRAS (33%), and MYC (15%), while in PBAC KRAS (25%), MYC (25%), FLT3 (17%) and TERT (17%) were recurrently affected. The most frequently affected pathways were the cell cycle regulation, and the DNA damage control pathway. Actionable mutations with at least one available approved drug were identified in 31/33 (94%) UrC and 8/12 (67%) PBAC patients. CONCLUSIONS: In this study, we developed a data-processing pipeline for the detection and therapeutic interpretation of genetic alterations in two rare cancers. Our analyses revealed actionable mutations in a high rate of cases, suggesting that this approach is a potentially feasible strategy for both UrC and PBAC treatments.
Asunto(s)
Adenocarcinoma , Neoplasias de la Vejiga Urinaria , Humanos , Vejiga Urinaria/patología , Proteínas Proto-Oncogénicas p21(ras)/genética , Adenocarcinoma/genética , Adenocarcinoma/patología , Mutación , Neoplasias de la Vejiga Urinaria/patología , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
BACKGROUND: Clinical laboratories routinely use formalin-fixed paraffin-embedded (FFPE) tissue or cell block cytology samples in oncology panel sequencing to identify mutations that can predict patient response to targeted therapy. To understand the technical error due to FFPE processing, a robustly characterized diploid cell line was used to create FFPE samples with four different pre-tissue processing formalin fixation times. A total of 96 FFPE sections were then distributed to different laboratories for targeted sequencing analysis by four oncopanels, and variants resulting from technical error were identified. RESULTS: Tissue sections that fail more frequently show low cellularity, lower than recommended library preparation DNA input, or target sequencing depth. Importantly, sections from block surfaces are more likely to show FFPE-specific errors, akin to "edge effects" seen in histology, while the inner samples display no quality degradation related to fixation time. CONCLUSIONS: To assure reliable results, we recommend avoiding the block surface portion and restricting mutation detection to genomic regions of high confidence.
Asunto(s)
Formaldehído , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Adhesión en Parafina , Análisis de Secuencia de ADN , Fijación del TejidoRESUMEN
Molecular variants including single nucleotide variants (SNVs), copy number variants (CNVs) and fusions can be detected in the clinical setting using deep targeted sequencing. These assays support low limits of detection using little genomic input material. They are gaining in popularity in clinical laboratories, where sample volumes are limited, and low variant allele fractions may be present. However, data on reproducibility between laboratories is limited. Using a ring study, we evaluated the performance of 7 Ontario laboratories using targeted sequencing panels. All laboratories analysed a series of control and clinical samples for SNVs/CNVs and gene fusions. High concordance was observed across laboratories for measured CNVs and SNVs. Over 97% of SNV calls in clinical samples were detected by all laboratories. Whilst only a single CNV was detected in the clinical samples tested, all laboratories were able to reproducibly report both the variant and copy number. Concordance for information derived from RNA was lower than observed for DNA, due largely to decreased quality metrics associated with the RNA components of the assay, suggesting that the RNA portions of comprehensive NGS assays may be more vulnerable to variations in approach and workflow. Overall the results of this study support the use of the OFA for targeted sequencing for testing of clinical samples and suggest specific internal quality metrics that can be reliable indicators of assay failure. While we believe this evidence can be interpreted to support deep targeted sequencing in general, additional studies should be performed to confirm this.
Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Proteínas de Neoplasias/aislamiento & purificación , Neoplasias/genética , ADN de Neoplasias/genética , Humanos , Mutación/genética , Proteínas de Neoplasias/genética , Neoplasias/patología , ARN Neoplásico/genéticaRESUMEN
BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.
Asunto(s)
Biomarcadores de Tumor , Pruebas Genéticas/métodos , Genómica/métodos , Neoplasias/genética , Oncogenes , Variaciones en el Número de Copia de ADN , Pruebas Genéticas/normas , Genómica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutación , Neoplasias/diagnóstico , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Asunto(s)
Alelos , Biomarcadores de Tumor , Frecuencia de los Genes , Pruebas Genéticas/métodos , Variación Genética , Genómica/métodos , Neoplasias/genética , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , Heterogeneidad Genética , Pruebas Genéticas/normas , Genómica/normas , Humanos , Neoplasias/diagnóstico , Flujo de TrabajoRESUMEN
BACKGROUND: Tumor mutational burden (TMB) is an increasingly important biomarker for immune checkpoint inhibitors. Recent publications have described strong association between high TMB and objective response to mono- and combination immunotherapies in several cancer types. Existing methods to estimate TMB require large amount of input DNA, which may not always be available. METHODS: In this study, we develop a method to estimate TMB using the Oncomine Tumor Mutation Load (TML) Assay with 20 ng of DNA, and we characterize the performance of this method on various formalin-fixed, paraffin-embedded (FFPE) research samples of several cancer types. We measure the analytical performance of TML workflow through comparison with control samples with known truth, and we compare performance with an orthogonal method which uses matched normal sample to remove germline variants. We perform whole exome sequencing (WES) on a batch of FFPE samples and compare the WES TMB values with TMB estimates by the TML assay. RESULTS: In-silico analyses demonstrated the Oncomine TML panel has sufficient genomic coverage to estimate somatic mutations with a strong correlation (r2=0.986) to WES. Further, in silico prediction using WES data from three separate cohorts and comparing with a subset of the WES overlapping with the TML panel, confirmed the ability to stratify responders and non-responders to immune checkpoint inhibitors with high statistical significance. We found the rate of somatic mutations with the TML assay on cell lines and control samples were similar to the known truth. We verified the performance of germline filtering using only a tumor sample in comparison to a matched tumor-normal experimental design to remove germline variants. We compared TMB estimates by the TML assay with that from WES on a batch of FFPE research samples and found high correlation (r2=0.83). We found biologically interesting tumorigenesis signatures on FFPE research samples of colorectal cancer (CRC), lung, and melanoma origin. Further, we assessed TMB on a cohort of FFPE research samples including lung, colon, and melanoma tumors to discover the biologically relevant range of TMB values. CONCLUSIONS: These results show that the TML assay targeting a 1.7-Mb genomic footprint can accurately predict TMB values that are comparable to the WES. The TML assay workflow incorporates a simple workflow using the Ion GeneStudio S5 System. Further, the AmpliSeq chemistry allows the use of low input DNA to estimate mutational burden from FFPE samples. This TMB assay enables scalable, robust research into immuno-oncology biomarkers with scarce samples.
RESUMEN
BACKGROUND: Gene-fusion or chimeric transcripts have been implicated in the onset and progression of a variety of cancers. Massively parallel RNA sequencing (RNA-Seq) of the cellular transcriptome is a promising approach for the identification of chimeric transcripts of potential functional significance. We report here the development and use of an integrated computational pipeline for the de novo assembly and characterization of chimeric transcripts in 55 primary breast cancer and normal tissue samples. METHODS: An integrated computational pipeline was employed to screen the transcriptome of breast cancer and control tissues for high-quality RNA-sequencing reads. Reads were de novo assembled into contigs followed by reference genome mapping. Chimeric transcripts were detected, filtered and characterized using our R-SAP algorithm. The relative abundance of reads was used to estimate levels of gene expression. RESULTS: De novo assembly allowed for the accurate detection of 1959 chimeric transcripts to nucleotide level resolution and facilitated detailed molecular characterization and quantitative analysis. A number of the chimeric transcripts are of potential functional significance including 79 novel fusion-protein transcripts and many chimeric transcripts with alterations in their un-translated leader regions. A number of chimeric transcripts in the cancer samples mapped to genomic regions devoid of any known genes. Several 'pro-neoplastic' fusions comprised of genes previously implicated in cancer are expressed at low levels in normal tissues but at high levels in cancer tissues. CONCLUSIONS: Collectively, our results underscore the utility of deep sequencing technologies and improved bioinformatics workflows to uncover novel and potentially significant chimeric transcripts in cancer and normal somatic tissues.
Asunto(s)
Neoplasias de la Mama/genética , Perfilación de la Expresión Génica , Fusión Génica/genética , Mama/citología , Mama/patología , Neoplasias de la Mama/patología , HumanosRESUMEN
BACKGROUND: Genomic rearrangements or structural variants (SVs) are one of the most common classes of mutations in cancer. METHODS: An integrated DNA sequencing and transcriptional profiling (RNA sequence and microarray gene expression data) analysis was performed on six ovarian cancer patient samples. Matched sets of control (whole blood) samples from these same patients were used to distinguish cancer SVs of germline origin from those arising somatically in the cancer cell lineage. RESULTS: We detected 10,034 ovarian cancer SVs (5518 germline derived; 4516 somatically derived) at base-pair level resolution. Only 11 % of these variants were shown to have the potential to form gene fusions and, of these, less than 20 % were detected at the transcriptional level. CONCLUSIONS: Collectively our results are consistent with the view that gene fusions and other SVs can be significant factors in the onset and progression of ovarian cancer. The results further indicate that it may not only be the occurrence of these variants in cancer but their regulation that contributes to their biological and clinical significance.
Asunto(s)
Perfilación de la Expresión Génica , Fusión Génica/genética , Variación Genética/genética , Neoplasias Ováricas/genética , Análisis de Secuencia de ADN , Rotura Cromosómica , Mapeo Cromosómico , Progresión de la Enfermedad , Femenino , Humanos , Intrones/genética , Integración de SistemasRESUMEN
The feasibility of representing the excitation source characteristics in expressive voice signals by an aperiodic sequence of impulses in the time domain is examined in this paper. In particular, the aperiodic components of excitation of expressive voices, like the Noh voice, are examined in some detail. The aperiodic component is extracted from the speech signal using a modified zero-frequency filtering method, and it is represented using a sequence of impulses with amplitudes corresponding to the relative strength of excitation around each impulse. The spectral characteristics of the aperiodic sequence show subharmonics and harmonics of the fundamental frequency corresponding to pitch. The effects of aperiodicity are examined using spectrograms and saliency plots of synthetic amplitude and duration (i.e., frequency) modulation of sequences of impulses.
RESUMEN
Characteristics of glottal vibration are affected by the obstruction to the flow of air through the vocal tract system. The obstruction to the airflow is determined by the nature, location, and extent of constriction in the vocal tract during production of voiced sounds. The effects of constriction on glottal vibration are examined for six different categories of speech sounds having varying degree of constriction. The effects are examined in terms of source and system features derived from the speech and electroglottograph signals. It is observed that a high degree of constriction causing obstruction to the flow of air results in large changes in these features, relative to the adjacent steady vowel regions, as in the case of apical trill and alveolar fricative sounds. These changes are insignificant when the obstruction to the airflow is less, as in the case of velar fricative and lateral approximant sounds. There are no changes in the excitation features when there is a free flow of air along the auxiliary tract, despite constriction in the vocal tract, as in the case of nasals. These studies show that effects of constriction can indeed be observed in the features of glottal vibration as well as vocal tract resonances.
RESUMEN
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.
Asunto(s)
Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Programas Informáticos , Línea Celular , Biología Computacional/métodos , Genoma Humano , Humanos , Alineación de SecuenciaRESUMEN
BACKGROUND: Although humans and chimpanzees have accumulated significant differences in a number of phenotypic traits since diverging from a common ancestor about six million years ago, their genomes are more than 98.5% identical at protein-coding loci. This modest degree of nucleotide divergence is not sufficient to explain the extensive phenotypic differences between the two species. It has been hypothesized that the genetic basis of the phenotypic differences lies at the level of gene regulation and is associated with the extensive insertion and deletion (INDEL) variation between the two species. To test the hypothesis that large INDELs (80 to 12,000 bp) may have contributed significantly to differences in gene regulation between the two species, we categorized human-chimpanzee INDEL variation mapping in or around genes and determined whether this variation is significantly correlated with previously determined differences in gene expression. RESULTS: Extensive, large INDEL variation exists between the human and chimpanzee genomes. This variation is primarily attributable to retrotransposon insertions within the human lineage. There is a significant correlation between differences in gene expression and large human-chimpanzee INDEL variation mapping in genes or in proximity to them. CONCLUSIONS: The results presented herein are consistent with the hypothesis that large INDELs, particularly those associated with retrotransposons, have played a significant role in human-chimpanzee regulatory evolution.
RESUMEN
The problem of predicting non-long terminal repeats (LTR) like long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) from the DNA sequence is still an open problem in bioinformatics. To elevate the quality of annotations of LINES and SINEs an automated tool "RetroPred" was developed. The pipeline allowed rapid and thorough annotation of non-LTR retrotransposons. The non-LTR retrotransposable elements were initially predicted by Pairwise Aligner for Long Sequences (PALS) and Parsimonious Inference of a Library of Elementary Repeats (PILER). Predicted non-LTR elements were automatically classified into LINEs and SINEs using ANN based on the position specific probability matrix (PSPM) generated by Multiple EM for Motif Elicitation (MEME). The ANN model revealed a superior model (accuracy = 78.79 +/- 6.86 %, Q(pred) = 74.734 +/- 17.08 %, sensitivity = 84.48 +/- 6.73 %, specificity = 77.13 +/- 13.39 %) using four-fold cross validation. As proof of principle, we have thoroughly annotated the location of LINEs and SINEs in rice and Arabidopsis genome using the tool and is proved to be very useful with good accuracy. Our tool is accessible at http://www.juit.ac.in/RepeatPred/home.html.