RESUMO
BACKGROUND: Recent advancements in high-throughput sequencing technologies have generated an unprecedented amount of genomic data that must be stored, processed, and transmitted over the network for sharing. Lossy genomic data compression, especially of the base quality values of sequencing data, is emerging as an efficient way to handle this challenge due to its superior compression performance compared to lossless compression methods. Many lossy compression algorithms have been developed for and evaluated using DNA sequencing data. However, whether these algorithms can be used on RNA sequencing (RNA-seq) data remains unclear. RESULTS: In this study, we evaluated the impacts of lossy quality value compression on common RNA-seq data analysis pipelines including expression quantification, transcriptome assembly, and short variants detection using RNA-seq data from different species and sequencing platforms. Our study shows that lossy quality value compression could effectively improve RNA-seq data compression. In some cases, lossy algorithms achieved up to 1.2-3 times further reduction on the overall RNA-seq data size compared to existing lossless algorithms. However, lossy quality value compression could affect the results of some RNA-seq data processing pipelines, and hence its impacts to RNA-seq studies cannot be ignored in some cases. Pipelines using HISAT2 for alignment were most significantly affected by lossy quality value compression, while the effects of lossy compression on pipelines that do not depend on quality values, e.g., STAR-based expression quantification and transcriptome assembly pipelines, were not observed. Moreover, regardless of using either STAR or HISAT2 as the aligner, variant detection results were affected by lossy quality value compression, albeit to a lesser extent when STAR-based pipeline was used. Our results also show that the impacts of lossy quality value compression depend on the compression algorithms being used and the compression levels if the algorithm supports setting of multiple compression levels. CONCLUSIONS: Lossy quality value compression can be incorporated into existing RNA-seq analysis pipelines to alleviate the data storage and transmission burdens. However, care should be taken on the selection of compression tools and levels based on the requirements of the downstream analysis pipelines to avoid introducing undesirable adverse effects on the analysis results.
Assuntos
Algoritmos , Compressão de Dados/métodos , Compressão de Dados/normas , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Sequência de Bases , Perfilação da Expressão Gênica , Genoma Humano , HumanosRESUMO
BACKGROUND: A number of simulators have been developed for emulating next-generation sequencing data by incorporating known errors such as base substitutions and indels. However, their practicality may be degraded by functional and runtime limitations. Particularly, the positional and genomic contextual information is not effectively utilized for reliably characterizing base substitution patterns, as well as the positional and contextual difference of Phred quality scores is not fully investigated. Thus, a more effective and efficient bioinformatics tool is sorely required. RESULTS: Here, we introduce a novel tool, SimuSCoP, to reliably emulate complex DNA sequencing data. The base substitution patterns and the statistical behavior of quality scores in Illumina sequencing data are fully explored and integrated into the simulation model for reliably emulating datasets for different applications. In addition, an integrated and easy-to-use pipeline is employed in SimuSCoP to facilitate end-to-end simulation of complex samples, and high runtime efficiency is achieved by implementing the tool to run in multithreading with low memory consumption. These features enable SimuSCoP to gets substantial improvements in reliability, functionality, practicality and runtime efficiency. The tool is comprehensively evaluated in multiple aspects including consistency of profiles, simulation of genomic variations and complex tumor samples, and the results demonstrate the advantages of SimuSCoP over existing tools. CONCLUSIONS: SimuSCoP, a new bioinformatics tool is developed to learn informative profiles from real sequencing data and reliably mimic complex data by introducing various genomic variations. We believe that the presented work will catalyse new development of downstream bioinformatics methods for analyzing sequencing data.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Simulação por Computador , Genômica/métodos , Reprodutibilidade dos TestesRESUMO
BACKGROUND: GATK Best Practices workflows are widely used in large-scale sequencing projects and recommend post-alignment processing before variant calling. Two key post-processing steps include the computationally intensive local realignment around known INDELs and base quality score recalibration (BQSR). Both have been shown to reduce erroneous calls; however, the findings are mainly supported by the analytical pipeline that incorporates BWA and GATK UnifiedGenotyper. It is not known whether there is any benefit of post-processing and to what extent the benefit might be for pipelines implementing other methods, especially given that both mappers and callers are typically updated. Moreover, because sequencing platforms are upgraded regularly and the new platforms provide better estimations of read quality scores, the need for post-processing is also unknown. Finally, some regions in the human genome show high sequence divergence from the reference genome; it is unclear whether there is benefit from post-processing in these regions. RESULTS: We used both simulated and NA12878 exome data to comprehensively assess the impact of post-processing for five or six popular mappers together with five callers. Focusing on chromosome 6p21.3, which is a region of high sequence divergence harboring the human leukocyte antigen (HLA) system, we found that local realignment had little or no impact on SNP calling, but increased sensitivity was observed in INDEL calling for the Stampy + GATK UnifiedGenotyper pipeline. No or only a modest effect of local realignment was detected on the three haplotype-based callers and no evidence of effect on Novoalign. BQSR had virtually negligible effect on INDEL calling and generally reduced sensitivity for SNP calling that depended on caller, coverage and level of divergence. Specifically, for SAMtools and FreeBayes calling in the regions with low divergence, BQSR reduced the SNP calling sensitivity but improved the precision when the coverage is insufficient. However, in regions of high divergence (e.g., the HLA region), BQSR reduced the sensitivity of both callers with little gain in precision rate. For the other three callers, BQSR reduced the sensitivity without increasing the precision rate regardless of coverage and divergence level. CONCLUSIONS: We demonstrated that the gain from post-processing is not universal; rather, it depends on mapper and caller combination, and the benefit is influenced further by sequencing depth and divergence level. Our analysis highlights the importance of considering these key factors in deciding to apply the computationally intensive post-processing to Illumina exome data.
Assuntos
Biologia Computacional/métodos , Biologia Computacional/normas , Exoma/genética , Alinhamento de Sequência/métodos , Software , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Fluxo de TrabalhoRESUMO
BACKGROUND: There is a limited understanding of site-specific, quality of life (QOL) outcomes in anterior skull base surgery (ASBS). The objective of the present investigation was to characterize postoperative change in QOL outcomes for anterior skull base lesions following open and endoscopic surgery. METHODS: A comprehensive review of the literature was performed according to Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines using the PubMed, Scopus, Embase, and Cochrane databases for studies reporting pre- and postoperative, site-specific, QOL outcome measures in ASBS using validated questionnaires. Studies utilizing the anterior skull base quality of life (ASBQ) questionnaire or the skull base inventory were included. Investigations focusing on skull base surgery for pituitary lesions, as well as survey validation and non-English studies, were excluded. RESULTS: A total of 112 studies were screened; 4 studies, comprising a total of 195 patients and focusing exclusively on the ASBQ, were included in the systematic review. Using a fixed effect model for the meta-analysis, the mean ASBQ score was similar at six (3.45, P = 0.312; -0.19, 95% confidence interval: -0.57, 0.18) and 12 months postoperatively (3.6, P = 0.147; 0.3, 95% confidence interval: -0.11, 0.72) compared to baseline (3.53). CONCLUSIONS: Across a variety of anterior skull base pathologies, skull base-specific QOL demonstrated no improvement at 6 months and 12 months postsurgery. Few studies to date have published pre- and postoperative QOL data for patients undergoing ASBS, highlighting a current shortcoming in the available literature. Long-term follow-up in patients undergoing open and endoscopic approaches will be necessary to better understand and optimize outcomes for patients having ASBS.
Assuntos
Qualidade de Vida , Neoplasias da Base do Crânio , Base do Crânio , Humanos , Base do Crânio/cirurgia , Neoplasias da Base do Crânio/cirurgia , Procedimentos Neurocirúrgicos/métodos , Resultado do Tratamento , Neuroendoscopia/métodosRESUMO
Next generation sequencing (NGS) is routinely used in clinical genetic testing. Quality management of NGS testing is essential to ensure performance is consistently and rigorously evaluated. Three primary metrics are used in NGS quality evaluation: depth of coverage, base quality and mapping quality. To provide consistency and transparency in the utilisation of these metrics we present the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene or other genomic region, as appropriate. The QSM format is CX_BY(P Y)_MZ(P Z). X is the parameter threshold for C, Y the parameter threshold for B, P Y the percentage of reads that must reach Y, Z the parameter threshold for M, P Z the percentage of reads that must reach Z. The data underlying the QSM is in the BAM file, so a QSM can be easily and automatically calculated in any NGS pipeline. We used the QSM to optimise cancer predisposition gene testing using the TruSight Cancer Panel (TSCP). We set the QSM as C50_B10(85)_M20(95). Test regions falling below the QSM were automatically flagged for review, with 100/1471 test regions QSM-flagged in multiple individuals. Supplementing these regions with 132 additional probes improved performance in 85/100. We also used the QSM to optimise testing of genes with pseudogenes such as PTEN and PMS2. In TSCP data from 960 individuals the median number of regions that passed QSM per sample was 1429 (97%). Importantly, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. We believe many laboratories would find the QSM useful. Furthermore, widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories.
RESUMO
Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView.
RESUMO
OBJECT: Craniopharyngiomas are benign parasellar tumors for which surgical removal, although potentially curative, often leads to morbidity with resulting decreases in quality of life (QOL). The endonasal endoscopic approach is a minimal-access technique for removing these tumors and may reduce postoperative morbidity. The QOL following this method for resection of craniopharyngiomas has not been documented. METHODS: The authors reviewed a database of consecutive endonasal endoscopic surgeries done at Weill Cornell Medical College. Adult patients with histologically proven craniopharyngiomas were included who had completed either only postoperative (> 9 months) or both pre- and postoperative QOL forms, the Anterior Skull Base Quality of Life (ASBQ) questionnaire, and the 22-Item Sinonasal Outcome Test (SNOT-22). Rates of gross-total resection (GTR), complications, and visual and endocrine function were collected. Retrospective independence (Wen score) was also assigned. A contemporaneous group of patients undergoing endonasal endoscopic pituitary macroadenoma resection was used as a control. RESULTS: This study included 33 procedures performed in 31 patients. The average postoperative ASBQ score was 3.35 and the SNOT-22 score was 19.6. Better QOL was associated with GTR and postoperative radiation. Worse QOL was associated with persistent visual defects, hypopituitarism, tumor recurrence, increase in body mass index, and worsening Wen score. In a subset of 10 patients, both pre- and postoperative (> 9 months) QOL scores were obtained. Both ASBQ and SNOT-22 scores showed stability and a trend toward improvement, from 2.93 ± 0.51 to 2.96 ± 0.47 (ASBQ) and 23.7 ± 10.8 to 18.4 ± 11.6 (SNOT-22). Compared with 62 patients undergoing endoscopic pituitary macroadenoma resection, patients with craniopharyngiomas had worse postoperative QOL on the ASBQ (3.35 vs 3.80; p = 0.023) and SNOT-22 (19.6 vs 13.4; p = 0.12). CONCLUSIONS: This report of validated site-specific QOL following endoscopic surgery for craniopharyngiomas shows an overall maintenance of postoperative compared with preoperative QOL. Better QOL could be seen in patients with GTR and radiation therapy, and worse QOL was found in patients with visual or endocrine deficits. Nevertheless, patients with craniopharyngiomas still had worse QOL than those undergoing similar surgery for pituitary macroadenomas, confirming the worse prognosis of craniopharyngiomas even when removed via a minimally invasive approach. These measures should serve as benchmarks for comparison with open transcranial approaches to similar tumors.