Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32379294

RESUMO

Somatic structural variants (SVs), which are variants that typically impact >50 nucleotides, play a significant role in cancer development and evolution but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of seven commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the seven SV callers examined in this paper. As the importance of large SVs become increasingly recognized in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection that should be considered when choosing SV callers.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Frequência do Gene , Variação Genética , Humanos , Neoplasias/patologia , Análise de Sequência de DNA/métodos
2.
Int J Mol Sci ; 22(11)2021 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-34073316

RESUMO

Circulating cell-free DNA (cfDNA) is emerging as a potential tumor biomarker. CfDNA-based biomarkers may be applicable in tumors without an available non-invasive screening method among at-risk populations. Esophageal squamous cell carcinoma (ESCC) and residents of the Asian cancer belt are examples of those malignancies and populations. Previous epidemiological studies using cfDNA have pointed to the need for high volumes of good quality plasma (i.e., >1 mL plasma with 0 or 1 cycles of freeze-thaw) rather than archival serum, which is often the main available source of cfDNA in retrospective studies. Here, we have investigated the concordance of TP53 mutations in tumor tissue and cfDNA extracted from archival serum left-over from 42 cases and 39 matched controls (age, gender, residence) in a high-risk area of Northern Iran (Golestan). Deep sequencing of TP53 coding regions was complemented with a specialized variant caller (Needlestack). Overall, 23% to 31% of mutations were concordantly detected in tumor and serum cfDNA (based on two false discovery rate thresholds). Concordance was positively correlated with high cfDNA concentration, smoking history (p-value = 0.02) and mutations with a high potential of neoantigen formation (OR; 95%CI = 1.9 (1.11-3.29)), suggesting that tumor DNA release in the bloodstream might reflect the effects of immune and inflammatory context on tumor cell turnover. We identified TP53 mutations in five controls, one of whom was subsequently diagnosed with ESCC. Overall, the results showed that cfDNA mutations can be reliably identified by deep sequencing of archival serum, with a rate of success comparable to plasma. Nonetheless, 70% non-identifiable mutations among cancer patients and 12% mutation detection in controls are the main challenges in applying cfDNA to detect tumor-related variants when blindly targeting whole coding regions of the TP53 gene in ESCC.


Assuntos
DNA Tumoral Circulante/genética , Neoplasias Esofágicas/genética , Carcinoma de Células Escamosas do Esôfago/genética , Mutação , Proteína Supressora de Tumor p53/genética , DNA Tumoral Circulante/sangue , Neoplasias Esofágicas/sangue , Carcinoma de Células Escamosas do Esôfago/sangue , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Soro , Proteína Supressora de Tumor p53/sangue
3.
Int J Mol Sci ; 21(10)2020 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-32429412

RESUMO

Cancer gene panel testing requires accurate detection of somatic mosaic mutations, as the test sample consists of a mixture of cancer cells and normal cells; each minor clone in the tumor also has different somatic mutations. Several studies have shown that the different types of software used for variant calling for next generation sequencing (NGS) can detect low-frequency somatic mutations. However, the accuracy of these somatic variant callers is unknown. We performed cancer gene panel testing in duplicate experiments using three different high-fidelity DNA polymerases in pre-capture amplification steps and analyzed by three different variant callers, Strelka2, Mutect2, and LoFreq. We selected six somatic variants that were detected in both experiments with more than two polymerases and by at least one variant caller. Among them, five single nucleotide variants were verified by CEL nuclease-mediated heteroduplex incision with polyacrylamide gel electrophoresis and silver staining (CHIPS) and Sanger sequencing. In silico analysis indicated that the FBXW7 and MAP3K1 missense mutations cause damage at the protein level. Comparing three somatic variant callers, we found that Strelka2 detected more variants than Mutect2 and LoFreq. We conclude that dual sequencing with Strelka2 analysis is useful for detection of accurate somatic mutations in cancer gene panel testing.


Assuntos
Genes Neoplásicos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação/genética , Neoplasias/genética , Sequência de Bases , DNA Polimerase Dirigida por DNA/metabolismo , Feminino , Frequência do Gene/genética , Humanos , Pessoa de Meia-Idade , Reprodutibilidade dos Testes
4.
BMC Bioinformatics ; 18(1): 8, 2017 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-28049408

RESUMO

BACKGROUND: Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant. RESULTS: Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision. CONCLUSIONS: The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.


Assuntos
Exoma/genética , Neoplasias Renais/genética , Algoritmos , DNA de Neoplasias/química , DNA de Neoplasias/metabolismo , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias Renais/patologia , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
5.
BMC Genomics ; 18(1): 5, 2017 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-28049435

RESUMO

BACKGROUND: Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. RESULTS: We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. CONCLUSIONS: We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.


Assuntos
Alelos , Sequência de Bases , Código de Barras de DNA Taxonômico , Frequência do Gene , Variação Genética , Biologia Computacional/métodos , Modelos Estatísticos , Reação em Cadeia da Polimerase Multiplex , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
6.
Front Genet ; 14: 1227176, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37533432

RESUMO

Calling tandem repeat (TR) variants from DNA sequences is of both theoretical and practical significance. Some bioinformatics tools have been developed for detecting or genotyping TRs. However, little study has been done to genotyping TR alleles from long-read sequencing data, and the accuracy of genotyping TR alleles from next-generation sequencing data still needs to be improved. Herein, a novel algorithm is described to retrieve TR regions from sequence alignment, and a software program TRcaller has been developed and integrated into a web portal to call TR alleles from both short- and long-read sequences, both whole genome and targeted sequences generated from multiple sequencing platforms. All TR alleles are genotyped as haplotypes and the robust alleles will be reported, even multiple alleles in a DNA mixture. TRcaller could provide substantially higher accuracy (>99% in 289 human individuals) in detecting TR alleles with magnitudes faster (e.g., ∼2 s for 300x human sequence data) than the mainstream software tools. The web portal preselected 119 TR loci from forensics, genealogy, and disease related TR loci. TRcaller is validated to be scalable in various applications, such as DNA forensics and disease diagnosis, which can be expanded into other fields like breeding programs. Availability: TRcaller is available at https://www.trcaller.com/SignIn.aspx.

7.
Front Genet ; 13: 692257, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35350246

RESUMO

Mitochondrial DNA (mtDNA) mutations contribute to human disease across a range of severity, from rare, highly penetrant mutations causal for monogenic disorders to mutations with milder contributions to phenotypes. mtDNA variation can exist in all copies of mtDNA or in a percentage of mtDNA copies and can be detected with levels as low as 1%. The large number of copies of mtDNA and the possibility of multiple alternative alleles at the same DNA nucleotide position make the task of identifying allelic variation in mtDNA very challenging. In recent years, specialized variant calling algorithms have been developed that are tailored to identify mtDNA variation from whole-genome sequencing (WGS) data. However, very few studies have systematically evaluated and compared these methods for the detection of both homoplasmy and heteroplasmy. A publicly available synthetic gold standard dataset was used to assess four mtDNA variant callers (Mutserve, mitoCaller, MitoSeek, and MToolBox), and the commonly used Genome Analysis Toolkit "best practices" pipeline, which is included in most current WGS pipelines. We also used WGS data from 126 trios and calculated the percentage of maternally inherited variants as a metric of calling accuracy, especially for homoplasmic variants. We additionally compared multiple pathogenicity prediction resources for mtDNA variants. Although the accuracy of homoplasmic variant detection was high for the majority of the callers with high concordance across callers, we found a very low concordance rate between mtDNA variant callers for heteroplasmic variants ranging from 2.8% to 3.6%, for heteroplasmy thresholds of 5% and 1%. Overall, Mutserve showed the best performance using the synthetic benchmark dataset. The analysis of mtDNA pathogenicity resources also showed low concordance in prediction results. We have shown that while homoplasmic variant calling is consistent between callers, there remains a significant discrepancy in heteroplasmic variant calling. We found that resources like population frequency databases and pathogenicity predictors are now available for variant annotation but still need refinement and improvement. With its peculiarities, the mitochondria require special considerations, and we advocate that caution needs to be taken when analyzing mtDNA data from WGS data.

8.
Front Genet ; 13: 1096797, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36685885

RESUMO

A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/ via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only.

9.
Comput Struct Biotechnol J ; 19: 343-354, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33489004

RESUMO

Single cell genomics offers an unprecedented resolution to interrogate genetic heterogeneity in a patient's tumour at the intercellular level. However, the DNA yield per cell is insufficient for today's sequencing library preparation protocols. This necessitates DNA amplification which is a key source of experimental noise. We provide an evaluation of two protocols using micro-fluidics based amplification for whole exome sequencing, which is an experimental scenario commonly used in single cell genomics. The results highlight their respective biases and relative strengths in identification of single nucleotide variations. Towards this end, we introduce a workflow SoVaTSiC, which allows for quality evaluation and somatic variant identification of single cell data. As proof of concept, the framework was applied to study a lung adenocarcinoma tumour. The analysis provides insights into tumour phylogeny by identifying key mutational events in lung adenocarcinoma evolution. The consequence of this inference is supported by the histology of the tumour and demonstrates usefulness of the approach.

10.
Oncotarget ; 7(48): 79485-79493, 2016 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-27825131

RESUMO

Highlighting tumoral mutations is a key step in oncology for personalizing care. Considering the genetic heterogeneity in a tumor, software used for detecting mutations should clearly distinguish real tumor events of interest that could be predictive markers for personalized medicine from false positives. OutLyzer is a new variant-caller designed for the specific and sensitive detection of mutations for research and diagnostic purposes. It is based on statistic and local evaluation of sequencing background noise to highlight potential true positive variants. 130 previously genotyped patients were sequenced after enrichment by capturing the exons of 22 genes. Sequencing data were analyzed by HaplotypeCaller, LofreqStar, Varscan2 and OutLyzer. OutLyzer had the best sensitivity and specificity with a fixed limit of detection for all tools of 1% for SNVs and 2% for Indels. OutLyzer is a useful tool for detecting mutations of interest in tumors including low allele-frequency mutations, and could be adopted in standard practice for delivering targeted therapies in cancer treatment.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Neoplasias/genética , Análise de Sequência de DNA/métodos , Éxons , Frequência do Gene , Genótipo , Humanos , Medicina de Precisão , Software
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa