Pesquisa | BVS Doenças Infecciosas e Parasitárias

Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks.

Su, Junhao; Zheng, Zhenxian; Ahmed, Syed Shakeel; Lam, Tak-Wah; Luo, Ruibang.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35849103

RESUMO

Accurate identification of genetic variants from family child-mother-father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio's predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.

Assuntos

Nanoporos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Redes Neurais de Computação , Análise de Sequência de DNA , Software

Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP.

Yu, Huijing; Zheng, Zhenxian; Su, Junhao; Lam, Tak-Wah; Luo, Ruibang.

BMC Bioinformatics ; 24(1): 308, 2023 Aug 03.

Artigo em Inglês | MEDLINE | ID: mdl-37537536

RESUMO

BACKGROUND: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. RESULTS: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP . CONCLUSIONS: These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications.

Assuntos

Genoma , Genômica , Reprodutibilidade dos Testes , Sequenciamento de Nucleotídeos em Larga Escala

ClusterV-Web: a user-friendly tool for profiling HIV quasispecies and generating drug resistance reports from nanopore long-read data.

Su, Junhao; Li, Shumin; Zheng, Zhenxian; Lam, Tak-Wah; Luo, Ruibang.

Bioinform Adv ; 4(1): vbae006, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38282975

RESUMO

Summary: Third-generation long-read sequencing is an increasingly utilized technique for profiling human immunodeficiency virus (HIV) quasispecies and detecting drug resistance mutations due to its ability to cover the entire viral genome in individual reads. Recently, the ClusterV tool has demonstrated accurate detection of HIV quasispecies from Nanopore long-read sequencing data. However, the need for scripting skills and a computational environment may act as a barrier for many potential users. To address this issue, we have introduced ClusterV-Web, a user-friendly web-based application that enables easy configuration and execution of ClusterV, both remotely and locally. Our tool provides interactive tables and data visualizations to aid in the interpretation of results. This development is expected to democratize access to long-read sequencing data analysis, enabling a wider range of researchers and clinicians to efficiently profile HIV quasispecies and detect drug resistance mutations. Availability and implementation: ClusterV-Web is freely available and open source, with detailed documentation accessible at http://www.bio8.cs.hku.hk/ClusterVW/. The standalone Docker image and source code are also available at https://github.com/HKU-BAL/ClusterV-Web.

Evaluation of Mycobacterium tuberculosis enrichment in metagenomic samples using ONT adaptive sequencing and amplicon sequencing for identification and variant calling.

Su, Junhao; Lui, Wui Wang; Lee, YanLam; Zheng, Zhenxian; Siu, Gilman Kit-Hang; Ng, Timothy Ting-Leung; Zhang, Tong; Lam, Tommy Tsan-Yuk; Lao, Hiu-Yin; Yam, Wing-Cheong; Tam, Kingsley King-Gee; Leung, Kenneth Siu-Sing; Lam, Tak-Wah; Leung, Amy Wing-Sze; Luo, Ruibang.

Sci Rep ; 13(1): 5237, 2023 03 31.

Artigo em Inglês | MEDLINE | ID: mdl-37002338

RESUMO

Sensitive detection of Mycobacterium tuberculosis (TB) in small percentages in metagenomic samples is essential for microbial classification and drug resistance prediction. However, traditional methods, such as bacterial culture and microscopy, are time-consuming and sometimes have limited TB detection sensitivity. Oxford nanopore technologies (ONT) MinION sequencing allows rapid and simple sample preparation for sequencing. Its recently developed adaptive sequencing selects reads from targets while allowing real-time base-calling to achieve sequence enrichment or depletion during sequencing. Another common enrichment method is PCR amplification of the target TB genes. In this study, we compared both methods using ONT MinION sequencing for TB detection and variant calling in metagenomic samples using both simulation runs and those with synthetic and patient samples. We found that both methods effectively enrich TB reads from a high percentage of human (95%) and other microbial DNA. Adaptive sequencing with readfish and UNCALLDE achieved a 3.9-fold and 2.2-fold enrichment compared to the control run. We provide a simple automatic analysis framework to support the detection of TB for clinical use, openly available at https://github.com/HKU-BAL/ONT-TB-NF . Depending on the patient's medical condition and sample type, we recommend users evaluate and optimize their workflow for different clinical specimens to improve the detection limit.

Assuntos

Mycobacterium tuberculosis , Nanoporos , Humanos , Mycobacterium tuberculosis/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Metagenoma , Simulação por Computador , Análise de Sequência de DNA

Symphonizing pileup and full-alignment for deep learning-based long-read variant calling.

Zheng, Zhenxian; Li, Shumin; Su, Junhao; Leung, Amy Wing-Sze; Lam, Tak-Wah; Luo, Ruibang.

Nat Comput Sci ; 2(12): 797-803, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38177392

RESUMO

Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs faster than any of the other state-of-the-art variant callers and demonstrates improved performance, especially at lower coverage.

Assuntos

Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética

ECNano: A cost-effective workflow for target enrichment sequencing and accurate variant calling on 4800 clinically significant genes using a single MinION flowcell.

Leung, Amy Wing-Sze; Leung, Henry Chi-Ming; Wong, Chak-Lim; Zheng, Zhen-Xian; Lui, Wui-Wang; Luk, Ho-Ming; Lo, Ivan Fai-Man; Luo, Ruibang; Lam, Tak-Wah.

BMC Med Genomics ; 15(1): 43, 2022 03 04.

Artigo em Inglês | MEDLINE | ID: mdl-35246132

RESUMO

BACKGROUND: The application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants. By applying MES with MinION sequencing, the technology can achieve a more uniform capture of the target regions, shorter turnaround time, and lower sequencing cost per sample. METHOD: We introduced a cost-effective optimized workflow, ECNano, comprising a wet-lab protocol and bioinformatics analysis, for accurate variant detection at 4800 clinically important genes and regions using a single MinION flowcell. The ECNano wet-lab protocol was optimized to perform long-read target enrichment and ONT library preparation to stably generate high-quality MES data with adequate coverage. The subsequent variant-calling workflow, Clair-ensemble, adopted a fast RNN-based variant caller, Clair, and was optimized for target enrichment data. To evaluate its performance and practicality, ECNano was tested on both reference DNA samples and patient samples. RESULTS: ECNano achieved deep on-target depth of coverage (DoC) at average > 100× and > 98% uniformity using one MinION flowcell. For accurate ONT variant calling, the generated reads sufficiently covered 98.9% of pathogenic positions listed in ClinVar, with 98.96% having at least 30× DoC. ECNano obtained an average read length of 1000 bp. The long reads of ECNano also covered the adjacent splice sites well, with 98.5% of positions having ≥ 30× DoC. Clair-ensemble achieved > 99% recall and accuracy for SNV calling. The whole workflow from wet-lab protocol to variant detection was completed within three days. CONCLUSION: We presented ECNano, an out-of-the-box workflow comprising (1) a wet-lab protocol for ONT target enrichment sequencing and (2) a downstream variant detection workflow, Clair-ensemble. The workflow is cost-effective, with a short turnaround time for high accuracy variant calling in 4800 clinically significant genes and regions using a single MinION flowcell. The long-read exon captured data has potential for further development, promoting the application of long-read sequencing in personalized disease treatment and risk prediction.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Análise Custo-Benefício , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos , Fluxo de Trabalho

Applications and potentials of nanopore sequencing in the (epi)genome and (epi)transcriptome era.

Xie, Shangqian; Leung, Amy Wing-Sze; Zheng, Zhenxian; Zhang, Dake; Xiao, Chuanle; Luo, Ruibang; Luo, Ming; Zhang, Shoudong.

Innovation (Camb) ; 2(4): 100153, 2021 Nov 28.

Artigo em Inglês | MEDLINE | ID: mdl-34901902

RESUMO

The Human Genome Project opened an era of (epi)genomic research, and also provided a platform for the development of new sequencing technologies. During and after the project, several sequencing technologies continue to dominate nucleic acid sequencing markets. Currently, Illumina (short-read), PacBio (long-read), and Oxford Nanopore (long-read) are the most popular sequencing technologies. Unlike PacBio or the popular short-read sequencers before it, which, as examples of the second or so-called Next-Generation Sequencing platforms, need to synthesize when sequencing, nanopore technology directly sequences native DNA and RNA molecules. Nanopore sequencing, therefore, avoids converting mRNA into cDNA molecules, which not only allows for the sequencing of extremely long native DNA and full-length RNA molecules but also document modifications that have been made to those native DNA or RNA bases. In this review on direct DNA sequencing and direct RNA sequencing using Oxford Nanopore technology, we focus on their development and application achievements, discussing their challenges and future perspective. We also address the problems researchers may encounter applying these approaches in their research topics, and how to resolve them.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA