Pesquisa | Biblioteca Virtual em Saúde Fiocruz

GPMeta: a GPU-accelerated method for ultrarapid pathogen identification from metagenomic sequences.

Wang, Xuebin; Wang, Taifu; Xie, Zhihao; Zhang, Youjin; Xia, Shiqiang; Sun, Ruixue; He, Xinqiu; Xiang, Ruizhi; Zheng, Qiwen; Liu, Zhencheng; Wang, Jin'An; Wu, Honglong; Jin, Xiangqian; Chen, Weijun; Li, Dongfang; He, Zengquan.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36917170

RESUMO

Metagenomic sequencing (mNGS) is a powerful diagnostic tool to detect causative pathogens in clinical microbiological testing owing to its unbiasedness and substantially reduced costs. Rapid and accurate classification of metagenomic sequences is a critical procedure for pathogen identification in dry-lab step of mNGS test. However, clinical practices of the testing technology are hampered by the challenge of classifying sequences within a clinically relevant timeframe. Here, we present GPMeta, a novel GPU-accelerated approach to ultrarapid pathogen identification from complex mNGS data, allowing users to bypass this limitation. Using mock microbial community datasets and public real metagenomic sequencing datasets from clinical samples, we show that GPMeta has not only higher accuracy but also significantly higher speed than existing state-of-the-art tools such as Bowtie2, Bwa, Kraken2 and Centrifuge. Furthermore, GPMeta offers GPMetaC clustering algorithm, a statistical model for clustering and rescoring ambiguous alignments to improve the discrimination of highly homologous sequences from microbial genomes with average nucleotide identity >95%. GPMetaC exhibits higher precision and recall rate than others. GPMeta underlines its key role in the development of the mNGS test in infectious diseases that require rapid turnaround times. Further study will discern how to best and easily integrate GPMeta into routine clinical practices. GPMeta is freely accessible to non-commercial users at https://github.com/Bgi-LUSH/GPMeta.

Assuntos

Metagenoma , Microbiota , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Sensibilidade e Especificidade

Fast and accurate DNASeq variant calling workflow composed of LUSH toolkit.

Wang, Taifu; Zhang, Youjin; Wang, Haoling; Zheng, Qiwen; Yang, Jiaobo; Zhang, Tiefeng; Sun, Geng; Liu, Weicong; Yin, Longhui; He, Xinqiu; You, Rui; Wang, Chu; Liu, Zhencheng; Liu, Zhijian; Wang, Jin'an; Jin, Xiangqian; He, Zengquan.

Hum Genomics ; 18(1): 114, 2024 Oct 10.

Artigo em Inglês | MEDLINE | ID: mdl-39390620

RESUMO

BACKGROUND: Whole genome sequencing (WGS) is becoming increasingly prevalent for molecular diagnosis, staging and prognosis because of its declining costs and the ability to detect nearly all genes associated with a patient's disease. The currently widely accepted variant calling pipeline, GATK, is limited in terms of its computational speed and efficiency, which cannot meet the growing analysis needs. RESULTS: Here, we propose a fast and accurate DNASeq variant calling workflow that is purely composed of tools from LUSH toolkit. The precision and recall measurements indicate that both the LUSH and GATK pipelines exhibit high levels of consistency, with precision and recall rates exceeding 99% on the 30x NA12878 dataset. In terms of processing speed, the LUSH pipeline outperforms the GATK pipeline, completing 30x WGS data analysis in just 1.6 h, which is approximately 17 times faster than GATK. Notably, the LUSH_HC tool completes the processing from BAM to VCF in just 12 min, which is around 76 times faster than GATK. CONCLUSION: These findings suggest that the LUSH pipeline is a highly promising alternative to the GATK pipeline for WGS data analysis, with the potential to significantly improve bedside analysis of acutely ill patients, large-scale cohort data analysis, and high-throughput variant calling in crop breeding programs. Furthermore, the LUSH pipeline is highly scalable and easily deployable, allowing it to be readily applied to various scenarios such as clinical diagnosis and genomic research.

Assuntos

Software , Sequenciamento Completo do Genoma , Fluxo de Trabalho , Humanos , Sequenciamento Completo do Genoma/métodos , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Biologia Computacional/métodos

Single-cell RNA-seq reveals distinct dynamic behavior of sex chromosomes during early human embryogenesis.

Zhou, Qing; Wang, Taifu; Leng, Lizhi; Zheng, Wei; Huang, Jinrong; Fang, Fang; Yang, Ling; Chen, Fang; Lin, Ge; Wang, Wen-Jing; Kristiansen, Karsten.

Mol Reprod Dev ; 86(7): 871-882, 2019 07.

Artigo em Inglês | MEDLINE | ID: mdl-31094050

RESUMO

Several animal and human studies have demonstrated that sex affects kinetics and metabolism during early embryo development. However, the mechanism governing these differences at the molecular level before the expression of the sex-determining gene SRY is unknown. We performed a systematic profiling of gene expression comparing male and female embryos using available single-cell RNA-sequencing data of 1607 individual cells from 99 human preimplantation embryos, covering development stages from 4-cell to late blastocyst. We observed consistent chromosome-wide transcription of autosomes, whereas expression from sex chromosomes exhibits significant differences after embryonic genome activation (EGA). Activation of the Y chromosome is initiated by expression of two genes, RPS4Y1 and DDX3Y, whereas the X chromosome is widely activated, with both copies in females being activated after EGA. In contrast to the stable activation of the Y chromosome, expression of X-linked genes in females declines at the late blastocyst stage, especially in trophectoderm cells, revealing a rapid process of dosage compensation. This dynamic behavior results in a dosage imbalance between male and female embryos, which influences genes involved in cell cycle, protein translation and metabolism. Our results reveal the dynamics of sex chromosomes expression and silencing during early embryogenesis. Studying sex differences during human embryogenesis, as well as understanding the process of X chromosome inactivation and their effects on the sex bias development of in vitro fertilized embryos, will expand the capabilities of assisted reproductive technology and possibly improve the treatment of infertility and enhance reproductive health.

Assuntos

Blastocisto , Cromossomos Humanos X/genética , Cromossomos Humanos Y/genética , Desenvolvimento Embrionário/genética , RNA-Seq , Análise de Célula Única/métodos , Feminino , Fertilização in vitro/métodos , Regulação da Expressão Gênica no Desenvolvimento , Genes Ligados ao Cromossomo X , Genoma Humano/genética , Humanos , Cinética , Masculino , Caracteres Sexuais , Transcriptoma , Inativação do Cromossomo X

Evaluating the Effects of Storage Conditions on Multiple Cell-Free RNAs in Plasma by High-Throughput Sequencing.

Sun, Jinghua; Yang, Xi; Wang, Taifu; Xing, Yanru; Chen, Haixiao; Zhu, Sujun; Zeng, Juan; Zhou, Qing; Chen, Fang; Zhang, Xiuqing; Wang, Wen-Jing.

Biopreserv Biobank ; 21(3): 242-254, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-36006659

RESUMO

Background: Plasma cell-free RNAs (cfRNAs) can serve as noninvasive biomarkers for the diagnosis and monitoring of diseases. However, the delay in blood processing may lead to unreliable results. Therefore, an unbiased evaluation based on the whole transcriptome under different storage conditions is needed. Methods: Here, blood samples were collected in ethylenediaminetetraacetic acid tubes and processed immediately (0 hour), or stored at room temperature (RT) or 4°C for different time intervals (2, 6, and 24 hours) before plasma separation. High-throughput sequencing was applied to assess the effects of storage conditions on the transcript profiles and fragment characteristics of plasma cell-free mRNA, long noncoding RNA (lncRNA), and small RNAs. Results: More genes changed their expression levels with time when blood was stored at RT compared with those at 4°C. Cell-free mRNA and lncRNA were relatively stable in blood preserved at 4°C for 6 hours, while cell-free microRNA (miRNA) and piwi-interacting RNA (piRNA) remained stable at 4°C for 24 hours. After 24 hours, more contamination of the leukocyte-derived RNAs occurred at RT, possibly due to apoptosis. Meanwhile, significant changes were also observed regarding the characteristics of the RNA fragments, including fragment size, the proportion of intron, and the pyrimidine frequency of the fragmented 3' end. Fifteen tissue-enriched genes were detected in the plasma but not expressed in leukocytes. The expression level and fragment length of these genes gradually decreased during storage, suggesting the degradation of the cfRNA and the dilution of leukocyte-derived RNA with other tissue-derived cfRNA. Conclusions: Our results suggest that the contamination of leukocyte-derived RNA and the degradation of original cfRNA contribute to the changes in the cfRNA expression profiles and the fragment characteristics during short-term storage. The storage of blood at 4°C for 6 hours allows plasma cfRNA to remain relatively stable, which will be useful for further studies or clinical applications where adequate quantification or the fragment signature of cfRNA is required.

Assuntos

Ácidos Nucleicos Livres , RNA Longo não Codificante , Ácidos Nucleicos Livres/genética , RNA Longo não Codificante/genética , RNA Mensageiro , Coleta de Amostras Sanguíneas/métodos , RNA de Interação com Piwi , Sequenciamento de Nucleotídeos em Larga Escala

Polyadenylation ligation-mediated sequencing (PALM-Seq) characterizes cell-free coding and non-coding RNAs in human biofluids.

Liu, Zhongzhen; Wang, Taifu; Yang, Xi; Zhou, Qing; Zhu, Sujun; Zeng, Juan; Chen, Haixiao; Sun, Jinghua; Li, Liqiang; Xu, Jinjin; Geng, Chunyu; Xu, Xun; Wang, Jian; Yang, Huanming; Zhu, Shida; Chen, Fang; Wang, Wen-Jing.

Clin Transl Med ; 12(7): e987, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-35858042

RESUMO

BACKGROUND: Cell-free messenger RNA (cf-mRNA) and long non-coding RNA (cf-lncRNA) are becoming increasingly important in liquid biopsy by providing biomarkers for disease prediction, diagnosis and prognosis, but the simultaneous characterization of coding and non-coding RNAs in human biofluids remains challenging. METHODS: Here, we developed polyadenylation ligation-mediated sequencing (PALM-Seq), an RNA sequencing strategy employing treatment of RNA with T4 polynucleotide kinase to generate cell-free RNA (cfRNA) fragments with 5' phosphate and 3' hydroxyl and RNase H to deplete abundant RNAs, achieving simultaneous quantification and characterization of cfRNAs. RESULTS: Using PALM-Seq, we successfully identified well-known differentially abundant mRNA, lncRNA and microRNA in the blood plasma of pregnant women. We further characterized cfRNAs in blood plasma, saliva, urine, seminal plasma and amniotic fluid and found that the detected numbers of different RNA biotypes varied with body fluids. The profiles of cf-mRNA reflected the function of originated tissues, and immune cells significantly contributed RNA to blood plasma and saliva. Short fragments (<50 nt) of mRNA and lncRNA were major in biofluids, whereas seminal plasma and amniotic fluid tended to retain long RNA. Body fluids showed distinct preferences of pyrimidine at the 3' end and adenine at the 5' end of cf-mRNA and cf-lncRNA, which were correlated with the proportions of short fragments. CONCLUSION: Together, PALM-Seq enables a simultaneous characterization of cf-mRNA and cf-lncRNA, contributing to elucidating the biology and promoting the application of cfRNAs.

Assuntos

Ácidos Nucleicos Livres , MicroRNAs , RNA Longo não Codificante , Ácidos Nucleicos Livres/genética , Feminino , Humanos , MicroRNAs/genética , Poliadenilação/genética , Gravidez , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Análise de Sequência de RNA

CNV-P: a machine-learning framework for predicting high confident copy number variations.

Wang, Taifu; Sun, Jinghua; Zhang, Xiuqing; Wang, Wen-Jing; Zhou, Qing.

PeerJ ; 9: e12564, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34917425

RESUMO

BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. METHODS: Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. RESULTS: The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. CONCLUSIONS: Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.

First near complete mitochondrial genome of large toothed toad, Oreolalax major (Anura: Megophryidae) from southwest China.

Liu, Fuwen; Liu, Yusong; Li, Yan; Ni, Qingyong; Yao, Yongfang; Xu, Huailiang; Yang, Mingxian; Wang, Taifu; Wang, Jiacai; Rao, Dingqi; Zhang, Mingwang.

Mitochondrial DNA B Resour ; 2(1): 37-38, 2017 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-33473708

RESUMO

In this study, the near complete mitogenome sequence (15,469 bp) of Oreolalax major was determined using polymerase chain reaction (PCR). It includes 13 protein-coding genes (PCGs), 2 ribosomal RNA (rRNA) genes and 19 transfer RNA (tRNA) genes (GenBank accession number KU310894). The features of O. major have one more tRNA gene (tRNAMet ) behind the original one before ND2 which is similar to Leptobrachium boringii. Phylogenetic analyses were based on the concatenated sequences of the 13 protein-encoding genes of O. major and other related species.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA