Your browser doesn't support javascript.
loading
MetaCRS: unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity.
Jiang, Zhongjun; Li, Xiaobo; Guo, Lijun.
Afiliação
  • Jiang Z; College of Information Science and Technology, Ningbo University, Ningbo, 315211, China.
  • Li X; College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, 321004, China. lxb@zjnu.edu.cn.
  • Guo L; College of Engineering, Lishui University, Lishui, 323000, China. lxb@zjnu.edu.cn.
BMC Bioinformatics ; 22(Suppl 12): 315, 2022 Jan 20.
Article em En | MEDLINE | ID: mdl-35045830
ABSTRACT

BACKGROUND:

Metagenomics technology can directly extract microbial genetic material from the environmental samples to obtain their sequencing reads, which can be further assembled into contigs through assembly tools. Clustering methods of contigs are subsequently applied to recover complete genomes from environmental samples. The main problems with current clustering methods are that they cannot recover more high-quality genes from complex environments. Firstly, there are multiple strains under the same species, resulting in assembly of chimeras. Secondly, different strains under the same species are difficult to be classified. Thirdly, it is difficult to determine the number of strains during the clustering process.

RESULTS:

In view of the shortcomings of current clustering methods, we propose an unsupervised clustering method which can improve the ability to recover genes from complex environments and a new method for selecting the number of sample's strains in clustering process. The sequence composition characteristics (tetranucleotide frequency) and co-abundance are combined to train the probability model for clustering. A new recursive method that can continuously reduce the complexity of the samples is proposed to improve the ability to recover genes from complex environments. The new clustering method was tested on both simulated and real metagenomic datasets, and compared with five state-of-the-art methods including CONCOCT, Maxbin2.0, MetaBAT, MyCC and COCACOLA. In terms of the number and quality of recovered genes from metagenomic datasets, the results show that our proposed method is more effective.

CONCLUSIONS:

A new contigs clustering method is proposed, which can recover more high-quality genes from complex environmental samples.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Metagenômica Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Metagenômica Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article