Your browser doesn't support javascript.
loading
iDeLUCS: a deep learning interactive tool for alignment-free clustering of DNA sequences.
Millan Arias, Pablo; Hill, Kathleen A; Kari, Lila.
Afiliación
  • Millan Arias P; Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
  • Hill KA; Department of Biology, University of Western Ontario, London, ON N6A 5B7, Canada.
  • Kari L; Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
Bioinformatics ; 39(9)2023 09 02.
Article en En | MEDLINE | ID: mdl-37589603
ABSTRACT

SUMMARY:

We present an interactive Deep Learning-based software tool for Unsupervised Clustering of DNA Sequences (iDeLUCS), that detects genomic signatures and uses them to cluster DNA sequences, without the need for sequence alignment or taxonomic identifiers. iDeLUCS is scalable and user-friendly its graphical user interface, with support for hardware acceleration, allows the practitioner to fine-tune the different hyper-parameters involved in the training process without requiring extensive knowledge of deep learning. The performance of iDeLUCS was evaluated on a diverse set of datasets several real genomic datasets from organisms in kingdoms Animalia, Protista, Fungi, Bacteria, and Archaea, three datasets of viral genomes, a dataset of simulated metagenomic reads from microbial genomes, and multiple datasets of synthetic DNA sequences. The performance of iDeLUCS was compared to that of two classical clustering algorithms (k-means++ and GMM) and two clustering algorithms specialized in DNA sequences (MeShClust v3.0 and DeLUCS), using both intrinsic cluster evaluation metrics and external evaluation metrics. In terms of unsupervised clustering accuracy, iDeLUCS outperforms the two classical algorithms by an average of ∼20%, and the two specialized algorithms by an average of ∼12%, on the datasets of real DNA sequences analyzed. Overall, our results indicate that iDeLUCS is a robust clustering method suitable for the clustering of large and diverse datasets of unlabeled DNA sequences. AVAILABILITY AND IMPLEMENTATION iDeLUCS is available at https//github.com/Kari-Genomics-Lab/iDeLUCS under the terms of the MIT licence.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Aprendizaje Profundo Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Canadá

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Aprendizaje Profundo Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Canadá
...