Your browser doesn't support javascript.
loading
Clustering Highly Divergent Homologous Proteins: An Alignment-Free Method.
Muñoz-Baena, Laura; Poon, Art F Y.
Afiliación
  • Muñoz-Baena L; Department of Microbiology and Immunology, Western University, London, Ontario, Canada.
  • Poon AFY; Department of Microbiology and Immunology, Western University, London, Ontario, Canada.
Curr Protoc ; 3(2): e666, 2023 Feb.
Article en En | MEDLINE | ID: mdl-36809686
ABSTRACT
The comparative analysis of amino acid sequences is an important tool in molecular biology that often requires multiple sequence alignments. In comparisons between less closely related genomes, however, it becomes more difficult to accurately align protein-coding sequences, or even to identify homologous regions in different genomes. In this article, we describe an alignment-free method for the classification of homologous protein-coding regions from different genomes. This methodology was originally developed for comparing genomes within virus families, but may be adapted for other organisms. We quantify sequence homology from the overlap (intersection distance) of the k-mer (word) frequency distributions for different protein sequences. Next, we extract groups of homologous sequences from the resulting distance matrix using a combination of dimensionality reduction and hierarchical clustering methods. Finally, we demonstrate how to generate visualizations of the composition of clusters with respect to protein annotations, and by coloring protein-coding regions of genomes by cluster assignments. These provide a useful means to quickly assess the reliability of the clustering results based on the distribution of homologous genes among genomes. © 2023 Wiley Periodicals LLC. Basic Protocol 1 Data collection and processing Basic Protocol 2 Calculating k-mer distances Basic Protocol 3 Extracting clusters of homology Support Protocol Genome plot based on clustering results.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos Tipo de estudio: Prognostic_studies Idioma: En Revista: Curr Protoc Año: 2023 Tipo del documento: Article País de afiliación: Canadá

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos Tipo de estudio: Prognostic_studies Idioma: En Revista: Curr Protoc Año: 2023 Tipo del documento: Article País de afiliación: Canadá