Your browser doesn't support javascript.
loading
Extended many-item similarity indices for sets of nucleotide and protein sequences.
Bajusz, Dávid; Miranda-Quintana, Ramón Alain; Rácz, Anita; Héberger, Károly.
Afiliación
  • Bajusz D; Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117 Budapest, Hungary.
  • Miranda-Quintana RA; Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA.
  • Rácz A; Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117 Budapest, Hungary.
  • Héberger K; Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117 Budapest, Hungary.
Comput Struct Biotechnol J ; 19: 3628-3639, 2021.
Article en En | MEDLINE | ID: mdl-34257841
ABSTRACT
Quantification of similarities between protein sequences or DNA/RNA strands is a (sub-)task that is ubiquitously present in bioinformatics workflows, and is usually accomplished by pairwise comparisons of sequences, utilizing simple (e.g. percent identity) or more intricate concepts (e.g. substitution scoring matrices). Complex tasks (such as clustering) rely on a large number of pairwise comparisons under the hood, instead of a direct quantification of set similarities. Based on our recently introduced framework that enables multiple comparisons of binary molecular fingerprints (i.e., direct calculation of the similarity of fingerprint sets), here we introduce novel symmetric similarity indices for analogous calculations on sets of character sequences with more than two (t) possible items (e.g. DNA/RNA sequences with t = 4, or protein sequences with t = 20). The features of these new indices are studied in detail with analysis of variance (ANOVA), and demonstrated with three case studies of protein/DNA sequences with varying degrees of similarity (or evolutionary proximity). The Python code for the extended many-item similarity indices is publicly available at https//github.com/ramirandaq/tn_Comparisons.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: Comput Struct Biotechnol J Año: 2021 Tipo del documento: Article País de afiliación: Hungria Pais de publicación: HOLANDA / HOLLAND / NETHERLANDS / NL / PAISES BAJOS / THE NETHERLANDS

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: Comput Struct Biotechnol J Año: 2021 Tipo del documento: Article País de afiliación: Hungria Pais de publicación: HOLANDA / HOLLAND / NETHERLANDS / NL / PAISES BAJOS / THE NETHERLANDS