Your browser doesn't support javascript.
loading
A phylogenetic approach for weighting genetic sequences.
De Maio, Nicola; Alekseyenko, Alexander V; Coleman-Smith, William J; Pardi, Fabio; Suchard, Marc A; Tamuri, Asif U; Truszkowski, Jakub; Goldman, Nick.
Afiliação
  • De Maio N; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. demaio@ebi.ac.uk.
  • Alekseyenko AV; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
  • Coleman-Smith WJ; Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA.
  • Pardi F; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
  • Suchard MA; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
  • Tamuri AU; LIRMM, University of Montpellier, CNRS, Montpellier, France.
  • Truszkowski J; Departments of Biostatistics, Biomathematics and Human Genetics, University of California, Los Angeles, CA, USA.
  • Goldman N; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
BMC Bioinformatics ; 22(1): 285, 2021 May 28.
Article em En | MEDLINE | ID: mdl-34049487
ABSTRACT

BACKGROUND:

Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are 'novel' compared to the others in the same dataset, and low weights to sequences that are over-represented.

RESULTS:

We formalise this principle by rigorously defining the evolutionary 'novelty' of a sequence within an alignment. This results in new sequence weights that we call 'phylogenetic novelty scores'. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column-important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes.

CONCLUSIONS:

Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Biologia Computacional Tipo de estudo: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Biologia Computacional Tipo de estudo: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Reino Unido