Your browser doesn't support javascript.
loading
A new method to cluster genomes based on cumulative Fourier power spectrum.
Dong, Rui; Zhu, Ziyue; Yin, Changchuan; He, Rong L; Yau, Stephen S-T.
Afiliação
  • Dong R; Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.
  • Zhu Z; Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.
  • Yin C; Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, IL 60607, USA.
  • He RL; Department of Biological Sciences, Chicago State University, Chicago, IL 60628, USA.
  • Yau SS; Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China. Electronic address: yau@uic.edu.
Gene ; 673: 239-250, 2018 Oct 05.
Article em En | MEDLINE | ID: mdl-29935353
ABSTRACT
Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https//github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum).
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Vírus da Influenza A / Rhinovirus / Análise por Conglomerados / Análise de Sequência / Macaca Limite: Animals / Humans Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Vírus da Influenza A / Rhinovirus / Análise por Conglomerados / Análise de Sequência / Macaca Limite: Animals / Humans Idioma: En Ano de publicação: 2018 Tipo de documento: Article