Your browser doesn't support javascript.
loading
Distribution rules of 8-mer spectra and characterization of evolution state in animal genome sequences.
Li, Xiaolong; Li, Hong; Yang, Zhenhua; Wang, Lu.
Affiliation
  • Li X; Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China.
  • Li H; Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China. ndlihong@imu.edu.cn.
  • Yang Z; School of Economics and Management, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
  • Wang L; Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China.
BMC Genomics ; 25(1): 855, 2024 Sep 12.
Article in En | MEDLINE | ID: mdl-39266973
ABSTRACT

BACKGROUND:

Studying the composition rules and evolution mechanisms of genome sequences are core issues in the post-genomic era, and k-mer spectrum analysis of genome sequences is an effective means to solve this problem.

RESULT:

We divided total 8-mers of genome sequences into 16 kinds of XY-type due to XY dinucleotides number in 8-mers. Previous works explored that the independent unimodal distributions observed only in three CG-type 8-mer spectra, while non-CG type 8-mer spectra have not the universal phenomenon from prokaryotes to eukaryotes. On this basis, we analyzed the distribution variation of non-CG type 8-mer spectra across 889 animal genome sequences. Following the evolutionary order of animals from primitive to more complex, we found that the spectrum distributions gradually transition from unimodal to tri-modal. The relative distance from the average frequency of each non-CG type 8-mers to the center frequency is different within a species and among different species. For the 8-mers contain CG dinucleotides, we further divided these into 16 subsets, where each 8-mer contains both CG and XY dinucleotides, called XY1_CG1 subsets. We found that the separability values of XY1_CG1 spectra are closely related to the evolution and specificity of animals. Considering the constraint of Chargaff's second parity rule, we finally obtained 10 separability values as the feature set to characterize the evolution state of genome sequences. In order to verify the rationality of the feature set, we used 14 common classification algorithms to perform binary classification tests. The results showed that the accuracy (Acc) ranged between 98.70% and 83.88% among birds, other vertebrates and mammals.

CONCLUSION:

We proposed a credible feature set to characterizes the evolution state of genomes and obtained satisfied results by the feature set on large scale classification of animals.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome / Evolution, Molecular Limits: Animals Language: En Journal: BMC Genomics Journal subject: GENETICA Year: 2024 Document type: Article Affiliation country: China Country of publication: Reino Unido

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome / Evolution, Molecular Limits: Animals Language: En Journal: BMC Genomics Journal subject: GENETICA Year: 2024 Document type: Article Affiliation country: China Country of publication: Reino Unido