Your browser doesn't support javascript.
loading
Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes.
Yu, Jia-Feng; Chen, Qing-Li; Ren, Jing; Yang, Yan-Ling; Wang, Ji-Hua; Sun, Xiao.
Afiliação
  • Yu JF; Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China. Electronic address: jfyu1979@126.com.
  • Chen QL; Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; College of life science, Shandong Normal University, Jinan 250358, China.
  • Ren J; Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.
  • Yang YL; School of Physics and Electronic Information, Dezhou University, Dezhou 253023, China.
  • Wang JH; Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; School of Physics and Electronic Information, Dezhou University, Dezhou 253023, China.
  • Sun X; State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China.
J Theor Biol ; 376: 8-14, 2015 Jul 07.
Article em En | MEDLINE | ID: mdl-25865522
ABSTRACT
The important roles of duplicated genes in evolutional process have been recognized in bacteria, archaebacteria and eukaryotes, while there is very little study on the multi-copied protein coding genes that share sequence identity of 100%. In this paper, the multi-copied protein coding genes in a number of prokaryotic genomes are comprehensively analyzed firstly. The results show that 0-15.93% of the protein coding genes in each genome are multi-copied genes and 0-16.49% of the protein coding genes in each genome are highly similar with the sequence identity ≥ 80%. Function and COG (Clusters of Orthologous Groups of proteins) analysis shows that 64.64% of multi-copied genes concentrate on the function of transposase and 86.28% of the COG assigned multi-copied genes concentrate on the COG code of 'L'. Furthermore, the impact of redundant protein coding sequences on the gene prediction results is studied. The results show that the problem of protein coding sequence redundancies cannot be ignored and the consistency of the gene annotation results before and after excluding the redundant sequences is negatively related with the sequences redundancy degree of the protein coding sequences in the training set.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Bactérias / Fases de Leitura Aberta / Dosagem de Genes / Duplicação Gênica / Genes Bacterianos Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Bactérias / Fases de Leitura Aberta / Dosagem de Genes / Duplicação Gênica / Genes Bacterianos Idioma: En Ano de publicação: 2015 Tipo de documento: Article