Your browser doesn't support javascript.
loading
Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi.
Keilwagen, Jens; Hartung, Frank; Paulini, Michael; Twardziok, Sven O; Grau, Jan.
Afiliação
  • Keilwagen J; Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Quedlinburg, D-06484, Germany. jens.keilwagen@julius-kuehn.de.
  • Hartung F; Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Quedlinburg, D-06484, Germany.
  • Paulini M; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
  • Twardziok SO; Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, D-85764, Germany.
  • Grau J; Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), D-06120, Germany.
BMC Bioinformatics ; 19(1): 189, 2018 05 30.
Article em En | MEDLINE | ID: mdl-29843602
ABSTRACT

BACKGROUND:

Genome annotation is of key importance in many research questions. The identification of protein-coding genes is often based on transcriptome sequencing data, ab-initio or homology-based prediction. Recently, it was demonstrated that intron position conservation improves homology-based gene prediction, and that experimental data improves ab-initio gene prediction.

RESULTS:

Here, we present an extension of the gene prediction program GeMoMa that utilizes amino acid sequence conservation, intron position conservation and optionally RNA-seq data for homology-based gene prediction. We show on published benchmark data for plants, animals and fungi that GeMoMa performs better than the gene prediction programs BRAKER1, MAKER2, and CodingQuarry, and purely RNA-seq-based pipelines for transcript identification. In addition, we demonstrate that using multiple reference organisms may help to further improve the performance of GeMoMa. Finally, we apply GeMoMa to four nematode species and to the recently published barley reference genome indicating that current annotations of protein-coding genes may be refined using GeMoMa predictions.

CONCLUSIONS:

GeMoMa might be of great utility for annotating newly sequenced genomes but also for finding homologs of a specific gene or gene family. GeMoMa has been published under GNU GPL3 and is freely available at http//www.jstacs.de/index.php/GeMoMa .
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Homologia de Sequência de Aminoácidos / Análise de Sequência de RNA / Genes de Plantas / Perfilação da Expressão Gênica / Genes Fúngicos Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Alemanha

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Homologia de Sequência de Aminoácidos / Análise de Sequência de RNA / Genes de Plantas / Perfilação da Expressão Gênica / Genes Fúngicos Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Alemanha