Your browser doesn't support javascript.
loading
A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.
Scalzitti, Nicolas; Jeannin-Girardon, Anne; Collet, Pierre; Poch, Olivier; Thompson, Julie D.
Afiliação
  • Scalzitti N; Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
  • Jeannin-Girardon A; Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
  • Collet P; Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
  • Poch O; Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
  • Thompson JD; Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France. thompson@unistra.fr.
BMC Genomics ; 21(1): 293, 2020 Apr 09.
Article em En | MEDLINE | ID: mdl-32272892
ABSTRACT

BACKGROUND:

The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations.

RESULTS:

We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools.

CONCLUSIONS:

The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Biologia Computacional / Eucariotos / Anotação de Sequência Molecular Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals / Humans Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Biologia Computacional / Eucariotos / Anotação de Sequência Molecular Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals / Humans Idioma: En Ano de publicação: 2020 Tipo de documento: Article