Your browser doesn't support javascript.
loading
Foster thy young: enhanced prediction of orphan genes in assembled genomes.
Li, Jing; Singh, Urminder; Bhandary, Priyanka; Campbell, Jacqueline; Arendsee, Zebulun; Seetharam, Arun S; Wurtele, Eve Syrkin.
Afiliação
  • Li J; Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.
  • Singh U; Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.
  • Bhandary P; Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA.
  • Campbell J; Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.
  • Arendsee Z; Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.
  • Seetharam AS; Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA.
  • Wurtele ES; Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.
Nucleic Acids Res ; 50(7): e37, 2022 04 22.
Article em En | MEDLINE | ID: mdl-34928390
ABSTRACT
Proteins encoded by newly-emerged genes ('orphan genes') share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Oryza / Arabidopsis Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Oryza / Arabidopsis Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article