Your browser doesn't support javascript.
loading
Identification of the expressome by machine learning on omics data.
Sartor, Ryan C; Noshay, Jaclyn; Springer, Nathan M; Briggs, Steven P.
Afiliación
  • Sartor RC; Division of Biology, University of California San Diego, La Jolla, CA 92093.
  • Noshay J; Department of Plant Biology, University of Minnesota, St. Paul, MN 55108.
  • Springer NM; Department of Plant Biology, University of Minnesota, St. Paul, MN 55108.
  • Briggs SP; Division of Biology, University of California San Diego, La Jolla, CA 92093; sbriggs@ucsd.edu.
Proc Natl Acad Sci U S A ; 116(36): 18119-18125, 2019 09 03.
Article en En | MEDLINE | ID: mdl-31420517
ABSTRACT
Accurate annotation of plant genomes remains complex due to the presence of many pseudogenes arising from whole-genome duplication-generated redundancy or the capture and movement of gene fragments by transposable elements. Machine learning on genome-wide epigenetic marks, informed by transcriptomic and proteomic training data, could be used to improve annotations through classification of all putative protein-coding genes as either constitutively silent or able to be expressed. Expressed genes were subclassified as able to express both mRNAs and proteins or only RNAs, and CG gene body methylation was associated only with the former subclass. More than 60,000 protein-coding genes have been annotated in the reference genome of maize inbred B73. About two-thirds of these genes are transcribed and are designated the filtered gene set (FGS). Classification of genes by our trained random forest algorithm was accurate and relied only on histone modifications or DNA methylation patterns within the gene body; promoter methylation was unimportant. Other inbred lines are known to transcribe significantly different sets of genes, indicating that the FGS is specific to B73. We accurately classified the sets of transcribed genes in additional inbred lines, arising from inbred-specific DNA methylation patterns. This approach highlights the potential of using chromatin information to improve annotations of functional genes.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Genoma de Planta / Regulación de la Expresión Génica de las Plantas / Zea mays / Perfilación de la Expresión Génica / Bases de Datos de Ácidos Nucleicos / Aprendizaje Automático Tipo de estudio: Diagnostic_studies Idioma: En Revista: Proc Natl Acad Sci U S A Año: 2019 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Genoma de Planta / Regulación de la Expresión Génica de las Plantas / Zea mays / Perfilación de la Expresión Génica / Bases de Datos de Ácidos Nucleicos / Aprendizaje Automático Tipo de estudio: Diagnostic_studies Idioma: En Revista: Proc Natl Acad Sci U S A Año: 2019 Tipo del documento: Article