Your browser doesn't support javascript.
loading
Active learning of enhancer and silencer regulatory grammar in photoreceptors.
Friedman, Ryan Z; Ramu, Avinash; Lichtarge, Sara; Myers, Connie A; Granas, David M; Gause, Maria; Corbo, Joseph C; Cohen, Barak A; White, Michael A.
Afiliação
  • Friedman RZ; The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110.
  • Ramu A; Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110.
  • Lichtarge S; The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110.
  • Myers CA; Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110.
  • Granas DM; The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110.
  • Gause M; Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110.
  • Corbo JC; Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110.
  • Cohen BA; The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110.
  • White MA; Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110.
bioRxiv ; 2023 Aug 22.
Article em En | MEDLINE | ID: mdl-37662358
ABSTRACT
Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model's internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models of cis-regulatory function after exhausting naturally occurring training examples in the genome.

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article