Pesquisa | Portal Regional da BVS

Evaluation and optimization of sequence-based gene regulatory deep learning models.

Rafi, Abdul Muntakim; Nogina, Daria; Penzar, Dmitry; Lee, Dohoon; Lee, Danyeong; Kim, Nayeon; Kim, Sangyeup; Kim, Dohyeon; Shin, Yeojin; Kwak, Il-Youp; Meshcheryakov, Georgy; Lando, Andrey; Zinkevich, Arsenii; Kim, Byeong-Chan; Lee, Juhyun; Kang, Taein; Vaishnav, Eeshit Dhaval; Yadollahpour, Payman; Kim, Sun; Albrecht, Jake; Regev, Aviv; Gong, Wuming; Kulakovskiy, Ivan V; Meyer, Pablo; de Boer, Carl.

bioRxiv ; 2024 Feb 17.

Artigo em Inglês | MEDLINE | ID: mdl-38405704

RESUMO

Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. While some benchmarks produced similar results across the top-performing models, others differed substantially. All top-performing models used neural networks, but diverged in architectures and novel training strategies, tailored to genomics sequence data. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide any given model into logically equivalent building blocks. We tested all possible combinations for the top three models and observed performance improvements for each. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets. Overall, we demonstrate that high-quality gold-standard genomics datasets can drive significant progress in model development.

LegNet: a best-in-class deep learning model for short DNA regulatory regions.

Penzar, Dmitry; Nogina, Daria; Noskova, Elizaveta; Zinkevich, Arsenii; Meshcheryakov, Georgy; Lando, Andrey; Rafi, Abdul Muntakim; de Boer, Carl; Kulakovskiy, Ivan V.

Bioinformatics ; 39(8)2023 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-37490428

RESUMO

MOTIVATION: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar. RESULTS: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.org team in the DREAM 2022 challenge of predicting gene expression from gigantic parallel reporter assays. Using published data, here, we demonstrate that LegNet outperforms existing models and accurately predicts gene expression per se as well as the effects of single-nucleotide variants. Furthermore, we show how LegNet can be used in a diffusion network manner for the rational design of promoter sequences yielding the desired expression level. AVAILABILITY AND IMPLEMENTATION: https://github.com/autosome-ru/LegNet. The GitHub repository includes Jupyter Notebook tutorials and Python scripts under the MIT license to reproduce the results presented in the study.

Assuntos

Aprendizado Profundo , Sequências Reguladoras de Ácido Nucleico , DNA , Regiões Promotoras Genéticas , Software

GRaNIE and GRaNPA: inference and evaluation of enhancer-mediated gene regulatory networks.

Kamal, Aryan; Arnold, Christian; Claringbould, Annique; Moussa, Rim; Servaas, Nila H; Kholmatov, Maksim; Daga, Neha; Nogina, Daria; Mueller-Dott, Sophia; Reyes-Palomares, Armando; Palla, Giovanni; Sigalova, Olga; Bunina, Daria; Pabst, Caroline; Zaugg, Judith B.

Mol Syst Biol ; 19(6): e11627, 2023 06 12.

Artigo em Inglês | MEDLINE | ID: mdl-37073532

RESUMO

Enhancers play a vital role in gene regulation and are critical in mediating the impact of noncoding genetic variants associated with complex traits. Enhancer activity is a cell-type-specific process regulated by transcription factors (TFs), epigenetic mechanisms and genetic variants. Despite the strong mechanistic link between TFs and enhancers, we currently lack a framework for jointly analysing them in cell-type-specific gene regulatory networks (GRN). Equally important, we lack an unbiased way of assessing the biological significance of inferred GRNs since no complete ground truth exists. To address these gaps, we present GRaNIE (Gene Regulatory Network Inference including Enhancers) and GRaNPA (Gene Regulatory Network Performance Analysis). GRaNIE (https://git.embl.de/grp-zaugg/GRaNIE) builds enhancer-mediated GRNs based on covariation of chromatin accessibility and RNA-seq across samples (e.g. individuals), while GRaNPA (https://git.embl.de/grp-zaugg/GRaNPA) assesses the performance of GRNs for predicting cell-type-specific differential expression. We demonstrate their power by investigating gene regulatory mechanisms underlying the response of macrophages to infection, cancer and common genetic traits including autoimmune diseases. Finally, our methods identify the TF PURA as a putative regulator of pro-inflammatory macrophage polarisation.

Assuntos

Redes Reguladoras de Genes , Neoplasias , Humanos , Regulação da Expressão Gênica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Cromatina , Neoplasias/genética , Elementos Facilitadores Genéticos/genética

A hierarchy in clusters of cephalopod mRNA editing sites.

Moldovan, Mikhail A; Chervontseva, Zoe S; Nogina, Daria S; Gelfand, Mikhail S.

Sci Rep ; 12(1): 3447, 2022 03 02.

Artigo em Inglês | MEDLINE | ID: mdl-35236910

RESUMO

RNA editing in the form of substituting adenine with inosine (A-to-I editing) is the most frequent type of RNA editing in many metazoan species. In most species, A-to-I editing sites tend to form clusters and editing at clustered sites depends on editing of the adjacent sites. Although functionally important in some specific cases, A-to-I editing usually is rare. The exception occurs in soft-bodied coleoid cephalopods, where tens of thousands of potentially important A-to-I editing sites have been identified, making coleoids an ideal model for studying of properties and evolution of A-to-I editing sites. Here, we apply several diverse techniques to demonstrate a strong tendency of coleoid RNA editing sites to cluster along the transcript. We show that clustering of editing sites and correlated editing substantially contribute to the transcriptome diversity that arises due to extensive RNA editing. Moreover, we identify three distinct types of editing site clusters, varying in size, and describe RNA structural features and mechanisms likely underlying formation of these clusters. In particular, these observations may explain sequence conservation at large distances around editing sites and the observed dependency of editing on mutations in the vicinity of editing sites.

Assuntos

Cefalópodes , Animais , Cefalópodes/genética , Cefalópodes/metabolismo , Inosina/metabolismo , RNA/genética , Edição de RNA , RNA Mensageiro/genética

Ribosomal leaky scanning through a translated uORF requires eIF4G2.

Smirnova, Victoria V; Shestakova, Ekaterina D; Nogina, Daria S; Mishchenko, Polina A; Prikazchikova, Tatiana A; Zatsepin, Timofei S; Kulakovskiy, Ivan V; Shatsky, Ivan N; Terenin, Ilya M.

Nucleic Acids Res ; 50(2): 1111-1127, 2022 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-35018467

RESUMO

eIF4G2 (DAP5 or Nat1) is a homologue of the canonical translation initiation factor eIF4G1 in higher eukaryotes but its function remains poorly understood. Unlike eIF4G1, eIF4G2 does not interact with the cap-binding protein eIF4E and is believed to drive translation under stress when eIF4E activity is impaired. Here, we show that eIF4G2 operates under normal conditions as well and promotes scanning downstream of the eIF4G1-mediated 40S recruitment and cap-proximal scanning. Specifically, eIF4G2 facilitates leaky scanning for a subset of mRNAs. Apparently, eIF4G2 replaces eIF4G1 during scanning of 5' UTR and the necessity for eIF4G2 only arises when eIF4G1 dissociates from the scanning complex. In particular, this event can occur when the leaky scanning complexes interfere with initiating or elongating 80S ribosomes within a translated uORF. This mechanism is therefore crucial for higher eukaryotes which are known to have long 5' UTRs with highly frequent uORFs. We suggest that uORFs are not the only obstacle on the way of scanning complexes towards the main start codon, because certain eIF4G2 mRNA targets lack uORF(s). Thus, higher eukaryotes possess two distinct scanning complexes: the principal one that binds mRNA and initiates scanning, and the accessory one that rescues scanning when the former fails.

Assuntos

Fator de Iniciação Eucariótico 4G/metabolismo , RNA Mensageiro/metabolismo , Ribossomos/metabolismo , Humanos , Fases de Leitura Aberta , Biossíntese de Proteínas

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA