Pesquisa | BVS IEC

In silico design of DNA sequences for in vivo nucleosome positioning.

Routhier, Etienne; Joubert, Alexandra; Westbrook, Alex; Pierre, Edgard; Lancrey, Astrid; Cariou, Marie; Boulé, Jean-Baptiste; Mozziconacci, Julien.

Nucleic Acids Res ; 52(12): 6802-6810, 2024 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-38828788

RESUMO

The computational design of synthetic DNA sequences with designer in vivo properties is gaining traction in the field of synthetic genomics. We propose here a computational method which combines a kinetic Monte Carlo framework with a deep mutational screening based on deep learning predictions. We apply our method to build regular nucleosome arrays with tailored nucleosomal repeat lengths (NRL) in yeast. Our design was validated in vivo by successfully engineering and integrating thousands of kilobases long tandem arrays of computationally optimized sequences which could accommodate NRLs much larger than the yeast natural NRL (namely 197 and 237 bp, compared to the natural NRL of â¼165 bp). RNA-seq results show that transcription of the arrays can occur but is not driven by the NRL. The computational method proposed here delineates the key sequence rules for nucleosome positioning in yeast and should be easily applicable to other sequence properties and other genomes.

Assuntos

Nucleossomos , Saccharomyces cerevisiae , Nucleossomos/metabolismo , Nucleossomos/genética , Nucleossomos/química , Saccharomyces cerevisiae/genética , Simulação por Computador , Método de Monte Carlo , DNA/genética , DNA/química , DNA/metabolismo , Sequência de Bases , Aprendizado Profundo , Montagem e Desmontagem da Cromatina

Genome-wide prediction of DNA mutation effect on nucleosome positions for yeast synthetic genomics.

Routhier, Etienne; Pierre, Edgard; Khodabandelou, Ghazaleh; Mozziconacci, Julien.

Genome Res ; 31(2): 317-326, 2021 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-33355297

RESUMO

Genetically modified genomes are often used today in many areas of fundamental and applied research. In many studies, coding or noncoding regions are modified in order to change protein sequences or gene expression levels. Modifying one or several nucleotides in a genome can also lead to unexpected changes in the epigenetic regulation of genes. When designing a synthetic genome with many mutations, it would thus be very informative to be able to predict the effect of these mutations on chromatin. We develop here a deep learning approach that quantifies the effect of every possible single mutation on nucleosome positions on the full Saccharomyces cerevisiae genome. This type of annotation track can be used when designing a modified S. cerevisiae genome. We further highlight how this track can provide new insights on the sequence-dependent mechanisms that drive nucleosomes' positions in vivo.

keras_dna: a wrapper for fast implementation of deep learning models in genomics.

Routhier, Etienne; Bin Kamruddin, Ayman; Mozziconacci, Julien.

Bioinformatics ; 37(11): 1593-1594, 2021 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-33135730

RESUMO

SUMMARY: Prediction of genomic annotations from DNA sequences using deep learning is today becoming a flourishing field with many applications. Nevertheless, there are still difficulties in handling data in order to conveniently build and train models dedicated for specific end-user's tasks. keras_dna is designed for an easy implementation of Keras models (TensorFlow high level API) for genomics. It can handle standard bioinformatic files formats as inputs such as bigwig, gff, bed, wig, bedGraph or fasta and returns standardized inputs for model training. keras_dna is designed to implement existing models but also to facilitate the development of news models that can have single or multiple targets or inputs. AVAILABILITY AND IMPLEMENTATION: Freely available with a MIT License using pip install keras_dna or cloning the github repo at https://github.com/etirouthier/keras_dna.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Software , DNA/genética , Genoma , Genômica

Genomics enters the deep learning era.

Routhier, Etienne; Mozziconacci, Julien.

PeerJ ; 10: e13613, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35769139

RESUMO

The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Genômica , Biologia Computacional

Nucleosome Positioning on Large Tandem DNA Repeats of the '601' Sequence Engineered in Saccharomyces cerevisiae.

Lancrey, Astrid; Joubert, Alexandra; Duvernois-Berthet, Evelyne; Routhier, Etienne; Raj, Saurabh; Thierry, Agnès; Sigarteu, Marta; Ponger, Loic; Croquette, Vincent; Mozziconacci, Julien; Boulé, Jean-Baptiste.

J Mol Biol ; 434(7): 167497, 2022 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-35189129

RESUMO

The artificial 601 DNA sequence is often used to constrain the position of nucleosomes on a DNA molecule in vitro. Although the ability of the 147 base pair sequence to precisely position a nucleosome in vitro is well documented, application of this property in vivo has been explored only in a few studies and yielded contradictory conclusions. Our goal in the present study was to test the ability of the 601 sequence to dictate nucleosome positioning in Saccharomyces cerevisiae in the context of a long tandem repeat array inserted in a yeast chromosome. We engineered such arrays with three different repeat size, namely 167, 197 and 237 base pairs. Although our arrays are able to position nucleosomes in vitro, analysis of nucleosome occupancy in vivo revealed that nucleosomes are not preferentially positioned as expected on the 601-core sequence along the repeats and that the measured nucleosome repeat length does not correspond to the one expected by design. Altogether our results demonstrate that the rules defining nucleosome positions on this DNA sequence in vitro are not valid in vivo, at least in this chromosomal context, questioning the relevance of using the 601 sequence in vivo to achieve precise nucleosome positioning on designer synthetic DNA sequences.

Assuntos

Nucleossomos , Saccharomyces cerevisiae , Sequências de Repetição em Tandem , Montagem e Desmontagem da Cromatina , DNA Fúngico/genética , DNA Fúngico/metabolismo , Engenharia Genética , Nucleossomos/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Sequências de Repetição em Tandem/genética

Genome annotation across species using deep convolutional neural networks.

Khodabandelou, Ghazaleh; Routhier, Etienne; Mozziconacci, Julien.

PeerJ Comput Sci ; 6: e278, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33816929

RESUMO

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA