Pesquisa | Secretaria de Estado da Saúde

1.

Intracellular Salmonella Paratyphi A is motile and differs in the expression of flagella-chemotaxis, SPI-1 and carbon utilization pathways in comparison to intracellular S. Typhimurium.

Cohen, Helit; Hoede, Claire; Scharte, Felix; Coluzzi, Charles; Cohen, Emiliano; Shomer, Inna; Mallet, Ludovic; Holbert, Sébastien; Serre, Remy Felix; Schiex, Thomas; Virlogeux-Payant, Isabelle; Grassl, Guntram A; Hensel, Michael; Chiapello, Hélène; Gal-Mor, Ohad.

PLoS Pathog ; 18(4): e1010425, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35381053

RESUMO

Although Salmonella Typhimurium (STM) and Salmonella Paratyphi A (SPA) belong to the same phylogenetic species, share large portions of their genome and express many common virulence factors, they differ vastly in their host specificity, the immune response they elicit, and the clinical manifestations they cause. In this work, we compared their intracellular transcriptomic architecture and cellular phenotypes during human epithelial cell infection. While transcription induction of many metal transport systems, purines, biotin, PhoPQ and SPI-2 regulons was similar in both intracellular SPA and STM, we identified 234 differentially expressed genes that showed distinct expression patterns in intracellular SPA vs. STM. Surprisingly, clear expression differences were found in SPI-1, motility and chemotaxis, and carbon (mainly citrate, galactonate and ethanolamine) utilization pathways, indicating that these pathways are regulated differently during their intracellular phase. Concurring, on the cellular level, we show that while the majority of STM are non-motile and reside within Salmonella-Containing Vacuoles (SCV), a significant proportion of intracellular SPA cells are motile and compartmentalized in the cytosol. Moreover, we found that the elevated expression of SPI-1 and motility genes by intracellular SPA results in increased invasiveness of SPA, following exit from host cells. These findings demonstrate unexpected flagellum-dependent intracellular motility of a typhoidal Salmonella serovar and intriguing differences in intracellular localization between typhoidal and non-typhoidal salmonellae. We propose that these differences facilitate new cycles of host cell infection by SPA and may contribute to the ability of SPA to disseminate beyond the intestinal lamina propria of the human host during enteric fever.

Assuntos

Quimiotaxia , Salmonella paratyphi A , Proteínas de Bactérias/metabolismo , Carbono/metabolismo , Flagelos/genética , Flagelos/metabolismo , Peptídeos e Proteínas de Sinalização Intercelular , Filogenia , Salmonella paratyphi A/metabolismo , Salmonella typhimurium

2.

The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.

Badouin, Hélène; Gouzy, Jérôme; Grassa, Christopher J; Murat, Florent; Staton, S Evan; Cottret, Ludovic; Lelandais-Brière, Christine; Owens, Gregory L; Carrère, Sébastien; Mayjonade, Baptiste; Legrand, Ludovic; Gill, Navdeep; Kane, Nolan C; Bowers, John E; Hubner, Sariel; Bellec, Arnaud; Bérard, Aurélie; Bergès, Hélène; Blanchet, Nicolas; Boniface, Marie-Claude; Brunel, Dominique; Catrice, Olivier; Chaidir, Nadia; Claudel, Clotilde; Donnadieu, Cécile; Faraut, Thomas; Fievet, Ghislain; Helmstetter, Nicolas; King, Matthew; Knapp, Steven J; Lai, Zhao; Le Paslier, Marie-Christine; Lippi, Yannick; Lorenzon, Lolita; Mandel, Jennifer R; Marage, Gwenola; Marchand, Gwenaëlle; Marquand, Elodie; Bret-Mestries, Emmanuelle; Morien, Evan; Nambeesan, Savithri; Nguyen, Thuy; Pegot-Espagnet, Prune; Pouilly, Nicolas; Raftis, Frances; Sallet, Erika; Schiex, Thomas; Thomas, Justine; Vandecasteele, Céline; Varès, Didier.

Nature ; 546(7656): 148-152, 2017 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-28538728

RESUMO

The domesticated sunflower, Helianthus annuus L., is a global oil crop that has promise for climate change adaptation, because it can maintain stable yields across a wide variety of environmental conditions, including drought. Even greater resilience is achievable through the mining of resistance alleles from compatible wild sunflower relatives, including numerous extremophile species. Here we report a high-quality reference for the sunflower genome (3.6 gigabases), together with extensive transcriptomic data from vegetative and floral organs. The genome mostly consists of highly similar, related sequences and required single-molecule real-time sequencing technologies for successful assembly. Genome analyses enabled the reconstruction of the evolutionary history of the Asterids, further establishing the existence of a whole-genome triplication at the base of the Asterids II clade and a sunflower-specific whole-genome duplication around 29 million years ago. An integrative approach combining quantitative genetics, expression and diversity data permitted development of comprehensive gene networks for two major breeding traits, flowering time and oil metabolism, and revealed new candidate genes in these networks. We found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years. This genome represents a cornerstone for future research programs aiming to exploit genetic diversity to improve biotic and abiotic stress resistance and oil production, while also considering agricultural constraints and human nutritional needs.

Assuntos

Evolução Molecular , Flores/genética , Flores/fisiologia , Genoma de Planta/genética , Helianthus/genética , Helianthus/metabolismo , Óleos de Plantas/metabolismo , Aclimatação/genética , Duplicação Gênica/genética , Regulação da Expressão Gênica de Plantas , Variação Genética , Genômica , Helianthus/classificação , Análise de Sequência de DNA , Estresse Fisiológico/genética , Óleo de Girassol , Transcriptoma/genética

3.

Iterated local search with partition crossover for computational protein design.

Beuvin, François; de Givry, Simon; Schiex, Thomas; Verel, Sébastien; Simoncini, David.

Proteins ; 89(11): 1522-1529, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34228826

RESUMO

Structure-based computational protein design (CPD) refers to the problem of finding a sequence of amino acids which folds into a specific desired protein structure, and possibly fulfills some targeted biochemical properties. Recent studies point out the particularly rugged CPD energy landscape, suggesting that local search optimization methods should be designed and tuned to easily escape local minima attraction basins. In this article, we analyze the performance and search dynamics of an iterated local search (ILS) algorithm enhanced with partition crossover. Our algorithm, PILS, quickly finds local minima and escapes their basins of attraction by solution perturbation. Additionally, the partition crossover operator exploits the structure of the residue interaction graph in order to efficiently mix solutions and find new unexplored basins. Our results on a benchmark of 30 proteins of various topology and size show that PILS consistently finds lower energy solutions compared to Rosetta fixbb and a classic ILS, and that the corresponding sequences are mostly closer to the native.

Assuntos

Algoritmos , Aminoácidos/química , Engenharia de Proteínas/métodos , Proteínas/química , Software , Sequência de Aminoácidos , Benchmarking , Biologia Computacional , Conformação Proteica , Dobramento de Proteína , Termodinâmica

4.

Seven Amino Acid Types Suffice to Create the Core Fold of RNA Polymerase.

Yagi, Sota; Padhi, Aditya K; Vucinic, Jelena; Barbe, Sophie; Schiex, Thomas; Nakagawa, Reiko; Simoncini, David; Zhang, Kam Y J; Tagami, Shunsuke.

J Am Chem Soc ; 143(39): 15998-16006, 2021 10 06.

Artigo em Inglês | MEDLINE | ID: mdl-34559526

RESUMO

The extant complex proteins must have evolved from ancient short and simple ancestors. The double-ψ ß-barrel (DPBB) is one of the oldest protein folds and conserved in various fundamental enzymes, such as the core domain of RNA polymerase. Here, by reverse engineering a modern DPBB domain, we reconstructed its plausible evolutionary pathway started by "interlacing homodimerization" of a half-size peptide, followed by gene duplication and fusion. Furthermore, by simplifying the amino acid repertoire of the peptide, we successfully created the DPBB fold with only seven amino acid types (Ala, Asp, Glu, Gly, Lys, Arg, and Val), which can be coded by only GNN and ARR (R = A or G) codons in the modern translation system. Thus, the DPBB fold could have been materialized by the early translation system and genetic code.

Assuntos

Aminoácidos/química , Aminoácidos/classificação , RNA Polimerases Dirigidas por DNA/química , RNA Polimerases Dirigidas por DNA/metabolismo , Sequência de Aminoácidos , Modelos Moleculares , Conformação Proteica , Domínios Proteicos , Dobramento de Proteína

5.

Positive multistate protein design.

Vucinic, Jelena; Simoncini, David; Ruffini, Manon; Barbe, Sophie; Schiex, Thomas.

Bioinformatics ; 36(1): 122-130, 2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31199465

RESUMO

MOTIVATION: Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems. RESULTS: We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms. AVAILABILITY AND IMPLEMENTATION: https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Engenharia de Proteínas , Proteínas , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Conformação Proteica , Engenharia de Proteínas/métodos , Proteínas/química , Software

6.

Protein Design with Deep Learning.

Defresne, Marianne; Barbe, Sophie; Schiex, Thomas.

Int J Mol Sci ; 22(21)2021 Oct 29.

Artigo em Inglês | MEDLINE | ID: mdl-34769173

RESUMO

Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.

Assuntos

Biologia Computacional , Aprendizado Profundo , Engenharia de Proteínas , Proteínas , Domínios Proteicos , Proteínas/química , Proteínas/genética

7.

A Comparative Study to Decipher the Structural and Dynamics Determinants Underlying the Activity and Thermal Stability of GH-11 Xylanases.

Vucinic, Jelena; Novikov, Gleb; Montanier, Cédric Y; Dumon, Claire; Schiex, Thomas; Barbe, Sophie.

Int J Mol Sci ; 22(11)2021 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-34073139

RESUMO

With the growing need for renewable sources of energy, the interest for enzymes capable of biomass degradation has been increasing. In this paper, we consider two different xylanases from the GH-11 family: the particularly active GH-11 xylanase from Neocallimastix patriciarum, NpXyn11A, and the hyper-thermostable mutant of the environmentally isolated GH-11 xylanase, EvXyn11TS. Our aim is to identify the molecular determinants underlying the enhanced capacities of these two enzymes to ultimately graft the abilities of one on the other. Molecular dynamics simulations of the respective free-enzymes and enzyme-xylohexaose complexes were carried out at temperatures of 300, 340, and 500 K. An in-depth analysis of these MD simulations showed how differences in dynamics influence the activity and stability of these two enzymes and allowed us to study and understand in greater depth the molecular and structural basis of these two systems. In light of the results presented in this paper, the thumb region and the larger substrate binding cleft of NpXyn11A seem to play a major role on the activity of this enzyme. Its lower thermal stability may instead be caused by the higher flexibility of certain regions located further from the active site. Regions such as the N-ter, the loops located in the fingers region, the palm loop, and the helix loop seem to be less stable than in the hyper-thermostable EvXyn11TS. By identifying molecular regions that are critical for the stability of these enzymes, this study allowed us to identify promising targets for engineering GH-11 xylanases. Eventually, we identify NpXyn11A as the ideal host for grafting the thermostabilizing traits of EvXyn11TS.

Assuntos

Endo-1,4-beta-Xilanases/química , Neocallimastix/enzimologia , Sequência de Aminoácidos , Domínio Catalítico , Estabilidade Enzimática , Cinética , Simulação de Dinâmica Molecular , Temperatura

8.

A structural homology approach for computational protein design with flexible backbone.

Simoncini, David; Zhang, Kam Y J; Schiex, Thomas; Barbe, Sophie.

Bioinformatics ; 35(14): 2418-2426, 2019 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-30496341

RESUMO

MOTIVATION: Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. RESULTS: We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. AVAILABILITY AND IMPLEMENTATION: Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Conformação Proteica , Proteínas

9.

Hybridization and polyploidy enable genomic plasticity without sex in the most devastating plant-parasitic nematodes.

Blanc-Mathieu, Romain; Perfus-Barbeoch, Laetitia; Aury, Jean-Marc; Da Rocha, Martine; Gouzy, Jérôme; Sallet, Erika; Martin-Jimenez, Cristina; Bailly-Bechet, Marc; Castagnone-Sereno, Philippe; Flot, Jean-François; Kozlowski, Djampa K; Cazareth, Julie; Couloux, Arnaud; Da Silva, Corinne; Guy, Julie; Kim-Jo, Yu-Jin; Rancurel, Corinne; Schiex, Thomas; Abad, Pierre; Wincker, Patrick; Danchin, Etienne G J.

PLoS Genet ; 13(6): e1006777, 2017 06.

Artigo em Inglês | MEDLINE | ID: mdl-28594822

RESUMO

Root-knot nematodes (genus Meloidogyne) exhibit a diversity of reproductive modes ranging from obligatory sexual to fully asexual reproduction. Intriguingly, the most widespread and devastating species to global agriculture are those that reproduce asexually, without meiosis. To disentangle this surprising parasitic success despite the absence of sex and genetic exchanges, we have sequenced and assembled the genomes of three obligatory ameiotic and asexual Meloidogyne. We have compared them to those of relatives able to perform meiosis and sexual reproduction. We show that the genomes of ameiotic asexual Meloidogyne are large, polyploid and made of duplicated regions with a high within-species average nucleotide divergence of ~8%. Phylogenomic analysis of the genes present in these duplicated regions suggests that they originated from multiple hybridization events and are thus homoeologs. We found that up to 22% of homoeologous gene pairs were under positive selection and these genes covered a wide spectrum of predicted functional categories. To biologically assess functional divergence, we compared expression patterns of homoeologous gene pairs across developmental life stages using an RNAseq approach in the most economically important asexually-reproducing nematode. We showed that >60% of homoeologous gene pairs display diverged expression patterns. These results suggest a substantial functional impact of the genome structure. Contrasting with high within-species nuclear genome divergence, mitochondrial genome divergence between the three ameiotic asexuals was very low, signifying that these putative hybrids share a recent common maternal ancestor. Transposable elements (TE) cover a ~1.7 times higher proportion of the genomes of the ameiotic asexual Meloidogyne compared to the sexual relative and might also participate in their plasticity. The intriguing parasitic success of asexually-reproducing Meloidogyne species could be partly explained by their TE-rich composite genomes, resulting from allopolyploidization events, and promoting plasticity and functional divergence between gene copies in the absence of sex and meiosis.

Assuntos

Variação Genética , Genoma Helmíntico , Hibridização Genética , Poliploidia , Reprodução Assexuada , Tylenchoidea/genética , Animais , Elementos de DNA Transponíveis , Genoma Mitocondrial , Polimorfismo Genético , Seleção Genética

10.

Cost function network-based design of protein-protein interactions: predicting changes in binding affinity.

Viricel, Clément; de Givry, Simon; Schiex, Thomas; Barbe, Sophie.

Bioinformatics ; 34(15): 2581-2589, 2018 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-29474517

RESUMO

Motivation: Accurate and economic methods to predict change in protein binding free energy upon mutation are imperative to accelerate the design of proteins for a wide range of applications. Free energy is defined by enthalpic and entropic contributions. Following the recent progresses of Artificial Intelligence-based algorithms for guaranteed NP-hard energy optimization and partition function computation, it becomes possible to quickly compute minimum energy conformations and to reliably estimate the entropic contribution of side-chains in the change of free energy of large protein interfaces. Results: Using guaranteed Cost Function Network algorithms, Rosetta energy functions and Dunbrack's rotamer library, we developed and assessed EasyE and JayZ, two methods for binding affinity estimation that ignore or include conformational entropic contributions on a large benchmark of binding affinity experimental measures. If both approaches outperform most established tools, we observe that side-chain conformational entropy brings little or no improvement on most systems but becomes crucial in some rare cases. Availability and implementation: as open-source Python/C++ code at sourcesup.renater.fr/projects/easy-jayz. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Inteligência Artificial , Biologia Computacional/métodos , Mutação , Ligação Proteica , Proteínas/química , Termodinâmica , Animais , Bactérias/genética , Bactérias/metabolismo , Entropia , Humanos , Conformação Proteica , Proteínas/genética , Proteínas/metabolismo , Software

11.

Variable Neighborhood Search with Cost Function Networks To Solve Large Computational Protein Design Problems.

Charpentier, Antoine; Mignon, David; Barbe, Sophie; Cortes, Juan; Schiex, Thomas; Simonson, Thomas; Allouche, David.

J Chem Inf Model ; 59(1): 127-136, 2019 01 28.

Artigo em Inglês | MEDLINE | ID: mdl-30380857

RESUMO

Computational protein design (CPD) aims to predict amino acid sequences that fold to specific structures and perform desired functions. CPD depends on a rotamer library, an energy function, and an algorithm to search the sequence/conformation space. Variable neighborhood search (VNS) with cost function networks is a powerful framework that can provide tight upper bounds on the global minimum energy. We propose a new CPD heuristic based on VNS in which a subset of the solution space (a "neighborhood") is explored, whose size is gradually increased with a dedicated probabilistic heuristic. The algorithm was tested on 99 protein designs with fixed backbones involving nine proteins from the SH2, SH3, and PDZ families. The number of mutating positions was 20, 30, or all of the amino acids, while the rest of the protein explored side-chain rotamers. VNS was more successful than Monte Carlo (MC), replica-exchange MC, and a heuristic steepest-descent energy minimization, providing solutions with equal or lower best energies in most cases. For complete protein redesign, it gave solutions that were 2.5 to 11.2 kcal/mol lower in energy than those obtained with the other approaches. VNS is implemented in the toulbar2 software. It could be very helpful for large and/or complex design problems.

Assuntos

Biologia Computacional , Engenharia de Proteínas , Proteínas/química , Algoritmos , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Software

12.

Balancing exploration and exploitation in population-based sampling improves fragment-based de novo protein structure prediction.

Simoncini, David; Schiex, Thomas; Zhang, Kam Y J.

Proteins ; 85(5): 852-858, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-28066917

RESUMO

Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population-based meta-heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment-based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster-based variation (EdaRosec ) and an energy-based one (EdaRoseen ). We analyze the search dynamics of our new Rosetta protocols and show that EdaRosec is able to provide predictions with lower C αRMSD to the native structure than EdaRoseen and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/. Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852-858. © 2016 Wiley Periodicals, Inc.

Assuntos

Algoritmos , Proteínas/química , Proteômica/estatística & dados numéricos , Software , Benchmarking , Análise por Conglomerados , Internet , Conformação Proteica , Proteômica/métodos , Termodinâmica

13.

Fast search algorithms for computational protein design.

Traoré, Seydou; Roberts, Kyle E; Allouche, David; Donald, Bruce R; André, Isabelle; Schiex, Thomas; Barbe, Sophie.

J Comput Chem ; 37(12): 1048-58, 2016 May 05.

Artigo em Inglês | MEDLINE | ID: mdl-26833706

RESUMO

One of the main challenges in computational protein design (CPD) is the huge size of the protein sequence and conformational space that has to be computationally explored. Recently, we showed that state-of-the-art combinatorial optimization technologies based on Cost Function Network (CFN) processing allow speeding up provable rigid backbone protein design methods by several orders of magnitudes. Building up on this, we improved and injected CFN technology into the well-established CPD package Osprey to allow all Osprey CPD algorithms to benefit from associated speedups. Because Osprey fundamentally relies on the ability of A* to produce conformations in increasing order of energy, we defined new A* strategies combining CFN lower bounds, with new side-chain positioning-based branching scheme. Beyond the speedups obtained in the new A*-CFN combination, this novel branching scheme enables a much faster enumeration of suboptimal sequences, far beyond what is reachable without it. Together with the immediate and important speedups provided by CFN technology, these developments directly benefit to all the algorithms that previously relied on the DEE/ A* combination inside Osprey* and make it possible to solve larger CPD problems with provable algorithms.

Assuntos

Algoritmos , Biologia Computacional , Proteínas/química , Sequência de Aminoácidos , Desenho de Fármacos , Conformação Proteica

14.

EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes.

Sallet, Erika; Gouzy, Jérôme; Schiex, Thomas.

Bioinformatics ; 30(18): 2659-61, 2014 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-24880686

RESUMO

UNLABELLED: It is now easy and increasingly usual to produce oriented RNA-Seq data as a prokaryotic genome is being sequenced. However, this information is usually just used for expression quantification. EuGene-PP is a fully automated pipeline for structural annotation of prokaryotic genomes integrating protein similarities, statistical information and any oriented expression information (RNA-Seq or tiling arrays) through a variety of file formats to produce a qualitatively enriched annotation including coding regions but also (possibly antisense) non-coding genes and transcription start sites. AVAILABILITY AND IMPLEMENTATION: EuGene-PP is an open-source software based on EuGene-P integrating a Galaxy configuration. EuGene-PP can be downloaded at eugene.toulouse.inra.fr.

Assuntos

Bactérias/genética , Genoma Bacteriano/genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Automação , Análise de Sequência de RNA , Sítio de Iniciação de Transcrição

15.

A new framework for computational protein design through cost function network optimization.

Traoré, Seydou; Allouche, David; André, Isabelle; de Givry, Simon; Katsirelos, George; Schiex, Thomas; Barbe, Sophie.

Bioinformatics ; 29(17): 2129-36, 2013 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-23842814

RESUMO

MOTIVATION: The main challenge for structure-based computational protein design (CPD) remains the combinatorial nature of the search space. Even in its simplest fixed-backbone formulation, CPD encompasses a computationally difficult NP-hard problem that prevents the exact exploration of complex systems defining large sequence-conformation spaces. RESULTS: We present here a CPD framework, based on cost function network (CFN) solving, a recent exact combinatorial optimization technique, to efficiently handle highly complex combinatorial spaces encountered in various protein design problems. We show that the CFN-based approach is able to solve optimality a variety of complex designs that could often not be solved using a usual CPD-dedicated tool or state-of-the-art exact operations research tools. Beyond the identification of the optimal solution, the global minimum-energy conformation, the CFN-based method is also able to quickly enumerate large ensembles of suboptimal solutions of interest to rationally build experimental enzyme mutant libraries. AVAILABILITY: The combined pipeline used to generate energetic models (based on a patched version of the open source solver Osprey 2.0), the conversion to CFN models (based on Perl scripts) and CFN solving (based on the open source solver toulbar2) are all available at http://genoweb.toulouse.inra.fr/~tschiex/CPD

Assuntos

Conformação Proteica , Engenharia de Proteínas/métodos , Algoritmos , Modelos Moleculares , Proteínas/química , Análise de Sequência de Proteína , Software

16.

Complete combinatorial mutational enumeration of a protein functional site enables sequence-landscape mapping and identifies highly-mutated variants that retain activity.

Colom, Mireia Solà; Vucinic, Jelena; Adolf-Bryfogle, Jared; Bowman, James W; Verel, Sébastien; Moczygemba, Isabelle; Schiex, Thomas; Simoncini, David; Bahl, Christopher D.

Protein Sci ; 33(8): e5109, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38989563

RESUMO

Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride toward achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.

Assuntos

Enzima de Conversão de Angiotensina 2 , Mutação , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/metabolismo , Enzima de Conversão de Angiotensina 2/metabolismo , Enzima de Conversão de Angiotensina 2/química , Enzima de Conversão de Angiotensina 2/genética , SARS-CoV-2/genética , SARS-CoV-2/química , SARS-CoV-2/metabolismo , Humanos , Sítios de Ligação , COVID-19/virologia , COVID-19/genética , Ligação Proteica , Inteligência Artificial

17.

Complete Combinatorial Mutational Enumeration of a protein functional site enables sequence-landscape mapping and identifies highly-mutated variants that retain activity.

Colom, Mireia Solà; Vucinic, Jelena; Adolf-Bryfogle, Jared; Bowman, James W; Verel, Sébastien; Moczygemba, Isabelle; Schiex, Thomas; Simoncini, David; Bahl, Christopher D.

Res Sq ; 2023 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-36482980

RESUMO

Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride towards achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.

18.

Detecting long tandem duplications in genomic sequences.

Audemard, Eric; Schiex, Thomas; Faraut, Thomas.

BMC Bioinformatics ; 13: 83, 2012 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-22568762

RESUMO

BACKGROUND: Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. RESULTS: In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. CONCLUSIONS: ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.

Assuntos

Arabidopsis/genética , Evolução Molecular , Duplicação Gênica , Software , Algoritmos , Genoma de Planta

19.

Computational Design of Miniprotein Binders.

Bouchiba, Younes; Ruffini, Manon; Schiex, Thomas; Barbe, Sophie.

Methods Mol Biol ; 2405: 361-382, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35298822

RESUMO

Miniprotein binders hold a great interest as a class of drugs that bridges the gap between monoclonal antibodies and small molecule drugs. Like monoclonal antibodies, they can be designed to bind to therapeutic targets with high affinity, but they are more stable and easier to produce and to administer. In this chapter, we present a structure-based computational generic approach for miniprotein inhibitor design. Specifically, we describe step-by-step the implementation of the approach for the design of miniprotein binders against the SARS-CoV-2 coronavirus, using available structural data on the SARS-CoV-2 spike receptor binding domain (RBD) in interaction with its native target, the human receptor ACE2. Structural data being increasingly accessible around many protein-protein interaction systems, this method might be applied to the design of miniprotein binders against numerous therapeutic targets. The computational pipeline exploits provable and deterministic artificial intelligence-based protein design methods, with some recent additions in terms of binding energy estimation, multistate design and diverse library generation.

Assuntos

Simulação por Computador , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus , Inteligência Artificial , Humanos , Ligação Proteica , Domínios Proteicos , SARS-CoV-2/química , Glicoproteína da Espícula de Coronavírus/química

20.

Molecular flexibility in computational protein design: an algorithmic perspective.

Bouchiba, Younes; Cortés, Juan; Schiex, Thomas; Barbe, Sophie.

Protein Eng Des Sel ; 342021 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-33959778

RESUMO

Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.

Assuntos

Algoritmos , Biologia Computacional , Modelos Moleculares , Conformação Proteica , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa