Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Protein Sci ; 33(8): e5109, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38989563

RESUMEN

Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride toward achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.


Asunto(s)
Enzima Convertidora de Angiotensina 2 , Mutación , SARS-CoV-2 , Glicoproteína de la Espiga del Coronavirus , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/metabolismo , Enzima Convertidora de Angiotensina 2/metabolismo , Enzima Convertidora de Angiotensina 2/química , Enzima Convertidora de Angiotensina 2/genética , SARS-CoV-2/genética , SARS-CoV-2/química , SARS-CoV-2/metabolismo , Humanos , Sitios de Unión , COVID-19/virología , COVID-19/genética , Unión Proteica , Inteligencia Artificial
2.
Res Sq ; 2023 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-36482980

RESUMEN

Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride towards achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.

3.
PLoS Pathog ; 18(4): e1010425, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35381053

RESUMEN

Although Salmonella Typhimurium (STM) and Salmonella Paratyphi A (SPA) belong to the same phylogenetic species, share large portions of their genome and express many common virulence factors, they differ vastly in their host specificity, the immune response they elicit, and the clinical manifestations they cause. In this work, we compared their intracellular transcriptomic architecture and cellular phenotypes during human epithelial cell infection. While transcription induction of many metal transport systems, purines, biotin, PhoPQ and SPI-2 regulons was similar in both intracellular SPA and STM, we identified 234 differentially expressed genes that showed distinct expression patterns in intracellular SPA vs. STM. Surprisingly, clear expression differences were found in SPI-1, motility and chemotaxis, and carbon (mainly citrate, galactonate and ethanolamine) utilization pathways, indicating that these pathways are regulated differently during their intracellular phase. Concurring, on the cellular level, we show that while the majority of STM are non-motile and reside within Salmonella-Containing Vacuoles (SCV), a significant proportion of intracellular SPA cells are motile and compartmentalized in the cytosol. Moreover, we found that the elevated expression of SPI-1 and motility genes by intracellular SPA results in increased invasiveness of SPA, following exit from host cells. These findings demonstrate unexpected flagellum-dependent intracellular motility of a typhoidal Salmonella serovar and intriguing differences in intracellular localization between typhoidal and non-typhoidal salmonellae. We propose that these differences facilitate new cycles of host cell infection by SPA and may contribute to the ability of SPA to disseminate beyond the intestinal lamina propria of the human host during enteric fever.


Asunto(s)
Quimiotaxis , Salmonella paratyphi A , Proteínas Bacterianas/metabolismo , Carbono/metabolismo , Flagelos/genética , Flagelos/metabolismo , Péptidos y Proteínas de Señalización Intercelular , Filogenia , Salmonella paratyphi A/metabolismo , Salmonella typhimurium
4.
Methods Mol Biol ; 2405: 361-382, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35298822

RESUMEN

Miniprotein binders hold a great interest as a class of drugs that bridges the gap between monoclonal antibodies and small molecule drugs. Like monoclonal antibodies, they can be designed to bind to therapeutic targets with high affinity, but they are more stable and easier to produce and to administer. In this chapter, we present a structure-based computational generic approach for miniprotein inhibitor design. Specifically, we describe step-by-step the implementation of the approach for the design of miniprotein binders against the SARS-CoV-2 coronavirus, using available structural data on the SARS-CoV-2 spike receptor binding domain (RBD) in interaction with its native target, the human receptor ACE2. Structural data being increasingly accessible around many protein-protein interaction systems, this method might be applied to the design of miniprotein binders against numerous therapeutic targets. The computational pipeline exploits provable and deterministic artificial intelligence-based protein design methods, with some recent additions in terms of binding energy estimation, multistate design and diverse library generation.


Asunto(s)
Simulación por Computador , SARS-CoV-2 , Glicoproteína de la Espiga del Coronavirus , Inteligencia Artificial , Humanos , Unión Proteica , Dominios Proteicos , SARS-CoV-2/química , Glicoproteína de la Espiga del Coronavirus/química
5.
Int J Mol Sci ; 22(21)2021 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-34769173

RESUMEN

Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.


Asunto(s)
Biología Computacional , Aprendizaje Profundo , Ingeniería de Proteínas , Proteínas , Dominios Proteicos , Proteínas/química , Proteínas/genética
6.
J Am Chem Soc ; 143(39): 15998-16006, 2021 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-34559526

RESUMEN

The extant complex proteins must have evolved from ancient short and simple ancestors. The double-ψ ß-barrel (DPBB) is one of the oldest protein folds and conserved in various fundamental enzymes, such as the core domain of RNA polymerase. Here, by reverse engineering a modern DPBB domain, we reconstructed its plausible evolutionary pathway started by "interlacing homodimerization" of a half-size peptide, followed by gene duplication and fusion. Furthermore, by simplifying the amino acid repertoire of the peptide, we successfully created the DPBB fold with only seven amino acid types (Ala, Asp, Glu, Gly, Lys, Arg, and Val), which can be coded by only GNN and ARR (R = A or G) codons in the modern translation system. Thus, the DPBB fold could have been materialized by the early translation system and genetic code.


Asunto(s)
Aminoácidos/química , Aminoácidos/clasificación , ARN Polimerasas Dirigidas por ADN/química , ARN Polimerasas Dirigidas por ADN/metabolismo , Secuencia de Aminoácidos , Modelos Moleculares , Conformación Proteica , Dominios Proteicos , Pliegue de Proteína
7.
Proteins ; 89(11): 1522-1529, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34228826

RESUMEN

Structure-based computational protein design (CPD) refers to the problem of finding a sequence of amino acids which folds into a specific desired protein structure, and possibly fulfills some targeted biochemical properties. Recent studies point out the particularly rugged CPD energy landscape, suggesting that local search optimization methods should be designed and tuned to easily escape local minima attraction basins. In this article, we analyze the performance and search dynamics of an iterated local search (ILS) algorithm enhanced with partition crossover. Our algorithm, PILS, quickly finds local minima and escapes their basins of attraction by solution perturbation. Additionally, the partition crossover operator exploits the structure of the residue interaction graph in order to efficiently mix solutions and find new unexplored basins. Our results on a benchmark of 30 proteins of various topology and size show that PILS consistently finds lower energy solutions compared to Rosetta fixbb and a classic ILS, and that the corresponding sequences are mostly closer to the native.


Asunto(s)
Algoritmos , Aminoácidos/química , Ingeniería de Proteínas/métodos , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Benchmarking , Biología Computacional , Conformación Proteica , Pliegue de Proteína , Termodinámica
8.
Int J Mol Sci ; 22(11)2021 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-34073139

RESUMEN

With the growing need for renewable sources of energy, the interest for enzymes capable of biomass degradation has been increasing. In this paper, we consider two different xylanases from the GH-11 family: the particularly active GH-11 xylanase from Neocallimastix patriciarum, NpXyn11A, and the hyper-thermostable mutant of the environmentally isolated GH-11 xylanase, EvXyn11TS. Our aim is to identify the molecular determinants underlying the enhanced capacities of these two enzymes to ultimately graft the abilities of one on the other. Molecular dynamics simulations of the respective free-enzymes and enzyme-xylohexaose complexes were carried out at temperatures of 300, 340, and 500 K. An in-depth analysis of these MD simulations showed how differences in dynamics influence the activity and stability of these two enzymes and allowed us to study and understand in greater depth the molecular and structural basis of these two systems. In light of the results presented in this paper, the thumb region and the larger substrate binding cleft of NpXyn11A seem to play a major role on the activity of this enzyme. Its lower thermal stability may instead be caused by the higher flexibility of certain regions located further from the active site. Regions such as the N-ter, the loops located in the fingers region, the palm loop, and the helix loop seem to be less stable than in the hyper-thermostable EvXyn11TS. By identifying molecular regions that are critical for the stability of these enzymes, this study allowed us to identify promising targets for engineering GH-11 xylanases. Eventually, we identify NpXyn11A as the ideal host for grafting the thermostabilizing traits of EvXyn11TS.


Asunto(s)
Endo-1,4-beta Xilanasas/química , Neocallimastix/enzimología , Secuencia de Aminoácidos , Dominio Catalítico , Estabilidad de Enzimas , Cinética , Simulación de Dinámica Molecular , Temperatura
9.
Protein Eng Des Sel ; 342021 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-33959778

RESUMEN

Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.


Asunto(s)
Algoritmos , Biología Computacional , Modelos Moleculares , Conformación Proteica , Reproducibilidad de los Resultados
10.
Bioinformatics ; 36(1): 122-130, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31199465

RESUMEN

MOTIVATION: Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems. RESULTS: We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms. AVAILABILITY AND IMPLEMENTATION: https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Ingeniería de Proteínas , Proteínas , Algoritmos , Secuencia de Aminoácidos , Biología Computacional , Conformación Proteica , Ingeniería de Proteínas/métodos , Proteínas/química , Programas Informáticos
11.
Methods Mol Biol ; 1962: 97-120, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31020556

RESUMEN

EuGene is an integrative gene finder applicable to both prokaryotic and eukaryotic genomes. EuGene annotated its first genome in 1999. Starting from genomic DNA sequences representing a complete genome, EuGene is able to predict the major transcript units in the genome from a variety of sources of information: statistical information, similarities with known transcripts and proteins, but also any GFF3 structured information supporting the presence or absence of specific types of elements. EuGene has been used to find genes in the plants Arabidopsis thaliana, Medicago truncatula, and Theobroma cacao; tomato, sunflower, and Rosa genomes; and in the nematode Meloidogyne incognita genome, among many others. The large fraction of plant in this list probably influenced EuGene development, especially in its capacities to withstand a genome with a large number of repeated regions and transposable elements.Depending on the sources of information used for prediction, EuGene can be considered as purely ab initio, purely similarity based, or hybrid. With the general availability of NGS-transcribed sequence data in genome projects, EuGene adopts a default hybrid behavior that strongly relies on similarity information. Initially targeted at eukaryotic genomes, EuGene has also been extended to offer integrative gene prediction for bacteria, allowing for richer and robust predictions than either purely statistical or homology-based prokaryotic gene finders.This text has been written as a practical guide that will give you the capacity to train and execute EuGene on your favorite eukaryotic genome. As the prokaryotic case is simpler and has already been described, only the main differences with the eukaryotic version were reported.


Asunto(s)
Biología Computacional/métodos , Células Eucariotas , Células Procariotas , Programas Informáticos , Arabidopsis/genética , Bases de Datos Genéticas , Internet , Aprendizaje Automático , Modelos Estadísticos , Anotación de Secuencia Molecular , Plantas/genética , Proteoma/genética , Sitios de Empalme de ARN , ARN no Traducido , Transcriptoma , Navegador Web
12.
IUCrJ ; 6(Pt 1): 46-55, 2019 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-30713702

RESUMEN

ß-Propeller proteins form one of the largest families of protein structures, with a pseudo-symmetrical fold made up of subdomains called blades. They are not only abundant but are also involved in a wide variety of cellular processes, often by acting as a platform for the assembly of protein complexes. WD40 proteins are a subfamily of propeller proteins with no intrinsic enzymatic activity, but their stable, modular architecture and versatile surface have allowed evolution to adapt them to many vital roles. By computationally reverse-engineering the duplication, fusion and diversification events in the evolutionary history of a WD40 protein, a perfectly symmetrical homologue called Tako8 was made. If two or four blades of Tako8 are expressed as single polypeptides, they do not self-assemble to complete the eight-bladed architecture, which may be owing to the closely spaced negative charges inside the ring. A different computational approach was employed to redesign Tako8 to create Ika8, a fourfold-symmetrical protein in which neighbouring blades carry compensating charges. Ika2 and Ika4, carrying two or four blades per subunit, respectively, were found to assemble spontaneously into a complete eight-bladed ring in solution. These artificial eight-bladed rings may find applications in bionanotechnology and as models to study the folding and evolution of WD40 proteins.

13.
J Chem Inf Model ; 59(1): 127-136, 2019 01 28.
Artículo en Inglés | MEDLINE | ID: mdl-30380857

RESUMEN

Computational protein design (CPD) aims to predict amino acid sequences that fold to specific structures and perform desired functions. CPD depends on a rotamer library, an energy function, and an algorithm to search the sequence/conformation space. Variable neighborhood search (VNS) with cost function networks is a powerful framework that can provide tight upper bounds on the global minimum energy. We propose a new CPD heuristic based on VNS in which a subset of the solution space (a "neighborhood") is explored, whose size is gradually increased with a dedicated probabilistic heuristic. The algorithm was tested on 99 protein designs with fixed backbones involving nine proteins from the SH2, SH3, and PDZ families. The number of mutating positions was 20, 30, or all of the amino acids, while the rest of the protein explored side-chain rotamers. VNS was more successful than Monte Carlo (MC), replica-exchange MC, and a heuristic steepest-descent energy minimization, providing solutions with equal or lower best energies in most cases. For complete protein redesign, it gave solutions that were 2.5 to 11.2 kcal/mol lower in energy than those obtained with the other approaches. VNS is implemented in the toulbar2 software. It could be very helpful for large and/or complex design problems.


Asunto(s)
Biología Computacional , Ingeniería de Proteínas , Proteínas/química , Algoritmos , Modelos Moleculares , Método de Montecarlo , Conformación Proteica , Programas Informáticos
14.
Bioinformatics ; 35(14): 2418-2426, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-30496341

RESUMEN

MOTIVATION: Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. RESULTS: We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. AVAILABILITY AND IMPLEMENTATION: Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Biología Computacional , Bases de Datos de Proteínas , Conformación Proteica , Proteínas
15.
Bioinformatics ; 34(15): 2581-2589, 2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-29474517

RESUMEN

Motivation: Accurate and economic methods to predict change in protein binding free energy upon mutation are imperative to accelerate the design of proteins for a wide range of applications. Free energy is defined by enthalpic and entropic contributions. Following the recent progresses of Artificial Intelligence-based algorithms for guaranteed NP-hard energy optimization and partition function computation, it becomes possible to quickly compute minimum energy conformations and to reliably estimate the entropic contribution of side-chains in the change of free energy of large protein interfaces. Results: Using guaranteed Cost Function Network algorithms, Rosetta energy functions and Dunbrack's rotamer library, we developed and assessed EasyE and JayZ, two methods for binding affinity estimation that ignore or include conformational entropic contributions on a large benchmark of binding affinity experimental measures. If both approaches outperform most established tools, we observe that side-chain conformational entropy brings little or no improvement on most systems but becomes crucial in some rare cases. Availability and implementation: as open-source Python/C++ code at sourcesup.renater.fr/projects/easy-jayz. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Mutación , Unión Proteica , Proteínas/química , Termodinámica , Animales , Bacterias/genética , Bacterias/metabolismo , Entropía , Humanos , Conformación Proteica , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos
16.
PLoS Genet ; 13(6): e1006777, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-28594822

RESUMEN

Root-knot nematodes (genus Meloidogyne) exhibit a diversity of reproductive modes ranging from obligatory sexual to fully asexual reproduction. Intriguingly, the most widespread and devastating species to global agriculture are those that reproduce asexually, without meiosis. To disentangle this surprising parasitic success despite the absence of sex and genetic exchanges, we have sequenced and assembled the genomes of three obligatory ameiotic and asexual Meloidogyne. We have compared them to those of relatives able to perform meiosis and sexual reproduction. We show that the genomes of ameiotic asexual Meloidogyne are large, polyploid and made of duplicated regions with a high within-species average nucleotide divergence of ~8%. Phylogenomic analysis of the genes present in these duplicated regions suggests that they originated from multiple hybridization events and are thus homoeologs. We found that up to 22% of homoeologous gene pairs were under positive selection and these genes covered a wide spectrum of predicted functional categories. To biologically assess functional divergence, we compared expression patterns of homoeologous gene pairs across developmental life stages using an RNAseq approach in the most economically important asexually-reproducing nematode. We showed that >60% of homoeologous gene pairs display diverged expression patterns. These results suggest a substantial functional impact of the genome structure. Contrasting with high within-species nuclear genome divergence, mitochondrial genome divergence between the three ameiotic asexuals was very low, signifying that these putative hybrids share a recent common maternal ancestor. Transposable elements (TE) cover a ~1.7 times higher proportion of the genomes of the ameiotic asexual Meloidogyne compared to the sexual relative and might also participate in their plasticity. The intriguing parasitic success of asexually-reproducing Meloidogyne species could be partly explained by their TE-rich composite genomes, resulting from allopolyploidization events, and promoting plasticity and functional divergence between gene copies in the absence of sex and meiosis.


Asunto(s)
Variación Genética , Genoma de los Helmintos , Hibridación Genética , Poliploidía , Reproducción Asexuada , Tylenchoidea/genética , Animales , Elementos Transponibles de ADN , Genoma Mitocondrial , Polimorfismo Genético , Selección Genética
17.
Nature ; 546(7656): 148-152, 2017 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-28538728

RESUMEN

The domesticated sunflower, Helianthus annuus L., is a global oil crop that has promise for climate change adaptation, because it can maintain stable yields across a wide variety of environmental conditions, including drought. Even greater resilience is achievable through the mining of resistance alleles from compatible wild sunflower relatives, including numerous extremophile species. Here we report a high-quality reference for the sunflower genome (3.6 gigabases), together with extensive transcriptomic data from vegetative and floral organs. The genome mostly consists of highly similar, related sequences and required single-molecule real-time sequencing technologies for successful assembly. Genome analyses enabled the reconstruction of the evolutionary history of the Asterids, further establishing the existence of a whole-genome triplication at the base of the Asterids II clade and a sunflower-specific whole-genome duplication around 29 million years ago. An integrative approach combining quantitative genetics, expression and diversity data permitted development of comprehensive gene networks for two major breeding traits, flowering time and oil metabolism, and revealed new candidate genes in these networks. We found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years. This genome represents a cornerstone for future research programs aiming to exploit genetic diversity to improve biotic and abiotic stress resistance and oil production, while also considering agricultural constraints and human nutritional needs.


Asunto(s)
Evolución Molecular , Flores/genética , Flores/fisiología , Genoma de Planta/genética , Helianthus/genética , Helianthus/metabolismo , Aceites de Plantas/metabolismo , Aclimatación/genética , Duplicación de Gen/genética , Regulación de la Expresión Génica de las Plantas , Variación Genética , Genómica , Helianthus/clasificación , Análisis de Secuencia de ADN , Estrés Fisiológico/genética , Aceite de Girasol , Transcriptoma/genética
18.
Proteins ; 85(5): 852-858, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28066917

RESUMEN

Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population-based meta-heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment-based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster-based variation (EdaRosec ) and an energy-based one (EdaRoseen ). We analyze the search dynamics of our new Rosetta protocols and show that EdaRosec is able to provide predictions with lower C αRMSD to the native structure than EdaRoseen and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/. Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852-858. © 2016 Wiley Periodicals, Inc.


Asunto(s)
Algoritmos , Proteínas/química , Proteómica/estadística & datos numéricos , Programas Informáticos , Benchmarking , Análisis por Conglomerados , Internet , Conformación Proteica , Proteómica/métodos , Termodinámica
19.
Methods Mol Biol ; 1529: 107-123, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27914047

RESUMEN

One main challenge in Computational Protein Design (CPD) lies in the exploration of the amino-acid sequence space, while considering, to some extent, side chain flexibility. The exorbitant size of the search space urges for the development of efficient exact deterministic search methods enabling identification of low-energy sequence-conformation models, corresponding either to the global minimum energy conformation (GMEC) or an ensemble of guaranteed near-optimal solutions. In contrast to stochastic local search methods that are not guaranteed to find the GMEC, exact deterministic approaches always identify the GMEC and prove its optimality in finite but exponential worst-case time. After a brief overview on these two classes of methods, we discuss the grounds and merits of four deterministic methods that have been applied to solve CPD problems. These approaches are based either on the Dead-End-Elimination theorem combined with A* algorithm (DEE/A*), on Cost Function Networks algorithms (CFN), on Integer Linear Programming solvers (ILP) or on Markov Random Fields solvers (MRF). The way two of these methods (DEE/A* and CFN) can be used in practice to identify low-energy sequence-conformation models starting from a pairwise decomposed energy matrix is detailed in this review.


Asunto(s)
Biología Computacional/métodos , Ingeniería de Proteínas/métodos , Proteínas , Algoritmos , Simulación por Computador , Modelos Moleculares , Conformación Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos , Relación Estructura-Actividad
20.
J Comput Chem ; 37(12): 1048-58, 2016 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-26833706

RESUMEN

One of the main challenges in computational protein design (CPD) is the huge size of the protein sequence and conformational space that has to be computationally explored. Recently, we showed that state-of-the-art combinatorial optimization technologies based on Cost Function Network (CFN) processing allow speeding up provable rigid backbone protein design methods by several orders of magnitudes. Building up on this, we improved and injected CFN technology into the well-established CPD package Osprey to allow all Osprey CPD algorithms to benefit from associated speedups. Because Osprey fundamentally relies on the ability of A* to produce conformations in increasing order of energy, we defined new A* strategies combining CFN lower bounds, with new side-chain positioning-based branching scheme. Beyond the speedups obtained in the new A*-CFN combination, this novel branching scheme enables a much faster enumeration of suboptimal sequences, far beyond what is reachable without it. Together with the immediate and important speedups provided by CFN technology, these developments directly benefit to all the algorithms that previously relied on the DEE/ A* combination inside Osprey* and make it possible to solve larger CPD problems with provable algorithms.


Asunto(s)
Algoritmos , Biología Computacional , Proteínas/química , Secuencia de Aminoácidos , Diseño de Fármacos , Conformación Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...