Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
1.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37991847

RESUMO

MOTIVATION: The two strands of the DNA double helix locally and spontaneously separate and recombine in living cells due to the inherent thermal DNA motion. This dynamics results in transient openings in the double helix and is referred to as "DNA breathing" or "DNA bubbles." The propensity to form local transient openings is important in a wide range of biological processes, such as transcription, replication, and transcription factors binding. However, the modeling and computer simulation of these phenomena, have remained a challenge due to the complex interplay of numerous factors, such as, temperature, salt content, DNA sequence, hydrogen bonding, base stacking, and others. RESULTS: We present pyDNA-EPBD, a parallel software implementation of the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA model that allows us to describe some features of DNA dynamics in detail. The pyDNA-EPBD generates genomic scale profiles of average base-pair openings, base flipping probability, DNA bubble probability, and calculations of the characteristically dynamic length indicating the number of base pairs statistically significantly affected by a single point mutation using the Markov Chain Monte Carlo algorithm. AVAILABILITY AND IMPLEMENTATION: pyDNA-EPBD is supported across most operating systems and is freely available at https://github.com/lanl/pyDNA_EPBD. Extensive documentation can be found at https://lanl.github.io/pyDNA_EPBD/.


Assuntos
DNA , Modelos Químicos , Simulação por Computador , DNA/química , Software , Pareamento de Bases , Conformação de Ácido Nucleico
2.
Int J Mol Sci ; 25(8)2024 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-38674107

RESUMO

The fibroblast growth factor receptor 2 (FGFR2) gene is one of the most extensively studied genes with many known mutations implicated in several human disorders, including oncogenic ones. Most FGFR2 disease-associated gene mutations are missense mutations that result in constitutive activation of the FGFR2 protein and downstream molecular pathways. Many tertiary structures of the FGFR2 kinase domain are publicly available in the wildtype and mutated forms and in the inactive and activated state of the receptor. The current literature suggests a molecular brake inhibiting the ATP-binding A loop from adopting the activated state. Mutations relieve this brake, triggering allosteric changes between active and inactive states. However, the existing analysis relies on static structures and fails to account for the intrinsic structural dynamics. In this study, we utilize experimentally resolved structures of the FGFR2 tyrosine kinase domain and machine learning to capture the intrinsic structural dynamics, correlate it with functional regions and disease types, and enrich it with predicted structures of variants with currently no experimentally resolved structures. Our findings demonstrate the value of machine learning-enabled characterizations of structure dynamics in revealing the impact of mutations on (dys)function and disorder in FGFR2.


Assuntos
Receptor Tipo 2 de Fator de Crescimento de Fibroblastos , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/genética , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/química , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/metabolismo , Humanos , Mutação , Aprendizado de Máquina , Mutação de Sentido Incorreto , Modelos Moleculares , Conformação Proteica , Domínios Proteicos , Relação Estrutura-Atividade
3.
Bioinformatics ; 38(12): 3200-3208, 2022 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-35511125

RESUMO

MOTIVATION: Expanding our knowledge of small molecules beyond what is known in nature or designed in wet laboratories promises to significantly advance cheminformatics, drug discovery, biotechnology and material science. In silico molecular design remains challenging, primarily due to the complexity of the chemical space and the non-trivial relationship between chemical structures and biological properties. Deep generative models that learn directly from data are intriguing, but they have yet to demonstrate interpretability in the learned representation, so we can learn more about the relationship between the chemical and biological space. In this article, we advance research on disentangled representation learning for small molecule generation. We build on recent work by us and others on deep graph generative frameworks, which capture atomic interactions via a graph-based representation of a small molecule. The methodological novelty is how we leverage the concept of disentanglement in the graph variational autoencoder framework both to generate biologically relevant small molecules and to enhance model interpretability. RESULTS: Extensive qualitative and quantitative experimental evaluation in comparison with state-of-the-art models demonstrate the superiority of our disentanglement framework. We believe this work is an important step to address key challenges in small molecule generation with deep generative frameworks. AVAILABILITY AND IMPLEMENTATION: Training and generated data are made available at https://ieee-dataport.org/documents/dataset-disentangled-representation-learning-interpretable-molecule-generation. All code is made available at https://anonymous.4open.science/r/D-MolVAE-2799/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Descoberta de Drogas
4.
Molecules ; 26(5)2021 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-33668217

RESUMO

Protein molecules are inherently dynamic and modulate their interactions with different molecular partners by accessing different tertiary structures under physiological conditions. Elucidating such structures remains challenging. Current momentum in deep learning and the powerful performance of generative adversarial networks (GANs) in complex domains, such as computer vision, inspires us to investigate GANs on their ability to generate physically-realistic protein tertiary structures. The analysis presented here shows that several GAN models fail to capture complex, distal structural patterns present in protein tertiary structures. The study additionally reveals that mechanisms touted as effective in stabilizing the training of a GAN model are not all effective, and that performance based on loss alone may be orthogonal to performance based on the quality of generated datasets. A novel contribution in this study is the demonstration that Wasserstein GAN strikes a good balance and manages to capture both local and distal patterns, thus presenting a first step towards more powerful deep generative models for exploring a possibly very diverse set of structures supporting diverse activities of a protein molecule in the cell.


Assuntos
Redes Neurais de Computação , Proteínas/química , Estrutura Terciária de Proteína
5.
BMC Bioinformatics ; 21(Suppl 1): 189, 2020 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-33297949

RESUMO

BACKGROUND: Identifying one or more biologically-active/native decoys from millions of non-native decoys is one of the major challenges in computational structural biology. The extreme lack of balance in positive and negative samples (native and non-native decoys) in a decoy set makes the problem even more complicated. Consensus methods show varied success in handling the challenge of decoy selection despite some issues associated with clustering large decoy sets and decoy sets that do not show much structural similarity. Recent investigations into energy landscape-based decoy selection approaches show promises. However, lack of generalization over varied test cases remains a bottleneck for these methods. RESULTS: We propose a novel decoy selection method, ML-Select, a machine learning framework that exploits the energy landscape associated with the structure space probed through a template-free decoy generation. The proposed method outperforms both clustering and energy ranking-based methods, all the while consistently offering better performance on varied test-cases. Moreover, ML-Select shows promising results even for the decoy sets consisting of mostly low-quality decoys. CONCLUSIONS: ML-Select is a useful method for decoy selection. This work suggests further research in finding more effective ways to adopt machine learning frameworks in achieving robust performance for decoy selection in template-free protein structure prediction.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Análise por Conglomerados , Aprendizado de Máquina , Conformação Proteica , Dobramento de Proteína , Termodinâmica
6.
Molecules ; 25(5)2020 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-32143444

RESUMO

Rapid growth in molecular structure data is renewing interest in featurizing structure. Featurizations that retain information on biological activity are particularly sought for protein molecules, where decades of research have shown that indeed structure encodes function. Research on featurization of protein structure is active, but here we assess the promise of autoencoders. Motivated by rapid progress in neural network research, we investigate and evaluate autoencoders on yielding linear and nonlinear featurizations of protein tertiary structures. An additional reason we focus on autoencoders as the engine to obtain featurizations is the versatility of their architectures and the ease with which changes to architecture yield linear versus nonlinear features. While open-source neural network libraries, such as Keras, which we employ here, greatly facilitate constructing, training, and evaluating autoencoder architectures and conducting model search, autoencoders have not yet gained popularity in the structure biology community. Here we demonstrate their utility in a practical context. Employing autoencoder-based featurizations, we address the classic problem of decoy selection in protein structure prediction. Utilizing off-the-shelf supervised learning methods, we demonstrate that the featurizations are indeed meaningful and allow detecting active tertiary structures, thus opening the way for further avenues of research.


Assuntos
Proteínas/química , Estrutura Terciária de Proteína , Aprendizado de Máquina Supervisionado
7.
Molecules ; 25(9)2020 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-32397410

RESUMO

Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. In this paper, we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. Evaluations are related on both benchmark and CASP target proteins. Structure ensembles subjected to the proposed approach and the source code of the proposed approach are publicly-available at the links provided in Section 1.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Análise por Conglomerados , Modelos Moleculares , Dobramento de Proteína , Estrutura Terciária de Proteína , Software
8.
BMC Bioinformatics ; 20(1): 211, 2019 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-31023237

RESUMO

BACKGROUND: Computational approaches for the determination of biologically-active/native three-dimensional structures of proteins with novel sequences have to handle several challenges. The (conformation) space of possible three-dimensional spatial arrangements of the chain of amino acids that constitute a protein molecule is vast and high-dimensional. Exploration of the conformation spaces is performed in a sampling-based manner and is biased by the internal energy that sums atomic interactions. Even state-of-the-art energy functions that quantify such interactions are inherently inaccurate and associate with protein conformation spaces overly rugged energy surfaces riddled with artifact local minima. The response to these challenges in template-free protein structure prediction is to generate large numbers of low-energy conformations (also referred to as decoys) as a way of increasing the likelihood of having a diverse decoy dataset that covers a sufficient number of local minima possibly housing near-native conformations. RESULTS: In this paper we pursue a complementary approach and propose to directly control the diversity of generated decoys. Inspired by hard optimization problems in high-dimensional and non-linear variable spaces, we propose that conformation sampling for decoy generation is more naturally framed as a multi-objective optimization problem. We demonstrate that mechanisms inherent to evolutionary search techniques facilitate such framing and allow balancing multiple objectives in protein conformation sampling. We showcase here an operationalization of this idea via a novel evolutionary algorithm that has high exploration capability and is also able to access lower-energy regions of the energy landscape of a given protein with similar or better proximity to the known native structure than several state-of-the-art decoy generation algorithms. CONCLUSIONS: The presented results constitute a promising research direction in improving decoy generation for template-free protein structure prediction with regards to balancing of multiple conflicting objectives under an optimization framework. Future work will consider additional optimization objectives and variants of improvement and selection operators to apportion a fixed computational budget. Of particular interest are directions of research that attenuate dependence on protein energy models.


Assuntos
Algoritmos , Proteínas/química , Biologia Computacional , Estrutura Terciária de Proteína , Termodinâmica
9.
BMC Bioinformatics ; 20(Suppl 11): 280, 2019 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-31167640

RESUMO

BACKGROUND: Nearly all cellular processes involve proteins structurally rearranging to accommodate molecular partners. The energy landscape underscores the inherent nature of proteins as dynamic molecules interconverting between structures with varying energies. In principle, reconstructing a protein's energy landscape holds the key to characterizing the structural dynamics and its regulation of protein function. In practice, the disparate spatio-temporal scales spanned by the slow dynamics challenge both wet and dry laboratories. However, the growing number of deposited structures for proteins central to human biology presents an opportunity to infer the relevant dynamics via exploitation of the information encoded in such structures about equilibrium dynamics. RESULTS: Recent computational efforts using extrinsic modes of motion as variables have successfully reconstructed detailed energy landscapes of several medium-size proteins. Here we investigate the extent to which one can reconstruct the energy landscape of a protein in the absence of sufficient, wet-laboratory structural data. We do so by integrating intrinsic modes of motion extracted off a single structure in a stochastic optimization framework that supports the plug-and-play of different variable selection strategies. We demonstrate that, while knowledge of more wet-laboratory structures yields better-reconstructed landscapes, precious information can be obtained even when only one structural model is available. CONCLUSIONS: The presented work shows that it is possible to reconstruct the energy landscape of a protein with reasonable detail and accuracy even when the structural information about the protein is limited to one structure. By attenuating the dependence on structural data of methods designed to compute protein energy landscapes, the work opens up interesting venues of research on structure-based inference of dynamics. Of particular interest are directions of research that will extend such inference to proteins with no experimentally-characterized structures.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Guanosina Difosfato/química , Humanos , Movimento (Física) , Análise de Componente Principal , Termodinâmica
10.
Bioinformatics ; 34(16): 2740-2747, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29590297

RESUMO

Motivation: Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates. Results: In this work, we utilize deep learning to recognize antimicrobial activity. We propose a neural network model with convolutional and recurrent layers that leverage primary sequence composition. Results show that the proposed model outperforms state-of-the-art classification models on a comprehensive dataset. By utilizing the embedding weights, we also present a reduced-alphabet representation and show that reasonable AMP recognition can be maintained using nine amino acid types. Availability and implementation: Models and datasets are made freely available through the Antimicrobial Peptide Scanner vr.2 web server at www.ampscanner.com. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Anti-Infecciosos/farmacologia , Biologia Computacional/métodos , Aprendizado Profundo , Peptídeos/farmacologia , Análise de Sequência de Proteína/métodos
11.
Molecules ; 24(3)2019 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-30759724

RESUMO

Computational biology has made powerful advances. Among these, trends in human health have been uncovered through heterogeneous 'big data' integration, and disease-associated genes were identified and classified. Along a different front, the dynamic organization of chromatin is being elucidated to gain insight into the fundamental question of genome regulation. Powerful conformational sampling methods have also been developed to yield a detailed molecular view of cellular processes. when combining these methods with the advancements in the modeling of supramolecular assemblies, including those at the membrane, we are finally able to get a glimpse into how cells' actions are regulated. Perhaps most intriguingly, a major thrust is on to decipher the mystery of how the brain is coded. Here, we aim to provide a broad, yet concise, sketch of modern aspects of computational biology, with a special focus on computational structural biology. We attempt to forecast the areas that computational structural biology will embrace in the future and the challenges that it may face. We skirt details, highlight successes, note failures, and map directions.


Assuntos
Biologia Computacional/métodos , Encéfalo/fisiologia , Cromatina/genética , Genoma/genética , Humanos , Modelos Biológicos
12.
Molecules ; 24(5)2019 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-30823390

RESUMO

Significant efforts in wet and dry laboratories are devoted to resolving molecular structures. In particular, computational methods can now compute thousands of tertiary structures that populate the structure space of a protein molecule of interest. These advances are now allowing us to turn our attention to analysis methodologies that are able to organize the computed structures in order to highlight functionally relevant structural states. In this paper, we propose a methodology that leverages community detection methods, designed originally to detect communities in social networks, to organize computationally probed protein structure spaces. We report a principled comparison of such methods along several metrics on proteins of diverse folds and lengths. We present a rigorous evaluation in the context of decoy selection in template-free protein structure prediction. The results make the case that network-based community detection methods warrant further investigation to advance analysis of protein structure spaces for automated selection of functionally relevant structures.


Assuntos
Algoritmos , Biologia Computacional , Modelos Moleculares , Proteínas , Conformação Proteica , Proteínas/química , Proteínas/genética
13.
Molecules ; 24(9)2019 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-31067727

RESUMO

A tertiary structure governs, to a great extent, the biological activity of a protein in the living cell and is consequently a central focus of numerous studies aiming to shed light on cellular processes central to human health. Here, we aim to elucidate the structure of the Rift Valley fever virus (RVFV) L protein using a combination of in silico techniques. Due to its large size and multiple domains, elucidation of the tertiary structure of the L protein has so far challenged both dry and wet laboratories. In this work, we leverage complementary perspectives and tools from the computational-molecular-biology and bioinformatics domains for constructing, refining, and evaluating several atomistic structural models of the L protein that are physically realistic. All computed models have very flexible termini of about 200 amino acids each, and a high proportion of helical regions. Properties such as potential energy, radius of gyration, hydrodynamics radius, flexibility coefficient, and solvent-accessible surface are reported. Structural characterization of the L protein enables our laboratories to better understand viral replication and transcription via further studies of L protein-mediated protein-protein interactions. While results presented a focus on the RVFV L protein, the following workflow is a more general modeling protocol for discovering the tertiary structure of multidomain proteins consisting of thousands of amino acids.


Assuntos
Estrutura Terciária de Proteína , Febre do Vale de Rift/virologia , Vírus da Febre do Vale do Rift/química , Proteínas Virais/química , Animais , Genoma Viral/genética , Humanos , Conformação Proteica , RNA Viral/química , RNA Viral/genética , Vírus da Febre do Vale do Rift/genética , Proteínas Virais/genética , Replicação Viral/genética
14.
BMC Genomics ; 19(Suppl 7): 671, 2018 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-30255791

RESUMO

BACKGROUND: The protein energy landscape underscores the inherent nature of proteins as dynamic molecules interconverting between structures with varying energies. Reconstructing a protein's energy landscape holds the key to characterizing a protein's equilibrium conformational dynamics and its relationship to function. Many pathogenic mutations in protein sequences alter the equilibrium dynamics that regulates molecular interactions and thus protein function. In principle, reconstructing energy landscapes of a protein's healthy and diseased variants is a central step to understanding how mutations impact dynamics, biological mechanisms, and function. RESULTS: Recent computational advances are yielding detailed, sample-based representations of protein energy landscapes. In this paper, we propose and describe two novel methods that leverage computed, sample-based representations of landscapes to reconstruct them and extract from them informative local structures that reveal the underlying organization of an energy landscape. Such structures constitute landscape features that, as we demonstrate here, can be utilized to detect alterations of landscapes upon mutation. CONCLUSIONS: The proposed methods detect altered protein energy landscape features in response to sequence mutations. By doing so, the methods allow formulating hypotheses on the impact of mutations on specific biological activities of a protein. This work demonstrates that the availability of energy landscapes of healthy and diseased variants of a protein opens up new avenues to harness the quantitative information embedded in landscapes to summarize mechanisms via which mutations alter protein dynamics to percolate to dysfunction.


Assuntos
Algoritmos , Modelos Moleculares , Mutação , Proteínas/genética , Proteínas/metabolismo , Biologia Computacional/métodos , Humanos , Conformação Proteica , Proteínas/química , Termodinâmica
15.
Molecules ; 23(1)2018 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-29351266

RESUMO

Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Proteínas/química , Algoritmos , Bases de Dados de Proteínas
16.
PLoS Comput Biol ; 12(4): e1004619, 2016 04.
Artigo em Inglês | MEDLINE | ID: mdl-27124275

RESUMO

Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.


Assuntos
Substâncias Macromoleculares/química , Simulação de Dinâmica Molecular/estatística & dados numéricos , Algoritmos , Biologia Computacional , Simulação por Computador , Modelos Moleculares , Estrutura Molecular , Método de Monte Carlo , Ácidos Nucleicos/química , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas
17.
BMC Genomics ; 17 Suppl 4: 546, 2016 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-27535545

RESUMO

BACKGROUND: Structural excursions of a protein at equilibrium are key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to aid wet laboratories in characterizing equilibrium protein dynamics. In principle, structural excursions of a protein can be directly observed via simulation of its dynamics, but the disparate temporal scales involved in such excursions make this approach computationally impractical. On the other hand, an informative representation of the structure space available to a protein at equilibrium can be obtained efficiently via stochastic optimization, but this approach does not directly yield information on equilibrium dynamics. METHODS: We present here a novel methodology that first builds a multi-dimensional map of the energy landscape that underlies the structure space of a given protein and then queries the computed map for energetically-feasible excursions between structures of interest. An evolutionary algorithm builds such maps with a practical computational budget. Graphical techniques analyze a computed multi-dimensional map and expose interesting features of an energy landscape, such as basins and barriers. A path searching algorithm then queries a nearest-neighbor graph representation of a computed map for energetically-feasible basin-to-basin excursions. RESULTS: Evaluation is conducted on intrinsically-dynamic proteins of importance in human biology and disease. Visual statistical analysis of the maps of energy landscapes computed by the proposed methodology reveals features already captured in the wet laboratory, as well as new features indicative of interesting, unknown thermodynamically-stable and semi-stable regions of the equilibrium structure space. Comparison of maps and structural excursions computed by the proposed methodology on sequence variants of a protein sheds light on the role of equilibrium structure and dynamics in the sequence-function relationship. CONCLUSIONS: Applications show that the proposed methodology is effective at locating basins in complex energy landscapes and computing basin-basin excursions of a protein with a practical computational budget. While the actual temporal scales spanned by a structural excursion cannot be directly obtained due to the foregoing of simulation of dynamics, hypotheses can be formulated regarding the impact of sequence mutations on protein function. These hypotheses are valuable in instigating further research in wet laboratories.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Algoritmos , Análise por Conglomerados , Humanos , Modelos Moleculares , Termodinâmica
18.
PLoS Comput Biol ; 11(9): e1004470, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26325505

RESUMO

An important goal in molecular biology is to understand functional changes upon single-point mutations in proteins. Doing so through a detailed characterization of structure spaces and underlying energy landscapes is desirable but continues to challenge methods based on Molecular Dynamics. In this paper we propose a novel algorithm, SIfTER, which is based instead on stochastic optimization to circumvent the computational challenge of exploring the breadth of a protein's structure space. SIfTER is a data-driven evolutionary algorithm, leveraging experimentally-available structures of wildtype and variant sequences of a protein to define a reduced search space from where to efficiently draw samples corresponding to novel structures not directly observed in the wet laboratory. The main advantage of SIfTER is its ability to rapidly generate conformational ensembles, thus allowing mapping and juxtaposing landscapes of variant sequences and relating observed differences to functional changes. We apply SIfTER to variant sequences of the H-Ras catalytic domain, due to the prominent role of the Ras protein in signaling pathways that control cell proliferation, its well-studied conformational switching, and abundance of documented mutations in several human tumors. Many Ras mutations are oncogenic, but detailed energy landscapes have not been reported until now. Analysis of SIfTER-computed energy landscapes for the wildtype and two oncogenic variants, G12V and Q61L, suggests that these mutations cause constitutive activation through two different mechanisms. G12V directly affects binding specificity while leaving the energy landscape largely unchanged, whereas Q61L has pronounced, starker effects on the landscape. An implementation of SIfTER is made available at http://www.cs.gmu.edu/~ashehu/?q=OurTools. We believe SIfTER is useful to the community to answer the question of how sequence mutations affect the function of a protein, when there is an abundance of experimental structures that can be exploited to reconstruct an energy landscape that would be computationally impractical to do via Molecular Dynamics.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Moleculares , Proteína Oncogênica p21(ras)/química , Proteína Oncogênica p21(ras)/genética , Cristalografia , Humanos , Mutação , Proteína Oncogênica p21(ras)/metabolismo , Análise de Componente Principal , Conformação Proteica , Termodinâmica
19.
Bioessays ; 35(12): 1025-34, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24185813

RESUMO

It was, until recently, accepted that the two classes of acetylcholine (ACh) receptors are distinct in an important sense: muscarinic ACh receptors signal via heterotrimeric GTP binding proteins (G proteins), whereas nicotinic ACh receptors (nAChRs) open to allow flux of Na+, Ca2+, and K+ ions into the cell after activation. Here we present evidence of direct coupling between G proteins and nAChRs in neurons. Based on proteomic, biophysical, and functional evidence, we hypothesize that binding to G proteins modulates the activity and signaling of nAChRs in cells. It is important to note that while this hypothesis is new for the nAChR, it is consistent with known interactions between G proteins and structurally related ligand-gated ion channels. Therefore, it underscores an evolutionarily conserved metabotropic mechanism of G protein signaling via nAChR channels.


Assuntos
Proteínas de Ligação ao GTP/metabolismo , Receptores Nicotínicos/metabolismo , Animais , Proteínas de Ligação ao GTP/genética , Humanos , Ligação Proteica , Receptores Nicotínicos/genética , Transdução de Sinais/genética , Transdução de Sinais/fisiologia
20.
BMC Bioinformatics ; 15 Suppl 8: S4, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25080993

RESUMO

BACKGROUND: Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. METHODS: Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. RESULTS: We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. CONCLUSIONS: This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Automação , Biologia Computacional/instrumentação , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa