Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 117
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Mol Cell ; 81(16): 3294-3309.e12, 2021 08 19.
Artículo en Inglés | MEDLINE | ID: mdl-34293321

RESUMEN

Temperature is a variable component of the environment, and all organisms must deal with or adapt to temperature change. Acute temperature change activates cellular stress responses, resulting in refolding or removal of damaged proteins. However, how organisms adapt to long-term temperature change remains largely unexplored. Here we report that budding yeast responds to long-term high temperature challenge by switching from chaperone induction to reduction of temperature-sensitive proteins and re-localizing a portion of its proteome. Surprisingly, we also find that many proteins adopt an alternative conformation. Using Fet3p as an example, we find that the temperature-dependent conformational difference is accompanied by distinct thermostability, subcellular localization, and, importantly, cellular functions. We postulate that, in addition to the known mechanisms of adaptation, conformational plasticity allows some polypeptides to acquire new biophysical properties and functions when environmental change endures.


Asunto(s)
Adaptación Fisiológica/genética , Proteoma/genética , Estrés Fisiológico/genética , Transcriptoma/genética , Aclimatación/genética , Animales , Exposición a Riesgos Ambientales/efectos adversos , Regulación Fúngica de la Expresión Génica/genética , Calor/efectos adversos , Saccharomycetales/genética
2.
Nat Methods ; 20(6): 824-835, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37069271

RESUMEN

BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings.


Asunto(s)
Benchmarking , Microscopía , Microscopía/métodos , Imagenología Tridimensional/métodos , Neuronas/fisiología , Algoritmos
3.
Proc Natl Acad Sci U S A ; 119(1)2022 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-34983849

RESUMEN

RAS is a signaling protein associated with the cell membrane that is mutated in up to 30% of human cancers. RAS signaling has been proposed to be regulated by dynamic heterogeneity of the cell membrane. Investigating such a mechanism requires near-atomistic detail at macroscopic temporal and spatial scales, which is not possible with conventional computational or experimental techniques. We demonstrate here a multiscale simulation infrastructure that uses machine learning to create a scale-bridging ensemble of over 100,000 simulations of active wild-type KRAS on a complex, asymmetric membrane. Initialized and validated with experimental data (including a new structure of active wild-type KRAS), these simulations represent a substantial advance in the ability to characterize RAS-membrane biology. We report distinctive patterns of local lipid composition that correlate with interfacially promiscuous RAS multimerization. These lipid fingerprints are coupled to RAS dynamics, predicted to influence effector binding, and therefore may be a mechanism for regulating cell signaling cascades.


Asunto(s)
Membrana Celular/enzimología , Lípidos/química , Aprendizaje Automático , Simulación de Dinámica Molecular , Multimerización de Proteína , Proteínas Proto-Oncogénicas p21(ras)/química , Transducción de Señal , Humanos
4.
PLoS Comput Biol ; 19(4): e1011004, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37099625

RESUMEN

Mathematical models are often used to explore network-driven cellular processes from a systems perspective. However, a dearth of quantitative data suitable for model calibration leads to models with parameter unidentifiability and questionable predictive power. Here we introduce a combined Bayesian and Machine Learning Measurement Model approach to explore how quantitative and non-quantitative data constrain models of apoptosis execution within a missing data context. We find model prediction accuracy and certainty strongly depend on rigorous data-driven formulations of the measurement, and the size and make-up of the datasets. For instance, two orders of magnitude more ordinal (e.g., immunoblot) data are necessary to achieve accuracy comparable to quantitative (e.g., fluorescence) data for calibration of an apoptosis execution model. Notably, ordinal and nominal (e.g., cell fate observations) non-quantitative data synergize to reduce model uncertainty and improve accuracy. Finally, we demonstrate the potential of a data-driven Measurement Model approach to identify model features that could lead to informative experimental measurements and improve model predictive power.


Asunto(s)
Aprendizaje Automático , Modelos Teóricos , Teorema de Bayes , Calibración , Apoptosis
5.
J Chem Inf Model ; 63(5): 1438-1453, 2023 03 13.
Artículo en Inglés | MEDLINE | ID: mdl-36808989

RESUMEN

Direct-acting antivirals for the treatment of the COVID-19 pandemic caused by the SARS-CoV-2 virus are needed to complement vaccination efforts. Given the ongoing emergence of new variants, automated experimentation, and active learning based fast workflows for antiviral lead discovery remain critical to our ability to address the pandemic's evolution in a timely manner. While several such pipelines have been introduced to discover candidates with noncovalent interactions with the main protease (Mpro), here we developed a closed-loop artificial intelligence pipeline to design electrophilic warhead-based covalent candidates. This work introduces a deep learning-assisted automated computational workflow to introduce linkers and an electrophilic "warhead" to design covalent candidates and incorporates cutting-edge experimental techniques for validation. Using this process, promising candidates in the library were screened, and several potential hits were identified and tested experimentally using native mass spectrometry and fluorescence resonance energy transfer (FRET)-based screening assays. We identified four chloroacetamide-based covalent inhibitors of Mpro with micromolar affinities (KI of 5.27 µM) using our pipeline. Experimentally resolved binding modes for each compound were determined using room-temperature X-ray crystallography, which is consistent with the predicted poses. The induced conformational changes based on molecular dynamics simulations further suggest that the dynamics may be an important factor to further improve selectivity, thereby effectively lowering KI and reducing toxicity. These results demonstrate the utility of our modular and data-driven approach for potent and selective covalent inhibitor discovery and provide a platform to apply it to other emerging targets.


Asunto(s)
COVID-19 , Hepatitis C Crónica , Humanos , SARS-CoV-2/metabolismo , Antivirales/farmacología , Pandemias , Inteligencia Artificial , Inhibidores de Proteasas/farmacología , Simulación del Acoplamiento Molecular
6.
Proc Natl Acad Sci U S A ; 117(39): 24258-24268, 2020 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-32913056

RESUMEN

The small GTPase KRAS is localized at the plasma membrane where it functions as a molecular switch, coupling extracellular growth factor stimulation to intracellular signaling networks. In this process, KRAS recruits effectors, such as RAF kinase, to the plasma membrane where they are activated by a series of complex molecular steps. Defining the membrane-bound state of KRAS is fundamental to understanding the activation of RAF kinase and in evaluating novel therapeutic opportunities for the inhibition of oncogenic KRAS-mediated signaling. We combined multiple biophysical measurements and computational methodologies to generate a consensus model for authentically processed, membrane-anchored KRAS. In contrast to the two membrane-proximal conformations previously reported, we identify a third significantly populated state using a combination of neutron reflectivity, fast photochemical oxidation of proteins (FPOP), and NMR. In this highly populated state, which we refer to as "membrane-distal" and estimate to comprise ∼90% of the ensemble, the G-domain does not directly contact the membrane but is tethered via its C-terminal hypervariable region and carboxymethylated farnesyl moiety, as shown by FPOP. Subsequent interaction of the RAF1 RAS binding domain with KRAS does not significantly change G-domain configurations on the membrane but affects their relative populations. Overall, our results are consistent with a directional fly-casting mechanism for KRAS, in which the membrane-distal state of the G-domain can effectively recruit RAF kinase from the cytoplasm for activation at the membrane.


Asunto(s)
Proteínas Proto-Oncogénicas p21(ras)/metabolismo , Quinasas raf/metabolismo , Membrana Celular/metabolismo , Simulación de Dinámica Molecular
7.
Int J High Perform Comput Appl ; 37(1): 28-44, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36647365

RESUMEN

We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.

8.
J Chem Inf Model ; 62(1): 116-128, 2022 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-34793155

RESUMEN

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.


Asunto(s)
COVID-19 , Inhibidores de Proteasas , Antivirales , Proteasas 3C de Coronavirus , Humanos , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Ácido Orótico/análogos & derivados , Piperazinas , SARS-CoV-2
9.
Proc Natl Acad Sci U S A ; 116(11): 5086-5095, 2019 03 12.
Artículo en Inglés | MEDLINE | ID: mdl-30808805

RESUMEN

The lysosomal enzyme glucocerebrosidase-1 (GCase) catalyzes the cleavage of a major glycolipid glucosylceramide into glucose and ceramide. The absence of fully functional GCase leads to the accumulation of its lipid substrates in lysosomes, causing Gaucher disease, an autosomal recessive disorder that displays profound genotype-phenotype nonconcordance. More than 250 disease-causing mutations in GBA1, the gene encoding GCase, have been discovered, although only one of these, N370S, causes 70% of disease. Here, we have used a knowledge-based docking protocol that considers experimental data of protein-protein binding to generate a complex between GCase and its known facilitator protein saposin C (SAPC). Multiscale molecular-dynamics simulations were used to study lipid self-assembly, membrane insertion, and the dynamics of the interactions between different components of the complex. Deep learning was applied to propose a model that explains the mechanism of GCase activation, which requires SAPC. Notably, we find that conformational changes in the loops at the entrance of the substrate-binding site are stabilized by direct interactions with SAPC and that the loss of such interactions induced by N370S and another common mutation, L444P, result in destabilization of the complex and reduced GCase activation. Our findings provide an atomistic-level explanation for GCase activation and the precise mechanism through which N370S and L444P cause Gaucher disease.


Asunto(s)
Aprendizaje Profundo , Enfermedad de Gaucher/enzimología , Enfermedad de Gaucher/fisiopatología , Glucosilceramidasa/metabolismo , Simulación de Dinámica Molecular , Dominio Catalítico , Activación Enzimática , Glucosilceramidasa/química , Humanos , Enlace de Hidrógeno , Proteínas Mutantes/química , Mapas de Interacción de Proteínas , Estructura Secundaria de Proteína , Saposinas/metabolismo
10.
Int J High Perform Comput Appl ; 36(5-6): 603-623, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-38464362

RESUMEN

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical compounds is a pathway to treating COVID-19. Conventional tools, e.g., cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide sufficiently high resolution or timescale to capture important dynamics of this molecular machine. Consequently, we develop an innovative workflow that bridges the gap between these resolutions, using mesoscale fluctuating finite element analysis (FFEA) continuum simulations and a hierarchy of AI-methods that continually learn and infer features for maintaining consistency between AAMD and FFEA simulations. We leverage a multi-site distributed workflow manager to orchestrate AI, FFEA, and AAMD jobs, providing optimal resource utilization across HPC centers. Our study provides unprecedented access to study the SARS-CoV-2 RTC machinery, while providing general capability for AI-enabled multi-resolution simulations at scale.

11.
J Biol Chem ; 295(4): 1105-1119, 2020 01 24.
Artículo en Inglés | MEDLINE | ID: mdl-31836666

RESUMEN

Neurofibromin is a tumor suppressor encoded by the NF1 gene, which is mutated in Rasopathy disease neurofibromatosis type I. Defects in NF1 lead to aberrant signaling through the RAS-mitogen-activated protein kinase pathway due to disruption of the neurofibromin GTPase-activating function on RAS family small GTPases. Very little is known about the function of most of the neurofibromin protein; to date, biochemical and structural data exist only for its GAP domain and a region containing a Sec-PH motif. To better understand the role of this large protein, here we carried out a series of biochemical and biophysical experiments, including size-exclusion chromatography-multiangle light scattering (SEC-MALS), small-angle X-ray and neutron scattering, and analytical ultracentrifugation, indicating that full-length neurofibromin forms a high-affinity dimer. We observed that neurofibromin dimerization also occurs in human cells and likely has biological and clinical implications. Analysis of purified full-length and truncated neurofibromin variants by negative-stain EM revealed the overall architecture of the dimer and predicted the potential interactions that contribute to the dimer interface. We could reconstitute structures resembling high-affinity full-length dimers by mixing N- and C-terminal protein domains in vitro The reconstituted neurofibromin was capable of GTPase activation in vitro, and co-expression of the two domains in human cells effectively recapitulated the activity of full-length neurofibromin. Taken together, these results suggest how neurofibromin dimers might form and be stabilized within the cell.


Asunto(s)
Neurofibromina 1/química , Neurofibromina 1/metabolismo , Multimerización de Proteína , Células HEK293 , Humanos , Neurofibromina 1/ultraestructura , Dominios Proteicos , Relación Estructura-Actividad , Proteínas Activadoras de ras GTPasa/metabolismo
12.
J Chem Inf Model ; 61(12): 5793-5803, 2021 12 27.
Artículo en Inglés | MEDLINE | ID: mdl-34905348

RESUMEN

Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.


Asunto(s)
Fluorocarburos , Animales , Fluorocarburos/química , Fluorocarburos/toxicidad , Aprendizaje Automático , Redes Neurales de la Computación , Ratas , Incertidumbre
13.
J Chem Inf Model ; 61(6): 3058-3073, 2021 06 28.
Artículo en Inglés | MEDLINE | ID: mdl-34124899

RESUMEN

ß-coronavirus (CoVs) alone has been responsible for three major global outbreaks in the 21st century. The current crisis has led to an urgent requirement to develop therapeutics. Even though a number of vaccines are available, alternative strategies targeting essential viral components are required as a backup against the emergence of lethal viral variants. One such target is the main protease (Mpro) that plays an indispensable role in viral replication. The availability of over 270 Mpro X-ray structures in complex with inhibitors provides unique insights into ligand-protein interactions. Herein, we provide a comprehensive comparison of all nonredundant ligand-binding sites available for SARS-CoV2, SARS-CoV, and MERS-CoV Mpro. Extensive adaptive sampling has been used to investigate structural conservation of ligand-binding sites using Markov state models (MSMs) and compare conformational dynamics employing convolutional variational auto-encoder-based deep learning. Our results indicate that not all ligand-binding sites are dynamically conserved despite high sequence and structural conservation across ß-CoV homologs. This highlights the complexity in targeting all three Mpro enzymes with a single pan inhibitor.


Asunto(s)
COVID-19 , Péptido Hidrolasas , Antivirales , Sitios de Unión , Humanos , Ligandos , Inhibidores de Proteasas , ARN Viral , SARS-CoV-2
14.
Int J High Perform Comput Appl ; 35(5): 432-451, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38603008

RESUMEN

We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.

15.
Proteomics ; 20(5-6): e1800407, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32068959

RESUMEN

Aging biology is intimately associated with dysregulated metabolism, which is one of the hallmarks of aging. Aging-related pathways such as mTOR and AMPK, which are major targets of anti-aging interventions including rapamcyin, metformin, and exercise, either directly regulate or intersect with metabolic pathways. In this review, numerous candidate bio-markers of aging that have emerged using metabolomics are outlined. Metabolomics studies also reveal that not all metabolites are created equally. A set of core "hub" metabolites are emerging as central mediators of aging. The hub metabolites reviewed here are nicotinamide adenine dinucleotide, reduced nicotinamide dinucleotide phosphate, α-ketoglutarate, and ß-hydroxybutyrate. These "hub" metabolites have signaling and epigenetic roles along with their canonical roles as co-factors or intermediates of carbon metabolism. Together these hub metabolites suggest a central role of the TCA cycle in signaling and metabolic dysregulation associated with aging.


Asunto(s)
Envejecimiento , Redes y Vías Metabólicas , Metaboloma , Ácido 3-Hidroxibutírico/genética , Ácido 3-Hidroxibutírico/metabolismo , Animales , Biomarcadores/metabolismo , Ciclo del Ácido Cítrico , Daño del ADN , Epigénesis Genética , Humanos , Ácidos Cetoglutáricos/metabolismo , Metabolómica/métodos , NAD/genética , NAD/metabolismo , NADP/genética , NADP/metabolismo
16.
Biophys J ; 117(3): 429-444, 2019 08 06.
Artículo en Inglés | MEDLINE | ID: mdl-31349988

RESUMEN

Cardiolipin is an anionic lipid found in the mitochondrial membranes of eukaryotes ranging from unicellular microorganisms to metazoans. This unique lipid contributes to various mitochondrial functions, including metabolism, mitochondrial membrane fusion and/or fission dynamics, and apoptosis. However, differences in cardiolipin content between the two mitochondrial membranes, as well as dynamic fluctuations in cardiolipin content in response to stimuli and cellular signaling events, raise questions about how cardiolipin concentration affects mitochondrial membrane structure and dynamics. Although cardiolipin's structural and dynamic roles have been extensively studied in binary mixtures with other phospholipids, the biophysical properties of cardiolipin in higher number lipid mixtures are still not well resolved. Here, we used molecular dynamics simulations to investigate the cardiolipin-dependent properties of ternary lipid bilayer systems that mimic the major components of mitochondrial membranes. We found that changes to cardiolipin concentration only resulted in minor changes to bilayer structural features but that the lipid diffusion was significantly affected by those alterations. We also found that cardiolipin position along the bilayer surfaces correlated to negative curvature deflections, consistent with the induction of negative curvature stress in the membrane monolayers. This work contributes to a foundational understanding of the role of cardiolipin in altering the properties in ternary lipid mixtures composed of the major mitochondrial phospholipids, providing much-needed insights to help understand how cardiolipin concentration modulates the biophysical properties of mitochondrial membranes.


Asunto(s)
Cardiolipinas/química , Membranas Mitocondriales/metabolismo , Simulación de Dinámica Molecular , Difusión , Interacciones Hidrofóbicas e Hidrofílicas , Membrana Dobles de Lípidos/química , Fosfolípidos/química
17.
Biophys J ; 114(9): 2040-2043, 2018 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-29742397

RESUMEN

Anharmonicity in time-dependent conformational fluctuations is noted to be a key feature of functional dynamics of biomolecules. Although anharmonic events are rare, long-timescale (µs-ms and beyond) simulations facilitate probing of such events. We have previously developed quasi-anharmonic analysis to resolve higher-order spatial correlations and characterize anharmonicity in biomolecular simulations. In this article, we have extended this toolbox to resolve higher-order temporal correlations and built a scalable Python package called anharmonic conformational analysis (ANCA). ANCA has modules to: 1) measure anharmonicity in the form of higher-order statistics and its variation as a function of time, 2) output a storyboard representation of the simulations to identify key anharmonic conformational events, and 3) identify putative anharmonic conformational substates and visualization of transitions between these substates.


Asunto(s)
Simulación de Dinámica Molecular , Animales , Aprotinina/química , Aprotinina/metabolismo , Bovinos , Movimiento , Conformación Proteica
18.
BMC Bioinformatics ; 19(Suppl 18): 484, 2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30577777

RESUMEN

BACKGROUND: We examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes. RESULTS: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 µs aggregate sampling), villin head piece (single trajectory of 125 µs) and ß- ß- α (BBA) protein (223 + 102 µs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features. CONCLUSIONS: Together, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.


Asunto(s)
Pliegue de Proteína , Análisis por Conglomerados , Simulación de Dinámica Molecular
19.
BMC Bioinformatics ; 19(Suppl 18): 488, 2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30577743

RESUMEN

BACKGROUND: Deep Learning (DL) has advanced the state-of-the-art capabilities in bioinformatics applications which has resulted in trends of increasingly sophisticated and computationally demanding models trained by larger and larger data sets. This vastly increased computational demand challenges the feasibility of conducting cutting-edge research. One solution is to distribute the vast computational workload across multiple computing cluster nodes with data parallelism algorithms. In this study, we used a High-Performance Computing environment and implemented the Downpour Stochastic Gradient Descent algorithm for data parallelism to train a Convolutional Neural Network (CNN) for the natural language processing task of information extraction from a massive dataset of cancer pathology reports. We evaluated the scalability improvements using data parallelism training and the Titan supercomputer at Oak Ridge Leadership Computing Facility. To evaluate scalability, we used different numbers of worker nodes and performed a set of experiments comparing the effects of different training batch sizes and optimizer functions. RESULTS: We found that Adadelta would consistently converge at a lower validation loss, though requiring over twice as many training epochs as the fastest converging optimizer, RMSProp. The Adam optimizer consistently achieved a close 2nd place minimum validation loss significantly faster; using a batch size of 16 and 32 allowed the network to converge in only 4.5 training epochs. CONCLUSIONS: We demonstrated that the networked training process is scalable across multiple compute nodes communicating with message passing interface while achieving higher classification accuracy compared to a traditional machine learning algorithm.


Asunto(s)
Metodologías Computacionales , Aprendizaje Profundo/tendencias , Neoplasias/diagnóstico , Comprensión , Humanos , Neoplasias/patología , Redes Neurales de la Computación
20.
Biochemistry ; 57(29): 4263-4275, 2018 07 24.
Artículo en Inglés | MEDLINE | ID: mdl-29901984

RESUMEN

Optimal enzyme activity depends on a number of factors, including structure and dynamics. The role of enzyme structure is well recognized; however, the linkage between protein dynamics and enzyme activity has given rise to a contentious debate. We have developed an approach that uses an aqueous mixture of organic solvent to control the functionally relevant enzyme dynamics (without changing the structure), which in turn modulates the enzyme activity. Using this approach, we predicted that the hydride transfer reaction catalyzed by the enzyme dihydrofolate reductase (DHFR) from Escherichia coli in aqueous mixtures of isopropanol (IPA) with water will decrease by ∼3 fold at 20% (v/v) IPA concentration. Stopped-flow kinetic measurements find that the pH-independent khydride rate decreases by 2.2 fold. X-ray crystallographic enzyme structures show no noticeable differences, while computational studies indicate that the transition state and electrostatic effects were identical for water and mixed solvent conditions; quasi-elastic neutron scattering studies show that the dynamical enzyme motions are suppressed. Our approach provides a unique avenue to modulating enzyme activity through changes in enzyme dynamics. Further it provides vital insights that show the altered motions of DHFR cause significant changes in the enzyme's ability to access its functionally relevant conformational substates, explaining the decreased khydride rate. This approach has important implications for obtaining fundamental insights into the role of rate-limiting dynamics in catalysis and as well as for enzyme engineering.


Asunto(s)
2-Propanol/metabolismo , Activación Enzimática/efectos de los fármacos , Escherichia coli/enzimología , Solventes/metabolismo , Tetrahidrofolato Deshidrogenasa/metabolismo , Cristalografía por Rayos X/métodos , Escherichia coli/química , Escherichia coli/metabolismo , Cinética , Simulación de Dinámica Molecular , Conformación Proteica/efectos de los fármacos , Electricidad Estática , Tetrahidrofolato Deshidrogenasa/química , Viscosidad , Agua/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA