Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 118
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Mol Cell ; 81(16): 3294-3309.e12, 2021 08 19.
Article in English | MEDLINE | ID: mdl-34293321

ABSTRACT

Temperature is a variable component of the environment, and all organisms must deal with or adapt to temperature change. Acute temperature change activates cellular stress responses, resulting in refolding or removal of damaged proteins. However, how organisms adapt to long-term temperature change remains largely unexplored. Here we report that budding yeast responds to long-term high temperature challenge by switching from chaperone induction to reduction of temperature-sensitive proteins and re-localizing a portion of its proteome. Surprisingly, we also find that many proteins adopt an alternative conformation. Using Fet3p as an example, we find that the temperature-dependent conformational difference is accompanied by distinct thermostability, subcellular localization, and, importantly, cellular functions. We postulate that, in addition to the known mechanisms of adaptation, conformational plasticity allows some polypeptides to acquire new biophysical properties and functions when environmental change endures.


Subject(s)
Adaptation, Physiological/genetics , Proteome/genetics , Stress, Physiological/genetics , Transcriptome/genetics , Acclimatization/genetics , Animals , Environmental Exposure/adverse effects , Gene Expression Regulation, Fungal/genetics , Hot Temperature/adverse effects , Saccharomycetales/genetics
2.
Nat Methods ; 20(6): 824-835, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37069271

ABSTRACT

BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings.


Subject(s)
Benchmarking , Microscopy , Microscopy/methods , Imaging, Three-Dimensional/methods , Neurons/physiology , Algorithms
3.
Proc Natl Acad Sci U S A ; 119(1)2022 01 04.
Article in English | MEDLINE | ID: mdl-34983849

ABSTRACT

RAS is a signaling protein associated with the cell membrane that is mutated in up to 30% of human cancers. RAS signaling has been proposed to be regulated by dynamic heterogeneity of the cell membrane. Investigating such a mechanism requires near-atomistic detail at macroscopic temporal and spatial scales, which is not possible with conventional computational or experimental techniques. We demonstrate here a multiscale simulation infrastructure that uses machine learning to create a scale-bridging ensemble of over 100,000 simulations of active wild-type KRAS on a complex, asymmetric membrane. Initialized and validated with experimental data (including a new structure of active wild-type KRAS), these simulations represent a substantial advance in the ability to characterize RAS-membrane biology. We report distinctive patterns of local lipid composition that correlate with interfacially promiscuous RAS multimerization. These lipid fingerprints are coupled to RAS dynamics, predicted to influence effector binding, and therefore may be a mechanism for regulating cell signaling cascades.


Subject(s)
Cell Membrane/enzymology , Lipids/chemistry , Machine Learning , Molecular Dynamics Simulation , Protein Multimerization , Proto-Oncogene Proteins p21(ras)/chemistry , Signal Transduction , Humans
4.
PLoS Comput Biol ; 19(4): e1011004, 2023 04.
Article in English | MEDLINE | ID: mdl-37099625

ABSTRACT

Mathematical models are often used to explore network-driven cellular processes from a systems perspective. However, a dearth of quantitative data suitable for model calibration leads to models with parameter unidentifiability and questionable predictive power. Here we introduce a combined Bayesian and Machine Learning Measurement Model approach to explore how quantitative and non-quantitative data constrain models of apoptosis execution within a missing data context. We find model prediction accuracy and certainty strongly depend on rigorous data-driven formulations of the measurement, and the size and make-up of the datasets. For instance, two orders of magnitude more ordinal (e.g., immunoblot) data are necessary to achieve accuracy comparable to quantitative (e.g., fluorescence) data for calibration of an apoptosis execution model. Notably, ordinal and nominal (e.g., cell fate observations) non-quantitative data synergize to reduce model uncertainty and improve accuracy. Finally, we demonstrate the potential of a data-driven Measurement Model approach to identify model features that could lead to informative experimental measurements and improve model predictive power.


Subject(s)
Machine Learning , Models, Theoretical , Bayes Theorem , Calibration , Apoptosis
5.
J Chem Inf Model ; 63(5): 1438-1453, 2023 03 13.
Article in English | MEDLINE | ID: mdl-36808989

ABSTRACT

Direct-acting antivirals for the treatment of the COVID-19 pandemic caused by the SARS-CoV-2 virus are needed to complement vaccination efforts. Given the ongoing emergence of new variants, automated experimentation, and active learning based fast workflows for antiviral lead discovery remain critical to our ability to address the pandemic's evolution in a timely manner. While several such pipelines have been introduced to discover candidates with noncovalent interactions with the main protease (Mpro), here we developed a closed-loop artificial intelligence pipeline to design electrophilic warhead-based covalent candidates. This work introduces a deep learning-assisted automated computational workflow to introduce linkers and an electrophilic "warhead" to design covalent candidates and incorporates cutting-edge experimental techniques for validation. Using this process, promising candidates in the library were screened, and several potential hits were identified and tested experimentally using native mass spectrometry and fluorescence resonance energy transfer (FRET)-based screening assays. We identified four chloroacetamide-based covalent inhibitors of Mpro with micromolar affinities (KI of 5.27 µM) using our pipeline. Experimentally resolved binding modes for each compound were determined using room-temperature X-ray crystallography, which is consistent with the predicted poses. The induced conformational changes based on molecular dynamics simulations further suggest that the dynamics may be an important factor to further improve selectivity, thereby effectively lowering KI and reducing toxicity. These results demonstrate the utility of our modular and data-driven approach for potent and selective covalent inhibitor discovery and provide a platform to apply it to other emerging targets.


Subject(s)
COVID-19 , Hepatitis C, Chronic , Humans , SARS-CoV-2/metabolism , Antiviral Agents/pharmacology , Pandemics , Artificial Intelligence , Protease Inhibitors/pharmacology , Molecular Docking Simulation
6.
Proc Natl Acad Sci U S A ; 117(39): 24258-24268, 2020 09 29.
Article in English | MEDLINE | ID: mdl-32913056

ABSTRACT

The small GTPase KRAS is localized at the plasma membrane where it functions as a molecular switch, coupling extracellular growth factor stimulation to intracellular signaling networks. In this process, KRAS recruits effectors, such as RAF kinase, to the plasma membrane where they are activated by a series of complex molecular steps. Defining the membrane-bound state of KRAS is fundamental to understanding the activation of RAF kinase and in evaluating novel therapeutic opportunities for the inhibition of oncogenic KRAS-mediated signaling. We combined multiple biophysical measurements and computational methodologies to generate a consensus model for authentically processed, membrane-anchored KRAS. In contrast to the two membrane-proximal conformations previously reported, we identify a third significantly populated state using a combination of neutron reflectivity, fast photochemical oxidation of proteins (FPOP), and NMR. In this highly populated state, which we refer to as "membrane-distal" and estimate to comprise ∼90% of the ensemble, the G-domain does not directly contact the membrane but is tethered via its C-terminal hypervariable region and carboxymethylated farnesyl moiety, as shown by FPOP. Subsequent interaction of the RAF1 RAS binding domain with KRAS does not significantly change G-domain configurations on the membrane but affects their relative populations. Overall, our results are consistent with a directional fly-casting mechanism for KRAS, in which the membrane-distal state of the G-domain can effectively recruit RAF kinase from the cytoplasm for activation at the membrane.


Subject(s)
Proto-Oncogene Proteins p21(ras)/metabolism , raf Kinases/metabolism , Cell Membrane/metabolism , Molecular Dynamics Simulation
7.
Int J High Perform Comput Appl ; 37(1): 28-44, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36647365

ABSTRACT

We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.

9.
J Chem Inf Model ; 62(1): 116-128, 2022 01 10.
Article in English | MEDLINE | ID: mdl-34793155

ABSTRACT

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.


Subject(s)
COVID-19 , Protease Inhibitors , Antiviral Agents , Coronavirus 3C Proteases , Humans , Molecular Docking Simulation , Molecular Dynamics Simulation , Orotic Acid/analogs & derivatives , Piperazines , SARS-CoV-2
10.
Proc Natl Acad Sci U S A ; 116(11): 5086-5095, 2019 03 12.
Article in English | MEDLINE | ID: mdl-30808805

ABSTRACT

The lysosomal enzyme glucocerebrosidase-1 (GCase) catalyzes the cleavage of a major glycolipid glucosylceramide into glucose and ceramide. The absence of fully functional GCase leads to the accumulation of its lipid substrates in lysosomes, causing Gaucher disease, an autosomal recessive disorder that displays profound genotype-phenotype nonconcordance. More than 250 disease-causing mutations in GBA1, the gene encoding GCase, have been discovered, although only one of these, N370S, causes 70% of disease. Here, we have used a knowledge-based docking protocol that considers experimental data of protein-protein binding to generate a complex between GCase and its known facilitator protein saposin C (SAPC). Multiscale molecular-dynamics simulations were used to study lipid self-assembly, membrane insertion, and the dynamics of the interactions between different components of the complex. Deep learning was applied to propose a model that explains the mechanism of GCase activation, which requires SAPC. Notably, we find that conformational changes in the loops at the entrance of the substrate-binding site are stabilized by direct interactions with SAPC and that the loss of such interactions induced by N370S and another common mutation, L444P, result in destabilization of the complex and reduced GCase activation. Our findings provide an atomistic-level explanation for GCase activation and the precise mechanism through which N370S and L444P cause Gaucher disease.


Subject(s)
Deep Learning , Gaucher Disease/enzymology , Gaucher Disease/physiopathology , Glucosylceramidase/metabolism , Molecular Dynamics Simulation , Catalytic Domain , Enzyme Activation , Glucosylceramidase/chemistry , Humans , Hydrogen Bonding , Mutant Proteins/chemistry , Protein Interaction Maps , Protein Structure, Secondary , Saposins/metabolism
11.
Int J High Perform Comput Appl ; 36(5-6): 603-623, 2022 Nov.
Article in English | MEDLINE | ID: mdl-38464362

ABSTRACT

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical compounds is a pathway to treating COVID-19. Conventional tools, e.g., cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide sufficiently high resolution or timescale to capture important dynamics of this molecular machine. Consequently, we develop an innovative workflow that bridges the gap between these resolutions, using mesoscale fluctuating finite element analysis (FFEA) continuum simulations and a hierarchy of AI-methods that continually learn and infer features for maintaining consistency between AAMD and FFEA simulations. We leverage a multi-site distributed workflow manager to orchestrate AI, FFEA, and AAMD jobs, providing optimal resource utilization across HPC centers. Our study provides unprecedented access to study the SARS-CoV-2 RTC machinery, while providing general capability for AI-enabled multi-resolution simulations at scale.

12.
J Biol Chem ; 295(4): 1105-1119, 2020 01 24.
Article in English | MEDLINE | ID: mdl-31836666

ABSTRACT

Neurofibromin is a tumor suppressor encoded by the NF1 gene, which is mutated in Rasopathy disease neurofibromatosis type I. Defects in NF1 lead to aberrant signaling through the RAS-mitogen-activated protein kinase pathway due to disruption of the neurofibromin GTPase-activating function on RAS family small GTPases. Very little is known about the function of most of the neurofibromin protein; to date, biochemical and structural data exist only for its GAP domain and a region containing a Sec-PH motif. To better understand the role of this large protein, here we carried out a series of biochemical and biophysical experiments, including size-exclusion chromatography-multiangle light scattering (SEC-MALS), small-angle X-ray and neutron scattering, and analytical ultracentrifugation, indicating that full-length neurofibromin forms a high-affinity dimer. We observed that neurofibromin dimerization also occurs in human cells and likely has biological and clinical implications. Analysis of purified full-length and truncated neurofibromin variants by negative-stain EM revealed the overall architecture of the dimer and predicted the potential interactions that contribute to the dimer interface. We could reconstitute structures resembling high-affinity full-length dimers by mixing N- and C-terminal protein domains in vitro The reconstituted neurofibromin was capable of GTPase activation in vitro, and co-expression of the two domains in human cells effectively recapitulated the activity of full-length neurofibromin. Taken together, these results suggest how neurofibromin dimers might form and be stabilized within the cell.


Subject(s)
Neurofibromin 1/chemistry , Neurofibromin 1/metabolism , Protein Multimerization , HEK293 Cells , Humans , Neurofibromin 1/ultrastructure , Protein Domains , Structure-Activity Relationship , ras GTPase-Activating Proteins/metabolism
13.
J Chem Inf Model ; 61(12): 5793-5803, 2021 12 27.
Article in English | MEDLINE | ID: mdl-34905348

ABSTRACT

Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.


Subject(s)
Fluorocarbons , Animals , Fluorocarbons/chemistry , Fluorocarbons/toxicity , Machine Learning , Neural Networks, Computer , Rats , Uncertainty
14.
J Chem Inf Model ; 61(6): 3058-3073, 2021 06 28.
Article in English | MEDLINE | ID: mdl-34124899

ABSTRACT

ß-coronavirus (CoVs) alone has been responsible for three major global outbreaks in the 21st century. The current crisis has led to an urgent requirement to develop therapeutics. Even though a number of vaccines are available, alternative strategies targeting essential viral components are required as a backup against the emergence of lethal viral variants. One such target is the main protease (Mpro) that plays an indispensable role in viral replication. The availability of over 270 Mpro X-ray structures in complex with inhibitors provides unique insights into ligand-protein interactions. Herein, we provide a comprehensive comparison of all nonredundant ligand-binding sites available for SARS-CoV2, SARS-CoV, and MERS-CoV Mpro. Extensive adaptive sampling has been used to investigate structural conservation of ligand-binding sites using Markov state models (MSMs) and compare conformational dynamics employing convolutional variational auto-encoder-based deep learning. Our results indicate that not all ligand-binding sites are dynamically conserved despite high sequence and structural conservation across ß-CoV homologs. This highlights the complexity in targeting all three Mpro enzymes with a single pan inhibitor.


Subject(s)
COVID-19 , Peptide Hydrolases , Antiviral Agents , Binding Sites , Humans , Ligands , Protease Inhibitors , RNA, Viral , SARS-CoV-2
15.
Int J High Perform Comput Appl ; 35(5): 432-451, 2021 Sep.
Article in English | MEDLINE | ID: mdl-38603008

ABSTRACT

We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.

16.
Proteomics ; 20(5-6): e1800407, 2020 03.
Article in English | MEDLINE | ID: mdl-32068959

ABSTRACT

Aging biology is intimately associated with dysregulated metabolism, which is one of the hallmarks of aging. Aging-related pathways such as mTOR and AMPK, which are major targets of anti-aging interventions including rapamcyin, metformin, and exercise, either directly regulate or intersect with metabolic pathways. In this review, numerous candidate bio-markers of aging that have emerged using metabolomics are outlined. Metabolomics studies also reveal that not all metabolites are created equally. A set of core "hub" metabolites are emerging as central mediators of aging. The hub metabolites reviewed here are nicotinamide adenine dinucleotide, reduced nicotinamide dinucleotide phosphate, α-ketoglutarate, and ß-hydroxybutyrate. These "hub" metabolites have signaling and epigenetic roles along with their canonical roles as co-factors or intermediates of carbon metabolism. Together these hub metabolites suggest a central role of the TCA cycle in signaling and metabolic dysregulation associated with aging.


Subject(s)
Aging , Metabolic Networks and Pathways , Metabolome , 3-Hydroxybutyric Acid/genetics , 3-Hydroxybutyric Acid/metabolism , Animals , Biomarkers/metabolism , Citric Acid Cycle , DNA Damage , Epigenesis, Genetic , Humans , Ketoglutaric Acids/metabolism , Metabolomics/methods , NAD/genetics , NAD/metabolism , NADP/genetics , NADP/metabolism
17.
Biophys J ; 117(3): 429-444, 2019 08 06.
Article in English | MEDLINE | ID: mdl-31349988

ABSTRACT

Cardiolipin is an anionic lipid found in the mitochondrial membranes of eukaryotes ranging from unicellular microorganisms to metazoans. This unique lipid contributes to various mitochondrial functions, including metabolism, mitochondrial membrane fusion and/or fission dynamics, and apoptosis. However, differences in cardiolipin content between the two mitochondrial membranes, as well as dynamic fluctuations in cardiolipin content in response to stimuli and cellular signaling events, raise questions about how cardiolipin concentration affects mitochondrial membrane structure and dynamics. Although cardiolipin's structural and dynamic roles have been extensively studied in binary mixtures with other phospholipids, the biophysical properties of cardiolipin in higher number lipid mixtures are still not well resolved. Here, we used molecular dynamics simulations to investigate the cardiolipin-dependent properties of ternary lipid bilayer systems that mimic the major components of mitochondrial membranes. We found that changes to cardiolipin concentration only resulted in minor changes to bilayer structural features but that the lipid diffusion was significantly affected by those alterations. We also found that cardiolipin position along the bilayer surfaces correlated to negative curvature deflections, consistent with the induction of negative curvature stress in the membrane monolayers. This work contributes to a foundational understanding of the role of cardiolipin in altering the properties in ternary lipid mixtures composed of the major mitochondrial phospholipids, providing much-needed insights to help understand how cardiolipin concentration modulates the biophysical properties of mitochondrial membranes.


Subject(s)
Cardiolipins/chemistry , Mitochondrial Membranes/metabolism , Molecular Dynamics Simulation , Diffusion , Hydrophobic and Hydrophilic Interactions , Lipid Bilayers/chemistry , Phospholipids/chemistry
18.
Biophys J ; 114(9): 2040-2043, 2018 05 08.
Article in English | MEDLINE | ID: mdl-29742397

ABSTRACT

Anharmonicity in time-dependent conformational fluctuations is noted to be a key feature of functional dynamics of biomolecules. Although anharmonic events are rare, long-timescale (µs-ms and beyond) simulations facilitate probing of such events. We have previously developed quasi-anharmonic analysis to resolve higher-order spatial correlations and characterize anharmonicity in biomolecular simulations. In this article, we have extended this toolbox to resolve higher-order temporal correlations and built a scalable Python package called anharmonic conformational analysis (ANCA). ANCA has modules to: 1) measure anharmonicity in the form of higher-order statistics and its variation as a function of time, 2) output a storyboard representation of the simulations to identify key anharmonic conformational events, and 3) identify putative anharmonic conformational substates and visualization of transitions between these substates.


Subject(s)
Molecular Dynamics Simulation , Animals , Aprotinin/chemistry , Aprotinin/metabolism , Cattle , Movement , Protein Conformation
19.
BMC Bioinformatics ; 19(Suppl 18): 484, 2018 Dec 21.
Article in English | MEDLINE | ID: mdl-30577777

ABSTRACT

BACKGROUND: We examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes. RESULTS: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 µs aggregate sampling), villin head piece (single trajectory of 125 µs) and ß- ß- α (BBA) protein (223 + 102 µs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features. CONCLUSIONS: Together, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.


Subject(s)
Protein Folding , Cluster Analysis , Molecular Dynamics Simulation
20.
BMC Bioinformatics ; 19(Suppl 18): 488, 2018 Dec 21.
Article in English | MEDLINE | ID: mdl-30577743

ABSTRACT

BACKGROUND: Deep Learning (DL) has advanced the state-of-the-art capabilities in bioinformatics applications which has resulted in trends of increasingly sophisticated and computationally demanding models trained by larger and larger data sets. This vastly increased computational demand challenges the feasibility of conducting cutting-edge research. One solution is to distribute the vast computational workload across multiple computing cluster nodes with data parallelism algorithms. In this study, we used a High-Performance Computing environment and implemented the Downpour Stochastic Gradient Descent algorithm for data parallelism to train a Convolutional Neural Network (CNN) for the natural language processing task of information extraction from a massive dataset of cancer pathology reports. We evaluated the scalability improvements using data parallelism training and the Titan supercomputer at Oak Ridge Leadership Computing Facility. To evaluate scalability, we used different numbers of worker nodes and performed a set of experiments comparing the effects of different training batch sizes and optimizer functions. RESULTS: We found that Adadelta would consistently converge at a lower validation loss, though requiring over twice as many training epochs as the fastest converging optimizer, RMSProp. The Adam optimizer consistently achieved a close 2nd place minimum validation loss significantly faster; using a batch size of 16 and 32 allowed the network to converge in only 4.5 training epochs. CONCLUSIONS: We demonstrated that the networked training process is scalable across multiple compute nodes communicating with message passing interface while achieving higher classification accuracy compared to a traditional machine learning algorithm.


Subject(s)
Computing Methodologies , Deep Learning/trends , Neoplasms/diagnosis , Comprehension , Humans , Neoplasms/pathology , Neural Networks, Computer
SELECTION OF CITATIONS
SEARCH DETAIL