RESUMO
The grid inhomogeneous solvation theory (GIST) method requires the often time-consuming calculation of water-water and water-solute energy on a grid. Previous efforts to speed up this calculation include using OpenMP, GPUs, and particle mesh Ewald. This article details how the speed of this calculation can be increased by parallelizing it with MPI, where trajectory frames are divided among multiple processors. This requires very little communication between individual processes during trajectory processing, meaning the calculation scales well to large processor counts. This article also details how the entropy calculation, which must happen after trajectory processing since it requires information from all trajectory frames, is parallelized via MPI. This parallelized GIST method has been implemented in the freely-available CPPTRAJ analysis software.
RESUMO
AmberTools is a free and open-source collection of programs used to set up, run, and analyze molecular simulations. The newer features contained within AmberTools23 are briefly described in this Application note.
RESUMO
Setting up molecular dynamics simulations from experimentally determined structures is often complicated by a variety of factors, particularly the inclusion of carbohydrates, since these have several anomer types which can be linked in a variety of ways. Here we present a stand-alone tool implemented in the widely-used software CPPTRAJ that can be used to automate building structures and generating a "ready to run" parameter and coordinate file pair. This tool automatically identifies carbohydrate anomer type, configuration, linkage, and functional groups, and performs topology modifications (e.g., renaming residue/atom names) required to build the final system using state of the art GLYCAM force field parameters. It will also generate the necessary commands for bonding carbohydrates and creating any disulfide bonds.
Assuntos
Simulação de Dinâmica Molecular , Software , Carboidratos/químicaRESUMO
The selectivity filter (SF) of bacterial voltage-gated sodium channels consists of four glutamate residues arranged in a C4 symmetry. The protonation state population of this tetrad is unclear. To address this question, we simulate the pore domain of bacterial voltage-gated sodium channel of Magnetococcus sp. (Nav Ms) through constant pH methodology in explicit solvent and free energy perturbation calculations. We find that at physiological pH the fully deprotonated as well as singly and doubly protonated states of the SF appear feasible, and that the calculated pKa decreases with each additional bound ion, suggesting that a decrease in the number of ions in the pore can lead to protonation of the SF. Previous molecular dynamics simulations have suggested that protonation can lead to a decrease in the conductance, but no pKa calculations were performed. We confirm a decreased ionic population of the pore with protonation, and also observe structural symmetry breaking triggered by protonation; the SF of the deprotonated channel is closest to the C4 symmetry observed in crystal structures of the open state, while the SF of protonated states display greater levels of asymmetry which could lead to transition to the inactivated state which possesses a C2 symmetry in the crystal structure. We speculate that the decrease in the number of ions near the mouth of the channel, due to either random fluctuations or ion depletion due to conduction, could be a self-regulatory mechanism resulting in a nonconducting state that functionally resembles inactivated states.
Assuntos
Alphaproteobacteria/química , Proteínas de Bactérias/química , Prótons , Sódio/química , Canais de Sódio Disparados por Voltagem/química , Alphaproteobacteria/metabolismo , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Cátions Monovalentes , Cristalografia por Raios X , Concentração de Íons de Hidrogênio , Transporte de Íons , Cinética , Simulação de Dinâmica Molecular , Ligação Proteica , Conformação Proteica em alfa-Hélice , Domínios e Motivos de Interação entre Proteínas , Sódio/metabolismo , Termodinâmica , Canais de Sódio Disparados por Voltagem/metabolismoRESUMO
Before beginning the production phase of molecular dynamics simulations, i.e., the phase that produces the data to be analyzed, it is often necessary to first perform a series of one or more preparatory minimizations and/or molecular dynamics simulations in order to ensure that subsequent production simulations are stable. This is particularly important for simulations with explicit solvent molecules. Despite the preparatory minimizations and simulations being ubiquitous and essential for stable production simulations, there are currently no general recommended procedures to perform them and very few criteria to decide whether the system is capable of producing a stable simulation trajectory. Here, we propose a simple and well-defined ten step simulation preparation protocol for explicitly solvated biomolecules, which can be applied to a wide variety of system types, as well as a simple test based on the system density for determining whether the simulation is stabilized.
RESUMO
Advances in biomolecular simulation methods and access to large scale computer resources have led to a massive increase in the amount of data generated. The key enablers have been optimization and parallelization of the simulation codes. However, much of the software used to analyze trajectory data from these simulations is still run in serial, or in some cases many threads via shared memory. Here, we describe the addition of multiple levels of parallel trajectory processing to the molecular dynamics simulation analysis software CPPTRAJ. In addition to the existing OpenMP shared-memory parallelism, CPPTRAJ now has two additional levels of message passing (MPI) parallelism involving both across-trajectory processing and across-ensemble processing. All three levels of parallelism can be simultaneously active, leading to significant speed ups in data analysis of large datasets on the NCSA Blue Waters supercomputer by better leveraging the many available nodes and its parallel file system. © 2018 Wiley Periodicals, Inc.
RESUMO
Recent modifications and improvements to standard nucleic acid force fields have attempted to fix problems and issues that have been observed as longer timescale simulations have become routine. Although previous work has shown the ability to fold the UUCG stem-loop structure, until now no group has attempted to quantify the performance of current force fields using highly converged structural populations of the tetraloop conformational ensemble. In this study, we report the use of multiple independent sets of multidimensional replica exchange molecular dynamics (M-REMD) simulations with different initial conditions to generate well-converged conformational ensembles for the tetranucleotides r(GACC) and r(CCCC), as well as the larger UUCG tetraloop motif. By generating what is to our knowledge the most complete RNA structure ensembles reported to date for these systems, we remove the coupling between force field errors and errors due to incomplete sampling, providing a comprehensive comparison between current top-performing MD force fields for RNA. Of the RNA force fields tested in this study, none demonstrate the ability to correctly identify the most thermodynamically stable structure for all three systems. We discuss the deficiencies present in each potential function and suggest areas where improvements can be made. The results imply that although "short" (nsec-µsec timescale) simulations may stay close to their respective experimental structures and may well reproduce experimental observables, inevitably the current force fields will populate alternative incorrect structures that are more stable than those observed via experiment.
Assuntos
Biologia Computacional/métodos , RNA/química , Espectroscopia de Ressonância Magnética , Simulação de Dinâmica Molecular , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , TermodinâmicaRESUMO
BACKGROUND: The structure and dynamics of DNA are critically related to its function. Molecular dynamics simulations augment experiment by providing detailed information about the atomic motions. However, to date the simulations have not been long enough for convergence of the dynamics and structural properties of DNA. METHODS: Molecular dynamics simulations performed with AMBER using the ff99SB force field with the parmbsc0 modifications, including ensembles of independent simulations, were compared to long timescale molecular dynamics performed with the specialized Anton MD engine on the B-DNA structure d(GCACGAACGAACGAACGC). To assess convergence, the decay of the average RMSD values over longer and longer time intervals was evaluated in addition to assessing convergence of the dynamics via the Kullback-Leibler divergence of principal component projection histograms. RESULTS: These molecular dynamics simulations-including one of the longest simulations of DNA published to date at ~44µs-surprisingly suggest that the structure and dynamics of the DNA helix, neglecting the terminal base pairs, are essentially fully converged on the ~1-5µs timescale. CONCLUSIONS: We can now reproducibly converge the structure and dynamics of B-DNA helices, omitting the terminal base pairs, on the µs time scale with both the AMBER and CHARMM C36 nucleic acid force fields. Results from independent ensembles of simulations starting from different initial conditions, when aggregated, match the results from long timescale simulations on the specialized Anton MD engine. GENERAL SIGNIFICANCE: With access to large-scale GPU resources or the specialized MD engine "Anton" it is possible for a variety of molecular systems to reproducibly and reliably converge the conformational ensemble of sampled structures. This article is part of a Special Issue entitled: Recent developments of molecular dynamics.
Assuntos
DNA/química , Simulação de Dinâmica Molecular , Algoritmos , DNA/biossíntese , Reparo do DNA , Replicação do DNA , Movimento (Física) , Conformação de Ácido Nucleico , Análise de Componente Principal , Reprodutibilidade dos Testes , Relação Estrutura-Atividade , Fatores de Tempo , Transcrição GênicaRESUMO
Long time scale molecular dynamics (MD) simulations of biological systems are becoming increasingly commonplace due to the availability of both large-scale computational resources and significant advances in the underlying simulation methodologies. Therefore, it is useful to investigate and develop data mining and analysis techniques to quickly and efficiently extract the biologically relevant information from the incredible amount of generated data. Wavelet analysis (WA) is a technique that can quickly reveal significant motions during an MD simulation. Here, the application of WA on well-converged long time scale (tens of µs) simulations of a DNA helix is described. We show how WA combined with a simple clustering method can be used to identify both the physical and temporal locations of events with significant motion in MD trajectories. We also show that WA can not only distinguish and quantify the locations and time scales of significant motions, but by changing the maximum time scale of WA a more complete characterization of these motions can be obtained. This allows motions of different time scales to be identified or ignored as desired.
Assuntos
DNA/química , Simulação de Dinâmica Molecular , Análise de Ondaletas , Sequência de Bases , DNA/genética , Cinética , Conformação de Ácido NucleicoRESUMO
One of the key challenges of k-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as k-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation, k-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of k-means++ will lead to a lack of reproducibility. K-means N-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient n-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping k-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse datasets and be used as a standalone tool or as part of our MDANCE clustering package.
RESUMO
One of the key challenges of k-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as k-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex data sets such as those obtained from molecular simulation, k-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of k-means++ will lead to a lack of reproducibility. K-means N-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient n-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping k-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse data sets and be used as a standalone tool or as part of our MDANCE clustering package.
RESUMO
Modified nucleic acids have surged as a popular therapeutic route, emphasizing the importance of nucleic acid research in drug discovery and development. Beyond well-known RNA vaccines, antisense oligonucleotides and aptamers can incorporate various modified nucleic acids to target specific biomolecules for various therapeutic activities. Molecular dynamics simulations can accelerate the design and development of these systems with noncanonical nucleic acids by observing intricate dynamic properties and relative stability on the all-atom level. However, modeling these modified systems is challenging due to the time and resources required to parametrize components outside default force field parameters. Here, we present modXNA, a tool to derive and build modified nucleotides for use with Amber force fields. Several nucleic acid systems varying in size and number of modification sites were used to evaluate the accuracy of modXNA parameters, and results indicate the dynamics and structure are preserved throughout the simulations. We detail the protocol for quantum mechanics charge derivation and describe a workflow for implementing modXNA in Amber molecular dynamics simulations, which includes updates and added features to CPPTRAJ.
RESUMO
Caddisflies are aquatic relatives of silk-spinning terrestrial moths and butterflies. Casemaker larvae spin adhesive silk fibers for underwater construction of protective composite cases. The central region of Hesperophylax sp. H-fibroin contains a repeating pattern of three conserved subrepeats, all of which contain one or more (SX)n motifs with extensively phosphorylated serines. Native silk fibers were highly extensible and displayed a distinct yield point, force plateau, and load cycle hysteresis. FTIR spectroscopy of native silk showed a conformational mix of random coil, ß-sheet, and turns. Exchanging multivalent ions with Na(+) EDTA disrupted fiber mechanics, shifted the secondary structure ratios from antiparallel ß-sheet toward random coil and turns, and caused the fibers to shorten, swell in diameter, and disrupted fiber birefringence. The EDTA effects were reversed by restoring Ca(2+). Molecular dynamic simulations provided theoretical support for a hypothetical structure in which the (pSX)n motifs may assemble into two- and three-stranded, Ca(2+)-stabilized ß-sheets.
Assuntos
Ciclos de Atividade , Cálcio/química , Seda/química , Animais , Insetos , Modelos Moleculares , Simulação de Dinâmica Molecular , Estrutura Molecular , Resistência à Tração , Água/químicaRESUMO
In conjunction with the recent American Chemical Society symposium titled "Docking and Scoring: A Review of Docking Programs" the performance of the DOCK6 program was evaluated through (1) pose reproduction and (2) database enrichment calculations on a common set of organizer-specified systems and datasets (ASTEX, DUD, WOMBAT). Representative baseline grid score results averaged over five docking runs yield a relatively high pose identification success rate of 72.5 % (symmetry corrected rmsd) and sampling rate of 91.9 % for the multi site ASTEX set (N = 147) using organizer-supplied structures. Numerous additional docking experiments showed that ligand starting conditions, symmetry, multiple binding sites, clustering, and receptor preparation protocols all affect success. Encouragingly, in some cases, use of more sophisticated scoring and sampling methods yielded results which were comparable (Amber score ligand movable protocol) or exceeded (LMOD score) analogous baseline grid-score results. The analysis highlights the potential benefit and challenges associated with including receptor flexibility and indicates that different scoring functions have system dependent strengths and weaknesses. Enrichment studies with the DUD database prepared using the SB2010 preparation protocol and native ligand pairings yielded individual area under the curve (AUC) values derived from receiver operating characteristic curve analysis ranging from 0.29 (bad enrichment) to 0.96 (good enrichment) with an average value of 0.60 (27/38 have AUC ≥ 0.5). Strong early enrichment was also observed in the critically important 1.0-2.0 % region. Somewhat surprisingly, an alternative receptor preparation protocol yielded comparable results. As expected, semi-random pairings yielded poorer enrichments, in particular, for unrelated receptors. Overall, the breadth and number of experiments performed provide a useful snapshot of current capabilities of DOCK6 as well as starting points to guide future development efforts to further improve sampling and scoring.
Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Proteínas/química , Sítios de Ligação , Ligantes , Modelos Moleculares , Ligação Proteica , SoftwareRESUMO
Molecular dynamics (MD) simulations are now able to routinely reach timescales of microseconds and beyond. This has led to a corresponding increase in the amount of MD trajectory data that needs to be stored, particularly when those trajectories contain explicit solvent molecules. As such, it is desirable to be able to compress trajectory data while still retaining as much of the original information as possible. In this work, we describe compressing MD trajectory data using the NetCDF4/HDF5 file format, making use of quantization of the original positions to achieve better compression ratios. We also analyze the affect this has on both the resulting positions and the energies calculated from post-processing these trajectories, and recommend an optimal level of quantization. Overall we find the NetCDF4/HDF5 format to be an excellent choice for storing MD trajectory data in terms of speed, compressibility, and versatility.
Assuntos
Compressão de Dados , Simulação de Dinâmica Molecular , Compressão de Dados/métodos , SolventesRESUMO
Visualizing data generated from molecular dynamics simulations can be difficult, particularly when there can be thousands to millions of trajectory frames. The creation of a 3D grid of atomic density (i.e. a volumetric map) is one way to easily view the long-time average behavior of a system. One way to generate volumetric maps is by approximating each atom with a Gaussian function centered on that atom and spread over neighboring grid cells. However the calculation of the Gaussian function requires evaluation of the exponential function, which is computationally costly. Here we report on speeding up the calculation of volumetric maps from molecular dynamics trajectory data by replacing the expensive exponential function evaluation with an approximation using interpolating cubic splines. We also discuss the errors involved in this approximation, and recommend settings for volumetric map creation based on this.
Assuntos
Simulação de Dinâmica MolecularRESUMO
Grid Inhomogeneous Solvation Theory (GIST) maps out solvation thermodynamic properties on a fine meshed grid and provides a statistical mechanical formalism for thermodynamic end-state calculations. However, differences in how long-range nonbonded interactions are calculated in molecular dynamics engines and in the current implementation of GIST have prevented precise comparisons between free energies estimated using GIST and those from other free energy methods such as thermodynamic integration (TI). Here, we address this by presenting PME-GIST, a formalism by which particle mesh Ewald (PME)-based electrostatic energies and long-range Lennard-Jones (LJ) energies are decomposed and assigned to individual atoms and the corresponding voxels they occupy in a manner consistent with the GIST approach. PME-GIST yields potential energy calculations that are precisely consistent with modern simulation engines and performs these calculations at a dramatically faster speed than prior implementations. Here, we apply PME-GIST end-state analyses to 32 small molecules whose solvation free energies are close to evenly distributed from 2 kcal/mol to -17 kcal/mol and obtain solvation energies consistent with TI calculations (R2 = 0.99, mean unsigned difference 0.8 kcal/mol). We also estimate the entropy contribution from the second and higher order entropy terms that are truncated in GIST by the differences between entropies calculated in TI and GIST. With a simple correction for the high order entropy terms, PME-GIST obtains solvation free energies that are highly consistent with TI calculations (R2 = 0.99, mean unsigned difference = 0.4 kcal/mol) and experimental results (R2 = 0.88, mean unsigned difference = 1.4 kcal/mol). The precision of PME-GIST also enables us to show that the solvation free energy of small hydrophobic and hydrophilic molecules can be largely understood based on perturbations of the solvent in a region extending a few solvation shells from the solute. We have integrated PME-GIST into the open-source molecular dynamics analysis software CPPTRAJ.
RESUMO
The quantitative assessment of uncertainty and sampling quality is essential in molecular simulation. Many systems of interest are highly complex, often at the edge of current computational capabilities. Modelers must therefore analyze and communicate statistical uncertainties so that "consumers" of simulated data understand its significance and limitations. This article covers key analyses appropriate for trajectory data generated by conventional simulation methods such as molecular dynamics and (single Markov chain) Monte Carlo. It also provides guidance for analyzing some 'enhanced' sampling approaches. We do not discuss systematic errors arising, e.g., from inaccuracy in the chosen model or force field.
RESUMO
The effects of the use of three generalized Born (GB) implicit solvent models on the thermodynamics of a simple polyalanine peptide are studied via comparing several hundred nanoseconds of well-converged replica exchange molecular dynamics (REMD) simulations using explicit TIP3P solvent to REMD simulations with the GB solvent models. It is found that when compared to REMD simulations using TIP3P the GB REMD simulations contain significant differences in secondary structure populations, most notably an overabundance of alpha-helical secondary structure. This discrepancy is explored via comparison of the differences in the electrostatic component of the free energy of solvation (DeltaDeltaG(pol)) between TIP3P (via thermodynamic Integration calculations), the GB models, and an implicit solvent model based on the Poisson equation (PE). The electrostatic components of the solvation free energies are calculated using each solvent model for four representative conformations of Ala10. Since the PE model is found to have the best performance with respect to reproducing TIP3P DeltaDeltaG(pol) values, effective Born radii from the GB models are compared to effective Born radii calculated with PE (so-called perfect radii), and significant and numerous deviations in GB radii from perfect radii are found in all GB models. The effect of these deviations on the solvation free energy is discussed, and it is shown that even when perfect radii are used the agreement of GB with TIP3P DeltaDeltaG(pol) values does not improve. This suggests a limit to the optimization of the effective Born radius calculation and that future efforts to improve the accuracy of GB models must extend beyond such optimizations.
Assuntos
Algoritmos , Simulação por Computador , Peptídeos/química , Estrutura Secundária de Proteína , Solventes/química , Modelos Químicos , Eletricidade Estática , TermodinâmicaRESUMO
An experimentally well-studied model of RNA tertiary structures is a 58mer rRNA fragment, known as GTPase-associating center (GAC) RNA, in which a highly negative pocket walled by phosphate oxygen atoms is stabilized by a chelated cation. Although such deep pockets with more than one direct phosphate to ion chelation site normally include magnesium, as shown in one GAC crystal structure, another GAC crystal structure and solution experiments suggest potassium at this site. Both crystal structures also depict two magnesium ions directly bound to the phosphate groups comprising this controversial pocket. Here, we used classical molecular dynamics simulations as well as umbrella sampling to investigate the possibility of binding of potassium versus magnesium inside the pocket and to better characterize the chelation of one of the binding magnesium ions outside the pocket. The results support the preference of the pocket to accommodate potassium rather than magnesium and suggest that one of the closely binding magnesium ions can only bind at high magnesium concentrations, such as might be present during crystallization. This work illustrates the complementary utility of molecular modeling approaches with atomic-level detail in resolving discrepancies between conflicting experimental results.