Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
J Chem Theory Comput ; 2024 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-38905589

RESUMEN

One of the key challenges of k-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as k-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex data sets such as those obtained from molecular simulation, k-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of k-means++ will lead to a lack of reproducibility. K-means N-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient n-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping k-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse data sets and be used as a standalone tool or as part of our MDANCE clustering package.

2.
bioRxiv ; 2024 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-38496504

RESUMEN

One of the key challenges of k-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as k-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation, k-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of k-means++ will lead to a lack of reproducibility. K-means N-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient n-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping k-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse datasets and be used as a standalone tool or as part of our MDANCE clustering package.

3.
J Comput Chem ; 45(10): 633-637, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38071482

RESUMEN

The grid inhomogeneous solvation theory (GIST) method requires the often time-consuming calculation of water-water and water-solute energy on a grid. Previous efforts to speed up this calculation include using OpenMP, GPUs, and particle mesh Ewald. This article details how the speed of this calculation can be increased by parallelizing it with MPI, where trajectory frames are divided among multiple processors. This requires very little communication between individual processes during trajectory processing, meaning the calculation scales well to large processor counts. This article also details how the entropy calculation, which must happen after trajectory processing since it requires information from all trajectory frames, is parallelized via MPI. This parallelized GIST method has been implemented in the freely-available CPPTRAJ analysis software.

5.
Protein Sci ; 31(12): e4511, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36382864

RESUMEN

Molecular dynamics (MD) simulations are now able to routinely reach timescales of microseconds and beyond. This has led to a corresponding increase in the amount of MD trajectory data that needs to be stored, particularly when those trajectories contain explicit solvent molecules. As such, it is desirable to be able to compress trajectory data while still retaining as much of the original information as possible. In this work, we describe compressing MD trajectory data using the NetCDF4/HDF5 file format, making use of quantization of the original positions to achieve better compression ratios. We also analyze the affect this has on both the resulting positions and the energies calculated from post-processing these trajectories, and recommend an optimal level of quantization. Overall we find the NetCDF4/HDF5 format to be an excellent choice for storing MD trajectory data in terms of speed, compressibility, and versatility.


Asunto(s)
Compresión de Datos , Simulación de Dinámica Molecular , Compresión de Datos/métodos , Solventes
6.
J Comput Chem ; 43(13): 930-935, 2022 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-35318701

RESUMEN

Setting up molecular dynamics simulations from experimentally determined structures is often complicated by a variety of factors, particularly the inclusion of carbohydrates, since these have several anomer types which can be linked in a variety of ways. Here we present a stand-alone tool implemented in the widely-used software CPPTRAJ that can be used to automate building structures and generating a "ready to run" parameter and coordinate file pair. This tool automatically identifies carbohydrate anomer type, configuration, linkage, and functional groups, and performs topology modifications (e.g., renaming residue/atom names) required to build the final system using state of the art GLYCAM force field parameters. It will also generate the necessary commands for bonding carbohydrates and creating any disulfide bonds.


Asunto(s)
Simulación de Dinámica Molecular , Programas Informáticos , Carbohidratos/química
7.
J Chem Theory Comput ; 17(5): 2714-2724, 2021 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-33830762

RESUMEN

Grid Inhomogeneous Solvation Theory (GIST) maps out solvation thermodynamic properties on a fine meshed grid and provides a statistical mechanical formalism for thermodynamic end-state calculations. However, differences in how long-range nonbonded interactions are calculated in molecular dynamics engines and in the current implementation of GIST have prevented precise comparisons between free energies estimated using GIST and those from other free energy methods such as thermodynamic integration (TI). Here, we address this by presenting PME-GIST, a formalism by which particle mesh Ewald (PME)-based electrostatic energies and long-range Lennard-Jones (LJ) energies are decomposed and assigned to individual atoms and the corresponding voxels they occupy in a manner consistent with the GIST approach. PME-GIST yields potential energy calculations that are precisely consistent with modern simulation engines and performs these calculations at a dramatically faster speed than prior implementations. Here, we apply PME-GIST end-state analyses to 32 small molecules whose solvation free energies are close to evenly distributed from 2 kcal/mol to -17 kcal/mol and obtain solvation energies consistent with TI calculations (R2 = 0.99, mean unsigned difference 0.8 kcal/mol). We also estimate the entropy contribution from the second and higher order entropy terms that are truncated in GIST by the differences between entropies calculated in TI and GIST. With a simple correction for the high order entropy terms, PME-GIST obtains solvation free energies that are highly consistent with TI calculations (R2 = 0.99, mean unsigned difference = 0.4 kcal/mol) and experimental results (R2 = 0.88, mean unsigned difference = 1.4 kcal/mol). The precision of PME-GIST also enables us to show that the solvation free energy of small hydrophobic and hydrophilic molecules can be largely understood based on perturbations of the solvent in a region extending a few solvation shells from the solute. We have integrated PME-GIST into the open-source molecular dynamics analysis software CPPTRAJ.

8.
J Mol Graph Model ; 104: 107832, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33444979

RESUMEN

Visualizing data generated from molecular dynamics simulations can be difficult, particularly when there can be thousands to millions of trajectory frames. The creation of a 3D grid of atomic density (i.e. a volumetric map) is one way to easily view the long-time average behavior of a system. One way to generate volumetric maps is by approximating each atom with a Gaussian function centered on that atom and spread over neighboring grid cells. However the calculation of the Gaussian function requires evaluation of the exponential function, which is computationally costly. Here we report on speeding up the calculation of volumetric maps from molecular dynamics trajectory data by replacing the expensive exponential function evaluation with an approximation using interpolating cubic splines. We also discuss the errors involved in this approximation, and recommend settings for volumetric map creation based on this.


Asunto(s)
Simulación de Dinámica Molecular
9.
J Chem Phys ; 153(5): 054123, 2020 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-32770927

RESUMEN

Before beginning the production phase of molecular dynamics simulations, i.e., the phase that produces the data to be analyzed, it is often necessary to first perform a series of one or more preparatory minimizations and/or molecular dynamics simulations in order to ensure that subsequent production simulations are stable. This is particularly important for simulations with explicit solvent molecules. Despite the preparatory minimizations and simulations being ubiquitous and essential for stable production simulations, there are currently no general recommended procedures to perform them and very few criteria to decide whether the system is capable of producing a stable simulation trajectory. Here, we propose a simple and well-defined ten step simulation preparation protocol for explicitly solvated biomolecules, which can be applied to a wide variety of system types, as well as a simple test based on the system density for determining whether the simulation is stabilized.

10.
Proteins ; 88(3): 527-539, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31589792

RESUMEN

The selectivity filter (SF) of bacterial voltage-gated sodium channels consists of four glutamate residues arranged in a C4 symmetry. The protonation state population of this tetrad is unclear. To address this question, we simulate the pore domain of bacterial voltage-gated sodium channel of Magnetococcus sp. (Nav Ms) through constant pH methodology in explicit solvent and free energy perturbation calculations. We find that at physiological pH the fully deprotonated as well as singly and doubly protonated states of the SF appear feasible, and that the calculated pKa decreases with each additional bound ion, suggesting that a decrease in the number of ions in the pore can lead to protonation of the SF. Previous molecular dynamics simulations have suggested that protonation can lead to a decrease in the conductance, but no pKa calculations were performed. We confirm a decreased ionic population of the pore with protonation, and also observe structural symmetry breaking triggered by protonation; the SF of the deprotonated channel is closest to the C4 symmetry observed in crystal structures of the open state, while the SF of protonated states display greater levels of asymmetry which could lead to transition to the inactivated state which possesses a C2 symmetry in the crystal structure. We speculate that the decrease in the number of ions near the mouth of the channel, due to either random fluctuations or ion depletion due to conduction, could be a self-regulatory mechanism resulting in a nonconducting state that functionally resembles inactivated states.


Asunto(s)
Alphaproteobacteria/química , Proteínas Bacterianas/química , Protones , Sodio/química , Canales de Sodio Activados por Voltaje/química , Alphaproteobacteria/metabolismo , Proteínas Bacterianas/metabolismo , Sitios de Unión , Cationes Monovalentes , Cristalografía por Rayos X , Concentración de Iones de Hidrógeno , Transporte Iónico , Cinética , Simulación de Dinámica Molecular , Unión Proteica , Conformación Proteica en Hélice alfa , Dominios y Motivos de Interacción de Proteínas , Sodio/metabolismo , Termodinámica , Canales de Sodio Activados por Voltaje/metabolismo
11.
Artículo en Inglés | MEDLINE | ID: mdl-30533602

RESUMEN

The quantitative assessment of uncertainty and sampling quality is essential in molecular simulation. Many systems of interest are highly complex, often at the edge of current computational capabilities. Modelers must therefore analyze and communicate statistical uncertainties so that "consumers" of simulated data understand its significance and limitations. This article covers key analyses appropriate for trajectory data generated by conventional simulation methods such as molecular dynamics and (single Markov chain) Monte Carlo. It also provides guidance for analyzing some 'enhanced' sampling approaches. We do not discuss systematic errors arising, e.g., from inaccuracy in the chosen model or force field.

12.
J Comput Chem ; 39(25): 2110-2117, 2018 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-30368859

RESUMEN

Advances in biomolecular simulation methods and access to large scale computer resources have led to a massive increase in the amount of data generated. The key enablers have been optimization and parallelization of the simulation codes. However, much of the software used to analyze trajectory data from these simulations is still run in serial, or in some cases many threads via shared memory. Here, we describe the addition of multiple levels of parallel trajectory processing to the molecular dynamics simulation analysis software CPPTRAJ. In addition to the existing OpenMP shared-memory parallelism, CPPTRAJ now has two additional levels of message passing (MPI) parallelism involving both across-trajectory processing and across-ensemble processing. All three levels of parallelism can be simultaneously active, leading to significant speed ups in data analysis of large datasets on the NCSA Blue Waters supercomputer by better leveraging the many available nodes and its parallel file system. © 2018 Wiley Periodicals, Inc.

13.
J Phys Chem B ; 121(3): 451-462, 2017 01 26.
Artículo en Inglés | MEDLINE | ID: mdl-27983843

RESUMEN

An experimentally well-studied model of RNA tertiary structures is a 58mer rRNA fragment, known as GTPase-associating center (GAC) RNA, in which a highly negative pocket walled by phosphate oxygen atoms is stabilized by a chelated cation. Although such deep pockets with more than one direct phosphate to ion chelation site normally include magnesium, as shown in one GAC crystal structure, another GAC crystal structure and solution experiments suggest potassium at this site. Both crystal structures also depict two magnesium ions directly bound to the phosphate groups comprising this controversial pocket. Here, we used classical molecular dynamics simulations as well as umbrella sampling to investigate the possibility of binding of potassium versus magnesium inside the pocket and to better characterize the chelation of one of the binding magnesium ions outside the pocket. The results support the preference of the pocket to accommodate potassium rather than magnesium and suggest that one of the closely binding magnesium ions can only bind at high magnesium concentrations, such as might be present during crystallization. This work illustrates the complementary utility of molecular modeling approaches with atomic-level detail in resolving discrepancies between conflicting experimental results.


Asunto(s)
GTP Fosfohidrolasas/química , Magnesio/química , Simulación de Dinámica Molecular , Potasio/química , ARN/química , Sitios de Unión , GTP Fosfohidrolasas/metabolismo , Iones/química , Iones/metabolismo , Magnesio/metabolismo , Potasio/metabolismo , ARN/metabolismo
14.
J Chem Inf Model ; 56(7): 1282-91, 2016 07 25.
Artículo en Inglés | MEDLINE | ID: mdl-27286268

RESUMEN

Long time scale molecular dynamics (MD) simulations of biological systems are becoming increasingly commonplace due to the availability of both large-scale computational resources and significant advances in the underlying simulation methodologies. Therefore, it is useful to investigate and develop data mining and analysis techniques to quickly and efficiently extract the biologically relevant information from the incredible amount of generated data. Wavelet analysis (WA) is a technique that can quickly reveal significant motions during an MD simulation. Here, the application of WA on well-converged long time scale (tens of µs) simulations of a DNA helix is described. We show how WA combined with a simple clustering method can be used to identify both the physical and temporal locations of events with significant motion in MD trajectories. We also show that WA can not only distinguish and quantify the locations and time scales of significant motions, but by changing the maximum time scale of WA a more complete characterization of these motions can be obtained. This allows motions of different time scales to be identified or ignored as desired.


Asunto(s)
ADN/química , Simulación de Dinámica Molecular , Análisis de Ondículas , Secuencia de Bases , ADN/genética , Cinética , Conformación de Ácido Nucleico
15.
RNA ; 21(9): 1578-90, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26124199

RESUMEN

Recent modifications and improvements to standard nucleic acid force fields have attempted to fix problems and issues that have been observed as longer timescale simulations have become routine. Although previous work has shown the ability to fold the UUCG stem-loop structure, until now no group has attempted to quantify the performance of current force fields using highly converged structural populations of the tetraloop conformational ensemble. In this study, we report the use of multiple independent sets of multidimensional replica exchange molecular dynamics (M-REMD) simulations with different initial conditions to generate well-converged conformational ensembles for the tetranucleotides r(GACC) and r(CCCC), as well as the larger UUCG tetraloop motif. By generating what is to our knowledge the most complete RNA structure ensembles reported to date for these systems, we remove the coupling between force field errors and errors due to incomplete sampling, providing a comprehensive comparison between current top-performing MD force fields for RNA. Of the RNA force fields tested in this study, none demonstrate the ability to correctly identify the most thermodynamically stable structure for all three systems. We discuss the deficiencies present in each potential function and suggest areas where improvements can be made. The results imply that although "short" (nsec-µsec timescale) simulations may stay close to their respective experimental structures and may well reproduce experimental observables, inevitably the current force fields will populate alternative incorrect structures that are more stable than those observed via experiment.


Asunto(s)
Biología Computacional/métodos , ARN/química , Espectroscopía de Resonancia Magnética , Simulación de Dinámica Molecular , Conformación de Ácido Nucleico , Motivos de Nucleótidos , Termodinámica
16.
Biochim Biophys Acta ; 1850(5): 1041-1058, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25219455

RESUMEN

BACKGROUND: The structure and dynamics of DNA are critically related to its function. Molecular dynamics simulations augment experiment by providing detailed information about the atomic motions. However, to date the simulations have not been long enough for convergence of the dynamics and structural properties of DNA. METHODS: Molecular dynamics simulations performed with AMBER using the ff99SB force field with the parmbsc0 modifications, including ensembles of independent simulations, were compared to long timescale molecular dynamics performed with the specialized Anton MD engine on the B-DNA structure d(GCACGAACGAACGAACGC). To assess convergence, the decay of the average RMSD values over longer and longer time intervals was evaluated in addition to assessing convergence of the dynamics via the Kullback-Leibler divergence of principal component projection histograms. RESULTS: These molecular dynamics simulations-including one of the longest simulations of DNA published to date at ~44µs-surprisingly suggest that the structure and dynamics of the DNA helix, neglecting the terminal base pairs, are essentially fully converged on the ~1-5µs timescale. CONCLUSIONS: We can now reproducibly converge the structure and dynamics of B-DNA helices, omitting the terminal base pairs, on the µs time scale with both the AMBER and CHARMM C36 nucleic acid force fields. Results from independent ensembles of simulations starting from different initial conditions, when aggregated, match the results from long timescale simulations on the specialized Anton MD engine. GENERAL SIGNIFICANCE: With access to large-scale GPU resources or the specialized MD engine "Anton" it is possible for a variety of molecular systems to reproducibly and reliably converge the conformational ensemble of sampled structures. This article is part of a Special Issue entitled: Recent developments of molecular dynamics.


Asunto(s)
ADN/química , Simulación de Dinámica Molecular , Algoritmos , ADN/biosíntesis , Reparación del ADN , Replicación del ADN , Movimiento (Física) , Conformación de Ácido Nucleico , Análisis de Componente Principal , Reproducibilidad de los Resultados , Relación Estructura-Actividad , Factores de Tiempo , Transcripción Genética
17.
Nat Commun ; 5: 5152, 2014 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-25351257

RESUMEN

DNA helices display a rich tapestry of motion on both short (<100 ns) and long (>1 ms) timescales. However, with the exception of mismatched or damaged DNA, experimental measures indicate that motions in the 1 µs to 1 ms range are effectively absent, which is often attributed to difficulties in measuring motions in this time range. We hypothesized that these motions have not been measured because there is effectively no motion on this timescale, as this provides a means to distinguish faithful Watson-Crick base-paired DNA from damaged DNA. The absence of motion on this timescale would present a 'static' DNA sequence-specific structure that matches the encounter timescales of proteins, thereby facilitating recognition. Here we report long-timescale (~10-44 µs) molecular dynamics simulations of a B-DNA duplex structure that addresses this hypothesis using both an 'Anton' machine and large ensembles of AMBER GPU simulations.


Asunto(s)
ADN/química , Conformación de Ácido Nucleico , Simulación de Dinámica Molecular , Análisis de Componente Principal , Reproducibilidad de los Resultados , Factores de Tiempo
18.
J Phys Chem B ; 118(13): 3543-52, 2014 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-24625009

RESUMEN

Many problems studied via molecular dynamics require accurate estimates of various thermodynamic properties, such as the free energies of different states of a system, which in turn requires well-converged sampling of the ensemble of possible structures. Enhanced sampling techniques are often applied to provide faster convergence than is possible with traditional molecular dynamics simulations. Hamiltonian replica exchange molecular dynamics (H-REMD) is a particularly attractive method, as it allows the incorporation of a variety of enhanced sampling techniques through modifications to the various Hamiltonians. In this work, we study the enhanced sampling of the RNA tetranucleotide r(GACC) provided by H-REMD combined with accelerated molecular dynamics (aMD), where a boosting potential is applied to torsions, and compare this to the enhanced sampling provided by H-REMD in which torsion potential barrier heights are scaled down to lower force constants. We show that H-REMD and multidimensional REMD (M-REMD) combined with aMD does indeed enhance sampling for r(GACC), and that the addition of the temperature dimension in the M-REMD simulations is necessary to efficiently sample rare conformations. Interestingly, we find that the rate of convergence can be improved in a single H-REMD dimension by simply increasing the number of replicas from 8 to 24 without increasing the maximum level of bias. The results also indicate that factors beyond replica spacing, such as round trip times and time spent at each replica, must be considered in order to achieve optimal sampling efficiency.


Asunto(s)
Simulación de Dinámica Molecular , ARN/química , Análisis por Conglomerados , Análisis de Componente Principal , ARN/metabolismo
19.
J Cheminform ; 6(1): 4, 2014 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-24484917

RESUMEN

BACKGROUND: Few environments have been developed or deployed to widely share biomolecular simulation data or to enable collaborative networks to facilitate data exploration and reuse. As the amount and complexity of data generated by these simulations is dramatically increasing and the methods are being more widely applied, the need for new tools to manage and share this data has become obvious. In this paper we present the results of a process aimed at assessing the needs of the community for data representation standards to guide the implementation of future repositories for biomolecular simulations. RESULTS: We introduce a list of common data elements, inspired by previous work, and updated according to feedback from the community collected through a survey and personal interviews. These data elements integrate the concepts for multiple types of computational methods, including quantum chemistry and molecular dynamics. The identified core data elements were organized into a logical model to guide the design of new databases and application programming interfaces. Finally a set of dictionaries was implemented to be used via SQL queries or locally via a Java API built upon the Apache Lucene text-search engine. CONCLUSIONS: The model and its associated dictionaries provide a simple yet rich representation of the concepts related to biomolecular simulations, which should guide future developments of repositories and more complex terminologies and ontologies. The model still remains extensible through the decomposition of virtual experiments into tasks and parameter sets, and via the use of extended attributes. The benefits of a common logical model for biomolecular simulations was illustrated through various use cases, including data storage, indexing, and presentation. All the models and dictionaries introduced in this paper are available for download at http://ibiomes.chpc.utah.edu/mediawiki/index.php/Downloads.

20.
J Chem Theory Comput ; 10(1): 492-499, 2014 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-24453949

RESUMEN

A necessary step to properly assess and validate the performance of force fields for biomolecules is to exhaustively sample the accessible conformational space, which is challenging for large RNA structures. Given questions regarding the reliability of modeling RNA structure and dynamics with current methods, we have begun to use RNA tetranucleotides to evaluate force fields. These systems, though small, display considerable conformational variability and complete sampling with standard simulation methods remains challenging. Here we compare and discuss the performance of known variations of replica exchange molecular dynamics (REMD) methods, specifically temperature REMD (T-REMD), Hamiltonian REMD (H-REMD), and multidimensional REMD (M-REMD) methods, which have been implemented in Amber's accelerated GPU code. Using two independent simulations, we show that M-REMD not only makes very efficient use of emerging large-scale GPU clusters, like Blue Waters at the University of Illinois, but also is critically important in generating the converged ensemble more efficiently than either T-REMD or H-REMD. With 57.6 µs aggregate sampling of a conformational ensemble with M-REMD methods, the populations can be compared to NMR data to evaluate force field reliability and further understand how putative changes to the force field may alter populations to be in more consistent agreement with experiment.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...