RESUMO
In order to improve the accuracy of molecular dynamics simulations, classical forcefields are supplemented with a kernel-based machine learning method trained on quantum-mechanical fragment energies. As an example application, a potential-energy surface is generalized for a small DNA duplex, taking into account explicit solvation and long-range electron exchange-correlation effects. A long-standing problem in molecular science is that experimental studies of the structural and thermodynamic behavior of DNA under tension are not well confirmed by simulation; study of the potential energy vs extension taking into account a novel correction shows that leading classical DNA models have excessive stiffness with respect to stretching. This discrepancy is found to be common across multiple forcefields. The quantum correction is in qualitative agreement with the experimental thermodynamics for larger DNA double helices, providing a candidate explanation for the general and long-standing discrepancy between single molecule stretching experiments and classical calculations of DNA stretching. The new dataset of quantum calculations should facilitate multiple types of nucleic acid simulation, and the associated Kernel Modified Molecular Dynamics method (KMMD) is applicable to biomolecular simulations in general. KMMD is made available as part of the AMBER22 simulation software.
Assuntos
DNA , Simulação de Dinâmica Molecular , Pareamento de Bases , DNA/química , Aprendizado de Máquina , Solventes/químicaRESUMO
BACKGROUND: Current biological investigations tend to operate with genomes, instead of genes as during the last century. It is possible to compare entire genomes, transcriptomes or proteomes, using alphanumeric data corresponding to the differential expression levels of thousands of genes. What remains difficult is to link array results to factual or bibliographical data and retrieve information that is highly structured and - in Shannon's sense - rare. MATERIAL/METHODS: We have developed a tool, Documentation and Information LIBrary (DILIB), that enables us to retrieve, organize and analyze huge amounts of data available on the Internet and related to microarray experiments. DILIB can link hundreds of differentially expressed genes - through their Single Identifier or GenBank accession number - to hundreds of Medline records, and can retrieve, analyze, and compare automatically thousands of non-trivial descriptors related to gene clusters. RESULTS: As exemplified with frequency comparison of MEdical Subject Headings and Registry Number descriptors, we reanalyzed the involvement of 'integrin', 'interleukin' and 'CD Antigens' in mesotheliomas. Thus, DILIB allowed us to: (i). associate literature to expressed genes, (ii). link functional transcriptomes in various experiments, (iii). associate specific descriptors to experiments, (iv). define new research areas, and eventually (v). find new functions for co-expressed genes. CONCLUSIONS: We propose a new concept, 'bibliomics', representing a subset of high quality and rare information, retrieved and organized by systematic literature-searching tools from existing databases, and related to a subset of genes functioning together in '-omic' sciences.