RESUMO
SUMMARY: Modern scientific investigation is generating increasingly larger datasets, yet analyzing these data with current tools is challenging. DIVE is a software framework intended to facilitate big data analysis and reduce the time to scientific insight. Here, we present features of the framework and demonstrate DIVE's application to the Dynameomics project, looking specifically at two proteins. AVAILABILITY AND IMPLEMENTATION: Binaries and documentation are available at http://www.dynameomics.org/DIVE/DIVESetup.exe.
Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Documentação/métodos , Proteínas Mutantes/metabolismo , Software , Simulação por Computador , Humanos , Proteínas Mutantes/genética , Mutação/genética , Superóxido Dismutase/genética , Superóxido Dismutase/metabolismo , Superóxido Dismutase-1 , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismoRESUMO
Most rotamer libraries are generated from subsets of the PDB and do not fully represent the conformational scope of protein side chains. Previous attempts to rectify this sparse coverage of conformational space have involved application of weighting and smoothing functions. We resolve these limitations by using physics-based molecular dynamics simulations to determine more accurate frequencies of rotameric states. This work forms part of our Dynameomics initiative and uses a set of 807 proteins selected to represent 97% of known autonomous protein folds, thereby eliminating the bias toward common topologies found within the PDB. Our Dynameomics derived rotamer libraries encompass 4.8 × 10(9) rotamers, sampled from at least 51,000 occurrences of each of 93,642 residues. Here, we provide a backbone-dependent rotamer library, based on secondary structure Ï/ψ regions, and an update to our 2011 backbone-independent library that addresses the doubling of our dataset since its original publication.
Assuntos
Simulação de Dinâmica Molecular , Biblioteca de Peptídeos , Software , Animais , Humanos , Isomerismo , Conformação Proteica , Ubiquitinas/químicaRESUMO
PURPOSE: This paper proposes a novel application of computer-aided diagnosis (CAD) to an everyday clinical dental challenge: the noninvasive differential diagnosis of periapical lesions between periapical cysts and granulomas. A histological biopsy is the most reliable method currently available for this differential diagnosis; however, this invasive procedure prevents the lesions from healing noninvasively despite a report that they may heal without surgical treatment. A CAD using cone-beam computed tomography (CBCT) offers an alternative noninvasive diagnostic tool which helps to avoid potentially unnecessary surgery and to investigate the unknown healing process and rate for the lesions. METHODS: The proposed semiautomatic solution combines graph-based random walks segmentation with machine learning-based boosted classifiers and offers a robust clinical tool with minimal user interaction. As part of this CAD framework, the authors provide two novel technical contributions: (1) probabilistic extension of the random walks segmentation with likelihood ratio test and (2) LDA-AdaBoost: a new integration of weighted linear discriminant analysis to AdaBoost. RESULTS: A dataset of 28 CBCT scans is used to validate the approach and compare it with other popular segmentation and classification methods. The results show the effectiveness of the proposed method with 94.1% correct classification rate and an improvement of the performance by comparison with the Simon's state-of-the-art method by 17.6%. The authors also compare classification performances with two independent ground-truth sets from the histopathology and CBCT diagnoses provided by endodontic experts. CONCLUSIONS: Experimental results of the authors show that the proposed CAD system behaves in clearer agreement with the CBCT ground-truth than with histopathology, supporting the Simon's conjecture that CBCT diagnosis can be as accurate as histopathology for differentiating the periapical lesions.
Assuntos
Tomografia Computadorizada de Feixe Cônico/métodos , Interpretação de Imagem Assistida por Computador/métodos , Granuloma Periapical/diagnóstico por imagem , Granuloma Periapical/diagnóstico , Cisto Radicular/diagnóstico por imagem , Cisto Radicular/diagnóstico , Conjuntos de Dados como Assunto , Diagnóstico Diferencial , Análise Discriminante , Humanos , Imageamento Tridimensional/métodos , Funções Verossimilhança , Modelos Lineares , Aprendizado de Máquina , Granuloma Periapical/patologia , Cisto Radicular/patologiaRESUMO
The need for data-centric scientific tools is growing; domains such as biology, chemistry, and physics are increasingly adopting computational approaches. So, scientists must deal with the challenges of big data. To address these challenges, researchers built a visual-analytics platform named DIVE (Data Intensive Visualization Engine). DIVE is a data-agnostic, ontologically expressive software framework that can stream large datasets at interactive speeds. In particular, DIVE makes novel contributions to structured-data-model manipulation and high-throughput streaming of large, structured datasets.
Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Software , Simulação de Dinâmica Molecular , Interface Usuário-ComputadorRESUMO
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, â¼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Simulação de Dinâmica Molecular , Proteínas , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de ProteínaRESUMO
Integrative structural biology attempts to model the structures of protein complexes that are challenging or intractable by classical structural methods (due to size, dynamics, or heterogeneity) by combining computational structural modeling with data from experimental methods. One such experimental method is chemical crosslinking mass spectrometry (XL-MS), in which protein complexes are crosslinked and characterized using liquid chromatography-mass spectrometry to pinpoint specific amino acid residues in close structural proximity. The commonly used lysine-reactive N-hydroxysuccinimide ester reagents disuccinimidylsuberate (DSS) and bis(sulfosuccinimidyl)suberate (BS(3) ) have a linker arm that is 11.4 Å long when fully extended, allowing Cα (alpha carbon of protein backbone) atoms of crosslinked lysine residues to be up to â¼24 Å apart. However, XL-MS studies on proteins of known structure frequently report crosslinks that exceed this distance. Typically, a tolerance of â¼3 Å is added to the theoretical maximum to account for this observation, with limited justification for the chosen value. We used the Dynameomics database, a repository of high-quality molecular dynamics simulations of 807 proteins representative of diverse protein folds, to investigate the relationship between lysine-lysine distances in experimental starting structures and in simulation ensembles. We conclude that for DSS/BS(3), a distance constraint of 26-30 Å between Cα atoms is appropriate. This analysis provides a theoretical basis for the widespread practice of adding a tolerance to the crosslinker length when comparing XL-MS results to structures or in modeling. We also discuss the comparison of XL-MS results to MD simulations and known structures as a means to test and validate experimental XL-MS methods.
Assuntos
Lisina/química , Espectrometria de Massas/métodos , Simulação de Dinâmica Molecular , Reagentes de Ligações Cruzadas/química , Proteínas/química , Succinimidas/químicaRESUMO
The dynamic behavior of proteins is important for an understanding of their function and folding. We have performed molecular dynamics simulations of the native state and unfolding pathways of over 2000 protein/peptide systems (approximately 11,000 independent simulations) representing the majority of folds in globular proteins. These data are stored and organized using an innovative database approach, which can be mined to obtain both general and specific information about the dynamics and folding/unfolding of proteins, relevant subsets thereof, and individual proteins. Here we describe the project in general terms and the type of information contained in the database. Then we provide examples of mining the database for information relevant to protein folding, structure building, the effect of single-nucleotide polymorphisms, and drug design. The native state simulation data and corresponding analyses for the 100 most populated metafolds, together with related resources, are publicly accessible through http://www.dynameomics.org.