RESUMO
NMR spectroscopy is an indispensably powerful technique for the analysis of biomolecules under ambient conditions, both for structural- and functional studies. However, in practice the complexity of the technique has often frustrated its application by non-specialists. In this paper, we present CcpNmr version-3, the latest software release from the Collaborative Computational Project for NMR, for all aspects of NMR data analysis, including liquid- and solid-state NMR data. This software has been designed to be simple, functional and flexible, and aims to ensure that routine tasks can be performed in a straightforward manner. We have designed the software according to modern software engineering principles and leveraged the capabilities of modern graphics libraries to simplify a variety of data analysis tasks. We describe the process of backbone assignment as an example of the flexibility and simplicity of implementing workflows, as well as the toolkit used to create the necessary graphics for this workflow. The package can be downloaded from www.ccpn.ac.uk/v3-software/downloads and is freely available to all non-profit organisations.
Assuntos
Ressonância Magnética Nuclear Biomolecular/métodos , Software , Estatística como Assunto , Estrutura Molecular , Interface Usuário-Computador , Fluxo de TrabalhoRESUMO
CcpNmr Analysis provides a streamlined pipeline for both NMR chemical shift assignment and structure determination of biological macromolecules. In addition, it encompasses tools to analyse the many additional experiments that make NMR such a pivotal technique for research into complex biological questions. This report describes how CcpNmr Analysis can seamlessly link together all of the tasks in the NMR structure-determination process. It details each of the stages from generating NMR restraints [distance, dihedral, hydrogen bonds and residual dipolar couplings (RDCs)], exporting these to and subsequently re-importing them from structure-calculation software (such as the programs CYANA or ARIA) and analysing and validating the results obtained from the structure calculation to, ultimately, the streamlined deposition of the completed assignments and the refined ensemble of structures into the PDBe repository. Until recently, such solution-structure determination by NMR has been quite a laborious task, requiring multiple stages and programs. However, with the new enhancements to CcpNmr Analysis described here, this process is now much more intuitive and efficient and less error-prone.
Assuntos
Ressonância Magnética Nuclear Biomolecular/métodos , Ligação de Hidrogênio , Estrutura MolecularRESUMO
We performed a comprehensive structure validation of both automated and manually generated structures of the 10 targets of the CASD-NMR-2013 effort. We established that automated structure determination protocols are capable of reliably producing structures of comparable accuracy and quality to those generated by a skilled researcher, at least for small, single domain proteins such as the ten targets tested. The most robust results appear to be obtained when NOESY peak lists are used either as the primary input data or to augment chemical shift data without the need to manually filter such lists. A detailed analysis of the long-range NOE restraints generated by the different programs from the same data showed a surprisingly low degree of overlap. Additionally, we found that there was no significant correlation between the extent of the NOE restraint overlap and the accuracy of the structure. This result was surprising given the importance of NOE data in producing good quality structures. We suggest that this could be explained by the information redundancy present in NOEs between atoms contained within a fixed covalent network.
Assuntos
Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular/métodos , Conformação Proteica , Proteínas/química , Reprodutibilidade dos Testes , SoftwareRESUMO
The second round of the community-wide initiative Critical Assessment of automated Structure Determination of Proteins by NMR (CASD-NMR-2013) comprised ten blind target datasets, consisting of unprocessed spectral data, assigned chemical shift lists and unassigned NOESY peak and RDC lists, that were made available in both curated (i.e. manually refined) or un-curated (i.e. automatically generated) form. Ten structure calculation programs, using fully automated protocols only, generated a total of 164 three-dimensional structures (entries) for the ten targets, sometimes using both curated and un-curated lists to generate multiple entries for a single target. The accuracy of the entries could be established by comparing them to the corresponding manually solved structure of each target, which was not available at the time the data were provided. Across the entire data set, 71 % of all entries submitted achieved an accuracy relative to the reference NMR structure better than 1.5 Å. Methods based on NOESY peak lists achieved even better results with up to 100% of the entries within the 1.5 Å threshold for some programs. However, some methods did not converge for some targets using un-curated NOESY peak lists. Over 90% of the entries achieved an accuracy better than the more relaxed threshold of 2.5 Å that was used in the previous CASD-NMR-2010 round. Comparisons between entries generated with un-curated versus curated peaks show only marginal improvements for the latter in those cases where both calculations converged.
Assuntos
Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular/métodos , Conformação Proteica , Proteínas/química , Espectroscopia de Ressonância Magnética Nuclear de Carbono-13 , Conjuntos de Dados como Assunto , Espectroscopia de Prótons por Ressonância Magnética , Reprodutibilidade dos TestesRESUMO
Biomolecular structures at atomic resolution present a valuable resource for the understanding of biology. NMR spectroscopy accounts for 11% of all structures in the PDB repository. In response to serious problems with the accuracy of some of the NMR-derived structures and in order to facilitate proper analysis of the experimental models, a number of program suites are available. We discuss nine of these tools in this review: PROCHECK-NMR, PSVS, GLM-RMSD, CING, Molprobity, Vivaldi, ResProx, NMR constraints analyzer and QMEAN. We evaluate these programs for their ability to assess the structural quality, restraints and their violations, chemical shifts, peaks and the handling of multi-model NMR ensembles. We document both the input required by the programs and output they generate. To discuss their relative merits we have applied the tools to two representative examples from the PDB: a small, globular monomeric protein (Staphylococcal nuclease from S. aureus, PDB entry 2kq3) and a small, symmetric homodimeric protein (a region of human myosin-X, PDB entry 2lw9).
Assuntos
Ressonância Magnética Nuclear Biomolecular , Proteínas/química , Software , Bases de Dados de Proteínas , Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular/métodos , Conformação Proteica , Reprodutibilidade dos TestesRESUMO
SUMMARY: We present here the freely available Metabolomics Project resource specifically designed to work under the CcpNmr Analysis program produced by CCPN (Collaborative Computing Project for NMR) (Vranken et al., 2005, The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins, 59, 687-696). The project consists of a database of assigned 1D and 2D spectra of many common metabolites. The project aims to help the user to analyze and assign 1D and 2D NMR spectra of unknown metabolite mixtures. Spectra of unknown mixtures can be easily superimposed and compared with the database spectra, thus facilitating their assignment and identification. AVAILABILITY: The CCPN Metabolomics Project, together with an annotated example dataset, is freely available via: http://www.ccpn.ac.uk/metabolomics/.
Assuntos
Espectroscopia de Ressonância Magnética/métodos , Metabolômica/métodos , Software , Biologia Computacional/métodos , Bases de Dados FactuaisRESUMO
The Collaborative Computing Project for NMR (CCPN) has build a software framework consisting of the CCPN data model (with APIs) for NMR related data, the CcpNmr Analysis program and additional tools like CcpNmr FormatConverter. The open architecture allows for the integration of external software to extend the abilities of the CCPN framework with additional calculation methods. Recently, we have carried out the first steps for integrating our software Computer Simulation of Molecular Structures (COSMOS) into the CCPN framework. The COSMOS-NMR force field unites quantum chemical routines for the calculation of molecular properties with a molecular mechanics force field yielding the relative molecular energies. COSMOS-NMR allows introducing NMR parameters as constraints into molecular mechanics calculations. The resulting infrastructure will be made available for the NMR community. As a first application we have tested the evaluation of calculated protein structures using COSMOS-derived 13C Cα and Cß chemical shifts. In this paper we give an overview of the methodology and a roadmap for future developments and applications.
Assuntos
Disciplinas das Ciências Biológicas , Armazenamento e Recuperação da Informação/métodos , Internet , Modelos Químicos , Modelos Moleculares , Software , Interface Usuário-Computador , Simulação por Computador , Pesquisa sobre Serviços de Saúde/métodos , Disseminação de Informação/métodos , Fluxo de TrabalhoRESUMO
Solid-state magic-angle-spinning (MAS) NMR of proteins has undergone many rapid methodological developments in recent years, enabling detailed studies of protein structure, function and dynamics. Software development, however, has not kept pace with these advances and data analysis is mostly performed using tools developed for solution NMR which do not directly address solid-state specific issues. Here we present additions to the CcpNmr Analysis software package which enable easier identification of spinning side bands, straightforward analysis of double quantum spectra, automatic consideration of non-uniform labelling schemes, as well as extension of other existing features to the needs of solid-state MAS data. To underpin this, we have updated and extended the CCPN data model and experiment descriptions to include transfer types and nomenclature appropriate for solid-state NMR experiments, as well as a set of experiment prototypes covering the experiments commonly employed by solid-sate MAS protein NMR spectroscopists. This work not only improves solid-state MAS NMR data analysis but provides a platform for anyone who uses the CCPN data model for programming, data transfer, or data archival involving solid-state MAS NMR data.
Assuntos
Ressonância Magnética Nuclear Biomolecular/métodos , Proteínas/química , Software , Análise de Elementos Finitos , Modelos Químicos , Estatística como AssuntoRESUMO
We present a suite of software for the complete and easy deposition of NMR data to the PDB and BMRB. This suite uses the CCPN framework and introduces a freely downloadable, graphical desktop application called CcpNmr Entry Completion Interface (ECI) for the secure editing of experimental information and associated datasets through the lifetime of an NMR project. CCPN projects can be created within the CcpNmr Analysis software or by importing existing NMR data files using the CcpNmr FormatConverter. After further data entry and checking with the ECI, the project can then be rapidly deposited to the PDBe using AutoDep, or exported as a complete deposition NMR-STAR file. In full CCPN projects created with ECI, it is straightforward to select chemical shift lists, restraint data sets, structural ensembles and all relevant associated experimental collection details, which all are or will become mandatory when depositing to the PDB. Instructions and download information for the ECI are available from the PDBe web site at http://www.ebi.ac.uk/pdbe/nmr/deposition/eci.html .
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Ressonância Magnética Nuclear Biomolecular , Proteínas/química , Proteínas/classificação , Interface Usuário-ComputadorRESUMO
To address data management and data exchange problems in the nuclear magnetic resonance (NMR) community, the Collaborative Computing Project for the NMR community (CCPN) created a "Data Model" that describes all the different types of information needed in an NMR structural study, from molecular structure and NMR parameters to coordinates. This paper describes the development of a set of software applications that use the Data Model and its associated libraries, thus validating the approach. These applications are freely available and provide a pipeline for high-throughput analysis of NMR data. Three programs work directly with the Data Model: CcpNmr Analysis, an entirely new analysis and interactive display program, the CcpNmr FormatConverter, which allows transfer of data from programs commonly used in NMR to and from the Data Model, and the CLOUDS software for automated structure calculation and assignment (Carnegie Mellon University), which was rewritten to interact directly with the Data Model. The ARIA 2.0 software for structure calculation (Institut Pasteur) and the QUEEN program for validation of restraints (University of Nijmegen) were extended to provide conversion of their data to the Data Model. During these developments the Data Model has been thoroughly tested and used, demonstrating that applications can successfully exchange data via the Data Model. The software architecture developed by CCPN is now ready for new developments, such as integration with additional software applications and extensions of the Data Model into other areas of research.
Assuntos
Bases de Dados de Proteínas , Espectroscopia de Ressonância Magnética/métodos , Software , Gráficos por Computador , Espectroscopia de Ressonância Magnética/instrumentação , Modelos TeóricosRESUMO
In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces--APIs--currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Processamento Eletrônico de Dados , Software , Espectroscopia de Ressonância Magnética/normas , Padrões de ReferênciaRESUMO
Despite ongoing efforts in organising NMR information, there is no consistent and well-described generic standard for naming NMR experiments. The main reason for the absence of a universal naming system is that the information content of the coherence pathways is difficult to describe in full detail. We propose a system that describes the common and generic elements of the coherence pathways produced by pulse sequences. The system itself is formalised by an 'NMR experiment protocol' model, which is described in the Universal Modelling Language (UML) as part of the CCPN data model. Furthermore, normalized experiment names can be derived from this proposed model. We hope this article will stimulate discussion to organise the wealth of NMR experiments, and that by bringing this discussion into the public domain we can improve and expand our proposed system to include as much information and as many NMR experiments as possible.
Assuntos
Armazenamento e Recuperação da Informação/métodos , Espectroscopia de Ressonância Magnética/normas , Terminologia como Assunto , Bases de Dados como Assunto , Pesquisa , Vocabulário ControladoRESUMO
MOTIVATION: The lack of standards for storage and exchange of data is a serious hindrance for the large-scale data deposition, data mining and program interoperability that is becoming increasingly important in bioinformatics. The problem lies not only in defining and maintaining the standards, but also in convincing scientists and application programmers with a wide variety of backgrounds and interests to adhere to them. RESULTS: We present a UML-based programming framework for the modeling of data and the automated production of software to manipulate that data. Our approach allows one to make an abstract description of the structure of the data used in a particular scientific field and then use it to generate fully functional computer code for data access and input/output routines for data storage, together with accompanying documentation. This code can be generated simultaneously for different programming languages from a single model, together with, for example for format descriptions and I/O libraries XML and various relational databases. The framework is entirely general and could be applied in any subject area. We have used this approach to generate a data exchange standard for structural biology and analysis software for macromolecular NMR spectroscopy. AVAILABILITY: The framework is available under the GPL license, the data exchange standard with generated subroutine libraries under the LGPL license. Both may be found at http://www.ccpn.ac.uk; http://sourceforge.net/projects/ccpn CONTACT: ccpn@mole.bio.cam.ac.uk.