RESUMEN
For many macromolecular NMR ensembles from the Protein Data Bank (PDB) the experiment-based restraint lists are available, while other experimental data, mainly chemical shift values, are often available from the BioMagResBank. The accuracy and precision of the coordinates in these macromolecular NMR ensembles can be improved by recalculation using the available experimental data and present-day software. Such efforts, however, generally fail on half of all NMR ensembles due to the syntactic and semantic heterogeneity of the underlying data and the wide variety of formats used for their deposition. We have combined the remediated restraint information from our NMR Restraints Grid (NRG) database with available chemical shifts from the BioMagResBank and the Common Interface for NMR structure Generation (CING) structure validation reports into the weekly updated NRG-CING database (http://nmr.cmbi.ru.nl/NRG-CING). Eleven programs have been included in the NRG-CING production pipeline to arrive at validation reports that list for each entry the potential inconsistencies between the coordinates and the available experimental NMR data. The longitudinal validation of these data in a publicly available relational database yields a set of indicators that can be used to judge the quality of every macromolecular structure solved with NMR. The remediated NMR experimental data sets and validation reports are freely available online.
Asunto(s)
Bases de Datos de Proteínas , Resonancia Magnética Nuclear Biomolecular , Conformación Proteica , Reproducibilidad de los Resultados , Integración de SistemasRESUMEN
The BioMagResBank (BMRB: www.bmrb.wisc.edu) is a repository for experimental and derived data gathered from nuclear magnetic resonance (NMR) spectroscopic studies of biological molecules. BMRB is a partner in the Worldwide Protein Data Bank (wwPDB). The BMRB archive consists of four main data depositories: (i) quantitative NMR spectral parameters for proteins, peptides, nucleic acids, carbohydrates and ligands or cofactors (assigned chemical shifts, coupling constants and peak lists) and derived data (relaxation parameters, residual dipolar couplings, hydrogen exchange rates, pK(a) values, etc.), (ii) databases for NMR restraints processed from original author depositions available from the Protein Data Bank, (iii) time-domain (raw) spectral data from NMR experiments used to assign spectral resonances and determine the structures of biological macromolecules and (iv) a database of one- and two-dimensional (1)H and (13)C one- and two-dimensional NMR spectra for over 250 metabolites. The BMRB website provides free access to all of these data. BMRB has tools for querying the archive and retrieving information and an ftp site (ftp.bmrb.wisc.edu) where data in the archive can be downloaded in bulk. Two BMRB mirror sites exist: one at the PDBj, Protein Research Institute, Osaka University, Osaka, Japan (bmrb.protein.osaka-u.ac.jp) and the other at CERM, University of Florence, Florence, Italy (bmrb.postgenomicnmr.net/). The site at Osaka also accepts and processes data depositions.
Asunto(s)
Bases de Datos Factuales , Resonancia Magnética Nuclear Biomolecular , Carbohidratos/química , Internet , Ligandos , Ácidos Nucleicos/química , Péptidos/química , Proteínas/química , Interfaz Usuario-ComputadorRESUMEN
Several pilot experiments have indicated that improvements in older NMR structures can be expected by applying modern software and new protocols (Nabuurs et al. in Proteins 55:483-186, 2004; Nederveen et al. in Proteins 59:662-672, 2005; Saccenti and Rosato in J Biomol NMR 40:251-261, 2008). A recent large scale X-ray study also has shown that modern software can significantly improve the quality of X-ray structures that were deposited more than a few years ago (Joosten et al. in J. Appl Crystallogr 42:376-384, 2009; Sanderson in Nature 459:1038-1039, 2009). Recalculation of three-dimensional coordinates requires that the original experimental data are available and complete, and are semantically and syntactically correct, or are at least correct enough to be reconstructed. For multiple reasons, including a lack of standards, the heterogeneity of the experimental data and the many NMR experiment types, it has not been practical to parse a large proportion of the originally deposited NMR experimental data files related to protein NMR structures. This has made impractical the automatic recalculation, and thus improvement, of the three dimensional coordinates of these structures. We here describe a large-scale international collaborative effort to make all deposited experimental NMR data semantically and syntactically homogeneous, and thus useful for further research. A total of 4,014 out of 5,266 entries were 'cleaned' in this process. For 1,387 entries, human intervention was needed. Continuous efforts in automating the parsing of both old, and newly deposited files is steadily decreasing this fraction. The cleaned data files are available from the NMR restraints grid at http://restraintsgrid.bmrb.wisc.edu .
Asunto(s)
Bases de Datos de Proteínas/normas , Resonancia Magnética Nuclear Biomolecular/métodos , Ácidos Nucleicos , Proteínas , Procesamiento Automatizado de Datos , Humanos , Estándares de Referencia , Programas InformáticosRESUMEN
Assignment of individual compound identities within mixtures of thousands of metabolites in biological extracts is a major challenge for metabolomic technology. Mass spectrometry offers high sensitivity over a large dynamic range of abundances and molecular weights but is limited in its capacity to discriminate isobaric compounds. In this article, we have extended earlier studies using isotopic labeling for elemental composition elucidation (Rodgers, R. P.; Blumer, E. N.; Hendrickson, C. L.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 2000, 11, 835-40) to limit the formulas consistent with any exact mass measurement by comparing observations of metabolites extracted from Arabidopsis thaliana plants grown with (I) (12)C and (14)N (natural abundance), (II) (12)C and (15)N, (III) (13)C and (14)N, or (IV) (13)C and (15)N. Unique elemental compositions were determined over a dramatically enhanced mass range by analyzing exact mass measurement data from the four extracts using two methods. In the first, metabolite masses were matched with a library of 11,000 compounds known to be present in living cells by using values calculated for each of the four isotopic conditions. In the second method, metabolite masses were searched against masses calculated for a constrained subset of possible atomic combinations in all four isotopic regimes. In both methods, the lists of elemental compositions from each labeling regime were compared to find common formulas with similar retention properties by HPLC in at least three of the four regimes. These results demonstrate that metabolic labeling can be used to provide additional constraints for higher confidence formula assignments over an extended mass range.
Asunto(s)
Arabidopsis/química , Clorofila/metabolismo , Elementos Químicos , Marcaje Isotópico , Hojas de la Planta/química , Isótopos de Carbono , Cromatografía Líquida de Alta Presión , Bases de Datos como Asunto , Espectrometría de Masas , Metanol/química , Modelos Teóricos , Estructura Molecular , Isótopos de Nitrógeno , Hojas de la Planta/metabolismoRESUMEN
We recently developed two databases and a laboratory information system as resources for the metabolomics community. These tools are freely available and are intended to ease data analysis in both MS and NMR based metabolomics studies. The first database is a metabolomics extension to the BioMagResBank (BMRB, http://www.bmrb.wisc.edu), which currently contains experimental spectral data on over 270 pure compounds. Each small molecule entry consists of five or six one- and two-dimensional NMR data sets, along with information about the source of the compound, solution conditions, data collection protocol and the NMR pulse sequences. Users have free access to peak lists, spectra, and original time-domain data. The BMRB database can be queried by name, monoisotopic mass and chemical shift. We are currently developing a deposition tool that will enable people in the community to add their own data to this resource. Our second database, the Madison Metabolomics Consortium Database (MMCD, available from http://mmcd.nmrfam.wisc.edu/), is a hub for information on over 10,000 metabolites. These data were collected from a variety of sites with an emphasis on metabolites found in Arabidopsis. The MMC database supports extensive search functions and allows users to make bulk queries using experimental MS and/or NMR data. In addition to these databases, we have developed a new module for the Sesame laboratory information management system (http://www.sesame.wisc.edu) that captures all of the experimental protocols, background information, and experimental data associated with metabolomics samples. Sesame was designed to help coordinate research efforts in laboratories with high sample throughput and multiple investigators and to track all of the actions that have taken place in a particular study.