Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
1.
Nucleic Acids Res ; 51(D1): D368-D376, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36478084

ABSTRACT

The Biological Magnetic Resonance Data Bank (BMRB, https://bmrb.io) is the international open data repository for biomolecular nuclear magnetic resonance (NMR) data. Comprised of both empirical and derived data, BMRB has applications in the study of biomacromolecular structure and dynamics, biomolecular interactions, drug discovery, intrinsically disordered proteins, natural products, biomarkers, and metabolomics. Advances including GHz-class NMR instruments, national and trans-national NMR cyberinfrastructure, hybrid structural biology methods and machine learning are driving increases in the amount, type, and applications of NMR data in the biosciences. BMRB is a Core Archive and member of the World-wide Protein Data Bank (wwPDB).


Subject(s)
Databases, Chemical , Magnetic Resonance Spectroscopy , Databases, Protein , Nuclear Magnetic Resonance, Biomolecular , Protein Conformation
2.
Biophys J ; 112(8): 1529-1534, 2017 Apr 25.
Article in English | MEDLINE | ID: mdl-28445744

ABSTRACT

Advances in computation have been enabling many recent advances in biomolecular applications of NMR. Due to the wide diversity of applications of NMR, the number and variety of software packages for processing and analyzing NMR data is quite large, with labs relying on dozens, if not hundreds of software packages. Discovery, acquisition, installation, and maintenance of all these packages is a burdensome task. Because the majority of software packages originate in academic labs, persistence of the software is compromised when developers graduate, funding ceases, or investigators turn to other projects. To simplify access to and use of biomolecular NMR software, foster persistence, and enhance reproducibility of computational workflows, we have developed NMRbox, a shared resource for NMR software and computation. NMRbox employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud. Ongoing development includes a metadata harvester to regularize, annotate, and preserve workflows and facilitate and enhance data depositions to BioMagResBank, and tools for Bayesian inference to enhance the robustness and extensibility of computational analyses. In addition to facilitating use and preservation of the rich and dynamic software environment for biomolecular NMR, NMRbox fosters the development and deployment of a new class of metasoftware packages. NMRbox is freely available to not-for-profit users.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular , Software , Access to Information , Bayes Theorem , Cloud Computing , Internet , Metadata
3.
Libr Trends ; 65(4): 555-562, 2017.
Article in English | MEDLINE | ID: mdl-29375158

ABSTRACT

The era of big data and ubiquitous computation has brought with it concerns about ensuring reproducibility in this new research environment. It is easy to assume computational methods self-document by their very nature of being exact, deterministic processes. However, similar to laboratory experiments, ensuring reproducibility in the computational realm requires the documentation of both the protocols used (workflows) as well as a detailed description of the computational environment: algorithms, implementations, software environments as well as the data ingested and execution logs of the computation. These two aspects of computational reproducibility (workflows and execution details) are discussed in the context of biomolecular Nuclear Magnetic Resonance spectroscopy (bioNMR) as well as the PRIMAD model for computational reproducibility.

4.
J Biomol NMR ; 63(2): 141-50, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26253947

ABSTRACT

Reproducibility is a cornerstone of the scientific method, essential for validation of results by independent laboratories and the sine qua non of scientific progress. A key step toward reproducibility of biomolecular NMR studies was the establishment of public data repositories (PDB and BMRB). Nevertheless, bio-NMR studies routinely fall short of the requirement for reproducibility that all the data needed to reproduce the results are published. A key limitation is that considerable metadata goes unpublished, notably manual interventions that are typically applied during the assignment of multidimensional NMR spectra. A general solution to this problem has been elusive, in part because of the wide range of approaches and software packages employed in the analysis of protein NMR spectra. Here we describe an approach for capturing missing metadata during the assignment of protein NMR spectra that can be generalized to arbitrary workflows, different software packages, other biomolecules, or other stages of data analysis in bio-NMR. We also present extensions to the NMR-STAR data dictionary that enable machine archival and retrieval of the "missing" metadata.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular , Proteins/chemistry , Computational Biology/methods , Databases, Protein , Humans , Nuclear Magnetic Resonance, Biomolecular/methods , Reproducibility of Results
5.
J Biomol NMR ; 62(3): 313-26, 2015 Jul.
Article in English | MEDLINE | ID: mdl-26066803

ABSTRACT

CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular/methods , Software , Computational Biology , User-Computer Interface
6.
Nucleic Acids Res ; 40(Database issue): D252-60, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22146221

ABSTRACT

Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.


Subject(s)
Amino Acid Motifs , Databases, Protein , Amino Acid Sequence , Consensus Sequence , Models, Biological , Protein Interaction Maps , Proteins/genetics , Sequence Analysis, Protein
7.
Comput Sci Eng ; 15(1): 76-83, 2012 May 01.
Article in English | MEDLINE | ID: mdl-24634607

ABSTRACT

The problem of formatting data so that it conforms to the required input for scientific data processing tools pervades scientific computing. The CONNecticut Joint University Research Group (CONNJUR) has developed a data translation tool based on a pipeline architecture that partially solves this problem. The CONNJUR Spectrum Translator supports data format translation for experiments that use Nuclear Magnetic Resonance to determine the structure of large protein molecules.

8.
J Biomol NMR ; 50(1): 83-9, 2011 May.
Article in English | MEDLINE | ID: mdl-21409563

ABSTRACT

NMR spectroscopists are hindered by the lack of standardization for spectral data among the file formats for various NMR data processing tools. This lack of standardization is cumbersome as researchers must perform their own file conversion in order to switch between processing tools and also restricts the combination of tools employed if no conversion option is available. The CONNJUR Spectrum Translator introduces a new, extensible architecture for spectrum translation and introduces two key algorithmic improvements. This first is translation of NMR spectral data (time and frequency domain) to a single in-memory data model to allow addition of new file formats with two converter modules, a reader and a writer, instead of writing a separate converter to each existing format. Secondly, the use of layout descriptors allows a single fid data translation engine to be used for all formats. For the end user, sophisticated metadata readers allow conversion of the majority of files with minimum user configuration. The open source code is freely available at http://connjur.sourceforge.net for inspection and extension.


Subject(s)
Magnetic Resonance Spectroscopy/methods , Software , Algorithms , User-Computer Interface
9.
Nucleic Acids Res ; 37(18): e124, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19656955

ABSTRACT

Residue conservation is an important, established method for inferring protein function, modularity and specificity. It is important to recognize that it is the 3D spatial orientation of residues that drives sequence conservation. Considering this, we have built a new computational tool, VENN that allows researchers to interactively and graphically titrate sequence homology onto surface representations of protein structures. Our proposed titration strategies reveal critical details that are not readily identified using other existing tools. Analyses of a bZIP transcription factor and receptor recognition of Fibroblast Growth Factor using VENN revealed key specificity determinants. Weblink: http://sbtools.uchc.edu/venn/.


Subject(s)
Protein Conformation , Sequence Homology, Amino Acid , Software , Amino Acid Sequence , CCAAT-Enhancer-Binding Protein-beta/chemistry , Conserved Sequence , Fibroblast Growth Factor 8/chemistry , Humans , Models, Molecular
10.
Nucleic Acids Res ; 37(Database issue): D185-90, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18978024

ABSTRACT

Minimotif Miner (MnM) consists of a minimotif database and a web-based application that enables prediction of motif-based functions in user-supplied protein queries. We have revised MnM by expanding the database more than 10-fold to approximately 5000 motifs and standardized the motif function definitions. The web-application user interface has been redeveloped with new features including improved navigation, screencast-driven help, support for alias names and expanded SNP analysis. A sample analysis of prion shows how MnM 2 can be used. Weblink: http://mnm.engr.uconn.edu, weblink for version 1 is http://sms.engr.uconn.edu.


Subject(s)
Amino Acid Motifs , Databases, Protein , Protein Interaction Domains and Motifs , Amino Acid Motifs/genetics , Internet , Polymorphism, Single Nucleotide , Prions , Sequence Analysis, Protein , User-Computer Interface
11.
Article in English | MEDLINE | ID: mdl-36312518

ABSTRACT

The STAR (Self-defining Text Archival and Retrieval) file format for electronic data transfer and archiving was introduced in 1991 (Hall, 1991). This format was designed to be extensible and flexible to handle all types of data in a machine independent manner. As a file format, STAR encompasses both a model (structure) for the information contained within the file as well as a syntax for defining the layout of the information within the file (serialization). This manuscript reports on an attempt to decompose the model from the layout and in doing so, both highlight differences between variants and versions of STAR as well as propose a simple alternate serialization of the STAR model in XML.

12.
Front Mol Biosci ; 8: 817175, 2021.
Article in English | MEDLINE | ID: mdl-35111815

ABSTRACT

The Biological Magnetic Resonance Data Bank (BMRB) has served the NMR structural biology community for 40 years, and has been instrumental in the development of many widely-used tools. It fosters the reuse of data resources in structural biology by embodying the FAIR data principles (Findable, Accessible, Inter-operable, and Re-usable). NMRbox is less than a decade old, but complements BMRB by providing NMR software and high-performance computing resources, facilitating the reuse of software resources. BMRB and NMRbox both facilitate reproducible research. NMRbox also fosters the development and deployment of complex meta-software. Combining BMRB and NMRbox helps speed and simplify workflows that utilize BMRB, and enables facile federation of BMRB with other data repositories. Utilization of BMRB and NMRbox in tandem will enable additional advances, such as machine learning, that are poised to become increasingly powerful.

13.
BMC Bioinformatics ; 11: 328, 2010 Jun 16.
Article in English | MEDLINE | ID: mdl-20565705

ABSTRACT

BACKGROUND: Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. RESULTS: We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. CONCLUSIONS: MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context.


Subject(s)
Algorithms , Amino Acid Motifs , Databases, Protein , Proteins/chemistry , Animals , Artificial Intelligence , Data Mining/methods , Humans , Protein Binding , Proteins/metabolism , Sequence Analysis, Protein
14.
Article in English | MEDLINE | ID: mdl-33767737

ABSTRACT

This paper reports on the ongoing activities and curation practices of the National Center for Biomolecular NMR Data Processing and Analysis. Over the past several years, the Center has been developing and extending computational workflow management software for use by a community of biomolecular NMR spectroscopists. Previous work had been to refactor the workflow system to utilize the PREMIS framework for reporting retrospective provenance as well as for sharing workflows between scientists and to support data reuse. In this paper, we report on our recent efforts to embed analytics within the workflow execution and within provenance tracking. Important metrics for each of the intermediate datasets are included within the corresponding PREMIS intellectual object, which allows for both inspection of the operation of individual actors as well as visualization of the changes throughout a full processing workflow. These metrics can be viewed within the workflow management system or through standalone metadata widgets. Our approach is to support a hybrid approach of both automated, workflow execution as well as manual intervention and metadata management. In this combination, the workflow system and metadata widgets encourage the domain experts to be avid curators of the data which they create, fostering both computational reproducibility and scientific data reuse.

15.
BMC Genomics ; 10: 360, 2009 Aug 05.
Article in English | MEDLINE | ID: mdl-19656396

ABSTRACT

BACKGROUND: One of the most important developments in bioinformatics over the past few decades has been the observation that short linear peptide sequences (minimotifs) mediate many classes of cellular functions such as protein-protein interactions, molecular trafficking and post-translational modifications. As both the creators and curators of a database which catalogues minimotifs, Minimotif Miner, the authors have a unique perspective on the commonalities of the many functional roles of minimotifs. There is an obvious usefulness in standardizing functional annotations both in allowing for the facile exchange of data between various bioinformatics resources, as well as the internal clustering of sets of related data elements. With these two purposes in mind, the authors provide a proposed syntax for minimotif semantics primarily useful for functional annotation. RESULTS: Herein, we present a structured syntax of minimotifs and their functional annotation. A syntax-based model of minimotif function with established minimotif sequence definitions was implemented using a relational database management system (RDBMS). To assess the usefulness of our standardized semantics, a series of database queries and stored procedures were used to classify SH3 domain binding minimotifs into 10 groups spanning 700 unique binding sequences. CONCLUSION: Our derived minimotif syntax is currently being used to normalize minimotif covalent chemistry and functional definitions within the MnM database. Analysis of SH3 binding minimotif data spanning many different studies within our database reveals unique attributes and frequencies which can be used to classify different types of binding minimotifs. Implementation of the syntax in the relational database enables the application of many different analysis protocols of minimotif data and is an important tool that will help to better understand specificity of minimotif-driven molecular interactions with proteins.


Subject(s)
Computational Biology/methods , Database Management Systems , Databases, Protein , Amino Acid Motifs , Protein Interaction Domains and Motifs , Semantics
16.
Int J Digit Curation ; 13(1): 286-293, 2018.
Article in English | MEDLINE | ID: mdl-31061674

ABSTRACT

This paper describes our recent and ongoing efforts for enhancing the curation of scientific workflows to improve reproducibility and reusability of biomolecular nuclear magnetic resonance (bioNMR) data. Our efforts have focused on both developing a workflow management system, called CONNJUR Workflow Builder (CWB), as well as refactoring our workflow data model to make use of the PREMIS model for digital preservation. This revised workflow management system will be available through the NMRbox cloud-computing platform for bioNMR. In addition, we are implementing a new file structure which bundles the original binary data files along with PREMIS XML records describing the provenance of the data. These are packaged together using a standardized file archive utility. In this manner, the provenance and data curation information is maintained together along with the scientific data. The benefits and limitations of these approaches as well as future directions are discussed.

17.
Transform Digit Worlds (2018) ; 10766: 620-625, 2018 Mar.
Article in English | MEDLINE | ID: mdl-30334020

ABSTRACT

Two barriers to computational reproducibility are the ability to record the critical metadata required for rerunning a computation, as well as translating the semantics of the metadata so that alternate approaches can easily be configured for verifying computational reproducibility. We are addressing this problem in the context of biomolecular NMR computational analysis by developing a series of linked ontologies which define the semantics of the various software tools used by researchers for data transformation and analysis. Building from a core ontology representing the primary observational data of NMR, the linked data approach allows for the translation of metadata in order to configure alternate software approaches for given computational tasks. In this paper we illustrate the utility of this with a small sample of the core ontology as well as tool-specific semantics for two third-party software tools. This approach to semantic mediation will help support an automated approach to validating the reliability of computation in which the same processing workflow is implemented with different software tools. In addition, the detailed semantics of both the data and the processing functionalities will provide a method for software tool classification.

18.
BMC Bioinformatics ; 8: 31, 2007 Jan 30.
Article in English | MEDLINE | ID: mdl-17263870

ABSTRACT

BACKGROUND: Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. RESULTS: We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. CONCLUSION: Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.


Subject(s)
Decision Support Techniques , Documentation/methods , Laboratories/organization & administration , Magnetic Resonance Spectroscopy/methods , Models, Organizational , Research/organization & administration , Science/organization & administration , Science/methods , Software , Workplace/organization & administration
19.
Nucleic Acids Res ; 31(2): 580-8, 2003 Jan 15.
Article in English | MEDLINE | ID: mdl-12527765

ABSTRACT

Human X-ray cross-complementing group 1 (XRCC1) is a single-strand DNA break repair protein which forms a base excision repair (BER) complex with DNA polymerase beta (beta-Pol). Here we report a site- directed mutational analysis in which 16 mutated versions of the XRCC1 N-terminal domain (XRCC1-NTD) were constructed on the basis of previous NMR results that had implicated the proximity of various surface residues to beta-Pol. Mutant proteins defective in XRCC1-NTD interaction with beta-Pol and with a beta-Pol-gapped DNA complex were determined by gel filtration chromatography and a gel mobility shift assay. The interaction surface determined from the mutated residues was found to encompass beta-strand D and E of the five-stranded beta-sheet (betaABGDE) and the protruding alpha2 helix of the XRCC1-NTD. Mutations that included F67A (betaD), E69K (betaD), V86R (betaE) on the five-stranded beta-sheet and deletion of the alpha2 helix, but not mutations within alpha2, abolished binding of the XRCC1-NTD to beta-Pol. A Y136A mutant abolished beta-Pol binding, and a R109S mutant reduced beta-Pol binding. E98K, E98A, N104A, Y136A, R109S, K129E, F142A, R31A/K32A/R34A and delta-helix-2 mutants displayed temperature dependent solubility. These findings confirm the importance of the alpha2 helix and the betaD and betaE strands of XRCC1-NTD to the energetics of beta-Pol binding. Establishing the direct contacts in the beta-Pol XRCC1 complex is a critical step in understanding how XRCC1 fulfills its numerous functions in DNA BER.


Subject(s)
DNA Polymerase beta/metabolism , DNA-Binding Proteins/metabolism , Amino Acid Substitution , Binding Sites/genetics , Chromatography, High Pressure Liquid , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Electrophoresis, Polyacrylamide Gel , Magnetic Resonance Spectroscopy , Mutagenesis, Site-Directed , Mutation , Protein Binding , Protein Conformation , Protein Folding , Solubility , Temperature , X-ray Repair Cross Complementing Protein 1
SELECTION OF CITATIONS
SEARCH DETAIL