Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(D1): D368-D376, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36478084

RESUMEN

The Biological Magnetic Resonance Data Bank (BMRB, https://bmrb.io) is the international open data repository for biomolecular nuclear magnetic resonance (NMR) data. Comprised of both empirical and derived data, BMRB has applications in the study of biomacromolecular structure and dynamics, biomolecular interactions, drug discovery, intrinsically disordered proteins, natural products, biomarkers, and metabolomics. Advances including GHz-class NMR instruments, national and trans-national NMR cyberinfrastructure, hybrid structural biology methods and machine learning are driving increases in the amount, type, and applications of NMR data in the biosciences. BMRB is a Core Archive and member of the World-wide Protein Data Bank (wwPDB).


Asunto(s)
Bases de Datos de Compuestos Químicos , Espectroscopía de Resonancia Magnética , Bases de Datos de Proteínas , Resonancia Magnética Nuclear Biomolecular , Conformación Proteica
2.
Front Mol Biosci ; 8: 817175, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35111815

RESUMEN

The Biological Magnetic Resonance Data Bank (BMRB) has served the NMR structural biology community for 40 years, and has been instrumental in the development of many widely-used tools. It fosters the reuse of data resources in structural biology by embodying the FAIR data principles (Findable, Accessible, Inter-operable, and Re-usable). NMRbox is less than a decade old, but complements BMRB by providing NMR software and high-performance computing resources, facilitating the reuse of software resources. BMRB and NMRbox both facilitate reproducible research. NMRbox also fosters the development and deployment of complex meta-software. Combining BMRB and NMRbox helps speed and simplify workflows that utilize BMRB, and enables facile federation of BMRB with other data repositories. Utilization of BMRB and NMRbox in tandem will enable additional advances, such as machine learning, that are poised to become increasingly powerful.

3.
Artículo en Inglés | MEDLINE | ID: mdl-36312518

RESUMEN

The STAR (Self-defining Text Archival and Retrieval) file format for electronic data transfer and archiving was introduced in 1991 (Hall, 1991). This format was designed to be extensible and flexible to handle all types of data in a machine independent manner. As a file format, STAR encompasses both a model (structure) for the information contained within the file as well as a syntax for defining the layout of the information within the file (serialization). This manuscript reports on an attempt to decompose the model from the layout and in doing so, both highlight differences between variants and versions of STAR as well as propose a simple alternate serialization of the STAR model in XML.

4.
Artículo en Inglés | MEDLINE | ID: mdl-33767737

RESUMEN

This paper reports on the ongoing activities and curation practices of the National Center for Biomolecular NMR Data Processing and Analysis. Over the past several years, the Center has been developing and extending computational workflow management software for use by a community of biomolecular NMR spectroscopists. Previous work had been to refactor the workflow system to utilize the PREMIS framework for reporting retrospective provenance as well as for sharing workflows between scientists and to support data reuse. In this paper, we report on our recent efforts to embed analytics within the workflow execution and within provenance tracking. Important metrics for each of the intermediate datasets are included within the corresponding PREMIS intellectual object, which allows for both inspection of the operation of individual actors as well as visualization of the changes throughout a full processing workflow. These metrics can be viewed within the workflow management system or through standalone metadata widgets. Our approach is to support a hybrid approach of both automated, workflow execution as well as manual intervention and metadata management. In this combination, the workflow system and metadata widgets encourage the domain experts to be avid curators of the data which they create, fostering both computational reproducibility and scientific data reuse.

5.
Transform Digit Worlds (2018) ; 10766: 620-625, 2018 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30334020

RESUMEN

Two barriers to computational reproducibility are the ability to record the critical metadata required for rerunning a computation, as well as translating the semantics of the metadata so that alternate approaches can easily be configured for verifying computational reproducibility. We are addressing this problem in the context of biomolecular NMR computational analysis by developing a series of linked ontologies which define the semantics of the various software tools used by researchers for data transformation and analysis. Building from a core ontology representing the primary observational data of NMR, the linked data approach allows for the translation of metadata in order to configure alternate software approaches for given computational tasks. In this paper we illustrate the utility of this with a small sample of the core ontology as well as tool-specific semantics for two third-party software tools. This approach to semantic mediation will help support an automated approach to validating the reliability of computation in which the same processing workflow is implemented with different software tools. In addition, the detailed semantics of both the data and the processing functionalities will provide a method for software tool classification.

6.
Int J Digit Curation ; 13(1): 286-293, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31061674

RESUMEN

This paper describes our recent and ongoing efforts for enhancing the curation of scientific workflows to improve reproducibility and reusability of biomolecular nuclear magnetic resonance (bioNMR) data. Our efforts have focused on both developing a workflow management system, called CONNJUR Workflow Builder (CWB), as well as refactoring our workflow data model to make use of the PREMIS model for digital preservation. This revised workflow management system will be available through the NMRbox cloud-computing platform for bioNMR. In addition, we are implementing a new file structure which bundles the original binary data files along with PREMIS XML records describing the provenance of the data. These are packaged together using a standardized file archive utility. In this manner, the provenance and data curation information is maintained together along with the scientific data. The benefits and limitations of these approaches as well as future directions are discussed.

7.
Biophys J ; 112(8): 1529-1534, 2017 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-28445744

RESUMEN

Advances in computation have been enabling many recent advances in biomolecular applications of NMR. Due to the wide diversity of applications of NMR, the number and variety of software packages for processing and analyzing NMR data is quite large, with labs relying on dozens, if not hundreds of software packages. Discovery, acquisition, installation, and maintenance of all these packages is a burdensome task. Because the majority of software packages originate in academic labs, persistence of the software is compromised when developers graduate, funding ceases, or investigators turn to other projects. To simplify access to and use of biomolecular NMR software, foster persistence, and enhance reproducibility of computational workflows, we have developed NMRbox, a shared resource for NMR software and computation. NMRbox employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud. Ongoing development includes a metadata harvester to regularize, annotate, and preserve workflows and facilitate and enhance data depositions to BioMagResBank, and tools for Bayesian inference to enhance the robustness and extensibility of computational analyses. In addition to facilitating use and preservation of the rich and dynamic software environment for biomolecular NMR, NMRbox fosters the development and deployment of a new class of metasoftware packages. NMRbox is freely available to not-for-profit users.


Asunto(s)
Resonancia Magnética Nuclear Biomolecular , Programas Informáticos , Acceso a la Información , Teorema de Bayes , Nube Computacional , Internet , Metadatos
8.
Libr Trends ; 65(4): 555-562, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29375158

RESUMEN

The era of big data and ubiquitous computation has brought with it concerns about ensuring reproducibility in this new research environment. It is easy to assume computational methods self-document by their very nature of being exact, deterministic processes. However, similar to laboratory experiments, ensuring reproducibility in the computational realm requires the documentation of both the protocols used (workflows) as well as a detailed description of the computational environment: algorithms, implementations, software environments as well as the data ingested and execution logs of the computation. These two aspects of computational reproducibility (workflows and execution details) are discussed in the context of biomolecular Nuclear Magnetic Resonance spectroscopy (bioNMR) as well as the PRIMAD model for computational reproducibility.

9.
J Biomol NMR ; 63(2): 141-50, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26253947

RESUMEN

Reproducibility is a cornerstone of the scientific method, essential for validation of results by independent laboratories and the sine qua non of scientific progress. A key step toward reproducibility of biomolecular NMR studies was the establishment of public data repositories (PDB and BMRB). Nevertheless, bio-NMR studies routinely fall short of the requirement for reproducibility that all the data needed to reproduce the results are published. A key limitation is that considerable metadata goes unpublished, notably manual interventions that are typically applied during the assignment of multidimensional NMR spectra. A general solution to this problem has been elusive, in part because of the wide range of approaches and software packages employed in the analysis of protein NMR spectra. Here we describe an approach for capturing missing metadata during the assignment of protein NMR spectra that can be generalized to arbitrary workflows, different software packages, other biomolecules, or other stages of data analysis in bio-NMR. We also present extensions to the NMR-STAR data dictionary that enable machine archival and retrieval of the "missing" metadata.


Asunto(s)
Resonancia Magnética Nuclear Biomolecular , Proteínas/química , Biología Computacional/métodos , Bases de Datos de Proteínas , Humanos , Resonancia Magnética Nuclear Biomolecular/métodos , Reproducibilidad de los Resultados
10.
J Biomol NMR ; 62(3): 313-26, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-26066803

RESUMEN

CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses.


Asunto(s)
Resonancia Magnética Nuclear Biomolecular/métodos , Programas Informáticos , Biología Computacional , Interfaz Usuario-Computador
11.
Artículo en Inglés | MEDLINE | ID: mdl-24352525

RESUMEN

Nuclear Magnetic Resonance (NMR) spectroscopy is a technique for acquiring protein data at atomic resolution and determining the three-dimensional structure of large protein molecules. A typical structure determination process results in the deposition of a large data sets to the BMRB (Bio-Magnetic Resonance Data Bank). This data is stored and shared in a file format called NMR-Star. This format is syntactically and semantically complex making it challenging to parse. Nevertheless, parsing these files is crucial to applying the vast amounts of biological information stored in NMR-Star files, allowing researchers to harness the results of previous studies to direct and validate future work. One powerful approach for parsing files is to apply a Backus-Naur Form (BNF) grammar, which is a high-level model of a file format. Translation of the grammatical model to an executable parser may be automatically accomplished. This paper will show how we applied a model BNF grammar of the NMR-Star format to create a free, open-source parser, using a method that originated in the functional programming world known as "parser combinators". This paper demonstrates the effectiveness of a principled approach to file specification and parsing. This paper also builds upon our previous work [1], in that 1) it applies concepts from Functional Programming (which is relevant even though the implementation language, Java, is more mainstream than Functional Programming), and 2) all work and accomplishments from this project will be made available under standard open source licenses to provide the community with the opportunity to learn from our techniques and methods.

12.
PLoS One ; 7(12): e49957, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23236358

RESUMEN

Minimotifs are short contiguous segments of proteins that have a known biological function. The hundreds of thousands of minimotifs discovered thus far are an important part of the theoretical understanding of the specificity of protein-protein interactions, posttranslational modifications, and signal transduction that occur in cells. However, a longstanding problem is that the different abstractions of the sequence definitions do not accurately capture the specificity, despite decades of effort by many labs. We present evidence that structure is an essential component of minimotif specificity, yet is not used in minimotif definitions. Our analysis of several known minimotifs as case studies, analysis of occurrences of minimotifs in structured and disordered regions of proteins, and review of the literature support a new model for minimotif definitions that includes sequence, structure, and function.


Asunto(s)
Secuencias de Aminoácidos , Estructura Secundaria de Proteína , Proteínas/química , Bases de Datos de Proteínas , Humanos , Análisis de Secuencia de Proteína
14.
Artículo en Inglés | MEDLINE | ID: mdl-25328913

RESUMEN

Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org).

15.
Comput Sci Eng ; 15(1): 76-83, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24634607

RESUMEN

The problem of formatting data so that it conforms to the required input for scientific data processing tools pervades scientific computing. The CONNecticut Joint University Research Group (CONNJUR) has developed a data translation tool based on a pipeline architecture that partially solves this problem. The CONNJUR Spectrum Translator supports data format translation for experiments that use Nuclear Magnetic Resonance to determine the structure of large protein molecules.

16.
Nucleic Acids Res ; 40(Database issue): D252-60, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22146221

RESUMEN

Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.


Asunto(s)
Secuencias de Aminoácidos , Bases de Datos de Proteínas , Secuencia de Aminoácidos , Secuencia de Consenso , Modelos Biológicos , Mapas de Interacción de Proteínas , Proteínas/genética , Análisis de Secuencia de Proteína
17.
J Biomol NMR ; 50(1): 83-9, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21409563

RESUMEN

NMR spectroscopists are hindered by the lack of standardization for spectral data among the file formats for various NMR data processing tools. This lack of standardization is cumbersome as researchers must perform their own file conversion in order to switch between processing tools and also restricts the combination of tools employed if no conversion option is available. The CONNJUR Spectrum Translator introduces a new, extensible architecture for spectrum translation and introduces two key algorithmic improvements. This first is translation of NMR spectral data (time and frequency domain) to a single in-memory data model to allow addition of new file formats with two converter modules, a reader and a writer, instead of writing a separate converter to each existing format. Secondly, the use of layout descriptors allows a single fid data translation engine to be used for all formats. For the end user, sophisticated metadata readers allow conversion of the majority of files with minimum user configuration. The open source code is freely available at http://connjur.sourceforge.net for inspection and extension.


Asunto(s)
Espectroscopía de Resonancia Magnética/métodos , Programas Informáticos , Algoritmos , Interfaz Usuario-Computador
18.
Proc Int Conf Inf Technol New Gener ; : 1014-1020, 2011 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-22214925

RESUMEN

The CONNecticut Joint University Research (CONNJUR) team is a group of biochemical and software engineering researchers at multiple institutions. The vision of the team is to develop a comprehensive application that integrates a variety of existing analysis tools with workflow and data management to support the process of protein structure determination using Nuclear Magnetic Resonance (NMR). The use of multiple disparate tools and lack of data management, currently the norm in NMR data processing, provides strong motivation for such an integrated environment. This manuscript briefly describes the domain of NMR as used for protein structure determination and explains the formation of the CONNJUR team and its operation in developing the CONNJUR application. The manuscript also describes the evolution of the CONNJUR application through four prototypes and describes the challenges faced while developing the CONNJUR application and how those challenges were met.

20.
BMC Bioinformatics ; 11: 328, 2010 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-20565705

RESUMEN

BACKGROUND: Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. RESULTS: We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. CONCLUSIONS: MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context.


Asunto(s)
Algoritmos , Secuencias de Aminoácidos , Bases de Datos de Proteínas , Proteínas/química , Animales , Inteligencia Artificial , Minería de Datos/métodos , Humanos , Unión Proteica , Proteínas/metabolismo , Análisis de Secuencia de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA