Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Methods Mol Biol ; 2112: 187-218, 2020.
Article in English | MEDLINE | ID: mdl-32006287

ABSTRACT

The Biological Magnetic Resonance Data Bank (BioMagResBank or BMRB), founded in 1988, serves as the archive for data generated by nuclear magnetic resonance (NMR) spectroscopy of biological systems. NMR spectroscopy is unique among biophysical approaches in its ability to provide a broad range of atomic and higher-level information relevant to the structural, dynamic, and chemical properties of biological macromolecules, as well as report on metabolite and natural product concentrations in complex mixtures and their chemical structures. BMRB became a core member of the Worldwide Protein Data Bank (wwPDB) in 2007, and the BMRB archive is now a core archive of the wwPDB. Currently, about 10% of the structures deposited into the PDB archive are based on NMR spectroscopy. BMRB stores experimental and derived data from biomolecular NMR studies. Newer BMRB biopolymer depositions are divided about evenly between those associated with structure determinations (atomic coordinates and supporting information archived in the PDB) and those reporting experimental information on molecular dynamics, conformational transitions, ligand binding, assigned chemical shifts, or other results from NMR spectroscopy. BMRB also provides resources for NMR studies of metabolites and other small molecules that are often macromolecular ligands and/or nonstandard residues. This chapter is directed to the structural biology community rather than the metabolomics and natural products community. Our goal is to describe various BMRB services offered to structural biology researchers and how they can be accessed and utilized. These services can be classified into four main groups: (1) data deposition, (2) data retrieval, (3) data analysis, and (4) services for NMR spectroscopists and software developers. The chapter also describes the NMR-STAR data format used by BMRB and the tools provided to facilitate its use. For programmers, BMRB offers an application programming interface (API) and libraries in the Python and R languages that enable users to develop their own BMRB-based tools for data analysis, visualization, and manipulation of NMR-STAR formatted files. BMRB also provides users with direct access tools through the NMRbox platform.


Subject(s)
Macromolecular Substances/chemistry , Molecular Biology/methods , Protein Conformation , Proteins/chemistry , Databases, Protein , Ligands , Nuclear Magnetic Resonance, Biomolecular/methods , Software
2.
J Biomol NMR ; 73(1-2): 5-9, 2019 Feb.
Article in English | MEDLINE | ID: mdl-30580387

ABSTRACT

The growth of the biological nuclear magnetic resonance (NMR) field and the development of new experimental technology have mandated the revision and enlargement of the NMR-STAR ontology used to represent experiments, spectral and derived data, and supporting metadata. We present here a brief description of the NMR-STAR ontology and software tools for manipulating NMR-STAR data files, editing the files, extracting selected data, and creating data visualizations. Detailed information on these is accessible from the links provided.


Subject(s)
Biological Ontologies , Nuclear Magnetic Resonance, Biomolecular , Information Storage and Retrieval , Software , Vocabulary, Controlled
3.
Biophys J ; 112(8): 1529-1534, 2017 Apr 25.
Article in English | MEDLINE | ID: mdl-28445744

ABSTRACT

Advances in computation have been enabling many recent advances in biomolecular applications of NMR. Due to the wide diversity of applications of NMR, the number and variety of software packages for processing and analyzing NMR data is quite large, with labs relying on dozens, if not hundreds of software packages. Discovery, acquisition, installation, and maintenance of all these packages is a burdensome task. Because the majority of software packages originate in academic labs, persistence of the software is compromised when developers graduate, funding ceases, or investigators turn to other projects. To simplify access to and use of biomolecular NMR software, foster persistence, and enhance reproducibility of computational workflows, we have developed NMRbox, a shared resource for NMR software and computation. NMRbox employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud. Ongoing development includes a metadata harvester to regularize, annotate, and preserve workflows and facilitate and enhance data depositions to BioMagResBank, and tools for Bayesian inference to enhance the robustness and extensibility of computational analyses. In addition to facilitating use and preservation of the rich and dynamic software environment for biomolecular NMR, NMRbox fosters the development and deployment of a new class of metasoftware packages. NMRbox is freely available to not-for-profit users.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular , Software , Access to Information , Bayes Theorem , Cloud Computing , Internet , Metadata
4.
Curr Opin Biotechnol ; 43: 56-61, 2017 02.
Article in English | MEDLINE | ID: mdl-27643760

ABSTRACT

The metabolome, the collection of small molecules associated with an organism, is a growing subject of inquiry, with the data utilized for data-intensive systems biology, disease diagnostics, biomarker discovery, and the broader characterization of small molecules in mixtures. Owing to their close proximity to the functional endpoints that govern an organism's phenotype, metabolites are highly informative about functional states. The field of metabolomics identifies and quantifies endogenous and exogenous metabolites in biological samples. Information acquired from nuclear magnetic spectroscopy (NMR), mass spectrometry (MS), and the published literature, as processed by statistical approaches, are driving increasingly wider applications of metabolomics. This review focuses on the role of databases and software tools in advancing the rigor, robustness, reproducibility, and validation of metabolomics studies.


Subject(s)
Databases, Factual , Magnetic Resonance Spectroscopy/methods , Metabolome , Metabolomics/methods , Software , Systems Biology/methods , Animals , Humans , Magnetic Resonance Imaging
5.
FEBS Lett ; 587(11): 1587-91, 2013 Jun 05.
Article in English | MEDLINE | ID: mdl-23603389

ABSTRACT

The axis inhibition (Axin) scaffold protein colocalizes ß-catenin, casein kinase Iα, and glycogen synthetase kinase 3ß by their binding to Axin's long intrinsically disordered region, thereby yielding structured domains with flexible linkers. This complex leads to the phosphorylation of ß-catenin, marking it for destruction. Fusing proteins with flexible linkers vastly accelerates chemical interactions between them by their colocalization. Here we propose that the complex works by random movements of a "stochastic machine," not by coordinated conformational changes. This non-covalent, modular assembly process allows the various molecular machine components to be used in multiple processes.


Subject(s)
Axin Signaling Complex/chemistry , Models, Molecular , Protein Processing, Post-Translational , Allosteric Regulation , Axin Signaling Complex/physiology , Casein Kinase I/chemistry , Humans , Phosphorylation , Protein Structure, Quaternary , Protein Transport , Proteolysis , Stochastic Processes , Wnt Signaling Pathway , beta Catenin/chemistry
6.
J Struct Biol ; 181(1): 29-36, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23142703

ABSTRACT

Intrinsically disordered proteins (IDPs) do not adopt stable three-dimensional structures in physiological conditions, yet these proteins play crucial roles in biological phenomena. In most cases, intrinsic disorder manifests itself in segments or domains of an IDP, called intrinsically disordered regions (IDRs), but fully disordered IDPs also exist. Although IDRs can be detected as missing residues in protein structures determined by X-ray crystallography, no protocol has been developed to identify IDRs from structures obtained by Nuclear Magnetic Resonance (NMR). Here, we propose a computational method to assign IDRs based on NMR structures. We compared missing residues of X-ray structures with residue-wise deviations of NMR structures for identical proteins, and derived a threshold deviation that gives the best correlation of ordered and disordered regions of both structures. The obtained threshold of 3.2Å was applied to proteins whose structures were only determined by NMR, and the resulting IDRs were analyzed and compared to those of X-ray structures with no NMR counterpart in terms of sequence length, IDR fraction, protein function, cellular location, and amino acid composition, all of which suggest distinct characteristics. The structural knowledge of IDPs is still inadequate compared with that of structured proteins. Our method can collect and utilize IDRs from structures determined by NMR, potentially enhancing the understanding of IDPs.


Subject(s)
Models, Molecular , Proteins/chemistry , Algorithms , Amino Acid Sequence , Crystallography, X-Ray , Nuclear Magnetic Resonance, Biomolecular , Protein Conformation , Protein Stability
7.
J Virol ; 83(20): 10719-36, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19640978

ABSTRACT

It is widely assumed that new proteins are created by duplication, fusion, or fission of existing coding sequences. Another mechanism of protein birth is provided by overlapping genes. They are created de novo by mutations within a coding sequence that lead to the expression of a novel protein in another reading frame, a process called "overprinting." To investigate this mechanism, we have analyzed the sequences of the protein products of manually curated overlapping genes from 43 genera of unspliced RNA viruses infecting eukaryotes. Overlapping proteins have a sequence composition globally biased toward disorder-promoting amino acids and are predicted to contain significantly more structural disorder than nonoverlapping proteins. By analyzing the phylogenetic distribution of overlapping proteins, we were able to confirm that 17 of these had been created de novo and to study them individually. Most proteins created de novo are orphans (i.e., restricted to one species or genus). Almost all are accessory proteins that play a role in viral pathogenicity or spread, rather than proteins central to viral replication or structure. Most proteins created de novo are predicted to be fully disordered and have a highly unusual sequence composition. This suggests that some viral overlapping reading frames encoding hypothetical proteins with highly biased composition, often discarded as noncoding, might in fact encode proteins. Some proteins created de novo are predicted to be ordered, however, and whenever a three-dimensional structure of such a protein has been solved, it corresponds to a fold previously unobserved, suggesting that the study of these proteins could enhance our knowledge of protein space.


Subject(s)
Computational Biology/methods , Genes, Overlapping/genetics , Genes, Viral/genetics , Genome, Viral , RNA Viruses/genetics , Viral Proteins/chemistry , Viral Proteins/genetics , Amino Acid Sequence , Animals , Eukaryotic Cells/virology , Evolution, Molecular , Humans , Open Reading Frames , Phylogeny , Sequence Alignment
8.
J Biomol Struct Dyn ; 24(4): 325-42, 2007 Feb.
Article in English | MEDLINE | ID: mdl-17206849

ABSTRACT

The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only approximately 7% of proteins are observed in the corresponding PDB structures, and only approximately 25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, "Observed" (which correspond to structured regions), "Not observed" (regions with missing electron density, potentially disordered), "Uncharacterized," and "Ambiguous," depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a 'fragment' or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. "Non-observed," "Ambiguous," and "Uncharacterized" regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR(R) VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the "Observed" dataset are ordered, and that the "Not observed" regions are mostly disordered. The "Uncharacterized" regions possess some tendency toward order, whereas the predictions for the short "Ambiguous" regions are really ambiguous. Long "Ambiguous" regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be "wobbly" domains. Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset approximately 10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and approximately 40% of the proteins possess short regions (> or =10 and < 30 amino-acid long) of missing and ambiguous residues.


Subject(s)
Databases, Protein , Proteins/chemistry , Algorithms , Amino Acid Sequence , Amino Acids/analysis , Animals , Archaeal Proteins/chemistry , Bacterial Proteins/chemistry , Models, Molecular , Protein Conformation , Viral Proteins/chemistry
9.
Nucleic Acids Res ; 34(8): 2438-44, 2006.
Article in English | MEDLINE | ID: mdl-16682451

ABSTRACT

Vibrio cholerae, the etiological agent of the diarrheal illness cholera, can kill an infected adult in 24 h. V.cholerae lives as an autochthonous microbe in estuaries, rivers and coastal waters. A better understanding of its metabolic pathways will assist the development of more effective treatments and will provide a deeper understanding of how this bacterium persists in natural aquatic habitats. Using the completed V.cholerae genome sequence and PathoLogic software, we created VchoCyc, a pathway-genome database that predicted 171 likely metabolic pathways in the bacterium. We report here experimental evidence supporting the computationally predicted pathways. The evidence comes from microarray gene expression studies of V.cholerae in the stools of three cholera patients [D. S. Merrell, S. M. Butler, F. Qadri, N. A. Dolganov, A. Alam, M. B. Cohen, S. B. Calderwood, G. K. Schoolnik and A. Camilli (2002) Nature, 417, 642-645.], from gene expression studies in minimal growth conditions and LB rich medium, and from clinical tests that identify V.cholerae. Expression data provide evidence supporting 92 (53%) of the 171 pathways. The clinical tests provide evidence supporting seven pathways, with six pathways supported by both methods. VchoCyc provides biologists with a useful tool for analyzing this organism's metabolic and genomic information, which could lead to potential insights into new anti-bacterial agents. VchoCyc is available in the BioCyc database collection (http://BioCyc.org).


Subject(s)
Databases, Genetic , Vibrio cholerae/genetics , Vibrio cholerae/metabolism , Bacteriological Techniques , Cholera/microbiology , Databases, Genetic/statistics & numerical data , Gene Expression Profiling , Humans , Internet , Oligonucleotide Array Sequence Analysis , Software , Vibrio cholerae/isolation & purification
10.
Proc Natl Acad Sci U S A ; 103(22): 8390-5, 2006 May 30.
Article in English | MEDLINE | ID: mdl-16717195

ABSTRACT

Alternative splicing of pre-mRNA generates two or more protein isoforms from a single gene, thereby contributing to protein diversity. Despite intensive efforts, an understanding of the protein structure-function implications of alternative splicing is still lacking. Intrinsic disorder, which is a lack of equilibrium 3D structure under physiological conditions, may provide this understanding. Intrinsic disorder is a common phenomenon, particularly in multicellular eukaryotes, and is responsible for important protein functions including regulation and signaling. We hypothesize that polypeptide segments affected by alternative splicing are most often intrinsically disordered such that alternative splicing enables functional and regulatory diversity while avoiding structural complications. We analyzed a set of 46 differentially spliced genes encoding experimentally characterized human proteins containing both structured and intrinsically disordered amino acid segments. We show that 81% of 75 alternatively spliced fragments in these proteins were associated with fully (57%) or partially (24%) disordered protein regions. Regions affected by alternative splicing were significantly biased toward encoding disordered residues, with a vanishingly small P value. A larger data set composed of 558 SwissProt proteins with known isoforms produced by 1,266 alternatively spliced fragments was characterized by applying the pondr vsl1 disorder predictor. Results from prediction data are consistent with those obtained from experimental data, further supporting the proposed hypothesis. Associating alternative splicing with protein disorder enables the time- and tissue-specific modulation of protein function needed for cell differentiation and the evolution of multicellular organisms.


Subject(s)
Alternative Splicing , Proteins/genetics , Proteins/metabolism , Animals , Humans , Models, Molecular , Protein Conformation , Protein Isoforms/genetics , Protein Isoforms/metabolism , Proteins/chemistry , Transcription, Genetic/genetics
11.
Curr Opin Struct Biol ; 14(5): 570-6, 2004 Oct.
Article in English | MEDLINE | ID: mdl-15465317

ABSTRACT

Several computational and experimental methods exist for identifying disordered residues within proteins. Computational algorithms can now identify these disordered sequences and predict their occurrence within genomes with relatively high accuracy. Recent advances in NMR and mass spectroscopy permit faster and more detailed studies of disordered states at atomic resolutions. Combining prediction, computation and experimentation is proposed to accelerate and enhance the characterization of intrinsically disordered protein.


Subject(s)
Proteins/chemistry , Hydrolysis , Nuclear Magnetic Resonance, Biomolecular
SELECTION OF CITATIONS
SEARCH DETAIL