Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Nucleic Acids Res ; 49(16): e96, 2021 09 20.
Article in English | MEDLINE | ID: mdl-34181736

ABSTRACT

Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.


Subject(s)
Computational Biology/methods , Software , Databases, Chemical , Databases, Genetic , Deep Learning , Humans
3.
Bioinformatics ; 36(17): 4643-4648, 2020 11 01.
Article in English | MEDLINE | ID: mdl-32399560

ABSTRACT

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.


Subject(s)
Knowledge Bases , Proteins , Chromosome Mapping , Databases, Protein , Molecular Sequence Annotation , Proteins/genetics
4.
Proteomics ; 15(1): 48-57, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25307260

ABSTRACT

In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.


Subject(s)
Databases, Protein , Proteins/chemistry , Proteomics , Animals , Humans , Mice , Peptides/chemistry , Peptides/metabolism , Protein Isoforms/chemistry , Protein Isoforms/metabolism , Proteins/metabolism , Sequence Analysis, Protein , Trypsin/metabolism
5.
Chembiochem ; 13(9): 1297-303, 2012 Jun 18.
Article in English | MEDLINE | ID: mdl-22614947

ABSTRACT

We have identified the native dimer interface of heptaprenylglyceryl phosphate synthase PcrB from the bacterium Bacillus subtilis and analyzed the significance of oligomer formation for stability and catalytic activity. Computational methods predicted two different surface regions of the PcrB protomer that could be responsible for dimer formation. These bona fide interfaces were assessed both in silico and experimentally by the introduction of amino acid substitutions that led to monomerization, and by incorporation of an unnatural amino acid to allow cross-linking of the two protomers. The results showed that, in contrast to previous assumptions, PcrB uses the same interface for dimerization as the homologous geranylgeranylglyceryl phosphate synthase from Archaea. Thermal unfolding demonstrated that the monomeric proteins are only slightly less stable than wild-type PcrB. However, activity assays showed that monomerization limits the length of accepted polyprenyl pyrophosphates to three isoprene units, whereas the native PcrB substrate contains seven isoprene entities. We provide a plausible hypothesis as to how dimerization determines substrate specificity of PcrB.


Subject(s)
Bacillus subtilis/enzymology , Dimethylallyltranstransferase/chemistry , Dimethylallyltranstransferase/metabolism , Protein Multimerization , Amino Acid Substitution , Dimethylallyltranstransferase/genetics , Enzyme Stability , Models, Molecular , Protein Structure, Quaternary , Substrate Specificity , Temperature
6.
Proteins ; 80(1): 154-68, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22038731

ABSTRACT

An important task of computational biology is to identify those parts of a polypeptide chain, which are involved in interactions with other proteins. For this purpose, we have developed the program PresCont, which predicts in a robust manner amino acids that constitute protein-protein interfaces (PPIs). PresCont reaches state-of-the-art classification quality on the basis of only four residue properties that can be readily deduced from the 3D structure of an individual protein and a multiple sequence alignment (MSA) composed of homologs. The core of PresCont is a support vector machine, which assesses solvent-accessible surface area, hydrophobicity, conservation, and the local environment of each amino acid on the protein surface. For training and performance testing, we compiled three nonoverlapping datasets consisting of permanently formed or transient complexes, respectively. A comparison with SPPIDER, ProMate, and meta-PPISP showed that PresCont compares favorably with these highly sophisticated programs, and that its prediction quality is less dependent on the type of protein complex being considered. This balance is due to a mutual compensation of classification weaknesses observed for individual properties: For PPIs of permanent complexes, solvent-accessible surface and hydrophobicity contribute most to classification quality, for PPIs of transient complexes, the assessment of the local environment is most significant. Moreover, we show that for permanent complexes a segmentation of PPIs into core and rim residues has only a moderate influence on prediction quality. PresCont is available as a web service at http://www-bioinf.uni-regensburg.de/.


Subject(s)
Computer Simulation , Models, Molecular , Protein Interaction Domains and Motifs , Software , Algorithms , Amino Acid Sequence , Conserved Sequence , Fungal Proteins/chemistry , Hydrophobic and Hydrophilic Interactions , Multiprotein Complexes/chemistry , ROC Curve , Sequence Alignment , Support Vector Machine , Surface Properties , Yeasts , tRNA Methyltransferases/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...