Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 49(16): e96, 2021 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-34181736

RESUMO

Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.


Assuntos
Biologia Computacional/métodos , Software , Bases de Dados de Compostos Químicos , Bases de Dados Genéticas , Aprendizado Profundo , Humanos
3.
Bioinformatics ; 36(17): 4643-4648, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32399560

RESUMO

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.


Assuntos
Bases de Conhecimento , Proteínas , Mapeamento Cromossômico , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/genética
4.
Proteomics ; 15(1): 48-57, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25307260

RESUMO

In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteômica , Animais , Humanos , Camundongos , Peptídeos/química , Peptídeos/metabolismo , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína , Tripsina/metabolismo
5.
Chembiochem ; 13(9): 1297-303, 2012 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-22614947

RESUMO

We have identified the native dimer interface of heptaprenylglyceryl phosphate synthase PcrB from the bacterium Bacillus subtilis and analyzed the significance of oligomer formation for stability and catalytic activity. Computational methods predicted two different surface regions of the PcrB protomer that could be responsible for dimer formation. These bona fide interfaces were assessed both in silico and experimentally by the introduction of amino acid substitutions that led to monomerization, and by incorporation of an unnatural amino acid to allow cross-linking of the two protomers. The results showed that, in contrast to previous assumptions, PcrB uses the same interface for dimerization as the homologous geranylgeranylglyceryl phosphate synthase from Archaea. Thermal unfolding demonstrated that the monomeric proteins are only slightly less stable than wild-type PcrB. However, activity assays showed that monomerization limits the length of accepted polyprenyl pyrophosphates to three isoprene units, whereas the native PcrB substrate contains seven isoprene entities. We provide a plausible hypothesis as to how dimerization determines substrate specificity of PcrB.


Assuntos
Bacillus subtilis/enzimologia , Dimetilaliltranstransferase/química , Dimetilaliltranstransferase/metabolismo , Multimerização Proteica , Substituição de Aminoácidos , Dimetilaliltranstransferase/genética , Estabilidade Enzimática , Modelos Moleculares , Estrutura Quaternária de Proteína , Especificidade por Substrato , Temperatura
6.
Proteins ; 80(1): 154-68, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22038731

RESUMO

An important task of computational biology is to identify those parts of a polypeptide chain, which are involved in interactions with other proteins. For this purpose, we have developed the program PresCont, which predicts in a robust manner amino acids that constitute protein-protein interfaces (PPIs). PresCont reaches state-of-the-art classification quality on the basis of only four residue properties that can be readily deduced from the 3D structure of an individual protein and a multiple sequence alignment (MSA) composed of homologs. The core of PresCont is a support vector machine, which assesses solvent-accessible surface area, hydrophobicity, conservation, and the local environment of each amino acid on the protein surface. For training and performance testing, we compiled three nonoverlapping datasets consisting of permanently formed or transient complexes, respectively. A comparison with SPPIDER, ProMate, and meta-PPISP showed that PresCont compares favorably with these highly sophisticated programs, and that its prediction quality is less dependent on the type of protein complex being considered. This balance is due to a mutual compensation of classification weaknesses observed for individual properties: For PPIs of permanent complexes, solvent-accessible surface and hydrophobicity contribute most to classification quality, for PPIs of transient complexes, the assessment of the local environment is most significant. Moreover, we show that for permanent complexes a segmentation of PPIs into core and rim residues has only a moderate influence on prediction quality. PresCont is available as a web service at http://www-bioinf.uni-regensburg.de/.


Assuntos
Simulação por Computador , Modelos Moleculares , Domínios e Motivos de Interação entre Proteínas , Software , Algoritmos , Sequência de Aminoácidos , Sequência Conservada , Proteínas Fúngicas/química , Interações Hidrofóbicas e Hidrofílicas , Complexos Multiproteicos/química , Curva ROC , Alinhamento de Sequência , Máquina de Vetores de Suporte , Propriedades de Superfície , Leveduras , tRNA Metiltransferases/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...