Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
PLoS Biol ; 19(12): e3001464, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34871295

RESUMO

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.


Assuntos
Crowdsourcing/métodos , Curadoria de Dados/métodos , Anotação de Sequência Molecular/métodos , Sequência de Aminoácidos/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas/tendências , Humanos , Literatura , Proteínas/metabolismo , Participação dos Interessados
2.
Hum Genet ; 142(7): 927-947, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37191732

RESUMO

To expedite gene discovery in eye development and its associated defects, we previously developed a bioinformatics resource-tool iSyTE (integrated Systems Tool for Eye gene discovery). However, iSyTE is presently limited to lens tissue and is predominantly based on transcriptomics datasets. Therefore, to extend iSyTE to other eye tissues on the proteome level, we performed high-throughput tandem mass spectrometry (MS/MS) on mouse embryonic day (E)14.5 retina and retinal pigment epithelium combined tissue and identified an average of 3300 proteins per sample (n = 5). High-throughput expression profiling-based gene discovery approaches-involving either transcriptomics or proteomics-pose a key challenge of prioritizing candidates from thousands of RNA/proteins expressed. To address this, we used MS/MS proteome data from mouse whole embryonic body (WB) as a reference dataset and performed comparative analysis-termed "in silico WB-subtraction"-with the retina proteome dataset. In silico WB-subtraction identified 90 high-priority proteins with retina-enriched expression at stringency criteria of ≥ 2.5 average spectral counts, ≥ 2.0 fold-enrichment, false discovery rate < 0.01. These top candidates represent a pool of retina-enriched proteins, several of which are associated with retinal biology and/or defects (e.g., Aldh1a1, Ank2, Ank3, Dcn, Dync2h1, Egfr, Ephb2, Fbln5, Fbn2, Hras, Igf2bp1, Msi1, Rbp1, Rlbp1, Tenm3, Yap1, etc.), indicating the effectiveness of this approach. Importantly, in silico WB-subtraction also identified several new high-priority candidates with potential regulatory function in retina development. Finally, proteins exhibiting expression or enriched-expression in the retina are made accessible in a user-friendly manner at iSyTE ( https://research.bioinformatics.udel.edu/iSyTE/ ), to allow effective visualization of this information and facilitate eye gene discovery.


Assuntos
Oftalmopatias , Epitélio Pigmentado da Retina , Animais , Camundongos , Epitélio Pigmentado da Retina/metabolismo , Espectrometria de Massas em Tandem , Proteoma/genética , Proteoma/metabolismo , Proteômica , Retina/metabolismo , Perfilação da Expressão Gênica , Estudos de Associação Genética
3.
Bioinformatics ; 36(17): 4643-4648, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32399560

RESUMO

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.


Assuntos
Bases de Conhecimento , Proteínas , Mapeamento Cromossômico , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/genética
4.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30398656

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Animais , Bases de Dados Genéticas , Ontologia Genética , Humanos , Internet , Família Multigênica , Domínios Proteicos/genética , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador
5.
Hum Genet ; 139(2): 151-184, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31797049

RESUMO

While the bioinformatics resource-tool iSyTE (integrated Systems Tool for Eye gene discovery) effectively identifies human cataract-associated genes, it is currently based on just transcriptome data, and thus, it is necessary to include protein-level information to gain greater confidence in gene prioritization. Here, we expand iSyTE through development of a novel proteome-based resource on the lens and demonstrate its utility in cataract gene discovery. We applied high-throughput tandem mass spectrometry (MS/MS) to generate a global protein expression profile of mouse lens at embryonic day (E)14.5, which identified 2371 lens-expressed proteins. A major challenge of high-throughput expression profiling is identification of high-priority candidates among the thousands of expressed proteins. To address this problem, we generated new MS/MS proteome data on mouse whole embryonic body (WB). WB proteome was then used as a reference dataset for performing "in silico WB-subtraction" comparative analysis with the lens proteome, which effectively identified 422 proteins with lens-enriched expression at ≥ 2.5 average spectral counts, ≥ 2.0 fold enrichment (FDR < 0.01) cut-off. These top 20% candidates represent a rich pool of high-priority proteins in the lens including known human cataract-linked genes and many new potential regulators of lens development and homeostasis. This rich information is made publicly accessible through iSyTE (https://research.bioinformatics.udel.edu/iSyTE/), which enables user-friendly visualization of promising candidates, thus making iSyTE a comprehensive tool for cataract gene discovery.


Assuntos
Biomarcadores/metabolismo , Catarata/metabolismo , Simulação por Computador , Proteínas do Olho/metabolismo , Cristalino/metabolismo , Proteoma/metabolismo , Espectrometria de Massas em Tandem/métodos , Animais , Catarata/genética , Catarata/patologia , Biologia Computacional , Proteínas do Olho/genética , Perfilação da Expressão Gênica , Humanos , Cristalino/embriologia , Camundongos , Camundongos Endogâmicos C57BL , Proteoma/análise , Transcriptoma
6.
Nucleic Acids Res ; 46(D1): D875-D885, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29036527

RESUMO

Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches.


Assuntos
Catarata/genética , Bases de Dados Genéticas , Proteínas do Olho/genética , Expressão Gênica , Estudos de Associação Genética/métodos , Animais , Catarata/embriologia , Catarata/metabolismo , Conjuntos de Dados como Assunto , Modelos Animais de Doenças , Proteínas do Olho/biossíntese , Previsões , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Humanos , Cristalino/embriologia , Cristalino/crescimento & desenvolvimento , Cristalino/metabolismo , Camundongos , Camundongos Mutantes , Análise de Sequência com Séries de Oligonucleotídeos , Interface Usuário-Computador
7.
Nucleic Acids Res ; 46(D1): D542-D550, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29145615

RESUMO

Protein post-translational modifications (PTMs) play a pivotal role in numerous biological processes by modulating regulation of protein function. We have developed iPTMnet (http://proteininformationresource.org/iPTMnet) for PTM knowledge discovery, employing an integrative bioinformatics approach-combining text mining, data mining, and ontological representation to capture rich PTM information, including PTM enzyme-substrate-site relationships, PTM-specific protein-protein interactions (PPIs) and PTM conservation across species. iPTMnet encompasses data from (i) our PTM-focused text mining tools, RLIMS-P and eFIP, which extract phosphorylation information from full-scale mining of PubMed abstracts and full-length articles; (ii) a set of curated databases with experimentally observed PTMs; and iii) Protein Ontology that organizes proteins and PTM proteoforms, enabling their representation, annotation and comparison within and across species. Presently covering eight major PTM types (phosphorylation, ubiquitination, acetylation, methylation, glycosylation, S-nitrosylation, sumoylation and myristoylation), iPTMnet knowledgebase contains more than 654 500 unique PTM sites in over 62 100 proteins, along with more than 1200 PTM enzymes and over 24 300 PTM enzyme-substrate-site relations. The website supports online search, browsing, retrieval and visual analysis for scientific queries. Several examples, including functional interpretation of phosphoproteomic data, demonstrate iPTMnet as a gateway for visual exploration and systematic analysis of PTM networks and conservation, thereby enabling PTM discovery and hypothesis generation.


Assuntos
Bases de Dados de Proteínas , Bases de Conhecimento , Processamento de Proteína Pós-Traducional , Animais , Biologia Computacional , Mineração de Dados , Enzimas/metabolismo , Humanos , Internet , Fosforilação , Mapas de Interação de Proteínas , Alinhamento de Sequência
8.
Hum Mutat ; 40(6): 694-705, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30840782

RESUMO

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.


Assuntos
Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Mutação de Sentido Incorreto , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Predisposição Genética para Doença , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Proteínas/genética , Proteínas/metabolismo , Software , Navegador
9.
Nucleic Acids Res ; 45(D1): D339-D346, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899649

RESUMO

The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas , Animais , Humanos , Proteínas/química , Proteínas/genética , Navegador
10.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899635

RESUMO

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Software , Humanos , Anotação de Sequência Molecular , Filogenia
11.
BMC Genomics ; 19(1): 695, 2018 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-30241500

RESUMO

BACKGROUND: Although hatching is perhaps the most abrupt and profound metabolic challenge that a chicken must undergo; there have been no attempts to functionally map the metabolic pathways induced in liver during the embryo-to-hatchling transition. Furthermore, we know very little about the metabolic and regulatory factors that regulate lipid metabolism in late embryos or newly-hatched chicks. In the present study, we examined hepatic transcriptomes of 12 embryos and 12 hatchling chicks during the peri-hatch period-or the metabolic switch from chorioallantoic to pulmonary respiration. RESULTS: Initial hierarchical clustering revealed two distinct, albeit opposing, patterns of hepatic gene expression. Cluster A genes are largely lipolytic and highly expressed in embryos. While, Cluster B genes are lipogenic/thermogenic and mainly controlled by the lipogenic transcription factor THRSPA. Using pairwise comparisons of embryo and hatchling ages, we found 1272 genes that were differentially expressed between embryos and hatchling chicks, including 24 transcription factors and 284 genes that regulate lipid metabolism. The three most differentially-expressed transcripts found in liver of embryos were MOGAT1, DIO3 and PDK4, whereas THRSPA, FASN and DIO2 were highest in hatchlings. An unusual finding was the "ectopic" and extremely high differentially expression of seven feather keratin transcripts in liver of 16 day embryos, which coincides with engorgement of liver with yolk lipids. Gene interaction networks show several transcription factors, transcriptional co-activators/co-inhibitors and their downstream genes that exert a 'ying-yang' action on lipid metabolism during the embryo-to-hatching transition. These upstream regulators include ligand-activated transcription factors, sirtuins and Kruppel-like factors. CONCLUSIONS: Our genome-wide transcriptional analysis has greatly expanded the hepatic repertoire of regulatory and metabolic genes involved in the embryo-to-hatchling transition. New knowledge was gained on interactive transcriptional networks and metabolic pathways that enable the abrupt switch from ectothermy (embryo) to endothermy (hatchling) in the chicken. Several transcription factors and their coactivators/co-inhibitors appear to exert opposing actions on lipid metabolism, leading to the predominance of lipolysis in embryos and lipogenesis in hatchlings. Our analysis of hepatic transcriptomes has enabled discovery of opposing, interconnected and interdependent transcriptional regulators that provide precise ying-yang or homeorhetic regulation of lipid metabolism during the critical embryo-to-hatchling transition.


Assuntos
Galinhas/crescimento & desenvolvimento , Galinhas/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Fígado/metabolismo , Animais , Cruzamento , Embrião de Galinha/crescimento & desenvolvimento , Embrião de Galinha/metabolismo , Desenvolvimento Embrionário , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Fígado/embriologia , Fígado/crescimento & desenvolvimento , Transcriptoma
12.
Hum Genet ; 137(11-12): 941-954, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30417254

RESUMO

Isolated or syndromic congenital cataracts are heterogeneous developmental defects, making the identification of the associated genes challenging. In the past, mouse lens expression microarrays have been successfully applied in bioinformatics tools (e.g., iSyTE) to facilitate human cataract-associated gene discovery. To develop a new resource for geneticists, we report high-throughput RNA sequencing (RNA-seq) profiles of mouse lens at key embryonic stages (E)10.5 (lens pit), E12.5 (primary fiber cell differentiation), E14.5 and E16.5 (secondary fiber cell differentiation). These stages capture important events as the lens develops from an invaginating placode into a transparent tissue. Previously, in silico whole-embryo body (WB)-subtraction-based "lens-enriched" expression has been effective in prioritizing cataract-linked genes. To apply an analogous approach, we generated new mouse WB RNA-seq datasets and show that in silico WB subtraction of lens RNA-seq datasets successfully identifies key genes based on lens-enriched expression. At ≥2 counts-per-million expression, ≥1.5 log2 fold-enrichment (p < 0.05) cutoff, E10.5 lens exhibits 1401 enriched genes (17% lens-expressed genes), E12.5 lens exhibits 1937 enriched genes (22% lens-expressed genes), E14.5 lens exhibits 2514 enriched genes (31% lens-expressed genes), and E16.5 lens exhibits 2745 enriched genes (34% lens-expressed genes). Biological pathway analysis identified genes associated with lens development, transcription regulation and signaling pathways, among other functional groups. Furthermore, these new RNA-seq data confirmed high expression of established cataract-linked genes and identified new potential regulators in the lens. Finally, we developed new lens stage-specific UCSC Genome Brower annotation tracks and made these publicly accessible through iSyTE ( https://research.bioinformatics.udel.edu/iSyTE/ ) for user-friendly visualization of lens gene expression/enrichment to prioritize genes from high-throughput data from cataract cases.


Assuntos
Catarata/genética , Diferenciação Celular/genética , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica/genética , Animais , Catarata/patologia , Biologia Computacional , Estudos de Associação Genética , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cristalino/patologia , Camundongos , Análise de Sequência de RNA
13.
Bioinformatics ; 32(13): 2041-3, 2016 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153712

RESUMO

MOTIVATION: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. RESULTS: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt's curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. AVAILABILITY AND IMPLEMENTATION: http://proteininformationresource.org/rps/viruses/ CONTACT: chenc@udel.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Proteínas , Proteoma/análise , Proteínas Virais/análise , Sequência de Aminoácidos , Análise por Conglomerados , Biologia Computacional , Bases de Conhecimento
15.
Nucleic Acids Res ; 43(Database issue): D213-21, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25428371

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Bactérias/metabolismo , Ontologia Genética , Estrutura Terciária de Proteína , Proteínas/genética , Análise de Sequência de Proteína , Software
16.
Hum Mol Genet ; 23(1): 24-39, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23943793

RESUMO

Iron-sulfur (Fe-S) clusters are ancient enzyme cofactors found in virtually all life forms. We evaluated the physiological effects of chronic Fe-S cluster deficiency in human skeletal muscle, a tissue that relies heavily on Fe-S cluster-mediated aerobic energy metabolism. Despite greatly decreased oxidative capacity, muscle tissue from patients deficient in the Fe-S cluster scaffold protein ISCU showed a predominance of type I oxidative muscle fibers and higher capillary density, enhanced expression of transcriptional co-activator PGC-1α and increased mitochondrial fatty acid oxidation genes. These Fe-S cluster-deficient muscles showed a dramatic up-regulation of the ketogenic enzyme HMGCS2 and the secreted protein FGF21 (fibroblast growth factor 21). Enhanced muscle FGF21 expression was reflected by elevated circulating FGF21 levels in the patients, and robust FGF21 secretion could be recapitulated by respiratory chain inhibition in cultured myotubes. Our findings reveal that mitochondrial energy starvation elicits a coordinated response in Fe-S-deficient skeletal muscle that is reflected systemically by increased plasma FGF21 levels.


Assuntos
Acidose Láctica/congênito , Fatores de Crescimento de Fibroblastos/metabolismo , Hidroximetilglutaril-CoA Sintase/metabolismo , Proteínas Ferro-Enxofre/metabolismo , Músculo Esquelético/metabolismo , Doenças Musculares/congênito , Fatores de Transcrição/genética , Acidose Láctica/genética , Acidose Láctica/metabolismo , Acidose Láctica/patologia , Adulto , Idoso , Estudos de Casos e Controles , Células Cultivadas , Metabolismo Energético , Feminino , Fatores de Crescimento de Fibroblastos/genética , Regulação da Expressão Gênica , Humanos , Hidroximetilglutaril-CoA Sintase/genética , Proteínas Ferro-Enxofre/genética , Masculino , Pessoa de Meia-Idade , Mitocôndrias Musculares/metabolismo , Mitocôndrias Musculares/patologia , Doenças Musculares/genética , Doenças Musculares/metabolismo , Doenças Musculares/patologia , Coativador 1-alfa do Receptor gama Ativado por Proliferador de Peroxissomo , Fatores de Transcrição/metabolismo
17.
Bioinformatics ; 31(6): 926-32, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25398609

RESUMO

MOTIVATION: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. RESULTS: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (∼7 times shorter hit list before expansion), faster (∼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Dioxigenases/metabolismo , Proteínas de Membrana/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína , Software , Homólogo AlkB 5 da RNA Desmetilase , Análise por Conglomerados , Dioxigenases/química , Dioxigenases/genética , Ontologia Genética , Humanos , Armazenamento e Recuperação da Informação , Proteínas de Membrana/química , Proteínas de Membrana/genética , Anotação de Sequência Molecular , Proteínas/química , Proteínas/genética
18.
Nucleic Acids Res ; 42(Database issue): D415-21, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24270789

RESUMO

The Protein Ontology (PRO; http://proconsortium.org) formally defines protein entities and explicitly represents their major forms and interrelations. Protein entities represented in PRO corresponding to single amino acid chains are categorized by level of specificity into family, gene, sequence and modification metaclasses, and there is a separate metaclass for protein complexes. All metaclasses also have organism-specific derivatives. PRO complements established sequence databases such as UniProtKB, and interoperates with other biomedical and biological ontologies such as the Gene Ontology (GO). PRO relates to UniProtKB in that PRO's organism-specific classes of proteins encoded by a specific gene correspond to entities documented in UniProtKB entries. PRO relates to the GO in that PRO's representations of organism-specific protein complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology. The past few years have seen growth and changes to the PRO, as well as new points of access to the data and new applications of PRO in immunology and proteomics. Here we describe some of these developments.


Assuntos
Ontologias Biológicas , Bases de Dados de Proteínas , Proteínas/classificação , Animais , Humanos , Internet , Camundongos , Proteínas/química
19.
Bioinformatics ; 29(21): 2808-9, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23958731

RESUMO

SUMMARY: We have developed a new web application for peptide matching using Apache Lucene-based search engine. The Peptide Match service is designed to quickly retrieve all occurrences of a given query peptide from UniProt Knowledgebase (UniProtKB) with isoforms. The matched proteins are shown in summary tables with rich annotations, including matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases. The results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree. The service supports queries where isobaric leucine and isoleucine are treated equivalent, and an option for searching UniRef100 representative sequences, as well as dynamic queries to major proteomic databases. In addition to the web interface, we also provide RESTful web services. The underlying data are updated every 4 weeks in accordance with the UniProt releases. AVAILABILITY: http://proteininformationresource.org/peptide.shtml. CONTACT: chenc@udel.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Proteínas , Peptídeos/química , Ferramenta de Busca , Internet , Bases de Conhecimento , Proteômica , Análise de Sequência de Proteína
20.
BMC Struct Biol ; 13: 6, 2013 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-23617634

RESUMO

BACKGROUND: The post-genomic era poses several challenges. The biggest is the identification of biochemical function for protein sequences and structures resulting from genomic initiatives. Most sequences lack a characterized function and are annotated as hypothetical or uncharacterized. While homology-based methods are useful, and work well for sequences with sequence identities above 50%, they fail for sequences in the twilight zone (<30%) of sequence identity. For cases where sequence methods fail, structural approaches are often used, based on the premise that structure preserves function for longer evolutionary time-frames than sequence alone. It is now clear that no single method can be used successfully for functional inference. Given the growing need for functional assignments, we describe here a systematic new approach, designated ligand-centric, which is primarily based on analysis of ligand-bound/unbound structures in the PDB. Results of applying our approach to S-adenosyl-L-methionine (SAM) binding proteins are presented. RESULTS: Our analysis included 1,224 structures that belong to 172 unique families of the Protein Information Resource Superfamily system. Our ligand-centric approach was divided into four levels: residue, protein/domain, ligand, and family levels. The residue level included the identification of conserved binding site residues based on structure-guided sequence alignments of representative members of a family, and the identification of conserved structural motifs. The protein/domain level included structural classification of proteins, Pfam domains, domain architectures, and protein topologies. The ligand level included ligand conformations, ribose sugar puckering, and the identification of conserved ligand-atom interactions. The family level included phylogenetic analysis. CONCLUSION: We found that SAM bound to a total of 18 different fold types (I-XVIII). We identified 4 new fold types and 11 additional topological arrangements of strands within the well-studied Rossmann fold Methyltransferases (MTases). This extends the existing structural classification of SAM binding proteins. A striking correlation between fold type and the conformation of the bound SAM (classified as types) was found across the 18 fold types. Several site-specific rules were created for the assignment of functional residues to families and proteins that do not have a bound SAM or a solved structure.


Assuntos
Ligantes , Proteínas/metabolismo , S-Adenosilmetionina/metabolismo , Motivos de Aminoácidos , Sítios de Ligação , Bases de Dados de Proteínas , Metiltransferases/química , Metiltransferases/metabolismo , Ligação Proteica , Dobramento de Proteína , Estrutura Terciária de Proteína , Proteínas/química , S-Adenosilmetionina/química , Temperatura
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA