Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 12: 91, 2011 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-21466708

RESUMO

BACKGROUND: Protein O-GlcNAcylation (or O-GlcNAc-ylation) is an O-linked glycosylation involving the transfer of ß-N-acetylglucosamine to the hydroxyl group of serine or threonine residues of proteins. Growing evidences suggest that protein O-GlcNAcylation is common and is analogous to phosphorylation in modulating broad ranges of biological processes. However, compared to phosphorylation, the amount of protein O-GlcNAcylation data is relatively limited and its annotation in databases is scarce. Furthermore, a bioinformatics resource for O-GlcNAcylation is lacking, and an O-GlcNAcylation site prediction tool is much needed. DESCRIPTION: We developed a database of O-GlcNAcylated proteins and sites, dbOGAP, primarily based on literature published since O-GlcNAcylation was first described in 1984. The database currently contains ~800 proteins with experimental O-GlcNAcylation information, of which ~61% are of humans, and 172 proteins have a total of ~400 O-GlcNAcylation sites identified. The O-GlcNAcylated proteins are primarily nucleocytoplasmic, including membrane- and non-membrane bounded organelle-associated proteins. The known O-GlcNAcylated proteins exert a broad range of functions including transcriptional regulation, macromolecular complex assembly, intracellular transport, translation, and regulation of cell growth or death. The database also contains ~365 potential O-GlcNAcylated proteins inferred from known O-GlcNAcylated orthologs. Additional annotations, including other protein posttranslational modifications, biological pathways and disease information are integrated into the database. We developed an O-GlcNAcylation site prediction system, OGlcNAcScan, based on Support Vector Machine and trained using protein sequences with known O-GlcNAcylation sites from dbOGAP. The site prediction system achieved an area under ROC curve of 74.3% in five-fold cross-validation. The dbOGAP website was developed to allow for performing search and query on O-GlcNAcylated proteins and associated literature, as well as for browsing by gene names, organisms or pathways, and downloading of the database. Also available from the website, the OGlcNAcScan tool presents a list of predicted O-GlcNAcylation sites for given protein sequences. CONCLUSIONS: dbOGAP is the first public bioinformatics resource to allow systematic access to the O-GlcNAcylated proteins, and related functional information and bibliography, as well as to an O-GlcNAcylation site prediction tool. The resource will facilitate research on O-GlcNAcylation and its proteomic identification.


Assuntos
Biologia Computacional/métodos , Acetilglucosamina/metabolismo , Glicosilação , Humanos , Fosforilação , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Proteômica
2.
Bioinformatics ; 23(2): 198-206, 2007 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-17077095

RESUMO

MOTIVATION: Our purpose is to develop a statistical modeling approach for cancer biomarker discovery and provide new insights into early cancer detection. We propose the concept of dependence network, apply it for identifying cancer biomarkers, and study the difference between the protein or gene samples from cancer and non-cancer subjects based on mass-spectrometry (MS) and microarray data. RESULTS: Three MS and two gene microarray datasets are studied. Clear differences are observed in the dependence networks for cancer and non-cancer samples. Protein/gene features are examined three at one time through an exhaustive search. Dependence networks are constructed by binding triples identified by the eigenvalue pattern of the dependence model, and are further compared to identify cancer biomarkers. Such dependence-network-based biomarkers show much greater consistency under 10-fold cross-validation than the classification-performance-based biomarkers. Furthermore, the biological relevance of the dependence-network-based biomarkers using microarray data is discussed. The proposed scheme is shown promising for cancer diagnosis and prediction. AVAILABILITY: See supplements: http://dsplab.eng.umd.edu/~genomics/dependencenetwork/


Assuntos
Biomarcadores Tumorais/análise , Diagnóstico por Computador/métodos , Espectrometria de Massas/métodos , Proteínas de Neoplasias/análise , Neoplasias/diagnóstico , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Biológicos , Transdução de Sinais
3.
BMC Bioinformatics ; 8 Suppl 9: S5, 2007 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-18047706

RESUMO

MOTIVATION: With more and more research dedicated to literature mining in the biomedical domain, more and more systems are available for people to choose from when building literature mining applications. In this study, we focus on one specific kind of literature mining task, i.e., detecting definitions of acronyms, abbreviations, and symbols in biomedical text. We denote acronyms, abbreviations, and symbols as short forms (SFs) and their corresponding definitions as long forms (LFs). The study was designed to answer the following questions; i) how well a system performs in detecting LFs from novel text, ii) what the coverage is for various terminological knowledge bases in including SFs as synonyms of their LFs, and iii) how to combine results from various SF knowledge bases. METHOD: We evaluated the following three publicly available detection systems in detecting LFs for SFs: i) a handcrafted pattern/rule based system by Ao and Takagi, ALICE, ii) a machine learning system by Chang et al., and iii) a simple alignment-based program by Schwartz and Hearst. In addition, we investigated the conceptual coverage of two terminological knowledge bases: i) the UMLS (the Unified Medical Language System), and ii) the BioThesaurus (a thesaurus of names for all UniProt protein records). We also implemented a web interface that provides a virtual integration of various SF knowledge bases. RESULTS: We found that detection systems agree with each other on most cases, and the existing terminological knowledge bases have a good coverage of synonymous relationship for frequently defined LFs. The web interface allows people to detect SF definitions from text and to search several SF knowledge bases. AVAILABILITY: The web site is http://gauss.dbb.georgetown.edu/liblab/SFThesaurus.


Assuntos
Algoritmos , Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , MEDLINE , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Análise por Conglomerados , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Semântica , Interface Usuário-Computador
4.
Front Biosci ; 12: 5071-88, 2007 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-17569631

RESUMO

In the post-genome era, researchers are systematically tackling gene functions and complex regulatory processes by studying organisms on a global scale; however, a major challenge lies in the voluminous, complex, and dynamic data being maintained in heterogeneous sources, especially from proteomics experiments. Advanced computational methods are needed for integration, mining, comparative analysis, and functional interpretation of high-throughput proteomic data. In the first part of this review, we discuss aspects of data integration important for capturing all data relevant to functional analysis. We provide a list of databases commonly used in genomics and proteomics and explain strategies to connect the source data, with especial emphasis on our ID mapping service. Next, we describe iProClass, a central data infrastructure that supports both data integration and functional annotation of proteins, and give a brief introduction to the data search/retrieval and analysis tools currently available at our website (http://pir.georgetown.edu) that researchers can use for large-scale functional analysis. In the last part, we introduce iProXpress (integrated Protein eXpression), an integrated research and discovery platform for large-scale expression data analysis, and we show a prototype that has been useful for organelle proteome analysis.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Proteômica , Bases de Dados Genéticas , Genômica , Humanos , Internet , Melanossomas/metabolismo , Organelas/metabolismo , Mapeamento de Peptídeos , Proteoma , Software
5.
Bioinformatics ; 22(17): 2136-42, 2006 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-16837530

RESUMO

MOTIVATION: Attribute selection is a critical step in development of document classification systems. As a standard practice, words are stemmed and the most informative ones are used as attributes in classification. Owing to high complexity of biomedical terminology, general-purpose stemming algorithms are often conservative and could also remove informative stems. This can lead to accuracy reduction, especially when the number of labeled documents is small. To address this issue, we propose an algorithm that omits stemming and, instead, uses the most discriminative substrings as attributes. RESULTS: The approach was tested on five annotated sets of abstracts from iProLINK that report on the experimental evidence about five types of protein post-translational modifications. The experiments showed that Naive Bayes and support vector machine classifiers perform consistently better [with area under the ROC curve (AUC) accuracy in range 0.92-0.97] when using the proposed attribute selection than when using attributes obtained by the Porter stemmer algorithm (AUC in 0.86-0.93 range). The proposed approach is particularly useful when labeled datasets are small.


Assuntos
Indexação e Redação de Resumos/métodos , Sistemas de Gerenciamento de Base de Dados , Documentação/métodos , Armazenamento e Recuperação da Informação/métodos , MEDLINE , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Algoritmos , Inteligência Artificial , Vocabulário Controlado
6.
Int J Mass Spectrom ; 259(1-3): 147-160, 2007 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-17375895

RESUMO

Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for 7 lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles.

7.
J Am Med Inform Assoc ; 13(5): 497-507, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16799122

RESUMO

OBJECTIVE: Natural language processing (NLP) approaches have been explored to manage and mine information recorded in biological literature. A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases. The aim of this study was to provide quantitative assessment of the complexity of BNET on protein entities through BioThesaurus, a thesaurus of gene/protein names for UniProt knowledgebase (UniProtKB) entries that was acquired using online resources. METHODS: We evaluated the complexity through several perspectives: ambiguity (i.e., the number of genes/proteins represented by one name), synonymy (i.e., the number of names associated with the same gene/protein), and coverage (i.e., the percentage of gene/protein names in text included in the thesaurus). We also normalized names in BioThesaurus and measures were obtained twice, once before normalization and once after. RESULTS: The current version of BioThesaurus has over 2.6 million names or 2.1 million normalized names covering more than 1.8 million UniProtKB entries. The average synonymy is 3.53 (2.86 after normalization), ambiguity is 2.31 before normalization and 2.32 after, while the coverage is 94.0% based on the BioCreAtive data set comprising MEDLINE abstracts containing genes/proteins. CONCLUSION: The study indicated that names for genes/proteins are highly ambiguous and there are usually multiple names for the same gene or protein. It also demonstrated that most gene/protein names appearing in text can be found in BioThesaurus.


Assuntos
Processamento de Linguagem Natural , Proteínas , Vocabulário Controlado , Dicionários como Assunto , Genes , Nomes
8.
Nucleic Acids Res ; 30(1): 35-7, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752247

RESUMO

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).


Assuntos
Bases de Dados de Proteínas , Sequência de Aminoácidos , Animais , Humanos , Armazenamento e Recuperação da Informação , Agências Internacionais , Internet , Proteínas/classificação , Proteínas/genética , Integração de Sistemas
9.
Nucleic Acids Res ; 32(Database issue): D112-4, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681371

RESUMO

The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization of protein annotation and the systematic detection of annotation errors, PIR has extended its superfamily concept and developed the SuperFamily (PIRSF) classification system. Based on the evolutionary relationships of whole proteins, this classification system allows annotation of both specific biological and generic biochemical functions. The system adopts a network structure for protein classification from superfamily to subfamily levels. Protein family members are homologous (sharing common ancestry) and homeomorphic (sharing full-length sequence similarity with common domain architecture). The PIRSF database consists of two data sets, preliminary clusters and curated families. The curated families include family name, protein membership, parent-child relationship, domain architecture, and optional description and bibliography. PIRSF is accessible from the website at http://pir.georgetown.edu/pirsf/ for report retrieval and sequence classification. The report presents family annotation, membership statistics, cross-references to other databases, graphical display of domain architecture, and links to multiple sequence alignments and phylogenetic trees for curated families. PIRSF can be utilized to analyze phylogenetic profiles, to reveal functional convergence and divergence, and to identify interesting relationships between homeomorphic families, domains and structural classes.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Motivos de Aminoácidos , Animais , Evolução Molecular , Humanos , Armazenamento e Recuperação da Informação , Internet , Estrutura Terciária de Proteína
10.
Beijing Da Xue Xue Bao Yi Xue Ban ; 38(2): 218-21, 2006 Apr 18.
Artigo em Zh | MEDLINE | ID: mdl-16617371

RESUMO

A critical factor in the advancement of biomedical research is the ease with which data can be integrated, redistributed and analyzed both within and across domains. This paper summarizes the Biomedical Information Core Infrastructure built by National Cancer Institute Center for Bioinformatics in America (NCICB). The main product from the Core Infrastructure is caCORE--cancer Common Ontologic Reference Environment, which is the infrastructure backbone supporting data management and application development at NCICB. The paper explains the structure and function of caCORE: (1) Enterprise Vocabulary Services (EVS). They provide controlled vocabulary, dictionary and thesaurus services, and EVS produces the NCI Thesaurus and the NCI Metathesaurus; (2) The Cancer Data Standards Repository (caDSR). It provides a metadata registry for common data elements. (3) Cancer Bioinformatics Infrastructure Objects (caBIO). They provide Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. The vision for caCORE is to provide a common data management framework that will support the consistency, clarity, and comparability of biomedical research data and information. In addition to providing facilities for data management and redistribution, caCORE helps solve problems of data integration. All NCICB-developed caCORE components are distributed under open-source licenses that support unrestricted usage by both non-profit and commercial entities, and caCORE has laid the foundation for a number of scientific and clinical applications. Based on it, the paper expounds caCORE-base applications simply in several NCI projects, of which one is CMAP (Cancer Molecular Analysis Project), and the other is caBIG (Cancer Biomedical Informatics Grid). In the end, the paper also gives good prospects of caCORE, and while caCORE was born out of the needs of the cancer research community, it is intended to serve as a general resource. Cancer research has historically contributed to many areas beyond tumor biology. At the same time, the paper makes some suggestions about the study at the present time on biomedical informatics in China.


Assuntos
Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação , Informática Médica , National Cancer Institute (U.S.) , Software , Estados Unidos
11.
BMC Bioinformatics ; 6: 201, 2005 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-16091147

RESUMO

BACKGROUND: A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO) as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations). Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms). Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. RESULTS: We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO). DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. CONCLUSION: We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete documentation and software are freely available for download from the website http://biocreative.ifsm.umbc.edu/dyngo.


Assuntos
Apresentação de Dados , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Software , Interface Usuário-Computador , Gráficos por Computador , Documentação/métodos , Genes , Semântica , Vocabulário
12.
Beijing Da Xue Xue Bao Yi Xue Ban ; 37(4): 445-7, 2005 Aug 18.
Artigo em Zh | MEDLINE | ID: mdl-16086073

RESUMO

National Institutes of Health (NIH) released the biomedical research project NIH Roadmap Initiatives, including 3 themes, new pathways to discovery, research teams of the future, and re-engineering the clinical research enterprise. The purpose of the project is to catalyze to transform our new scientific knowledge into tangible benefits for people. Now, mostly of the project have begin to carry into practice.


Assuntos
Pesquisa Biomédica/tendências , Promoção da Saúde/métodos , Previsões , Humanos , National Institutes of Health (U.S.) , Objetivos Organizacionais , Apoio à Pesquisa como Assunto , Estados Unidos
13.
Endocrinology ; 143(6): 2139-42, 2002 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12021177

RESUMO

Transcription of the prolactin receptor (PRLR) is under the control of multiple promoters. Following the recent demonstration of the human non-coding exon 1, hE1(N) (hE1(N1)) and the generic exon 1 hE1(3), we have identified their promoters and characterized four other novel human exons 1 (hE1(N2-5)) that are alternatively spliced to a common non-coding exon 2 in human tissues and breast cancer cells. Genomic regions containing these exons, and 5'-flanking and intronic sequences, were determined and their order was established in chromosome 5p14-13. Promoters utilized in the transcription of previously characterized PRLR exons 1 species hE1(3) (hPII) and hE1(N1) (hP(N1)) were found to employ distinct mechanisms for controlling hPRLR transcription. hPIII requires C/EBP beta and Sp1/Sp3 for basal transcriptional activity, while hP(N1) activity is conferred by domains containing an Ets element and an NR half-site. The complex promoter control system that governs transcription of the hPRLR in multiple tissues is of relevance for studies on the regulation of PRLR expression in physiological and pathological states.


Assuntos
Regiões 5' não Traduzidas/genética , Éxons/genética , Regiões Promotoras Genéticas/genética , Receptores da Prolactina/genética , Sequência de Bases , Clonagem Molecular , Eletroforese , Humanos , Dados de Sequência Molecular , Técnicas de Amplificação de Ácido Nucleico , Plasmídeos/genética , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Terminologia como Assunto , Transcrição Gênica/genética
14.
J Steroid Biochem Mol Biol ; 82(2-3): 263-8, 2002 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-12477494

RESUMO

Human prolactin receptor (hPRLR) expression is regulated by estradiol-17beta (E(2)) in vivo in animal tissues, and in vitro in normal human endometrial cells and in MCF7 human breast cancer cells. The objective of this study was to determine the effect of E(2) on the expression of two recently described hPRLR isoforms with distinct exons-1, hE1(3) and hE1(N1) that are transcribed from the generic hPIII promoter, also present in the rat and mouse, and the human-specific promoter hP(N1), respectively. Also, to determine the effect of estradiol on the hPIII promoter activity in cancer cells. T47D breast cancer cells were examined using quantitative competitive RT-PCR for the level of expression of two alternative non-coding exon-1 transcripts, hE1(3) and hE1(N1) following incubation with E(2) in presence or absence of the E(2) receptor antagonist ICI 182,780. The effects of estradiol were also evaluated in cells transiently transfected with constructs of hPIII promoter luciferase reporter gene. E(2) significantly increased the expression of both hPRLR mRNA transcripts, hE1(3) and hE1(N1). In transfection studies E(2) activated the hPIII promoter. This effect of estradiol was markedly inhibited by coincubation with the E(2) receptor antagonist. Our results demonstrate a stimulatory effect of estradiol on the expression of hPRLR mRNA species with alternative exons-1, hE1(3) and hE1(N1) possibly through activation of their corresponding promoters. The lack of a formal ERE in these promoters suggested that the effect of estradiol is mediated through association of the activated ER with relevant DNA binding transfactor(s). These findings support the role of E(2) in the regulation of hPRLR expression in human breast cancer cell lines.


Assuntos
Processamento Alternativo , Estradiol/metabolismo , Éxons , Isoformas de Proteínas/metabolismo , Receptores da Prolactina/metabolismo , Animais , Neoplasias da Mama , Feminino , Regulação Neoplásica da Expressão Gênica , Genes Reporter , Humanos , Regiões Promotoras Genéticas , Isoformas de Proteínas/genética , Receptores de Estrogênio/genética , Receptores de Estrogênio/metabolismo , Receptores da Prolactina/genética , Células Tumorais Cultivadas
15.
Comput Biol Chem ; 28(5-6): 409-16, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15556482

RESUMO

The exponential growth of large-scale molecular sequence data and of the PubMed scientific literature has prompted active research in biological literature mining and information extraction to facilitate genome/proteome annotation and improve the quality of biological databases. Motivated by the promise of text mining methodologies, but at the same time, the lack of adequate curated data for training and benchmarking, the Protein Information Resource (PIR) has developed a resource for protein literature mining--iProLINK (integrated Protein Literature INformation and Knowledge). As PIR focuses its effort on the curation of the UniProt protein sequence database, the goal of iProLINK is to provide curated data sources that can be utilized for text mining research in the areas of bibliography mapping, annotation extraction, protein named entity recognition, and protein ontology development. The data sources for bibliography mapping and annotation extraction include mapped citations (PubMed ID to protein entry and feature line mapping) and annotation-tagged literature corpora. The latter includes several hundred abstracts and full-text articles tagged with experimentally validated post-translational modifications (PTMs) annotated in the PIR protein sequence database. The data sources for entity recognition and ontology development include a protein name dictionary, word token dictionaries, protein name-tagged literature corpora along with tagging guidelines, as well as a protein ontology based on PIRSF protein family names. iProLINK is freely accessible at http://pir.georgetown.edu/iprolink, with hypertext links for all downloadable files.


Assuntos
Bases de Dados de Proteínas , Serviços de Informação , Proteínas/química , Biologia Computacional , Bases de Dados Bibliográficas , Internet , Proteínas/classificação , Proteínas/genética , PubMed , Integração de Sistemas
16.
Methods Mol Biol ; 719: 547-71, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21370102

RESUMO

Genomic, proteomic, and other omic-based approaches are now broadly used in biomedical research to facilitate the understanding of disease mechanisms and identification of molecular targets and biomarkers for therapeutic and diagnostic development. While the Omics technologies and bioinformatics tools for analyzing Omics data are rapidly advancing, the functional analysis and interpretation of the data remain challenging due to the inherent nature of the generally long workflows of Omics experiments. We adopt a strategy that emphasizes the use of curated knowledge resources coupled with expert-guided examination and interpretation of Omics data for the selection of potential molecular targets. We describe a downstream workflow and procedures for functional analysis that focus on biological pathways, from which molecular targets can be derived and proposed for experimental validation.


Assuntos
Biologia Computacional/métodos , Animais , Biomarcadores/metabolismo , Interpretação Estatística de Dados , Mineração de Dados , Humanos , Gestão da Informação , Literatura Moderna , Camundongos , Anotação de Sequência Molecular , Fenótipo , Mapeamento de Interação de Proteínas , Ratos , Software
17.
PLoS One ; 6(6): e20410, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21738574

RESUMO

BACKGROUND: Estrogen is a known growth promoter for estrogen receptor (ER)-positive breast cancer cells. Paradoxically, in breast cancer cells that have been chronically deprived of estrogen stimulation, re-introduction of the hormone can induce apoptosis. METHODOLOGY/PRINCIPAL FINDINGS: Here, we sought to identify signaling networks that are triggered by estradiol (E2) in isogenic MCF-7 breast cancer cells that undergo apoptosis (MCF-7:5C) versus cells that proliferate upon exposure to E2 (MCF-7). The nuclear receptor co-activator AIB1 (Amplified in Breast Cancer-1) is known to be rate-limiting for E2-induced cell survival responses in MCF-7 cells and was found here to also be required for the induction of apoptosis by E2 in the MCF-7:5C cells. Proteins that interact with AIB1 as well as complexes that contain tyrosine phosphorylated proteins were isolated by immunoprecipitation and identified by mass spectrometry (MS) at baseline and after a brief exposure to E2 for two hours. Bioinformatic network analyses of the identified protein interactions were then used to analyze E2 signaling pathways that trigger apoptosis versus survival. Comparison of MS data with a computationally-predicted AIB1 interaction network showed that 26 proteins identified in this study are within this network, and are involved in signal transduction, transcription, cell cycle regulation and protein degradation. CONCLUSIONS: G-protein-coupled receptors, PI3 kinase, Wnt and Notch signaling pathways were most strongly associated with E2-induced proliferation or apoptosis and are integrated here into a global AIB1 signaling network that controls qualitatively distinct responses to estrogen.


Assuntos
Apoptose/efeitos dos fármacos , Neoplasias da Mama/metabolismo , Estradiol/farmacologia , Proteômica/métodos , Apoptose/genética , Feminino , Humanos , Imunoprecipitação , Espectrometria de Massas , Coativador 3 de Receptor Nuclear/genética , Coativador 3 de Receptor Nuclear/metabolismo , Fosfatidilinositol 3-Quinases/genética , Fosfatidilinositol 3-Quinases/metabolismo , Ligação Proteica , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Receptores Notch/genética , Receptores Notch/metabolismo , Transdução de Sinais , Células Tumorais Cultivadas , Proteínas Wnt/genética , Proteínas Wnt/metabolismo
18.
AMIA Annu Symp Proc ; 2009: 640-4, 2009 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-20351933

RESUMO

Glycosylation is a common and complex protein post-translational modification (PTM). In particular, mucin-type O-linked glycosylation is abundant and plays important biological functions. The number of determined glycosylation sites is still small and there remains the need of accurate computational prediction for annotation and functional understanding of proteins. PTM site prediction can be formulated as a machine learning task. An important step in applying machine learning to this task is encoding protein fragments as feature vectors. Here we assess existing encoding methods as well as an enhanced encoding method named composition of monomer spectrum (CMS) using support vector machines (SVMs). SVMs employing the existing encoding methods achieved AUC (area under ROC curve) of 90.3-91.3%, and ones employing CMS achieved AUC of 92.4%. Analysis of different encoding methods suggests the potential in further improving the prediction.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Glicosilação , Mucinas/metabolismo , Algoritmos , Área Sob a Curva , Sítios de Ligação , Processamento de Proteína Pós-Traducional
19.
J Proteomics Bioinform ; 1(2): 47-60, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-19088860

RESUMO

Functional analysis and interpretation of large-scale proteomics and gene expression data require effective use of bioinformatics tools and public knowledge resources coupled with expert-guided examination. An integrated bioinformatics approach was used to analyze cellular pathways in response to ionizing radiation. ATM, or ataxia-telangiectasia mutated , a serine-threonine protein kinase, plays critical roles in radiation responses, including cell cycle arrest and DNA repair. We analyzed radiation responsive pathways based on 2D-gel/MS proteomics and microarray gene expression data from fibroblasts expressing wild type or mutant ATM gene. The analysis showed that metabolism was significantly affected by radiation in an ATM dependent manner. In particular, purine metabolic pathways were differentially changed in the two cell lines. The expression of ribonucleoside-diphosphate reductase subunit M2 (RRM2) was increased in ATM-wild type cells at both mRNA and protein levels, but no changes were detected in ATM-mutated cells. Increased expression of p53 was observed 30min after irradiation of the ATM-wild type cells. These results suggest that RRM2 is a downstream target of the ATM-p53 pathway that mediates radiation-induced DNA repair. We demonstrated that the integrated bioinformatics approach facilitated pathway analysis, hypothesis generation and target gene/protein identification.

20.
Virus Genes ; 35(2): 175-86, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17508277

RESUMO

We have identified 72 completely conserved amino acid residues in the E protein of major groups of the Flavivirus genus by computational analyses. In the dengue species we have identified 12 highly conserved sequence regions, 186 negatively selected sites, and many dengue serotype-specific negatively selected sites. The flavivirus-conserved sites included residues involved in forming six disulfide bonds crucial for the structural integrity of the protein, the fusion motif involved in viral infectivity, and the interface residues of the oligomers. The structural analysis of the E protein showed 19 surface-exposed non-conserved residues, 128 dimer or trimer interface residues, and regions, which undergo major conformational change during trimerization. Eleven consensus T(h)-cell epitopes common to all four dengue serotypes were predicted. Most of these corresponded to dengue-conserved regions or negatively selected sites. Of special interest are six singular sites (N(37), Q(211), D(215), P(217), H(244), K(246)) in dengue E protein that are conserved, are part of the predicted consensus T(h)-cell epitopes and are exposed in the dimer or trimer. We propose these sites and corresponding epitopic regions as potential candidates for prioritization by experimental biologists for development of diagnostics and vaccines that may be difficult to circumvent by natural or man-made alteration of dengue virus.


Assuntos
Aminoácidos/genética , Biologia Computacional , Vacinas contra Dengue/imunologia , Vírus da Dengue/genética , Dengue/diagnóstico , Dengue/virologia , Análise de Sequência de Proteína , Proteínas do Envelope Viral/genética , Sequência de Aminoácidos , Aminoácidos/fisiologia , Sequência Conservada , Dengue/prevenção & controle , Vacinas contra Dengue/administração & dosagem , Vacinas contra Dengue/genética , Vírus da Dengue/imunologia , Vírus da Dengue/fisiologia , Marcação de Genes , Dados de Sequência Molecular , Alinhamento de Sequência , Proteínas do Envelope Viral/administração & dosagem , Proteínas do Envelope Viral/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA