Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Int J Data Min Bioinform ; 7(4): 450-62, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23798227

RESUMO

Information on Protein Interactions (Pls) is valuable for biomedical research, but often lies buried in the scientific literature and cannot be readily retrieved. While much progress has been made over the years in extracting Pls from the literature using computational methods, there is a lack of free, public, user-friendly tools for the discovery of Pls. We developed an online tool for the extraction of PI relationships from PubMed-abstracts, which we name PIMiner. Protein pairs and the words that describe their interactions are reported by PIMiner so that new interactions can be easily detected within text. The interaction likelihood levels are reported too. The option to extract only specific types of interactions is also provided. The PIMiner server can be accessed through a web browser or remotely through a client's command line. PIMiner can process 50,000 PubMed abstracts in approximately 7 min and thus appears suitable for large-scale processing of biological/biomedical literature.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Software , Sítios de Ligação , Armazenamento e Recuperação da Informação , Internet , Proteínas/metabolismo , PubMed
2.
PLoS One ; 7(4): e34480, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22493694

RESUMO

BACKGROUND: Protein interaction networks (PINs) specific within a particular context contain crucial information regarding many cellular biological processes. For example, PINs may include information on the type and directionality of interaction (e.g. phosphorylation), location of interaction (i.e. tissues, cells), and related diseases. Currently, very few tools are capable of deriving context-specific PINs for conducting exploratory analysis. RESULTS: We developed a literature-based online system, Context-specific Protein Network Miner (CPNM), which derives context-specific PINs in real-time from the PubMed database based on a set of user-input keywords and enhanced PubMed query system. CPNM reports enriched information on protein interactions (with type and directionality), their network topology with summary statistics (e.g. most densely connected proteins in the network; most densely connected protein-pairs; and proteins connected by most inbound/outbound links) that can be explored via a user-friendly interface. Some of the novel features of the CPNM system include PIN generation, ontology-based PubMed query enhancement, real-time, user-queried, up-to-date PubMed document processing, and prediction of PIN directionality. CONCLUSIONS: CPNM provides a tool for biologists to explore PINs. It is freely accessible at http://www.biotextminer.com/CPNM/.


Assuntos
Mineração de Dados/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/metabolismo , Software , Algoritmos , Bases de Dados Genéticas , Humanos , Internet , Proteínas/genética , PubMed , Interface Usuário-Computador
3.
Am J Respir Cell Mol Biol ; 47(1): 112-9, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22383585

RESUMO

Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers.


Assuntos
Bases de Dados Genéticas , Anotação de Sequência Molecular , Regiões Promotoras Genéticas , Doenças Respiratórias/genética , Regulação da Expressão Gênica , Genômica/métodos , Humanos , Fatores de Transcrição/genética
4.
Artigo em Inglês | MEDLINE | ID: mdl-23367189

RESUMO

Electronic Health Records (EHR) contain large amounts of useful information that could potentially be used for building models for predicting onset of diseases. In this study, we have investigated the use of free-text and coded data in Marshfield Clinic's EHR, individually and in combination for building machine learning based models to predict the first ever episode of atrial fibrillation and/or atrial flutter (AFF). We trained and evaluated our AFF models on the EHR data across different time intervals (1, 3, 5 and all years) prior to first documented onset of AFF. We applied several machine learning methods, including naïve bayes, support vector machines (SVM), logistic regression and random forests for building AFF prediction models and evaluated these using 10-fold cross-validation approach. On text-based datasets, the best model achieved an F-measure of 60.1%, when applied exclusively to coded data. The combination of textual and coded data achieved comparable performance. The study results attest to the relative merit of utilizing textual data to complement the use of coded data for disease onset prediction modeling.


Assuntos
Fibrilação Atrial/diagnóstico , Flutter Atrial/diagnóstico , Registros Eletrônicos de Saúde , Humanos
5.
Nucleic Acids Res ; 35(Database issue): D732-6, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17090589

RESUMO

Estrogen has a profound impact on human physiology affecting transcription of numerous genes. To decipher functional characteristics of estrogen responsive genes, we developed KnowledgeBase for Estrogen Responsive Genes (KBERG). Genes in KBERG were derived from Estrogen Responsive Gene Database (ERGDB) and were analyzed from multiple aspects. We explored the possible transcription regulation mechanism by capturing highly conserved promoter motifs across orthologous genes, using promoter regions that cover the range of [-1200, +500] relative to the transcription start sites. The motif detection is based on ab initio discovery of common cis-elements from the orthologous gene cluster from human, mouse and rat, thus reflecting a degree of promoter sequence preservation during evolution. The identified motifs are linked to transcription factor binding sites based on the TRANSFAC database. In addition, KBERG uses two established ontology systems, GO and eVOC, to associate genes with their function. Users may assess gene functionality through the description terms in GO. Alternatively, they can gain gene co-expression information through evidence from human EST libraries via eVOC. KBERG is a user-friendly system that provides links to other relevant resources such as ERGDB, UniGene, Entrez Gene, HomoloGene, GO, eVOC and GenBank, and thus offers a platform for functional exploration and potential annotation of genes responsive to estrogen. KBERG database can be accessed at http://research.i2r.a-star.edu.sg/kberg.


Assuntos
Bases de Dados de Ácidos Nucleicos , Estrogênios/fisiologia , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Animais , Sequência de Bases , Sítios de Ligação , Sequência Conservada , Humanos , Internet , Camundongos , Ratos , Análise de Sequência de DNA , Transcrição Gênica , Interface Usuário-Computador
6.
Genome Biol ; 7 Suppl 1: S3.1-13, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16925837

RESUMO

BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Genômica/métodos , Regiões Promotoras Genéticas , Biologia Computacional/normas , Bases de Dados Genéticas , Genes , Genômica/normas , Humanos , RNA Mensageiro/análise , Análise de Sequência de DNA , Análise de Sequência de RNA
7.
PLoS Genet ; 2(4): e23, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16683022

RESUMO

The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo-messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo-messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.


Assuntos
RNA Mensageiro/genética , Transcrição Gênica , Animais , Elementos de DNA Transponíveis , Evolução Molecular , Humanos , Camundongos , Regiões Promotoras Genéticas , Proteínas/genética , Pseudogenes , Reprodutibilidade dos Testes , Alinhamento de Sequência
8.
PLoS Genet ; 2(4): e47, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16683030

RESUMO

Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis-antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis-antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cis-antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis-antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis-antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis-antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.


Assuntos
Mapeamento Cromossômico , Genoma , Camundongos , Animais , Camundongos/genética , Pareamento de Bases , Primers do DNA , Genoma Humano , Regiões Promotoras Genéticas , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Humanos
9.
PLoS Genet ; 2(4): e54, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16683032

RESUMO

Using the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and cis-elements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools.


Assuntos
Camundongos , Regiões Promotoras Genéticas , Transcrição Gênica , Animais , Camundongos/genética , Composição de Bases , Bases de Dados de Ácidos Nucleicos , Fosfatos de Dinucleosídeos , DNA Complementar/genética , Biblioteca Gênica , TATA Box , Humanos
10.
Bioinformatics ; 22(18): 2310-2, 2006 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-16613910

RESUMO

UNLABELLED: Dragon Promoter Mapper (DPM) is a tool to model promoter structure of co-regulated genes using methodology of Bayesian networks. DPM exploits an exhaustive set of motif features (such as motif, its strand, the order of motif occurrence and mutual distance between the adjacent motifs) and generates models from the target promoter sequences, which may be used to (1) detect regions in a genomic sequence which are similar to the target promoters or (2) to classify other promoters as similar or not to the target promoter group. DPM can also be used for modelling of enhancers and silencers. AVAILABILITY: http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/ CONTACT: vlad@sanbi.ac.za SUPPLEMENTARY INFORMATION: Manual for using DPM web server is provided at http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/html/manual/manual.htm.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Modelos Genéticos , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNA/métodos , Software , Interface Usuário-Computador , Teorema de Bayes , Simulação por Computador , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos
11.
Nat Genet ; 38(6): 626-35, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16645617

RESUMO

Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.


Assuntos
Evolução Molecular , Regiões Promotoras Genéticas , Regiões 3' não Traduzidas , Animais , Sequência de Bases , DNA , Genoma , Proteoma , TATA Box
12.
BMC Bioinformatics ; 7 Suppl 5: S8, 2006 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-17254313

RESUMO

BACKGROUND: Mammalian antimicrobial peptides (AMPs) are effectors of the innate immune response. A multitude of signals coming from pathways of mammalian pathogen/pattern recognition receptors and other proteins affect the expression of AMP-coding genes (AMPcgs). For many AMPcgs the promoter elements and transcription factors that control their tissue cell-specific expression have yet to be fully identified and characterized. RESULTS: Based upon the RIKEN full-length cDNA and public sequence data derived from human, mouse and rat, we identified 178 candidate AMP transcripts derived from 61 genes belonging to 29 AMP families. However, only for 31 mouse genes belonging to 22 AMP families we were able to determine true orthologous relationships with 30 human and 15 rat sequences. We screened the promoter regions of AMPcgs in the three species for motifs by an ab initio motif finding method and analyzed the derived promoter characteristics. Promoter models were developed for alpha-defensins, penk and zap AMP families. The results suggest a core set of transcription factors (TFs) that regulate the transcription of AMPcg families in mouse, rat and human. The three most frequent core TFs groups include liver-, nervous system-specific and nuclear hormone receptors (NHRs). Out of 440 motifs analyzed, we found that three represent potentially novel TF-binding motifs enriched in promoters of AMPcgs, while the other four motifs appear to be species-specific. CONCLUSION: Our large-scale computational analysis of promoters of 22 families of AMPcgs across three mammalian species suggests that their key transcriptional regulators are likely to be TFs of the liver-, nervous system-specific and NHR groups. The computationally inferred promoter elements and potential TF binding motifs provide a rich resource for targeted experimental validation of TF binding and signaling studies that aim at the regulation of mouse, rat or human AMPcgs.


Assuntos
Peptídeos Catiônicos Antimicrobianos/genética , Biologia Computacional/métodos , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , Animais , Sítios de Ligação , Proteínas de Transporte/genética , Encefalinas/genética , Humanos , Camundongos , Família Multigênica/genética , Precursores de Proteínas/genética , Proteínas de Ligação a RNA , Ratos , Fatores de Transcrição/metabolismo , alfa-Defensinas/genética
13.
Nucleic Acids Res ; 32(21): 6212-7, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15576347

RESUMO

Estrogen has a profound impact on human physiology and affects numerous genes. The classical estrogen reaction is mediated by its receptors (ERs), which bind to the estrogen response elements (EREs) in target gene's promoter region. Due to tedious and expensive experiments, a limited number of human genes are functionally well characterized. It is still unclear how many and which human genes respond to estrogen treatment. We propose a simple, economic, yet effective computational method to predict a subclass of estrogen responsive genes. Our method relies on the similarity of ERE frames across different promoters in the human genome. Matching ERE frames of a test set of 60 known estrogen responsive genes to the collection of over 18,000 human promoters, we obtained 604 candidate genes. Evaluating our result by comparison with the published microarray data and literature, we found that more than half (53.6%, 324/604) of predicted candidate genes are responsive to estrogen. We believe this method can significantly reduce the number of testing potential estrogen target genes and provide functional clues for annotating part of genes that lack functional information.


Assuntos
Biologia Computacional/métodos , Estrogênios/farmacologia , Regulação da Expressão Gênica , Genômica/métodos , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Regiões Promotoras Genéticas , Elementos de Resposta
14.
Nat Biotechnol ; 22(11): 1467-73, 2004 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-15529174

RESUMO

Promoter prediction programs (PPPs) are important for in silico gene discovery without support from expressed sequence tag (EST)/cDNA/mRNA sequences, in the analysis of gene regulation and in genome annotation. Contrary to previous expectations, a comprehensive analysis of PPPs reveals that no program simultaneously achieves sensitivity and a positive predictive value >65%. PPP performances deduced from a limited number of chromosomes or smaller data sets do not hold when evaluated at the level of the whole genome, with serious inaccuracy of predictions for non-CpG-island-related promoters. Some PPPs even perform worse than, or close to, pure random guessing.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genoma Humano , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Ilhas de CpG/genética , Humanos , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Validação de Programas de Computador
15.
Nucleic Acids Res ; 32(Web Server issue): W230-4, 2004 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15215386

RESUMO

We present Dragon TF Association Miner (DTFAM), a system for text-mining of PubMed documents for potential functional association of transcription factors (TFs) with terms from Gene Ontology (GO) and with diseases. DTFAM has been trained and tested in the selection of relevant documents on a manually curated dataset containing >3000 PubMed abstracts relevant to transcription control. On our test data the system achieves sensitivity of 80% with specificity of 82%. DTFAM provides comprehensive tabular and graphical reports linking terms to relevant sets of documents. These documents are color-coded for easier inspection. DTFAM complements the existing biological resources by collecting, assessing, extracting and presenting associations that can reveal some of the not so easily observable connections among the entities found which could explain the functions of TFs and help decipher parts of gene transcriptional regulatory networks. DTFAM summarizes information from a large volume of documents saving time and making analysis simpler for individual users. DTFAM is freely available for academic and non-profit users at http://research.i2r.a-star.edu.sg/DRAGON/TFAM/.


Assuntos
Software , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Internet , PubMed , Transdução de Sinais , Fatores de Transcrição/fisiologia , Transcrição Gênica , Interface Usuário-Computador
16.
Nucleic Acids Res ; 31(13): 3605-7, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12824376

RESUMO

We present a unique program for identification of estrogen response elements (EREs) in genomic DNA and related analyses. The detection algorithm was tested on several large datasets and makes one prediction in 13 300 nt while achieving a sensitivity of 83%. Users can further investigate selected regions around the identified ERE patterns for transcription factor binding sites based on the TRANSFAC database. It is also possible to search for candidate human genes with a match for the identified EREs and their flanking regions within EPD annotated promoters. Additionally, users can search among the extended promoter regions of approximately 11 000 human genes for those that have a high degree of similarity to the identified ERE patterns. Dragon ERE Finder version 2 is freely available for academic and non-profit users (http://sdmc.lit.org.sg/ERE-V2/index).


Assuntos
Estrogênios/fisiologia , Elementos de Resposta , Análise de Sequência de DNA/métodos , Software , Vertebrados/genética , Algoritmos , Animais , Sítios de Ligação , Genoma , Humanos , Internet , Regiões Promotoras Genéticas , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA