Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Bioinformatics ; 29(1): 117-8, 2013 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-23110968

RESUMEN

SUMMARY: In higher eukaryotes, the identification of translation initiation sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using Arabidopsis thaliana (A.t.) information, we developed a prediction tool for signals within genomic sequences of plants that correspond to TISs. Our tool requires only genome sequence, not expressed sequences. Its sensitivity/specificity is for A.t. (90.75%/92.2%), for Vitis vinifera (66.8%/94.4%) and for Populus trichocarpa (81.6%/94.4%), which suggests that our tool can be used in annotation of different plant genomes. We provide a list of features used in our model. Further study of these features may improve our understanding of mechanisms of the translation initiation. AVAILABILITY AND IMPLEMENTATION: Our tool is implemented as an artificial neural network. It is available as a web-based tool and, together with the source code, the list of features, and data used for model development, is accessible at http://cbrc.kaust.edu.sa/dts.


Asunto(s)
Arabidopsis/genética , Iniciación de la Cadena Peptídica Traduccional , Programas Informáticos , Genoma de Planta , Genómica , Internet , Redes Neurales de la Computación , Motivos de Nucleótidos , Sensibilidad y Especificidad , Análisis de Secuencia de ADN
2.
Bioinformatics ; 28(5): 747-9, 2012 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-22238258

RESUMEN

MOTIVATION: Molecular interaction information, such as protein-protein interactions and protein-small molecule interactions, is indispensable for understanding the mechanism of biological processes and discovering treatments for diseases. Many databases have been built by manual annotation of literature to organize such information into structured form. However, most databases focus on only one type of interactions, which are often not well annotated and integrated with related functional information. RESULTS: In this study, we integrate molecular interaction information from literature by automatic information extraction and from manually annotated databases. We further integrate the relationships between protein/gene and other bio-entity terms including gene ontology terms, pathways, species and diseases to build an integrated molecular interaction database (IMID). Interactions can be selected by their associated probabilities. IMID allows complex and versatile queries for context-specific molecular interactions, which are not available currently in other molecular interaction databases. AVAILABILITY: The database is located at www.integrativebiology.org.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/metabolismo , Humanos , Internet , Unión Proteica , Mapas de Interacción de Proteínas , Vocabulario Controlado
3.
Bioinformatics ; 28(1): 127-9, 2012 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22088842

RESUMEN

MOTIVATION: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. CONTACT: vladimir.bajic@kaust.edu.sa SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Poli A/análisis , Genoma Humano , Humanos , Internet , Poli A/genética , Sensibilidad y Especificidad , Programas Informáticos
4.
Am J Respir Cell Mol Biol ; 47(1): 112-9, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22383585

RESUMEN

Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers.


Asunto(s)
Bases de Datos Genéticas , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas , Enfermedades Respiratorias/genética , Regulación de la Expresión Génica , Genómica/métodos , Humanos , Factores de Transcripción/genética
5.
Bioinformatics ; 25(12): 1536-42, 2009 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-19369495

RESUMEN

MOTIVATION: Protein-protein interaction (PPI) extraction from published biological articles has attracted much attention because of the importance of protein interactions in biological processes. Despite significant progress, mining PPIs from literatures still rely heavily on time- and resource-consuming manual annotations. RESULTS: In this study, we developed a novel methodology based on Bayesian networks (BNs) for extracting PPI triplets (a PPI triplet consists of two protein names and the corresponding interaction word) from unstructured text. The method achieved an overall accuracy of 87% on a cross-validation test using manually annotated dataset. We also showed, through extracting PPI triplets from a large number of PubMed abstracts, that our method was able to complement human annotations to extract large number of new PPIs from literature. AVAILABILITY: Programs/scripts we developed/used in the study are available at http://stat.fsu.edu/~jinfeng/datasets/Bio-SI-programs-Bayesian-chowdhary-zhang-liu.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Teorema de Bayes , Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Sitios de Unión , Bases de Datos de Proteínas , Proteínas/metabolismo
7.
Bioinformatics ; 22(18): 2310-2, 2006 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-16613910

RESUMEN

UNLABELLED: Dragon Promoter Mapper (DPM) is a tool to model promoter structure of co-regulated genes using methodology of Bayesian networks. DPM exploits an exhaustive set of motif features (such as motif, its strand, the order of motif occurrence and mutual distance between the adjacent motifs) and generates models from the target promoter sequences, which may be used to (1) detect regions in a genomic sequence which are similar to the target promoters or (2) to classify other promoters as similar or not to the target promoter group. DPM can also be used for modelling of enhancers and silencers. AVAILABILITY: http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/ CONTACT: vlad@sanbi.ac.za SUPPLEMENTARY INFORMATION: Manual for using DPM web server is provided at http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/html/manual/manual.htm.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Modelos Genéticos , Regiones Promotoras Genéticas/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Interfaz Usuario-Computador , Teorema de Bayes , Simulación por Computador , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos
8.
BMC Bioinformatics ; 7 Suppl 5: S8, 2006 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-17254313

RESUMEN

BACKGROUND: Mammalian antimicrobial peptides (AMPs) are effectors of the innate immune response. A multitude of signals coming from pathways of mammalian pathogen/pattern recognition receptors and other proteins affect the expression of AMP-coding genes (AMPcgs). For many AMPcgs the promoter elements and transcription factors that control their tissue cell-specific expression have yet to be fully identified and characterized. RESULTS: Based upon the RIKEN full-length cDNA and public sequence data derived from human, mouse and rat, we identified 178 candidate AMP transcripts derived from 61 genes belonging to 29 AMP families. However, only for 31 mouse genes belonging to 22 AMP families we were able to determine true orthologous relationships with 30 human and 15 rat sequences. We screened the promoter regions of AMPcgs in the three species for motifs by an ab initio motif finding method and analyzed the derived promoter characteristics. Promoter models were developed for alpha-defensins, penk and zap AMP families. The results suggest a core set of transcription factors (TFs) that regulate the transcription of AMPcg families in mouse, rat and human. The three most frequent core TFs groups include liver-, nervous system-specific and nuclear hormone receptors (NHRs). Out of 440 motifs analyzed, we found that three represent potentially novel TF-binding motifs enriched in promoters of AMPcgs, while the other four motifs appear to be species-specific. CONCLUSION: Our large-scale computational analysis of promoters of 22 families of AMPcgs across three mammalian species suggests that their key transcriptional regulators are likely to be TFs of the liver-, nervous system-specific and NHR groups. The computationally inferred promoter elements and potential TF binding motifs provide a rich resource for targeted experimental validation of TF binding and signaling studies that aim at the regulation of mouse, rat or human AMPcgs.


Asunto(s)
Péptidos Catiónicos Antimicrobianos/genética , Biología Computacional/métodos , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN/métodos , Animales , Sitios de Unión , Proteínas Portadoras/genética , Encefalinas/genética , Humanos , Ratones , Familia de Multigenes/genética , Precursores de Proteínas/genética , Proteínas de Unión al ARN , Ratas , Factores de Transcripción/metabolismo , alfa-Defensinas/genética
9.
Int J Data Min Bioinform ; 7(4): 450-62, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23798227

RESUMEN

Information on Protein Interactions (Pls) is valuable for biomedical research, but often lies buried in the scientific literature and cannot be readily retrieved. While much progress has been made over the years in extracting Pls from the literature using computational methods, there is a lack of free, public, user-friendly tools for the discovery of Pls. We developed an online tool for the extraction of PI relationships from PubMed-abstracts, which we name PIMiner. Protein pairs and the words that describe their interactions are reported by PIMiner so that new interactions can be easily detected within text. The interaction likelihood levels are reported too. The option to extract only specific types of interactions is also provided. The PIMiner server can be accessed through a web browser or remotely through a client's command line. PIMiner can process 50,000 PubMed abstracts in approximately 7 min and thus appears suitable for large-scale processing of biological/biomedical literature.


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Programas Informáticos , Sitios de Unión , Almacenamiento y Recuperación de la Información , Internet , Proteínas/metabolismo , PubMed
10.
PLoS One ; 8(7): e68857, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23874789

RESUMEN

BACKGROUND: Initiation of transcription is essential for most of the cellular responses to environmental conditions and for cell and tissue specificity. This process is regulated through numerous proteins, their ligands and mutual interactions, as well as interactions with DNA. The key such regulatory proteins are transcription factors (TFs) and transcription co-factors (TcoFs). TcoFs are important since they modulate the transcription initiation process through interaction with TFs. In eukaryotes, transcription requires that TFs form different protein complexes with various nuclear proteins. To better understand transcription regulation, it is important to know the functional class of proteins interacting with TFs during transcription initiation. Such information is not fully available, since not all proteins that act as TFs or TcoFs are yet annotated as such, due to generally partial functional annotation of proteins. In this study we have developed a method to predict, using only sequence composition of the interacting proteins, the functional class of human TF binding partners to be (i) TF, (ii) TcoF, or (iii) other nuclear protein. This allows for complementing the annotation of the currently known pool of nuclear proteins. Since only the knowledge of protein sequences is required in addition to protein interaction, the method should be easily applicable to many species. RESULTS: Based on experimentally validated interactions between human TFs with different TFs, TcoFs and other nuclear proteins, our two classification systems (implemented as a web-based application) achieve high accuracies in distinguishing TFs and TcoFs from other nuclear proteins, and TFs from TcoFs respectively. CONCLUSION: As demonstrated, given the fact that two proteins are capable of forming direct physical interactions and using only information about their sequence composition, we have developed a completely new method for predicting a functional class of TF interacting protein partners with high precision and accuracy.


Asunto(s)
Biología Computacional/métodos , Complejos Multiproteicos/metabolismo , Factores de Transcripción/metabolismo , Bases de Datos de Proteínas , Humanos , Unión Proteica
11.
PLoS One ; 7(4): e34480, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22493694

RESUMEN

BACKGROUND: Protein interaction networks (PINs) specific within a particular context contain crucial information regarding many cellular biological processes. For example, PINs may include information on the type and directionality of interaction (e.g. phosphorylation), location of interaction (i.e. tissues, cells), and related diseases. Currently, very few tools are capable of deriving context-specific PINs for conducting exploratory analysis. RESULTS: We developed a literature-based online system, Context-specific Protein Network Miner (CPNM), which derives context-specific PINs in real-time from the PubMed database based on a set of user-input keywords and enhanced PubMed query system. CPNM reports enriched information on protein interactions (with type and directionality), their network topology with summary statistics (e.g. most densely connected proteins in the network; most densely connected protein-pairs; and proteins connected by most inbound/outbound links) that can be explored via a user-friendly interface. Some of the novel features of the CPNM system include PIN generation, ontology-based PubMed query enhancement, real-time, user-queried, up-to-date PubMed document processing, and prediction of PIN directionality. CONCLUSIONS: CPNM provides a tool for biologists to explore PINs. It is freely accessible at http://www.biotextminer.com/CPNM/.


Asunto(s)
Minería de Datos/métodos , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Programas Informáticos , Algoritmos , Bases de Datos Genéticas , Humanos , Internet , Proteínas/genética , PubMed , Interfaz Usuario-Computador
12.
Artículo en Inglés | MEDLINE | ID: mdl-23367189

RESUMEN

Electronic Health Records (EHR) contain large amounts of useful information that could potentially be used for building models for predicting onset of diseases. In this study, we have investigated the use of free-text and coded data in Marshfield Clinic's EHR, individually and in combination for building machine learning based models to predict the first ever episode of atrial fibrillation and/or atrial flutter (AFF). We trained and evaluated our AFF models on the EHR data across different time intervals (1, 3, 5 and all years) prior to first documented onset of AFF. We applied several machine learning methods, including naïve bayes, support vector machines (SVM), logistic regression and random forests for building AFF prediction models and evaluated these using 10-fold cross-validation approach. On text-based datasets, the best model achieved an F-measure of 60.1%, when applied exclusively to coded data. The combination of textual and coded data achieved comparable performance. The study results attest to the relative merit of utilizing textual data to complement the use of coded data for disease onset prediction modeling.


Asunto(s)
Fibrilación Atrial/diagnóstico , Aleteo Atrial/diagnóstico , Registros Electrónicos de Salud , Humanos
13.
PLoS One ; 6(6): e21474, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21738677

RESUMEN

A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein-protein interactions, protein/gene regulations, protein-small molecule interactions, protein-GO relationships, protein-pathway relationships, and pathway-disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses--the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.


Asunto(s)
Biología Computacional/métodos , Algoritmos , Bases de Datos Factuales , Proteínas
14.
BMC Syst Biol ; 4 Suppl 1: S4, 2010 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-20522254

RESUMEN

BACKGROUND: The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes. RESULTS: We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters. CONCLUSIONS: This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that participate in regulation of histone and histone-coregulated gene transcription initiation. While these hypotheses still remain to be verified, we believe that these form a useful resource for researchers to further explore regulation of human histone genes and human genome. It is worthwhile to note that the regulatory regions of the human genome remain largely un-annotated even today and this study is an attempt to supplement our understanding of histone regulatory regions.


Asunto(s)
Genoma Humano/genética , Genómica , Histonas/genética , Regiones Promotoras Genéticas/genética , Teorema de Bayes , Humanos
15.
Int J Bioinform Res Appl ; 2(3): 282-8, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-18048166

RESUMEN

The standard practice in the analysis of promoters is to select promoter regions of convenient length. This may lead to false results when searching for Transcription Factor Binding Sites (TFBSs), since the sequences may contain coding segments. In such cases, motif detection may single out motifs from the coding regions. The mapping of TFBSs to promoters may result in a misleading picture of 'promoter' content. We illustrate these issues using the example of histones H2A and H2B and show how such analysis could be misleading if care is not exercised to eliminate coding regions from the presumed promoter sequences.


Asunto(s)
Biología Computacional/métodos , Regiones Promotoras Genéticas , Secuencias de Aminoácidos , Sitios de Unión , Histonas/química , Humanos , Modelos Genéticos , Unión Proteica , Estructura Terciaria de Proteína , Programas Informáticos , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Transcripción Genética
16.
Bioinformatics ; 21(11): 2623-8, 2005 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-15769833

RESUMEN

MOTIVATION: Histone proteins play important roles in chromosomal functions. They are significantly evolutionarily conserved across species, which suggests similarity in their transcription regulation. The abundance of experimental data on histone promoters provides an excellent background for the evaluation of computational methods. Our study addresses the issue of how well computational analysis can contribute to unveiling the biologically relevant content of promoter regions for a large number of mammalian histone genes taken across several species, and suggests the consensus promoter models of different histone groups. RESULTS: This is the first study to unveil the detailed promoter structures of all five mammalian histone groups and their subgroups. This is also the most comprehensive computational analysis of histone promoters performed to date. The most exciting fact is that the results correlate very well with the biologically known facts and experimental data. Our analysis convincingly demonstrates that computational approach can significantly contribute to elucidation of promoter content (identification of biologically relevant signals) complementing tedious wet-lab experiments. We believe that this type of analysis can be easily applied to other functional gene classes, thus providing a general framework for modelling promoter groups. These results also provide the basis to hunt for genes co-regulated with histone genes across mammalian genomes.


Asunto(s)
Algoritmos , Evolución Molecular , Histonas/genética , Modelos Genéticos , Regiones Promotoras Genéticas/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Animales , Secuencia Conservada , Humanos , Ratones , Filogenia , Ratas , Homología de Secuencia de Ácido Nucleico , Especificidad de la Especie
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA