Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
J Biomed Semantics ; 5: 37, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25852852

RESUMO

BACKGROUND: Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions. CONSTRUCTION AND CONTENT: Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as 'cell line', 'cell line cell', 'cell line culturing', and 'mortal' vs. 'immortal cell line cell'. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms. UTILITY AND DISCUSSION: The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO's utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.

2.
Bioinformatics ; 26(24): 3135-7, 2010 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-21123224

RESUMO

SUMMARY: GLay provides Cytoscape users an assorted collection of versatile community structure algorithms and graph layout functions for network clustering and structured visualization. High performance is achieved by dynamically linking highly optimized C functions to the Cytoscape JAVA program, which makes GLay especially suitable for decomposition, display and exploratory analysis of large biological networks. AVAILABILITY: http://brainarray.mbni.med.umich.edu/glay/.


Assuntos
Modelos Biológicos , Software , Algoritmos , Gráficos por Computador
3.
BMC Med Genomics ; 3: 49, 2010 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-20979611

RESUMO

BACKGROUND: Reactive oxygen species (ROS) are known mediators of cellular damage in multiple diseases including diabetic complications. Despite its importance, no comprehensive database is currently available for the genes associated with ROS. METHODS: We present ROS- and diabetes-related targets (genes/proteins) collected from the biomedical literature through a text mining technology. A web-based literature mining tool, SciMiner, was applied to 1,154 biomedical papers indexed with diabetes and ROS by PubMed to identify relevant targets. Over-represented targets in the ROS-diabetes literature were obtained through comparisons against randomly selected literature. The expression levels of nine genes, selected from the top ranked ROS-diabetes set, were measured in the dorsal root ganglia (DRG) of diabetic and non-diabetic DBA/2J mice in order to evaluate the biological relevance of literature-derived targets in the pathogenesis of diabetic neuropathy. RESULTS: SciMiner identified 1,026 ROS- and diabetes-related targets from the 1,154 biomedical papers (http://jdrf.neurology.med.umich.edu/ROSDiabetes/). Fifty-three targets were significantly over-represented in the ROS-diabetes literature compared to randomly selected literature. These over-represented targets included well-known members of the oxidative stress response including catalase, the NADPH oxidase family, and the superoxide dismutase family of proteins. Eight of the nine selected genes exhibited significant differential expression between diabetic and non-diabetic mice. For six genes, the direction of expression change in diabetes paralleled enhanced oxidative stress in the DRG. CONCLUSIONS: Literature mining compiled ROS-diabetes related targets from the biomedical literature and led us to evaluate the biological relevance of selected targets in the pathogenesis of diabetic neuropathy.


Assuntos
Mineração de Dados/métodos , Diabetes Mellitus/genética , Diabetes Mellitus/metabolismo , Publicações Periódicas como Assunto , Espécies Reativas de Oxigênio/metabolismo , Pesquisa , Animais , Nefropatias Diabéticas/genética , Nefropatias Diabéticas/metabolismo , Gânglios Espinais/metabolismo , Perfilação da Expressão Gênica , Humanos , Camundongos , Proteínas/metabolismo , Células Receptoras Sensoriais/metabolismo
4.
J Bioinform Comput Biol ; 8(2): 219-46, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20401945

RESUMO

Gene regulation in eukaryotes involves a complex interplay between the proximal promoter and distal genomic elements (such as enhancers) which work in concert to drive precise spatio-temporal gene expression. The experimental localization and characterization of gene regulatory elements is a very complex and resource-intensive process. The computational identification of regulatory regions that confer spatiotemporally specific tissue-restricted expression of a gene is thus an important challenge for computational biology. One of the most popular strategies for enhancer localization from DNA sequence is the use of conservation-based prefiltering and more recently, the use of canonical (transcription factor motifs) or de novo tissue-specific sequence motifs. However, there is an ongoing effort in the computational biology community to further improve the fidelity of enhancer predictions from sequence data by integrating other, complementary genomic modalities. In this work, we propose a framework that complements existing methodologies for prospective enhancer identification. The methods in this work are derived from two key insights: (i) that chromatin modification signatures can discriminate proximal and distally located regulatory regions and (ii) the notion of promoter-enhancer cross-talk (as assayed in 3C/5C experiments) might have implications in the search for regulatory sequences that co-operate with the promoter to yield tissue-restricted, gene-specific expression.


Assuntos
Ativação Transcricional , Animais , Sequência de Bases , Biologia Computacional , DNA/genética , DNA/metabolismo , Bases de Dados Genéticas , Elementos Facilitadores Genéticos , Expressão Gênica , Humanos , Rim/metabolismo , Camundongos , Modelos Genéticos , Regiões Promotoras Genéticas , Mapeamento de Interação de Proteínas , Fatores de Transcrição/metabolismo
5.
Endocrinology ; 150(8): 3645-54, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19406940

RESUMO

For insight into transcriptional mechanisms mediating physiological responses to GH, data mining was performed on a profile of GH-regulated genes induced or inhibited at different times in highly responsive 3T3-F442A adipocytes. Gene set enrichment analysis indicated that GH-regulated genes are enriched in pathways including phosphoinositide and insulin signaling and suggested that suppressor of cytokine signaling 2 (SOCS2) and phosphoinositide 3' kinase regulatory subunit p85alpha (Pik3r1) are important targets. Model-based Chinese restaurant clustering identified a group of genes highly regulated by GH at times consistent with its key physiological actions. This cluster included IGF-I, phosphoinositide 3' kinase p85alpha, SOCS2, and cytokine-inducible SH2-containing protein. It also contains the most strongly repressed gene in the profile, B cell lymphoma 6 (Bcl6), a transcriptional repressor. Quantitative real-time PCR verified the strong decrease in Bcl6 mRNA after GH treatment and induction of the other genes in the cluster. Transcriptional network analysis of the genes implicated signal transducer and activator of transcription (Stat) 5 as hub regulating the most responsive genes, Igf1, Socs2, Cish, and Bcl6. Transcriptional activation analysis demonstrated that Bcl6 inhibits SOCS2-luciferase and blunts its stimulation by GH. Occupancy of endogenous Bcl6 on SOCS2 DNA decreased after GH treatment, whereas occupancy of Stat5 increased concomitantly. Thus, GH-mediated inhibition of Bcl6 expression may reverse the repression of SOCS2 and facilitate SOCS2 activation by GH. Together these analyses identify Bcl6 as a participant in GH-regulated gene expression and suggest an interplay between the repressor Bcl6 and the activator Stat5 in regulating genes, which contribute to GH responses.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/fisiologia , Regulação da Expressão Gênica/efeitos dos fármacos , Regulação da Expressão Gênica/genética , Hormônio do Crescimento/farmacologia , Adipócitos/efeitos dos fármacos , Adipócitos/metabolismo , Animais , Linhagem Celular , Imunoprecipitação da Cromatina , Proteínas de Ligação a DNA/genética , Immunoblotting , Camundongos , Proteínas Proto-Oncogênicas c-bcl-6 , Fator de Transcrição STAT5/genética , Fator de Transcrição STAT5/fisiologia , Proteínas Supressoras da Sinalização de Citocina/genética , Proteínas Supressoras da Sinalização de Citocina/fisiologia , Transcrição Gênica/efeitos dos fármacos , Transcrição Gênica/genética
6.
Bioinformatics ; 25(6): 838-40, 2009 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-19188191

RESUMO

UNLABELLED: SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources. Ambiguous acronyms are resolved by a scoring scheme based on the co-occurrence of acronyms and corresponding description terms, which incorporates optional user-defined filters. Functional enrichment analyses are used to identify highly relevant targets (genes and proteins), GO (Gene Ontology) terms, MeSH (Medical Subject Headings) terms, pathways and protein-protein interaction networks by comparing identified targets from one search result with those from other searches or to the full HGNC [HUGO (Human Genome Organization) Gene Nomenclature Committee] gene set. The performance of gene/protein name identification was evaluated using the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) version 2 (Year 2006) Gene Normalization Task as a gold standard. SciMiner achieved 87.1% recall, 71.3% precision and 75.8% F-measure. SciMiner's literature mining performance coupled with functional enrichment analyses provides an efficient platform for retrieval and summary of rich biological information from corpora of users' interests. AVAILABILITY: http://jdrf.neurology.med.umich.edu/SciMiner/. A server version of the SciMiner is also available for download and enables users to utilize their institution's journal subscriptions. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Genes , Proteínas/fisiologia , PubMed , Software , Armazenamento e Recuperação da Informação/métodos , Internet , MEDLINE , Publicações , Estados Unidos
7.
Cancer Res ; 69(1): 300-9, 2009 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-19118015

RESUMO

To assess the potential of tumor-associated, alternatively spliced gene products as a source of biomarkers in biological fluids, we have analyzed a large data set of mass spectra derived from the plasma proteome of a mouse model of human pancreatic ductal adenocarcinoma. MS/MS spectra were interrogated for novel splice isoforms using a nonredundant database containing an exhaustive three-frame translation of Ensembl transcripts and gene models from ECgene. This integrated analysis identified 420 distinct splice isoforms, of which 92 did not match any previously annotated mouse protein sequence. We chose seven of those novel variants for validation by reverse transcription-PCR. The results were concordant with the proteomic analysis. All seven novel peptides were successfully amplified in pancreas specimens from both wild-type and mutant mice. Isotopic labeling of cysteine-containing peptides from tumor-bearing mice and wild-type controls enabled relative quantification of the proteins. Differential expression between tumor-bearing and control mice was notable for peptides from novel variants of muscle pyruvate kinase, malate dehydrogenase 1, glyceraldehyde-3-phosphate dehydrogenase, proteoglycan 4, minichromosome maintenance, complex component 9, high mobility group box 2, and hepatocyte growth factor activator. Our results show that, in a mouse model for human pancreatic cancer, novel and differentially expressed alternative splice isoforms are detectable in plasma and may be a source of candidate biomarkers.


Assuntos
Proteínas de Neoplasias/sangue , Proteínas de Neoplasias/genética , Neoplasias Pancreáticas/sangue , Neoplasias Pancreáticas/genética , Processamento Alternativo , Sequência de Aminoácidos , Animais , Proteínas Sanguíneas/genética , Modelos Animais de Doenças , Humanos , Masculino , Dados de Sequência Molecular , Isoformas de Proteínas , Reação em Cadeia da Polimerase Via Transcriptase Reversa
8.
Bioinformatics ; 25(1): 137-8, 2009 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-18812364

RESUMO

UNLABELLED: The MiMI molecular interaction repository integrates data from multiple sources, resolves interactions to standard gene names and symbols, links to annotation data from GO, MeSH and PubMed and normalizes the descriptions of interaction type. Here, we describe a Cytoscape plugin that retrieves interaction and annotation data from MiMI and links out to multiple data sources and tools. Community annotation of the interactome is supported. AVAILABILITY: MiMI plugin v3.0.1 can be installed from within Cytoscape 2.6 using the Cytoscape plugin manager in 'Network and Attribute I/0' category. The plugin is also preloaded when Cytoscape is launched using Java WebStart at http://mimi.ncibi.org by querying a gene and clicking 'View in MiMI Plugin for Cytoscape' link.


Assuntos
Biologia Computacional/métodos , Software , Bases de Dados Genéticas , Interface Usuário-Computador
9.
Bioinformatics ; 25(7): 974-6, 2009 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-18326507

RESUMO

UNLABELLED: MiSearch is an adaptive biomedical literature search tool that ranks citations based on a statistical model for the likelihood that a user will choose to view them. Citation selections are automatically acquired during browsing and used to dynamically update a likelihood model that includes authorship, journal and PubMed indexing information. The user can optionally elect to include or exclude specific features and vary the importance of timeliness in the ranking. AVAILABILITY: http://misearch.ncibi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
PubMed , Software , Algoritmos , Biologia Computacional/métodos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Internet , Interface Usuário-Computador
10.
Bioinformatics ; 24(23): 2760-6, 2008 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-18849319

RESUMO

MOTIVATION: Cell lines are used extensively in biomedical research, but the nomenclature describing cell lines has not been standardized. The problems are both linguistic and experimental. Many ambiguous cell line names appear in the published literature. Users of the same cell line may refer to it in different ways, and cell lines may mutate or become contaminated without the knowledge of the user. As a first step towards rationalizing this nomenclature, we created a cell line knowledgebase (CLKB) with a well-structured collection of names and descriptive data for cell lines cultured in vitro. The objectives of this work are: (i) to assist users in extracting useful information from biomedical text and (ii) to highlight the importance of standardizing cell line names in biomedical research. This CLKB contains a broad collection of cell line names compiled from ATCC, Hyper CLDB and MeSH. In addition to names, the knowledgebase specifies relationships between cell lines. We analyze the use of cell line names in biomedical text. Issues include ambiguous names, polymorphisms in the use of names and the fact that some cell line names are also common English words. Linguistic patterns associated with the occurrence of cell line names are analyzed. Applying these patterns to find additional cell line names in the literature identifies only a small number of additional names. Annotation of microarray gene expression studies is used as a test case. The CLKB facilitates data exploration and comparison of different cell lines in support of clinical and experimental research. AVAILABILITY: The web ontology file for this cell line collection can be downloaded at http://www.stateslab.org/data/celllineOntology/cellline.zip.


Assuntos
Linhagem Celular , Bases de Dados Factuais , Terminologia como Assunto , Biologia Computacional/métodos , MEDLINE
11.
J Bioinform Comput Biol ; 6(3): 493-519, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18574860

RESUMO

The systematic inference of biologically relevant influence networks remains a challenging problem in computational biology. Even though the availability of high-throughput data has enabled the use of probabilistic models to infer the plausible structure of such networks, their true interpretation of the biology of the process is questionable. In this work, we propose a network inference methodology, based on the directed information (DTI) criterion, that incorporates the biology of transcription within the framework so as to enable experimentally verifiable inference. We use publicly available embryonic kidney and T-cell microarray datasets to demonstrate our results. We present two variants of network inference via DTI--supervised and unsupervised--and the inferred networks relevant to mammalian nephrogenesis and T-cell activation. Conformity of the obtained interactions with the literature as well as comparison with the coefficient of determination (CoD) method are demonstrated. Apart from network inference, the proposed framework enables the exploration of specific interactions, not just those revealed by data. To illustrate the latter point, a DTI-based framework to resolve interactions between transcription factor modules and target coregulated genes is proposed. Additionally, we show that DTI can be used in conjunction with mutual information to infer higher-order influence networks involving cooperative gene interactions.


Assuntos
Biologia Computacional/métodos , Serviços de Informação , Modelos Biológicos
12.
Genome Biol ; 9(6): R93, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18522751

RESUMO

We present an in-depth analysis of mouse plasma leading to the development of a publicly available repository composed of 568 liquid chromatography-tandem mass spectrometry runs. A total of 13,779 distinct peptides have been identified with high confidence. The corresponding approximately 3,000 proteins are estimated to span a 7 logarithmic range of abundance in plasma. A major finding from this study is the identification of novel isoforms and transcript variants not previously predicted from genome analysis.


Assuntos
Bases de Dados de Proteínas , Peptídeos/análise , Plasma/química , Processamento Alternativo , Animais , Humanos , Camundongos , Isoformas de Proteínas , Proteômica , Espectrometria de Massas em Tandem
13.
Bioinformatics ; 24(12): 1465-6, 2008 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-18445605

RESUMO

SUMMARY: Cytoscape enhanced search plugin (ESP) enables searching complex biological networks on multiple attribute fields using logical operators and wildcards. Queries use an intuitive syntax and simple search line interface. ESP is implemented as a Cytoscape plugin and complements existing search functions in the Cytoscape network visualization and analysis software, allowing users to easily identify nodes, edges and subgraphs of interest, even for very large networks. Availabiity: http://chianti.ucsd.edu/cyto_web/plugins/ CONTACT: ashkenaz@agri.huji.ac.il.


Assuntos
Algoritmos , Gráficos por Computador , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Transdução de Sinais/fisiologia , Software , Interface Usuário-Computador , Simulação por Computador
14.
J Proteome Res ; 7(6): 2195-203, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18422353

RESUMO

The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, from SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides. The Phosphopeptide FDR Estimator software is freely available for download at http://ncrr.pnl.gov/software/.


Assuntos
Espectrometria de Massas/estatística & dados numéricos , Fosfopeptídeos/análise , Proteômica/métodos , Algoritmos , Teorema de Bayes , Interpretação Estatística de Dados , Análise Discriminante , Fibroblastos/química , Fibroblastos/citologia , Fibroblastos/efeitos da radiação , Humanos , Internet , Distribuição Normal , Curva ROC , Reprodutibilidade dos Testes , Pele/citologia , Software
15.
Artigo em Inglês | MEDLINE | ID: mdl-17951820

RESUMO

The systematic inference of biologically relevant influence networks remains a challenging problem in computational biology. Even though the availability of high-throughput data has enabled the use of probabilistic models to infer the plausible structure of such networks, their true interpretation of the biology of the process is questionable. In this work, we propose a network inference methodology, based on the directed information (DTI) criterion, which incorporates the biology of transcription within the framework, so as to enable experimentally verifiable inference. We use publicly available embryonic kidney and T-cell microarray datasets to demonstrate our results. We present two variants of network inference via DTI (supervised and unsupervised) and the inferred networks relevant to mammalian nephrogenesis as well as T-cell activation. We demonstrate the conformity of the obtained interactions with literature as well as comparison with the coefficient of determination (CoD) method. Apart from network inference, the proposed framework enables the exploration of specific interactions, not just those revealed by data.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fatores de Transcrição/metabolismo , Algoritmos , Simulação por Computador
16.
PLoS Comput Biol ; 3(4): e63, 2007 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-17411336

RESUMO

The MYC genes encode nuclear sequence specific-binding DNA-binding proteins that are pleiotropic regulators of cellular function, and the c-MYC proto-oncogene is deregulated and/or mutated in most human cancers. Experimental studies of MYC binding to the genome are not fully consistent. While many c-MYC recognition sites can be identified in c-MYC responsive genes, other motif matches-even experimentally confirmed sites-are associated with genes showing no c-MYC response. We have developed a computational model that integrates multiple sources of evidence to predict which genes will bind and be regulated by MYC in vivo. First, a Bayesian network classifier is used to predict those c-MYC recognition sites that are most likely to exhibit high-occupancy binding in chromatin immunoprecipitation studies. This classifier incorporates genomic sequence, experimentally determined genomic chromatin acetylation islands, and predicted methylation status from a computational model estimating the likelihood of genomic DNA methylation. We find that the predictions from this classifier are also applicable to other transcription factors, such as cAMP-response element-binding protein, whose binding sites are sensitive to DNA methylation. Second, the MYC binding probability is combined with the gene expression profile data from nine independent microarray datasets in multiple tissues. Finally, we may consider gene function annotations in Gene Ontology to predict the c-MYC targets. We assess the performance of our prediction results by comparing them with the c-myc targets identified in the biomedical literature. In total, we predict 460 likely c-MYC target genes in the human genome, of which 67 have been reported to be both bound and regulated by MYC, 68 are bound by MYC, and another 80 are MYC-regulated. The approach thus successfully identifies many known c-MYC targets and suggests many novel sites. Our findings suggest that to identify c-MYC genomic targets, integration of different data sources helps to improve the accuracy.


Assuntos
Cromatina/química , Cromatina/genética , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Análise de Sequência de Proteína/métodos , Fatores de Transcrição/química , Fatores de Transcrição/genética , Sítios de Ligação , Mapeamento Cromossômico , Simulação por Computador , Modelos Químicos , Ligação Proteica , Mapeamento de Interação de Proteínas , Proto-Oncogene Mas , Relação Estrutura-Atividade , Integração de Sistemas
17.
OMICS ; 11(1): 96-115, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17411398

RESUMO

Gene expression responses are complex and frequently involve the actions of many genes to effect coordinated patterns. We hypothesized these coordinated responses are evolutionarily conserved and used a comparison of human and mouse gene expression profiles to identify the most prominent conserved features across a set of normal mammalian tissues. Based on data from multiple studies across multiple tissues in human and mouse, 13 gene expression modes across multiple tissues were identified in each of these species using principal component analysis. Strikingly, 1-to-1 pairing of human and mouse modes was observed in 12 out of 13 modes obtained from the two species independently. These paired modes define evolutionarily conserved gene expression response modes (CGEMs). Notably, in this study we were able to extract biological responses that are not overwhelmed by laboratory-to-laboratory or species-to-species variation. Of the variation in our gene expression dataset, 84% can be explained using these CGEMs. Functional annotation was performed using Gene Ontology, pathway, and transcription factor binding site over representation. Our conclusion is that we found an unbiased way of obtaining conserved gene response modes that accounts for a considerable portion of gene expression variation in a given dataset, as well as validates the conservation of major gene expression response modes across the mammals.


Assuntos
Perfilação da Expressão Gênica , Animais , Sítios de Ligação , Análise por Conglomerados , Sequência Conservada , Evolução Molecular , Regulação da Expressão Gênica , Humanos , Camundongos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Componente Principal , Especificidade da Espécie , Fatores de Transcrição/metabolismo
18.
Bioinformatics ; 23(2): 232-9, 2007 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-17110368

RESUMO

MOTIVATION: With the rapid increase in the availability of biological graph datasets, there is a growing need for effective and efficient graph querying methods. Due to the noisy and incomplete characteristics of these datasets, exact graph matching methods have limited use and approximate graph matching methods are required. Unfortunately, existing graph matching methods are too restrictive as they only allow exact or near exact graph matching. This paper presents a novel approximate graph matching technique called SAGA. This technique employs a flexible model for computing graph similarity, which allows for node gaps, node mismatches and graph structural differences. SAGA employs an indexing technique that allows it to efficiently evaluate queries even against large graph datasets. RESULTS: SAGA has been used to query biological pathways and literature datasets, which has revealed interesting similarities between distinct pathways that cannot be found by existing methods. These matches associate seemingly unrelated biological processes, connect studies in different sub-areas of biomedical research and thus pose hypotheses for new discoveries. SAGA is also orders of magnitude faster than existing methods. AVAILABILITY: SAGA can be accessed freely via the web at http://www.eecs.umich.edu/saga. Binaries are also freely available at this website.


Assuntos
Algoritmos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Software , Gráficos por Computador , Simulação por Computador , Reconhecimento Automatizado de Padrão/métodos , Interface Usuário-Computador
19.
Artigo em Inglês | MEDLINE | ID: mdl-18340376

RESUMO

Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often leads to discovery of novel motifs (including transcription factor sites) with previously uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work, we present an approach to the identification of motifs (not necessarily transcription factor sites) and examine its application to some questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific gene promoter or regulatory regions from those that are not tissue-specific. There are two main contributions of this work. Firstly, we propose the use of directed information for such classification constrained motif discovery, and then use the selected features with a support vector machine (SVM) classifier to find the tissue specificity of any sequence of interest. Such analysis yields several novel interesting motifs that merit further experimental characterization. Furthermore, this approach leads to a principled framework for the prospective examination of any chosen motif to be discriminatory motif for a group of coexpressed/coregulated genes, thereby integrating sequence and expression perspectives. We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue-specific regulatory role of any conserved sequence element identified from genome-wide studies.

20.
Artigo em Inglês | MEDLINE | ID: mdl-18309363

RESUMO

Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA