Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Plant J ; 107(5): 1363-1386, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34160110

RESUMO

The photosynthetic capacity of mature leaves increases after several days' exposure to constant or intermittent episodes of high light (HL) and is manifested primarily as changes in chloroplast physiology. How this chloroplast-level acclimation to HL is initiated and controlled is unknown. From expanded Arabidopsis leaves, we determined HL-dependent changes in transcript abundance of 3844 genes in a 0-6 h time-series transcriptomics experiment. It was hypothesized that among such genes were those that contribute to the initiation of HL acclimation. By focusing on differentially expressed transcription (co-)factor genes and applying dynamic statistical modelling to the temporal transcriptomics data, a regulatory network of 47 predominantly photoreceptor-regulated transcription (co-)factor genes was inferred. The most connected gene in this network was B-BOX DOMAIN CONTAINING PROTEIN32 (BBX32). Plants overexpressing BBX32 were strongly impaired in acclimation to HL and displayed perturbed expression of photosynthesis-associated genes under LL and after exposure to HL. These observations led to demonstrating that as well as regulation of chloroplast-level acclimation by BBX32, CRYPTOCHROME1, LONG HYPOCOTYL5, CONSTITUTIVELY PHOTOMORPHOGENIC1 and SUPPRESSOR OF PHYA-105 are important. In addition, the BBX32-centric gene regulatory network provides a view of the transcriptional control of acclimation in mature leaves distinct from other photoreceptor-regulated processes, such as seedling photomorphogenesis.


Assuntos
Aclimatação/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Proteínas de Transporte/metabolismo , Regulação da Expressão Gênica de Plantas , Transcriptoma , Aclimatação/efeitos da radiação , Arabidopsis/fisiologia , Arabidopsis/efeitos da radiação , Proteínas de Arabidopsis/genética , Teorema de Bayes , Proteínas de Transporte/genética , Cloroplastos/efeitos da radiação , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Luz , Fotossíntese/efeitos da radiação , Folhas de Planta/genética , Folhas de Planta/fisiologia , Folhas de Planta/efeitos da radiação
2.
Bioinformatics ; 34(5): 884-886, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29126246

RESUMO

Summary: Every year, a large number of novel algorithms are introduced to the scientific community for a myriad of applications, but using these across different research groups is often troublesome, due to suboptimal implementations and specific dependency requirements. This does not have to be the case, as public cloud computing services can easily house tractable implementations within self-contained dependency environments, making the methods easily accessible to a wider public. We have taken 14 popular methods, the majority related to expression data or promoter analysis, developed these up to a good implementation standard and housed the tools in isolated Docker containers which we integrated into the CyVerse Discovery Environment, making these easily usable for a wide community as part of the CyVerse UK project. Availability and implementation: The integrated apps can be found at http://www.cyverse.org/discovery-environment, while the raw code is available at https://github.com/cyversewarwick and the corresponding Docker images are housed at https://hub.docker.com/r/cyversewarwick/. Contact: info@cyverse.warwick.ac.uk or D.L.Wild@warwick.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Computação em Nuvem , Biologia Computacional/métodos , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , Software , Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
3.
Plant Cell ; 28(2): 345-66, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26842464

RESUMO

In Arabidopsis thaliana, changes in metabolism and gene expression drive increased drought tolerance and initiate diverse drought avoidance and escape responses. To address regulatory processes that link these responses, we set out to identify genes that govern early responses to drought. To do this, a high-resolution time series transcriptomics data set was produced, coupled with detailed physiological and metabolic analyses of plants subjected to a slow transition from well-watered to drought conditions. A total of 1815 drought-responsive differentially expressed genes were identified. The early changes in gene expression coincided with a drop in carbon assimilation, and only in the late stages with an increase in foliar abscisic acid content. To identify gene regulatory networks (GRNs) mediating the transition between the early and late stages of drought, we used Bayesian network modeling of differentially expressed transcription factor (TF) genes. This approach identified AGAMOUS-LIKE22 (AGL22), as key hub gene in a TF GRN. It has previously been shown that AGL22 is involved in the transition from vegetative state to flowering but here we show that AGL22 expression influences steady state photosynthetic rates and lifetime water use. This suggests that AGL22 uniquely regulates a transcriptional network during drought stress, linking changes in primary metabolism and the initiation of stress responses.


Assuntos
Ácido Abscísico/metabolismo , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Reguladores de Crescimento de Plantas/metabolismo , Fatores de Transcrição/metabolismo , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/fisiologia , Proteínas de Arabidopsis/genética , Teorema de Bayes , Análise por Conglomerados , Secas , Redes Reguladoras de Genes , Mutação , Fenótipo , Fotossíntese/fisiologia , Estresse Fisiológico , Fatores de Transcrição/genética
4.
Plant Cell ; 27(11): 3038-64, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26566919

RESUMO

Transcriptional reprogramming is integral to effective plant defense. Pathogen effectors act transcriptionally and posttranscriptionally to suppress defense responses. A major challenge to understanding disease and defense responses is discriminating between transcriptional reprogramming associated with microbial-associated molecular pattern (MAMP)-triggered immunity (MTI) and that orchestrated by effectors. A high-resolution time course of genome-wide expression changes following challenge with Pseudomonas syringae pv tomato DC3000 and the nonpathogenic mutant strain DC3000hrpA- allowed us to establish causal links between the activities of pathogen effectors and suppression of MTI and infer with high confidence a range of processes specifically targeted by effectors. Analysis of this information-rich data set with a range of computational tools provided insights into the earliest transcriptional events triggered by effector delivery, regulatory mechanisms recruited, and biological processes targeted. We show that the majority of genes contributing to disease or defense are induced within 6 h postinfection, significantly before pathogen multiplication. Suppression of chloroplast-associated genes is a rapid MAMP-triggered defense response, and suppression of genes involved in chromatin assembly and induction of ubiquitin-related genes coincide with pathogen-induced abscisic acid accumulation. Specific combinations of promoter motifs are engaged in fine-tuning the MTI response and active transcriptional suppression at specific promoter configurations by P. syringae.


Assuntos
Arabidopsis/imunologia , Terapia de Imunossupressão , Moléculas com Motivos Associados a Patógenos/metabolismo , Imunidade Vegetal/genética , Folhas de Planta/imunologia , Pseudomonas syringae/fisiologia , Transcrição Gênica , Arabidopsis/genética , Arabidopsis/microbiologia , Sequência de Bases , Cromatina/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Redes Reguladoras de Genes , Genes de Plantas , Dados de Sequência Molecular , Motivos de Nucleotídeos/genética , Doenças das Plantas/genética , Doenças das Plantas/imunologia , Doenças das Plantas/microbiologia , Folhas de Planta/genética , Folhas de Planta/microbiologia , Regiões Promotoras Genéticas/genética , Pseudomonas syringae/crescimento & desenvolvimento , Fatores de Transcrição/metabolismo
5.
Stat Appl Genet Mol Biol ; 15(1): 83-6, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26910751

RESUMO

The integration of multi-dimensional datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct--but often complementary--information. However, the large amount of data adds burden to any inference task. Flexible Bayesian methods may reduce the necessity for strong modelling assumptions, but can also increase the computational burden. We present an improved implementation of a Bayesian correlated clustering algorithm, that permits integrated clustering to be routinely performed across multiple datasets, each with tens of thousands of items. By exploiting GPU based computation, we are able to improve runtime performance of the algorithm by almost four orders of magnitude. This permits analysis across genomic-scale data sets, greatly expanding the range of applications over those originally possible. MDI is available here: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Algoritmos , Análise por Conglomerados , Cadeias de Markov , Método de Monte Carlo , Software , Biologia de Sistemas/métodos
6.
Bioinformatics ; 31(12): i97-105, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072515

RESUMO

MOTIVATION: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. RESULTS: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase. AVAILABILITY AND IMPLEMENTATION: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Algoritmos , Teorema de Bayes , Ciclo Celular/genética , Simulação por Computador , Modelos Genéticos , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Software
7.
Stat Appl Genet Mol Biol ; 14(3): 307-10, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26030796

RESUMO

Here we introduce the causal structure identification (CSI) package, a Gaussian process based approach to inferring gene regulatory networks (GRNs) from multiple time series data. The standard CSI approach infers a single GRN via joint learning from multiple time series datasets; the hierarchical approach (HCSI) infers a separate GRN for each dataset, albeit with the networks constrained to favor similar structures, allowing for the identification of context specific networks. The software is implemented in MATLAB and includes a graphical user interface (GUI) for user friendly inference. Finally the GUI can be connected to high performance computer clusters to facilitate analysis of large genomic datasets.


Assuntos
Perfilação da Expressão Gênica/métodos , Software , Teorema de Bayes , Regulação da Expressão Gênica , Redes Reguladoras de Genes
8.
Plant Cell ; 24(9): 3530-57, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23023172

RESUMO

Transcriptional reprogramming forms a major part of a plant's response to pathogen infection. Many individual components and pathways operating during plant defense have been identified, but our knowledge of how these different components interact is still rudimentary. We generated a high-resolution time series of gene expression profiles from a single Arabidopsis thaliana leaf during infection by the necrotrophic fungal pathogen Botrytis cinerea. Approximately one-third of the Arabidopsis genome is differentially expressed during the first 48 h after infection, with the majority of changes in gene expression occurring before significant lesion development. We used computational tools to obtain a detailed chronology of the defense response against B. cinerea, highlighting the times at which signaling and metabolic processes change, and identify transcription factor families operating at different times after infection. Motif enrichment and network inference predicted regulatory interactions, and testing of one such prediction identified a role for TGA3 in defense against necrotrophic pathogens. These data provide an unprecedented level of detail about transcriptional changes during a defense response and are suited to systems biology analyses to generate predictive models of the gene regulatory networks mediating the Arabidopsis response to B. cinerea.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiologia , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , Doenças das Plantas/imunologia , Arabidopsis/imunologia , Arabidopsis/metabolismo , Arabidopsis/microbiologia , Botrytis/crescimento & desenvolvimento , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Modelos Genéticos , Mutação , Motivos de Nucleotídeos , Análise de Sequência com Séries de Oligonucleotídeos , Doenças das Plantas/microbiologia , Imunidade Vegetal , Folhas de Planta/genética , Folhas de Planta/metabolismo , Folhas de Planta/microbiologia , Regiões Promotoras Genéticas/genética , Transdução de Sinais , Fatores de Tempo , Fatores de Transcrição/genética , Transcriptoma
9.
Plant J ; 75(1): 26-39, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23578292

RESUMO

A model is presented describing the gene regulatory network surrounding three similar NAC transcription factors that have roles in Arabidopsis leaf senescence and stress responses. ANAC019, ANAC055 and ANAC072 belong to the same clade of NAC domain genes and have overlapping expression patterns. A combination of promoter DNA/protein interactions identified using yeast 1-hybrid analysis and modelling using gene expression time course data has been applied to predict the regulatory network upstream of these genes. Similarities and divergence in regulation during a variety of stress responses are predicted by different combinations of upstream transcription factors binding and also by the modelling. Mutant analysis with potential upstream genes was used to test and confirm some of the predicted interactions. Gene expression analysis in mutants of ANAC019 and ANAC055 at different times during leaf senescence has revealed a distinctly different role for each of these genes. Yeast 1-hybrid analysis is shown to be a valuable tool that can distinguish clades of binding proteins and be used to test and quantify protein binding to predicted promoter motifs.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiologia , Regulação da Expressão Gênica de Plantas , Estresse Fisiológico , Arabidopsis/fisiologia , Proteínas de Arabidopsis/metabolismo , Senescência Celular , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Doenças das Plantas/microbiologia , Folhas de Planta/genética , Folhas de Planta/fisiologia , Plantas Geneticamente Modificadas , Regiões Promotoras Genéticas/genética , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Técnicas do Sistema de Duplo-Híbrido
10.
Bioinformatics ; 29(5): 580-7, 2013 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-23314126

RESUMO

MOTIVATION: The problem of ab initio protein folding is one of the most difficult in modern computational biology. The prediction of residue contacts within a protein provides a more tractable immediate step. Recently introduced maximum entropy-based correlated mutation measures (CMMs), such as direct information, have been successful in predicting residue contacts. However, most correlated mutation studies focus on proteins that have large good-quality multiple sequence alignments (MSA) because the power of correlated mutation analysis falls as the size of the MSA decreases. However, even with small autogenerated MSAs, maximum entropy-based CMMs contain information. To make use of this information, in this article, we focus not on general residue contacts but contacts between residues in ß-sheets. The strong constraints and prior knowledge associated with ß-contacts are ideally suited for prediction using a method that incorporates an often noisy CMM. RESULTS: Using contrastive divergence, a statistical machine learning technique, we have calculated a maximum entropy-based CMM. We have integrated this measure with a new probabilistic model for ß-contact prediction, which is used to predict both residue- and strand-level contacts. Using our model on a standard non-redundant dataset, we significantly outperform a 2D recurrent neural network architecture, achieving a 5% improvement in true positives at the 5% false-positive rate at the residue level. At the strand level, our approach is competitive with the state-of-the-art single methods achieving precision of 61.0% and recall of 55.4%, while not requiring residue solvent accessibility as an input. AVAILABILITY: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/


Assuntos
Inteligência Artificial , Modelos Estatísticos , Estrutura Secundária de Proteína , Entropia , Modelos Moleculares , Mutação , Redes Neurais de Computação , Dobramento de Proteína , Proteínas/química , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína
11.
Plant Cell ; 23(3): 873-94, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21447789

RESUMO

Leaf senescence is an essential developmental process that impacts dramatically on crop yields and involves altered regulation of thousands of genes and many metabolic and signaling pathways, resulting in major changes in the leaf. The regulation of senescence is complex, and although senescence regulatory genes have been characterized, there is little information on how these function in the global control of the process. We used microarray analysis to obtain a high-resolution time-course profile of gene expression during development of a single leaf over a 3-week period to senescence. A complex experimental design approach and a combination of methods were used to extract high-quality replicated data and to identify differentially expressed genes. The multiple time points enable the use of highly informative clustering to reveal distinct time points at which signaling and metabolic pathways change. Analysis of motif enrichment, as well as comparison of transcription factor (TF) families showing altered expression over the time course, identify clear groups of TFs active at different stages of leaf development and senescence. These data enable connection of metabolic processes, signaling pathways, and specific TF activity, which will underpin the development of network models to elucidate the process of senescence.


Assuntos
Proteínas de Arabidopsis/análise , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Folhas de Planta/metabolismo , Análise de Variância , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Clorofila/análise , Análise por Conglomerados , Perfilação da Expressão Gênica , Análise em Microsséries/métodos , Modelos Biológicos , Família Multigênica , Reguladores de Crescimento de Plantas/análise , Folhas de Planta/genética , Folhas de Planta/crescimento & desenvolvimento , Regiões Promotoras Genéticas , RNA de Plantas/genética , Fatores de Transcrição/metabolismo
12.
Bioinformatics ; 28(12): i233-41, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22689766

RESUMO

MOTIVATION: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. RESULTS: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. AVAILABILITY: The methods outlined in this article have been implemented in Matlab and are available on request.


Assuntos
Teorema de Bayes , Redes Reguladoras de Genes , Estatísticas não Paramétricas , Algoritmos , Arabidopsis/genética , Regulação da Expressão Gênica , Modelos Teóricos , Fatores de Transcrição/genética , Técnicas do Sistema de Duplo-Híbrido
13.
Bioinformatics ; 28(24): 3290-7, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23047558

RESUMO

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.


Assuntos
Genômica/métodos , Modelos Estatísticos , Teorema de Bayes , Imunoprecipitação da Cromatina , Análise por Conglomerados , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/genética , Biologia de Sistemas
14.
Biophys J ; 102(4): 878-86, 2012 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-22385859

RESUMO

Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Go-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used.


Assuntos
Modelos Moleculares , Dobramento de Proteína , Proteínas de Bactérias/química , Teorema de Bayes , Peptídeos/química , Estrutura Secundária de Proteína , Termodinâmica
15.
BMC Bioinformatics ; 12: 399, 2011 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-21995452

RESUMO

BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.


Assuntos
Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Distribuição Normal , Saccharomyces cerevisiae
16.
Semin Cell Dev Biol ; 20(7): 863-8, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19682595

RESUMO

A major challenge in systems biology is the ability to model complex regulatory interactions, such as gene regulatory networks, and a number of computational approaches have been developed over recent years to address this challenge. This paper reviews a number of these approaches, with a focus on probabilistic graphical models and the integration of diverse data sets, such as gene expression and transcription factor binding site location and activity.


Assuntos
Imunoprecipitação da Cromatina/métodos , Expressão Gênica , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , Biologia de Sistemas/métodos
17.
Bioinformatics ; 26(12): i158-67, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529901

RESUMO

MOTIVATION: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. RESULTS: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs. AVAILABILITY: If interested in the code for the work presented in this article, please contact the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Fatores de Transcrição/metabolismo , Teorema de Bayes , Sítios de Ligação , Família Multigênica , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
18.
BMC Genomics ; 11: 10, 2010 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-20053288

RESUMO

BACKGROUND: During the lifetime of a fermenter culture, the soil bacterium S. coelicolor undergoes a major metabolic switch from exponential growth to antibiotic production. We have studied gene expression patterns during this switch, using a specifically designed Affymetrix genechip and a high-resolution time-series of fermenter-grown samples. RESULTS: Surprisingly, we find that the metabolic switch actually consists of multiple finely orchestrated switching events. Strongly coherent clusters of genes show drastic changes in gene expression already many hours before the classically defined transition phase where the switch from primary to secondary metabolism was expected. The main switch in gene expression takes only 2 hours, and changes in antibiotic biosynthesis genes are delayed relative to the metabolic rearrangements. Furthermore, global variation in morphogenesis genes indicates an involvement of cell differentiation pathways in the decision phase leading up to the commitment to antibiotic biosynthesis. CONCLUSIONS: Our study provides the first detailed insights into the complex sequence of early regulatory events during and preceding the major metabolic switch in S. coelicolor, which will form the starting point for future attempts at engineering antibiotic production in a biotechnological setting.


Assuntos
Perfilação da Expressão Gênica , Streptomyces coelicolor/genética , Streptomyces coelicolor/metabolismo , Antibacterianos/biossíntese , Análise por Conglomerados , Fermentação , Regulação Bacteriana da Expressão Gênica , Genes Bacterianos , Família Multigênica , RNA Bacteriano/genética , Streptomyces coelicolor/crescimento & desenvolvimento
19.
Biophys J ; 96(11): 4399-408, 2009 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-19486664

RESUMO

Efficient and accurate reconstruction of secondary structure elements in the context of protein structure prediction is the major focus of this work. We present a novel approach capable of reconstructing alpha-helices and beta-sheets in atomic detail. The method is based on Metropolis Monte Carlo simulations in a force field of empirical potentials that are designed to stabilize secondary structure elements in room-temperature simulations. Particular attention is paid to lateral side-chain interactions in beta-sheets and between the turns of alpha-helices, as well as backbone hydrogen bonding. The force constants are optimized using contrastive divergence, a novel machine learning technique, from a data set of known structures. Using this approach, we demonstrate the applicability of the framework to the problem of reconstructing the overall protein fold for a number of commonly studied small proteins, based on only predicted secondary structure and contact map. For protein G and chymotrypsin inhibitor 2, we are able to reconstruct the secondary structure elements in atomic detail and the overall protein folds with a root mean-square deviation of <10 A. For cold-shock protein and the SH3 domain, we accurately reproduce the secondary structure elements and the topology of the 5-stranded beta-sheets, but not the barrel structure. The importance of high-quality secondary structure and contact map prediction is discussed.


Assuntos
Modelos Químicos , Estabilidade Proteica , Estrutura Secundária de Proteína , Algoritmos , Inteligência Artificial , Proteínas de Bactérias/química , Simulação por Computador , Escherichia coli , Ligação de Hidrogênio , Modelos Moleculares , Método de Monte Carlo , Proteínas do Tecido Nervoso/química , Peptídeos/química , Proteínas de Plantas/química , Temperatura , Domínios de Homologia de src , Quinases da Família src/química
20.
BMC Bioinformatics ; 10: 242, 2009 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-19660130

RESUMO

BACKGROUND: Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS: We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. CONCLUSION: Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.


Assuntos
Perfilação da Expressão Gênica/métodos , Design de Software , Algoritmos , Arabidopsis/genética , Teorema de Bayes , Análise por Conglomerados , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA