Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Plant J ; 107(5): 1363-1386, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34160110

RESUMEN

The photosynthetic capacity of mature leaves increases after several days' exposure to constant or intermittent episodes of high light (HL) and is manifested primarily as changes in chloroplast physiology. How this chloroplast-level acclimation to HL is initiated and controlled is unknown. From expanded Arabidopsis leaves, we determined HL-dependent changes in transcript abundance of 3844 genes in a 0-6 h time-series transcriptomics experiment. It was hypothesized that among such genes were those that contribute to the initiation of HL acclimation. By focusing on differentially expressed transcription (co-)factor genes and applying dynamic statistical modelling to the temporal transcriptomics data, a regulatory network of 47 predominantly photoreceptor-regulated transcription (co-)factor genes was inferred. The most connected gene in this network was B-BOX DOMAIN CONTAINING PROTEIN32 (BBX32). Plants overexpressing BBX32 were strongly impaired in acclimation to HL and displayed perturbed expression of photosynthesis-associated genes under LL and after exposure to HL. These observations led to demonstrating that as well as regulation of chloroplast-level acclimation by BBX32, CRYPTOCHROME1, LONG HYPOCOTYL5, CONSTITUTIVELY PHOTOMORPHOGENIC1 and SUPPRESSOR OF PHYA-105 are important. In addition, the BBX32-centric gene regulatory network provides a view of the transcriptional control of acclimation in mature leaves distinct from other photoreceptor-regulated processes, such as seedling photomorphogenesis.


Asunto(s)
Aclimatación/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Proteínas Portadoras/metabolismo , Regulación de la Expresión Génica de las Plantas , Transcriptoma , Aclimatación/efectos de la radiación , Arabidopsis/fisiología , Arabidopsis/efectos de la radiación , Proteínas de Arabidopsis/genética , Teorema de Bayes , Proteínas Portadoras/genética , Cloroplastos/efectos de la radiación , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Luz , Fotosíntesis/efectos de la radiación , Hojas de la Planta/genética , Hojas de la Planta/fisiología , Hojas de la Planta/efectos de la radiación
2.
Bioinformatics ; 34(5): 884-886, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29126246

RESUMEN

Summary: Every year, a large number of novel algorithms are introduced to the scientific community for a myriad of applications, but using these across different research groups is often troublesome, due to suboptimal implementations and specific dependency requirements. This does not have to be the case, as public cloud computing services can easily house tractable implementations within self-contained dependency environments, making the methods easily accessible to a wider public. We have taken 14 popular methods, the majority related to expression data or promoter analysis, developed these up to a good implementation standard and housed the tools in isolated Docker containers which we integrated into the CyVerse Discovery Environment, making these easily usable for a wide community as part of the CyVerse UK project. Availability and implementation: The integrated apps can be found at http://www.cyverse.org/discovery-environment, while the raw code is available at https://github.com/cyversewarwick and the corresponding Docker images are housed at https://hub.docker.com/r/cyversewarwick/. Contact: info@cyverse.warwick.ac.uk or D.L.Wild@warwick.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Nube Computacional , Biología Computacional/métodos , Regulación de la Expresión Génica , Regiones Promotoras Genéticas , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos
3.
Plant Cell ; 28(2): 345-66, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26842464

RESUMEN

In Arabidopsis thaliana, changes in metabolism and gene expression drive increased drought tolerance and initiate diverse drought avoidance and escape responses. To address regulatory processes that link these responses, we set out to identify genes that govern early responses to drought. To do this, a high-resolution time series transcriptomics data set was produced, coupled with detailed physiological and metabolic analyses of plants subjected to a slow transition from well-watered to drought conditions. A total of 1815 drought-responsive differentially expressed genes were identified. The early changes in gene expression coincided with a drop in carbon assimilation, and only in the late stages with an increase in foliar abscisic acid content. To identify gene regulatory networks (GRNs) mediating the transition between the early and late stages of drought, we used Bayesian network modeling of differentially expressed transcription factor (TF) genes. This approach identified AGAMOUS-LIKE22 (AGL22), as key hub gene in a TF GRN. It has previously been shown that AGL22 is involved in the transition from vegetative state to flowering but here we show that AGL22 expression influences steady state photosynthetic rates and lifetime water use. This suggests that AGL22 uniquely regulates a transcriptional network during drought stress, linking changes in primary metabolism and the initiation of stress responses.


Asunto(s)
Ácido Abscísico/metabolismo , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas , Reguladores del Crecimiento de las Plantas/metabolismo , Factores de Transcripción/metabolismo , Arabidopsis/crecimiento & desarrollo , Arabidopsis/fisiología , Proteínas de Arabidopsis/genética , Teorema de Bayes , Análisis por Conglomerados , Sequías , Redes Reguladoras de Genes , Mutación , Fenotipo , Fotosíntesis/fisiología , Estrés Fisiológico , Factores de Transcripción/genética
4.
Plant Cell ; 27(11): 3038-64, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26566919

RESUMEN

Transcriptional reprogramming is integral to effective plant defense. Pathogen effectors act transcriptionally and posttranscriptionally to suppress defense responses. A major challenge to understanding disease and defense responses is discriminating between transcriptional reprogramming associated with microbial-associated molecular pattern (MAMP)-triggered immunity (MTI) and that orchestrated by effectors. A high-resolution time course of genome-wide expression changes following challenge with Pseudomonas syringae pv tomato DC3000 and the nonpathogenic mutant strain DC3000hrpA- allowed us to establish causal links between the activities of pathogen effectors and suppression of MTI and infer with high confidence a range of processes specifically targeted by effectors. Analysis of this information-rich data set with a range of computational tools provided insights into the earliest transcriptional events triggered by effector delivery, regulatory mechanisms recruited, and biological processes targeted. We show that the majority of genes contributing to disease or defense are induced within 6 h postinfection, significantly before pathogen multiplication. Suppression of chloroplast-associated genes is a rapid MAMP-triggered defense response, and suppression of genes involved in chromatin assembly and induction of ubiquitin-related genes coincide with pathogen-induced abscisic acid accumulation. Specific combinations of promoter motifs are engaged in fine-tuning the MTI response and active transcriptional suppression at specific promoter configurations by P. syringae.


Asunto(s)
Arabidopsis/inmunología , Terapia de Inmunosupresión , Moléculas de Patrón Molecular Asociado a Patógenos/metabolismo , Inmunidad de la Planta/genética , Hojas de la Planta/inmunología , Pseudomonas syringae/fisiología , Transcripción Genética , Arabidopsis/genética , Arabidopsis/microbiología , Secuencia de Bases , Cromatina/metabolismo , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Ontología de Genes , Redes Reguladoras de Genes , Genes de Plantas , Datos de Secuencia Molecular , Motivos de Nucleótidos/genética , Enfermedades de las Plantas/genética , Enfermedades de las Plantas/inmunología , Enfermedades de las Plantas/microbiología , Hojas de la Planta/genética , Hojas de la Planta/microbiología , Regiones Promotoras Genéticas/genética , Pseudomonas syringae/crecimiento & desarrollo , Factores de Transcripción/metabolismo
5.
Stat Appl Genet Mol Biol ; 15(1): 83-6, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26910751

RESUMEN

The integration of multi-dimensional datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct--but often complementary--information. However, the large amount of data adds burden to any inference task. Flexible Bayesian methods may reduce the necessity for strong modelling assumptions, but can also increase the computational burden. We present an improved implementation of a Bayesian correlated clustering algorithm, that permits integrated clustering to be routinely performed across multiple datasets, each with tens of thousands of items. By exploiting GPU based computation, we are able to improve runtime performance of the algorithm by almost four orders of magnitude. This permits analysis across genomic-scale data sets, greatly expanding the range of applications over those originally possible. MDI is available here: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Algoritmos , Análisis por Conglomerados , Cadenas de Markov , Método de Montecarlo , Programas Informáticos , Biología de Sistemas/métodos
6.
Bioinformatics ; 31(12): i97-105, 2015 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-26072515

RESUMEN

MOTIVATION: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. RESULTS: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase. AVAILABILITY AND IMPLEMENTATION: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/.


Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Algoritmos , Teorema de Bayes , Ciclo Celular/genética , Simulación por Computador , Modelos Genéticos , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Programas Informáticos
7.
Stat Appl Genet Mol Biol ; 14(3): 307-10, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26030796

RESUMEN

Here we introduce the causal structure identification (CSI) package, a Gaussian process based approach to inferring gene regulatory networks (GRNs) from multiple time series data. The standard CSI approach infers a single GRN via joint learning from multiple time series datasets; the hierarchical approach (HCSI) infers a separate GRN for each dataset, albeit with the networks constrained to favor similar structures, allowing for the identification of context specific networks. The software is implemented in MATLAB and includes a graphical user interface (GUI) for user friendly inference. Finally the GUI can be connected to high performance computer clusters to facilitate analysis of large genomic datasets.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Programas Informáticos , Teorema de Bayes , Regulación de la Expresión Génica , Redes Reguladoras de Genes
8.
Plant Cell ; 24(9): 3530-57, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23023172

RESUMEN

Transcriptional reprogramming forms a major part of a plant's response to pathogen infection. Many individual components and pathways operating during plant defense have been identified, but our knowledge of how these different components interact is still rudimentary. We generated a high-resolution time series of gene expression profiles from a single Arabidopsis thaliana leaf during infection by the necrotrophic fungal pathogen Botrytis cinerea. Approximately one-third of the Arabidopsis genome is differentially expressed during the first 48 h after infection, with the majority of changes in gene expression occurring before significant lesion development. We used computational tools to obtain a detailed chronology of the defense response against B. cinerea, highlighting the times at which signaling and metabolic processes change, and identify transcription factor families operating at different times after infection. Motif enrichment and network inference predicted regulatory interactions, and testing of one such prediction identified a role for TGA3 in defense against necrotrophic pathogens. These data provide an unprecedented level of detail about transcriptional changes during a defense response and are suited to systems biology analyses to generate predictive models of the gene regulatory networks mediating the Arabidopsis response to B. cinerea.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiología , Regulación de la Expresión Génica de las Plantas/genética , Genoma de Planta/genética , Enfermedades de las Plantas/inmunología , Arabidopsis/inmunología , Arabidopsis/metabolismo , Arabidopsis/microbiología , Botrytis/crecimiento & desarrollo , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Modelos Genéticos , Mutación , Motivos de Nucleótidos , Análisis de Secuencia por Matrices de Oligonucleótidos , Enfermedades de las Plantas/microbiología , Inmunidad de la Planta , Hojas de la Planta/genética , Hojas de la Planta/metabolismo , Hojas de la Planta/microbiología , Regiones Promotoras Genéticas/genética , Transducción de Señal , Factores de Tiempo , Factores de Transcripción/genética , Transcriptoma
9.
Plant J ; 75(1): 26-39, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23578292

RESUMEN

A model is presented describing the gene regulatory network surrounding three similar NAC transcription factors that have roles in Arabidopsis leaf senescence and stress responses. ANAC019, ANAC055 and ANAC072 belong to the same clade of NAC domain genes and have overlapping expression patterns. A combination of promoter DNA/protein interactions identified using yeast 1-hybrid analysis and modelling using gene expression time course data has been applied to predict the regulatory network upstream of these genes. Similarities and divergence in regulation during a variety of stress responses are predicted by different combinations of upstream transcription factors binding and also by the modelling. Mutant analysis with potential upstream genes was used to test and confirm some of the predicted interactions. Gene expression analysis in mutants of ANAC019 and ANAC055 at different times during leaf senescence has revealed a distinctly different role for each of these genes. Yeast 1-hybrid analysis is shown to be a valuable tool that can distinguish clades of binding proteins and be used to test and quantify protein binding to predicted promoter motifs.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiología , Regulación de la Expresión Génica de las Plantas , Estrés Fisiológico , Arabidopsis/fisiología , Proteínas de Arabidopsis/metabolismo , Senescencia Celular , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos , Enfermedades de las Plantas/microbiología , Hojas de la Planta/genética , Hojas de la Planta/fisiología , Plantas Modificadas Genéticamente , Regiones Promotoras Genéticas/genética , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Técnicas del Sistema de Dos Híbridos
10.
Bioinformatics ; 29(5): 580-7, 2013 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-23314126

RESUMEN

MOTIVATION: The problem of ab initio protein folding is one of the most difficult in modern computational biology. The prediction of residue contacts within a protein provides a more tractable immediate step. Recently introduced maximum entropy-based correlated mutation measures (CMMs), such as direct information, have been successful in predicting residue contacts. However, most correlated mutation studies focus on proteins that have large good-quality multiple sequence alignments (MSA) because the power of correlated mutation analysis falls as the size of the MSA decreases. However, even with small autogenerated MSAs, maximum entropy-based CMMs contain information. To make use of this information, in this article, we focus not on general residue contacts but contacts between residues in ß-sheets. The strong constraints and prior knowledge associated with ß-contacts are ideally suited for prediction using a method that incorporates an often noisy CMM. RESULTS: Using contrastive divergence, a statistical machine learning technique, we have calculated a maximum entropy-based CMM. We have integrated this measure with a new probabilistic model for ß-contact prediction, which is used to predict both residue- and strand-level contacts. Using our model on a standard non-redundant dataset, we significantly outperform a 2D recurrent neural network architecture, achieving a 5% improvement in true positives at the 5% false-positive rate at the residue level. At the strand level, our approach is competitive with the state-of-the-art single methods achieving precision of 61.0% and recall of 55.4%, while not requiring residue solvent accessibility as an input. AVAILABILITY: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/


Asunto(s)
Inteligencia Artificial , Modelos Estadísticos , Estructura Secundaria de Proteína , Entropía , Modelos Moleculares , Mutación , Redes Neurales de la Computación , Pliegue de Proteína , Proteínas/química , Proteínas/genética , Alineación de Secuencia , Análisis de Secuencia de Proteína
11.
Plant Cell ; 23(3): 873-94, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21447789

RESUMEN

Leaf senescence is an essential developmental process that impacts dramatically on crop yields and involves altered regulation of thousands of genes and many metabolic and signaling pathways, resulting in major changes in the leaf. The regulation of senescence is complex, and although senescence regulatory genes have been characterized, there is little information on how these function in the global control of the process. We used microarray analysis to obtain a high-resolution time-course profile of gene expression during development of a single leaf over a 3-week period to senescence. A complex experimental design approach and a combination of methods were used to extract high-quality replicated data and to identify differentially expressed genes. The multiple time points enable the use of highly informative clustering to reveal distinct time points at which signaling and metabolic pathways change. Analysis of motif enrichment, as well as comparison of transcription factor (TF) families showing altered expression over the time course, identify clear groups of TFs active at different stages of leaf development and senescence. These data enable connection of metabolic processes, signaling pathways, and specific TF activity, which will underpin the development of network models to elucidate the process of senescence.


Asunto(s)
Proteínas de Arabidopsis/análisis , Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas , Hojas de la Planta/metabolismo , Análisis de Varianza , Arabidopsis/crecimiento & desarrollo , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Clorofila/análisis , Análisis por Conglomerados , Perfilación de la Expresión Génica , Análisis por Micromatrices/métodos , Modelos Biológicos , Familia de Multigenes , Reguladores del Crecimiento de las Plantas/análisis , Hojas de la Planta/genética , Hojas de la Planta/crecimiento & desarrollo , Regiones Promotoras Genéticas , ARN de Planta/genética , Factores de Transcripción/metabolismo
12.
Bioinformatics ; 28(12): i233-41, 2012 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-22689766

RESUMEN

MOTIVATION: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. RESULTS: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. AVAILABILITY: The methods outlined in this article have been implemented in Matlab and are available on request.


Asunto(s)
Teorema de Bayes , Redes Reguladoras de Genes , Estadísticas no Paramétricas , Algoritmos , Arabidopsis/genética , Regulación de la Expresión Génica , Modelos Teóricos , Factores de Transcripción/genética , Técnicas del Sistema de Dos Híbridos
13.
Bioinformatics ; 28(24): 3290-7, 2012 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-23047558

RESUMEN

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.


Asunto(s)
Genómica/métodos , Modelos Estadísticos , Teorema de Bayes , Inmunoprecipitación de Cromatina , Análisis por Conglomerados , Expresión Génica , Perfilación de la Expresión Génica/métodos , Distribución Normal , Análisis de Secuencia por Matrices de Oligonucleótidos , Mapeo de Interacción de Proteínas , Saccharomyces cerevisiae/genética , Biología de Sistemas
14.
Biophys J ; 102(4): 878-86, 2012 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-22385859

RESUMEN

Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Go-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used.


Asunto(s)
Modelos Moleculares , Pliegue de Proteína , Proteínas Bacterianas/química , Teorema de Bayes , Péptidos/química , Estructura Secundaria de Proteína , Termodinámica
15.
BMC Bioinformatics ; 12: 399, 2011 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-21995452

RESUMEN

BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.


Asunto(s)
Teorema de Bayes , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Humanos , Modelos Biológicos , Distribución Normal , Saccharomyces cerevisiae
16.
Semin Cell Dev Biol ; 20(7): 863-8, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19682595

RESUMEN

A major challenge in systems biology is the ability to model complex regulatory interactions, such as gene regulatory networks, and a number of computational approaches have been developed over recent years to address this challenge. This paper reviews a number of these approaches, with a focus on probabilistic graphical models and the integration of diverse data sets, such as gene expression and transcription factor binding site location and activity.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Expresión Génica , Redes Reguladoras de Genes , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Biología de Sistemas/métodos
17.
Bioinformatics ; 26(12): i158-67, 2010 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-20529901

RESUMEN

MOTIVATION: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. RESULTS: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs. AVAILABILITY: If interested in the code for the work presented in this article, please contact the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Factores de Transcripción/metabolismo , Teorema de Bayes , Sitios de Unión , Familia de Multigenes , Análisis de Secuencia por Matrices de Oligonucleótidos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
18.
BMC Genomics ; 11: 10, 2010 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-20053288

RESUMEN

BACKGROUND: During the lifetime of a fermenter culture, the soil bacterium S. coelicolor undergoes a major metabolic switch from exponential growth to antibiotic production. We have studied gene expression patterns during this switch, using a specifically designed Affymetrix genechip and a high-resolution time-series of fermenter-grown samples. RESULTS: Surprisingly, we find that the metabolic switch actually consists of multiple finely orchestrated switching events. Strongly coherent clusters of genes show drastic changes in gene expression already many hours before the classically defined transition phase where the switch from primary to secondary metabolism was expected. The main switch in gene expression takes only 2 hours, and changes in antibiotic biosynthesis genes are delayed relative to the metabolic rearrangements. Furthermore, global variation in morphogenesis genes indicates an involvement of cell differentiation pathways in the decision phase leading up to the commitment to antibiotic biosynthesis. CONCLUSIONS: Our study provides the first detailed insights into the complex sequence of early regulatory events during and preceding the major metabolic switch in S. coelicolor, which will form the starting point for future attempts at engineering antibiotic production in a biotechnological setting.


Asunto(s)
Perfilación de la Expresión Génica , Streptomyces coelicolor/genética , Streptomyces coelicolor/metabolismo , Antibacterianos/biosíntesis , Análisis por Conglomerados , Fermentación , Regulación Bacteriana de la Expresión Génica , Genes Bacterianos , Familia de Multigenes , ARN Bacteriano/genética , Streptomyces coelicolor/crecimiento & desarrollo
19.
Biophys J ; 96(11): 4399-408, 2009 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-19486664

RESUMEN

Efficient and accurate reconstruction of secondary structure elements in the context of protein structure prediction is the major focus of this work. We present a novel approach capable of reconstructing alpha-helices and beta-sheets in atomic detail. The method is based on Metropolis Monte Carlo simulations in a force field of empirical potentials that are designed to stabilize secondary structure elements in room-temperature simulations. Particular attention is paid to lateral side-chain interactions in beta-sheets and between the turns of alpha-helices, as well as backbone hydrogen bonding. The force constants are optimized using contrastive divergence, a novel machine learning technique, from a data set of known structures. Using this approach, we demonstrate the applicability of the framework to the problem of reconstructing the overall protein fold for a number of commonly studied small proteins, based on only predicted secondary structure and contact map. For protein G and chymotrypsin inhibitor 2, we are able to reconstruct the secondary structure elements in atomic detail and the overall protein folds with a root mean-square deviation of <10 A. For cold-shock protein and the SH3 domain, we accurately reproduce the secondary structure elements and the topology of the 5-stranded beta-sheets, but not the barrel structure. The importance of high-quality secondary structure and contact map prediction is discussed.


Asunto(s)
Modelos Químicos , Estabilidad Proteica , Estructura Secundaria de Proteína , Algoritmos , Inteligencia Artificial , Proteínas Bacterianas/química , Simulación por Computador , Escherichia coli , Enlace de Hidrógeno , Modelos Moleculares , Método de Montecarlo , Proteínas del Tejido Nervioso/química , Péptidos/química , Proteínas de Plantas/química , Temperatura , Dominios Homologos src , Familia-src Quinasas/química
20.
BMC Bioinformatics ; 10: 242, 2009 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-19660130

RESUMEN

BACKGROUND: Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS: We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. CONCLUSION: Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Diseño de Software , Algoritmos , Arabidopsis/genética , Teorema de Bayes , Análisis por Conglomerados , Análisis de Secuencia por Matrices de Oligonucleótidos , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA