RESUMO
Photocatalysis mediated by low energy light wavelengths has potential to enable safer, sustainable synthetic methods. A phenanthroline-derived ligand bathocupSani, with a large two-photon absorption (TPA) cross section was used to construct a heteroleptic complex [Cu(bathocupSani)(DPEPhos)]BF4 and a homoleptic complex [Cu(bathocupSani)2]BF4. The ligand and the respective homoleptic complex with copper exhibit two-photon upconversion with an anti-Stokes shift of 1.2â eV using red light. The complex [Cu(bathocupSani)2]BF4 promoted energy transfer photocatalysis enabling oxidative dimerization of benzylic amines, sulfide oxidation, phosphine oxidation, boronic acid oxidation and atom-transfer radical addition.
RESUMO
BACKGROUND: The impact of a perturbation, over-expression, or repression of a key node on an organism, can be modelled based on a regulatory and/or metabolic network. Integration of these two networks could improve our global understanding of biological mechanisms triggered by a perturbation. This study focuses on improving the modelling of the regulatory network to facilitate a possible integration with the metabolic network. Previously proposed methods that study this problem fail to deal with a real-size regulatory network, computing predictions sensitive to perturbation and quantifying the predicted species behaviour more finely. RESULTS: To address previously mentioned limitations, we develop a new method based on Answer Set Programming, MajS. It takes a regulatory network and a discrete partial set of observations as input. MajS tests the consistency between the input data, proposes minimal repairs on the network to establish consistency, and finally computes weighted and signed predictions over the network species. We tested MajS by comparing the HIF-1 signalling pathway with two gene-expression datasets. Our results show that MajS can predict 100% of unobserved species. When comparing MajS with two similar (discrete and quantitative) tools, we observed that compared with the discrete tool, MajS proposes a better coverage of the unobserved species, is more sensitive to system perturbations, and proposes predictions closer to real data. Compared to the quantitative tool, MajS provides more refined discrete predictions that agree with the dynamic proposed by the quantitative tool. CONCLUSIONS: MajS is a new method to test the consistency between a regulatory network and a dataset that provides computational predictions on unobserved network species. It provides fine-grained discrete predictions by outputting the weight of the predicted sign as a piece of additional information. MajS' output, thanks to its weight, could easily be integrated with metabolic network modelling.
Assuntos
Transdução de Sinais , Expressão GênicaRESUMO
Protein signaling networks are static views of dynamic processes where proteins go through many biochemical modifications such as ubiquitination and phosphorylation to propagate signals that regulate cells and can act as feed-back systems. Understanding the precise mechanisms underlying protein interactions can elucidate how signaling and cell cycle progression occur within cells in different diseases such as cancer. Large-scale protein signaling networks contain an important number of experimentally verified protein relations but lack the capability to predict the outcomes of the system, and therefore to be trained with respect to experimental measurements. Boolean Networks (BNs) are a simple yet powerful framework to study and model the dynamics of the protein signaling networks. While many BN approaches exist to model biological systems, they focus mainly on system properties, and few exist to integrate experimental data in them. In this work, we show an application of a method conceived to integrate time series phosphoproteomic data into protein signaling networks. We use a large-scale real case study from the HPN-DREAM Breast Cancer challenge. Our efficient and parameter-free method combines logic programming and model-checking to infer a family of BNs from multiple perturbation time series data of four breast cancer cell lines given a prior protein signaling network. Because each predicted BN family is cell line specific, our method highlights commonalities and discrepancies between the four cell lines. Our models have a Root Mean Square Error (RMSE) of 0.31 with respect to the testing data, while the best performant method of this HPN-DREAM challenge had a RMSE of 0.47. To further validate our results, BNs are compared with the canonical mTOR pathway showing a comparable AUROC score (0.77) to the top performing HPN-DREAM teams. In addition, our approach can also be used as a complementary method to identify erroneous experiments. These results prove our methodology as an efficient dynamic model discovery method in multiple perturbation time course experimental data of large-scale signaling networks. The software and data are publicly available at https://github.com/misbahch6/caspo-ts.
Assuntos
Modelos Biológicos , Neoplasias/genética , Mapas de Interação de Proteínas/genética , Proteômica/métodos , Transdução de Sinais/genética , Algoritmos , Linhagem Celular Tumoral , Humanos , Neoplasias/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/metabolismoRESUMO
Increasing amounts of sequence data are becoming available for a wide range of non-model organisms. Investigating and modelling the metabolic behaviour of those organisms is highly relevant to understand their biology and ecology. As sequences are often incomplete and poorly annotated, draft networks of their metabolism largely suffer from incompleteness. Appropriate gap-filling methods to identify and add missing reactions are therefore required to address this issue. However, current tools rely on phenotypic or taxonomic information, or are very sensitive to the stoichiometric balance of metabolic reactions, especially concerning the co-factors. This type of information is often not available or at least prone to errors for newly-explored organisms. Here we introduce Meneco, a tool dedicated to the topological gap-filling of genome-scale draft metabolic networks. Meneco reformulates gap-filling as a qualitative combinatorial optimization problem, omitting constraints raised by the stoichiometry of a metabolic network considered in other methods, and solves this problem using Answer Set Programming. Run on several artificial test sets gathering 10,800 degraded Escherichia coli networks Meneco was able to efficiently identify essential reactions missing in networks at high degradation rates, outperforming the stoichiometry-based tools in scalability. To demonstrate the utility of Meneco we applied it to two case studies. Its application to recent metabolic networks reconstructed for the brown algal model Ectocarpus siliculosus and an associated bacterium Candidatus Phaeomarinobacter ectocarpi revealed several candidate metabolic pathways for algal-bacterial interactions. Then Meneco was used to reconstruct, from transcriptomic and metabolomic data, the first metabolic network for the microalga Euglena mutabilis. These two case studies show that Meneco is a versatile tool to complete draft genome-scale metabolic networks produced from heterogeneous data, and to suggest relevant reactions that explain the metabolic capacity of a biological system.
Assuntos
Genômica/métodos , Redes e Vias Metabólicas/genética , Metaboloma/genética , Software , Transcriptoma/genética , Algoritmos , Bases de Dados Genéticas , Escherichia coli/genética , Escherichia coli/metabolismo , Genoma/genéticaRESUMO
Sea urchin eggs exhibit a cap-dependent increase in protein synthesis within minutes after fertilization. This rise in protein synthesis occurs at a constant rate for a great number of proteins translated from the different available mRNAs. Surprisingly, we found that cyclin B, a major cell-cycle regulator, follows a synthesis pattern that is distinct from the global protein population, so we developed a mathematical model to analyze this dissimilarity in biosynthesis kinetic patterns. The model includes two pathways for cyclin B mRNA entry into the translational machinery: one from immediately available mRNA (mRNAcyclinB) and one from mRNA activated solely after fertilization (XXmRNAcyclinB). Two coefficients, α and ß, were added to fit the measured scales of global protein and cyclin B synthesis, respectively. The model was simplified to identify the synthesis parameters and to allow its simulation. The calculated parameters for activation of the specific cyclin B synthesis pathway after fertilization included a kinetic constant (ka ) of 0.024 sec-1 , for the activation of XXmRNAcyclinB, and a critical time interval (t2 ) of 42 min. The proportion of XXmRNAcyclinB form was also calculated to be largely dominant over the mRNAcyclinB form. Regulation of cyclin B biosynthesis is an example of a select protein whose translation is controlled by pathways that are distinct from housekeeping proteins, even though both involve the same cap-dependent initiation pathway. Therefore, this model should help provide insight to the signaling utilized for the biosynthesis of cyclin B and other select proteins. Mol. Reprod. Dev. 83: 1070-1082, 2016. © 2016 Wiley Periodicals, Inc.
Assuntos
Ciclina B/biossíntese , Fertilização , Modelos Biológicos , Óvulo/metabolismo , Biossíntese de Proteínas/fisiologia , RNA Mensageiro Estocado/metabolismo , Animais , Feminino , Óvulo/citologia , Ouriços-do-Mar/metabolismoRESUMO
Single-cell transcriptomic studies of differentiating systems allow meaningful understanding, especially in human embryonic development and cell fate determination. We present an innovative method aimed at modeling these intricate processes by leveraging scRNAseq data from various human developmental stages. Our implemented method identifies pseudo-perturbations, since actual perturbations are unavailable due to ethical and technical constraints. By integrating these pseudo-perturbations with prior knowledge of gene interactions, our framework generates stage-specific Boolean networks (BNs). We apply our method to medium and late trophectoderm developmental stages and identify 20 pseudo-perturbations required to infer BNs. The resulting BN families delineate distinct regulatory mechanisms, enabling the differentiation between these developmental stages. We show that our program outperforms existing pseudo-perturbation identification tool. Our framework contributes to comprehending human developmental processes and holds potential applicability to diverse developmental stages and other research scenarios.
Assuntos
Desenvolvimento Embrionário , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Humanos , Desenvolvimento Embrionário/genética , Análise de Célula Única/métodos , Transcriptoma , Blastocisto/metabolismo , Diferenciação Celular/genética , Biologia Computacional/métodosRESUMO
Among glucocorticoids (GCs), dexamethasone (Dex) is widely used in treatment of multiple myelomas. However, despite a definite benefit, all patients relapse. Moreover, the molecular basis of glucocorticoid efficacy remains elusive. To determine genomic response to Dex in myeloma cells, we generated bulk and single-cell multi-omics data and high-resolution contact maps of active enhancers and target genes. We show that a minority of glucocorticoid receptor-binding sites are associated with enhancer activity gains, increased interaction loops, and transcriptional activity. We identified and characterized a predominant enhancer enriched in cohesin (RAD21) and more accessible upon Dex exposure. Analysis of four gene-specific networks revealed the importance of the CTCF-cohesin couple and the synchronization of regulatory sequence openings for efficient transcription in response to Dex. Notably, these epigenomic changes are associated with cell-to-cell transcriptional heterogeneity, in particular, lineage-specific genes. As consequences, BCL2L11-encoding BIM critical for Dex-induced apoptosis and CXCR4 protective from chemotherapy-induced apoptosis are rather up-regulated in different cells. In summary, our work provides new insights into the molecular mechanisms involved in Dex escape.
Assuntos
Dexametasona , Mieloma Múltiplo , Humanos , Dexametasona/farmacologia , Mieloma Múltiplo/tratamento farmacológico , Mieloma Múltiplo/genética , Recidiva Local de Neoplasia , Glucocorticoides , Apoptose , Receptores de Glucocorticoides/genéticaRESUMO
Despite recent improvements in molecular techniques, biological knowledge remains incomplete. Any theorizing about living systems is therefore necessarily based on the use of heterogeneous and partial information. Much current research has focused successfully on the qualitative behaviors of macromolecular networks. Nonetheless, it is not capable of taking into account available quantitative information such as time-series protein concentration variations. The present work proposes a probabilistic modeling framework that integrates both kinds of information. Average case analysis methods are used in combination with Markov chains to link qualitative information about transcriptional regulations to quantitative information about protein concentrations. The approach is illustrated by modeling the carbon starvation response in Escherichia coli. It accurately predicts the quantitative time-series evolution of several protein concentrations using only knowledge of discrete gene interactions and a small number of quantitative observations on a single protein concentration. From this, the modeling technique also derives a ranking of interactions with respect to their importance during the experiment considered. Such a classification is confirmed by the literature. Therefore, our method is principally novel in that it allows (i) a hybrid model that integrates both qualitative discrete model and quantities to be built, even using a small amount of quantitative information, (ii) new quantitative predictions to be derived, (iii) the robustness and relevance of interactions with respect to phenotypic criteria to be precisely quantified, and (iv) the key features of the model to be extracted that can be used as a guidance to design future experiments.
Assuntos
Algoritmos , Escherichia coli/genética , Redes Reguladoras de Genes , Carbono/metabolismo , Escherichia coli/metabolismo , Regulação da Expressão Gênica , Modelos EstatísticosRESUMO
Understanding lineage specification during human pre-implantation development is a gateway to improving assisted reproductive technologies and stem cell research. Here we employ pseudotime analysis of single-cell RNA sequencing (scRNA-seq) data to reconstruct early mouse and human embryo development. Using time-lapse imaging of annotated embryos, we provide an integrated, ordered, and continuous analysis of transcriptomics changes throughout human development. We reveal that human trophectoderm/inner cell mass transcriptomes diverge at the transition from the B2 to the B3 blastocyst stage, just before blastocyst expansion. We explore the dynamics of the fate markers IFI16 and GATA4 and show that they gradually become mutually exclusive upon establishment of epiblast and primitive endoderm fates, respectively. We also provide evidence that NR2F2 marks trophectoderm maturation, initiating from the polar side, and subsequently spreads to all cells after implantation. Our study pinpoints the precise timing of lineage specification events in the human embryo and identifies transcriptomics hallmarks and cell fate markers.
Assuntos
Desenvolvimento Embrionário , Transcriptoma , Animais , Blastocisto , Linhagem da Célula/genética , Desenvolvimento Embrionário/genética , Camadas Germinativas , Humanos , Camundongos , Transcriptoma/genéticaRESUMO
Human trophoblast stem cells (hTSCs) derived from blastocysts and first-trimester cytotrophoblasts offer an unprecedented opportunity to study the placenta. However, access to human embryos and first-trimester placentas is limited, thus preventing the establishment of hTSCs from diverse genetic backgrounds associated with placental disorders. Here, we show that hTSCs can be generated from numerous genetic backgrounds using post-natal cells via two alternative methods: (1) somatic cell reprogramming of adult fibroblasts with OCT4, SOX2, KLF4, MYC (OSKM) and (2) cell fate conversion of naive and extended pluripotent stem cells. The resulting induced/converted hTSCs recapitulated hallmarks of hTSCs including long-term self-renewal, expression of specific transcription factors, transcriptomic signature, and the potential to differentiate into syncytiotrophoblast and extravillous trophoblast cells. We also clarified the developmental stage of hTSCs and show that these cells resemble day 8 cytotrophoblasts. Altogether, hTSC lines of diverse genetic origins open the possibility to model both placental development and diseases in a dish.
Assuntos
Células-Tronco Pluripotentes/metabolismo , Trofoblastos/metabolismo , Diferenciação Celular , Feminino , Humanos , GravidezRESUMO
Understanding the interactions between microbial communities and their environment sufficiently to predict diversity on the basis of physicochemical parameters is a fundamental pursuit of microbial ecology that still eludes us. However, modeling microbial communities is problematic, because (i) communities are complex, (ii) most descriptions are qualitative, and (iii) quantitative understanding of the way communities interact with their surroundings remains incomplete. One approach to overcoming such complications is the integration of partial qualitative and quantitative descriptions into more complex networks. Here we outline the development of a probabilistic framework, based on Event Transition Graph (ETG) theory, to predict microbial community structure across observed chemical data. Using reverse engineering, we derive probabilities from the ETG that accurately represent observations from experiments and predict putative constraints on communities within dynamic environments. These predictions can feedback into the future development of field experiments by emphasizing the most important functional reactions, and associated microbial strains, required to characterize microbial ecosystems.
RESUMO
Induced pluripotent stem cells (iPSCs) have considerably impacted human developmental biology and regenerative medicine, notably because they circumvent the use of cells of embryonic origin and offer the potential to generate patient-specific pluripotent stem cells. However, conventional reprogramming protocols produce developmentally advanced, or primed, human iPSCs (hiPSCs), restricting their use to post-implantation human development modeling. Hence, there is a need for hiPSCs resembling preimplantation naive epiblast. Here, we develop a method to generate naive hiPSCs directly from somatic cells, using OKMS overexpression and specific culture conditions, further enabling parallel generation of their isogenic primed counterparts. We benchmark naive hiPSCs against human preimplantation epiblast and reveal remarkable concordance in their transcriptome, dependency on mitochondrial respiration and X-chromosome status. Collectively, our results are essential for the understanding of pluripotency regulation throughout preimplantation development and generate new opportunities for disease modeling and regenerative medicine.
Assuntos
Blastocisto/citologia , Células-Tronco Embrionárias/citologia , Camadas Germinativas/citologia , Células-Tronco Pluripotentes Induzidas/citologia , Animais , Blastocisto/metabolismo , Células Cultivadas , Reprogramação Celular/genética , Técnicas de Reprogramação Celular , Desenvolvimento Embrionário/genética , Células-Tronco Embrionárias/metabolismo , Feminino , Fibroblastos/citologia , Fibroblastos/metabolismo , Camadas Germinativas/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Masculino , Camundongos , TranscriptomaRESUMO
Susceptibility to sporadic colorectal cancers (CRC) is generally thought to be the sum of complex interactions between environmental and genetic factors, all of which contribute independently, producing only a modest effect on the whole phenomenon. However, to date, most research has concealed the notion of interaction and merely focused on dissociate analyses of risk factors to highlight associations with CRC. By contrast, we have chosen a combinative approach here to explore the joint effects of several factors at a time. Through an association study based on 1,023 cases and 1,121 controls, we examined the influence on CRC risk of environmental factors coanalyzed with combinations of six single nucleotide polymorphisms located in cytochrome P450 genes (c.-163A>C and c.1548T>C in CYP1A2, g.-1293G>C and g.-1053C>T in CYP2E1, c.1294C>G in CYP1B1, and c.430C>T in CYP2C9). Whereas separate analyses of the SNPs showed no effect on CRC risk, three allelic variant combinations were found to be associated with a significant increase in CRC risk in interaction with an excessive red meat consumption, thereby exacerbating the intrinsic procarcinogenic effect of this dietary factor. One of these three predisposing combinations was also shown to interact positively with obesity. Provided that they are validated, our results suggest the need to develop robust combinative methods to improve genetic investigations into the susceptibility to CRC.
Assuntos
Neoplasias Colorretais/etiologia , Sistema Enzimático do Citocromo P-450/genética , Dieta , Carne , Polimorfismo de Nucleotídeo Único/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Hidrocarboneto de Aril Hidroxilases/genética , Estudos de Casos e Controles , Estudos de Coortes , Neoplasias Colorretais/epidemiologia , Citocromo P-450 CYP1A2/genética , Citocromo P-450 CYP1B1 , Citocromo P-450 CYP2C9 , Citocromo P-450 CYP2E1/genética , Feminino , França/epidemiologia , Predisposição Genética para Doença , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de RiscoRESUMO
Designing probabilistic reaction models and determining their stochastic kinetic parameters are major issues in systems biology. To assist in the construction of reaction network models, we introduce a logic that allows one to express asymptotic properties about the steady-state stochastic dynamics of a reaction network. Basically, the formulas can express properties on expectancies, variances, and covariances. If a formula encoding for experimental observations on the system is not satisfiable, then the reaction network model can be rejected. We demonstrate that deciding the satisfiability of a formula is NP-hard, but we provide a decision method based on solving systems of polynomial constraints. We illustrate our method on a toy example.
Assuntos
Algoritmos , Redes e Vias Metabólicas , Modelos Estatísticos , Biologia de Sistemas , Humanos , Cinética , Modelos Químicos , Processos EstocásticosRESUMO
Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs) for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA) and multi-objective flux variability analysis (MO-FVA). Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity) that take place at the ecosystem scale.
Assuntos
Genoma Microbiano , Microbiota/genética , Modelos Teóricos , Fontes Termais/microbiologia , Redes e Vias Metabólicas , Microbiota/fisiologiaRESUMO
BACKGROUND: When studying metabolism at the organ level, a major challenge is to understand the matter exchanges between the input and output components of the system. For example, in nutrition, biochemical models have been developed to study the metabolism of the mammary gland in relation to the synthesis of milk components. These models were designed to account for the quantitative constraints observed on inputs and outputs of the system. In these models, a compatible flux distribution is first selected. Alternatively, an infinite family of compatible set of flux rates may have to be studied when the constraints raised by observations are insufficient to identify a single flux distribution. The precursors of output nutrients are traced back with analyses similar to the computation of yield rates. However, the computation of the quantitative contributions of precursors may lack precision, mainly because some precursors are involved in the composition of several nutrients and because some metabolites are cycled in loops. RESULTS: We formally modeled the quantitative allocation of input nutrients among the branches of the metabolic network (AIO). It corresponds to yield information which, if standardized across all the outputs of the system, allows a precise quantitative understanding of their precursors. By solving nonlinear optimization problems, we introduced a method to study the variability of AIO coefficients when parsing the space of flux distributions that are compatible with both model stoichiometry and experimental data. Applied to a model of the metabolism of the mammary gland, our method made it possible to distinguish the effects of different nutritional treatments, although it cannot be proved that the mammary gland optimizes a specific linear combination of flux variables, including those based on energy. Altogether, our study indicated that the mammary gland possesses considerable metabolic flexibility. CONCLUSION: Our method enables to study the variability of a metabolic network with respect to efficiency (i.e. yield rates). It allows a quantitative comparison of the respective contributions of precursors to the production of a set of nutrients by a metabolic network, regardless of the choice of the flux distribution within the different branches of the network.
Assuntos
Redes e Vias Metabólicas , Modelos Biológicos , Biologia de Sistemas/métodos , Ácidos Graxos/química , Ácidos Graxos/metabolismo , Humanos , Glândulas Mamárias Humanas/metabolismo , OxirreduçãoRESUMO
Fertilization of sea urchin eggs involves an increase in protein synthesis associated with a decrease in the amount of the translation initiation inhibitor 4E-BP. A highly simple reaction model for the regulation of protein synthesis was built and was used to simulate the physiological changes in the total 4E-BP amount observed during time after fertilization. Our study evidenced that two changes occurring at fertilization are necessary to fit with experimental data. The first change was an 8-fold increase in the dissociation parameter (koff1) of the eIF4E:4E-BP complex. The second was an important 32.5-fold activation of the degradation mechanism of the protein 4E-BP. Additionally, the changes in both processes should occur in 5 min time interval post-fertilization. To validate the model, we checked that the kinetic of the predicted 4.2-fold increase of eIF4E:eIF4G complex concentration at fertilization matched the increase of protein synthesis experimentally observed after fertilization (6.6-fold, SD = 2.3, n = 8). The minimal model was also used to simulate changes observed after fertilization in the presence of rapamycin, a FRAP/mTOR inhibitor. The model showed that the eIF4E:4E-BP complex destabilization was impacted and surprisingly, that the mechanism of 4E-BP degradation was also strongly affected, therefore suggesting that both processes are controlled by the protein kinase FRAP/mTOR.
RESUMO
Brown algae belong to a phylogenetic lineage distantly related to land plants and animals. They are almost exclusively found in the intertidal zone, a harsh and frequently changing environment where organisms are submitted to marine and terrestrial constraints. In relation with their unique evolutionary history and their habitat, they feature several peculiarities, including at the level of their primary and secondary metabolism. The establishment of Ectocarpus siliculosus as a model organism for brown algae has represented a framework in which several omics techniques have been developed, in particular, to study the response of these organisms to abiotic stresses. With the recent publication of medium to high throughput profiling data, it is now possible to envision integrating observations at the cellular scale to apply systems biology approaches. As a first step, we propose a protocol focusing on integrating heterogeneous knowledge gained on brown algal metabolism. The resulting abstraction of the system will then help understanding how brown algae cope with changes in abiotic parameters within their unique habitat, and to decipher some of the mechanisms underlying their (1) acclimation and (2) adaptation, respectively consequences of (1) the behavior or (2) the topology of the system resulting from the integrative approach.
Assuntos
Aclimatação , Adaptação Fisiológica , Meio Ambiente , Phaeophyceae/fisiologia , Biologia de Sistemas/métodos , Evolução Biológica , Ecossistema , Genômica , Proteômica , Estresse FisiológicoRESUMO
Gene regulation implies many mechanisms. Their identification is a crucial task to construct regulatory networks, and is necessary to understand the pathology in many cases. This requires the identification of transcription factors that play a role in regulation. Numerous motif discovery tools are now available. Combining efficiently their results appears useful for comparing and clustering these motifs in order to reduce redundancies and to identify the corresponding transcription factor. We develop a method that produces, compares and clusters a set of motifs and identifies some close motifs in databases like JASPAR and the public version of Transfac. Unlike previous comparison methods, where each matrix column is compared independently, we have developed a global method to compare motifs that also helps to reduce the number of false positives. We also propose an original graph motif model that generalizes the classical position specific pattern matrices. Finally, we present an application of our method to study ChIP-chip data sets in the context of an eukaryotic organism.