RESUMO
Gene expression programs determine cell fate in embryonic development and their dysregulation results in disease. Transcription factors (TFs) control gene expression by binding to enhancers, but how TFs select and activate their target enhancers is still unclear. HOX TFs share conserved homeodomains with highly similar sequence recognition properties, yet they impart the identity of different animal body parts. To understand how HOX TFs control their specific transcriptional programs in vivo, we compared HOXA2 and HOXA3 binding profiles in the mouse embryo. HOXA2 and HOXA3 directly cooperate with TALE TFs and selectively target different subsets of a broad TALE chromatin platform. Binding of HOX and tissue-specific TFs convert low affinity TALE binding into high confidence, tissue-specific binding events, which bear the mark of active enhancers. We propose that HOX paralogs, alone and in combination with tissue-specific TFs, generate tissue-specific transcriptional outputs by modulating the activity of TALE TFs at selected enhancers.
Assuntos
Elementos Facilitadores Genéticos , Proteínas de Homeodomínio/metabolismo , Motivos de Aminoácidos , Animais , Regulação da Expressão Gênica no Desenvolvimento , Proteínas de Homeodomínio/química , Proteínas de Homeodomínio/genética , Camundongos , Especificidade de Órgãos , Ligação Proteica , Fatores de Transcrição/metabolismo , Ativação Transcricional , Peixe-ZebraRESUMO
Transcription factors (TFs) can bind DNA in a cooperative manner, enabling a mutual increase in occupancy. Through this type of interaction, alternative binding sites can be preferentially bound in different tissues to regulate tissue-specific expression programmes. Recently, deep learning models have become state-of-the-art in various pattern analysis tasks, including applications in the field of genomics. We therefore investigate the application of convolutional neural network (CNN) models to the discovery of sequence features determining cooperative and differential TF binding across tissues. We analyse ChIP-seq data from MEIS, TFs which are broadly expressed across mouse branchial arches, and HOXA2, which is expressed in the second and more posterior branchial arches. By developing models predictive of MEIS differential binding in all three tissues, we are able to accurately predict HOXA2 co-binding sites. We evaluate transfer-like and multitask approaches to regularizing the high-dimensional classification task with a larger regression dataset, allowing for the creation of deeper and more accurate models. We test the performance of perturbation and gradient-based attribution methods in identifying the HOXA2 sites from differential MEIS data. Our results show that deep regularized models significantly outperform shallow CNNs as well as k-mer methods in the discovery of tissue-specific sites bound in vivo.
Assuntos
Região Branquial/metabolismo , Aprendizado Profundo , Proteínas de Homeodomínio/genética , Proteína Meis1/genética , RNA/genética , Animais , Sítios de Ligação , Região Branquial/crescimento & desenvolvimento , Imunoprecipitação da Cromatina , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Embrião de Mamíferos , Regulação da Expressão Gênica no Desenvolvimento , Sequenciamento de Nucleotídeos em Larga Escala , Proteínas de Homeodomínio/metabolismo , Camundongos , Modelos Genéticos , Proteína Meis1/metabolismo , Especificidade de Órgãos , Distribuição de Poisson , Ligação Proteica , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA/metabolismoRESUMO
BACKGROUND: Although different protein-protein physical interaction (PPI) datasets exist for Escherichia coli, no common methodology exists to integrate these datasets and extract reliable modules reflecting the existing biological process and protein complexes. Naïve Bayesian formula is the highly accepted method to integrate different PPI datasets into a single weighted PPI network, but detecting proper weights in such network is still a major problem. RESULTS: In this paper, we proposed a new methodology to integrate various physical PPI datasets into a single weighted PPI network in a way that the detected modules in PPI network exhibit the highest similarity to available functional modules. We used the co-expression modules as functional modules, and we shown that direct functional modules detected from Gene Ontology terms could be used as an alternative dataset. After running this integrating methodology over six different physical PPI datasets, orthologous high-confidence interactions from a related organism and two AP-MS PPI datasets gained high weights in the integrated networks, while the weights for one AP-MS PPI dataset and two other datasets derived from public databases have converged to zero. The majority of detected modules shaped around one or few hub protein(s). Still, a large number of highly interacting protein modules were detected which are functionally relevant and are likely to construct protein complexes. CONCLUSIONS: We provided a new high confidence protein complex prediction method supported by functional studies and literature mining.
Assuntos
Proteínas de Bactérias/metabolismo , Escherichia coli/metabolismo , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Proteínas de Bactérias/química , Teorema de Bayes , Cromatografia de Afinidade , Espectrometria de Massas , Mapas de Interação de ProteínasRESUMO
In polymicrobial communities where several species co-exist in a certain niche and consequently the possibility of interactions among species is very high, gene expression data sources can give better insights in to underlying adaptation mechanisms assumed by bacteria. Furthermore, several possible synergistic or antagonistic interactions among species can be investigated through gene expression comparisons. Lung is one of the habitats harboring several distinct pathogens during severe pulmonary disorders such as chronic obstructive pulmonary disease (COPD) and cystic fibrosis (CF). Expression data analysis of these lung residents can help to gain a better understanding on how these species interact with each other within the host cells. The first part of this paper deals with introducing available data sources for the major bacteria responsible for causing lung diseases and their genomic relations. In the second part, the main focus is on the studies concerning gene expression analyses of these species.
RESUMO
Increasingly large-scale expression compendia for different species are becoming available. By exploiting the modularity of the coexpression network, these compendia can be used to identify biological processes for which the expression behavior is conserved over different species. However, comparing module networks across species is not trivial. The definition of a biologically meaningful module is not a fixed one and changing the distance threshold that defines the degree of coexpression gives rise to different modules. As a result when comparing modules across species, many different partially overlapping conserved module pairs across species exist and deciding which pair is most relevant is hard. Therefore, we developed a method referred to as conserved modules across organisms (COMODO) that uses an objective selection criterium to identify conserved expression modules between two species. The method uses as input microarray data and a gene homology map and provides as output pairs of conserved modules and searches for the pair of modules for which the number of sharing homologs is statistically most significant relative to the size of the linked modules. To demonstrate its principle, we applied COMODO to study coexpression conservation between the two well-studied bacteria Escherichia coli and Bacillus subtilis. COMODO is available at: http://homes.esat.kuleuven.be/â¼kmarchal/Supplementary_Information_Zarrineh_2010/comodo/index.html.
Assuntos
Perfilação da Expressão Gênica/métodos , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Mapeamento Cromossômico , Análise por Conglomerados , Escherichia coli/genética , Escherichia coli/metabolismo , Evolução Molecular , Regulação Bacteriana da Expressão Gênica , Genes Essenciais , Análise de Sequência com Séries de Oligonucleotídeos , Óperon , ProbabilidadeRESUMO
A lingering question in developmental biology has centered on how transcription factors with widespread distribution in vertebrate embryos can perform tissue-specific functions. Here, using the murine hindlimb as a model, we investigate the elusive mechanisms whereby PBX TALE homeoproteins, viewed primarily as HOX cofactors, attain context-specific developmental roles despite ubiquitous presence in the embryo. We first demonstrate that mesenchymal-specific loss of PBX1/2 or the transcriptional regulator HAND2 generates similar limb phenotypes. By combining tissue-specific and temporally controlled mutagenesis with multi-omics approaches, we reconstruct a gene regulatory network (GRN) at organismal-level resolution that is collaboratively directed by PBX1/2 and HAND2 interactions in subsets of posterior hindlimb mesenchymal cells. Genome-wide profiling of PBX1 binding across multiple embryonic tissues further reveals that HAND2 interacts with subsets of PBX-bound regions to regulate limb-specific GRNs. Our research elucidates fundamental principles by which promiscuous transcription factors cooperate with cofactors that display domain-restricted localization to instruct tissue-specific developmental programs.
Assuntos
Redes Reguladoras de Genes , Fatores de Transcrição , Animais , Camundongos , Proteínas de Homeodomínio/metabolismo , Fator de Transcrição 1 de Leucemia de Células Pré-B/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
The highly conserved HOX homeodomain (HD) transcription factors (TFs) establish the identity of different body parts along the antero-posterior axis of bilaterian animals. Segment diversification and the morphogenesis of different structures is achieved by generating precise patterns of HOX expression along the antero-posterior axis and by the ability of different HOX TFs to instruct unique and specific transcriptional programs. However, HOX binding properties in vitro, characterised by the recognition of similar AT-rich binding sequences, do not account for the ability of different HOX to instruct segment-specific transcriptional programs. To address this problem, we previously compared HOXA2 and HOXA3 binding in vivo. Here, we explore if sequence motif enrichments observed in vivo are explained by binding affinities in vitro. Unexpectedly, we found that the highest enriched motif in HOXA2 peaks was not recognised by HOXA2 in vitro, highlighting the importance of investigating HOX binding in its physiological context. We also report the ability of HOXA2 and HOXA3 to heterodimerise, which may have functional consequences for the HOX patterning function in vivo.
RESUMO
Genome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T cells over 24 h, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes in gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.
Assuntos
Artrite Reumatoide/genética , Linfócitos T CD4-Positivos/metabolismo , Cromatina , Proteína Forkhead Box O1/genética , Doenças Autoimunes/genética , Linfócitos T CD4-Positivos/citologia , Cromatina/química , Cromatina/genética , Elementos Facilitadores Genéticos , Proteína Forkhead Box O1/metabolismo , Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Células HEK293 , Humanos , Cultura Primária de Células , Regiões Promotoras Genéticas , Locos de Características QuantitativasRESUMO
How the genome activates or silences transcriptional programmes governs organ formation. Little is known in human embryos undermining our ability to benchmark the fidelity of stem cell differentiation or cell programming, or interpret the pathogenicity of noncoding variation. Here, we study histone modifications across thirteen tissues during human organogenesis. We integrate the data with transcription to build an overview of how the human genome differentially regulates alternative organ fates including by repression. Promoters from nearly 20,000 genes partition into discrete states. Key developmental gene sets are actively repressed outside of the appropriate organ without obvious bivalency. Candidate enhancers, functional in zebrafish, allow imputation of tissue-specific and shared patterns of transcription factor binding. Overlaying more than 700 noncoding mutations from patients with developmental disorders allows correlation to unanticipated target genes. Taken together, the data provide a comprehensive genomic framework for investigating normal and abnormal human development.
Assuntos
Deficiências do Desenvolvimento/genética , Epigênese Genética , Organogênese/genética , Animais , Animais Geneticamente Modificados , Bases de Dados Genéticas , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Código das Histonas/genética , Humanos , Modelos Genéticos , Mutação , Organogênese/fisiologia , Regiões Promotoras Genéticas , Distribuição Tecidual , Fatores de Transcrição/metabolismo , Peixe-Zebra/embriologia , Peixe-Zebra/genéticaRESUMO
Link prediction is a promising research area for modeling various types of networks and has mainly focused on predicting missing links. Link prediction methods may be valuable for describing brain connectivity, as it changes in Alzheimer's disease (AD) and its precursor, mild cognitive impairment (MCI). Here, we analyzed 3-tesla whole-brain diffusion-weighted images from 202 participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI) - 50 healthy controls, 72 with earlyMCI (eMCI) and 38 with lateMCI (lMCI) and 42 AD patients. We introduce a novel approach for Mixed Link Prediction (MLP) to test and define the percent of predictability of each heightened stage of dementia from its previous, less impaired stage, in the simplest case. Using well-known link prediction algorithms as the core of MLP, we propose a new approach that predicts stages of cognitive impairment by simultaneously adding and removing links in the brain networks of elderly individuals. We found that the optimal algorithm, called "Adamic and Adar", had the best fit and most accurately predicted the stages of AD from their previous stage. When compared to the other link prediction algorithms, that mainly only predict the added links, our proposed approach can more inclusively simulate the brain changes during disease by both adding and removing links of the network. Our results are also in line with computational neuroimaging and clinical findings and can be improved for better results.
Assuntos
Doença de Alzheimer/fisiopatologia , Encéfalo/fisiopatologia , Modelos Neurológicos , Vias Neurais , Algoritmos , Mapeamento Encefálico , Estudos de Casos e Controles , Bases de Dados Factuais , Humanos , Imageamento por Ressonância Magnética/métodos , Fluxo de TrabalhoRESUMO
Connection of the heart to the systemic circulation is a critical developmental event that requires selective preservation of embryonic vessels (aortic arches). However, why some aortic arches regress while others are incorporated into the mature aortic tree remains unclear. By microdissection and deep sequencing in mouse, we find that neural crest (NC) only differentiates into vascular smooth muscle cells (SMCs) around those aortic arches destined for survival and reorganization, and identify the transcription factor Gata6 as a crucial regulator of this process. Gata6 is expressed in SMCs and its target genes activation control SMC differentiation. Furthermore, Gata6 is sufficient to promote SMCs differentiation in vivo, and drive preservation of aortic arches that ought to regress. These findings identify Gata6-directed differentiation of NC to SMCs as an essential mechanism that specifies the aortic tree, and provide a new framework for how mutations in GATA6 lead to congenital heart disorders in humans.
Assuntos
Aorta/embriologia , Diferenciação Celular , Fator de Transcrição GATA6/metabolismo , Miócitos de Músculo Liso/fisiologia , Crista Neural/embriologia , Animais , Expressão Gênica , CamundongosRESUMO
Gene co-expression analysis is one of the main aspects of systems biology that uses high-throughput gene expression data. In the present study we applied cross-species co-expressional analysis on a module of biofilm and stress response associated genes. We addressed different kinds of stresses in three most intensively studied members of Gammaproteobacteria including Escherichia coli K12, Pseudomonas aeruginosa PAO1 and Salmonella enterica for which large sets of gene expression data are available. Our aim was to evaluate the presence of common stress response strategies adopted by these microorganisms that may be assigned to the other members of Gammaproteobacteria. Results of functional annotation analysis revealed distinct categories among co-expressed genes, most of which concerned biological processes associated with virulence and stress response. Transcriptional regulatory analysis of genes present in co-expressed modules showed that the global stress sigma factor, RpoS, besides several local transcription factors accounts for the observed co-expressional response, and that several cases of feed-forward loops exist between global regulators, local transcription factors and their targets. Our results lend partial support to our underlying assumption of the conservation of core biological processes and regulatory interactions among these related Gammaproteobacteria members. This has led to the implementation of transferring gene function annotations from well-studied Gammaproteobacterial species to less-characterized members. These findings can shed light on the discovery of new drug targets capable of controlling severe infections caused by these groups of bacteria.
Assuntos
Escherichia coli/genética , Escherichia coli/patogenicidade , Regulação Bacteriana da Expressão Gênica , Pseudomonas aeruginosa/genética , Pseudomonas aeruginosa/patogenicidade , Salmonella enterica/genética , Salmonella enterica/patogenicidade , Estresse Fisiológico/genética , Biofilmes , Coinfecção/microbiologia , Redes Reguladoras de Genes , Genes Bacterianos , Humanos , Fator sigma/metabolismo , Transdução de Sinais/genética , Fatores de Transcrição/metabolismo , Virulência/genéticaRESUMO
Availability of genome-wide gene expression datasets provides the opportunity to study gene expression across different organisms under a plethora of experimental conditions. In our previous work, we developed an algorithm called COMODO (COnserved MODules across Organisms) that identifies conserved expression modules between two species. In the present study, we expanded COMODO to detect the co-expression conservation across three organisms by adapting the statistics behind it. We applied COMODO to study expression conservation/divergence between Escherichia coli, Salmonella enterica, and Bacillus subtilis. We observed that some parts of the regulatory interaction networks were conserved between E. coli and S. enterica especially in the regulon of local regulators. However, such conservation was not observed between the regulatory interaction networks of B. subtilis and the two other species. We found co-expression conservation on a number of genes involved in quorum sensing, but almost no conservation for genes involved in pathogenicity across E. coli and S. enterica which could partially explain their different lifestyles. We concluded that despite their different lifestyles, no significant rewiring have occurred at the level of local regulons involved for instance, and notable conservation can be detected in signaling pathways and stress sensing in the phylogenetically close species S. enterica and E. coli. Moreover, conservation of local regulons seems to depend on the evolutionary time of divergence across species disappearing at larger distances as shown by the comparison with B. subtilis. Global regulons follow a different trend and show major rewiring even at the limited evolutionary distance that separates E. coli and S. enterica.
Assuntos
Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Regulon/fisiologia , Salmonella typhimurium/genética , Algoritmos , Escherichia coli/fisiologia , Perfilação da Expressão Gênica , Genoma Bacteriano , Modelos Genéticos , Método de Monte Carlo , Filogenia , Percepção de Quorum/genética , Salmonella typhimurium/fisiologia , Virulência/genéticaRESUMO
Pseudomonas tolaasii, the causative agent of Agaricus bisporus brown blotch disease, can be identified by the white line reaction, occurring upon confrontation of the tolaasin-producing mushroom pathogen with "Pseudomonas reactans," producing the lipopeptide white line-inducing principle (WLIP). The draft genome sequence of the WLIP-producing indicator Pseudomonas fluorescens strain LMG 5329 is reported here.