Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 365
Filtrar
1.
Cell ; 173(7): 1796-1809.e17, 2018 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-29779944

RESUMO

Non-coding genetic variation is a major driver of phenotypic diversity and allows the investigation of mechanisms that control gene expression. Here, we systematically investigated the effects of >50 million variations from five strains of mice on mRNA, nascent transcription, transcription start sites, and transcription factor binding in resting and activated macrophages. We observed substantial differences associated with distinct molecular pathways. Evaluating genetic variation provided evidence for roles of ∼100 TFs in shaping lineage-determining factor binding. Unexpectedly, a substantial fraction of strain-specific factor binding could not be explained by local mutations. Integration of genomic features with chromatin interaction data provided evidence for hundreds of connected cis-regulatory domains associated with differences in transcription factor binding and gene expression. This system and the >250 datasets establish a substantial new resource for investigation of how genetic variation affects cellular phenotypes.


Assuntos
Variação Genética , Macrófagos/metabolismo , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Células da Medula Óssea/citologia , Proteína beta Intensificadora de Ligação a CCAAT/genética , Proteína beta Intensificadora de Ligação a CCAAT/metabolismo , Análise por Conglomerados , Elementos Facilitadores Genéticos/genética , Feminino , Regulação da Expressão Gênica/efeitos dos fármacos , Lipopolissacarídeos/farmacologia , Macrófagos/citologia , Macrófagos/efeitos dos fármacos , Masculino , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos NOD , Regiões Promotoras Genéticas , Ligação Proteica , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo , Transativadores/genética , Transativadores/metabolismo , Fatores de Transcrição/genética
2.
Mol Cell ; 78(1): 152-167.e11, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-32053778

RESUMO

Eukaryotic transcription factors (TFs) form complexes with various partner proteins to recognize their genomic target sites. Yet, how the DNA sequence determines which TF complex forms at any given site is poorly understood. Here, we demonstrate that high-throughput in vitro DNA binding assays coupled with unbiased computational analysis provide unprecedented insight into how different DNA sequences select distinct compositions and configurations of homeodomain TF complexes. Using inferred knowledge about minor groove width readout, we design targeted protein mutations that destabilize homeodomain binding both in vitro and in vivo in a complex-specific manner. By performing parallel systematic evolution of ligands by exponential enrichment sequencing (SELEX-seq), chromatin immunoprecipitation sequencing (ChIP-seq), RNA sequencing (RNA-seq), and Hi-C assays, we not only classify the majority of in vivo binding events in terms of complex composition but also infer complex-specific functions by perturbing the gene regulatory network controlled by a single complex.


Assuntos
DNA/química , Proteínas de Drosophila/metabolismo , Regulação da Expressão Gênica , Proteínas de Homeodomínio/metabolismo , Fatores de Transcrição/metabolismo , Animais , Sequência de Bases , Sítios de Ligação , DNA/metabolismo , Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Proteínas de Homeodomínio/química , Proteínas de Homeodomínio/genética , Mutação , Conformação de Ácido Nucleico , Ligação Proteica , Fatores de Transcrição/química , Fatores de Transcrição/genética
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38701417

RESUMO

Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.


Assuntos
Redes Neurais de Computação , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Sítios de Ligação , Algoritmos , Biologia Computacional/métodos , Humanos , Aprendizado Profundo , Ligação Proteica
4.
Proc Natl Acad Sci U S A ; 120(10): e2216907120, 2023 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-36853943

RESUMO

Ultraviolet (UV) light induces different classes of mutagenic photoproducts in DNA, namely cyclobutane pyrimidine dimers (CPDs), 6-4 photoproducts (6-4PPs), and atypical thymine-adenine photoproducts (TA-PPs). CPD formation is modulated by nucleosomes and transcription factors (TFs), which has important ramifications for Ultraviolet (UV) mutagenesis. How chromatin affects the formation of 6-4PPs and TA-PPs is unclear. Here, we use UV damage endonuclease-sequencing (UVDE-seq) to map these UV photoproducts across the yeast genome. Our results indicate that nucleosomes, the fundamental building block of chromatin, have opposing effects on photoproduct formation. Nucleosomes induce CPDs and 6-4PPs at outward rotational settings in nucleosomal DNA but suppress TA-PPs at these settings. Our data also indicate that DNA binding by different classes of yeast TFs causes lesion-specific hotspots of 6-4PPs or TA-PPs. For example, DNA binding by the TF Rap1 generally suppresses CPD and 6-4PP formation but induces a TA-PP hotspot. Finally, we show that 6-4PP formation is strongly induced at the binding sites of TATA-binding protein (TBP), which is correlated with higher mutation rates in UV-exposed yeast. These results indicate that the formation of 6-4PPs and TA-PPs is modulated by chromatin differently than CPDs and that this may have important implications for UV mutagenesis.


Assuntos
Cromatina , Saccharomyces cerevisiae , Cromatina/genética , Saccharomyces cerevisiae/genética , Nucleossomos/genética , Mutagênese , Mutagênicos , Adenina , Dímeros de Pirimidina/genética
5.
Genes Dev ; 32(9-10): 723-736, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29764918

RESUMO

The K50 (lysine at amino acid position 50) homeodomain (HD) protein Orthodenticle (Otd) is critical for anterior patterning and brain and eye development in most metazoans. In Drosophila melanogaster, another K50HD protein, Bicoid (Bcd), has evolved to replace Otd's ancestral function in embryo patterning. Bcd is distributed as a long-range maternal gradient and activates transcription of a large number of target genes, including otd Otd and Bcd bind similar DNA sequences in vitro, but how their transcriptional activities are integrated to pattern anterior regions of the embryo is unknown. Here we define three major classes of enhancers that are differentially sensitive to binding and transcriptional activation by Bcd and Otd. Class 1 enhancers are initially activated by Bcd, and activation is transferred to Otd via a feed-forward relay (FFR) that involves sequential binding of the two proteins to the same DNA motif. Class 2 enhancers are activated by Bcd and maintained by an Otd-independent mechanism. Class 3 enhancers are never bound by Bcd, but Otd binds and activates them in a second wave of zygotic transcription. The specific activities of enhancers in each class are mediated by DNA motif variants preferentially bound by Bcd or Otd and the presence or absence of sites for cofactors that interact with these proteins. Our results define specific patterning roles for Bcd and Otd and provide mechanisms for coordinating the precise timing of gene expression patterns during embryonic development.


Assuntos
Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/embriologia , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Transativadores/genética , Transativadores/metabolismo , Motivos de Aminoácidos , Animais , Padronização Corporal/genética , Drosophila melanogaster/metabolismo , Desenvolvimento Embrionário/efeitos dos fármacos , Desenvolvimento Embrionário/genética , Elementos Facilitadores Genéticos/genética , Ligação Proteica
6.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38149460

RESUMO

Evolution of gene expression mediated by cis-regulatory changes is thought to be an important contributor to organismal adaptation, but identifying adaptive cis-regulatory changes is challenging due to the difficulty in knowing the expectation under no positive selection. A new approach for detecting positive selection on transcription factor binding sites (TFBSs) was recently developed, thanks to the application of machine learning in predicting transcription factor (TF) binding affinities of DNA sequences. Given a TFBS sequence from a focal species and the corresponding inferred ancestral sequence that differs from the former at n sites, one can predict the TF-binding affinities of many n-step mutational neighbors of the ancestral sequence and obtain a null distribution of the derived binding affinity, which allows testing whether the binding affinity of the real derived sequence deviates significantly from the null distribution. Applying this test genomically to all experimentally identified binding sites of 3 TFs in humans, a recent study reported positive selection for elevated binding affinities of TFBSs. Here, we show that this genomic test suffers from an ascertainment bias because, even in the absence of positive selection for strengthened binding, the binding affinities of known human TFBSs are more likely to have increased than decreased in evolution. We demonstrate by computer simulation that this bias inflates the false positive rate of the selection test. We propose several methods to mitigate the ascertainment bias and show that almost all previously reported positive selection signals disappear when these methods are applied.


Assuntos
Genômica , Fatores de Transcrição , Humanos , Fatores de Transcrição/metabolismo , Simulação por Computador , Sítios de Ligação/genética , Ligação Proteica
7.
Development ; 149(7)2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35394007

RESUMO

A long-standing biological question is how DNA cis-regulatory elements shape transcriptional patterns during metazoan development. Reporter constructs, cell culture assays and computational modeling have made major contributions to answering this question, but analysis of elements in their natural context is an important complement. Here, we mutate Notch-dependent LAG-1 binding sites (LBSs) in the endogenous Caenorhabditis elegans sygl-1 gene, which encodes a key stem cell regulator, and analyze the consequences on sygl-1 expression (nascent transcripts, mRNA, protein) and stem cell maintenance. Mutation of one LBS in a three-element cluster approximately halved both expression and stem cell pool size, whereas mutation of two LBSs essentially abolished them. Heterozygous LBS mutant clusters provided intermediate values. Our results lead to two major conclusions. First, both LBS number and configuration impact cluster activity: LBSs act additively in trans and synergistically in cis. Second, the SYGL-1 gradient promotes self-renewal above its functional threshold and triggers differentiation below the threshold. Our approach of coupling CRISPR/Cas9 LBS mutations with effects on both molecular and biological readouts establishes a powerful model for in vivo analyses of DNA cis-regulatory elements.


Assuntos
Caenorhabditis elegans , Elementos Reguladores de Transcrição , Células-Tronco , Animais , Caenorhabditis elegans/citologia , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/genética , Autorrenovação Celular , DNA/metabolismo , Proteínas de Ligação a DNA/genética , Receptores Notch , Células-Tronco/citologia
8.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36748992

RESUMO

Interactions between DNA and transcription factors (TFs) play an essential role in understanding transcriptional regulation mechanisms and gene expression. Due to the large accumulation of training data and low expense, deep learning methods have shown huge potential in determining the specificity of TFs-DNA interactions. Convolutional network-based and self-attention network-based methods have been proposed for transcription factor binding sites (TFBSs) prediction. Convolutional operations are efficient to extract local features but easy to ignore global information, while self-attention mechanisms are expert in capturing long-distance dependencies but difficult to pay attention to local feature details. To discover comprehensive features for a given sequence as far as possible, we propose a Dual-branch model combining Self-Attention and Convolution, dubbed as DSAC, which fuses local features and global representations in an interactive way. In terms of features, convolution and self-attention contribute to feature extraction collaboratively, enhancing the representation learning. In terms of structure, a lightweight but efficient architecture of network is designed for the prediction, in particular, the dual-branch structure makes the convolution and the self-attention mechanism can be fully utilized to improve the predictive ability of our model. The experiment results on 165 ChIP-seq datasets show that DSAC obviously outperforms other five deep learning based methods and demonstrate that our model can effectively predict TFBSs based on sequence feature alone. The source code of DSAC is available at https://github.com/YuBinLab-QUST/DSAC/.


Assuntos
DNA , Redes Neurais de Computação , Ligação Proteica , Sítios de Ligação , Fatores de Transcrição/genética
9.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37114659

RESUMO

Cyclic AMP receptor proteins (CRPs) are important transcription regulators in many species. The prediction of CRP-binding sites was mainly based on position-weighted matrixes (PWMs). Traditional prediction methods only considered known binding motifs, and their ability to discover inflexible binding patterns was limited. Thus, a novel CRP-binding site prediction model called CRPBSFinder was developed in this research, which combined the hidden Markov model, knowledge-based PWMs and structure-based binding affinity matrixes. We trained this model using validated CRP-binding data from Escherichia coli and evaluated it with computational and experimental methods. The result shows that the model not only can provide higher prediction performance than a classic method but also quantitatively indicates the binding affinity of transcription factor binding sites by prediction scores. The prediction result included not only the most knowns regulated genes but also 1089 novel CRP-regulated genes. The major regulatory roles of CRPs were divided into four classes: carbohydrate metabolism, organic acid metabolism, nitrogen compound metabolism and cellular transport. Several novel functions were also discovered, including heterocycle metabolic and response to stimulus. Based on the functional similarity of homologous CRPs, we applied the model to 35 other species. The prediction tool and the prediction results are online and are available at: https://awi.cuhk.edu.cn/∼CRPBSFinder.


Assuntos
Proteína Receptora de AMP Cíclico , Proteínas de Escherichia coli , Proteína Receptora de AMP Cíclico/genética , Proteína Receptora de AMP Cíclico/química , Proteína Receptora de AMP Cíclico/metabolismo , Proteínas de Escherichia coli/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Sítios de Ligação/genética , Ligação Proteica/genética
10.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37328639

RESUMO

Precise targeting of transcription factor binding sites (TFBSs) is essential to comprehending transcriptional regulatory processes and investigating cellular function. Although several deep learning algorithms have been created to predict TFBSs, the models' intrinsic mechanisms and prediction results are difficult to explain. There is still room for improvement in prediction performance. We present DeepSTF, a unique deep-learning architecture for predicting TFBSs by integrating DNA sequence and shape profiles. We use the improved transformer encoder structure for the first time in the TFBSs prediction approach. DeepSTF extracts DNA higher-order sequence features using stacked convolutional neural networks (CNNs), whereas rich DNA shape profiles are extracted by combining improved transformer encoder structure and bidirectional long short-term memory (Bi-LSTM), and, finally, the derived higher-order sequence features and representative shape profiles are integrated into the channel dimension to achieve accurate TFBSs prediction. Experiments on 165 ENCODE chromatin immunoprecipitation sequencing (ChIP-seq) datasets show that DeepSTF considerably outperforms several state-of-the-art algorithms in predicting TFBSs, and we explain the usefulness of the transformer encoder structure and the combined strategy using sequence features and shape profiles in capturing multiple dependencies and learning essential features. In addition, this paper examines the significance of DNA shape features predicting TFBSs. The source code of DeepSTF is available at https://github.com/YuBinLab-QUST/DeepSTF/.


Assuntos
DNA , Redes Neurais de Computação , Sítios de Ligação , Ligação Proteica , DNA/genética , DNA/química , Fatores de Transcrição/genética , Fatores de Transcrição/química
11.
Hum Genomics ; 18(1): 12, 2024 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-38308339

RESUMO

Genome-wide association studies (GWAS) are a powerful tool for detecting variants associated with complex traits and can help risk stratification and prevention strategies against pancreatic ductal adenocarcinoma (PDAC). However, the strict significance threshold commonly used makes it likely that many true risk loci are missed. Functional annotation of GWAS polymorphisms is a proven strategy to identify additional risk loci. We aimed to investigate single-nucleotide polymorphisms (SNP) in regulatory regions [transcription factor binding sites (TFBSs) and enhancers] that could change the expression profile of multiple genes they act upon and thereby modify PDAC risk. We analyzed a total of 12,636 PDAC cases and 43,443 controls from PanScan/PanC4 and the East Asian GWAS (discovery populations), and the PANDoRA consortium (replication population). We identified four associations that reached study-wide statistical significance in the overall meta-analysis: rs2472632(A) (enhancer variant, OR 1.10, 95%CI 1.06,1.13, p = 5.5 × 10-8), rs17358295(G) (enhancer variant, OR 1.16, 95%CI 1.10,1.22, p = 6.1 × 10-7), rs2232079(T) (TFBS variant, OR 0.88, 95%CI 0.83,0.93, p = 6.4 × 10-6) and rs10025845(A) (TFBS variant, OR 1.88, 95%CI 1.50,1.12, p = 1.32 × 10-5). The SNP with the most significant association, rs2472632, is located in an enhancer predicted to target the coiled-coil domain containing 34 oncogene. Our results provide new insights into genetic risk factors for PDAC by a focused analysis of polymorphisms in regulatory regions and demonstrating the usefulness of functional prioritization to identify loci associated with PDAC risk.


Assuntos
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Estudo de Associação Genômica Ampla , Predisposição Genética para Doença , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/epidemiologia , Neoplasias Pancreáticas/patologia , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/patologia , Sequências Reguladoras de Ácido Nucleico , Polimorfismo de Nucleotídeo Único/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética
12.
Plant J ; 116(1): 234-250, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37387536

RESUMO

Enhancers are critical cis-regulatory elements controlling gene expression during cell development and differentiation. However, genome-wide enhancer characterization has been challenging due to the lack of a well-defined relationship between enhancers and genes. Function-based methods are the gold standard for determining the biological function of cis-regulatory elements; however, these methods have not been widely applied to plants. Here, we applied a massively parallel reporter assay on Arabidopsis to measure enhancer activities across the genome. We identified 4327 enhancers with various combinations of epigenetic modifications distinctively different from animal enhancers. Furthermore, we showed that enhancers differ from promoters in their preference for transcription factors. Although some enhancers are not conserved and overlap with transposable elements forming clusters, enhancers are generally conserved across thousand Arabidopsis accessions, suggesting they are selected under evolution pressure and could play critical roles in the regulation of important genes. Moreover, comparison analysis reveals that enhancers identified by different strategies do not overlap, suggesting these methods are complementary in nature. In sum, we systematically investigated the features of enhancers identified by functional assay in A. thaliana, which lays the foundation for further investigation into enhancers' functional mechanisms in plants.


Assuntos
Arabidopsis , Animais , Arabidopsis/genética , Elementos Facilitadores Genéticos/genética , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/genética , Epigênese Genética
13.
BMC Genomics ; 25(1): 710, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39044130

RESUMO

BACKGROUND: Identifying the DNA-binding specificities of transcription factors (TF) is central to understanding gene networks that regulate growth and development. Such knowledge is lacking in oomycetes, a microbial eukaryotic lineage within the stramenopile group. Oomycetes include many important plant and animal pathogens such as the potato and tomato blight agent Phytophthora infestans, which is a tractable model for studying life-stage differentiation within the group. RESULTS: Mining of the P. infestans genome identified 197 genes encoding proteins belonging to 22 TF families. Their chromosomal distribution was consistent with family expansions through unequal crossing-over, which were likely ancient since each family had similar sizes in most oomycetes. Most TFs exhibited dynamic changes in RNA levels through the P. infestans life cycle. The DNA-binding preferences of 123 proteins were assayed using protein-binding oligonucleotide microarrays, which succeeded with 73 proteins from 14 families. Binding sites predicted for representatives of the families were validated by electrophoretic mobility shift or chromatin immunoprecipitation assays. Consistent with the substantial evolutionary distance of oomycetes from traditional model organisms, only a subset of the DNA-binding preferences resembled those of human or plant orthologs. Phylogenetic analyses of the TF families within P. infestans often discriminated clades with canonical and novel DNA targets. Paralogs with similar binding preferences frequently had distinct patterns of expression suggestive of functional divergence. TFs were predicted to either drive life stage-specific expression or serve as general activators based on the representation of their binding sites within total or developmentally-regulated promoters. This projection was confirmed for one TF using synthetic and mutated promoters fused to reporter genes in vivo. CONCLUSIONS: We established a large dataset of binding specificities for P. infestans TFs, representing the first in the stramenopile group. This resource provides a basis for understanding transcriptional regulation by linking TFs with their targets, which should help delineate the molecular components of processes such as sporulation and host infection. Our work also yielded insight into TF evolution during the eukaryotic radiation, revealing both functional conservation as well as diversification across kingdoms.


Assuntos
Evolução Molecular , Filogenia , Phytophthora infestans , Fatores de Transcrição , Phytophthora infestans/genética , Phytophthora infestans/metabolismo , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Sítios de Ligação , Ligação Proteica
14.
Mol Biol Evol ; 40(5)2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37172323

RESUMO

Changes in transcription factor binding sites (TFBSs) can alter the spatiotemporal expression pattern and transcript abundance of genes. Loss and gain of TFBSs were shown to cause shifts in expression patterns in numerous cases. However, we know little about the evolution of extended regulatory sequences incorporating many TFBSs. We compare, across the crucifers (Brassicaceae, cabbage family), the sequences between the translated regions of Arabidopsis Bsister (ABS)-like MADS-box genes (including paralogous GOA-like genes) and the next gene upstream, as an example of family-wide evolution of putative upstream regulatory regions (PURRs). ABS-like genes are essential for integument development of ovules and endothelium formation in seeds of Arabidopsis thaliana. A combination of motif-based gene ontology enrichment and reporter gene analysis using A. thaliana as common trans-regulatory environment allows analysis of selected Brassicaceae Bsister gene PURRs. Comparison of TFBS of transcriptionally active ABS-like genes with those of transcriptionally largely inactive GOA-like genes shows that the number of in silico predicted TFBS) is similar between paralogs, emphasizing the importance of experimental verification for in silico characterization of TFBS activity and analysis of their evolution. Further, our data show highly conserved expression of Brassicaceae ABS-like genes almost exclusively in the chalazal region of ovules. The Arabidopsis-specific insertion of a transposable element (TE) into the ABS PURRs is required for stabilizing this spatially restricted expression, while other Brassicaceae achieve chalaza-specific expression without TE insertion. We hypothesize that the chalaza-specific expression of ABS is regulated by cis-regulatory elements provided by the TE.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Brassica , Brassicaceae , Arabidopsis/metabolismo , Brassicaceae/genética , Brassicaceae/metabolismo , Elementos de DNA Transponíveis , Proteínas de Arabidopsis/genética , Sementes/genética , Brassica/genética , Regulação da Expressão Gênica de Plantas
15.
Hum Mol Genet ; 31(R1): R114-R122, 2022 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-36083269

RESUMO

Every cell in the human body inherits a copy of the same genetic information. The three billion base pairs of DNA in the human genome, and the roughly 50 000 coding and non-coding genes they contain, must thus encode all the complexity of human development and cell and tissue type diversity. Differences in gene regulation, or the modulation of gene expression, enable individual cells to interpret the genome differently to carry out their specific functions. Here we discuss recent and ongoing efforts to build gene regulatory maps, which aim to characterize the regulatory roles of all sequences in a genome. Many researchers and consortia have identified such regulatory elements using functional assays and evolutionary analyses; we discuss the results, strengths and shortcomings of their approaches. We also discuss new techniques the field can leverage and emerging challenges it will face while striving to build gene regulatory maps of ever-increasing resolution and comprehensiveness.


Assuntos
Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Humanos , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Mapeamento Cromossômico , DNA/genética
16.
Development ; 148(6)2021 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-33593819

RESUMO

The Evf2 long non-coding RNA directs Dlx5/6 ultraconserved enhancer(UCE)-intrachromosomal interactions, regulating genes across a 27 Mb region on chromosome 6 in mouse developing forebrain. Here, we show that Evf2 long-range gene repression occurs through multi-step mechanisms involving the transcription factor Sox2. Evf2 directly interacts with Sox2, antagonizing Sox2 activation of Dlx5/6UCE, and recruits Sox2 to the Dlx5/6eii shadow enhancer and key Dlx5/6UCE interaction sites. Sox2 directly interacts with Dlx1 and Smarca4, as part of the Evf2 ribonucleoprotein complex, forming spherical subnuclear domains (protein pools, PPs). Evf2 targets Sox2 PPs to one long-range repressed target gene (Rbm28), at the expense of another (Akr1b8). Evf2 and Sox2 shift Dlx5/6UCE interactions towards Rbm28, linking Evf2/Sox2 co-regulated topological control and gene repression. We propose a model that distinguishes Evf2 gene repression mechanisms at Rbm28 (Dlx5/6UCE position) and Akr1b8 (limited Sox2 availability). Genome-wide control of RNPs (Sox2, Dlx and Smarca4) shows that co-recruitment influences Sox2 DNA binding. Together, these data suggest that Evf2 organizes a Sox2 PP subnuclear domain and, through Sox2-RNP sequestration and recruitment, regulates chromosome 6 long-range UCE targeting and activity with genome-wide consequences.


Assuntos
Cromossomos de Mamíferos/genética , Regulação da Expressão Gênica no Desenvolvimento , Prosencéfalo/metabolismo , RNA Longo não Codificante/genética , Fatores de Transcrição SOXB1/genética , Animais , DNA Helicases/genética , DNA Helicases/metabolismo , Elementos Facilitadores Genéticos/genética , Imunofluorescência/métodos , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Hibridização in Situ Fluorescente/métodos , Camundongos Knockout , Camundongos Transgênicos , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Prosencéfalo/embriologia , Ligação Proteica , RNA Longo não Codificante/metabolismo , Ribonucleoproteínas/genética , Ribonucleoproteínas/metabolismo , Fatores de Transcrição SOXB1/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
17.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34929739

RESUMO

The discovery of putative transcription factor binding sites (TFBSs) is important for understanding the underlying binding mechanism and cellular functions. Recently, many computational methods have been proposed to jointly account for DNA sequence and shape properties in TFBSs prediction. However, these methods fail to fully utilize the latent features derived from both sequence and shape profiles and have limitation in interpretability and knowledge discovery. To this end, we present a novel Deep Convolution Attention network combining Sequence and Shape, dubbed as D-SSCA, for precisely predicting putative TFBSs. Experiments conducted on 165 ENCODE ChIP-seq datasets reveal that D-SSCA significantly outperforms several state-of-the-art methods in predicting TFBSs, and justify the utility of channel attention module for feature refinements. Besides, the thorough analysis about the contribution of five shapes to TFBSs prediction demonstrates that shape features can improve the predictive power for transcription factors-DNA binding. Furthermore, D-SSCA can realize the cross-cell line prediction of TFBSs, indicating the occupancy of common interplay patterns concerning both sequence and shape across various cell lines. The source code of D-SSCA can be found at https://github.com/MoonLord0525/.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Fatores de Transcrição/química , Algoritmos , Sequenciamento de Cromatina por Imunoprecipitação , DNA/química , Humanos , Redes Neurais de Computação , Ligação Proteica , Software , Fatores de Transcrição/metabolismo
18.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34965583

RESUMO

Chromatin immunoprecipitation coupled with sequencing (ChIP-seq) is a technique used to identify protein-DNA interaction sites through antibody pull-down, sequencing and analysis; with enrichment 'peak' calling being the most critical analytical step. Benchmarking studies have consistently shown that peak callers have distinct selectivity and specificity characteristics that are not additive and seldom completely overlap in many scenarios, even after parameter optimization. We therefore developed ChIP-AP, an integrated ChIP-seq analysis pipeline utilizing four independent peak callers, which seamlessly processes raw sequencing files to final result. This approach enables (1) better gauging of peak confidence through detection by multiple algorithms, and (2) more thoroughly surveys the binding landscape by capturing peaks not detected by individual callers. Final analysis results are then integrated into a single output table, enabling users to explore their data by applying selectivity and sensitivity thresholds that best address their biological questions, without needing any additional reprocessing. ChIP-AP therefore presents investigators with a more comprehensive coverage of the binding landscape without requiring additional wet-lab observations.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Benchmarking , Linhagem Celular , Imunoprecipitação da Cromatina , Software , Fatores de Transcrição
19.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34664074

RESUMO

Accurate identification of transcription factor binding sites is of great significance in understanding gene expression, biological development and drug design. Although a variety of methods based on deep-learning models and large-scale data have been developed to predict transcription factor binding sites in DNA sequences, there is room for further improvement in prediction performance. In addition, effective interpretation of deep-learning models is greatly desirable. Here we present MAResNet, a new deep-learning method, for predicting transcription factor binding sites on 690 ChIP-seq datasets. More specifically, MAResNet combines the bottom-up and top-down attention mechanisms and a state-of-the-art feed-forward network (ResNet), which is constructed by stacking attention modules that generate attention-aware features. In particular, the multi-scale attention mechanism is utilized at the first stage to extract rich and representative sequence features. We further discuss the attention-aware features learned from different attention modules in accordance with the changes as the layers go deeper. The features learned by MAResNet are also visualized through the TMAP tool to illustrate that the method can extract the unique characteristics of transcription factor binding sites. The performance of MAResNet is extensively tested on 690 test subsets with an average AUC of 0.927, which is higher than that of the current state-of-the-art methods. Overall, this study provides a new and useful framework for the prediction of transcription factor binding sites by combining the funnel attention modules with the residual network.


Assuntos
Aprendizado Profundo , Sítios de Ligação/genética , Redes Neurais de Computação , Ligação Proteica , Fatores de Transcrição/metabolismo
20.
Int J Mol Sci ; 25(9)2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38732207

RESUMO

Prediction of binding sites for transcription factors is important to understand how the latter regulate gene expression and how this regulation can be modulated for therapeutic purposes. A consistent number of references address this issue with different approaches, Machine Learning being one of the most successful. Nevertheless, we note that many such approaches fail to propose a robust and meaningful method to embed the genetic data under analysis. We try to overcome this problem by proposing a bidirectional transformer-based encoder, empowered by bidirectional long-short term memory layers and with a capsule layer responsible for the final prediction. To evaluate the efficiency of the proposed approach, we use benchmark ChIP-seq datasets of five cell lines available in the ENCODE repository (A549, GM12878, Hep-G2, H1-hESC, and Hela). The results show that the proposed method can predict TFBS within the five different cell lines very well; moreover, cross-cell predictions provide satisfactory results as well. Experiments conducted across cell lines are reinforced by the analysis of five additional lines used only to test the model trained using the others. The results confirm that prediction across cell lines remains very high, allowing an extensive cross-transcription factor analysis to be performed from which several indications of interest for molecular biology may be drawn.


Assuntos
Aprendizado Profundo , Fatores de Transcrição , Humanos , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Sítios de Ligação , Biologia Computacional/métodos , Células HeLa , Ligação Proteica , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Linhagem Celular
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa