RESUMO
MOTIVATION: Transcription factor binding sites (TFBS) are regulatory elements that have significant impact on transcription regulation and cell fate determination. Canonical motifs, biological experiments, and computational methods have made it possible to discover TFBS. However, most existing in silico TFBS prediction models are solely DNA-based, and are trained and utilized within the same biosample, which fail to infer TFBS in experimentally unexplored biosamples. RESULTS: Here, we propose TFBS prediction by modified TransFormer (TFTF), a multimodal deep language architecture which integrates multiomics information in epigenetic studies. In comparison to existing computational techniques, TFTF has state-of-the-art accuracy, and is also the first approach to accurately perform genome-wide detection for cell-type and species-specific TFBS in experimentally unexplored biosamples. Compared to peak calling methods, TFTF consistently discovers true TFBS in threshold tuning-free way, with higher recalled rates. The underlying mechanism of TFTF reveals greater attention to the targeted TF's motif region in TFBS, and general attention to the entire peak region in non-TFBS. TFTF can benefit from the integration of broader and more diverse data for improvement and can be applied to multiple epigenetic scenarios. AVAILABILITY AND IMPLEMENTATION: We provide a web server (https://tftf.ibreed.cn/) for users to utilize TFTF model. Users can train TFTF model and discover TFBS with their own data.
Assuntos
Genoma , Multiômica , Sítios de Ligação , Ligação Proteica , Fatores de Transcrição/metabolismo , Biologia Computacional/métodosRESUMO
BACKGROUND: Single-cell RNA sequencing enables studying cells individually, yet high gene dimensions and low cell numbers challenge analysis. And only a subset of the genes detected are involved in the biological processes underlying cell-type specific functions. RESULT: In this study, we present COMSE, an unsupervised feature selection framework using community detection to capture informative genes from scRNA-seq data. COMSE identified homogenous cell substates with high resolution, as demonstrated by distinguishing different cell cycle stages. Evaluations based on real and simulated scRNA-seq datasets showed COMSE outperformed methods even with high dropout rates in cell clustering assignment. We also demonstrate that by identifying communities of genes associated with batch effects, COMSE parses signals reflecting biological difference from noise arising due to differences in sequencing protocols, thereby enabling integrated analysis of scRNA-seq datasets of different sources. CONCLUSIONS: COMSE provides an efficient unsupervised framework that selects highly informative genes in scRNA-seq data improving cell sub-states identification and cell clustering. It identifies gene subsets that reveal biological and technical heterogeneity, supporting applications like batch effect correction and pathway analysis. It also provides robust results for bulk RNA-seq data analysis.
Assuntos
RNA-Seq , Análise da Expressão Gênica de Célula Única , Animais , Humanos , Camundongos , RNA-Seq/métodosRESUMO
High-throughput single-cell RNA-seq data have provided unprecedented opportunities for deciphering the regulatory interactions among genes. However, such interactions are complex and often nonlinear or nonmonotonic, which makes their inference using linear models challenging. We present SIGNET, a deep learning-based framework for capturing complex regulatory relationships between genes under the assumption that the expression levels of transcription factors participating in gene regulation are strong predictors of the expression of their target genes. Evaluations based on a variety of real and simulated scRNA-seq datasets showed that SIGNET is more sensitive to ChIP-seq validated regulatory interactions in different types of cells, particularly rare cells. Therefore, this process is more effective for various downstream analyses, such as cell clustering and gene regulatory network inference. We demonstrated that SIGNET is a useful tool for identifying important regulatory modules driving various biological processes.
Assuntos
Redes Reguladoras de Genes , Redes Neurais de Computação , Análise de Sequência de RNA , Análise de Célula Única , Algoritmos , Análise por Conglomerados , Aprendizado Profundo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , RNA-Seq , Fatores de Transcrição/metabolismoRESUMO
The pandemic of COVID-19, caused by SARS-CoV-2, is a major global health threat. Epidemiological studies suggest that bats (Rhinolophus affinis) are the natural zoonotic reservoir for SARS-CoV-2. However, the host range of SARS-CoV-2 and intermediate hosts that facilitate its transmission to humans remain unknown. The interaction of coronavirus with its host receptor is a key genetic determinant of host range and cross-species transmission. SARS-CoV-2 uses angiotensin-converting enzyme 2 (ACE2) as the receptor to enter host cells in a species-dependent manner. In this study, we characterized the ability of ACE2 from diverse species to support viral entry. By analyzing the conservation of five residues in two virus-binding hotspots of ACE2 (hotspot 31Lys and hotspot 353Lys), we predicted 80 ACE2 proteins from mammals that could potentially mediate SARS-CoV-2 entry. We chose 48 ACE2 orthologs among them for functional analysis, and showed that 44 of these orthologs-including domestic animals, pets, livestock, and animals commonly found in zoos and aquaria-could bind the SARS-CoV-2 spike protein and support viral entry. In contrast, New World monkey ACE2 orthologs could not bind the SARS-CoV-2 spike protein and support viral entry. We further identified the genetic determinant of New World monkey ACE2 that restricts viral entry using genetic and functional analyses. These findings highlight a potentially broad host tropism of SARS-CoV-2 and suggest that SARS-CoV-2 might be distributed much more widely than previously recognized, underscoring the necessity to monitor susceptible hosts to prevent future outbreaks.
Assuntos
Enzima de Conversão de Angiotensina 2/genética , COVID-19/veterinária , Receptores Virais/genética , SARS-CoV-2/genética , Enzima de Conversão de Angiotensina 2/metabolismo , Animais , COVID-19/genética , COVID-19/metabolismo , COVID-19/virologia , Especificidade de Hospedeiro , Humanos , Pandemias/prevenção & controle , Peptidil Dipeptidase A/genética , Peptidil Dipeptidase A/metabolismo , Filogenia , Ligação Proteica , Receptores Virais/metabolismo , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo , Tropismo Viral , Zoonoses Virais/genética , Zoonoses Virais/prevenção & controle , Zoonoses Virais/virologia , Ligação Viral , Internalização do VírusRESUMO
Hi-C is a genome-wide assay based on Chromosome Conformation Capture and high-throughput sequencing to decipher 3D chromatin organization in the nucleus. However, computational methods to detect functional interactions utilizing Hi-C data face challenges including the correction for various sources of biases and the identification of functional interactions with low counts of interacting fragments. We present Chrom-Lasso, a lasso linear regression model that removes complex biases assumption-free and identifies functional interacting loci with increased power by combining information of local reads distribution surrounding the area of interest. We showed that interacting regions identified by Chrom-Lasso are more enriched for 5C validated interactions and functional GWAS hits than that of GOTHiC and Fit-Hi-C. To further demonstrate the ability of Chrom-Lasso to detect interactions of functional importance, we performed time-series Hi-C and RNA-seq during T cell activation and exhaustion. We showed that the dynamic changes in gene expression and chromatin interactions identified by Chrom-Lasso were largely concordant with each other. Finally, we experimentally confirmed Chrom-Lasso's finding that Erbb3 was co-regulated with distinct neighboring genes at different states during T cell activation. Our results highlight Chrom-Lasso's utility in detecting weak functional interaction between cis-regulatory elements, such as promoters and enhancers.
Assuntos
Cromatina/química , Cromatina/genética , Genômica/métodos , Modelos Moleculares , Modelos Estatísticos , Análise de Regressão , Software , Animais , Linfócitos T CD8-Positivos/imunologia , Linfócitos T CD8-Positivos/metabolismo , Bases de Dados Genéticas , Epistasia Genética , Regulação da Expressão Gênica , Biblioteca Gênica , Estudo de Associação Genômica Ampla/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Ativação Linfocitária/genética , Ativação Linfocitária/imunologia , Camundongos , Locos de Características QuantitativasRESUMO
Streptococcus pneumoniae resides in the human upper airway as a commensal but also causes pneumonia, bacteremia, meningitis, and otitis media. It remains unclear how pneumococci adapt to nutritional conditions of various host niches. We here show that MetR, a LysR family transcriptional regulator, serves as a molecular adaptor for pneumococcal fitness, particularly in the upper airway. The metR mutant of strain D39 rapidly disappeared from the nasopharynx but was marginally attenuated in the lungs and bloodstream of mice. RNA-seq and ChIP-seq analyses showed that MetR broadly regulates transcription of the genes involved in methionine synthesis and other functions under methionine starvation. Genetic and biochemical analyses confirmed that MetR is essential for the activation of methionine synthesis but not uptake. Co-infection of influenza virus partially restored the colonization defect of the metR mutant. These results strongly suggest that MetR is particularly evolved for pneumococcal carriage in the upper airway of healthy individuals where free methionine is severely limited, but it becomes dispensable where environmental methionine is relatively more abundant (e.g., inflamed upper airway and sterile sites). To the best of our knowledge, MetR represents the first known regulator particularly for pneumococcal carriage in healthy individuals.
Assuntos
Proteínas de Bactérias/genética , Metionina/biossíntese , Nasofaringe/microbiologia , Streptococcus pneumoniae/crescimento & desenvolvimento , Streptococcus pneumoniae/genética , Transativadores/genética , Animais , Proteínas de Bactérias/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Feminino , Metionina/metabolismo , Camundongos , Infecções Pneumocócicas/patologia , Transativadores/metabolismo , Transcrição Gênica/genéticaRESUMO
Type I restriction-modification (R-M) systems consist of a DNA endonuclease (HsdR, HsdM and HsdS subunits) and methyltransferase (HsdM and HsdS subunits). The hsdS sequences flanked by inverted repeats (referred to as epigenetic invertons) in certain Type I R-M systems undergo invertase-catalyzed inversions. Previous studies in Streptococcus pneumoniae have shown that hsdS inversions within clonal populations produce subpopulations with profound differences in the methylome, cellular physiology and virulence. In this study, we bioinformatically identified six major clades of the tyrosine and serine family invertases homologs from 16 bacterial phyla, which potentially catalyze hsdS inversions in the epigenetic invertons. In particular, the epigenetic invertons are highly enriched in host-associated bacteria. We further verified hsdS inversions in the Type I R-M systems of four representative host-associated bacteria and found that each of the resultant hsdS allelic variants specifies methylation of a unique DNA sequence. In addition, transcriptome analysis revealed that hsdS allelic variations in Enterococcus faecalis exert significant impact on gene expression. These findings indicate that epigenetic switches driven by invertases in the epigenetic invertons broadly operate in the host-associated bacteria, which may broadly contribute to bacterial host adaptation and virulence beyond the role of the Type I R-M systems against phage infection.
Assuntos
Proteínas de Bactérias/genética , Enzimas de Restrição-Modificação do DNA/genética , Epigênese Genética , Regulação Bacteriana da Expressão Gênica , Bacteroides fragilis/genética , Metilação de DNA , DNA Bacteriano/química , Enterococcus faecalis/genética , Sequências Repetidas Invertidas , Streptococcus agalactiae/genética , Treponema denticola/genéticaRESUMO
BACKGROUND: Hepatocellular carcinoma (HCC) ranks the fourth in terms of cancer-related mortality globally. Herein, in this research, we attempted to develop a novel immune-related gene signature that could predict survival and efficacy of immunotherapy for HCC patients. METHODS: The transcriptomic and clinical data of HCC samples were downloaded from The Cancer Genome Atlas (TCGA) and GSE14520 datasets, followed by acquiring immune-related genes from the ImmPort database. Afterwards, an immune-related gene-based prognostic index (IRGPI) was constructed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model. Kaplan-Meier survival curves as well as time-dependent receiver operating characteristic (ROC) curve were performed to evaluate its predictive capability. Besides, both univariate and multivariate analyses on overall survival for the IRGPI and multiple clinicopathologic factors were carried out, followed by the construction of a nomogram. Finally, we explored the possible correlation of IRGPI with immune cell infiltration or immunotherapy efficacy. RESULTS: Analysis of 365 HCC samples identified 11 differentially expressed immune-related genes, which were selected to establish the IRGPI. Notably, it can predict the survival of HCC patients more accurately than published biomarkers. Furthermore, IRGPI can predict the infiltration of immune cells in the tumor microenvironment of HCC, as well as the response of immunotherapy. CONCLUSION: Collectively, the currently established IRGPI can accurately predict survival, reflect the immune microenvironment, and predict the efficacy of immunotherapy among HCC patients.
Assuntos
Biomarcadores Tumorais/genética , Carcinoma Hepatocelular/mortalidade , Regulação Neoplásica da Expressão Gênica , Imunoterapia/mortalidade , Neoplasias Hepáticas/mortalidade , Nomogramas , Transcriptoma , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/imunologia , Carcinoma Hepatocelular/terapia , Estudos de Casos e Controles , Biologia Computacional , Feminino , Seguimentos , Perfilação da Expressão Gênica , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/imunologia , Neoplasias Hepáticas/terapia , Masculino , Prognóstico , Taxa de SobrevidaRESUMO
Multiple new variants of SARS-CoV-2 have been identified as the COVID-19 pandemic spreads across the globe. However, most epidemic models view the virus as static and unchanging and thus fail to address the consequences of the potential evolution of the virus. Here, we built a competitive susceptible-infected-removed (coSIR) model to simulate the competition between virus strains of differing severities or transmissibility under various virus control policies. The coSIR model predicts that although the virus is extremely unlikely to evolve into a "super virus" that causes an increased fatality rate, virus variants with less severe symptoms can lead to potential new outbreaks and can cost more lives over time. The present model also demonstrates that the protocols restricting the transmission of the virus, such as wearing masks and social distancing, are the most effective strategy in reducing total mortality. A combination of adequate testing and strict quarantine is a powerful alternative to policies such as mandatory stay-at-home orders, which may have an enormous negative impact on the economy. In addition, building Mobile Cabin Hospitals can be effective and efficient in reducing the mortality rate of highly infectious virus strains. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11071-021-06705-8.
RESUMO
The profiling of plasma cell-free DNA (cfDNA) is becoming a valuable tool rapidly for tumor diagnosis, monitoring and prognosis. Diverse plasma cfDNA technologies have been in routine or emerging use, including analyses of mutations, copy number alterations, gene fusions and DNA methylation. Recently, new technologies in cfDNA analysis have been developed in laboratories, and potentially reflect the status of epigenetic modification, the immune microenvironment and the microbiome in tumor tissues. In this review, the authors discuss the principles, methods and effects of the current cfDNA assays and provide an overview of studies that may inform clinical applications in the near future.
RESUMO
Gene conversion is the copying of a genetic sequence from a "donor" region to an "acceptor." In nonallelic gene conversion (NAGC), the donor and the acceptor are at distinct genetic loci. Despite the role NAGC plays in various genetic diseases and the concerted evolution of gene families, the parameters that govern NAGC are not well characterized. Here, we survey duplicate gene families and identify converted tracts in 46% of them. These conversions reflect a large GC bias of NAGC. We develop a sequence evolution model that leverages substantially more information in duplicate sequences than used by previous methods and use it to estimate the parameters that govern NAGC in humans: a mean converted tract length of 250 bp and a probability of [Formula: see text] per generation for a nucleotide to be converted (an order of magnitude higher than the point mutation rate). Despite this high baseline rate, we show that NAGC slows down as duplicate sequences diverge-until an eventual "escape" of the sequences from its influence. As a result, NAGC has a small average effect on the sequence divergence of duplicates. This work improves our understanding of the NAGC mechanism and the role that it plays in the evolution of gene duplicates.
Assuntos
Evolução Molecular , Conversão Gênica , Genes Duplicados , Genética Humana , Modelos Genéticos , Animais , Composição de Bases , Loci Gênicos , Gorilla gorilla/genética , Humanos , Macaca/genética , Taxa de Mutação , Pan troglodytes/genética , Pongo/genéticaRESUMO
Human transcription factors recognize specific DNA sequence motifs to regulate transcription. It is unknown whether a single transcription factor is able to bind to distinctly different motifs on chromatin, and if so, what determines the usage of specific motifs. By using a motif-resolution chromatin immunoprecipitation-exonuclease (ChIP-exo) approach, we find that agonist-liganded human androgen receptor (AR) and antagonist-liganded AR bind to two distinctly different motifs, leading to distinct transcriptional outcomes in prostate cancer cells. Further analysis on clinical prostate tissues reveals that the binding of AR to these two distinct motifs is involved in prostate carcinogenesis. Together, these results suggest that unique ligands may switch DNA motifs recognized by ligand-dependent transcription factors in vivo. Our findings also provide a broad mechanistic foundation for understanding ligand-specific induction of gene expression profiles.
Assuntos
Antagonistas de Receptores de Andrógenos/química , Androgênios/química , DNA/metabolismo , Neoplasias da Próstata/metabolismo , Receptores Androgênicos/metabolismo , Antagonistas de Receptores de Andrógenos/metabolismo , Androgênios/metabolismo , Proliferação de Células/fisiologia , Imunoprecipitação da Cromatina , Ensaio de Desvio de Mobilidade Eletroforética , Humanos , Masculino , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
MOTIVATION: Enhancers are of short regulatory DNA elements. They can be bound with proteins (activators) to activate transcription of a gene, and hence play a critical role in promoting gene transcription in eukaryotes. With the avalanche of DNA sequences generated in the post-genomic age, it is a challenging task to develop computational methods for timely identifying enhancers from extremely complicated DNA sequences. Although some efforts have been made in this regard, they were limited at only identifying whether a query DNA element being of an enhancer or not. According to the distinct levels of biological activities and regulatory effects on target genes, however, enhancers should be further classified into strong and weak ones in strength. RESULTS: In view of this, a two-layer predictor called ' IENHANCER-2L: ' was proposed by formulating DNA elements with the 'pseudo k-tuple nucleotide composition', into which the six DNA local parameters were incorporated. To the best of our knowledge, it is the first computational predictor ever established for identifying not only enhancers, but also their strength. Rigorous cross-validation tests have indicated that IENHANCER-2L: holds very high potential to become a useful tool for genome analysis. AVAILABILITY AND IMPLEMENTATION: For the convenience of most experimental scientists, a web server for the two-layer predictor was established at http://bioinformatics.hitsz.edu.cn/iEnhancer-2L/, by which users can easily get their desired results without the need to go through the mathematical details. CONTACT: bliu@gordonlifescience.org, bliu@insun.hit.edu.cn, xlan@stanford.edu, kcchou@gordonlifescience.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Elementos Facilitadores Genéticos , Análise de Sequência de DNA/métodos , Software , DNA/química , Genômica/métodos , Nucleotídeos/análiseRESUMO
DNA methylation is an important epigenetic regulator of gene expression. Recent studies have revealed widespread associations between genetic variation and methylation levels. However, the mechanistic links between genetic variation and methylation remain unclear. To begin addressing this gap, we collected methylation data at â¼300,000 loci in lymphoblastoid cell lines (LCLs) from 64 HapMap Yoruba individuals, and genome-wide bisulfite sequence data in ten of these individuals. We identified (at an FDR of 10%) 13,915 cis methylation QTLs (meQTLs)-i.e., CpG sites in which changes in DNA methylation are associated with genetic variation at proximal loci. We found that meQTLs are frequently associated with changes in methylation at multiple CpGs across regions of up to 3 kb. Interestingly, meQTLs are also frequently associated with variation in other properties of gene regulation, including histone modifications, DNase I accessibility, chromatin accessibility, and expression levels of nearby genes. These observations suggest that genetic variants may lead to coordinated molecular changes in all of these regulatory phenotypes. One plausible driver of coordinated changes in different regulatory mechanisms is variation in transcription factor (TF) binding. Indeed, we found that SNPs that change predicted TF binding affinities are significantly enriched for associations with DNA methylation at nearby CpGs.
Assuntos
Metilação de DNA , Regulação da Expressão Gênica , Histonas/metabolismo , Locos de Características Quantitativas , Fatores de Transcrição/metabolismo , Sítios de Ligação , Linhagem Celular Transformada , Biologia Computacional , Estudo de Associação Genômica Ampla , Genômica/métodos , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Ligação ProteicaRESUMO
Alternative splicing (AS), in higher eukaryotes, is one of the mechanisms of post-transcriptional regulation that generate multiple transcripts from the same gene. One particular mode of AS is the skipping event where an exon may be alternatively excluded or constitutively included in the resulting mature mRNA. Both transcript isoforms from this skipping event site, i.e. in which the exon is either included (inclusion isoform) or excluded (skipping isoform), are typically present in one cell, and maintain a subtle balance that is vital to cellular function and dynamics. However, how the prevailing conditions dictate which isoform is expressed and what biological factors might influence the regulation of this process remain areas requiring further exploration. In this study, we have developed a novel computational method, graph-based exon-skipping scanner (GESS), for de novo detection of skipping event sites from raw RNA-seq reads without prior knowledge of gene annotations, as well as for determining the dominant isoform generated from such sites. We have applied our method to publicly available RNA-seq data in GM12878 and K562 cells from the ENCODE consortium and experimentally validated several skipping site predictions by RT-PCR. Furthermore, we integrated other sequencing-based genomic data to investigate the impact of splicing activities, transcription factors (TFs) and epigenetic histone modifications on splicing outcomes. Our computational analysis found that splice sites within the skipping-isoform-dominated group (SIDG) tended to exhibit weaker MaxEntScan-calculated splice site strength around middle, 'skipping', exons compared to those in the inclusion-isoform-dominated group (IIDG). We further showed the positional preference pattern of splicing factors, characterized by enrichment in the intronic splice sites immediately bordering middle exons. Finally, our analysis suggested that different epigenetic factors may introduce a variable obstacle in the process of exon-intron boundary establishment leading to skipping events.
Assuntos
Processamento Alternativo , Epigênese Genética , Éxons , Análise de Sequência de RNA , Transcrição Gênica , Sítios de Ligação , Linhagem Celular , Biologia Computacional/métodos , Histonas/metabolismo , Humanos , Células K562 , Motivos de Nucleotídeos , Sítios de Splice de RNA , RNA Mensageiro/química , Proteínas de Ligação a RNA/metabolismo , Fatores de Transcrição/metabolismoRESUMO
Gene fusion is among the primary processes that generate new genes and has been well characterized as potent pathway of oncogenesis. Here, by high-throughput RNA sequencing in nine paired human endometrial carcinoma (EC) and matched non-cancerous tissues, we obtained that chimeric translin-associated factor X-disrupted-in-schizophrenia 1 (TSNAX-DISC1) occurred significantly upregulated in multiple EC samples. Experimental investigation showed that TSNAX-DISC1 appears to be formed by splicing without chromosomal rearrangement. The chimera expression inversely correlated with the binding of CCCTC-binding factor (CTCF) to the insulators. Subsequent investigations indicate that long intergenic non-coding RNA lincRNA-NR_034037, separating TSNAX from DISC1, regulates TSNAX -DISC1 production and TSNAX/DISC1 expression levels by extricating CTCF from insulators. Dysregulation of TSNAX influences steroidogenic factor-1-stimulated transcription on the StAR promoter, altering progesterone actions, implying the association with cancer. Together, these results advance our understanding of the mechanism in which lincRNA-NR_034037 regulates TSNAX-DISC1 formation programs that tightly regulate EC development.
Assuntos
Proteínas de Ligação a DNA/genética , Neoplasias do Endométrio/genética , Neoplasias do Endométrio/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Proteínas do Tecido Nervoso/genética , Splicing de RNA/genética , Proteínas Repressoras/metabolismo , Adolescente , Idoso , Animais , Apoptose , Western Blotting , Fator de Ligação a CCCTC , Ciclo Celular , Proliferação de Células , Criança , Imunoprecipitação da Cromatina , Proteínas de Ligação a DNA/metabolismo , Neoplasias do Endométrio/metabolismo , Endométrio/metabolismo , Feminino , Rearranjo Gênico , Humanos , Técnicas Imunoenzimáticas , Camundongos , Camundongos Nus , Gradação de Tumores , Proteínas do Tecido Nervoso/metabolismo , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Proteínas Repressoras/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Células Tumorais Cultivadas , Ensaios Antitumorais Modelo de XenoenxertoRESUMO
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.
Assuntos
Cromatina/metabolismo , Regulação da Expressão Gênica , Fatores de Transcrição/metabolismo , Sítios de Ligação , Cromatina/química , Cromatina/classificação , Imunoprecipitação da Cromatina , Análise por Conglomerados , Epigênese Genética , Fatores de Transcrição GATA/metabolismo , Loci Gênicos , Genoma Humano , Histonas/metabolismo , Humanos , Células K562 , Análise de Regressão , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Análise de Sequência de RNA , Integração de SistemasRESUMO
Identifying interactions between T-cell receptors (TCRs) and immunogenic peptides holds profound implications across diverse research domains and clinical scenarios. Unsupervised clustering models (UCMs) cannot predict peptide-TCR binding directly, while supervised predictive models (SPMs) often face challenges in identifying antigens previously unencountered by the immune system or possessing limited TCR binding repertoires. Therefore, we propose HeteroTCR, an SPM based on Heterogeneous Graph Neural Network (GNN), to accurately predict peptide-TCR binding probabilities. HeteroTCR captures within-type (TCR-TCR or peptide-peptide) similarity information and between-type (peptide-TCR) interaction insights for predictions on unseen peptides and TCRs, surpassing limitations of existing SPMs. Our evaluation shows HeteroTCR outperforms state-of-the-art models on independent datasets. Ablation studies and visual interpretation underscore the Heterogeneous GNN module's critical role in enhancing HeteroTCR's performance by capturing pivotal binding process features. We further demonstrate the robustness and reliability of HeteroTCR through validation using single-cell datasets, aligning with the expectation that pMHC-TCR complexes with higher predicted binding probabilities correspond to increased binding fractions.
Assuntos
Redes Neurais de Computação , Peptídeos , Receptores de Antígenos de Linfócitos T , Receptores de Antígenos de Linfócitos T/metabolismo , Receptores de Antígenos de Linfócitos T/imunologia , Receptores de Antígenos de Linfócitos T/química , Peptídeos/química , Peptídeos/metabolismo , Peptídeos/imunologia , Ligação Proteica , Humanos , Biologia Computacional/métodosRESUMO
Organogenesis is a highly complex and precisely regulated process. Here we profiled the chromatin accessibility in >350,000 cells derived from 13 mouse embryos at four developmental stages from embryonic day (E) 10.5 to E13.5 by SPATAC-seq in a single experiment. The resulting atlas revealed the status of 830,873 candidate cis-regulatory elements in 43 major cell types. By integrating the chromatin accessibility atlas with the previous transcriptomic dataset, we characterized cis-regulatory sequences and transcription factors associated with cell fate commitment, such as Nr5a2 in the development of gastrointestinal tract, which was preliminarily supported by the in vivo experiment in zebrafish. Finally, we integrated this atlas with the previous single-cell chromatin accessibility dataset from 13 adult mouse tissues to delineate the developmental stage-specific gene regulatory programmes within and across different cell types and identify potential molecular switches throughout lineage development. This comprehensive dataset provides a foundation for exploring transcriptional regulation in organogenesis.
Assuntos
Cromatina , Regulação da Expressão Gênica no Desenvolvimento , Organogênese , Análise de Célula Única , Peixe-Zebra , Animais , Organogênese/genética , Cromatina/metabolismo , Cromatina/genética , Peixe-Zebra/genética , Peixe-Zebra/embriologia , Peixe-Zebra/metabolismo , Camundongos , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Linhagem da Célula/genética , Transcriptoma/genética , Embrião de Mamíferos/metabolismo , Feminino , Camundongos Endogâmicos C57BLRESUMO
This study introduces VitTCR, a predictive model based on the vision transformer (ViT) architecture, aimed at identifying interactions between T cell receptors (TCRs) and peptides, crucial for developing cancer immunotherapies and vaccines. VitTCR converts TCR-peptide interactions into numerical AtchleyMaps using Atchley factors for prediction, achieving AUROC (0.6485) and AUPR (0.6295) values. Benchmark analysis indicates VitTCR's performance is comparable to other models, with further comparative studies suggested to understand its effectiveness in varied contexts. Additionally, integrating a positional bias weight matrix (PBWM), derived from amino acid contact probabilities in structurally resolved pMHC-TCR complexes, slightly improves VitTCR's accuracy. The model's predictions show weak yet statistically significant correlations with immunological factors like T cell clonal expansion and activation percentages, underscoring the biological relevance of VitTCR's predictive capabilities. VitTCR emerges as a valuable computational tool for predicting TCR-peptide interactions, offering insights for immunotherapy and vaccine development.