RESUMO
As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.
Assuntos
Biologia Computacional , Lisina , Processamento de Proteína Pós-Traducional , Proteínas , Humanos , Acilação , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Lisina/metabolismo , Lisina/química , Proteínas/metabolismo , Proteínas/química , SoftwareRESUMO
Lysine 2-hydroxyisobutylation (Khib), which was first reported in 2014, has been shown to play vital roles in a myriad of biological processes including gene transcription, regulation of chromatin functions, purine metabolism, pentose phosphate pathway and glycolysis/gluconeogenesis. Identification of Khib sites in protein substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein 2-hydroxyisobutylation. Experimental identification of Khib sites mainly depends on the combination of liquid chromatography and mass spectrometry. However, experimental approaches for identifying Khib sites are often time-consuming and expensive compared with computational approaches. Previous studies have shown that Khib sites may have distinct characteristics for different cell types of the same species. Several tools have been developed to identify Khib sites, which exhibit high diversity in their algorithms, encoding schemes and feature selection techniques. However, to date, there are no tools designed for predicting cell type-specific Khib sites. Therefore, it is highly desirable to develop an effective predictor for cell type-specific Khib site prediction. Inspired by the residual connection of ResNet, we develop a deep learning-based approach, termed ResNetKhib, which leverages both the one-dimensional convolution and transfer learning to enable and improve the prediction of cell type-specific 2-hydroxyisobutylation sites. ResNetKhib is capable of predicting Khib sites for four human cell types, mouse liver cell and three rice cell types. Its performance is benchmarked against the commonly used random forest (RF) predictor on both 10-fold cross-validation and independent tests. The results show that ResNetKhib achieves the area under the receiver operating characteristic curve values ranging from 0.807 to 0.901, depending on the cell type and species, which performs better than RF-based predictors and other currently available Khib site prediction tools. We also implement an online web server of the proposed ResNetKhib algorithm together with all the curated datasets and trained model for the wider research community to use, which is publicly accessible at https://resnetkhib.erc.monash.edu/.
Assuntos
Lisina , Processamento de Proteína Pós-Traducional , Animais , Camundongos , Humanos , Lisina/metabolismo , Proteínas/metabolismo , Algoritmos , Aprendizado de MáquinaRESUMO
Genome-wide association studies (GWASs) have been widely applied in the neuroimaging field to discover genetic variants associated with brain-related traits. So far, almost all GWASs conducted in neuroimaging genetics are performed on univariate quantitative features summarized from brain images. On the other hand, powerful deep learning technologies have dramatically improved our ability to classify images. In this study, we proposed and implemented a novel machine learning strategy for systematically identifying genetic variants that lead to detectable nuances on Magnetic Resonance Images (MRI). For a specific single nucleotide polymorphism (SNP), if MRI images labeled by genotypes of this SNP can be reliably distinguished using machine learning, we then hypothesized that this SNP is likely to be associated with brain anatomy or function which is manifested in MRI brain images. We applied this strategy to a catalog of MRI image and genotype data collected by the Alzheimer's Disease Neuroimaging Initiative (ADNI) consortium. From the results, we identified novel variants that show strong association to brain phenotypes.
Assuntos
Doença de Alzheimer , Encéfalo , Aprendizado Profundo , Estudo de Associação Genômica Ampla , Imageamento por Ressonância Magnética , Neuroimagem , Polimorfismo de Nucleotídeo Único , Humanos , Imageamento por Ressonância Magnética/métodos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Encéfalo/diagnóstico por imagem , Doença de Alzheimer/genética , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/classificação , Neuroimagem/métodos , Genótipo , Biologia Computacional/métodos , MasculinoRESUMO
Fragile Xassociated tremor/ataxia syndrome (FXTAS) is a debilitating late-onset neurodegenerative disease in premutation carriers of the expanded CGG repeat in FMR1 that presents with a spectrum of neurological manifestations, such as gait ataxia, intention tremor, and parkinsonism [P. J. Hagerman, R. J. Hagerman, Ann. N. Y. Acad. Sci. 1338, 5870 (2015); S. Jacquemont et al., JAMA 291, 460469 (2004)]. Here, we performed whole-genome sequencing (WGS) on male premutation carriers (CGG55200) and prioritized candidate variants to screen for candidate genetic modifiers using a Drosophila model of FXTAS. We found 18 genes that genetically modulate CGG-associated neurotoxicity in Drosophila, such as Prosbeta5 (PSMB5), pAbp (PABPC1L), e(y)1 (TAF9), and CG14231 (OSGEPL1). Among them, knockdown of Prosbeta5 (PSMB5) suppressed CGG-associated neurodegeneration in the fly as well as in N2A cells. Interestingly, an expression quantitative trait locus variant in PSMB5, PSMB5rs11543947-A, was found to be associated with decreased expression of PSMB5 and delayed onset of FXTAS in human FMR1 premutation carriers. Finally, we demonstrate evidence that PSMB5 knockdown results in suppression of CGG neurotoxicity via both the RAN translation and RNA-mediated toxicity mechanisms, thereby presenting a therapeutic strategy for FXTAS.
Assuntos
Ataxia , Síndrome do Cromossomo X Frágil , Complexo de Endopeptidases do Proteassoma , Tremor , Animais , Ataxia/genética , Modelos Animais de Doenças , Drosophila melanogaster , Proteína do X Frágil da Deficiência Intelectual/genética , Síndrome do Cromossomo X Frágil/genética , Humanos , Masculino , Complexo de Endopeptidases do Proteassoma/genética , Tremor/genéticaRESUMO
Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Genoma Humano , Genômica , Humanos , Aprendizado de MáquinaRESUMO
Given most tissues are consist of abundant and diverse (sub-)cell types, an important yet unaddressed problem in bulk RNA-seq analysis is to identify at which (sub-)cell type(s) the differential expression occurs. Single-cell RNA-sequencing (scRNA-seq) technologies can answer the question, but they are often labor-intensive and cost-prohibitive. Here, we present LRcell, a computational method aiming to identify specific (sub-)cell type(s) that drives the changes observed in a bulk RNA-seq experiment. In addition, LRcell provides pre-embedded marker genes computed from putative scRNA-seq experiments as options to execute the analyses. We conduct a simulation study to demonstrate the effectiveness and reliability of LRcell. Using three different real datasets, we show that LRcell successfully identifies known cell types involved in psychiatric disorders. Applying LRcell to bulk RNA-seq results can produce a hypothesis on which (sub-)cell type(s) contributes to the differential expression. LRcell is complementary to cell type deconvolution methods.
Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Humanos , RNA-Seq , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodosRESUMO
Transmitted/founder (T/F) HIV-1 envelope proteins (Envs) from infected individuals that developed neutralization breadth are likely to possess inherent features desirable for vaccine immunogen design. To explore this premise, we conducted an immunization study in rhesus macaques (RM) using T/F Env sequences from two human subjects, one of whom developed potent and broad neutralizing antibodies (Z1800M) while the other developed little to no neutralizing antibody responses (R66M) during HIV-1 infection. Using a DNA/MVA/protein immunization protocol, 10 RM were immunized with each T/F Env. Within each T/F Env group, the protein boosts were administered as either monomeric gp120 or stabilized trimeric gp140 protein. All vaccination regimens elicited high titers of antigen-specific IgG, and two animals that received monomeric Z1800M Env gp120 developed autologous neutralizing activity. Using early Env escape variants isolated from subject Z1800M as guides, the serum neutralizing activity of the two immunized RM was found to be dependent on the gp120 V5 region. Interestingly, the exact same residues of V5 were also targeted by a neutralizing monoclonal antibody (nmAb) isolated from the subject Z1800M early in infection. Glycan profiling and computational modeling of the Z1800M Env gp120 immunogen provided further evidence that the V5 loop is exposed in this T/F Env and was a dominant feature that drove neutralizing antibody targeting during infection and immunization. An expanded B cell clonotype was isolated from one of the neutralization-positive RM and nmAbs corresponding to this group demonstrated V5-dependent neutralization similar to both the RM serum and the human Z1800M nmAb. The results demonstrate that neutralizing antibody responses elicited by the Z1800M T/F Env in RM converged with those in the HIV-1 infected human subject, illustrating the potential of using immunogens based on this or other T/F Envs with well-defined immunogenicity as a starting point to drive breadth.
Assuntos
Vacinas contra a AIDS , Infecções por HIV , HIV-1 , Animais , Anticorpos Neutralizantes , Anticorpos Anti-HIV , Proteína gp120 do Envelope de HIV , Infecções por HIV/prevenção & controle , Humanos , Macaca mulatta , Produtos do Gene env do Vírus da Imunodeficiência HumanaRESUMO
BACKGROUND AND PURPOSE: In the context of the widespread availability of magnetic resonance imaging (MRI) and aggressive salvage irradiation techniques, there has been controversy surrounding the use of prophylactic cranial irradiation (PCI) for small-cell lung cancer (SCLC) patients. This study aimed to explore whether regular brain MRI plus salvage brain irradiation (SBI) is not inferior to PCI in patients with limited-stage SCLC (LS-SCLC). METHODS: This real-world multicenter study, which was conducted between January 2014 and September 2020 at three general hospitals, involved patients with LS-SCLC who had a good response to initial chemoradiotherapy and no brain metastasis confirmed by MRI. Overall survival (OS) was compared between patients who did not receive PCI for various reasons but chose regular MRI surveillance and followed salvage brain irradiation (SBI) when brain metastasis was detected and patients who received PCI. RESULTS: 120 patients met the inclusion criteria. 55 patients received regular brain MRI plus SBI (SBI group) and 65 patients received PCI (PCI group). There was no statistically significant difference in median OS between the two groups (27.14 versus 33.00 months; P = 0.18). In the SBI group, 32 patients underwent whole brain radiotherapy and 23 patients underwent whole brain radiotherapy + simultaneous integrated boost. On multivariate analysis, only extracranial metastasis was independently associated with poor OS in the SBI group. CONCLUSION: The results of this real-world study showed that MRI surveillance plus SBI is not inferior to PCI in OS for LS-SCLC patients who had a good response to initial chemoradiotherapy.
Assuntos
Neoplasias Encefálicas , Irradiação Craniana , Neoplasias Pulmonares , Imageamento por Ressonância Magnética , Terapia de Salvação , Carcinoma de Pequenas Células do Pulmão , Humanos , Carcinoma de Pequenas Células do Pulmão/radioterapia , Carcinoma de Pequenas Células do Pulmão/diagnóstico por imagem , Carcinoma de Pequenas Células do Pulmão/mortalidade , Carcinoma de Pequenas Células do Pulmão/patologia , Masculino , Feminino , Imageamento por Ressonância Magnética/métodos , Neoplasias Pulmonares/radioterapia , Neoplasias Pulmonares/mortalidade , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/patologia , Pessoa de Meia-Idade , Idoso , Irradiação Craniana/métodos , Neoplasias Encefálicas/secundário , Neoplasias Encefálicas/radioterapia , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/mortalidade , Estudos Retrospectivos , Estadiamento de Neoplasias , Adulto , Quimiorradioterapia/métodosRESUMO
PURPOSE: To establish two nomograms to quantify the risk of lung metastasis (LM) in laryngeal carcinoma (LC) and predict the overall survival of LC patients with LM. METHODS: Totally 9515 LC patients diagnosed histologically from 2000 to 2019 were collected from the Surveillance, Epidemiology, and End Results database. The independent diagnostic factors for LM in LC patients and prognostic factors for LC patients with LM were identified by logistic and Cox regression analysis, respectively. Nomograms were established based on regression coefficients and evaluated by receiver operating characteristic curve, calibration curves, and decision curve analysis. RESULTS: Patients with supraglottis, higher pathological grade, higher N stage, and distant metastasis (bone, brain, or liver) were more likely to have LM (P < 0.05). Chemotherapy, surgery and radiotherapy were independent factors of the overall survival of LC patients with LM (P < 0.05). The area under curve of diagnostic nomogram were 0.834 and 0.816 in the training and validation cohort respectively. For the prognostic nomogram, the area under curves of 1-, 2-, and 3-years were 0.735, 0.734, and 0.709 in the training cohort and 0.705, 0.803, and 0.809 in the validation cohort. The calibration curves and decision curve analysis indicated good performance of the nomograms. CONCLUSION: Distant metastasis (bone, brain, or liver) and N stage should be considered for prediction of LM in LC patients. Chemotherapy is the most significant influencing prognostic factor improving the survival of LC patients with LM. Two nomograms may benefit for providing better precautionary measures and treatment decision.
Assuntos
Neoplasias Laríngeas , Neoplasias Pulmonares , Nomogramas , Programa de SEER , Humanos , Neoplasias Laríngeas/patologia , Neoplasias Laríngeas/terapia , Neoplasias Laríngeas/mortalidade , Neoplasias Laríngeas/diagnóstico , Masculino , Feminino , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/mortalidade , Neoplasias Pulmonares/terapia , Neoplasias Pulmonares/diagnóstico , Pessoa de Meia-Idade , Prognóstico , Idoso , Estadiamento de Neoplasias , Curva ROC , Adulto , Taxa de SobrevidaRESUMO
Alzheimer's disease (AD) is the most common form of dementia. Obesity in middle age increases AD risk and severity, which is alarming given that obesity prevalence peaks at middle age and obesity rates are accelerating worldwide. Midlife, but not late-life obesity increases AD risk, suggesting that this interaction is specific to preclinical AD. AD pathology begins in middle age, with accumulation of amyloid beta (Aß), hyperphosphorylated tau, metabolic decline, and neuroinflammation occurring decades before cognitive symptoms appear. We used a transcriptomic discovery approach in young adult (6.5 months old) male and female TgF344-AD rats that overexpress mutant human amyloid precursor protein and presenilin-1 and wild-type (WT) controls to determine whether inducing obesity with a high-fat/high-sugar "Western" diet during preclinical AD increases brain metabolic dysfunction in dorsal hippocampus (dHC), a brain region vulnerable to the effects of obesity and early AD. Analyses of dHC gene expression data showed dysregulated mitochondrial and neurotransmission pathways, and up-regulated genes involved in cholesterol synthesis. Western diet amplified the number of genes that were different between AD and WT rats and added pathways involved in noradrenergic signaling, dysregulated inhibition of cholesterol synthesis, and decreased intracellular lipid transporters. Importantly, the Western diet impaired dHC-dependent spatial working memory in AD but not WT rats, confirming that the dietary intervention accelerated cognitive decline. To examine later consequences of early transcriptional dysregulation, we measured dHC monoamine levels in older (13 months old) AD and WT rats of both sexes after long-term chow or Western diet consumption. Norepinephrine (NE) abundance was significantly decreased in AD rats, NE turnover was increased, and the Western diet attenuated the AD-induced increases in turnover. Collectively, these findings indicate obesity during prodromal AD impairs memory, potentiates AD-induced metabolic decline likely leading to an overproduction of cholesterol, and interferes with compensatory increases in NE transmission.
RESUMO
Alzheimer's disease (AD) is a neurodegenerative disorder influenced by a complex interplay of environmental, epigenetic, and genetic factors. DNA methylation (5mC) and hydroxymethylation (5hmC) are DNA modifications that serve as tissue-specific and temporal regulators of gene expression. TET family enzymes dynamically regulate these epigenetic modifications in response to environmental conditions, connecting environmental factors with gene expression. Previous epigenetic studies have identified 5mC and 5hmC changes associated with AD. In this study, we performed targeted resequencing of TET1 on a cohort of early-onset AD (EOAD) and control samples. Through gene-wise burden analysis, we observed significant enrichment of rare TET1 variants associated with AD (p = 0.04). We also profiled 5hmC in human postmortem brain tissues from AD and control groups. Our analysis identified differentially hydroxymethylated regions (DhMRs) in key genes responsible for regulating the methylome: TET3, DNMT3L, DNMT3A, and MECP2. To further investigate the role of Tet1 in AD pathogenesis, we used the 5xFAD mouse model with a Tet1 KO allele to examine how Tet1 loss influences AD pathogenesis. We observed significant changes in neuropathology, 5hmC, and RNA expression associated with Tet1 loss, while the behavioral alterations were not significant. The loss of Tet1 significantly increased amyloid plaque burden in the 5xFAD mouse (p = 0.044) and lead to a non-significant trend towards exacerbated AD-associated stress response in 5xFAD mice. At the molecular level, we found significant DhMRs enriched in genes involved in pathways responsible for neuronal projection organization, dendritic spine development and organization, and myelin assembly. RNA-Seq analysis revealed a significant increase in the expression of AD-associated genes such as Mpeg1, Ctsd, and Trem2. In conclusion, our results suggest that TET enzymes, particularly TET1, which regulate the methylome, may contribute to AD pathogenesis, as the loss of TET function increases AD-associated pathology.
Assuntos
Doença de Alzheimer , Humanos , Camundongos , Animais , Doença de Alzheimer/metabolismo , 5-Metilcitosina , Epigênese Genética , Metilação de DNA , Fatores de Transcrição/metabolismo , Oxigenases de Função Mista/genética , Oxigenases de Função Mista/metabolismo , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo , Glicoproteínas de Membrana/metabolismo , Receptores Imunológicos/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismoRESUMO
BACKGROUND: Long noncoding RNAs (lncRNAs) are RNA molecules with over 200 nucleotides that do not code for proteins, but are known to be widely expressed and have key roles in gene regulation and cellular functions. They are also found to be involved in the onset and development of various cancers, including prostate cancer (PCa). Since PCa are commonly driven by androgen regulated signaling, mainly stimulated pathways, identification and determining the influence of lncRNAs in androgen response is useful and necessary. LncRNAs regulated by the androgen receptor (AR) can serve as potential biomarkers for PCa. In the present study, gene expression data analysis were performed to distinguish lncRNAs related to the androgen response pathway. METHODS AND RESULTS: We used publicly available RNA-sequencing and ChIP-seq data to identify lncRNAs that are associated with the androgen response pathway. Using Universal Correlation Coefficient (UCC) and Pearson Correlation Coefficient (PCC) analyses, we found 15 lncRNAs that have (a) highly correlated expression with androgen response genes in PCa and are (b) differentially expressed in the setting of treatment with an androgen agonist as well as antagonist compared to controls. Using publicly available ChIP-seq data, we investigated the role of androgen/AR axis in regulating expression of these lncRNAs. We observed AR binding in the promoter regions of 5 lncRNAs (MIR99AHG, DUBR, DRAIC, PVT1, and COLCA1), showing the direct influence of AR on their expression and highlighting their association with the androgen response pathway. CONCLUSION: By utilizing publicly available multiomics data and by employing in silico methods, we identified five candidate lncRNAs that are involved in the androgen response pathway. These lncRNAs should be investigated as potential biomarkers for PCa.
Assuntos
Neoplasias da Próstata , RNA Longo não Codificante , Masculino , Humanos , Androgênios , RNA Longo não Codificante/genética , Linhagem Celular Tumoral , Neoplasias da Próstata/tratamento farmacológico , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Regulação da Expressão Gênica , Regulação Neoplásica da Expressão GênicaRESUMO
Chromosomes of metazoan organisms are partitioned in the interphase nucleus into discrete topologically associating domains (TADs). Borders between TADs are formed in regions containing active genes and clusters of architectural protein binding sites. The transcription of most genes is repressed after temperature stress in Drosophila. Here we show that temperature stress induces relocalization of architectural proteins from TAD borders to inside TADs, and this is accompanied by a dramatic rearrangement in the 3D organization of the nucleus. TAD border strength declines, allowing for an increase in long-distance inter-TAD interactions. Similar but quantitatively weaker effects are observed upon inhibition of transcription or depletion of individual architectural proteins. Heat shock-induced inter-TAD interactions result in increased contacts among enhancers and promoters of silenced genes, which recruit Pc and form Pc bodies in the nucleolus. These results suggest that the TAD organization of metazoan genomes is plastic and can be reconfigured quickly.
Assuntos
Cromatina/genética , Cromossomos/genética , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Proteínas do Grupo Polycomb/metabolismo , Animais , Linhagem Celular , Proteínas de Drosophila/química , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Elementos Facilitadores Genéticos , Dados de Sequência Molecular , Proteínas do Grupo Polycomb/química , Proteínas do Grupo Polycomb/genética , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Estresse Fisiológico , TemperaturaRESUMO
Identification of repeat-associated non-AUG (RAN) translation in trinucleotide (CAG) repeat diseases has led to the emerging concept that CAG repeat diseases are caused by nonpolyglutamine products. Nonetheless, the in vivo contribution of RAN translation to the pathogenesis of CAG repeat diseases remains elusive. Via CRISPR/Cas9-mediated genome editing, we established knock-in mouse models that harbor expanded CAG repeats in the mouse huntingtin gene to express RAN-translated products with or without polyglutamine peptides. We found that RAN translation is not detected in the knock-in mouse models when expanded CAG repeats are expressed at the endogenous level. Consistently, the expanded CAG repeats that cannot be translated into polyglutamine repeats do not yield the neuropathological and behavioral phenotypes that were found in knock-in mice expressing expanded polyglutamine repeats. Our findings suggest that RAN-translated products do not play a major role in the pathogenesis of CAG repeat diseases and underscore the importance in targeting polyglutamine repeats for therapeutics.
Assuntos
Doença de Huntington/genética , RNA/genética , Animais , Modelos Animais de Doenças , Feminino , Técnicas de Introdução de Genes , Humanos , Proteína Huntingtina/genética , Proteína Huntingtina/metabolismo , Doença de Huntington/metabolismo , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Biossíntese de Proteínas , RNA/metabolismo , Expansão das Repetições de Trinucleotídeos , Repetições de TrinucleotídeosRESUMO
There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.
RESUMO
Histone H2B ubiquitination plays an important role in transcription regulation. It has been shown that H2B ubiquitination is regulated by multiple upstream events associated with elongating RNA polymerase. Here we demonstrate that H2B K34 ubiquitylation by the MOF-MSL complex is part of the protein networks involved in early steps of transcription elongation. Knocking down MSL2 in the MOF-MSL complex affects not only global H2BK34ub, but also multiple cotranscriptionally regulated histone modifications. More importantly, we show that the MSL, PAF1, and RNF20/40 complexes are recruited and stabilized at active gene promoters by direct binary interactions. The stabilized complexes serve to regulate chromatin association of pTEFb through a positive feedback loop and facilitate Pol II transition during early transcription elongation. Results from our biochemical studies are underscored by genome-wide analyses that show high RNA Pol II processivity and transcription activity at MSL target genes.
Assuntos
Fatores de Transcrição de Zíper de Leucina Básica/metabolismo , Histonas/química , Proteínas Nucleares/metabolismo , RNA Polimerase II/metabolismo , Ubiquitina-Proteína Ligases/genética , Ubiquitinação , Anticorpos/imunologia , Sítios de Ligação/genética , Linhagem Celular Tumoral , Cromatina/genética , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Células HeLa , Histona Acetiltransferases/química , Histonas/imunologia , Humanos , Regiões Promotoras Genéticas , Ligação Proteica/genética , Interferência de RNA , RNA Interferente Pequeno , Fatores de Transcrição , Transcrição Gênica , Ubiquitina-Proteína Ligases/químicaRESUMO
Here we report a comprehensive characterization of our recently developed inhibitor MM-401 that targets the MLL1 H3K4 methyltransferase activity. MM-401 is able to specifically inhibit MLL1 activity by blocking MLL1-WDR5 interaction and thus the complex assembly. This targeting strategy does not affect other mixed-lineage leukemia (MLL) family histone methyltransferases (HMTs), revealing a unique regulatory feature for the MLL1 complex. Using MM-401 and its enantiomer control MM-NC-401, we show that inhibiting MLL1 methyltransferase activity specifically blocks proliferation of MLL cells by inducing cell-cycle arrest, apoptosis, and myeloid differentiation without general toxicity to normal bone marrow cells or non-MLL cells. More importantly, transcriptome analyses show that MM-401 induces changes in gene expression similar to those of MLL1 deletion, supporting a predominant role of MLL1 activity in regulating MLL1-dependent leukemia transcription program. We envision broad applications for MM-401 in basic and translational research.
Assuntos
Histona-Lisina N-Metiltransferase/antagonistas & inibidores , Histona-Lisina N-Metiltransferase/metabolismo , Histonas/metabolismo , Leucemia Aguda Bifenotípica/enzimologia , Proteína de Leucina Linfoide-Mieloide/metabolismo , Animais , Apoptose/efeitos dos fármacos , Pontos de Checagem do Ciclo Celular/efeitos dos fármacos , Diferenciação Celular/efeitos dos fármacos , Linhagem Celular Tumoral , Proliferação de Células , Histona Metiltransferases , Histona-Lisina N-Metiltransferase/química , Histona-Lisina N-Metiltransferase/genética , Humanos , Peptídeos e Proteínas de Sinalização Intracelular , Camundongos , Proteína de Leucina Linfoide-Mieloide/química , Proteína de Leucina Linfoide-Mieloide/genética , Oligopeptídeos/química , Oligopeptídeos/fisiologia , Proteínas/metabolismo , Proteínas Proto-Oncogênicas c-bcl-2/metabolismo , Transcriptoma/efeitos dos fármacosRESUMO
BACKGROUND: Transcriptional programs control cell fate, and identifying their components is critical for understanding diseases caused by cell lesion, such as podocytopathy. Although many transcription factors (TFs) are necessary for cell-state maintenance in glomeruli, their roles in transcriptional regulation are not well understood. METHODS: The distribution of H3K27ac histones in human glomerulus cells was analyzed to identify superenhancer-associated TFs, and ChIP-seq and transcriptomics were performed to elucidate the regulatory roles of the TFs. Transgenic animal models of disease were further investigated to confirm the roles of specific TFs in podocyte maintenance. RESULTS: Superenhancer distribution revealed a group of potential TFs in core regulatory circuits in human glomerulus cells, including FOXC1/2, WT1, and LMX1B. Integration of transcriptome and cistrome data of FOXC1/2 in mice resolved transcriptional regulation in podocyte maintenance. FOXC1/2 regulated differentiation-associated transcription in mature podocytes. In both humans and animal models, mature podocyte injury was accompanied by deregulation of FOXC1/2 expression, and FOXC1/2 overexpression could protect podocytes in zebrafish. CONCLUSIONS: FOXC1/2 maintain podocyte differentiation through transcriptional stabilization. The genome-wide chromatin resources support further investigation of TFs' regulatory roles in glomeruli transcription programs.
Assuntos
Fatores de Transcrição Forkhead/genética , Podócitos/fisiologia , Fatores de Transcrição/genética , Transcrição Gênica , Animais , Diferenciação Celular/genética , Mapeamento Cromossômico , Modelos Animais de Doenças , Fatores de Transcrição Forkhead/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Histonas , Humanos , Nefropatias/genética , Nefropatias/metabolismo , Proteínas com Homeodomínio LIM/genética , Proteínas com Homeodomínio LIM/metabolismo , Camundongos , Camundongos Knockout , Camundongos Transgênicos , Podócitos/patologia , Fatores de Transcrição/metabolismo , Transcriptoma , Proteínas WT1/genética , Proteínas WT1/metabolismo , Peixe-Zebra , Proteínas de Peixe-Zebra/genéticaRESUMO
BACKGROUND: Cell type-specific transcriptional programming results from the combinatorial interplay between the repertoire of active regulatory elements. Disease-associated variants disrupt such programming, leading to altered expression of downstream regulated genes and the onset of pathological states. However, due to the non-linear regulatory properties of non-coding elements such as enhancers, which can activate transcription at long distances and in a non-directional way, the identification of causal variants and their target genes remains challenging. Here, we provide a multi-omics analysis to identify regulatory elements associated with functional kidney disease variants, and downstream regulated genes. RESULTS: In order to understand the genetic risk of kidney diseases, we generated a comprehensive dataset of the chromatin landscape of human kidney tubule cells, including transcription-centered 3D chromatin organization, histone modifications distribution and transcriptome with HiChIP, ChIP-seq and RNA-seq. We identified genome-wide functional elements and thousands of interactions between the distal elements and target genes. The results revealed that risk variants for renal tumor and chronic kidney disease were enriched in kidney tubule cells. We further pinpointed the target genes for the variants and validated two target genes by CRISPR/Cas9 genome editing techniques in zebrafish, demonstrating that SLC34A1 and MTX1 were indispensable genes to maintain kidney function. CONCLUSIONS: Our results provide a valuable multi-omics resource on the chromatin landscape of human kidney tubule cells and establish a bioinformatic pipeline in dissecting functions of kidney disease-associated variants based on cell type-specific epigenome.
Assuntos
Sistemas CRISPR-Cas , Cromatina/metabolismo , Epigenoma , Nefropatias/genética , Animais , Edição de Genes , Humanos , Peixe-ZebraRESUMO
MOTIVATION: Annotating a given genomic locus or a set of genomic loci is an important yet challenging task. This is especially true for the non-coding part of the genome which is enormous yet poorly understood. Since gene set enrichment analyses have demonstrated to be effective approach to annotate a set of genes, the same idea can be extended to explore the enrichment of functional elements or features in a set of genomic intervals to reveal potential functional connections. RESULTS: In this study, we describe a novel computational strategy named loci2path that takes advantage of the newly emerged, genome-wide and tissue-specific expression quantitative trait loci (eQTL) information to help annotate a set of genomic intervals in terms of transcription regulation. By checking the presence or the absence of millions of eQTLs in a set of input genomic intervals, combined with grouping eQTLs by the pathways or gene sets that their target genes belong to, loci2path build a bridge connecting genomic intervals to functional pathways and pre-defined biological-meaningful gene sets, revealing potential for regulatory connection. Our method enjoys two key advantages over existing methods: first, we no longer rely on proximity to link a locus to a gene which has shown to be unreliable; second, eQTL allows us to provide the regulatory annotation under the context of specific tissue types. To demonstrate its utilities, we apply loci2path on sets of genomic intervals harboring disease-associated variants as query. Using 1 702 612 eQTLs discovered by the Genotype-Tissue Expression (GTEx) project across 44 tissues and 6320 pathways or gene sets cataloged in MSigDB as annotation resource, our method successfully identifies highly relevant biological pathways and revealed disease mechanisms for psoriasis and other immune-related diseases. Tissue specificity analysis of associated eQTLs provide additional evidence of the distinct roles of different tissues played in the disease mechanisms. AVAILABILITY AND IMPLEMENTATION: loci2path is published as an open source Bioconductor package, and it is available at http://bioconductor.org/packages/release/bioc/html/loci2path.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.