RESUMO
It is a vital step to recognize cyanobacteria promoters on a genome-wide scale. Computational methods are promising to assist in difficult biological identification. When building recognition models, these methods rely on non-promoter generation to cope with the lack of real non-promoters. Nevertheless, the factitious significant difference between promoters and non-promoters causes over-optimistic prediction. Moreover, designed for E. coli or B. subtilis, existing methods cannot uncover novel, distinct motifs among cyanobacterial promoters. To address these issues, this work first proposes a novel non-promoter generation strategy called phantom sampling, which can eliminate the factitious difference between promoters and generated non-promoters. Furthermore, it elaborates a novel promoter prediction model based on the Siamese network (SiamProm), which can amplify the hidden difference between promoters and non-promoters through a joint characterization of global associations, upstream and downstream contexts, and neighboring associations w.r.t. k-mer tokens. The comparison with state-of-the-art methods demonstrates the superiority of our phantom sampling and SiamProm. Both comprehensive ablation studies and feature space illustrations also validate the effectiveness of the Siamese network and its components. More importantly, SiamProm, upon our phantom sampling, finds a novel cyanobacterial promoter motif ('GCGATCGC'), which is palindrome-patterned, content-conserved, but position-shifted.
Assuntos
Cianobactérias , Regiões Promotoras Genéticas , Cianobactérias/genética , Biologia Computacional/métodos , AlgoritmosRESUMO
MOTIVATION: Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS: In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.
Assuntos
Cromatina , Epigênese Genética , Animais , Camundongos , Cromatina/genética , Sequências Reguladoras de Ácido Nucleico , Redes Neurais de ComputaçãoRESUMO
Without general adaptative immunity, invertebrates evolved a vast number of heterogeneous non-self recognition strategies. One of those well-known adaptations is the expansion of the immune receptor gene superfamily coding for scavenger receptor cysteine-rich domain containing proteins (SRCR) in a few invertebrates. Here, we investigated the evolutionary history of the SRCR gene superfamily (SRCR-SF) across 29 metazoan species with an emphasis on invertebrates. We analyzed their domain architectures, genome locations and phylogenetic distribution. Our analysis shows extensive genome-wide duplications of the SRCR-SFs in Amphimedon queenslandica and Strongylocentrotus purpuratus. Further molecular evolution study reveals various patterns of conserved cysteines in the sponge and sea urchin SRCR-SFs, indicating independent and convergent evolution of SRCR-SF expansion during invertebrate evolution. In the case of the sponge SRCR-SFs, a novel motif with seven conserved cysteines was identified. Exon-intron structure analysis suggests the rapid evolution of SRCR-SFs during gene duplications in both the sponge and the sea urchin. Our findings across nine representative metazoans also underscore a heightened expression of SRCR-SFs in immune-related tissues, notably the digestive glands. This observation indicates the potential role of SRCR-SFs in reinforcing distinct immune functions in these invertebrates. Collectively, our results reveal that gene duplication, motif structure variation, and exon-intron divergence might lead to the convergent evolution of SRCR-SF expansions in the genomes of the sponge and sea urchin. Our study also suggests that the utilization of SRCR-SF receptor duplication may be a general and basal strategy to increase immune diversity and tissue specificity for the invertebrates.
Assuntos
Invertebrados , Receptores Imunológicos , Animais , Receptores Depuradores/genética , Filogenia , Receptores Imunológicos/genética , Invertebrados/genética , Ouriços-do-Mar/genética , Evolução MolecularRESUMO
BACKGROUND: Allele-specific binding (ASB) events occur when transcription factors (TFs) bind more favorably to one of the two parental alleles at heterozygous single nucleotide polymorphisms (SNPs). Evidence suggests that ASB events could reveal the impact of sequence variations on TF binding and may have implications for the risk of diseases. RESULTS: Here we present ASB-analyzer, a software platform that enables the users to quickly and efficiently input raw sequencing data to generate individual reports containing the cytogenetic map of ASB SNPs and their associated phenotypes. This interactive tool thereby combines ASB SNP identification, biological annotation, motif analysis, phenotype associations and report summary in one pipeline. With this pipeline, we identified 3772 ASB SNPs from thirty GM12878 ChIP-seq datasets and demonstrated that the ASB SNPs were more likely to be enriched at important sites in TF-binding domains. CONCLUSIONS: ASB-analyzer is a user-friendly tool that enables the detection, characterization and visualization of ASB SNPs. It is implemented in Python, R and bash shell and packaged in the Conda environment. It is available as an open-source tool on GitHub at https://github.com/Liying1996/ASBanalyzer .
Assuntos
Polimorfismo de Nucleotídeo Único , Fatores de Transcrição , Alelos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Software , Ligação Proteica , Sítios de LigaçãoRESUMO
BACKGROUND: Massive amounts of data are produced by combining next-generation sequencing with complex biochemistry techniques to characterize regulatory genomics profiles, such as protein-DNA interaction and chromatin accessibility. Interpretation of such high-throughput data typically requires different computation methods. However, existing tools are usually developed for a specific task, which makes it challenging to analyze the data in an integrative manner. RESULTS: We here describe the Regulatory Genomics Toolbox (RGT), a computational library for the integrative analysis of regulatory genomics data. RGT provides different functionalities to handle genomic signals and regions. Based on that, we developed several tools to perform distinct downstream analyses, including the prediction of transcription factor binding sites using ATAC-seq data, identification of differential peaks from ChIP-seq data, and detection of triple helix mediated RNA and DNA interactions, visualization, and finding an association between distinct regulatory factors. CONCLUSION: We present here RGT; a framework to facilitate the customization of computational methods to analyze genomic data for specific regulatory genomics problems. RGT is a comprehensive and flexible Python package for analyzing high throughput regulatory genomics data and is available at: https://github.com/CostaLab/reg-gen . The documentation is available at: https://reg-gen.readthedocs.io.
Assuntos
Cromatina , Genômica , Sequenciamento de Cromatina por Imunoprecipitação , Documentação , Biblioteca GênicaRESUMO
Discovery of target-binding molecules, such as aptamers and peptides, is usually performed with the use of high-throughput experimental screening methods. These methods typically generate large datasets of sequences of target-binding molecules, which can be enriched with high affinity binders. However, the identification of the highest affinity binders from these large datasets often requires additional low-throughput experiments or other approaches. Bioinformatics-based analyses could be helpful to better understand these large datasets and identify the parts of the sequence space enriched with high affinity binders. BinderSpace is an open-source Python package that performs motif analysis, sequence space visualization, clustering analyses, and sequence extraction from clusters of interest. The motif analysis, resulting in text-based and visual output of motifs, can also provide heat maps of previously measured user-defined functional properties for all the motif-containing molecules. Users can also run principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) analyses on whole datasets and on motif-related subsets of the data. Functionally important sequences can also be highlighted in the resulting PCA and t-SNE maps. If points (sequences) in two-dimensional maps in PCA or t-SNE space form clusters, users can perform clustering analyses on their data, and extract sequences from clusters of interest. We demonstrate the use of BinderSpace on a dataset of oligonucleotides binding to single-wall carbon nanotubes in the presence and absence of a bioanalyte, and on a dataset of cyclic peptidomimetics binding to bovine carbonic anhydrase protein. BinderSpace is openly accessible to the public via the GitHub website: https://github.com/vukoviclab/BinderSpace.
Assuntos
Nanotubos de Carbono , Oligonucleotídeos , Animais , Bovinos , Peptídeos , Biologia Computacional , Análise de Sequência , AlgoritmosRESUMO
Primary hepatocytes are widely used in the pharmaceutical industry to screen drug candidates for hepatotoxicity, but hepatocytes quickly dedifferentiate and lose their mature metabolic function in culture. Attempts have been made to better recapitulate the in vivo liver environment in culture, but the full spectrum of signals required to maintain hepatocyte function ex vivo remains elusive. To elucidate molecular changes that accompany, and may contribute to dedifferentiation of hepatocytes ex vivo, we performed lineage tracing and comprehensive profiling of alterations in their gene expression profiles and chromatin landscape during culture. First, using genetically tagged hepatocytes we demonstrate that expression of the fetal gene alpha-fetoprotein in cultured hepatocytes comes from cells that previously expressed the mature gene albumin, and not from a population of albumin-negative precursor cells, proving mature hepatocytes undergo true dedifferentiation in culture. Next we studied the dedifferentiation process in detail through bulk RNA-sequencing of hepatocytes cultured over an extended period. We identified three distinct phases of dedifferentiation: an early phase, where mature hepatocyte genes are rapidly downregulated in a matter of hours; a middle phase, where fetal genes are activated; and a late phase, where initially rare contaminating non-parenchymal cells proliferate, taking over the culture. Lastly, to better understand the signaling events that result in the rapid downregulation of mature genes in hepatocytes, we examined changes in chromatin accessibility in these cells during the first 24 h of culture using Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq). We find that drastic and rapid changes in chromatin accessibility occur immediately upon the start of culture. Using binding motif analysis of the areas of open chromatin sharing similar temporal profiles, we identify several candidate transcription factors potentially involved in the dedifferentiation of primary hepatocytes in culture.
Assuntos
Hepatócitos , Fígado , Células Cultivadas , Hepatócitos/metabolismo , Albuminas , Cromatina/genéticaRESUMO
Myo-inositol oxygenase (MIOX), the only catabolic enzyme of the inositol pathway, catalyzes conversion of myo-inositol to D-GlcA (glucuronic acid). The present study encompasses bioinformatic analysis of MIOX gene across phylogenetically related plant lineages and representative animal groups. Comparative motif analysis of the MIOX gene(s) across various plant groups suggested existence of abiotic- stress related cis-acting elements such as, DRE, MYB, MYC, STRE, MeJa among others. A detailed analysis revealed a single isoform of MIOX gene, located in chromosome 6 of indica rice (Oryza sativa) with an open reading frame of 938 bp coding for 308 amino acids producing a protein of ~ 35 kD. Secondary structure prediction of the protein gave the predicted number of 144 alpha helices and 154 random coils. The three-dimensional structure suggested it to be a monomeric protein with a single domain. Bacterial overexpression of the protein, purification and enzyme assay showed optimal catalytic activity at pH 7.5-8 at an optimal temperature of 37 °C with Michaelis constant of 40.92 mM. The range of Km was determined as 22.74-28.7 mM and the range of Vmax was calculated as 3.51-3.6 µM/min, respectively. Four salt-tolerant and salt-sensitive rice cultivars displayed differential gene expression of OsMIOX at different time points in different tissues under salinity and drought stress as observed from qRT-PCR data, microarray results and protein expression profile in immunoblot analysis. Gel volumetric analysis confirmed a very high expression of MIOX in roots and leaves on 7th day following germination. Microarray data showed high expression of MIOX at all developmental stages including seedling growth and reproduction. These data suggest that OsMIOX might have a role to play in rice abiotic stress responses mediated through the myo-inositol oxidation pathway. Supplementary Information: The online version contains supplementary material available at 10.1007/s12298-023-01340-6.
RESUMO
BACKGROUND: Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths worldwide. Recent studies have observed causative mutations in susceptible genes related to colorectal cancer in 10 to 15% of the patients. This highlights the importance of identifying mutations for early detection of this cancer for more effective treatments among high risk individuals. Mutation is considered as the key point in cancer research. Many studies have performed cancer subtyping based on the type of frequently mutated genes, or the proportion of mutational processes. However, to the best of our knowledge, combination of these features has never been used together for this task. This highlights the potential to introduce better and more inclusive subtype classification approaches using wider range of related features to enable biomarker discovery and thus inform drug development for CRC. RESULTS: In this study, we develop a new pipeline based on a novel concept called 'gene-motif', which merges mutated gene information with tri-nucleotide motif of mutated sites, for colorectal cancer subtype identification. We apply our pipeline to the International Cancer Genome Consortium (ICGC) CRC samples and identify, for the first time, 3131 gene-motif combinations that are significantly mutated in 536 ICGC colorectal cancer samples. Using these features, we identify seven CRC subtypes with distinguishable phenotypes and biomarkers, including unique cancer related signaling pathways, in which for most of them targeted treatment options are currently available. Interestingly, we also identify several genes that are mutated in multiple subtypes but with unique sequence contexts. CONCLUSION: Our results highlight the importance of considering both the mutation type and mutated genes in identification of cancer subtypes and cancer biomarkers. The new CRC subtypes presented in this study demonstrates distinguished phenotypic properties which can be effectively used to develop new treatments. By knowing the genes and phenotypes associated with the subtypes, a personalized treatment plan can be developed that considers the specific phenotypes associated with their genomic lesion.
Assuntos
Neoplasias Colorretais , Biomarcadores Tumorais/genética , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Genômica , Humanos , Mutação , FenótipoRESUMO
BACKGROUND: Genome-wide protein-DNA binding is popularly assessed using specific antibody pulldown in Chromatin Immunoprecipitation Sequencing (ChIP-Seq) or Cleavage Under Targets and Release Using Nuclease (CUT&RUN) sequencing experiments. These technologies generate high-throughput sequencing data that necessitate the use of multiple sophisticated, computationally intensive genomic tools to make discoveries, but these genomic tools often have a high barrier to use because of computational resource constraints. RESULTS: We present a comprehensive, infrastructure-independent, computational pipeline called SEAseq, which leverages field-standard, open-source tools for processing and analyzing ChIP-Seq/CUT&RUN data. SEAseq performs extensive analyses from the raw output of the experiment, including alignment, peak calling, motif analysis, promoters and metagene coverage profiling, peak annotation distribution, clustered/stitched peaks (e.g. super-enhancer) identification, and multiple relevant quality assessment metrics, as well as automatic interfacing with data in GEO/SRA. SEAseq enables rapid and cost-effective resource for analysis of both new and publicly available datasets as demonstrated in our comparative case studies. CONCLUSIONS: The easy-to-use and versatile design of SEAseq makes it a reliable and efficient resource for ensuring high quality analysis. Its cloud implementation enables a broad suite of analyses in environments with constrained computational resources. SEAseq is platform-independent and is aimed to be usable by everyone with or without programming skills. It is available on the cloud at https://platform.stjude.cloud/workflows/seaseq and can be locally installed from the repository at https://github.com/stjude/seaseq .
Assuntos
Cromatina , Software , Imunoprecipitação da Cromatina , Sequenciamento de Cromatina por Imunoprecipitação , Computação em Nuvem , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
G2-like (GLK) transcription factors contribute significantly and extensively in regulating chloroplast growth and development in plants. This study investigated the genome-wide identification, phylogenetic relationships, conserved motifs, promoter cis-elements, MCScanX, divergence times, and expression profile analysis of PeGLK genes in moso bamboo (Phyllostachys edulis). Overall, 78 putative PeGLKs (PeGLK1-PeGLK78) were identified and divided into 13 distinct subfamilies. Each subfamily contains members displaying similar gene structure and motif composition. By synteny analysis, 42 orthologous pairs and highly conserved microsynteny between regions of GLK genes across moso bamboo and maize were found. Furthermore, an analysis of the divergence times indicated that PeGLK genes had a duplication event around 15 million years ago (MYA) and a divergence happened around 38 MYA between PeGLK and ZmGLK. Tissue-specific expression analysis showed that PeGLK genes presented distinct expression profiles in various tissues, and many members were highly expressed in leaves. Additionally, several PeGLKs were significantly up-regulated under cold stress, osmotic stress, and MeJA and GA treatment, implying that they have a likelihood of affecting abiotic stress and phytohormone responses in plants. The results of this study provide a comprehensive understanding of the moso bamboo GLK gene family, as well as elucidating the potential functional characterization of PeGLK genes.
Assuntos
Regulação da Expressão Gênica de Plantas , Proteínas de Plantas , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Poaceae/genética , Poaceae/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Abiotic stress resistance traits may be especially crucial for sustainable production of bioenergy tree crops. Here, we show the performance of a set of rationally designed osmotic-related and salt stress-inducible synthetic promoters for use in hybrid poplar. De novo motif-detecting algorithms yielded 30 water-deficit (SD) and 34 salt stress (SS) candidate DNA motifs from relevant poplar transcriptomes. We selected three conserved water-deficit stress motifs (SD18, SD13 and SD9) found in 16 co-expressed gene promoters, and we discovered a well-conserved motif for salt response (SS16). We characterized several native poplar stress-inducible promoters to enable comparisons with our synthetic promoters. Fifteen synthetic promoters were designed using various SD and SS subdomains, in which heptameric repeats of five-to-eight subdomain bases were fused to a common core promoter downstream, which, in turn, drove a green fluorescent protein (GFP) gene for reporter assays. These 15 synthetic promoters were screened by transient expression assays in poplar leaf mesophyll protoplasts and agroinfiltrated Nicotiana benthamiana leaves under osmotic stress conditions. Twelve synthetic promoters were induced in transient expression assays with a GFP readout. Of these, five promoters (SD18-1, SD9-2, SS16-1, SS16-2 and SS16-3) endowed higher inducibility under osmotic stress conditions than native promoters. These five synthetic promoters were stably transformed into Arabidopsis thaliana to study inducibility in whole plants. Herein, SD18-1 and SD9-2 were induced by water-deficit stress, whereas SS16-1, SS16-2 and SS16-3 were induced by salt stress. The synthetic biology design pipeline resulted in five synthetic promoters that outperformed endogenous promoters in transgenic plants.
Assuntos
Regulação da Expressão Gênica de Plantas , Proteínas de Plantas , Regulação da Expressão Gênica de Plantas/genética , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas Geneticamente Modificadas/genética , Plantas Geneticamente Modificadas/metabolismo , Regiões Promotoras Genéticas/genética , Estresse Fisiológico/genéticaRESUMO
With the increase in CO2 emissions worldwide and its dire effects, there is a need to reduce CO2 concentrations in the atmosphere. Alpha-carbonic anhydrases (α-CAs) have been identified as suitable sequestration agents. This study reports the sequence and structural analysis of 15 α-CAs from bacteria, originating from hydrothermal vent systems. Structural analysis of the multimers enabled the identification of hotspot and interface residues. Molecular dynamics simulations of the homo-multimers were performed at 300 K, 363 K, 393 K and 423 K to unearth potentially thermostable α-CAs. Average betweenness centrality (BC) calculations confirmed the relevance of some hotspot and interface residues. The key residues responsible for dimer thermostability were identified by comparing fluctuating interfaces with stable ones, and were part of conserved motifs. Crucial long-lived hydrogen bond networks were observed around residues with high BC values. Dynamic cross correlation fortified the relevance of oligomerization of these proteins, thus the importance of simulating them in their multimeric forms. A consensus of the simulation analyses used in this study suggested high thermostability for the α-CA from Nitratiruptor tergarcus. Overall, our novel findings enhance the potential of biotechnology applications through the discovery of alternative thermostable CO2 sequestration agents and their potential protein design.
Assuntos
Bactérias/enzimologia , Proteínas de Bactérias/metabolismo , Dióxido de Carbono/isolamento & purificação , Anidrases Carbônicas/química , Anidrases Carbônicas/metabolismo , Fontes Hidrotermais/microbiologia , Sequestrantes/metabolismo , Sequência de Aminoácidos , Dióxido de Carbono/metabolismo , Simulação por Computador , Simulação de Dinâmica Molecular , Homologia de SequênciaRESUMO
BACKGROUND: Treatment of parasitic diseases has been challenging due to evolution of drug resistant parasites, and thus there is need to identify new class of drugs and drug targets. Protein translation is important for survival of malarial parasite, Plasmodium, and the pathway is present in all of its life cycle stages. Aminoacyl tRNA synthetases are primary enzymes in protein translation as they catalyse amino acid addition to the cognate tRNA. This study sought to understand differences between Plasmodium and human aminoacyl tRNA synthetases through bioinformatics analysis. METHODS: Plasmodium berghei, Plasmodium falciparum, Plasmodium fragile, Plasmodium knowlesi, Plasmodium malariae, Plasmodium ovale, Plasmodium vivax, Plasmodium yoelii and human aminoacyl tRNA synthetase sequences were retrieved from UniProt database and grouped into 20 families based on amino acid specificity. These families were further divided into two classes. Both families and classes were analysed. Motif discovery was carried out using the MEME software, sequence identity calculation was done using an in-house Python script, multiple sequence alignments were performed using PROMALS3D and TCOFFEE tools, and phylogenetic tree calculations were performed using MEGA vs 7.0 tool. Possible alternative binding sites were predicted using FTMap webserver and SiteMap tool. RESULTS: Motif discovery revealed Plasmodium-specific motifs while phylogenetic tree calculations showed that Plasmodium proteins have different evolutionary history to the human homologues. Human aaRSs sequences showed low sequence identity (below 40%) compared to Plasmodium sequences. Prediction of alternative binding sites revealed potential druggable sites in PfArgRS, PfMetRS and PfProRS at regions that are weakly conserved when compared to the human homologues. Multiple sequence analysis, motif discovery, pairwise sequence identity calculations and phylogenetic tree analysis showed significant differences between parasite and human aaRSs proteins despite functional and structural conservation. These differences may provide a basis for further exploration of Plasmodium aminoacyl tRNA synthetases as potential drug targets. CONCLUSION: This study showed that, despite, functional and structural conservation, Plasmodium aaRSs have key differences from the human homologues. These differences in Plasmodium aaRSs can be targeted to develop anti-malarial drugs with less toxicity to the host.
Assuntos
Aminoacil-tRNA Sintetases/genética , Antimaláricos/farmacologia , Plasmodium/genética , Proteínas de Protozoários/genética , Sequência de Aminoácidos , Aminoacil-tRNA Sintetases/antagonistas & inibidores , Aminoacil-tRNA Sintetases/química , Aminoacil-tRNA Sintetases/metabolismo , Biologia Computacional , Humanos , Filogenia , Plasmodium/efeitos dos fármacos , Plasmodium/enzimologia , Proteínas de Protozoários/antagonistas & inibidores , Proteínas de Protozoários/química , Proteínas de Protozoários/metabolismo , Alinhamento de SequênciaRESUMO
NAC transcription factors (TFs) are one of the largest and important TF family that are involved in the regulation of plant growth and development. They are characterized by a highly conserved N-terminal domain and a variable C-terminal domain. In the present study, the amino acid sequences of NAC TFs from four embryophytic plant species viz. Arabidopsis thaliana (Angiosperm), Picea abies (Gymnosperm), Selaginella moellendorffii (Pteridophyte) and Physcomitrella patens (Bryophyte) as reference of the different plant groups were collected from the Plant Transcription Factor Database (PTFD) and the phylogenetic relationships were evaluated. The phylogenetic tree revealed that the majority of the NAC members were interspersed in the major subgroups that indicated the expansion of the NAC members predates the speciation events. Thirty one (31), five (05), one (1) and ten (10) paralog pairs were determined respectively for Arabidopsis, Picea, Selaginella and Physcomitrella. The structure-function relationship of paralog pairs were inferred from the phylogenetic tree of combined set of paralogous gene pairs by studying the prevalence of flanking regions and motif analysis of the NAC proteins. The motif analysis revealed the presence of an N-terminal conserved domain, a characteristic of the majority of NAC family proteins. Conserved motifs in the C-terminal region were absent in the majority of the protein sequences except few members in Arabidopsis and Physcomitrella. Also the time of gene duplication of the paralog pairs were calculated that revealed the duplication events occurred between 4.48 and 45.94 MYA Arabidopsis, 167.57-532.86 MYA in Picea, and 29.12-53.53 MYA in Physcomitrella.
RESUMO
Leucine-rich repeat receptor-like kinases (LRR RLKs) form a large family of plant signaling proteins consisting of an extracellular domain connected by a single-pass transmembrane sequence to a cytoplasmic kinase domain. Autophosphorylation on specific Ser and/or Thr residues in the cytoplasmic domain is often critical for the activation of several LRR RLK family members with proven functional roles in plant growth regulation, morphogenesis, disease resistance, and stress responses. While identification and functional characterization of in vivo phosphorylation sites is ultimately required for a full understanding of LRR RLK biology and function, bacterial expression of recombinant LRR RLK cytoplasmic catalytic domains for identification of in vitro autophosphorylation sites provides a useful resource for further targeted identification and functional analysis of in vivo sites. In this study we employed high-throughput cloning and a variety of mass spectrometry approaches to generate an autophosphorylation site database representative of more than 30% of the approximately 223 LRR RLKs in Arabidopsis thaliana. We used His-tagged constructs of complete cytoplasmic domains to identify a total of 592 phosphorylation events across 73 LRR RLKs, with 497 sites uniquely assigned to specific Ser (268 sites) or Thr (229 sites) residues in 68 LRR RLKs. Multiple autophosphorylation sites per LRR RLK were the norm, with an average of seven sites per cytoplasmic domain, while some proteins showed more than 20 unique autophosphorylation sites. The database was used to analyze trends in the localization of phosphorylation sites across cytoplasmic kinase subdomains and to derive a statistically significant sequence motif for phospho-Ser autophosphorylation.
Assuntos
Proteínas de Arabidopsis/metabolismo , Bases de Dados Factuais , Proteínas Quinases/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Citoplasma/metabolismo , Escherichia coli/genética , Dados de Sequência Molecular , Fosforilação , Proteínas Quinases/genética , Proteínas Serina-Treonina Quinases/genética , Proteínas Serina-Treonina Quinases/metabolismo , Estrutura Terciária de ProteínaRESUMO
BACKGROUND: Subtilisin-like serine proteases or Subtilases in fungi are important for penetration and colonization of host. In Hypocreales, these proteins share several properties with other fungal, bacterial, plant and mammalian homologs. However, adoption of specific roles in entomopathogenesis may be governed by attainment of unique biochemical and structural features during the evolutionary course. Due to such functional shifts Subtilases coded by different family members of Hypocreales acquire distinct features according to respective hosts and lifestyle. We conducted phylogenetic and DIVERGE analyses and identified important protein residues that putatively assign functional specificity to Subtilases in fungal families/species under the order Hypocreales. RESULTS: A total of 161 Subtilases coded by 10 species from five different families under the fungal order Hypocreales was included in the analysis. Based on the presence of conserved domains, the Subtilase genes were divided into three subfamilies, Subtilisin (S08.005), Proteinase K (S08.054) and Serine-carboxyl peptidases (S53.001). These subfamilies were investigated for phylogenetic associations, protein residues under positive selection and functional divergence among paralogous clades. The observations were co-related with the life-styles of the fungal families/species. Phylogenetic and Divergence analyses of Subtilisin (S08.005) and Proteinase K (S08.054) families of proteins revealed that the paralogous clades were clear-cut representation of familial origin of the protein sequences. We observed divergence between the paralogous clades of plant-pathogenic fungi (Nectriaceae), insect-pathogenic fungi (Cordycipitaceae/Clavicipitaceae) and nematophagous fungi (Ophiocordycipitaceae). In addition, Subtilase genes from the nematode-parasitic fungus Purpureocillium lilacinum made a unique cluster which putatively indicated that the fungus might have developed distinctive mechanisms for nematode-pathogenesis. Our evolutionary genetics analysis revealed evidence of positive selection on the Subtilisin (S08.005) and Proteinase K (S08.054) protein sequences of the entomopathogenic and nematophagous species belonging to Cordycipitaceae, Clavicipitaceae and Ophiocordycipitaceae families of Hypocreales. CONCLUSIONS: Our study provided new insights into the evolution of Subtilisin like serine proteases in Hypocreales, a fungal order largely consisting of biological control species. Subtilisin (S08.005) and Proteinase K (S08.054) proteins seemed to play important roles during life style modifications among different families and species of Hypocreales. Protein residues found significant in functional divergence analysis in the present study may provide support for protein engineering in future.
Assuntos
Evolução Molecular , Variação Genética , Hypocreales/enzimologia , Hypocreales/genética , Filogenia , Subtilisinas/genética , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Sequência Conservada/genética , Endopeptidase K/genética , Funções Verossimilhança , Modelos Genéticos , Família Multigênica , Seleção Genética , Especificidade da EspécieRESUMO
The core promoter, which immediately flanks the transcription start site (TSS), plays a critical role in transcriptional regulation of eukaryotes. Recent studies on higher eukaryotes have revealed an unprecedented complexity of core promoter structures that underscores diverse regulatory mechanisms of gene expression. For unicellular eukaryotes, however, the structures of core promoters have not been investigated in detail. As an important model organism, Schizosaccharomyces pombe still lacks the precise annotation for TSSs, thus hampering the analysis of core promoter structures and their relationship to higher eukaryotes. Here we used a deep sequencing-based approach (DeepCAGE) to generate 16 million uniquely mapped tags, corresponding to 93,736 positions in the S. pombe genome. The high-resolution TSS landscape enabled identification of over 8,000 core promoters, characterization of 4 promoter classes and observation of widespread alternative promoters. The landscape also allowed precise determination of the representative TSSs within core promoters, thus redefining the 5' UTR for 82.8% of S. pombe genes. We further identified the consensus initiator (Inr) sequence--PyPyPuN(A/C)(C/A), the TATA-enriched region (between position -25 and -37) and an Inr immediate downstream motif--CC(T/A)(T/C)(T/C/A)(A/G)CCA(A/T/C), all of which were associated with highly expressed promoters. In conclusion, the detailed analysis of core promoters not only significantly improves the genome annotation of S. pombe, but also reveals that this unicellular eukaryote shares a highly similar organization in the core promoters with higher eukaryotes. These findings lend additional evidence for the power of this model system in delineating complex regulatory processes in multicellular organisms, despite its perceived simplicity.
Assuntos
Genoma Fúngico , Regiões Promotoras Genéticas , Schizosaccharomyces/genética , Software , Regiões 5' não Traduzidas/genética , Sequência de Bases , Sequência Consenso , DNA Intergênico/genética , Genes Fúngicos , Dados de Sequência Molecular , Motivos de Nucleotídeos/genética , Fases de Leitura Aberta/genética , RNA não Traduzido/genética , TATA Box/genética , Sítio de Iniciação de TranscriçãoRESUMO
Given the significant impact of transportation-related carbon emissions on air quality and climate change, understanding the regional dynamics of these emissions is crucial. Despite numerous studies on carbon emissions, there is a lack of comprehensive analysis of China's interprovincial transport carbon emission correlation network. Based on China's provincial data from 2007 to 2021, we analyzed the network's basic structural characteristics and categorized it into four significant plates to investigate their interactions. Subsequently, motif analysis is employed to examine the micro-correlation patterns within the network, and the Exponential random graph model (ERGM) is utilized to analyze the network's formation mechanism. Findings reveal that: (1) Provinces with high correlation intensity are predominantly concentrated in the eastern region, such as Shanghai and Beijing. Additionally, provinces in the eastern region assume a central role in the transport carbon emission correlation network, mainly receiving carbon emissions from other provinces. In contrast, the western region primarily emits carbon emissions to other provinces, continuously converging towards the center. (2) The network is segmented into net beneficiary plate, net overflow plate, bidirectional spillover plate, and broker plate, with distinct roles and influences across different years. (3) Bidirectional correlation motif structures emerge as primary influencers within the network, although specific structures impede interregional communication and collaborative emission reduction. (4) Internal network's structural variables, such as mutuality, cyclic triple, and geometrically weighted edgewise shared partner, along with influencing factors including government intervention, urbanization rate, openness, fiscal expenditure on transport, and province adjacency significantly impact the formation of the transport carbon emission correlation network. The above transportation network research provides a theoretical basis for the country to promote low-carbon transportation and improve air quality, and also has important guiding significance for the cross-regional collaborative emission reduction work of provinces.
RESUMO
Dream research today assumes that there is a connection between dreams and waking life. However, the structural alteration of dream motifs in connection with the psychotherapeutic process and waking life has not yet been researched extensively. This study depicts the development of the new Motif Analysis and Phase Model (MAP), a dynamic method which allows research on the previous aspects. The following question was investigated as an accompanying key issue: can a connection be established between the course of the dream patterns and the agency of the dream ego as well as the dream contents and the course of the psychotherapies of the dreaming person as a whole? Four hypotheses were formulated and tested. The data material consists of 217 dreams of a male test subject. The motifs were analysed using Structural Dream Analysis (SDA) at first. Thereafter, the content was linked to the test subject's waking life in a guided interview. The findings show a strong connection between the dream content and the psychotherapies as well as the test subject's waking life. Five motifs with structural changes were found, through which the Phase Model with four phases was developed. At turning points, the transformative child motif also appears in the dreams. The course of the dream patterns and agency of the dream ego, however, has not changed. The results, the method and the generalisability were critically discussed and recommendations for future research were formulated.