Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 939
Filtrar
1.
Comput Biol Med ; 168: 107753, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38039889

RESUMO

BACKGROUND: Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD: Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS: Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION: MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.


Assuntos
Algoritmos , Software , Motivos de Nucleotídeos/genética , Reprodutibilidade dos Testes , Cromatina/genética
2.
PLoS Comput Biol ; 19(10): e1011536, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37782656

RESUMO

How the locus-specificity of epigenetic modifications is regulated remains an unanswered question. A contributing mechanism is that epigenetic enzymes are recruited to specific loci by DNA binding factors recognizing particular sequence motifs (referred to as epi-motifs). Using these motifs to predict biological outputs depending on local epigenetic state such as somatic mutation rates would confirm their functionality. Here, we used DNA motifs including known TF motifs and epi-motifs as a surrogate of epigenetic signals to predict somatic mutation rates in 13 cancers at an average 23kbp resolution. We implemented an interpretable neural network model, called contextual regression, to successfully learn the universal relationship between mutations and DNA motifs, and uncovered motifs that are most impactful on the regional mutation rates such as TP53 and epi-motifs associated with H3K9me3. Furthermore, we identified genomic regions with significantly higher mutation rates than the expected values in each individual tumor and demonstrated that such cancer-related regions can accurately predict cancer types. Interestingly, we found that the same mutation signatures often have different contributions to cancer-related and cancer-independent regions, and we also identified the motifs with the most contribution to each mutation signature.


Assuntos
Taxa de Mutação , Neoplasias , Humanos , Motivos de Nucleotídeos/genética , Mutação/genética , Epigênese Genética/genética , Neoplasias/genética
3.
Int J Biol Macromol ; 253(Pt 5): 127181, 2023 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-37793523

RESUMO

RNA is a pivotal molecule that plays critical roles in various cellular processes. Quantifying RNA structures and interactions is essential to understanding RNA function and developing RNA-based therapeutics. Using a unified five-bead model and a non-redundant database, this paper investigates the structural features and interactions of five commonly occurring RNA motifs, i.e., double-stranded helices, hairpin loops, internal/bulge loops, multi-branched junctions, and single-stranded terminal tails. Analyzing detailed distributions of RNA local structural features and base-base interactions reveals a preference for helical structures in both local backbone structures and base orientations. The interactions between adjacent bases exhibit motif-specific and sequence-dependent characteristics, reflecting the distinct topological constraints imposed by different loop-helix connection modes and the varying pairing and stacking interactions among different sequences. These findings shed light on the stability of RNA helices, emphasizing their significance in providing dominant base pairing and stacking interactions for RNA structures and stability. The four non-helix motifs encompass unpaired nucleotide loops and exhibit diverse base-base interactions, contributing to the structural diversity observed in RNA. Overall, the complexity of RNA structure arises from the intricate interplay of base-base interactions.


Assuntos
RNA , RNA/genética , RNA/química , Conformação de Ácido Nucleico , Modelos Moleculares , Pareamento de Bases , Motivos de Nucleotídeos/genética
4.
Sci Rep ; 13(1): 15987, 2023 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-37749116

RESUMO

RNAs that are able to prevent degradation by the 5'-3' exoribonuclease Xrn1 have emerged as crucial structures during infection by an increasing number of RNA viruses. Several plant viruses employ the so-called coremin motif, an Xrn1-resistant RNA that is usually located in 3' untranslated regions. Investigation of its structural and sequence requirements has led to its identification in plant virus families beyond those in which the coremin motif was initially discovered. In this study, we identified coremin-like motifs that deviate from the original in the number of nucleotides present in the loop region of the 5' proximal hairpin. They are present in a number of viral families that previously did not have an Xrn1-resistant RNA identified yet, including the double-stranded RNA virus families Hypoviridae and Chrysoviridae. Through systematic mutational analysis, we demonstrated that a coremin motif carrying a 6-nucleotide loop in the 5' proximal hairpin generally requires a YGNNAD consensus for stalling Xrn1, similar to the previously determined YGAD consensus required for Xrn1 resistance of the original coremin motif. Furthermore, we determined the minimal requirements for the 3' proximal hairpin. Since some putative coremin motifs were found in intergenic regions or coding sequences, we demonstrated their capacity for inhibiting translation through an in vitro ribosomal scanning inhibition assay. Consequently, this study provides a further expansion on the number of viral families with known Xrn1-resistant elements, while adding a novel, potentially regulatory function for this structure.


Assuntos
Vírus de Plantas , RNA Viral , Motivos de Nucleotídeos/genética , RNA Viral/metabolismo , Exorribonucleases/metabolismo , Viroma , Ribossomos/metabolismo , Nucleotídeos , Vírus de Plantas/genética , Vírus de Plantas/metabolismo , Conformação de Ácido Nucleico , Estabilidade de RNA
5.
Nat Commun ; 14(1): 5944, 2023 09 23.
Artigo em Inglês | MEDLINE | ID: mdl-37741827

RESUMO

Advances in sequencing technologies have empowered epitranscriptomic profiling at the single-base resolution. Putative RNA modification sites identified from a single high-throughput experiment may contain one type of modification deposited by different writers or different types of modifications, along with false positive results because of the challenge of distinguishing signals from noise. However, current tools are insufficient for subtyping, visualization, and denoising these signals. Here, we present iMVP, which is an interactive framework for epitranscriptomic analysis with a nonlinear dimension reduction technique and density-based partition. As exemplified by the analysis of mRNA m5C and ModTect variant data, we show that iMVP allows the identification of previously unknown RNA modification motifs and writers and the discovery of false positives that are undetectable by traditional methods. Using putative m6A/m6Am sites called from 8 profiling approaches, we illustrate that iMVP enables comprehensive comparison of different approaches and advances our understanding of the difference and pattern of true positives and artifacts in these methods. Finally, we demonstrate the ability of iMVP to analyze an extremely large human A-to-I editing dataset that was previously unmanageable. Our work provides a general framework for the visualization and interpretation of epitranscriptomic data.


Assuntos
Artefatos , Tecnologia , Humanos , Motivos de Nucleotídeos/genética , RNA Mensageiro
6.
J Virol ; 97(6): e0063523, 2023 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-37223945

RESUMO

The stem-loop II motif (s2m) is an RNA structural element that is found in the 3' untranslated region (UTR) of many RNA viruses, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Though the motif was discovered over 25 years ago, its functional significance is unknown. In order to understand the importance of s2m, we created viruses with deletions or mutations of the s2m by reverse genetics and also evaluated a clinical isolate harboring a unique s2m deletion. Deletion or mutation of the s2m had no effect on growth in vitro or on growth and viral fitness in Syrian hamsters in vivo. We also compared the secondary structure of the 3' UTR of wild-type and s2m deletion viruses using selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) and dimethyl sulfate mutational profiling and sequencing (DMS-MaPseq). These experiments demonstrate that the s2m forms an independent structure and that its deletion does not alter the overall remaining 3'-UTR RNA structure. Together, these findings suggest that s2m is dispensable for SARS-CoV-2. IMPORTANCE RNA viruses, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), contain functional structures to support virus replication, translation, and evasion of the host antiviral immune response. The 3' untranslated region of early isolates of SARS-CoV-2 contained a stem-loop II motif (s2m), which is an RNA structural element that is found in many RNA viruses. This motif was discovered over 25 years ago, but its functional significance is unknown. We created SARS-CoV-2 with deletions or mutations of the s2m and determined the effect of these changes on viral growth in tissue culture and in rodent models of infection. Deletion or mutation of the s2m element had no effect on growth in vitro or on growth and viral fitness in Syrian hamsters in vivo. We also observed no impact of the deletion on other known RNA structures in the same region of the genome. These experiments demonstrate that s2m is dispensable for SARS-CoV-2.


Assuntos
Motivos de Nucleotídeos , SARS-CoV-2 , Animais , Cricetinae , Regiões 3' não Traduzidas/genética , COVID-19/virologia , Mesocricetus , Mutação , SARS-CoV-2/genética , Motivos de Nucleotídeos/genética , RNA Viral/química , RNA Viral/genética
7.
Proc Natl Acad Sci U S A ; 120(15): e2216698120, 2023 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-37023129

RESUMO

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.


Assuntos
Algoritmos , Redes Neurais de Computação , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Bases de Dados Factuais
8.
Nat Struct Mol Biol ; 30(4): 417-424, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36914796

RESUMO

Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.


Assuntos
DNA , Genoma Humano , Humanos , Motivos de Nucleotídeos/genética , Mutagênese , DNA/genética , DNA/química , Nucleotídeos
9.
PLoS Comput Biol ; 19(1): e1010859, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36689472

RESUMO

RNA recognition motifs (RRM) are the most prevalent class of RNA binding domains in eucaryotes. Their RNA binding preferences have been investigated for almost two decades, and even though some RRM domains are now very well described, their RNA recognition code has remained elusive. An increasing number of experimental structures of RRM-RNA complexes has become available in recent years. Here, we perform an in-depth computational analysis to derive an RNA recognition code for canonical RRMs. We present and validate a computational scoring method to estimate the binding between an RRM and a single stranded RNA, based on structural data from a carefully curated multiple sequence alignment, which can predict RRM binding RNA sequence motifs based on the RRM protein sequence. Given the importance and prevalence of RRMs in humans and other species, this tool could help design RNA binding motifs with uses in medical or synthetic biology applications, leading towards the de novo design of RRMs with specific RNA recognition.


Assuntos
Motivo de Reconhecimento de RNA , RNA , Humanos , RNA/química , Sequência de Aminoácidos , Alinhamento de Sequência , Motivos de Nucleotídeos/genética , Ligação Proteica , Sítios de Ligação
10.
Artigo em Inglês | MEDLINE | ID: mdl-35275822

RESUMO

A DNA motif is a sequence pattern shared by the DNA sequence segments that bind to a specific protein. Discovering motifs in a given DNA sequence dataset plays a vital role in studying gene expression regulation. As an important attribute of the DNA motif, the motif length directly affects the quality of the discovered motifs. How to determine the motif length more accurately remains a difficult challenge to be solved. We propose a new motif length prediction scheme named MotifLen by using supervised machine learning. First, a method of constructing sample data for predicting the motif length is proposed. Secondly, a deep learning model for motif length prediction is constructed based on the convolutional neural network. Then, the methods of applying the proposed prediction model based on a motif found by an existing motif discovery algorithm are given. The experimental results show that i) the prediction accuracy of MotifLen is more than 90% on the validation set and is significantly higher than that of the compared methods on real datasets, ii) MotifLen can successfully optimize the motifs found by the existing motif discovery algorithms, and iii) it can effectively improve the time performance of some existing motif discovery algorithms.


Assuntos
Aprendizado Profundo , Motivos de Nucleotídeos/genética , Análise de Sequência de DNA/métodos , Algoritmos , Redes Neurais de Computação
11.
Int J Mol Sci ; 23(16)2022 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-36012512

RESUMO

RNA motif classification is important for understanding structure/function connections and building phylogenetic relationships. Using our coarse-grained RNA-As-Graphs (RAG) representations, we identify recurrent dual graph motifs in experimentally solved RNA structures based on an improved search algorithm that finds and ranks independent RNA substructures. Our expanded list of 183 existing dual graph motifs reveals five common motifs found in transfer RNA, riboswitch, and ribosomal 5S RNA components. Moreover, we identify three motifs for available viral frameshifting RNA elements, suggesting a correlation between viral structural complexity and frameshifting efficiency. We further partition the RNA substructures into 1844 distinct submotifs, with pseudoknots and junctions retained intact. Common modules are internal loops and three-way junctions, and three submotifs are associated with riboswitches that bind nucleotides, ions, and signaling molecules. Together, our library of existing RNA motifs and submotifs adds to the growing universe of RNA modules, and provides a resource of structures and substructures for novel RNA design.


Assuntos
RNA , Riboswitch , Algoritmos , Biblioteca Gênica , Conformação de Ácido Nucleico , Motivos de Nucleotídeos/genética , Filogenia , RNA/química , RNA/genética , RNA Viral/genética
12.
Elife ; 112022 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-36043696

RESUMO

Sequence variation in enhancers that control cell-type-specific gene transcription contributes significantly to phenotypic variation within human populations. However, it remains difficult to predict precisely the effect of any given sequence variant on enhancer function due to the complexity of DNA sequence motifs that determine transcription factor (TF) binding to enhancers in their native genomic context. Using F1-hybrid cells derived from crosses between distantly related inbred strains of mice, we identified thousands of enhancers with allele-specific TF binding and/or activity. We find that genetic variants located within the central region of enhancers are most likely to alter TF binding and enhancer activity. We observe that the AP-1 family of TFs (Fos/Jun) are frequently required for binding of TEAD TFs and for enhancer function. However, many sequence variants outside of core motifs for AP-1 and TEAD also impact enhancer function, including sequences flanking core TF motifs and AP-1 half sites. Taken together, these data represent one of the most comprehensive assessments of allele-specific TF binding and enhancer function to date and reveal how sequence changes at enhancers alter their function across evolutionary timescales.


There are hundreds of different types of cells in the body. Each one performs a unique role, but they all share the same genes. Sequences of the genetic code called enhancers decide which genes each cell uses. Enhancers work like genetic switches: to turn a gene on, proteins called transcription factors assemble on an enhancer. Each transcription factor recognises a short sequence on the enhancer, and several distinct transcription factors work together to promote the activatation of a gene. The relationship between transcription factors, enhancers, and gene activation is complex. The specific genetic sequences of enhancers differ between species, changing the way these genetic switches work. But scientists are not yet able to reliably predict the effects of small changes in the DNA sequence of an enhancer. One way to tackle this problem is to look at different versions of the same enhancers side by side to see how small mutations change their behaviour. Mammalian cells generally carry two copies of each chromosome (the molecules that contain the genetic code), one inherited from each parent. Each of the two copies carries the same genes and enhancers, but there are many small differences in the DNA sequences of enhancers between the chromosomes inherited from each parent, which can potentially alter their function Yang, Ling et al. generated cells from mice that come from different inbred strains, which are similar to purebred dogs. By breeding two distinct inbred mouse strains together that are very different from one another, they generated a panel of hybrid mouse cell lines that have a relatively large number of differences in their DNA sequence between the maternal and paternal chromosomes. Looking at the different versions of each enhancer side-by-side revealed thousands of single letter changes in the DNA sequence of enhancers that changed how they work. Mutations affecting the binding site of one transcription factor within an enhancer can indirectly affect the binding of other types of transcription factors. Yang, Ling et al. found that if a transcription factor could no longer find its place on an enhancer, it stopped others from binding even if their own places had not changed. Sometimes, mutations on either side of the binding sequences also affected transcription factor binding. This suggests a more complex relationship than previously thought may exist between the DNA sequence of an enhancer and the transcription factors that bind to it. Spotting the differences caused by mutations could help further the efforts of scientists to read and write the genetic code. This could have many benefits. It would allow scientists to control natural or artificial genes, and to predict the effects of genetic changes that are identified in humans with genetic diseases. This might improve genetic experiments, medical screening, gene therapy, and our understanding of evolution.


Assuntos
Elementos Facilitadores Genéticos , Variação Genética , Fator de Transcrição AP-1 , Animais , Humanos , Camundongos , Sítios de Ligação/genética , Elementos Facilitadores Genéticos/genética , Variação Genética/genética , Motivos de Nucleotídeos/genética , Ligação Proteica/genética , Fator de Transcrição AP-1/genética
13.
Gene ; 841: 146756, 2022 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-35905857

RESUMO

Non-coding RNAs are key regulatory players in bacteria. Many computationally predicted non-coding RNAs, however, lack functional associations. An example is the Bacillaceae-1 RNA motif, whose Rfam model consists of two hairpin loops. We find the motif conserved in nine of 13 non-pathogenic strains of the genus Bacillus but only in one pathogenic strain. To elucidate functional characteristics, we studied 118 hits of the Rfam model in 11 Bacillus spp. and found two distinct classes based on the ensemble diversity of their RNA secondary structure and the genomic context concerning the ribosomal RNA (rRNA) cluster. Forty hits are associated with the rRNA cluster, of which all 19 hits upstream flanking of 16S rRNA have a reverse complementary structure of low structural diversity. Fifty-two hits have large ensemble diversity, of which 38 are located between two coding genes. For eight hits in Bacillus subtilis, we investigated public expression data under various conditions and observed either the forward or the reverse complementary motif expressed. Five hits are associated with the rRNA cluster. Four of them are located upstream of the 16S rRNA and are not transcriptionally active, but instead, their reverse complements with low structural diversity are expressed together with the rRNA cluster. The three other hits are located between two coding genes in non-conserved genomic loci. Two of them are independently expressed from their surrounding genes and are structurally diverse. In summary, we found that Bacillaceae-1 RNA motifs upstream flanking of ribosomal RNA clusters tend to have one stable structure with the reverse complementary motif expressed in B. subtilis. In contrast, a subgroup of intergenic motifs has the thermodynamic potential for structural switches.


Assuntos
Bacillaceae , Bacillus , Bacillaceae/genética , Bacillaceae/metabolismo , Bacillus/genética , Bacillus subtilis/genética , Motivos de Nucleotídeos/genética , Filogenia , RNA Ribossômico/genética , RNA Ribossômico 16S/genética
14.
Int J Mol Sci ; 23(5)2022 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-35269600

RESUMO

Influenza A virus (IAV) is a member of the single-stranded RNA (ssRNA) family of viruses. The most recent global pandemic caused by the SARS-CoV-2 virus has shown the major threat that RNA viruses can pose to humanity. In comparison, influenza has an even higher pandemic potential as a result of its high rate of mutations within its relatively short (<13 kbp) genome, as well as its capability to undergo genetic reassortment. In light of this threat, and the fact that RNA structure is connected to a broad range of known biological functions, deeper investigation of viral RNA (vRNA) structures is of high interest. Here, for the first time, we propose a secondary structure for segment 8 vRNA (vRNA8) of A/California/04/2009 (H1N1) formed in the presence of cellular and viral components. This structure shows similarities with prior in vitro experiments. Additionally, we determined the location of several well-defined, conserved structural motifs of vRNA8 within IAV strains with possible functionality. These RNA motifs appear to fold independently of regional nucleoprotein (NP)-binding affinity, but a low or uneven distribution of NP in each motif region is noted. This research also highlights several accessible sites for oligonucleotide tools and small molecules in vRNA8 in a cellular environment that might be a target for influenza A virus inhibition on the RNA level.


Assuntos
Regulação Viral da Expressão Gênica , Genoma Viral/genética , Vírus da Influenza A Subtipo H1N1/genética , Conformação de Ácido Nucleico , RNA Viral/química , Animais , Sequência de Bases , Cães , Humanos , Vírus da Influenza A Subtipo H1N1/metabolismo , Influenza Humana/virologia , Células Madin Darby de Rim Canino , Modelos Moleculares , Motivos de Nucleotídeos/genética , Dobramento de RNA , RNA Viral/genética , Proteínas Virais/genética , Proteínas Virais/metabolismo
15.
Sci Rep ; 12(1): 2420, 2022 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-35165300

RESUMO

The zinc finger antiviral protein (ZAP) is known to restrict viral replication by binding to the CpG rich regions of viral RNA, and subsequently inducing viral RNA degradation. This enzyme has recently been shown to be capable of restricting SARS-CoV-2. These data have led to the hypothesis that the low abundance of CpG in the SARS-CoV-2 genome is due to an evolutionary pressure exerted by the host ZAP. To investigate this hypothesis, we performed a detailed analysis of many coronavirus sequences and ZAP RNA binding preference data. Our analyses showed neither evidence for an evolutionary pressure acting specifically on CpG dinucleotides, nor a link between the activity of ZAP and the low CpG abundance of the SARS-CoV-2 genome.


Assuntos
COVID-19/genética , Fosfatos de Dinucleosídeos/genética , Genoma Viral/genética , Proteínas de Ligação a RNA/genética , SARS-CoV-2/genética , Animais , Sequência de Bases , Sítios de Ligação/genética , COVID-19/virologia , Fosfatos de Dinucleosídeos/metabolismo , Evolução Molecular , Interações Hospedeiro-Patógeno/genética , Humanos , Motivos de Nucleotídeos/genética , Ligação Proteica , RNA Viral/genética , RNA Viral/metabolismo , Proteínas de Ligação a RNA/metabolismo , SARS-CoV-2/fisiologia , Replicação Viral/genética
16.
Viruses ; 14(2)2022 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-35215760

RESUMO

Highly pathogenic porcine reproductive and respiratory syndrome virus (HP-PRRSV) with enhanced replication capability emerged in China and has become dominant epidemic strain since 2006. Up to now, the replication-regulated genes of PRRSV have not been fully clarified. Here, by swapping the genes or elements between HP-PRRSV and classical PRRSV based on infectious clones, NSP1, NSP2, NSP7, NSP9 and 3'-UTR are found to contribute to the high replication efficiency of HP-PRRSV. Further study revealed that mutations at positions 117th or 119th in the 3'-UTR are significantly related to replication efficiency, and the nucleotide at position 120th is critical for viral rescue. The motif composed by 117-120th nucleotides was quite conservative within each lineage of PRRSV; mutations in the motif of HP-PRRSV and currently epidemic lineage 1 (L1) PRRSV showed higher synthesis ability of viral negative genomic RNA, suggesting that those mutations were beneficial for viral replication. RNA structure analysis revealed that this motif maybe involved into a pseudoknot in the 3'-UTR. The results discovered a novel motif, 117-120th nucleotide in the 3'-UTR, that is critical for replication of PRRSV-2, and mutations in the motif contribute to the enhanced replicative ability of HP-PRRSV or L1 PRRSV. Our findings will help to understand the molecular basis of PRRSV replication and find the potential factors resulting in an epidemic strain of PRRSV.


Assuntos
Regiões 3' não Traduzidas/genética , Motivos de Nucleotídeos/genética , Síndrome Respiratória e Reprodutiva Suína/virologia , Vírus da Síndrome Respiratória e Reprodutiva Suína/genética , Replicação Viral/genética , Animais , Linhagem Celular , Mutação , Vírus da Síndrome Respiratória e Reprodutiva Suína/patogenicidade , RNA Viral/genética , Suínos , Proteínas não Estruturais Virais/genética , Virulência
17.
Int J Mol Sci ; 23(3)2022 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-35163483

RESUMO

The aryl hydrocarbon receptor (AhR) is a ligand-activated transcription factor known for mediating the toxicity of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) and related compounds. Although the canonical mechanism of AhR activation involves heterodimerization with the aryl hydrocarbon receptor nuclear translocator, other transcriptional regulators that interact with AhR have been identified. Enrichment analysis of motifs in AhR-bound genomic regions implicated co-operation with COUP transcription factor (COUP-TF) and hepatocyte nuclear factor 4 (HNF4). The present study investigated AhR, HNF4α and COUP-TFII genomic binding and effects on gene expression associated with liver-specific function and cell differentiation in response to TCDD. Hepatic ChIPseq data from male C57BL/6 mice at 2 h after oral gavage with 30 µg/kg TCDD were integrated with bulk RNA-sequencing (RNAseq) time-course (2-72 h) and dose-response (0.01-30 µg/kg) datasets to assess putative AhR, HNF4α and COUP-TFII interactions associated with differential gene expression. Functional enrichment analysis of differentially expressed genes (DEGs) identified differential binding enrichment for AhR, COUP-TFII, and HNF4α to regions within liver-specific genes, suggesting intersections associated with the loss of liver-specific functions and hepatocyte differentiation. Analysis found that the repression of liver-specific, HNF4α target and hepatocyte differentiation genes, involved increased AhR and HNF4α binding with decreased COUP-TFII binding. Collectively, these results suggested TCDD-elicited loss of liver-specific functions and markers of hepatocyte differentiation involved interactions between AhR, COUP-TFII and HNF4α.


Assuntos
Fatores de Transcrição COUP/metabolismo , Sequenciamento de Cromatina por Imunoprecipitação , Genoma , Fator 4 Nuclear de Hepatócito/metabolismo , Fígado/metabolismo , Dibenzodioxinas Policloradas/toxicidade , Receptores de Hidrocarboneto Arílico/metabolismo , Animais , Sequência de Bases , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Masculino , Camundongos Endogâmicos C57BL , Motivos de Nucleotídeos/genética , Ligação Proteica , RNA-Seq , Transcrição Gênica
18.
Int J Mol Sci ; 23(3)2022 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-35163661

RESUMO

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.


Assuntos
Genoma de Planta , Traqueófitas/genética , Sítio de Iniciação de Transcrição , Composição de Bases/genética , Sítios de Ligação , DNA de Plantas/genética , Éxons/genética , Anotação de Sequência Molecular , Motivos de Nucleotídeos/genética , Nucleotídeos/metabolismo , Fases de Leitura Aberta/genética , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo
19.
Cells ; 11(2)2022 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-35053346

RESUMO

In 1985, Keese and Symons proposed a hypothesis on the sequence and secondary structure of viroids from the family Pospiviroidae: their secondary structure can be subdivided into five structural and functional domains and "viroids have evolved by rearrangement of domains between different viroids infecting the same cell and subsequent mutations within each domain"; this article is one of the most cited in the field of viroids. Employing the pairwise alignment method used by Keese and Symons and in addition to more recent methods, we tried to reproduce the original results and extent them to further members of Pospiviroidae which were unknown in 1985. Indeed, individual members of Pospiviroidae consist of a patchwork of sequence fragments from the family but the lengths of fragments do not point to consistent points of rearrangement, which is in conflict with the original hypothesis of fixed domain borders.


Assuntos
Sequência Consenso , Motivos de Nucleotídeos/genética , Viroides/química , Sequência de Bases , Conformação de Ácido Nucleico , Viroides/genética
20.
PLoS One ; 17(1): e0263307, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35089985

RESUMO

We study the limits imposed by transcription factor specificity on the maximum number of binding motifs that can coexist in a gene regulatory network, using the SwissRegulon Fantom5 collection of 684 human transcription factor binding sites as a model. We describe transcription factor specificity using regular expressions and find that most human transcription factor binding site motifs are separated in sequence space by one to three motif-discriminating positions. We apply theorems based on the pigeonhole principle to calculate the maximum number of transcription factors that can coexist given this degree of specificity, which is in the order of ten thousand and would fully utilize the space of DNA subsequences. Taking into account an expanded DNA alphabet with modified bases can further raise this limit by several orders of magnitude, at a lower level of sequence space usage. Our results may guide the design of transcription factors at both the molecular and system scale.


Assuntos
DNA/metabolismo , Motivos de Nucleotídeos/genética , Fatores de Transcrição/metabolismo , Algoritmos , Sequência de Bases , Sítios de Ligação , Humanos , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...