RESUMO
Although they are organelles without a limiting membrane, nucleoli have an exclusive structure, built upon the rDNA-rich acrocentric short arms of five human chromosomes (nucleolar organizer regions or NORs). This has raised the question: what are the structural features of a chromosome required for its inclusion in a nucleolus? Previous work has suggested that sequences adjacent to the tandemly repeated rDNA repeat units (DJ, distal junction sequence) may be involved, and we have extended such studies by addressing several issues related to the requirements for the association of NORs with nucleoli. We exploited both a set of somatic cell hybrids containing individual human acrocentric chromosomes and a set of Human Artificial Chromosomes (HACs) carrying different parts of a NOR, including an rDNA unit or DJ or PJ (proximal junction) sequence. Association of NORs with nucleoli was increased when constituent rDNA was transcribed and may be also affected by the status of heterochromatin blocks formed next to the rDNA arrays. Furthermore, our data suggest that a relatively small size DJ region, highly conserved in evolution, is also involved, along with the rDNA repeats, in the localization of p-arms of acrocentric chromosomes in nucleoli. Thus, we infer a cooperative action of rDNA sequence-stimulated by its activity-and sequences distal to rDNA contributing to incorporation into nucleoli. Analysis of NOR sequences also identified LncRNA_038958 in the DJ, a candidate transcript with the region of the suggested promoter that is located close to the DJ/rDNA boundary and contains CTCF binding sites. This LncRNA may affect RNA Polymerase I and/or nucleolar activity. Our findings provide the basis for future studies to determine which RNAs and proteins interact critically with NOR sequences to organize the higher-order structure of nucleoli and their function in normal cells and pathological states.
Assuntos
Região Organizadora do Nucléolo , RNA Longo não Codificante , Humanos , Região Organizadora do Nucléolo/genética , Região Organizadora do Nucléolo/metabolismo , DNA Ribossômico/genética , RNA Longo não Codificante/metabolismo , Nucléolo Celular/genética , Nucléolo Celular/metabolismo , Cromossomos Humanos/metabolismoRESUMO
Increasing numbers of small, regulatory RNAs (sRNAs) corresponding to 3' untranslated regions (UTR) are being discovered in bacteria. One such sRNA, denoted GlnZ, corresponds to the 3' UTR of the Escherichia coli glnA mRNA encoding glutamine synthetase. Several forms of GlnZ, processed from the glnA mRNA, are detected in cells growing with limiting ammonium. GlnZ levels are regulated transcriptionally by the NtrC transcription factor and post-transcriptionally by RNase III. Consistent with the expression, E. coli cells lacking glnZ show delayed outgrowth from nitrogen starvation compared to wild type cells. Transcriptome-wide RNA-RNA interactome datasets indicated that GlnZ binds to multiple target RNAs. Immunoblots and assays of fusions confirmed GlnZ-mediated repression of glnP and sucA, encoding proteins that contribute to glutamine transport and the citric acid cycle, respectively. Although the overall sequences of GlnZ from E. coli K-12, Enterohemorrhagic E. coli and Salmonella enterica have significant differences due to various sequence insertions, all forms of the sRNA were able to regulate the two targets characterized. Together our data show that GlnZ impacts growth of E. coli under low nitrogen conditions by modulating genes that affect carbon and nitrogen flux.
Assuntos
Compostos de Amônio , Microbioma Gastrointestinal , Pequeno RNA não Traduzido , Regiões 3' não Traduzidas , Carbono/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Glutamato-Amônia Ligase/genética , Glutamina/genética , Glutamina/metabolismo , Nitrogênio/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Ribonuclease III/metabolismo , Fatores de Transcrição/metabolismoRESUMO
The choice of guide RNA (gRNA) for CRISPR-based gene targeting is an essential step in gene editing applications, but the prediction of gRNA specificity remains challenging. Lack of transparency and focus on point estimates of efficiency disregarding the information on possible error sources in the model limit the power of existing Deep Learning-based methods. To overcome these problems, we present a new approach, a hybrid of Capsule Networks and Gaussian Processes. Our method predicts the cleavage efficiency of a gRNA with a corresponding confidence interval, which allows the user to incorporate information regarding possible model errors into the experimental design. We provide the first utilization of uncertainty estimation in computational gRNA design, which is a critical step toward accurate decision-making for future CRISPR applications. The proposed solution demonstrates acceptable confidence intervals for most test sets and shows regression quality similar to existing models. We introduce a set of criteria for gRNA selection based on off-target cleavage efficiency and its variance and present a collection of pre-computed gRNAs for human chromosome 22. Using Neural Network Interpretation methods, we show that our model rediscovers an established biological factor underlying cleavage efficiency, the importance of the seed region in gRNA.
Assuntos
Sistemas CRISPR-Cas , Aprendizado Profundo , Edição de Genes , Marcação de Genes , RNA Guia de Cinetoplastídeos/genética , Algoritmos , Edição de Genes/métodos , Marcação de Genes/métodos , Genômica/métodos , Humanos , Redes Neurais de Computação , Reprodutibilidade dos TestesRESUMO
Pervasive transcription of eukaryotic genomes results in expression of long non-coding RNAs (lncRNAs) most of which are poorly conserved in evolution and appear to be non-functional. However, some lncRNAs have been shown to perform specific functions, in particular, transcription regulation. Thousands of small open reading frames (smORFs, <100 codons) located on lncRNAs potentially might be translated into peptides or microproteins. We report a comprehensive analysis of the conservation and evolutionary trajectories of lncRNAs-smORFs from the moss Physcomitrium patens across transcriptomes of 479 plant species. Although thousands of smORFs are subject to substantial purifying selection, the majority of the smORFs appear to be evolutionary young and could represent a major pool for functional innovation. Using nanopore RNA sequencing, we show that, on average, the transcriptional level of conserved smORFs is higher than that of non-conserved smORFs. Proteomic analysis confirmed translation of 82 novel species-specific smORFs. Numerous conserved smORFs containing low complexity regions (LCRs) or transmembrane domains were identified, the biological functions of a selected LCR-smORF were demonstrated experimentally. Thus, microproteins encoded by smORFs are a major, functionally diverse component of the plant proteome.
Assuntos
Bryopsida/genética , Fases de Leitura Aberta , Proteoma , RNA Longo não Codificante , TranscriptomaRESUMO
In enteric bacteria, the transcription factor σ(E) maintains membrane homeostasis by inducing synthesis of proteins involved in membrane repair and two small regulatory RNAs (sRNAs) that down-regulate synthesis of abundant membrane porins. Here, we describe the discovery of a third σ(E)-dependent sRNA, MicL (mRNA-interfering complementary RNA regulator of Lpp), transcribed from a promoter located within the coding sequence of the cutC gene. MicL is synthesized as a 308-nucleotide (nt) primary transcript that is processed to an 80-nt form. Both forms possess features typical of Hfq-binding sRNAs but surprisingly target only a single mRNA, which encodes the outer membrane lipoprotein Lpp, the most abundant protein of the cell. We show that the copper sensitivity phenotype previously ascribed to inactivation of the cutC gene is actually derived from the loss of MicL and elevated Lpp levels. This observation raises the possibility that other phenotypes currently attributed to protein defects are due to deficiencies in unappreciated regulatory RNAs. We also report that σ(E) activity is sensitive to Lpp abundance and that MicL and Lpp comprise a new σ(E) regulatory loop that opposes membrane stress. Together MicA, RybB, and MicL allow σ(E) to repress the synthesis of all abundant outer membrane proteins in response to stress.
Assuntos
Proteínas da Membrana Bacteriana Externa/metabolismo , Proteínas de Transporte/genética , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Lipoproteínas/metabolismo , Pequeno RNA não Traduzido/metabolismo , Fator sigma/metabolismo , Estresse Fisiológico/fisiologia , Proteínas da Membrana Bacteriana Externa/genética , Peptídeos e Proteínas de Sinalização Intracelular , Lipoproteínas/genética , Fenótipo , Regiões Promotoras Genéticas/genética , Biossíntese de Proteínas/fisiologia , Pequeno RNA não Traduzido/genética , Sequências Reguladoras de Ácido Ribonucleico/genéticaRESUMO
The opioid receptor (OPR) family comprises the mu-, delta-, and kappa-opioid, and nociceptin receptors that belong to the superfamily of 7-transmembrane spanning G protein-coupled receptors (GPCRs). The mu-opioid receptor is the main target for clinically used opioid analgesics, and its biology has been extensively studied. The N-terminally truncated 6TM receptors isoform produced through alternative splicing of the OPRM1 gene displays unique signaling and analgesic properties, but it is unclear if other OPRs have the same ability. In this study, we have built a comprehensive map of alternative splicing events that produce 6TM receptor variants in all the OPRs and demonstrated their evolutionary conservation. We then obtained evidence for their translation through ribosomal footprint analysis. We discovered that N-terminally truncated 6TM GPCRs are rare in the human genome and OPRs are overrepresented in this group. Finally, we also observed a significant enrichment of 6TM GPCR genes among genes associated with pain, psychiatric disorders, and addiction. Understanding the biology of 6TM receptors and leveraging this knowledge for drug development should pave the way for novel therapies.
Assuntos
Processamento Alternativo/genética , Sequência Conservada/genética , Receptores Opioides delta/genética , Receptores Opioides kappa/genética , Receptores Opioides mu/genética , Receptores Opioides/genética , Animais , Linhagem Celular Tumoral , Bases de Dados Genéticas , Variação Genética/genética , Humanos , Macaca , Camundongos , Especificidade da Espécie , Receptor de NociceptinaRESUMO
In response to low levels of magnesium (Mg2+ ), the PhoQP two component system induces the transcription of two convergent genes, one encoding a 31-amino acid protein denoted MgtS and the second encoding a small, regulatory RNA (sRNA) denoted MgrR. Previous studies showed that the MgtS protein interacts with and stabilizes the MgtA Mg2+ importer to increase intracellular Mg2+ levels, while the MgrR sRNA base pairs with the eptB mRNA thus affecting lipopolysaccharide modification. Surprisingly, we found overexpression of the MgtS protein also leads to induction of the PhoRB regulon. Studies to understand this activation showed that MgtS forms a complex with a second protein, PitA, a cation-phosphate symporter. Given that the additive effect of ∆mgtA and ∆mgtS mutations on intracellular Mg2+ concentrations seen previously is lost in the ∆pitA mutant, we suggest that MgtS binds to and prevents Mg2+ leakage through PitA under Mg2+ -limiting conditions. Consistent with a detrimental role of PitA in low Mg2+ , we also observe MgrR sRNA repression of PitA synthesis. Thus, PhoQP induces the expression of two convergent small genes in response to Mg2+ limitation whose products act to modulate PitA at different levels to increase intracellular Mg2+ .
Assuntos
Proteínas de Escherichia coli/metabolismo , Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Magnésio/metabolismo , Proteínas de Membrana/metabolismo , Proteínas de Transporte de Fosfato/metabolismo , Pequeno RNA não Traduzido/metabolismo , Escherichia coli/genética , Redes Reguladoras de GenesRESUMO
Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering â¼0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.
Assuntos
Cromossomos Humanos Par 21 , DNA Ribossômico/química , Genes de RNAr , Variação Genética , Animais , Linhagem Celular , Clonagem Molecular , DNA Ribossômico/isolamento & purificação , DNA Espaçador Ribossômico/química , Humanos , Camundongos , Conformação de Ácido Nucleico , Região Organizadora do Nucléolo/química , RNA Ribossômico/química , RNA Ribossômico/metabolismo , Análise de Sequência de DNARESUMO
Serine is the only amino acid that is encoded by two disjoint codon sets so that a tandem substitution of two nucleotides is required to switch between the two sets. Previously published evidence suggests that, for the most evolutionarily conserved serines, the codon set switch occurs by simultaneous substitution of two nucleotides. Here we report a genome-wide reconstruction of the evolution of serine codons in triplets of closely related species from diverse prokaryotes and eukaryotes. The results indicate that the great majority of codon set switches proceed by two consecutive nucleotide substitutions, via a threonine or cysteine intermediate, and are driven by selection. These findings imply a strong pressure of purifying selection in protein evolution, which in the case of serine codon set switches occurs via an initial deleterious substitution quickly followed by a second, compensatory substitution. The result is frequent reversal of amino acid replacements and, at short evolutionary distances, pervasive homoplasy.
Assuntos
Códon/genética , Serina/genética , Animais , Archaea/genética , Bactérias/genética , Evolução Molecular , Humanos , Mutação , Saccharomyces/genética , Seleção GenéticaRESUMO
Specific structures in mRNA modulate translation rate and thus can affect protein folding. Using the protein structures from two eukaryotes and three prokaryotes, we explore the connections between the protein compactness, inferred from solvent accessibility, and mRNA structure, inferred from mRNA folding energy (ΔG). In both prokaryotes and eukaryotes, the ΔG value of the most stable 30 nucleotide segment of the mRNA (ΔGmin) strongly, positively correlates with protein solvent accessibility. Thus, mRNAs containing exceptionally stable secondary structure elements typically encode compact proteins. The correlations between ΔG and protein compactness are much more pronounced in predicted ordered parts of proteins compared to the predicted disordered parts, indicative of an important role of mRNA secondary structure elements in the control of protein folding. Additionally, ΔG correlates with the mRNA length and the evolutionary rate of synonymous positions. The correlations are partially independent and were used to construct multiple regression models which explain about half of the variance of protein solvent accessibility. These findings suggest a model in which the mRNA structure, particularly exceptionally stable RNA structural elements, act as gauges of protein co-translational folding by reducing ribosome speed when the nascent peptide needs time to form and optimize the core structure.
Assuntos
Dobramento de Proteína , RNA Mensageiro/fisiologia , Animais , Composição de Bases , Humanos , Cinética , Modelos Lineares , Modelos Moleculares , Conformação de Ácido Nucleico , Biossíntese de Proteínas , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Estabilidade de RNA , RNA Mensageiro/química , Termodinâmica , TranscriptomaRESUMO
MOTIVATION: Target-specific hybridization depends on oligo-probe characteristics that improve hybridization specificity and minimize genome-wide cross-hybridization. Interplay between specific hybridization and genome-wide cross-hybridization has been insufficiently studied, despite its crucial role in efficient probe design and in data analysis. RESULTS: In this study, we defined hybridization specificity as a ratio between oligo target-specific hybridization and oligo genome-wide cross-hybridization. A microarray database, derived from the Genomic Comparison Hybridization (GCH) experiment and performed using the Affymetrix platform, contains two different types of probes. The first type of oligo-probes does not have a specific target on the genome and their hybridization signals are derived from genome-wide cross-hybridization alone. The second type includes oligonucleotides that have a specific target on the genomic DNA and their signals are derived from specific and cross-hybridization components combined together in a total signal. A comparative analysis of hybridization specificity of oligo-probes, as well as their nucleotide sequences and thermodynamic features was performed on the database. The comparison has revealed that hybridization specificity was negatively affected by low stability of the fully-paired oligo-target duplex, stable probe self-folding, G-rich content, including GGG motifs, low sequence complexity and nucleotide composition symmetry. CONCLUSION: Filtering out the probes with defined 'negative' characteristics significantly increases specific hybridization and dramatically decreasing genome-wide cross-hybridization. Selected oligo-probes have two times higher hybridization specificity on average, compared to the probes that were filtered from the analysis by applying suggested cutoff thresholds to the described parameters. A new approach for efficient oligo-probe design is described in our study. CONTACT: shabalin@ncbi.nlm.nih.gov or olga.matveeva@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Razão Sinal-Ruído , Sondas de DNA , Perfilação da Expressão Gênica , Genômica , Oligonucleotídeos , Sensibilidade e EspecificidadeRESUMO
Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.
Assuntos
Adaptação Biológica , Conformação de Ácido Nucleico , Biossíntese de Proteínas , Dobramento de Proteína , RNA Mensageiro/química , RNA Mensageiro/genética , Animais , Evolução Biológica , Humanos , Estabilidade de RNA , Seleção Genética , Relação Estrutura-AtividadeRESUMO
Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns.
Assuntos
Evolução Molecular , Isoformas de Proteínas/genética , Iniciação da Transcrição Genética , Terminação da Transcrição Genética , Animais , Variação Genética , Humanos , Camundongos , Proteoma , TranscriptomaAssuntos
Edição de Genes/métodos , Hibridização de Ácido Nucleico/métodos , Artefatos , Pareamento Incorreto de Bases/genética , Pareamento Incorreto de Bases/fisiologia , Sistemas CRISPR-Cas/genética , Sistemas CRISPR-Cas/fisiologia , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Edição de Genes/tendências , Humanos , Cinética , TermodinâmicaRESUMO
Messenger RNA is a key component of an intricate regulatory network of its own. It accommodates numerous nucleotide signals that overlap protein coding sequences and are responsible for multiple levels of regulation and generation of biological complexity. A wealth of structural and regulatory information, which mRNA carries in addition to the encoded amino acid sequence, raises the question of how these signals and overlapping codes are delineated along non-synonymous and synonymous positions in protein coding regions, especially in eukaryotes. Silent or synonymous codon positions, which do not determine amino acid sequences of the encoded proteins, define mRNA secondary structure and stability and affect the rate of translation, folding and post-translational modifications of nascent polypeptides. The RNA level selection is acting on synonymous sites in both prokaryotes and eukaryotes and is more common than previously thought. Selection pressure on the coding gene regions follows three-nucleotide periodic pattern of nucleotide base-pairing in mRNA, which is imposed by the genetic code. Synonymous positions of the coding regions have a higher level of hybridization potential relative to non-synonymous positions, and are multifunctional in their regulatory and structural roles. Recent experimental evidence and analysis of mRNA structure and interspecies conservation suggest that there is an evolutionary tradeoff between selective pressure acting at the RNA and protein levels. Here we provide a comprehensive overview of the studies that define the role of silent positions in regulating RNA structure and processing that exert downstream effects on proteins and their functions.
Assuntos
Regulação da Expressão Gênica , RNA Mensageiro/química , Códon , Evolução Molecular , Nucleotídeos/química , Biossíntese de Proteínas , Proteínas/genética , Dobramento de RNA , Estabilidade de RNA , Sequências Reguladoras de Ácido RibonucleicoRESUMO
We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes. We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences. Although the sequences of most lincRNAs are much less strongly conserved than protein sequences, the extent of orthology between the lincRNomes is unexpectedly high, with 60 to 70% of the lincRNA genes shared between human and mouse. The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.
Assuntos
Tamanho do Genoma , Genoma , RNA Longo não Codificante/genética , Animais , Bases de Dados Genéticas , Genômica , Humanos , CamundongosRESUMO
Aim: This study investigates factors influencing pandemic mortality rates across U.S. states during different waves of SARS-CoV-2 infection from February 2020 to April 2023, given that over one million people died from COVID-19 in the country. Methods: We performed statistical analyses and used linear regression models to estimate age-adjusted and unadjusted excess mortality as functions of life expectancy, vaccination rates, and GDP per capita in U.S. states. Results and Discussion: States with lower life expectancy and lower GDP per capita experienced significantly higher mortality rates during the pandemic, underscoring the critical role of underlying health conditions and healthcare infrastructure, as reflected in these factors. When categorizing states by vaccination rates, significant differences in GDP per capita and pre-pandemic life expectancy emerged between states with lower and higher vaccination rates, likely explaining mortality disparities before mass vaccination. During the Delta and Omicron BA.1 waves, when vaccines were widely available, the mortality gap widened, and states with lower vaccination rates experienced nearly double the mortality compared to states with higher vaccination rates (Odds Ratio 1.8, 95% CI 1.7-1.9, p < 0.01). This disparity disappeared during the later Omicron variants, likely because the levels of combined immunity from vaccination and widespread infection across state populations became comparable. We showed that vaccination rates were the only significant factor influencing age-adjusted mortality, highlighting the substantial impact of age-specific demographics on both life expectancy and GDP across states. Conclusion: The study underscores the critical role of high vaccination rates in reducing excess deaths across all states, regardless of economic status. Vaccination rates proved more decisive than GDP per capita in reducing excess deaths. Additionally, states with lower pre-pandemic life expectancy faced greater challenges, reflecting the combined effects of healthcare quality, demographic variations, and social determinants of health. These findings call for comprehensive public health strategies that address both immediate interventions, like vaccination, and long-term improvements in healthcare infrastructure and social conditions.
RESUMO
Microproteins encoded by small open reading frames (smORFs) comprise the "dark matter" of proteomes. Although functional microproteins were identified in diverse organisms from all three domains of life, bacterial smORFs remain poorly characterized. In this comprehensive study of intergenic smORFs (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identified 67,297 clusters of ismORFs subject to purifying selection. The ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins. Using AlphaFold Multimer, we predicted interactions of some of the predicted microproteins encoded by transcribed ismORFs with proteins encoded by neighboring genes, revealing the potential of microproteins to regulate the activity of various proteins, particularly, under stress. We compiled a catalog of predicted microprotein families with different levels of evidence from synteny analysis, structure prediction, and transcription and translation data. This study offers a resource for investigation of biological functions of microproteins.
RESUMO
Ribosomal DNA (rDNA) repeat units are organized into tandem clusters in eukaryotic cells. In mice, these clusters are located on at least eight chromosomes and show extensive variation in the number of repeats between mouse genomes. To analyze intra- and inter-genomic variation of mouse rDNA repeats, we selectively isolated 25 individual rDNA units using Transformation-Associated Recombination (TAR) cloning. Long-read sequencing and subsequent comparative sequence analysis revealed that each full-length unit comprises an intergenic spacer (IGS) and a â¼13.4 kb long transcribed region encoding the three rRNAs, but with substantial variability in rDNA unit size, ranging from â¼35 to â¼46 kb. Within the transcribed regions of rDNA units, we found 209 variants, 70 of which are in external transcribed spacers (ETSs); but the rDNA size differences are driven primarily by IGS size heterogeneity, due to indels containing repetitive elements and some functional signals such as enhancers. Further evolutionary analysis categorized rDNA units into distinct clusters with characteristic IGS lengths; numbers of enhancers; and presence/absence of two common SNPs in promoter regions, one of which is located within promoter (p)RNA and may influence pRNA folding stability. These characteristic features of IGSs also correlated significantly with 5'ETS variant patterns described previously and associated with differential expression of rDNA units. Our results suggest that variant rDNA units are differentially regulated and open a route to investigate the role of rDNA variation on nucleolar formation and possible associations with pathology.