Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Bull Math Biol ; 85(4): 24, 2023 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-36826719

RESUMEN

Based on the circular code theory, we define a new function f that quantifies the property of reading frame retrieval (RFR) of genes from their codon usage. This RFR function f is computed on a massive scale in genes of genomes of bacteria, eukaryotes and archaea. By expressing f as a function of the mean number [Formula: see text] of codons per gene, a "universal" property is identified, whatever the kingdom: the reading frame retrieval is enhanced in large genes. By investigating this property according to the theory developed, a Spearman's rank correlation with a strong negative coefficient is observed between the codon usage dispersion d (from the uniform codon distribution [Formula: see text]) and the RFR function f, whatever the kingdom (p-values [Formula: see text] in bacteria, [Formula: see text] in eukaryotes and [Formula: see text] in archaea). Thus, the reading frame retrieval is enhanced with the codon usage dispersion. Furthermore, this approach identifies a "genome centre" from which emerge two distinct "genome arms": an upper arm and a lower arm, respectively, above and below the linear regression. The RFR function by itself or combined with classical methods (alignment, phylogeny) could also be a new approach to classify the genomes in the future.


Asunto(s)
Uso de Codones , Código Genético , Modelos Biológicos , Modelos Genéticos , Conceptos Matemáticos , Codón , Sistemas de Lectura , Bacterias/genética , Eucariontes
2.
RNA ; 25(12): 1714-1730, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31506380

RESUMEN

The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution, we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.


Asunto(s)
Evolución Molecular , Código Genético , Motivos de Nucleótidos , Biosíntesis de Proteínas , Ribosomas/genética , Ribosomas/metabolismo , Modelos Biológicos , Modelos Moleculares , Conformación Molecular , Conformación de Ácido Nucleico , ARN Ribosómico/química , ARN Ribosómico/genética , Ribosomas/química , Relación Estructura-Actividad
3.
Virol J ; 17(1): 131, 2020 08 27.
Artículo en Inglés | MEDLINE | ID: mdl-32854725

RESUMEN

BACKGROUND: The Covid19 infection is caused by the SARS-CoV-2 virus, a novel member of the coronavirus (CoV) family. CoV genomes code for a ORF1a / ORF1ab polyprotein and four structural proteins widely studied as major drug targets. The genomes also contain a variable number of open reading frames (ORFs) coding for accessory proteins that are not essential for virus replication, but appear to have a role in pathogenesis. The accessory proteins have been less well characterized and are difficult to predict by classical bioinformatics methods. METHODS: We propose a computational tool GOFIX to characterize potential ORFs in virus genomes. In particular, ORF coding potential is estimated by searching for enrichment in motifs of the X circular code, that is known to be over-represented in the reading frames of viral genes. RESULTS: We applied GOFIX to study the SARS-CoV-2 and related genomes including SARS-CoV and SARS-like viruses from bat, civet and pangolin hosts, focusing on the accessory proteins. Our analysis provides evidence supporting the presence of overlapping ORFs 7b, 9b and 9c in all the genomes and thus helps to resolve some differences in current genome annotations. In contrast, we predict that ORF3b is not functional in all genomes. Novel putative ORFs were also predicted, including a truncated form of the ORF10 previously identified in SARS-CoV-2 and a little known ORF overlapping the Spike protein in Civet-CoV and SARS-CoV. CONCLUSIONS: Our findings contribute to characterizing sequence properties of accessory genes of SARS coronaviruses, and especially the newly acquired genes making use of overlapping reading frames.


Asunto(s)
Betacoronavirus/genética , Genoma Viral , Sistemas de Lectura Abierta , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/genética , Proteínas Reguladoras y Accesorias Virales/genética , Animales , Codón , Biología Computacional , Evolución Molecular , Genes Virales , Humanos , SARS-CoV-2 , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Proteínas de la Matriz Viral/genética , Proteínas Virales/química , Proteínas Virales/genética , Proteínas Reguladoras y Accesorias Virales/química
4.
Naturwissenschaften ; 107(3): 20, 2020 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-32367155

RESUMEN

Stereochemical nucleotide-amino acid interactions, in the form of noncovalent nucleotide-amino acid interactions, potentially produced the genetic code's codon-amino acid assignments. Empirical estimates of single nucleotide-amino acid affinities on surfaces and in solution are used to test whether trinucleotide-amino acid affinities determined genetic code assignments pending the principle "first arrived, first served": presumed early amino acids have greater codon-amino acid affinities than ulterior ones. Here, these single nucleotide affinities are used to approximate all 64 × 20 trinucleotide-amino acid affinities. Analyses show that (1) on surfaces, genetic code codon-amino acid assignments tend to match high affinities for the amino acids that integrated earliest the genetic code (according to Wong's metabolic coevolution hypothesis between nucleotides and amino acids) and (2) in solution, the same principle holds for the anticodon-amino acid assignments. Affinity analyses match best genetic code assignments when assuming that trinucleotides competed for amino acids, rather than amino acids for trinucleotides. Codon-amino acid affinities stick better to genetic code assignments than anticodon-amino acid affinities. Presumably, two independent coding systems, on surfaces and in solution, converged, and formed the current translation system. Proto-translation on surfaces by direct codon-amino acid interactions without tRNA-like adaptors coadapted with a system emerging in solution by proto-tRNA anticodon-amino acid interactions. These systems assigned identical or similar cognates to codons on surfaces and to anticodons in solution. Results indicate that a prebiotic metabolism predated genetic code self-organization.


Asunto(s)
Aminoácidos/química , Aminoácidos/metabolismo , Codón/química , Codón/metabolismo , Evolución Biológica , Codón/genética , Estereoisomerismo
5.
RNA Biol ; 17(4): 571-583, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31960748

RESUMEN

Three-base periodicity (TBP), where nucleotides and higher order n-tuples are preferentially spaced by 3, 6, 9, etc. bases, is a well-known intrinsic property of protein-coding DNA sequences. However, its origins are still not fully understood. One hypothesis is that the periodicity reflects a primordial coding system that was used before the emergence of the modern standard genetic code (SGC). Recent evidence suggests that the X circular code, a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, represents a possible ancestor of the SGC. Motifs from the X circular code have been found in the reading frame of protein-coding regions in extant organisms from bacteria to eukaryotes, in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase centre and the decoding centre. Here, we have used a powerful correlation function to search for periodicity patterns involving the 20 trinucleotides of the X circular code in a large set of bacterial protein-coding genes, as well as in the translation machinery, including rRNA and tRNA sequences. As might be expected, we found a strong circular code periodicity 0 modulo 3 in the protein-coding genes. More surprisingly, we also identified a similar circular code periodicity in a large region of the 16S rRNA. This region includes the 3' major domain corresponding to the primordial proto-ribosome decoding centre and containing numerous sites that interact with the tRNA and messenger RNA (mRNA) during translation. Furthermore, 3D structural analysis shows that the periodicity region surrounds the mRNA channel that lies between the head and the body of the SSU. Our results support the hypothesis that the X circular code may constitute an ancestral translation code involved in reading frame retrieval and maintenance, traces of which persist in modern mRNA, tRNA and rRNA despite their long evolution and adaptation to the SGC.


Asunto(s)
Bacterias/genética , Proteínas Bacterianas/genética , Biología Computacional/métodos , Ribosomas/genética , Algoritmos , Bacterias/metabolismo , Evolución Molecular , Código Genético , Periodicidad , ARN Bacteriano/genética , ARN Ribosómico/genética , ARN de Transferencia/genética
6.
Bull Math Biol ; 82(8): 105, 2020 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-32754878

RESUMEN

A code X is k-circular if any concatenation of at most k words from X, when read on a circle, admits exactly one partition into words from X. It is circular if it is k-circular for every integer k. While it is not a priori clear from the definition, there exists, for every pair [Formula: see text], an integer k such that every k-circular [Formula: see text]-letter code over an alphabet of cardinality n is circular, and we determine the least such integer k for all values of n and [Formula: see text]. The k-circular codes may represent an important evolutionary step between the circular codes, such as the comma-free codes, and the genetic code.


Asunto(s)
Modelos Genéticos , Evolución Biológica , Código Genético , Conceptos Matemáticos , Nucleótidos
7.
Bull Math Biol ; 82(4): 48, 2020 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-32248310

RESUMEN

The origin of the modern genetic code and the mechanisms that have contributed to its present form raise many questions. The main goal of this work is to test two hypotheses concerning the development of the genetic code for their compatibility and complementarity and see if they could benefit from each other. On the one hand, Gonzalez, Giannerini and Rosa developed a theory, based on four-based codons, which they called tesserae. This theory can explain the degeneracy of the modern vertebrate mitochondrial code. On the other hand, in the 1990s, so-called circular codes were discovered in nature, which seem to ensure the maintenance of a correct reading-frame during the translation process. It turns out that the two concepts not only do not contradict each other, but on the contrary complement and enrichen each other.


Asunto(s)
Evolución Molecular , Código Genético , Modelos Genéticos , Animales , Codón , Genes Mitocondriales , Humanos , Conceptos Matemáticos , Biosíntesis de Proteínas , Sistemas de Lectura , Vertebrados/genética
8.
J Theor Biol ; 471: 108-116, 2019 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-30935956

RESUMEN

BACKGROUND: Theoretical minimal RNA rings form stem-loop hairpins coding for each of the 20 amino acids and a stop, presumably mimicking life's first minimal coding and self-replicating RNAs. They resemble consensual tRNAs. Mean amino acid positions in proteins follow the genetic code's consensual amino acid inclusion order, a 5'-late-to-3'-early amino acid gradient. HYPOTHESIS: We translated minimal RNA rings to test whether translated peptides share that gradient with modern proteins, using a) ribosomal translation, non-overlapping consecutive codons; and b) frameless translation advancing nucleotide by nucleotide, producing partially overlapping codons. RESULTS: For frameless translation, most RNA rings code for a 5'-late-to-3'early amino acid gradient. Gradients indicate decreasing amino acid metabolic costs, from large to small amino acids. For ribosomal translation, the 5'-late-to-3'early amino acid gradient evolves from early to late RNA rings when ranked according to yields in Miller's experiment of their predicted anticodon's cognate amino acid. CONCLUSIONS: Simulations that produced in silico minimal RNA rings didn't account for coded amino acid properties. Yet, produced peptides remind actual proteins, and suggest ancestral frameless translation of partially overlapping trinucleotides advancing by single nucleotide steps, constrained by resource scarcity. Minimal RNA rings reflect the transition from frameless to ribosomal translation and are realistic candidates for ancestral tRNAs.


Asunto(s)
Aminoácidos , Codón , Modelos Biológicos , Conformación de Ácido Nucleico , Biosíntesis de Proteínas , ARN de Transferencia , Ribosomas/metabolismo , Aminoácidos/genética , Aminoácidos/metabolismo , Péptidos/genética , Péptidos/metabolismo , ARN de Transferencia/genética , ARN de Transferencia/metabolismo
9.
J Theor Biol ; 408: 198-212, 2016 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-27444403

RESUMEN

A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. In this paper, we develop several statistical analyzes of X motifs in 138 available complete genomes of eukaryotes in which genes as well as non-gene regions are examined. Large X motifs (with lengths of at least 15 consecutive trinucleotides of X and compositions of at least 10 different trinucleotides of X among 20) have the highest occurrence in genomes of eukaryotes compared to its 23 large bijective motifs, its two large permuted motifs and large random motifs. The largest X motifs identified in eukaryotic genomes are presented, e.g. an X motif in a non-gene region of the genome Solanum pennellii with a length of 155 trinucleotides (465 nucleotides) and an expectation E=10(-71). In the human genome, the largest X motif occurs in a non-gene region of the chromosome 13 with a length of 36 trinucleotides and an expectation E=10(-11). X motifs in non-gene regions of genomes could be evolutionary relics of primitive genes using the circular code for translation. However, the proportion of X motifs (with lengths of at least 10 consecutive trinucleotides of X and compositions of at least 5 different trinucleotides of X among 20) in genes/non-genes of the 138 complete eukaryotic genomes is about 8. Thus, the X motifs occur preferentially in genes, as expected from the previous works of 20 years.


Asunto(s)
Eucariontes/genética , Motivos de Nucleótidos/genética , ADN Circular , Genoma/genética , Sistemas de Lectura/genética
10.
J Theor Biol ; 389: 40-6, 2016 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-26382231

RESUMEN

We determine here the number and the list of maximal dinucleotide and trinucleotide circular codes. We prove that there is no maximal dinucleotide circular code having strictly less than 6 elements (maximum size of dinucleotide circular codes). On the other hand, a computer calculus shows that there are maximal trinucleotide circular codes with less than 20 elements (maximum size of trinucleotide circular codes). More precisely, there are maximal trinucleotide circular codes with 14, 15, 16, 17, 18 and 19 elements and no maximal trinucleotide circular code having less than 14 elements. We give the same information for the maximal self-complementary dinucleotide and trinucleotide circular codes. The amino acid distribution of maximal trinucleotide circular codes is also determined.


Asunto(s)
Aminoácidos/genética , Código Genético , Modelos Genéticos , Nucleótidos/genética , Animales , Apicomplexa/genética , Bacterias/genética , Hongos/genética , Humanos , Modelos Teóricos , Nucleótidos/química , Programas Informáticos , Virus/genética
11.
J Theor Biol ; 389: 206-13, 2016 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-26562635

RESUMEN

The problem of retrieval and maintenance of the correct reading frame plays a significant role in RNA transcription. Circular codes, and especially comma-free codes, can help to understand the underlying mechanisms of error-detection in this process. In recent years much attention has been paid to the investigation of trinucleotide circular codes (see, for instance, Fimmel et al., 2014; Fimmel and Strüngmann, 2015a; Michel and Pirillo, 2012; Michel et al., 2012, 2008), while dinucleotide codes had been touched on only marginally, even though dinucleotides are associated to important biological functions. Recently, all maximal dinucleotide circular codes were classified (Fimmel et al., 2015; Michel and Pirillo, 2013). The present paper studies maximal dinucleotide comma-free codes and their close connection to maximal dinucleotide circular codes. We give a construction principle for such codes and provide a graphical representation that allows them to be visualized geometrically. Moreover, we compare the results for dinucleotide codes with the corresponding situation for trinucleotide maximal self-complementary C(3)-codes. Finally, the results obtained are discussed with respect to Crick׳s hypothesis about frame-shift-detecting codes without commas.


Asunto(s)
Código Genético , Nucleótidos/química , Algoritmos , Aminoácidos/química , Codón , Gráficos por Computador , Simulación por Computador , Evolución Molecular , Genoma , Modelos Genéticos , Nucleótidos/genética , ARN/genética , Reproducibilidad de los Resultados , Transcripción Genética
12.
J Theor Biol ; 365: 164-74, 2015 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-25311909

RESUMEN

The reading frame coding (RFC) of codes (sets) of trinucleotides is a genetic concept which has been largely ignored during the last 50 years. An extended definition of the statistical parameter PrRFC (Michel, 2014) is proposed here for analysing the probability (efficiency) of reading frame coding of usage of any trinucleotide code. It is applied to the analysis of the RFC efficiency of usage of the C(3) self-complementary trinucleotide circular code X identified in prokaryotic and eukaryotic genes (Arquès and Michel, 1996). The usage of X is called usage XU. The highest RFC probabilities of usage XU are identified in bacterial plasmids and bacteria (about 49.0%). Then, by decreasing values, the RFC probabilities of usage XU are observed in archaea (47.5%), viruses (45.4%) and nuclear eukaryotes (42.8%). The lowest RFC probabilities of usage XU are found in mitochondria and chloroplasts (about 36.5%). Thus, genes contain information for reading frame coding. Such a genetic property which to our knowledge has never been identified, may bring new insights in the origin and evolution of the genetic code.


Asunto(s)
Archaea/genética , Bacterias/genética , Codón/genética , Eucariontes/genética , Evolución Molecular , Sistemas de Lectura/fisiología , Cloroplastos/genética , Mitocondrias/genética
13.
J Theor Biol ; 380: 156-77, 2015 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-25934352

RESUMEN

In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to the two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2014, the number of trinucleotides in prokaryotic genes has been multiplied by a factor of 527. Furthermore, two new gene kingdoms of plasmids and viruses contain enough trinucleotide data to be analysed. The approach used in 1996 for identifying a preferential frame for a trinucleotide is quantified here with a new definition analysing the occurrence probability of a complementary/permutation (CP) trinucleotide set in a gene kingdom. Furthermore, in order to increase the statistical significance of results compared to those of 1996, the circular code X is studied on several gene taxonomic groups in a kingdom. Based on this new statistical approach, the circular code X is strengthened in genes of prokaryotes and eukaryotes, and now also identified in genes of plasmids. A subset of X with 18 or 16 trinucleotides is identified in genes of viruses. Furthermore, a simple probabilistic model based on the independent occurrence of trinucleotides in reading frame of genes explains the circular code frequencies and asymmetries observed in the shifted frames in all studied gene kingdoms. Finally, the developed approach allows to identify variant X codes in genes, i.e. trinucleotide codes which differ from X. In genes of bacteria, eukaryotes and plasmids, 14 among the 47 studied gene taxonomic groups (about 30%) have variant X codes. Seven variant X codes are identified with at least 16 trinucleotides of X. Two variant X codes XA in cyanobacteria and plasmids of cyanobacteria, and XD in birds are self-complementary, without permuted trinucleotides but non-circular. Five variant X codes XB in deinococcus, plasmids of chloroflexi and deinococcus, mammals and kinetoplasts, XC in elusimicrobia and apicomplexans, XE in fishes, XF in insects, and XG in basidiomycetes and plasmids of spirochaetes are C(3) self-complementary circular. In genes of viruses, no variant X code is found.


Asunto(s)
Genes Bacterianos , Genes Virales , Oligonucleótidos/química , Plásmidos , Células Eucariotas , Modelos Teóricos , Probabilidad
14.
J Theor Biol ; 355: 83-94, 2014 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-24698943

RESUMEN

The reading frame coding (RFC) of codes (sets) of trinucleotides is a genetic concept which has been largely ignored during the last 50 years. A first objective is the definition of a new and simple statistical parameter PrRFC for analysing the probability (efficiency) of reading frame coding (RFC) of any trinucleotide code. A second objective is to reveal different classes and subclasses of trinucleotide codes involved in reading frame coding: the circular codes of 20 trinucleotides and the bijective genetic codes of 20 trinucleotides coding the 20 amino acids. This approach allows us to propose a genetic scale of reading frame coding which ranges from 1/3 with the random codes (RFC probability identical in the three frames) to 1 with the comma-free circular codes (RFC probability maximal in the reading frame and null in the two shifted frames). This genetic scale shows, in particular, the reading frame coding probabilities of the 12,964,440 circular codes (PrRFC=83.2% in average), the 216 C(3) self-complementary circular codes (PrRFC=84.1% in average) including the code X identified in eukaryotic and prokaryotic genes (PrRFC=81.3%) and the 339,738,624 bijective genetic codes (PrRFC=61.5% in average) including the 52 codes without permuted trinucleotides (PrRFC=66.0% in average). Otherwise, the reading frame coding probabilities of each trinucleotide code coding an amino acid with the universal genetic code are also determined. The four amino acids Gly, Lys, Phe and Pro are coded by codes (not circular) with RFC probabilities equal to 2/3, 1/2, 1/2 and 2/3, respectively. The amino acid Leu is coded by a circular code (not comma-free) with a RFC probability equal to 18/19. The 15 other amino acids are coded by comma-free circular codes, i.e. with RFC probabilities equal to 1. The identification of coding properties in some classes of trinucleotide codes studied here may bring new insights in the origin and evolution of the genetic code.


Asunto(s)
Aminoácidos , Codón/fisiología , Evolución Molecular , Modelos Genéticos , Sistemas de Lectura Abierta/fisiología
15.
Biosystems ; 239: 105215, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38641199

RESUMEN

A massive statistical analysis based on the autocorrelation function of the circular code X observed in genes is performed on the (eukaryotic) introns. Surprisingly, a circular code periodicity 0 modulo 3 is identified in 5 groups of introns: birds, ascomycetes, basidiomycetes, green algae and land plants. This circular code periodicity, which is a property of retrieving the reading frame in (protein coding) genes, may suggest that these introns have a coding property. In a well-known way, a periodicity 1 modulo 2 is observed in 6 groups of introns: amphibians, fishes, mammals, other animals, reptiles and apicomplexans. A mixed periodicity modulo 2 and 3 is found in the introns of insects. Astonishing, a subperiodicity 3 modulo 6 is a common statistical property in these 3 classes of introns. When the particular trinucleotides N1N2N1 of the circular code X are not considered, the circular code periodicity 0 modulo 3, hidden by the periodicity 1 modulo 2, is now retrieved in 5 groups of introns: amphibians, fishes, other animals, reptiles and insects. Thus, 10 groups of introns, taxonomically different, out of 12 have a coding property related to the reading frame retrieval. The trinucleotides N1N2N1 are analysed in the 216 maximal C3 self-complementary trinucleotide circular codes. A hexanucleotide code (words of 6 letters) is proposed to explain the periodicity 3 modulo 6. It could be a trace of more general circular codes at the origin of the circular code X.


Asunto(s)
Código Genético , Intrones , Intrones/genética , Animales , Código Genético/genética , Evolución Molecular
16.
Biosystems ; 217: 104667, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35351587

RESUMEN

A code X is (⩾k)-circular if every concatenation of words from X that admits, when read on a circle, more than one partition into words from X, must contain at least k+1 words. In other words, the reading frame retrieval is guaranteed for any concatenation of up to k words from X. A code that is (⩾k)-circular for all integers k is said to be circular. Any code is (⩾0)-circular and it turns out that a code of trinucleotides is circular as soon as it is (⩾4)-circular. A code is k-circular if it is (⩾k)-circular and not (⩾k+1)-circular. Due to the explosive combinatorics of trinucleotide k-circular codes, we developed three classes of algorithms based on: (i) the smallest directed cycles (directed girth) in graphs; (ii) the eigenvalues of matrices; and (iii) the files that incrementally save partial results. These different approaches also allow us to verify the computational results obtained. We determine here the growth functions of trinucleotide k-circular codes, k varying between 0 and 4, in the general case and in various particular cases: minimum, minimal, maximum, self-complementary k-, (k,k,k)- and self-complementary (k,k,k)-circular.


Asunto(s)
Código Genético , Modelos Genéticos , Código Genético/genética , Sistemas de Lectura
17.
Biosystems ; 217: 104668, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35358608

RESUMEN

A code X is (⩾k)-circular if every concatenation of words from X that admits, when read on a circle, more than one partition into words from X, must contain at least k+1 words. In other words, the reading frame retrieval is guaranteed for any concatenation of up to k words from X. A code that is (⩾k)-circular for all integers k is said to be circular. Any code is (⩾0)-circular and it turns out that a code of trinucleotides is circular as soon as it is (⩾4)-circular. A code is k-circular if it is (⩾k)-circular and not (⩾k+1)-circular. The theoretical aspects of trinucleotide k-circular codes have been developed in a companion article (Michel et al., 2022). Trinucleotide circular codes always retrieve the reading frame, leaving no ambiguous sequences. On the contrary, trinucleotide k-circular codes, for k∈{0,1,2,3} all have ambiguous sequences, for which the reading frame cannot always be retrieved. However, such a trinucleotide k-circular code is still able to retrieve the reading frame for a number of sequences, thereby exhibiting a partial circularity property. We describe this combinatorial property for each class of trinucleotide k-circular codes with k∈{0,1,2,3}. The circularity, i.e. the reading frame retrieval, is an ordinary property in genes. In order to consider the different cases of ambiguous sequences, we derive a new and general formula to measure the reading frame loss, whatever the trinucleotide k-circular code. This formula allows us to study the evolution of any trinucleotide k-circular code of (maximal) cardinality 20 to the genetic code, based on the reading frame retrieval property. We apply this approach to analyse the evolution of the trinucleotide circular code X observed in genes to the genetic code. The (⩾1)-circular codes of maximal size 20 necessarily have the same number of each nucleotide, specifically 15=3⋅20/4. This balanceness property can also be achieved by trinucleotide codes of cardinality 4,8,12 and 16. We call such trinucleotide codes balanced. We develop a general mathematical method to compute the number of balanced trinucleotide codes of each size, which also applies to self-complementary trinucleotide codes. We establish and quantify a relation between this balanceness property and the self-complementarity property. The combinatorial hierarchy of trinucleotide k-circular codes is updated with the growth function results. The numbers of amino acids coded by the trinucleotide k-circular codes are given for the cases maximal, minimal, self-complementary k-, (k,k,k)- and self-complementary (k,k,k)-circular.


Asunto(s)
Código Genético , Modelos Genéticos , Biología , Código Genético/genética , Nucleótidos/genética , Sistemas de Lectura
18.
Biosystems ; 203: 104368, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-33567309

RESUMEN

The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.


Asunto(s)
Codón/genética , Regulación de la Expresión Génica/genética , Motivos de Nucleótidos/genética , Código Genético/genética , Sistemas de Lectura , Ribosomas
19.
Gene ; 769: 145208, 2021 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-33031892

RESUMEN

Genetic code codon-amino acid assignments evolve for 15 (AAA, AGA, AGG, ATA, CGG, CTA, CTG. CTC, CTT, TAA, TAG, TCA, TCG, TGA and TTA (GNN codons notably absent)) among 64 codons (23.4%) across the 31 genetic codes (NCBI list completed with recently suggested green algal mitochondrial genetic codes). Their usage in 25 theoretical minimal RNA rings is examined. RNA rings are designed in silico to code once over the shortest length for all 22 coding signals (start and stop codons and each amino acid according to the standard genetic code). Though designed along coding constraints, RNA rings resemble ancestral tRNA loops, assigning to each RNA ring a putative anticodon, a cognate amino acid and an evolutionary genetic code integration rank for that cognate amino acid. Analyses here show 1. biases against/for evolvable codons in the two first vs last thirds of RNA ring coding sequences, 2. RNA rings with evolvable codons have recent cognates, and 3. evolvable codon and cytosine numbers in RNA ring compositions are positively correlated. Applying alternative genetic codes to RNA rings designed for nonredundant coding according to the standard genetic code reveals unsuspected properties of the standard genetic code and of RNA rings, notably on codon assignment evolvability and the special role of cytosine in relation to codon assignment evolvability and of the genetic code's coding structure.


Asunto(s)
Codón , Evolución Molecular , ARN Circular/genética , Simulación por Computador , Código Genético , ARN de Transferencia/genética
20.
Biosystems ; 206: 104431, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-33894288

RESUMEN

The X motifs, motifs from the circular code X, are enriched in the (protein coding) genes of bacteria, archaea, eukaryotes, plasmids and viruses, moreover, in the minimal gene set belonging to the three domains of life, as well as in tRNA and rRNA sequences. They allow to retrieve, maintain and synchronize the reading frame in genes, and contribute to the regulation of gene expression. These results lead here to a theoretical study of genes based on the circular code alphabet. A new occurrence relation of the circular code X under the hypothesis of an equiprobable (balanced) strand pairing is given. Surprisingly, a statistical analysis of a large set of bacterial genes retrieves this relation on the circular code alphabet, but not on the DNA alphabet. Furthermore, the circular code X has the strongest balanced circular code pairing among 216 maximal C3 self-complementary trinucleotide circular codes, a new property of this circular code X. As an application of this theory, different tRNAs studied on the circular code alphabet reveal an unexpected stem structure. Thus, the circular code X would have constructed a coding stem in tRNAs as an outline of the future gene structure and the future DNA double helix.


Asunto(s)
Genes Bacterianos/fisiología , Código Genético/fisiología , ARN Circular/fisiología , ARN de Transferencia/fisiología , Animales , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA