Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Bull Math Biol ; 85(4): 24, 2023 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-36826719

RESUMO

Based on the circular code theory, we define a new function f that quantifies the property of reading frame retrieval (RFR) of genes from their codon usage. This RFR function f is computed on a massive scale in genes of genomes of bacteria, eukaryotes and archaea. By expressing f as a function of the mean number [Formula: see text] of codons per gene, a "universal" property is identified, whatever the kingdom: the reading frame retrieval is enhanced in large genes. By investigating this property according to the theory developed, a Spearman's rank correlation with a strong negative coefficient is observed between the codon usage dispersion d (from the uniform codon distribution [Formula: see text]) and the RFR function f, whatever the kingdom (p-values [Formula: see text] in bacteria, [Formula: see text] in eukaryotes and [Formula: see text] in archaea). Thus, the reading frame retrieval is enhanced with the codon usage dispersion. Furthermore, this approach identifies a "genome centre" from which emerge two distinct "genome arms": an upper arm and a lower arm, respectively, above and below the linear regression. The RFR function by itself or combined with classical methods (alignment, phylogeny) could also be a new approach to classify the genomes in the future.


Assuntos
Uso do Códon , Código Genético , Modelos Biológicos , Modelos Genéticos , Conceitos Matemáticos , Códon , Fases de Leitura , Bactérias/genética , Eucariotos
2.
RNA ; 25(12): 1714-1730, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31506380

RESUMO

The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution, we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.


Assuntos
Evolução Molecular , Código Genético , Motivos de Nucleotídeos , Biossíntese de Proteínas , Ribossomos/genética , Ribossomos/metabolismo , Modelos Biológicos , Modelos Moleculares , Conformação Molecular , Conformação de Ácido Nucleico , RNA Ribossômico/química , RNA Ribossômico/genética , Ribossomos/química , Relação Estrutura-Atividade
3.
RNA Biol ; 17(4): 571-583, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31960748

RESUMO

Three-base periodicity (TBP), where nucleotides and higher order n-tuples are preferentially spaced by 3, 6, 9, etc. bases, is a well-known intrinsic property of protein-coding DNA sequences. However, its origins are still not fully understood. One hypothesis is that the periodicity reflects a primordial coding system that was used before the emergence of the modern standard genetic code (SGC). Recent evidence suggests that the X circular code, a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, represents a possible ancestor of the SGC. Motifs from the X circular code have been found in the reading frame of protein-coding regions in extant organisms from bacteria to eukaryotes, in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase centre and the decoding centre. Here, we have used a powerful correlation function to search for periodicity patterns involving the 20 trinucleotides of the X circular code in a large set of bacterial protein-coding genes, as well as in the translation machinery, including rRNA and tRNA sequences. As might be expected, we found a strong circular code periodicity 0 modulo 3 in the protein-coding genes. More surprisingly, we also identified a similar circular code periodicity in a large region of the 16S rRNA. This region includes the 3' major domain corresponding to the primordial proto-ribosome decoding centre and containing numerous sites that interact with the tRNA and messenger RNA (mRNA) during translation. Furthermore, 3D structural analysis shows that the periodicity region surrounds the mRNA channel that lies between the head and the body of the SSU. Our results support the hypothesis that the X circular code may constitute an ancestral translation code involved in reading frame retrieval and maintenance, traces of which persist in modern mRNA, tRNA and rRNA despite their long evolution and adaptation to the SGC.


Assuntos
Bactérias/genética , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Ribossomos/genética , Algoritmos , Bactérias/metabolismo , Evolução Molecular , Código Genético , Periodicidade , RNA Bacteriano/genética , RNA Ribossômico/genética , RNA de Transferência/genética
4.
Bull Math Biol ; 82(8): 105, 2020 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-32754878

RESUMO

A code X is k-circular if any concatenation of at most k words from X, when read on a circle, admits exactly one partition into words from X. It is circular if it is k-circular for every integer k. While it is not a priori clear from the definition, there exists, for every pair [Formula: see text], an integer k such that every k-circular [Formula: see text]-letter code over an alphabet of cardinality n is circular, and we determine the least such integer k for all values of n and [Formula: see text]. The k-circular codes may represent an important evolutionary step between the circular codes, such as the comma-free codes, and the genetic code.


Assuntos
Modelos Genéticos , Evolução Biológica , Código Genético , Conceitos Matemáticos , Nucleotídeos
5.
Bull Math Biol ; 79(8): 1796-1819, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28643131

RESUMO

Comma-free codes constitute a class of circular codes, which has been widely studied, in particular by Golomb et al. (Biologiske Meddelelser, Kongelige Danske Videnskabernes Selskab 23:1-34, 1958a, Can J Math 10:202-209, 1958b), Michel et al. (Comput Math Appl 55:989-996, 2008a, Theor Comput Sci 401:17-26, 2008b, Inf Comput 212:55-63, 2012), Michel and Pirillo (Int J Comb 2011:659567, 2011), and Fimmel and Strüngmann (J Theor Biol 389:206-213, 2016). Based on a recent approach using graph theory to study circular codes Fimmel et al. (Philos Trans R Soc 374:20150058, 2016), a new class of circular codes, called strong comma-free codes, is identified. These codes detect a frameshift during the translation process immediately after a reading window of at most two nucleotides. We describe several combinatorial properties of strong comma-free codes: enumeration, maximality, self-complementarity and [Formula: see text]-property (comma-free property in all the three possible frames). These combinatorial results also highlight some new properties of the genetic code and its evolution. Each amino acid in the standard genetic code is coded by at least one strong comma-free code of size 1. There are 9 amino acids [Formula: see text] among 20 such that for each amino acid from S, its synonymous trinucleotide set (excluding the necessary periodic trinucleotides [Formula: see text]) is a strong comma-free code. The primeval comma-free RNY code of Eigen and Schuster (Naturwissenschaften 65:341-369, 1978) is a self-complementary [Formula: see text]-code of size 16. Furthermore, it is the union of two strong comma-free codes of size 8 which are complementary to each other.


Assuntos
Aminoácidos , Código Genético , Modelos Genéticos , Códon , Nucleotídeos
6.
J Theor Biol ; 408: 198-212, 2016 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-27444403

RESUMO

A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. In this paper, we develop several statistical analyzes of X motifs in 138 available complete genomes of eukaryotes in which genes as well as non-gene regions are examined. Large X motifs (with lengths of at least 15 consecutive trinucleotides of X and compositions of at least 10 different trinucleotides of X among 20) have the highest occurrence in genomes of eukaryotes compared to its 23 large bijective motifs, its two large permuted motifs and large random motifs. The largest X motifs identified in eukaryotic genomes are presented, e.g. an X motif in a non-gene region of the genome Solanum pennellii with a length of 155 trinucleotides (465 nucleotides) and an expectation E=10(-71). In the human genome, the largest X motif occurs in a non-gene region of the chromosome 13 with a length of 36 trinucleotides and an expectation E=10(-11). X motifs in non-gene regions of genomes could be evolutionary relics of primitive genes using the circular code for translation. However, the proportion of X motifs (with lengths of at least 10 consecutive trinucleotides of X and compositions of at least 5 different trinucleotides of X among 20) in genes/non-genes of the 138 complete eukaryotic genomes is about 8. Thus, the X motifs occur preferentially in genes, as expected from the previous works of 20 years.


Assuntos
Eucariotos/genética , Motivos de Nucleotídeos/genética , DNA Circular , Genoma/genética , Fases de Leitura/genética
7.
J Theor Biol ; 389: 40-6, 2016 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-26382231

RESUMO

We determine here the number and the list of maximal dinucleotide and trinucleotide circular codes. We prove that there is no maximal dinucleotide circular code having strictly less than 6 elements (maximum size of dinucleotide circular codes). On the other hand, a computer calculus shows that there are maximal trinucleotide circular codes with less than 20 elements (maximum size of trinucleotide circular codes). More precisely, there are maximal trinucleotide circular codes with 14, 15, 16, 17, 18 and 19 elements and no maximal trinucleotide circular code having less than 14 elements. We give the same information for the maximal self-complementary dinucleotide and trinucleotide circular codes. The amino acid distribution of maximal trinucleotide circular codes is also determined.


Assuntos
Aminoácidos/genética , Código Genético , Modelos Genéticos , Nucleotídeos/genética , Animais , Apicomplexa/genética , Bactérias/genética , Fungos/genética , Humanos , Modelos Teóricos , Nucleotídeos/química , Software , Vírus/genética
8.
J Theor Biol ; 365: 164-74, 2015 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-25311909

RESUMO

The reading frame coding (RFC) of codes (sets) of trinucleotides is a genetic concept which has been largely ignored during the last 50 years. An extended definition of the statistical parameter PrRFC (Michel, 2014) is proposed here for analysing the probability (efficiency) of reading frame coding of usage of any trinucleotide code. It is applied to the analysis of the RFC efficiency of usage of the C(3) self-complementary trinucleotide circular code X identified in prokaryotic and eukaryotic genes (Arquès and Michel, 1996). The usage of X is called usage XU. The highest RFC probabilities of usage XU are identified in bacterial plasmids and bacteria (about 49.0%). Then, by decreasing values, the RFC probabilities of usage XU are observed in archaea (47.5%), viruses (45.4%) and nuclear eukaryotes (42.8%). The lowest RFC probabilities of usage XU are found in mitochondria and chloroplasts (about 36.5%). Thus, genes contain information for reading frame coding. Such a genetic property which to our knowledge has never been identified, may bring new insights in the origin and evolution of the genetic code.


Assuntos
Archaea/genética , Bactérias/genética , Códon/genética , Eucariotos/genética , Evolução Molecular , Fases de Leitura/fisiologia , Cloroplastos/genética , Mitocôndrias/genética
9.
J Theor Biol ; 380: 156-77, 2015 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-25934352

RESUMO

In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to the two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2014, the number of trinucleotides in prokaryotic genes has been multiplied by a factor of 527. Furthermore, two new gene kingdoms of plasmids and viruses contain enough trinucleotide data to be analysed. The approach used in 1996 for identifying a preferential frame for a trinucleotide is quantified here with a new definition analysing the occurrence probability of a complementary/permutation (CP) trinucleotide set in a gene kingdom. Furthermore, in order to increase the statistical significance of results compared to those of 1996, the circular code X is studied on several gene taxonomic groups in a kingdom. Based on this new statistical approach, the circular code X is strengthened in genes of prokaryotes and eukaryotes, and now also identified in genes of plasmids. A subset of X with 18 or 16 trinucleotides is identified in genes of viruses. Furthermore, a simple probabilistic model based on the independent occurrence of trinucleotides in reading frame of genes explains the circular code frequencies and asymmetries observed in the shifted frames in all studied gene kingdoms. Finally, the developed approach allows to identify variant X codes in genes, i.e. trinucleotide codes which differ from X. In genes of bacteria, eukaryotes and plasmids, 14 among the 47 studied gene taxonomic groups (about 30%) have variant X codes. Seven variant X codes are identified with at least 16 trinucleotides of X. Two variant X codes XA in cyanobacteria and plasmids of cyanobacteria, and XD in birds are self-complementary, without permuted trinucleotides but non-circular. Five variant X codes XB in deinococcus, plasmids of chloroflexi and deinococcus, mammals and kinetoplasts, XC in elusimicrobia and apicomplexans, XE in fishes, XF in insects, and XG in basidiomycetes and plasmids of spirochaetes are C(3) self-complementary circular. In genes of viruses, no variant X code is found.


Assuntos
Genes Bacterianos , Genes Virais , Oligonucleotídeos/química , Plasmídeos , Células Eucarióticas , Modelos Teóricos , Probabilidade
10.
J Theor Biol ; 355: 83-94, 2014 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-24698943

RESUMO

The reading frame coding (RFC) of codes (sets) of trinucleotides is a genetic concept which has been largely ignored during the last 50 years. A first objective is the definition of a new and simple statistical parameter PrRFC for analysing the probability (efficiency) of reading frame coding (RFC) of any trinucleotide code. A second objective is to reveal different classes and subclasses of trinucleotide codes involved in reading frame coding: the circular codes of 20 trinucleotides and the bijective genetic codes of 20 trinucleotides coding the 20 amino acids. This approach allows us to propose a genetic scale of reading frame coding which ranges from 1/3 with the random codes (RFC probability identical in the three frames) to 1 with the comma-free circular codes (RFC probability maximal in the reading frame and null in the two shifted frames). This genetic scale shows, in particular, the reading frame coding probabilities of the 12,964,440 circular codes (PrRFC=83.2% in average), the 216 C(3) self-complementary circular codes (PrRFC=84.1% in average) including the code X identified in eukaryotic and prokaryotic genes (PrRFC=81.3%) and the 339,738,624 bijective genetic codes (PrRFC=61.5% in average) including the 52 codes without permuted trinucleotides (PrRFC=66.0% in average). Otherwise, the reading frame coding probabilities of each trinucleotide code coding an amino acid with the universal genetic code are also determined. The four amino acids Gly, Lys, Phe and Pro are coded by codes (not circular) with RFC probabilities equal to 2/3, 1/2, 1/2 and 2/3, respectively. The amino acid Leu is coded by a circular code (not comma-free) with a RFC probability equal to 18/19. The 15 other amino acids are coded by comma-free circular codes, i.e. with RFC probabilities equal to 1. The identification of coding properties in some classes of trinucleotide codes studied here may bring new insights in the origin and evolution of the genetic code.


Assuntos
Aminoácidos , Códon/fisiologia , Evolução Molecular , Modelos Genéticos , Fases de Leitura Aberta/fisiologia
11.
Biosystems ; 239: 105215, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38641199

RESUMO

A massive statistical analysis based on the autocorrelation function of the circular code X observed in genes is performed on the (eukaryotic) introns. Surprisingly, a circular code periodicity 0 modulo 3 is identified in 5 groups of introns: birds, ascomycetes, basidiomycetes, green algae and land plants. This circular code periodicity, which is a property of retrieving the reading frame in (protein coding) genes, may suggest that these introns have a coding property. In a well-known way, a periodicity 1 modulo 2 is observed in 6 groups of introns: amphibians, fishes, mammals, other animals, reptiles and apicomplexans. A mixed periodicity modulo 2 and 3 is found in the introns of insects. Astonishing, a subperiodicity 3 modulo 6 is a common statistical property in these 3 classes of introns. When the particular trinucleotides N1N2N1 of the circular code X are not considered, the circular code periodicity 0 modulo 3, hidden by the periodicity 1 modulo 2, is now retrieved in 5 groups of introns: amphibians, fishes, other animals, reptiles and insects. Thus, 10 groups of introns, taxonomically different, out of 12 have a coding property related to the reading frame retrieval. The trinucleotides N1N2N1 are analysed in the 216 maximal C3 self-complementary trinucleotide circular codes. A hexanucleotide code (words of 6 letters) is proposed to explain the periodicity 3 modulo 6. It could be a trace of more general circular codes at the origin of the circular code X.


Assuntos
Código Genético , Íntrons , Íntrons/genética , Animais , Código Genético/genética , Evolução Molecular
12.
Biosystems ; : 105263, 2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-38971553

RESUMO

In this work we present an analysis of the dinucleotide occurrences in the three codon sites 1-2, 2-3 and 1-3, based on a computation of the codon usage of three large sets of bacterial, archaeal and eukaryotic genes using the same method that identified a maximal C3 self-complementary trinucleotide circular code X in genes of bacteria and eukaryotes in 1996 Arquès and Michel (1996). Surprisingly, two dinucleotide circular codes are identified in the codon sites 1-2 and 2-3. Furthermore, these two codes are shifted versions of each other. Moreover, the dinucleotide code in the codon site 1-3 is circular, self-complementary and contained in the projection of X onto the 1st and 3rd bases, i.e. by cutting the middle base in each codon of X. We prove several results showing that the circularity and the self-complementarity of trinucleotide codes is induced by the circularity and the self-complementarity of its dinucleotide cut codes. Finally, we present several evolutionary approaches for an emergence of trinucleotide codes from dinucleotide codes.

13.
J Theor Biol ; 319: 116-21, 2013 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-23206387

RESUMO

We identify here a combinatorial property between circular code and genetic code. A circular code of 20 trinucleotides which allows to retrieve the reading frame has a permuted set of 20 trinucleotides which is a code, but not circular, coding the 20 amino acids in variant nuclear codes. This result is a contribution to the research field analysing the mathematical properties of genetic codes.


Assuntos
Aminoácidos , Código Genético/fisiologia , Modelos Genéticos
14.
Biosystems ; 229: 104906, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37196893

RESUMO

In this article, we introduce the new mathematical concept of circular mixed sets of words over an arbitrary finite alphabet. These circular mixed sets may not be codes in the classical sense and hence allow a higher amount of information to be encoded. After describing their basic properties, we generalize a recent graph theoretical approach for circularity and apply it to distinguish codes from sets (i.e. non-codes). Moreover, several methods are given to construct circular mixed sets. Finally, this approach allows us to propose a new evolution model of the present genetic code that could have evolved from a dinucleotide world to a trinucleotide world via circular mixed sets of dinucleotides and trinucleotides.


Assuntos
Código Genético , Modelos Genéticos , Código Genético/genética
16.
Bull Math Biol ; 74(8): 1764-88, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22644340

RESUMO

We introduce here a gene evolution model which is an extension of the time-continuous stochastic IDIS model (Lèbre and Michel in J. Comput. Biol. Chem. 34:259-267, 2010) to sequence length. This new IDISL (Insertion Deletion Independent of Substitution based on sequence Length) model gives an analytical expression of the residue occurrence probability p(l) at sequence length l depending on stochastically independent processes of substitution, insertion, and deletion. Furthermore, in contrast to all mathematical models in this research field, the substitution, insertion, and deletion parameters of the IDISL model are independent of each other. For any diagonalizable substitution matrix M, the residue occurrence probability p(l) is given as a function of the eigenvalues of M, the eigenvector matrix of M, a vector r of the residue insertion rates, a deletion rate d (unlike our previous IDIS model), and a vector of the initial residue occurrence probability p(l(0)) at sequence length l(0).As another difference with the classical evolution approaches which mainly focus on sequence alignment, the IDIS class of models allows a mathematical analysis of the behavior of the residue occurrence probability according to either evolution time or sequence length. The length parameter can be associated with any nucleotide regions: genes, genomes, introns, repeats, 5' and 3' regions, etc. Three properties of the IDISL model are given in relation with the sequence length l: parameter scale, inverse evolution, and residue equilibrium distribution. Nucleotide occurrence probabilities are given in the particular case of the IDISL-HKY model, i.e. the IDISL model associated with the HKY asymmetric substitution matrix (Hasegawa et al. in J. Mol. Evol. 22:160-174, 1985).An application of the IDISL model is developed for a massive statistical analysis of GC content in all complete bacterial genomes available to date (894 non-anaerobic and anaerobic genomes). The IDISL-HKY model confirms the increase of the GC content with the genome length for two non-anaerobic taxonomic groups of bacterial genomes. Moreover, the non-linear modelling proposed by the IDISL model outperforms the most recent modelling of GC content in these bacterial genomes (Wang et al. in Biochem. Biophys. Res. Commun. 342:681-684, 2006; Musto et al. in Biochem. Biophys. Res. Commun. 347:1-3, 2006).


Assuntos
Composição de Bases , Evolução Molecular , Genoma Bacteriano , Modelos Genéticos , Deleção de Sequência , Simulação por Computador , Nucleotídeos/genética
17.
Biosystems ; 217: 104667, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35351587

RESUMO

A code X is (⩾k)-circular if every concatenation of words from X that admits, when read on a circle, more than one partition into words from X, must contain at least k+1 words. In other words, the reading frame retrieval is guaranteed for any concatenation of up to k words from X. A code that is (⩾k)-circular for all integers k is said to be circular. Any code is (⩾0)-circular and it turns out that a code of trinucleotides is circular as soon as it is (⩾4)-circular. A code is k-circular if it is (⩾k)-circular and not (⩾k+1)-circular. Due to the explosive combinatorics of trinucleotide k-circular codes, we developed three classes of algorithms based on: (i) the smallest directed cycles (directed girth) in graphs; (ii) the eigenvalues of matrices; and (iii) the files that incrementally save partial results. These different approaches also allow us to verify the computational results obtained. We determine here the growth functions of trinucleotide k-circular codes, k varying between 0 and 4, in the general case and in various particular cases: minimum, minimal, maximum, self-complementary k-, (k,k,k)- and self-complementary (k,k,k)-circular.


Assuntos
Código Genético , Modelos Genéticos , Código Genético/genética , Fases de Leitura
18.
Biosystems ; 217: 104668, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35358608

RESUMO

A code X is (⩾k)-circular if every concatenation of words from X that admits, when read on a circle, more than one partition into words from X, must contain at least k+1 words. In other words, the reading frame retrieval is guaranteed for any concatenation of up to k words from X. A code that is (⩾k)-circular for all integers k is said to be circular. Any code is (⩾0)-circular and it turns out that a code of trinucleotides is circular as soon as it is (⩾4)-circular. A code is k-circular if it is (⩾k)-circular and not (⩾k+1)-circular. The theoretical aspects of trinucleotide k-circular codes have been developed in a companion article (Michel et al., 2022). Trinucleotide circular codes always retrieve the reading frame, leaving no ambiguous sequences. On the contrary, trinucleotide k-circular codes, for k∈{0,1,2,3} all have ambiguous sequences, for which the reading frame cannot always be retrieved. However, such a trinucleotide k-circular code is still able to retrieve the reading frame for a number of sequences, thereby exhibiting a partial circularity property. We describe this combinatorial property for each class of trinucleotide k-circular codes with k∈{0,1,2,3}. The circularity, i.e. the reading frame retrieval, is an ordinary property in genes. In order to consider the different cases of ambiguous sequences, we derive a new and general formula to measure the reading frame loss, whatever the trinucleotide k-circular code. This formula allows us to study the evolution of any trinucleotide k-circular code of (maximal) cardinality 20 to the genetic code, based on the reading frame retrieval property. We apply this approach to analyse the evolution of the trinucleotide circular code X observed in genes to the genetic code. The (⩾1)-circular codes of maximal size 20 necessarily have the same number of each nucleotide, specifically 15=3⋅20/4. This balanceness property can also be achieved by trinucleotide codes of cardinality 4,8,12 and 16. We call such trinucleotide codes balanced. We develop a general mathematical method to compute the number of balanced trinucleotide codes of each size, which also applies to self-complementary trinucleotide codes. We establish and quantify a relation between this balanceness property and the self-complementarity property. The combinatorial hierarchy of trinucleotide k-circular codes is updated with the growth function results. The numbers of amino acids coded by the trinucleotide k-circular codes are given for the cases maximal, minimal, self-complementary k-, (k,k,k)- and self-complementary (k,k,k)-circular.


Assuntos
Código Genético , Modelos Genéticos , Biologia , Código Genético/genética , Nucleotídeos/genética , Fases de Leitura
19.
J Theor Biol ; 288: 73-83, 2011 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-21827770

RESUMO

We generalize here the classical stochastic substitution models of nucleotides to genetic motifs of any size. This generalized model gives the analytical occurrence probabilities of genetic motifs as a function of a substitution matrix containing up to three formal parameters (substitution rates) per motif site and of an initial occurrence probability vector of genetic motifs. The evolution direction can be direct (past-present) or inverse (present-past). This extension has been made due to the identification of a Kronecker relation between the nucleotide substitution matrices and the motif substitution matrices. The evolution models for motifs of size 4 (tetranucleotides) and 5 (pentanucleotides) are now included in the SEGM (Stochastic Evolution of Genetic Motifs) web server.


Assuntos
Evolução Molecular , Modelos Genéticos , Motivos de Nucleotídeos/genética , Animais , Internet , Software , Processos Estocásticos
20.
Biosystems ; 206: 104431, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33894288

RESUMO

The X motifs, motifs from the circular code X, are enriched in the (protein coding) genes of bacteria, archaea, eukaryotes, plasmids and viruses, moreover, in the minimal gene set belonging to the three domains of life, as well as in tRNA and rRNA sequences. They allow to retrieve, maintain and synchronize the reading frame in genes, and contribute to the regulation of gene expression. These results lead here to a theoretical study of genes based on the circular code alphabet. A new occurrence relation of the circular code X under the hypothesis of an equiprobable (balanced) strand pairing is given. Surprisingly, a statistical analysis of a large set of bacterial genes retrieves this relation on the circular code alphabet, but not on the DNA alphabet. Furthermore, the circular code X has the strongest balanced circular code pairing among 216 maximal C3 self-complementary trinucleotide circular codes, a new property of this circular code X. As an application of this theory, different tRNAs studied on the circular code alphabet reveal an unexpected stem structure. Thus, the circular code X would have constructed a coding stem in tRNAs as an outline of the future gene structure and the future DNA double helix.


Assuntos
Genes Bacterianos/fisiologia , Código Genético/fisiologia , RNA Circular/fisiologia , RNA de Transferência/fisiologia , Animais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA