Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Dokl Biochem Biophys ; 477(1): 398-400, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-29297128

RESUMO

A new mathematical method was used for the first time to search for tandem repeats with insertions and deletions in the full-length sequence of the A. thaliana genome. The method is based on a new algorithm for multiple alignment of sequences of certain periods without using paired comparisons of sequences. We identified 13997 periodic sites 2 to 50 characters long, only approximately 30% of which were known earlier. The possible origin and use of the identified sites with tandem repeats are discussed.


Assuntos
Arabidopsis/genética , Genoma de Planta/genética , Mutagênese Insercional , Deleção de Sequência , Sequências de Repetição em Tandem/genética
2.
Biofizika ; 60(6): 1057-68, 2015.
Artigo em Russo | MEDLINE | ID: mdl-26841498

RESUMO

A mathematical method was developed in order to search for latent periodicity in protein amino acid and other symbolical sequences using the dynamic programming and random matrixes. The method permits detection of the latent periodicity with insertions and deletions in the previously unknown positions. The developed method was applied to search for the periodicity in the amino acid sequences of some proteins and the periodicity in EUR/USD exchange rate since 2001. The presence of the long period length with insertions and deletions in amino acid sequences was shown. The period length of 7 amino acids was found in proteins containing supercoiled areas (coiled coil), the period length of 6 and 5 and more amino acids was also demonstrated. The existence of the period length of 6 and 7 days as well as 24 and 25 hours in the analyzed financial time series, which can be detected with insertions and deletions only, is revealed. The reasons of the occurence of the latent periodicity with insertions and deletions in the amino acid sequences and financial time series are discussed.


Assuntos
Sequência de Aminoácidos/genética , Aminoácidos/genética , Modelos Teóricos , Algoritmos , Mutação INDEL/genética , Proteínas/química , Proteínas/genética
3.
Appl Biochem Microbiol ; 57(2): 271-279, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33727728

RESUMO

In Russia and around the world, there are important questions regarding the potential threats to national and biological safety created by genetic technologies and the need to improve or introduce new, justified, and adequate measures for their control, regulation, and prevention. The article shows that a significant volume of the global market is occupied by five major transgenic crops, and producers are ready to switch to crops with an edited genome that has been approved in the United States, Argentina, and other countries. We propose a qualitatively new approach to the risk assessment of edited plants, "Safe Design," and we have also developed an extremely important, fundamentally new approach to the development of methods that combine next-generation sequencing (NGS) and Bioinformatics for the assessment of the crop import biosafety. The proposed mathematical approach provides a detailed analysis of the possible insertions of DNA fragments into the genome of edited crops and a clarification of their biological significance. The developed method can be used in the rapid screening of plants for the presence of potentially dangerous genes, viral sequences, and nonspecific promoter sequences.

4.
DNA Res ; 26(2): 157-170, 2019 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-30726896

RESUMO

A new mathematical method for potential reading frameshift detection in protein-coding sequences (cds) was developed. The algorithm is adjusted to the triplet periodicity of each analysed sequence using dynamic programming and a genetic algorithm. This does not require any preliminary training. Using the developed method, cds from the Arabidopsis thaliana genome were analysed. In total, the algorithm found 9,930 sequences containing one or more potential reading frameshift(s). This is ∼21% of all analysed sequences of the genome. The Type I and Type II error rates were estimated as 11% and 30%, respectively. Similar results were obtained for the genomes of Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Rattus norvegicus and Xenopus tropicalis. Also, the developed algorithm was tested on 17 bacterial genomes. We compared our results with the previously obtained data on the search for potential reading frameshifts in these genomes. This study discussed the possibility that the reading frameshift seems like a relatively frequently encountered mutation; and this mutation could participate in the creation of new genes and proteins.


Assuntos
Algoritmos , Arabidopsis/genética , Mutação da Fase de Leitura , Genoma , Fases de Leitura Aberta , Análise de Sequência de DNA/métodos , Animais , Bactérias/genética , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Humanos , Ratos
5.
Gene ; 421(1-2): 52-60, 2008 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-18593596

RESUMO

We introduce a new concept of triplet periodicity class (TPC) and a measure of similarity between such classes. We performed classification of 472288 triplet periodicity (TP) regions found in 578868 genes from 29th release of KEGG databank. Totally 2520 classes were obtained. They contain 94% of 472288 found cases of TP. For 92% of TP regions contained in classes the same linkage of TP to open reading frame (ORF) is observed. For 8% of TP cases we revealed a shift between ORF of a gene and ORF common for majority of genes contained in a TPC. For these 8% of periodic regions the hypothetical amino acid sequences corresponding to ORF built by TPC were made. BLAST program has shown that 2679 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of TP regions contained in classes possess a mutation originating from ORF shift. Obtained TPCs can be used for identification of genes' coding regions as well as for searching for mutations arisen arising from ORF shift.


Assuntos
Fases de Leitura Aberta , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Sequência de Bases , Classificação/métodos , Genes , Dados de Sequência Molecular , Proteínas/química , Análise de Sequência de DNA , Análise de Sequência de Proteína
6.
Mol Biol (Mosk) ; 42(4): 707-20, 2008.
Artigo em Russo | MEDLINE | ID: mdl-18856072

RESUMO

We conducted classification for 472,288 regions of triplet periodicity found in 578,868 genes from release 29 of KEGG databank. A new concept of triplet periodicity class and a measure of similarity between them are introduced. Totally 2520 classes were created that contain 94% of found triplet periodicity. For 92% of triplet periodicity regions contained in classes an identical linkage of triplet periodicity to reading frame is observed. For the rest triplet periodicity cases a shift between reading frame of a gene and reading frame common for majority of genes contained in a class of triplet periodicity was observed. These periodicity regions were encoded into hypothetical amino acid sequences in accordance with reading frame built by triplet periodicity class. By BLAST program it was shown that 2660 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of triplet periodicity regions that joined classes mutated by means of reading frame shift. Created classes of triplet periodicity can be used for identification of coding regions of genes as well as for searching for mutations arisen from reading frame shift.


Assuntos
Bases de Dados Genéticas , Mutação da Fase de Leitura , Modelos Genéticos , Fases de Leitura Aberta/genética , Análise de Sequência de Proteína/métodos , Repetições de Trinucleotídeos/genética
7.
Genetika ; 44(1): 120-36, 2008 Jan.
Artigo em Russo | MEDLINE | ID: mdl-18409394

RESUMO

The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the GenBank database, 14766 sequences with a periodicity of two nucleotides have been found. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed.


Assuntos
DNA de Plantas/genética , Repetições de Dinucleotídeos/genética , Genoma de Planta/genética , Modelos Genéticos , Plantas/genética , Análise de Sequência de DNA
8.
Mol Biol (Mosk) ; 39(3): 420-36, 2005.
Artigo em Russo | MEDLINE | ID: mdl-15981572

RESUMO

We identified latent periodicity in catalytic domains of approximately 85% of serine/threonine and tyrosine protein kinases. Similar results were obtained for other 22 protein domains. We also designed the method of noise decomposition, which is aimed to distinguish between different periodicity types of the same period length. The method is to be used in conjunction with the cyclic profile alignment, and this combination is able to reveal structure-related or function-related patterns of latent periodicity. Possible origins of the periodic structure of protein kinase active sites are discussed. Summarizing, we presume that latent periodicity is the common property of many catalytic protein domains.


Assuntos
Algoritmos , Proteínas Serina-Treonina Quinases/química , Proteínas Tirosina Quinases/química , Sequência de Aminoácidos , Animais , Biologia Computacional , Bases de Dados de Proteínas , Humanos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos
9.
DNA Res ; 3(3): 157-64, 1996 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-8905233

RESUMO

The concept of nucleic acid sequence base alternations is presented. The number of base alterations for the sequences of different length is established. The definition of "enlarged similarity" of nucleic acids sequences on the basis of sequence base alterations is introduced. Mutual information between sequences is used as a quantitative measure of enlarged similarity for two compared sequences. The method of mutual information calculation is developed considering the correlation of bases in compared sequences. The definitions of correlated similarity and evolution similarity between compared sequences are given. Results of the use of enlarged similarity approach for DNA sequences analysis are discussed.


Assuntos
Homologia de Sequência do Ácido Nucleico , Sequência de Bases , Entropia , Modelos Genéticos , Dados de Sequência Molecular
10.
DNA Res ; 6(3): 153-63, 1999 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-10470846

RESUMO

An earlier reported method for revealing latent periodicity of the nucleotide sequences has been considerably modified in a case of small samples, by applying a Monte Carlo method. This improved method has been used to search for the latent periodicity of some nucleotide sequences of the EMBL data bank. The existence of the nucleotide sequences' latent periodicity has been shown for some genes. The results obtained have implied that periodicity of gene structure is projected onto the periodicity of primary amino acid sequences and, further, onto spatial protein conformation. Even though the periodic structure of gene sequences has been eroded, it is still retained in primary and/or spatial structures of corresponding proteins. Furthermore, in a few cases the study of genes' periodicity has suggested their possible evolutionary origin by multifold duplications of some gene's fragments.


Assuntos
Sequência de Bases/genética , Biologia Computacional , Dados de Sequência Molecular , Método de Monte Carlo
11.
DNA Seq ; 4(6): 413-5, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-7841466

RESUMO

A new algorithm for scanning sequences is described. This algorithm uses the boolean operators AND and OR. The mutual information between the sequences is used as a measure of sequence interrelation. It allows evaluation of the probability of accidental sequence interrelation in a quantitative manner. The proposed algorithm was used for searching for MB1 repeats in human and other mammalian sequences.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA , Animais , Bases de Dados Factuais , Humanos , Nucleotídeos de Purina , Nucleotídeos de Pirimidina , Sequências Repetitivas de Ácido Nucleico
12.
DNA Seq ; 5(6): 353-8, 1995.
Artigo em Inglês | MEDLINE | ID: mdl-8777314

RESUMO

The mutual information is used to reveal of DNA sequences latent periodicity. Latent periodicity of DNA sequence is periodic. ity with low level of homology between any two periods inside DNA sequence. The mutual information between artificial numerical sequence and DNA sequence is calculated. The length of artificial sequence period is changed from 2 to 250. High level of mutual information between artificial and DNA sequences allows to find any type of latent periodicity of DNA sequence. The latent periodicity of some DNA coding regions is considered. For example, 24 exon of Apo B-100 gene from HSAP821 clone contains latent period 84 bases long. The IGF-I receptor gene from HSIGFIRR clone contains the region with latent period 57 bases long. Possible significance of latent periodicity is discussed.


Assuntos
DNA/genética , Genoma Humano , Sequências Repetitivas de Ácido Nucleico , Apolipoproteína B-100 , Apolipoproteínas B/genética , Sequência de Bases , Humanos , Dados de Sequência Molecular , Receptor IGF Tipo 1/genética , Homologia de Sequência do Ácido Nucleico
13.
DNA Seq ; 14(1): 33-52, 2003 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-12751330

RESUMO

The existence of a typical latent periodicity of 21 bases from the Tar chemoreceptor gene of Escherichia coli (E. coli) (MCP II) in the bacterial genes has been investigated in this work. Among 583 annotated bacterial genes and ORFs in the GenBank, in which the typical periodicity has been found, the chemoreceptors' genes constituted the most numerous group (18.5%). This typical latent periodicity of 21 bases has been revealed in many different genes of regulatory proteins, DNA polymerases, reductases, kinases and others. The numbers in such gene groups varied from 1 to 4% of the total analyzed genes. The 2D-structures analysis of the amino acid residues, which have been translated from the genes' regions with 21 bases periodicity, has shown that, though the enrichment of alpha-helical structures in such sequences is kept in all cases, it is seen that the latent periodicity of 21 bases is a very sensitively tuned basis, allowing the translated residues to smoothly change from one conformation to another. Interesting results have been obtained for 16S rRNAs genes of proteobacteria. Short sequences-determinants have been revealed in the genes, which select beta and gamma proteobacteria with an accuracy of above 90%.


Assuntos
DNA Bacteriano/genética , Proteínas de Escherichia coli/genética , Genes Bacterianos/genética , Receptores de Superfície Celular/genética , Proteínas de Bactérias , Sequência de Bases , Células Quimiorreceptoras , Bases de Dados de Ácidos Nucleicos , Escherichia coli/genética , Dados de Sequência Molecular , Conformação de Ácido Nucleico , RNA Ribossômico 16S/química , RNA Ribossômico 16S/genética , Alinhamento de Sequência/métodos , Homologia de Sequência do Ácido Nucleico
14.
DNA Seq ; 8(1-2): 31-8, 1997.
Artigo em Inglês | MEDLINE | ID: mdl-9522118

RESUMO

By using a weighted function and the method of enlarged similarity a search has been performed to identify mammalian interspersed repeats (MIRs) in DNA sequences from the EMBL data bank. The existence of MIRs is shown in coding regions of human genes and also in chicken and duck genomes. It is possible to conclude from the results obtained that MIRs were established in the coding regions of some genes and may have taken part in gene evolution. Furthermore, MIRs may have been amplified in vertebrate genomes before the origin of mammals.


Assuntos
Algoritmos , Modelos Genéticos , Sequências Repetitivas de Ácido Nucleico , Animais , Sequência de Bases , Aves/genética , Sequência Conservada , Elementos de DNA Transponíveis , Bases de Dados Factuais , Evolução Molecular , Humanos , Mamíferos/genética , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
15.
Mol Biol (Mosk) ; 21(2): 478-83, 1987.
Artigo em Russo | MEDLINE | ID: mdl-3600625

RESUMO

Information theory methods were used for computer search of Alu-like sequences in human DNA and RNA. Eight new regions related to the Alu repeat sequence was revealed in 85 clones from the EMBL-5 data bank. Some of these regions are purine-pyrimidine images of Alu repeats sequence, the rest are more complex images of Alu repeat sequence. A new definition for the likeness of different sequences--information image of sequence--was introduced. This information theory application greatly increases the power of DNA sequences computer analysis.


Assuntos
Clonagem Molecular , DNA/genética , Sequências Repetitivas de Ácido Nucleico , Humanos , Modelos Genéticos
16.
Mol Biol (Mosk) ; 25(1): 250-63, 1991.
Artigo em Russo | MEDLINE | ID: mdl-1896037

RESUMO

A new family of repeats--i.e. MB1 repeats family--the number of copies of which per a human genome constitutes a few hundreds of thousands of copies has been revealed in a human gemone by computer analysis of a noncanonical similarity of nucleic acid sequences. The numbers of that family of repeats have also been revealed in the genomes of mouse and rat, they have been identified as mirror--reflected copies--in purines and pyrimidines--of B1 repeats in the genome of mouse and the Alu repeats in the human genome. The MB1 repeats tend to remain most similar at a length of 70 b.p. They are not flanked by short repeats, neither contain poly(A) region at the 3' end, by which they differ from the repeats of the SINE family. It has been assumed that the member of the Alu repeats family and the MB1 repeats family can form a so called H-form of DNA. The mirror-reflected repeat family could have been formed by replication of parallel DNA strands.


Assuntos
Genoma Humano , Sequências Repetitivas de Ácido Nucleico , Sequência de Bases , Humanos , Dados de Sequência Molecular , Software
17.
Mol Biol (Mosk) ; 35(3): 376-82, 2001.
Artigo em Russo | MEDLINE | ID: mdl-11443916

RESUMO

The location of mammalian interspersed repeats (MIRs) and their density have been determined in the complete nucleotide sequence of human chromosome 22. The approach developed by us has allowed detection of 9675 MIRs at a statistically significant level, which by 15% exceeds the MIR number revealed by all previous approaches. It has been demonstrated that a considerable amount of MIRs missed by the algorithms applied earlier occurs in known DNA sequences of the human genome. The study of the MIR density revealed substantial irregularity of their distribution along the chromosome. The data on the MIRs thus found and the computer program searching for diverged sequences are available by E-mail: katrin2@mail.ru or katrin22@mtu-net.ru.


Assuntos
Cromossomos Humanos Par 22 , Sequências Repetitivas de Ácido Nucleico , Algoritmos , DNA/genética , Humanos
18.
Mol Biol (Mosk) ; 35(6): 1023-31, 2001.
Artigo em Russo | MEDLINE | ID: mdl-11771126

RESUMO

A search for new members of the mammalian interspersed repeat (MIR) family has been done over the coding regions of human genome from GenBank-116. Only 254 nucleotide sequences contained MIRs in coding regions, of which 45 MIR copies were unknown before, including 17 that occurred in translated gene regions. The program developed by the authors has been demonstrated to surpass the CENSOR program in the search power. The evolution of the MIR copies located in translated regions of human genome is discussed.


Assuntos
Códon , Evolução Molecular , Genoma Humano , Sequências Repetitivas de Ácido Nucleico , Sequência de Bases , DNA , Humanos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
19.
Mol Biol (Mosk) ; 23(4): 1113-23, 1989.
Artigo em Russo | MEDLINE | ID: mdl-2586503

RESUMO

The sequence of rat ID repeat was compared with those of Escherichia coli, Drosophila melanogaster, chicken, rat, mouse and human clones and clones with t-RNA gene from EMBL-5 date bank. This comparison was made bearing on the determination of the level of mutual information between the sequences compared. The non-canonical similarity of the ID repeat sequence with the tRNA genes was found. It is revealed by the conservation of purine or pyrimidine sites in the sequences compared. In human and mouse clones purine-pyrimidine copies of rat ID sequence were also found. In some cases these sequences were flanked by short direct repeats, they contained also poly(A)-like sequences. The possible functional meaning and evolutional origin of the revealed relations are discussed.


Assuntos
DNA/genética , Sequências Repetitivas de Ácido Nucleico , Animais , Clonagem Molecular , Humanos , Camundongos , Dados de Sequência Molecular , Ratos
20.
Mol Biol (Mosk) ; 37(4): 663-73, 2003.
Artigo em Russo | MEDLINE | ID: mdl-12942640

RESUMO

A program package has been developed to search for hidden tandem repeats of any specified type in the protein sequence databases. The applied algorithm of the locally optimal cyclic alignment is able to find subsequences possessing a certain profile-based periodicity type when no appreciable homology between periods is observed, as well as in the presence of arbitrary insertions/deletions. The profile can be adjusted to search for the periodicity types structurally and functionally important. The Swiss-Prot database has been analyzed to reveal the periodicities undetectable earlier that are caused by the secondary and super-secondary structure regularities of the NAD-binding sites. In particular, a significant periodicity of 24 aa was found to be characteristic of the absolute majority of domains possessing the Rossman (or Rossman-like) fold and displaying the apparent regularity in their secondary structures, not being obvious at the primary structure level.


Assuntos
NAD/metabolismo , Proteínas/genética , Proteínas/metabolismo , Alinhamento de Sequência/métodos , Software , Sequência de Aminoácidos , Sítios de Ligação , Bases de Dados de Proteínas , Dados de Sequência Molecular , Periodicidade , Dobramento de Proteína , Proteínas/química , Sequências Repetitivas de Ácido Nucleico , Homologia Estrutural de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA