Pesquisa | Portal Regional da BVS

Classification of Promoter Sequences from Human Genome.

Zaytsev, Konstantin; Fedorov, Alexey; Korotkov, Eugene.

Int J Mol Sci ; 24(16)2023 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-37628742

RESUMO

We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10-8 per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes.

Assuntos

Genoma Humano , Humanos , Elementos Alu/genética , Bases de Dados Factuais , Elementos de DNA Transponíveis/genética

Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms.

Korotkov, Eugene; Zaytsev, Konstantin; Fedorov, Alexey.

Entropy (Basel) ; 24(5)2022 Apr 30.

Artigo em Inglês | MEDLINE | ID: mdl-35626518

RESUMO

In this paper, we attempted to find a relation between bacteria living conditions and their genome algorithmic complexity. We developed a probabilistic mathematical method for the evaluation of k-words (6 bases length) occurrence irregularity in bacterial gene coding sequences. For this, the coding sequences from different bacterial genomes were analyzed and as an index of k-words occurrence irregularity, we used W, which has a distribution similar to normal. The research results for bacterial genomes show that they can be divided into two uneven groups. First, the smaller one has W in the interval from 170 to 475, while for the second it is from 475 to 875. Plants, metazoan and virus genomes also have W in the same interval as the first bacterial group. We suggested that second bacterial group coding sequences are much less susceptible to evolutionary changes than the first group ones. It is also discussed to use the W index as a biological stress value.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA