Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Mol Biol Evol ; 41(6)2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38860506

RESUMO

Phylogenetic inference based on protein sequence alignment is a widely used procedure. Numerous phylogenetic algorithms have been developed, most of which have many parameters and options. Choosing a program, options, and parameters can be a nontrivial task. No benchmark for comparison of phylogenetic programs on real protein sequences was publicly available. We have developed PhyloBench, a benchmark for evaluating the quality of phylogenetic inference, and used it to test a number of popular phylogenetic programs. PhyloBench is based on natural, not simulated, protein sequences of orthologous evolutionary domains. The measure of accuracy of an inferred tree is its distance to the corresponding species tree. A number of tree-to-tree distance measures were tested. The most reliable results were obtained using the Robinson-Foulds distance. Our results confirmed recent findings that distance methods are more accurate than maximum likelihood (ML) and maximum parsimony. We tested the bayesian program MrBayes on natural protein sequences and found that, on our datasets, it performs better than ML, but worse than distance methods. Of the methods we tested, the Balanced Minimum Evolution method implemented in FastME yielded the best results on our material. Alignments and reference species trees are available at https://mouse.belozersky.msu.ru/tools/phylobench/ together with a web-interface that allows for a semi-automatic comparison of a user's method with a number of popular programs.


Assuntos
Algoritmos , Filogenia , Software , Benchmarking , Alinhamento de Sequência/métodos , Teorema de Bayes , Evolução Molecular , Biologia Computacional/métodos
2.
Biochemistry (Mosc) ; 88(2): 253-261, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37072330

RESUMO

Some restriction-modification systems contain two DNA methyltransferases. In the present work, we have classified such systems according to the families of catalytic domains present in the restriction endonucleases and both DNA methyltransferases. Evolution of the restriction-modification systems containing an endonuclease with a NOV_C family domain and two DNA methyltransferases, both with DNA_methylase family domains, was investigated in detail. Phylogenetic tree of DNA methyltransferases from the systems of this class consists of two clades of the same size. Two DNA methyltransferases of each restriction-modification system of this class belong to the different clades. This indicates independent evolution of the two methyltransferases. We detected multiple cross-species horizontal transfers of the systems as a whole, as well as the cases of gene transfer between the systems.


Assuntos
Enzimas de Restrição-Modificação do DNA , Metiltransferases , Enzimas de Restrição do DNA/genética , Enzimas de Restrição-Modificação do DNA/genética , Filogenia , Metiltransferases/genética , DNA
3.
Biochemistry (Mosc) ; 87(12): 1689-1698, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36717457

RESUMO

ae-mail: sas@belozersky.msu.ru Protein phylogeny is usually reconstructed basing on a multiple alignment of amino acid sequences. One of the problems of such alignments is the presence of regions with different degree of conservation, including those with a questionable quality of the alignment. This problem is often solved by filtering the alignment columns with a special software developed for this purpose. In this work, we investigated various approaches to the phylogeny reconstruction using proteins with two evolutionary domains as examples. The sequences of such proteins are inherently heterogeneous in the degree of conservation due to the presence of both evolutionary domains and linkers between them, as well as the N- and C-termini. It is shown that filtering the alignment columns on average improves the quality of reconstruction only when using the full-length sequences and only for eukaryotic proteins. Limiting the alignment to the evolutionary domains with rejection of less conserved linkers and terminal sequences on average worsened the quality of phylogenetic reconstruction.


Assuntos
Proteínas , Software , Filogenia , Alinhamento de Sequência , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Algoritmos
4.
BMC Evol Biol ; 20(1): 164, 2020 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-33308147

RESUMO

BACKGROUND: Eukaryotic protein-coding genes consist of exons and introns. Exon-intron borders are conserved between species and thus their changes might be observed only on quite long evolutionary distances. One of the rarest types of change, in which intron relocates over a short distance, is called "intron sliding", but the reality of this event has been debated for a long time. The main idea of a search for intron sliding is to use the most accurate genome annotation and genome sequence, as well as high-quality transcriptome data. We applied them in a search for sliding introns in mammals in order to widen knowledge about the presence or absence of such phenomena in this group. RESULTS: We didn't find any significant evidence of intron sliding in the primate group (human, chimpanzee, rhesus macaque, crab-eating macaque, green monkey, marmoset). Only one possible intron sliding event supported by a set of high quality transcriptomes was observed between EIF1AX human and sheep gene orthologs. Also, we checked a list of previously observed intron sliding events in mammals and showed that most likely they are artifacts of genome annotations and are not shown in subsequent annotation versions as well as are not supported by transcriptomic data. CONCLUSIONS: We assume that intron sliding is indeed a very rare evolutionary event if it exists at all. Every case of intron sliding needs a lot of supportive data for detection and confirmation.


Assuntos
Evolução Molecular , Íntrons/genética , Mamíferos/genética , Animais , Éxons/genética , Humanos , Primatas/genética , Reprodutibilidade dos Testes , Ovinos/genética , Incerteza
5.
BMC Bioinformatics ; 19(1): 374, 2018 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-30314446

RESUMO

BACKGROUND: Many algorithms and programs are available for phylogenetic reconstruction of families of proteins. Methods used widely at present use either a number of distance-based principles or character-based principles of maximum parsimony or maximum likelihood. RESULTS: We developed a novel program, named PQ, for reconstructing protein and nucleic acid phylogenies following a new character-based principle. Being tested on natural sequences PQ improves upon the results of maximum parsimony and maximum likelihood. Working with alignments of 10 and 15 sequences, it also outperforms the FastME program, which is based on one of the distance-based principles. Among all tested programs PQ is proved to be the least susceptible to long branch attraction. FastME outperforms PQ when processing alignments of 45 sequences, however. We confirm a recent result that on natural sequences FastME outperforms maximum parsimony and maximum likelihood. At the same time, both PQ and FastME are inferior to maximum parsimony and maximum likelihood on simulated sequences. PQ is open source and available to the public via an online interface. CONCLUSIONS: The software we developed offers an open-source alternative for phylogenetic reconstruction for relatively small sets of proteins and nucleic acids, with up to a few tens of sequences.


Assuntos
Filogenia , Algoritmos , Software
6.
Nucleic Acids Res ; 44(D1): D144-53, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26656949

RESUMO

The recent upgrade of nucleic acid-protein interaction database (NPIDB, http://npidb.belozersky.msu.ru/) includes a newly elaborated classification of complexes of protein domains with double-stranded DNA and a classification of families of related complexes. Our classifications are based on contacting structural elements of both DNA: the major groove, the minor groove and the backbone; and protein: helices, beta-strands and unstructured segments. We took into account both hydrogen bonds and hydrophobic interaction. The analyzed material contains 1942 structures of protein domains from 748 PDB entries. We have identified 97 interaction modes of individual protein domain-DNA complexes and 17 DNA-protein interaction classes of protein domain families. We analyzed the sources of diversity of DNA-protein interaction modes in different complexes of one protein domain family. The observed interaction mode is sometimes influenced by artifacts of crystallization or diversity in secondary structure assignment. The interaction classes of domain families are more stable and thus possess more biological sense than a classification of single complexes. Integration of the classification into NPIDB allows the user to browse the database according to the interacting structural elements of DNA and protein molecules. For each family, we present average DNA shape parameters in contact zones with domains of the family.


Assuntos
Proteínas de Ligação a DNA/química , DNA/química , Bases de Dados Genéticas , DNA/metabolismo , Proteínas de Ligação a DNA/classificação , Proteínas de Ligação a DNA/metabolismo , Conformação de Ácido Nucleico , Estrutura Terciária de Proteína
7.
BMC Genomics ; 16: 1084, 2015 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-26689194

RESUMO

BACKGROUND: Avoidance of palindromic recognition sites of Type II restriction-modification (R-M) systems was shown for many R-M systems in dozens of prokaryotic genomes. However the phenomenon has not been investigated systematically for all presently available genomes and annotated R-M systems. We have studied all known recognition sites in thousands of prokaryotic genomes and found factors that influence their avoidance. RESULTS: Only Type II R-M systems consisting of independently acting endonuclease and methyltransferase (called 'orthodox' here) cause avoidance of their sites, both palindromic and asymmetric, in corresponding prokaryotic genomes; the avoidance takes place for ~ 50 % of 1774 studied cases. It is known that prokaryotes can acquire and lose R-M systems. Thus it is possible to talk about the lifespan of an R-M system in a genome. We have shown that the recognition site avoidance correlates with the lifespan of R-M systems. The sites of orthodox R-M systems that are encoded in host genomes for a long time are avoided more often (up to 100 % in certain cohorts) than the sites of recently acquired ones. We also found cases of site avoidance in absence of the corresponding R-M systems in the genome. An analysis of closely related bacteria shows that such avoidance can be a trace of lost R-M systems. Sites of Type I, IIС/G, IIM, III, and IV R-M systems are not avoided in vast majority of cases. CONCLUSIONS: The avoidance of orthodox Type II R-M system recognition sites in prokaryotic genomes is a widespread phenomenon. Presence of an R-M system without an underrepresentation of its site may indicate that the R-M system was acquired recently. At the same time, a significant underrepresentation of a site may be a sign of presence of the corresponding R-M system in this organism or in its ancestors for a long time. The drastic difference between site avoidance for orthodox Type II R-M systems and R-M systems of other types can be explained by a higher rate of specificity changes or a less self-toxicity of the latter.


Assuntos
Desoxirribonucleases de Sítio Específico do Tipo II/metabolismo , Sequências Repetidas Invertidas , Metilação de DNA , Genoma Microbiano , Mapeamento por Restrição
8.
Nucleic Acids Res ; 40(20): 10107-15, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22965118

RESUMO

Prokaryotic restriction-modification (R-M) systems defend the host cell from the invasion of a foreign DNA. They comprise two enzymatic activities: specific DNA cleavage activity and DNA methylation activity preventing cleavage. Typically, these activities are provided by two separate enzymes: a DNA methyltransferase (MTase) and a restriction endonuclease (RE). In the absence of a corresponding MTase, an RE of Type II R-M system is highly toxic for the cell. Genes of the R-M system are linked in the genome in the vast majority of annotated cases. There are only a few reported cases in which the genes of MTase and RE from one R-M system are not linked. Nevertheless, a few hundreds solitary RE genes are present in the Restriction Enzyme Database (http://rebase.neb.com) annotations. Using the comparative genomic approach, we analysed 272 solitary RE genes. For 57 solitary RE genes we predicted corresponding MTase genes located distantly in a genome. Of the 272 solitary RE genes, 99 are likely to be fragments of RE genes. Various explanations for the existence of the remaining 116 solitary RE genes are also discussed.


Assuntos
Enzimas de Restrição do DNA/genética , Genoma Arqueal , Genoma Bacteriano , Metilases de Modificação do DNA/genética , Enzimas de Restrição do DNA/classificação , Desoxirribonucleases de Sítio Específico do Tipo I/genética , Desoxirribonucleases de Sítio Específico do Tipo II/genética , Genômica
9.
BMC Bioinformatics ; 12: 268, 2011 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-21718472

RESUMO

BACKGROUND: The substitution rates within different nucleotide contexts are subject to varying levels of bias. The most well known example of such bias is the excess of C to T (C > T) mutations in CpG (CG) dinucleotides. The molecular mechanisms underlying this bias are important factors in human genome evolution and cancer development. The discovery of other nucleotide contexts that have profound effects on substitution rates can improve our understanding of how mutations are acquired, and why mutation hotspots exist. RESULTS: We compared rates of inherited mutations in 1-4 bp nucleotide contexts using reconstructed ancestral states of human single nucleotide polymorphisms (SNPs) from intergenic regions. Chimp and orangutan genomic sequences were used as outgroups. We uncovered 3.5 and 3.3-fold excesses of T > C mutations in the second position of ATTG and ATAG words, respectively, and a 3.4-fold excess of A > C mutations in the first position of the ACAA word. CONCLUSIONS: Although all the observed biases are less pronounced than the 5.1-fold excess of C > T mutations in CG dinucleotides, the three 4 bp mutation contexts mentioned above (and their complementary contexts) are well distinguished from all other mutation contexts. This provides a challenge to discover the underlying mechanisms responsible for the observed excesses of mutations.


Assuntos
Ilhas de CpG , Genoma Humano , Mutação Puntual , Primatas/genética , Seleção Genética , Animais , Humanos , Pan troglodytes/genética , Polimorfismo de Nucleotídeo Único , Pongo/genética
10.
Biomed Res Int ; 2020: 4657615, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32775422

RESUMO

[This corrects the article DOI: 10.1155/2013/989410.].

11.
J Bioinform Comput Biol ; 6(4): 775-88, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18763742

RESUMO

Water molecules immobilized on a protein or DNA surface are known to play an important role in intramolecular and intermolecular interactions. Comparative analysis of related three-dimensional (3D) structures allows to predict the locations of such water molecules on the protein surface. We have developed and implemented the algorithm WLAKE detecting "conserved" water molecules, i.e. those located in almost the same positions in a set of superimposed structures of related proteins or macromolecular complexes. The problem is reduced to finding maximal cliques in a certain graph. Despite exponential algorithm complexity, the program works appropriately fast for dozens of superimposed structures. WLAKE was used to predict functionally significant water molecules in enzyme active sites (transketolases) as well as in intermolecular (ETS-DNA complexes) and intramolecular (thiol-disulfide interchange protein) interactions. The program is available online at http://monkey.belozersky.msu.ru/~evgeniy/wLake/wLake.html.


Assuntos
DNA/química , DNA/ultraestrutura , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Água/química , Sítios de Ligação , Simulação por Computador , Difração de Raios X/métodos
12.
J Bioinform Comput Biol ; 6(4): 759-73, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18763741

RESUMO

Expressed sequence tags (ESTs) represent 500-1000-bp-long sequences corresponding to mRNAs derived from different sources (cell lines, tissues, etc.). The human EST database contains over 8,000,000 sequences, with over 4,000,000,000 total nucleotides. RNA molecules are transcribed from a genomic DNA template; therefore, all ESTs should match corresponding genomes. Nevertheless, we have found in the human EST database approximately 11,000 ESTs not matching sequences in the human genome database. The presence of "trash" ESTs (TESTs) in the EST database could result from DNA or RNA contamination of the laboratory equipment, tissues, or cell lines. TESTs could also represent sequences from unidentified human genes or from species inhabiting the human body. Here, we attempt to identify the sources of human EST database contaminations. In particular, we discuss systematic contamination of the mammalian EST databases with sequences of plants.


Assuntos
Mapeamento Cromossômico/métodos , DNA Complementar/genética , Componentes Genômicos/genética , Genoma Humano/genética , Alinhamento de Sequência/métodos , Sequência de Bases , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Humanos , Dados de Sequência Molecular
13.
J Bioinform Comput Biol ; 14(2): 1641003, 2016 04.
Artigo em Inglês | MEDLINE | ID: mdl-26972562

RESUMO

Palindromes are frequently underrepresented in prokaryotic genomes. Palindromic 5[Formula: see text]-GATC-3[Formula: see text] site is a recognition site of different Restriction-Modification (R-M) systems, as well as solitary methyltransferase Dam. Classical GATC-specific R-M systems methylate GATC and cleave unmethylated GATC. On the contrary, methyl-directed Type II restriction endonucleases cleave methylated GATC. Methylation of GATC by Dam methyltransferase is involved in the regulation of different cellular processes. The diversity of functions of GATC-recognizing proteins makes GATC sequence a good model for studying the reasons of palindrome avoidance in prokaryotic genomes. In this work, the influence of R-M systems and solitary proteins on the GATC site avoidance is described by a mathematical model. GATC avoidance is strongly associated with the presence of alternate (methyl-directed or classical Type II R-M system) genes in different strains of the same species, as we have shown for Streptococcus pneumoniae, Neisseria meningitidis, Eubacterium rectale, and Moraxella catarrhalis. We hypothesize that GATC avoidance can result from a DNA exchange between strains with different methylation status of GATC site within the process of natural transformation. If this hypothesis is correct, the GATC avoidance is a sign of a DNA exchange between bacteria with different methylation status in a mixed population.


Assuntos
Enzimas de Restrição-Modificação do DNA/metabolismo , Sequências Repetidas Invertidas/genética , Metilação de DNA , Enzimas de Restrição-Modificação do DNA/classificação , Enzimas de Restrição-Modificação do DNA/genética , Genoma , Modelos Biológicos , Família Multigênica , Células Procarióticas , DNA Metiltransferases Sítio Específica (Adenina-Específica)/genética , DNA Metiltransferases Sítio Específica (Adenina-Específica)/metabolismo
14.
Biomed Res Int ; 2013: 989410, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24058920

RESUMO

Substitution rates strongly depend on their nucleotide context. One of the most studied examples is the excess of C > T mutations in the CG context in various groups of organisms, including vertebrates. Studies on the molecular mechanisms underlying this mutation regularity have provided insights into evolution, mutagenesis, and cancer development. Recently several other hypermutable motifs were identified in the human genome. There is an increased frequency of T > C mutations in the second position of the words ATTG and ATAG and an increased frequency of A > C mutations in the first position of the word ACAA. For a better understanding of evolution, it is of interest whether these mutation regularities are human specific or present in other vertebrates, as their presence might affect the validity of currently used substitution models and molecular clocks. A comprehensive analysis of mutagenesis in 4 bp mutation contexts requires a vast amount of mutation data. Such data may be derived from the comparisons of individual genomes or from single nucleotide polymorphism (SNP) databases. Using this approach, we performed a systematical comparison of mutation regularities within 2-4 bp contexts in Mus musculus and Homo sapiens and uncovered that even closely related organisms may have notable differences in context-dependent mutation regularities.


Assuntos
Modelos Genéticos , Mutagênese/genética , Mutação/genética , Animais , Pareamento de Bases , Humanos , Camundongos , Polimorfismo de Nucleotídeo Único/genética
15.
Int J Genomics ; 2013: 173616, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23984310

RESUMO

In general, mutation frequencies are context-dependent: specific adjacent nucleotides may influence the probability to observe a specific type of mutation in a genome. Recently, several hypermutable motifs were identified in the human genome. Namely, there is an increased frequency of T>C mutations in the second position of the words ATTG and ATAG and an increased frequency of A>C mutations in the first position of the word ACAA. Previous studies have also shown that there is a remarkable difference between the mutagenesis of humans and drosophila. While C>T mutations are overrepresented in the CG context in humans (and other vertebrates), this mutation regularity is not observed in Drosophila melanogaster. Such differences in the observed regularities of mutagenesis between representatives of different taxa might reflect differences in the mechanisms involved in mutagenesis. We performed a systematical comparison of mutation regularities within 2-4 bp contexts in Homo sapiens and Drosophila melanogaster and found that the aforementioned contexts are not hypermutable in fruit flies. It seems that most mutation contexts affect mutation rates in a similar manner in H. sapiens and D. melanogaster; however, several important exceptions are noted and discussed.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA