Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Nature ; 517(7532): 77-80, 2015 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-25317564

RESUMO

The mechanisms that underlie the origin of major prokaryotic groups are poorly understood. In principle, the origin of both species and higher taxa among prokaryotes should entail similar mechanisms--ecological interactions with the environment paired with natural genetic variation involving lineage-specific gene innovations and lineage-specific gene acquisitions. To investigate the origin of higher taxa in archaea, we have determined gene distributions and gene phylogenies for the 267,568 protein-coding genes of 134 sequenced archaeal genomes in the context of their homologues from 1,847 reference bacterial genomes. Archaeal-specific gene families define 13 traditionally recognized archaeal higher taxa in our sample. Here we report that the origins of these 13 groups unexpectedly correspond to 2,264 group-specific gene acquisitions from bacteria. Interdomain gene transfer is highly asymmetric, transfers from bacteria to archaea are more than fivefold more frequent than vice versa. Gene transfers identified at major evolutionary transitions among prokaryotes specifically implicate gene acquisitions for metabolic functions from bacteria as key innovations in the origin of higher archaeal taxa.


Assuntos
Archaea/classificação , Archaea/genética , Bactérias/genética , Evolução Molecular , Transferência Genética Horizontal/genética , Genes Arqueais/genética , Genes Bacterianos/genética , Archaea/metabolismo , Proteínas Arqueais/genética , Bactérias/metabolismo , Genoma Arqueal/genética , Filogenia
2.
Nature ; 524(7566): 427-32, 2015 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-26287458

RESUMO

Chloroplasts arose from cyanobacteria, mitochondria arose from proteobacteria. Both organelles have conserved their prokaryotic biochemistry, but their genomes are reduced, and most organelle proteins are encoded in the nucleus. Endosymbiotic theory posits that bacterial genes in eukaryotic genomes entered the eukaryotic lineage via organelle ancestors. It predicts episodic influx of prokaryotic genes into the eukaryotic lineage, with acquisition corresponding to endosymbiotic events. Eukaryotic genome sequences, however, increasingly implicate lateral gene transfer, both from prokaryotes to eukaryotes and among eukaryotes, as a source of gene content variation in eukaryotic genomes, which predicts continuous, lineage-specific acquisition of prokaryotic genes in divergent eukaryotic groups. Here we discriminate between these two alternatives by clustering and phylogenetic analysis of eukaryotic gene families having prokaryotic homologues. Our results indicate (1) that gene transfer from bacteria to eukaryotes is episodic, as revealed by gene distributions, and coincides with major evolutionary transitions at the origin of chloroplasts and mitochondria; (2) that gene inheritance in eukaryotes is vertical, as revealed by extensive topological comparison, sparse gene distributions stemming from differential loss; and (3) that continuous, lineage-specific lateral gene transfer, although it sometimes occurs, does not contribute to long-term gene content evolution in eukaryotic genomes.


Assuntos
Eucariotos/genética , Evolução Molecular , Modelos Genéticos , Organelas/genética , Simbiose/genética , Archaea/genética , Bactérias/genética , Análise por Conglomerados , Eucariotos/classificação , Células Eucarióticas/metabolismo , Transferência Genética Horizontal/genética , Genoma/genética , Mitocôndrias/genética , Filogenia , Plastídeos/genética , Células Procarióticas/metabolismo , Proteoma/genética , Fatores de Tempo
3.
Mol Biol Evol ; 36(3): 472-486, 2019 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-30517696

RESUMO

The ubiquity of plasmids in all prokaryotic phyla and habitats and their ability to transfer between cells marks them as prominent constituents of prokaryotic genomes. Many plasmids are found in their host cell in multiple copies. This leads to an increased mutational supply of plasmid-encoded genes and genetically heterogeneous plasmid genomes. Nonetheless, the segregation of plasmid copies into daughter cells during cell division is considered to occur in the absence of selection on the plasmid alleles. We investigate the implications of random genetic drift of multicopy plasmids during cell division-termed here "segregational drift"-to plasmid evolution. Performing experimental evolution of low- and high-copy non-mobile plasmids in Escherichia coli, we find that the evolutionary rate of multicopy plasmids does not reflect the increased mutational supply expected according to their copy number. In addition, simulated evolution of multicopy plasmid alleles demonstrates that segregational drift leads to increased loss frequency and extended fixation time of plasmid mutations in comparison to haploid chromosomes. Furthermore, an examination of the experimentally evolved hosts reveals a significant impact of the plasmid type on the host chromosome evolution. Our study demonstrates that segregational drift of multicopy plasmids interferes with the retention and fixation of novel plasmid variants. Depending on the selection pressure on newly emerging variants, plasmid genomes may evolve slower than haploid chromosomes, regardless of their higher mutational supply. We suggest that plasmid copy number is an important determinant of plasmid evolvability due to the manifestation of segregational drift.


Assuntos
Evolução Biológica , Deriva Genética , Modelos Genéticos , Plasmídeos/genética , Cromossomos Bacterianos , Escherichia coli , Frequência do Gene
4.
Syst Biol ; 68(1): 117-130, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29771363

RESUMO

The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.


Assuntos
Classificação/métodos , Filogenia , Alinhamento de Sequência , Software , Simulação por Computador , Reprodutibilidade dos Testes
5.
Mol Biol Evol ; 31(2): 410-8, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24188869

RESUMO

Eukaryotic genomes are mosaics of genes acquired from their prokaryotic ancestors, the eubacterial endosymbiont that gave rise to the mitochondrion and its archaebacterial host. Genomic footprints of the prokaryotic merger at the origin of eukaryotes are still discernable in eukaryotic genomes, where gene expression and function correlate with their prokaryotic ancestry. Molecular chaperones are essential in all domains of life as they assist the functional folding of their substrate proteins and protect the cell against the cytotoxic effects of protein misfolding. Eubacteria and archaebacteria code for slightly different chaperones, comprising distinct protein folding pathways. Here we study the evolution of the eukaryotic protein folding pathways following the endosymbiosis event. A phylogenetic analysis of all 64 chaperones encoded in the Saccharomyces cerevisiae genome revealed 25 chaperones of eubacterial ancestry, 11 of archaebacterial ancestry, 10 of ambiguous prokaryotic ancestry, and 18 that may represent eukaryotic innovations. Several chaperone families (e.g., Hsp90 and Prefoldin) trace their ancestry to only one prokaryote group, while others, such as Hsp40 and Hsp70, are of mixed ancestry, with members contributed from both prokaryotic ancestors. Analysis of the yeast chaperone-substrate interaction network revealed no preference for interaction between chaperones and substrates of the same origin. Our results suggest that the archaebacterial and eubacterial protein folding pathways have been reorganized and integrated into the present eukaryotic pathway. The highly integrated chaperone system of yeast is a manifestation of the central role of chaperone-mediated folding in maintaining cellular fitness. Most likely, both archaebacterial and eubacterial chaperone systems were essential at the very early stages of eukaryogenesis, and the retention of both may have offered new opportunities for expanding the scope of chaperone-mediated folding.


Assuntos
Archaea/genética , Bactérias/genética , Evolução Biológica , Eucariotos/genética , Chaperonas de Histonas/genética , Saccharomyces cerevisiae/genética , Archaea/metabolismo , Bactérias/metabolismo , Eucariotos/metabolismo , Modelos Moleculares , Filogenia , Dobramento de Proteína , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Simbiose
6.
Proc Natl Acad Sci U S A ; 109(50): 20537-42, 2012 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-23184964

RESUMO

Archaebacterial halophiles (Haloarchaea) are oxygen-respiring heterotrophs that derive from methanogens--strictly anaerobic, hydrogen-dependent autotrophs. Haloarchaeal genomes are known to have acquired, via lateral gene transfer (LGT), several genes from eubacteria, but it is yet unknown how many genes the Haloarchaea acquired in total and, more importantly, whether independent haloarchaeal lineages acquired their genes in parallel, or as a single acquisition at the origin of the group. Here we have studied 10 haloarchaeal and 1,143 reference genomes and have identified 1,089 haloarchaeal gene families that were acquired by a methanogenic recipient from eubacteria. The data suggest that these genes were acquired in the haloarchaeal common ancestor, not in parallel in independent haloarchaeal lineages, nor in the common ancestor of haloarchaeans and methanosarcinales. The 1,089 acquisitions include genes for catabolic carbon metabolism, membrane transporters, menaquinone biosynthesis, and complexes I-IV of the eubacterial respiratory chain that functions in the haloarchaeal membrane consisting of diphytanyl isoprene ether lipids. LGT on a massive scale transformed a strictly anaerobic, chemolithoautotrophic methanogen into the heterotrophic, oxygen-respiring, and bacteriorhodopsin-photosynthetic haloarchaeal common ancestor.


Assuntos
Bactérias/genética , Euryarchaeota/genética , Evolução Molecular , Transferência Genética Horizontal , Genes Bacterianos , Proteínas Arqueais/genética , Euryarchaeota/classificação , Genoma Arqueal , Genoma Bacteriano , Modelos Genéticos , Filogenia
7.
BMC Evol Biol ; 14: 266, 2014 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-25547755

RESUMO

BACKGROUND: Analyzed individually, gene trees for a given taxon set tend to harbour incongruent or conflicting signals. One popular approach to deal with this circumstance is to use concatenated data. But especially in prokaryotes, where lateral gene transfer (LGT) is a natural mechanism of generating genetic diversity, there are open questions as to whether concatenation amplifies or averages phylogenetic signals residing in individual genes. Here we investigate concatenations of prokaryotic and eukaryotic datasets to investigate possible sources of incongruence in phylogenetic trees and to examine the level of overlap between individual and concatenated alignments. RESULTS: We analyzed prokaryotic datasets comprising 248 invidual gene trees from 315 genomes at three taxonomic depths spanning gammaproteobacteria, proteobacteria, and prokaryotes (bacteria plus archaea), and eukaryotic datasets comprising 279 invidual gene trees from 85 genomes at two taxonomic depths: across plants-animals-fungi and within fungi. Consistent with previous findings, the branches in trees made from concatenated alignments are, in general, not supported by any of their underlying individual gene trees, even though the concatenation trees tend to possess high bootstrap proportions values. For the prokaryote data, this observation is independent of phylogenetic depth and sequence conservation. The eukaryotic data show much better agreement between concatenation and single gene trees. LGT frequencies in trees were estimated using established methods. Sequence length in individual alignments, but not sequence divergence, was found to correlate with the generation of branches that correspond to the concatenated tree. CONCLUSIONS: The weak correspondence of concatenation trees with single gene trees gives rise to the question where the phylogenetic signal in concatenated trees is coming from. The eukaryote data reveals a better correspondence between individual and concatenation trees than the prokaryote data. The question of whether the lack of correspondence between individual genes and the concatenation tree in the prokaryotic data is due to LGT or phylogenetic artefacts remains unanswered. If LGT is the cause of incongruence between concatenation and individual trees, we would have expected to see greater degrees of incongruence for more divergent prokaryotic data sets, which was not observed, although estimated rates of LGT suggest that LGT is responsible for at least some of the observed incongruence.


Assuntos
Genoma , Filogenia , Animais , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Sequência Conservada/genética , Evolução Molecular , Fungos/classificação , Fungos/genética , Transferência Genética Horizontal , Plantas/classificação , Plantas/genética
8.
BMC Genomics ; 15: 906, 2014 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-25326207

RESUMO

BACKGROUND: The human pathogen Trichomonas vaginalis is a parabasalian flagellate that is estimated to infect 3% of the world's population annually. With a 160 megabase genome and up to 60,000 genes residing in six chromosomes, the parasite has the largest genome among sequenced protists. Although it is thought that the genome size and unusual large coding capacity is owed to genome duplication events, the exact reason and its consequences are less well studied. RESULTS: Among transcriptome data we found thousands of instances, in which reads mapped onto genomic loci not annotated as genes, some reaching up to several kilobases in length. At first sight these appear to represent long non-coding RNAs (lncRNAs), however, about half of these lncRNAs have significant sequence similarities to genomic loci annotated as protein-coding genes. This provides evidence for the transcription of hundreds of pseudogenes in the parasite. Conventional lncRNAs and pseudogenes are expressed in Trichomonas through their own transcription start sites and independently from flanking genes in Trichomonas. Expression of several representative lncRNAs was verified through reverse-transcriptase PCR in different T. vaginalis strains and case studies exclude the use of alternative start codons or stop codon suppression for the genes analysed. CONCLUSION: Our results demonstrate that T. vaginalis expresses thousands of intergenic loci, including numerous transcribed pseudogenes. In contrast to yeast these are expressed independently from neighbouring genes. Our results furthermore illustrate the effect genome duplication events can have on the transcriptome of a protist. The parasite's genome is in a steady state of changing and we hypothesize that the numerous lncRNAs could offer a large pool for potential innovation from which novel proteins or regulatory RNA units could evolve.


Assuntos
Pseudogenes , RNA Longo não Codificante/genética , RNA de Protozoário/genética , Trichomonas vaginalis/genética , Duplicação Gênica , Perfilação da Expressão Gênica , Análise de Sequência de RNA
9.
Genome Res ; 21(4): 599-609, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21270172

RESUMO

Lateral gene transfer (LGT) plays a major role in prokaryote evolution with only a few genes that are resistant to it; yet the nature and magnitude of barriers to lateral transfer are still debated. Here, we implement directed networks to investigate donor-recipient events of recent lateral gene transfer among 657 sequenced prokaryote genomes. For 2,129,548 genes investigated, we detected 446,854 recent lateral gene transfer events through nucleotide pattern analysis. Among these, donor-recipient relationships could be specified through phylogenetic reconstruction for 7% of the pairs, yielding 32,028 polarized recent gene acquisition events, which constitute the edges of our directed networks. We find that the frequency of recent LGT is linearly correlated both with genome sequence similarity and with proteome similarity of donor-recipient pairs. Genome sequence similarity accounts for 25% of the variation in gene-transfer frequency, with proteome similarity adding only 1% to the variability explained. The range of donor-recipient GC content similarity within the network is extremely narrow, with 86% of the LGTs occurring between donor-recipient pairs having ≤5% difference in GC content. Hence, genome sequence similarity and GC content similarity are strong barriers to LGT in prokaryotes. But they are not insurmountable, as we detected 1530 recent transfers between distantly related genomes. The directed network revealed that recipient genomes of distant transfers encode proteins of nonhomologous end-joining (NHEJ; a DNA repair mechanism) far more frequently than the recipient lacking that mechanism. This implicates NHEJ in genes spread across distantly related prokaryotes through bypassing the donor-recipient sequence similarity barrier.


Assuntos
Reparo do DNA/genética , Evolução Molecular , Transferência Genética Horizontal/genética , Genoma/genética , Células Procarióticas/metabolismo , Composição de Bases/genética , Biologia Computacional , DNA Bacteriano/genética , Homologia de Sequência do Ácido Nucleico
10.
Curr Biol ; 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38964320

RESUMO

Plasmids are extrachromosomal genetic elements that reside in prokaryotes. The acquisition of plasmids encoding beneficial traits can facilitate short-term survival in harsh environmental conditions or long-term adaptation of new ecological niches. Due to their ability to transfer between cells, plasmids are considered agents of gene transfer. Nonetheless, the frequency of DNA transfer between plasmids and chromosomes remains understudied. Using a novel approach for detection of homologous loci between genome pairs, we uncover gene sharing with the chromosome in 1,974 (66%) plasmids residing in 1,016 (78%) taxonomically diverse isolates. The majority of homologous loci correspond to mobile elements, which may be duplicated in the host chromosomes in tens of copies. Neighboring shared genes often encode similar functional categories, indicating the transfer of multigene functional units. Rare transfer events of antibiotics resistance genes are observed mainly with mobile elements. The frequent erosion of sequence similarity in homologous regions indicates that the transferred DNA is often devoid of function. DNA transfer between plasmids and chromosomes thus generates genetic variation that is akin to workings of endosymbiotic gene transfer in eukaryotic evolution. Our findings imply that plasmid contribution to gene transfer most often corresponds to transfer of the plasmid entity rather than transfer of protein-coding genes between plasmids and chromosomes.

11.
BMC Genet ; 14: 37, 2013 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-23651527

RESUMO

BACKGROUND: Whether or not a mutant allele in a population is under selection is an important issue in population genetics, and various neutrality tests have been invented so far to detect selection. However, detection of negative selection has been notoriously difficult, partly because negatively selected alleles are usually rare in the population and have little impact on either population dynamics or the shape of the gene genealogy. Recently, through studies of genetic disorders and genome-wide analyses, many structural variations were shown to occur recurrently in the population. Such "recurrent mutations" might be revealed as deleterious by exploiting the signal of negative selection in the gene genealogy enhanced by their recurrence. RESULTS: Motivated by the above idea, we devised two new test statistics. One is the total number of mutants at a recurrently mutating locus among sampled sequences, which is tested conditionally on the number of forward mutations mapped on the sequence genealogy. The other is the size of the most common class of identical-by-descent mutants in the sample, again tested conditionally on the number of forward mutations mapped on the sequence genealogy. To examine the performance of these two tests, we simulated recurrently mutated loci each flanked by sites with neutral single nucleotide polymorphisms (SNPs), with no recombination. Using neutral recurrent mutations as null models, we attempted to detect deleterious recurrent mutations. Our analyses demonstrated high powers of our new tests under constant population size, as well as their moderate power to detect selection in expanding populations. We also devised a new maximum parsimony algorithm that, given the states of the sampled sequences at a recurrently mutating locus and an incompletely resolved genealogy, enumerates mutation histories with a minimum number of mutations while partially resolving genealogical relationships when necessary. CONCLUSIONS: With their considerably high powers to detect negative selection, our new neutrality tests may open new venues for dealing with the population genetics of recurrent mutations as well as help identifying some types of genetic disorders that may have escaped identification by currently existing methods.


Assuntos
Mutação , Seleção Genética , Humanos , Polimorfismo de Nucleotídeo Único
12.
Genome Biol Evol ; 15(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37247390

RESUMO

The determination of the last common ancestor (LCA) of a group of species plays a vital role in evolutionary theory. Traditionally, an LCA is inferred by the rooting of a fully resolved species tree. From a theoretical perspective, however, inference of the LCA amounts to the reconstruction of just one branch-the root branch-of the true species tree and should therefore be a much easier task than the full resolution of the species tree. Discarding the reliance on a hypothesized species tree and its rooting leads us to reevaluate what phylogenetic signal is directly relevant to LCA inference and to recast the task as that of sampling the total evidence from all gene families at the genomic scope. Here, we reformulate LCA and root inference in the framework of statistical hypothesis testing and outline an analytical procedure to formally test competing a priori LCA hypotheses and to infer confidence sets for the earliest speciation events in the history of a group of species. Applying our methods to two demonstrative data sets, we show that our inference of the opisthokonta LCA is well in agreement with the common knowledge. Inference of the proteobacteria LCA shows that it is most closely related to modern Epsilonproteobacteria, raising the possibility that it may have been characterized by a chemolithoautotrophic and anaerobic life style. Our inference is based on data comprising between 43% (opisthokonta) and 86% (proteobacteria) of all gene families. Approaching LCA inference within a statistical framework renders the phylogenomic inference powerful and robust.


Assuntos
Evolução Biológica , Genômica , Filogenia , Genômica/métodos , Genoma , Eucariotos/genética , Proteobactérias/genética
13.
Nucleic Acids Res ; 38(15): e158, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20571085

RESUMO

It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen-Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, D(JS), using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas D(JS) failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.


Assuntos
Algoritmos , Genoma Humano , Genômica/métodos , Composição de Bases , Simulação por Computador , Humanos , Isocoros , Modelos Genéticos
14.
Nucleic Acids Res ; 38(Web Server issue): W23-8, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20497997

RESUMO

Evaluating the accuracy of multiple sequence alignment (MSA) is critical for virtually every comparative sequence analysis that uses an MSA as input. Here we present the GUIDANCE web-server, a user-friendly, open access tool for the identification of unreliable alignment regions. The web-server accepts as input a set of unaligned sequences. The server aligns the sequences and provides a simple graphic visualization of the confidence score of each column, residue and sequence of an alignment, using a color-coding scheme. The method is generic and the user is allowed to choose the alignment algorithm (ClustalW, MAFFT and PRANK are supported) as well as any type of molecular sequences (nucleotide, protein or codon sequences). The server implements two different algorithms for evaluating confidence scores: (i) the heads-or-tails (HoT) method, which measures alignment uncertainty due to co-optimal solutions; (ii) the GUIDANCE method, which measures the robustness of the alignment to guide-tree uncertainty. The server projects the confidence scores onto the MSA and points to columns and sequences that are unreliably aligned. These can be automatically removed in preparation for downstream analyses. GUIDANCE is freely available for use at http://guidance.tau.ac.il.


Assuntos
Alinhamento de Sequência/métodos , Software , Proteínas do Vírus da Imunodeficiência Humana/química , Internet , Análise de Sequência de Proteína , Proteínas Virais Reguladoras e Acessórias/química
15.
Mol Biol Evol ; 27(8): 1759-67, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20207713

RESUMO

Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.


Assuntos
Algoritmos , Sequência de Aminoácidos , Sequência de Bases , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Animais , Simulação por Computador , Bases de Dados Factuais , Drosophila melanogaster/genética , Dados de Sequência Molecular , Filogenia , Curva ROC , Software
16.
Genome Biol Evol ; 13(2)2021 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-33231627

RESUMO

The transition from unicellular to multicellular organisms is one of the most significant events in the history of life. Key to this process is the emergence of Darwinian individuality at the higher level: Groups must become single entities capable of reproduction for selection to shape their evolution. Evolutionary transitions in individuality are characterized by cooperation between the lower level entities and by division of labor. Theory suggests that division of labor may drive the transition to multicellularity by eliminating the trade off between two incompatible processes that cannot be performed simultaneously in one cell. Here, we examine the evolution of the most ancient multicellular transition known today, that of cyanobacteria, where we reconstruct the sequence of ecological and phenotypic trait evolution. Our results show that the prime driver of multicellularity in cyanobacteria was the expansion in metabolic capacity offered by nitrogen fixation, which was accompanied by the emergence of the filamentous morphology and succeeded by a reproductive life cycle. This was followed by the progression of multicellularity into higher complexity in the form of differentiated cells and patterned multicellularity.


Assuntos
Cianobactérias/genética , Evolução Molecular , Proteínas de Bactérias/classificação , Cianobactérias/classificação , Cianobactérias/citologia , Cianobactérias/metabolismo , Ecossistema , Fixação de Nitrogênio
17.
Mol Biol Evol ; 26(8): 1829-33, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19443854

RESUMO

The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition.


Assuntos
Composição de Bases , Códon/genética , Isocoros/genética , Proteínas/genética , Animais , Genoma , Genoma Humano , Humanos
18.
Mol Biol Evol ; 25(4): 748-61, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18222943

RESUMO

Plastids are descended from a cyanobacterial symbiosis which occurred over 1.2 billion years ago. During the course of endosymbiosis, most genes were lost from the cyanobacterium's genome and many were relocated to the host nucleus through endosymbiotic gene transfer (EGT). The issue of how many genes were acquired through EGT in different plant lineages is unresolved. Here, we report the genome-wide frequency of gene acquisitions from cyanobacteria in 4 photosynthetic eukaryotes--Arabidopsis, rice, Chlamydomonas, and the red alga Cyanidioschyzon--by comparison of the 83,138 proteins encoded in their genomes with 851,607 proteins encoded in 9 sequenced cyanobacterial genomes, 215 other reference prokaryotic genomes, and 13 reference eukaryotic genomes. The analyses entail 11,569 phylogenies inferred with both maximum likelihood and Neighbor-Joining approaches. Because each phylogenetic result is dependent not only upon the reconstruction method but also upon the site patterns in the underlying alignment, we investigated how the reliability of site pattern generation via alignment affects our results: if the site patterns in an alignment differ depending upon the order in which amino acids are introduced into multiple sequence alignment--N- to C-terminal versus C- to N-terminal--then the phylogenetic result is likely to be artifactual. Excluding unreliable alignments by this means, we obtain a conservative estimate, wherein about 14% of the proteins examined in each plant genome indicate a cyanobacterial origin for the corresponding nuclear gene, with higher proportions (17-25%) observed among the more reliable alignments. The identification of cyanobacterial genes in plant genomes affords access to an important question: from which type of cyanobacterium did the ancestor of plastids arise? Among the 9 cyanobacterial genomes sampled, Nostoc sp. PCC7120 and Anabaena variabilis ATCC29143 were found to harbor collections of genes which are-in terms of presence/absence and sequence similarity-more like those possessed by the plastid ancestor than those of the other 7 cyanobacterial genomes sampled here. This suggests that the ancestor of plastids might have been an organism more similar to filamentous, heterocyst-forming (nitrogen-fixing) representatives of section IV recognized in Stanier's cyanobacterial classification. Members of section IV are very common partners in contemporary symbiotic associations involving endosymbiotic cyanobacteria, which generally provide nitrogen to their host, consistent with suggestions that fixed nitrogen supplied by the endosymbiont might have played an important role during the origin of plastids.


Assuntos
Núcleo Celular/genética , Cianobactérias/genética , Genes Bacterianos , Genoma de Planta/genética , Plantas/genética , Plastídeos/genética , Sequência de Aminoácidos , Animais , Arabidopsis/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Chlamydomonas/genética , Sequência Conservada , Transferência Genética Horizontal , Dados de Sequência Molecular , Fixação de Nitrogênio/genética , Oryza/genética , Filogenia , Rodófitas/genética , Alinhamento de Sequência , Simbiose/genética
19.
FEBS J ; 286(13): 2471-2489, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30945446

RESUMO

Pyruvate kinases (PKs) synthesize ATP as the final step of glycolysis in the three domains of life. PKs from most bacteria and eukarya are allosteric enzymes that are activated by sugar phosphates; for example, the feed-forward regulator fructose-1,6-bisphosphate, or AMP as a sensor of energy charge. Archaea utilize unusual glycolytic pathways, but the allosteric properties of PKs from these species are largely unknown. Here, we present an analysis of 24 PKs from most archaeal clades with respect to allosteric properties, together with phylogenetic analyses constructed using a novel mode of rooting protein trees. We find that PKs from many Thermoproteales, an order of crenarchaeota, are allosterically activated by 3-phosphoglycerate (3PG). We also identify five conserved amino acids that form the binding pocket for 3PG. 3PG is generated via an irreversible reaction in the modified glycolytic pathway of these archaea and therefore functions as a feed-forward regulator. We also show that PKs from hyperthermophilic Methanococcales, an order of euryarchaeota, are activated by AMP. Phylogenetic analyses indicate that 3PG-activated PKs form an evolutionary lineage that is distinct from that of sugar-phosphate activated PKs, and that sugar phosphate-activated PKs originated as AMP-regulated PKs in hyperthermophilic Methanococcales. Since the phospho group of sugar phosphates and 3PG overlap in the allosteric site, our data indicate that the allostery in PKs first started from a progenitor phosphate-binding site that evolved in two spatially distinct directions: one direction generated the canonical site that responds to sugar phosphates and the other gave rise to the 3PG site present in Thermoproteales. Overall, our data suggest an intimate connection between the allosteric properties and evolution of PKs.


Assuntos
Sítio Alostérico , Proteínas Arqueais/metabolismo , Evolução Molecular , Piruvato Quinase/metabolismo , Regulação Alostérica , Proteínas Arqueais/química , Proteínas Arqueais/genética , Filogenia , Piruvato Quinase/química , Piruvato Quinase/genética , Thermoproteus/classificação , Thermoproteus/enzimologia , Thermoproteus/genética
20.
ISME J ; 11(2): 543-554, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27648812

RESUMO

Bacteriophages are recognized DNA vectors and transduction is considered as a common mechanism of lateral gene transfer (LGT) during microbial evolution. Anecdotal events of phage-mediated gene transfer were studied extensively, however, a coherent evolutionary viewpoint of LGT by transduction, its extent and characteristics, is still lacking. Here we report a large-scale evolutionary reconstruction of transduction events in 3982 genomes. We inferred 17 158 recent transduction events linking donors, phages and recipients into a phylogenomic transduction network view. We find that LGT by transduction is mostly restricted to closely related donors and recipients. Furthermore, a substantial number of the transduction events (9%) are best described as gene duplications that are mediated by mobile DNA vectors. We propose to distinguish this type of paralogy by the term autology. A comparison of donor and recipient genomes revealed that genome similarity is a superior predictor of species connectivity in the network in comparison to common habitat. This indicates that genetic similarity, rather than ecological opportunity, is a driver of successful transduction during microbial evolution. A striking difference in the connectivity pattern of donors and recipients shows that while lysogenic interactions are highly species-specific, the host range for lytic phage infections can be much wider, serving to connect dense clusters of closely related species. Our results thus demonstrate that DNA transfer via transduction occurs within the context of phage-host specificity, but that this tight constraint can be breached, on rare occasions, to produce long-range LGTs of profound evolutionary consequences.


Assuntos
Bactérias/genética , Bacteriófagos/fisiologia , Transferência Genética Horizontal , Genoma Bacteriano/genética , Transdução Genética , Bactérias/classificação , Bactérias/virologia , Evolução Biológica , Especificidade de Hospedeiro , Filogenia
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa