Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Genetics ; 227(4)2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-38869251

RESUMO

The number of genome assemblies has rapidly increased in recent history, with NCBI databases reaching over 41,000 eukaryotic genome assemblies across about 2,300 species. Increases in read length and improvements in assembly algorithms have led to increased contiguity and larger genome assemblies. While this number of assemblies is impressive, only about a third of these assemblies have corresponding genome size estimations for their respective species on publicly available databases. In this paper, genome assemblies are assessed regarding their total size compared to their respective publicly available genome size estimations. These deviations in size are assessed related to genome size, kingdom, sequencing platform, and standard assembly metrics, such as N50 and BUSCO values. A large proportion of assemblies deviate from their estimated genome size by more than 10%, with increasing deviations in size with increased genome size, suggesting nonprotein coding and structural DNA may be to blame. Modest differences in performance of sequencing platforms are noted as well. While standard metrics of genome assessment are more likely to indicate an assembly approaching the estimated genome size, much of the variation in this deviation in size is not explained with these raw metrics. A new, proportional N50 metric is proposed, in which N50 values are made relative to the average chromosome size of each species. This new metric has a stronger relationship with complete genome assemblies and, due to its proportional nature, allows for a more direct comparison across assemblies for genomes with variation in sizes and architectures.


Assuntos
Cromossomos , Tamanho do Genoma , Cromossomos/genética , Eucariotos/genética , Genômica/métodos , Algoritmos , Análise de Sequência de DNA/métodos
2.
Science ; 375(6580): 515-522, 2022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-35113693

RESUMO

The discovery of N6-methyldeoxyadenine (6mA) across eukaryotes led to a search for additional epigenetic mechanisms. However, some studies have highlighted confounding factors that challenge the prevalence of 6mA in eukaryotes. We developed a metagenomic method to quantitatively deconvolve 6mA events from a genomic DNA sample into species of interest, genomic regions, and sources of contamination. Applying this method, we observed high-resolution 6mA deposition in two protozoa. We found that commensal or soil bacteria explained the vast majority of 6mA in insect and plant samples. We found no evidence of high abundance of 6mA in Drosophila, Arabidopsis, or humans. Plasmids used for genetic manipulation, even those from Dam methyltransferase mutant Escherichia coli, could carry abundant 6mA, confounding the evaluation of candidate 6mA methyltransferases and demethylases. On the basis of this work, we advocate for a reassessment of 6mA in eukaryotes.


Assuntos
Metilação de DNA , DNA/química , Desoxiadenosinas/análise , Eucariotos/genética , Animais , Arabidopsis/genética , Neoplasias Encefálicas/genética , Chlamydomonas reinhardtii/genética , DNA/genética , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Fúngico/química , DNA Fúngico/genética , DNA de Protozoário/química , DNA de Protozoário/genética , Drosophila melanogaster/genética , Drosophila melanogaster/microbiologia , Epigênese Genética , Escherichia coli/genética , Eucariotos/metabolismo , Glioblastoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Leucócitos Mononucleares/química , Metagenômica , Plasmídeos , Análise de Sequência de DNA , Tetrahymena thermophila/genética
3.
Sci Rep ; 11(1): 24241, 2021 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-34930992

RESUMO

Both aquatic and terrestrial biodiversity information can be detected in riverine water environmental DNA (eDNA). However, the effectiveness of using riverine water eDNA to simultaneously monitor the riverine and terrestrial biodiversity information remains unidentified. Here, we proposed that the monitoring effectiveness could be approximated by the transportation effectiveness of land-to-river and upstream-to-downstream biodiversity information flows and described by three new indicators. Subsequently, we conducted a case study in a watershed on the Qinghai-Tibet Plateau. The results demonstrated that there was higher monitoring effectiveness on summer or autumn rainy days than in other seasons and weather conditions. The monitoring of the bacterial biodiversity information was more efficient than the monitoring of the eukaryotic biodiversity information. On summer rainy days, 43-76% of species information in riparian sites could be detected in adjacent riverine water eDNA samples, 92-99% of species information in riverine sites could be detected in a 1-km downstream eDNA sample, and half of dead bioinformation (the bioinformation labeling the biological material that lacked life activity and fertility) could be monitored 4-6 km downstream for eukaryotes and 13-19 km downstream for bacteria. The current study provided reference method and data for future monitoring projects design and for future monitoring results evaluation.


Assuntos
DNA Ambiental/análise , Rios , Água/química , Biodiversidade , Classificação , Código de Barras de DNA Taxonômico/métodos , Ecologia , Meio Ambiente , Monitoramento Ambiental/métodos , Eucariotos/genética , Chuva , Reprodutibilidade dos Testes , Estações do Ano
4.
Genes Genomics ; 42(7): 699-714, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32445179

RESUMO

BACKGROUND: The apparent disconnection between biological complexity and both genome size (C-value) and gene number (G-value) is one of the long-standing biological puzzles. Gene-dense genomic sequences in prokaryotes or simple eukaryotes are highly constrained during selection, whereas gene-sparse genomic sequences in higher eukaryotes have low selection constraints. This review discusses the correlations of the C-value and G-value with genome architecture, polyploidy, repeatomes, introns, cell economy and phenomes. DISCUSSION: Eukaryotic chromosomes carry an assortment of various repeated DNA sequences (repeatomes). Expansion of copies of repeatomes together with polyploidization or whole-genome duplication (WGD) are major players in genome size (C-value) bloating, but genomes are equipped with counterbalancing systems such as diploidization, illegitimate recombination, and nonhomologous end joining (NHEJ) after double-strand breaks (DSBs). The lack of these efficient purging systems allowed the accumulation of repeat DNA, which resulted in extremely large genomes in several species. However, the correlation between chromosome number and genome size is not clear due to inconsistent results with different sets of species. Positive correlations between genome size and intron size and density were reported in early studies, but these proposals were refuted by the results with increased numbers of species, in which genome-wide features of introns (size, density, gene contents, repeats) were weakly associated with genome size. The assumption of the correlations between C-value and gene number (G-value) and organismal complexity is acceptable in general, but this assumption is often violated in specific lineages or species, suggesting C- and G-value paradoxes. The C-value paradox is partly explained by noncoding repeatomes. The G-value paradox can also be explained by several genomic features: (1) one gene can produce many mature mRNAs by alternative splicing, and eukaryotic gene expression is highly regulated at both the transcriptional and translational levels; (2) many proteins exert multiple functions during development; (3) gene expansion/contraction are frequent events in the gene family among evolutionarily close species; and (4) sets of homeotic genes regulate development such that organismal complexity is sometimes not clear among organisms. A large genome must be burdensome in terms of cell economy, such that a large genome constraint results in the distribution of genome sizes skewed to small genomes. Moreover, the C-value can affect the phenome. A strong positive correlation has been recognized between genome size and cell size, but the relationship is weak or null with higher-level traits. Additional analyses of the relationship between the C-value and phenome should be carried out, because natural selection acts on the phenotype rather than the genotype. CONCLUSIONS: Dramatic advancement in genomics has given some answers to the C-value and G-value paradoxes. We know the mechanisms by which the current genomes have been constructed. However, basic questions have not yet been fully resolved. Why have some species retained small genomes yet some closely related species have large genomes? Random genetic drift and mutational pressure might have affected for genome size in the limited population size during evolution; thus, genome size may be quasiadaptable rather than the best adaptive trait.


Assuntos
Tamanho do Genoma , Íntrons , Poliploidia , Sequências Repetitivas de Ácido Nucleico , Animais , DNA , Eucariotos/genética , Evolução Molecular , Humanos
5.
Mol Ecol Resour ; 20(3)2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-32065492

RESUMO

Surveying microbial diversity and function is accomplished by combining complementary molecular tools. Among them, metagenomics is a PCR free approach that contains all genetic information from microbial assemblages and is today performed at a relatively large scale and reasonable cost, mostly based on very short reads. Here, we investigated the potential of metagenomics to provide taxonomic reports of marine microbial eukaryotes. We prepared a curated database with reference sequences of the V4 region of 18S rDNA clustered at 97% similarity and used this database to extract and classify metagenomic reads. More than half of them were unambiguously affiliated to a unique reference whilst the rest could be assigned to a given taxonomic group. The overall diversity reported by metagenomics was similar to that obtained by amplicon sequencing of the V4 and V9 regions of the 18S rRNA gene, although either one or both of these amplicon surveys performed poorly for groups like Excavata, Amoebozoa, Fungi and Haptophyta. We then studied the diversity of picoeukaryotes and nanoeukaryotes using 91 metagenomes from surface down to bathypelagic layers in different oceans, unveiling a clear taxonomic separation between size fractions and depth layers. Finally, we retrieved long rDNA sequences from assembled metagenomes that improved phylogenetic reconstructions of particular groups. Overall, this study shows metagenomics as an excellent resource for taxonomic exploration of marine microbial eukaryotes.


Assuntos
Eucariotos/genética , Metagenoma/genética , Microbiota/genética , Biodiversidade , DNA Ribossômico/genética , Metagenômica/métodos , Oceanos e Mares , Filogenia , Reação em Cadeia da Polimerase/métodos , RNA Ribossômico 18S/genética , Análise de Sequência de DNA/métodos
6.
Sci Rep ; 9(1): 14820, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31616016

RESUMO

Stellwagen Bank National Marine Sanctuary (SBNMS) in the Gulf of Maine is a historic fishing ground renowned for remarkable productivity. Biodiversity conservation is a key management priority for SBNMS and yet data on the diversity of microorganisms, both prokaryotic and eukaryotic, is lacking. This study utilized next generation sequencing to characterize sedimentary communities within SBNMS at three sites over two seasons. Targeting 16S and 18S small subunit (SSU) rRNA genes and fungal Internal Transcribed Spacer (ITS) rDNA sequences, samples contained high diversity at all taxonomic levels and identified 127 phyla, including 115 not previously represented in the SBNMS Management Plan and Environmental Assessment. A majority of the diversity was bacterial, with 59 phyla, but also represented were nine Archaea, 18 Animalia, 14 Chromista, eight Protozoa, two Plantae, and 17 Fungi phyla. Samples from different sites and seasons were dominated by the same high abundance organisms but displayed considerable variation in rare taxa. The levels of biodiversity seen on this small spatial scale suggest that benthic communities of this area support a diverse array of micro- and macro-organisms, and provide a baseline for future studies to assess changes in community structure in response to rapid warming in the Gulf of Maine.


Assuntos
Archaea/genética , Bactérias/genética , Eucariotos/genética , Sedimentos Geológicos/microbiologia , Microbiota/genética , Archaea/classificação , Archaea/isolamento & purificação , Oceano Atlântico , Bactérias/classificação , Bactérias/isolamento & purificação , Conservação dos Recursos Naturais , Código de Barras de DNA Taxonômico , DNA Ambiental/genética , DNA Ambiental/isolamento & purificação , Monitorização de Parâmetros Ecológicos , Eucariotos/classificação , Eucariotos/isolamento & purificação , Maine , Metagenoma , Filogenia , Água do Mar/microbiologia
7.
DNA Repair (Amst) ; 81: 102653, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31324529

RESUMO

Cells utilize sophisticated RNA processing machines to ensure the quality of RNA. Many RNA processing machines have been further implicated in regulating the DNA damage response signifying a strong link between RNA processing and genome maintenance. One of the most intricate and highly regulated RNA processing pathways is the processing of the precursor ribosomal RNA (pre-rRNA), which is paramount for the production of ribosomes. Removal of the Internal Transcribed Spacer 2 (ITS2), located between the 5.8S and 25S rRNA, is one of the most complex steps of ribosome assembly. Processing of the ITS2 is initiated by the newly discovered endoribonuclease Las1, which cleaves at the C2 site within the ITS2, generating products that are further processed by the polynucleotide kinase Grc3, the 5'→3' exonuclease Rat1, and the 3'→5' RNA exosome complex. In addition to their defined roles in ITS2 processing, these critical cellular machines participate in other stages of ribosome assembly, turnover of numerous cellular RNAs, and genome maintenance. Here we summarize recent work defining the molecular mechanisms of ITS2 processing by these essential RNA processing machines and highlight their emerging roles in transcription termination, heterochromatin function, telomere maintenance, and DNA repair.


Assuntos
Processamento Pós-Transcricional do RNA , RNA Ribossômico/metabolismo , Telômero , Transcrição Gênica , Reparo do DNA , Eucariotos/genética , Eucariotos/metabolismo , Exorribonucleases/metabolismo , Proteínas Nucleares/metabolismo , Polinucleotídeo 5'-Hidroxiquinase/metabolismo , RNA Ribossômico 5,8S/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
8.
Curr Biol ; 29(11): R512-R520, 2019 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-31163165

RESUMO

In sexual reproduction, opportunities are limited and the stakes are high. This inevitably leads to conflict. One pervasive conflict occurs within genomes between alternative alleles at heterozygous loci. Each gamete and thus each offspring will inherit only one of the two alleles from a heterozygous parent. Most alleles 'play fair' and have a 50% chance of being included in any given gamete. However, alleles can gain an enormous advantage if they act selfishly to force their own transmission into more than half, sometimes even all, of the functional gametes. These selfish alleles are known as 'meiotic drivers', and their cheating often incurs a high cost on the fertility of eukaryotes ranging from plants to mammals. Here, we review how several types of meiotic drivers directly and indirectly contribute to infertility, and argue that a complete picture of the genetics of infertility will require focusing on both the standard alleles - those that play fair - as well as selfish alleles involved in genetic conflict.


Assuntos
Eucariotos/fisiologia , Fertilidade/genética , Meiose/fisiologia , Eucariotos/genética
9.
Viruses ; 11(4)2019 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-30986983

RESUMO

It has been believed for a long time that the transfer and fixation of genetic material from RNA viruses to eukaryote genomes is very unlikely. However, during the last decade, there have been several cases in which "virus-to-host" gene transfer from various viral families into various eukaryotic phyla have been described. These transfers have been identified by sequence similarity, which may disappear very quickly, especially in the case of RNA viruses. However, compared to sequences, protein structure is known to be more conserved. Applying protein structure-guided protein domain-specific Hidden Markov Models, we detected homologues of the Virgaviridae capsid protein in Schizophora flies. Further data analysis supported "virus-to-host" transfer into Schizophora ancestors as a single transfer event. This transfer was not identifiable by BLAST or by other methods we applied. Our data show that structure-guided Hidden Markov Models should be used to detect ancestral virus-to-host transfers.


Assuntos
Eucariotos/genética , Cadeias de Markov , Proteínas Virais/química , Proteínas Virais/genética , Vírus/genética , Algoritmos , Animais , Bases de Dados de Proteínas , Transferência Genética Horizontal , Genoma/genética , Filogenia , Domínios Proteicos , Sintenia
10.
Sci Rep ; 8(1): 11737, 2018 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-30082688

RESUMO

Understanding how biodiversity changes in time and space is vital to assess the effects of environmental change on benthic ecosystems. Due to the limitations of morphological methods, there has been a rapid expansion in the application of high-throughput sequencing methods to study benthic eukaryotic communities. However, the effect of sample size and small-scale spatial variation on the assessment of benthic eukaryotic diversity is still not well understood. Here, we investigate the effect of different sample volumes in the genetic assessment of benthic metazoan and non-metazoan eukaryotic community composition. Accordingly, DNA was extracted from five different cumulative sediment volumes comprising 100% of the top 2 cm of five benthic sampling cores, and used as template for Ilumina MiSeq sequencing of 18 S rRNA amplicons. Sample volumes strongly impacted diversity metrics for both metazoans and non-metazoan eukaryotes. Beta-diversity of treatments using smaller sample volumes was significantly different from the beta-diversity of the 100% sampled area. Overall our findings indicate that sample volumes of 0.2 g (1% of the sampled area) are insufficient to account for spatial heterogeneity at small spatial scales, and that relatively large percentages of sediment core samples are needed for obtaining robust diversity measurement of both metazoan and non-metazoan eukaryotes.


Assuntos
Eucariotos/classificação , Eucariotos/genética , Animais , Biodiversidade , Código de Barras de DNA Taxonômico , Ecossistema , Células Eucarióticas , Sedimentos Geológicos , Sequenciamento de Nucleotídeos em Larga Escala , RNA Ribossômico 18S/genética , Tamanho da Amostra
11.
Genomics ; 109(3-4): 186-191, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28286147

RESUMO

Massive data produced due to the advent of next-generation sequencing (NGS) technology is widely used for biological researches and medical diagnosis. The crucial step in NGS analysis is read alignment or mapping which is computationally intensive and complex. The mapping bias tends to affect the downstream analysis, including detection of polymorphisms. In order to provide guidelines to the biologist for suitable selection of aligners; we have evaluated and benchmarked 5 different aligners (BWA, Bowtie2, NovoAlign, Smalt and Stampy) and their mapping bias based on characteristics of 5 microbial genomes. Two million simulated read pairs of various sizes (36bp, 50bp, 72bp, 100bp, 125bp, 150bp, 200bp, 250bp and 300bp) were aligned. Specific alignment features such as sensitivity of mapping, percentage of properly paired reads, alignment time and effect of tandem repeats on incorrectly mapped reads were evaluated. BWA showed faster alignment followed by Bowtie2 and Smalt. NovoAlign and Stampy were comparatively slower. Most of the aligners showed high sensitivity towards long reads (>100bp) mapping. On the other hand NovoAlign showed higher sensitivity towards both short reads (36bp, 50bp, 72bp) and long reads (>100bp) mappings; It also showed higher sensitivity towards mapping a complex genome like Plasmodium falciparum. The percentage of properly paired reads aligned by NovoAlign, BWA and Stampy were markedly higher. None of the aligners outperforms the others in the benchmark, however the aligners perform differently with genome characteristics. We expect that the results from this study will be useful for the end user to choose aligner, thus enhance the accuracy of read mapping.


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Bactérias/genética , Eucariotos/genética , Sensibilidade e Especificidade , Alinhamento de Sequência , Sequências de Repetição em Tandem
12.
Methods Mol Biol ; 1555: 59-75, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28092027

RESUMO

Today there exists a rapidly expanding number of sequenced genomes. Cataloging protein interaction domains such as the Src Homology 2 (SH2) domain across these various genomes can be accomplished with ease due to existing algorithms and predictions models. An evolutionary analysis of SH2 domains provides a step towards understanding how SH2 proteins integrated with existing signaling networks to position phosphotyrosine signaling as a crucial driver of robust cellular communication networks in metazoans. However organizing and tracing SH2 domain across organisms and understanding their evolutionary trajectory remains a challenge. This chapter describes several methodologies towards analyzing the evolutionary trajectory of SH2 domains including a global SH2 domain classification system, which facilitates annotation of new SH2 sequences essential for tracing the lineage of SH2 domains throughout eukaryote evolution. This classification utilizes a combination of sequence homology, protein domain architecture and the boundary positions between introns and exons within the SH2 domain or genes encoding these domains. Discrete SH2 families can then be traced across various genomes to provide insight into its origins. Furthermore, additional methods for examining potential mechanisms for divergence of SH2 domains from structural changes to alterations in the protein domain content and genome duplication will be discussed. Therefore a better understanding of SH2 domain evolution may enhance our insight into the emergence of phosphotyrosine signaling and the expansion of protein interaction domains.


Assuntos
Biologia Computacional/métodos , Eucariotos/metabolismo , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Proteínas/metabolismo , Domínios de Homologia de src , Algoritmos , Animais , Bases de Dados Genéticas , Eucariotos/genética , Evolução Molecular , Genômica/métodos , Humanos , Cadeias de Markov , Modelos Moleculares , Fosforilação , Fosfotirosina/metabolismo , Filogenia , Ligação Proteica , Proteínas Tirosina Quinases/química , Proteínas Tirosina Quinases/genética , Proteínas Tirosina Quinases/metabolismo , Proteínas/genética , Splicing de RNA , Análise de Sequência de DNA , Transdução de Sinais , Software , Relação Estrutura-Atividade , Navegador
13.
J Eukaryot Microbiol ; 63(6): 732-743, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27062087

RESUMO

Tailings ponds in the Athabasca oil sands (Canada) contain fluid wastes, generated by the extraction of bitumen from oil sands ores. Although the autochthonous prokaryotic communities have been relatively well characterized, almost nothing is known about microbial eukaryotes living in the anoxic soft sediments of tailings ponds or in the thin oxic layer of water that covers them. We carried out the first next-generation sequencing study of microbial eukaryotic diversity in oil sands tailings ponds. In metagenomes prepared from tailings sediment and surface water, we detected very low numbers of sequences encoding eukaryotic small subunit ribosomal RNA representing seven major taxonomic groups of protists. We also produced and analysed three amplicon-based 18S rRNA libraries prepared from sediment samples. These revealed a more diverse set of taxa, 169 different OTUs encompassing up to eleven higher order groups of eukaryotes, according to detailed classification using homology searching and phylogenetic methods. The 10 most abundant OTUs accounted for > 90% of the total of reads, vs. large numbers of rare OTUs (< 1% abundance). Despite the anoxic and hydrocarbon-enriched nature of the environment, the tailings ponds harbour complex communities of microbial eukaryotes indicating that these organisms should be taken into account when studying the microbiology of the oil sands.


Assuntos
Eucariotos/genética , Eucariotos/isolamento & purificação , Sedimentos Geológicos/parasitologia , Lagoas/parasitologia , Biodiversidade , Eucariotos/classificação , Sequenciamento de Nucleotídeos em Larga Escala , Campos de Petróleo e Gás , Filogenia
14.
Phys Biol ; 12(2): 026007, 2015 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-25884278

RESUMO

Gene activity in eukaryotes is in part regulated at the level of chromatin through the assembly of local chromatin states that are more or less permissive to transcription. How do these chromatin states achieve their functions and whether or not they contribute to the epigenetic inheritance of the transcriptional program remain to be elucidated. In cycling cells, stability is indeed strongly challenged by the periodic occurrence of replication and cell division. To address this question, we perform simulations of the stochastic dynamics of chromatin states when driven out-of-equilibrium by periodic perturbations. We show how epigenetic memory is significantly affected by the cell cycle length. In addition, we develop a simple model to connect the epigenetic state to the transcriptional state and gene activity. In particular, it suggests that replication may induce transcriptional bursting at repressive loci. Finally, we discuss how our findings-effect of replication and link to gene transcription-have original and deep implications to various biological contexts of epigenetic memory.


Assuntos
Replicação do DNA , Epigênese Genética , Transcrição Gênica , Ciclo Celular , Divisão Celular , Eucariotos/genética , Modelos Genéticos , Método de Monte Carlo , Processos Estocásticos
15.
Syst Biol ; 64(3): 472-91, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25631175

RESUMO

In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks.


Assuntos
Classificação/métodos , Cadeias de Markov , Método de Monte Carlo , Filogenia , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Teorema de Bayes , Eucariotos/classificação , Eucariotos/genética , Modelos Genéticos
16.
Mol Biol Evol ; 31(5): 1132-48, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24497029

RESUMO

Tandem repeats (TRs) are a major element of protein sequences in all domains of life. They are particularly abundant in mammals, where by conservative estimates one in three proteins contain a TR. High generation-scale duplication and deletion rates were reported for nucleic TR units. However, it is not known whether protein TR units can also be frequently lost or gained providing a source of variation for rapid adaptation of protein function, or alternatively, tend to have conserved TR unit configurations over long evolutionary times. To obtain a systematic picture, we performed a proteome-wide analysis of the mode of evolution for human protein TRs. For this purpose, we propose a novel method for the detection of orthologous TRs based on circular profile hidden Markov models. For all detected TRs, we reconstructed bispecies TR unit phylogenies across 61 eukaryotes ranging from human to yeast. Moreover, we performed additional analyses to correlate functional and structural annotations of human TRs with their mode of evolution. Surprisingly, we find that the vast majority of human TRs are ancient, with TR unit number and order preserved intact since distant speciation events. For example, ≥ 61% of all human TRs have been strongly conserved at least since the root of all mammals, approximately 300 Ma. Further, we find no human protein TR that shows evidence for strong recent duplications and deletions. The results are in contrast to the high generation-scale mutability of nucleic TRs. Presumably, most protein TRs fold into stable and conserved structures that are indispensable for the function of the TR-containing protein. All of our data and results are available for download from http://www.atgc-montpellier.fr/TRE.


Assuntos
Eucariotos/química , Eucariotos/genética , Evolução Molecular , Proteínas/química , Proteínas/genética , Sequências de Repetição em Tandem , Substituição de Aminoácidos , Animais , Sequência Conservada , Éxons , Genoma Humano , Humanos , Cadeias de Markov , Modelos Genéticos , Filogenia , Proteoma/química , Proteoma/genética , Fatores de Tempo
17.
BMC Genomics ; 14: 420, 2013 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-23800006

RESUMO

BACKGROUND: The C2H2 zinc-finger (ZNF) containing gene family is one of the largest and most complex gene families in metazoan genomes. These genes are known to exist in almost all eukaryotes, and they constitute a major subset of eukaryotic transcription factors. The genes of this family usually occur as clusters in genomes and are thought to have undergone a massive expansion in vertebrates by multiple tandem duplication events (BMC Evol Biol 8:176, 2008). RESULTS: In this study, we combined two popular approaches for homolog detection, Reciprocal Best Hit (RBH) (Proc Natl Acad Sci USA 95:6239-6244, 1998) and Hidden-Markov model (HMM) profiles search (Bioinformatics 14:755-763, 1998), on a diverse set of complete genomes of 124 eukaryotic species ranging from excavates to humans to identify all detectable members of 37 C2H2 ZNF gene families. We succeeded in identifying 3,890 genes as distinct members of 37 C2H2 gene families. These 37 families are distributed among the eukaryotes as progressive additions of gene blocks with increasing complexity of the organisms. The first block featuring the protists had 7 families, the second block featuring plants had 2 families, the third block featuring the fungi had 2 families (one of which was also present in plants) and the final block consisted of metazoans with 25 families. Among the metazoans, the simpler unicellular metazoans had just 15 of the 25 families while most of the bilaterians had all 25 families making up a total of 37 families. Multiple potential examples of lineage-specific gene duplications and gene losses were also observed. CONCLUSIONS: Our hybrid approach combines features of the both RBH and HMM methods for homolog detection. This largely automated technique is much faster than manual methods and is able to detect homologs accurately and efficiently among a diverse set of organisms. Our analysis of the 37 evolutionarily conserved C2H2 ZNF gene families revealed a stepwise appearance of ZNF families, agreeing well with the phylogenetic relationship of the organisms compared and their presumed stepwise increase in complexity (Science 300:1694, 2003).


Assuntos
Sequência Conservada , Eucariotos/genética , Dedos de Zinco/genética , Amebozoários/genética , Animais , Fungos/genética , Cadeias de Markov , Plantas/genética
18.
Microbiologyopen ; 2(3): 402-14, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23520129

RESUMO

Despite the recent and significant increase in the study of aquatic microbial communities, little is known about the microbial diversity of complex ecosystems such as running waters. This study investigated the biodiversity of biofilm communities formed in a river with 454 Sequencing™. This river has the particularity of integrating both organic and microbiological pollution, as receiver of agricultural pollution in its upstream catchment area and urban pollution through discharges of the wastewater treatment plant of the town of Billom. Different regions of the small subunit (SSU) ribosomal RNA gene were targeted using nine pairs of primers, either universal or specific for bacteria, eukarya, or archaea. Our aim was to characterize the widest range of rDNA sequences using different sets of polymerase chain reaction (PCR) primers. A first look at reads abundance revealed that a large majority (47-48%) were rare sequences (<5 copies). Prokaryotic phyla represented the species richness, and eukaryotic phyla accounted for a small part. Among the prokaryotic phyla, Proteobacteria (beta and alpha) predominated, followed by Bacteroidetes together with a large number of nonaffiliated bacterial sequences. Bacillariophyta plastids were abundant. The remaining bacterial phyla, Verrucomicrobia and Cyanobacteria, made up the rest of the bulk biodiversity. The most abundant eukaryotic phyla were annelid worms, followed by Diatoms, and Chlorophytes. These latter phyla attest to the abundance of plastids and the importance of photosynthetic activity for the biofilm. These findings highlight the existence and plasticity of multiple trophic levels within these complex biological systems.


Assuntos
Archaea/classificação , Bactérias/classificação , Biofilmes/crescimento & desenvolvimento , Biota , Eucariotos/classificação , Rios/microbiologia , Rios/parasitologia , Archaea/genética , Bactérias/genética , DNA Ribossômico/química , DNA Ribossômico/genética , Eucariotos/genética , França , Genes de RNAr , Fenômenos Fisiológicos , Análise de Sequência de DNA
19.
BMC Biol ; 10: 71, 2012 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-22873208

RESUMO

BACKGROUND: Membrane-bound organelles are a defining feature of eukaryotic cells, and play a central role in most of their fundamental processes. The Rab G proteins are the single largest family of proteins that participate in the traffic between organelles, with 66 Rabs encoded in the human genome. Rabs direct the organelle-specific recruitment of vesicle tethering factors, motor proteins, and regulators of membrane traffic. Each organelle or vesicle class is typically associated with one or more Rab, with the Rabs present in a particular cell reflecting that cell's complement of organelles and trafficking routes. RESULTS: Through iterative use of hidden Markov models and tree building, we classified Rabs across the eukaryotic kingdom to provide the most comprehensive view of Rab evolution obtained to date. A strikingly large repertoire of at least 20 Rabs appears to have been present in the last eukaryotic common ancestor (LECA), consistent with the 'complexity early' view of eukaryotic evolution. We were able to place these Rabs into six supergroups, giving a deep view into eukaryotic prehistory. CONCLUSIONS: Tracing the fate of the LECA Rabs revealed extensive losses with many extant eukaryotes having fewer Rabs, and none having the full complement. We found that other Rabs have expanded and diversified, including a large expansion at the dawn of metazoans, which could be followed to provide an account of the evolutionary history of all human Rabs. Some Rab changes could be correlated with differences in cellular organization, and the relative lack of variation in other families of membrane-traffic proteins suggests that it is the changes in Rabs that primarily underlies the variation in organelles between species and cell types.


Assuntos
Evolução Molecular , Genômica , Proteínas rab de Ligação ao GTP/genética , Sequência de Aminoácidos , Animais , Eucariotos/genética , Variação Genética , Humanos , Cadeias de Markov , Família Multigênica , Filogenia , Reprodutibilidade dos Testes , Especificidade da Espécie , Proteínas rab de Ligação ao GTP/química , Proteínas rab de Ligação ao GTP/classificação
20.
PLoS Comput Biol ; 7(9): e1002150, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21935348

RESUMO

Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6-7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.


Assuntos
Eucariotos/genética , Genoma , Íntrons , Processamento Alternativo , Animais , Evolução Molecular , Humanos , Cadeias de Markov
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA