Pesquisa | Biblioteca Virtual em Saúde

Lower Statistical Support with Larger Data Sets: Insights from the Ochrophyta Radiation.

Di Franco, Arnaud; Baurain, Denis; Glöckner, Gernot; Melkonian, Michael; Philippe, Hervé.

Mol Biol Evol ; 39(1)2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34694402

RESUMO

It is commonly assumed that increasing the number of characters has the potential to resolve evolutionary radiations. Here, we studied photosynthetic stramenopiles (Ochrophyta) using alignments of heterogeneous origin mitochondrion, plastid, and nucleus. Surprisingly while statistical support for the relationships between the six major Ochrophyta lineages increases when comparing the mitochondrion (6,762 sites) and plastid (21,692 sites) trees, it decreases in the nuclear (209,105 sites) tree. Statistical support is not simply related to the data set size but also to the quantity of phylogenetic signal available at each position and our ability to extract it. Here, we show that this ability for current phylogenetic methods is limited, because conflicting results were obtained when varying taxon sampling. Even though the use of a better fitting model improved signal extraction and reduced the observed conflicts, the plastid data set provided higher statistical support for the ochrophyte radiation than the larger nucleus data set. We propose that the higher support observed in the plastid tree is due to an acceleration of the evolutionary rate in one short deep internal branch, implying that more phylogenetic signal per position is available to resolve the Ochrophyta radiation in the plastid than in the nuclear data set. Our work therefore suggests that, in order to resolve radiations, beyond the obvious use of data sets with more positions, we need to continue developing models of sequence evolution that better extract the phylogenetic signal and design methods to search for genes/characters that contain more signal specifically for short internal branches.

Assuntos

Estramenópilas , Filogenia , Plastídeos/genética

Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences.

Di Franco, Arnaud; Poujol, Raphaël; Baurain, Denis; Philippe, Hervé.

BMC Evol Biol ; 19(1): 21, 2019 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-30634908

RESUMO

BACKGROUND: Multiple Sequence Alignments (MSAs) are the starting point of molecular evolutionary analyses. Errors in MSAs generate a non-historical signal that can lead to incorrect inferences. Therefore, numerous efforts have been made to reduce the impact of alignment errors, by improving alignment algorithms and by developing methods to filter out poorly aligned regions. However, MSAs do not only contain alignment errors, but also primary sequence errors. Such errors may originate from sequencing errors, from assembly errors, or from erroneous structural annotations (such as incorrect intron/exon boundaries). Even though their existence is acknowledged, the impact of primary sequence errors on evolutionary inference is poorly characterized. RESULTS: In a first step to fill this gap, we have developed a program called HmmCleaner, which detects and eliminates these errors from MSAs. It uses profile hidden Markov models (pHMM) to identify sequence segments that poorly fit their MSA and selectively removes them. We assessed its performances using > 700 amino-acid MSAs from prokaryotes and eukaryotes, in which we introduced several types of simulated primary sequence errors. The sensitivity of HmmCleaner towards simulated primary sequence errors was > 95%. In a second step, we compared the impact of segment filtering software (HmmCleaner and PREQUAL) relative to commonly used block-filtering software (BMGE and TrimAI) on evolutionary analyses. Using real data from vertebrates, we observed that segment-filtering methods improve the quality of evolutionary inference more than the currently used block-filtering methods. The formers were especially effective at improving branch length inferences, and at reducing false positive rate during detection of positive selection. CONCLUSIONS: Segment filtering methods such as HmmCleaner accurately detect simulated primary sequence errors. Our results suggest that these errors are more detrimental than alignment errors. However, they also show that stochastic (sampling) error is predominant in single-gene evolutionary inferences. Therefore, we argue that MSA filtering should focus on segment instead of block removal and that more studies are required to find the optimal balance between accuracy improvement and stochastic error increase brought by data removal.

Assuntos

Evolução Molecular , Alinhamento de Sequência , Algoritmos , Sequência de Aminoácidos , Sequência Conservada , Filogenia , Software

A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling.

Eché, Camille; Iampietro, Carole; Birbes, Clément; Dréau, Andreea; Kuchly, Claire; Di Franco, Arnaud; Klopp, Christophe; Faraut, Thomas; Djebali, Sarah; Castinel, Adrien; Zytnicki, Matthias; Denis, Erwan; Boussaha, Mekki; Grohs, Cécile; Boichard, Didier; Gaspin, Christine; Milan, Denis; Donnadieu, Cécile.

Sci Data ; 10(1): 369, 2023 06 08.

Artigo em Inglês | MEDLINE | ID: mdl-37291142

RESUMO

Inspired by the production of reference data sets in the Genome in a Bottle project, we sequenced one Charolais heifer with different technologies: Illumina paired-end, Oxford Nanopore, Pacific Biosciences (HiFi and CLR), 10X Genomics linked-reads, and Hi-C. In order to generate haplotypic assemblies, we also sequenced both parents with short reads. From these data, we built two haplotyped trio high quality reference genomes and a consensus assembly, using up-to-date software packages. The assemblies obtained using PacBio HiFi reaches a size of 3.2 Gb, which is significantly larger than the 2.7 Gb ARS-UCD1.2 reference. The BUSCO score of the consensus assembly reaches a completeness of 95.8%, among highly conserved mammal genes. We also identified 35,866 structural variants larger than 50 base pairs. This assembly is a contribution to the bovine pangenome for the "Charolais" breed. These datasets will prove to be useful resources enabling the community to gain additional insight on sequencing technologies for applications such as SNP, indel or structural variant calling, and de novo assembly.

Assuntos

Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Animais , Bovinos , Feminino , Benchmarking , Genoma , Análise de Sequência de DNA

Decontamination, pooling and dereplication of the 678 samples of the Marine Microbial Eukaryote Transcriptome Sequencing Project.

Van Vlierberghe, Mick; Di Franco, Arnaud; Philippe, Hervé; Baurain, Denis.

BMC Res Notes ; 14(1): 306, 2021 Aug 09.

Artigo em Inglês | MEDLINE | ID: mdl-34372933

RESUMO

OBJECTIVES: Complex algae are photosynthetic organisms resulting from eukaryote-to-eukaryote endosymbiotic-like interactions. Yet the specific lineages and mechanisms are still under debate. That is why large scale phylogenomic studies are needed. Whereas available proteomes provide a limited diversity of complex algae, MMETSP (Marine Microbial Eukaryote Transcriptome Sequencing Project) transcriptomes represent a valuable resource for phylogenomic analyses, owing to their broad and rich taxonomic sampling, especially of photosynthetic species. Unfortunately, this sampling is unbalanced and sometimes highly redundant. Moreover, we observed contaminated sequences in some samples. In such a context, tree inference and readability are impaired. Consequently, the aim of the data processing reported here is to release a unique set of clean and non-redundant transcriptomes produced through an original protocol featuring decontamination, pooling and dereplication steps. DATA DESCRIPTION: We submitted 678 MMETSP re-assembly samples to our parallel consolidation pipeline. Hence, we combined 423 samples into 110 consolidated transcriptomes, after the systematic removal of the most contaminated samples (186). This approach resulted in a total of 224 high-quality transcriptomes, easy to use and suitable to compute less contaminated, less redundant and more balanced phylogenies.

Assuntos

Eucariotos , Transcriptoma , Descontaminação , Eucariotos/genética , Filogenia , Plantas , Transcriptoma/genética

A Large and Consistent Phylogenomic Dataset Supports Sponges as the Sister Group to All Other Animals.

Simion, Paul; Philippe, Hervé; Baurain, Denis; Jager, Muriel; Richter, Daniel J; Di Franco, Arnaud; Roure, Béatrice; Satoh, Nori; Quéinnec, Éric; Ereskovsky, Alexander; Lapébie, Pascal; Corre, Erwan; Delsuc, Frédéric; King, Nicole; Wörheide, Gert; Manuel, Michaël.

Curr Biol ; 27(7): 958-967, 2017 Apr 03.

Artigo em Inglês | MEDLINE | ID: mdl-28318975

RESUMO

Resolving the early diversification of animal lineages has proven difficult, even using genome-scale datasets. Several phylogenomic studies have supported the classical scenario in which sponges (Porifera) are the sister group to all other animals ("Porifera-sister" hypothesis), consistent with a single origin of the gut, nerve cells, and muscle cells in the stem lineage of eumetazoans (bilaterians + ctenophores + cnidarians). In contrast, several other studies have recovered an alternative topology in which ctenophores are the sister group to all other animals (including sponges). The "Ctenophora-sister" hypothesis implies that eumetazoan-specific traits, such as neurons and muscle cells, either evolved once along the metazoan stem lineage and were then lost in sponges and placozoans or evolved at least twice independently in Ctenophora and in Cnidaria + Bilateria. Here, we report on our reconstruction of deep metazoan relationships using a 1,719-gene dataset with dense taxonomic sampling of non-bilaterian animals that was assembled using a semi-automated procedure, designed to reduce known error sources. Our dataset outperforms previous metazoan gene superalignments in terms of data quality and quantity. Analyses with a best-fitting site-heterogeneous evolutionary model provide strong statistical support for placing sponges as the sister-group to all other metazoans, with ctenophores emerging as the second-earliest branching animal lineage. Only those methodological settings that exacerbated long-branch attraction artifacts yielded Ctenophora-sister. These results show that methodological issues must be carefully addressed to tackle difficult phylogenetic questions and pave the road to a better understanding of how fundamental features of animal body plans have emerged.

Assuntos

Evolução Biológica , Genoma , Invertebrados/classificação , Filogenia , Poríferos/genética , Vertebrados/classificação , Animais , Genômica/métodos , Invertebrados/genética , Poríferos/classificação , Vertebrados/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA