Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Syst Biol ; 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38456663

RESUMO

The molluscan order Neogastropoda encompasses over 15,000 almost exclusively marine species playing important roles in benthic communities and in the economies of coastal countries. Neogastropoda underwent intensive cladogenesis in early stages of diversification, generating a 'bush' at the base of their evolutionary tree, that has been hard to resolve even with high throughput molecular data. In the present study to resolve the bush, we use a variety of phylogenetic inference methods and a comprehensive exon capture dataset of 1,817 loci (79.6% data occupancy) comprising 112 taxa of 48 out of 60 Neogastropoda families. Our results show consistent topologies and high support in all analyses at (super)family level, supporting monophyly of Muricoidea, Mitroidea, Conoidea, and, with some reservations, Olivoidea and Buccinoidea. Volutoidea and Turbinelloidea as currently circumscribed are clearly paraphyletic. Despite our analyses consistently resolving most backbone nodes, three prove problematic: First, uncertain placement of Cancellariidae, as the sister group to either a Ficoidea-Tonnoidea clade, or to the rest of Neogastropoda, leaves monophyly of Neogastropoda unresolved. Second, relationships are contradictory at the base of the major 'core Neogastropoda' grouping. Third, coalescence-based analyses reject monophyly of the Buccinoidea in relation to Vasidae. We analysed phylogenetic signal of targeted loci in relation to potential biases, and we propose most probable resolutions in the latter two recalcitrant nodes. The uncertain placement of Cancellariidae may be explained by orthology violations due to differential paralog loss shortly after the whole genome duplication, which should be resolved with a curated set of longer loci.

2.
Mol Phylogenet Evol ; 191: 107969, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38007006

RESUMO

Taxon sampling in most phylogenomic studies is often based on known taxa and/or morphospecies, thus ignoring undescribed diversity and/or cryptic lineages. The family Turridae is a group of venomous snails within the hyperdiverse superfamily Conoidea that includes many undescribed and cryptic species. Therefore 'traditional' taxon sampling could constitute a strong risk of undersampling or oversampling Turridae lineages. To minimize potential biases, we establish a robust sampling strategy, from species delimitation to phylogenomics. More than 3,000 cox-1 "barcode" sequences were used to propose 201 primary species hypotheses, nearly half of them corresponding to species potentially new to science, including several cryptic species. A 110-taxa exon-capture tree, including species representatives of the diversity uncovered with the cox-1 dataset, was build using up to 4,178 loci. Our results show the polyphyly of the genus Gemmula, that is split into up to 10 separate lineages, of which half would not have been detected if the sampling strategy was based only on described species. Our results strongly suggest that the use of blind, exploratory and intensive barcode sampling is necessary to avoid sampling biases in phylogenomic studies.


Assuntos
Código de Barras de DNA Taxonômico , Caramujos , Animais , Filogenia , Caramujos/genética , DNA , Éxons
3.
Syst Biol ; 72(6): 1280-1295, 2023 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-37756489

RESUMO

The bootstrap method is based on resampling sequence alignments and re-estimating trees. Felsenstein's bootstrap proportions (FBP) are the most common approach to assess the reliability and robustness of sequence-based phylogenies. However, when increasing taxon sampling (i.e., the number of sequences) to hundreds or thousands of taxa, FBP tend to return low support for deep branches. The transfer bootstrap expectation (TBE) has been recently suggested as an alternative to FBP. TBE is measured using a continuous transfer index in [0,1] for each bootstrap tree, instead of the binary {0,1} index used in FBP to measure the presence/absence of the branch of interest. TBE has been shown to yield higher and more informative supports while inducing a very low number of falsely supported branches. Nonetheless, it has been argued that TBE must be used with care due to sampling issues, especially in datasets with a high number of closely related taxa. In this study, we conduct multiple experiments by varying taxon sampling and comparing FBP and TBE support values on different phylogenetic depths, using empirical datasets. Our results show that the main critique of TBE stands in extreme cases with shallow branches and highly unbalanced sampling among clades, but that TBE is still robust in most cases, while FBP is inescapably negatively impacted by high taxon sampling. We suggest guidelines and good practices in TBE (and FBP) computing and interpretation.


Assuntos
Filogenia , Reprodutibilidade dos Testes
4.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1700-1712, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35849662

RESUMO

MAGUS is a recent multiple sequence alignment method that provides excellent accuracy on large challenging datasets. MAGUS uses divide-and-conquer: it divides the sequences into disjoint sets, computes alignments on the disjoint sets, and then merges the alignments using a technique it calls the Graph Clustering Method (GCM). To understand why MAGUS is so accurate, we show that GCM is a good heuristic for the NP-hard MWT-AM problem (Maximum Weight Trace, adapted to the Alignment Merging problem). Our study, using both biological and simulated data, establishes that MWT-AM scores correlate very well with alignment accuracy and presents improvements to GCM that are even better heuristics for MWT-AM. This study suggests a new direction for large-scale MSA estimation based on improved divide-and-conquer strategies, with the merging step based on optimizing MWT-AM. MAGUS and its enhanced versions are available at https://github.com/vlasmirnov/MAGUS.


Assuntos
Algoritmos , Software , Alinhamento de Sequência , Heurística , Análise por Conglomerados
5.
Zool Scr ; 51(5): 550-561, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36245672

RESUMO

The Neogastropoda (Mollusca, Gastropoda) encompass more than 15,000 described species of marine predators, including several model organisms in toxinology, embryology and physiology. However, their phylogenetic relationships remain mostly unresolved and their classification unstable. We took advantage of the many mitogenomes published in GenBank to produce a new molecular phylogeny of the neogastropods. We completed the taxon sampling by using an in-house bioinformatic pipeline to retrieve mitochondrial genes from 13 transcriptomes, corresponding to five families not represented in GenBank, for a final dataset of 113 taxa. Because mitogenomic data are prone to reconstruction artefacts, eight different evolutionary models were applied to reconstruct phylogenetic trees with IQTREE, RAxML and MrBayes. If the over-parametrization of some models produced trees with aberrant internal long branches, the global topology of the trees remained stable over models and softwares, and several relationships were revealed or found supported here for the first time. However, even if our dataset encompasses 60% of the valid families of neogastropods, some key taxa are missing and should be added in the future before proposing a revision of the classification of the neogastropods. Our study also demonstrates that even complex models struggle to satisfactorily handle the evolutionary history of mitogenomes, still leading to long-branch attractions in phylogenetic trees. Other approaches, such as reduced-genome strategies, must be envisaged to fully resolve the neogastropod phylogeny.

6.
Philos Trans R Soc Lond B Biol Sci ; 377(1861): 20210244, 2022 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-35989607

RESUMO

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.


Assuntos
Duplicação Gênica , Genômica , Genômica/métodos , Filogenia , Alinhamento de Sequência
7.
J Comput Biol ; 29(1): 74-89, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34986031

RESUMO

Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Filogenia , Sequência de Aminoácidos , Classificação/métodos , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Evolução Molecular
8.
Syst Biol ; 71(3): 610-629, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-34450658

RESUMO

Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.].


Assuntos
Algoritmos , Duplicação Gênica , Biologia Computacional , Modelos Genéticos , Linhagem , Filogenia
9.
Bioinformatics ; 38(4): 918-924, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34791036

RESUMO

SUMMARY: Multiple sequence alignment is an initial step in many bioinformatics pipelines, including phylogeny estimation, protein structure prediction and taxonomic identification of reads produced in amplicon or metagenomic datasets, etc. Yet, alignment estimation is challenging on datasets that exhibit substantial sequence length heterogeneity, and especially when the datasets have fragmentary sequences as a result of including reads or contigs generated by next-generation sequencing technologies. Here, we examine techniques that have been developed to improve alignment estimation when datasets contain substantial numbers of fragmentary sequences. We find that MAGUS, a recently developed MSA method, is fairly robust to fragmentary sequences under many conditions, and that using a two-stage approach where MAGUS is used to align selected 'backbone sequences' and the remaining sequences are added into the alignment using ensembles of Hidden Markov Models further improves alignment accuracy. The combination of MAGUS with the ensemble of eHMMs (i.e. MAGUS+eHMMs) clearly improves on UPP, the previous leading method for aligning datasets with high levels of fragmentation. AVAILABILITY AND IMPLEMENTATION: UPP is available on https://github.com/smirarab/sepp, and MAGUS is available on https://github.com/vlasmirnov/MAGUS. MAGUS+eHMMs can be performed by running MAGUS to obtain the backbone alignment, and then using the backbone alignment as an input to UPP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Alinhamento de Sequência , Proteínas/genética , Proteínas/química , Metagenoma , Filogenia
10.
Proc Biol Sci ; 288(1954): 20211017, 2021 07 14.
Artigo em Inglês | MEDLINE | ID: mdl-34229491

RESUMO

Marine gastropods of the genus Conus are renowned for their remarkable diversity and deadly venoms. While Conus venoms are increasingly well studied for their biomedical applications, we know surprisingly little about venom composition in other lineages of Conidae. We performed comprehensive venom transcriptomic profiling for Conasprella coriolisi and Pygmaeconus traillii, first time for both respective genera. We complemented reference-based transcriptome annotation by a de novo toxin prediction guided by phylogeny, which involved transcriptomic data on two additional 'divergent' cone snail lineages, Profundiconus, and Californiconus. We identified toxin clusters (SSCs) shared among all or some of the four analysed genera based on the identity of the signal region-a molecular tag present in toxins. In total, 116 and 98 putative toxins represent 29 and 28 toxin gene superfamilies in Conasprella and Pygmaeconus, respectively; about quarter of these only found by semi-manual annotation of the SSCs. Two rare gene superfamilies, originally identified from fish-hunting cone snails, were detected outside Conus rather unexpectedly, so we further investigated their distribution across Conidae radiation. We demonstrate that both these, in fact, are ubiquitous in Conidae, sometimes with extremely high expression. Our findings demonstrate how a phylogeny-aware approach circumvents methodological caveats of similarity-based transcriptome annotation.


Assuntos
Conotoxinas , Caramujo Conus , Animais , Caramujo Conus/genética , Filogenia , Caramujos , Peçonhas
11.
Bioinformatics ; 37(24): 4677-4683, 2021 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-34320635

RESUMO

MOTIVATION: BAli-Phy, a popular Bayesian method that co-estimates multiple sequence alignments and phylogenetic trees, is a rigorous statistical method, but due to its computational requirements, it has generally been limited to relatively small datasets (at most about 100 sequences). Here, we repurpose BAli-Phy as a 'phylogeny-aware' alignment method: we estimate the phylogeny from the input of unaligned sequences, and then use that as a fixed tree within BAli-Phy. RESULTS: We show that this approach achieves high accuracy, greatly superior to Prank, the current most popular phylogeny-aware alignment method, and is even more accurate than MAFFT, one of the top performing alignment methods in common use. Furthermore, this approach can be used to align very large datasets (up to 1000 sequences in this study). AVAILABILITY AND IMPLEMENTATION: See https://doi.org/10.13012/B2IDB-7863273_V1 for datasets used in this study. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Filogenia , Teorema de Bayes , Indonésia , Alinhamento de Sequência
12.
Mol Phylogenet Evol ; 142: 106660, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31639524

RESUMO

For over a decade now, High Throughput sequencing (HTS) approaches have revolutionized phylogenetics, both in terms of data production and methodology. While transcriptomes and (reduced) genomes are increasingly used, generating and analyzing HTS datasets remain expensive, time consuming and complex for most non-model taxa. Indeed, a literature survey revealed that 74% of the molecular phylogenetics trees published in 2018 are based on data obtained through Sanger sequencing. In this context, our goal was to identify the strategy that would represent the best compromise among costs, time and robustness of the resulting tree. We sequenced and assembled 32 transcriptomes of the marine mollusk family Turridae, considered as a typical non-model animal taxon. From these data, we extracted the loci most commonly used in gastropod phylogenies (cox1, 12S, 16S, 28S, h3 and 18S), full mitogenomes, and a reduced nuclear transcriptome representation. With each dataset, we reconstructed phylogenies and compared their robustness and accuracy. We discuss the impact of missing data and the use of statistical tests, tree metrics, and supertree and supermatrix methods to further improve phylogenetic data acquisition pipelines. We evaluated the overall costs (time and money) in order to identify the best compromise for phylogenetic data sampling in non-model animal taxa. Although sequencing full mitogenomes seems to constitute the best compromise both in terms of costs and node support, they are known to induce biases in phylogenetic reconstructions. Rather, we recommend to systematically include loci commonly used for phylogenetics and taxonomy (i.e. DNA barcodes, rRNA genes, full mitogenomes, etc.) among the other loci when designing baits for capture.


Assuntos
Filogenia , Animais , Custos e Análise de Custo , Perfilação da Expressão Gênica , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Moluscos/classificação , Moluscos/genética , Análise de Sequência de DNA
13.
Toxins (Basel) ; 11(11)2019 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-31661832

RESUMO

Profundiconus is the most divergent cone snail genus and its unique phylogenetic position, sister to the rest of the family Conidae, makes it a key taxon for examining venom evolution and diversity. Venom gland and foot transcriptomes of Profundiconus cf. vaubani and Profundiconusneocaledonicus were de novo assembled, annotated, and analyzed for differential expression. One hundred and thirty-seven venom components were identified from P. cf. vaubani and 82 from P. neocaledonicus, with only four shared by both species. The majority of the transcript diversity was composed of putative peptides, including conotoxins, profunditoxins, turripeptides, insulin, and prohormone-4. However, there were also a significant percentage of other putative venom components such as chymotrypsin and L-rhamnose-binding lectin. The large majority of conotoxins appeared to be from new gene superfamilies, three of which are highly different from previously reported venom peptide toxins. Their low conotoxin diversity and the type of insulin found suggested that these species, for which no ecological information are available, have a worm or molluscan diet associated with a narrow dietary breadth. Our results indicate that Profundiconus venom is highly distinct from that of other cone snails, and therefore important for examining venom evolution in the Conidae family.


Assuntos
Evolução Biológica , Conotoxinas/genética , Conotoxinas/toxicidade , Caramujo Conus/química , Caramujo Conus/genética , Variação Genética , Animais
14.
Mol Ecol ; 27(22): 4591-4611, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30252979

RESUMO

Species delimitation in poorly known and diverse taxa is usually performed based on monolocus, DNA-barcoding-like approaches, while multilocus data are often used to test alternative species hypotheses in well-studied groups. We combined both approaches to delimit species in the Xenuroturris/Iotyrris complex, a group of venomous marine gastropods from the Indo-Pacific. First, COI sequences were analysed using three methods of species delimitation to propose primary species hypotheses. Second, RAD sequencing data were also obtained and a maximum-likelihood phylogenetic tree produced. We tested the impact of the level of missing data on the robustness of the phylogenetic tree obtained with the RAD-seq data. Alternative species partitions revealed with the COI data set were also tested using the RAD-seq data and the Bayes factor species delimitation method. The congruence between the species hypotheses proposed with the mitochondrial nuclear data sets, together with the morphological variability of the shell and the radula and the distribution pattern, was used to turn the primary species hypotheses into secondary species hypotheses. Allopatric primary species hypotheses defined with the COI gene were interpreted to correspond to intraspecific structure. Most of the species are found sympatrically in the Philippines, and only one is confidently identified as a new species and described as Iotyrris conotaxis n. sp. The results obtained demonstrate the efficiency of the combined monolocus/multilocus approach to delimit species.


Assuntos
Gastrópodes/classificação , Especiação Genética , Filogenia , Análise de Sequência de DNA/métodos , Exoesqueleto , Animais , Teorema de Bayes , Núcleo Celular/genética , DNA Mitocondrial/genética , Oceano Índico , Funções Verossimilhança , Oceano Pacífico
15.
Mol Biol Evol ; 35(10): 2355-2374, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30032303

RESUMO

Transcriptome-based exon capture methods provide an approach to recover several hundred markers from genomic DNA, allowing for robust phylogenetic estimation at deep timescales. We applied this method to a highly diverse group of venomous marine snails, Conoidea, for which published phylogenetic trees remain mostly unresolved for the deeper nodes. We targeted 850 protein coding genes (678,322 bp) in ca. 120 samples, spanning all (except one) known families of Conoidea and a broad selection of non-Conoidea neogastropods. The capture was successful for most samples, although capture efficiency decreased when DNA libraries were of insufficient quality and/or quantity (dried samples or low starting DNA concentration) and when targeting the most divergent lineages. An average of 75.4% of proteins was recovered, and the resulting tree, reconstructed using both supermatrix (IQ-tree) and supertree (Astral-II, combined with the Weighted Statistical Binning method) approaches, are almost fully supported. A reconstructed fossil-calibrated tree dates the origin of Conoidea to the Lower Cretaceous. We provide descriptions for two new families. The phylogeny revealed in this study provides a robust framework to reinterpret changes in Conoidea anatomy through time. Finally, we used the phylogeny to test the impact of the venom gland and radular type on diversification rates. Our analyses revealed that repeated losses of the venom gland had no effect on diversification rates, while families with a breadth of radula types showed increases in diversification rates, thus suggesting that trophic ecology may have an impact on the evolution of Conoidea.


Assuntos
Caramujo Conus/genética , Análise de Sequência de DNA/métodos , Animais , Evolução Biológica , Evolução Molecular , Éxons , Gastrópodes/genética , Variação Genética/genética , Filogenia , Transcriptoma/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA