Pesquisa | Portal Regional da BVS

Host-Virus Cophylogenetic Trajectories: Investigating Molecular Relationships between Coronaviruses and Bat Hosts.

Li, Wanlin; Tahiri, Nadia.

Viruses ; 16(7)2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-39066295

RESUMO

Bats, with their virus tolerance, social behaviors, and mobility, are reservoirs for emerging viruses, including coronaviruses (CoVs) known for genetic flexibility. Studying the cophylogenetic link between bats and CoVs provides vital insights into transmission dynamics and host adaptation. Prior research has yielded valuable insights into phenomena such as host switching, cospeciation, and other dynamics concerning the interaction between CoVs and bats. Nonetheless, a distinct gap exists in the current literature concerning a comparative cophylogenetic analysis focused on elucidating the contributions of sequence fragments to the co-evolution between hosts and viruses. In this study, we analyzed the cophylogenetic patterns of 69 host-virus connections. Among the 69 host-virus links examined, 47 showed significant cophylogeny based on ParaFit and PACo analyses, affirming strong associations. Focusing on two proteins, ORF1ab and spike, we conducted a comparative analysis of host and CoV phylogenies. For ORF1ab, the specific window ranged in multiple sequence alignment (positions 520-680, 770-870, 2930-3070, and 4910-5080) exhibited the lowest Robinson-Foulds (RF) distance (i.e., 84.62%), emphasizing its higher contribution in the cophylogenetic association. Similarly, within the spike region, distinct window ranges (positions 0-140, 60-180, 100-410, 360-550, and 630-730) displayed the lowest RF distance at 88.46%. Our analysis identified six recombination regions within ORF1ab (positions 360-1390, 550-1610, 680-1680, 700-1710, 2060-3090, and 2130-3250), and four within the spike protein (positions 10-510, 50-560, 170-710, and 230-730). The convergence of minimal RF distance regions with combination regions robustly affirms the pivotal role of recombination in viral adaptation to host selection pressures. Furthermore, horizontal gene transfer reveals prominent instances of partial gene transfer events, occurring not only among variants within the same host species but also crossing host species boundaries. This suggests a more intricate pattern of genetic exchange. By employing a multifaceted approach, our comprehensive strategy offers a nuanced understanding of the intricate interactions that govern the co-evolutionary dynamics between bat hosts and CoVs. This deeper insight enhances our comprehension of viral evolution and adaptation mechanisms, shedding light on the broader dynamics that propel viral diversity.

Assuntos

Quirópteros , Coronavirus , Filogenia , Quirópteros/virologia , Animais , Coronavirus/genética , Coronavirus/classificação , Coronavirus/fisiologia , Evolução Molecular , Interações Hospedeiro-Patógeno/genética , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo , Especificidade de Hospedeiro , Infecções por Coronavirus/virologia

GPTree Cluster: phylogenetic tree cluster generator in the context of supertree inference.

Koshkarov, Aleksandr; Tahiri, Nadia.

Bioinform Adv ; 3(1): vbad023, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37056516

RESUMO

Summary: For many years, evolutionary and molecular biologists have been working with phylogenetic supertrees, which are oriented acyclic graph structures. In the standard approaches, supertrees are obtained by concatenating a set of phylogenetic trees defined on different but overlapping sets of taxa (i.e. species). More recent approaches propose alternative solutions for supertree inference. The testing of new metrics for comparing supertrees and adapting clustering algorithms to overlapping phylogenetic trees with different numbers of leaves requires large amounts of data. In this context, designing a new approach and developing a computer program to generate phylogenetic tree clusters with different numbers of overlapping leaves are key elements to advance research on phylogenetic supertrees and evolution. The main objective of the project is to propose a new approach to simulate clusters of phylogenetic trees defined on different, but mutually overlapping, sets of taxa, with biological events. The proposed generator can be used to generate a certain number of clusters of phylogenetic trees in Newick format with a variable number of leaves and with a defined level of overlap between trees in clusters. Availability and implementation: A Python script version 3.7, called GPTree Cluster, which implements the discussed approach, is freely available at: https://github.com/tahiri-lab/GPTree/tree/GPTreeCluster.

Intelligent personalized shopping recommendation using clustering and supervised machine learning algorithms.

Chabane, Nail; Bouaoune, Achraf; Tighilt, Reda; Abdar, Moloud; Boc, Alix; Lord, Etienne; Tahiri, Nadia; Mazoure, Bogdan; Acharya, U Rajendra; Makarenkov, Vladimir.

PLoS One ; 17(12): e0278364, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36454766

RESUMO

Next basket recommendation is a critical task in market basket data analysis. It is particularly important in grocery shopping, where grocery lists are an essential part of shopping habits of many customers. In this work, we first present a new grocery Recommender System available on the MyGroceryTour platform. Our online system uses different traditional machine learning (ML) and deep learning (DL) algorithms, and provides recommendations to users in a real-time manner. It aims to help Canadian customers create their personalized intelligent weekly grocery lists based on their individual purchase histories, weekly specials offered in local stores, and product cost and availability information. We perform clustering analysis to partition given customer profiles into four non-overlapping clusters according to their grocery shopping habits. Then, we conduct computational experiments to compare several traditional ML algorithms and our new DL algorithm based on the use of a gated recurrent unit (GRU)-based recurrent neural network (RNN) architecture. Our DL algorithm can be viewed as an extension of DREAM (Dynamic REcurrent bAsket Model) adapted to multi-class (i.e. multi-store) classification, since a given user can purchase recommended products in different grocery stores in which these products are available. Among traditional ML algorithms, the highest average F-score of 0.516 for the considered data set of 831 customers was obtained using Random Forest, whereas our proposed DL algorithm yielded the average F-score of 0.559 for this data set. The main advantage of the presented Recommender System is that our intelligent recommendation is personalized, since a separate traditional ML or DL model is built for each customer considered. Such a personalized approach allows us to outperform the prediction results provided by general state-of-the-art DL models.

Assuntos

Algoritmos , Aprendizado de Máquina Supervisionado , Canadá , Análise por Conglomerados , Aprendizado de Máquina

Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.

Tahiri, Nadia; Veriga, Andrey; Koshkarov, Aleksandr; Morozov, Boris.

J Bioinform Comput Biol ; 20(4): 2250012, 2022 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-35798684

RESUMO

The evolutionary histories of genes are susceptible of differing greatly from each other which could be explained by evolutionary variations in horizontal gene transfers or biological recombinations. A phylogenetic tree would therefore represent the evolutionary history of each gene, which may present different patterns from the species tree that defines the main evolutionary patterns. In addition, phylogenetic trees of closely related species should be merged, thus minimizing the topological conflicts they present and obtaining consensus trees (in the case of homogeneous data) or supertrees (in the case of heterogeneous data). The traditional approaches are consensus tree inference (if the set of trees contains the same set of species) or supertrees (if the set of trees contains different, but overlapping sets of species). Consensus trees and supertrees are constructed to produce unique trees. However, these methods lose precision with respect to different evolutionary variability. Other approaches have been implemented to preserve this variability using the [Formula: see text]-means algorithm or the [Formula: see text]-medoids algorithm. Using a new method, we determine all possible consensus trees and supertrees that best represent the most significant evolutionary models in a set of phylogenetic trees, thereby increasing the precision of the results and decreasing the time required. Results: This paper presents in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (called CNNTrees) for multiple tree classification. This new strategy returns a number of clusters of the input phylogenetic trees for different-size sets of trees, which makes the new approach more stable and more robust. The paper provides an in-depth analysis of the relevant, but very difficult, problem of constructing alternative supertrees using phylogenies with different but overlapping sets of taxa. This new model will play an important role in the inference of Trees of Life (ToL). Availability and implementation: CNNTrees is available through a web server at https://tahirinadia.github.io/. The source code, data and information about installation procedures are also available at https://github.com/TahiriNadia/CNNTrees. Supplementary information: Supplementary data are available on GitHub platform. The evolutionary history of species is not unique, but is specific to sets of genes. Indeed, each gene has its own evolutionary history that differs considerably from one gene to another. For example, some individual genes or operons may be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene must be represented by its own phylogenetic tree, which may exhibit different evolutionary patterns than the species tree that accounts for the major vertical descent patterns. The result of traditional consensus tree or supertree inference methods is a single consensus tree or supertree. In this paper, we present in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (CNNTrees) to construct multiple tree classification. This new strategy returns a number of clusters in the order of the input trees, which allows this new approach to be more stable and also more robust.

Assuntos

Algoritmos , Redes Neurais de Computação , Transferência Genética Horizontal , Filogenia , Software

Building alternative consensus trees and supertrees using k-means and Robinson and Foulds distance.

Tahiri, Nadia; Fichet, Bernard; Makarenkov, Vladimir.

Bioinformatics ; 38(13): 3367-3376, 2022 06 27.

Artigo em Inglês | MEDLINE | ID: mdl-35579343

RESUMO

MOTIVATION: Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. RESULTS: We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Calinski-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets. AVAILABILITY AND IMPLEMENTATION: Our KMeansSuperTreeClustering program along with its C++ source code is available at: https://github.com/TahiriNadia/KMeansSuperTreeClustering. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Filogenia , Consenso , Análise por Conglomerados

DoubleRecViz: a web-based tool for visualizing transcript-gene-species tree reconciliation.

Kuitche, Esaie; Qi, Yanchun; Tahiri, Nadia; Parmer, Jack; Ouangraoua, Aïda.

Bioinformatics ; 37(13): 1920-1922, 2021 07 27.

Artigo em Inglês | MEDLINE | ID: mdl-33051656

RESUMO

MOTIVATION: A phylogenetic tree reconciliation is a mapping of one phylogenetic tree onto another which represents the co-evolution of two sets of taxa (e.g. parasite-host co-evolution, gene-species co-evolution). The reconciliation framework was extended to allow modeling the co-evolution of three sets of taxa such as transcript-gene-species co-evolutions. Several web-based tools have been developed for the display and manipulation of phylogenetic trees and co-phylogenetic trees involving two trees, but there currently exists no tool for visualizing the joint reconciliation between three phylogenetic trees. RESULTS: Here, we present DoubleRecViz, a web-based tool for visualizing double reconciliations between phylogenetic trees at three levels: transcript, gene and species. DoubleRecViz extends the RecPhyloXML model-developed for gene-species tree reconciliation-to represent joint transcript-gene and gene-species tree reconciliations. It is implemented using the Dash library, which is a toolbox that provides dynamic visualization functionalities for web data visualization in Python. AVAILABILITY AND IMPLEMENTATION: DoubleRecViz is available through a web server at https://doublerecviz.cobius.usherbrooke.ca. The source code and information about installation procedures are also available at https://github.com/UdeS-CoBIUS/DoubleRecViz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Evolução Molecular , Software , Algoritmos , Internet , Filogenia

A new fast method for inferring multiple consensus trees using k-medoids.

Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir.

BMC Evol Biol ; 18(1): 48, 2018 04 05.

Artigo em Inglês | MEDLINE | ID: mdl-29621975

RESUMO

BACKGROUND: Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. RESULTS: We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Calinski-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. CONCLUSIONS: The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.

Assuntos

Algoritmos , Genômica/métodos , Filogenia , Archaea/metabolismo , Análise por Conglomerados , Simulação por Computador , Transferência Genética Horizontal/genética , Proteínas Ribossômicas/metabolismo , Especificidade da Espécie

A new efficient algorithm for inferring explicit hybridization networks following the Neighbor-Joining principle.

Willems, Matthieu; Tahiri, Nadia; Makarenkov, Vladimir.

J Bioinform Comput Biol ; 12(5): 1450024, 2014 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-25219384

RESUMO

Several algorithms and software have been developed for inferring phylogenetic trees. However, there exist some biological phenomena such as hybridization, recombination, or horizontal gene transfer which cannot be represented by a tree topology. We need to use phylogenetic networks to adequately represent these important evolutionary mechanisms. In this article, we present a new efficient heuristic algorithm for inferring hybridization networks from evolutionary distance matrices between species. The famous Neighbor-Joining concept and the least-squares criterion are used for building networks. At each step of the algorithm, before joining two given nodes, we check if a hybridization event could be related to one of them or to both of them. The proposed algorithm finds the exact tree solution when the considered distance matrix is a tree metric (i.e. it is representable by a unique phylogenetic tree). It also provides very good hybrids recovery rates for large trees (with 32 and 64 leaves in our simulations) for both distance and sequence types of data. The results yielded by the new algorithm for real and simulated datasets are illustrated and discussed in detail.

Assuntos

Algoritmos , Hibridização Genética , Filogenia , Animais , Evolução Biológica , Biologia Computacional , Simulação por Computador , Culicidae/classificação , Culicidae/genética , Bases de Dados Genéticas , Diploide , Análise dos Mínimos Quadrados , Modelos Genéticos , Plantas/classificação , Plantas/genética , Poliploidia , Software

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA