Pesquisa | Biblioteca Virtual em Saúde

Asymmetric Cluster-Based Measures for Comparative Phylogenetics.

Wagle, Sanket; Markin, Alexey; Górecki, Pawel; Anderson, Tavis K; Eulenstein, Oliver.

J Comput Biol ; 31(4): 312-327, 2024 04.

Artigo em Inglês | MEDLINE | ID: mdl-38634854

RESUMO

Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.

Assuntos

Algoritmos , Filogenia , Análise por Conglomerados , Modelos Genéticos , Biologia Computacional/métodos , Evolução Molecular

PARNAS: Objectively Selecting the Most Representative Taxa on a Phylogeny.

Markin, Alexey; Wagle, Sanket; Grover, Siddhant; Vincent Baker, Amy L; Eulenstein, Oliver; Anderson, Tavis K.

Syst Biol ; 72(5): 1052-1063, 2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37208300

RESUMO

The use of next-generation sequencing technology has enabled phylogenetic studies with hundreds of thousands of taxa. Such large-scale phylogenies have become a critical component in genomic epidemiology in pathogens such as SARS-CoV-2 and influenza A virus. However, detailed phenotypic characterization of pathogens or generating a computationally tractable dataset for detailed phylogenetic analyses requires objective subsampling of taxa. To address this need, we propose parnas, an objective and flexible algorithm to sample and select taxa that best represent observed diversity by solving a generalized k-medoids problem on a phylogenetic tree. parnas solves this problem efficiently and exactly by novel optimizations and adapting algorithms from operations research. For more nuanced selections, taxa can be weighted with metadata or genetic sequence parameters, and the pool of potential representatives can be user-constrained. Motivated by influenza A virus genomic surveillance and vaccine design, parnas can be applied to identify representative taxa that optimally cover the diversity in a phylogeny within a specified distance radius. We demonstrated that parnas is more efficient and flexible than existing approaches. To demonstrate its utility, we applied parnas to 1) quantify SARS-CoV-2 genetic diversity over time, 2) select representative influenza A virus in swine genes derived from over 5 years of genomic surveillance data, and 3) identify gaps in H3N2 human influenza A virus vaccine coverage. We suggest that our method, through the objective selection of representatives in a phylogeny, provides criteria for quantifying genetic diversity that has application in the the rational design of multivalent vaccines and genomic epidemiology. PARNAS is available at https://github.com/flu-crew/parnas.

Assuntos

Vírus da Influenza A Subtipo H3N2 , Vacinas , Animais , Humanos , Suínos , Filogenia , Vírus da Influenza A Subtipo H3N2/genética , Genômica

RF-Net 2: fast inference of virus reassortment and hybridization networks.

Markin, Alexey; Wagle, Sanket; Anderson, Tavis K; Eulenstein, Oliver.

Bioinformatics ; 38(8): 2144-2152, 2022 04 12.

Artigo em Inglês | MEDLINE | ID: mdl-35150239

RESUMO

MOTIVATION: A phylogenetic network is a powerful model to represent entangled evolutionary histories with both divergent (speciation) and convergent (e.g. hybridization, reassortment, recombination) evolution. The standard approach to inference of hybridization networks is to (i) reconstruct rooted gene trees and (ii) leverage gene tree discordance for network inference. Recently, we introduced a method called RF-Net for accurate inference of virus reassortment and hybridization networks from input gene trees in the presence of errors commonly found in phylogenetic trees. While RF-Net demonstrated the ability to accurately infer networks with up to four reticulations from erroneous input gene trees, its application was limited by the number of reticulations it could handle in a reasonable amount of time. This limitation is particularly restrictive in the inference of the evolutionary history of segmented RNA viruses such as influenza A virus (IAV), where reassortment is one of the major mechanisms shaping the evolution of these pathogens. RESULTS: Here, we expand the functionality of RF-Net that makes it significantly more applicable in practice. Crucially, we introduce a fast extension to RF-Net, called Fast-RF-Net, that can handle large numbers of reticulations without sacrificing accuracy. In addition, we develop automatic stopping criteria to select the appropriate number of reticulations heuristically and implement a feature for RF-Net to output error-corrected input gene trees. We then conduct a comprehensive study of the original method and its novel extensions and confirm their efficacy in practice using extensive simulation and empirical IAV evolutionary analyses. AVAILABILITY AND IMPLEMENTATION: RF-Net 2 is available at https://github.com/flu-crew/rf-net-2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Vírus da Influenza A , Filogenia , Simulação por Computador , Vírus da Influenza A/genética , Evolução Molecular , Modelos Genéticos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA