Search | VHL Regional Portal

The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning.

Azouri, Dana; Granit, Oz; Alburquerque, Michael; Mansour, Yishay; Pupko, Tal; Mayrose, Itay.

Mol Biol Evol ; 41(6)2024 Jun 01.

Article in English | MEDLINE | ID: mdl-38829798

ABSTRACT

The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.

Subject(s)

Algorithms , Phylogeny , Likelihood Functions , Models, Genetic , Computational Biology/methods , Software

A machine-learning-based alternative to phylogenetic bootstrap.

Ecker, Noa; Huchon, Dorothée; Mansour, Yishay; Mayrose, Itay; Pupko, Tal.

Bioinformatics ; 40(Supplement_1): i208-i217, 2024 Jun 28.

Article in English | MEDLINE | ID: mdl-38940166

ABSTRACT

MOTIVATION: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. RESULTS: Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets. AVAILABILITY AND IMPLEMENTATION: The data supporting this work are available in the Figshare repository at https://doi.org/10.6084/m9.figshare.25050554.v1, and the underlying code is accessible via GitHub at https://github.com/noaeker/bootstrap_repo.

Subject(s)

Algorithms , Machine Learning , Phylogeny , Software , Sequence Alignment/methods , Computational Biology/methods , Likelihood Functions

Statistical framework to determine indel-length distribution.

Wygoda, Elya; Loewenthal, Gil; Moshe, Asher; Alburquerque, Michael; Mayrose, Itay; Pupko, Tal.

Bioinformatics ; 40(2)2024 02 01.

Article in English | MEDLINE | ID: mdl-38269647

ABSTRACT

MOTIVATION: Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS: We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.

Subject(s)

Algorithms , Software , Bayes Theorem , Sequence Alignment , INDEL Mutation , Evolution, Molecular

Publisher Correction: Revising the global biogeography of annual and perennial plants.

Poppenwimer, Tyler; Mayrose, Itay; DeMalach, Niv.

Nature ; 626(8000): E16, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38297131

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL