Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 86
Filtrar
1.
Nature ; 629(8013): 851-860, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38560995

RESUMO

Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method and the choice of genomic regions1-3. Here we address these issues by analysing the genomes of 363 bird species4 (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a marked degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous-Palaeogene boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that are a challenge to model due to either extreme DNA composition, variable substitution rates, incomplete lineage sorting or complex evolutionary events such as ancient hybridization. Assessment of the effects of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates and relative brain size following the Cretaceous-Palaeogene extinction event, supporting the hypothesis that emerging ecological opportunities catalysed the diversification of modern birds. The resulting phylogenetic estimate offers fresh insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.


Assuntos
Aves , Evolução Molecular , Genoma , Filogenia , Animais , Aves/genética , Aves/classificação , Aves/anatomia & histologia , Encéfalo/anatomia & histologia , Extinção Biológica , Genoma/genética , Genômica , Densidade Demográfica , Masculino , Feminino
2.
Genome Res ; 31(11): 2107-2119, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34426513

RESUMO

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, is coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies, and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation, it is flexible enough to enable future implementations of various population models.


Assuntos
Genética Populacional , Modelos Genéticos , Animais , Simulação por Computador , Humanos , Densidade Demográfica , Recombinação Genética
3.
PLoS Genet ; 17(8): e1009701, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34407067

RESUMO

Trait evolution among a set of species-a central theme in evolutionary biology-has long been understood and analyzed with respect to a species tree. However, the field of phylogenomics, which has been propelled by advances in sequencing technologies, has ushered in the era of species/gene tree incongruence and, consequently, a more nuanced understanding of trait evolution. For a trait whose states are incongruent with the branching patterns in the species tree, the same state could have arisen independently in different species (homoplasy) or followed the branching patterns of gene trees, incongruent with the species tree (hemiplasy). Another evolutionary process whose extent and significance are better revealed by phylogenomic studies is gene flow between different species. In this work, we present a phylogenomic method for assessing the role of hybridization and introgression in the evolution of polymorphic or monomorphic binary traits. We apply the method to simulated evolutionary scenarios to demonstrate the interplay between the parameters of the evolutionary history and the role of introgression in a binary trait's evolution (which we call xenoplasy). Very importantly, we demonstrate, including on a biological data set, that inferring a species tree and using it for trait evolution analysis in the presence of gene flow could lead to misleading hypotheses about trait evolution.


Assuntos
Biologia Computacional/métodos , Introgressão Genética/genética , Locos de Características Quantitativas , Evolução Molecular , Especiação Genética , Modelos Genéticos , Fenótipo , Filogenia
4.
Bioinformatics ; 38(10): 2912-2914, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561189

RESUMO

SUMMARY: We report on a new single-cell DNA sequence simulator, SimSCSnTree, which generates an evolutionary tree of cells and evolves single nucleotide variants (SNVs) and copy number aberrations (CNAs) along its branches. Data generated by the simulator can be used to benchmark tools for single-cell genomic analyses, particularly in cancer where SNVs and CNAs are ubiquitous. AVAILABILITY AND IMPLEMENTATION: SimSCSnTree is now on BioConda and also is freely available for download at https://github.com/compbiofan/SimSCSnTree.git with detailed documentation.


Assuntos
Genoma , Genômica , Sequência de Bases , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Análise de Célula Única , Software
5.
Bioinformatics ; 38(Suppl 1): i195-i202, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758771

RESUMO

MOTIVATION: Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data. RESULTS: Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCIΦ in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases. AVAILABILITY AND IMPLEMENTATION: Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Nucleotídeos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Filogenia , Análise de Sequência de DNA
6.
Mol Phylogenet Evol ; 181: 107724, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36720421

RESUMO

Accurate inference of population parameters plays a pivotal role in unravelling evolutionary histories. While recombination has been universally accepted as a fundamental process in the evolution of sexually reproducing organisms, it remains challenging to model it exactly. Thus, existing coalescent-based approaches make different assumptions or approximations to facilitate phylogenetic inference, which can potentially bring about biases in estimates of evolutionary parameters when recombination is present. In this article, we evaluate the performance of population parameter estimation using three methods-StarBEAST2, SNAPP, and diCal2-that represent three different types of inference. We performed whole-genome simulations in which recombination rates, mutation rates, and levels of incomplete lineage sorting were varied. We show that StarBEAST2 using short or medium-sized loci is robust to realistic rates of recombination, which is in agreement with previous studies. SNAPP, as expected, is generally unaffected by recombination events. Most surprisingly, diCal2, a method that is designed to explicitly account for recombination, performs considerably worse than other methods under comparison.


Assuntos
Genoma , Taxa de Mutação , Filogenia , Recombinação Genética , Modelos Genéticos , Simulação por Computador
7.
Syst Biol ; 71(3): 706-720, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-34605924

RESUMO

Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this article, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package. [Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees; multispecies network coalescent; phylogenetic networks; polyploidy.].


Assuntos
Hibridização Genética , Poliploidia , Evolução Biológica , Humanos , Filogenia
8.
Syst Biol ; 71(2): 367-381, 2022 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-34245291

RESUMO

Many recent phylogenetic methods have focused on accurately inferring species trees when there is gene tree discordance due to incomplete lineage sorting (ILS). For almost all of these methods, and for phylogenetic methods in general, the data for each locus are assumed to consist of orthologous, single-copy sequences. Loci that are present in more than a single copy in any of the studied genomes are excluded from the data. These steps greatly reduce the number of loci available for analysis. The question we seek to answer in this study is: what happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two large biological data sets, we show that running such methods on data with paralogs can still provide accurate results. We use multiple different methods, some of which are based directly on the multispecies coalescent model, and some of which have been proven to be statistically consistent under it. We also treat the paralogous loci in multiple ways: from explicitly denoting them as paralogs, to randomly selecting one copy per species. In all cases, the inferred species trees are as accurate as equivalent analyses using single-copy orthologs. Our results have significant implications for the use of ILS-aware phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci. This will greatly increase the amount of data that can be used for phylogenetic inference.[Gene duplication and loss; incomplete lineage sorting; multispecies coalescent; orthology; paralogy.].


Assuntos
Duplicação Gênica , Modelos Genéticos , Simulação por Computador , Genoma , Filogenia
9.
PLoS Comput Biol ; 18(6): e1010216, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35675326

RESUMO

Phylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show using simulation that it is necessary to delineate homology groups in a set of bacterial genomes without relying on gene annotations to define the boundaries of homologous regions. To solve this problem, we have developed a graph-based algorithm to partition a set of bacterial genomes into Maximal Homologous Groups of sequences (MHGs) where each MHG is a maximal set of maximum-length sequences which are homologous across the entire sequence alignment. We applied our algorithm to a dataset of 19 Enterobacteriaceae species and found that MHGs cover much greater proportions of genomes than markers and, relatedly, are less biased in terms of the functions of the genes they cover. We zoomed in on the correlation between each individual marker and their overlapping MHGs, and show that few phylogenetic splits supported by the markers are supported by the MHGs while many marker-supported splits are contradicted by the MHGs. A comparison of the species tree inferred from marker genes with the species tree inferred from MHGs suggests that the increased bias and lack of genome coverage by markers causes incorrect inferences as to the overall relationship between bacterial taxa.


Assuntos
Genoma Bacteriano , Células Procarióticas , Transferência Genética Horizontal , Genoma Bacteriano/genética , Filogenia , Alinhamento de Sequência
10.
Genome Res ; 29(11): 1847-1859, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31628257

RESUMO

Accumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS data sets, including false-positives (FP) and false-negatives (FN) due to allelic dropout, and cell doublets, significantly complicate these tasks. Here, we propose a nonparametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone, and the evolutionary relationship between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental data sets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.


Assuntos
Células Clonais , Genótipo , Neoplasias/genética , Análise de Célula Única/métodos , Teorema de Bayes , Humanos , Filogenia , Mutação Puntual
11.
Mol Biol Evol ; 37(6): 1809-1818, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32077947

RESUMO

Species tree inference from multilocus data has emerged as a powerful paradigm in the postgenomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown not only to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets. In this article, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.


Assuntos
Modelos Genéticos , Filogenia , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo
12.
PLoS Comput Biol ; 16(7): e1008012, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32658894

RESUMO

Single-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. We benchmarked three widely used methods-Ginkgo, HMMcopy, and CopyNumber-on simulated as well as real datasets. To facilitate this, we developed a novel simulator of single-cell genome evolution in the presence of CNAs. Furthermore, to assess performance on empirical data where the ground truth is unknown, we introduce a phylogeny-based measure for identifying potentially erroneous inferences. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, our findings show that even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.


Assuntos
Variações do Número de Cópias de DNA , Genoma Humano , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Algoritmos , Aberrações Cromossômicas , Biologia Computacional , Simulação por Computador , Dosagem de Genes , Humanos , Mutação , Neoplasias/genética , Ploidias , Distribuição de Poisson , Curva ROC , Reprodutibilidade dos Testes , Software
13.
BMC Genomics ; 21(Suppl 2): 219, 2020 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-32299348

RESUMO

BACKGROUND: Multi-locus species phylogeny inference is based on models of sequence evolution on gene trees as well as models of gene tree evolution within the branches of species phylogenies. Almost all statistical methods for this inference task assume a common mechanism across all loci as captured by a single value of each branch length of the species phylogeny. RESULTS: In this paper, we pursue a "no common mechanism" (NCM) model, where every gene tree evolves according to its own parameters of the species phylogeny. Based on this model, we derive an analytically integrated likelihood of both species trees and networks given the gene trees of multiple loci under an NCM model. We demonstrate the performance of inference under this integrated likelihood on both simulated and biological data. CONCLUSIONS: The model presented here will afford opportunities for exploring connections among various criteria for estimating species phylogenies from multiple, independent loci. Furthermore, further development of this model could potentially result in more efficient methods for searching the space of species phylogenies by focusing solely on the topology of the phylogeny.


Assuntos
Evolução Molecular , Genômica/métodos , Animais , Simulação por Computador , Culicidae/genética , Especiação Genética , Funções Verossimilhança , Modelos Genéticos , Redes Neurais de Computação , Filogenia , Probabilidade , Testamentos/estatística & dados numéricos
14.
Genome Res ; 27(5): 793-800, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28104618

RESUMO

Achieving complete, accurate, and cost-effective assembly of human genomes is of great importance for realizing the promise of precision medicine. The abundance of repeats and genetic variations in human genomes and the limitations of existing sequencing technologies call for the development of novel assembly methods that can leverage the complementary strengths of multiple technologies. We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next-generation sequencing and single-molecule sequencing technologies to accurately assemble and detect structural variants (SVs) in human genomes. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance the assembly of structurally altered regions in human genomes. We used data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878) to test our approach. The result showed that, compared with existing methods, our approach had a low false discovery rate and substantially improved the detection of many types of SVs, particularly novel large insertions, small indels (10-50 bp), and short tandem repeat expansions and contractions. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Variação Estrutural do Genoma , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Mapeamento de Sequências Contíguas/normas , Diploide , Genômica/normas , Haploidia , Humanos , Camundongos , Análise de Sequência de DNA/normas , Sequências de Repetição em Tandem
15.
Bioinformatics ; 35(14): i370-i378, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510688

RESUMO

MOTIVATION: Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. RESULTS: In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. AVAILABILITY AND IMPLEMENTATION: We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Filogenia , Evolução Molecular , Genoma , Alinhamento de Sequência , Software
16.
PLoS Genet ; 13(2): e1006598, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28178269

RESUMO

[This corrects the article DOI: 10.1371/journal.pgen.1006006.].

17.
Nat Methods ; 13(6): 505-7, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27088313

RESUMO

Current variant callers are not suitable for single-cell DNA sequencing, as they do not account for allelic dropout, false-positive errors and coverage nonuniformity. We developed Monovar (https://bitbucket.org/hamimzafar/monovar), a statistical method for detecting and genotyping single-nucleotide variants in single-cell data. Monovar exhibited superior performance over standard algorithms on benchmarks and in identifying driver mutations and delineating clonal substructure in three different human tumor data sets.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Algoritmos , Benchmarking , Linhagem Celular , Exoma/genética , Humanos , Sensibilidade e Especificidade , Análise de Sequência de DNA/estatística & dados numéricos , Análise de Célula Única/estatística & dados numéricos
18.
Bioinformatics ; 34(13): i376-i385, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29950004

RESUMO

Motivation: Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method's applicability. Results: In this article, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger datasets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss. Availability and implementation: The methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet).


Assuntos
Alelos , Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Software , Evolução Molecular , Probabilidade
19.
Bioinformatics ; 34(17): i697-i705, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30423064

RESUMO

Motivation: Species and gene trees represent how species and individual loci within their genomes evolve from their most recent common ancestors. These trees are central to addressing several questions in biology relating to, among other issues, species conservation, trait evolution and gene function. Consequently, their accurate inference from genomic data is a major endeavor. One approach to their inference is to co-estimate species and gene trees from genome-wide data. Indeed, Bayesian methods based on this approach already exist. However, these methods are very slow, limiting their applicability to datasets with small numbers of taxa. The more commonly used approach is to first infer gene trees individually, and then use gene tree estimates to infer the species tree. Methods in this category rely significantly on the accuracy of the gene trees which is often not high when the dataset includes closely related species. Results: In this work, we introduce a simple, yet effective, iterative method for co-estimating gene and species trees from sequence data of multiple, unlinked loci. In every iteration, the method estimates a species tree, uses it as a generative process to simulate a collection of gene trees, and then selects gene trees for the individual loci from among the simulated gene trees by making use of the sequence data. We demonstrate the accuracy and efficiency of our method on simulated as well as biological data, and compare them to those of existing competing methods. Availability and implementation: The method has been implemented in PhyloNet, which is publicly available at http://bioinfocs.rice.edu/phylonet.


Assuntos
Simulação por Computador , Aprendizado de Máquina , Teorema de Bayes , Genômica/métodos , Modelos Genéticos
20.
Bioinformatics ; 34(16): 2848-2850, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29562324

RESUMO

Summary: The evolutionary histories of individual regions across a genomic alignment-called 'local genealogies'-can differ from each other, due to processes such as recombination. Elucidating and analyzing these local genealogies are important for a large number of inference tasks, including those pertaining to species phylogenies, evolutionary processes and trait mapping. In this paper, we present a toolkit for automated local phylogenomic analyses, or ALPHA. The purpose of this toolkit is to provide a wide array of functionalities for automated inference of local genealogies as well as analyses based on these local genealogies. The toolkit uses sliding windows to construct local genealogies and can compute a wide array of local phylogeny based statistics, such as the D-statistic. The toolkit comes with a graphical user interface and several import/export functionalities. Over the last few decades, much emphasis in phylogenomics has been put on developing tools for inferring species phylogenies. This toolkit complements those efforts by emphasizing the 'local' aspect of phylogenomics. Availability and implementation: ALPHA is freely available for installation and use, including source code, at https://github.com/chilleo/ALPHA.


Assuntos
Genômica/métodos , Filogenia , Genealogia e Heráldica , Genoma , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA