Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37449891

RESUMEN

SUMMARY: CNAsim is a software package for improved simulation of single-cell copy number alteration (CNA) data from tumors. CNAsim can be used to efficiently generate single-cell copy number profiles for thousands of simulated tumor cells under a more realistic error model and a broader range of possible CNA mechanisms compared with existing simulators. The error model implemented in CNAsim accounts for the specific biases of single-cell sequencing that leads to read count fluctuation and poor resolution of CNA detection. For improved realism over existing simulators, CNAsim can (i) generate WGD, whole-chromosomal CNAs, and chromosome-arm CNAs, (ii) simulate subclonal population structure defined by the accumulation of chromosomal CNAs, and (iii) dilute the sampled cell population with both normal diploid cells and pseudo-diploid cells. The software can also generate DNA-seq data for sampled cells. AVAILABILITY AND IMPLEMENTATION: CNAsim is written in Python and is freely available open-source from https://github.com/samsonweiner/CNAsim.


Asunto(s)
Variaciones en el Número de Copia de ADN , Neoplasias , Humanos , Simulación por Computador , Programas Informáticos , Neoplasias/genética , ADN
2.
Bioinformatics ; 39(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36752504

RESUMEN

MOTIVATION: A chronogram is a dated phylogenetic tree whose branch lengths have been scaled to represent time. Such chronograms are computed based on available date estimates (e.g. from dated fossils), which provide absolute time constraints for one or more nodes of an input undated phylogeny, coupled with an appropriate underlying model for evolutionary rates variation along the branches of the phylogeny. However, traditional methods for phylogenetic dating cannot take into account relative time constraints, such as those provided by inferred horizontal transfer events. In many cases, chronograms computed using only absolute time constraints are inconsistent with known relative time constraints. RESULTS: In this work, we introduce a new approach, Dating Trees using Relative constraints (DaTeR), for phylogenetic dating that can take into account both absolute and relative time constraints. The key idea is to use existing Bayesian approaches for phylogenetic dating to sample posterior chronograms satisfying desired absolute time constraints, minimally adjust or 'error-correct' these sampled chronograms to satisfy all given relative time constraints, and aggregate across all error-corrected chronograms. DaTeR uses a constrained optimization framework for the error-correction step, finding minimal deviations from previously assigned dates or branch lengths. We applied DaTeR to a biological dataset of 170 Cyanobacterial taxa and a reliable set of 24 transfer-based relative constraints, under six different molecular dating models. Our extensive analysis of this dataset demonstrates that DaTeR is both highly effective and scalable and that its application can significantly improve estimated chronograms. AVAILABILITY AND IMPLEMENTATION: Freely available from https://compbio.engr.uconn.edu/software/dater/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Evolución Biológica , Fósiles , Filogenia , Teorema de Bayes , Tiempo , Evolución Molecular
3.
Mol Biol Evol ; 38(6): 2639-2659, 2021 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-33565580

RESUMEN

Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the "scale" of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multigene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale data set of over 22,000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multigene transfer. Among other insights, we find that 1) the observed relative frequency of HMGT increases as divergence between genomes increases, 2) HMGTs often have conserved gene functions, and 3) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.


Asunto(s)
Aeromonas/genética , Transferencia de Gen Horizontal , Genómica/métodos , Programas Informáticos
4.
Nature ; 518(7539): 317-30, 2015 Feb 19.
Artículo en Inglés | MEDLINE | ID: mdl-25693563

RESUMEN

The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.


Asunto(s)
Epigénesis Genética/genética , Epigenómica , Genoma Humano/genética , Secuencia de Bases , Linaje de la Célula/genética , Células Cultivadas , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , Cromosomas Humanos/química , Cromosomas Humanos/genética , Cromosomas Humanos/metabolismo , ADN/química , ADN/genética , ADN/metabolismo , Metilación de ADN , Conjuntos de Datos como Asunto , Elementos de Facilitación Genéticos/genética , Variación Genética/genética , Estudio de Asociación del Genoma Completo , Histonas/metabolismo , Humanos , Especificidad de Órganos/genética , ARN/genética , Valores de Referencia
5.
Bioinformatics ; 35(18): 3496-3498, 2019 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-30715213

RESUMEN

SUMMARY: SaGePhy is a software package for improved phylogenetic simulation of gene and subgene evolution. SaGePhy can be used to generate species trees, gene trees and subgene or (protein) domain trees using a probabilistic birth-death process that allows for gene and subgene duplication, horizontal gene and subgene transfer and gene and subgene loss. SaGePhy implements a range of important features not found in other phylogenetic simulation frameworks/software. These include (i) simulation of subgene or domain level evolution inside one or more gene trees, (ii) simultaneous simulation of both additive and replacing horizontal gene/subgene transfers and (iii) probabilistic sampling of species tree and gene tree nodes, respectively, for gene- and domain-family birth. SaGePhy is open-source, platform independent and written in Java and Python. AVAILABILITY AND IMPLEMENTATION: Executables, source code (open-source under the revised BSD license) and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/sagephy/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Filogenia , Programas Informáticos , Evolución Molecular , Transferencia de Gen Horizontal
6.
Nature ; 515(7527): 355-64, 2014 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-25409824

RESUMEN

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.


Asunto(s)
Genoma/genética , Genómica , Ratones/genética , Anotación de Secuencia Molecular , Animales , Linaje de la Célula/genética , Cromatina/genética , Cromatina/metabolismo , Secuencia Conservada/genética , Replicación del ADN/genética , Desoxirribonucleasa I/metabolismo , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Estudio de Asociación del Genoma Completo , Humanos , ARN/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Especificidad de la Especie , Factores de Transcripción/metabolismo , Transcriptoma/genética
7.
Bioinformatics ; 34(18): 3214-3216, 2018 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-29688310

RESUMEN

Summary: RANGER-DTL 2.0 is a software program for inferring gene family evolution using Duplication-Transfer-Loss reconciliation. This new software is highly scalable and easy to use, and offers many new features not currently available in any other reconciliation program. RANGER-DTL 2.0 has a particular focus on reconciliation accuracy and can account for many sources of reconciliation uncertainty including uncertain gene tree rooting, gene tree topological uncertainty, multiple optimal reconciliations and alternative event cost assignments. RANGER-DTL 2.0 is open-source and written in C++ and Python. Availability and implementation: Pre-compiled executables, source code (open-source under GNU GPL) and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Evolución Molecular , Duplicación de Gen , Programas Informáticos , Algoritmos
8.
Bioinformatics ; 34(21): 3646-3652, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-29762653

RESUMEN

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation: http://phylariane.univ-lyon1.fr/recphyloxml/.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Algoritmos , Filogenia , Programas Informáticos
9.
BMC Bioinformatics ; 19(Suppl 9): 290, 2018 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-30367593

RESUMEN

BACKGROUND: Duplication-Transfer-Loss (DTL) reconciliation is a powerful and increasingly popular technique for studying the evolution of microbial gene families. DTL reconciliation requires the use of rooted gene trees to perform the reconciliation with the species tree, and the standard technique for rooting gene trees is to assign a root that results in the minimum reconciliation cost across all rootings of that gene tree. However, even though it is well understood that many gene trees have multiple optimal roots, only a single optimal root is randomly chosen to create the rooted gene tree and perform the reconciliation. This remains an important overlooked and unaddressed problem in DTL reconciliation, leading to incorrect evolutionary inferences. In this work, we perform an in-depth analysis of the impact of uncertain gene tree rooting on the computed DTL reconciliation and provide the first computational tools to quantify and negate the impact of gene tree rooting uncertainty on DTL reconciliation. RESULTS: Our analysis of a large data set of over 4500 gene families from 100 species shows that a large fraction of gene trees have multiple optimal rootings, that these multiple roots often, but not always, appear closely clustered together in the same region of the gene tree, that many aspects of the reconciliation remain conserved across the multiple rootings, that gene tree error has a profound impact on the prevalence and structure of multiple optimal rootings, and that there are specific interesting patterns in the reconciliation of those gene trees that have multiple optimal roots. CONCLUSIONS: Our results show that unrooted gene trees can be meaningfully reconciled and high-quality evolutionary information can be obtained from them even after accounting for multiple optimal rootings. In addition, the techniques and tools introduced in this paper make it possible to systematically avoid incorrect evolutionary inferences caused by incorrect or uncertain gene tree rooting. These tools have been implemented in the phylogenetic reconciliation software package RANGER-DTL 2.0, freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/ .


Asunto(s)
Algoritmos , Evolución Molecular , Duplicación de Gen , Transferencia de Gen Horizontal , Genómica/métodos , Familia de Multigenes , Filogenia , Programas Informáticos , Incertidumbre
10.
Genome Res ; 24(3): 475-86, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24310000

RESUMEN

Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation methods consider discordance only due to gene duplication and loss (and sometimes horizontal gene transfer). Methods that do model ILS are either highly parameterized or consider a restricted set of histories, thus limiting their applicability and accuracy. To address these challenges, we present a novel algorithm DLCpar for inferring a most parsimonious (MP) history of a gene family in the presence of duplications, losses, and ILS. Our algorithm relies on a new reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes coalescent and duplication-loss history. We show that the LCT representation enables an exhaustive and efficient search over the space of reconciliations, and, for most gene families, the least common ancestor (LCA) mapping is an optimal solution for the species mapping between the gene tree and species tree in an MP LCT. Applying our algorithm to a variety of clades, including flies, fungi, and primates, as well as to simulated phylogenies, we achieve high accuracy, comparable to sophisticated probabilistic reconciliation methods, at reduced run time and with far fewer parameters. These properties enable inferences of the complex evolution of gene families across a broad range of species and large data sets.


Asunto(s)
Dípteros/genética , Evolución Molecular , Hongos/genética , Eliminación de Gen , Duplicación de Gen , Primates/genética , Algoritmos , Animales , Transferencia de Gen Horizontal , Genes , Genoma , Modelos Genéticos , Familia de Multigenes , Filogenia , Especificidad de la Especie
11.
Bioinformatics ; 31(8): 1211-8, 2015 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-25481006

RESUMEN

MOTIVATION: The accurate inference of gene trees is a necessary step in many evolutionary studies. Although the problem of accurate gene tree inference has received considerable attention, most existing methods are only applicable to gene families unaffected by horizontal gene transfer. As a result, the accurate inference of gene trees affected by horizontal gene transfer remains a largely unaddressed problem. RESULTS: In this study, we introduce a new and highly effective method for gene tree error correction in the presence of horizontal gene transfer. Our method efficiently models horizontal gene transfers, gene duplications and losses, and uses a statistical hypothesis testing framework [Shimodaira-Hasegawa (SH) test] to balance sequence likelihood with topological information from a known species tree. Using a thorough simulation study, we show that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that our method dramatically improves gene tree accuracy. We apply our method to a dataset of 11 cyanobacterial species and demonstrate the large impact of gene tree accuracy on downstream evolutionary analyses. AVAILABILITY AND IMPLEMENTATION: An implementation of our method is available at http://compbio.mit.edu/treefix-dtl/ CONTACT: : mukul@engr.uconn.edu or manoli@mit.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Cianobacterias/genética , Evolución Molecular , Transferencia de Gen Horizontal , Biología Computacional/métodos , Duplicación de Gen , Familia de Multigenes , Filogenia
12.
Bioinformatics ; 30(12): i87-95, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24932009

RESUMEN

MOTIVATION: Phylogenetic tree reconciliation is a widely used method for reconstructing the evolutionary histories of gene families and species, hosts and parasites and other dependent pairs of entities. Reconciliation is typically performed using maximum parsimony, in which each evolutionary event type is assigned a cost and the objective is to find a reconciliation of minimum total cost. It is generally understood that reconciliations are sensitive to event costs, but little is understood about the relationship between event costs and solutions. Moreover, choosing appropriate event costs is a notoriously difficult problem. RESULTS: We address this problem by giving an efficient algorithm for computing Pareto-optimal sets of reconciliations, thus providing the first systematic method for understanding the relationship between event costs and reconciliations. This, in turn, results in new techniques for computing event support values and, for cophylogenetic analyses, performing robust statistical tests. We provide new software tools and demonstrate their use on a number of datasets from evolutionary genomic and cophylogenetic studies. AVAILABILITY AND IMPLEMENTATION: Our Python tools are freely available at www.cs.hmc.edu/∼hadas/xscape. .


Asunto(s)
Algoritmos , Filogenia , Genómica , Familia de Multigenes , Programas Informáticos
13.
Bioinformatics ; 29(5): 571-9, 2013 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-23335015

RESUMEN

MOTIVATION: Horizontal gene transfer (HGT) plays a crucial role in the evolution of prokaryotic species. Typically, no more than a few genes are horizontally transferred between any two species. However, several studies identified pairs of species (or linages) between which many different genes were horizontally transferred. Such a pair is said to be linked by a highway of gene sharing. Inferring such highways is crucial to understanding the evolution of prokaryotes and for inferring past symbiotic and ecological associations among different species. RESULTS: We present a new improved method for systematically detecting highways of gene sharing. As we demonstrate using a variety of simulated datasets, our method is highly accurate and efficient, and robust to noise and high rates of HGT. We further validate our method by applying it to a published dataset of >22 000 gene trees from 144 prokaryotic species. Our method makes it practical, for the first time, to perform accurate highway analysis quickly and easily even on large datasets with high rates of HGT. AVAILABILITY AND IMPLEMENTATION: An implementation of the method can be freely downloaded from: http://acgt.cs.tau.ac.il/hide.


Asunto(s)
Algoritmos , Transferencia de Gen Horizontal , Genes Bacterianos , Filogenia , Bacterias/clasificación , Bacterias/genética , Evolución Molecular
14.
Syst Biol ; 62(1): 110-20, 2013 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22949484

RESUMEN

Accurate gene tree reconstruction is a fundamental problem in phylogenetics, with many important applications. However, sequence data alone often lack enough information to confidently support one gene tree topology over many competing alternatives. Here, we present a novel framework for combining sequence data and species tree information, and we describe an implementation of this framework in TreeFix, a new phylogenetic program for improving gene tree reconstructions. Given a gene tree (preferably computed using a maximum-likelihood phylogenetic program), TreeFix finds a "statistically equivalent" gene tree that minimizes a species tree-based cost function. We have applied TreeFix to 2 clades of 12 Drosophila and 16 fungal genomes, as well as to simulated phylogenies and show that it dramatically improves reconstructions compared with current state-of-the-art programs. Given its accuracy, speed, and simplicity, TreeFix should be applicable to a wide range of analyses and have many important implications for future investigations of gene evolution. The source code and a sample data set are available at http://compbio.mit.edu/treefix.


Asunto(s)
Clasificación/métodos , Filogenia , Programas Informáticos , Animales , Drosophila/clasificación , Drosophila/genética , Hongos/clasificación , Hongos/genética , Reproducibilidad de los Resultados
15.
Bioinformatics ; 28(12): i283-91, 2012 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-22689773

RESUMEN

MOTIVATION: Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction. RESULTS: We present two new algorithms for the DTL reconciliation problem that are dramatically faster than existing algorithms, both asymptotically and in practice. We also extend the standard DTL reconciliation model by considering distance-dependent transfer costs, which allow for more accurate reconciliation and give an efficient algorithm for DTL reconciliation under this extended model. We implemented our new algorithms and demonstrated up to 100 000-fold speed-up over existing methods, using both simulated and biological datasets. This dramatic improvement makes it possible to use DTL reconciliation for performing rigorous evolutionary analyses of large gene families and enables its use in advanced reconciliation-based gene and species tree reconstruction methods. AVAILABILITY: Our programs can be freely downloaded from http://compbio.mit.edu/ranger-dtl/.


Asunto(s)
Algoritmos , Evolución Molecular , Duplicación de Gen , Transferencia de Gen Horizontal , Eliminación de Gen , Genómica , Familia de Multigenes , Filogenia , Programas Informáticos
16.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3511-3522, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37436868

RESUMEN

Protein domains play an important role in the function and evolution of many gene families. Previous studies have shown that domains are frequently lost or gained during gene family evolution. Yet, most computational approaches for studying gene family evolution do not account for domain-level evolution within genes. To address this limitation, a new three-level reconciliation framework, called the Domain-Gene-Species (DGS) reconciliation model, has been recently developed to simultaneously model the evolution of a domain family inside one or more gene families and the evolution of those gene families inside a species tree. However, the existing model applies only to multi-cellular eukaryotes where horizontal gene transfer is negligible. In this work, we generalize the existing DGS reconciliation model by allowing for the spread of genes and domains across species boundaries through horizontal transfer. We show that the problem of computing optimal generalized DGS reconciliations, though NP-hard, is approximable to within a constant factor, where the specific approximation ratio depends on the "event costs" used. We provide two different approximation algorithms for the problem and demonstrate the impact of the generalized framework using both simulated and real biological data. Our results show that our new algorithms result in highly accurate reconstructions of domain family evolution for microbes.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Filogenia , Algoritmos , Genes Microbianos , Transferencia de Gen Horizontal/genética , Modelos Genéticos
17.
J Comput Biol ; 30(1): 3-20, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36125448

RESUMEN

An accurate understanding of the evolutionary history of rapidly-evolving viruses like SARS-CoV-2, responsible for the COVID-19 pandemic, is crucial to tracking and preventing the spread of emerging pathogens. However, viruses undergo frequent recombination, which makes it difficult to trace their evolutionary history using traditional phylogenetic methods. In this study, we present a phylogenetic workflow, virDTL, for analyzing viral evolution in the presence of recombination. Our approach leverages reconciliation methods developed for inferring horizontal gene transfer in prokaryotes and, compared to existing tools, is uniquely able to identify ancestral recombinations while accounting for several sources of inference uncertainty, including in the construction of a strain tree, estimation and rooting of gene family trees, and reconciliation itself. We apply this workflow to the Sarbecovirus subgenus and demonstrate how a principled analysis of predicted recombination gives insight into the evolution of SARS-CoV-2. In addition to providing confirming evidence for the horseshoe bat as its zoonotic origin, we identify several ancestral recombination events that merit further study.


Asunto(s)
COVID-19 , Quirópteros , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo , Animales , Humanos , SARS-CoV-2/genética , COVID-19/epidemiología , COVID-19/genética , Filogenia , Pandemias , Transferencia de Gen Horizontal/genética , Quirópteros/genética , Genoma Viral/genética , Evolución Molecular
18.
Syst Biol ; 60(2): 117-25, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21186249

RESUMEN

Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.


Asunto(s)
Clasificación/métodos , Filogenia , Plantas/clasificación , Plantas/genética , Algoritmos , Etiquetas de Secuencia Expresada , Genómica
19.
Methods Mol Biol ; 2569: 233-252, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36083451

RESUMEN

Phylogenetic reconciliation has emerged as a principled, highly effective technique for investigating the origin, spread, and evolutionary history of microbial gene families. Proper application of phylogenetic reconciliation requires a clear understanding of potential pitfalls and sources of error, and knowledge of the most effective reconciliation-based tools and protocols to use to maximize accuracy. In this book chapter, we provide a brief overview of Duplication-Transfer-Loss (DTL) reconciliation, the standard reconciliation model used to study microbial gene families and provide a step-by-step computational protocol to maximize the accuracy of DTL reconciliation and minimize false-positive evolutionary inferences.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Algoritmos , Transferencia de Gen Horizontal , Genes Microbianos , Modelos Genéticos , Filogenia
20.
Artículo en Inglés | MEDLINE | ID: mdl-34255632

RESUMEN

The inference of disease transmission networks is an important problem in epidemiology. One popular approach for building transmission networks is to reconstruct a phylogenetic tree using sequences from disease strains sampled from infected hosts and infer transmissions based on this tree. However, most existing phylogenetic approaches for transmission network inference are highly computationally intensive and cannot take within-host strain diversity into account. Here, we introduce a new phylogenetic approach for inferring transmission networks, TNet, that addresses these limitations. TNet uses multiple strain sequences from each sampled host to infer transmissions and is simpler and more accurate than existing approaches. Furthermore, TNet is highly scalable and able to distinguish between ambiguous and unambiguous transmission inferences. We evaluated TNet on a large collection of 560 simulated transmission networks of various sizes and diverse host, sequence, and transmission characteristics, as well as on 10 real transmission datasets with known transmission histories. Our results show that TNet outperforms two other recently developed methods, phyloscanner and SharpTNI, that also consider within-host strain diversity. We also applied TNet to a large collection of SARS-CoV-2 genomes sampled from infected individuals in many countries around the world, demonstrating how our inference framework can be adapted to accurately infer geographical transmission networks. TNet is freely available from https://compbio.engr.uconn.edu/software/TNet/.


Asunto(s)
COVID-19 , Genoma , Humanos , Filogenia , SARS-CoV-2
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA