Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bull Math Biol ; 86(3): 24, 2024 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-38294587

RESUMO

Phylogenetic trees are a mathematical formalisation of evolutionary histories between organisms, species, genes, cancer cells, etc. For many applications, e.g. when analysing virus transmission trees or cancer evolution, (phylogenetic) time trees are of interest, where branch lengths represent times. Computational methods for reconstructing time trees from (typically molecular) sequence data, for example Bayesian phylogenetic inference using Markov Chain Monte Carlo (MCMC) methods, rely on algorithms that sample the treespace. They employ tree rearrangement operations such as [Formula: see text] (Subtree Prune and Regraft) and [Formula: see text] (Nearest Neighbour Interchange) or, in the case of time tree inference, versions of these that take times of internal nodes into account. While the classic [Formula: see text] tree rearrangement is well-studied, its variants for time trees are less understood, limiting comparative analysis for time tree methods. In this paper we consider a modification of the classical [Formula: see text] rearrangement on the space of ranked phylogenetic trees, which are trees equipped with a ranking of all internal nodes. This modification results in two novel treespaces, which we propose to study. We begin this study by discussing algorithmic properties of these treespaces, focusing on those relating to the complexity of computing distances under the ranked [Formula: see text] operations as well as similarities and differences to known tree rearrangement based treespaces. Surprisingly, we show the counterintuitive result that adding leaves to trees can actually decrease their ranked [Formula: see text] distance, which may have an impact on the results of time tree sampling algorithms given uncertain "rogue taxa".


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Teorema de Bayes , Filogenia , Algoritmos
2.
J Comput Biol ; 30(4): 518-537, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36475926

RESUMO

Phylogenetic methods are emerging as a useful tool to understand cancer evolutionary dynamics, including tumor structure, heterogeneity, and progression. Most currently used approaches utilize either bulk whole genome sequencing or single-cell DNA sequencing and are based on calling copy number alterations and single nucleotide variants (SNVs). Single-cell RNA sequencing (scRNA-seq) is commonly applied to explore differential gene expression of cancer cells throughout tumor progression. The method exacerbates the single-cell sequencing problem of low yield per cell with uneven expression levels. This accounts for low and uneven sequencing coverage and makes SNV detection and phylogenetic analysis challenging. In this article, we demonstrate for the first time that scRNA-seq data contain sufficient evolutionary signal and can also be utilized in phylogenetic analyses. We explore and compare results of such analyses based on both expression levels and SNVs called from scRNA-seq data. Both techniques are shown to be useful for reconstructing phylogenetic relationships between cells, reflecting the clonal composition of a tumor. Both standardized expression values and SNVs appear to be equally capable of reconstructing a similar pattern of phylogenetic relationship. This pattern is stable even when phylogenetic uncertainty is taken in account. Our results open up a new direction of somatic phylogenetics based on scRNA-seq data. Further research is required to refine and improve these approaches to capture the full picture of somatic evolutionary dynamics in cancer.


Assuntos
Neoplasias , Análise da Expressão Gênica de Célula Única , Humanos , Filogenia , Neoplasias/genética , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos
3.
PLoS Comput Biol ; 18(12): e1010730, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36580499

RESUMO

Large-scale genotype-phenotype screens provide a wealth of data for identifying molecular alterations associated with a phenotype. Epistatic effects play an important role in such association studies. For example, siRNA perturbation screens can be used to identify combinatorial gene-silencing effects. In bacteria, epistasis has practical consequences in determining antimicrobial resistance as the genetic background of a strain plays an important role in determining resistance. Recently developed tools scale to human exome-wide screens for pairwise interactions, but none to date have included the possibility of three-way interactions. Expanding upon recent state-of-the-art methods, we make a number of improvements to the performance on large-scale data, making consideration of three-way interactions possible. We demonstrate our proposed method, Pint, on both simulated and real data sets, including antibiotic resistance testing and siRNA perturbation screens. Pint outperforms known methods in simulated data, and identifies a number of biologically plausible gene effects in both the antibiotic and siRNA models. For example, we have identified a combination of known tumour suppressor genes that is predicted (using Pint) to cause a significant increase in cell proliferation.


Assuntos
Antibacterianos , Epistasia Genética , Humanos , Fenótipo , Antibacterianos/farmacologia
4.
Mol Biol Evol ; 39(8)2022 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-35733333

RESUMO

Single-cell sequencing provides a new way to explore the evolutionary history of cells. Compared to traditional bulk sequencing, where a population of heterogeneous cells is pooled to form a single observation, single-cell sequencing isolates and amplifies genetic material from individual cells, thereby preserving the information about the origin of the sequences. However, single-cell data are more error-prone than bulk sequencing data due to the limited genomic material available per cell. Here, we present error and mutation models for evolutionary inference of single-cell data within a mature and extensible Bayesian framework, BEAST2. Our framework enables integration with biologically informative models such as relaxed molecular clocks and population dynamic models. Our simulations show that modeling errors increase the accuracy of relative divergence times and substitution parameters. We reconstruct the phylogenetic history of a colorectal cancer patient and a healthy patient from single-cell DNA sequencing data. We find that the estimated times of terminal splitting events are shifted forward in time compared to models which ignore errors. We observed that not accounting for errors can overestimate the phylogenetic diversity in single-cell DNA sequencing data. We estimate that 30-50% of the apparent diversity can be attributed to error. Our work enables a full Bayesian approach capable of accounting for errors in the data within the integrative Bayesian software framework BEAST2.


Assuntos
Neoplasias , Software , Teorema de Bayes , Evolução Molecular , Genômica , Humanos , Modelos Genéticos , Filogenia
5.
Artigo em Inglês | MEDLINE | ID: mdl-35483879

RESUMO

Tuberous sclerosis complex (TSC) is an inheritable disorder characterized by the formation of benign yet disorganized tumors in multiple organ systems. Germline mutations in the TSC1 (hamartin) or more frequently TSC2 (tuberin) genes are causative for TSC. The malignant manifestations of TSC, pulmonary lymphangioleiomyomatosis (LAM) and renal angiomyolipoma (AML), may also occur as independent sporadic perivascular epithelial cell tumor (PEComa) characterized by somatic TSC2 mutations. Thus, discerning TSC from the copresentation of sporadic LAM and sporadic AML may be obscured in TSC patients lacking additional features. In this report, we present a case study on a single patient initially reported to have sporadic LAM and a mucinous duodenal adenocarcinoma deficient in DNA mismatch repair proteins. Moreover, the patient had a history of Wilms' tumor, which was reclassified as AML following the LAM diagnosis. Therefore, we investigated the origins and relatedness of these tumors. Using germline whole-genome sequencing, we identified a premature truncation in one of the patient's TSC2 alleles. Using immunohistochemistry, loss of tuberin expression was observed in AML and LAM tissue. However, no evidence of a somatic loss of heterozygosity or DNA methylation epimutations was observed at the TSC2 locus, suggesting alternate mechanisms may contribute to loss of the tumor suppressor protein. In the mucinous duodenal adenocarcinoma, no causative mutations were found in the DNA mismatch repair genes MLH1, MSH2, MSH6, or PMS2 Rather, clonal deconvolution analyses were used to identify mutations contributing to pathogenesis. This report highlights both the utility of using multiple sequencing techniques and the complexity of interpreting the data in a clinical context.


Assuntos
Adenocarcinoma , Angiomiolipoma , Neoplasias Renais , Leucemia Mieloide Aguda , Esclerose Tuberosa , Angiomiolipoma/genética , Angiomiolipoma/patologia , Feminino , Humanos , Masculino , Esclerose Tuberosa/diagnóstico , Esclerose Tuberosa/genética , Esclerose Tuberosa/metabolismo , Proteína 2 do Complexo Esclerose Tuberosa/genética , Proteínas Supressoras de Tumor/genética
6.
Genome Biol ; 23(1): 56, 2022 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-35172880

RESUMO

BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. RESULTS: We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. CONCLUSIONS: Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish-possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate.


Assuntos
Biologia Computacional , Software , Editoração
7.
J Math Biol ; 83(5): 60, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34739608

RESUMO

In many phylogenetic applications, such as cancer and virus evolution, time trees, evolutionary histories where speciation events are timed, are inferred. Of particular interest are clock-like trees, where all leaves are sampled at the same time and have equal distance to the root. One popular approach to model clock-like trees is coalescent theory, which is used in various tree inference software packages. Methodologically, phylogenetic inference methods require a tree space over which the inference is performed, and the geometry of this space plays an important role in statistical and computational aspects of tree inference algorithms. It has recently been shown that coalescent tree spaces possess a unique geometry, different from that of classical phylogenetic tree spaces. Here we introduce and study a space of discrete coalescent trees. They assume that time is discrete, which is natural in many computational applications. This tree space is a generalisation of the previously studied ranked nearest neighbour interchange space, and is built upon tree-rearrangement operations. We generalise existing results about ranked trees, including an algorithm for computing distances in polynomial time, and in particular provide new results for both the space of discrete coalescent trees and the space of ranked trees. We establish several geometrical properties of these spaces and show how these properties impact various algorithms used in phylogenetic analyses. Our tree space is a discretisation of a previously introduced time tree space, called t-space, and hence our results can be used to approximate solutions to various open problems in t-space.


Assuntos
Algoritmos , Análise por Conglomerados , Filogenia
8.
PLoS One ; 16(7): e0254491, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34255784

RESUMO

The treatment of complex diseases often relies on combinatorial therapy, a strategy where drugs are used to target multiple genes simultaneously. Promising candidate genes for combinatorial perturbation often constitute epistatic genes, i.e., genes which contribute to a phenotype in a non-linear fashion. Experimental identification of the full landscape of genetic interactions by perturbing all gene combinations is prohibitive due to the exponential growth of testable hypotheses. Here we present a model for the inference of pairwise epistatic, including synthetic lethal, gene interactions from siRNA-based perturbation screens. The model exploits the combinatorial nature of siRNA-based screens resulting from the high numbers of sequence-dependent off-target effects, where each siRNA apart from its intended target knocks down hundreds of additional genes. We show that conditional and marginal epistasis can be estimated as interaction coefficients of regression models on perturbation data. We compare two methods, namely glinternet and xyz, for selecting non-zero effects in high dimensions as components of the model, and make recommendations for the appropriate use of each. For data simulated from real RNAi screening libraries, we show that glinternet successfully identifies epistatic gene pairs with high accuracy across a wide range of relevant parameters for the signal-to-noise ratio of observed phenotypes, the effect size of epistasis and the number of observations per double knockdown. xyz is also able to identify interactions from lower dimensional data sets (fewer genes), but is less accurate for many dimensions. Higher accuracy of glinternet, however, comes at the cost of longer running time compared to xyz. The general model is widely applicable and allows mining the wealth of publicly available RNAi screening data for the estimation of epistatic interactions between genes. As a proof of concept, we apply the model to search for interactions, and potential targets for treatment, among previously published sets of siRNA perturbation screens on various pathogens. The identified interactions include both known epistatic interactions as well as novel findings.


Assuntos
Biologia Computacional/métodos , Interferência de RNA/fisiologia , Epistasia Genética/genética , Epistasia Genética/fisiologia , Humanos , Modelos Teóricos
9.
J Math Biol ; 82(1-2): 8, 2021 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-33492606

RESUMO

Many popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is [Formula: see text]-hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to [Formula: see text]-hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).


Assuntos
Algoritmos , Modelos Genéticos , Análise por Conglomerados , Biologia Computacional , Filogenia
10.
Proc Natl Acad Sci U S A ; 115(51): E11951-E11960, 2018 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-30510004

RESUMO

Gut bacteria can affect key aspects of host fitness, such as development, fecundity, and lifespan, while the host, in turn, shapes the gut microbiome. However, it is unclear to what extent individual species versus community interactions within the microbiome are linked to host fitness. Here, we combinatorially dissect the natural microbiome of Drosophila melanogaster and reveal that interactions between bacteria shape host fitness through life history tradeoffs. Empirically, we made germ-free flies colonized with each possible combination of the five core species of fly gut bacteria. We measured the resulting bacterial community abundances and fly fitness traits, including development, reproduction, and lifespan. The fly gut promoted bacterial diversity, which, in turn, accelerated development, reproduction, and aging: Flies that reproduced more died sooner. From these measurements, we calculated the impact of bacterial interactions on fly fitness by adapting the mathematics of genetic epistasis to the microbiome. Development and fecundity converged with higher diversity, suggesting minimal dependence on interactions. However, host lifespan and microbiome abundances were highly dependent on interactions between bacterial species. Higher-order interactions (involving three, four, and five species) occurred in 13-44% of possible cases depending on the trait, with the same interactions affecting multiple traits, a reflection of the life history tradeoff. Overall, we found these interactions were frequently context-dependent and often had the same magnitude as individual species themselves, indicating that the interactions can be as important as the individual species in gut microbiomes.


Assuntos
Microbioma Gastrointestinal/fisiologia , Trato Gastrointestinal/microbiologia , Interações entre Hospedeiro e Microrganismos/fisiologia , Interações Microbianas/fisiologia , Microbiota/fisiologia , Animais , Bactérias/isolamento & purificação , Biodiversidade , Drosophila melanogaster , Epistasia Genética , Fertilidade , Microbioma Gastrointestinal/genética , Vida Livre de Germes , Interações entre Hospedeiro e Microrganismos/genética , Longevidade , Interações Microbianas/genética , Microbiota/genética , Fenótipo , Reprodução
11.
J Math Biol ; 77(4): 951-970, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-29736875

RESUMO

We present an efficient computational approach for detecting genetic interactions from fitness comparison data together with a geometric interpretation using polyhedral cones associated to partial orderings. Genetic interactions are defined by linear forms with integer coefficients in the fitness variables assigned to genotypes. These forms generalize several popular approaches to study interactions, including Fourier-Walsh coefficients, interaction coordinates, and circuits. We assume that fitness measurements come with high uncertainty or are even unavailable, as is the case for many empirical studies, and derive interactions only from comparisons of genotypes with respect to their fitness, i.e. from partial fitness orders. We present a characterization of the class of partial fitness orders that imply interactions, using a graph-theoretic approach. Our characterization then yields an efficient algorithm for testing the condition when certain genetic interactions, such as sign epistasis, are implied. This provides an exponential improvement of the best previously known method. We also present a geometric interpretation of our characterization, which provides the basis for statistical analysis of partial fitness orders and genetic interactions.


Assuntos
Aptidão Genética , Modelos Genéticos , Algoritmos , Animais , Antimaláricos/administração & dosagem , Evolução Biológica , Epistasia Genética , Genótipo , Humanos , Modelos Lineares , Conceitos Matemáticos , Mutação , Plasmodium vivax/efeitos dos fármacos , Plasmodium vivax/genética , Plasmodium vivax/crescimento & desenvolvimento , Pirimetamina/administração & dosagem
12.
J Math Biol ; 76(5): 1101-1121, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28756523

RESUMO

A time-tree is a rooted phylogenetic tree such that all internal nodes are equipped with absolute divergence dates and all leaf nodes are equipped with sampling dates. Such time-trees have become a central object of study in phylogenetics but little is known about the parameter space of such objects. Here we introduce and study a hierarchy of discrete approximations of the space of time-trees from the graph-theoretic and algorithmic point of view. One of the basic and widely used phylogenetic graphs, the [Formula: see text] graph, is the roughest approximation and bottom level of our hierarchy. More refined approximations discretize the relative timing of evolutionary divergence and sampling dates. We study basic graph-theoretic questions for these graphs, including the size of neighborhoods, diameter upper and lower bounds, and the problem of finding shortest paths. We settle many of these questions by extending the concept of graph grammars introduced by Sleator, Tarjan, and Thurston to our graphs. Although time values greatly increase the number of possible trees, we show that 1-neighborhood sizes remain linear, allowing for efficient local exploration and construction of these graphs. We also obtain upper bounds on the r-neighborhood sizes of these graphs, including a smaller bound than was previously known for [Formula: see text]. Our results open up a number of possible directions for theoretical investigation of graph-theoretic and algorithmic properties of the time-tree graphs. We discuss the directions that are most valuable for phylogenetic applications and give a list of prominent open problems for those applications. In particular, we conjecture that the split theorem applies to shortest paths in time-tree graphs, a property not shared in the general [Formula: see text] case.


Assuntos
Algoritmos , Filogenia , Evolução Biológica , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Método de Monte Carlo , Fatores de Tempo
13.
Elife ; 62017 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-29260711

RESUMO

Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax, the fungus Aspergillus niger, and the TEM-family of ß-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.


Assuntos
Aspergillus niger/genética , Epistasia Genética , Aptidão Genética , Genótipo , HIV-1/genética , Plasmodium vivax/genética , beta-Lactamases/genética , Aspergillus niger/fisiologia , HIV-1/fisiologia , Plasmodium vivax/fisiologia
14.
J Theor Biol ; 403: 197-208, 2016 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-27188249

RESUMO

The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods.


Assuntos
Filogenia , Modelos Teóricos
15.
Proc Biol Sci ; 282(1806): 20150420, 2015 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-25876846

RESUMO

One of the central objectives in the field of phylodynamics is the quantification of population dynamic processes using genetic sequence data or in some cases phenotypic data. Phylodynamics has been successfully applied to many different processes, such as the spread of infectious diseases, within-host evolution of a pathogen, macroevolution and even language evolution. Phylodynamic analysis requires a probability distribution on phylogenetic trees spanned by the genetic data. Because such a probability distribution is not available for many common stochastic population dynamic processes, coalescent-based approximations assuming deterministic population size changes are widely employed. Key to many population dynamic models, in particular epidemiological models, is a period of exponential population growth during the initial phase. Here, we show that the coalescent does not well approximate stochastic exponential population growth, which is typically modelled by a birth-death process. We demonstrate that introducing demographic stochasticity into the population size function of the coalescent improves the approximation for values of R0 close to 1, but substantial differences remain for large R0. In addition, the computational advantage of using an approximation over exact models vanishes when introducing such demographic stochasticity. These results highlight that we need to increase efforts to develop phylodynamic tools that correctly account for the stochasticity of population dynamic models for inference.


Assuntos
Coeficiente de Natalidade , Modelos Biológicos , Mortalidade , Dinâmica Populacional , Crescimento Demográfico , Processos Estocásticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA