Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 39(39 Suppl 1): i131-i139, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387130

ABSTRACT

MOTIVATION: Recent advances in spatial proteomics technologies have enabled the profiling of dozens of proteins in thousands of single cells in situ. This has created the opportunity to move beyond quantifying the composition of cell types in tissue, and instead probe the spatial relationships between cells. However, most current methods for clustering data from these assays only consider the expression values of cells and ignore the spatial context. Furthermore, existing approaches do not account for prior information about the expected cell populations in a sample. RESULTS: To address these shortcomings, we developed SpatialSort, a spatially aware Bayesian clustering approach that allows for the incorporation of prior biological knowledge. Our method is able to account for the affinities of cells of different types to neighbour in space, and by incorporating prior information about expected cell populations, it is able to simultaneously improve clustering accuracy and perform automated annotation of clusters. Using synthetic and real data, we show that by using spatial and prior information SpatialSort improves clustering accuracy. We also demonstrate how SpatialSort can perform label transfer between spatial and nonspatial modalities through the analysis of a real world diffuse large B-cell lymphoma dataset. AVAILABILITY AND IMPLEMENTATION: Source code is available on Github at: https://github.com/Roth-Lab/SpatialSort.


Subject(s)
Lymphoma, Large B-Cell, Diffuse , Proteomics , Humans , Bayes Theorem , Biological Assay , Cluster Analysis
2.
Nat Commun ; 13(1): 4534, 2022 08 04.
Article in English | MEDLINE | ID: mdl-35927228

ABSTRACT

Assessing tumour gene fitness in physiologically-relevant model systems is challenging due to biological features of in vivo tumour regeneration, including extreme variations in single cell lineage progeny. Here we develop a reproducible, quantitative approach to pooled genetic perturbation in patient-derived xenografts (PDXs), by encoding single cell output from transplanted CRISPR-transduced cells in combination with a Bayesian hierarchical model. We apply this to 181 PDX transplants from 21 breast cancer patients. We show that uncertainty in fitness estimates depends critically on the number of transplant cell clones and the variability in clone sizes. We use a pathway-directed allelic series to characterize Notch signaling, and quantify TP53 / MDM2 drug-gene conditional fitness in outlier patients. We show that fitness outlier identification can be mirrored by pharmacological perturbation. Overall, we demonstrate that the gene fitness landscape in breast PDXs is dominated by inter-patient differences.


Subject(s)
Breast Neoplasms , Clustered Regularly Interspaced Short Palindromic Repeats , Animals , Bayes Theorem , Breast Neoplasms/genetics , Disease Models, Animal , Female , Heterografts , Humans , Xenograft Model Antitumor Assays
3.
Nature ; 595(7868): 585-590, 2021 07.
Article in English | MEDLINE | ID: mdl-34163070

ABSTRACT

Progress in defining genomic fitness landscapes in cancer, especially those defined by copy number alterations (CNAs), has been impeded by lack of time-series single-cell sampling of polyclonal populations and temporal statistical models1-7. Here we generated 42,000 genomes from multi-year time-series single-cell whole-genome sequencing of breast epithelium and primary triple-negative breast cancer (TNBC) patient-derived xenografts (PDXs), revealing the nature of CNA-defined clonal fitness dynamics induced by TP53 mutation and cisplatin chemotherapy. Using a new Wright-Fisher population genetics model8,9 to infer clonal fitness, we found that TP53 mutation alters the fitness landscape, reproducibly distributing fitness over a larger number of clones associated with distinct CNAs. Furthermore, in TNBC PDX models with mutated TP53, inferred fitness coefficients from CNA-based genotypes accurately forecast experimentally enforced clonal competition dynamics. Drug treatment in three long-term serially passaged TNBC PDXs resulted in cisplatin-resistant clones emerging from low-fitness phylogenetic lineages in the untreated setting. Conversely, high-fitness clones from treatment-naive controls were eradicated, signalling an inversion of the fitness landscape. Finally, upon release of drug, selection pressure dynamics were reversed, indicating a fitness cost of treatment resistance. Together, our findings define clonal fitness linked to both CNA and therapeutic resistance in polyclonal tumours.


Subject(s)
DNA Copy Number Variations , Drug Resistance, Neoplasm , Triple Negative Breast Neoplasms/genetics , Animals , Cell Line, Tumor , Cisplatin/pharmacology , Clone Cells/pathology , Female , Genetic Fitness , Humans , Mice , Models, Statistical , Neoplasm Transplantation , Tumor Suppressor Protein p53/genetics , Whole Genome Sequencing
4.
Syst Biol ; 69(1): 155-183, 2020 01 01.
Article in English | MEDLINE | ID: mdl-31173141

ABSTRACT

We describe an "embarrassingly parallel" method for Bayesian phylogenetic inference, annealed Sequential Monte Carlo (SMC), based on recent advances in the SMC literature such as adaptive determination of annealing parameters. The algorithm provides an approximate posterior distribution over trees and evolutionary parameters as well as an unbiased estimator for the marginal likelihood. This unbiasedness property can be used for the purpose of testing the correctness of posterior simulation software. We evaluate the performance of phylogenetic annealed SMC by reviewing and comparing with other computational Bayesian phylogenetic methods, in particular, different marginal likelihood estimation methods. Unlike previous SMC methods in phylogenetics, our annealed method can utilize standard Markov chain Monte Carlo (MCMC) tree moves and hence benefit from the large inventory of such moves available in the literature. Consequently, the annealed SMC method should be relatively easy to incorporate into existing phylogenetic software packages based on MCMC algorithms. We illustrate our method using simulation studies and real data analysis.


Subject(s)
Algorithms , Classification/methods , Phylogeny , Bayes Theorem , Monte Carlo Method , Software
5.
Genome Biol ; 20(1): 54, 2019 03 12.
Article in English | MEDLINE | ID: mdl-30866997

ABSTRACT

Measuring gene expression of tumor clones at single-cell resolution links functional consequences to somatic alterations. Without scalable methods to simultaneously assay DNA and RNA from the same single cell, parallel single-cell DNA and RNA measurements from independent cell populations must be mapped for genome-transcriptome association. We present clonealign, which assigns gene expression states to cancer clones using single-cell RNA and DNA sequencing independently sampled from a heterogeneous population. We apply clonealign to triple-negative breast cancer patient-derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either sequencing method alone.


Subject(s)
Biomarkers, Tumor/genetics , Cystadenocarcinoma, Serous/genetics , High-Throughput Nucleotide Sequencing/methods , Models, Statistical , Ovarian Neoplasms/genetics , Single-Cell Analysis/methods , Software , Triple Negative Breast Neoplasms/genetics , Animals , Clone Cells , Cystadenocarcinoma, Serous/pathology , Female , Humans , Mice, Inbred NOD , Mice, SCID , Ovarian Neoplasms/pathology , Triple Negative Breast Neoplasms/pathology , Tumor Cells, Cultured , Xenograft Model Antitumor Assays
6.
Commun Biol ; 2: 44, 2019.
Article in English | MEDLINE | ID: mdl-30729182

ABSTRACT

Somatic mutations are a primary contributor to malignancy in human cells. Accurate detection of mutations is needed to define the clonal composition of tumours whereby clones may have distinct phenotypic properties. Although analysis of mutations over multiple tumour samples from the same patient has the potential to enhance identification of clones, few analytic methods exploit the correlation structure across samples. We posited that incorporating clonal information into joint analysis over multiple samples would improve mutation detection, particularly those with low prevalence. In this paper, we develop a new procedure called MuClone, for detection of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. In addition to mutation detection, MuClone classifies mutations into biologically meaningful groups and allows us to study clonal dynamics. We show that, on lung and ovarian cancer datasets, MuClone improves somatic mutation detection sensitivity over competing approaches without compromising specificity.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , Cystadenocarcinoma, Serous/genetics , Genome, Human , Lung Neoplasms/genetics , Models, Statistical , Neoplasm Proteins/genetics , Ovarian Neoplasms/genetics , Carcinoma, Non-Small-Cell Lung/diagnosis , Carcinoma, Non-Small-Cell Lung/metabolism , Carcinoma, Non-Small-Cell Lung/pathology , Clone Cells , Cystadenocarcinoma, Serous/diagnosis , Cystadenocarcinoma, Serous/metabolism , Cystadenocarcinoma, Serous/pathology , Datasets as Topic , Exome , Female , Gene Expression , Genetic Loci , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/metabolism , Lung Neoplasms/pathology , Male , Multigene Family , Mutation , Neoplasm Proteins/metabolism , Ovarian Neoplasms/diagnosis , Ovarian Neoplasms/metabolism , Ovarian Neoplasms/pathology , Software , Whole Genome Sequencing
7.
Cell ; 173(7): 1755-1769.e22, 2018 06 14.
Article in English | MEDLINE | ID: mdl-29754820

ABSTRACT

High-grade serous ovarian cancer (HGSC) exhibits extensive malignant clonal diversity with widespread but non-random patterns of disease dissemination. We investigated whether local immune microenvironment factors shape tumor progression properties at the interface of tumor-infiltrating lymphocytes (TILs) and cancer cells. Through multi-region study of 212 samples from 38 patients with whole-genome sequencing, immunohistochemistry, histologic image analysis, gene expression profiling, and T and B cell receptor sequencing, we identified three immunologic subtypes across samples and extensive within-patient diversity. Epithelial CD8+ TILs negatively associated with malignant diversity, reflecting immunological pruning of tumor clones inferred by neoantigen depletion, HLA I loss of heterozygosity, and spatial tracking between T cell and tumor clones. In addition, combinatorial prognostic effects of mutational processes and immune properties were observed, illuminating how specific genomic aberration types associate with immune response and impact survival. We conclude that within-patient spatial immune microenvironment variation shapes intraperitoneal malignant spread, provoking new evolutionary perspectives on HGSC clonal dispersion.


Subject(s)
Lymphocytes, Tumor-Infiltrating/immunology , Ovarian Neoplasms/pathology , Adult , Aged , Aged, 80 and over , Antigens, Neoplasm/genetics , Antigens, Neoplasm/metabolism , BRCA1 Protein/genetics , BRCA1 Protein/metabolism , BRCA2 Protein/genetics , BRCA2 Protein/metabolism , CD8 Antigens/metabolism , Cluster Analysis , Female , HLA Antigens/genetics , HLA Antigens/metabolism , Humans , Loss of Heterozygosity , Lymphocytes, Tumor-Infiltrating/cytology , Lymphocytes, Tumor-Infiltrating/metabolism , Middle Aged , Neoplasm Grading , Ovarian Neoplasms/classification , Ovarian Neoplasms/immunology , Polymorphism, Single Nucleotide , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/metabolism , Whole Genome Sequencing , Young Adult
9.
Genome Biol ; 18(1): 140, 2017 07 27.
Article in English | MEDLINE | ID: mdl-28750660

ABSTRACT

Somatic evolution of malignant cells produces tumors composed of multiple clonal populations, distinguished in part by rearrangements and copy number changes affecting chromosomal segments. Whole genome sequencing mixes the signals of sampled populations, diluting the signals of clone-specific aberrations, and complicating estimation of clone-specific genotypes. We introduce ReMixT, a method to unmix tumor and contaminating normal signals and jointly predict mixture proportions, clone-specific segment copy number, and clone specificity of breakpoints. ReMixT is free, open-source software and is available at http://bitbucket.org/dranew/remixt .


Subject(s)
Breast Neoplasms/genetics , Cystadenocarcinoma, Serous/genetics , Genome, Human , Models, Statistical , Ovarian Neoplasms/genetics , Software , Algorithms , Animals , Breast Neoplasms/metabolism , Breast Neoplasms/pathology , Cell Count , Clone Cells , Cystadenocarcinoma, Serous/metabolism , Cystadenocarcinoma, Serous/pathology , DNA Copy Number Variations , Female , Genotype , Heterografts/metabolism , Heterografts/pathology , Humans , Internet , Mice , Mice, SCID , Neoplastic Cells, Circulating , Ovarian Neoplasms/metabolism , Ovarian Neoplasms/pathology , Translocation, Genetic , Whole Genome Sequencing
10.
Genome Biol ; 18(1): 44, 2017 03 01.
Article in English | MEDLINE | ID: mdl-28249593

ABSTRACT

Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.


Subject(s)
Clone Cells/metabolism , Computational Biology/methods , Models, Statistical , Neoplasms/genetics , Single-Cell Analysis , Alleles , Animals , Cluster Analysis , Computer Simulation , Disease Models, Animal , Female , Genotype , Heterografts , High-Throughput Nucleotide Sequencing , Humans , Mice , Mutation , Neoplasms/pathology , Reproducibility of Results , Sequence Analysis, DNA , Single-Cell Analysis/methods , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/pathology , Workflow
11.
Nat Genet ; 48(7): 758-67, 2016 07.
Article in English | MEDLINE | ID: mdl-27182968

ABSTRACT

We performed phylogenetic analysis of high-grade serous ovarian cancers (68 samples from seven patients), identifying constituent clones and quantifying their relative abundances at multiple intraperitoneal sites. Through whole-genome and single-nucleus sequencing, we identified evolutionary features including mutation loss, convergence of the structural genome and temporal activation of mutational processes that patterned clonal progression. We then determined the precise clonal mixtures comprising each tumor sample. The majority of sites were clonally pure or composed of clones from a single phylogenetic clade. However, each patient contained at least one site composed of polyphyletic clones. Five patients exhibited monoclonal and unidirectional seeding from the ovary to intraperitoneal sites, and two patients demonstrated polyclonal spread and reseeding. Our findings indicate that at least two distinct modes of intraperitoneal spread operate in clonal dissemination and highlight the distribution of migratory potential over clonal populations comprising high-grade serous ovarian cancers.


Subject(s)
Biomarkers, Tumor/genetics , Clone Cells/pathology , Cystadenocarcinoma, Serous/pathology , Genetic Variation/genetics , Ovarian Neoplasms/pathology , Peritoneal Neoplasms/pathology , Tumor Microenvironment/genetics , Aged , Clone Cells/metabolism , Cystadenocarcinoma, Serous/genetics , Disease Progression , Fallopian Tube Neoplasms/genetics , Fallopian Tube Neoplasms/pathology , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Humans , Middle Aged , Mutation/genetics , Neoplasm Grading , Neoplasm Recurrence, Local/genetics , Neoplasm Recurrence, Local/pathology , Ovarian Neoplasms/genetics , Peritoneal Neoplasms/genetics , Phylogeny , Single-Cell Analysis/methods , Survival Rate
12.
Nat Methods ; 13(7): 573-6, 2016 07.
Article in English | MEDLINE | ID: mdl-27183439

ABSTRACT

Single-cell DNA sequencing has great potential to reveal the clonal genotypes and population structure of human cancers. However, single-cell data suffer from missing values and biased allelic counts as well as false genotype measurements owing to the sequencing of multiple cells. We describe the Single Cell Genotyper (https://bitbucket.org/aroth85/scg), an open-source software based on a statistical model coupled with a mean-field variational inference method, which can be used to address these problems and robustly infer clonal genotypes.


Subject(s)
Cystadenocarcinoma, Serous/genetics , Leukemia/genetics , Mammary Glands, Human/metabolism , Ovarian Neoplasms/genetics , Single-Cell Analysis/methods , Software , Clone Cells , Female , Genome, Human , Genotype , High-Throughput Nucleotide Sequencing/methods , Humans , Models, Statistical , Polymorphism, Single Nucleotide/genetics
13.
Nat Methods ; 11(4): 396-8, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24633410

ABSTRACT

We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.


Subject(s)
Bayes Theorem , Cluster Analysis , Models, Biological , Models, Statistical , Neoplasms/metabolism , Algorithms , Alleles , Animals , DNA Mutational Analysis/methods , Gene Expression Regulation, Neoplastic , Humans , Mutation , Reproducibility of Results , Software
14.
Bull Math Biol ; 75(12): 2529-50, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24135792

ABSTRACT

Probabilistic models over strings have played a key role in developing methods that take into consideration indels as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question is the complexity of computing the normalization of a class of string-valued graphical models. This question has been investigated using tools from combinatorics, dynamic programming, and graph theory, and has practical applications in Bayesian phylogenetics. In this work, we revisit this theoretical question from a different point of view, based on linear algebra. The main contribution is a set of results based on this linear algebra view that facilitate the analysis and design of inference algorithms on string-valued graphical models. As an illustration, we use this method to give a new elementary proof of a known result on the complexity of inference on the "TKF91" model, a well-known probabilistic model over strings. Compared to previous work, our proving method is easier to extend to other models, since it relies on a novel weak condition, triangular transducers, which is easy to establish in practice. The linear algebra view provides a concise way of describing transducer algorithms and their compositions, opens the possibility of transferring fast linear algebra libraries (for example, based on GPUs), as well as low rank matrix approximation methods, to string-valued inference problems.


Subject(s)
Models, Statistical , Phylogeny , Algorithms , Bayes Theorem , Computational Biology , Evolution, Molecular , INDEL Mutation , Linear Models , Mathematical Concepts , Models, Genetic
15.
Proc Natl Acad Sci U S A ; 110(11): 4224-9, 2013 Mar 12.
Article in English | MEDLINE | ID: mdl-23401532

ABSTRACT

One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw inferences about human history. Protolanguages are typically reconstructed using a painstaking manual process known as the comparative method. We present a family of probabilistic models of sound change as well as algorithms for performing inference in these models. The resulting system automatically and accurately reconstructs protolanguages from modern languages. We apply this system to 637 Austronesian languages, providing an accurate, large-scale automatic reconstruction of a set of protolanguages. Over 85% of the system's reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages. Being able to automatically reconstruct large numbers of languages provides a useful way to quantitatively explore hypotheses about the factors determining which sounds in a language are likely to change over time. We demonstrate this by showing that the reconstructed Austronesian protolanguages provide compelling support for a hypothesis about the relationship between the function of a sound and its probability of changing that was first proposed in 1955.


Subject(s)
Language , Models, Theoretical , History, Ancient , Humans
16.
Proc Natl Acad Sci U S A ; 110(4): 1160-6, 2013 Jan 22.
Article in English | MEDLINE | ID: mdl-23275296

ABSTRACT

We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.


Subject(s)
Evolution, Molecular , INDEL Mutation , Models, Genetic , Models, Statistical , Bayes Theorem , Biostatistics , Likelihood Functions , Markov Chains , Phylogeny , Poisson Distribution , Sequence Alignment/statistics & numerical data
17.
Syst Biol ; 61(4): 579-93, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22223445

ABSTRACT

Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.


Subject(s)
Models, Genetic , Monte Carlo Method , Phylogeny , Algorithms , Bayes Theorem , Gene Frequency , Humans , Markov Chains , RNA, Ribosomal, 16S/genetics , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...