RESUMO
With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix.
Assuntos
Neoplasias da Mama/genética , Mapeamento Cromossômico/métodos , Dosagem de Genes/genética , Genes Neoplásicos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Interpretação Estatística de Dados , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data.
Assuntos
Algoritmos , Neoplasias/genética , Análise por Conglomerados , Hibridização Genômica Comparativa , Dosagem de Genes , HumanosRESUMO
BACKGROUND: Tumorigenesis is an evolutionary process by which tumor cells acquire mutations through successive diversification and differentiation. There is much interest in reconstructing this process of evolution due to its relevance to identifying drivers of mutation and predicting future prognosis and drug response. Efforts are challenged by high tumor heterogeneity, though, both within and among patients. In prior work, we showed that this heterogeneity could be turned into an advantage by computationally reconstructing models of cell populations mixed to different degrees in distinct tumors. Such mixed membership model approaches, however, are still limited in their ability to dissect more than a few well-conserved cell populations across a tumor data set. RESULTS: We present a method to improve on current mixed membership model approaches by better accounting for conserved progression pathways between subsets of cancers, which imply a structure to the data that has not previously been exploited. We extend our prior methods, which use an interpretation of the mixture problem as that of reconstructing simple geometric objects called simplices, to instead search for structured unions of simplices called simplicial complexes that one would expect to emerge from mixture processes describing branches along an evolutionary tree. We further improve on the prior work with a novel objective function to better identify mixtures corresponding to parsimonious evolutionary tree models. We demonstrate that this approach improves on our ability to accurately resolve mixtures on simulated data sets and demonstrate its practical applicability on a large RNASeq tumor data set. CONCLUSIONS: Better exploiting the expected geometric structure for mixed membership models produced from common evolutionary trees allows us to quickly and accurately reconstruct models of cell populations sampled from those trees. In the process, we hope to develop a better understanding of tumor evolution as well as other biological problems that involve interpreting genomic data gathered from heterogeneous populations of cells.
Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Biologia Computacional/métodos , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas de Neoplasias/genética , Progressão da Doença , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Genoma Humano , HumanosRESUMO
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.
Assuntos
Demografia/estatística & dados numéricos , Genética Populacional/métodos , Dinâmica Populacional/tendências , População/genética , Análise por Conglomerados , Demografia/métodos , Emigrantes e Imigrantes , Fluxo Gênico/genética , Técnicas de Genotipagem , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único , Dinâmica Populacional/estatística & dados numéricos , Análise de Sequência de DNA , Estados Unidos/etnologiaRESUMO
Wnt signalling is a critically important signalling pathway regulating embryogenesis and differentiation, and is broadly conserved amongst multicellular animals. In addition, dysregulation of Wnt signalling contributes to the pathogenesis of many human cancers, in particular colorectal cancer. Core members of the Wnt signalling pathway are quite well defined, although it has become apparent that a much broader network of interacting proteins regulates Wnt signalling activity. The goal of this paper is first to identify novel members of the Wnt regulatory network; and second, to identify sub-networks of the larger Wnt signalling network that are active in different biological contexts. We address these two questions using complementary computational approaches and show how these approaches may identify potentially novel Wnt signalling proteins as well as defining Wnt sub-networks active in different stages of colorectal cancer.