Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

SpatialSort: a Bayesian model for clustering and cell population annotation of spatial proteomics data.

Lee, Eric; Chern, Kevin; Nissen, Michael; Wang, Xuehai; Huang, Chris; Gandhi, Anita K; Bouchard-Côté, Alexandre; Weng, Andrew P; Roth, Andrew.

Bioinformatics ; 39(39 Suppl 1): i131-i139, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387130

RESUMO

MOTIVATION: Recent advances in spatial proteomics technologies have enabled the profiling of dozens of proteins in thousands of single cells in situ. This has created the opportunity to move beyond quantifying the composition of cell types in tissue, and instead probe the spatial relationships between cells. However, most current methods for clustering data from these assays only consider the expression values of cells and ignore the spatial context. Furthermore, existing approaches do not account for prior information about the expected cell populations in a sample. RESULTS: To address these shortcomings, we developed SpatialSort, a spatially aware Bayesian clustering approach that allows for the incorporation of prior biological knowledge. Our method is able to account for the affinities of cells of different types to neighbour in space, and by incorporating prior information about expected cell populations, it is able to simultaneously improve clustering accuracy and perform automated annotation of clusters. Using synthetic and real data, we show that by using spatial and prior information SpatialSort improves clustering accuracy. We also demonstrate how SpatialSort can perform label transfer between spatial and nonspatial modalities through the analysis of a real world diffuse large B-cell lymphoma dataset. AVAILABILITY AND IMPLEMENTATION: Source code is available on Github at: https://github.com/Roth-Lab/SpatialSort.

Assuntos

Linfoma Difuso de Grandes Células B , Proteômica , Humanos , Teorema de Bayes , Bioensaio , Análise por Conglomerados

2.

Accurate determination of CRISPR-mediated gene fitness in transplantable tumours.

Eirew, Peter; O'Flanagan, Ciara; Ting, Jerome; Salehi, Sohrab; Brimhall, Jazmine; Wang, Beixi; Biele, Justina; Algara, Teresa; Lee, So Ra; Hoang, Corey; Yap, Damian; McKinney, Steven; Bates, Cherie; Kong, Esther; Lai, Daniel; Beatty, Sean; Andronescu, Mirela; Zaikova, Elena; Funnell, Tyler; Ceglia, Nicholas; Chia, Stephen; Gelmon, Karen; Mar, Colin; Shah, Sohrab; Roth, Andrew; Bouchard-Côté, Alexandre; Aparicio, Samuel.

Nat Commun ; 13(1): 4534, 2022 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-35927228

RESUMO

Assessing tumour gene fitness in physiologically-relevant model systems is challenging due to biological features of in vivo tumour regeneration, including extreme variations in single cell lineage progeny. Here we develop a reproducible, quantitative approach to pooled genetic perturbation in patient-derived xenografts (PDXs), by encoding single cell output from transplanted CRISPR-transduced cells in combination with a Bayesian hierarchical model. We apply this to 181 PDX transplants from 21 breast cancer patients. We show that uncertainty in fitness estimates depends critically on the number of transplant cell clones and the variability in clone sizes. We use a pathway-directed allelic series to characterize Notch signaling, and quantify TP53 / MDM2 drug-gene conditional fitness in outlier patients. We show that fitness outlier identification can be mirrored by pharmacological perturbation. Overall, we demonstrate that the gene fitness landscape in breast PDXs is dominated by inter-patient differences.

Assuntos

Neoplasias da Mama , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Animais , Teorema de Bayes , Neoplasias da Mama/genética , Modelos Animais de Doenças , Feminino , Xenoenxertos , Humanos , Ensaios Antitumorais Modelo de Xenoenxerto

3.

Clonal fitness inferred from time-series modelling of single-cell cancer genomes.

Salehi, Sohrab; Kabeer, Farhia; Ceglia, Nicholas; Andronescu, Mirela; Williams, Marc J; Campbell, Kieran R; Masud, Tehmina; Wang, Beixi; Biele, Justina; Brimhall, Jazmine; Gee, David; Lee, Hakwoo; Ting, Jerome; Zhang, Allen W; Tran, Hoa; O'Flanagan, Ciara; Dorri, Fatemeh; Rusk, Nicole; de Algara, Teresa Ruiz; Lee, So Ra; Cheng, Brian Yu Chieh; Eirew, Peter; Kono, Takako; Pham, Jenifer; Grewal, Diljot; Lai, Daniel; Moore, Richard; Mungall, Andrew J; Marra, Marco A; McPherson, Andrew; Bouchard-Côté, Alexandre; Aparicio, Samuel; Shah, Sohrab P.

Nature ; 595(7868): 585-590, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-34163070

RESUMO

Progress in defining genomic fitness landscapes in cancer, especially those defined by copy number alterations (CNAs), has been impeded by lack of time-series single-cell sampling of polyclonal populations and temporal statistical models1-7. Here we generated 42,000 genomes from multi-year time-series single-cell whole-genome sequencing of breast epithelium and primary triple-negative breast cancer (TNBC) patient-derived xenografts (PDXs), revealing the nature of CNA-defined clonal fitness dynamics induced by TP53 mutation and cisplatin chemotherapy. Using a new Wright-Fisher population genetics model8,9 to infer clonal fitness, we found that TP53 mutation alters the fitness landscape, reproducibly distributing fitness over a larger number of clones associated with distinct CNAs. Furthermore, in TNBC PDX models with mutated TP53, inferred fitness coefficients from CNA-based genotypes accurately forecast experimentally enforced clonal competition dynamics. Drug treatment in three long-term serially passaged TNBC PDXs resulted in cisplatin-resistant clones emerging from low-fitness phylogenetic lineages in the untreated setting. Conversely, high-fitness clones from treatment-naive controls were eradicated, signalling an inversion of the fitness landscape. Finally, upon release of drug, selection pressure dynamics were reversed, indicating a fitness cost of treatment resistance. Together, our findings define clonal fitness linked to both CNA and therapeutic resistance in polyclonal tumours.

Assuntos

Variações do Número de Cópias de DNA , Resistencia a Medicamentos Antineoplásicos , Neoplasias de Mama Triplo Negativas/genética , Animais , Linhagem Celular Tumoral , Cisplatino/farmacologia , Células Clonais/patologia , Feminino , Aptidão Genética , Humanos , Camundongos , Modelos Estatísticos , Transplante de Neoplasias , Proteína Supressora de Tumor p53/genética , Sequenciamento Completo do Genoma

4.

An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics.

Wang, Liangliang; Wang, Shijia; Bouchard-Côté, Alexandre.

Syst Biol ; 69(1): 155-183, 2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31173141

RESUMO

We describe an "embarrassingly parallel" method for Bayesian phylogenetic inference, annealed Sequential Monte Carlo (SMC), based on recent advances in the SMC literature such as adaptive determination of annealing parameters. The algorithm provides an approximate posterior distribution over trees and evolutionary parameters as well as an unbiased estimator for the marginal likelihood. This unbiasedness property can be used for the purpose of testing the correctness of posterior simulation software. We evaluate the performance of phylogenetic annealed SMC by reviewing and comparing with other computational Bayesian phylogenetic methods, in particular, different marginal likelihood estimation methods. Unlike previous SMC methods in phylogenetics, our annealed method can utilize standard Markov chain Monte Carlo (MCMC) tree moves and hence benefit from the large inventory of such moves available in the literature. Consequently, the annealed SMC method should be relatively easy to incorporate into existing phylogenetic software packages based on MCMC algorithms. We illustrate our method using simulation studies and real data analysis.

Assuntos

Algoritmos , Classificação/métodos , Filogenia , Teorema de Bayes , Método de Monte Carlo , Software

5.

clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers.

Campbell, Kieran R; Steif, Adi; Laks, Emma; Zahn, Hans; Lai, Daniel; McPherson, Andrew; Farahani, Hossein; Kabeer, Farhia; O'Flanagan, Ciara; Biele, Justina; Brimhall, Jazmine; Wang, Beixi; Walters, Pascale; Bouchard-Côté, Alexandre; Aparicio, Samuel; Shah, Sohrab P.

Genome Biol ; 20(1): 54, 2019 03 12.

Artigo em Inglês | MEDLINE | ID: mdl-30866997

RESUMO

Measuring gene expression of tumor clones at single-cell resolution links functional consequences to somatic alterations. Without scalable methods to simultaneously assay DNA and RNA from the same single cell, parallel single-cell DNA and RNA measurements from independent cell populations must be mapped for genome-transcriptome association. We present clonealign, which assigns gene expression states to cancer clones using single-cell RNA and DNA sequencing independently sampled from a heterogeneous population. We apply clonealign to triple-negative breast cancer patient-derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either sequencing method alone.

Assuntos

Biomarcadores Tumorais/genética , Cistadenocarcinoma Seroso/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , Neoplasias Ovarianas/genética , Análise de Célula Única/métodos , Software , Neoplasias de Mama Triplo Negativas/genética , Animais , Células Clonais , Cistadenocarcinoma Seroso/patologia , Feminino , Humanos , Camundongos Endogâmicos NOD , Camundongos SCID , Neoplasias Ovarianas/patologia , Neoplasias de Mama Triplo Negativas/patologia , Células Tumorais Cultivadas , Ensaios Antitumorais Modelo de Xenoenxerto

6.

Somatic mutation detection and classification through probabilistic integration of clonal population information.

Dorri, Fatemeh; Jewell, Sean; Bouchard-Côté, Alexandre; Shah, Sohrab P.

Commun Biol ; 2: 44, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30729182

RESUMO

Somatic mutations are a primary contributor to malignancy in human cells. Accurate detection of mutations is needed to define the clonal composition of tumours whereby clones may have distinct phenotypic properties. Although analysis of mutations over multiple tumour samples from the same patient has the potential to enhance identification of clones, few analytic methods exploit the correlation structure across samples. We posited that incorporating clonal information into joint analysis over multiple samples would improve mutation detection, particularly those with low prevalence. In this paper, we develop a new procedure called MuClone, for detection of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. In addition to mutation detection, MuClone classifies mutations into biologically meaningful groups and allows us to study clonal dynamics. We show that, on lung and ovarian cancer datasets, MuClone improves somatic mutation detection sensitivity over competing approaches without compromising specificity.

Assuntos

Carcinoma Pulmonar de Células não Pequenas/genética , Cistadenocarcinoma Seroso/genética , Genoma Humano , Neoplasias Pulmonares/genética , Modelos Estatísticos , Proteínas de Neoplasias/genética , Neoplasias Ovarianas/genética , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Carcinoma Pulmonar de Células não Pequenas/patologia , Células Clonais , Cistadenocarcinoma Seroso/diagnóstico , Cistadenocarcinoma Seroso/metabolismo , Cistadenocarcinoma Seroso/patologia , Conjuntos de Dados como Assunto , Exoma , Feminino , Expressão Gênica , Loci Gênicos , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Masculino , Família Multigênica , Mutação , Proteínas de Neoplasias/metabolismo , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/metabolismo , Neoplasias Ovarianas/patologia , Software , Sequenciamento Completo do Genoma

7.

Interfaces of Malignant and Immunologic Clonal Dynamics in Ovarian Cancer.

Zhang, Allen W; McPherson, Andrew; Milne, Katy; Kroeger, David R; Hamilton, Phineas T; Miranda, Alex; Funnell, Tyler; Little, Nicole; de Souza, Camila P E; Laan, Sonya; LeDoux, Stacey; Cochrane, Dawn R; Lim, Jamie L P; Yang, Winnie; Roth, Andrew; Smith, Maia A; Ho, Julie; Tse, Kane; Zeng, Thomas; Shlafman, Inna; Mayo, Michael R; Moore, Richard; Failmezger, Henrik; Heindl, Andreas; Wang, Yi Kan; Bashashati, Ali; Grewal, Diljot S; Brown, Scott D; Lai, Daniel; Wan, Adrian N C; Nielsen, Cydney B; Huebner, Curtis; Tessier-Cloutier, Basile; Anglesio, Michael S; Bouchard-Côté, Alexandre; Yuan, Yinyin; Wasserman, Wyeth W; Gilks, C Blake; Karnezis, Anthony N; Aparicio, Samuel; McAlpine, Jessica N; Huntsman, David G; Holt, Robert A; Nelson, Brad H; Shah, Sohrab P.

Cell ; 173(7): 1755-1769.e22, 2018 06 14.

Artigo em Inglês | MEDLINE | ID: mdl-29754820

RESUMO

High-grade serous ovarian cancer (HGSC) exhibits extensive malignant clonal diversity with widespread but non-random patterns of disease dissemination. We investigated whether local immune microenvironment factors shape tumor progression properties at the interface of tumor-infiltrating lymphocytes (TILs) and cancer cells. Through multi-region study of 212 samples from 38 patients with whole-genome sequencing, immunohistochemistry, histologic image analysis, gene expression profiling, and T and B cell receptor sequencing, we identified three immunologic subtypes across samples and extensive within-patient diversity. Epithelial CD8+ TILs negatively associated with malignant diversity, reflecting immunological pruning of tumor clones inferred by neoantigen depletion, HLA I loss of heterozygosity, and spatial tracking between T cell and tumor clones. In addition, combinatorial prognostic effects of mutational processes and immune properties were observed, illuminating how specific genomic aberration types associate with immune response and impact survival. We conclude that within-patient spatial immune microenvironment variation shapes intraperitoneal malignant spread, provoking new evolutionary perspectives on HGSC clonal dispersion.

Assuntos

Linfócitos do Interstício Tumoral/imunologia , Neoplasias Ovarianas/patologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/metabolismo , Proteína BRCA1/genética , Proteína BRCA1/metabolismo , Proteína BRCA2/genética , Proteína BRCA2/metabolismo , Antígenos CD8/metabolismo , Análise por Conglomerados , Feminino , Antígenos HLA/genética , Antígenos HLA/metabolismo , Humanos , Perda de Heterozigosidade , Linfócitos do Interstício Tumoral/citologia , Linfócitos do Interstício Tumoral/metabolismo , Pessoa de Meia-Idade , Gradação de Tumores , Neoplasias Ovarianas/classificação , Neoplasias Ovarianas/imunologia , Polimorfismo de Nucleotídeo Único , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T/metabolismo , Sequenciamento Completo do Genoma , Adulto Jovem

8.

Correction to: ReMixT: clone-specific genomic structure estimation in cancer.

McPherson, Andrew W; Roth, Andrew; Ha, Gavin; Chauve, Cedric; Steif, Adi; de Souza, Camila P E; Eirew, Peter; Bouchard-Côté, Alexandre; Aparicio, Sam; Sahinalp, S Cenk; Shah, Sohrab P.

Genome Biol ; 18(1): 188, 2017 10 06.

Artigo em Inglês | MEDLINE | ID: mdl-28985744

9.

ReMixT: clone-specific genomic structure estimation in cancer.

McPherson, Andrew W; Roth, Andrew; Ha, Gavin; Chauve, Cedric; Steif, Adi; de Souza, Camila P E; Eirew, Peter; Bouchard-Côté, Alexandre; Aparicio, Sam; Sahinalp, S Cenk; Shah, Sohrab P.

Genome Biol ; 18(1): 140, 2017 07 27.

Artigo em Inglês | MEDLINE | ID: mdl-28750660

RESUMO

Somatic evolution of malignant cells produces tumors composed of multiple clonal populations, distinguished in part by rearrangements and copy number changes affecting chromosomal segments. Whole genome sequencing mixes the signals of sampled populations, diluting the signals of clone-specific aberrations, and complicating estimation of clone-specific genotypes. We introduce ReMixT, a method to unmix tumor and contaminating normal signals and jointly predict mixture proportions, clone-specific segment copy number, and clone specificity of breakpoints. ReMixT is free, open-source software and is available at http://bitbucket.org/dranew/remixt .

Assuntos

Neoplasias da Mama/genética , Cistadenocarcinoma Seroso/genética , Genoma Humano , Modelos Estatísticos , Neoplasias Ovarianas/genética , Software , Algoritmos , Animais , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Contagem de Células , Células Clonais , Cistadenocarcinoma Seroso/metabolismo , Cistadenocarcinoma Seroso/patologia , Variações do Número de Cópias de DNA , Feminino , Genótipo , Xenoenxertos/metabolismo , Xenoenxertos/patologia , Humanos , Internet , Camundongos , Camundongos SCID , Células Neoplásicas Circulantes , Neoplasias Ovarianas/metabolismo , Neoplasias Ovarianas/patologia , Translocação Genética , Sequenciamento Completo do Genoma

10.

ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data.

Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P.

Genome Biol ; 18(1): 44, 2017 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-28249593

RESUMO

Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.

Assuntos

Células Clonais/metabolismo , Biologia Computacional/métodos , Modelos Estatísticos , Neoplasias/genética , Análise de Célula Única , Alelos , Animais , Análise por Conglomerados , Simulação por Computador , Modelos Animais de Doenças , Feminino , Genótipo , Xenoenxertos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Mutação , Neoplasias/patologia , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Análise de Célula Única/métodos , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologia , Fluxo de Trabalho

11.

Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer.

McPherson, Andrew; Roth, Andrew; Laks, Emma; Masud, Tehmina; Bashashati, Ali; Zhang, Allen W; Ha, Gavin; Biele, Justina; Yap, Damian; Wan, Adrian; Prentice, Leah M; Khattra, Jaswinder; Smith, Maia A; Nielsen, Cydney B; Mullaly, Sarah C; Kalloger, Steve; Karnezis, Anthony; Shumansky, Karey; Siu, Celia; Rosner, Jamie; Chan, Hector Li; Ho, Julie; Melnyk, Nataliya; Senz, Janine; Yang, Winnie; Moore, Richard; Mungall, Andrew J; Marra, Marco A; Bouchard-Côté, Alexandre; Gilks, C Blake; Huntsman, David G; McAlpine, Jessica N; Aparicio, Samuel; Shah, Sohrab P.

Nat Genet ; 48(7): 758-67, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-27182968

RESUMO

We performed phylogenetic analysis of high-grade serous ovarian cancers (68 samples from seven patients), identifying constituent clones and quantifying their relative abundances at multiple intraperitoneal sites. Through whole-genome and single-nucleus sequencing, we identified evolutionary features including mutation loss, convergence of the structural genome and temporal activation of mutational processes that patterned clonal progression. We then determined the precise clonal mixtures comprising each tumor sample. The majority of sites were clonally pure or composed of clones from a single phylogenetic clade. However, each patient contained at least one site composed of polyphyletic clones. Five patients exhibited monoclonal and unidirectional seeding from the ovary to intraperitoneal sites, and two patients demonstrated polyclonal spread and reseeding. Our findings indicate that at least two distinct modes of intraperitoneal spread operate in clonal dissemination and highlight the distribution of migratory potential over clonal populations comprising high-grade serous ovarian cancers.

Assuntos

Biomarcadores Tumorais/genética , Células Clonais/patologia , Cistadenocarcinoma Seroso/patologia , Variação Genética/genética , Neoplasias Ovarianas/patologia , Neoplasias Peritoneais/patologia , Microambiente Tumoral/genética , Idoso , Células Clonais/metabolismo , Cistadenocarcinoma Seroso/genética , Progressão da Doença , Neoplasias das Tubas Uterinas/genética , Neoplasias das Tubas Uterinas/patologia , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Pessoa de Meia-Idade , Mutação/genética , Gradação de Tumores , Recidiva Local de Neoplasia/genética , Recidiva Local de Neoplasia/patologia , Neoplasias Ovarianas/genética , Neoplasias Peritoneais/genética , Filogenia , Análise de Célula Única/métodos , Taxa de Sobrevida

12.

Clonal genotype and population structure inference from single-cell tumor sequencing.

Roth, Andrew; McPherson, Andrew; Laks, Emma; Biele, Justina; Yap, Damian; Wan, Adrian; Smith, Maia A; Nielsen, Cydney B; McAlpine, Jessica N; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P.

Nat Methods ; 13(7): 573-6, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-27183439

RESUMO

Single-cell DNA sequencing has great potential to reveal the clonal genotypes and population structure of human cancers. However, single-cell data suffer from missing values and biased allelic counts as well as false genotype measurements owing to the sequencing of multiple cells. We describe the Single Cell Genotyper (https://bitbucket.org/aroth85/scg), an open-source software based on a statistical model coupled with a mean-field variational inference method, which can be used to address these problems and robustly infer clonal genotypes.

Assuntos

Cistadenocarcinoma Seroso/genética , Leucemia/genética , Glândulas Mamárias Humanas/metabolismo , Neoplasias Ovarianas/genética , Análise de Célula Única/métodos , Software , Células Clonais , Feminino , Genoma Humano , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética

13.

PyClone: statistical inference of clonal population structure in cancer.

Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P.

Nat Methods ; 11(4): 396-8, 2014 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-24633410

RESUMO

We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

Assuntos

Teorema de Bayes , Análise por Conglomerados , Modelos Biológicos , Modelos Estatísticos , Neoplasias/metabolismo , Algoritmos , Alelos , Animais , Análise Mutacional de DNA/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Mutação , Reprodutibilidade dos Testes , Software

14.

A note on probabilistic models over strings: the linear algebra approach.

Bouchard-Côté, Alexandre.

Bull Math Biol ; 75(12): 2529-50, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24135792

RESUMO

Probabilistic models over strings have played a key role in developing methods that take into consideration indels as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question is the complexity of computing the normalization of a class of string-valued graphical models. This question has been investigated using tools from combinatorics, dynamic programming, and graph theory, and has practical applications in Bayesian phylogenetics. In this work, we revisit this theoretical question from a different point of view, based on linear algebra. The main contribution is a set of results based on this linear algebra view that facilitate the analysis and design of inference algorithms on string-valued graphical models. As an illustration, we use this method to give a new elementary proof of a known result on the complexity of inference on the "TKF91" model, a well-known probabilistic model over strings. Compared to previous work, our proving method is easier to extend to other models, since it relies on a novel weak condition, triangular transducers, which is easy to establish in practice. The linear algebra view provides a concise way of describing transducer algorithms and their compositions, opens the possibility of transferring fast linear algebra libraries (for example, based on GPUs), as well as low rank matrix approximation methods, to string-valued inference problems.

Assuntos

Modelos Estatísticos , Filogenia , Algoritmos , Teorema de Bayes , Biologia Computacional , Evolução Molecular , Mutação INDEL , Modelos Lineares , Conceitos Matemáticos , Modelos Genéticos

15.

Automated reconstruction of ancient languages using probabilistic models of sound change.

Bouchard-Côté, Alexandre; Hall, David; Griffiths, Thomas L; Klein, Dan.

Proc Natl Acad Sci U S A ; 110(11): 4224-9, 2013 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-23401532

RESUMO

One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw inferences about human history. Protolanguages are typically reconstructed using a painstaking manual process known as the comparative method. We present a family of probabilistic models of sound change as well as algorithms for performing inference in these models. The resulting system automatically and accurately reconstructs protolanguages from modern languages. We apply this system to 637 Austronesian languages, providing an accurate, large-scale automatic reconstruction of a set of protolanguages. Over 85% of the system's reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages. Being able to automatically reconstruct large numbers of languages provides a useful way to quantitatively explore hypotheses about the factors determining which sounds in a language are likely to change over time. We demonstrate this by showing that the reconstructed Austronesian protolanguages provide compelling support for a hypothesis about the relationship between the function of a sound and its probability of changing that was first proposed in 1955.

Assuntos

Idioma , Modelos Teóricos , História Antiga , Humanos

16.

Evolutionary inference via the Poisson Indel Process.

Bouchard-Côté, Alexandre; Jordan, Michael I.

Proc Natl Acad Sci U S A ; 110(4): 1160-6, 2013 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-23275296

RESUMO

We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.

Assuntos

Evolução Molecular , Mutação INDEL , Modelos Genéticos , Modelos Estatísticos , Teorema de Bayes , Bioestatística , Funções Verossimilhança , Cadeias de Markov , Filogenia , Distribuição de Poisson , Alinhamento de Sequência/estatística & dados numéricos

17.

Phylogenetic inference via sequential Monte Carlo.

Bouchard-Côté, Alexandre; Sankararaman, Sriram; Jordan, Michael I.

Syst Biol ; 61(4): 579-93, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22223445

RESUMO

Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.

Assuntos

Modelos Genéticos , Método de Monte Carlo , Filogenia , Algoritmos , Teorema de Bayes , Frequência do Gene , Humanos , Cadeias de Markov , RNA Ribossômico 16S/genética , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA