RESUMEN
MOTIVATION: Metastasis formation is a hallmark of cancer lethality. Yet, metastases are generally unobservable during their early stages of dissemination and spread to distant organs. Genomic datasets of matched primary tumors and metastases may offer insights into the underpinnings and the dynamics of metastasis formation. RESULTS: We present metMHN, a cancer progression model designed to deduce the joint progression of primary tumors and metastases using cross-sectional cancer genomics data. The model elucidates the statistical dependencies among genomic events, the formation of metastasis, and the clinical emergence of both primary tumors and their metastatic counterparts. metMHN enables the chronological reconstruction of mutational sequences and facilitates estimation of the timing of metastatic seeding. In a study of nearly 5000 lung adenocarcinomas, metMHN pinpointed TP53 and EGFR as mediators of metastasis formation. Furthermore, the study revealed that post-seeding adaptation is predominantly influenced by frequent copy number alterations. AVAILABILITY AND IMPLEMENTATION: All datasets and code are available on GitHub at https://github.com/cbg-ethz/metMHN.
Asunto(s)
Genómica , Metástasis de la Neoplasia , Humanos , Genómica/métodos , Metástasis de la Neoplasia/genética , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Progresión de la Enfermedad , Neoplasias/genética , Neoplasias/patología , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/patología , Mutación , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo , Estudios Transversales , Receptores ErbB/genéticaRESUMEN
Cancer progression can be described by continuous-time Markov chains whose state space grows exponentially in the number of somatic mutations. The age of a tumor at diagnosis is typically unknown. Therefore, the quantity of interest is the time-marginal distribution over all possible genotypes of tumors, defined as the transient distribution integrated over an exponentially distributed observation time. It can be obtained as the solution of a large linear system. However, the sheer size of this system renders classical solvers infeasible. We consider Markov chains whose transition rates are separable functions, allowing for an efficient low-rank tensor representation of the linear system's operator. Thus we can reduce the computational complexity from exponential to linear. We derive a convergent iterative method using low-rank formats whose result satisfies the normalization constraint of a distribution. We also perform numerical experiments illustrating that the marginal distribution is well approximated with low rank.
Asunto(s)
Cadenas de Markov , GenotipoRESUMEN
MOTIVATION: Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. RESULTS: Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. AVAILABILITY AND IMPLEMENTATION: Implementation and data are available at https://github.com/RudiSchill/MHN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Biología Computacional , Glioblastoma , Modelos Genéticos , Biología Computacional/métodos , Estudios Transversales , Genoma/genética , Glioblastoma/genética , Humanos , Aprendizaje Automático , MutaciónRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Establishing gene regulatory networks during differentiation or reprogramming requires master or pioneer transcription factors (TFs) such as PU.1, a prototype master TF of hematopoietic lineage differentiation. To systematically determine molecular features that control its activity, here we analyze DNA-binding in vitro and genome-wide in vivo across different cell types with native or ectopic PU.1 expression. Although PU.1, in contrast to classical pioneer factors, is unable to access nucleosomal target sites in vitro, ectopic induction of PU.1 leads to the extensive remodeling of chromatin and redistribution of partner TFs. De novo chromatin access, stable binding, and redistribution of partner TFs both require PU.1's N-terminal acidic activation domain and its ability to recruit SWI/SNF remodeling complexes, suggesting that the latter may collect and distribute co-associated TFs in conjunction with the non-classical pioneer TF PU.1.