RESUMEN
Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.
Asunto(s)
Desdiferenciación Celular/genética , Aprendizaje Automático , Neoplasias/patología , Carcinogénesis , Metilación de ADN , Bases de Datos Genéticas , Epigénesis Genética , Humanos , MicroARNs/metabolismo , Metástasis de la Neoplasia , Neoplasias/genética , Células Madre/citología , Células Madre/metabolismo , Transcriptoma , Microambiente TumoralRESUMEN
Highly multiplexed tissue imaging makes detailed molecular analysis of single cells possible in a preserved spatial context. However, reproducible analysis of large multichannel images poses a substantial computational challenge. Here, we describe a modular and open-source computational pipeline, MCMICRO, for performing the sequential steps needed to transform whole-slide images into single-cell data. We demonstrate the use of MCMICRO on tissue and tumor images acquired using multiple imaging platforms, thereby providing a solid foundation for the continued development of tissue imaging software.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Neoplasias , Diagnóstico por Imagen , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Neoplasias/diagnóstico por imagen , Neoplasias/patología , Programas InformáticosRESUMEN
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
Asunto(s)
Causalidad , Redes Reguladoras de Genes , Neoplasias/genética , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Biología de Sistemas , Algoritmos , Biología Computacional , Simulación por Computador , Perfilación de la Expresión Génica , Humanos , Modelos Biológicos , Transducción de Señal , Células Tumorales CultivadasRESUMEN
Evidence from numerous cancers suggests that increased aggressiveness is accompanied by up-regulation of signaling pathways and acquisition of properties common to stem cells. It is unclear if different subtypes of late-stage cancer vary in stemness properties and whether or not these subtypes are transcriptionally similar to normal tissue stem cells. We report a gene signature specific for human prostate basal cells that is differentially enriched in various phenotypes of late-stage metastatic prostate cancer. We FACS-purified and transcriptionally profiled basal and luminal epithelial populations from the benign and cancerous regions of primary human prostates. High-throughput RNA sequencing showed the basal population to be defined by genes associated with stem cell signaling programs and invasiveness. Application of a 91-gene basal signature to gene expression datasets from patients with organ-confined or hormone-refractory metastatic prostate cancer revealed that metastatic small cell neuroendocrine carcinoma was molecularly more stem-like than either metastatic adenocarcinoma or organ-confined adenocarcinoma. Bioinformatic analysis of the basal cell and two human small cell gene signatures identified a set of E2F target genes common between prostate small cell neuroendocrine carcinoma and primary prostate basal cells. Taken together, our data suggest that aggressive prostate cancer shares a conserved transcriptional program with normal adult prostate basal stem cells.
Asunto(s)
Perfilación de la Expresión Génica , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , Células Madre/metabolismo , Antígenos CD/metabolismo , Células Epiteliales/metabolismo , Femenino , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Masculino , Glándulas Mamarias Humanas/citología , Metástasis de la Neoplasia , Tumores Neuroendocrinos/genética , Tumores Neuroendocrinos/patología , Fenotipo , Proteínas Proto-Oncogénicas c-myc/metabolismo , Análisis de Secuencia de ARN , Transducción de Señal/genética , Factores de Transcripción/metabolismoRESUMEN
We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.
Asunto(s)
Mapeo Cromosómico/métodos , Modelos Genéticos , Reconocimiento de Normas Patrones Automatizadas/métodos , Mapeo de Interacción de Proteínas/métodos , Proteoma/genética , Transducción de Señal/genética , Animales , Simulación por Computador , HumanosRESUMEN
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
Asunto(s)
Biología Computacional/métodos , Biología Molecular/métodos , Anotación de Secuencia Molecular , Proteínas/fisiología , Algoritmos , Animales , Bases de Datos de Proteínas , Exorribonucleasas/clasificación , Exorribonucleasas/genética , Exorribonucleasas/fisiología , Predicción , Humanos , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Especificidad de la EspecieRESUMEN
Many datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While biospecimen and experimental information is often captured, detailed metadata standards related to data matrices and analysis workflows are currently lacking. To address this, we develop the matrix and analysis metadata standards (MAMS) to serve as a resource for data centers, repositories, and tool developers. We define metadata fields for matrices and parameters commonly utilized in analytical workflows and developed the rmams package to extract MAMS from single-cell objects. Overall, MAMS promotes the harmonization, integration, and reproducibility of single-cell data across platforms.
Asunto(s)
Metadatos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/normas , Reproducibilidad de los Resultados , Humanos , Programas InformáticosRESUMEN
Neuroinflammation is a pathological feature of many neurodegenerative diseases, including Alzheimer's disease (AD)1,2 and amyotrophic lateral sclerosis (ALS)3, raising the possibility of common therapeutic targets. We previously established that cytoplasmic double-stranded RNA (cdsRNA) is spatially coincident with cytoplasmic pTDP-43 inclusions in neurons of patients with C9ORF72-mediated ALS4. CdsRNA triggers a type-I interferon (IFN-I)-based innate immune response in human neural cells, resulting in their death4. Here, we report that cdsRNA is also spatially coincident with pTDP-43 cytoplasmic inclusions in brain cells of patients with AD pathology and that type-I interferon response genes are significantly upregulated in brain regions affected by AD. We updated our machine-learning pipeline DRIAD-SP (Drug Repurposing In Alzheimer's Disease with Systems Pharmacology) to incorporate cryptic exon (CE) detection as a proxy of pTDP-43 inclusions and demonstrated that the FDA-approved JAK inhibitors baricitinib and ruxolitinib that block interferon signaling show a protective signal only in cortical brain regions expressing multiple CEs. Furthermore, the JAK family member TYK2 was a top hit in a CRISPR screen of cdsRNA-mediated death in differentiated human neural cells. The selective TYK2 inhibitor deucravacitinib, an FDA-approved drug for psoriasis, rescued toxicity elicited by cdsRNA. Finally, we identified CCL2, CXCL10, and IL-6 as candidate predictive biomarkers for cdsRNA-related neurodegenerative diseases. Together, we find parallel neuroinflammatory mechanisms between TDP-43 associated-AD and ALS and nominate TYK2 as a possible disease-modifying target of these incurable neurodegenerative diseases.
RESUMEN
Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species and models trained from collections of data within a given species. This version of GOstruct participated in the recent Critical Assessment of Functional Annotations (CAFA) challenge; since then we have significantly improved the natural language processing component of the method, which now provides performance that is on par with that provided by sequence information. The GOstruct framework is available for download at http://strut.sourceforge.net.
Asunto(s)
Anotación de Secuencia Molecular , Proteínas/fisiología , Algoritmos , Animales , Biología Computacional/métodos , Expresión Génica , Ratones , Mapeo de Interacción de Proteínas , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos , Vocabulario ControladoRESUMEN
MOTIVATION: A current challenge in understanding cancer processes is to pinpoint which mutations influence the onset and progression of disease. Toward this goal, we describe a method called PARADIGM-SHIFT that can predict whether a mutational event is neutral, gain-or loss-of-function in a tumor sample. The method uses a belief-propagation algorithm to infer gene activity from gene expression and copy number data in the context of a set of pathway interactions. RESULTS: The method was found to be both sensitive and specific on a set of positive and negative controls for multiple cancers for which pathway information was available. Application to the Cancer Genome Atlas glioblastoma, ovarian and lung squamous cancer datasets revealed several novel mutations with predicted high impact including several genes mutated at low frequency suggesting the approach will be complementary to current approaches that rely on the prevalence of events to reach statistical significance. AVAILABILITY: All source code is available at the github repository http:github.org/paradigmshift. CONTACT: jstuart@soe.ucsc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Mutación , Neoplasias/genética , Expresión Génica , Genes Relacionados con las Neoplasias , Genes p53 , Humanos , Factor 2 Relacionado con NF-E2/genética , Proteína de Retinoblastoma/genéticaRESUMEN
[This corrects the article DOI: 10.1016/j.patter.2023.100791.].
RESUMEN
The true accuracy of a machine-learning model is a population-level statistic that cannot be observed directly. In practice, predictor performance is estimated against one or more test datasets, and the accuracy of this estimate strongly depends on how well the test sets represent all possible unseen datasets. Here we describe paired evaluation as a simple, robust approach for evaluating performance of machine-learning models in small-sample biological and clinical studies. We use the method to evaluate predictors of drug response in breast cancer cell lines and of disease severity in patients with Alzheimer's disease, demonstrating that the choice of test data can cause estimates of performance to vary by as much as 20%. We show that paired evaluation makes it possible to identify outliers, improve the accuracy of performance estimates in the presence of known confounders, and assign statistical significance when comparing machine-learning models.
RESUMEN
A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata standards that describe data matrices and the analysis workflows that produced them are relatively lacking. Detailed metadata schema related to data analysis are needed to facilitate sharing and interoperability across groups and to promote data provenance for reproducibility. To address this need, we developed the Matrix and Analysis Metadata Standards (MAMS) to serve as a resource for data coordinating centers and tool developers. We first curated several simple and complex "use cases" to characterize the types of feature-observation matrices (FOMs), annotations, and analysis metadata produced in different workflows. Based on these use cases, metadata fields were defined to describe the data contained within each matrix including those related to processing, modality, and subsets. Suggested terms were created for the majority of fields to aid in harmonization of metadata terms across groups. Additional provenance metadata fields were also defined to describe the software and workflows that produced each FOM. Finally, we developed a simple list-like schema that can be used to store MAMS information and implemented in multiple formats. Overall, MAMS can be used as a guide to harmonize analysis-related metadata which will ultimately facilitate integration of datasets across tools and consortia. MAMS specifications, use cases, and examples can be found at https://github.com/single-cell-mams/mams/.
RESUMEN
The National Cancer Institute (NCI) supports many research programs and consortia, many of which use imaging as a major modality for characterizing cancerous tissue. A trans-consortia Image Analysis Working Group (IAWG) was established in 2019 with a mission to disseminate imaging-related work and foster collaborations. In 2022, the IAWG held a virtual hackathon focused on addressing challenges of analyzing high dimensional datasets from fixed cancerous tissues. Standard image processing techniques have automated feature extraction, but the next generation of imaging data requires more advanced methods to fully utilize the available information. In this perspective, we discuss current limitations of the automated analysis of multiplexed tissue images, the first steps toward deeper understanding of these limitations, what possible solutions have been developed, any new or refined approaches that were developed during the Image Analysis Hackathon 2022, and where further effort is required. The outstanding problems addressed in the hackathon fell into three main themes: 1) challenges to cell type classification and assessment, 2) translation and visual representation of spatial aspects of high dimensional data, and 3) scaling digital image analyses to large (multi-TB) datasets. We describe the rationale for each specific challenge and the progress made toward addressing it during the hackathon. We also suggest areas that would benefit from more focus and offer insight into broader challenges that the community will need to address as new technologies are developed and integrated into the broad range of image-based modalities and analytical resources already in use within the cancer research community.
RESUMEN
Mitochondrial biogenesis initiates within hours of T cell receptor (TCR) engagement and is critical for T cell activation, function, and survival; yet, how metabolic programs support mitochondrial biogenesis during TCR signaling is not fully understood. Here, we performed a multiplexed metabolic chemical screen in CD4+ T lymphocytes to identify modulators of metabolism that impact mitochondrial mass during early T cell activation. Treatment of T cells with pyrvinium pamoate early during their activation blocks an increase in mitochondrial mass and results in reduced proliferation, skewed CD4+ T cell differentiation, and reduced cytokine production. Furthermore, administration of pyrvinium pamoate at the time of induction of experimental autoimmune encephalomyelitis, an experimental model of multiple sclerosis in mice, prevented the onset of clinical disease. Thus, modulation of mitochondrial biogenesis may provide a therapeutic strategy for modulating T cell immune responses.
Asunto(s)
Encefalomielitis Autoinmune Experimental , Ratones , Animales , Encefalomielitis Autoinmune Experimental/tratamiento farmacológico , Linfocitos T , Activación de Linfocitos , Receptores de Antígenos de Linfocitos T , Linfocitos T CD4-PositivosRESUMEN
A challenge in tuberculosis treatment regimen design is the necessity to combine three or more antibiotics. We narrow the prohibitively large search space by breaking down high-order drug combinations into drug pair units. Using pairwise in vitro measurements, we train machine learning models to predict higher-order combination treatment outcomes in the relapsing BALB/c mouse model. Classifiers perform well and predict many of the >500 possible combinations among 12 antibiotics to be improved over bedaquiline + pretomanid + linezolid, a treatment-shortening regimen compared with the standard of care in mice. We reformulate classifiers as simple rulesets to reveal guiding principles of constructing combination therapies for both preclinical and clinical outcomes. One example ruleset combines a drug pair that is synergistic in a dormancy model with a pair that is potent in a cholesterol-rich growth environment. These rulesets are predictive, intuitive, and practical, thus enabling rational construction of drug combinations.
Asunto(s)
Antituberculosos , Tuberculosis , Animales , Antituberculosos/uso terapéutico , Combinación de Medicamentos , Linezolid/uso terapéutico , Ratones , Ratones Endogámicos BALB C , Tuberculosis/tratamiento farmacológicoRESUMEN
Emerging multiplexed imaging platforms provide an unprecedented view of an increasing number of molecular markers at subcellular resolution and the dynamic evolution of tumor cellular composition. As such, they are capable of elucidating cell-to-cell interactions within the tumor microenvironment that impact clinical outcome and therapeutic response. However, the rapid development of these platforms has far outpaced the computational methods for processing and analyzing the data they generate. While being technologically disparate, all imaging assays share many computational requirements for post-collection data processing. As such, our Image Analysis Working Group (IAWG), composed of researchers in the Cancer Systems Biology Consortium (CSBC) and the Physical Sciences - Oncology Network (PS-ON), convened a workshop on "Computational Challenges Shared by Diverse Imaging Platforms" to characterize these common issues and a follow-up hackathon to implement solutions for a selected subset of them. Here, we delineate these areas that reflect major axes of research within the field, including image registration, segmentation of cells and subcellular structures, and identification of cell types from their morphology. We further describe the logistical organization of these events, believing our lessons learned can aid others in uniting the imaging community around self-identified topics of mutual interest, in designing and implementing operational procedures to address those topics and in mitigating issues inherent in image analysis (e.g., sharing exemplar images of large datasets and disseminating baseline solutions to hackathon challenges through open-source code repositories).
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Neoplasias , Diagnóstico por Imagen , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Neoplasias/diagnóstico por imagen , Programas Informáticos , Microambiente TumoralRESUMEN
Background: Inflammatory breast cancer (IBC) is a rare and understudied disease, with 40% of cases presenting with human epidermal growth factor receptor 2 (HER2)-positive subtype. The goals of this study were to (i) assess the pathologic complete response (pCR) rate of short-term neoadjuvant dual-HER2-blockade and paclitaxel, (ii) contrast baseline and on-treatment transcriptional profiles of IBC tumor biopsies associated with pCR, and (iii) identify biological pathways that may explain the effect of neoadjuvant therapy on tumor response. Patients and Methods: A single-arm phase II trial of neoadjuvant trastuzumab (H), pertuzumab (P), and paclitaxel for 16 weeks was completed among patients with newly diagnosed HER2-positive IBC. Fresh-frozen tumor biopsies were obtained pretreatment (D1) and 8 days later (D8), following a single dose of HP, prior to adding paclitaxel. We performed RNA-sequencing on D1 and D8 tumor biopsies, identified genes associated with pCR using differential gene expression analysis, identified pathways associated with pCR using gene set enrichment and gene expression deconvolution methods, and compared the pCR predictive value of principal components derived from gene expression profiles by calculating and area under the curve for D1 and D8 subsets. Results: Twenty-three participants were enrolled, of whom 21 completed surgery following neoadjuvant therapy. Paired longitudinal fresh-frozen tumor samples (D1 and D8) were obtained from all patients. Among the 21 patients who underwent surgery, the pCR and the 4-year disease-free survival were 48% (90% CI 0.29-0.67) and 90% (95% CI 66-97%), respectively. The transcriptional profile of D8 biopsies was found to be more predictive of pCR (AUC = 0.91, 95% CI: 0.7993-1) than the D1 biopsies (AUC = 0.79, 95% CI: 0.5905-0.9822). Conclusions: In patients with HER2-positive IBC treated with neoadjuvant HP and paclitaxel for 16 weeks, gene expression patterns of tumor biopsies measured 1 week after treatment initiation not only offered different biological information but importantly served as a better predictor of pCR than baseline transcriptional analysis. Trial Registration: ClinicalTrials.gov identifier: NCT01796197 (https://clinicaltrials.gov/ct2/show/NCT01796197); registered on February 21, 2013.
RESUMEN
Metformin, a diabetes drug with anti-aging cellular responses, has complex actions that may alter dementia onset. Mixed results are emerging from prior observational studies. To address this complexity, we deploy a causal inference approach accounting for the competing risk of death in emulated clinical trials using two distinct electronic health record systems. In intention-to-treat analyses, metformin use associates with lower hazard of all-cause mortality and lower cause-specific hazard of dementia onset, after accounting for prolonged survival, relative to sulfonylureas. In parallel systems pharmacology studies, the expression of two AD-related proteins, APOE and SPP1, was suppressed by pharmacologic concentrations of metformin in differentiated human neural cells, relative to a sulfonylurea. Together, our findings suggest that metformin might reduce the risk of dementia in diabetes patients through mechanisms beyond glycemic control, and that SPP1 is a candidate biomarker for metformin's action in the brain.