ABSTRACT
Despite the successes of immunotherapy in cancer treatment over recent decades, less than <10%-20% cancer cases have demonstrated durable responses from immune checkpoint blockade. To enhance the efficacy of immunotherapies, combination therapies suppressing multiple immune evasion mechanisms are increasingly contemplated. To better understand immune cell surveillance and diverse immune evasion responses in tumor tissues, we comprehensively characterized the immune landscape of more than 1,000 tumors across ten different cancers using CPTAC pan-cancer proteogenomic data. We identified seven distinct immune subtypes based on integrative learning of cell type compositions and pathway activities. We then thoroughly categorized unique genomic, epigenetic, transcriptomic, and proteomic changes associated with each subtype. Further leveraging the deep phosphoproteomic data, we studied kinase activities in different immune subtypes, which revealed potential subtype-specific therapeutic targets. Insights from this work will facilitate the development of future immunotherapy strategies and enhance precision targeting with existing agents.
Subject(s)
Neoplasms , Proteogenomics , Humans , Combined Modality Therapy , Genomics , Neoplasms/genetics , Neoplasms/immunology , Neoplasms/therapy , Proteomics , Tumor EscapeABSTRACT
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with poor patient survival. Toward understanding the underlying molecular alterations that drive PDAC oncogenesis, we conducted comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues. Proteomic, phosphoproteomic, and glycoproteomic analyses were used to characterize proteins and their modifications. In addition, whole-genome sequencing, whole-exome sequencing, methylation, RNA sequencing (RNA-seq), and microRNA sequencing (miRNA-seq) were performed on the same tissues to facilitate an integrated proteogenomic analysis and determine the impact of genomic alterations on protein expression, signaling pathways, and post-translational modifications. To ensure robust downstream analyses, tumor neoplastic cellularity was assessed via multiple orthogonal strategies using molecular features and verified via pathological estimation of tumor cellularity based on histological review. This integrated proteogenomic characterization of PDAC will serve as a valuable resource for the community, paving the way for early detection and identification of novel therapeutic targets.
Subject(s)
Adenocarcinoma/genetics , Carcinoma, Pancreatic Ductal/genetics , Pancreatic Neoplasms/genetics , Proteogenomics , Adenocarcinoma/diagnosis , Adult , Aged , Aged, 80 and over , Algorithms , Carcinoma, Pancreatic Ductal/diagnosis , Cohort Studies , Endothelial Cells/metabolism , Epigenesis, Genetic , Female , Gene Dosage , Genome, Human , Glycolysis , Glycoproteins/biosynthesis , Humans , Male , Middle Aged , Molecular Targeted Therapy , Pancreatic Neoplasms/diagnosis , Phenotype , Phosphoproteins/metabolism , Phosphorylation , Prognosis , Protein Kinases/metabolism , Proteome/metabolism , Substrate Specificity , Transcriptome/geneticsABSTRACT
BACKGROUND: The COVID-19 pandemic began in early 2021 and placed significant strains on health care systems worldwide. There remains a compelling need to analyze factors that are predictive for patients at elevated risk of morbidity and mortality. OBJECTIVE: The goal of this retrospective study of patients who tested positive with COVID-19 and were treated at NYU (New York University) Langone Health was to identify clinical markers predictive of disease severity in order to assist in clinical decision triage and to provide additional biological insights into disease progression. METHODS: The clinical activity of 3740 patients at NYU Langone Hospital was obtained between January and August 2020; patient data were deidentified. Models were trained on clinical data during different parts of their hospital stay to predict three clinical outcomes: deceased, ventilated, or admitted to the intensive care unit (ICU). RESULTS: The XGBoost (eXtreme Gradient Boosting) model that was trained on clinical data from the final 24 hours excelled at predicting mortality (area under the curve [AUC]=0.92; specificity=86%; and sensitivity=85%). Respiration rate was the most important feature, followed by SpO2 (peripheral oxygen saturation) and being aged 75 years and over. Performance of this model to predict the deceased outcome extended 5 days prior, with AUC=0.81, specificity=70%, and sensitivity=75%. When only using clinical data from the first 24 hours, AUCs of 0.79, 0.80, and 0.77 were obtained for deceased, ventilated, or ICU-admitted outcomes, respectively. Although respiration rate and SpO2 levels offered the highest feature importance, other canonical markers, including diabetic history, age, and temperature, offered minimal gain. When lab values were incorporated, prediction of mortality benefited the most from blood urea nitrogen and lactate dehydrogenase (LDH). Features that were predictive of morbidity included LDH, calcium, glucose, and C-reactive protein. CONCLUSIONS: Together, this work summarizes efforts to systematically examine the importance of a wide range of features across different endpoint outcomes and at different hospitalization time points.
Subject(s)
Algorithms , COVID-19/diagnosis , COVID-19/mortality , Hospitalization , Adolescent , Adult , Aged , Area Under Curve , Child , Child, Preschool , Diabetes Mellitus , Female , Hospitals , Humans , Infant , Infant, Newborn , Intensive Care Units , Male , Middle Aged , Morbidity , New York City/epidemiology , Pandemics , Retrospective Studies , SARS-CoV-2 , Triage , Young AdultABSTRACT
Candida albicans is a commensal fungus of the human gastrointestinal tract and a prevalent opportunistic pathogen. To examine diversity within this species, extensive genomic and phenotypic analyses were performed on 21 clinical C. albicans isolates. Genomic variation was evident in the form of polymorphisms, copy number variations, chromosomal inversions, subtelomeric hypervariation, loss of heterozygosity (LOH), and whole or partial chromosome aneuploidies. All 21 strains were diploid, although karyotypic changes were present in eight of the 21 isolates, with multiple strains being trisomic for Chromosome 4 or Chromosome 7. Aneuploid strains exhibited a general fitness defect relative to euploid strains when grown under replete conditions. All strains were also heterozygous, yet multiple, distinct LOH tracts were present in each isolate. Higher overall levels of genome heterozygosity correlated with faster growth rates, consistent with increased overall fitness. Genes with the highest rates of amino acid substitutions included many cell wall proteins, implicating fast evolving changes in cell adhesion and host interactions. One clinical isolate, P94015, presented several striking properties including a novel cellular phenotype, an inability to filament, drug resistance, and decreased virulence. Several of these properties were shown to be due to a homozygous nonsense mutation in the EFG1 gene. Furthermore, loss of EFG1 function resulted in increased fitness of P94015 in a commensal model of infection. Our analysis therefore reveals intra-species genetic and phenotypic differences in C. albicans and delineates a natural mutation that alters the balance between commensalism and pathogenicity.
Subject(s)
Candida albicans/genetics , Genetic Variation , Phenotype , Aneuploidy , Candida albicans/classification , Candidiasis/microbiology , Chromosomes, Fungal , DNA Copy Number Variations , Evolution, Molecular , Genome, Fungal , Genotype , Humans , Phylogeny , Polymorphism, Single Nucleotide , Selection, Genetic , Sequence Analysis, DNAABSTRACT
The interaction between tumors and their microenvironment is complex and heterogeneous. Recent developments in high-dimensional multiplexed imaging have revealed the spatial organization of tumor tissues at the molecular level. However, the discovery and thorough characterization of the tumor microenvironment (TME) remains challenging due to the scale and complexity of the images. Here, we propose a self-supervised representation learning framework, CANVAS, that enables discovery of novel types of TMEs. CANVAS is a vision transformer that directly takes high-dimensional multiplexed images and is trained using self-supervised masked image modeling. In contrast to traditional spatial analysis approaches which rely on cell segmentations, CANVAS is segmentation-free, utilizes pixel-level information, and retains local morphology and biomarker distribution information. This approach allows the model to distinguish subtle morphological differences, leading to precise separation and characterization of distinct TME signatures. We applied CANVAS to a lung tumor dataset and identified and validated a monocytic signature that is associated with poor prognosis.
ABSTRACT
We introduce a pioneering approach that integrates pathology imaging with transcriptomics and proteomics to identify predictive histology features associated with critical clinical outcomes in cancer. We utilize 2,755 H&E-stained histopathological slides from 657 patients across 6 cancer types from CPTAC. Our models effectively recapitulate distinctions readily made by human pathologists: tumor vs. normal (AUROC = 0.995) and tissue-of-origin (AUROC = 0.979). We further investigate predictive power on tasks not normally performed from H&E alone, including TP53 prediction and pathologic stage. Importantly, we describe predictive morphologies not previously utilized in a clinical setting. The incorporation of transcriptomics and proteomics identifies pathway-level signatures and cellular processes driving predictive histology features. Model generalizability and interpretability is confirmed using TCGA. We propose a classification system for these tasks, and suggest potential clinical applications for this integrated human and machine learning approach. A publicly available web-based platform implements these models.
Subject(s)
Deep Learning , Neoplasms , Proteogenomics , Humans , Neoplasms/genetics , Proteomics , Machine LearningABSTRACT
Clinical activity of 3740 de-identified COVID-19 positive patients treated at NYU Langone Health (NYULH) were collected between January and August 2020. XGBoost model trained on clinical data from the final 24 hours excelled at predicting mortality (AUC=0.92, specificity=86% and sensitivity=85%). Respiration rate was the most important feature, followed by SpO2 and age 75+. Performance of this model to predict the deceased outcome extended 5 days prior with AUC=0.81, specificity=70%, sensitivity=75%. When only using clinical data from the first 24 hours, AUCs of 0.79, 0.80, and 0.77 were obtained for deceased, ventilated, or ICU admitted, respectively. Although respiration rate and SpO2 levels offered the highest feature importance, other canonical markers including diabetic history, age and temperature offered minimal gain. When lab values were incorporated, prediction of mortality benefited the most from blood urea nitrogen (BUN) and lactate dehydrogenase (LDH). Features predictive of morbidity included LDH, calcium, glucose, and C-reactive protein (CRP). Together this work summarizes efforts to systematically examine the importance of a wide range of features across different endpoint outcomes and at different hospitalization time points.
ABSTRACT
The human commensal and opportunistic fungal pathogen Candida albicans displays extensive genetic and phenotypic variation across clinical isolates. Here, we performed RNA sequencing on 21 well-characterized isolates to examine how genetic variation contributes to gene expression differences and to link these differences to phenotypic traits. C. albicans adapts primarily through clonal evolution, and yet hierarchical clustering of gene expression profiles in this set of isolates did not reproduce their phylogenetic relationship. Strikingly, strain-specific gene expression was prevalent in some strain backgrounds. Association of gene expression with phenotypic data by differential analysis, linear correlation, and assembly of gene networks connected both previously characterized and novel genes with 23 C. albicans traits. Construction of de novo gene modules produced a gene atlas incorporating 67% of C. albicans genes and revealed correlations between expression modules and important phenotypes such as systemic virulence. Furthermore, targeted investigation of two modules that have novel roles in growth and filamentation supported our bioinformatic predictions. Together, these studies reveal widespread transcriptional variation across C. albicans isolates and identify genetic and epigenetic links to phenotypic variation based on coexpression network analysis.IMPORTANCE Infectious fungal species are often treated uniformly despite clear evidence of genotypic and phenotypic heterogeneity being widespread across strains. Identifying the genetic basis for this phenotypic diversity is extremely challenging because of the tens or hundreds of thousands of variants that may distinguish two strains. Here, we use transcriptional profiling to determine differences in gene expression that can be linked to phenotypic variation among a set of 21 Candida albicans isolates. Analysis of this transcriptional data set uncovered clear trends in gene expression characteristics for this species and new genes and pathways that were associated with variation in pathogenic processes. Direct investigation confirmed functional predictions for a number of new regulators associated with growth and filamentation, demonstrating the utility of these approaches in linking genes to important phenotypes.
Subject(s)
Candida albicans/genetics , Candida albicans/pathogenicity , Gene Expression Profiling , Gene Expression Regulation, Fungal/genetics , Genetic Variation , Phenotype , Candidiasis/microbiology , Genome, Fungal , Genotype , Humans , Phylogeny , Sequence Analysis, RNA , VirulenceABSTRACT
Candida albicans is a commensal fungus of human gastrointestinal and reproductive tracts, but also causes life-threatening systemic infections. The balance between colonization and pathogenesis is associated with phenotypic plasticity, with alternative cell states producing different outcomes in a mammalian host. Here, we reveal that gene dosage of a master transcription factor regulates cell differentiation in diploid C. albicans cells, as EFG1 hemizygous cells undergo a phenotypic transition inaccessible to "wild-type" cells with two functional EFG1 alleles. Notably, clinical isolates are often EFG1 hemizygous and thus licensed to undergo this transition. Phenotypic change corresponds to high-frequency loss of the functional EFG1 allele via de novo mutation or gene conversion events. This phenomenon also occurs during passaging in the gastrointestinal tract with the resulting cell type being hypercompetitive for commensal and systemic infections. A "two-hit" genetic model therefore underlies a key phenotypic transition in C. albicans that enables adaptation to host niches.
Subject(s)
Candida albicans/growth & development , Candida albicans/genetics , Candidiasis/microbiology , Gastrointestinal Tract/microbiology , Gene Expression Regulation, Fungal , Mutation , Symbiosis , Candida albicans/pathogenicity , DNA-Binding Proteins/genetics , Fungal Proteins/genetics , Gene Dosage , Humans , Transcription Factors/genetics , VirulenceABSTRACT
The opportunistic fungal pathogen Candida albicans lacks a conventional sexual program and is thought to evolve, at least primarily, through the clonal acquisition of genetic changes. Here, we performed an analysis of heterozygous diploid genomes from 21 clinical isolates to determine the natural evolutionary processes acting on the C. albicans genome. Mutation and recombination shaped the genomic landscape among the C. albicans isolates. Strain-specific single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) clustered across the genome. Additionally, loss-of-heterozygosity (LOH) events contributed substantially to genotypic variation, with most long-tract LOH events extending to the ends of the chromosomes suggestive of repair via break-induced replication. Consistent with a model of inheritance by descent, most polymorphisms were shared between closely related strains. However, some isolates contained highly mosaic genomes consistent with strains having experienced interclade recombination during their evolutionary history. A detailed examination of mitochondrial genomes also revealed clear examples of interclade recombination among sequenced strains. These analyses therefore establish that both (para)sexual recombination and mitotic mutational processes drive evolution of this important pathogen. To further facilitate the study of C. albicans genomes, we also introduce an online platform, SNPMap, to examine SNP patterns in sequenced isolates.IMPORTANCE Mutations introduce variation into the genome upon which selection can act. Defining the nature of these changes is critical for determining species evolution, as well as for understanding the genetic changes driving important cellular processes. The heterozygous diploid fungus Candida albicans is both a frequent commensal organism and a prevalent opportunistic pathogen. A prevailing theory is that C. albicans evolves primarily through the gradual buildup of mitotic mutations, and a pressing issue is whether sexual or parasexual processes also operate within natural populations. Here, we establish that the C. albicans genome evolves by a combination of localized mutation and both short-tract and long-tract loss-of-heterozygosity (LOH) events within the sequenced isolates. Mutations are more prevalent within noncoding and heterozygous regions and LOH increases towards chromosome ends. Furthermore, we provide evidence for genetic exchange between isolates, establishing that sexual or parasexual processes have contributed to the diversity of both nuclear and mitochondrial genomes.