|

Ultrasensitive plasma-based monitoring of tumor burden using machine-learning-guided signal enrichment.

Widman, Adam J; Shah, Minita; Frydendahl, Amanda; Halmos, Daniel; Khamnei, Cole C; Øgaard, Nadia; Rajagopalan, Srinivas; Arora, Anushri; Deshpande, Aditya; Hooper, William F; Quentin, Jean; Bass, Jake; Zhang, Mingxuan; Langanay, Theophile; Andersen, Laura; Steinsnyder, Zoe; Liao, Will; Rasmussen, Mads Heilskov; Henriksen, Tenna Vesterman; Jensen, Sarah Østrup; Nors, Jesper; Therkildsen, Christina; Sotelo, Jesus; Brand, Ryan; Schiffman, Joshua S; Shah, Ronak H; Cheng, Alexandre Pellan; Maher, Colleen; Spain, Lavinia; Krause, Kate; Frederick, Dennie T; den Brok, Wendie; Lohrisch, Caroline; Shenkier, Tamara; Simmons, Christine; Villa, Diego; Mungall, Andrew J; Moore, Richard; Zaikova, Elena; Cerda, Viviana; Kong, Esther; Lai, Daniel; Malbari, Murtaza S; Marton, Melissa; Manaa, Dina; Winterkorn, Lara; Gelmon, Karen; Callahan, Margaret K; Boland, Genevieve; Potenski, Catherine.

Nat Med ; 2024 Jun 14.

Article En | MEDLINE | ID: mdl-38877116

In solid tumor oncology, circulating tumor DNA (ctDNA) is poised to transform care through accurate assessment of minimal residual disease (MRD) and therapeutic response monitoring. To overcome the sparsity of ctDNA fragments in low tumor fraction (TF) settings and increase MRD sensitivity, we previously leveraged genome-wide mutational integration through plasma whole-genome sequencing (WGS). Here we now introduce MRD-EDGE, a machine-learning-guided WGS ctDNA single-nucleotide variant (SNV) and copy-number variant (CNV) detection platform designed to increase signal enrichment. MRD-EDGESNV uses deep learning and a ctDNA-specific feature space to increase SNV signal-to-noise enrichment in WGS by ~300× compared to previous WGS error suppression. MRD-EDGECNV also reduces the degree of aneuploidy needed for ultrasensitive CNV detection through WGS from 1 Gb to 200 Mb, vastly expanding its applicability within solid tumors. We harness the improved performance to identify MRD following surgery in multiple cancer types, track changes in TF in response to neoadjuvant immunotherapy in lung cancer and demonstrate ctDNA shedding in precancerous colorectal adenomas. Finally, the radical signal-to-noise enrichment in MRD-EDGESNV enables plasma-only (non-tumor-informed) disease monitoring in advanced melanoma and lung cancer, yielding clinically informative TF monitoring for patients on immune-checkpoint inhibition.

EGGNet, a Generalizable Geometric Deep Learning Framework for Protein Complex Pose Scoring.

Wang, Zichen; Brand, Ryan; Adolf-Bryfogle, Jared; Grewal, Jasleen; Qi, Yanjun; Combs, Steven A; Golovach, Nataliya; Alford, Rebecca; Rangwala, Huzefa; Clark, Peter M.

ACS Omega ; 9(7): 7471-7479, 2024 Feb 20.

Article En | MEDLINE | ID: mdl-38405499

Computational prediction of molecule-protein interactions has been key for developing new molecules to interact with a target protein for therapeutics development. Previous work includes two independent streams of approaches: (1) predicting protein-protein interactions (PPIs) between naturally occurring proteins and (2) predicting binding affinities between proteins and small-molecule ligands [also known as drug-target interaction (DTI)]. Studying the two problems in isolation has limited the ability of these computational models to generalize across the PPI and DTI tasks, both of which ultimately involve noncovalent interactions with a protein target. In this work, we developed Equivariant Graph of Graphs neural Network (EGGNet), a geometric deep learning (GDL) framework, for molecule-protein binding predictions that can handle three types of molecules for interacting with a target protein: (1) small molecules, (2) synthetic peptides, and (3) natural proteins. EGGNet leverages a graph of graphs (GoG) representation constructed from the molecular structures at atomic resolution and utilizes a multiresolution equivariant graph neural network to learn from such representations. In addition, EGGNet leverages the underlying biophysics and makes use of both atom- and residue-level interactions, which improve EGGNet's ability to rank candidate poses from blind docking. EGGNet achieves competitive performance on both a public protein-small-molecule binding affinity prediction task (80.2% top 1 success rate on CASF-2016) and a synthetic protein interface prediction task (88.4% area under the precision-recall curve). We envision that the proposed GDL framework can generalize to many other protein interaction prediction problems, such as binding site prediction and molecular docking, helping accelerate protein engineering and structure-based drug development.

LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction.

Wang, Zichen; Combs, Steven A; Brand, Ryan; Calvo, Miguel Romero; Xu, Panpan; Price, George; Golovach, Nataliya; Salawu, Emmanuel O; Wise, Colby J; Ponnapalli, Sri Priya; Clark, Peter M.

Sci Rep ; 12(1): 6832, 2022 04 27.

Article En | MEDLINE | ID: mdl-35477726

Proteins perform many essential functions in biological systems and can be successfully developed as bio-therapeutics. It is invaluable to be able to predict their properties based on a proposed sequence and structure. In this study, we developed a novel generalizable deep learning framework, LM-GVP, composed of a protein Language Model (LM) and Graph Neural Network (GNN) to leverage information from both 1D amino acid sequences and 3D structures of proteins. Our approach outperformed the state-of-the-art protein LMs on a variety of property prediction tasks including fluorescence, protease stability, and protein functions from Gene Ontology (GO). We also illustrated insights into how a GNN prediction head can inform the fine-tuning of protein LMs to better leverage structural information. We envision that our deep learning framework will be generalizable to many protein property prediction problems to greatly accelerate protein engineering and drug development.

Deep Learning , Amino Acid Sequence , Language , Neural Networks, Computer , Proteins/chemistry

Somatic mutations and cell identity linked by Genotyping of Transcriptomes.

Nam, Anna S; Kim, Kyu-Tae; Chaligne, Ronan; Izzo, Franco; Ang, Chelston; Taylor, Justin; Myers, Robert M; Abu-Zeinah, Ghaith; Brand, Ryan; Omans, Nathaniel D; Alonso, Alicia; Sheridan, Caroline; Mariani, Marisa; Dai, Xiaoguang; Harrington, Eoghan; Pastore, Alessandro; Cubillos-Ruiz, Juan R; Tam, Wayne; Hoffman, Ronald; Rabadan, Raul; Scandura, Joseph M; Abdel-Wahab, Omar; Smibert, Peter; Landau, Dan A.

Nature ; 571(7765): 355-360, 2019 07.

Article En | MEDLINE | ID: mdl-31270458

Defining the transcriptomic identity of malignant cells is challenging in the absence of surface markers that distinguish cancer clones from one another, or from admixed non-neoplastic cells. To address this challenge, here we developed Genotyping of Transcriptomes (GoT), a method to integrate genotyping with high-throughput droplet-based single-cell RNA sequencing. We apply GoT to profile 38,290 CD34+ cells from patients with CALR-mutated myeloproliferative neoplasms to study how somatic mutations corrupt the complex process of human haematopoiesis. High-resolution mapping of malignant versus normal haematopoietic progenitors revealed an increasing fitness advantage with myeloid differentiation of cells with mutated CALR. We identified the unfolded protein response as a predominant outcome of CALR mutations, with a considerable dependency on cell identity, as well as upregulation of the NF-κB pathway specifically in uncommitted stem cells. We further extended the GoT toolkit to genotype multiple targets and loci that are distant from transcript ends. Together, these findings reveal that the transcriptional output of somatic mutations in myeloproliferative neoplasms is dependent on the native cell identity.

Genotype , Mutation , Myeloproliferative Disorders/genetics , Myeloproliferative Disorders/pathology , Neoplasms/genetics , Neoplasms/pathology , Transcriptome/genetics , Animals , Antigens, CD34/metabolism , Calreticulin/genetics , Cell Line , Cell Proliferation , Clone Cells/classification , Clone Cells/metabolism , Clone Cells/pathology , Endoribonucleases/metabolism , Hematopoiesis/genetics , Hematopoietic Stem Cells/classification , Hematopoietic Stem Cells/metabolism , Hematopoietic Stem Cells/pathology , High-Throughput Nucleotide Sequencing/methods , Humans , Mice , Models, Molecular , Myeloproliferative Disorders/classification , NF-kappa B/metabolism , Neoplasms/classification , Neoplastic Stem Cells/cytology , Neoplastic Stem Cells/metabolism , Neoplastic Stem Cells/pathology , Primary Myelofibrosis/genetics , Primary Myelofibrosis/pathology , Protein Serine-Threonine Kinases/metabolism , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Unfolded Protein Response/genetics

Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia.

Gaiti, Federico; Chaligne, Ronan; Gu, Hongcang; Brand, Ryan M; Kothen-Hill, Steven; Schulman, Rafael C; Grigorev, Kirill; Risso, Davide; Kim, Kyu-Tae; Pastore, Alessandro; Huang, Kevin Y; Alonso, Alicia; Sheridan, Caroline; Omans, Nathaniel D; Biederstedt, Evan; Clement, Kendell; Wang, Lili; Felsenfeld, Joshua A; Bhavsar, Erica B; Aryee, Martin J; Allan, John N; Furman, Richard; Gnirke, Andreas; Wu, Catherine J; Meissner, Alexander; Landau, Dan A.

Nature ; 569(7757): 576-580, 2019 05.

Article En | MEDLINE | ID: mdl-31092926

Genetic and epigenetic intra-tumoral heterogeneity cooperate to shape the evolutionary course of cancer1. Chronic lymphocytic leukaemia (CLL) is a highly informative model for cancer evolution as it undergoes substantial genetic diversification and evolution after therapy2,3. The CLL epigenome is also an important disease-defining feature4,5, and growing populations of cells in CLL diversify by stochastic changes in DNA methylation known as epimutations6. However, previous studies using bulk sequencing methods to analyse the patterns of DNA methylation were unable to determine whether epimutations affect CLL populations homogeneously. Here, to measure the epimutation rate at single-cell resolution, we applied multiplexed single-cell reduced-representation bisulfite sequencing to B cells from healthy donors and patients with CLL. We observed that the common clonal origin of CLL results in a consistently increased epimutation rate, with low variability in the cell-to-cell epimutation rate. By contrast, variable epimutation rates across healthy B cells reflect diverse evolutionary ages across the trajectory of B cell differentiation, consistent with epimutations serving as a molecular clock. Heritable epimutation information allowed us to reconstruct lineages at high-resolution with single-cell data, and to apply this directly to patient samples. The CLL lineage tree shape revealed earlier branching and longer branch lengths than in normal B cells, reflecting rapid drift after the initial malignant transformation and a greater proliferative history. Integration of single-cell bisulfite sequencing analysis with single-cell transcriptomes and genotyping confirmed that genetic subclones mapped to distinct clades, as inferred solely on the basis of epimutation information. Finally, to examine potential lineage biases during therapy, we profiled serial samples during ibrutinib-associated lymphocytosis, and identified clades of cells that were preferentially expelled from the lymph node after treatment, marked by distinct transcriptional profiles. The single-cell integration of genetic, epigenetic and transcriptional information thus charts the lineage history of CLL and its evolution with therapy.

Cell Lineage , Epigenesis, Genetic , Evolution, Molecular , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Leukemia, Lymphocytic, Chronic, B-Cell/pathology , Base Sequence , Biological Clocks , Cell Lineage/genetics , DNA Methylation , Epigenome/genetics , Gene Expression Regulation, Neoplastic , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/metabolism , Mutation Rate , Sequence Analysis, RNA , Single-Cell Analysis , Transcription, Genetic

Corrupted coordination of epigenetic modifications leads to diverging chromatin states and transcriptional heterogeneity in CLL.

Pastore, Alessandro; Gaiti, Federico; Lu, Sydney X; Brand, Ryan M; Kulm, Scott; Chaligne, Ronan; Gu, Hongcang; Huang, Kevin Y; Stamenova, Elena K; Béguelin, Wendy; Jiang, Yanwen; Schulman, Rafael C; Kim, Kyu-Tae; Alonso, Alicia; Allan, John N; Furman, Richard R; Gnirke, Andreas; Wu, Catherine J; Melnick, Ari M; Meissner, Alexander; Bernstein, Bradley E; Abdel-Wahab, Omar; Landau, Dan A.

Nat Commun ; 10(1): 1874, 2019 04 23.

Article En | MEDLINE | ID: mdl-31015400

Cancer evolution is fueled by epigenetic as well as genetic diversity. In chronic lymphocytic leukemia (CLL), intra-tumoral DNA methylation (DNAme) heterogeneity empowers evolution. Here, to comprehensively study the epigenetic dimension of cancer evolution, we integrate DNAme analysis with histone modification mapping and single cell analyses of RNA expression and DNAme in 22 primary CLL and 13 healthy donor B lymphocyte samples. Our data reveal corrupted coherence across different layers of the CLL epigenome. This manifests in decreased mutual information across epigenetic modifications and gene expression attributed to cell-to-cell heterogeneity. Disrupted epigenetic-transcriptional coordination in CLL is also reflected in the dysregulation of the transcriptional output as a function of the combinatorial chromatin states, including incomplete Polycomb-mediated gene silencing. Notably, we observe unexpected co-mapping of typically mutually exclusive activating and repressing histone modifications, suggestive of intra-tumoral epigenetic diversity. Thus, CLL epigenetic diversification leads to decreased coordination across layers of epigenetic information, likely reflecting an admixture of cells with diverging cellular identities.

B-Lymphocytes/metabolism , Chromatin/metabolism , Epigenesis, Genetic , Gene Expression Regulation, Neoplastic , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , DNA Methylation , Evolution, Molecular , Gene Silencing , Genes, Immunoglobulin Heavy Chain/genetics , Healthy Volunteers , Histone Code/genetics , Histones/genetics , Histones/metabolism , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/blood , Polycomb-Group Proteins/genetics , Polycomb-Group Proteins/metabolism , Promoter Regions, Genetic/genetics , Sequence Analysis, RNA , Single-Cell Analysis/methods , Exome Sequencing