Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 272
Filter
1.
Cell Rep ; 43(7): 114436, 2024 Jul 04.
Article in English | MEDLINE | ID: mdl-38968069

ABSTRACT

Single-gene missense mutations remain challenging to interpret. Here, we deploy scalable functional screening by sequencing (SEUSS), a Perturb-seq method, to generate mutations at protein interfaces of RUNX1 and quantify their effect on activities of downstream cellular programs. We evaluate single-cell RNA profiles of 115 mutations in myelogenous leukemia cells and categorize them into three functionally distinct groups, wild-type (WT)-like, loss-of-function (LoF)-like, and hypomorphic, that we validate in orthogonal assays. LoF-like variants dominate the DNA-binding site and are recurrent in cancer; however, recurrence alone does not predict functional impact. Hypomorphic variants share characteristics with LoF-like but favor protein interactions, promoting gene expression indicative of nerve growth factor (NGF) response and cytokine recruitment of neutrophils. Accessible DNA near differentially expressed genes frequently contains RUNX1-binding motifs. Finally, we reclassify 16 variants of uncertain significance and train a classifier to predict 103 more. Our work demonstrates the potential of targeting protein interactions to better define the landscape of phenotypes reachable by missense mutations.

2.
bioRxiv ; 2024 May 24.
Article in English | MEDLINE | ID: mdl-38826258

ABSTRACT

This article describes the Cell Maps for Artificial Intelligence (CM4AI) project and its goals, methods, standards, current datasets, software tools , status, and future directions. CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institute of Health's (NIH) Bridge2AI program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research.

3.
Sci Rep ; 14(1): 13989, 2024 06 18.
Article in English | MEDLINE | ID: mdl-38886371

ABSTRACT

In vitro evolution and whole genome analysis has proven to be a powerful method for studying the mechanism of action of small molecules in many haploid microbes but has generally not been applied to human cell lines in part because their diploid state complicates the identification of variants that confer drug resistance. To determine if haploid human cells could be used in MOA studies, we evolved resistance to five different anticancer drugs (doxorubicin, gemcitabine, etoposide, topotecan, and paclitaxel) using a near-haploid cell line (HAP1) and then analyzed the genomes of the drug resistant clones, developing a bioinformatic pipeline that involved filtering for high frequency alleles predicted to change protein sequence, or alleles which appeared in the same gene for multiple independent selections with the same compound. Applying the filter to sequences from 28 drug resistant clones identified a set of 21 genes which was strongly enriched for known resistance genes or known drug targets (TOP1, TOP2A, DCK, WDR33, SLCO3A1). In addition, some lines carried structural variants that encompassed additional known resistance genes (ABCB1, WWOX and RRM1). Gene expression knockdown and knockout experiments of 10 validation targets showed a high degree of specificity and accuracy in our calls and demonstrates that the same drug resistance mechanisms found in diverse clinical samples can be evolved, discovered and studied in an isogenic background.


Subject(s)
Antineoplastic Agents , Drug Resistance, Neoplasm , Haploidy , Humans , Drug Resistance, Neoplasm/genetics , Antineoplastic Agents/pharmacology , Genome, Human , Whole Genome Sequencing/methods , Cell Line
4.
Bioinformatics ; 40(Supplement_1): i160-i168, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940147

ABSTRACT

MOTIVATION: Predicting cancer drug response requires a comprehensive assessment of many mutations present across a tumor genome. While current drug response models generally use a binary mutated/unmutated indicator for each gene, not all mutations in a gene are equivalent. RESULTS: Here, we construct and evaluate a series of predictive models based on leading methods for quantitative mutation scoring. Such methods include VEST4 and CADD, which score the impact of a mutation on gene function, and CHASMplus, which scores the likelihood a mutation drives cancer. The resulting predictive models capture cellular responses to dabrafenib, which targets BRAF-V600 mutations, whereas models based on binary mutation status do not. Performance improvements generalize to other drugs, extending genetic indications for PIK3CA, ERBB2, EGFR, PARP1, and ABL1 inhibitors. Introducing quantitative mutation features in drug response models increases performance and mechanistic understanding. AVAILABILITY AND IMPLEMENTATION: Code and example datasets are available at https://github.com/pgwall/qms.


Subject(s)
Antineoplastic Agents , Mutation , Neoplasms , Humans , Neoplasms/genetics , Neoplasms/drug therapy , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Imidazoles/pharmacology , Oximes/pharmacology , Computational Biology/methods
5.
Article in English | MEDLINE | ID: mdl-38748859

ABSTRACT

While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.

6.
Nat Commun ; 15(1): 3636, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38710699

ABSTRACT

Polypharmacology drugs-compounds that inhibit multiple proteins-have many applications but are difficult to design. To address this challenge we have developed POLYGON, an approach to polypharmacology based on generative reinforcement learning. POLYGON embeds chemical space and iteratively samples it to generate new molecular structures; these are rewarded by the predicted ability to inhibit each of two protein targets and by drug-likeness and ease-of-synthesis. In binding data for >100,000 compounds, POLYGON correctly recognizes polypharmacology interactions with 82.5% accuracy. We subsequently generate de-novo compounds targeting ten pairs of proteins with documented co-dependency. Docking analysis indicates that top structures bind their two targets with low free energies and similar 3D orientations to canonical single-protein inhibitors. We synthesize 32 compounds targeting MEK1 and mTOR, with most yielding >50% reduction in each protein activity and in cell viability when dosed at 1-10 µM. These results support the potential of generative modeling for polypharmacology.


Subject(s)
Molecular Docking Simulation , Humans , TOR Serine-Threonine Kinases/metabolism , Polypharmacology , MAP Kinase Kinase 1/antagonists & inhibitors , MAP Kinase Kinase 1/metabolism , MAP Kinase Kinase 1/chemistry , Protein Kinase Inhibitors/pharmacology , Protein Kinase Inhibitors/chemistry , Protein Binding , Drug Discovery/methods , Drug Design , Cell Survival/drug effects
7.
bioRxiv ; 2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38746239

ABSTRACT

Advancements in genomic and proteomic technologies have powered the use of gene and protein networks ("interactomes") for understanding genotype-phenotype translation. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 46 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks. Our analysis shows that large composite networks such as HumanNet, STRING, and FunCoup are most effective for identifying disease genes, while smaller networks such as DIP and SIGNOR demonstrate strong interaction prediction performance. These findings provide a benchmark for interactomes across diverse network biology applications and clarify factors that influence network performance. Furthermore, our evaluation pipeline paves the way for continued assessment of emerging and updated interaction networks in the future.

8.
Nat Cancer ; 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38443662

ABSTRACT

Cyclin-dependent kinase 4 and 6 inhibitors (CDK4/6is) have revolutionized breast cancer therapy. However, <50% of patients have an objective response, and nearly all patients develop resistance during therapy. To elucidate the underlying mechanisms, we constructed an interpretable deep learning model of the response to palbociclib, a CDK4/6i, based on a reference map of multiprotein assemblies in cancer. The model identifies eight core assemblies that integrate rare and common alterations across 90 genes to stratify palbociclib-sensitive versus palbociclib-resistant cell lines. Predictions translate to patients and patient-derived xenografts, whereas single-gene biomarkers do not. Most predictive assemblies can be shown by CRISPR-Cas9 genetic disruption to regulate the CDK4/6i response. Validated assemblies relate to cell-cycle control, growth factor signaling and a histone regulatory complex that we show promotes S-phase entry through the activation of the histone modifiers KAT6A and TBL1XR1 and the transcription factor RUNX1. This study enables an integrated assessment of how a tumor's genetic profile modulates CDK4/6i resistance.

9.
bioRxiv ; 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38464225

ABSTRACT

Genome-wide association studies (GWAS) have identified hundreds of common variants associated with alcohol consumption. In contrast, rare variants have only begun to be studied for their role in alcohol consumption. No studies have examined whether common and rare variants implicate the same genes and molecular networks. To address this knowledge gap, we used publicly available alcohol consumption GWAS summary statistics (GSCAN, N=666,978) and whole exome sequencing data (Genebass, N=393,099) to identify a set of common and rare variants for alcohol consumption. Gene-based analysis of each dataset have implicated 294 (common variants) and 35 (rare variants) genes, including ethanol metabolizing genes ADH1B and ADH1C, which were identified by both analyses, and ANKRD12, GIGYF1, KIF21B, and STK31, which were identified only by rare variant analysis, but have been associated with related psychiatric traits. We then used a network colocalization procedure to propagate the common and rare gene sets onto a shared molecular network, revealing significant overlap. The shared network identified gene families that function in alcohol metabolism, including ADH, ALDH, CYP, and UGT. 74 of the genes in the network were previously implicated in comorbid psychiatric or substance use disorders, but had not previously been identified for alcohol-related behaviors, including EXOC2, EPM2A, CACNB3, and CACNG4. Differential gene expression analysis showed enrichment in the liver and several brain regions supporting the role of network genes in alcohol consumption. Thus, genes implicated by common and rare variants identify shared functions relevant to alcohol consumption, which also underlie psychiatric traits and substance use disorders that are comorbid with alcohol use.

10.
Cancer Discov ; 14(3): 508-523, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38236062

ABSTRACT

Rapid proliferation is a hallmark of cancer associated with sensitivity to therapeutics that cause DNA replication stress (RS). Many tumors exhibit drug resistance, however, via molecular pathways that are incompletely understood. Here, we develop an ensemble of predictive models that elucidate how cancer mutations impact the response to common RS-inducing (RSi) agents. The models implement recent advances in deep learning to facilitate multidrug prediction and mechanistic interpretation. Initial studies in tumor cells identify 41 molecular assemblies that integrate alterations in hundreds of genes for accurate drug response prediction. These cover roles in transcription, repair, cell-cycle checkpoints, and growth signaling, of which 30 are shown by loss-of-function genetic screens to regulate drug sensitivity or replication restart. The model translates to cisplatin-treated cervical cancer patients, highlighting an RTK-JAK-STAT assembly governing resistance. This study defines a compendium of mechanisms by which mutations affect therapeutic responses, with implications for precision medicine. SIGNIFICANCE: Zhao and colleagues use recent advances in machine learning to study the effects of tumor mutations on the response to common therapeutics that cause RS. The resulting predictive models integrate numerous genetic alterations distributed across a constellation of molecular assemblies, facilitating a quantitative and interpretable assessment of drug response. This article is featured in Selected Articles from This Issue, p. 384.


Subject(s)
Uterine Cervical Neoplasms , Humans , Female , Mutation , Signal Transduction , Cisplatin/pharmacology , Cisplatin/therapeutic use , Machine Learning
11.
Cell Genom ; 4(1): 100466, 2024 Jan 10.
Article in English | MEDLINE | ID: mdl-38190108

ABSTRACT

The data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches. Those included increasing the amount and variety of training datasets by integrating genomic with multiomics, context-specific (e.g., by cell type), and social determinants of health datasets; reducing the inherent biases of training datasets; prioritizing transparency and interpretability of ML methods; and developing privacy-preserving technologies for research participants' data.


Subject(s)
Bioethics , Genomics , Humans , Algorithms , Privacy , Machine Learning
12.
bioRxiv ; 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38076945

ABSTRACT

Translating high-confidence (hc) autism spectrum disorder (ASD) genes into viable treatment targets remains elusive. We constructed a foundational protein-protein interaction (PPI) network in HEK293T cells involving 100 hcASD risk genes, revealing over 1,800 PPIs (87% novel). Interactors, expressed in the human brain and enriched for ASD but not schizophrenia genetic risk, converged on protein complexes involved in neurogenesis, tubulin biology, transcriptional regulation, and chromatin modification. A PPI map of 54 patient-derived missense variants identified differential physical interactions, and we leveraged AlphaFold-Multimer predictions to prioritize direct PPIs and specific variants for interrogation in Xenopus tropicalis and human forebrain organoids. A mutation in the transcription factor FOXP1 led to reconfiguration of DNA binding sites and altered development of deep cortical layer neurons in forebrain organoids. This work offers new insights into molecular mechanisms underlying ASD and describes a powerful platform to develop and test therapeutic strategies for many genetically-defined conditions.

13.
ArXiv ; 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-37731657

ABSTRACT

Gene set analysis is a mainstay of functional genomics, but it relies on curated databases of gene functions that are incomplete. Here we evaluate five Large Language Models (LLMs) for their ability to discover the common biological functions represented by a gene set, substantiated by supporting rationale, citations and a confidence assessment. Benchmarking against canonical gene sets from the Gene Ontology, GPT-4 confidently recovered the curated name or a more general concept (73% of cases), while benchmarking against random gene sets correctly yielded zero confidence. Gemini-Pro and Mixtral-Instruct showed ability in naming but were falsely confident for random sets, whereas Llama2-70b had poor performance overall. In gene sets derived from 'omics data, GPT-4 identified novel functions not reported by classical functional enrichment (32% of cases), which independent review indicated were largely verifiable and not hallucinations. The ability to rapidly synthesize common gene functions positions LLMs as valuable 'omics assistants.

14.
Pac Symp Biocomput ; 29: 661-665, 2024.
Article in English | MEDLINE | ID: mdl-38160316

ABSTRACT

Cells consist of large components, such as organelles, that recursively factor into smaller systems, such as condensates and protein complexes, forming a dynamic multi-scale structure of the cell. Recent technological innovations have paved the way for systematic interrogation of subcellular structures, yielding unprecedented insights into their roles and interactions. In this workshop, we discuss progress, challenges, and collaboration to marshal various computational approaches toward assembling an integrated structural map of the human cell.


Subject(s)
Computational Biology , Organelles , Humans , Organelles/chemistry , Organelles/metabolism , Organelles/ultrastructure
15.
bioRxiv ; 2023 Dec 09.
Article in English | MEDLINE | ID: mdl-38106096

ABSTRACT

DNA methylation marks have recently been used to build models known as "epigenetic clocks" which predict calendar age. As methylation of cytosine promotes C-to-T mutations, we hypothesized that the methylation changes observed with age should reflect the accrual of somatic mutations, and the two should yield analogous aging estimates. In analysis of multimodal data from 9,331 human individuals, we find that CpG mutations indeed coincide with changes in methylation, not only at the mutated site but also with pervasive remodeling of the methylome out to ±10 kilobases. This one-to-many mapping enables mutation-based predictions of age that agree with epigenetic clocks, including which individuals are aging faster or slower than expected. Moreover, genomic loci where mutations accumulate with age also tend to have methylation patterns that are especially predictive of age. These results suggest a close coupling between the accumulation of sporadic somatic mutations and the widespread changes in methylation observed over the course of life.

16.
Res Sq ; 2023 Sep 18.
Article in English | MEDLINE | ID: mdl-37790547

ABSTRACT

Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

17.
bioRxiv ; 2023 Sep 22.
Article in English | MEDLINE | ID: mdl-37786690

ABSTRACT

Desmosomes are transmembrane protein complexes that contribute to cell-cell adhesion in epithelia and other tissues. Here, we report the discovery of frequent genetic alterations in the desmosome in human cancers, with the strongest signal seen in cutaneous melanoma where desmosomes are mutated in over 70% of cases. In primary but not metastatic melanoma biopsies, the burden of coding mutations on desmosome genes associates with a strong reduction in desmosome gene expression. Analysis by spatial transcriptomics suggests that these expression decreases occur in keratinocytes in the microenvironment rather than in primary melanoma tumor cells. In further support of a microenvironmental origin, we find that loss-of-function knockdowns of the desmosome in keratinocytes yield markedly increased proliferation of adjacent melanocytes in keratinocyte/melanocyte co-cultures. Thus, gradual accumulation of desmosome mutations in neighboring cells may prime melanocytes for neoplastic transformation.

18.
Cell Rep ; 42(8): 112873, 2023 08 29.
Article in English | MEDLINE | ID: mdl-37527041

ABSTRACT

A vexing observation in genome-wide association studies (GWASs) is that parallel analyses in different species may not identify orthologous genes. Here, we demonstrate that cross-species translation of GWASs can be greatly improved by an analysis of co-localization within molecular networks. Using body mass index (BMI) as an example, we show that the genes associated with BMI in humans lack significant agreement with those identified in rats. However, the networks interconnecting these genes show substantial overlap, highlighting common mechanisms including synaptic signaling, epigenetic modification, and hormonal regulation. Genetic perturbations within these networks cause abnormal BMI phenotypes in mice, too, supporting their broad conservation across mammals. Other mechanisms appear species specific, including carbohydrate biosynthesis (humans) and glycerolipid metabolism (rodents). Finally, network co-localization also identifies cross-species convergence for height/body length. This study advances a general paradigm for determining whether and how phenotypes measured in model species recapitulate human biology.


Subject(s)
Body Mass Index , Gene Regulatory Networks , Genome-Wide Association Study , Humans , Animals , Rats , Body Size , Mice , Species Specificity
19.
Cancer Discov ; 13(10): 2270-2291, 2023 Oct 05.
Article in English | MEDLINE | ID: mdl-37553760

ABSTRACT

Oncogenes can initiate tumors only in certain cellular contexts, which is referred to as oncogenic competence. In melanoma, whether cells in the microenvironment can endow such competence remains unclear. Using a combination of zebrafish transgenesis coupled with human tissues, we demonstrate that GABAergic signaling between keratinocytes and melanocytes promotes melanoma initiation by BRAFV600E. GABA is synthesized in melanoma cells, which then acts on GABA-A receptors in keratinocytes. Electron microscopy demonstrates specialized cell-cell junctions between keratinocytes and melanoma cells, and multielectrode array analysis shows that GABA acts to inhibit electrical activity in melanoma/keratinocyte cocultures. Genetic and pharmacologic perturbation of GABA synthesis abrogates melanoma initiation in vivo. These data suggest that GABAergic signaling across the skin microenvironment regulates the ability of oncogenes to initiate melanoma. SIGNIFICANCE: This study shows evidence of GABA-mediated regulation of electrical activity between melanoma cells and keratinocytes, providing a new mechanism by which the microenvironment promotes tumor initiation. This provides insights into the role of the skin microenvironment in early melanomas while identifying GABA as a potential therapeutic target in melanoma. See related commentary by Ceol, p. 2128. This article is featured in Selected Articles from This Issue, p. 2109.


Subject(s)
Melanoma , Animals , Humans , Melanoma/drug therapy , Melanoma/genetics , Melanoma/pathology , Zebrafish , Melanocytes/pathology , Skin , Keratinocytes , Cell Transformation, Neoplastic/genetics , gamma-Aminobutyric Acid , Tumor Microenvironment
20.
bioRxiv ; 2023 Aug 04.
Article in English | MEDLINE | ID: mdl-37577681

ABSTRACT

Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. Here we deploy SEUSS, a Perturb-seq like approach, to generate and assay mutations at physical interfaces of the RUNX1 Runt domain. We measured the impact of 115 mutations on RNA profiles in single myelogenous leukemia cells and used the profiles to categorize mutations into three functionally distinct groups: wild-type (WT)-like, loss-of-function (LOF)-like and hypomorphic. Notably, the largest concentration of functional mutations (non-WT-like) clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Hypomorphic variants shared characteristics with loss of function variants but had gene expression profiles indicative of response to neural growth factor and cytokine recruitment of neutrophils. Additionally, DNA accessibility changes upon perturbations were enriched for RUNX1 binding motifs, particularly near differentially expressed genes. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.

SELECTION OF CITATIONS
SEARCH DETAIL
...