Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39091825

RESUMO

The subcellular localization of a protein is important for its function and interaction with other molecules, and its mislocalization is linked to numerous diseases. While atlas-scale efforts have been made to profile protein localization across various cell lines, existing datasets only contain limited pairs of proteins and cell lines which do not cover all human proteins. We present a method that uses both protein sequences and cellular landmark images to perform Predictions of Unseen Proteins' Subcellular localization (PUPS), which can generalize to both proteins and cell lines not used for model training. PUPS combines a protein language model and an image inpainting model to utilize both protein sequence and cellular images for protein localization prediction. The protein sequence input enables generalization to unseen proteins and the cellular image input enables cell type specific prediction that captures single-cell variability. PUPS' ability to generalize to unseen proteins and cell lines enables us to assess the variability in protein localization across cell lines as well as across single cells within a cell line and to identify the biological processes associated with the proteins that have variable localization. Experimental validation shows that PUPS can be used to predict protein localization in newly performed experiments outside of the Human Protein Atlas used for training. Collectively, PUPS utilizes both protein sequences and cellular images to predict protein localization in unseen proteins and cell lines with the ability to capture single-cell variability.

2.
Nat Commun ; 15(1): 6112, 2024 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-39030176

RESUMO

Ductal carcinoma in situ (DCIS) is a pre-invasive tumor that can progress to invasive breast cancer, a leading cause of cancer death. We generate a large-scale tissue microarray dataset of chromatin images, from 560 samples from 122 female patients in 3 disease stages and 11 phenotypic categories. Using representation learning on chromatin images alone, without multiplexed staining or high-throughput sequencing, we identify eight morphological cell states and tissue features marking DCIS. All cell states are observed in all disease stages with different proportions, indicating that cell states enriched in invasive cancer exist in small fractions in normal breast tissue. Tissue-level analysis reveals significant changes in the spatial organization of cell states across disease stages, which is predictive of disease stage and phenotypic category. Taken together, we show that chromatin imaging represents a powerful measure of cell state and disease stage of DCIS, providing a simple and effective tumor biomarker.


Assuntos
Neoplasias da Mama , Carcinoma Intraductal não Infiltrante , Cromatina , Humanos , Feminino , Carcinoma Intraductal não Infiltrante/patologia , Carcinoma Intraductal não Infiltrante/genética , Carcinoma Intraductal não Infiltrante/metabolismo , Cromatina/metabolismo , Neoplasias da Mama/patologia , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Biomarcadores Tumorais/metabolismo , Biomarcadores Tumorais/genética , Aprendizado de Máquina não Supervisionado , Processamento de Imagem Assistida por Computador/métodos , Análise Serial de Tecidos , Estadiamento de Neoplasias
3.
bioRxiv ; 2024 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-38617272

RESUMO

Ebola virus (EBOV) is a high-consequence filovirus that gives rise to frequent epidemics with high case fatality rates and few therapeutic options. Here, we applied image-based screening of a genome-wide CRISPR library to systematically identify host cell regulators of Ebola virus infection in 39,085,093 million single cells. Measuring viral RNA and protein levels together with their localization in cells identified over 998 related host factors and provided detailed information about the role of each gene across the virus replication cycle. We trained a deep learning model on single-cell images to associate each host factor with predicted replication steps, and confirmed the predicted relationship for select host factors. Among the findings, we showed that the mitochondrial complex III subunit UQCRB is a post-entry regulator of Ebola virus RNA replication, and demonstrated that UQCRB inhibition with a small molecule reduced overall Ebola virus infection with an IC50 of 5 µM. Using a random forest model, we also identified perturbations that reduced infection by disrupting the equilibrium between viral RNA and protein. One such protein, STRAP, is a spliceosome-associated factor that was found to be closely associated with VP35, a viral protein required for RNA processing. Loss of STRAP expression resulted in a reduction in full-length viral genome production and subsequent production of non-infectious virus particles. Overall, the data produced in this genome-wide high-content single-cell screen and secondary screens in additional cell lines and related filoviruses (MARV and SUDV) revealed new insights about the role of host factors in virus replication and potential new targets for therapeutic intervention.

4.
bioRxiv ; 2023 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-38106093

RESUMO

Synthetic lethality refers to a genetic interaction where the simultaneous perturbation of gene pairs leads to cell death. Synthetically lethal gene pairs (SL pairs) provide a potential avenue for selectively targeting cancer cells based on genetic vulnerabilities. The rise of large-scale gene perturbation screens such as the Cancer Dependency Map (DepMap) offers the opportunity to identify SL pairs automatically using machine learning. We build on a recently developed class of feature learning kernel machines known as Recursive Feature Machines (RFMs) to develop a pipeline for identifying SL pairs based on CRISPR viability data from DepMap. In particular, we first train RFMs to predict viability scores for a given CRISPR gene knockout from cell line embeddings consisting of gene expression and mutation features. After training, RFMs use a statistical operator known as average gradient outer product to provide weights for each feature indicating the importance of each feature in predicting cellular viability. We subsequently apply correlation-based filters to re-weight RFM feature importances and identify those features that are most indicative of low cellular viability. Our resulting pipeline is computationally efficient, taking under 3 minutes for analyzing all 17, 453 knockouts from DepMap for candidate SL pairs. We show that our pipeline more accurately recovers experimentally verified SL pairs than prior approaches. Moreover, our pipeline finds new candidate SL pairs, thereby opening novel avenues for identifying genetic vulnerabilities in cancer.

5.
bioRxiv ; 2023 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-38106037

RESUMO

Proteins on the cell membrane cluster to respond to extracellular signals; for example, adhesion proteins cluster to enhance extracellular matrix sensing; or T-cell receptors cluster to enhance antigen sensing. Importantly, the maturation of such receptor clusters requires transcriptional control to adapt and reinforce the extracellular signal sensing. However, it has been unclear how such efficient clustering mechanisms are encoded at the level of the genes that code for these receptor proteins. Using the adhesome as an example, we show that genes that code for adhesome receptor proteins are spatially co-localized and co-regulated within the cell nucleus. Towards this, we use Hi-C maps combined with RNA-seq data of adherent cells to map the correspondence between adhesome receptor proteins and their associated genes. Interestingly, we find that the transcription factors that regulate these genes are also co-localized with the adhesome gene loci, thereby potentially facilitating a transcriptional reinforcement of the extracellular matrix sensing machinery. Collectively, our results highlight an important layer of transcriptional control of cellular signal sensing.

6.
ArXiv ; 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-38168456

RESUMO

Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Prior work focused on supervised learning methods using a large set of binding affinity data for small molecules, but it is hard to apply the same strategy to other drug classes like antibodies as labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy prediction as a generative modeling task. Specifically, we train an energy-based model on a set of unlabelled protein-ligand complexes using SE(3) denoising score matching and interpret its log-likelihood as binding affinity. Our key contribution is a new equivariant rotation prediction network called Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts a rotation by modeling the force and torque between protein and ligand atoms, where the force is defined as the gradient of an energy function with respect to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen binding affinity prediction benchmarks. Our model outperforms all unsupervised baselines (physics-based and statistical potentials) and matches supervised learning methods in the antibody case.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa