Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
bioRxiv ; 2024 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-38617272

RESUMO

Ebola virus (EBOV) is a high-consequence filovirus that gives rise to frequent epidemics with high case fatality rates and few therapeutic options. Here, we applied image-based screening of a genome-wide CRISPR library to systematically identify host cell regulators of Ebola virus infection in 39,085,093 million single cells. Measuring viral RNA and protein levels together with their localization in cells identified over 998 related host factors and provided detailed information about the role of each gene across the virus replication cycle. We trained a deep learning model on single-cell images to associate each host factor with predicted replication steps, and confirmed the predicted relationship for select host factors. Among the findings, we showed that the mitochondrial complex III subunit UQCRB is a post-entry regulator of Ebola virus RNA replication, and demonstrated that UQCRB inhibition with a small molecule reduced overall Ebola virus infection with an IC50 of 5 µM. Using a random forest model, we also identified perturbations that reduced infection by disrupting the equilibrium between viral RNA and protein. One such protein, STRAP, is a spliceosome-associated factor that was found to be closely associated with VP35, a viral protein required for RNA processing. Loss of STRAP expression resulted in a reduction in full-length viral genome production and subsequent production of non-infectious virus particles. Overall, the data produced in this genome-wide high-content single-cell screen and secondary screens in additional cell lines and related filoviruses (MARV and SUDV) revealed new insights about the role of host factors in virus replication and potential new targets for therapeutic intervention.

3.
Aging Cell ; 23(3): e14056, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38062919

RESUMO

Human life expectancy is constantly increasing and aging has become a major risk factor for many diseases, although the underlying gene regulatory mechanisms are still unclear. Using transcriptomic and chromosomal conformation capture (Hi-C) data from human skin fibroblasts from individuals across different age groups, we identified a tight coupling between the changes in co-regulation and co-localization of genes. We obtained transcription factors, cofactors, and chromatin regulators that could drive the cellular aging process by developing a time-course prize-collecting Steiner tree algorithm. In particular, by combining RNA-Seq data from different age groups and protein-protein interaction data we determined the key transcription regulators and gene regulatory changes at different life stage transitions. We then mapped these transcription regulators to the 3D reorganization of chromatin in young and old skin fibroblasts. Collectively, we identified key transcription regulators whose target genes are spatially rearranged and correlate with changes in their expression, thereby providing potential targets for reverting cellular aging.


Assuntos
Cromatina , Fatores de Transcrição , Humanos , Cromatina/genética , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Senescência Celular/genética , Perfilação da Expressão Gênica
4.
ArXiv ; 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38076509

RESUMO

High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

5.
bioRxiv ; 2023 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-38106037

RESUMO

Proteins on the cell membrane cluster to respond to extracellular signals; for example, adhesion proteins cluster to enhance extracellular matrix sensing; or T-cell receptors cluster to enhance antigen sensing. Importantly, the maturation of such receptor clusters requires transcriptional control to adapt and reinforce the extracellular signal sensing. However, it has been unclear how such efficient clustering mechanisms are encoded at the level of the genes that code for these receptor proteins. Using the adhesome as an example, we show that genes that code for adhesome receptor proteins are spatially co-localized and co-regulated within the cell nucleus. Towards this, we use Hi-C maps combined with RNA-seq data of adherent cells to map the correspondence between adhesome receptor proteins and their associated genes. Interestingly, we find that the transcription factors that regulate these genes are also co-localized with the adhesome gene loci, thereby potentially facilitating a transcriptional reinforcement of the extracellular matrix sensing machinery. Collectively, our results highlight an important layer of transcriptional control of cellular signal sensing.

6.
bioRxiv ; 2023 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-38106093

RESUMO

Synthetic lethality refers to a genetic interaction where the simultaneous perturbation of gene pairs leads to cell death. Synthetically lethal gene pairs (SL pairs) provide a potential avenue for selectively targeting cancer cells based on genetic vulnerabilities. The rise of large-scale gene perturbation screens such as the Cancer Dependency Map (DepMap) offers the opportunity to identify SL pairs automatically using machine learning. We build on a recently developed class of feature learning kernel machines known as Recursive Feature Machines (RFMs) to develop a pipeline for identifying SL pairs based on CRISPR viability data from DepMap. In particular, we first train RFMs to predict viability scores for a given CRISPR gene knockout from cell line embeddings consisting of gene expression and mutation features. After training, RFMs use a statistical operator known as average gradient outer product to provide weights for each feature indicating the importance of each feature in predicting cellular viability. We subsequently apply correlation-based filters to re-weight RFM feature importances and identify those features that are most indicative of low cellular viability. Our resulting pipeline is computationally efficient, taking under 3 minutes for analyzing all 17, 453 knockouts from DepMap for candidate SL pairs. We show that our pipeline more accurately recovers experimentally verified SL pairs than prior approaches. Moreover, our pipeline finds new candidate SL pairs, thereby opening novel avenues for identifying genetic vulnerabilities in cancer.

7.
Nat Commun ; 14(1): 5570, 2023 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-37689796

RESUMO

Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple models that are competitive on a variety of tasks, it has been unclear how to develop scalable kernel-based transfer learning methods across general source and target tasks with possibly differing label dimensions. In this work, we propose a transfer learning framework for kernel methods by projecting and translating the source model to the target task. We demonstrate the effectiveness of our framework in applications to image classification and virtual drug screening. For both applications, we identify simple scaling laws that characterize the performance of transfer-learned kernels as a function of the number of target examples. We explain this phenomenon in a simplified linear setting, where we are able to derive the exact scaling laws.

8.
Nat Commun ; 14(1): 2436, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37105979

RESUMO

A fundamental challenge in diagnostics is integrating multiple modalities to develop a joint characterization of physiological state. Using the heart as a model system, we develop a cross-modal autoencoder framework for integrating distinct data modalities and constructing a holistic representation of cardiovascular state. In particular, we use our framework to construct such cross-modal representations from cardiac magnetic resonance images (MRIs), containing structural information, and electrocardiograms (ECGs), containing myoelectric information. We leverage the learned cross-modal representation to (1) improve phenotype prediction from a single, accessible phenotype such as ECGs; (2) enable imputation of hard-to-acquire cardiac MRIs from easy-to-acquire ECGs; and (3) develop a framework for performing genome-wide association studies in an unsupervised manner. Our results systematically integrate distinct diagnostic modalities into a common representation that better characterizes physiologic state.


Assuntos
Sistema Cardiovascular , Estudo de Associação Genômica Ampla , Coração/diagnóstico por imagem , Sistema Cardiovascular/diagnóstico por imagem , Eletrocardiografia , Aprendizagem
9.
Proc Natl Acad Sci U S A ; 120(14): e2208779120, 2023 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-36996114

RESUMO

While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are consistent for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that are consistent. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and neural tangent kernels, we provide explicit activation functions that can be used to construct networks that achieve consistency. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: 1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); 2) majority vote (model predictions are given by the label of the class with the greatest representation in the training set); or 3) singular kernel classifiers (a set of classifiers containing those that achieve consistency). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação
10.
bioRxiv ; 2023 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-36747789

RESUMO

E3 ligases regulate key processes, but many of their roles remain unknown. Using Perturb-seq, we interrogated the function of 1,130 E3 ligases, partners and substrates in the inflammatory response in primary dendritic cells (DCs). Dozens impacted the balance of DC1, DC2, migratory DC and macrophage states and a gradient of DC maturation. Family members grouped into co-functional modules that were enriched for physical interactions and impacted specific programs through substrate transcription factors. E3s and their adaptors co-regulated the same processes, but partnered with different substrate recognition adaptors to impact distinct aspects of the DC life cycle. Genetic interactions were more prevalent within than between modules, and a deep learning model, comßVAE, predicts the outcome of new combinations by leveraging modularity. The E3 regulatory network was associated with heritable variation and aberrant gene expression in immune cells in human inflammatory diseases. Our study provides a general approach to dissect gene function.

11.
ArXiv ; 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-38168456

RESUMO

Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Prior work focused on supervised learning methods using a large set of binding affinity data for small molecules, but it is hard to apply the same strategy to other drug classes like antibodies as labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy prediction as a generative modeling task. Specifically, we train an energy-based model on a set of unlabelled protein-ligand complexes using SE(3) denoising score matching and interpret its log-likelihood as binding affinity. Our key contribution is a new equivariant rotation prediction network called Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts a rotation by modeling the force and torque between protein and ligand atoms, where the force is defined as the gradient of an energy function with respect to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen binding affinity prediction benchmarks. Our model outperforms all unsupervised baselines (physics-based and statistical potentials) and matches supervised learning methods in the antibody case.

12.
Nat Commun ; 13(1): 7480, 2022 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-36463283

RESUMO

Tissue development and disease lead to changes in cellular organization, nuclear morphology, and gene expression, which can be jointly measured by spatial transcriptomic technologies. However, methods for jointly analyzing the different spatial data modalities in 3D are still lacking. We present a computational framework to integrate Spatial Transcriptomic data using over-parameterized graph-based Autoencoders with Chromatin Imaging data (STACI) to identify molecular and functional alterations in tissues. STACI incorporates multiple modalities in a single representation for downstream tasks, enables the prediction of spatial transcriptomic data from nuclear images in unseen tissue sections, and provides built-in batch correction of gene expression and tissue morphology through over-parameterization. We apply STACI to analyze the spatio-temporal progression of Alzheimer's disease and identify the associated nuclear morphometric and coupled gene expression features. Collectively, we demonstrate the importance of characterizing disease progression by integrating multiple data modalities and its potential for the discovery of disease biomarkers.


Assuntos
Doença de Alzheimer , Cromatina , Humanos , Cromatina/genética , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Transcriptoma/genética , Biomarcadores , Tecnologia
13.
Sci Rep ; 12(1): 17318, 2022 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-36243826

RESUMO

Long-term sustained mechano-chemical signals in tissue microenvironment regulate cell-state transitions. In recent work, we showed that laterally confined growth of fibroblasts induce dedifferentiation programs. However, the molecular mechanisms underlying such mechanically induced cell-state transitions are poorly understood. In this paper, we identify Lef1 as a critical somatic transcription factor for the mechanical regulation of de-differentiation pathways. Network optimization methods applied to time-lapse RNA-seq data identify Lef1 dependent signaling as potential regulators of such cell-state transitions. We show that Lef1 knockdown results in the down-regulation of fibroblast de-differentiation and that Lef1 directly interacts with the promoter regions of downstream reprogramming factors. We also evaluate the potential upstream activation pathways of Lef1, including the Smad4, Atf2, NFkB and Beta-catenin pathways, thereby identifying that Smad4 and Atf2 may be critical for Lef1 activation. Collectively, we describe an important mechanotransduction pathway, including Lef1, which upon activation, through progressive lateral cell confinement, results in fibroblast de-differentiation.


Assuntos
Mecanotransdução Celular , beta Catenina , Diferenciação Celular/genética , Fator 1 de Ligação ao Facilitador Linfoide/genética , Fator 1 de Ligação ao Facilitador Linfoide/metabolismo , Transdução de Sinais , Fatores de Transcrição/metabolismo , beta Catenina/genética , beta Catenina/metabolismo
14.
Found Comut Math ; : 1-35, 2022 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-35935470

RESUMO

In this review, we discuss approaches for learning causal structure from data, also called causal discovery. In particular, we focus on approaches for learning directed acyclic graphs and various generalizations which allow for some variables to be unobserved in the available data. We devote special attention to two fundamental combinatorial aspects of causal structure learning. First, we discuss the structure of the search space over causal graphs. Second, we discuss the structure of equivalence classes over causal graphs, i.e., sets of graphs which represent what can be learned from observational data alone, and how these equivalence classes can be refined by adding interventional data.

15.
Proc Natl Acad Sci U S A ; 119(16): e2115064119, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-35412891

RESUMO

Matrix completion problems arise in many applications including recommendation systems, computer vision, and genomics. Increasingly larger neural networks have been successful in many of these applications but at considerable computational costs. Remarkably, taking the width of a neural network to infinity allows for improved computational performance. In this work, we develop an infinite width neural network framework for matrix completion that is simple, fast, and flexible. Simplicity and speed come from the connection between the infinite width limit of neural networks and kernels known as neural tangent kernels (NTK). In particular, we derive the NTK for fully connected and convolutional neural networks for matrix completion. The flexibility stems from a feature prior, which allows encoding relationships between coordinates of the target matrix, akin to semisupervised learning. The effectiveness of our framework is demonstrated through competitive results for virtual drug screening and image inpainting/reconstruction. We also provide an implementation in Python to make our framework accessible on standard hardware to a broad audience.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Computadores , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado
16.
Bioinformatics ; 37(18): 3067-3069, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33704425

RESUMO

SUMMARY: Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphs given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. AVAILABILITY AND IMPLEMENTATION: Python package freely available at http://uhlerlab.github.io/causaldag/dci. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Regulação da Expressão Gênica
17.
Curr Opin Solid State Mater Sci ; 25(1): 100874, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33519291

RESUMO

In this Current Opinion, we highlight the importance of the material properties of tissues and how alterations therein, which influence epithelial-to-mesenchymal transitions, represent an important layer of regulation in a number of diseases and potentially also play a critical role in host-pathogen interactions. In light of the current SARS-CoV-2 pandemic, we here highlight the possible role of lung tissue stiffening with ageing and how this might facilitate increased SARS-CoV-2 replication through matrix-stiffness dependent epithelial-to-mesenchymal transitions of the lung epithelium. This emphasizes the need for integrating material properties of tissues in drug discovery programs.

18.
Nat Commun ; 12(1): 1024, 2021 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-33589624

RESUMO

Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. While a number of data-driven and experimental approaches have been suggested in the context of drug repurposing, a platform that systematically integrates available transcriptomic, proteomic and structural data is missing. More importantly, given that SARS-CoV-2 pathogenicity is highly age-dependent, it is critical to integrate aging signatures into drug discovery platforms. We here take advantage of large-scale transcriptional drug screens combined with RNA-seq data of the lung epithelium with SARS-CoV-2 infection as well as the aging lung. To identify robust druggable protein targets, we propose a principled causal framework that makes use of multiple data modalities. Our analysis highlights the importance of serine/threonine and tyrosine kinases as potential targets that intersect the SARS-CoV-2 and aging pathways. By integrating transcriptomic, proteomic and structural data that is available for many diseases, our drug discovery platform is broadly applicable. Rigorous in vitro experiments as well as clinical trials are needed to validate the identified candidate drugs.


Assuntos
Envelhecimento/fisiologia , Tratamento Farmacológico da COVID-19 , COVID-19/genética , Reposicionamento de Medicamentos , Células A549 , Algoritmos , Enzima de Conversão de Angiotensina 2/metabolismo , Antivirais/uso terapêutico , COVID-19/metabolismo , Descoberta de Drogas , Expressão Gênica , Redes Reguladoras de Genes , Humanos , Proteômica , SARS-CoV-2 , Transcriptoma
19.
Nat Commun ; 12(1): 31, 2021 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-33397893

RESUMO

The development of single-cell methods for capturing different data modalities including imaging and sequencing has revolutionized our ability to identify heterogeneous cell states. Different data modalities provide different perspectives on a population of cells, and their integration is critical for studying cellular heterogeneity and its function. While various methods have been proposed to integrate different sequencing data modalities, coupling imaging and sequencing has been an open challenge. We here present an approach for integrating vastly different modalities by learning a probabilistic coupling between the different data modalities using autoencoders to map to a shared latent space. We validate this approach by integrating single-cell RNA-seq and chromatin images to identify distinct subpopulations of human naive CD4+ T-cells that are poised for activation. Collectively, our approach provides a framework to integrate and translate between data modalities that cannot yet be measured within the same cell for diverse applications in biomedical discovery.


Assuntos
Algoritmos , Linfócitos T CD4-Positivos/imunologia , Análise de Célula Única , Núcleo Celular/metabolismo , Cromatina/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Análise de Componente Principal , Curva ROC , Reprodutibilidade dos Testes , Análise de Sequência de RNA
20.
Proc Natl Acad Sci U S A ; 117(44): 27162-27170, 2020 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-33067397

RESUMO

Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks trained using standard optimization methods implement such a mechanism for real-valued data. We provide empirical evidence that 1) overparameterized autoencoders store training samples as attractors and thus iterating the learned map leads to sample recovery, and that 2) the same mechanism allows for encoding sequences of examples and serves as an even more efficient mechanism for memory than autoencoding. Theoretically, we prove that when trained on a single example, autoencoders store the example as an attractor. Lastly, by treating a sequence encoder as a composition of maps, we prove that sequence encoding provides a more efficient mechanism for memory than autoencoding.


Assuntos
Biologia Computacional/métodos , Memória/fisiologia , Redes Neurais de Computação , Aprendizado de Máquina , Dinâmica não Linear
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA