Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Nat Rev Mol Cell Biol ; 18(12): 717-727, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29044247

RESUMEN

It is well established that cells sense chemical signals from their local microenvironment and transduce them to the nucleus to regulate gene expression programmes. Although a number of experiments have shown that mechanical cues can also modulate gene expression, the underlying mechanisms are far from clear. Nevertheless, we are now beginning to understand how mechanical cues are transduced to the nucleus and how they influence nuclear mechanics, genome organization and transcription. In particular, recent progress in super-resolution imaging, in genome-wide application of RNA sequencing, chromatin immunoprecipitation and chromosome conformation capture and in theoretical modelling of 3D genome organization enables the exploration of the relationship between cell mechanics, 3D chromatin configurations and transcription, thereby shedding new light on how mechanical forces regulate gene expression.


Asunto(s)
Ensamble y Desensamble de Cromatina/fisiología , Cromatina/fisiología , Genoma Humano/fisiología , Mecanotransducción Celular/fisiología , Modelos Genéticos , Animales , Humanos
3.
Proc Natl Acad Sci U S A ; 120(14): e2208779120, 2023 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-36996114

RESUMEN

While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are consistent for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that are consistent. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and neural tangent kernels, we provide explicit activation functions that can be used to construct networks that achieve consistency. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: 1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); 2) majority vote (model predictions are given by the label of the class with the greatest representation in the training set); or 3) singular kernel classifiers (a set of classifiers containing those that achieve consistency). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación
4.
Proc Natl Acad Sci U S A ; 119(16): e2115064119, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-35412891

RESUMEN

Matrix completion problems arise in many applications including recommendation systems, computer vision, and genomics. Increasingly larger neural networks have been successful in many of these applications but at considerable computational costs. Remarkably, taking the width of a neural network to infinity allows for improved computational performance. In this work, we develop an infinite width neural network framework for matrix completion that is simple, fast, and flexible. Simplicity and speed come from the connection between the infinite width limit of neural networks and kernels known as neural tangent kernels (NTK). In particular, we derive the NTK for fully connected and convolutional neural networks for matrix completion. The flexibility stems from a feature prior, which allows encoding relationships between coordinates of the target matrix, akin to semisupervised learning. The effectiveness of our framework is demonstrated through competitive results for virtual drug screening and image inpainting/reconstruction. We also provide an implementation in Python to make our framework accessible on standard hardware to a broad audience.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Computadores , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Aprendizaje Automático Supervisado
5.
Cell ; 134(3): 416-26, 2008 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-18692465

RESUMEN

A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000 year-old Neandertal individual with 8341 mtDNA sequences identified among 4.8 Gb of DNA generated from approximately 0.3 g of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs, and allows an estimate of the divergence date between the two mtDNA lineages of 660,000 +/- 140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared with other primate lineages, suggesting that the effective population size of Neandertals was small.


Asunto(s)
Evolución Molecular , Fósiles , Hominidae/genética , Análisis de Secuencia de ADN/métodos , Animales , Secuencia de Bases , Huesos/metabolismo , Croacia , Ciclooxigenasa 2/química , ADN Mitocondrial/genética , Genoma Mitocondrial , Humanos , Modelos Moleculares , Datos de Secuencia Molecular
6.
Proc Natl Acad Sci U S A ; 117(44): 27162-27170, 2020 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-33067397

RESUMEN

Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks trained using standard optimization methods implement such a mechanism for real-valued data. We provide empirical evidence that 1) overparameterized autoencoders store training samples as attractors and thus iterating the learned map leads to sample recovery, and that 2) the same mechanism allows for encoding sequences of examples and serves as an even more efficient mechanism for memory than autoencoding. Theoretically, we prove that when trained on a single example, autoencoders store the example as an attractor. Lastly, by treating a sequence encoder as a composition of maps, we prove that sequence encoding provides a more efficient mechanism for memory than autoencoding.


Asunto(s)
Biología Computacional/métodos , Memoria/fisiología , Redes Neurales de la Computación , Aprendizaje Automático , Dinámicas no Lineales
7.
Bioinformatics ; 37(18): 3067-3069, 2021 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-33704425

RESUMEN

SUMMARY: Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphs given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. AVAILABILITY AND IMPLEMENTATION: Python package freely available at http://uhlerlab.github.io/causaldag/dci. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Regulación de la Expresión Génica
8.
PLoS Comput Biol ; 16(4): e1007828, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32343706

RESUMEN

Lineage tracing involves the identification of all ancestors and descendants of a given cell, and is an important tool for studying biological processes such as development and disease progression. However, in many settings, controlled time-course experiments are not feasible, for example when working with tissue samples from patients. Here we present ImageAEOT, a computational pipeline based on autoencoders and optimal transport for predicting the lineages of cells using time-labeled datasets from different stages of a cellular process. Given a single-cell image from one of the stages, ImageAEOT generates an artificial lineage of this cell based on the population characteristics of the other stages. These lineages can be used to connect subpopulations of cells through the different stages and identify image-based features and biomarkers underlying the biological process. To validate our method, we apply ImageAEOT to a benchmark task based on nuclear and chromatin images during the activation of fibroblasts by tumor cells in engineered 3D tissues. We further validate ImageAEOT on chromatin images of various breast cancer cell lines and human tissue samples, thereby linking alterations in chromatin condensation patterns to different stages of tumor progression. Our results demonstrate the promise of computational methods based on autoencoding and optimal transport principles for lineage tracing in settings where existing experimental strategies cannot be used.


Asunto(s)
Linaje de la Célula , Biología Computacional/métodos , Análisis de la Célula Individual/métodos , Neoplasias de la Mama , Diferenciación Celular/fisiología , Línea Celular Tumoral , Núcleo Celular/fisiología , Cromatina/fisiología , Técnicas de Cocultivo , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Reproducibilidad de los Resultados
9.
Curr Opin Solid State Mater Sci ; 25(1): 100874, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33519291

RESUMEN

In this Current Opinion, we highlight the importance of the material properties of tissues and how alterations therein, which influence epithelial-to-mesenchymal transitions, represent an important layer of regulation in a number of diseases and potentially also play a critical role in host-pathogen interactions. In light of the current SARS-CoV-2 pandemic, we here highlight the possible role of lung tissue stiffening with ageing and how this might facilitate increased SARS-CoV-2 replication through matrix-stiffness dependent epithelial-to-mesenchymal transitions of the lung epithelium. This emphasizes the need for integrating material properties of tissues in drug discovery programs.

10.
Proc Natl Acad Sci U S A ; 114(52): 13714-13719, 2017 12 26.
Artículo en Inglés | MEDLINE | ID: mdl-29229825

RESUMEN

The 3D structure of the genome plays a key role in regulatory control of the cell. Experimental methods such as high-throughput chromosome conformation capture (Hi-C) have been developed to probe the 3D structure of the genome. However, it remains a challenge to deduce from these data chromosome regions that are colocalized and coregulated. Here, we present an integrative approach that leverages 1D functional genomic features (e.g., epigenetic marks) with 3D interactions from Hi-C data to identify functional interchromosomal interactions. We construct a weighted network with 250-kb genomic regions as nodes and Hi-C interactions as edges, where the edge weights are given by the correlation between 1D genomic features. Individual interacting clusters are determined using weighted correlation clustering on the network. We show that intermingling regions generally fall into either active or inactive clusters based on the enrichment for RNA polymerase II (RNAPII) and H3K9me3, respectively. We show that active clusters are hotspots for transcription factor binding sites. We also validate our predictions experimentally by 3D fluorescence in situ hybridization (FISH) experiments and show that active RNAPII is enriched in predicted active clusters. Our method provides a general quantitative framework that couples 1D genomic features with 3D interactions from Hi-C to probe the guiding principles that link the spatial organization of the genome with regulatory control.


Asunto(s)
Cromosomas Humanos , Análisis de Secuencia de ADN/métodos , Transcripción Genética/fisiología , Animales , Cromosomas Humanos/genética , Cromosomas Humanos/metabolismo , Humanos
11.
J Biomed Inform ; 50: 133-41, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24509073

RESUMEN

The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ(2)-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013).


Asunto(s)
Estudio de Asociación del Genoma Completo , Privacidad , Enfermedad de Crohn/genética , Humanos
12.
Aging Cell ; 23(3): e14056, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38062919

RESUMEN

Human life expectancy is constantly increasing and aging has become a major risk factor for many diseases, although the underlying gene regulatory mechanisms are still unclear. Using transcriptomic and chromosomal conformation capture (Hi-C) data from human skin fibroblasts from individuals across different age groups, we identified a tight coupling between the changes in co-regulation and co-localization of genes. We obtained transcription factors, cofactors, and chromatin regulators that could drive the cellular aging process by developing a time-course prize-collecting Steiner tree algorithm. In particular, by combining RNA-Seq data from different age groups and protein-protein interaction data we determined the key transcription regulators and gene regulatory changes at different life stage transitions. We then mapped these transcription regulators to the 3D reorganization of chromatin in young and old skin fibroblasts. Collectively, we identified key transcription regulators whose target genes are spatially rearranged and correlate with changes in their expression, thereby providing potential targets for reverting cellular aging.


Asunto(s)
Cromatina , Factores de Transcripción , Humanos , Cromatina/genética , Factores de Transcripción/metabolismo , Regulación de la Expresión Génica , Senescencia Celular/genética , Perfilación de la Expresión Génica
13.
bioRxiv ; 2024 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-38617272

RESUMEN

Ebola virus (EBOV) is a high-consequence filovirus that gives rise to frequent epidemics with high case fatality rates and few therapeutic options. Here, we applied image-based screening of a genome-wide CRISPR library to systematically identify host cell regulators of Ebola virus infection in 39,085,093 million single cells. Measuring viral RNA and protein levels together with their localization in cells identified over 998 related host factors and provided detailed information about the role of each gene across the virus replication cycle. We trained a deep learning model on single-cell images to associate each host factor with predicted replication steps, and confirmed the predicted relationship for select host factors. Among the findings, we showed that the mitochondrial complex III subunit UQCRB is a post-entry regulator of Ebola virus RNA replication, and demonstrated that UQCRB inhibition with a small molecule reduced overall Ebola virus infection with an IC50 of 5 µM. Using a random forest model, we also identified perturbations that reduced infection by disrupting the equilibrium between viral RNA and protein. One such protein, STRAP, is a spliceosome-associated factor that was found to be closely associated with VP35, a viral protein required for RNA processing. Loss of STRAP expression resulted in a reduction in full-length viral genome production and subsequent production of non-infectious virus particles. Overall, the data produced in this genome-wide high-content single-cell screen and secondary screens in additional cell lines and related filoviruses (MARV and SUDV) revealed new insights about the role of host factors in virus replication and potential new targets for therapeutic intervention.

14.
bioRxiv ; 2023 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-38106093

RESUMEN

Synthetic lethality refers to a genetic interaction where the simultaneous perturbation of gene pairs leads to cell death. Synthetically lethal gene pairs (SL pairs) provide a potential avenue for selectively targeting cancer cells based on genetic vulnerabilities. The rise of large-scale gene perturbation screens such as the Cancer Dependency Map (DepMap) offers the opportunity to identify SL pairs automatically using machine learning. We build on a recently developed class of feature learning kernel machines known as Recursive Feature Machines (RFMs) to develop a pipeline for identifying SL pairs based on CRISPR viability data from DepMap. In particular, we first train RFMs to predict viability scores for a given CRISPR gene knockout from cell line embeddings consisting of gene expression and mutation features. After training, RFMs use a statistical operator known as average gradient outer product to provide weights for each feature indicating the importance of each feature in predicting cellular viability. We subsequently apply correlation-based filters to re-weight RFM feature importances and identify those features that are most indicative of low cellular viability. Our resulting pipeline is computationally efficient, taking under 3 minutes for analyzing all 17, 453 knockouts from DepMap for candidate SL pairs. We show that our pipeline more accurately recovers experimentally verified SL pairs than prior approaches. Moreover, our pipeline finds new candidate SL pairs, thereby opening novel avenues for identifying genetic vulnerabilities in cancer.

15.
Nat Commun ; 14(1): 5570, 2023 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-37689796

RESUMEN

Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple models that are competitive on a variety of tasks, it has been unclear how to develop scalable kernel-based transfer learning methods across general source and target tasks with possibly differing label dimensions. In this work, we propose a transfer learning framework for kernel methods by projecting and translating the source model to the target task. We demonstrate the effectiveness of our framework in applications to image classification and virtual drug screening. For both applications, we identify simple scaling laws that characterize the performance of transfer-learned kernels as a function of the number of target examples. We explain this phenomenon in a simplified linear setting, where we are able to derive the exact scaling laws.

16.
ArXiv ; 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38076509

RESUMEN

High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

17.
bioRxiv ; 2023 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-38106037

RESUMEN

Proteins on the cell membrane cluster to respond to extracellular signals; for example, adhesion proteins cluster to enhance extracellular matrix sensing; or T-cell receptors cluster to enhance antigen sensing. Importantly, the maturation of such receptor clusters requires transcriptional control to adapt and reinforce the extracellular signal sensing. However, it has been unclear how such efficient clustering mechanisms are encoded at the level of the genes that code for these receptor proteins. Using the adhesome as an example, we show that genes that code for adhesome receptor proteins are spatially co-localized and co-regulated within the cell nucleus. Towards this, we use Hi-C maps combined with RNA-seq data of adherent cells to map the correspondence between adhesome receptor proteins and their associated genes. Interestingly, we find that the transcription factors that regulate these genes are also co-localized with the adhesome gene loci, thereby potentially facilitating a transcriptional reinforcement of the extracellular matrix sensing machinery. Collectively, our results highlight an important layer of transcriptional control of cellular signal sensing.

18.
ArXiv ; 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38168456

RESUMEN

Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Prior work focused on supervised learning methods using a large set of binding affinity data for small molecules, but it is hard to apply the same strategy to other drug classes like antibodies as labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy prediction as a generative modeling task. Specifically, we train an energy-based model on a set of unlabelled protein-ligand complexes using SE(3) denoising score matching and interpret its log-likelihood as binding affinity. Our key contribution is a new equivariant rotation prediction network called Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts a rotation by modeling the force and torque between protein and ligand atoms, where the force is defined as the gradient of an energy function with respect to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen binding affinity prediction benchmarks. Our model outperforms all unsupervised baselines (physics-based and statistical potentials) and matches supervised learning methods in the antibody case.

19.
Nat Commun ; 14(1): 2436, 2023 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-37105979

RESUMEN

A fundamental challenge in diagnostics is integrating multiple modalities to develop a joint characterization of physiological state. Using the heart as a model system, we develop a cross-modal autoencoder framework for integrating distinct data modalities and constructing a holistic representation of cardiovascular state. In particular, we use our framework to construct such cross-modal representations from cardiac magnetic resonance images (MRIs), containing structural information, and electrocardiograms (ECGs), containing myoelectric information. We leverage the learned cross-modal representation to (1) improve phenotype prediction from a single, accessible phenotype such as ECGs; (2) enable imputation of hard-to-acquire cardiac MRIs from easy-to-acquire ECGs; and (3) develop a framework for performing genome-wide association studies in an unsupervised manner. Our results systematically integrate distinct diagnostic modalities into a common representation that better characterizes physiologic state.


Asunto(s)
Sistema Cardiovascular , Estudio de Asociación del Genoma Completo , Corazón/diagnóstico por imagen , Sistema Cardiovascular/diagnóstico por imagen , Electrocardiografía , Aprendizaje
20.
bioRxiv ; 2023 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-36747789

RESUMEN

E3 ligases regulate key processes, but many of their roles remain unknown. Using Perturb-seq, we interrogated the function of 1,130 E3 ligases, partners and substrates in the inflammatory response in primary dendritic cells (DCs). Dozens impacted the balance of DC1, DC2, migratory DC and macrophage states and a gradient of DC maturation. Family members grouped into co-functional modules that were enriched for physical interactions and impacted specific programs through substrate transcription factors. E3s and their adaptors co-regulated the same processes, but partnered with different substrate recognition adaptors to impact distinct aspects of the DC life cycle. Genetic interactions were more prevalent within than between modules, and a deep learning model, comßVAE, predicts the outcome of new combinations by leveraging modularity. The E3 regulatory network was associated with heritable variation and aberrant gene expression in immune cells in human inflammatory diseases. Our study provides a general approach to dissect gene function.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA