Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.875
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38605640

RESUMEN

Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.


Asunto(s)
Empalme del ARN , Vertebrados , Animales , Humanos , Secuencia de Bases , Vertebrados/genética , ARN , Aprendizaje Automático Supervisado
2.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38982642

RESUMEN

Inferring cell type proportions from bulk transcriptome data is crucial in immunology and oncology. Here, we introduce guided LDA deconvolution (GLDADec), a bulk deconvolution method that guides topics using cell type-specific marker gene names to estimate topic distributions for each sample. Through benchmarking using blood-derived datasets, we demonstrate its high estimation performance and robustness. Moreover, we apply GLDADec to heterogeneous tissue bulk data and perform comprehensive cell type analysis in a data-driven manner. We show that GLDADec outperforms existing methods in estimation performance and evaluate its biological interpretability by examining enrichment of biological processes for topics. Finally, we apply GLDADec to The Cancer Genome Atlas tumor samples, enabling subtype stratification and survival analysis based on estimated cell type proportions, thus proving its practical utility in clinical settings. This approach, utilizing marker gene names as partial prior information, can be applied to various scenarios for bulk data deconvolution. GLDADec is available as an open-source Python package at https://github.com/mizuno-group/GLDADec.


Asunto(s)
Programas Informáticos , Humanos , Perfilación de la Expresión Génica/métodos , Algoritmos , Transcriptoma , Biología Computacional/métodos , Neoplasias/genética , Biomarcadores de Tumor/genética , Marcadores Genéticos
3.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38801702

RESUMEN

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.


Asunto(s)
Aprendizaje Automático Supervisado , Algoritmos , Biología Computacional/métodos
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38426320

RESUMEN

Protein subcellular localization (PSL) is very important in order to understand its functions, and its movement between subcellular niches within cells plays fundamental roles in biological process regulation. Mass spectrometry-based spatio-temporal proteomics technologies can help provide new insights of protein translocation, but bring the challenge in identifying reliable protein translocation events due to the noise interference and insufficient data mining. We propose a semi-supervised graph convolution network (GCN)-based framework termed TransGCN that infers protein translocation events from spatio-temporal proteomics. Based on expanded multiple distance features and joint graph representations of proteins, TransGCN utilizes the semi-supervised GCN to enable effective knowledge transfer from proteins with known PSLs for predicting protein localization and translocation. Our results demonstrate that TransGCN outperforms current state-of-the-art methods in identifying protein translocations, especially in coping with batch effects. It also exhibited excellent predictive accuracy in PSL prediction. TransGCN is freely available on GitHub at https://github.com/XuejiangGuo/TransGCN.


Asunto(s)
Habilidades de Afrontamiento , Proteómica , Minería de Datos , Espectrometría de Masas , Transporte de Proteínas
5.
Proc Natl Acad Sci U S A ; 120(1): e2214972120, 2023 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-36580592

RESUMEN

Regression learning is one of the long-standing problems in statistics, machine learning, and deep learning (DL). We show that writing this problem as a probabilistic expectation over (unknown) feature probabilities - thus increasing the number of unknown parameters and seemingly making the problem more complex-actually leads to its simplification, and allows incorporating the physical principle of entropy maximization. It helps decompose a very general setting of this learning problem (including discretization, feature selection, and learning multiple piece-wise linear regressions) into an iterative sequence of simple substeps, which are either analytically solvable or cheaply computable through an efficient second-order numerical solver with a sublinear cost scaling. This leads to the computationally cheap and robust non-DL second-order Sparse Probabilistic Approximation for Regression Task Analysis (SPARTAn) algorithm, that can be efficiently applied to problems with millions of feature dimensions on a commodity laptop, when the state-of-the-art learning tools would require supercomputers. SPARTAn is compared to a range of commonly used regression learning tools on synthetic problems and on the prediction of the El Niño Southern Oscillation, the dominant interannual mode of tropical climate variability. The obtained SPARTAn learners provide more predictive, sparse, and physically explainable data descriptions, clearly discerning the important role of ocean temperature variability at the thermocline in the equatorial Pacific. SPARTAn provides an easily interpretable description of the timescales by which these thermocline temperature features evolve and eventually express at the surface, thereby enabling enhanced predictability of the key drivers of the interannual climate.


Asunto(s)
El Niño Oscilación del Sur , Clima Tropical , Entropía , Temperatura , Algoritmos
6.
Proc Natl Acad Sci U S A ; 120(32): e2300558120, 2023 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-37523562

RESUMEN

While sensory representations in the brain depend on context, it remains unclear how such modulations are implemented at the biophysical level, and how processing layers further in the hierarchy can extract useful features for each possible contextual state. Here, we demonstrate that dendritic N-Methyl-D-Aspartate spikes can, within physiological constraints, implement contextual modulation of feedforward processing. Such neuron-specific modulations exploit prior knowledge, encoded in stable feedforward weights, to achieve transfer learning across contexts. In a network of biophysically realistic neuron models with context-independent feedforward weights, we show that modulatory inputs to dendritic branches can solve linearly nonseparable learning problems with a Hebbian, error-modulated learning rule. We also demonstrate that local prediction of whether representations originate either from different inputs, or from different contextual modulations of the same input, results in representation learning of hierarchical feedforward weights across processing layers that accommodate a multitude of contexts.


Asunto(s)
Modelos Neurológicos , N-Metilaspartato , Aprendizaje/fisiología , Neuronas/fisiología , Percepción
7.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36653906

RESUMEN

Spatially resolved transcriptomics technologies enable comprehensive measurement of gene expression patterns in the context of intact tissues. However, existing technologies suffer from either low resolution or shallow sequencing depth. Here, we present DIST, a deep learning-based method that imputes the gene expression profiles on unmeasured locations and enhances the gene expression for both original measured spots and imputed spots by self-supervised learning and transfer learning. We evaluate the performance of DIST for imputation, clustering, differential expression analysis and functional enrichment analysis. The results show that DIST can impute the gene expression accurately, enhance the gene expression for low-quality data, help detect more biological meaningful differentially expressed genes and pathways, therefore allow for deeper insights into the biological processes.


Asunto(s)
Aprendizaje Profundo , Transcriptoma , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
8.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36869836

RESUMEN

The rapid development of single-cell RNA sequencing (scRNA-seq) technology allows us to study gene expression heterogeneity at the cellular level. Cell annotation is the basis for subsequent downstream analysis in single-cell data mining. As more and more well-annotated scRNA-seq reference data become available, many automatic annotation methods have sprung up in order to simplify the cell annotation process on unlabeled target data. However, existing methods rarely explore the fine-grained semantic knowledge of novel cell types absent from the reference data, and they are usually susceptible to batch effects on the classification of seen cell types. Taking into consideration the limitations above, this paper proposes a new and practical task called generalized cell type annotation and discovery for scRNA-seq data whereby target cells are labeled with either seen cell types or cluster labels, instead of a unified 'unassigned' label. To accomplish this, we carefully design a comprehensive evaluation benchmark and propose a novel end-to-end algorithmic framework called scGAD. Specifically, scGAD first builds the intrinsic correspondences on seen and novel cell types by retrieving geometrically and semantically mutual nearest neighbors as anchor pairs. Together with the similarity affinity score, a soft anchor-based self-supervised learning module is then designed to transfer the known label information from reference data to target data and aggregate the new semantic knowledge within target data in the prediction space. To enhance the inter-type separation and intra-type compactness, we further propose a confidential prototype self-supervised learning paradigm to implicitly capture the global topological structure of cells in the embedding space. Such a bidirectional dual alignment mechanism between embedding space and prediction space can better handle batch effect and cell type shift. Extensive results on massive simulation datasets and real datasets demonstrate the superiority of scGAD over various state-of-the-art clustering and annotation methods. We also implement marker gene identification to validate the effectiveness of scGAD in clustering novel cell types and their biological significance. To the best of our knowledge, we are the first to introduce this new and practical task and propose an end-to-end algorithmic framework to solve it. Our method scGAD is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scGAD.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual/métodos , Simulación por Computador , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos
9.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37088980

RESUMEN

Immunofluorescence patterns of anti-nuclear antibodies (ANAs) on human epithelial cell (HEp-2) substrates are important biomarkers for the diagnosis of autoimmune diseases. There are growing clinical requirements for an automatic readout and classification of ANA immunofluorescence patterns for HEp-2 images following the taxonomy recommended by the International Consensus on Antinuclear Antibody Patterns (ICAP). In this study, a comprehensive collection of HEp-2 specimen images covering a broad range of ANA patterns was established and manually annotated by experienced laboratory experts. By utilizing a supervised learning methodology, an automatic immunofluorescence pattern classification framework for HEp-2 specimen images was developed. The framework consists of a module for HEp-2 cell detection and cell-level feature extraction, followed by an image-level classifier that is capable of recognizing all 14 classes of ANA immunofluorescence patterns as recommended by ICAP. Performance analysis indicated an accuracy of 92.05% on the validation dataset and 87% on an independent test dataset, which has surpassed the performance of human examiners on the same test dataset. The proposed framework is expected to contribute to the automatic ANA pattern recognition in clinical laboratories to facilitate efficient and precise diagnosis of autoimmune diseases.


Asunto(s)
Anticuerpos Antinucleares , Enfermedades Autoinmunes , Humanos , Técnica del Anticuerpo Fluorescente , Anticuerpos Antinucleares/análisis , Enfermedades Autoinmunes/diagnóstico , Células Epiteliales , Aprendizaje Automático Supervisado
10.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36592051

RESUMEN

MOTIVATION: Molecular property prediction is a significant requirement in AI-driven drug design and discovery, aiming to predict the molecular property information (e.g. toxicity) based on the mined biomolecular knowledge. Although graph neural networks have been proven powerful in predicting molecular property, unbalanced labeled data and poor generalization capability for new-synthesized molecules are always key issues that hinder further improvement of molecular encoding performance. RESULTS: We propose a novel self-supervised representation learning scheme based on a Cascaded Attention Network and Graph Contrastive Learning (CasANGCL). We design a new graph network variant, designated as cascaded attention network, to encode local-global molecular representations. We construct a two-stage contrast predictor framework to tackle the label imbalance problem of training molecular samples, which is an integrated end-to-end learning scheme. Moreover, we utilize the information-flow scheme for training our network, which explicitly captures the edge information in the node/graph representations and obtains more fine-grained knowledge. Our model achieves an 81.9% ROC-AUC average performance on 661 tasks from seven challenging benchmarks, showing better portability and generalizations. Further visualization studies indicate our model's better representation capacity and provide interpretability.


Asunto(s)
Benchmarking , Aprendizaje , Diseño de Fármacos , Redes Neurales de la Computación
11.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37771003

RESUMEN

A microbial community maintains its ecological dynamics via metabolite crosstalk. Hence, knowledge of the metabolome, alongside its populace, would help us understand the functionality of a community and also predict how it will change in atypical conditions. Methods that employ low-cost metagenomic sequencing data can predict the metabolic potential of a community, that is, its ability to produce or utilize specific metabolites. These, in turn, can potentially serve as markers of biochemical pathways that are associated with different communities. We developed MMIP (Microbiome Metabolome Integration Platform), a web-based analytical and predictive tool that can be used to compare the taxonomic content, diversity variation and the metabolic potential between two sets of microbial communities from targeted amplicon sequencing data. MMIP is capable of highlighting statistically significant taxonomic, enzymatic and metabolic attributes as well as learning-based features associated with one group in comparison with another. Furthermore, MMIP can predict linkages among species or groups of microbes in the community, specific enzyme profiles, compounds or metabolites associated with such a group of organisms. With MMIP, we aim to provide a user-friendly, online web server for performing key microbiome-associated analyses of targeted amplicon sequencing data, predicting metabolite signature, and using learning-based linkage analysis, without the need for initial metabolomic analysis, and thereby helping in hypothesis generation.


Asunto(s)
Metaboloma , Microbiota , Metabolómica/métodos , Internet
12.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37930026

RESUMEN

Artificial intelligence-based molecular property prediction plays a key role in molecular design such as bioactive molecules and functional materials. In this study, we propose a self-supervised pretraining deep learning (DL) framework, called functional group bidirectional encoder representations from transformers (FG-BERT), pertained based on ~1.45 million unlabeled drug-like molecules, to learn meaningful representation of molecules from function groups. The pretrained FG-BERT framework can be fine-tuned to predict molecular properties. Compared to state-of-the-art (SOTA) machine learning and DL methods, we demonstrate the high performance of FG-BERT in evaluating molecular properties in tasks involving physical chemistry, biophysics and physiology across 44 benchmark datasets. In addition, FG-BERT utilizes attention mechanisms to focus on FG features that are critical to the target properties, thereby providing excellent interpretability for downstream training tasks. Collectively, FG-BERT does not require any artificially crafted features as input and has excellent interpretability, providing an out-of-the-box framework for developing SOTA models for a variety of molecule (especially for drug) discovery tasks.


Asunto(s)
Algoritmos , Inteligencia Artificial , Benchmarking , Aprendizaje Automático
13.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38033291

RESUMEN

Although substantial efforts have been made using graph neural networks (GNNs) for artificial intelligence (AI)-driven drug discovery, effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets , which are time-consuming, computationally expensive and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.


Asunto(s)
Inteligencia Artificial , Benchmarking , Sistemas de Liberación de Medicamentos , Descubrimiento de Drogas , Redes Neurales de la Computación
14.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37950905

RESUMEN

Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.


Asunto(s)
Neoplasias , Oncogenes , Mutación , Benchmarking , Genes Esenciales , Genómica , Neoplasias/genética
15.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37903413

RESUMEN

Accurate prediction of drug-target affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the Semi-Supervised Multi-task training (SSM) framework for DTA prediction, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling using paired drug-target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug-target binding activities, virtual screening experiments, drug feature visualizations and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes.


Asunto(s)
Benchmarking , Descubrimiento de Drogas , Sistemas de Liberación de Medicamentos
16.
Methods ; 229: 115-124, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38950719

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) enables the investigation of intricate mechanisms governing cell heterogeneity and diversity. Clustering analysis remains a pivotal tool in scRNA-seq for discerning cell types. However, persistent challenges arise from noise, high dimensionality, and dropout in single-cell data. Despite the proliferation of scRNA-seq clustering methods, these often focus on extracting representations from individual cell expression data, neglecting potential intercellular relationships. To overcome this limitation, we introduce scGAAC, a novel clustering method based on an attention-based graph convolutional autoencoder. By leveraging structural information between cells through a graph attention autoencoder, scGAAC uncovers latent relationships while extracting representation information from single-cell gene expression patterns. An attention fusion module amalgamates the learned features of the graph attention autoencoder and the autoencoder through attention weights. Ultimately, a self-supervised learning policy guides model optimization. scGAAC, a hypothesis-free framework, performs better on four real scRNA-seq datasets than most state-of-the-art methods. The scGAAC implementation is publicly available on Github at: https://github.com/labiip/scGAAC.


Asunto(s)
Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos , RNA-Seq/métodos , Algoritmos , Programas Informáticos
17.
Methods ; 230: 140-146, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39179191

RESUMEN

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus, which mainly causes respiratory and enteric diseases and is responsible for the outbreak of coronavirus disease 19 (COVID-19). Numerous studies have demonstrated that SARS-CoV-2 infection will lead to a significant dysregulation of protein post-translational modification profile in human cells. The accurate recognition of phosphorylation sites in host cells will contribute to a deep understanding of the pathogenic mechanisms of SARS-CoV-2 and also help to screen drugs and compounds with antiviral potential. Therefore, there is a need to develop cost-effective and high-precision computational strategies for specifically identifying SARS-CoV-2-infected phosphorylation sites. In this work, we first implemented a custom neural network model (named PhosBERT) on the basis of a pre-trained protein language model of ProtBert, which was a self-supervised learning approach developed on the Bidirectional Encoder Representation from Transformers (BERT) architecture. PhosBERT was then trained and validated on serine (S) and threonine (T) phosphorylation dataset and tyrosine (Y) phosphorylation dataset with 5-fold cross-validation, respectively. Independent validation results showed that PhosBERT could identify S/T phosphorylation sites with high accuracy and AUC (area under the receiver operating characteristic) value of 81.9% and 0.896. The prediction accuracy and AUC value of Y phosphorylation sites reached up to 87.1% and 0.902. It indicated that the proposed model was of good prediction ability and stability and would provide a new approach for studying SARS-CoV-2 phosphorylation sites.

18.
Proc Natl Acad Sci U S A ; 119(38): e2202113119, 2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36095183

RESUMEN

We propose a method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data, such as genomics, proteomics, and radiomics, are measured on a common set of samples. "Cooperative learning" combines the usual squared-error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g., lasso, random forests, boosting, or neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and real multiomics examples of labor-onset prediction. By leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.


Asunto(s)
Genómica , Redes Neurales de la Computación , Aprendizaje Automático Supervisado , Genómica/métodos
19.
Proc Natl Acad Sci U S A ; 119(30): e2204379119, 2022 07 26.
Artículo en Inglés | MEDLINE | ID: mdl-35858450

RESUMEN

Prediction errors guide many forms of learning, providing teaching signals that help us improve our performance. Implicit motor adaptation, for instance, is thought to be driven by sensory prediction errors (SPEs), which occur when the expected and observed consequences of a movement differ. Traditionally, SPE computation is thought to require movement execution. However, recent work suggesting that the brain can generate sensory predictions based on motor imagery or planning alone calls this assumption into question. Here, by measuring implicit motor adaptation during a visuomotor task, we tested whether motor planning and well-timed sensory feedback are sufficient for adaptation. Human participants were cued to reach to a target and were, on a subset of trials, rapidly cued to withhold these movements. Errors displayed both on trials with and without movements induced single-trial adaptation. Learning following trials without movements persisted even when movement trials had never been paired with errors and when the direction of movement and sensory feedback trajectories were decoupled. These observations indicate that the brain can compute errors that drive implicit adaptation without generating overt movements, leading to the adaptation of motor commands that are not overtly produced.


Asunto(s)
Aprendizaje , Desempeño Psicomotor , Adaptación Fisiológica , Retroalimentación Sensorial , Humanos , Movimiento
20.
BMC Bioinformatics ; 25(1): 25, 2024 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-38221640

RESUMEN

With the growing number of single-cell datasets collected under more complex experimental conditions, there is an opportunity to leverage single-cell variability to reveal deeper insights into how cells respond to perturbations. Many existing approaches rely on discretizing the data into clusters for differential gene expression (DGE), effectively ironing out any information unveiled by the single-cell variability across cell-types. In addition, DGE often assumes a statistical distribution that, if erroneous, can lead to false positive differentially expressed genes. Here, we present Cellograph: a semi-supervised framework that uses graph neural networks to quantify the effects of perturbations at single-cell granularity. Cellograph not only measures how prototypical cells are of each condition but also learns a latent space that is amenable to interpretable data visualization and clustering. The learned gene weight matrix from training reveals pertinent genes driving the differences between conditions. We demonstrate the utility of our approach on publicly-available datasets including cancer drug therapy, stem cell reprogramming, and organoid differentiation. Cellograph outperforms existing methods for quantifying the effects of experimental perturbations and offers a novel framework to analyze single-cell data using deep learning.


Asunto(s)
Visualización de Datos , Redes Neurales de la Computación , Diferenciación Celular , Análisis por Conglomerados , ARN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA