Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 2.392
Filter
Add more filters

Publication year range
1.
Cell ; 187(6): 1508-1526.e16, 2024 Mar 14.
Article in English | MEDLINE | ID: mdl-38442711

ABSTRACT

Dorsal root ganglia (DRG) somatosensory neurons detect mechanical, thermal, and chemical stimuli acting on the body. Achieving a holistic view of how different DRG neuron subtypes relay neural signals from the periphery to the CNS has been challenging with existing tools. Here, we develop and curate a mouse genetic toolkit that allows for interrogating the properties and functions of distinct cutaneous targeting DRG neuron subtypes. These tools have enabled a broad morphological analysis, which revealed distinct cutaneous axon arborization areas and branching patterns of the transcriptionally distinct DRG neuron subtypes. Moreover, in vivo physiological analysis revealed that each subtype has a distinct threshold and range of responses to mechanical and/or thermal stimuli. These findings support a model in which morphologically and physiologically distinct cutaneous DRG sensory neuron subtypes tile mechanical and thermal stimulus space to collectively encode a wide range of natural stimuli.


Subject(s)
Ganglia, Spinal , Sensory Receptor Cells , Single-Cell Gene Expression Analysis , Animals , Mice , Ganglia, Spinal/cytology , Sensory Receptor Cells/cytology , Skin/innervation
2.
Cell ; 183(3): 620-635.e22, 2020 10 29.
Article in English | MEDLINE | ID: mdl-33035454

ABSTRACT

Hippocampal activity represents many behaviorally important variables, including context, an animal's location within a given environmental context, time, and reward. Using longitudinal calcium imaging in mice, multiple large virtual environments, and differing reward contingencies, we derived a unified probabilistic model of CA1 representations centered on a single feature-the field propensity. Each cell's propensity governs how many place fields it has per unit space, predicts its reward-related activity, and is preserved across distinct environments and over months. Propensity is broadly distributed-with many low, and some very high, propensity cells-and thus strongly shapes hippocampal representations. This results in a range of spatial codes, from sparse to dense. Propensity varied ∼10-fold between adjacent cells in salt-and-pepper fashion, indicating substantial functional differences within a presumed cell type. Intracellular recordings linked propensity to cell excitability. The stability of each cell's propensity across conditions suggests this fundamental property has anatomical, transcriptional, and/or developmental origins.


Subject(s)
Hippocampus/anatomy & histology , Hippocampus/physiology , Animals , Behavior, Animal/physiology , Biophysical Phenomena , Calcium/metabolism , Male , Mice, Inbred C57BL , Models, Neurological , Pyramidal Cells/physiology , Reward , Task Performance and Analysis , Time Factors
3.
Proc Natl Acad Sci U S A ; 121(26): e2312335121, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38889151

ABSTRACT

Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.


Subject(s)
Mutation , Proteins , Proteins/genetics , Proteins/metabolism , Proteins/chemistry , Epistasis, Genetic , Evolution, Molecular , Computational Biology/methods
4.
Am J Hum Genet ; 110(2): 314-325, 2023 02 02.
Article in English | MEDLINE | ID: mdl-36610401

ABSTRACT

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.


Subject(s)
Biological Specimen Banks , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Likelihood Functions , Population Groups , Software , Genetics, Population
5.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38904542

ABSTRACT

The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.


Subject(s)
Breast Neoplasms , Machine Learning , Humans , Breast Neoplasms/genetics , Breast Neoplasms/drug therapy , Breast Neoplasms/metabolism , Female , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Algorithms , Antineoplastic Agents/therapeutic use , Antineoplastic Agents/pharmacology , Computational Biology/methods , Genomics/methods
6.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39228303

ABSTRACT

Recent advances in spatial transcriptomics (ST) enable measurements of transcriptome within intact biological tissues by preserving spatial information, offering biologists unprecedented opportunities to comprehensively understand tissue micro-environment, where spatial domains are basic units of tissues. Although great efforts are devoted to this issue, they still have many shortcomings, such as ignoring local information and relations of spatial domains, requiring alternatives to solve these problems. Here, a novel algorithm for spatial domain identification in Spatial Transcriptomics data with Structure Correlation and Self-Representation (ST-SCSR), which integrates local information, global information, and similarity of spatial domains. Specifically, ST-SCSR utilzes matrix tri-factorization to simultaneously decompose expression profiles and spatial network of spots, where expressional and spatial features of spots are fused via the shared factor matrix that interpreted as similarity of spatial domains. Furthermore, ST-SCSR learns affinity graph of spots by manipulating expressional and spatial features, where local preservation and sparse constraints are employed, thereby enhancing the quality of graph. The experimental results demonstrate that ST-SCSR not only outperforms state-of-the-art algorithms in terms of accuracy, but also identifies many potential interesting patterns.


Subject(s)
Algorithms , Gene Expression Profiling , Transcriptome , Gene Expression Profiling/methods , Computational Biology/methods , Humans
7.
Proc Natl Acad Sci U S A ; 120(7): e2206994120, 2023 Feb 14.
Article in English | MEDLINE | ID: mdl-36763535

ABSTRACT

Recent advances in high-resolution imaging techniques and particle-based simulation methods have enabled the precise microscopic characterization of collective dynamics in various biological and engineered active matter systems. In parallel, data-driven algorithms for learning interpretable continuum models have shown promising potential for the recovery of underlying partial differential equations (PDEs) from continuum simulation data. By contrast, learning macroscopic hydrodynamic equations for active matter directly from experiments or particle simulations remains a major challenge, especially when continuum models are not known a priori or analytic coarse graining fails, as often is the case for nondilute and heterogeneous systems. Here, we present a framework that leverages spectral basis representations and sparse regression algorithms to discover PDE models from microscopic simulation and experimental data, while incorporating the relevant physical symmetries. We illustrate the practical potential through a range of applications, from a chiral active particle model mimicking nonidentical swimming cells to recent microroller experiments and schooling fish. In all these cases, our scheme learns hydrodynamic equations that reproduce the self-organized collective dynamics observed in the simulations and experiments. This inference framework makes it possible to measure a large number of hydrodynamic parameters in parallel and directly from video data.

8.
Genet Epidemiol ; 2024 May 15.
Article in English | MEDLINE | ID: mdl-38751238

ABSTRACT

Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing "gene component scores" and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.

9.
Mol Biol Evol ; 41(7)2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38916040

ABSTRACT

Phylogenomic analyses of long sequences, consisting of many genes and genomic segments, reconstruct organismal relationships with high statistical confidence. But, inferred relationships can be sensitive to excluding just a few sequences. Currently, there is no direct way to identify fragile relationships and the associated individual gene sequences in species. Here, we introduce novel metrics for gene-species sequence concordance and clade probability derived from evolutionary sparse learning models. We validated these metrics using fungi, plant, and animal phylogenomic datasets, highlighting the ability of the new metrics to pinpoint fragile clades and the sequences responsible. The new approach does not necessitate the investigation of alternative phylogenetic hypotheses, substitution models, or repeated data subset analyses. Our methodology offers a streamlined approach to evaluating major inferred clades and identifying sequences that may distort reconstructed phylogenies using large datasets.


Subject(s)
Genomics , Phylogeny , Animals , Genomics/methods , Models, Genetic , Evolution, Molecular , Plants/genetics , Fungi/genetics
10.
Biostatistics ; 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38916966

ABSTRACT

Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.

11.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37406190

ABSTRACT

Studies have confirmed that the occurrence of many complex diseases in the human body is closely related to the microbial community, and microbes can affect tumorigenesis and metastasis by regulating the tumor microenvironment. However, there are still large gaps in the clinical observation of the microbiota in disease. Although biological experiments are accurate in identifying disease-associated microbes, they are also time-consuming and expensive. The computational models for effective identification of diseases related microbes can shorten this process, and reduce capital and time costs. Based on this, in the paper, a model named DSAE_RF is presented to predict latent microbe-disease associations by combining multi-source features and deep learning. DSAE_RF calculates four similarities between microbes and diseases, which are then used as feature vectors for the disease-microbe pairs. Later, reliable negative samples are screened by k-means clustering, and a deep sparse autoencoder neural network is further used to extract effective features of the disease-microbe pairs. In this foundation, a random forest classifier is presented to predict the associations between microbes and diseases. To assess the performance of the model in this paper, 10-fold cross-validation is implemented on the same dataset. As a result, the AUC and AUPR of the model are 0.9448 and 0.9431, respectively. Furthermore, we also conduct a variety of experiments, including comparison of negative sample selection methods, comparison with different models and classifiers, Kolmogorov-Smirnov test and t-test, ablation experiments, robustness analysis, and case studies on Covid-19 and colorectal cancer. The results fully demonstrate the reliability and availability of our model.


Subject(s)
COVID-19 , Deep Learning , Microbiota , Humans , Reproducibility of Results , Algorithms , Computational Biology/methods
12.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36573486

ABSTRACT

Viral infection involves a large number of protein-protein interactions (PPIs) between the virus and the host, and the identification of these PPIs plays an important role in revealing viral infection and pathogenesis. Existing computational models focus on predicting whether human proteins and viral proteins interact, and rarely take into account the types of diseases associated with these interactions. Although there are computational models based on a matrix and tensor decomposition for predicting multi-type biological interaction relationships, these methods cannot effectively model high-order nonlinear relationships of biological entities and are not suitable for integrating multiple features. To this end, we propose a novel computational framework, LTDSSL, to determine human-virus PPIs under different disease types. LTDSSL utilizes logistic functions to model nonlinear associations, sets importance levels to emphasize the importance of observed interactions and utilizes sparse subspace learning of multiple features to improve model performance. Experimental results show that LTDSSL has better predictive performance for both new disease types and new triples than the state-of-the-art methods. In addition, the case study further demonstrates that LTDSSL can effectively predict human-viral PPIs under various disease types.


Subject(s)
Protein Interaction Mapping , Viruses , Humans , Protein Interaction Mapping/methods , Viral Proteins/metabolism , Viruses/metabolism
13.
Proc Natl Acad Sci U S A ; 119(30): e2122788119, 2022 07 26.
Article in English | MEDLINE | ID: mdl-35867822

ABSTRACT

Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available.


Subject(s)
Microbiota , Logistic Models , Metagenomics/methods , Microbiota/genetics , Sequence Analysis
14.
Proc Natl Acad Sci U S A ; 119(8)2022 02 22.
Article in English | MEDLINE | ID: mdl-35181603

ABSTRACT

High-frequency (HF) signals are ubiquitous in the industrial world and are of great use for monitoring of industrial assets. Most deep-learning tools are designed for inputs of fixed and/or very limited size and many successful applications of deep learning to the industrial context use as inputs extracted features, which are a manually and often arduously obtained compact representation of the original signal. In this paper, we propose a fully unsupervised deep-learning framework that is able to extract a meaningful and sparse representation of raw HF signals. We embed in our architecture important properties of the fast discrete wavelet transform (FDWT) such as 1) the cascade algorithm; 2) the conjugate quadrature filter property that links together the wavelet, the scaling, and transposed filter functions; and 3) the coefficient denoising. Using deep learning, we make this architecture fully learnable: Both the wavelet bases and the wavelet coefficient denoising become learnable. To achieve this objective, we propose an activation function that performs a learnable hard thresholding of the wavelet coefficients. With our framework, the denoising FDWT becomes a fully learnable unsupervised tool that does not require any type of pre- or postprocessing or any prior knowledge on wavelet transform. We demonstrate the benefits of embedding all these properties on three machine-learning tasks performed on open-source sound datasets. We perform an ablation study of the impact of each property on the performance of the architecture, achieve results well above baseline, and outperform other state-of-the-art methods.

15.
Proc Natl Acad Sci U S A ; 119(33): e2115335119, 2022 08 16.
Article in English | MEDLINE | ID: mdl-35947616

ABSTRACT

We propose that coding and decoding in the brain are achieved through digital computation using three principles: relative ordinal coding of inputs, random connections between neurons, and belief voting. Due to randomization and despite the coarseness of the relative codes, we show that these principles are sufficient for coding and decoding sequences with error-free reconstruction. In particular, the number of neurons needed grows linearly with the size of the input repertoire growing exponentially. We illustrate our model by reconstructing sequences with repertoires on the order of a billion items. From this, we derive the Shannon equations for the capacity limit to learn and transfer information in the neural population, which is then generalized to any type of neural network. Following the maximum entropy principle of efficient coding, we show that random connections serve to decorrelate redundant information in incoming signals, creating more compact codes for neurons and therefore, conveying a larger amount of information. Henceforth, despite the unreliability of the relative codes, few neurons become necessary to discriminate the original signal without error. Finally, we discuss the significance of this digital computation model regarding neurobiological findings in the brain and more generally with artificial intelligence algorithms, with a view toward a neural information theory and the design of digital neural networks.


Subject(s)
Artificial Intelligence , Brain , Models, Neurological , Algorithms , Brain/physiology , Neural Networks, Computer , Neurons/physiology
16.
Nano Lett ; 24(21): 6255-6261, 2024 May 29.
Article in English | MEDLINE | ID: mdl-38743662

ABSTRACT

In this study, we clarify the liquid structure formed at the interface between LiCoO2 (LCO), the cathode material of Li-ion batteries, and propylene carbonate (PC), which is used as a solvent in the electrolyte, on a molecular scale. We apply sparse modeling-based modal analysis to force spectroscopy data measured by frequency modulation atomic force microscopy (FM-AFM) and show that each component in the FM-AFM force curve, such as oscillatory solvation force, background, and noise, can be automatically decomposed. Moreover, by combining detailed force curve analysis with solid/liquid interface simulations based on first-principles calculation, we have identified that there are distinct damped vibrational modes in the force curves at the LCO/PC interface with a period of about 0.57 nm and those with shorter periods, which likely correspond to the solvation forces associated with bulk-state PC molecules and those with PC molecules in "lying down" orientations.

17.
Nano Lett ; 24(7): 2149-2156, 2024 Feb 21.
Article in English | MEDLINE | ID: mdl-38329715

ABSTRACT

The integration time and signal-to-noise ratio are inextricably linked when performing scanning probe microscopy based on raster scanning. This often yields a large lower bound on the measurement time, for example, in nano-optical imaging experiments performed using a scanning near-field optical microscope (SNOM). Here, we utilize sparse scanning augmented with Gaussian process regression to bypass the time constraint. We apply this approach to image charge-transfer polaritons in graphene residing on ruthenium trichloride (α-RuCl3) and obtain key features such as polariton damping and dispersion. Critically, nano-optical SNOM imaging data obtained via sparse sampling are in good agreement with those extracted from traditional raster scans but require 11 times fewer sampled points. As a result, Gaussian process-aided sparse spiral scans offer a major decrease in scanning time.

18.
J Neurosci ; 43(22): 4129-4143, 2023 05 31.
Article in English | MEDLINE | ID: mdl-37185098

ABSTRACT

The mechanisms involved in transforming early visual signals to curvature representations in V4 are unknown. We propose a hierarchical model that reveals V1/V2 encodings that are essential components for this transformation to the reported curvature representations in V4. Then, by relaxing the often-imposed prior of a single Gaussian, V4 shape selectivity is learned in the last layer of the hierarchy from Macaque V4 responses. We found that V4 cells integrate multiple shape parts from the full spatial extent of their receptive fields with similar excitatory and inhibitory contributions. Our results uncover new details in existing data about shape selectivity in V4 neurons that with additional experiments can enhance our understanding of processing in this area. Accordingly, we propose designs for a stimulus set that allow removing shape parts without disturbing the curvature signal to isolate part contributions to V4 responses.SIGNIFICANCE STATEMENT Selectivity to convex and concave shape parts in V4 neurons has been repeatedly reported. Nonetheless, the mechanisms that yield such selectivities in the ventral stream remain unknown. We propose a hierarchical computational model that incorporates findings of the various visual areas involved in shape processing and suggest mechanisms that transform the shape signal from low-level features to convex/concave part representations. Learning shape selectivity from Macaque V4 responses in the final processing stage in our model, we found that V4 neurons integrate shape parts from the full spatial extent of their receptive field with both facilitatory and inhibitory contributions. These results reveal hidden information in existing V4 data that with additional experiments can enhance our understanding of processing in V4.


Subject(s)
Form Perception , Visual Cortex , Animals , Visual Cortex/physiology , Form Perception/physiology , Macaca , Neurons/physiology , Brain , Visual Pathways/physiology , Photic Stimulation
19.
BMC Bioinformatics ; 25(1): 132, 2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38539064

ABSTRACT

BACKGROUND: Classifying breast cancer subtypes is crucial for clinical diagnosis and treatment. However, the early symptoms of breast cancer may not be apparent. Rapid advances in high-throughput sequencing technology have led to generating large number of multi-omics biological data. Leveraging and integrating the available multi-omics data can effectively enhance the accuracy of identifying breast cancer subtypes. However, few efforts focus on identifying the associations of different omics data to predict the breast cancer subtypes. RESULTS: In this paper, we propose a differential sparse canonical correlation analysis network (DSCCN) for classifying the breast cancer subtypes. DSCCN performs differential analysis on multi-omics expression data to identify differentially expressed (DE) genes and adopts sparse canonical correlation analysis (SCCA) to mine highly correlated features between multi-omics DE-genes. Meanwhile, DSCCN uses multi-task deep learning neural network separately to train the correlated DE-genes to predict breast cancer subtypes, which spontaneously tackle the data heterogeneity problem in integrating multi-omics data. CONCLUSIONS: The experimental results show that by mining the associations among multi-omics data, DSCCN is more capable of accurately classifying breast cancer subtypes than the existing methods.


Subject(s)
Breast Neoplasms , Deep Learning , Humans , Female , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Multiomics , Canonical Correlation Analysis
20.
J Cell Mol Med ; 28(18): e70071, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39300612

ABSTRACT

The use of matrix completion methods to predict the association between microbes and diseases can effectively improve treatment efficiency. However, the similarity measures used in the existing methods are often influenced by various factors such as neighbourhood size, choice of similarity metric, or multiple parameters for similarity fusion, making it challenging. Additionally, matrix completion is currently limited by the sparsity of the initial association matrix, which restricts its predictive performance. To address these problems, we propose a matrix completion method based on adaptive neighbourhood similarity and sparse constraints (ANS-SCMC) for predict microbe-disease potential associations. Adaptive neighbourhood similarity learning dynamically uses the decomposition results as effective information for the next learning iteration by simultaneously performing local manifold structure learning and decomposition. This approach effectively preserves fine local structure information and avoids the influence of weight parameters directly involved in similarity measurement. Additionally, the sparse constraint-based matrix completion approach can better handle the sparsity challenge in the association matrix. Finally, the algorithm we proposed has achieved significantly higher predictive performance in the validation compared to several commonly used prediction methods proposed to date. Furthermore, in the case study, the prediction algorithm achieved an accuracy of up to 80% for the top 10 microbes associated with type 1 diabetes and 100% for Crohn's disease respectively.


Subject(s)
Algorithms , Humans , Computational Biology/methods , Microbiota , Crohn Disease/microbiology
SELECTION OF CITATIONS
SEARCH DETAIL