Búsqueda | Portal de Búsqueda de la BVS

1.

Precision probiotics supplement strategy in aging population based on gut microbiome composition.

Chuang, Yi-Fang; Fan, Kang-Chen; Su, Yin-Yuan; Wu, Ming-Fong; Chiu, Yen-Ling; Liu, Yi-Chien; Lin, Chen-Ching.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-39038938

RESUMEN

With the increasing prevalence of age-related chronic diseases burdening healthcare systems, there is a pressing need for innovative management strategies. Our study focuses on the gut microbiota, essential for metabolic, nutritional, and immune functions, which undergoes significant changes with aging. These changes can impair intestinal function, leading to altered microbial diversity and composition that potentially influence health outcomes and disease progression. Using advanced metagenomic sequencing, we explore the potential of personalized probiotic supplements in 297 older adults by analyzing their gut microbiota. We identified distinctive Lactobacillus and Bifidobacterium signatures in the gut microbiota of older adults, revealing probiotic patterns associated with various population characteristics, microbial compositions, cognitive functions, and neuroimaging results. These insights suggest that tailored probiotic supplements, designed to match individual probiotic profile, could offer an innovative method for addressing age-related diseases and functional declines. Our findings enhance the existing evidence base for probiotic use among older adults, highlighting the opportunity to create more targeted and effective probiotic strategies. However, additional research is required to validate our results and further assess the impact of precision probiotics on aging populations. Future studies should employ longitudinal designs and larger cohorts to conclusively demonstrate the benefits of tailored probiotic treatments.

Asunto(s)

Envejecimiento , Suplementos Dietéticos , Microbioma Gastrointestinal , Probióticos , Probióticos/uso terapéutico , Probióticos/administración & dosificación , Humanos , Anciano , Femenino , Masculino , Anciano de 80 o más Años , Persona de Mediana Edad , Lactobacillus/genética , Metagenómica/métodos , Bifidobacterium

2.

GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference.

Li, Shuo; Liu, Yan; Shen, Long-Chen; Yan, He; Song, Jiangning; Yu, Dong-Jun.

Brief Bioinform ; 25(2)2024 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-38261340

RESUMEN

The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.

Asunto(s)

Benchmarking , Redes Reguladoras de Genes , Área Bajo la Curva , Aprendizaje , Redes Neurales de la Computación

3.

scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq data.

Kang, Yue; Zhang, Hongyu; Guan, Jinting.

Brief Bioinform ; 25(3)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38600665

RESUMEN

Single-cell RNA sequencing (scRNA-seq) facilitates the study of cell type heterogeneity and the construction of cell atlas. However, due to its limitations, many genes may be detected to have zero expressions, i.e. dropout events, leading to bias in downstream analyses and hindering the identification and characterization of cell types and cell functions. Although many imputation methods have been developed, their performances are generally lower than expected across different kinds and dimensions of data and application scenarios. Therefore, developing an accurate and robust single-cell gene expression data imputation method is still essential. Considering to maintain the original cell-cell and gene-gene correlations and leverage bulk RNA sequencing (bulk RNA-seq) data information, we propose scINRB, a single-cell gene expression imputation method with network regularization and bulk RNA-seq data. scINRB adopts network-regularized non-negative matrix factorization to ensure that the imputed data maintains the cell-cell and gene-gene similarities and also approaches the gene average expression calculated from bulk RNA-seq data. To evaluate the performance, we test scINRB on simulated and experimental datasets and compare it with other commonly used imputation methods. The results show that scINRB recovers gene expression accurately even in the case of high dropout rates and dimensions, preserves cell-cell and gene-gene similarities and improves various downstream analyses including visualization, clustering and trajectory inference.

Asunto(s)

Algoritmos , Análisis de la Célula Individual , RNA-Seq , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Análisis por Conglomerados , Expresión Génica , Perfilación de la Expresión Génica , Programas Informáticos

4.

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.

Qiu, Yushan; Guo, Dong; Zhao, Pu; Zou, Quan.

Brief Bioinform ; 25(3)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38754408

RESUMEN

MOTIVATION: The technology for analyzing single-cell multi-omics data has advanced rapidly and has provided comprehensive and accurate cellular information by exploring cell heterogeneity in genomics, transcriptomics, epigenomics, metabolomics and proteomics data. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, as well as the limitations of various analysis algorithms, the clustering performance is generally poor. Matrix factorization is an unsupervised, dimensionality reduction-based method that can cluster individuals and discover related omics variables from different blocks. Here, we present a novel algorithm that performs joint dimensionality reduction learning and cell clustering analysis on single-cell multi-omics data using non-negative matrix factorization that we named scMNMF. We formulate the objective function of joint learning as a constrained optimization problem and derive the corresponding iterative formulas through alternating iterative algorithms. The major advantage of the scMNMF algorithm remains its capability to explore hidden related features among omics data. Additionally, the feature selection for dimensionality reduction and cell clustering mutually influence each other iteratively, leading to a more effective discovery of cell types. We validated the performance of the scMNMF algorithm using two simulated and five real datasets. The results show that scMNMF outperformed seven other state-of-the-art algorithms in various measurements. AVAILABILITY AND IMPLEMENTATION: scMNMF code can be found at https://github.com/yushanqiu/scMNMF.

Asunto(s)

Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Humanos , Genómica/métodos , Biología Computacional/métodos , Proteómica/métodos , Metabolómica/métodos , Epigenómica/métodos , Multiómica

5.

ST-SCSR: identifying spatial domains in spatial transcriptomics data via structure correlation and self-representation.

Zhang, Min; Zhang, Wensheng; Ma, Xiaoke.

Brief Bioinform ; 25(5)2024 Jul 25.

Artículo en Inglés | MEDLINE | ID: mdl-39228303

RESUMEN

Recent advances in spatial transcriptomics (ST) enable measurements of transcriptome within intact biological tissues by preserving spatial information, offering biologists unprecedented opportunities to comprehensively understand tissue micro-environment, where spatial domains are basic units of tissues. Although great efforts are devoted to this issue, they still have many shortcomings, such as ignoring local information and relations of spatial domains, requiring alternatives to solve these problems. Here, a novel algorithm for spatial domain identification in Spatial Transcriptomics data with Structure Correlation and Self-Representation (ST-SCSR), which integrates local information, global information, and similarity of spatial domains. Specifically, ST-SCSR utilzes matrix tri-factorization to simultaneously decompose expression profiles and spatial network of spots, where expressional and spatial features of spots are fused via the shared factor matrix that interpreted as similarity of spatial domains. Furthermore, ST-SCSR learns affinity graph of spots by manipulating expressional and spatial features, where local preservation and sparse constraints are employed, thereby enhancing the quality of graph. The experimental results demonstrate that ST-SCSR not only outperforms state-of-the-art algorithms in terms of accuracy, but also identifies many potential interesting patterns.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica , Transcriptoma , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Humanos

6.

Biolinguistic graph fusion model for circRNA-miRNA association prediction.

Guo, Lu-Xiang; Wang, Lei; You, Zhu-Hong; Yu, Chang-Qing; Hu, Meng-Lei; Zhao, Bo-Wei; Li, Yang.

Brief Bioinform ; 25(2)2024 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-38426324

RESUMEN

Emerging clinical evidence suggests that sophisticated associations with circular ribonucleic acids (RNAs) (circRNAs) and microRNAs (miRNAs) are a critical regulatory factor of various pathological processes and play a critical role in most intricate human diseases. Nonetheless, the above correlations via wet experiments are error-prone and labor-intensive, and the underlying novel circRNA-miRNA association (CMA) has been validated by numerous existing computational methods that rely only on single correlation data. Considering the inadequacy of existing machine learning models, we propose a new model named BGF-CMAP, which combines the gradient boosting decision tree with natural language processing and graph embedding methods to infer associations between circRNAs and miRNAs. Specifically, BGF-CMAP extracts sequence attribute features and interaction behavior features by Word2vec and two homogeneous graph embedding algorithms, large-scale information network embedding and graph factorization, respectively. Multitudinous comprehensive experimental analysis revealed that BGF-CMAP successfully predicted the complex relationship between circRNAs and miRNAs with an accuracy of 82.90% and an area under receiver operating characteristic of 0.9075. Furthermore, 23 of the top 30 miRNA-associated circRNAs of the studies on data were confirmed in relevant experiences, showing that the BGF-CMAP model is superior to others. BGF-CMAP can serve as a helpful model to provide a scientific theoretical basis for the study of CMA prediction.

Asunto(s)

MicroARNs , Humanos , MicroARNs/genética , ARN Circular/genética , Curva ROC , Aprendizaje Automático , Algoritmos , Biología Computacional/métodos

7.

Genomic loci influence patterns of structural covariance in the human brain.

Wen, Junhao; Nasrallah, Ilya M; Abdulkadir, Ahmed; Satterthwaite, Theodore D; Yang, Zhijian; Erus, Guray; Robert-Fitzgerald, Timothy; Singh, Ashish; Sotiras, Aristeidis; Boquet-Pujadas, Aleix; Mamourian, Elizabeth; Doshi, Jimit; Cui, Yuhan; Srinivasan, Dhivya; Skampardoni, Ioanna; Chen, Jiong; Hwang, Gyujoon; Bergman, Mark; Bao, Jingxuan; Veturi, Yogasudha; Zhou, Zhen; Yang, Shu; Dazzan, Paola; Kahn, Rene S; Schnack, Hugo G; Zanetti, Marcus V; Meisenzahl, Eva; Busatto, Geraldo F; Crespo-Facorro, Benedicto; Pantelis, Christos; Wood, Stephen J; Zhuo, Chuanjun; Shinohara, Russell T; Gur, Ruben C; Gur, Raquel E; Koutsouleris, Nikolaos; Wolf, Daniel H; Saykin, Andrew J; Ritchie, Marylyn D; Shen, Li; Thompson, Paul M; Colliot, Olivier; Wittfeld, Katharina; Grabe, Hans J; Tosun, Duygu; Bilgel, Murat; An, Yang; Marcus, Daniel S; LaMontagne, Pamela; Heckbert, Susan R.

Proc Natl Acad Sci U S A ; 120(52): e2300842120, 2023 Dec 26.

Artículo en Inglés | MEDLINE | ID: mdl-38127979

RESUMEN

Normal and pathologic neurobiological processes influence brain morphology in coordinated ways that give rise to patterns of structural covariance (PSC) across brain regions and individuals during brain aging and diseases. The genetic underpinnings of these patterns remain largely unknown. We apply a stochastic multivariate factorization method to a diverse population of 50,699 individuals (12 studies and 130 sites) and derive data-driven, multi-scale PSCs of regional brain size. PSCs were significantly correlated with 915 genomic loci in the discovery set, 617 of which are newly identified, and 72% were independently replicated. Key pathways influencing PSCs involve reelin signaling, apoptosis, neurogenesis, and appendage development, while pathways of breast cancer indicate potential interplays between brain metastasis and PSCs associated with neurodegeneration and dementia. Using support vector machines, multi-scale PSCs effectively derive imaging signatures of several brain diseases. Our results elucidate genetic and biological underpinnings that influence structural covariance patterns in the human brain.

Asunto(s)

Neoplasias Encefálicas , Imagen por Resonancia Magnética , Humanos , Imagen por Resonancia Magnética/métodos , Encéfalo/patología , Mapeo Encefálico/métodos , Genómica , Neoplasias Encefálicas/patología

8.

Fragmentation landscape of cell-free DNA revealed by deconvolutional analysis of end motifs.

Zhou, Ze; Ma, Mary-Jane L; Chan, Rebecca W Y; Lam, W K Jacky; Peng, Wenlei; Gai, Wanxia; Hu, Xi; Ding, Spencer C; Ji, Lu; Zhou, Qing; Cheung, Peter P H; Yu, Stephanie C Y; Teoh, Jeremy Y C; Szeto, Cheuk-Chun; Wong, John; Wong, Vincent W S; Wong, Grace L H; Chan, Stephen L; Hui, Edwin P; Ma, Brigette B Y; Chan, Anthony T C; Chiu, Rossa W K; Chan, K C Allen; Lo, Y M Dennis; Jiang, Peiyong.

Proc Natl Acad Sci U S A ; 120(17): e2220982120, 2023 04 25.

Artículo en Inglés | MEDLINE | ID: mdl-37075072

RESUMEN

Cell-free DNA (cfDNA) fragmentation is nonrandom, at least partially mediated by various DNA nucleases, forming characteristic cfDNA end motifs. However, there is a paucity of tools for deciphering the relative contributions of cfDNA cleavage patterns related to underlying fragmentation factors. In this study, through non-negative matrix factorization algorithm, we used 256 5' 4-mer end motifs to identify distinct types of cfDNA cleavage patterns, referred to as "founder" end-motif profiles (F-profiles). F-profiles were associated with different DNA nucleases based on whether such patterns were disrupted in nuclease-knockout mouse models. Contributions of individual F-profiles in a cfDNA sample could be determined by deconvolutional analysis. We analyzed 93 murine cfDNA samples of different nuclease-deficient mice and identified six types of F-profiles. F-profiles I, II, and III were linked to deoxyribonuclease 1 like 3 (DNASE1L3), deoxyribonuclease 1 (DNASE1), and DNA fragmentation factor subunit beta (DFFB), respectively. We revealed that 42.9% of plasma cfDNA molecules were attributed to DNASE1L3-mediated fragmentation, whereas 43.4% of urinary cfDNA molecules involved DNASE1-mediated fragmentation. We further demonstrated that the relative contributions of F-profiles were useful to inform pathological states, such as autoimmune disorders and cancer. Among the six F-profiles, the use of F-profile I could inform the human patients with systemic lupus erythematosus. F-profile VI could be used to detect individuals with hepatocellular carcinoma, with an area under the receiver operating characteristic curve of 0.97. F-profile VI was more prominent in patients with nasopharyngeal carcinoma undergoing chemoradiotherapy. We proposed that this profile might be related to oxidative stress.

Asunto(s)

Ácidos Nucleicos Libres de Células , Humanos , Ratones , Animales , Ácidos Nucleicos Libres de Células/genética , Desoxirribonucleasas/genética , Ratones Noqueados , Endonucleasas/genética , Fragmentación del ADN , Endodesoxirribonucleasas/genética

9.

Quantifying common and distinct information in single-cell multimodal data with Tilted Canonical Correlation Analysis.

Lin, Kevin Z; Zhang, Nancy R.

Proc Natl Acad Sci U S A ; 120(32): e2303647120, 2023 08 08.

Artículo en Inglés | MEDLINE | ID: mdl-37523521

RESUMEN

Multimodal single-cell technologies profile multiple modalities for each cell simultaneously, enabling a more thorough characterization of cell populations. Existing dimension-reduction methods for multimodal data capture the "union of information," producing a lower-dimensional embedding that combines the information across modalities. While these tools are useful, we focus on a fundamentally different task of separating and quantifying the information among cells that is shared between the two modalities as well as unique to only one modality. Hence, we develop Tilted Canonical Correlation Analysis (Tilted-CCA), a method that decomposes a paired multimodal dataset into three lower-dimensional embeddings-one embedding captures the "intersection of information," representing the geometric relations among the cells that is common to both modalities, while the remaining two embeddings capture the "distinct information for a modality," representing the modality-specific geometric relations. We analyze single-cell multimodal datasets sequencing RNA along surface antibodies (i.e., CITE-seq) as well as RNA alongside chromatin accessibility (i.e., 10x) for blood cells and developing neurons via Tilted-CCA. These analyses show that Tilted-CCA enables meaningful visualization and quantification of the cross-modal information. Finally, Tilted-CCA's framework allows us to perform two specific downstream analyses. First, for single-cell datasets that simultaneously profile transcriptome and surface antibody markers, we show that Tilted-CCA helps design the target antibody panel to complement the transcriptome best. Second, for developmental single-cell datasets that simultaneously profile transcriptome and chromatin accessibility, we show that Tilted-CCA helps identify development-informative genes and distinguish between transient versus terminal cell types.

Asunto(s)

Algoritmos , Análisis de Correlación Canónica , Transcriptoma , Análisis de la Célula Individual/métodos

10.

SMURF: embedding single-cell RNA-seq data with matrix factorization preserving self-consistency.

Pu, Juhua; Wang, Bingchen; Liu, Xingwu; Chen, Lingxi; Li, Shuai Cheng.

Brief Bioinform ; 24(2)2023 03 19.

Artículo en Inglés | MEDLINE | ID: mdl-36715274

RESUMEN

The advance in single-cell RNA-sequencing (scRNA-seq) sheds light on cell-specific transcriptomic studies of cell developments, complex diseases and cancers. Nevertheless, scRNA-seq techniques suffer from 'dropout' events, and imputation tools are proposed to address the sparsity. Here, rather than imputation, we propose a tool, SMURF, to extract the low-dimensional embeddings from cells and genes utilizing matrix factorization with a mixture of Poisson-Gamma divergent as objective while preserving self-consistency. SMURF exhibits feasible cell subpopulation discovery efficacy with obtained cell embeddings on replicated in silico and eight web lab scRNA datasets with ground truth cell types. Furthermore, SMURF can reduce the cell embedding to a 1D-oval space to recover the time course of cell cycle. SMURF can also serve as an imputation tool; the in silico data assessment shows that SMURF parades the most robust gene expression recovery power with low root mean square error and high Pearson correlation. Moreover, SMURF recovers the gene distribution for the WM989 Drop-seq data. SMURF is available at https://github.com/deepomicslab/SMURF.

Asunto(s)

Análisis de Expresión Génica de una Sola Célula , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica , Análisis por Conglomerados

11.

Cooperative driver pathways discovery by multiplex network embedding.

Wang, Jun; Chen, Xi; Wu, Zhengtian; Guo, Maozu; Yu, Guoxian.

Brief Bioinform ; 24(3)2023 05 19.

Artículo en Inglés | MEDLINE | ID: mdl-37000166

RESUMEN

Cooperative driver pathways discovery helps researchers to study the pathogenesis of cancer. However, most discovery methods mainly focus on genomics data, and neglect the known pathway information and other related multi-omics data; thus they cannot faithfully decipher the carcinogenic process. We propose CDPMiner (Cooperative Driver Pathways Miner) to discover cooperative driver pathways by multiplex network embedding, which can jointly model relational and attribute information of multi-type molecules. CDPMiner first uses the pathway topology to quantify the weight of genes in different pathways, and optimizes the relations between genes and pathways. Then it constructs an attributed multiplex network consisting of micro RNAs, long noncoding RNAs, genes and pathways, embeds the network through deep joint matrix factorization to mine more essential information for pathway-level analysis and reconstructs the pathway interaction network. Finally, CDPMiner leverages the reconstructed network and mutation data to define the driver weight between pathways to discover cooperative driver pathways. Experimental results on Breast invasive carcinoma and Stomach adenocarcinoma datasets show that CDPMiner can effectively fuse multi-omics data to discover more driver pathways, which indeed cooperatively trigger cancers and are valuable for carcinogenesis analysis. Ablation study justifies CDPMiner for a more comprehensive analysis of cancer by fusing multi-omics data.

Asunto(s)

Algoritmos , Neoplasias de la Mama , Humanos , Femenino , Genómica/métodos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Mutación , Carcinogénesis/genética

12.

Predicting metabolite-disease associations based on auto-encoder and non-negative matrix factorization.

Gao, Hongyan; Sun, Jianqiang; Wang, Yukun; Lu, Yuer; Liu, Liyu; Zhao, Qi; Shuai, Jianwei.

Brief Bioinform ; 24(5)2023 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-37466194

RESUMEN

Metabolism refers to a series of orderly chemical reactions used to maintain life activities in organisms. In healthy individuals, metabolism remains within a normal range. However, specific diseases can lead to abnormalities in the levels of certain metabolites, causing them to either increase or decrease. Detecting these deviations in metabolite levels can aid in diagnosing a disease. Traditional biological experiments often rely on a lot of manpower to do repeated experiments, which is time consuming and labor intensive. To address this issue, we develop a deep learning model based on the auto-encoder and non-negative matrix factorization named as MDA-AENMF to predict the potential associations between metabolites and diseases. We integrate a variety of similarity networks and then acquire the characteristics of both metabolites and diseases through three specific modules. First, we get the disease characteristics from the five-layer auto-encoder module. Later, in the non-negative matrix factorization module, we extract both the metabolite and disease characteristics. Furthermore, the graph attention auto-encoder module helps us obtain metabolite characteristics. After obtaining the features from three modules, these characteristics are merged into a single, comprehensive feature vector for each metabolite-disease pair. Finally, we send the corresponding feature vector and label to the multi-layer perceptron for training. The experiment demonstrates our area under the receiver operating characteristic curve of 0.975 and area under the precision-recall curve of 0.973 in 5-fold cross-validation, which are superior to those of existing state-of-the-art predictive methods. Through case studies, most of the new associations obtained by MDA-AENMF have been verified, further highlighting the reliability of MDA-AENMF in predicting the potential relationships between metabolites and diseases.

Asunto(s)

Algoritmos , Redes Neurales de la Computación , Humanos , Reproducibilidad de los Resultados

13.

UNMF: a unified nonnegative matrix factorization for multi-dimensional omics data.

Abe, Ko; Shimamura, Teppei.

Brief Bioinform ; 24(5)2023 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-37478378

RESUMEN

Factor analysis, ranging from principal component analysis to nonnegative matrix factorization, represents a foremost approach in analyzing multi-dimensional data to extract valuable patterns, and is increasingly being applied in the context of multi-dimensional omics datasets represented in tensor form. However, traditional analytical methods are heavily dependent on the format and structure of the data itself, and if these change even slightly, the analyst must change their data analysis strategy and techniques and spend a considerable amount of time on data preprocessing. Additionally, many traditional methods cannot be applied as-is in the presence of missing values in the data. We present a new statistical framework, unified nonnegative matrix factorization (UNMF), for finding informative patterns in messy biological data sets. UNMF is designed for tidy data format and structure, making data analysis easier and simplifying the development of data analysis tools. UNMF can handle a wide range of data structures and formats, and works seamlessly with tensor data including missing observations and repeated measurements. The usefulness of UNMF is demonstrated through its application to several multi-dimensional omics data, offering user-friendly and unified features for analysis and integration. Its application holds great potential for the life science community. UNMF is implemented with R and is available from GitHub (https://github.com/abikoushi/moltenNMF).

Asunto(s)

Algoritmos , Multiómica , Análisis de Componente Principal , Análisis Factorial

14.

Flexible model-based non-negative matrix factorization with application to mutational signatures.

Laursen, Ragnhild; Maretty, Lasse; Hobolth, Asger.

Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-38753402

RESUMEN

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.

Asunto(s)

Algoritmos , Mutación , Neoplasias , Humanos , Neoplasias/genética , Modelos Genéticos , Simulación por Computador , Modelos Estadísticos

15.

JLONMFSC: Clustering scRNA-seq data based on joint learning of non-negative matrix factorization and subspace clustering.

Lan, Wei; Liu, Mingyang; Chen, Jianwei; Ye, Jin; Zheng, Ruiqing; Zhu, Xiaoshu; Peng, Wei.

Methods ; 222: 1-9, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38128706

RESUMEN

The development of single cell RNA sequencing (scRNA-seq) has provided new perspectives to study biological problems at the single cell level. One of the key issues in scRNA-seq data analysis is to divide cells into several clusters for discovering the heterogeneity and diversity of cells. However, the existing scRNA-seq data are high-dimensional, sparse, and noisy, which challenges the existing single-cell clustering methods. In this study, we propose a joint learning framework (JLONMFSC) for clustering scRNA-seq data. In our method, the dimension of the original data is reduced to minimize the effect of noise. In addition, the graph regularized matrix factorization is used to learn the local features. Further, the Low-Rank Representation (LRR) subspace clustering is utilized to learn the global features. Finally, the joint learning of local features and global features is performed to obtain the results of clustering. We compare the proposed algorithm with eight state-of-the-art algorithms for clustering performance on six datasets, and the experimental results demonstrate that the JLONMFSC achieves better performance in all datasets. The code is avalable at https://github.com/lanbiolab/JLONMFSC.

Asunto(s)

Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados

16.

Computational chromatography: A machine learning strategy for demixing individual chemical components in complex mixtures.

Bajomo, Mary M; Ju, Yilong; Zhou, Jingyi; Elefterescu, Simina; Farr, Corbin; Zhao, Yiping; Neumann, Oara; Nordlander, Peter; Patel, Ankit; Halas, Naomi J.

Proc Natl Acad Sci U S A ; 119(52): e2211406119, 2022 12 27.

Artículo en Inglés | MEDLINE | ID: mdl-36534806

RESUMEN

Surface-enhanced Raman spectroscopy (SERS) holds exceptional promise as a streamlined chemical detection strategy for biological and environmental contaminants compared with current laboratory methods. Priority pollutants such as polycyclic aromatic hydrocarbons (PAHs), detectable in water and soil worldwide and known to induce multiple adverse health effects upon human exposure, are typically found in multicomponent mixtures. By combining the molecular fingerprinting capabilities of SERS with the signal separation and detection capabilities of machine learning (ML), we examine whether individual PAHs can be identified through an analysis of the SERS spectra of multicomponent PAH mixtures. We have developed an unsupervised ML method we call Characteristic Peak Extraction, a dimensionality reduction algorithm that extracts characteristic SERS peaks based on counts of detected peaks of the mixture. By analyzing the SERS spectra of two-component and four-component PAH mixtures where the concentration ratios of the various components vary, this algorithm is able to extract the spectra of each unknown component in the mixture of unknowns, which is then subsequently identified against a SERS spectral library of PAHs. Combining the molecular fingerprinting capabilities of SERS with the signal separation and detection capabilities of ML, this effort is a step toward the computational demixing of unknown chemical components occurring in complex multicomponent mixtures.

Asunto(s)

Contaminantes Ambientales , Hidrocarburos Policíclicos Aromáticos , Humanos , Hidrocarburos Policíclicos Aromáticos/análisis , Espectrometría Raman/métodos , Agua , Contaminantes Ambientales/análisis , Mezclas Complejas , Aprendizaje Automático

17.

Revealing the Distribution of Lithium Compounds in Lithium Dendrites by Four-Dimensional Electron Microscopy Analysis.

Wang, Zeyu; Zhai, Wenbo; Yu, Yi.

Nano Lett ; 24(8): 2537-2543, 2024 Feb 28.

Artículo en Inglés | MEDLINE | ID: mdl-38372692

RESUMEN

Characterizing the microstructure of radiation- and chemical-sensitive lithium dendrites and its solid electrolyte interphase (SEI) is an important task when investigating the performance and reliability of lithium-ion batteries. Widely used methods, such as cryogenic high-resolution transmission electron microscopy as well as related spectroscopy, are able to reveal the local structure at nanometer and atomic scale; however, these methods are unable to show the distribution of various crystal phases along the dendrite in a large field of view. In this work, two types of four-dimensional electron microscopy diffractive imaging methods, i.e., scanning electron nanodiffraction (SEND) and scanning convergent beam electron diffraction (SCBED), are employed to show a new pathway on characterizing the sensitive lithium dendrite samples at room temperature and in a large field of view. Combining with the non-negative matrix factorization (NMF) algorithm, orientations of different lithium metal grains along the lithium dendrite as well as different lithium compounds in the SEI layer are clearly identified.

18.

Denoiseit: denoising gene expression data using rank based isolation trees.

Jeon, Jaemin; Suk, Youjeong; Kim, Sang Cheol; Jo, Hye-Yeong; Kim, Kwangsoo; Jung, Inuk.

BMC Bioinformatics ; 25(1): 271, 2024 Aug 21.

Artículo en Inglés | MEDLINE | ID: mdl-39169300

RESUMEN

BACKGROUND: Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers. METHOD: In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes. RESULTS: DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https://github.com/cobi-git/DenoiseIt.

Asunto(s)

COVID-19 , Perfilación de la Expresión Génica , Humanos , COVID-19/genética , COVID-19/virología , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , SARS-CoV-2/genética , RNA-Seq/métodos

19.

eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings.

Lin, Kevin Z; Qiu, Yixuan; Roeder, Kathryn.

BMC Bioinformatics ; 25(1): 113, 2024 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-38486150

RESUMEN

BACKGROUND: Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS: We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS: eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.

Asunto(s)

Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Humanos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Análisis de la Célula Individual/métodos

20.

Cauchy hyper-graph Laplacian nonnegative matrix factorization for single-cell RNA-sequencing data analysis.

Wang, Gao-Fei; Shen, Longying.

BMC Bioinformatics ; 25(1): 169, 2024 Apr 29.

Artículo en Inglés | MEDLINE | ID: mdl-38684942

RESUMEN

Many important biological facts have been found as single-cell RNA sequencing (scRNA-seq) technology has advanced. With the use of this technology, it is now possible to investigate the connections among individual cells, genes, and illnesses. For the analysis of single-cell data, clustering is frequently used. Nevertheless, biological data usually contain a large amount of noise data, and traditional clustering methods are sensitive to noise. However, acquiring higher-order spatial information from the data alone is insufficient. As a result, getting trustworthy clustering findings is challenging. We propose the Cauchy hyper-graph Laplacian non-negative matrix factorization (CHLNMF) as a unique approach to address these issues. In CHLNMF, we replace the measurement based on Euclidean distance in the conventional non-negative matrix factorization (NMF), which can lessen the influence of noise, with the Cauchy loss function (CLF). The model also incorporates the hyper-graph constraint, which takes into account the high-order link among the samples. The CHLNMF model's best solution is then discovered using a half-quadratic optimization approach. Finally, using seven scRNA-seq datasets, we contrast the CHLNMF technique with the other nine top methods. The validity of our technique was established by analysis of the experimental outcomes.

Asunto(s)

Algoritmos , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Análisis por Conglomerados , Biología Computacional/métodos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA