Pesquisa | Portal Regional da BVS

Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach.

Malagoli, Gabriele; Valle, Filippo; Barillot, Emmanuel; Caselle, Michele; Martignetti, Loredana.

Cancers (Basel) ; 16(7)2024 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-38611028

RESUMO

Topic modeling is a popular technique in machine learning and natural language processing, where a corpus of text documents is classified into themes or topics using word frequency analysis. This approach has proven successful in various biological data analysis applications, such as predicting cancer subtypes with high accuracy and identifying genes, enhancers, and stable cell types simultaneously from sparse single-cell epigenomics data. The advantage of using a topic model is that it not only serves as a clustering algorithm, but it can also explain clustering results by providing word probability distributions over topics. Our study proposes a novel topic modeling approach for clustering single cells and detecting topics (gene signatures) in single-cell datasets that measure multiple omics simultaneously. We applied this approach to examine the transcriptional heterogeneity of luminal and triple-negative breast cancer cells using patient-derived xenograft models with acquired resistance to chemotherapy and targeted therapy. Through this approach, we identified protein-coding genes and long non-coding RNAs (lncRNAs) that group thousands of cells into biologically similar clusters, accurately distinguishing drug-sensitive and -resistant breast cancer types. In comparison to standard state-of-the-art clustering analyses, our approach offers an optimal partitioning of genes into topics and cells into clusters simultaneously, producing easily interpretable clustering outcomes. Additionally, we demonstrate that an integrative clustering approach, which combines the information from mRNAs and lncRNAs treated as disjoint omics layers, enhances the accuracy of cell classification.

Integrated microRNA and proteome analysis of cancer datasets with MoPC.

Lovino, Marta; Ficarra, Elisa; Martignetti, Loredana.

PLoS One ; 19(3): e0289699, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38512819

RESUMO

MicroRNAs (miRNAs) are small molecules that play an essential role in regulating gene expression by post-transcriptional gene silencing. Their study is crucial in revealing the fundamental processes underlying pathologies and, in particular, cancer. To date, most studies on miRNA regulation consider the effect of specific miRNAs on specific target mRNAs, providing wet-lab validation. However, few tools have been developed to explain the miRNA-mediated regulation at the protein level. In this paper, the MoPC computational tool is presented, that relies on the partial correlation between mRNAs and proteins conditioned on the miRNA expression to predict miRNA-target interactions in multi-omic datasets. MoPC returns the list of significant miRNA-target interactions and plot the significant correlations on the heatmap in which the miRNAs and targets are ordered by the chromosomal location. The software was applied on three TCGA/CPTAC datasets (breast, glioblastoma, and lung cancer), returning enriched results in three independent targets databases.

Assuntos

MicroRNAs , Neoplasias , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Proteoma/genética , Proteoma/metabolismo , Neoplasias/genética , Software , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica

Representation and quantification of module activity from omics data with rROMA.

Najm, Matthieu; Cornet, Matthieu; Albergante, Luca; Zinovyev, Andrei; Sermet-Gaudelus, Isabelle; Stoven, Véronique; Calzone, Laurence; Martignetti, Loredana.

NPJ Syst Biol Appl ; 10(1): 8, 2024 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-38242871

RESUMO

The efficiency of analyzing high-throughput data in systems biology has been demonstrated in numerous studies, where molecular data, such as transcriptomics and proteomics, offers great opportunities for understanding the complexity of biological processes. One important aspect of data analysis in systems biology is the shift from a reductionist approach that focuses on individual components to a more integrative perspective that considers the system as a whole, where the emphasis shifted from differential expression of individual genes to determining the activity of gene sets. Here, we present the rROMA software package for fast and accurate computation of the activity of gene sets with coordinated expression. The rROMA package incorporates significant improvements in the calculation algorithm, along with the implementation of several functions for statistical analysis and visualizing results. These additions greatly expand the package's capabilities and offer valuable tools for data analysis and interpretation. It is an open-source package available on github at: www.github.com/sysbio-curie/rROMA . Based on publicly available transcriptomic datasets, we applied rROMA to cystic fibrosis, highlighting biological mechanisms potentially involved in the establishment and progression of the disease and the associated genes. Results indicate that rROMA can detect disease-related active signaling pathways using transcriptomic and proteomic data. The results notably identified a significant mechanism relevant to cystic fibrosis, raised awareness of a possible bias related to cell culture, and uncovered an intriguing gene that warrants further investigation.

Assuntos

Fibrose Cística , Proteômica , Humanos , Proteômica/métodos , Perfilação da Expressão Gênica/métodos , Transcriptoma/genética , Biologia de Sistemas/métodos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA