Búsqueda | Portal Regional de la BVS

SigPrimedNet: A Signaling-Informed Neural Network for scRNA-seq Annotation of Known and Unknown Cell Types.

Gundogdu, Pelin; Alamo, Inmaculada; Nepomuceno-Chamorro, Isabel A; Dopazo, Joaquin; Loucera, Carlos.

Biology (Basel) ; 12(4)2023 Apr 10.

Artículo en Inglés | MEDLINE | ID: mdl-37106779

RESUMEN

Single-cell RNA sequencing is increasing our understanding of the behavior of complex tissues or organs, by providing unprecedented details on the complex cell type landscape at the level of individual cells. Cell type definition and functional annotation are key steps to understanding the molecular processes behind the underlying cellular communication machinery. However, the exponential growth of scRNA-seq data has made the task of manually annotating cells unfeasible, due not only to an unparalleled resolution of the technology but to an ever-increasing heterogeneity of the data. Many supervised and unsupervised methods have been proposed to automatically annotate cells. Supervised approaches for cell-type annotation outperform unsupervised methods except when new (unknown) cell types are present. Here, we introduce SigPrimedNet an artificial neural network approach that leverages (i) efficient training by means of a sparsity-inducing signaling circuits-informed layer, (ii) feature representation learning through supervised training, and (iii) unknown cell-type identification by fitting an anomaly detection method on the learned representation. We show that SigPrimedNet can efficiently annotate known cell types while keeping a low false-positive rate for unseen cells across a set of publicly available datasets. In addition, the learned representation acts as a proxy for signaling circuit activity measurements, which provide useful estimations of the cell functionalities.

Ten quick tips for biomarker discovery and validation analyses using machine learning.

Diaz-Uriarte, Ramon; Gómez de Lope, Elisa; Giugno, Rosalba; Fröhlich, Holger; Nazarov, Petr V; Nepomuceno-Chamorro, Isabel A; Rauschenberger, Armin; Glaab, Enrico.

PLoS Comput Biol ; 18(8): e1010357, 2022 08.

Artículo en Inglés | MEDLINE | ID: mdl-35951526

Asunto(s)

Investigación Biomédica , Aprendizaje Automático , Biomarcadores , Biología Computacional

Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S.

BioData Min ; 11: 4, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-29610579

RESUMEN

BACKGROUND: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. RESULTS: The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. CONCLUSIONS: It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.

Building Transcriptional Association Networks in Cytoscape with RegNetC.

Nepomuceno-Chamorro, Isabel A; Marquez-Chamorro, Alfonso; Aguilar-Ruiz, Jesus S.

IEEE/ACM Trans Comput Biol Bioinform ; 12(4): 823-4, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26357322

RESUMEN

The Regression Network plugin for Cytoscape (RegNetC) implements the RegNet algorithm for the inference of transcriptional association network from gene expression profiles. This algorithm is a model tree-based method to detect the relationship between each gene and the remaining genes simultaneously instead of analyzing individually each pair of genes as correlation-based methods do. Model trees are a very useful technique to estimate the gene expression value by regression models and favours localized similarities over more global similarity, which is one of the major drawbacks of correlation-based methods. Here, we present an integrated software suite, named RegNetC, as a Cytoscape plugin that can operate on its own as well. RegNetC facilitates, according to user-defined parameters, the resulted transcriptional gene association network in .sif format for visualization, analysis and interoperates with other Cytoscape plugins, which can be exported for publication figures. In addition to the network, the RegNetC plugin also provides the quantitative relationships between genes expression values of those genes involved in the inferred network, i.e., those defined by the regression models.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Programas Informáticos , Biología de Sistemas/métodos , Algoritmos , Redes Reguladoras de Genes , Modelos Lineales

Integrating biological knowledge based on functional annotations for biclustering of gene expression data.

Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S.

Comput Methods Programs Biomed ; 119(3): 163-80, 2015 May.

Artículo en Inglés | MEDLINE | ID: mdl-25843807

RESUMEN

Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica/estadística & datos numéricos , Anotación de Secuencia Molecular/estadística & datos numéricos , Aprendizaje Automático no Supervisado/estadística & datos numéricos , Análisis por Conglomerados , Minería de Datos , Bases de Datos Genéticas/estadística & datos numéricos , Ontología de Genes/estadística & datos numéricos , Genes Fúngicos , Bases del Conocimiento , Levaduras/genética

Transcriptional response to cardiac injury in the zebrafish: systematic identification of genes with highly concordant activity across in vivo models.

Rodius, Sophie; Nazarov, Petr V; Nepomuceno-Chamorro, Isabel A; Jeanty, Céline; González-Rosa, Juan Manuel; Ibberson, Mark; da Costa, Ricardo M Benites; Xenarios, Ioannis; Mercader, Nadia; Azuaje, Francisco.

BMC Genomics ; 15: 852, 2014 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-25280539

RESUMEN

BACKGROUND: Zebrafish is a clinically-relevant model of heart regeneration. Unlike mammals, it has a remarkable heart repair capacity after injury, and promises novel translational applications. Amputation and cryoinjury models are key research tools for understanding injury response and regeneration in vivo. An understanding of the transcriptional responses following injury is needed to identify key players of heart tissue repair, as well as potential targets for boosting this property in humans. RESULTS: We investigated amputation and cryoinjury in vivo models of heart damage in the zebrafish through unbiased, integrative analyses of independent molecular datasets. To detect genes with potential biological roles, we derived computational prediction models with microarray data from heart amputation experiments. We focused on a top-ranked set of genes highly activated in the early post-injury stage, whose activity was further verified in independent microarray datasets. Next, we performed independent validations of expression responses with qPCR in a cryoinjury model. Across in vivo models, the top candidates showed highly concordant responses at 1 and 3 days post-injury, which highlights the predictive power of our analysis strategies and the possible biological relevance of these genes. Top candidates are significantly involved in cell fate specification and differentiation, and include heart failure markers such as periostin, as well as potential new targets for heart regeneration. For example, ptgis and ca2 were overexpressed, while usp2a, a regulator of the p53 pathway, was down-regulated in our in vivo models. Interestingly, a high activity of ptgis and ca2 has been previously observed in failing hearts from rats and humans. CONCLUSIONS: We identified genes with potential critical roles in the response to cardiac damage in the zebrafish. Their transcriptional activities are reproducible in different in vivo models of cardiac injury.

Asunto(s)

Lesiones Cardíacas/metabolismo , Animales , Biología Computacional , Sistema Enzimático del Citocromo P-450/genética , Sistema Enzimático del Citocromo P-450/metabolismo , Modelos Animales de Enfermedad , Endopeptidasas/genética , Endopeptidasas/metabolismo , Corazón/fisiología , Lesiones Cardíacas/genética , Lesiones Cardíacas/patología , Miocardio/metabolismo , Miocardio/patología , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena en Tiempo Real de la Polimerasa , Regeneración , Factores de Tiempo , Transcriptoma , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo , Pez Cebra , Proteínas de Pez Cebra/genética , Proteínas de Pez Cebra/metabolismo

CarGene: characterisation of sets of genes based on metabolic pathways analysis.

Aguilar-Ruiz, Jesus S; Rodriguez-Baena, Domingo S; Diaz-Diaz, Norberto; Nepomuceno-Chamorro, Isabel A.

Int J Data Min Bioinform ; 5(5): 558-73, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-22145534

RESUMEN

The great amount of biological information provides scientists with an incomparable framework for testing the results of new algorithms. Several tools have been developed for analysing gene-enrichment and most of them are Gene Ontology-based tools. We developed a Kyoto Encyclopedia of Genes and Genomes (Kegg)-based tool that provides a friendly graphical environment for analysing gene-enrichment. The tool integrates two statistical corrections and simultaneously analysing the information about many groups of genes in both visual and textual manner. We tested the usefulness of our approach on a previous analysis (Huttenshower et al.). Furthermore, our tool is freely available (http://www.upo.es/eps/bigs/cargene.html).

Asunto(s)

Proteínas/genética , Programas Informáticos , Animales , Bases de Datos Factuales , Expresión Génica , Genes , Genoma , Redes y Vías Metabólicas/genética , Proteínas/metabolismo

Inferring gene regression networks with model trees.

Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesus S; Riquelme, Jose C.

BMC Bioinformatics ; 11: 517, 2010 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-20950452

RESUMEN

BACKGROUND: Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. RESULTS: We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. CONCLUSIONS: REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET.

Asunto(s)

Biología Computacional/métodos , Redes Reguladoras de Genes , Bases de Datos Genéticas , Escherichia coli/genética , Modelos Lineales , Saccharomyces cerevisiae/genética , Transcripción Genética

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA