RESUMEN
Asthma is a common chronic airway disease worldwide. Due to its clinical and genetic heterogeneity, the cellular and molecular processes in asthma are highly complex and relatively unknown. To discover novel biomarkers and the molecular mechanisms underlying asthma, several studies have been conducted by focusing on gene expression patterns in epithelium through microarray analysis. However, few robust specific biomarkers were identified and some inconsistent results were observed. Therefore, it is imperative to conduct a robust analysis to solve these problems. Herein, an integrated gene expression analysis of ten independent, publicly available microarray data of bronchial epithelial cells from 348 asthmatic patients and 208 healthy controls was performed. As a result, 78 up- and 75 down-regulated genes were identified in bronchial epithelium of asthmatics. Comprehensive functional enrichment and pathway analysis revealed that response to chemical stimulus, extracellular region, pathways in cancer, and arachidonic acid metabolism were the four most significantly enriched terms. In the protein-protein interaction network, three main communities associated with cytoskeleton, response to lipid, and regulation of response to stimulus were established, and the most highly ranked 6 hub genes (up-regulated CD44, KRT6A, CEACAM5, SERPINB2, and down-regulated LTF and MUC5B) were identified and should be considered as new biomarkers. Pathway cross-talk analysis highlights that signaling pathways mediated by IL-4/13 and transcription factor HIF-1α and FOXA1 play crucial roles in the pathogenesis of asthma. Interestingly, three chemicals, polyphenol catechin, antibiotic lomefloxacin, and natural alkaloid boldine, were predicted and may be potential drugs for asthma treatment. Taken together, our findings shed new light on the common molecular pathogenesis mechanisms of asthma and provide theoretical support for further clinical therapeutic studies.
Asunto(s)
Asma/diagnóstico , Biología de Sistemas/métodos , Asma/genética , Asma/metabolismo , Asma/patología , Biomarcadores/análisis , Biomarcadores/metabolismo , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Mapas de Interacción de Proteínas , TranscriptomaRESUMEN
BACKGROUND: Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS: We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS: No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Humanos , Programas Informáticos , Simulación por Computador , Transcriptoma , Biología Computacional/métodos , Análisis de Secuencia de ARN/métodos , RNA-Seq/métodos , RNA-Seq/normasRESUMEN
With the rapid development of single-cell RNA-sequencing techniques, various computational methods and tools were proposed to analyze these high-throughput data, which led to an accelerated reveal of potential biological information. As one of the core steps of single-cell transcriptome data analysis, clustering plays a crucial role in identifying cell types and interpreting cellular heterogeneity. However, the results generated by different clustering methods showed distinguishing, and those unstable partitions can affect the accuracy of the analysis to a certain extent. To overcome this challenge and obtain more accurate results, currently clustering ensemble is frequently applied to cluster analysis of single-cell transcriptome datasets, and the results generated by all clustering ensembles are nearly more reliable than those from most of the single clustering partitions. In this review, we summarize applications and challenges of the clustering ensemble method in single-cell transcriptome data analysis, and provide constructive thoughts and references for researchers in this field.
Asunto(s)
Análisis de la Célula Individual , Análisis de Expresión Génica de una Sola Célula , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Transcriptoma/genética , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , AlgoritmosRESUMEN
For accurate gene expression quantification, normalization of gene expression data against reliable reference genes is required. It is known that the expression levels of commonly used reference genes vary considerably under different experimental conditions, and therefore, their use for data normalization is limited. In this study, an unbiased identification of reference genes in Caenorhabditis elegans was performed based on 145 microarray datasets (2296 gene array samples) covering different developmental stages, different tissues, drug treatments, lifestyle, and various stresses. As a result, thirteen housekeeping genes (rps-23, rps-26, rps-27, rps-16, rps-2, rps-4, rps-17, rpl-24.1, rpl-27, rpl-33, rpl-36, rpl-35, and rpl-15) with enhanced stability were comprehensively identified by using six popular normalization algorithms and RankAggreg method. Functional enrichment analysis revealed that these genes were significantly overrepresented in GO terms or KEGG pathways related to ribosomes. Validation analysis using recently published datasets revealed that the expressions of newly identified candidate reference genes were more stable than the commonly used reference genes. Based on the results, we recommended using rpl-33 and rps-26 as the optimal reference genes for microarray and rps-2 and rps-4 for RNA-sequencing data validation. More importantly, the most stable rps-23 should be a promising reference gene for both data types. This study, for the first time, successfully displays a large-scale microarray data driven genome-wide identification of stable reference genes for normalizing gene expression data and provides a potential guideline on the selection of universal internal reference genes in C. elegans, for quantitative gene expression analysis.