Pesquisa | Secretaria de Estado da Saúde

Characterizing efficient feature selection for single-cell expression analysis.

Cho, Juok; Baik, Bukyung; Nguyen, Hai C T; Park, Daeui; Nam, Dougu.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38975891

RESUMO

Unsupervised feature selection is a critical step for efficient and accurate analysis of single-cell RNA-seq data. Previous benchmarks used two different criteria to compare feature selection methods: (i) proportion of ground-truth marker genes included in the selected features and (ii) accuracy of cell clustering using ground-truth cell types. Here, we systematically compare the performance of 11 feature selection methods for both criteria. We first demonstrate the discordance between these criteria and suggest using the latter. We then compare the distribution of selected genes in their means between feature selection methods. We show that lowly expressed genes exhibit seriously high coefficients of variation and are mostly excluded by high-performance methods. In particular, high-deviation- and high-expression-based methods outperform the widely used in Seurat package in clustering cells and data visualization. We further show they also enable a clear separation of the same cell type from different tissues as well as accurate estimation of cell trajectories.

Assuntos

Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Humanos , Perfilação da Expressão Gênica/métodos , Algoritmos , Biologia Computacional/métodos , Análise de Sequência de RNA/métodos , RNA-Seq/métodos

Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets.

Yoon, Sora; Nguyen, Hai C T; Jo, Woobeen; Kim, Jinhwan; Chi, Sang-Mun; Park, Jiyoung; Kim, Seon-Young; Nam, Dougu.

Nucleic Acids Res ; 47(9): e53, 2019 05 21.

Artigo em Inglês | MEDLINE | ID: mdl-30820547

RESUMO

We present a novel approach to identify human microRNA (miRNA) regulatory modules (mRNA targets and relevant cell conditions) by biclustering a large collection of mRNA fold-change data for sequence-specific targets. Bicluster targets were assessed using validated messenger RNA (mRNA) targets and exhibited on an average 17.0% (median 19.4%) improved gain in certainty (sensitivity + specificity). The net gain was further increased up to 32.0% (median 33.4%) by incorporating functional networks of targets. We analyzed cancer-specific biclusters and found that the PI3K/Akt signaling pathway is strongly enriched with targets of a few miRNAs in breast cancer and diffuse large B-cell lymphoma. Indeed, five independent prognostic miRNAs were identified, and repression of bicluster targets and pathway activity by miR-29 was experimentally validated. In total, 29 898 biclusters for 459 human miRNAs were collected in the BiMIR database where biclusters are searchable for miRNAs, tissues, diseases, keywords and target genes.

Assuntos

Big Data , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , MicroRNAs/genética , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Bases de Dados Genéticas , Feminino , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Linfoma Difuso de Grandes Células B/genética , Linfoma Difuso de Grandes Células B/patologia , Fosfatidilinositol 3-Quinases/genética , Prognóstico , Proteínas Proto-Oncogênicas c-akt/genética , Transdução de Sinais/genética , Transcriptoma/genética

Efficient pathway enrichment and network analysis of GWAS summary data using GSA-SNP2.

Yoon, Sora; Nguyen, Hai C T; Yoo, Yun J; Kim, Jinhwan; Baik, Bukyung; Kim, Sounkou; Kim, Jin; Kim, Sangsoo; Nam, Dougu.

Nucleic Acids Res ; 46(10): e60, 2018 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-29562348

RESUMO

Pathway-based analysis in genome-wide association study (GWAS) is being widely used to uncover novel multi-genic functional associations. Many of these pathway-based methods have been used to test the enrichment of the associated genes in the pathways, but exhibited low powers and were highly affected by free parameters. We present the novel method and software GSA-SNP2 for pathway enrichment analysis of GWAS P-value data. GSA-SNP2 provides high power, decent type I error control and fast computation by incorporating the random set model and SNP-count adjusted gene score. In a comparative study using simulated and real GWAS data, GSA-SNP2 exhibited high power and best prioritized gold standard positive pathways compared with six existing enrichment-based methods and two self-contained methods (alternative pathway analysis approach). Based on these results, the difference between pathway analysis approaches was investigated and the effects of the gene correlation structures on the pathway enrichment analysis were also discussed. In addition, GSA-SNP2 is able to visualize protein interaction networks within and across the significant pathways so that the user can prioritize the core subnetworks for further studies. GSA-SNP2 is freely available at https://sourceforge.net/projects/gsasnp2.

Assuntos

Estudo de Associação Genômica Ampla/métodos , Software , Povo Asiático/genética , Estatura/genética , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Humanos , Polimorfismo de Nucleotídeo Único , Linguagens de Programação , Mapas de Interação de Proteínas

Benchmarking integration of single-cell differential expression.

Nguyen, Hai C T; Baik, Bukyung; Yoon, Sora; Park, Taesung; Nam, Dougu.

Nat Commun ; 14(1): 1570, 2023 03 21.

Artigo em Inglês | MEDLINE | ID: mdl-36944632

RESUMO

Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression analysis of single-cell data remain underinvestigated. Here, we benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches. We show that batch effects, sequencing depth and data sparsity substantially impact their performances. Notably, we find that the use of batch-corrected data rarely improves the analysis for sparse data, whereas batch covariate modeling improves the analysis for substantial batch effects. We show that for low depth data, single-cell techniques based on zero-inflation model deteriorate the performance, whereas the analysis of uncorrected data using limmatrend, Wilcoxon test and fixed effects model performs well. We suggest several high-performance methods under different conditions based on various simulation and real data analyses. Additionally, we demonstrate that differential expression analysis for a specific cell type outperforms that of large-scale bulk sample data in prioritizing disease-related genes.

Assuntos

Benchmarking , Análise de Dados , Análise de Sequência de RNA/métodos , Benchmarking/métodos , Simulação por Computador , Fluxo de Trabalho , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa