Denoiseit: denoising gene expression data using rank based isolation trees.
BMC Bioinformatics
; 25(1): 271, 2024 Aug 21.
Article
en En
| MEDLINE
| ID: mdl-39169300
ABSTRACT
BACKGROUND:
Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers.METHOD:
In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes.RESULTS:
DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https//github.com/cobi-git/DenoiseIt.Palabras clave
Texto completo:
1
Banco de datos:
MEDLINE
Asunto principal:
Perfilación de la Expresión Génica
/
COVID-19
Límite:
Humans
Idioma:
En
Año:
2024
Tipo del documento:
Article