Your browser doesn't support javascript.
loading
Denoiseit: denoising gene expression data using rank based isolation trees.
Jeon, Jaemin; Suk, Youjeong; Kim, Sang Cheol; Jo, Hye-Yeong; Kim, Kwangsoo; Jung, Inuk.
  • Jeon J; Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-gu, Seoul, 08826, Republic of Korea.
  • Suk Y; School of Computer Science and Engineering, Kyungpook National University, Buk-gu, Daegu, 41566, Republic of Korea.
  • Kim SC; Division of Healthcare and Artificial Intelligence, Department of Precision Medicine, Korea National Institute of Health, Korea Disease Control and Prevention Agency, Osong, CheongJu, 28159, Republic of Korea.
  • Jo HY; Division of Healthcare and Artificial Intelligence, Department of Precision Medicine, Korea National Institute of Health, Korea Disease Control and Prevention Agency, Osong, CheongJu, 28159, Republic of Korea.
  • Kim K; Department of Transdisciplinary Medicine, Seoul National University Hospital, Jongno-gu, Seoul, 03080, Republic of Korea. kwangsookim@snu.ac.kr.
  • Jung I; Department of Medicine, Seoul National University, Jongno-gu, Seoul, 03080, Republic of Korea. kwangsookim@snu.ac.kr.
BMC Bioinformatics ; 25(1): 271, 2024 Aug 21.
Article en En | MEDLINE | ID: mdl-39169300
ABSTRACT

BACKGROUND:

Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers.

METHOD:

In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes.

RESULTS:

DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https//github.com/cobi-git/DenoiseIt.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Perfilación de la Expresión Génica / COVID-19 Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Perfilación de la Expresión Génica / COVID-19 Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article