Omada: robust clustering of transcriptomes through multiple testing.

Kariotis, Sokratis; Tan, Pei Fang; Lu, Haiping; Rhodes, Christopher J; Wilkins, Martin R; Lawrie, Allan; Wang, Dennis

Kariotis, Sokratis; Tan, Pei Fang; Lu, Haiping; Rhodes, Christopher J; Wilkins, Martin R; Lawrie, Allan; Wang, Dennis.

Afiliação

Kariotis S; Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), 30 Medical Dr, 117609, Singapore, Republic of Singapore.
Tan PF; Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis St, Matrix, 138671, Singapore, Republic of Singapore.
Lu H; National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse St, SW3 6LY, London, United Kingdom.
Rhodes CJ; Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), 30 Medical Dr, 117609, Singapore, Republic of Singapore.
Wilkins MR; Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis St, Matrix, 138671, Singapore, Republic of Singapore.
Lawrie A; Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, S1 4DP, Sheffield, United Kingdom.
Wang D; National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse St, SW3 6LY, London, United Kingdom.

Gigascience ; 132024 01 02.

Article em En | MEDLINE | ID: mdl-38991852

ABSTRACT

ABSTRACT

BACKGROUND:

Cohort studies increasingly collect biosamples for molecular profiling and are observing molecular heterogeneity. High-throughput RNA sequencing is providing large datasets capable of reflecting disease mechanisms. Clustering approaches have produced a number of tools to help dissect complex heterogeneous datasets, but selecting the appropriate method and parameters to perform exploratory clustering analysis of transcriptomic data requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent. To address this, we have developed Omada, a suite of tools aiming to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning-based functions.

FINDINGS:

The efficiency of each tool was tested with 7 datasets characterized by different expression signal strengths to capture a wide spectrum of RNA expression datasets. Our toolkit's decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Within datasets with less clear biological distinctions, our tools either formed stable subgroups with different expression profiles and robust clinical associations or revealed signs of problematic data such as biased measurements.

CONCLUSIONS:

In conclusion, Omada successfully automates the robust unsupervised clustering of transcriptomic data, making advanced analysis accessible and reliable even for those without extensive machine learning expertise. Implementation of Omada is available at http//bioconductor.org/packages/omada/.

Assuntos

Perfilação da Expressão Gênica; Software; Transcriptoma; Análise por Conglomerados; Perfilação da Expressão Gênica/métodos; Humanos; Biologia Computacional/métodos; Aprendizado de Máquina; Sequenciamento de Nucleotídeos em Larga Escala/métodos; Análise de Sequência de RNA/métodos; Algoritmos

Palavras-chave

cluster analysis; gene expression; software toolkit; unsupervised learning

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Perfilação da Expressão Gênica / Transcriptoma Limite: Humans Idioma: En Revista: Gigascience Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google