Strategy for improved characterization of human metabolic phenotypes using a COmbined Multi-block Principal components Analysis with Statistical Spectroscopy (COMPASS).

Loo, Ruey Leng; Chan, Queenie; Antti, Henrik; Li, Jia V; Ashrafian, H; Elliott, Paul; Stamler, Jeremiah; Nicholson, Jeremy K; Holmes, Elaine; Wist, Julien

Loo, Ruey Leng; Chan, Queenie; Antti, Henrik; Li, Jia V; Ashrafian, H; Elliott, Paul; Stamler, Jeremiah; Nicholson, Jeremy K; Holmes, Elaine; Wist, Julien.

Afiliação

Loo RL; Centre for Computational and Systems Medicine, Perth, WA 6150, Australia.
Chan Q; The Australian National Phenome Centre, Health Futures Institute, Murdoch University, Perth, WA 6150, Australia.
Antti H; Department of Epidemiology and Biostatistics, London W2 1PG, UK.
Li JV; MRC Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.
Ashrafian H; Department of Chemistry, Umea Universitet, 901 87 Umeå, Sweden.
Elliott P; Department of Surgery and Cancer, Imperial College London, London W2 1PG, UK.
Stamler J; Department of Surgery and Cancer, Imperial College London, London W2 1PG, UK.
Nicholson JK; Department of Epidemiology and Biostatistics, London W2 1PG, UK.
Holmes E; MRC Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.
Wist J; Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.

Bioinformatics ; 36(21): 5229-5236, 2021 01 29.

Article em En | MEDLINE | ID: mdl-32692809

ABSTRACT

ABSTRACT

MOTIVATION Large-scale population omics data can provide insight into associations between gene-environment interactions and disease. However, existing dimension reduction modelling techniques are often inefficient for extracting detailed information from these complex datasets.

RESULTS:

Here, we present an interactive software pipeline for exploratory analyses of population-based nuclear magnetic resonance spectral data using a COmbined Multi-block Principal components Analysis with Statistical Spectroscopy (COMPASS) within the R-library hastaLaVista framework. Principal component analysis models are generated for a sequential series of spectral regions (blocks) to provide more granular detail defining sub-populations within the dataset. Molecular identification of key differentiating signals is subsequently achieved by implementing Statistical TOtal Correlation SpectroscopY on the full spectral data to define feature patterns. Finally, the distributions of cross-correlation of the reference patterns across the spectral dataset are used to provide population statistics for identifying underlying features arising from drug intake, latent diseases and diet. The COMPASS method thus provides an efficient semi-automated approach for screening population datasets. AVAILABILITY AND IMPLEMENTATION Source code is available at https//github.com/cheminfo/COMPASS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Assuntos

Software; Humanos; Fenótipo; Análise de Componente Principal; Análise Espectral

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google