Biomarker identification by interpretable maximum mean discrepancy.
Bioinformatics
; 40(Suppl 1): i501-i510, 2024 06 28.
Article
in En
| MEDLINE
| ID: mdl-38940158
ABSTRACT
MOTIVATION In many biomedical applications, we are confronted with paired groups of samples, such as treated versus control. The aim is to detect discriminating features, i.e. biomarkers, based on high-dimensional (omics-) data. This problem can be phrased more generally as a two-sample problem requiring statistical significance testing to establish differences, and interpretations to identify distinguishing features. The multivariate maximum mean discrepancy (MMD) test quantifies group-level differences, whereas statistically significantly associated features are usually found by univariate feature selection. Currently, few general-purpose methods simultaneously perform multivariate feature selection and two-sample testing. RESULTS:
We introduce a sparse, interpretable, and optimized MMD test (SpInOpt-MMD) that enables two-sample testing and feature selection in the same experiment. SpInOpt-MMD is a versatile method and we demonstrate its application to a variety of synthetic and real-world data types including images, gene expression measurements, and text data. SpInOpt-MMD is effective in identifying relevant features in small sample sizes and outperforms other feature selection methods such as SHapley Additive exPlanations and univariate association analysis in several experiments. AVAILABILITY AND IMPLEMENTATION The code and links to our public data are available at https//github.com/BorgwardtLab/spinoptmmd.
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Biomarkers
Limits:
Humans
Language:
En
Journal:
Bioinformatics
Journal subject:
INFORMATICA MEDICA
Year:
2024
Document type:
Article
Affiliation country:
Switzerland
Country of publication:
United kingdom