Navigating the pitfalls of applying machine learning in genomics.
Nat Rev Genet
; 23(3): 169-181, 2022 03.
Article
em En
| MEDLINE
| ID: mdl-34837041
The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Genômica
/
Aprendizado de Máquina
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Limite:
Animals
/
Humans
Idioma:
En
Revista:
Nat Rev Genet
Assunto da revista:
GENETICA
Ano de publicação:
2022
Tipo de documento:
Article
País de afiliação:
Estados Unidos