Supervised learning on phylogenetically distributed data.
Bioinformatics
; 36(Suppl_2): i895-i902, 2020 12 30.
Article
en En
| MEDLINE
| ID: mdl-33381838
ABSTRACT
MOTIVATION The ability to develop robust machine-learning (ML) models is considered imperative to the adoption of ML techniques in biology and medicine fields. This challenge is particularly acute when data available for training is not independent and identically distributed (iid), in which case trained models are vulnerable to out-of-distribution generalization problems. Of particular interest are problems where data correspond to observations made on phylogenetically related samples (e.g. antibiotic resistance data). RESULTS:
We introduce DendroNet, a new approach to train neural networks in the context of evolutionary data. DendroNet explicitly accounts for the relatedness of the training/testing data, while allowing the model to evolve along the branches of the phylogenetic tree, hence accommodating potential changes in the rules that relate genotypes to phenotypes. Using simulated data, we demonstrate that DendroNet produces models that can be significantly better than non-phylogenetically aware approaches. DendroNet also outperforms other approaches at two biological tasks of significant practical importance antiobiotic resistance prediction in bacteria and trophic level prediction in fungi. AVAILABILITY AND IMPLEMENTATION https//github.com/BlanchetteLab/DendroNet.
Texto completo:
1
Colección:
01-internacional
Banco de datos:
MEDLINE
Asunto principal:
Redes Neurales de la Computación
/
Aprendizaje Automático
Tipo de estudio:
Prognostic_studies
Idioma:
En
Revista:
Bioinformatics
Asunto de la revista:
INFORMATICA MEDICA
Año:
2020
Tipo del documento:
Article
País de afiliación:
Canadá