Your browser doesn't support javascript.
loading
Repeated Decision Stumping Distils Simple Rules from Single-Cell Data.
Croydon-Veleslavov, Ivan A; Stumpf, Michael P H.
Afiliación
  • Croydon-Veleslavov IA; Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom.
  • Stumpf MPH; Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom.
J Comput Biol ; 31(1): 21-40, 2024 01.
Article en En | MEDLINE | ID: mdl-38170180
ABSTRACT
Single-cell data afford unprecedented insights into molecular processes. But the complexity and size of these data sets have proved challenging and given rise to a large armory of statistical and machine learning approaches. The majority of approaches focuses on either describing features of these data, or making predictions and classifying unlabeled samples. In this study, we introduce repeated decision stumping (ReDX) as a method to distill simple models from single-cell data. We develop decision trees of depth one-hence "stumps"-to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key players involved in these processes in an unbiased manner without prior knowledge. Our algorithm is deliberately targeting the simplest possible candidate hypotheses that can be extracted from complex high-dimensional data. There are three reasons for this (1) the predictions become straightforwardly testable hypotheses; (2) the identified candidates form the basis for further mechanistic model development, for example, for engineering and synthetic biology interventions; and (3) this approach complements existing descriptive modeling approaches and frameworks. The approach is computationally efficient, has remarkable predictive power, including in simulation studies where the ground truth is known, and yields robust and statistically stable predictors; the same set of candidates is generated by applying the algorithm to different subsamples of experimental data.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Tipo de estudio: Prognostic_studies Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Reino Unido

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Tipo de estudio: Prognostic_studies Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Reino Unido
...