Leveraging external knowledge on molecular interactions in classification methods for risk prediction of patients.

Porzelius, Christine; Johannes, Marc; Binder, Harald; Beissbarth, Tim

Porzelius, Christine; Johannes, Marc; Binder, Harald; Beissbarth, Tim.

Afiliación

Porzelius C; Freiburg Center for Data Analysis and Modeling, University of Freiburg, Eckerstraße 1, 79104 Freiburg, Germany. cp@imbi.uni-freiburg.de

Biom J ; 53(2): 190-201, 2011 Mar.

Article en En | MEDLINE | ID: mdl-21328603

RESUMEN

Classification of patients based on molecular markers, for example into different risk groups, is a modern field in medical research. The aim of this classification is often a better diagnosis or individualized therapy. The search for molecular markers often utilizes extremely high-dimensional data sets (e.g. gene-expression microarrays). However, in situations where the number of measured markers (genes) is intrinsically higher than the number of available patients, standard methods from statistical learning fail to deal correctly with this so-called "curse of dimensionality". Also feature or dimension reduction techniques based on statistical models promise only limited success. Several recent methods explore ideas of how to quantify and incorporate biological prior knowledge of molecular interactions and known cellular processes into the feature selection process. This article aims to give an overview of such current methods as well as the databases, where this external knowledge can be obtained from. For illustration, two recent methods are compared in detail, a feature selection approach for support vector machines as well as a boosting approach for regression models. As a practical example, data on patients with acute lymphoblastic leukemia are considered, where the binary endpoint "relapse within first year" should be predicted.

Asunto(s)

Perfilación de la Expresión Génica; Regulación Leucémica de la Expresión Génica; Área Bajo la Curva; Teorema de Bayes; Bases de Datos Genéticas; Regulación de la Expresión Génica; Humanos; Modelos Genéticos; Modelos Estadísticos; Análisis Multivariante; Análisis de Secuencia por Matrices de Oligonucleótidos; Leucemia-Linfoma Linfoblástico de Células Precursoras/genética; Probabilidad; Análisis de Regresión; Riesgo

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Regulación Leucémica de la Expresión Génica / Perfilación de la Expresión Génica Tipo de estudio: Diagnostic_studies / Etiology_studies / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Biom J Año: 2011 Tipo del documento: Article País de afiliación: Alemania

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google