Your browser doesn't support javascript.
loading
A general framework for a reliable multivariate analysis and pattern recognition in high-dimensional epidemiological data, based on cluster robustness: a tutorial to enrich the epidemiologists' toolkit.
Lefèvre, T; Chauvin, P.
Afiliación
  • Lefèvre T; Inserm, UMR_S 1136, department of social epidemiology, Pierre-Louis institute of epidemiology and public health, 27, rue de Chaligny, 75012 Paris, France; UMR_S 1136, UPMC université Paris 06, Sorbonne universités, 75646 Paris, France; Inserm, UMR_S 1101, laboratory of medical information processing, 29609 Brest, France; UMR_S 1101, université de Bretagne occidentale, 29609 Brest, France; UMR_S 1101, institut Mines-Telecom, Telecom Bretagne, 29609 Brest, France. Electronic address: thomas.lefevr
  • Chauvin P; Inserm, UMR_S 1136, department of social epidemiology, Pierre-Louis institute of epidemiology and public health, 27, rue de Chaligny, 75012 Paris, France; UMR_S 1136, UPMC université Paris 06, Sorbonne universités, 75646 Paris, France. Electronic address: pierre.chauvin@inserm.fr.
Rev Epidemiol Sante Publique ; 63(1): 9-19, 2015 Feb.
Article en En | MEDLINE | ID: mdl-25604830
ABSTRACT

BACKGROUND:

In an epidemiologist's toolbox, three main types of statistical tools can be found means and proportions comparisons, linear or logistic regression models and Cox-type regression models. All these techniques have their own multivariate formulations, so that biases can be accounted for. Nonetheless, there is an entire set of natively massive multivariate techniques, which are based on weaker assumptions than classical statistical techniques are, and which seem to be underestimated or remain unknown to most epidemiologists. These techniques are used for pattern recognition or clustering ­ that is, for retrieving homogeneous groups in data without any a priori about these groups. They are widely used in connex domains such as genetics or biomolecular studies.

METHODS:

Most clustering techniques require tuning specific parameters so that groups can be identified in data. A critical parameter to set is the number of groups the technique needs to discover. Different approaches to find the optimal number of groups are available, such as the silhouette approach and the robustness approach. This article presents the key aspects of clustering techniques (how proximity between observations is defined and how to find the number of groups), two archetypal techniques (namely the k-means and PAM algorithms) and how they relate to more classical statistical approaches.

RESULTS:

Through a theoretical, simple example and a real data application, we provide a complete framework within which classical epidemiological concerns can be reconsidered. We show how to (i) identify whether distinct groups exist in data, (ii) identify the optimal number of groups in data, (iii) label each observation according to its own group and (iv) analyze the groups identified according to separate and explicative data. In addition, how to achieve consistent results while removing sensitivity to initial conditions is explained.

CONCLUSIONS:

Clustering techniques, in conjunction with methods for parameter tuning, provide the epidemiologist with substantial additional tools. They differ from the usual approaches based on hypothesis-testing because no assumptions are made on the data and these clustering techniques are natively multivariate.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Proyectos de Investigación / Estudios Epidemiológicos / Análisis por Conglomerados / Análisis Multivariante Tipo de estudio: Observational_studies / Prognostic_studies Límite: Humans Idioma: En Revista: Rev Epidemiol Sante Publique Año: 2015 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Proyectos de Investigación / Estudios Epidemiológicos / Análisis por Conglomerados / Análisis Multivariante Tipo de estudio: Observational_studies / Prognostic_studies Límite: Humans Idioma: En Revista: Rev Epidemiol Sante Publique Año: 2015 Tipo del documento: Article
...