Your browser doesn't support javascript.
loading
Variable ranking based on the estimated degree of separation for two distributions of data by the length of the receiver operating characteristic curve.
Maswadeh, Waleed M; Snyder, A Peter.
Afiliação
  • Maswadeh WM; U.S. Army Edgewood Chemical Biological Center (ECBC), RDECOM, ATTN: RDCB-DRD-P, Building E3160, Edgewood Area, Aberdeen Proving Ground, MD 21010-5424, USA. Electronic address: waleed.m.maswadeh.civ@mail.mil.
  • Snyder AP; Bel Air, MD 21015, USA.
Anal Chim Acta ; 876: 39-48, 2015 May 30.
Article em En | MEDLINE | ID: mdl-25998456
ABSTRACT
Variable responses are fundamental for all experiments, and they can consist of information-rich, redundant, and low signal intensities. A dataset can consist of a collection of variable responses over multiple classes or groups. Usually some of the variables are removed in a dataset that contain very little information. Sometimes all the variables are used in the data analysis phase. It is common practice to discriminate between two distributions of data; however, there is no formal algorithm to arrive at a degree of separation (DS) between two distributions of data. The DS is defined herein as the average of the sum of the areas from the probability density functions (PDFs) of A and B that contain a≥percentage of A and/or B. Thus, DS90 is the average of the sum of the PDF areas of A and B that contain ≥90% of A and/or B. To arrive at a DS value, two synthesized PDFs or very large experimental datasets are required. Experimentally it is common practice to generate relatively small datasets. Therefore, the challenge was to find a statistical parameter that can be used on small datasets to estimate and highly correlate with the DS90 parameter. Established statistical methods include the overlap area of the two data distribution profiles, Welch's t-test, Kolmogorov-Smirnov (K-S) test, Mann-Whitney-Wilcoxon test, and the area under the receiver operating characteristics (ROC) curve (AUC). The area between the ROC curve and diagonal (ACD) and the length of the ROC curve (LROC) are introduced. The established, ACD, and LROC methods were correlated to the DS90 when applied on many pairs of synthesized PDFs. The LROC method provided the best linear correlation with, and estimation of, the DS90. The estimated DS90 from the LROC (DS90-LROC) is applied to a database, as an example, of three Italian wines consisting of thirteen variable responses for variable ranking consideration. An important highlight of the DS90-LROC method is utilizing the LROC curve methodology to test all variables one-at-a-time with all pairs of classes in a dataset.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2015 Tipo de documento: Article