Your browser doesn't support javascript.
loading
Genetic test bed for feature selection.
Choudhary, Ashish; Brun, Marcel; Hua, Jianping; Lowey, James; Suh, Ed; Dougherty, Edward R.
  • Choudhary A; Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
Bioinformatics ; 22(7): 837-42, 2006 Apr 01.
Article en En | MEDLINE | ID: mdl-16428263
ABSTRACT
MOTIVATION Given a large set of potential features, such as the set of all gene-expression values from a microarray, it is necessary to find a small subset with which to classify. The task of finding an optimal feature set of a given size is inherently combinatoric because to assure optimality all feature sets of a given size must be checked. Thus, numerous suboptimal feature-selection algorithms have been proposed. There are strong impediments to evaluate feature-selection algorithms using real data when data are limited, a common situation in genetic classification. The difficulty is compound. First, there are no class-conditional distributions from which to draw data points, only a single small labeled sample. Second, there are no test data with which to estimate the feature-set errors, and one must depend on a training-data-based error estimator. Finally, there is no optimal feature set with which to compare the feature sets found by the algorithms.

RESULTS:

This paper describes a genetic test bed for the evaluation of feature-selection algorithms. It begins with a large biological feature-label dataset that is used as an empirical distribution and, using massively parallel computation, finds the top feature sets of various sizes based on a given sample size and classification rule. The user can draw random samples from the data, apply a proposed algorithm, and evaluate the proficiency of the proposed algorithm via three different measures (code provided). A key feature of the test bed is that, once a dataset is input, a single command creates the entire test bed relative to the dataset. The particular dataset used for the first version of the test bed comes from a microarray-based classification study that analyzes a large number of microarrays, prepared with RNA from breast tumor samples from each of 295 patients.

AVAILABILITY:

The software and supplementary material are available at http//public.tgen.org/tgen-cb/support/testbed/ CONTACT edward@ece.tamu.edu.
Asunto(s)
Search on Google
Banco de datos: MEDLINE Asunto principal: Algoritmos / Simulación por Computador / Perfilación de la Expresión Génica Tipo de estudio: Prognostic_studies Límite: Female / Humans Idioma: En Año: 2006 Tipo del documento: Article
Search on Google
Banco de datos: MEDLINE Asunto principal: Algoritmos / Simulación por Computador / Perfilación de la Expresión Génica Tipo de estudio: Prognostic_studies Límite: Female / Humans Idioma: En Año: 2006 Tipo del documento: Article