Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 30(4): 735-40, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18276977

RESUMO

We derive a tight dependency-related bound on the difference between the Naïve Bayes (NB) error and Bayes error for two binary features and two equiprobable classes. A measure of discrepancy of feature dependencies is proposed for multiple features. Its correlation with NB is shown using 23 real data sets.


Assuntos
Algoritmos , Inteligência Artificial , Teorema de Bayes , Interpretação Estatística de Dados , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Simulação por Computador , Modelos Logísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
2.
Comput Biol Med ; 37(8): 1194-202, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17222398

RESUMO

Scrapie is a neuro-degenerative disease in small ruminants. A data set of 3113 records of sheep reported to the Scrapie Notifications Database in Great Britain has been studied. Clinical signs were recorded as present/absent in each animal by veterinary officials (VO) and a post-mortem diagnosis was made. In an attempt to detect healthy animals within the set of suspects using only the clinical signs, 18 classification methods were applied ranging from simple linear classifiers to classifier ensembles such as Bagging, AdaBoost and Random Forests. The results suggest that the clinical classification by the VO was adequate as no further differentiation within the set of suspects was feasible.


Assuntos
Diagnóstico por Computador/veterinária , Scrapie/diagnóstico , Animais , Simulação por Computador , Bases de Dados Factuais , Diagnóstico por Computador/classificação , Diagnóstico por Computador/estatística & dados numéricos , Curva ROC , Scrapie/classificação , Ovinos , Reino Unido
3.
IEEE Trans Pattern Anal Mach Intell ; 28(11): 1798-808, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17063684

RESUMO

Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. Each clusterer is assigned a random target number of clusters, k and is started from a random initialization. Here, we use 10 artificial and 10 real data sets to study ensemble stability with respect to random k, and random initialization. The data sets were chosen to have a small number of clusters (two to seven) and a moderate number of data points (up to a few hundred). Pairwise stability is defined as the adjusted Rand index between pairs of clusterers in the ensemble, averaged across all pairs. Nonpairwise stability is defined as the entropy of the consensus matrix of the ensemble. An experimental comparison with the stability of the standard k-means algorithm was carried out for k from 2 to 20. The results revealed that ensembles are generally more stable, markedly so for larger k. To establish whether stability can serve as a cluster validity index, we first looked at the relationship between stability and accuracy with respect to the number of clusters, k. We found that such a relationship strongly depends on the data set, varying from almost perfect positive correlation (0.97, for the glass data) to almost perfect negative correlation (-0.93, for the crabs data). We propose a new combined stability index to be the sum of the pairwise individual and ensemble stabilities. This index was found to correlate better with the ensemble accuracy. Following the hypothesis that a point of stability of a clustering algorithm corresponds to a structure found in the data, we used the stability measures to pick the number of clusters. The combined stability index gave best results.


Assuntos
Algoritmos , Inteligência Artificial , Análise por Conglomerados , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Modelos Estatísticos , Distribuição Aleatória , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
4.
IEEE Trans Pattern Anal Mach Intell ; 28(10): 1619-30, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16986543

RESUMO

We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the Rotation Forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favorable to Rotation Forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest, and more diverse than these in Bagging, sometimes more accurate as well.


Assuntos
Algoritmos , Inteligência Artificial , Análise por Conglomerados , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Análise Numérica Assistida por Computador , Análise de Componente Principal , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
5.
IEEE Trans Neural Netw Learn Syst ; 25(1): 69-80, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24806645

RESUMO

When classifiers are deployed in real-world applications, it is assumed that the distribution of the incoming data matches the distribution of the data used to train the classifier. This assumption is often incorrect, which necessitates some form of change detection or adaptive classification. While there has been a lot of work on change detection based on the classification error monitored over the course of the operation of the classifier, finding changes in multidimensional unlabeled data is still a challenge. Here, we propose to apply principal component analysis (PCA) for feature extraction prior to the change detection. Supported by a theoretical example, we argue that the components with the lowest variance should be retained as the extracted features because they are more likely to be affected by a change. We chose a recently proposed semiparametric log-likelihood change detection criterion that is sensitive to changes in both mean and variance of the multidimensional distribution. An experiment with 35 datasets and an illustration with a simple video segmentation demonstrate the advantage of using extracted features compared to raw data. Further analysis shows that feature extraction through PCA is beneficial, specifically for data with multiple balanced classes.

6.
IEEE Trans Cybern ; 44(3): 342-54, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23757554

RESUMO

This paper describes a general method to address partial occlusions for human detection in still images. The random subspace method (RSM) is chosen for building a classifier ensemble robust against partial occlusions. The component classifiers are chosen on the basis of their individual and combined performance. The main contribution of this work lies in our approach's capability to improve the detection rate when partial occlusions are present without compromising the detection performance on non occluded data. In contrast to many recent approaches, we propose a method which does not require manual labeling of body parts, defining any semantic spatial components, or using additional data coming from motion or stereo. Moreover, the method can be easily extended to other object classes. The experiments are performed on three large datasets: the INRIA person dataset, the Daimler Multicue dataset, and a new challenging dataset, called PobleSec, in which a considerable number of targets are partially occluded. The different approaches are evaluated at the classification and detection levels for both partially occluded and non-occluded data. The experimental results show that our detector outperforms state-of-the-art approaches in the presence of partial occlusions, while offering performance and reliability similar to those of the holistic approach on non-occluded data. The datasets used in our experiments have been made publicly available for benchmarking purposes.


Assuntos
Identificação Biométrica/métodos , Interpretação de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Imagem Corporal Total/métodos , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Aumento da Imagem/métodos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Técnica de Subtração
7.
Magn Reson Imaging ; 28(4): 583-93, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-20096528

RESUMO

Functional magnetic resonance imaging (fMRI) is becoming a forefront brain-computer interface tool. To decipher brain patterns, fast, accurate and reliable classifier methods are needed. The support vector machine (SVM) classifier has been traditionally used. Here we argue that state-of-the-art methods from pattern recognition and machine learning, such as classifier ensembles, offer more accurate classification. This study compares 18 classification methods on a publicly available real data set due to Haxby et al. [Science 293 (2001) 2425-2430]. The data comes from a single-subject experiment, organized in 10 runs where eight classes of stimuli were presented in each run. The comparisons were carried out on voxel subsets of different sizes, selected through seven popular voxel selection methods. We found that, while SVM was robust, accurate and scalable, some classifier ensemble methods demonstrated significantly better performance. The best classifiers were found to be the random subspace ensemble of SVM classifiers, rotation forest and ensembles with random linear and random spherical oracle.


Assuntos
Encéfalo/patologia , Imageamento por Ressonância Magnética/métodos , Algoritmos , Inteligência Artificial , Teorema de Bayes , Mapeamento Encefálico , Simulação por Computador , Computadores , Humanos , Aumento da Imagem , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Processamento de Sinais Assistido por Computador , Estatística como Assunto
8.
IEEE Trans Med Imaging ; 29(2): 531-42, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20129853

RESUMO

Classification of brain images obtained through functional magnetic resonance imaging (fMRI) poses a serious challenge to pattern recognition and machine learning due to the extremely large feature-to-instance ratio. This calls for revision and adaptation of the current state-of-the-art classification methods. We investigate the suitability of the random subspace (RS) ensemble method for fMRI classification. RS samples from the original feature set and builds one (base) classifier on each subset. The ensemble assigns a class label by either majority voting or averaging of output probabilities. Looking for guidelines for setting the two parameters of the method-ensemble size and feature sample size-we introduce three criteria calculated through these parameters: usability of the selected feature sets, coverage of the set of "important" features, and feature set diversity. Optimized together, these criteria work toward producing accurate and diverse individual classifiers. RS was tested on three fMRI datasets from single-subject experiments: the Haxby data (Haxby, 2001.) and two datasets collected in-house. We found that RS with support vector machines (SVM) as the base classifier outperformed single classifiers as well as some of the most widely used classifier ensembles such as bagging, AdaBoost, random forest, and rotation forest. The closest rivals were the single SVM and bagging of SVM classifiers. We use kappa-error diagrams to understand the success of RS.


Assuntos
Algoritmos , Encéfalo/fisiologia , Imageamento por Ressonância Magnética/métodos , Reconhecimento Automatizado de Padrão/métodos , Adulto , Simulação por Computador , Humanos , Masculino , Análise Multivariada , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA