Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
2.
Neural Netw ; 100: 39-48, 2018 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-29475014

RESUMEN

The scalability of low-rank representation (LRR) to large-scale data is still a major research issue, because it is extremely time-consuming to solve singular value decomposition (SVD) in each optimization iteration especially for large matrices. Several methods were proposed to speed up LRR, but they are still computationally heavy, and the overall representation results were also found degenerated. In this paper, a novel method, called accelerated LRR (ALRR) is proposed for large-scale data. The proposed accelerated method integrates matrix factorization with nuclear-norm minimization to find a low-rank representation. In our proposed method, the large square matrix of representation coefficients is transformed into a significantly smaller square matrix, on which SVD can be efficiently implemented. The size of the transformed matrix is not related to the number of data points and the optimization of ALRR is linear with the number of data points. The proposed ALRR is convex, accurate, robust, and efficient for large-scale data. In this paper, ALRR is compared with state-of-the-art in subspace clustering and semi-supervised classification on real image datasets. The obtained results verify the effectiveness and superiority of the proposed ALRR method.


Asunto(s)
Reconocimiento Visual de Modelos/clasificación , Estadística como Asunto/clasificación , Aprendizaje Automático Supervisado/clasificación , Algoritmos , Inteligencia Artificial/clasificación , Análisis por Conglomerados , Aprendizaje
3.
Neural Netw ; 96: 101-114, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28987974

RESUMEN

In this paper, a novel imbalance learning method for binary classes is proposed, named as Post-Boosting of classification boundary for Imbalanced data (PBI), which can significantly improve the performance of any trained neural networks (NN) classification boundary. The procedure of PBI simply consists of two steps: an (imbalanced) NN learning method is first applied to produce a classification boundary, which is then adjusted by PBI under the geometric mean (G-mean). For data imbalance, the geometric mean of the accuracies of both minority and majority classes is considered, that is statistically more suitable than the common metric accuracy. PBI also has the following advantages over traditional imbalance methods: (i) PBI can significantly improve the classification accuracy on minority class while improving or keeping that on majority class as well; (ii) PBI is suitable for large data even with high imbalance ratio (up to 0.001). For evaluation of (i), a new metric called Majority loss/Minority advance ratio (MMR) is proposed that evaluates the loss ratio of majority class to minority class. Experiments have been conducted for PBI and several imbalance learning methods over benchmark datasets of different sizes, different imbalance ratios, and different dimensionalities. By analyzing the experimental results, PBI is shown to outperform other imbalance learning methods on almost all datasets.


Asunto(s)
Aprendizaje Automático/clasificación , Redes Neurales de la Computación , Estadística como Asunto/clasificación , Algoritmos , Biometría
4.
Neural Netw ; 75: 150-61, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26797472

RESUMEN

This paper presents new Radial Basis Function (RBF) learning methods for classification problems. The proposed methods use some heuristics to determine the spreads, the centers and the number of hidden neurons of network in such a way that the higher efficiency is achieved by fewer numbers of neurons, while the learning algorithm remains fast and simple. To retain network size limited, neurons are added to network recursively until termination condition is met. Each neuron covers some of train data. The termination condition is to cover all training data or to reach the maximum number of neurons. In each step, the center and spread of the new neuron are selected based on maximization of its coverage. Maximization of coverage of the neurons leads to a network with fewer neurons and indeed lower VC dimension and better generalization property. Using power exponential distribution function as the activation function of hidden neurons, and in the light of new learning approaches, it is proved that all data became linearly separable in the space of hidden layer outputs which implies that there exist linear output layer weights with zero training error. The proposed methods are applied to some well-known datasets and the simulation results, compared with SVM and some other leading RBF learning methods, show their satisfactory and comparable performance.


Asunto(s)
Heurística Computacional , Aprendizaje Automático , Redes Neurales de la Computación , Estadística como Asunto/clasificación , Algoritmos , Humanos , Neuronas/fisiología , Factores de Tiempo
5.
Arthritis Care Res (Hoboken) ; 68(5): 612-20, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-26414884

RESUMEN

OBJECTIVE: Knee osteoarthritis (OA) is a broadly applied diagnosis that may describe multiple subtypes of pain. The purpose of this study was to identify phenotypes of knee OA, using measures from the following pain-related domains: 1) knee OA pathology, 2) psychological distress, and 3) altered pain neurophysiology. METHODS: Data were selected from a total of 3,494 participants at visit 6 of the Osteoarthritis Initiative study. Latent class analysis was applied to the following variables: radiographic OA severity, quadriceps strength, body mass index, the Charlson Comorbidity Index (CCI), the Center for Epidemiologic Studies Depression Scale, the Coping Strategies Questionnaire-Catastrophizing subscale, number of bodily pain sites, and knee joint tenderness at 4 sites. The resulting classes were compared on the following demographic and clinical factors: age, sex, pain severity, disability, walking speed, and use of arthritis-related health care. RESULTS: A 4-class model was identified. Class 1 (4% of the study population) had higher CCI scores. Class 2 (24%) had higher knee joint sensitivity. Class 3 (10%) had greater psychological distress. Class 4 (62%) had lesser radiographic OA, little psychological involvement, greater strength, and less pain sensitivity. Additionally, class 1 was the oldest, on average. Class 4 was the youngest, had the lowest disability, and least pain. Class 3 had the worst disability and most pain. CONCLUSION: Four distinct pain phenotypes of knee OA were identified. Psychological factors, comorbidity status, and joint sensitivity appear to be important in defining phenotypes of knee OA-related pain.


Asunto(s)
Osteoartritis de la Rodilla/clasificación , Osteoartritis de la Rodilla/diagnóstico , Dimensión del Dolor/clasificación , Dolor/clasificación , Dolor/diagnóstico , Fenotipo , Anciano , Estudios Transversales , Femenino , Estudios de Seguimiento , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Osteoartritis de la Rodilla/epidemiología , Dolor/epidemiología , Dimensión del Dolor/métodos , Estadística como Asunto/clasificación , Estadística como Asunto/métodos
10.
Neural Netw ; 31: 53-72, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22497802

RESUMEN

This paper presents a survey as well as an empirical comparison and evaluation of seven kernels on graphs and two related similarity matrices, that we globally refer to as "kernels on graphs" for simplicity. They are the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commute-time (or resistance-distance) kernel, the random-walk-with-restart similarity matrix, and finally, a kernel first introduced in this paper (the regularized commute-time kernel) and two kernels defined in some of our previous work and further investigated in this paper (the Markov diffusion kernel and the relative-entropy diffusion matrix). The kernel-on-graphs approach is simple and intuitive. It is illustrated by applying the nine kernels to a collaborative-recommendation task, viewed as a link prediction problem, and to a semisupervised classification task, both on several databases. The methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commute-time and the Markov diffusion kernels perform best on the investigated tasks, closely followed by the regularized Laplacian kernel.


Asunto(s)
Bases de Datos Factuales/clasificación , Cadenas de Markov , Estadística como Asunto/clasificación , Distribución Aleatoria
12.
Artículo en Inglés | MEDLINE | ID: mdl-19963901

RESUMEN

This work examines support vector machine (SVM) classification of complex fMRI data, both in the image domain and in the acquired k-space data. We achieve high classification accuracy using the magnitude data in both domains. Additionally, we maintain high classification accuracy even when using only partial k-space data. Thus we demonstrate the feasibility of using kspace data for classification, enabling rapid realtime acquisition and classification.


Asunto(s)
Algoritmos , Imagen por Resonancia Magnética/métodos , Estadística como Asunto/clasificación , Humanos
13.
Talanta ; 76(3): 602-9, 2008 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-18585327

RESUMEN

Missing elements and outliers can often occur in experimental data. The presence of outliers makes the evaluation of any least squares model parameters difficult, while the missing values influence the adequate identification of outliers. Therefore, approaches that can handle incomplete data containing outliers are highly valued. In this paper, we present the expectation-maximization robust soft independent modeling of class analogy approach (EM-S-SIMCA) based on the recently introduced spherical SIMCA method. Several important issues like the possibility of choosing the complexity of the model with the leverage correction procedure, the selection of training and test sets using methods of uniform design for incomplete data and prediction of new samples containing missing elements are discussed. The results of a comparison study showed that EM-S-SIMCA outperforms the classic expectation-maximization SIMCA method. The performance of the method was illustrated on simulated and real data sets and led to satisfactory results.


Asunto(s)
Análisis de los Mínimos Cuadrados , Estadística como Asunto/clasificación , Clasificación , Proyectos de Investigación , Estadística como Asunto/métodos
16.
Stat Appl Genet Mol Biol ; 5: Article16, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17049027

RESUMEN

This note is a comment on the article "Dimension Reduction for Classification with Gene Expression Microarray Data" that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et al., 2006).


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Estadística como Asunto/clasificación , Biología Computacional , Estadística como Asunto/métodos
17.
Parasitology ; 132(Pt 2): 157-67, 2006 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-16472413

RESUMEN

A full understanding of the immune system and its responses to infection by different pathogens is important for the development of anti-parasitic vaccines. A growing number of large-scale experimental techniques, such as microarrays, are being used to gain a better understanding of the immune system. To analyse the data generated by these experiments, methods such as clustering are widely used. However, individual applications of these methods tend to analyse the experimental data without taking publicly available biological and immunological knowledge into account systematically and in an unbiased manner. To make best use of the experimental investment, to benefit from existing evidence, and to support the findings in the experimental data, available biological information should be included in the analysis in a systematic manner. In this review we present a classification of tasks that shows how experimental data produced by studies of the immune system can be placed in a broader biological context. Taking into account available evidence, the classification can be used to identify different ways of analysing the experimental data systematically. We have used the classification to identify alternative ways of analysing microarray data, and illustrate its application using studies of immune responses in mice to infection with the intestinal nematode parasites Trichuris muris and Heligmosomoides polygyrus.


Asunto(s)
Recolección de Datos/clasificación , Recolección de Datos/métodos , Genómica , Inmunidad Activa/genética , Análisis y Desempeño de Tareas , Alergia e Inmunología/clasificación , Animales , Recolección de Datos/normas , Genómica/métodos , Ratones , Nematospiroides dubius/inmunología , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Estadística como Asunto/clasificación , Estadística como Asunto/métodos , Estadística como Asunto/normas , Infecciones por Strongylida/genética , Infecciones por Strongylida/inmunología , Tricuriasis/genética , Tricuriasis/inmunología , Trichuris/inmunología
18.
Zhonghua Yi Xue Za Zhi ; 85(27): 1936-40, 2005 Jul 20.
Artículo en Chino | MEDLINE | ID: mdl-16255993

RESUMEN

OBJECTIVE: To point out the crux of why so many people failed to grasp statistics and to bring forth a "triple-type theory of statistics" to solve the problem in a creative way. METHODS: Based on the experience in long-time teaching and research in statistics, the "three-type theory" was raised and clarified. Examples were provided to demonstrate that the 3 types, i.e., expressive type, prototype and the standardized type are the essentials for people to apply statistics rationally both in theory and practice, and moreover, it is demonstrated by some instances that the "three types" are correlated with each other. It can help people to see the essence by interpreting and analyzing the problems of experimental designs and statistical analyses in medical research work. RESULTS: Investigations reveal that for some questions, the three types are mutually identical; for some questions, the prototype is their standardized type; however, for some others, the three types are distinct from each other. It has been shown that in some multifactor experimental researches, it leads to the nonexistence of the standardized type corresponding to the prototype at all, because some researchers have committed the mistake of "incomplete control" in setting experimental groups. This is a problem which should be solved by the concept and method of "division". CONCLUSION: Once the "triple-type" for each question is clarified, a proper experimental design and statistical method can be carried out easily. "Triple-type theory of statistics" can help people to avoid committing statistical mistakes or at least to decrease the misuse rate dramatically and improve the quality, level and speed of biomedical research during the process of applying statistics. It can also help people to improve the quality of statistical textbooks and the teaching effect of statistics and it has demonstrated how to advance biomedical statistics.


Asunto(s)
Investigación Biomédica , Estadística como Asunto/clasificación , Estadística como Asunto/métodos
19.
Artículo en Ruso | MEDLINE | ID: mdl-16028534

RESUMEN

Mathematical statistics deals with abstract notions, while medicine solves complicated and many-sided problems. For this reason medical statistics faces some moot points in the interpretation of a number of notions and the classification of statistical indices. In the present article the definition of variables and statistical indices is formulated and their characterization is given. An attempt is made to provide the systematization and natural classification of the latter. Statistical indices are defined as the characteristics of statistical totalities. To classify statistical indices, the most essential signs are used: the character of a variable (external relations), the trend of study (internal content), the form of expression (calculation), derived indices and characteristics (comparison and the results of analysis).


Asunto(s)
Enfermedades Transmisibles/epidemiología , Estadística como Asunto/clasificación , Algoritmos , Animales , Enfermedades Transmisibles/clasificación , Humanos
20.
J Zhejiang Univ Sci ; 5(9): 1165-8, 2004 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-15323015

RESUMEN

This study aimed at investigating the characteristics of table and graph that people perceive and the data types which people consider the two displays are most appropriate for. Participants in this survey were 195 teachers and under-graduates from four universities in Beijing. The results showed people's different attitudes towards the two forms of display.


Asunto(s)
Gráficos por Computador/clasificación , Gráficos por Computador/estadística & datos numéricos , Comportamiento del Consumidor/estadística & datos numéricos , Interpretación Estadística de Datos , Almacenamiento y Recuperación de la Información/métodos , Interfaz Usuario-Computador , Percepción Visual , China , Recolección de Datos , Humanos , Almacenamiento y Recuperación de la Información/clasificación , Estadística como Asunto/clasificación , Estadística como Asunto/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...