Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Data Min Knowl Discov ; 37(4): 1473-1517, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37424877

RESUMEN

It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56-64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147-153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. Supplementary Information: The online version contains supplementary material available at 10.1007/s10618-023-00931-x.

5.
IEEE Trans Pattern Anal Mach Intell ; 42(9): 2096-2112, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31027043

RESUMEN

The application of clustering involves the interpretation of objects placed in multi-dimensional spaces. The task of clustering itself is inherently submitted to subjectivity, the optimal solution can be extremely costly to discover and sometimes even unreachable or nonexistent. This fact introduces a trade-off between accuracy and computational effort, moreover given that engineering applications usually work well with suboptimal solutions. In such applied scenarios, cluster validation is mandatory to refine algorithms and ensure that solutions are meaningful. Validity indices are commonly intended to benchmark diverse clustering setups, therefore they are coefficients with a relative nature, i.e., useful when compared to one another. In this paper, we propose a validation methodology that enables absolute evaluations of clustering results. Our method performs geometric measurements of the solution space and provides a coherent interpretation of the data structure by using indices based on inter- and intra-cluster distances, density, and multimodality within clusters. Conducted tests and comparisons with well-known indices show that our validation methodology improves the robustness of the clustering application for knowledge discovery. While clustering is often performed as a black box technique, our index is construable and therefore allows for the implementation of systems enriched with self-checking capabilities.

6.
Data Min Knowl Discov ; 33(6): 1894-1952, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32831623

RESUMEN

Semi-supervised learning is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is dramatically increasing. In this paper, we first introduce a unified view of density-based clustering algorithms. We then build upon this view and bridge the areas of semi-supervised clustering and classification under a common umbrella of density-based techniques. We show that there are close relations between density-based clustering algorithms and the graph-based approach for transductive classification. These relations are then used as a basis for a new framework for semi-supervised classification based on building-blocks from density-based clustering. This framework is not only efficient and effective, but it is also statistically sound. In addition, we generalize the core algorithm in our framework, HDBSCAN*, so that it can also perform semi-supervised clustering by directly taking advantage of any fraction of labeled data that may be available. Experimental results on a large collection of datasets show the advantages of the proposed approach both for semi-supervised classification as well as for semi-supervised clustering.

7.
Artículo en Inglés | MEDLINE | ID: mdl-20671325

RESUMEN

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article, we investigate empirically whether this is the case for two such hierarchies. We compare multiclass classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multiclass settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data but not in the case of the protein classification problems. Based on this, we recommend that strong flat multiclass methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.


Asunto(s)
Algoritmos , Inteligencia Artificial , Proteínas/química , Proteínas/clasificación , Metodologías Computacionales , Datos de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas , Pliegue de Proteína
8.
J Bioinform Comput Biol ; 7(2): 269-85, 2009 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19340915

RESUMEN

In the past decade, many automated prediction methods for the subcellular localization of proteins have been proposed, utilizing a wide range of principles and learning approaches. Based on an experimental evaluation of different methods and their theoretical properties, we propose to combine a well-balanced set of existing approaches to new, ensemble-based prediction methods. The experimental evaluation shows that our ensembles improve substantially over the underlying base methods.


Asunto(s)
Algoritmos , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Fracciones Subcelulares/química , Fracciones Subcelulares/metabolismo , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Relación Estructura-Actividad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA