Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Med Inform Decis Mak ; 21(1): 303, 2021 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-34724933

RESUMEN

BACKGROUND: Accurately predicting which patients with chronic heart failure (CHF) are particularly vulnerable for adverse outcomes is of crucial importance to support clinical decision making. The goal of the current study was to examine the predictive value on long term heart failure (HF) hospitalisation and all-cause mortality in CHF patients, by exploring and exploiting machine learning (ML) and traditional statistical techniques on a Dutch health insurance claims database. METHODS: Our study population consisted of 25,776 patients with a CHF diagnosis code between 2012 and 2014 and one year and three years follow-up HF hospitalisation (1446 and 3220 patients respectively) and all-cause mortality (2434 and 7882 patients respectively) were measured from 2015 to 2018. The area under the receiver operating characteristic (ROC) curve (AUC) was calculated after modelling the data using Logistic Regression, Random Forest, Elastic Net regression and Neural Networks. RESULTS: AUC rates ranged from 0.710 to 0.732 for 1-year HF hospitalisation, 0.705-0.733 for 3-years HF hospitalisation, 0.765-0.787 for 1-year mortality and 0.764-0.791 for 3-years mortality. Elastic Net performed best for all endpoints. Differences between techniques were small and only statistically significant between Elastic Net and Logistic Regression compared with Random Forest for 3-years HF hospitalisation. CONCLUSION: In this study based on a health insurance claims database we found clear predictive value for predicting long-term HF hospitalisation and mortality of CHF patients by using ML techniques compared to traditional statistics.


Asunto(s)
Insuficiencia Cardíaca , Hospitalización , Humanos , Modelos Logísticos , Aprendizaje Automático , Curva ROC
2.
Forensic Sci Res ; 3(3): 230-239, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30483673

RESUMEN

Law enforcement agencies have a restricted area in which their powers apply, which is called their jurisdiction. These restrictions also apply to the Internet. However, on the Internet, the physical borders of the jurisdiction, typically country borders, are hard to discover. In our case, it is hard to establish whether someone involved in criminal online behavior is indeed a Dutch citizen. We propose a way to overcome the arduous task of manually investigating whether a user on an Internet forum is Dutch or not. More precisely, we aim to detect that a given English text is written by a Dutch native author. To develop a detector, we follow a machine learning approach. Therefore, we need to prepare a specific training corpus. To obtain a corpus that is representative for online forums, we collected a large amount of English forum posts from Dutch and non-Dutch authors on Reddit. To learn a detection model, we used a bag-of-words representation to capture potential misspellings, grammatical errors or unusual turns of phrases that are characteristic of the mother tongue of the authors. For this learning task, we compare the linear support vector machine and regularized logistic regression using the appropriate performance metrics f 1 score, precision, and average precision. Our results show logistic regression with frequency-based feature selection performs best at predicting Dutch natives. Further study should be directed to the general applicability of the results that is to find out if the developed models are applicable to other forums with comparable high performance.

3.
IEEE Trans Pattern Anal Mach Intell ; 27(9): 1417-29, 2005 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-16173185

RESUMEN

We present the Nearest Subclass Classifier (NSC), which is a classification algorithm that unifies the flexibility of the nearest neighbor classifier with the robustness of the nearest mean classifier. The algorithm is based on the Maximum Variance Cluster algorithm and, as such, it belongs to the class of prototype-based classifiers. The variance constraint parameter of the cluster algorithm serves to regularize the classifier, that is, to prevent overfitting. With a low variance constraint value, the classifier turns into the nearest neighbor classifier and, with a high variance parameter, it becomes the nearest mean classifier with the respective properties. In other words, the number of prototypes ranges from the whole training set to only one per class. In the experiments, we compared the NSC with regard to its performance and data set compression ratio to several other prototype-based methods. On several data sets, the NSC performed similarly to the k-nearest neighbor classifier, which is a well-established classifier in many domains. Also concerning storage requirements and classification speed, the NSC has favorable properties, so it gives a good compromise between classification performance and efficiency.


Asunto(s)
Algoritmos , Inteligencia Artificial , Análisis por Conglomerados , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos
4.
IEEE Trans Pattern Anal Mach Intell ; 27(9): 1496-500, 2005 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-16173192

RESUMEN

In this paper, we specifically focus on high-dimensional data sets for which the number of dimensions is an order of magnitude higher than the number of objects. From a classifier design standpoint, such small sample size problems have some interesting challenges. The first challenge is to find, from all hyperplanes that separate the classes, a separating hyperplane which generalizes well for future data. A second important task is to determine which features are required to distinguish the classes. To attack these problems, we propose the LESS (Lowest Error in a Sparse Subspace) classifier that efficiently finds linear discriminants in a sparse subspace. In contrast with most classifiers for high-dimensional data sets, the LESS classifier incorporates a (simple) data model. Further, by means of a regularization parameter, the classifier establishes a suitable trade-off between subspace sparseness and classification accuracy. In the experiments, we show how LESS performs on several high-dimensional data sets and compare its performance to related state-of-the-art classifiers like, among others, linear ridge regression with the LASSO and the Support Vector Machine. It turns out that LESS performs competitively while using fewer dimensions.


Asunto(s)
Algoritmos , Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis por Conglomerados , Simulación por Computador
5.
IEEE Trans Image Process ; 12(3): 304-16, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-18237910

RESUMEN

Clustering is inherently a difficult problem, both with respect to the definition of adequate models as well as to the optimization of the models. We present a model for the cluster problem that does not need knowledge about the number of clusters a priori. This property is among others useful in the image segmentation domain, which we especially address. Further, we propose a cellular coevolutionary algorithm for the optimization of the model. Within this scheme multiple agents are placed in a regular two-dimensional (2-D) grid representing the image, which imposes neighboring relations on them. The agents cooperatively consider pixel migration from one agent to the other in order to improve the homogeneity of the ensemble of the image regions they represent. If the union of the regions of neighboring agents is homogeneous then the agents form alliances. On the other hand, if an agent discovers a deviant subject, it isolates the subject. In the experiments we show the effectiveness of the proposed method and compare it to other segmentation algorithms. The efficiency can easily be improved by exploiting the intrinsic parallelism of the proposed method.

6.
IEEE Trans Pattern Anal Mach Intell ; 32(7): 1271-83, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20489229

RESUMEN

This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.

7.
J Forensic Sci ; 54(3): 628-38, 2009 May.
Artículo en Inglés | MEDLINE | ID: mdl-19432739

RESUMEN

In this research, we examined whether fixed pattern noise or more specifically Photo Response Non-Uniformity (PRNU) can be used to identify the source camera of heavily JPEG compressed digital photographs of resolution 640 x 480 pixels. We extracted PRNU patterns from both reference and questioned images using a two-dimensional Gaussian filter and compared these patterns by calculating the correlation coefficient between them. Both the closed and open-set problems were addressed, leading the problems in the closed set to high accuracies for 83% for single images and 100% for around 20 simultaneously identified questioned images. The correct source camera was chosen from a set of 38 cameras of four different types. For the open-set problem, decision levels were obtained for several numbers of simultaneously identified questioned images. The corresponding false rejection rates were unsatisfactory for single images but improved for simultaneous identification of multiple images.

8.
Bioinformatics ; 21(19): 3755-62, 2005 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-15817694

RESUMEN

MOTIVATION: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no 'standard' computational approach has emerged which enables robust outcome prediction. RESULTS: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.


Asunto(s)
Algoritmos , Biomarcadores de Tumor/análisis , Diagnóstico por Computador/métodos , Perfilación de la Expresión Génica/métodos , Proteínas de Neoplasias/análisis , Neoplasias/diagnóstico , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Animales , Retroalimentación , Humanos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Validación de Programas de Computación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA