Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8311-8323, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37015369

ABSTRACT

Classic embedded feature selection algorithms are often divided in two large groups: tree-based algorithms and LASSO variants. Both approaches are focused in different aspects: while the tree-based algorithms provide a clear explanation about which variables are being used to trigger a certain output, LASSO-like approaches sacrifice a detailed explanation in favor of increasing its accuracy. In this paper, we present a novel embedded feature selection algorithm, called End-to-End Feature Selection (E2E-FS), that aims to provide both accuracy and explainability in a clever way. Despite having non-convex regularization terms, our algorithm, similar to the LASSO approach, is solved with gradient descent techniques, introducing some restrictions that force the model to specifically select a maximum number of features that are going to be used subsequently by the classifier. Although these are hard restrictions, the experimental results obtained show that this algorithm can be used with any learning model that is trained using a gradient descent algorithm.

2.
Med Biol Eng Comput ; 60(5): 1333-1345, 2022 May.
Article in English | MEDLINE | ID: mdl-35316469

ABSTRACT

The number of interconnected devices, such as personal wearables, cars, and smart-homes, surrounding us every day has recently increased. The Internet of Things devices monitor many processes, and have the capacity of using machine learning models for pattern recognition, and even making decisions, with the added advantage of diminishing network congestion by allowing computations near to the data sources. The main restriction is the low computation capacity of these devices. Thus, machine learning algorithms capable of maintaining accuracy while using mechanisms that exploit certain characteristics, such as low-precision versions, are needed. In this paper, low-precision mutual information-based feature selection algorithms are employed over DNA microarray datasets, showing that 16-bit and some times even 8-bit representations of these algorithms can be used without significant variations in the final classification results achieved. Graphical Abstract Graphical abstract.


Subject(s)
Algorithms , Machine Learning , Information Storage and Retrieval , Microarray Analysis
3.
Appl Intell (Dordr) ; 52(6): 6413-6431, 2022.
Article in English | MEDLINE | ID: mdl-34764619

ABSTRACT

In this study, we analyze the capability of several state of the art machine learning methods to predict whether patients diagnosed with CoVid-19 (CoronaVirus disease 2019) will need different levels of hospital care assistance (regular hospital admission or intensive care unit admission), during the course of their illness, using only demographic and clinical data. For this research, a data set of 10,454 patients from 14 hospitals in Galicia (Spain) was used. Each patient is characterized by 833 variables, two of which are age and gender and the other are records of diseases or conditions in their medical history. In addition, for each patient, his/her history of hospital or intensive care unit (ICU) admissions due to CoVid-19 is available. This clinical history will serve to label each patient and thus being able to assess the predictions of the model. Our aim is to identify which model delivers the best accuracies for both hospital and ICU admissions only using demographic variables and some structured clinical data, as well as identifying which of those are more relevant in both cases. The results obtained in the experimental study show that the best models are those based on oversampling as a preprocessing phase to balance the distribution of classes. Using these models and all the available features, we achieved an area under the curve (AUC) of 76.1% and 80.4% for predicting the need of hospital and ICU admissions, respectively. Furthermore, feature selection and oversampling techniques were applied and it has been experimentally verified that the relevant variables for the classification are age and gender, since only using these two features the performance of the models is not degraded for the two mentioned prediction problems.

4.
Comput Biol Med ; 112: 103375, 2019 09.
Article in English | MEDLINE | ID: mdl-31382212

ABSTRACT

Feature selection is a preprocessing technique that identifies the key features of a given problem. It has traditionally been applied in a wide range of problems that include biological data processing, finance, and intrusion detection systems. In particular, feature selection has been successfully used in medical applications, where it can not only reduce dimensionality but also help us understand the causes of a disease. We describe some basic concepts related to medical applications and provide some necessary background information on feature selection. We review the most recent feature selection methods developed for and applied in medical problems, covering prolific research fields such as medical imaging, biomedical signal processing, and DNA microarray data analysis. A case study of two medical applications that includes actual patient data is used to demonstrate the suitability of applying feature selection methods in medical problems and to illustrate how these methods work in real-world scenarios.


Subject(s)
Algorithms , Models, Biological , Pattern Recognition, Automated , Humans
5.
Methods Mol Biol ; 1986: 65-85, 2019.
Article in English | MEDLINE | ID: mdl-31115885

ABSTRACT

The advent of DNA microarray datasets has stimulated a new line of research both in bioinformatics and in machine learning. This type of data is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for disease diagnosis or for distinguishing specific types of tumor. Microarray data classification is a difficult challenge for machine learning researchers due to its high number of features and the small sample sizes. This chapter is devoted to reviewing the microarray databases most frequently used in the literature. We also make the interested reader aware of the problematic of data characteristics in this domain, such as the imbalance of the data, their complexity, and the so-called dataset shift.


Subject(s)
Databases, Genetic , Oligonucleotide Array Sequence Analysis , Humans , Neoplasms/genetics , Sample Size
6.
Methods Mol Biol ; 1986: 123-152, 2019.
Article in English | MEDLINE | ID: mdl-31115887

ABSTRACT

A typical characteristic of microarray data is that it has a very high number of features (in the order of thousands) while the number of examples is usually less than 100. In the context of microarray classification, this poses a challenge for machine learning methods, which can suffer overfitting and thus degradation in their performance. A common solution is to apply a dimensionality reduction technique before classification, to reduce the number of features. This chapter will be focused on one of the most famous dimensionality reduction techniques: feature selection. We will see how feature selection can help improve the classification accuracy in several microarray data scenarios.


Subject(s)
Algorithms , Oligonucleotide Array Sequence Analysis/methods , Bayes Theorem , Databases, Genetic , Support Vector Machine
7.
Methods Mol Biol ; 1986: 283-293, 2019.
Article in English | MEDLINE | ID: mdl-31115895

ABSTRACT

The current situation in microarray data analysis and prospects for the future are briefly discussed in this chapter, in which the competition between microarray technologies and high-throughput technologies is considered under a data analysis view. The up-to-date limitations of DNA microarrays are important to forecast challenges and future trends in microarray data analysis; these include data analysis techniques associated with an increasing sample sizes, new feature selection methods, deep learning techniques, covariate significance testing as well as false discovery rate methods, among other procedures for a better interpretability of the results.


Subject(s)
Microarray Analysis/methods , Microarray Analysis/trends , Algorithms , Deep Learning , Humans
8.
Adv Exp Med Biol ; 1065: 607-626, 2018.
Article in English | MEDLINE | ID: mdl-30051410

ABSTRACT

Medicine will experience many changes in the coming years because the so-called "medicine of the future" will be increasingly proactive, featuring four basic elements: predictive, personalized, preventive, and participatory. Drivers for these changes include the digitization of data in medicine and the availability of computational tools that deal with massive volumes of data. Thus, the need to apply machine-learning methods to medicine has increased dramatically in recent years while facing challenges related to an unprecedented large number of clinically relevant features and highly specific diagnostic tests. Advances regarding data-storage technology and the progress concerning genome studies have enabled collecting vast amounts of patient clinical details, thus permitting the extraction of valuable information. In consequence, big-data analytics is becoming a mandatory technology to be used in the clinical domain.Machine learning and big-data analytics can be used in the field of cardiology, for example, for the prediction of individual risk factors for cardiovascular disease, for clinical decision support, and for practicing precision medicine using genomic information. Several projects employ machine-learning techniques to address the problem of classification and prediction of heart failure (HF) subtypes and unbiased clustering analysis using dense phenomapping to identify phenotypically distinct HF categories. In this chapter, these ideas are further presented, and a computerized model allowing the distinction between two major HF phenotypes on the basis of ventricular-volume data analysis is discussed in detail.


Subject(s)
Big Data , Cardiology/methods , Data Mining/methods , Databases, Factual , Heart Failure , Machine Learning , Cluster Analysis , Heart Failure/classification , Heart Failure/diagnosis , Heart Failure/physiopathology , Heart Failure/therapy , Humans , Prognosis , Terminology as Topic
9.
JAMA Ophthalmol ; 134(6): 651-7, 2016 Jun 01.
Article in English | MEDLINE | ID: mdl-27077667

ABSTRACT

IMPORTANCE: Published definitions of plus disease in retinopathy of prematurity (ROP) reference arterial tortuosity and venous dilation within the posterior pole based on a standard published photograph. One possible explanation for limited interexpert reliability for a diagnosis of plus disease is that experts deviate from the published definitions. OBJECTIVE: To identify vascular features used by experts for diagnosis of plus disease through quantitative image analysis. DESIGN, SETTING, AND PARTICIPANTS: A computer-based image analysis system (Imaging and Informatics in ROP [i-ROP]) was developed using a set of 77 digital fundus images, and the system was designed to classify images compared with a reference standard diagnosis (RSD). System performance was analyzed as a function of the field of view (circular crops with a radius of 1-6 disc diameters) and vessel subtype (arteries only, veins only, or all vessels). Routine ROP screening was conducted from June 29, 2011, to October 14, 2014, in neonatal intensive care units at 8 academic institutions, with a subset of 73 images independently classified by 11 ROP experts for validation. The RSD was compared with the majority diagnosis of experts. MAIN OUTCOMES AND MEASURES: The primary outcome measure was the percentage of accuracy of the i-ROP system classification of plus disease, with the RSD as a function of the field of view and vessel type. Secondary outcome measures included the accuracy of the 11 experts compared with the RSD. RESULTS: Accuracy of plus disease diagnosis by the i-ROP computer-based system was highest (95%; 95% CI, 94%-95%) when it incorporated vascular tortuosity from both arteries and veins and with the widest field of view (6-disc diameter radius). Accuracy was 90% or less when using only arterial tortuosity and 85% or less using a 2- to 3-disc diameter view similar to the standard published photograph. Diagnostic accuracy of the i-ROP system (95%) was comparable to that of 11 expert physicians (mean 87%, range 79%-99%). CONCLUSIONS AND RELEVANCE: Experts in ROP appear to consider findings from beyond the posterior retina when diagnosing plus disease and consider tortuosity of both arteries and veins, in contrast with published definitions. It is feasible for a computer-based image analysis system to perform comparably with ROP experts, using manually segmented images.


Subject(s)
Arteries/abnormalities , Image Processing, Computer-Assisted , Joint Instability/diagnosis , Retinal Vessels/pathology , Retinopathy of Prematurity/diagnosis , Skin Diseases, Genetic/diagnosis , Vascular Malformations/diagnosis , Diagnosis, Computer-Assisted , Expert Systems , Humans , Infant, Newborn , Infant, Premature , Intensive Care Units, Neonatal , Joint Instability/classification , Reproducibility of Results , Retinopathy of Prematurity/classification , Skin Diseases, Genetic/classification , Vascular Malformations/classification
10.
Transl Vis Sci Technol ; 4(6): 5, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26644965

ABSTRACT

PURPOSE: We developed and evaluated the performance of a novel computer-based image analysis system for grading plus disease in retinopathy of prematurity (ROP), and identified the image features, shapes, and sizes that best correlate with expert diagnosis. METHODS: A dataset of 77 wide-angle retinal images from infants screened for ROP was collected. A reference standard diagnosis was determined for each image by combining image grading from 3 experts with the clinical diagnosis from ophthalmoscopic examination. Manually segmented images were cropped into a range of shapes and sizes, and a computer algorithm was developed to extract tortuosity and dilation features from arteries and veins. Each feature was fed into our system to identify the set of characteristics that yielded the highest-performing system compared to the reference standard, which we refer to as the "i-ROP" system. RESULTS: Among the tested crop shapes, sizes, and measured features, point-based measurements of arterial and venous tortuosity (combined), and a large circular cropped image (with radius 6 times the disc diameter), provided the highest diagnostic accuracy. The i-ROP system achieved 95% accuracy for classifying preplus and plus disease compared to the reference standard. This was comparable to the performance of the 3 individual experts (96%, 94%, 92%), and significantly higher than the mean performance of 31 nonexperts (81%). CONCLUSIONS: This comprehensive analysis of computer-based plus disease suggests that it may be feasible to develop a fully-automated system based on wide-angle retinal images that performs comparably to expert graders at three-level plus disease discrimination. TRANSLATIONAL RELEVANCE: Computer-based image analysis, using objective and quantitative retinal vascular features, has potential to complement clinical ROP diagnosis by ophthalmologists.

11.
Clin Med Insights Cardiol ; 9(Suppl 1): 57-71, 2015.
Article in English | MEDLINE | ID: mdl-26052231

ABSTRACT

BACKGROUND: Heart failure (HF) manifests as at least two subtypes. The current paradigm distinguishes the two by using both the metric ejection fraction (EF) and a constraint for end-diastolic volume. About half of all HF patients exhibit preserved EF. In contrast, the classical type of HF shows a reduced EF. Common practice sets the cut-off point often at or near EF = 50%, thus defining a linear divider. However, a rationale for this safe choice is lacking, while the assumption regarding applicability of strict linearity has not been justified. Additionally, some studies opt for eliminating patients from consideration for HF if 40 < EF < 50% (gray zone). Thus, there is a need for documented classification guidelines, solving gray zone ambiguity and formulating crisp delineation of transitions between phenotypes. METHODS: Machine learning (ML) models are applied to classify HF subtypes within the ventricular volume domain, rather than by the single use of EF. Various ML models, both unsupervised and supervised, are employed to establish a foundation for classification. Data regarding 48 HF patients are employed as training set for subsequent classification of Monte Carlo-generated surrogate HF patients (n = 403). Next, we map consequences when EF cut-off differs from 50% (as proposed for women) and analyze HF candidates not covered by current rules. RESULTS: The training set yields best results for the Support Vector Machine method (test error 4.06%), covers the gray zone, and other clinically relevant HF candidates. End-systolic volume (ESV) emerges as a logical discriminator rather than EF as in the prevailing paradigm. CONCLUSIONS: Selected ML models offer promise for classifying HF patients (including the gray zone), when driven by ventricular volume data. ML analysis indicates that ESV has a role in the development of guidelines to parse HF subtypes. The documented curvilinear relationship between EF and ESV suggests that the assumption concerning a linear EF divider may not be of general utility over the complete clinically relevant range.

12.
IEEE J Biomed Health Inform ; 18(4): 1485-93, 2014 Jul.
Article in English | MEDLINE | ID: mdl-25014945

ABSTRACT

Dry eye is a symptomatic disease which affects a wide range of population and has a negative impact on their daily activities. Its diagnosis can be achieved by analyzing the interference patterns of the tear film lipid layer and by classifying them into one of the Guillon categories. The manual process done by experts is not only affected by subjective factors but is also very time consuming. In this paper we propose a general methodology to the automatic classification of tear film lipid layer, using color and texture information to characterize the image and feature selection methods to reduce the processing time. The adequacy of the proposed methodology was demonstrated since it achieves classification rates over 97% while maintaining robustness and provides unbiased results. Also, it can be applied in real time, and so allows important time savings for the experts.


Subject(s)
Image Processing, Computer-Assisted/methods , Lipids/chemistry , Tears/chemistry , Adult , Humans , Microscopy, Video , Support Vector Machine , Young Adult
13.
Neural Netw ; 24(8): 888-96, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21703822

ABSTRACT

Gene-expression microarray is a novel technology that allows the examination of tens of thousands of genes at a time. For this reason, manual observation is not feasible and machine learning methods are progressing to face these new data. Specifically, since the number of genes is very high, feature selection methods have proven valuable to deal with these unbalanced-high dimensionality and low cardinality-data sets. In this work, the FVQIT (Frontier Vector Quantization using Information Theory) classifier is employed to classify twelve DNA gene-expression microarray data sets of different kinds of cancer. A comparative study with other well-known classifiers is performed. The proposed approach shows competitive results outperforming all other classifiers.


Subject(s)
Databases, Genetic , Information Theory , Microarray Analysis/classification , Algorithms , Artificial Intelligence , DNA, Neoplasm/genetics , Entropy , Fuzzy Logic , Humans , Microarray Analysis/methods , Models, Genetic , Models, Statistical , Reproducibility of Results , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...