Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Stat Sin ; 30: 1857-1879, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33311956

RESUMEN

Due to heterogeneity for many chronic diseases, precise personalized medicine, also known as precision medicine, has drawn increasing attentions in the scientific community. One main goal of precision medicine is to develop the most effective tailored therapy for each individual patient. To that end, one needs to incorporate individual characteristics to detect a proper individual treatment rule (ITR), by which suitable decisions on treatment assignments can be made to optimize patients' clinical outcome. For binary treatment settings, outcome weighted learning (OWL) and several of its variations have been proposed recently to estimate the ITR by optimizing the conditional expected outcome given patients' information. However, for multiple treatment scenarios, it remains unclear how to use OWL effectively. It can be shown that some direct extensions of OWL for multiple treatments, such as one-versus-one and one-versus-rest methods, can yield suboptimal performance. In this paper, we propose a new learning method, named Multicategory Outcome weighted Margin-based Learning (MOML), for estimating ITR with multiple treatments. Our proposed method is very general and covers OWL as a special case. We show Fisher consistency for the estimated ITR, and establish convergence rate properties. Variable selection using the sparse l 1 penalty is also considered. Analysis of simulated examples and a type 2 diabetes mellitus observational study are used to demonstrate competitive performance of the proposed method.

2.
Neuroimage ; 175: 230-245, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29596980

RESUMEN

With the development of advanced imaging techniques, scientists are interested in identifying imaging biomarkers that are related to different subtypes or transitional stages of various cancers, neuropsychiatric diseases, and neurodegenerative diseases, among many others. In this paper, we propose a novel spatial multi-category angle-based classifier (SMAC) for the efficient identification of such imaging biomarkers. The proposed SMAC not only utilizes the spatial structure of high-dimensional imaging data but also handles both binary and multi-category classification problems. We introduce an efficient algorithm based on an alternative direction method of multipliers to solve the large-scale optimization problem for SMAC. Both our simulation and real data experiments demonstrate the usefulness of SMAC.


Asunto(s)
Algoritmos , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Disfunción Cognitiva/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Neuroimagen/métodos , Anciano , Anciano de 80 o más Años , Biomarcadores , Clasificación , Femenino , Humanos , Masculino
3.
Evol Comput ; 26(1): 43-66, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-27982696

RESUMEN

Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this article we discuss a meta-learning algorithm (PSBML) that combines concepts from spatially structured evolutionary algorithms (SSEAs) with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the trade-off achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.


Asunto(s)
Algoritmos , Inteligencia Artificial , Simulación por Computador , Modelos Teóricos , Bases de Datos Factuales , Humanos
4.
Proteins ; 85(9): 1724-1740, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28598584

RESUMEN

Due to Ca2+ -dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet-lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet-lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large-margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM-binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome-wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif-based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub-sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels.


Asunto(s)
Proteínas de Unión a Calmodulina/química , Calmodulina/química , Proteoma/genética , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Sitios de Unión , Proteínas de Unión a Calmodulina/genética , Simulación por Computador , Unión Proteica , Proteoma/química
5.
Neural Netw ; 178: 106457, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38908166

RESUMEN

This study introduces a novel hyperparameter in the Softmax function to regulate the rate of gradient decay, which is dependent on sample probability. Our theoretical and empirical analyses reveal that both model generalization and calibration are significantly influenced by the gradient decay rate, particularly as confidence probability increases. Notably, the gradient decay varies in a convex or concave manner with rising sample probability. When employing a smaller gradient decay, we observe a curriculum learning sequence. This sequence highlights hard samples only after easy samples are adequately trained, and allows well-separated samples to receive a higher gradient, effectively reducing intra-class distances. However, this approach has a drawback: small gradient decay tends to exacerbate model overconfidence, shedding light on the calibration issues prevalent in modern neural networks. In contrast, a larger gradient decay addresses these issues effectively, surpassing even models that utilize post-calibration methods. Our findings provide substantial evidence that large margin Softmax can influence the local Lipschitz constraint by manipulating the probability-dependent gradient decay rate. This research contributes a fresh perspective and understanding of the interplay between large margin Softmax, curriculum learning, and model calibration through an exploration of gradient decay rates. Additionally, we propose a novel warm-up strategy that dynamically adjusts the gradient decay for a smoother L-constraint in early training, then mitigating overconfidence in the final model.


Asunto(s)
Redes Neurales de la Computación , Calibración , Algoritmos , Probabilidad , Humanos , Aprendizaje Automático
6.
Neural Netw ; 154: 165-178, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35882084

RESUMEN

Multi-metric learning plays a significant role in improving the generalization of algorithms related to distance metrics since using a single metric is sometimes insufficient to handle complex data. Metric learning can adjust automatically the distance between samples to make the intra-class samples compact while making the inter-class distance as far as possible. To implement this intention better,in this work, we propose a novel multi-metric learning framework based on the pair constraints instead of triple constraints to reduce computational burden. To solve effectively the problem, we first propose a multi-birth metric learning model (termed MBML), where for each class sample, the global metric and a local metric are jointly trained. Both global and local structural information are adapted to better depict sample information. Then two alternating iterative algorithms are developed to optimize the MBML. The convergence of the proposed algorithm and complexity are analyzed theoretically. Moreover, a fast diagonal multi-metric learning method is proposed based on binary constraints, and problem can be reformulated a linear programming, with fast training speed, low the computational burden and the global optimal solutions. Numerical experiments are carried out on different scales and different types of datasets including an artificial data, benchmark datasets and an image database from binary class and multi-class problems. Experiment results confirm the feasibility and effectiveness of the proposed methods.


Asunto(s)
Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Bases de Datos Factuales , Aprendizaje , Reconocimiento de Normas Patrones Automatizadas/métodos
7.
Signal Image Video Process ; 16(7): 1991-1999, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35469317

RESUMEN

Today, we are facing the COVID-19 pandemic. Accordingly, properly wearing face masks has become vital as an effective way to prevent the rapid spread of COVID-19. This research develops an Efficient Mask-Net method for low-power devices, such as mobile and embedding models with low-memory requirements. The method identifies face mask-wearing conditions in two different schemes: I. Correctly Face Mask (CFM), Incorrectly Face Mask (IFM), and Not Face Mask (NFM) wearing; II. Uncovered Chin IFM, Uncovered Nose IFM, and Uncovered Nose and Mouth IFM. The proposed method can also be helpful to unmask the face for face authentication based on unconstrained 2D facial images in the wild. In this study, deep convolutional neural networks (CNNs) were employed as feature extractors. Then, deep features were fed to a recently proposed large margin piecewise linear (LMPL) classifier. In the experimental study, lightweight and very powerful mobile implementation of CNN models were evaluated, where the novel "EffientNetb0" deep feature extractor with LMPL classifier outperformed well-known end-to-end CNN models, as well as conventional image classification methods. It achieved high accuracies of 99.53 and 99.64% in fulfilling the two mentioned tasks, respectively.

8.
Neural Netw ; 142: 509-521, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-34298266

RESUMEN

In some multiple instance learning (MIL) applications, positive bags are sparse (i.e. containing only a small fraction of positive instances). To deal with the imbalanced data caused by these situations, we present a novel MIL method based on a small sphere and large margin approach (SSLM-MIL). Due to the introduction of a large margin, SSLM-MIL enforces the desired constraint that for all positive bags, there is at least one positive instance in each bag. Moreover, our framework is flexible to incorporate the non-convex optimization problem. Therefore, we can solve it using the concave-convex procedure (CCCP). Still, CCCP may be computationally inefficient for the number of external iterations. Inspired by the existing safe screening rules, which can effectively reduce computational time by discarding some inactive instances. In this paper, we propose a strategy to reduce the scale of the optimization problem. Specifically, we construct a screening rule in the inner solver and another rule for propagating screened instances between iterations of CCCP. To the best of our knowledge, this is the first attempt to introduce safe instance screening to a non-convex hypersphere support vector machine. Experiments on thirty-one benchmark datasets demonstrate the safety and effectiveness of our approach.


Asunto(s)
Máquina de Vectores de Soporte
9.
Comput Biol Med ; 139: 104927, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34688172

RESUMEN

The world has experienced epidemics of coronavirus infections several times over the last two decades. Recent studies have shown that using medical imaging techniques can be useful in developing an automatic computer-aided diagnosis system to detect pandemic diseases with high accuracy at an early stage. In this study, a large margin piecewise linear classifier was developed to diagnose COVID-19 compared to a wide range of viral pneumonia, including SARS and MERS, using chest x-ray images. In the proposed method, a preprocessing pipeline was employed. Moreover, deep pre- and post-rectified linear unit (ReLU) features were extracted using the well-known VGG-Net19, which was fine-tuned to optimize transfer learning. Afterward, the canonical correlation analysis was performed for feature fusion, and fused deep features were passed into the LMPL classifier. The introduced method reached the highest performance in comparison with related state-of-the-art methods for two different schemes (normal, COVID-19, and typical viral pneumonia) and (COVID-19, SARS, and MERS pneumonia) with 99.39% and 98.86% classification accuracy, respectively.


Asunto(s)
COVID-19 , Aprendizaje Profundo , Neumonía Viral , Análisis de Correlación Canónica , Humanos , Redes Neurales de la Computación , SARS-CoV-2
10.
Stat Anal Data Min ; 9(2): 75-88, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-27326311

RESUMEN

High dimensional classification problems are prevalent in a wide range of modern scientific applications. Despite a large number of candidate classification techniques available to use, practitioners often face a dilemma of choosing between linear and general nonlinear classifiers. Specifically, simple linear classifiers have good interpretability, but may have limitations in handling data with complex structures. In contrast, general nonlinear classifiers are more flexible, but may lose interpretability and have higher tendency for overfitting. In this paper, we consider data with potential latent subgroups in the classes of interest. We propose a new method, namely the Composite Large Margin Classifier (CLM), to address the issue of classification with latent subclasses. The CLM aims to find three linear functions simultaneously: one linear function to split the data into two parts, with each part being classified by a different linear classifier. Our method has comparable prediction accuracy to a general nonlinear classifier, and it maintains the interpretability of traditional linear classifiers. We demonstrate the competitive performance of the CLM through comparisons with several existing linear and nonlinear classifiers by Monte Carlo experiments. Analysis of the Alzheimer's disease classification problem using CLM not only provides a lower classification error in discriminating cases and controls, but also identifies subclasses in controls that are more likely to develop the disease in the future.

11.
J Mach Learn Res ; 14: 1349-1386, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24415909

RESUMEN

Hard and soft classifiers are two important groups of techniques for classification problems. Logistic regression and Support Vector Machines are typical examples of soft and hard classifiers respectively. The essential difference between these two groups is whether one needs to estimate the class conditional probability for the classification task or not. In particular, soft classifiers predict the label based on the obtained class conditional probabilities, while hard classifiers bypass the estimation of probabilities and focus on the decision boundary. In practice, for the goal of accurate classification, it is unclear which one to use in a given situation. To tackle this problem, the Large-margin Unified Machine (LUM) was recently proposed as a unified family to embrace both groups. The LUM family enables one to study the behavior change from soft to hard binary classifiers. For multicategory cases, however, the concept of soft and hard classification becomes less clear. In that case, class probability estimation becomes more involved as it requires estimation of a probability vector. In this paper, we propose a new Multicategory LUM (MLUM) framework to investigate the behavior of soft versus hard classification under multicategory settings. Our theoretical and numerical results help to shed some light on the nature of multicategory classification and its transition behavior from soft to hard classifiers. The numerical results suggest that the proposed tuned MLUM yields very competitive performance.

12.
Artículo en Inglés | MEDLINE | ID: mdl-24363545

RESUMEN

Large margin classifiers have been shown to be very useful in many applications. The Support Vector Machine is a canonical example of large margin classifiers. Despite their flexibility and ability in handling high dimensional data, many large margin classifiers have serious drawbacks when the data are noisy, especially when there are outliers in the data. In this paper, we propose a new weighted large margin classification technique. The weights are chosen adaptively with data. The proposed classifiers are shown to be robust to outliers and thus are able to produce more accurate classification results.

13.
J Am Stat Assoc ; 108(502): 553-565, 2013 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-24039320

RESUMEN

Constructing classification rules for accurate diagnosis of a disorder is an important goal in medical practice. In many clinical applications, there is no clinically significant anatomical or physiological deviation exists to identify the gold standard disease status to inform development of classification algorithms. Despite absence of perfect disease class identifiers, there are usually one or more disease-informative auxiliary markers along with feature variables comprising known symptoms. Existing statistical learning approaches do not effectively draw information from auxiliary prognostic markers. We propose a large margin classification method, with particular emphasis on the support vector machine (SVM), assisted by available informative markers in order to classify disease without knowing a subject's true disease status. We view this task as statistical learning in the presence of missing data, and introduce a pseudo-EM algorithm to the classification. A major distinction with a regular EM algorithm is that we do not model the distribution of missing data given the observed feature variables either parametrically or semiparametrically. We also propose a sparse variable selection method embedded in the pseudo-EM algorithm. Theoretical examination shows that the proposed classification rule is Fisher consistent, and that under a linear rule, the proposed selection has an oracle variable selection property and the estimated coefficients are asymptotically normal. We apply the methods to build decision rules for including subjects in clinical trials of a new psychiatric disorder and present four applications to data available at the UCI Machine Learning Repository.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA