Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 10186-10195, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-34941500

RESUMEN

The estimation of nested functions (i.e., functions of functions) is one of the central reasons for the success and popularity of machine learning. Today, artificial neural networks are the predominant class of algorithms in this area, known as representational learning. Here, we introduce Representational Gradient Boosting (RGB), a nonparametric algorithm that estimates functions with multi-layer architectures obtained using backpropagation in the space of functions. RGB does not need to assume a functional form in the nodes or output (e.g., linear models or rectified linear units), but rather estimates these transformations. RGB can be seen as an optimized stacking procedure where a meta algorithm learns how to combine different classes of functions (e.g., Neural Networks (NN) and Gradient Boosting (GB)), while building and optimizing them jointly in an attempt to compensate each other's weaknesses. This highlights a stark difference with current approaches to meta-learning that combine models only after they have been built independently. We showed that providing optimized stacking is one of the main advantages of RGB over current approaches. Additionally, due to the nested nature of RGB we also showed how it improves over GB in problems that have several high-order interactions. Finally, we investigate both theoretically and in practice the problem of recovering nested functions and the value of prior knowledge.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Aprendizaje Automático
2.
Proc Natl Acad Sci U S A ; 117(35): 21175-21184, 2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32817416

RESUMEN

A method for decision tree induction is presented. Given a set of predictor variables [Formula: see text] and two outcome variables y and z associated with each x, the goal is to identify those values of x for which the respective distributions of [Formula: see text] and [Formula: see text], or selected properties of those distributions such as means or quantiles, are most different. Contrast trees provide a lack-of-fit measure for statistical models of such statistics, or for the complete conditional distribution [Formula: see text], as a function of x. They are easily interpreted and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method. A corresponding contrast-boosting strategy is described for remedying any uncovered errors, thereby producing potentially more accurate predictions. This leads to a distribution-boosting strategy for directly estimating the full conditional distribution of y at each x under no assumptions concerning its shape, form, or parametric representation.

4.
Proc Natl Acad Sci U S A ; 117(9): 4571-4577, 2020 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-32071251

RESUMEN

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications.


Asunto(s)
Sistemas Especialistas , Aprendizaje Automático/normas , Informática Médica/métodos , Manejo de Datos/métodos , Sistemas de Administración de Bases de Datos , Informática Médica/normas
5.
Proc Natl Acad Sci U S A ; 116(40): 19887-19893, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-31527280

RESUMEN

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.


Asunto(s)
Algoritmos , Árboles de Decisión , Aprendizaje Automático , Bases de Datos Factuales , Modelos Estadísticos , Lenguajes de Programación
6.
J Am Stat Assoc ; 106(495): 1125-1138, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-25580042

RESUMEN

We address the problem of sparse selection in linear models. A number of nonconvex penalties have been proposed in the literature for this purpose, along with a variety of convex-relaxation algorithms for finding good solutions. In this article we pursue a coordinate-descent approach for optimization, and study its convergence properties. We characterize the properties of penalties suitable for this approach, study their corresponding threshold functions, and describe a df-standardizing reparametrization that assists our pathwise algorithm. The MC+ penalty is ideally suited to this task, and we use it to demonstrate the performance of our algorithm. Certain technical derivations and experiments related to this article are included in the Supplementary Materials section.

7.
Stat Med ; 22(9): 1365-81, 2003 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-12704603

RESUMEN

Predicting future outcomes based on knowledge obtained from past observational data is a common application in a wide variety of areas of scientific research. In the present paper, prediction will be focused on various grades of cervical preneoplasia and neoplasia. Statistical tools used for prediction should of course possess predictive accuracy, and preferably meet secondary requirements such as speed, ease of use, and interpretability of the resulting predictive model. A new automated procedure based on an extension (called 'boosting') of regression and classification tree (CART) models is described. The resulting tool is a fast 'off-the-shelf' procedure for classification and regression that is competitive in accuracy with more customized approaches, while being fairly automatic to use (little tuning), and highly robust especially when applied to less than clean data. Additional tools are presented for interpreting and visualizing the results of such multiple additive regression tree (MART) models.


Asunto(s)
Métodos Epidemiológicos , Modelos Estadísticos , Análisis de Regresión , Carcinoma de Células Escamosas/epidemiología , Femenino , Humanos , Valor Predictivo de las Pruebas , Neoplasias del Cuello Uterino/epidemiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...