Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
BMC Med Res Methodol ; 23(1): 144, 2023 06 19.
Artículo en Inglés | MEDLINE | ID: mdl-37337173

RESUMEN

BACKGROUND: Machine learning tools such as random forests provide important opportunities for modeling large, complex modern data generated in medicine. Unfortunately, when it comes to understanding why machine learning models are predictive, applied research continues to rely on 'out of bag' (OOB) variable importance metrics (VIMPs) that are known to have considerable shortcomings within the statistics community. After explaining the limitations of OOB VIMPs - including bias towards correlated features and limited interpretability - we describe a modern approach called 'knockoff VIMPs' and explain its advantages. METHODS: We first evaluate current VIMP practices through an in-depth literature review of 50 recent random forest manuscripts. Next, we recommend organized and interpretable strategies for analysis with knockoff VIMPs, including computing them for groups of features and considering multiple model performance metrics. To demonstrate methods, we develop a random forest to predict 5-year incident stroke in the Sleep Heart Health Study and compare results based on OOB and knockoff VIMPs. RESULTS: Nearly all papers in the literature review contained substantial limitations in their use of VIMPs. In our demonstration, using OOB VIMPs for individual variables suggested two highly correlated lung function variables (forced expiratory volume, forced vital capacity) as the best predictors of incident stroke, followed by age and height. Using an organized analytic approach that considered knockoff VIMPs of both groups of features and individual features, the largest contributions to model sensitivity were medications (especially cardiovascular) and measured medical risk factors, while the largest contributions to model specificity were age, diastolic blood pressure, self-reported medical risk factors, polysomnography features, and pack-years of smoking. Thus, we reach very different conclusions about stroke risk factors using OOB VIMPs versus knockoff VIMPs. CONCLUSIONS: The near-ubiquitous reliance on OOB VIMPs may provide misleading results for researchers who use such methods to guide their research. Given the rapid pace of scientific inquiry using machine learning, it is essential to bring modern knockoff VIMPs that are interpretable and unbiased into widespread applied practice to steer researchers using random forest machine learning toward more meaningful results.


Asunto(s)
Bosques Aleatorios , Accidente Cerebrovascular , Humanos , Benchmarking , Aprendizaje Automático , Accidente Cerebrovascular/diagnóstico , Accidente Cerebrovascular/epidemiología , Sueño
2.
J Sleep Res ; 30(6): e13386, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-33991144

RESUMEN

Clarifying whether physiological sleep measures predict mortality could inform risk screening; however, such investigations should account for complex and potentially non-linear relationships among health risk factors. We aimed to establish the predictive utility of polysomnography (PSG)-assessed sleep measures for mortality using a novel permutation random forest (PRF) machine learning framework. Data collected from the years 1995 to present are from the Sleep Heart Health Study (SHHS; n = 5,734) and the Wisconsin Sleep Cohort Study (WSCS; n = 1,015), and include initial assessments of sleep and health, and up to 15 years of follow-up for all-cause mortality. We applied PRF models to quantify the predictive abilities of 24 measures grouped into five domains: PSG-assessed sleep (four measures), self-reported sleep (three), health (eight), health behaviours (four), and sociodemographic factors (five). A 10-fold repeated internal validation (WSCS and SHHS combined) and external validation (training in SHHS; testing in WSCS) were used to compute unbiased variable importance metrics and associated p values. We observed that health, sociodemographic factors, and PSG-assessed sleep domains predicted mortality using both external validation and repeated internal validation. The PSG-assessed sleep efficiency and the percentage of sleep time with oxygen saturation <90% were among the most predictive individual measures. Multivariable Cox regression also revealed the PSG-assessed sleep domain to be predictive, with very low sleep efficiency and high hypoxaemia conferring the highest risk. These findings, coupled with the emergence of new low-burden technologies for objectively assessing sleep and overnight oxygen saturation, suggest that consideration of physiological sleep measures may improve risk screening.


Asunto(s)
Sueño , Adulto , Estudios de Cohortes , Humanos , Aprendizaje Automático
3.
Assessment ; 27(4): 840-854, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-29457474

RESUMEN

Accuracy has several elements, not all of which have received equal attention in the field of clinical psychology. Calibration, the degree to which a probabilistic estimate of an event reflects the true underlying probability of the event, has largely been neglected in the field of clinical psychology in favor of other components of accuracy such as discrimination (e.g., sensitivity, specificity, area under the receiver operating characteristic curve). Although it is frequently overlooked, calibration is a critical component of accuracy with particular relevance for prognostic models and risk-assessment tools. With advances in personalized medicine and the increasing use of probabilistic (0% to 100%) estimates and predictions in mental health research, the need for careful attention to calibration has become increasingly important.


Asunto(s)
Psicología Clínica , Calibración , Humanos , Probabilidad , Pronóstico , Curva ROC
4.
J Comput Graph Stat ; 26(3): 589-597, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-30906174

RESUMEN

While statistical learning methods have proved powerful tools for predictive modeling, the black-box nature of the models they produce can severely limit their interpretability and the ability to conduct formal inference. However, the natural structure of ensemble learners like bagged trees and random forests has been shown to admit desirable asymptotic properties when base learners are built with proper subsamples. In this work, we demonstrate that by defining an appropriate grid structure on the covariate space, we may carry out formal hypothesis tests for both variable importance and underlying additive model structure. To our knowledge, these tests represent the first statistical tools for investigating the underlying regression structure in a context such as random forests. We develop notions of total and partial additivity and further demonstrate that testing can be carried out at no additional computational cost by estimating the variance within the process of constructing the ensemble. Furthermore, we propose a novel extension of these testing procedures utilizing random projections in order to allow for computationally efficient testing procedures that retain high power even when the grid size is much larger than that of the training set.

5.
Eval Program Plann ; 60: 284-292, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27590739

RESUMEN

Planning and evaluating projects often involves input from many stakeholders. Fusing and organizing many different ideas, opinions, and interpretations into a coherent and acceptable plan or project evaluation is challenging. This is especially true when seeking contributions from a large number of participants, especially when not all can participate in group discussions, or when some prefer to contribute their perspectives anonymously. One of the major breakthroughs in the area of evaluation and program planning has been the use of graphical tools to represent the brainstorming process. This provides a quantitative framework for organizing ideas and general concepts into simple-to-interpret graphs. We developed a new, open-source concept mapping software called R-CMap, which is implemented in R. This software provides a graphical user interface to guide users through the analytical process of concept mapping. The R-CMap software allows users to generate a variety of plots, including cluster maps, point rating and cluster rating maps, as well as pattern matching and go-zone plots. Additionally, R-CMap is capable of generating detailed reports that contain useful statistical summaries of the data. The plots and reports can be embedded in Microsoft Office tools such as Word and PowerPoint, where users may manually adjust various plot and table features to achieve the best visual results in their presentations and official reports. The graphical user interface of R-CMap allows users to define cluster names, change the number of clusters, select rating variables for relevant plots, and importantly, select subsets of respondents by demographic criteria. The latter is particularly useful to project managers in order to identify different patterns of preferences by subpopulations. R-CMap is user-friendly, and does not require any programming experience. However, proficient R users can add to its functionality by directly accessing built-in functions in R and sharing new features with the concept mapping community.


Asunto(s)
Análisis por Conglomerados , Procesos de Grupo , Desarrollo de Programa/métodos , Evaluación de Programas y Proyectos de Salud/métodos , Proyectos de Investigación , Diseño de Software , Conducta Cooperativa , Investigación Empírica , Humanos , Reproducibilidad de los Resultados , Interfaz Usuario-Computador
6.
Nat Commun ; 7: 13666, 2016 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-27966532

RESUMEN

Altered DNA methylation is common in cancer and often considered an early event in tumorigenesis. However, the sources of heterogeneity of DNA methylation among tumours remain poorly defined. Here we capitalize on the availability of multi-platform data on thousands of human tumours to build integrative models of DNA methylation. We quantify the contribution of clinical and molecular factors in explaining intertumoral variability in DNA methylation. We show that the levels of a set of metabolic genes involved in the methionine cycle is predictive of several features of DNA methylation in tumours, including the methylation of cancer genes. Finally, we demonstrate that patients whose DNA methylation can be predicted from the methionine cycle exhibited improved survival over cases where this regulation is disrupted. This study represents a comprehensive analysis of the determinants of methylation and demonstrates the surprisingly large interaction between metabolism and DNA methylation variation. Together, our results quantify links between tumour metabolism and epigenetics and outline clinical implications.


Asunto(s)
Metilación de ADN , Modelos Biológicos , Neoplasias/genética , Epigénesis Genética , Humanos , Neoplasias/metabolismo , Análisis de Supervivencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...