Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 119(11): e2111547119, 2022 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-35275788

RESUMEN

SignificanceWith the increase in artificial intelligence in real-world applications, there is interest in building hybrid systems that take both human and machine predictions into account. Previous work has shown the benefits of separately combining the predictions of diverse machine classifiers or groups of people. Using a Bayesian modeling framework, we extend these results by systematically investigating the factors that influence the performance of hybrid combinations of human and machine classifiers while taking into account the unique ways human and algorithmic confidence is expressed.


Asunto(s)
Inteligencia Artificial , Teorema de Bayes , Humanos
2.
Bioinformatics ; 39(10)2023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37756695

RESUMEN

MOTIVATION: Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes. RESULTS: We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes sample-level training data and predicts the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations. AVAILABILITY AND IMPLEMENTATION: The source code of CSNN and datasets used in the experiments are publicly available on GitHub (http://github.com/erobl/csnn). Raw FCS files can be downloaded from FlowRepository (ID: FR-FCM-Z6YK).


Asunto(s)
Neoplasias Hematológicas , Neoplasias , Humanos , Redes Neurales de la Computación , Neoplasias/diagnóstico , Citometría de Flujo/métodos , Programas Informáticos
3.
Biometrics ; 79(2): 826-840, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-35142367

RESUMEN

In data collection for predictive modeling, underrepresentation of certain groups, based on gender, race/ethnicity, or age, may yield less accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the machine learning literature typically build a single prediction model in a manner that encourages fair prediction performance for all groups. These approaches have two major limitations: (i) fairness is often achieved by compromising accuracy for some groups; (ii) the underlying relationship between dependent and independent variables may not be the same across groups. We propose a joint fairness model (JFM) approach for logistic regression models for binary outcomes that estimates group-specific classifiers using a joint modeling objective function that incorporates fairness criteria for prediction. We introduce an accelerated smoothing proximal gradient algorithm to solve the convex objective function, and present the key asymptotic properties of the JFM estimates. Through simulations, we demonstrate the efficacy of the JFM in achieving good prediction performance and across-group parity, in comparison with the single fairness model, group-separate model, and group-ignorant model, especially when the minority group's sample size is small. Finally, we demonstrate the utility of the JFM method in a real-world example to obtain fair risk predictions for underrepresented older patients diagnosed with coronavirus disease 2019 (COVID-19).


Asunto(s)
COVID-19 , Humanos , Modelos Logísticos , Algoritmos
4.
Cytometry A ; 97(3): 296-307, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31691488

RESUMEN

High-throughput single-cell cytometry technologies have significantly improved our understanding of cellular phenotypes to support translational research and the clinical diagnosis of hematological and immunological diseases. However, subjective and ad hoc manual gating analysis does not adequately handle the increasing volume and heterogeneity of cytometry data for optimal diagnosis. Prior work has shown that machine learning can be applied to classify cytometry samples effectively. However, many of the machine learning classification results are either difficult to interpret without using characteristics of cell populations to make the classification, or suboptimal due to the use of inaccurate cell population characteristics derived from gating boundaries. To date, little has been done to optimize both the gating boundaries and the diagnostic accuracy simultaneously. In this work, we describe a fully discriminative machine learning approach that can simultaneously learn feature representations (e.g., combinations of coordinates of gating boundaries) and classifier parameters for optimizing clinical diagnosis from cytometry measurements. The approach starts from an initial gating position and then refines the position of the gating boundaries by gradient descent until a set of globally-optimized gates across different samples are achieved. The learning procedure is constrained by regularization terms encoding domain knowledge that encourage the algorithm to seek interpretable results. We evaluate the proposed approach using both simulated and real data, producing classification results on par with those generated via human expertise, in terms of both the positions of the gating boundaries and the diagnostic accuracy. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.


Asunto(s)
Algoritmos , Aprendizaje Automático , Citometría de Flujo , Humanos
5.
Proc Natl Acad Sci U S A ; 114(33): 8689-8692, 2017 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-28784795

RESUMEN

Data science has attracted a lot of attention, promising to turn vast amounts of data into useful predictions and insights. In this article, we ask why scientists should care about data science. To answer, we discuss data science from three perspectives: statistical, computational, and human. Although each of the three is a critical component of data science, we argue that the effective combination of all three components is the essence of what data science is about.

6.
Int J Wildland Fire ; 28(11): 861-873, 2019 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-34045840

RESUMEN

Fires in boreal forests of Alaska are changing, threatening human health and ecosystems. Given expected increases in fire activity with climate warming, insight into the controls on fire size from the time of ignition is necessary. Such insight may be increasingly useful for fire management, especially in cases where many ignitions occur in a short time period. Here we investigated the controls and predictability of final fire size at the time of ignition. Using decision trees, we show that ignitions can be classified as leading to small, medium or large fires with 50.4 ± 5.2% accuracy. This was accomplished using two variables: vapour pressure deficit and the fraction of spruce cover near the ignition point. The model predicted that 40% of ignitions would lead to large fires, and those ultimately accounted for 75% of the total burned area. Other machine learning classification algorithms, including random forests and multi-layer perceptrons, were tested but did not outperform the simpler decision tree model. Applying the model to areas with intensive human management resulted in overprediction of large fires, as expected. This type of simple classification system could offer insight into optimal resource allocation, helping to maintain a historical fire regime and protect Alaskan ecosystems.

7.
PLoS Genet ; 10(7): e1004520, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25079073

RESUMEN

Mammary gland branching morphogenesis and ductal homeostasis relies on mammary stem cell function for the maintenance of basal and luminal cell compartments. The mechanisms of transcriptional regulation of the basal cell compartment are currently unknown. We explored these mechanisms in the basal cell compartment and identified the Co-factor of LIM domains (CLIM/LDB/NLI) as a transcriptional regulator that maintains these cells. Clims act within the basal cell compartment to promote branching morphogenesis by maintaining the number and proliferative potential of basal mammary epithelial stem cells. Clim2, in a complex with LMO4, supports mammary stem cells by directly targeting the Fgfr2 promoter in basal cells to increase its expression. Strikingly, Clims also coordinate basal-specific transcriptional programs to preserve luminal cell identity. These basal-derived cues inhibit epidermis-like differentiation of the luminal cell compartment and enhance the expression of luminal cell-specific oncogenes ErbB2 and ErbB3. Consistently, basal-expressed Clims promote the initiation and progression of breast cancer in the MMTV-PyMT tumor model, and the Clim-regulated branching morphogenesis gene network is a prognostic indicator of poor breast cancer outcome in humans.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Neoplasias de la Mama/genética , Proteínas de Unión al ADN/genética , Proteínas con Dominio LIM/genética , Neoplasias Basocelulares/genética , Receptor Tipo 2 de Factor de Crecimiento de Fibroblastos/genética , Factores de Transcripción/genética , Neoplasias de la Mama/metabolismo , Carcinogénesis/genética , Diferenciación Celular/genética , Células Epiteliales/metabolismo , Células Epiteliales/patología , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Glándulas Mamarias Humanas/metabolismo , Glándulas Mamarias Humanas/patología , Neoplasias Basocelulares/metabolismo , Regiones Promotoras Genéticas , Estructura Terciaria de Proteína , Receptor ErbB-2/genética , Células Madre/metabolismo , Células Madre/patología
8.
Proc Natl Acad Sci U S A ; 109(29): 11758-63, 2012 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-22753467

RESUMEN

The role of the circadian clock in skin and the identity of genes participating in its chronobiology remain largely unknown, leading us to define the circadian transcriptome of mouse skin at two different stages of the hair cycle, telogen and anagen. The circadian transcriptomes of telogen and anagen skin are largely distinct, with the former dominated by genes involved in cell proliferation and metabolism. The expression of many metabolic genes is antiphasic to cell cycle-related genes, the former peaking during the day and the latter at night. Consistently, accumulation of reactive oxygen species, a byproduct of oxidative phosphorylation, and S-phase are antiphasic to each other in telogen skin. Furthermore, the circadian variation in S-phase is controlled by BMAL1 intrinsic to keratinocytes, because keratinocyte-specific deletion of Bmal1 obliterates time-of-day-dependent synchronicity of cell division in the epidermis leading to a constitutively elevated cell proliferation. In agreement with higher cellular susceptibility to UV-induced DNA damage during S-phase, we found that mice are most sensitive to UVB-induced DNA damage in the epidermis at night. Because in the human epidermis maximum numbers of keratinocytes go through S-phase in the late afternoon, we speculate that in humans the circadian clock imposes regulation of epidermal cell proliferation so that skin is at a particularly vulnerable stage during times of maximum UV exposure, thus contributing to the high incidence of human skin cancers.


Asunto(s)
Factores de Transcripción ARNTL/metabolismo , Proliferación Celular , Ritmo Circadiano/genética , Daño del ADN/genética , Células Epidérmicas , Redes y Vías Metabólicas/genética , Transcriptoma/genética , Factores de Transcripción ARNTL/genética , Animales , Bromodesoxiuridina , Ciclo Celular/fisiología , Ritmo Circadiano/fisiología , Colchicina , Daño del ADN/fisiología , Ensayo de Inmunoadsorción Enzimática , Epidermis/efectos de la radiación , Inmunohistoquímica , Masculino , Redes y Vías Metabólicas/fisiología , Ratones , Ratones Endogámicos C57BL , Análisis por Micromatrices , Reacción en Cadena de la Polimerasa , Transcriptoma/fisiología , Rayos Ultravioleta/efectos adversos
9.
J Forensic Sci ; 69(4): 1289-1303, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38558223

RESUMEN

We investigate likelihood ratio models motivated by digital forensics problems involving time-stamped user-generated event data from a device or account. Of specific interest are scenarios where the data may have been generated by a single individual (the device/account owner) or by two different individuals (the device/account owner and someone else), such as instances in which an account was hacked or a device was stolen before being associated with a crime. Existing likelihood ratio methods in this context require that a precise time is specified at which the device or account is purported to have changed hands (the changepoint)-this is the known changepoint likelihood ratio model. In this paper, we develop a likelihood ratio model that instead accommodates uncertainty in the changepoint using Bayesian techniques, that is, an unknown changepoint likelihood ratio model. We show that the likelihood ratio in this case can be calculated in closed form as an expression that is straightforward to compute. In experiments with simulated changepoints using real-world data sets, the results demonstrate that the unknown changepoint model attains comparable performance to the known changepoint model that uses a perfectly specified changepoint, and considerably outperforms the known changepoint model that uses a misspecified changepoint, illustrating the benefit of capturing uncertainty in the changepoint.

10.
Psychol Rev ; 130(6): 1566-1591, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37589709

RESUMEN

Developing an accurate model of another agent's knowledge is central to communication and cooperation between agents. In this article, we propose a hierarchical framework of knowledge assessment that explains how people construct mental models of their own knowledge and the knowledge of others. Our framework posits that people integrate information about their own and others' knowledge via Bayesian inference. To evaluate this claim, we conduct an experiment in which participants repeatedly assess their own performance (a metacognitive task) and the performance of another person (a type of theory of mind task) on the same image classification tasks. We contrast the hierarchical framework with simpler alternatives that assume different degrees of differentiation between mental models of self and others. Our model accurately captures participants' assessment of their own performance and the performance of others in the task: Initially, people rely on their own self-assessment process to reason about the other person's performance, leading to similar self- and other-performance predictions. As more information about the other person's ability becomes available, the mental model for the other person becomes increasingly distinct from the mental model of self. Simulation studies also confirm that our framework explains a wide range of findings about human knowledge assessment of themselves and others. (PsycInfo Database Record (c) 2024 APA, all rights reserved).


Asunto(s)
Metacognición , Teoría de la Mente , Humanos , Teorema de Bayes , Conocimiento , Modelos Psicológicos
11.
Proc Mach Learn Res ; 219: 128-149, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38707261

RESUMEN

Survival analysis is a general framework for predicting the time until a specific event occurs, often in the presence of censoring. Although this framework is widely used in practice, few studies to date have considered fairness for time-to-event outcomes, despite recent significant advances in the algorithmic fairness literature more broadly. In this paper, we propose a framework to achieve demographic parity in survival analysis models by minimizing the mutual information between predicted time-to-event and sensitive attributes. We show that our approach effectively minimizes mutual information to encourage statistical independence of time-to-event predictions and sensitive attributes. Furthermore, we propose four types of disparity assessment metrics based on common survival analysis metrics. Through experiments on multiple benchmark datasets, we demonstrate that by minimizing the dependence between the prediction and the sensitive attributes, our method can systematically improve the fairness of survival predictions and is robust to censoring.

12.
medRxiv ; 2023 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-36798344

RESUMEN

Motivation: Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes. Results: We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes the available sample-level training data and predicts both the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations. Availability: The source code of CSNN and datasets used in the experiments are publicly available on GitHub and FlowRepository. Contact: Edgar E. Robles: roblesee@uci.edu and Yu Qian: mqian@jcvi.org. Supplementary information: Supplementary data are available on GitHub and at Bioinformatics online.

13.
Nat Commun ; 14(1): 3822, 2023 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-37380668

RESUMEN

Climate-driven changes in precipitation amounts and their seasonal variability are expected in many continental-scale regions during the remainder of the 21st century. However, much less is known about future changes in the predictability of seasonal precipitation, an important earth system property relevant for climate adaptation. Here, on the basis of CMIP6 models that capture the present-day teleconnections between seasonal precipitation and previous-season sea surface temperature (SST), we show that climate change is expected to alter the SST-precipitation relationships and thus our ability to predict seasonal precipitation by 2100. Specifically, in the tropics, seasonal precipitation predictability from SSTs is projected to increase throughout the year, except the northern Amazonia during boreal winter. Concurrently, in the extra-tropics predictability is likely to increase in central Asia during boreal spring and winter. The altered predictability, together with enhanced interannual variability of seasonal precipitation, poses new opportunities and challenges for regional water management.

14.
PLoS Genet ; 5(7): e1000573, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19629164

RESUMEN

Hair follicles undergo recurrent cycling of controlled growth (anagen), regression (catagen), and relative quiescence (telogen) with a defined periodicity. Taking a genomics approach to study gene expression during synchronized mouse hair follicle cycling, we discovered that, in addition to circadian fluctuation, CLOCK-regulated genes are also modulated in phase with the hair growth cycle. During telogen and early anagen, circadian clock genes are prominently expressed in the secondary hair germ, which contains precursor cells for the growing follicle. Analysis of Clock and Bmal1 mutant mice reveals a delay in anagen progression, and the secondary hair germ cells show decreased levels of phosphorylated Rb and lack mitotic cells, suggesting that circadian clock genes regulate anagen progression via their effect on the cell cycle. Consistent with a block at the G1 phase of the cell cycle, we show a significant upregulation of p21 in Bmal1 mutant skin. While circadian clock mechanisms have been implicated in a variety of diurnal biological processes, our findings indicate that circadian clock genes may be utilized to modulate the progression of non-diurnal cyclic processes.


Asunto(s)
Folículo Piloso/fisiología , Cabello/crecimiento & desarrollo , Fenómenos Fisiológicos de la Piel , Factores de Transcripción ARNTL , Animales , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Relojes Biológicos , Proteínas CLOCK , Ritmo Circadiano , Perfilación de la Expresión Génica , Ratones , Ratones Endogámicos C57BL , Transactivadores/genética , Transactivadores/metabolismo
15.
Proc Mach Learn Res ; 162: 5286-5308, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37016636

RESUMEN

Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term based solely on the linear components of the GLM, thus permitting efficient optimization. We also derive theoretical properties for the resulting fair GLM estimator. To empirically demonstrate the efficacy of the proposed fair GLM, we compare it with other wellknown fair prediction methods on an extensive set of benchmark datasets for binary classification and regression. In addition, we demonstrate that the fair GLM can generate fair predictions for a range of response variables, other than binary and continuous outcomes.

16.
Sci Data ; 9(1): 249, 2022 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-35637186

RESUMEN

Changing wildfire regimes in the western US and other fire-prone regions pose considerable risks to human health and ecosystem function. However, our understanding of wildfire behavior is still limited by a lack of data products that systematically quantify fire spread, behavior and impacts. Here we develop a novel object-based system for tracking the progression of individual fires using 375 m Visible Infrared Imaging Radiometer Suite active fire detections. At each half-daily time step, fire pixels are clustered according to their spatial proximity, and are either appended to an existing active fire object or are assigned to a new object. This automatic system allows us to update the attributes of each fire event, delineate the fire perimeter, and identify the active fire front shortly after satellite data acquisition. Using this system, we mapped the history of California fires during 2012-2020. Our approach and data stream may be useful for calibration and evaluation of fire spread models, estimation of near-real-time wildfire emissions, and as means for prescribing initial conditions in fire forecast models.

17.
Bioinformatics ; 26(6): 770-6, 2010 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-20147305

RESUMEN

MOTIVATION: Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate. RESULTS: We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study. AVAILABILITY: Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/ CONTACT: ihler@ics.uci.edu.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Animales , Ratones , Modelos Estadísticos
18.
Proc Mach Learn Res ; 146: 159-170, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35372850

RESUMEN

Dynamic survival analysis is a variant of traditional survival analysis where time-to-event predictions are updated as new information arrives about an individual over time. In this paper we propose a new approach to dynamic survival analysis based on learning a global parametric distribution, followed by individualization via truncating and renormalizing that distribution at different locations over time. We combine this approach with a likelihood-based loss that includes predictions at every time step within an individual's history, rather than just including one term per individual. The combination of this loss and model results in an interpretable approach to dynamic survival, requiring less fine tuning than existing methods, while still achieving good predictive performance. We evaluate the approach on the problem of predicting hospital mortality for a dataset with over 6900 COVID-19 patients.

19.
ArXiv ; 2021 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-34012993

RESUMEN

In data collection for predictive modeling, under-representation of certain groups, based on gender, race/ethnicity, or age, may yield less-accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the machine learning literature typically build a single prediction model in a manner that encourages fair prediction performance for all groups. These approaches have two major limitations: i) fairness is often achieved by compromising accuracy for some groups; ii) the underlying relationship between dependent and independent variables may not be the same across groups. We propose a Joint Fairness Model (JFM) approach for logistic regression models for binary outcomes that estimates group-specific classifiers using a joint modeling objective function that incorporates fairness criteria for prediction. We introduce an Accelerated Smoothing Proximal Gradient Algorithm to solve the convex objective function, and present the key asymptotic properties of the JFM estimates. Through simulations, we demonstrate the efficacy of the JFM in achieving good prediction performance and across-group parity, in comparison with the single fairness model, group-separate model, and group-ignorant model, especially when the minority group's sample size is small. Finally, we demonstrate the utility of the JFM method in a real-world example to obtain fair risk predictions for under-represented older patients diagnosed with coronavirus disease 2019 (COVID-19).

20.
Proc Mach Learn Res ; 149: 648-673, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35425906

RESUMEN

The widespread availability of high-dimensional electronic healthcare record (EHR) datasets has led to significant interest in using such data to derive clinical insights and make risk predictions. More specifically, techniques from machine learning are being increasingly applied to the problem of dynamic survival analysis, where updated time-to-event risk predictions are learned as a function of the full covariate trajectory from EHR datasets. EHR data presents unique challenges in the context of dynamic survival analysis, involving a variety of decisions about data representation, modeling, interpretability, and clinically meaningful evaluation. In this paper we propose a new approach to dynamic survival analysis which addresses some of these challenges. Our modeling approach is based on learning a global parametric distribution to represent population characteristics and then dynamically locating individuals on the time-axis of this distribution conditioned on their histories. For evaluation we also propose a new version of the dynamic C-Index for clinically meaningful evaluation of dynamic survival models. To validate our approach we conduct dynamic risk prediction on three real-world datasets, involving COVID-19 severe outcomes, cardiovascular disease (CVD) onset, and primary biliary cirrhosis (PBC) time-to-transplant. We find that our proposed modeling approach is competitive with other well-known statistical and machine learning approaches for dynamic risk prediction, while offering potential advantages in terms of interepretability of predictions at the individual level.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA