Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
J Forensic Sci ; 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38558223

RESUMO

We investigate likelihood ratio models motivated by digital forensics problems involving time-stamped user-generated event data from a device or account. Of specific interest are scenarios where the data may have been generated by a single individual (the device/account owner) or by two different individuals (the device/account owner and someone else), such as instances in which an account was hacked or a device was stolen before being associated with a crime. Existing likelihood ratio methods in this context require that a precise time is specified at which the device or account is purported to have changed hands (the changepoint)-this is the known changepoint likelihood ratio model. In this paper, we develop a likelihood ratio model that instead accommodates uncertainty in the changepoint using Bayesian techniques, that is, an unknown changepoint likelihood ratio model. We show that the likelihood ratio in this case can be calculated in closed form as an expression that is straightforward to compute. In experiments with simulated changepoints using real-world data sets, the results demonstrate that the unknown changepoint model attains comparable performance to the known changepoint model that uses a perfectly specified changepoint, and considerably outperforms the known changepoint model that uses a misspecified changepoint, illustrating the benefit of capturing uncertainty in the changepoint.

2.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37756695

RESUMO

MOTIVATION: Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes. RESULTS: We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes sample-level training data and predicts the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations. AVAILABILITY AND IMPLEMENTATION: The source code of CSNN and datasets used in the experiments are publicly available on GitHub (http://github.com/erobl/csnn). Raw FCS files can be downloaded from FlowRepository (ID: FR-FCM-Z6YK).


Assuntos
Neoplasias Hematológicas , Neoplasias , Humanos , Redes Neurais de Computação , Neoplasias/diagnóstico , Citometria de Fluxo/métodos , Software
3.
Psychol Rev ; 130(6): 1566-1591, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37589709

RESUMO

Developing an accurate model of another agent's knowledge is central to communication and cooperation between agents. In this article, we propose a hierarchical framework of knowledge assessment that explains how people construct mental models of their own knowledge and the knowledge of others. Our framework posits that people integrate information about their own and others' knowledge via Bayesian inference. To evaluate this claim, we conduct an experiment in which participants repeatedly assess their own performance (a metacognitive task) and the performance of another person (a type of theory of mind task) on the same image classification tasks. We contrast the hierarchical framework with simpler alternatives that assume different degrees of differentiation between mental models of self and others. Our model accurately captures participants' assessment of their own performance and the performance of others in the task: Initially, people rely on their own self-assessment process to reason about the other person's performance, leading to similar self- and other-performance predictions. As more information about the other person's ability becomes available, the mental model for the other person becomes increasingly distinct from the mental model of self. Simulation studies also confirm that our framework explains a wide range of findings about human knowledge assessment of themselves and others. (PsycInfo Database Record (c) 2024 APA, all rights reserved).


Assuntos
Metacognição , Teoria da Mente , Humanos , Teorema de Bayes , Conhecimento , Modelos Psicológicos
5.
Nat Commun ; 14(1): 3822, 2023 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-37380668

RESUMO

Climate-driven changes in precipitation amounts and their seasonal variability are expected in many continental-scale regions during the remainder of the 21st century. However, much less is known about future changes in the predictability of seasonal precipitation, an important earth system property relevant for climate adaptation. Here, on the basis of CMIP6 models that capture the present-day teleconnections between seasonal precipitation and previous-season sea surface temperature (SST), we show that climate change is expected to alter the SST-precipitation relationships and thus our ability to predict seasonal precipitation by 2100. Specifically, in the tropics, seasonal precipitation predictability from SSTs is projected to increase throughout the year, except the northern Amazonia during boreal winter. Concurrently, in the extra-tropics predictability is likely to increase in central Asia during boreal spring and winter. The altered predictability, together with enhanced interannual variability of seasonal precipitation, poses new opportunities and challenges for regional water management.

6.
medRxiv ; 2023 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-36798344

RESUMO

Motivation: Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes. Results: We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes the available sample-level training data and predicts both the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations. Availability: The source code of CSNN and datasets used in the experiments are publicly available on GitHub and FlowRepository. Contact: Edgar E. Robles: roblesee@uci.edu and Yu Qian: mqian@jcvi.org. Supplementary information: Supplementary data are available on GitHub and at Bioinformatics online.

7.
Biometrics ; 79(2): 826-840, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-35142367

RESUMO

In data collection for predictive modeling, underrepresentation of certain groups, based on gender, race/ethnicity, or age, may yield less accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the machine learning literature typically build a single prediction model in a manner that encourages fair prediction performance for all groups. These approaches have two major limitations: (i) fairness is often achieved by compromising accuracy for some groups; (ii) the underlying relationship between dependent and independent variables may not be the same across groups. We propose a joint fairness model (JFM) approach for logistic regression models for binary outcomes that estimates group-specific classifiers using a joint modeling objective function that incorporates fairness criteria for prediction. We introduce an accelerated smoothing proximal gradient algorithm to solve the convex objective function, and present the key asymptotic properties of the JFM estimates. Through simulations, we demonstrate the efficacy of the JFM in achieving good prediction performance and across-group parity, in comparison with the single fairness model, group-separate model, and group-ignorant model, especially when the minority group's sample size is small. Finally, we demonstrate the utility of the JFM method in a real-world example to obtain fair risk predictions for underrepresented older patients diagnosed with coronavirus disease 2019 (COVID-19).


Assuntos
COVID-19 , Humanos , Modelos Logísticos , Algoritmos
8.
Proc Mach Learn Res ; 219: 128-149, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38707261

RESUMO

Survival analysis is a general framework for predicting the time until a specific event occurs, often in the presence of censoring. Although this framework is widely used in practice, few studies to date have considered fairness for time-to-event outcomes, despite recent significant advances in the algorithmic fairness literature more broadly. In this paper, we propose a framework to achieve demographic parity in survival analysis models by minimizing the mutual information between predicted time-to-event and sensitive attributes. We show that our approach effectively minimizes mutual information to encourage statistical independence of time-to-event predictions and sensitive attributes. Furthermore, we propose four types of disparity assessment metrics based on common survival analysis metrics. Through experiments on multiple benchmark datasets, we demonstrate that by minimizing the dependence between the prediction and the sensitive attributes, our method can systematically improve the fairness of survival predictions and is robust to censoring.

9.
Sci Data ; 9(1): 249, 2022 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-35637186

RESUMO

Changing wildfire regimes in the western US and other fire-prone regions pose considerable risks to human health and ecosystem function. However, our understanding of wildfire behavior is still limited by a lack of data products that systematically quantify fire spread, behavior and impacts. Here we develop a novel object-based system for tracking the progression of individual fires using 375 m Visible Infrared Imaging Radiometer Suite active fire detections. At each half-daily time step, fire pixels are clustered according to their spatial proximity, and are either appended to an existing active fire object or are assigned to a new object. This automatic system allows us to update the attributes of each fire event, delineate the fire perimeter, and identify the active fire front shortly after satellite data acquisition. Using this system, we mapped the history of California fires during 2012-2020. Our approach and data stream may be useful for calibration and evaluation of fire spread models, estimation of near-real-time wildfire emissions, and as means for prescribing initial conditions in fire forecast models.

10.
Proc Natl Acad Sci U S A ; 119(11): e2111547119, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35275788

RESUMO

SignificanceWith the increase in artificial intelligence in real-world applications, there is interest in building hybrid systems that take both human and machine predictions into account. Previous work has shown the benefits of separately combining the predictions of diverse machine classifiers or groups of people. Using a Bayesian modeling framework, we extend these results by systematically investigating the factors that influence the performance of hybrid combinations of human and machine classifiers while taking into account the unique ways human and algorithmic confidence is expressed.


Assuntos
Inteligência Artificial , Teorema de Bayes , Humanos
11.
Proc Mach Learn Res ; 162: 5286-5308, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37016636

RESUMO

Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term based solely on the linear components of the GLM, thus permitting efficient optimization. We also derive theoretical properties for the resulting fair GLM estimator. To empirically demonstrate the efficacy of the proposed fair GLM, we compare it with other wellknown fair prediction methods on an extensive set of benchmark datasets for binary classification and regression. In addition, we demonstrate that the fair GLM can generate fair predictions for a range of response variables, other than binary and continuous outcomes.

12.
Nat Clim Chang ; 11: 143-151, 2021 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-34163539

RESUMO

Future changes in the position of the intertropical convergence zone (ITCZ; a narrow band of heavy precipitation in the tropics) with climate change could affect the livelihood and food security of billions of people. Although models predict a future narrowing of the ITCZ, uncertainties remain large regarding its future position, with most past work focusing on zonal-mean shifts. Here we use projections from 27 state-of-the-art (CMIP6) climate models and document a robust zonally-varying ITCZ response to the SSP3-7.0 scenario by 2100, with a northward shift over eastern Africa and the Indian Ocean, and a southward shift in the eastern Pacific and Atlantic Oceans. The zonally-varying response is consistent with changes in the divergent atmospheric energy transport, and sector-mean shifts of the energy flux equator. Our analysis provides insight about mechanisms influencing the future position of the tropical rainbelt, and may allow for more robust projections of climate change impacts.

13.
ArXiv ; 2021 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-34012993

RESUMO

In data collection for predictive modeling, under-representation of certain groups, based on gender, race/ethnicity, or age, may yield less-accurate predictions for these groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Existing methods to achieve fairness in the machine learning literature typically build a single prediction model in a manner that encourages fair prediction performance for all groups. These approaches have two major limitations: i) fairness is often achieved by compromising accuracy for some groups; ii) the underlying relationship between dependent and independent variables may not be the same across groups. We propose a Joint Fairness Model (JFM) approach for logistic regression models for binary outcomes that estimates group-specific classifiers using a joint modeling objective function that incorporates fairness criteria for prediction. We introduce an Accelerated Smoothing Proximal Gradient Algorithm to solve the convex objective function, and present the key asymptotic properties of the JFM estimates. Through simulations, we demonstrate the efficacy of the JFM in achieving good prediction performance and across-group parity, in comparison with the single fairness model, group-separate model, and group-ignorant model, especially when the minority group's sample size is small. Finally, we demonstrate the utility of the JFM method in a real-world example to obtain fair risk predictions for under-represented older patients diagnosed with coronavirus disease 2019 (COVID-19).

14.
Patient Educ Couns ; 104(8): 2098-2105, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33468364

RESUMO

OBJECTIVE: Train machine learning models that automatically predict emotional valence of patient and physician in primary care visits. METHODS: Using transcripts from 353 primary care office visits with 350 patients and 84 physicians (Cook, 2002 [1], Tai-Seale et al., 2015 [2]), we developed two machine learning models (a recurrent neural network with a hierarchical structure and a logistic regression classifier) to recognize the emotional valence (positive, negative, neutral) (Posner et al., 2005 [3]) of each utterance. We examined the agreement of human-generated ratings of emotional valence with machine learning model ratings of emotion. RESULTS: The agreement of emotion ratings from the recurrent neural network model with human ratings was comparable to that of human-human inter-rater agreement. The weighted-average of the correlation coefficients for the recurrent neural network model with human raters was 0.60, and the human rater agreement was also 0.60. CONCLUSIONS: The recurrent neural network model predicted the emotional valence of patients and physicians in primary care visits with similar reliability as human raters. PRACTICE IMPLICATIONS: As the first machine learning-based evaluation of emotion recognition in primary care visit conversations, our work provides valuable baselines for future applications that might help monitor patient emotional signals, supporting physicians in empathic communication, or examining the role of emotion in patient-centered care.


Assuntos
Emoções , Médicos , Comunicação , Humanos , Visita a Consultório Médico , Atenção Primária à Saúde , Reprodutibilidade dos Testes
15.
Proc Mach Learn Res ; 149: 648-673, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35425906

RESUMO

The widespread availability of high-dimensional electronic healthcare record (EHR) datasets has led to significant interest in using such data to derive clinical insights and make risk predictions. More specifically, techniques from machine learning are being increasingly applied to the problem of dynamic survival analysis, where updated time-to-event risk predictions are learned as a function of the full covariate trajectory from EHR datasets. EHR data presents unique challenges in the context of dynamic survival analysis, involving a variety of decisions about data representation, modeling, interpretability, and clinically meaningful evaluation. In this paper we propose a new approach to dynamic survival analysis which addresses some of these challenges. Our modeling approach is based on learning a global parametric distribution to represent population characteristics and then dynamically locating individuals on the time-axis of this distribution conditioned on their histories. For evaluation we also propose a new version of the dynamic C-Index for clinically meaningful evaluation of dynamic survival models. To validate our approach we conduct dynamic risk prediction on three real-world datasets, involving COVID-19 severe outcomes, cardiovascular disease (CVD) onset, and primary biliary cirrhosis (PBC) time-to-transplant. We find that our proposed modeling approach is competitive with other well-known statistical and machine learning approaches for dynamic risk prediction, while offering potential advantages in terms of interepretability of predictions at the individual level.

16.
Proc Mach Learn Res ; 146: 159-170, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35372850

RESUMO

Dynamic survival analysis is a variant of traditional survival analysis where time-to-event predictions are updated as new information arrives about an individual over time. In this paper we propose a new approach to dynamic survival analysis based on learning a global parametric distribution, followed by individualization via truncating and renormalizing that distribution at different locations over time. We combine this approach with a likelihood-based loss that includes predictions at every time step within an individual's history, rather than just including one term per individual. The combination of this loss and model results in an interpretable approach to dynamic survival, requiring less fine tuning than existing methods, while still achieving good predictive performance. We evaluate the approach on the problem of predicting hospital mortality for a dataset with over 6900 COVID-19 patients.

17.
J Adv Model Earth Syst ; 12(9): e2019MS001955, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33042387

RESUMO

Fire emissions of gases and aerosols alter atmospheric composition and have substantial impacts on climate, ecosystem function, and human health. Warming climate and human expansion in fire-prone landscapes exacerbate fire impacts and call for more effective management tools. Here we developed a global fire forecasting system that predicts monthly emissions using past fire data and climate variables for lead times of 1 to 6 months. Using monthly fire emissions from the Global Fire Emissions Database (GFED) as the prediction target, we fit a statistical time series model, the Autoregressive Integrated Moving Average model with eXogenous variables (ARIMAX), in over 1,300 different fire regions. Optimized parameters were then used to forecast future emissions. The forecast system took into account information about region-specific seasonality, long-term trends, recent fire observations, and climate drivers representing both large-scale climate variability and local fire weather. We cross-validated the forecast skill of the system with different combinations of predictors and forecast lead times. The reference model, which combined endogenous and exogenous predictors with a 1 month forecast lead time, explained 52% of the variability in the global fire emissions anomaly, considerably exceeding the performance of a reference model that assumed persistent emissions during the forecast period. The system also successfully resolved detailed spatial patterns of fire emissions anomalies in regions with significant fire activity. This study bridges the gap between the efforts of near-real-time fire forecasts and seasonal fire outlooks and represents a step toward establishing an operational global fire, smoke, and carbon cycle forecasting system.

18.
J Clim ; 34(2): 737-754, 2020 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-34045793

RESUMO

Understanding the physical drivers of seasonal hydroclimatic variability and improving predictive skill remains a challenge with important socioeconomic and environmental implications for many regions around the world. Physics-based deterministic models show limited ability to predict precipitation as the lead time increases, due to imperfect representation of physical processes and incomplete knowledge of initial conditions. Similarly, statistical methods drawing upon established climate teleconnections have low prediction skill due to the complex nature of the climate system. Recently, promising data-driven approaches have been proposed, but they often suffer from overparameterization and overfitting due to the short observational record, and they often do not account for spatiotemporal dependencies among covariates (i.e., predictors such as sea surface temperatures). This study addresses these challenges via a predictive model based on a graph-guided regularizer that simultaneously promotes similarity of predictive weights for highly correlated covariates and enforces sparsity in the covariate domain. This approach both decreases the effective dimensionality of the problem and identifies the most predictive features without specifying them a priori. We use large ensemble simulations from a climate model to construct this regularizer, reducing the structural uncertainty in the estimation. We apply the learned model to predict winter precipitation in the southwestern United States using sea surface temperatures over the entire Pacific basin, and demonstrate its superiority compared to other regularization approaches and statistical models informed by known teleconnections. Our results highlight the potential to combine optimally the space-time structure of predictor variables learned from climate models with new graph-based regularizers to improve seasonal prediction.

19.
Cytometry A ; 97(3): 296-307, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31691488

RESUMO

High-throughput single-cell cytometry technologies have significantly improved our understanding of cellular phenotypes to support translational research and the clinical diagnosis of hematological and immunological diseases. However, subjective and ad hoc manual gating analysis does not adequately handle the increasing volume and heterogeneity of cytometry data for optimal diagnosis. Prior work has shown that machine learning can be applied to classify cytometry samples effectively. However, many of the machine learning classification results are either difficult to interpret without using characteristics of cell populations to make the classification, or suboptimal due to the use of inaccurate cell population characteristics derived from gating boundaries. To date, little has been done to optimize both the gating boundaries and the diagnostic accuracy simultaneously. In this work, we describe a fully discriminative machine learning approach that can simultaneously learn feature representations (e.g., combinations of coordinates of gating boundaries) and classifier parameters for optimizing clinical diagnosis from cytometry measurements. The approach starts from an initial gating position and then refines the position of the gating boundaries by gradient descent until a set of globally-optimized gates across different samples are achieved. The learning procedure is constrained by regularization terms encoding domain knowledge that encourage the algorithm to seek interpretable results. We evaluate the proposed approach using both simulated and real data, producing classification results on par with those generated via human expertise, in terms of both the positions of the gating boundaries and the diagnostic accuracy. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.


Assuntos
Algoritmos , Aprendizado de Máquina , Citometria de Fluxo , Humanos
20.
J Am Med Inform Assoc ; 26(12): 1493-1504, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31532490

RESUMO

OBJECTIVE: Amid electronic health records, laboratory tests, and other technology, office-based patient and provider communication is still the heart of primary medical care. Patients typically present multiple complaints, requiring physicians to decide how to balance competing demands. How this time is allocated has implications for patient satisfaction, payments, and quality of care. We investigate the effectiveness of machine learning methods for automated annotation of medical topics in patient-provider dialog transcripts. MATERIALS AND METHODS: We used dialog transcripts from 279 primary care visits to predict talk-turn topic labels. Different machine learning models were trained to operate on single or multiple local talk-turns (logistic classifiers, support vector machines, gated recurrent units) as well as sequential models that integrate information across talk-turn sequences (conditional random fields, hidden Markov models, and hierarchical gated recurrent units). RESULTS: Evaluation was performed using cross-validation to measure 1) classification accuracy for talk-turns and 2) precision, recall, and F1 scores at the visit level. Experimental results showed that sequential models had higher classification accuracy at the talk-turn level and higher precision at the visit level. Independent models had higher recall scores at the visit level compared with sequential models. CONCLUSIONS: Incorporating sequential information across talk-turns improves the accuracy of topic prediction in patient-provider dialog by smoothing out noisy information from talk-turns. Although the results are promising, more advanced prediction techniques and larger labeled datasets will likely be required to achieve prediction performance appropriate for real-world clinical applications.


Assuntos
Comunicação , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação , Relações Médico-Paciente , Idoso , Conjuntos de Dados como Assunto , Humanos , Prontuários Médicos , Pessoa de Meia-Idade , Visita a Consultório Médico , Atenção Primária à Saúde , Gravação em Fita
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...