Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 964
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 121(33): e2408731121, 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39106305

RESUMO

AI is now an integral part of everyday decision-making, assisting us in both routine and high-stakes choices. These AI models often learn from human behavior, assuming this training data is unbiased. However, we report five studies that show that people change their behavior to instill desired routines into AI, indicating this assumption is invalid. To show this behavioral shift, we recruited participants to play the ultimatum game, where they were asked to decide whether to accept proposals of monetary splits made by either other human participants or AI. Some participants were informed their choices would be used to train an AI proposer, while others did not receive this information. Across five experiments, we found that people modified their behavior to train AI to make fair proposals, regardless of whether they could directly benefit from the AI training. After completing this task once, participants were invited to complete this task again but were told their responses would not be used for AI training. People who had previously trained AI persisted with this behavioral shift, indicating that the new behavioral routine had become habitual. This work demonstrates that using human behavior as training data has more consequences than previously thought since it can engender AI to perpetuate human biases and cause people to form habits that deviate from how they would normally act. Therefore, this work underscores a problem for AI algorithms that aim to learn unbiased representations of human preferences.


Assuntos
Inteligência Artificial , Tomada de Decisões , Humanos , Tomada de Decisões/fisiologia , Masculino , Feminino , Adulto , Comportamento de Escolha/fisiologia , Adulto Jovem
2.
Proc Natl Acad Sci U S A ; 120(38): e2301781120, 2023 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-37695896

RESUMO

Across many cultural contexts, the majority of women conduct the majority of their household labor. This gendered distribution of labor is often unequal, and thus represents one of the most frequently experienced forms of daily inequality because it occurs within one's own home. Young children are often passive observers of their family's distribution of labor, and yet little is known about the developmental onset of their perceptions of it. By the preschool age, children also show strong normative feelings about both equal resource distribution and gender stereotypes. To investigate the developmental onset of children's recognition of the (in)equality of household labor, we interviewed 3 to 10-y-old children in two distinct cultural contexts (US and China) and surveyed their caregivers about who does more household labor across a variety of tasks. Even at the youngest ages and in both cultural contexts, children's reports largely matched their parents', with both populations reporting that mothers do the majority of household labor. Both children and parents judged this to be generally fair, suggesting that children are observant of the gendered distribution of labor within their households, and show normalization of inequality from a young age. Our results point to preschool age as a critical developmental time period during which it is important to have parent-child discussions about structural constraints surrounding gender norms and household labor.


Assuntos
Comparação Transcultural , Equidade de Gênero , Papel de Gênero , Trabalho , Pré-Escolar , Feminino , Humanos , Povo Asiático , China , População do Leste Asiático , Emoções , Criança , Estados Unidos , Equidade de Gênero/etnologia , Equidade de Gênero/psicologia , Normas Sociais/etnologia , Trabalho/psicologia , Zeladoria , Características da Família/etnologia
3.
Proc Natl Acad Sci U S A ; 120(18): e2213709120, 2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37094137

RESUMO

The philosopher John Rawls proposed the Veil of Ignorance (VoI) as a thought experiment to identify fair principles for governing a society. Here, we apply the VoI to an important governance domain: artificial intelligence (AI). In five incentive-compatible studies (N = 2, 508), including two preregistered protocols, participants choose principles to govern an Artificial Intelligence (AI) assistant from behind the veil: that is, without knowledge of their own relative position in the group. Compared to participants who have this information, we find a consistent preference for a principle that instructs the AI assistant to prioritize the worst-off. Neither risk attitudes nor political preferences adequately explain these choices. Instead, they appear to be driven by elevated concerns about fairness: Without prompting, participants who reason behind the VoI more frequently explain their choice in terms of fairness, compared to those in the Control condition. Moreover, we find initial support for the ability of the VoI to elicit more robust preferences: In the studies presented here, the VoI increases the likelihood of participants continuing to endorse their initial choice in a subsequent round where they know how they will be affected by the AI intervention and have a self-interested motivation to change their mind. These results emerge in both a descriptive and an immersive game. Our findings suggest that the VoI may be a suitable mechanism for selecting distributive principles to govern AI.


Assuntos
Inteligência Artificial , Sociedades , Humanos , Justiça Social
4.
Proc Natl Acad Sci U S A ; 120(9): e2204781120, 2023 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-36827260

RESUMO

Machine learning (ML) techniques are increasingly prevalent in education, from their use in predicting student dropout to assisting in university admissions and facilitating the rise of massive open online courses (MOOCs). Given the rapid growth of these novel uses, there is a pressing need to investigate how ML techniques support long-standing education principles and goals. In this work, we shed light on this complex landscape drawing on qualitative insights from interviews with education experts. These interviews comprise in-depth evaluations of ML for education (ML4Ed) papers published in preeminent applied ML conferences over the past decade. Our central research goal is to critically examine how the stated or implied education and societal objectives of these papers are aligned with the ML problems they tackle. That is, to what extent does the technical problem formulation, objectives, approach, and interpretation of results align with the education problem at hand? We find that a cross-disciplinary gap exists and is particularly salient in two parts of the ML life cycle: the formulation of an ML problem from education goals and the translation of predictions to interventions. We use these insights to propose an extended ML life cycle, which may also apply to the use of ML in other domains. Our work joins a growing number of meta-analytical studies across education and ML research as well as critical analyses of the societal impact of ML. Specifically, it fills a gap between the prevailing technical understanding of machine learning and the perspective of education researchers working with students and in policy.


Assuntos
Objetivos , Aprendizado de Máquina , Estudantes , Humanos
5.
Cereb Cortex ; 34(2)2024 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-38383724

RESUMO

Human behavior often aligns with fairness norms, either voluntarily or under external pressure, like sanctions. Prior research has identified distinct neural activation patterns associated with voluntary and sanction-based compliance or non-compliance with fairness norms. However, an investigation gap exists into potential neural connectivity patterns and sex-based differences. To address this, we conducted a study using a monetary allocation game and functional magnetic resonance imaging to examine how neural activity and connectivity differ between sexes across three norm compliance conditions: voluntary, sanction-based, and voluntary post-sanctions. Fifty-five adults (27 females) participated, revealing that punishment influenced decisions, leading to strategic calculations and reduced generosity in voluntary compliance post-sanctions. Moreover, there were sex-based differences in neural activation and connectivity across the different compliance conditions. Specifically, the connectivity between the right dorsolateral prefrontal cortex and right dorsal anterior insular appeared to mediate intuitive preferences, with variations across norm compliance conditions and sexes. These findings imply potential sex-based differences in intuitive motivation for diverse norm compliance conditions. Our insights contribute to a better understanding of the neural pathways involved in fairness norm compliance and clarify sex-based differences, offering implications for future investigations into psychiatric and neurological disorders characterized by atypical socialization and mentalizing.


Assuntos
Imageamento por Ressonância Magnética , Comportamento Social , Adulto , Feminino , Humanos , Caracteres Sexuais , Motivação , Córtex Insular
6.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35046023

RESUMO

The gold-standard approaches for gleaning statistically valid conclusions from data involve random sampling from the population. Collecting properly randomized data, however, can be challenging, so modern statistical methods, including propensity score reweighting, aim to enable valid inferences when random sampling is not feasible. We put forth an approach for making inferences based on available data from a source population that may differ in composition in unknown ways from an eventual target population. Whereas propensity scoring requires a separate estimation procedure for each different target population, we show how to build a single estimator, based on source data alone, that allows for efficient and accurate estimates on any downstream target data. We demonstrate, theoretically and empirically, that our target-independent approach to inference, which we dub "universal adaptability," is competitive with target-specific approaches that rely on propensity scoring. Our approach builds on a surprising connection between the problem of inferences in unspecified target populations and the multicalibration problem, studied in the burgeoning field of algorithmic fairness. We show how the multicalibration framework can be employed to yield valid inferences from a single source population across a diverse set of target populations.

7.
Proc Natl Acad Sci U S A ; 119(11): e2115293119, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35259009

RESUMO

SignificanceDecision makers now use algorithmic personalization for resource allocation decisions in many domains (e.g., medical treatments, hiring decisions, product recommendations, or dynamic pricing). An inherent risk of personalization is disproportionate targeting of individuals from certain protected groups. Existing solutions that firms use to avoid this bias often do not eliminate the bias and may even exacerbate it. We propose BEAT (bias-eliminating adapted trees) to ensure balanced allocation of resources across individuals-guaranteeing both group and individual fairness-while still leveraging the value of personalization. We validate our method using simulations as well as an online experiment with N = 3,146 participants. BEAT is easy to implement in practice, has desirable scalability properties, and is applicable to many personalization problems.

8.
Neuroimage ; 290: 120565, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38453102

RESUMO

People tend to perceive the same information differently depending on whether it is expressed in an individual or a group frame. It has also been found that the individual (vs. group) frame of expression tends to lead to more charitable giving and greater tolerance of wealth inequality. However, little is known about whether the same resource allocation in social interactions elicits distinct responses depending on proposer type. Using the second-party punishment task, this study examined whether the same allocation from different proposers (individual vs. group) leads to differences in recipient behavior and the neural mechanisms. Behavioral results showed that reaction times were longer in the unfair (vs. fair) condition, and this difference was more pronounced when the proposer was the individual (vs. group). Neural results showed that proposer type (individual vs. group) influenced early automatic processing (indicated by AN1, P2, and central alpha band), middle processing (indicated by MFN and right frontal theta band), and late elaborative processing (indicated by P3 and parietal alpha band) of fairness in resource allocation. These results revealed more attentional resources were captured by the group proposer in the early stage of fairness processing, and more cognitive resources were consumed by processing group-proposed unfair allocations in the late stage, possibly because group proposers are less identifiable than individual proposers. The findings provide behavioral and neural evidence for the effects of "individual/group" framing leading to cognitive differences. They also deliver insights into social governance issues, such as punishing individual and/or group violations.


Assuntos
Eletroencefalografia , Jogos Experimentais , Humanos , Potenciais Evocados/fisiologia , Interação Social , Punição/psicologia
9.
Cancer ; 130(12): 2101-2107, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38554271

RESUMO

Modern artificial intelligence (AI) tools built on high-dimensional patient data are reshaping oncology care, helping to improve goal-concordant care, decrease cancer mortality rates, and increase workflow efficiency and scope of care. However, data-related concerns and human biases that seep into algorithms during development and post-deployment phases affect performance in real-world settings, limiting the utility and safety of AI technology in oncology clinics. To this end, the authors review the current potential and limitations of predictive AI for cancer diagnosis and prognostication as well as of generative AI, specifically modern chatbots, which interfaces with patients and clinicians. They conclude the review with a discussion on ongoing challenges and regulatory opportunities in the field.


Assuntos
Inteligência Artificial , Oncologia , Neoplasias , Humanos , Oncologia/métodos , Neoplasias/terapia , Neoplasias/diagnóstico , Algoritmos , Prognóstico
10.
Biostatistics ; 2023 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-37660301

RESUMO

Along with the increasing availability of health data has come the rise of data-driven models to inform decision making and policy. These models have the potential to benefit both patients and health care providers but can also exacerbate health inequities. Existing "algorithmic fairness" methods for measuring and correcting model bias fall short of what is needed for health policy in two key ways. First, methods typically focus on a single grouping along which discrimination may occur rather than considering multiple, intersecting groups. Second, in clinical applications, risk prediction is typically used to guide treatment, creating distinct statistical issues that invalidate most existing techniques. We present novel unfairness metrics that address both challenges. We also develop a complete framework of estimation and inference tools for our metrics, including the unfairness value ("u-value"), used to determine the relative extremity of unfairness, and standard errors and confidence intervals employing an alternative to the standard bootstrap. We demonstrate application of our framework to a COVID-19 risk prediction model deployed in a major Midwestern health system.

11.
Psychol Sci ; 35(1): 93-107, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38190225

RESUMO

We examined how 5- to 8-year-olds (N = 51; Mage = 83 months; 27 female, 24 male; 69% White, 12% Black/African American, 8% Asian/Asian American, 6% Hispanic, 6% not reported) and adults (N = 18; Mage = 20.13 years; 11 female, 7 male) accepted or rejected different distributions of resources between themselves and others. We used a reach-tracking method to track finger movement in 3D space over time. This allowed us to dissociate two inhibitory processes. One involved pausing motor responses to detect conflict between observed information and how participants thought resources should be divided; the other involved resolving the conflict between the response and the alternative. Reasoning about disadvantageous inequities involved more of the first system, and this was stable across development. Reasoning about advantageous inequities involved more of the second system and showed more of a developmental progression. Generally, reach tracking offers an on-line measure of inhibitory control for the study of cognition.


Assuntos
Julgamento , Comportamento Social , Adulto , Criança , Feminino , Humanos , Masculino , Adulto Jovem , Cognição , Resolução de Problemas
12.
Psychol Sci ; 35(5): 529-542, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38593467

RESUMO

Countless policies are crafted with the intention of punishing all who do wrong or rewarding only those who do right. However, this requires accommodating certain mistakes: some who do not deserve to be punished might be, and some who deserve to be rewarded might not be. Six preregistered experiments (N = 3,484 U.S. adults) reveal that people are more willing to accept this trade-off in principle, before errors occur, than in practice, after errors occur. The result is an asymmetry such that for punishments, people believe it is more important to prevent false negatives (e.g., criminals escaping justice) than to fix them, and more important to fix false positives (e.g., wrongful convictions) than to prevent them. For rewards, people believe it is more important to prevent false positives (e.g., welfare fraud) than to fix them and more important to fix false negatives (e.g., improperly denied benefits) than to prevent them.


Assuntos
Punição , Humanos , Adulto , Masculino , Feminino , Recompensa , Adulto Jovem
13.
Brain Topogr ; 2024 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-38200358

RESUMO

Altruistic punishment is a primary response to social norms violations; its neural mechanism has also attracted extensive research attention. In the present studies, we applied a low-frequency repetitive transcranial magnetic stimulation (rTMS) to the bilateral dorsolateral prefrontal cortex (DLPFC) while participants engaged in a modified Ultimatum Game (Study 1) and third-party punishment game (Study 2) to explore how the bilateral DLPFC disruption affects people's perception of violation of fairness norms and altruistic punishment decision in the gain and loss contexts. Typically, punishers intervene more often against and show more social outrage towards Dictators/Proposers who unfairly distribute losses than those who unfairly share gains. We found that disrupting the function of the left DLPFC in the second-party punishment and the bilateral DLPFC in the third-party punishment with rTMS effectively obliterated this difference, making participants punish unfairly shared gains as often as they usually would punish unfairly shared losses. In the altruistic punishment of maintaining the social fairness norms, the inhibition of the right DLPFC function will affect the deviation of individual information integration ability; the inhibition of the left DLPFC function will affect the assessment of the degree of violation of fairness norms and weaken impulse control, leading to attenuate the moderating effect of gain and loss contexts on altruistic punishment. Our findings emphasize that DLPFC is closely related to altruistic punishment and provide causal neuroscientific evidence.

14.
Brain Topogr ; 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38448713

RESUMO

Social norms and altruistic punitive behaviours are both based on the integration of information from multiple contexts. Individual behavioural performance can be altered by loss and gain contexts, which produce different mental states and subjective perceptions. In this study, we used event-related potential and time-frequency techniques to examine performance on a third-party punishment task and to explore the neural mechanisms underlying context-dependent differences in punishment decisions. The results indicated that individuals were more likely to reject unfairness in the context of loss (vs. gain) and to increase punishment as unfairness increased. In contrast, fairness appeared to cause an early increase in cognitive control signal enhancement, as indicated by the P2 amplitude and theta oscillations, and a later increase in emotional and motivational salience during decision-making in gain vs. loss contexts, as indicated by the medial frontal negativity and beta oscillations. In summary, individuals were more willing to sanction violations of social norms in the loss context than in the gain context and rejecting unfair losses induced more equity-related cognitive conflict than accepting unfair gains, highlighting the importance of context (i.e., gain vs. loss) in equity-related social decision-making processes.

15.
J Biomed Inform ; 156: 104671, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38876452

RESUMO

Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods' performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Fenótipo , Humanos , Medicina de Precisão/métodos , Sepse/diagnóstico , Viés , Pneumonia/diagnóstico , Biologia Computacional/métodos , Algoritmos
16.
J Biomed Inform ; 154: 104646, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38677633

RESUMO

OBJECTIVES: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. METHODS: We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. RESULTS: The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.


Assuntos
Inteligência Artificial , Viés , Processamento de Linguagem Natural , Humanos , Inquéritos e Questionários , Aprendizado de Máquina , Algoritmos
17.
J Biomed Inform ; 155: 104656, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38782170

RESUMO

OBJECTIVE: Healthcare continues to grapple with the persistent issue of treatment disparities, sparking concerns regarding the equitable allocation of treatments in clinical practice. While various fairness metrics have emerged to assess fairness in decision-making processes, a growing focus has been on causality-based fairness concepts due to their capacity to mitigate confounding effects and reason about bias. However, the application of causal fairness notions in evaluating the fairness of clinical decision-making with electronic health record (EHR) data remains an understudied domain. This study aims to address the methodological gap in assessing causal fairness of treatment allocation with electronic health records data. In addition, we investigate the impact of social determinants of health on the assessment of causal fairness of treatment allocation. METHODS: We propose a causal fairness algorithm to assess fairness in clinical decision-making. Our algorithm accounts for the heterogeneity of patient populations and identifies potential unfairness in treatment allocation by conditioning on patients who have the same likelihood to benefit from the treatment. We apply this framework to a patient cohort with coronary artery disease derived from an EHR database to evaluate the fairness of treatment decisions. RESULTS: Our analysis reveals notable disparities in coronary artery bypass grafting (CABG) allocation among different patient groups. Women were found to be 4.4%-7.7% less likely to receive CABG than men in two out of four treatment response strata. Similarly, Black or African American patients were 5.4%-8.7% less likely to receive CABG than others in three out of four response strata. These results were similar when social determinants of health (insurance and area deprivation index) were dropped from the algorithm. These findings highlight the presence of disparities in treatment allocation among similar patients, suggesting potential unfairness in the clinical decision-making process. CONCLUSION: This study introduces a novel approach for assessing the fairness of treatment allocation in healthcare. By incorporating responses to treatment into fairness framework, our method explores the potential of quantifying fairness from a causal perspective using EHR data. Our research advances the methodological development of fairness assessment in healthcare and highlight the importance of causality in determining treatment fairness.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Humanos , Masculino , Feminino , Tomada de Decisão Clínica , Doença da Artéria Coronariana/terapia , Disparidades em Assistência à Saúde , Pessoa de Meia-Idade , Determinantes Sociais da Saúde , Causalidade
18.
J Biomed Inform ; 149: 104545, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37992791

RESUMO

Liver transplantation is a life-saving procedure for patients with end-stage liver disease. There are two main challenges in liver transplant: finding the best matching patient for a donor and ensuring transplant equity among different subpopulations. The current MELD scoring system evaluates a patient's mortality risk if not receiving an organ within 90 days. However, the donor-patient matching should also consider post-transplant risk factors, such as cardiovascular disease, chronic rejection, etc., which are all common complications after transplant. Accurate prediction of these risk scores remains a significant challenge. In this study, we used predictive models to solve the above challenges. Specifically, we proposed a deep learning model to predict multiple risk factors after a liver transplant. By formulating it as a multi-task learning problem, the proposed deep neural network was trained to simultaneously predict the five post-transplant risks and achieve equal good performance by exploiting task-balancing techniques. We also proposed a novel fairness-achieving algorithm to ensure prediction fairness across different subpopulations. We used electronic health records of 160,360 liver transplant patients, including demographic information, clinical variables, and laboratory values, collected from the liver transplant records of the United States from 1987 to 2018. The model's performance was evaluated using various performance metrics such as AUROC and AUPRC. Our experiment results highlighted the success of our multi-task model in achieving task balance while maintaining accuracy. The model significantly reduced the task discrepancy by 39 %. Further application of the fairness-achieving algorithm substantially reduced fairness disparity among all sensitive attributes (gender, age group, and race/ethnicity) in each risk factor. It underlined the potency of integrating fairness considerations into the task-balancing framework, ensuring robust and fair predictions across multiple tasks and diverse demographic groups.


Assuntos
Aprendizado Profundo , Transplante de Fígado , Humanos , Estados Unidos , Doadores de Tecidos , Redes Neurais de Computação , Fatores de Risco
19.
J Biomed Inform ; 154: 104654, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38740316

RESUMO

OBJECTIVES: We evaluated methods for preparing electronic health record data to reduce bias before applying artificial intelligence (AI). METHODS: We created methods for transforming raw data into a data framework for applying machine learning and natural language processing techniques for predicting falls and fractures. Strategies such as inclusion and reporting for multiple races, mixed data sources such as outpatient, inpatient, structured codes, and unstructured notes, and addressing missingness were applied to raw data to promote a reduction in bias. The raw data was carefully curated using validated definitions to create data variables such as age, race, gender, and healthcare utilization. For the formation of these variables, clinical, statistical, and data expertise were used. The research team included a variety of experts with diverse professional and demographic backgrounds to include diverse perspectives. RESULTS: For the prediction of falls, information extracted from radiology reports was converted to a matrix for applying machine learning. The processing of the data resulted in an input of 5,377,673 reports to the machine learning algorithm, out of which 45,304 were flagged as positive and 5,332,369 as negative for falls. Processed data resulted in lower missingness and a better representation of race and diagnosis codes. For fractures, specialized algorithms extracted snippets of text around keywork "femoral" from dual x-ray absorptiometry (DXA) scans to identify femoral neck T-scores that are important for predicting fracture risk. The natural language processing algorithms yielded 98% accuracy and 2% error rate The methods to prepare data for input to artificial intelligence processes are reproducible and can be applied to other studies. CONCLUSION: The life cycle of data from raw to analytic form includes data governance, cleaning, management, and analysis. When applying artificial intelligence methods, input data must be prepared optimally to reduce algorithmic bias, as biased output is harmful. Building AI-ready data frameworks that improve efficiency can contribute to transparency and reproducibility. The roadmap for the application of AI involves applying specialized techniques to input data, some of which are suggested here. This study highlights data curation aspects to be considered when preparing data for the application of artificial intelligence to reduce bias.


Assuntos
Acidentes por Quedas , Algoritmos , Inteligência Artificial , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Processamento de Linguagem Natural , Humanos , Acidentes por Quedas/prevenção & controle , Fraturas Ósseas , Feminino
20.
J Biomed Inform ; 156: 104664, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38851413

RESUMO

OBJECTIVE: Guidance on how to evaluate accuracy and algorithmic fairness across subgroups is missing for clinical models that flag patients for an intervention but when health care resources to administer that intervention are limited. We aimed to propose a framework of metrics that would fit this specific use case. METHODS: We evaluated the following metrics and applied them to a Veterans Health Administration clinical model that flags patients for intervention who are at risk of overdose or a suicidal event among outpatients who were prescribed opioids (N = 405,817): Receiver - Operating Characteristic and area under the curve, precision - recall curve, calibration - reliability curve, false positive rate, false negative rate, and false omission rate. In addition, we developed a new approach to visualize false positives and false negatives that we named 'per true positive bars.' We demonstrate the utility of these metrics to our use case for three cohorts of patients at the highest risk (top 0.5 %, 1.0 %, and 5.0 %) by evaluating algorithmic fairness across the following age groups: <=30, 31-50, 51-65, and >65 years old. RESULTS: Metrics that allowed us to assess group differences more clearly were the false positive rate, false negative rate, false omission rate, and the new 'per true positive bars'. Metrics with limited utility to our use case were the Receiver - Operating Characteristic and area under the curve, the calibration - reliability curve, and the precision - recall curve. CONCLUSION: There is no "one size fits all" approach to model performance monitoring and bias analysis. Our work informs future researchers and clinicians who seek to evaluate accuracy and fairness of predictive models that identify patients to intervene on in the context of limited health care resources. In terms of ease of interpretation and utility for our use case, the new 'per true positive bars' may be the most intuitive to a range of stakeholders and facilitates choosing a threshold that allows weighing false positives against false negatives, which is especially important when predicting severe adverse events.


Assuntos
Algoritmos , Sistemas de Apoio a Decisões Clínicas , Humanos , Pessoa de Meia-Idade , Adulto , Idoso , Reprodutibilidade dos Testes , Curva ROC , Feminino , Masculino , Estados Unidos , United States Department of Veterans Affairs
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA