Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 996
Filter
1.
Algorithms ; 17(4)2024 Apr.
Article in English | MEDLINE | ID: mdl-38962581

ABSTRACT

Breast cancer is the most common cancer affecting women globally. Despite the significant impact of deep learning models on breast cancer diagnosis and treatment, achieving fairness or equitable outcomes across diverse populations remains a challenge when some demographic groups are underrepresented in the training data. We quantified the bias of models trained to predict breast cancer stage from a dataset consisting of 1000 biopsies from 842 patients provided by AIM-Ahead (Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity). Notably, the majority of data (over 70%) were from White patients. We found that prior to post-processing adjustments, all deep learning models we trained consistently performed better for White patients than for non-White patients. After model calibration, we observed mixed results, with only some models demonstrating improved performance. This work provides a case study of bias in breast cancer medical imaging models and highlights the challenges in using post-processing to attempt to achieve fairness.

2.
Diagn Interv Radiol ; 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38953330

ABSTRACT

Although artificial intelligence (AI) methods hold promise for medical imaging-based prediction tasks, their integration into medical practice may present a double-edged sword due to bias (i.e., systematic errors). AI algorithms have the potential to mitigate cognitive biases in human interpretation, but extensive research has highlighted the tendency of AI systems to internalize biases within their model. This fact, whether intentional or not, may ultimately lead to unintentional consequences in the clinical setting, potentially compromising patient outcomes. This concern is particularly important in medical imaging, where AI has been more progressively and widely embraced than any other medical field. A comprehensive understanding of bias at each stage of the AI pipeline is therefore essential to contribute to developing AI solutions that are not only less biased but also widely applicable. This international collaborative review effort aims to increase awareness within the medical imaging community about the importance of proactively identifying and addressing AI bias to prevent its negative consequences from being realized later. The authors began with the fundamentals of bias by explaining its different definitions and delineating various potential sources. Strategies for detecting and identifying bias were then outlined, followed by a review of techniques for its avoidance and mitigation. Moreover, ethical dimensions, challenges encountered, and prospects were discussed.

3.
Article in English | MEDLINE | ID: mdl-38960729

ABSTRACT

OBJECTIVE: This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups. MATERIALS AND METHODS: Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation. RESULTS: Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach. DISCUSSION: Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice. CONCLUSIONS: Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.

4.
Curr Opin Psychol ; 58: 101836, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38981371

ABSTRACT

Algorithmic bias has emerged as a critical challenge in the age of responsible production of artificial intelligence (AI). This paper reviews recent research on algorithmic bias and proposes increased engagement of psychological and social science research to understand antecedents and consequences of algorithmic bias. Through the lens of the 3-D Dependable AI Framework, this article explores how social science disciplines, such as psychology, can contribute to identifying and mitigating bias at the Design, Develop, and Deploy stages of the AI life cycle. Finally, we propose future research directions to further address the complexities of algorithmic bias and its societal implications.

6.
Meta Radiol ; 2(3)2024 Sep.
Article in English | MEDLINE | ID: mdl-38947177

ABSTRACT

Fairness of artificial intelligence and machine learning models, often caused by imbalanced datasets, has long been a concern. While many efforts aim to minimize model bias, this study suggests that traditional fairness evaluation methods may be biased, highlighting the need for a proper evaluation scheme with multiple evaluation metrics due to varying results under different criteria. Moreover, the limited data size of minority groups introduces significant data uncertainty, which can undermine the judgement of fairness. This paper introduces an innovative evaluation approach that estimates data uncertainty in minority groups through bootstrapping from majority groups for a more objective statistical assessment. Extensive experiments reveal that traditional evaluation methods might have drawn inaccurate conclusions about model fairness. The proposed method delivers an unbiased fairness assessment by adeptly addressing the inherent complications of model evaluation on imbalanced datasets. The results show that such comprehensive evaluation can provide more confidence when adopting those models.

7.
Data Brief ; 55: 110598, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38974007

ABSTRACT

In online food delivery apps, customers write reviews to reflect their experiences. However, certain restaurants use a "review event" strategy to solicit favorable reviews from customers and boost their revenue. Review event is a marketing strategy where a restaurant owner gives free services to customers in return for a promise to write a review. Nevertheless, current datasets of app reviews for food delivery services neglect this situation. Furthermore, there appears to be an absence of datasets with reviews written in Korean. To solve this gap, this paper presents a dataset that contains reviews obtained from restaurants on a Korean app which use a review event strategy. A total of 128,668 reviews were gathered from 136 restaurants through crawling reviews using the Selenium library in Python. The dataset consists of detailed information of each review which contains information about ordered dishes, each review's written time, whether the food image is included in the review or not, and various star ratings such as total, taste, quantity, and delivery ratings. This dataset supports an innovative process of preparing AI training data for achieving fairness AI by proposing a bias-free dataset of food delivery app reviews with data poisoning attacks as an example.Additionally, the dataset is beneficial for researchers who are examining review events or analyzing the sentiment of food delivery app reviews.

8.
Diagnostics (Basel) ; 14(13)2024 Jun 26.
Article in English | MEDLINE | ID: mdl-39001244

ABSTRACT

Primary Immune Thrombocytopenia (ITP) is a rare autoimmune disease characterised by the immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP are challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcome. In this work, we conduct a feasibility study to check if machine learning can be applied effectively for the diagnosis of ITP using routine blood tests and demographic data in a non-acute outpatient setting. Various ML models, including Logistic Regression, Support Vector Machine, k-Nearest Neighbor, Decision Tree and Random Forest, were applied to data from the UK Adult ITP Registry and a general haematology clinic. Two different approaches were investigated: a demographic-unaware and a demographic-aware one. We conduct extensive experiments to evaluate the predictive performance of these models and approaches, as well as their bias. The results revealed that Decision Tree and Random Forest models were both superior and fair, achieving nearly perfect predictive and fairness scores, with platelet count identified as the most significant variable. Models not provided with demographic information performed better in terms of predictive accuracy but showed lower fairness scores, illustrating a trade-off between predictive performance and fairness.

9.
J Biomed Inform ; : 104692, 2024 Jul 13.
Article in English | MEDLINE | ID: mdl-39009174

ABSTRACT

BACKGROUND: An inherent difference exists between male and female bodies, the historical under-representation of females in clinical trials widened this gap in existing healthcare data. The fairness of clinical decision-support tools is at risk when developed based on biased data. This paper aims to quantitatively assess the gender bias in risk prediction models. We aim to generalize our findings by performing this investigation on multiple use cases at different hospitals. METHODS: First, we conduct a thorough analysis of the source data to find gender-based disparities. Secondly, we assess the model performance on different gender groups at different hospitals and on different use cases. Performance evaluation is quantified using the area under the receiver-operating characteristic curve (AUROC). Lastly, we investigate the clinical implications of these biases by analyzing the underdiagnosis and overdiagnosis rate, and the decision curve analysis (DCA). We also investigate the influence of model calibration on mitigating gender-related disparities in decision-making processes. RESULTS: Our data analysis reveals notable variations in incidence rates, AUROC, and over-diagnosis rates across different genders, hospitals and clinical use cases. However, it is also observed the underdiagnosis rate is consistently higher in the female population. In general, the female population exhibits lower incidence rates and the models perform worse when applied to this group. Furthermore, the decision curve analysis demonstrates there is no statistically significant difference between the model's clinical utility across gender groups within the interested range of thresholds. CONCLUSION: The presence of gender bias within risk prediction models varies across different clinical use cases and healthcare institutions. Although inherent difference is observed between male and female populations at the data source level, this variance does not affect the parity of clinical utility. In conclusion, the evaluations conducted in this study highlight the significance of continuous monitoring of gender-based disparities in various perspectives for clinical risk prediction models.

10.
Soc Dev ; 33(2)2024 May.
Article in English | MEDLINE | ID: mdl-38993500

ABSTRACT

This study examined children's responses to targeted and collective punishment. Thirty-six 4-5-year-olds and 36 6-7-year-olds (36 females; 54 White; data collected 2018-2019 in the United States) experienced three classroom punishment situations: Targeted (only transgressing student punished), Collective (one student transgressed, all students punished), and Baseline (all students transgressed, all punished). The older children evaluated collective punishment as less fair than targeted, whereas younger children evaluated both similarly. Across ages, children distributed fewer resources to teachers who administered collective than targeted punishment, and rated transgressors more negatively and distributed fewer resources to transgressors in Collective and Targeted than Baseline. These findings demonstrate children's increasing understanding of punishment and point to the potential impact of different forms of punishment on children's social lives.

11.
Indian Dermatol Online J ; 15(3): 504-506, 2024.
Article in English | MEDLINE | ID: mdl-38845668

ABSTRACT

Topical corticosteroid (TC) abuse is a common, worldwide, problem. One of the recent emerging concerns is the adulteration of TC in fairness cream. The presence of TC in skin-whitening cosmetic creams can be detected by high-performance liquid chromatography (HPLC). Since HPLC is expensive, time-taking and not easily available, we suggest the use of histamine wheal test as a simple and inexpensive test to detect the presence of topical steroids in fairness cream.

12.
Cortex ; 177: 53-67, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38838559

ABSTRACT

How to fairly allocate goods is a key issue of social decision-making. Extensive research demonstrates that people do not selfishly maximize their own benefits, but instead also consider how others are affected. However, most accounts of the psychological processes underlying fairness-related behavior implicitly assume that assessments of fairness are somewhat stable. In this paper, we present results of a novel task, the Re-Allocation Game, in which two players receive an allocation determined by the computer and, on half of the trials, one player has the subsequent possibility to change this allocation. Importantly, prior to the receipt of the allocation, players were shown either their respective financial situations, their respective performance on a previous simple task, or random information, while being scanned using functional neuroimaging. As expected, our results demonstrate when given the opportunity, participants allocated on average almost half the money to anonymous others. However, our findings further show that participants used the provided information in a dynamic manner, revealing the underlying principle based on which people re-allocate money - namely based on merit, need, or equality - switches dynamically. On the neural level, we identified activity in the right and left dorsolateral prefrontal cortices related to context-independent inequity and context-dependent fairness information respectively when viewing the computer-generated allocations. At the same time, activity in the temporoparietal and precuneus represented these different types of fairness-related information in adjacent and partially overlapping clusters. Finally, we observed that the activity pattern in the precuneus and putamen was most clearly related to participants' subsequent re-allocation decisions. Together, our findings suggest that participants judge an allocation as fair or unfair using a network associated with cognitive control and theory-of-mind, while dynamically switching between what might constitute a fair allocation in a particular context.

13.
Front Public Health ; 12: 1403866, 2024.
Article in English | MEDLINE | ID: mdl-38841685

ABSTRACT

Children with disability face many barriers to participating in community sports. Little Athletics Australia aims to increase fair and meaningful inclusion via a new structure which will enable all children to take part in the same contest by competing for their 'personal best' score. Named the True Inclusion Method (TIM), this new structure will be piloted in 13 sites across six states. Formative evaluation of this pilot will critique TIM and its implementation using observations of events, and interviews and surveys with child athletes with and without disability, their parents/carers and Little Athletics volunteers. Implementation outcomes are acceptability, appropriateness, adoption, feasibility and fidelity. Qualitative data will be analysed thematically. TIM is designed to encourage inclusive participation by children with disability in sporting events, and to improve the competitive experience for all children by celebrating personal achievement and fostering fun.


Subject(s)
Disabled Children , Sports , Humans , Child , Disabled Children/rehabilitation , Australia , Male , Female , Pilot Projects , Adolescent , Program Evaluation
14.
J Cancer Policy ; 41: 100492, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38908820

ABSTRACT

Whole genome sequencing (WGS) of a tumour may sometimes reveal additional potential targets for medical treatment. Practice variation in the use of WGS is therefore a source of unequal access to targeted therapies and, as a consequence, of disparities in health outcomes. Moreover, this may even be more significant if patients seek access to WGS by paying a relatively limited amount of money out of pocket, and sometimes effectively buy themselves a ticket to (very) expensive publicly funded treatments. Should resulting unequal access to WGS be considered unfair? Drawing from current practice in the Dutch healthcare system, known as egalitarian, we argue that differences in employment of WGS between hospitals are the consequence of the fact that medical innovation and its subsequent uptake inevitably takes time. Consequently, temporal inequalities in access can be deemed acceptable, or at least tolerated, because and insofar as, ultimately, all patients benefit. However, we argue against allowing a practice of out-of-pocket payments for WGS in publicly funded healthcare systems, for four reasons: because allowing private spending favours patients with higher socio-economic status significantly more than practice variation between hospitals does, may lead to displacement of publicly funded health care, does not help to ultimately benefit all, and may undermine the solidaristic ethos essential for egalitarian healthcare systems.

15.
Article in English | MEDLINE | ID: mdl-38918321

ABSTRACT

BACKGROUND: While precision medicine algorithms can be used to improve health outcomes, concerns have been raised about racial equity and unintentional harm from encoded biases. In this study, we evaluated the fairness of using common individual- and community-level proxies of pediatric socioeconomic status (SES) such as insurance status and community deprivation index often utilized in precision medicine algorithms. METHODS: Using 2012-2021 vital records obtained from the Ohio Department of Health, we geocoded and matched each residential birth address to a census tract to obtain community deprivation index. We then conducted sensitivity and specificity analyses to determine the degree of match between deprivation index, insurance status, and birthing parent education level for all, Black, and White children to assess if there were differences based on race. RESULTS: We found that community deprivation index and insurance status fail to accurately represent individual SES, either alone or in combination. We found that deprivation index had a sensitivity of 61.2% and specificity of 74.1%, while insurance status had a higher sensitivity of 91.6% but lower specificity of 60.1%. Furthermore, these inconsistencies were race-based across all proxies evaluated, with greater sensitivities for Black children but greater specificities for White children. CONCLUSION: This may explain some of the racial disparities present in precision medicine algorithms that utilize SES proxies. Future studies should examine how to mitigate the biases introduced by using SES proxies, potentially by incorporating additional data on housing conditions.

16.
J Biomed Inform ; 156: 104664, 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38851413

ABSTRACT

OBJECTIVE: Guidance on how to evaluate accuracy and algorithmic fairness across subgroups is missing for clinical models that flag patients for an intervention but when health care resources to administer that intervention are limited. We aimed to propose a framework of metrics that would fit this specific use case. METHODS: We evaluated the following metrics and applied them to a Veterans Health Administration clinical model that flags patients for intervention who are at risk of overdose or a suicidal event among outpatients who were prescribed opioids (N = 405,817): Receiver - Operating Characteristic and area under the curve, precision - recall curve, calibration - reliability curve, false positive rate, false negative rate, and false omission rate. In addition, we developed a new approach to visualize false positives and false negatives that we named 'per true positive bars.' We demonstrate the utility of these metrics to our use case for three cohorts of patients at the highest risk (top 0.5 %, 1.0 %, and 5.0 %) by evaluating algorithmic fairness across the following age groups: <=30, 31-50, 51-65, and >65 years old. RESULTS: Metrics that allowed us to assess group differences more clearly were the false positive rate, false negative rate, false omission rate, and the new 'per true positive bars'. Metrics with limited utility to our use case were the Receiver - Operating Characteristic and area under the curve, the calibration - reliability curve, and the precision - recall curve. CONCLUSION: There is no "one size fits all" approach to model performance monitoring and bias analysis. Our work informs future researchers and clinicians who seek to evaluate accuracy and fairness of predictive models that identify patients to intervene on in the context of limited health care resources. In terms of ease of interpretation and utility for our use case, the new 'per true positive bars' may be the most intuitive to a range of stakeholders and facilitates choosing a threshold that allows weighing false positives against false negatives, which is especially important when predicting severe adverse events.

17.
J Biomed Inform ; 156: 104677, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38876453

ABSTRACT

OBJECTIVE: Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences. METHODS: We created five datasets from Mass General Brigham's electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated. RESULTS: We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (p-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (p = 0.043), in the CKD cohort for insurance type (p = 0.005) and education level (p = 0.016), and in the dementia cohort for body mass index (p = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with p-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and p-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively. DISCUSSION AND CONCLUSION: This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.

18.
J Biomed Inform ; 156: 104671, 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38876452

ABSTRACT

Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods' performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.

19.
J Biomed Inform ; 156: 104683, 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38925281

ABSTRACT

OBJECTIVE: Despite increased availability of methodologies to identify algorithmic bias, the operationalization of bias evaluation for healthcare predictive models is still limited. Therefore, this study proposes a process for bias evaluation through an empirical assessment of common hospital readmission models. The process includes selecting bias measures, interpretation, determining disparity impact and potential mitigations. METHODS: This retrospective analysis evaluated racial bias of four common models predicting 30-day unplanned readmission (i.e., LACE Index, HOSPITAL Score, and the CMS readmission measure applied as is and retrained). The models were assessed using 2.4 million adult inpatient discharges in Maryland from 2016 to 2019. Fairness metrics that are model-agnostic, easy to compute, and interpretable were implemented and apprised to select the most appropriate bias measures. The impact of changing model's risk thresholds on these measures was further assessed to guide the selection of optimal thresholds to control and mitigate bias. RESULTS: Four bias measures were selected for the predictive task: zero-one-loss difference, false negative rate (FNR) parity, false positive rate (FPR) parity, and generalized entropy index. Based on these measures, the HOSPITAL score and the retrained CMS measure demonstrated the lowest racial bias. White patients showed a higher FNR while Black patients resulted in a higher FPR and zero-one-loss. As the models' risk threshold changed, trade-offs between models' fairness and overall performance were observed, and the assessment showed all models' default thresholds were reasonable for balancing accuracy and bias. CONCLUSIONS: This study proposes an Applied Framework to Assess Fairness of Predictive Models (AFAFPM) and demonstrates the process using 30-day hospital readmission model as the example. It suggests the feasibility of applying algorithmic bias assessment to determine optimized risk thresholds so that predictive models can be used more equitably and accurately. It is evident that a combination of qualitative and quantitative methods and a multidisciplinary team are necessary to identify, understand and respond to algorithm bias in real-world healthcare settings. Users should also apply multiple bias measures to ensure a more comprehensive, tailored, and balanced view. The results of bias measures, however, must be interpreted with caution and consider the larger operational, clinical, and policy context.

20.
J Med Internet Res ; 26: e50295, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38941134

ABSTRACT

Artificial intelligence (AI)-based clinical decision support systems are gaining momentum by relying on a greater volume and variety of secondary use data. However, the uncertainty, variability, and biases in real-world data environments still pose significant challenges to the development of health AI, its routine clinical use, and its regulatory frameworks. Health AI should be resilient against real-world environments throughout its lifecycle, including the training and prediction phases and maintenance during production, and health AI regulations should evolve accordingly. Data quality issues, variability over time or across sites, information uncertainty, human-computer interaction, and fundamental rights assurance are among the most relevant challenges. If health AI is not designed resiliently with regard to these real-world data effects, potentially biased data-driven medical decisions can risk the safety and fundamental rights of millions of people. In this viewpoint, we review the challenges, requirements, and methods for resilient AI in health and provide a research framework to improve the trustworthiness of next-generation AI-based clinical decision support.


Subject(s)
Artificial Intelligence , Decision Support Systems, Clinical , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...