Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
1.
Lancet Digit Health ; 6(8): e589-e594, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39059890

RESUMO

The development and commercialisation of medical decision systems based on artificial intelligence (AI) far outpaces our understanding of their value for clinicians. Although applicable across many forms of medicine, we focus on characterising the diagnostic decisions of radiologists through the concept of ecologically bounded reasoning, review the differences between clinician decision making and medical AI model decision making, and reveal how these differences pose fundamental challenges for integrating AI into radiology. We argue that clinicians are contextually motivated, mentally resourceful decision makers, whereas AI models are contextually stripped, correlational decision makers, and discuss misconceptions about clinician-AI interaction stemming from this misalignment of capabilities. We outline how future research on clinician-AI interaction could better address the cognitive considerations of decision making and be used to enhance the safety and usability of AI models in high-risk medical decision-making contexts.


Assuntos
Inteligência Artificial , Tomada de Decisão Clínica , Humanos , Tomada de Decisão Clínica/métodos , Cognição , Sistemas de Apoio a Decisões Clínicas , Radiologia
2.
medRxiv ; 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39072010

RESUMO

Background: There are known racial disparities in the organ transplant allocation system in the United States. However, prior work has yet to establish if transplant center decisions on offer acceptance-the final step in the allocation process-contribute to these disparities. Objective: To estimate racial differences in the acceptance of organ offers by transplant center physicians on behalf of their patients. Design: Retrospective cohort analysis using data from the Scientific Registry of Transplant Recipients (SRTR) on patients who received an offer for a heart, liver, or lung transplant between January 1, 2010 and December 31, 2020. Setting: Nationwide, waitlist-based. Patients: 32,268 heart transplant candidates, 102,823 liver candidates, and 25,780 lung candidates, all aged 18 or older. Measurements: 1) Association between offer acceptance and two race-based variables: candidate race and donor-candidate race match; 2) association between offer rejection and time to patient mortality. Results: Black race was associated with significantly lower odds of offer acceptance for livers (OR=0.93, CI: 0.88-0.98) and lungs (OR=0.80, CI: 0.73-0.87). Donor-candidate race match was associated with significantly higher odds of offer acceptance for hearts (OR=1.11, CI: 1.06-1.16), livers (OR=1.10, CI: 1.06-1.13), and lungs (OR=1.13, CI: 1.07-1.19). Rejecting an offer was associated with lower survival times for all three organs (heart hazard ratio=1.16, CI: 1.09-1.23; liver HR=1.74, CI: 1.66-1.82; lung HR=1.21, CI: 1.15-1.28). Limitations: Our study analyzed the observational SRTR dataset, which has known limitations. Conclusion: Offer acceptance decisions are associated with inequity in the organ allocation system. Our findings demonstrate the additional barriers that Black patients face in accessing organ transplants and demonstrate the need for standardized practice, continuous distribution policies, and better organ procurement.

3.
Cancer Cell ; 42(6): 915-918, 2024 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-38861926

RESUMO

Experts discuss the challenges and opportunities of using artificial intelligence (AI) to study the evolution of cancer cells and their microenvironment, improve diagnosis, predict treatment response, and ensure responsible implementation in the clinic.


Assuntos
Inteligência Artificial , Neoplasias , Microambiente Tumoral , Humanos , Neoplasias/terapia , Neoplasias/genética , Neoplasias/patologia
4.
Nat Med ; 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38942996

RESUMO

As artificial intelligence (AI) rapidly approaches human-level performance in medical imaging, it is crucial that it does not exacerbate or propagate healthcare disparities. Previous research established AI's capacity to infer demographic data from chest X-rays, leading to a key concern: do models using demographic shortcuts have unfair predictions across subpopulations? In this study, we conducted a thorough investigation into the extent to which medical AI uses demographic encodings, focusing on potential fairness discrepancies within both in-distribution training sets and external test sets. Our analysis covers three key medical imaging disciplines-radiology, dermatology and ophthalmology-and incorporates data from six global chest X-ray datasets. We confirm that medical imaging AI leverages demographic shortcuts in disease classification. Although correcting shortcuts algorithmically effectively addresses fairness gaps to create 'locally optimal' models within the original data distribution, this optimality is not true in new test settings. Surprisingly, we found that models with less encoding of demographic attributes are often most 'globally optimal', exhibiting better fairness during model evaluation in new test environments. Our work establishes best practices for medical imaging models that maintain their performance and fairness in deployments beyond their initial training contexts, underscoring critical considerations for AI clinical deployments across populations and sites.

6.
Sci Rep ; 14(1): 4516, 2024 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-38402362

RESUMO

While novel oral anticoagulants are increasingly used to reduce risk of stroke in patients with atrial fibrillation, vitamin K antagonists such as warfarin continue to be used extensively for stroke prevention across the world. While effective in reducing the risk of strokes, the complex pharmacodynamics of warfarin make it difficult to use clinically, with many patients experiencing under- and/or over- anticoagulation. In this study we employed a novel implementation of deep reinforcement learning to provide clinical decision support to optimize time in therapeutic International Normalized Ratio (INR) range. We used a novel semi-Markov decision process formulation of the Batch-Constrained deep Q-learning algorithm to develop a reinforcement learning model to dynamically recommend optimal warfarin dosing to achieve INR of 2.0-3.0 for patients with atrial fibrillation. The model was developed using data from 22,502 patients in the warfarin treated groups of the pivotal randomized clinical trials of edoxaban (ENGAGE AF-TIMI 48), apixaban (ARISTOTLE) and rivaroxaban (ROCKET AF). The model was externally validated on data from 5730 warfarin-treated patients in a fourth trial of dabigatran (RE-LY) using multilevel regression models to estimate the relationship between center-level algorithm consistent dosing, time in therapeutic INR range (TTR), and a composite clinical outcome of stroke, systemic embolism or major hemorrhage. External validation showed a positive association between center-level algorithm-consistent dosing and TTR (R2 = 0.56). Each 10% increase in algorithm-consistent dosing at the center level independently predicted a 6.78% improvement in TTR (95% CI 6.29, 7.28; p < 0.001) and a 11% decrease in the composite clinical outcome (HR 0.89; 95% CI 0.81, 1.00; p = 0.015). These results were comparable to those of a rules-based clinical algorithm used for benchmarking, for which each 10% increase in algorithm-consistent dosing independently predicted a 6.10% increase in TTR (95% CI 5.67, 6.54, p < 0.001) and a 10% decrease in the composite outcome (HR 0.90; 95% CI 0.83, 0.98, p = 0.018). Our findings suggest that a deep reinforcement learning algorithm can optimize time in therapeutic range for patients taking warfarin. A digital clinical decision support system to promote algorithm-consistent warfarin dosing could optimize time in therapeutic range and improve clinical outcomes in atrial fibrillation globally.


Assuntos
Fibrilação Atrial , Acidente Vascular Cerebral , Humanos , Administração Oral , Anticoagulantes , Fibrilação Atrial/complicações , Fibrilação Atrial/tratamento farmacológico , Fibrilação Atrial/induzido quimicamente , Aprendizado de Máquina , Rivaroxabana/uso terapêutico , Acidente Vascular Cerebral/prevenção & controle , Acidente Vascular Cerebral/induzido quimicamente , Resultado do Tratamento , Varfarina , Ensaios Clínicos Controlados Aleatórios como Assunto
7.
J Clin Oncol ; 42(14): 1625-1634, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38359380

RESUMO

PURPOSE: For patients with advanced cancer, early consultations with palliative care (PC) specialists reduce costs, improve quality of life, and prolong survival. However, capacity limitations prevent all patients from receiving PC shortly after diagnosis. We evaluated whether a prognostic machine learning system could promote early PC, given existing capacity. METHODS: Using population-level administrative data in Ontario, Canada, we assembled a cohort of patients with incurable cancer who received palliative-intent systemic therapy between July 1, 2014, and December 30, 2019. We developed a machine learning system that predicted death within 1 year of each treatment using demographics, cancer characteristics, treatments, symptoms, laboratory values, and history of acute care admissions. We trained the system in patients who started treatment before July 1, 2017, and evaluated the potential impact of the system on PC in subsequent patients. RESULTS: Among 560,210 treatments received by 54,628 patients, death occurred within 1 year of 45.2% of treatments. The machine learning system recommended the same number of PC consultations observed with usual care at the 60.0% 1-year risk of death, with a first-alarm positive predictive value of 69.7% and an outcome-level sensitivity of 74.9%. Compared with usual care, system-guided care could increase early PC by 8.5% overall (95% CI, 7.5 to 9.5; P < .001) and by 15.3% (95% CI, 13.9 to 16.6; P < .001) among patients who live 6 months beyond their first treatment, without requiring more PC consultations in total or substantially increasing PC among patients with a prognosis exceeding 2 years. CONCLUSION: Prognostic machine learning systems could increase early PC despite existing resource constraints. These results demonstrate an urgent need to deploy and evaluate prognostic systems in real-time clinical practice to increase access to early PC.


Assuntos
Aprendizado de Máquina , Neoplasias , Cuidados Paliativos , Encaminhamento e Consulta , Humanos , Cuidados Paliativos/métodos , Neoplasias/terapia , Masculino , Feminino , Encaminhamento e Consulta/estatística & dados numéricos , Idoso , Pessoa de Meia-Idade , Ontário , Idoso de 80 Anos ou mais , Prognóstico
8.
PLOS Digit Health ; 3(1): e0000417, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38236824

RESUMO

The study provides a comprehensive review of OpenAI's Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4's report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.

9.
Sci Transl Med ; 16(731): eadg4517, 2024 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-38266105

RESUMO

The human retina is a multilayered tissue that offers a unique window into systemic health. Optical coherence tomography (OCT) is widely used in eye care and allows the noninvasive, rapid capture of retinal anatomy in exquisite detail. We conducted genotypic and phenotypic analyses of retinal layer thicknesses using macular OCT images from 44,823 UK Biobank participants. We performed OCT layer cross-phenotype association analyses (OCT-XWAS), associating retinal thicknesses with 1866 incident conditions (median 10-year follow-up) and 88 quantitative traits and blood biomarkers. We performed genome-wide association studies (GWASs), identifying inherited genetic markers that influence retinal layer thicknesses and replicated our associations among the LIFE-Adult Study (N = 6313). Last, we performed a comparative analysis of phenome- and genome-wide associations to identify putative causal links between retinal layer thicknesses and both ocular and systemic conditions. Independent associations with incident mortality were detected for thinner photoreceptor segments (PSs) and, separately, ganglion cell complex layers. Phenotypic associations were detected between thinner retinal layers and ocular, neuropsychiatric, cardiometabolic, and pulmonary conditions. A GWAS of retinal layer thicknesses yielded 259 unique loci. Consistency between epidemiologic and genetic associations suggested links between a thinner retinal nerve fiber layer with glaucoma, thinner PS with age-related macular degeneration, and poor cardiometabolic and pulmonary function with a thinner PS. In conclusion, we identified multiple inherited genetic loci and acquired systemic cardio-metabolic-pulmonary conditions associated with thinner retinal layers and identify retinal layers wherein thinning is predictive of future ocular and systemic conditions.


Assuntos
Doenças Cardiovasculares , Estudo de Associação Genômica Ampla , Adulto , Humanos , Tomografia de Coerência Óptica , Face , Retina/diagnóstico por imagem
11.
NPJ Digit Med ; 6(1): 237, 2023 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-38123810

RESUMO

Stress is associated with numerous chronic health conditions, both mental and physical. However, the heterogeneity of these associations at the individual level is poorly understood. While data generated from individuals in their day-to-day lives "in the wild" may best represent the heterogeneity of stress, gathering these data and separating signals from noise is challenging. In this work, we report findings from a major data collection effort using Digital Health Technologies (DHTs) and frontline healthcare workers. We provide insights into stress "in the wild", by using robust methods for its identification from multimodal data and quantifying its heterogeneity. Here we analyze data from the Stress and Recovery in Frontline COVID-19 Workers study following 365 frontline healthcare workers for 4-6 months using wearable devices and smartphone app-based measures. Causal discovery is used to learn how the causal structure governing an individual's self-reported symptoms and physiological features from DHTs differs between non-stress and potential stress states. Our methods uncover robust representations of potential stress states across a population of frontline healthcare workers. These representations reveal high levels of inter- and intra-individual heterogeneity in stress. We leverage multiple stress definitions that span different modalities (from subjective to physiological) to obtain a comprehensive view of stress, as these differing definitions rarely align in time. We show that these different stress definitions can be robustly represented as changes in the underlying causal structure on and off stress for individuals. This study is an important step toward better understanding potential underlying processes generating stress in individuals.

12.
Nat Hum Behav ; 7(11): 1833-1835, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37985904
14.
J Natl Compr Canc Netw ; 21(10): 1029-1037.e21, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37856226

RESUMO

BACKGROUND: Emergency department visits and hospitalizations frequently occur during systemic therapy for cancer. We developed and evaluated a longitudinal warning system for acute care use. METHODS: Using a retrospective population-based cohort of patients who started intravenous systemic therapy for nonhematologic cancers between July 1, 2014, and June 30, 2020, we randomly separated patients into cohorts for model training, hyperparameter tuning and model selection, and system testing. Predictive features included static features, such as demographics, cancer type, and treatment regimens, and dynamic features, such as patient-reported symptoms and laboratory values. The longitudinal warning system predicted the probability of acute care utilization within 30 days after each treatment session. Machine learning systems were developed in the training and tuning cohorts and evaluated in the testing cohort. Sensitivity analyses considered feature importance, other acute care endpoints, and performance within subgroups. RESULTS: The cohort included 105,129 patients who received 1,216,385 treatment sessions. Acute care followed 182,444 (15.0%) treatments within 30 days. The ensemble model achieved an area under the receiver operating characteristic curve of 0.742 (95% CI, 0.739-0.745) and was well calibrated in the test cohort. Important predictive features included prior acute care use, treatment regimen, and laboratory tests. If the system was set to alarm approximately once every 15 treatments, 25.5% of acute care events would be preceded by an alarm, and 47.4% of patients would experience acute care after an alarm. The system underestimated risk for some treatment regimens and potentially underserved populations such as females and non-English speakers. CONCLUSIONS: Machine learning warning systems can detect patients at risk for acute care utilization, which can aid in preventive intervention and facilitate tailored treatment. Future research should address potential biases and prospectively evaluate impact after system deployment.


Assuntos
Neoplasias , Feminino , Humanos , Estudos Retrospectivos , Neoplasias/diagnóstico , Neoplasias/tratamento farmacológico , Aprendizado de Máquina , Hospitalização , Serviço Hospitalar de Emergência
15.
Nat Med ; 29(11): 2929-2938, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37884627

RESUMO

Artificial intelligence as a medical device is increasingly being applied to healthcare for diagnosis, risk stratification and resource allocation. However, a growing body of evidence has highlighted the risk of algorithmic bias, which may perpetuate existing health inequity. This problem arises in part because of systemic inequalities in dataset curation, unequal opportunity to participate in research and inequalities of access. This study aims to explore existing standards, frameworks and best practices for ensuring adequate data diversity in health datasets. Exploring the body of existing literature and expert views is an important step towards the development of consensus-based guidelines. The study comprises two parts: a systematic review of existing standards, frameworks and best practices for healthcare datasets; and a survey and thematic analysis of stakeholder views of bias, health equity and best practices for artificial intelligence as a medical device. We found that the need for dataset diversity was well described in literature, and experts generally favored the development of a robust set of guidelines, but there were mixed views about how these could be implemented practically. The outputs of this study will be used to inform the development of standards for transparency of data diversity in health datasets (the STANDING Together initiative).


Assuntos
Inteligência Artificial , Atenção à Saúde , Humanos , Consenso , Revisões Sistemáticas como Assunto
17.
medRxiv ; 2023 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-37292770

RESUMO

The human retina is a complex multi-layered tissue which offers a unique window into systemic health and disease. Optical coherence tomography (OCT) is widely used in eye care and allows the non-invasive, rapid capture of retinal measurements in exquisite detail. We conducted genome- and phenome-wide analyses of retinal layer thicknesses using macular OCT images from 44,823 UK Biobank participants. We performed phenome-wide association analyses, associating retinal thicknesses with 1,866 incident ICD-based conditions (median 10-year follow-up) and 88 quantitative traits and blood biomarkers. We performed genome-wide association analyses, identifying inherited genetic markers which influence the retina, and replicated our associations among 6,313 individuals from the LIFE-Adult Study. And lastly, we performed comparative association of phenome- and genome- wide associations to identify putative causal links between systemic conditions, retinal layer thicknesses, and ocular disease. Independent associations with incident mortality were detected for photoreceptor thinning and ganglion cell complex thinning. Significant phenotypic associations were detected between retinal layer thinning and ocular, neuropsychiatric, cardiometabolic and pulmonary conditions. Genome-wide association of retinal layer thicknesses yielded 259 loci. Consistency between epidemiologic and genetic associations suggested putative causal links between thinning of the retinal nerve fiber layer with glaucoma, photoreceptor segment with AMD, as well as poor cardiometabolic and pulmonary function with PS thinning, among other findings. In conclusion, retinal layer thinning predicts risk of future ocular and systemic disease. Furthermore, systemic cardio-metabolic-pulmonary conditions promote retinal thinning. Retinal imaging biomarkers, integrated into electronic health records, may inform risk prediction and potential therapeutic strategies.

18.
Transl Psychiatry ; 13(1): 210, 2023 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-37328465

RESUMO

Advancements in artificial intelligence (AI) are enabling the development of clinical support tools (CSTs) in psychiatry to facilitate the review of patient data and inform clinical care. To promote their successful integration and prevent over-reliance, it is important to understand how psychiatrists will respond to information provided by AI-based CSTs, particularly if it is incorrect. We conducted an experiment to examine psychiatrists' perceptions of AI-based CSTs for treating major depressive disorder (MDD) and to determine whether perceptions interacted with the quality of CST information. Eighty-three psychiatrists read clinical notes about a hypothetical patient with MDD and reviewed two CSTs embedded within a single dashboard: the note's summary and a treatment recommendation. Psychiatrists were randomised to believe the source of CSTs was either AI or another psychiatrist, and across four notes, CSTs provided either correct or incorrect information. Psychiatrists rated the CSTs on various attributes. Ratings for note summaries were less favourable when psychiatrists believed the notes were generated with AI as compared to another psychiatrist, regardless of whether the notes provided correct or incorrect information. A smaller preference for psychiatrist-generated information emerged in ratings of attributes that reflected the summary's accuracy or its inclusion of important information from the full clinical note. Ratings for treatment recommendations were also less favourable when their perceived source was AI, but only when recommendations were correct. There was little evidence that clinical expertise or familiarity with AI impacted results. These findings suggest that psychiatrists prefer human-derived CSTs. This preference was less pronounced for ratings that may have prompted a deeper review of CST information (i.e. a comparison with the full clinical note to evaluate the summary's accuracy or completeness, assessing an incorrect treatment recommendation), suggesting a role of heuristics. Future work should explore other contributing factors and downstream implications for integrating AI into psychiatric care.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Transtorno Depressivo Maior , Psiquiatria , Humanos , Inteligência Artificial , Depressão , Transtorno Depressivo Maior/tratamento farmacológico
19.
Crit Care Explor ; 5(5): e0897, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-37151895

RESUMO

Hospital early warning systems that use machine learning (ML) to predict clinical deterioration are increasingly being used to aid clinical decision-making. However, it is not known how ML predictions complement physician and nurse judgment. Our objective was to train and validate a ML model to predict patient deterioration and compare model predictions with real-world physician and nurse predictions. DESIGN: Retrospective and prospective cohort study. SETTING: Academic tertiary care hospital. PATIENTS: Adult general internal medicine hospitalizations. MEASUREMENTS AND MAIN RESULTS: We developed and validated a neural network model to predict in-hospital death and ICU admission in 23,528 hospitalizations between April 2011 and April 2019. We then compared model predictions with 3,374 prospectively collected predictions from nurses, residents, and attending physicians about their own patients in 960 hospitalizations between April 30, and August 28, 2019. ML model predictions achieved clinician-level accuracy for predicting ICU admission or death (ML median F1 score 0.32 [interquartile range (IQR) 0.30-0.34], AUC 0.77 [IQ 0.76-0.78]; clinicians median F1-score 0.33 [IQR 0.30-0.35], AUC 0.64 [IQR 0.63-0.66]). ML predictions were more accurate than clinicians for ICU admission. Of all ICU admissions and deaths, 36% occurred in hospitalizations where the model and clinicians disagreed. Combining human and model predictions detected 49% of clinical deterioration events, improving sensitivity by 16% compared with clinicians alone and 24% compared with the model alone while maintaining a positive predictive value of 33%, thus keeping false alarms at a clinically acceptable level. CONCLUSIONS: ML models can complement clinician judgment to predict clinical deterioration in hospital. These findings demonstrate important opportunities for human-computer collaboration to improve prognostication and personalized medicine in hospital.

20.
Sci Adv ; 9(19): eabq0701, 2023 05 10.
Artigo em Inglês | MEDLINE | ID: mdl-37163590

RESUMO

As governments and industry turn to increased use of automated decision systems, it becomes essential to consider how closely such systems can reproduce human judgment. We identify a core potential failure, finding that annotators label objects differently depending on whether they are being asked a factual question or a normative question. This challenges a natural assumption maintained in many standard machine-learning (ML) data acquisition procedures: that there is no difference between predicting the factual classification of an object and an exercise of judgment about whether an object violates a rule premised on those facts. We find that using factual labels to train models intended for normative judgments introduces a notable measurement error. We show that models trained using factual labels yield significantly different judgments than those trained using normative labels and that the impact of this effect on model performance can exceed that of other factors (e.g., dataset size) that routinely attract attention from ML researchers and practitioners.


Assuntos
Julgamento , Aprendizado de Máquina , Humanos , Governo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA