Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Closing the gap between open source and commercial large language models for medical evidence summarization.

Zhang, Gongbo; Jin, Qiao; Zhou, Yiliang; Wang, Song; Idnay, Betina; Luo, Yiming; Park, Elizabeth; Nestor, Jordan G; Spotnitz, Matthew E; Soroush, Ali; Campion, Thomas R; Lu, Zhiyong; Weng, Chunhua; Peng, Yifan.

NPJ Digit Med ; 7(1): 239, 2024 Sep 09.

Artigo em Inglês | MEDLINE | ID: mdl-39251804

RESUMO

Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor dependency. While open-source LLMs allow better transparency and customization, their performance falls short compared to the proprietary ones. In this study, we investigated to what extent fine-tuning open-source LLMs can further improve their performance. Utilizing a benchmark dataset, MedReview, consisting of 8161 pairs of systematic reviews and summaries, we fine-tuned three broadly-used, open-sourced LLMs, namely PRIMERA, LongT5, and Llama-2. Overall, the performance of open-source models was all improved after fine-tuning. The performance of fine-tuned LongT5 is close to GPT-3.5 with zero-shot settings. Furthermore, smaller fine-tuned models sometimes even demonstrated superior performance compared to larger zero-shot models. The above trends of improvement were manifested in both a human evaluation and a larger-scale GPT4-simulated evaluation.

2.

Author Correction: Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling.

Holste, Gregory; Lin, Mingquan; Zhou, Ruiwen; Wang, Fei; Liu, Lei; Yan, Qi; Van Tassel, Sarah H; Kovacs, Kyle; Chew, Emily Y; Lu, Zhiyong; Wang, Zhangyang; Peng, Yifan.

NPJ Digit Med ; 7(1): 240, 2024 Sep 09.

Artigo em Inglês | MEDLINE | ID: mdl-39251870

3.

Deep learning models to predict primary open-angle glaucoma.

Zhou, Ruiwen; Miller, J Philip; Gordon, Mae; Kass, Michael; Lin, Mingquan; Peng, Yifan; Li, Fuhai; Feng, Jiarui; Liu, Lei.

Stat (Int Stat Inst) ; 13(1)2024.

Artigo em Inglês | MEDLINE | ID: mdl-39220673

RESUMO

Glaucoma is a major cause of blindness and vision impairment worldwide, and visual field (VF) tests are essential for monitoring the conversion of glaucoma. While previous studies have primarily focused on using VF data at a single time point for glaucoma prediction, there has been limited exploration of longitudinal trajectories. Additionally, many deep learning techniques treat the time-to-glaucoma prediction as a binary classification problem (glaucoma Yes/No), resulting in the misclassification of some censored subjects into the nonglaucoma category and decreased power. To tackle these challenges, we propose and implement several deep-learning approaches that naturally incorporate temporal and spatial information from longitudinal VF data to predict time-to-glaucoma. When evaluated on the Ocular Hypertension Treatment Study (OHTS) dataset, our proposed convolutional neural network (CNN)-long short-term memory (LSTM) emerged as the top-performing model among all those examined. The implementation code can be found online (https://github.com/rivenzhou/VF_prediction).

4.

Neoadjuvant chemotherapy with capecitabine combined with oxaliplatin for mid-low locally advanced rectal cancer with negative mesorectal fascia: Long-term outcomes of a prospective trial (PKUCH-R03 trial).

Chen, Nan; Zhao, Minghe; Yao, Yunfeng; Wang, Lin; Peng, Yifan; Sun, Tingting; Zhan, Tiancheng; Zhao, Jun; Wu, Aiwen.

Chin J Cancer Res ; 36(4): 410-420, 2024 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-39246707

RESUMO

Objective: To evaluate the safety and efficacy of neoadjuvant chemotherapy (NCT) in mid-low locally advanced rectal cancer with negative mesorectal fascia (MRF). Methods: This prospective, single-arm phase II trial was designed and conducted at Peking University Cancer Hospital. The patients who provided consent received 3 months of NCT (capecitabine and oxaliplatin, CapOX) followed by total mesorectal excision (TME). The primary endpoint was the rate of pathological complete response (pCR). Results: From January 2019 through December 2021, a total of 53 patients were enrolled, 7.5% of whom experienced grade 3-4 adverse events during NCT. The pCR rate was 17.0% for the entire cohort, and the overall rate of postoperative complications was 37.7% (1.9% of grade IIIa patients). The 3-year disease-free survival rate was 91.4%, and 23.5% (12/51) of the patients suffered from major low anterior resection syndrome (LARS). Postoperative complications were independently associated with major LARS. Conclusions: For patients with mid-low rectal cancer with negative MRF, 3 months of NCT were found to yield a favorable tumor response with acceptable toxicity. With fair long-term survival, the NCT regimen could be associated with low rates of perioperative complications as well as acceptable anal function.

5.

Some remarks on the argument appealing to nature against synthetic biology.

Lei, Ruipeng; Peng, Yifan; He, Yutian; Li, Jun.

Front Bioeng Biotechnol ; 12: 1428832, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39119275

RESUMO

This paper will focus on analyzing the argument with appealing to nature against synthetic biology and provide a counter-argument against it through demonstrating the ambiguity of the concept of nature, denying the existence of a morally significant line between natural and non/unnatural, and disproving the allegations against synthetic biology raised by the argument appealing to nature. The paper consists of two parts following a brief introduction. The first part will describe the argument appealing to nature against synthetic biology, and identify the deficiencies of the argument per se, e.g., the ambiguity of the concept 'nature'; and the problems in the morally significant line between the natural and the non/unnatural. The second part will discuss the allegations to synthetic biology stemming from this argument, e.g., committing metaphysical and ethical mistakes, and doing possible harms to the environment.

6.

Large language models in biomedicine and health: current research landscape and future directions.

Lu, Zhiyong; Peng, Yifan; Cohen, Trevor; Ghassemi, Marzyeh; Weng, Chunhua; Tian, Shubo.

J Am Med Inform Assoc ; 31(9): 1801-1811, 2024 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-39169867

Assuntos

Pesquisa Biomédica , Processamento de Linguagem Natural , Humanos , Informática Médica

7.

Call for papers: Special issue on biomedical multimodal large language models - novel approaches and applications.

Bian, Jiang; Peng, Yifan; Mendonca, Eneida; Banerjee, Imon; Xu, Hua.

J Biomed Inform ; 157: 104703, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39111608

Assuntos

Processamento de Linguagem Natural , Humanos , Informática Médica/métodos

8.

Improving Fairness of Automated Chest Radiograph Diagnosis by Contrastive Learning.

Lin, Mingquan; Li, Tianhao; Sun, Zhaoyi; Holste, Gregory; Ding, Ying; Wang, Fei; Shih, George; Peng, Yifan.

Radiol Artif Intell ; : e230342, 2024 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-39166973

RESUMO

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop an artificial intelligence model that utilizes supervised contrastive learning to minimize bias in chest radiograph (CXR) diagnosis. Materials and Methods In this retrospective study, the proposed method was evaluated on two datasets: the Medical Imaging and Data Resource Center (MIDRC) dataset with 77,887 CXRs from 27,796 patients collected as of April 20, 2023 for COVID-19 diagnosis, and the NIH Chest x-ray 14 (NIH-CXR) dataset with 112,120 CXRs from 30,805 patients collected between 1992 and 2015. In the NIH-CXR dataset, thoracic abnormalities included atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation, edema, emphysema, fibrosis, pleural thickening, or hernia. The proposed method utilized supervised contrastive learning with carefully selected positive and negative samples to generate fair image embeddings, which were fine-tuned for subsequent tasks to reduce bias in CXR diagnosis. The method was evaluated using the marginal area under the receiver operating characteristic curve (AUC) difference (ΔmAUC). Results The proposed model showed a significant decrease in bias across all subgroups compared with the baseline models, as evidenced by a paired T-test (P < .001). The ΔmAUCs obtained by the proposed method were 0.01 (95% CI, 0.01-0.01), 0.21 (95% CI, 0.21-0.21), and 0.10 (95% CI, 0.10-0.10) for sex, race, and age subgroups, respectively, on MIDRC, and 0.01 (95% CI, 0.01-0.01) and 0.05 (95% CI, 0.05-0.05) for sex and age subgroups, respectively, on NIH-CXR. Conclusion Employing supervised contrastive learning can mitigate bias in CXR diagnosis, addressing concerns of fairness and reliability in deep learning-based diagnostic methods. ©RSNA, 2024.

9.

Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling.

Holste, Gregory; Lin, Mingquan; Zhou, Ruiwen; Wang, Fei; Liu, Lei; Yan, Qi; Van Tassel, Sarah H; Kovacs, Kyle; Chew, Emily Y; Lu, Zhiyong; Wang, Zhangyang; Peng, Yifan.

NPJ Digit Med ; 7(1): 216, 2024 Aug 16.

Artigo em Inglês | MEDLINE | ID: mdl-39152209

RESUMO

Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing a disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.

10.

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.

Jin, Qiao; Chen, Fangyuan; Zhou, Yiliang; Xu, Ziyang; Cheung, Justin M; Chen, Robert; Summers, Ronald M; Rousseau, Justin F; Ni, Peiyun; Landsman, Marc J; Baxter, Sally L; Al'Aref, Subhi J; Li, Yijia; Chen, Alexander; Brejt, Josef A; Chiang, Michael F; Peng, Yifan; Lu, Zhiyong.

NPJ Digit Med ; 7(1): 190, 2024 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-39043988

RESUMO

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges-an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.

11.

The expression and clinical significance of CFAP65 in colon cancer.

Li, Yunze; Ran, Dongmei; Basnet, Shiva; Zhang, Buzhe; Pei, Hongjing; Dan, Chenchen; Zhang, Zixuan; Zhang, Liang; Lu, Tianyu; Peng, Yifan; Du, Changzheng.

BMC Gastroenterol ; 24(1): 222, 2024 Jul 11.

Artigo em Inglês | MEDLINE | ID: mdl-38992586

RESUMO

BACKGROUND: CFAP65 (cilia and flagella associated protein 65) is a fundamental protein in the development and formation of ciliated flagella, but few studies have focused on its role in cancer. This study aimed to investigate the prognostic significance of CFAP65 in colon cancer. METHODS: The functionally enriched genes related to CFAP65 were analyzed through the Gene Ontology (GO) database. Subsequently, CFAP65 expression levels in colon cancer were evaluated by reverse transcription and quantitative polymerase chain reaction (RT-qPCR) and immunoblotting in 20 pairs of frozen samples, including tumors and their matched paratumor tissue. Furthermore, protein expression of CFAP65 in 189 colon cancer patients were assessed via immunohistochemical staining. The correlations between CFAP65 expression and clinical features as well as long-term survival were statistically analyzed. RESULTS: CFAP65-related genes are significantly enriched on cellular processes of cell motility, ion channels, and GTPase-associated signaling. The expression of CFAP65 was significantly higher in colon cancer tissue compared to paratumor tissue. The proportion of high expression and low expression of CFAP65 in the clinical samples of colon cancer were 61.9% and 38.1%, respectively, and its expression level was not associated with the clinical parameters including gender, age, tumor location, histological differentiation, tumor stage, vascular invasion and mismatch repair deficiency. The five-year disease-free survival rate of the patients with CFAP65 low expression tumors was significantly lower than that those with high expression tumors (56.9% vs. 72.6%, P = 0.03), but the overall survival rate has no significant difference (69% vs. 78.6%, P = 0.171). The cox hazard regression analysis model showed that CFAP65 expression, tumor stage and tumor location were independent prognostic factors. CONCLUSIONS: In conclusion, we demonstrate CFAP65 is a potential predictive marker for tumor progression in colon cancer.

Assuntos

Biomarcadores Tumorais , Neoplasias do Colo , Humanos , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Neoplasias do Colo/metabolismo , Neoplasias do Colo/mortalidade , Masculino , Feminino , Pessoa de Meia-Idade , Prognóstico , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Idoso , Proteínas dos Microfilamentos/genética , Proteínas dos Microfilamentos/metabolismo , Relevância Clínica , Proteínas de Membrana , Proteínas de Neoplasias

12.

Error-compensation network for ringing artifact reduction in holographic displays.

Yuan, Ganzhangqin; Zhou, Mi; Peng, Yifan; Chen, Muku; Geng, Zihan.

Opt Lett ; 49(11): 3210-3213, 2024 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-38824365

RESUMO

Recent advances in learning-based computer-generated holography (CGH) have unlocked novel possibilities for crafting phase-only holograms. However, existing approaches primarily focus on the learning ability of network modules, often neglecting the impact of diffraction propagation models. The resulting ringing artifacts, emanating from the Gibbs phenomenon in the propagation model, can degrade the quality of reconstructed holographic images. To this end, we explore a diffraction propagation error-compensation network that can be easily integrated into existing CGH methods. This network is designed to correct propagation errors by predicting residual values, thereby aligning the diffraction process closely with an ideal state and easing the learning burden of the network. Simulations and optical experiments demonstrate that our method, when applied to state-of-the-art HoloNet and CCNN, achieves PSNRs of up to 32.47 dB and 29.53 dB, respectively, surpassing baseline methods by 3.89 dB and 0.62 dB. Additionally, real-world experiments have confirmed a significant reduction in ringing artifacts. We envision this approach being applied to a variety of CGH algorithms, paving the way for improved holographic displays.

13.

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge.

Holste, Gregory; Zhou, Yiliang; Wang, Song; Jaiswal, Ajay; Lin, Mingquan; Zhuge, Sherry; Yang, Yuzhe; Kim, Dongkyun; Nguyen-Mau, Trong-Hieu; Tran, Minh-Triet; Jeong, Jaehyup; Park, Wongi; Ryu, Jongbin; Hong, Feng; Verma, Arsh; Yamagishi, Yosuke; Kim, Changhyun; Seo, Hyeryeong; Kang, Myungjoo; Celi, Leo Anthony; Lu, Zhiyong; Summers, Ronald M; Shih, George; Wang, Zhangyang; Peng, Yifan.

Med Image Anal ; 97: 103224, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-38850624

RESUMO

Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" - there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.

Assuntos

Radiografia Torácica , Humanos , Radiografia Torácica/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Doenças Torácicas/diagnóstico por imagem , Doenças Torácicas/classificação , Algoritmos

14.

Erratum for: Evaluating GPT-4V (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.

Zhou, Yiliang; Ong, Hanley; Kennedy, Patrick; Wu, Carol C; Kazam, Jacob; Hentel, Keith; Flanders, Adam; Shih, George; Peng, Yifan.

Radiology ; 311(2): e249016, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38805735

15.

Full-colour 3D holographic augmented-reality displays with metasurface waveguides.

Gopakumar, Manu; Lee, Gun-Yeal; Choi, Suyeon; Chao, Brian; Peng, Yifan; Kim, Jonghyun; Wetzstein, Gordon.

Nature ; 629(8013): 791-797, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38720077

RESUMO

Emerging spatial computing systems seamlessly superimpose digital information on the physical environment observed by a user, enabling transformative experiences across various domains, such as entertainment, education, communication and training1-3. However, the widespread adoption of augmented-reality (AR) displays has been limited due to the bulky projection optics of their light engines and their inability to accurately portray three-dimensional (3D) depth cues for virtual content, among other factors4,5. Here we introduce a holographic AR system that overcomes these challenges using a unique combination of inverse-designed full-colour metasurface gratings, a compact dispersion-compensating waveguide geometry and artificial-intelligence-driven holography algorithms. These elements are co-designed to eliminate the need for bulky collimation optics between the spatial light modulator and the waveguide and to present vibrant, full-colour, 3D AR content in a compact device form factor. To deliver unprecedented visual quality with our prototype, we develop an innovative image formation model that combines a physically accurate waveguide model with learned components that are automatically calibrated using camera feedback. Our unique co-design of a nanophotonic metasurface waveguide and artificial-intelligence-driven holographic algorithms represents a significant advancement in creating visually compelling 3D AR experiences in a compact wearable device.

16.

Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.

Zhou, Yiliang; Ong, Hanley; Kennedy, Patrick; Wu, Carol C; Kazam, Jacob; Hentel, Keith; Flanders, Adam; Shih, George; Peng, Yifan.

Radiology ; 311(2): e233270, 2024 05.

Artigo em Inglês | MEDLINE | ID: mdl-38713028

RESUMO

Background Generating radiologic findings from chest radiographs is pivotal in medical image analysis. The emergence of OpenAI's generative pretrained transformer, GPT-4 with vision (GPT-4V), has opened new perspectives on the potential for automated image-text pair generation. However, the application of GPT-4V to real-world chest radiography is yet to be thoroughly examined. Purpose To investigate the capability of GPT-4V to generate radiologic findings from real-world chest radiographs. Materials and Methods In this retrospective study, 100 chest radiographs with free-text radiology reports were annotated by a cohort of radiologists, two attending physicians and three residents, to establish a reference standard. Of 100 chest radiographs, 50 were randomly selected from the National Institutes of Health (NIH) chest radiographic data set, and 50 were randomly selected from the Medical Imaging and Data Resource Center (MIDRC). The performance of GPT-4V at detecting imaging findings from each chest radiograph was assessed in the zero-shot setting (where it operates without prior examples) and few-shot setting (where it operates with two examples). Its outcomes were compared with the reference standard with regards to clinical conditions and their corresponding codes in the International Statistical Classification of Diseases, Tenth Revision (ICD-10), including the anatomic location (hereafter, laterality). Results In the zero-shot setting, in the task of detecting ICD-10 codes alone, GPT-4V attained an average positive predictive value (PPV) of 12.3%, average true-positive rate (TPR) of 5.8%, and average F1 score of 7.3% on the NIH data set, and an average PPV of 25.0%, average TPR of 16.8%, and average F1 score of 18.2% on the MIDRC data set. When both the ICD-10 codes and their corresponding laterality were considered, GPT-4V produced an average PPV of 7.8%, average TPR of 3.5%, and average F1 score of 4.5% on the NIH data set, and an average PPV of 10.9%, average TPR of 4.9%, and average F1 score of 6.4% on the MIDRC data set. With few-shot learning, GPT-4V showed improved performance on both data sets. When contrasting zero-shot and few-shot learning, there were improved average TPRs and F1 scores in the few-shot setting, but there was not a substantial increase in the average PPV. Conclusion Although GPT-4V has shown promise in understanding natural images, it had limited effectiveness in interpreting real-world chest radiographs. © RSNA, 2024 Supplemental material is available for this article.

Assuntos

Radiografia Torácica , Humanos , Radiografia Torácica/métodos , Estudos Retrospectivos , Feminino , Masculino , Pessoa de Meia-Idade , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Idoso , Adulto

17.

Deep learning with noisy labels in medical prediction problems: a scoping review.

Wei, Yishu; Deng, Yu; Sun, Cong; Lin, Mingquan; Jiang, Hongmei; Peng, Yifan.

J Am Med Inform Assoc ; 31(7): 1596-1607, 2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38814164

RESUMO

OBJECTIVES: Medical research faces substantial challenges from noisy labels attributed to factors like inter-expert variability and machine-extracted labels. Despite this, the adoption of label noise management remains limited, and label noise is largely ignored. To this end, there is a critical need to conduct a scoping review focusing on the problem space. This scoping review aims to comprehensively review label noise management in deep learning-based medical prediction problems, which includes label noise detection, label noise handling, and evaluation. Research involving label uncertainty is also included. METHODS: Our scoping review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We searched 4 databases, including PubMed, IEEE Xplore, Google Scholar, and Semantic Scholar. Our search terms include "noisy label AND medical/healthcare/clinical," "uncertainty AND medical/healthcare/clinical," and "noise AND medical/healthcare/clinical." RESULTS: A total of 60 papers met inclusion criteria between 2016 and 2023. A series of practical questions in medical research are investigated. These include the sources of label noise, the impact of label noise, the detection of label noise, label noise handling techniques, and their evaluation. Categorization of both label noise detection methods and handling techniques are provided. DISCUSSION: From a methodological perspective, we observe that the medical community has been up to date with the broader deep-learning community, given that most techniques have been evaluated on medical data. We recommend considering label noise as a standard element in medical research, even if it is not dedicated to handling noisy labels. Initial experiments can start with easy-to-implement methods, such as noise-robust loss functions, weighting, and curriculum learning.

Assuntos

Aprendizado Profundo , Humanos , Pesquisa Biomédica

18.

A survey of recent methods for addressing AI fairness and bias in biomedicine.

Yang, Yifan; Lin, Mingquan; Zhao, Han; Peng, Yifan; Huang, Furong; Lu, Zhiyong.

J Biomed Inform ; 154: 104646, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38677633

RESUMO

OBJECTIVES: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. METHODS: We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. RESULTS: The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.

Assuntos

Inteligência Artificial , Viés , Processamento de Linguagem Natural , Humanos , Inquéritos e Questionários , Aprendizado de Máquina , Algoritmos

19.

Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness.

Zhang, Gongbo; Jin, Qiao; Jered McInerney, Denis; Chen, Yong; Wang, Fei; Cole, Curtis L; Yang, Qian; Wang, Yanshan; Malin, Bradley A; Peleg, Mor; Wallace, Byron C; Lu, Zhiyong; Weng, Chunhua; Peng, Yifan.

J Biomed Inform ; 153: 104640, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38608915

RESUMO

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.

Assuntos

Inteligência Artificial , Medicina Baseada em Evidências , Humanos , Confiança , Processamento de Linguagem Natural

20.

Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias.

Yu, Zehao; Peng, Cheng; Yang, Xi; Dang, Chong; Adekkanattu, Prakash; Gopal Patra, Braja; Peng, Yifan; Pathak, Jyotishman; Wilson, Debbie L; Chang, Ching-Yuan; Lo-Ciganic, Wei-Hsuan; George, Thomas J; Hogan, William R; Guo, Yi; Bian, Jiang; Wu, Yonghui.

J Biomed Inform ; 153: 104642, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38621641

RESUMO

OBJECTIVE: To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS: We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS: We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (â¼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS: Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.

Assuntos

Narração , Processamento de Linguagem Natural , Determinantes Sociais da Saúde , Humanos , Feminino , Masculino , Viés , Registros Eletrônicos de Saúde , Documentação/métodos , Mineração de Dados/métodos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA