Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 323
Filtrar
Más filtros

Base de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
2.
Med Image Anal ; 97: 103224, 2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38850624

RESUMEN

Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" - there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.

3.
ArXiv ; 2024 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-38903741

RESUMEN

Searching for a related article based on a reference article is an integral part of scientific research. PubMed, like many academic search engines, has a "similar articles" feature that recommends articles relevant to the current article viewed by a user. Explaining recommended items can be of great utility to users, particularly in the literature search process. With more than a million biomedical papers being published each year, explaining the recommended similar articles would facilitate researchers and clinicians in searching for related articles. Nonetheless, the majority of current literature recommendation systems lack explanations for their suggestions. We employ a post hoc approach to explaining recommendations by identifying relevant tokens in the titles of similar articles. Our major contribution is building PubCLogs by repurposing 5.6 million pairs of coclicked articles from PubMed's user query logs. Using our PubCLogs dataset, we train the Highlight Similar Article Title (HSAT), a transformer-based model designed to select the most relevant parts of the title of a similar article, based on the title and abstract of a seed article. HSAT demonstrates strong performance in our empirical evaluations, achieving an F1 score of 91.72 percent on the PubCLogs test set, considerably outperforming several baselines including BM25 (70.62), MPNet (67.11), MedCPT (62.22), GPT-3.5 (46.00), and GPT-4 (64.89). Additional evaluations on a separate, manually annotated test set further verifies HSAT's performance. Moreover, participants of our user study indicate a preference for HSAT, due to its superior balance between conciseness and comprehensiveness. Our study suggests that repurposing user query logs of academic search engines can be a promising way to train state-of-the-art models for explaining literature recommendation.

4.
ArXiv ; 2024 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-38903745

RESUMEN

In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI-generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a 0.48 score, outperforming the METEOR metric by 0.19, while our "Regressed GPT-4" model shows even greater alignment with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports.

5.
ArXiv ; 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38903736

RESUMEN

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.

6.
ArXiv ; 2024 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-38903746

RESUMEN

Gene set knowledge discovery is essential for advancing human functional genomics. Recent studies have shown promising performance by harnessing the power of Large Language Models (LLMs) on this task. Nonetheless, their results are subject to several limitations common in LLMs such as hallucinations. In response, we present GeneAgent, a first-of-its-kind language agent featuring self-verification capability. It autonomously interacts with various biological databases and leverages relevant domain knowledge to improve accuracy and reduce hallucination occurrences. Benchmarking on 1,106 gene sets from different sources, GeneAgent consistently outperforms standard GPT-4 by a significant margin. Moreover, a detailed manual review confirms the effectiveness of the self-verification module in minimizing hallucinations and generating more reliable analytical narratives. To demonstrate its practical utility, we apply GeneAgent to seven novel gene sets derived from mouse B2905 melanoma cell lines, with expert evaluations showing that GeneAgent offers novel insights into gene functions and subsequently expedites knowledge discovery.

8.
Nucleic Acids Res ; 52(W1): W540-W546, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38572754

RESUMEN

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.


Asunto(s)
PubMed , Inteligencia Artificial , Humanos , Programas Informáticos , Minería de Datos/métodos , Semántica , Internet
9.
J Biomed Inform ; 153: 104640, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38608915

RESUMEN

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.


Asunto(s)
Inteligencia Artificial , Medicina Basada en la Evidencia , Humanos , Confianza , Procesamiento de Lenguaje Natural
10.
J Med Internet Res ; 26: e56655, 2024 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-38630520

RESUMEN

BACKGROUND: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. OBJECTIVE: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test-related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. METHODS: We collected laboratory test result-related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. RESULTS: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4-generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one's medical context, incorrect statements, and lack of references. CONCLUSIONS: By evaluating LLMs in generating responses to patients' laboratory test result-related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4's responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation.


Asunto(s)
Inteligencia Artificial , Registros Electrónicos de Salud , Humanos , Lenguaje
11.
Sci Total Environ ; 929: 172646, 2024 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-38653417

RESUMEN

Agroforestry waste and cow manure pollute the environment, of which, agroforestry waste is difficult to degrade. Compost is an effective way to dispose agroforestry waste; however, the low degradation efficiency of lignocellulose in agroforestry waste affects the process of composting humification. This study investigated lignocellulose degradation and composting humification in full-size apple wood and cow manure composting processes by applying different pretreatments (acidic, alkaline, and high-temperature) to apple wood. Simultaneously, physicochemical characterization and metagenome sequencing were combined to analyze the function of carbohydrate-active enzymes database (CAZy). Therefore, microbial communities and functions were linked during the composting process and the lignocellulose degradation mechanism was elaborated. The results showed that the addition of apple wood increased the compost humus (HS) yield, and pretreatment of apple wood enhanced the lignocellulose degradation during composting processes. In addition, pretreatment improved the physicochemical properties, such as temperature, pH, electric conductivity (EC), ammonium nitrogen (NH4+), and nitrate nitrogen (NO3-) in the compost, of which, acid treated apple wood compost (AcAWC) achieved the highest temperature of 58.4 °C, effectively promoting nitrification with NO3- ultimately reaching 0.127 g/kg. In all composts, microbial networks constructed a high proportion of positively correlated connections, and microorganisms promoted the composting process through cooperation. The proportions of glycosyltransferase (GT) and glycoside hydrolase (GH) promoted the separation and degradation of lignocellulose during composting to form HS. Notably, the adverse effects of the alkali-treated apple wood compost on bacteria were greater. AcAWC showed significant correlations between bacterial and fungal communities and both lignin and hemicellulose, and had more biomarkers associated with lignocellulose degradation and humification. The lignin degradation rate was 24.57 % and the HS yield increased by 27.49 %. Therefore, AcAWC has been confirmed to enhance lignocellulose degradation and promote compost humification by altering the properties of the apple wood and establishing a richer microbial community.


Asunto(s)
Compostaje , Lignina , Malus , Estiércol , Madera , Lignina/metabolismo , Animales , Bovinos , Biomasa , Sustancias Húmicas , Biodegradación Ambiental
12.
J Biomed Inform ; 154: 104646, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38677633

RESUMEN

OBJECTIVES: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. METHODS: We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. RESULTS: The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.


Asunto(s)
Inteligencia Artificial , Sesgo , Procesamiento de Lenguaje Natural , Humanos , Encuestas y Cuestionarios , Aprendizaje Automático , Algoritmos
13.
Ophthalmology ; 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38657840

RESUMEN

PURPOSE: To update the Age-Related Eye Disease Study (AREDS) simplified severity scale for risk of late age-related macular degeneration (AMD), including incorporation of reticular pseudodrusen (RPD), and to perform external validation on the Age-Related Eye Disease Study 2 (AREDS2). DESIGN: Post hoc analysis of 2 clinical trial cohorts: AREDS and AREDS2. PARTICIPANTS: Participants with no late AMD in either eye at baseline in AREDS (n = 2719) and AREDS2 (n = 1472). METHODS: Five-year rates of progression to late AMD were calculated according to levels 0 to 4 on the simplified severity scale after 2 updates: (1) noncentral geographic atrophy (GA) considered part of the outcome, rather than a risk feature, and (2) scale separation according to RPD status (determined by validated deep learning grading of color fundus photographs). MAIN OUTCOME MEASURES: Five-year rate of progression to late AMD (defined as neovascular AMD or any GA). RESULTS: In the AREDS, after the first scale update, the 5-year rates of progression to late AMD for levels 0 to 4 were 0.3%, 4.5%, 12.9%, 32.2%, and 55.6%, respectively. As the final simplified severity scale, the 5-year progression rates for levels 0 to 4 were 0.3%, 4.3%, 11.6%, 26.7%, and 50.0%, respectively, for participants without RPD at baseline and 2.8%, 8.0%, 29.0%, 58.7%, and 72.2%, respectively, for participants with RPD at baseline. In external validation on the AREDS2, for levels 2 to 4, the progression rates were similar: 15.0%, 27.7%, and 45.7% (RPD absent) and 26.2%, 46.0%, and 73.0% (RPD present), respectively. CONCLUSIONS: The AREDS AMD simplified severity scale has been modernized with 2 important updates. The new scale for individuals without RPD has 5-year progression rates of approximately 0.5%, 4%, 12%, 25%, and 50%, such that the rates on the original scale remain accurate. The new scale for individuals with RPD has 5-year progression rates of approximately 3%, 8%, 30%, 60%, and 70%, that is, approximately double for most levels. This scale fits updated definitions of late AMD, has increased prognostic accuracy, seems generalizable to similar populations, but remains simple for broad risk categorization. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

14.
ArXiv ; 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38529075

RESUMEN

Background: Even though patients have easy access to their electronic health records and lab test results data through patient portals, lab results are often confusing and hard to understand. Many patients turn to online forums or question and answering (Q&A) sites to seek advice from their peers. However, the quality of answers from social Q&A on health-related questions varies significantly, and not all the responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. Objective: We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with augmentation approaches. Methods: We first collected lab test results related question and answer data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from four LLMs including GPT-4, Meta LLaMA 2, MedAlpaca, and ORCA_mini. We first assessed the similarity of their answers using standard QA similarity-based evaluation metrics including ROUGE, BLEU, METEOR, BERTScore. We also utilized an LLM-based evaluator to judge whether a target model has higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. Finally, we performed a manual evaluation with medical experts for all the responses of seven selected questions on the same four aspects. Results: Regarding the similarity of the responses from 4 LLMs, where GPT-4 output was used as the reference answer, the responses from LLaMa 2 are the most similar ones, followed by LLaMa 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored lowest and thus least similar to GPT-4-generated answers. The results of Win Rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all the four aspects (relevance, correctness, helpfulness, and safety). However, LLM responses occasionally also suffer from lack of interpretation in one's medical context, incorrect statements, and lack of references. Conclusions: By evaluating LLMs in generating responses to patients' lab test results related questions, we find that compared to other three LLMs and human answer from the Q&A website, GPT-4's responses are more accurate, helpful, relevant, and safer. However, there are cases that GPT-4 responses are inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses including prompt engineering, prompt augmentation, retrieval augmented generation, and response evaluation.

15.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38514400

RESUMEN

MOTIVATION: Large Language Models (LLMs) have the potential to revolutionize the field of Natural Language Processing, excelling not only in text generation and reasoning tasks but also in their ability for zero/few-shot learning, swiftly adapting to new tasks with minimal fine-tuning. LLMs have also demonstrated great promise in biomedical and healthcare applications. However, when it comes to Named Entity Recognition (NER), particularly within the biomedical domain, LLMs fall short of the effectiveness exhibited by fine-tuned domain-specific models. One key reason is that NER is typically conceptualized as a sequence labeling task, whereas LLMs are optimized for text generation and reasoning tasks. RESULTS: We developed an instruction-based learning paradigm that transforms biomedical NER from a sequence labeling task into a generation task. This paradigm is end-to-end and streamlines the training and evaluation process by automatically repurposing pre-existing biomedical NER datasets. We further developed BioNER-LLaMA using the proposed paradigm with LLaMA-7B as the foundational LLM. We conducted extensive testing on BioNER-LLaMA across three widely recognized biomedical NER datasets, consisting of entities related to diseases, chemicals, and genes. The results revealed that BioNER-LLaMA consistently achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-4 on datasets with different biomedical entities. We show that a general-domain LLM can match the performance of rigorously fine-tuned PubMedBERT models and PMC-LLaMA, biomedical-specific language model. Our findings underscore the potential of our proposed paradigm in developing general-domain LLMs that can rival SOTA performances in multi-task, multi-domain scenarios in biomedical and health applications. AVAILABILITY AND IMPLEMENTATION: Datasets and other resources are available at https://github.com/BIDS-Xu-Lab/BioNER-LLaMA.


Asunto(s)
Camélidos del Nuevo Mundo , Aprendizaje Profundo , Animales , Lenguaje , Procesamiento de Lenguaje Natural
16.
J Am Chem Soc ; 146(12): 7950-7955, 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38483267

RESUMEN

Single-site catalysts (SSCs) achieve a high catalytic performance through atomically dispersed active sites. A challenge facing the development of SSCs is aggregation of active catalytic species. Reducing the loading of these sites to very low levels is a common strategy to mitigate aggregation and sintering; however, this limits the tools that can be used to characterize the SSCs. Here we report a sintering-resistant SSC with high loading that is achieved by incorporating Anderson-Evans polyoxometalate clusters (POMs, MMo6O24, M = Rh/Pt) within NU-1000, a Zr-based metal-organic framework (MOF). The dual confinement provided by isolating the active site within the POM, then isolating the POMs within the MOF, facilitates the formation of isolated noble metal sites with low coordination numbers via exsolution from the POM during activation. The high loading (up to 3.2 wt %) that can be achieved without sintering allowed the local structure transformation in the POM cluster and the surrounding MOF to be evaluated using in situ X-ray scattering with pair distribution function (PDF) analysis. Notably, the Rh/Pt···Mo distance in the active catalyst is shorter than the M···M bond lengths in the respective bulk metals. Models of the active cluster structure were identified based on the PDF data with complementary computation and X-ray absorption spectroscopy analysis.

17.
ArXiv ; 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38529077

RESUMEN

Objectives: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. Methods: We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. Results: The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.

18.
Comput Med Imaging Graph ; 114: 102363, 2024 06.
Artículo en Inglés | MEDLINE | ID: mdl-38447381

RESUMEN

Reliable localization of lymph nodes (LNs) in multi-parametric MRI (mpMRI) studies plays a major role in the assessment of lymphadenopathy and staging of metastatic disease. Radiologists routinely measure the nodal size in order to distinguish benign from malignant nodes, which require subsequent cancer staging. However, identification of lymph nodes is a cumbersome task due to their myriad appearances in mpMRI studies. Multiple sequences are acquired in mpMRI studies, including T2 fat suppressed (T2FS) and diffusion weighted imaging (DWI) sequences among others; consequently, the sizing of LNs is rendered challenging due to the variety of signal intensities in these sequences. Furthermore, radiologists can miss potentially metastatic LNs during a busy clinical day. To lighten these imaging and workflow challenges, we propose a computer-aided detection (CAD) pipeline to detect both benign and malignant LNs in the body for their subsequent measurement. We employed the recently proposed Dynamic Head (DyHead) neural network to detect LNs in mpMRI studies that were acquired using a variety of scanners and exam protocols. The T2FS and DWI series were co-registered, and a selective augmentation technique called Intra-Label LISA (ILL) was used to blend the two volumes with the interpolation factor drawn from a Beta distribution. In this way, ILL diversified the samples that the model encountered during the training phase, while the requirement for both sequences to be present at test time was nullified. Our results showed a mean average precision (mAP) of 53.5% and a sensitivity of ∼78% with ILL at 4 FP/vol. This corresponded to an improvement of ≥10% in mAP and ≥12% in sensitivity at 4FP (p ¡ 0.05) respectively over current LN detection approaches evaluated on the same dataset. We also established the out-of-distribution robustness of the DyHead model by training it on data acquired by a Siemens Aera scanner and testing it on data from the Siemens Verio, Siemens Biograph mMR, and Philips Achieva scanners. Our pilot work represents an important first step towards automated detection, segmentation, and classification of lymph nodes in mpMRI.


Asunto(s)
Imágenes de Resonancia Magnética Multiparamétrica , Humanos , Metástasis Linfática/diagnóstico por imagen , Metástasis Linfática/patología , Imagen de Difusión por Resonancia Magnética/métodos , Ganglios Linfáticos/diagnóstico por imagen , Estadificación de Neoplasias
19.
ArXiv ; 2024 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-38410657

RESUMEN

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

20.
ArXiv ; 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38410646

RESUMEN

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA