Pesquisa | Biblioteca Virtual em Saúde

Localizing in-domain adaptation of transformer-based biomedical language models.

Buonocore, Tommaso Mario; Crema, Claudio; Redolfi, Alberto; Bellazzi, Riccardo; Parimbelli, Enea.

J Biomed Inform ; 144: 104431, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37385327

RESUMO

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.

Assuntos

Idioma , Processamento de Linguagem Natural , Humanos , Registros , Itália , Unified Medical Language System

Advancing Italian biomedical information extraction with transformers-based models: Methodological insights and multicenter practical application.

Crema, Claudio; Buonocore, Tommaso Mario; Fostinelli, Silvia; Parimbelli, Enea; Verde, Federico; Fundarò, Cira; Manera, Marina; Ramusino, Matteo Cotta; Capelli, Marco; Costa, Alfredo; Binetti, Giuliano; Bellazzi, Riccardo; Redolfi, Alberto.

J Biomed Inform ; 148: 104557, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38012982

RESUMO

The introduction of computerized medical records in hospitals has reduced burdensome activities like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting data from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation by using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model. Moreover, we collected and leveraged three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77 %, Precision 83.16 %, Recall 86.44 %. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach. This allowed us to establish methodological guidelines that pave the way for Natural Language Processing studies in less-resourced languages.

Assuntos

Mineração de Dados , Idioma , Humanos , Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Itália , Processamento de Linguagem Natural , Estudos Multicêntricos como Assunto

Online content on eating disorders: a natural language processing study.

Tarchi, Livio; Buonocore, Tommaso Mario; Selvi, Giulia; Ricca, Valdo; Castellini, Giovanni.

J Commun Healthc ; : 1-10, 2024 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-39041376

RESUMO

BACKGROUND: Online content can inform the personal risk of developing an eating disorder, and it can influence the time and motivation to seek treatment. Patients routinely seek information online, and access to information is crucial for both prevention and treatment. The primary aim of the current study was to quantify the readability scores of online content on eating disorders using natural language processing algorithms, across two languages: English and Italian. METHODS: Unique terms related to single diagnoses were searched using Google®. The content available on Wikipedia was also assessed. Readability was defined according to the Flesch Readability Ease (FRE) and the Rate Readability Index (RIX). The scientific support of retrieved content and the authoritativeness of sources were measured through standardized variables. RESULTS: In Italian, online content was more likely published by private psychotherapy institutes or by websites that promote diet-advice or weight-loss. In both languages, the most readable content was on Anorexia Nervosa (RIX 4.18, FRE-en 59.6, FRE-it 41.69), Bulimia Nervosa (RIX 3.99, FRE-en 66.27, FRE-it 39.66) or Binge Eating (RIX 4.01, FRE-en 68.10, FRE-it 38.62). English sources consistently had more references than Italian pages (range 35-182, vs 1-163, respectively). and had a higher percentage of citations available in the target language. The content of these references was mainly reflective of peer-reviewed or clinical manuals. CONCLUSION: Attention should be given to developing online content for Muscle Dysmorphia and Orthorexia Nervosa, as well as improving the overall readability of online content on eating disorders, especially for languages other than English.

A synthetic dataset of liver disorder patients.

Nicora, Giovanna; Buonocore, Tommaso Mario; Parimbelli, Enea.

Data Brief ; 47: 108921, 2023 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-36747982

RESUMO

The data in this article include 10,000 synthetic patients with liver disorders, characterized by 70 different variables, including clinical features, and patient outcomes, such as hospital admission or surgery. Patient data are generated, simulating as close as possible real patient data, using a publicly available Bayesian network describing a casual model for liver disorders. By varying the network parameters, we also generated an additional set of 500 patients with characteristics that deviated from the initial patient population. We provide an overview of the synthetic data generation process and the associated scripts for generating the cohorts. This dataset can be useful for the machine learning models training and validation, especially under the effect of dataset shift between training and testing sets.

Why did AI get this one wrong? - Tree-based explanations of machine learning model predictions.

Parimbelli, Enea; Buonocore, Tommaso Mario; Nicora, Giovanna; Michalowski, Wojtek; Wilk, Szymon; Bellazzi, Riccardo.

Artif Intell Med ; 135: 102471, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36628785

RESUMO

Increasingly complex learning methods such as boosting, bagging and deep learning have made ML models more accurate, but harder to interpret and explain, culminating in black-box machine learning models. Model developers and users alike are often presented with a trade-off between performance and intelligibility, especially in high-stakes applications like medicine. In the present article we propose a novel methodological approach for generating explanations for the predictions of a generic machine learning model, given a specific instance for which the prediction has been made. The method, named AraucanaXAI, is based on surrogate, locally-fitted classification and regression trees that are used to provide post-hoc explanations of the prediction of a generic machine learning model. Advantages of the proposed XAI approach include superior fidelity to the original model, ability to deal with non-linear decision boundaries, and native support to both classification and regression problems. We provide a packaged, open-source implementation of the AraucanaXAI method and evaluate its behaviour in a number of different settings that are commonly encountered in medical applications of AI. These include potential disagreement between the model prediction and physician's expert opinion and low reliability of the prediction due to data scarcity.

Assuntos

Cognição , Medicina , Reprodutibilidade dos Testes , Aprendizado de Máquina

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA