Pesquisa | BVS Integralidade em Saúde

1.

End-to-end pseudonymization of fine-tuned clinical BERT models : Privacy preservation with maintained data utility.

Vakili, Thomas; Henriksson, Aron; Dalianis, Hercules.

BMC Med Inform Decis Mak ; 24(1): 162, 2024 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-38915012

RESUMO

Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks. This is cause for concern, especially when these models are applied in the clinical domain, where data are very sensitive. Training data pseudonymization is a privacy-preserving technique that aims to mitigate these problems. This technique automatically identifies and replaces sensitive entities with realistic but non-sensitive surrogates. Pseudonymization has yielded promising results in previous studies. However, no previous study has applied pseudonymization to both the pre-training data of PLMs and the fine-tuning data used to solve clinical NLP tasks. This study evaluates the effects on the predictive performance of end-to-end pseudonymization of Swedish clinical BERT models fine-tuned for five clinical NLP tasks. A large number of statistical tests are performed, revealing minimal harm to performance when using pseudonymized fine-tuning data. The results also find no deterioration from end-to-end pseudonymization of pre-training and fine-tuning data. These results demonstrate that pseudonymizing training data to reduce privacy risks can be done without harming data utility for training PLMs.

Assuntos

Processamento de Linguagem Natural , Humanos , Privacidade , Suécia , Anônimos e Pseudônimos , Segurança Computacional/normas , Confidencialidade/normas , Registros Eletrônicos de Saúde/normas

2.

Peripheral Oxygen Saturation Facilitates Assessment of Respiratory Dysfunction in the Sequential Organ Failure Assessment Score With Implications for the Sepsis-3 Criteria.

Valik, John Karlsson; Mellhammar, Lisa; Sundén-Cullberg, Jonas; Ward, Logan; Unge, Christian; Dalianis, Hercules; Henriksson, Aron; Strålin, Kristoffer; Linder, Adam; Nauclér, Pontus.

Crit Care Med ; 50(3): e272-e283, 2022 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-34406170

RESUMO

OBJECTIVES: Sequential Organ Failure Assessment score is the basis of the Sepsis-3 criteria and requires arterial blood gas analysis to assess respiratory function. Peripheral oxygen saturation is a noninvasive alternative but is not included in neither Sequential Organ Failure Assessment score nor Sepsis-3. We aimed to assess the association between worst peripheral oxygen saturation during onset of suspected infection and mortality. DESIGN: Cohort study of hospital admissions from a main cohort and emergency department visits from four external validation cohorts between year 2011 and 2018. Data were collected from electronic health records and prospectively by study investigators. SETTING: Eight academic and community hospitals in Sweden and Canada. PATIENTS: Adult patients with suspected infection episodes. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: The main cohort included 19,396 episodes (median age, 67.0 [53.0-77.0]; 9,007 [46.4%] women; 1,044 [5.4%] died). The validation cohorts included 10,586 episodes (range of median age, 61.0-76.0; women 42.1-50.2%; mortality 2.3-13.3%). Peripheral oxygen saturation levels 96-95% were not significantly associated with increased mortality in the main or pooled validation cohorts. At peripheral oxygen saturation 94%, the adjusted odds ratio of death was 1.56 (95% CI, 1.10-2.23) in the main cohort and 1.36 (95% CI, 1.00-1.85) in the pooled validation cohorts and increased gradually below this level. Respiratory assessment using peripheral oxygen saturation 94-91% and less than 91% to generate 1 and 2 Sequential Organ Failure Assessment points, respectively, improved the discrimination of the Sequential Organ Failure Assessment score from area under the receiver operating characteristics 0.75 (95% CI, 0.74-0.77) to 0.78 (95% CI, 0.77-0.80; p < 0.001). Peripheral oxygen saturation/Fio2 ratio had slightly better predictive performance compared with peripheral oxygen saturation alone, but the clinical impact was minor. CONCLUSIONS: These findings provide evidence for assessing respiratory function with peripheral oxygen saturation in the Sequential Organ Failure Assessment score and the Sepsis-3 criteria. Our data support using peripheral oxygen saturation thresholds 94% and 90% to get 1 and 2 Sequential Organ Failure Assessment respiratory points, respectively. This has important implications primarily for emergency practice, rapid response teams, surveillance, research, and resource-limited settings.

Assuntos

Unidades de Terapia Intensiva , Escores de Disfunção Orgânica , Consumo de Oxigênio/fisiologia , Saturação de Oxigênio/fisiologia , Sepse/sangue , Sepse/mortalidade , Idoso , Estudos de Coortes , Feminino , Mortalidade Hospitalar , Humanos , Masculino , Pessoa de Meia-Idade , Oxigênio/sangue , Estudos Retrospectivos , Síndrome de Resposta Inflamatória Sistêmica

3.

Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish.

Blanco, Alberto; Remmer, Sonja; Pérez, Alicia; Dalianis, Hercules; Casillas, Arantza.

J Biomed Inform ; 130: 104050, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35346854

RESUMO

Multi-label classification according to the International Classification of Diseases (ICD) is an Extreme Multi-label Classification task aiming to categorise health records according to a set of relevant ICD codes. We implemented PlaBERT, a new multi-label text classification head with per-label attention, on top of a BERT model. The model assessment is conducted on Electronic Health Records, conveying Discharge Summaries in three languages - English, Spanish, and Swedish. The study focuses on 157 diagnostic codes from the ICD. We additionally measure the labelling noise to estimate the consistency of the gold standard. Our specialised attention mechanism computes attention weights for each input token and label pair, obtaining the specific relevance of every word concerning each ICD code. The PlaBERT model outputs the computed attention importance for each token and label, allowing for visualisation. Our best results are 40.65, 38.36, and 41.13 F1-Score points on the English, Spanish and Swedish datasets, respectively, for the 157 gastrointestinal codes. Besides, Precision is the metric that most significantly improves owing to the attention mechanism of PlaBERT, with an increase of 44.63, 40.93, and 12.92 points, respectively, for the Spanish, Swedish and English datasets.

Assuntos

Classificação Internacional de Doenças , Idioma , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , Alta do Paciente , Suécia

4.

Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora.

Pérez, Alicia; Weegar, Rebecka; Casillas, Arantza; Gojenola, Koldo; Oronoz, Maite; Dalianis, Hercules.

J Biomed Inform ; 71: 16-30, 2017 07.

Artigo em Inglês | MEDLINE | ID: mdl-28526460

RESUMO

OBJECTIVE: The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. METHODS: The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. RESULTS: The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. CONCLUSIONS: The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.

Assuntos

Registros Eletrônicos de Saúde , Aprendizado de Máquina , Semântica , Análise por Conglomerados , Curadoria de Dados , Humanos , Suécia

5.

Ensembles of randomized trees using diverse distributed representations of clinical events.

Henriksson, Aron; Zhao, Jing; Dalianis, Hercules; Boström, Henrik.

BMC Med Inform Decis Mak ; 16 Suppl 2: 69, 2016 07 21.

Artigo em Inglês | MEDLINE | ID: mdl-27459846

RESUMO

BACKGROUND: Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling. METHODS: Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events - diagnosis codes, drug codes, measurements, and words in clinical notes - are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces - corresponding to the considered data types - of a given context window size. RESULTS: The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases. CONCLUSIONS: The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy - significantly outperforming the considered alternatives - involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.

Assuntos

Árvores de Decisões , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Modelos Teóricos , Farmacovigilância , Humanos , Semântica

6.

Identifying adverse drug event information in clinical notes with distributional semantic representations of context.

Henriksson, Aron; Kvist, Maria; Dalianis, Hercules; Duneld, Martin.

J Biomed Inform ; 57: 333-49, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26291578

RESUMO

For the purpose of post-marketing drug safety surveillance, which has traditionally relied on the voluntary reporting of individual cases of adverse drug events (ADEs), other sources of information are now being explored, including electronic health records (EHRs), which give us access to enormous amounts of longitudinal observations of the treatment of patients and their drug use. Adverse drug events, which can be encoded in EHRs with certain diagnosis codes, are, however, heavily underreported. It is therefore important to develop capabilities to process, by means of computational methods, the more unstructured EHR data in the form of clinical notes, where clinicians may describe and reason around suspected ADEs. In this study, we report on the creation of an annotated corpus of Swedish health records for the purpose of learning to identify information pertaining to ADEs present in clinical notes. To this end, three key tasks are tackled: recognizing relevant named entities (disorders, symptoms, drugs), labeling attributes of the recognized entities (negation, speculation, temporality), and relationships between them (indication, adverse drug event). For each of the three tasks, leveraging models of distributional semantics - i.e., unsupervised methods that exploit co-occurrence information to model, typically in vector space, the meaning of words - and, in particular, combinations of such models, is shown to improve the predictive performance. The ability to make use of such unsupervised methods is critical when faced with large amounts of sparse and high-dimensional data, especially in domains where annotated resources are scarce.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Semântica , Curadoria de Dados , Mineração de Dados , Humanos

7.

Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules.

J Biomed Inform ; 49: 148-58, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24508177

RESUMO

Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that the approaches used for English are also suitable for Swedish clinical text. However, a small proportion of the errors made by the model are less likely to occur in English text, showing that results might be improved by further tailoring the system to clinical Swedish. The entity recognition results for the individual entities Disorder and Finding show that it is meaningful to separate the general category Medical Problem into these two more granular entity types, e.g. for knowledge mining of co-morbidity relations and disorder-finding relations.

Assuntos

Inteligência Artificial , Automação , Doença , Humanos

8.

Improving Quality of ICD-10 (International Statistical Classification of Diseases, Tenth Revision) Coding Using AI: Protocol for a Crossover Randomized Controlled Trial.

Chomutare, Taridzo; Lamproudis, Anastasios; Budrionis, Andrius; Svenning, Therese Olsen; Hind, Lill Irene; Ngo, Phuong Dinh; Mikalsen, Karl Øyvind; Dalianis, Hercules.

JMIR Res Protoc ; 13: e54593, 2024 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-38470476

RESUMO

BACKGROUND: Computer-assisted clinical coding (CAC) tools are designed to help clinical coders assign standardized codes, such as the ICD-10 (International Statistical Classification of Diseases, Tenth Revision), to clinical texts, such as discharge summaries. Maintaining the integrity of these standardized codes is important both for the functioning of health systems and for ensuring data used for secondary purposes are of high quality. Clinical coding is an error-prone cumbersome task, and the complexity of modern classification systems such as the ICD-11 (International Classification of Diseases, Eleventh Revision) presents significant barriers to implementation. To date, there have only been a few user studies; therefore, our understanding is still limited regarding the role CAC systems can play in reducing the burden of coding and improving the overall quality of coding. OBJECTIVE: The objective of the user study is to generate both qualitative and quantitative data for measuring the usefulness of a CAC system, Easy-ICD, that was developed for recommending ICD-10 codes. Specifically, our goal is to assess whether our tool can reduce the burden on clinical coders and also improve coding quality. METHODS: The user study is based on a crossover randomized controlled trial study design, where we measure the performance of clinical coders when they use our CAC tool versus when they do not. Performance is measured by the time it takes them to assign codes to both simple and complex clinical texts as well as the coding quality, that is, the accuracy of code assignment. RESULTS: We expect the study to provide us with a measurement of the effectiveness of the CAC system compared to manual coding processes, both in terms of time use and coding quality. Positive outcomes from this study will imply that CAC tools hold the potential to reduce the burden on health care staff and will have major implications for the adoption of artificial intelligence-based CAC innovations to improve coding practice. Expected results to be published summer 2024. CONCLUSIONS: The planned user study promises a greater understanding of the impact CAC systems might have on clinical coding in real-life settings, especially with regard to coding time and quality. Further, the study may add new insights on how to meaningfully exploit current clinical text mining capabilities, with a view to reducing the burden on clinical coders, thus lowering the barriers and paving a more sustainable path to the adoption of modern coding systems, such as the new ICD-11. TRIAL REGISTRATION: clinicaltrials.gov NCT06286865; https://clinicaltrials.gov/study/NCT06286865. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/54593.

9.

De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.

Lamproudis, Anastasios; Mora, Sara; Svenning, Therese Olsen; Torsvik, Torbjørn; Chomutare, Taridzo; Ngo, Phuong Dinh; Dalianis, Hercules.

AMIA Annu Symp Proc ; 2023: 456-464, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38222432

RESUMO

The lack of relevant annotated datasets represents one key limitation in the application of Natural Language Processing techniques in a broad number of tasks, among them Protected Health Information (PHI) identification in Norwegian clinical text. In this work, the possibility of exploiting resources from Swedish, a very closely related language, to Norwegian is explored. The Swedish dataset is annotated with PHI information. Different processing and text augmentation techniques are evaluated, along with their impact in the final performance of the model. The augmentation techniques, such as injection and generation of both Norwegian and Scandinavian Named Entities into the Swedish training corpus, showed to increase the performance in the de-identification task for both Danish and Norwegian text. This trend was also confirmed by the evaluation of model performance on a sample Norwegian gastro surgical clinical text.

Assuntos

Registros Eletrônicos de Saúde , Idioma , Humanos , Suécia , Processamento de Linguagem Natural , Dinamarca

10.

Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding.

Lamproudis, Anastasios; Svenning, Therese Olsen; Torsvik, Torbjørn; Chomutare, Taridzo; Budrionis, Andrius; Dinh Ngo, Phuong; Vakili, Thomas; Dalianis, Hercules.

AMIA Annu Symp Proc ; 2023: 465-473, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38222373

RESUMO

With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish. Most importantly, it can be used in a coding support setup where a tool can recommend potential codes to the coders. This reduces the range of potential codes to consider and, in turn, reduces the workload of the coder. Moreover, the de-identified and pseudonymised dataset is open to use for academic users.

Assuntos

Classificação Internacional de Doenças , Alta do Paciente , Humanos , Processamento de Linguagem Natural , Codificação Clínica

11.

Predicting sepsis onset using a machine learned causal probabilistic network algorithm based on electronic health records data.

Valik, John Karlsson; Ward, Logan; Tanushi, Hideyuki; Johansson, Anders F; Färnert, Anna; Mogensen, Mads Lause; Pickering, Brian W; Herasevich, Vitaly; Dalianis, Hercules; Henriksson, Aron; Nauclér, Pontus.

Sci Rep ; 13(1): 11760, 2023 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-37474597

RESUMO

Sepsis is a leading cause of mortality and early identification improves survival. With increasing digitalization of health care data automated sepsis prediction models hold promise to aid in prompt recognition. Most previous studies have focused on the intensive care unit (ICU) setting. Yet only a small proportion of sepsis develops in the ICU and there is an apparent clinical benefit to identify patients earlier in the disease trajectory. In this cohort of 82,852 hospital admissions and 8038 sepsis episodes classified according to the Sepsis-3 criteria, we demonstrate that a machine learned score can predict sepsis onset within 48 h using sparse routine electronic health record data outside the ICU. Our score was based on a causal probabilistic network model-SepsisFinder-which has similarities with clinical reasoning. A prediction was generated hourly on all admissions, providing a new variable was registered. Compared to the National Early Warning Score (NEWS2), which is an established method to identify sepsis, the SepsisFinder triggered earlier and had a higher area under receiver operating characteristic curve (AUROC) (0.950 vs. 0.872), as well as area under precision-recall curve (APR) (0.189 vs. 0.149). A machine learning comparator based on a gradient-boosting decision tree model had similar AUROC (0.949) and higher APR (0.239) than SepsisFinder but triggered later than both NEWS2 and SepsisFinder. The precision of SepsisFinder increased if screening was restricted to the earlier admission period and in episodes with bloodstream infection. Furthermore, the SepsisFinder signaled median 5.5 h prior to antibiotic administration. Identifying a high-risk population with this method could be used to tailor clinical interventions and improve patient care.

Assuntos

Registros Eletrônicos de Saúde , Sepse , Humanos , Estudos Retrospectivos , Sepse/diagnóstico , Sepse/epidemiologia , Algoritmos , Hospitalização , Curva ROC , Unidades de Terapia Intensiva , Mortalidade Hospitalar

12.

The accuracy of fully automated algorithms for surveillance of healthcare-onset Clostridioides difficile infections in hospitalized patients.

van der Werff, Suzanne Desirée; Fritzing, Mikael; Tanushi, Hideyuki; Henriksson, Aron; Dalianis, Hercules; Ternhag, Anders; Färnert, Anna; Nauclér, Pontus.

Antimicrob Steward Healthc Epidemiol ; 2(1): e43, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36310782

RESUMO

We developed and validated a set of fully automated surveillance algorithms for healthcare-onset CDI using electronic health records. In a validation data set of 750 manually annotated admissions, the algorithm based on International Classification of Disease, Tenth Revision (ICD-10) code A04.7 had insufficient sensitivity. Algorithms based on microbiological test results with or without addition of symptoms performed well.

13.

Factuality levels of diagnoses in Swedish clinical text.

Velupillai, Sumithra; Dalianis, Hercules; Kvist, Maria.

Stud Health Technol Inform ; 169: 559-63, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21893811

RESUMO

Different levels of knowledge certainty, or factuality levels, are expressed in clinical health record documentation. This information is currently not fully exploited, as the subtleties expressed in natural language cannot easily be machine analyzed. Extracting relevant information from knowledge-intensive resources such as electronic health records can be used for improving health care in general by e.g. building automated information access systems. We present an annotation model of six factuality levels linked to diagnoses in Swedish clinical assessments from an emergency ward. Our main findings are that overall agreement is fairly high (0.7/0.58 F-measure, 0.73/0.6 Cohen's κ, Intra/Inter). These distinctions are important for knowledge models, since only approx. 50% of the diagnoses are affirmed with certainty. Moreover, our results indicate that there are patterns inherent in the diagnosis expressions themselves conveying factuality levels, showing that certainty is not only dependent on context cues.

Assuntos

Sistemas de Apoio a Decisões Clínicas , Registros Eletrônicos de Saúde , Serviços Médicos de Emergência , Algoritmos , Diagnóstico por Computador , Diagnóstico Diferencial , Processamento Eletrônico de Dados , Humanos , Idioma , Informática Médica/métodos , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Software , Suécia , Terminologia como Assunto

14.

Louhi 2014: Special issue on health text mining and information analysis.

Velupillai, Sumithra; Duneld, Martin; Henriksson, Aron; Kvist, Maria; Skeppstedt, Maria; Dalianis, Hercules.

BMC Med Inform Decis Mak ; 15 Suppl 2: S1, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26099575

Assuntos

Mineração de Dados/métodos , Gestão da Informação em Saúde/organização & administração , Sistemas de Informação em Saúde/organização & administração , Congressos como Assunto , Gestão da Informação em Saúde/métodos , Humanos

15.

Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.

Caccamisi, Andrea; Jørgensen, Leif; Dalianis, Hercules; Rosenlund, Mats.

Ups J Med Sci ; 125(4): 316-324, 2020 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-32696698

RESUMO

BACKGROUND: The electronic medical record (EMR) offers unique possibilities for clinical research, but some important patient attributes are not readily available due to its unstructured properties. We applied text mining using machine learning to enable automatic classification of unstructured information on smoking status from Swedish EMR data. METHODS: Data on patients' smoking status from EMRs were used to develop 32 different predictive models that were trained using Weka, changing sentence frequency, classifier type, tokenization, and attribute selection in a database of 85,000 classified sentences. The models were evaluated using F-score and accuracy based on out-of-sample test data including 8500 sentences. The error weight matrix was used to select the best model, assigning a weight to each type of misclassification and applying it to the model confusion matrices. The best performing model was then compared to a rule-based method. RESULTS: The best performing model was based on the Support Vector Machine (SVM) Sequential Minimal Optimization (SMO) classifier using a combination of unigrams and bigrams as tokens. Sentence frequency and attributes selection did not improve model performance. SMO achieved 98.14% accuracy and 0.981 F-score versus 79.32% and 0.756 for the rule-based model. CONCLUSION: A model using machine-learning algorithms to automatically classify patients' smoking status was successfully developed. Such algorithms may enable automatic assessment of smoking status and other unstructured data directly from EMRs without manual classification of complete case notes.

Assuntos

Registros Eletrônicos de Saúde , Aprendizado de Máquina , Processamento de Linguagem Natural , Fumar , Tabagismo/diagnóstico , Algoritmos , Automação , Teorema de Bayes , Mineração de Dados , Reações Falso-Positivas , Humanos , Informática Médica , Variações Dependentes do Observador , Reconhecimento Automatizado de Padrão , Curva ROC , Reprodutibilidade dos Testes , Projetos de Pesquisa , Software , Máquina de Vetores de Suporte , Suécia/epidemiologia , Tabagismo/epidemiologia

16.

De-Identifying Swedish EHR Text Using Public Resources in the General Domain.

Chomutare, Taridzo; Yigzaw, Kassaye Yitbarek; Budrionis, Andrius; Makhlysheva, Alexandra; Godtliebsen, Fred; Dalianis, Hercules.

Stud Health Technol Inform ; 270: 148-152, 2020 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-32570364

RESUMO

Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.

Assuntos

Registros Eletrônicos de Saúde , Idioma , Aprendizado de Máquina , Processamento de Linguagem Natural , Suécia

17.

Validation of automated sepsis surveillance based on the Sepsis-3 clinical criteria against physician record review in a general hospital population: observational study using electronic health records data.

Valik, John Karlsson; Ward, Logan; Tanushi, Hideyuki; Müllersdorf, Kajsa; Ternhag, Anders; Aufwerber, Ewa; Färnert, Anna; Johansson, Anders F; Mogensen, Mads Lause; Pickering, Brian; Dalianis, Hercules; Henriksson, Aron; Herasevich, Vitaly; Nauclér, Pontus.

BMJ Qual Saf ; 29(9): 735-745, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32029574

RESUMO

BACKGROUND: Surveillance of sepsis incidence is important for directing resources and evaluating quality-of-care interventions. The aim was to develop and validate a fully-automated Sepsis-3 based surveillance system in non-intensive care wards using electronic health record (EHR) data, and demonstrate utility by determining the burden of hospital-onset sepsis and variations between wards. METHODS: A rule-based algorithm was developed using EHR data from a cohort of all adult patients admitted at an academic centre between July 2012 and December 2013. Time in intensive care units was censored. To validate algorithm performance, a stratified random sample of 1000 hospital admissions (674 with and 326 without suspected infection) was classified according to the Sepsis-3 clinical criteria (suspected infection defined as having any culture taken and at least two doses of antimicrobials administered, and an increase in Sequential Organ Failure Assessment (SOFA) score by >2 points) and the likelihood of infection by physician medical record review. RESULTS: In total 82 653 hospital admissions were included. The Sepsis-3 clinical criteria determined by physician review were met in 343 of 1000 episodes. Among them, 313 (91%) had possible, probable or definite infection. Based on this reference, the algorithm achieved sensitivity 0.887 (95% CI: 0.799 to 0.964), specificity 0.985 (95% CI: 0.978 to 0.991), positive predictive value 0.881 (95% CI: 0.833 to 0.926) and negative predictive value 0.986 (95% CI: 0.973 to 0.996). When applied to the total cohort taking into account the sampling proportions of those with and without suspected infection, the algorithm identified 8599 (10.4%) sepsis episodes. The burden of hospital-onset sepsis (>48 hour after admission) and related in-hospital mortality varied between wards. CONCLUSIONS: A fully-automated Sepsis-3 based surveillance algorithm using EHR data performed well compared with physician medical record review in non-intensive care wards, and exposed variations in hospital-onset sepsis incidence between wards.

Assuntos

Médicos , Sepse , Adulto , Registros Eletrônicos de Saúde , Feminino , Infecções por HIV , Mortalidade Hospitalar , Hospitais Gerais , Humanos , Unidades de Terapia Intensiva , Estudos Retrospectivos

18.

Knowledge patterns for online health portal development.

Andrenucci, Andrea; Dalianis, Hercules; Velupillai, Sumithra.

Health Informatics J ; 25(4): 1779-1799, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-30232926

RESUMO

This article describes the development and evaluation of a set of knowledge patterns that provide guidelines and implications of design for developers of mental health portals. The knowledge patterns were based on three foundations: (1) knowledge integration of language technology approaches; (2) experiments with language technology applications and (3) user studies of portal interaction. A mixed-methods approach was employed for the evaluation of the knowledge patterns: formative workshops with knowledge pattern experts and summative surveys with experts in specific domains. The formative evaluation improved the cohesion of the patterns. The results of the summative evaluation showed that the problems discussed in the patterns were relevant for the domain, and that the knowledge embedded was useful to solve them. Ten patterns out of thirteen achieved an average score above 4.0, which is a positive result that leads us to conclude that they can be used as guidelines for developing health portals.

Assuntos

Conhecimento , Portais do Paciente , Desenvolvimento de Programas/métodos , Humanos , Desenvolvimento de Programas/estatística & dados numéricos , Avaliação de Programas e Projetos de Saúde/métodos , Avaliação de Programas e Projetos de Saúde/estatística & dados numéricos , Inquéritos e Questionários

19.

Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting.

Ehrentraut, Claudia; Ekholm, Markus; Tanushi, Hideyuki; Tiedemann, Jörg; Dalianis, Hercules.

Health Informatics J ; 24(1): 24-42, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-27496862

RESUMO

Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records. This study focuses on the application of text classification using support vector machines and gradient tree boosting to the problem. Support vector machines and gradient tree boosting have never been applied to the problem of detecting hospital-acquired infections in Swedish patient records, and according to our experiments, they lead to encouraging results. The best result is yielded by gradient tree boosting, at 93.7 percent recall, 79.7 percent precision and 85.7 percent F1 score when using stemming. We can show that simple preprocessing techniques and parameter tuning can lead to high recall (which we aim for in screening patient records) with appropriate precision for this task.

Assuntos

Análise de Dados , Doença Iatrogênica , Infecções/diagnóstico , Aprendizado de Máquina/normas , Máquina de Vetores de Suporte/normas , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Infecções/classificação , Infecções/etiologia , Aprendizado de Máquina/estatística & dados numéricos , Programas de Rastreamento/métodos , Programas de Rastreamento/normas

20.

Clinical Natural Language Processing in languages other than English: opportunities and challenges.

Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra; Savova, Guergana; Zweigenbaum, Pierre.

J Biomed Semantics ; 9(1): 12, 2018 03 30.

Artigo em Inglês | MEDLINE | ID: mdl-29602312

RESUMO

BACKGROUND: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. MAIN BODY: We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. CONCLUSION: We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

Assuntos

Processamento de Linguagem Natural , Humanos , Semântica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa