Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Biomed Inform ; 150: 104586, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38191011

RESUMEN

BACKGROUND: Halbert L. Dunn's concept of wellness is a multi-dimensional aspect encompassing social and mental well-being. Neglecting these dimensions over time can have a negative impact on an individual's mental health. The manual efforts employed in in-person therapy sessions reveal that underlying factors of mental disturbance if triggered, may lead to severe mental health disorders. OBJECTIVE: In our research, we introduce a fine-grained approach focused on identifying indicators of wellness dimensions and mark their presence in self-narrated human-writings on Reddit social media platform. DESIGN AND METHOD: We present the MultiWD dataset, a curated collection comprising 3281 instances, as a specifically designed and annotated dataset that facilitates the identification of multiple wellness dimensions in Reddit posts. In our study, we introduce the task of identifying wellness dimensions and utilize state-of-the-art classifiers to solve this multi-label classification task. RESULTS: Our findings highlights the best and comparative performance of fine-tuned large language models with fine-tuned BERT model. As such, we set BERT as a baseline model to tag wellness dimensions in a user-penned text with F1 score of 76.69. CONCLUSION: Our findings underscore the need of trustworthy and domain-specific knowledge infusion to develop more comprehensive and contextually-aware AI models for tagging and extracting wellness dimensions.


Asunto(s)
Trastornos Mentales , Medios de Comunicación Sociales , Humanos , Salud Mental , Concienciación
2.
BMC Med Inform Decis Mak ; 23(1): 20, 2023 01 26.
Artículo en Inglés | MEDLINE | ID: mdl-36703154

RESUMEN

BACKGROUND: Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. OBJECTIVE: This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. METHODS: The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. RESULTS: The named entity recognition implementation in the NLP layer achieves a performance gain of about 1-3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1-8% better). A thorough examination reveals the disease's presence and symptoms prevalence in patients. CONCLUSIONS: A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.


Asunto(s)
COVID-19 , Procesamiento de Lenguaje Natural , Humanos , Publicaciones
3.
BMC Bioinformatics ; 23(1): 210, 2022 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-35655148

RESUMEN

BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time. METHODS: This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics. RESULTS: Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%. CONCLUSION: CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions.


Asunto(s)
Benchmarking , COVID-19 , Algoritmos , Humanos , Lenguaje , Procesamiento de Lenguaje Natural
4.
Sci Rep ; 13(1): 8591, 2023 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-37237101

RESUMEN

The ability to extract critical information about an infectious disease in a timely manner is critical for population health research. The lack of procedures for mining large amounts of health data is a major impediment. The goal of this research is to use natural language processing (NLP) to extract key information (clinical factors, social determinants of health) from free text. The proposed framework describes database construction, NLP modules for locating clinical and non-clinical (social determinants) information, and a detailed evaluation protocol for evaluating results and demonstrating the effectiveness of the proposed framework. The use of COVID-19 case reports is demonstrated for data construction and pandemic surveillance. The proposed approach outperforms benchmark methods in F1-score by about 1-3%. A thorough examination reveals the disease's presence as well as the frequency of symptoms in patients. The findings suggest that prior knowledge gained through transfer learning can be useful when researching infectious diseases with similar presentations in order to accurately predict patient outcomes.


Asunto(s)
COVID-19 , Procesamiento de Lenguaje Natural , Humanos , COVID-19/epidemiología , Registros Electrónicos de Salud , Registros , Pandemias
5.
Artículo en Inglés | MEDLINE | ID: mdl-37738190

RESUMEN

Clinical decision-making is complex and time-intensive. To help in this effort, clinical recommender systems (RS) have been designed to facilitate healthcare practitioners with personalized advice. However, designing an effective clinical RS poses challenges due to the multifaceted nature of clinical data and the demand for tailored recommendations. In this paper, we introduce a 2-Stage Recommendation framework for clinical decision-making, which leverages a publicly accessible dataset of electronic health records. In the first stage, a deep neural network-based model is employed to extract a set of candidate items, such as diagnoses, medications, and prescriptions, from a patient's electronic health records. Subsequently, the second stage utilizes a deep learning model to rank and pinpoint the most relevant items for healthcare providers. Both retriever and ranker are based on pre-trained transformer models that are stacked together as a pipeline. To validate our model, we compared its performance against several baseline models using different evaluation metrics. The results reveal that our proposed model attains a performance gain of approximately 12.3% macro-average F1 compared to the second best performing baseline. Qualitative analysis across various dimensions also confirms the model's high performance. Furthermore, we discuss challenges like data availability, privacy concerns, and shed light on future exploration in this domain.

6.
Artículo en Inglés | MEDLINE | ID: mdl-37680768

RESUMEN

Background: Substance use, including the non-medical use of prescription medications, is a global health problem resulting in hundreds of thousands of overdose deaths and other health problems. Social media has emerged as a potent source of information for studying substance use-related behaviours and their consequences. Mining large-scale social media data on the topic requires the development of natural language processing (NLP) and machine learning frameworks customized for this problem. Our objective in this research is to develop a framework for conducting a content analysis of Twitter chatter about the non-medical use of a set of prescription medications. Methods: We collected Twitter data for four medications-fentanyl and morphine (opioids), alprazolam (benzodiazepine), and Adderall® (stimulant), and identified posts that indicated non-medical use using an automatic machine learning classifier. In our NLP framework, we applied supervised named entity recognition (NER) to identify other substances mentioned, symptoms, and adverse events. We applied unsupervised topic modelling to identify latent topics associated with the chatter for each medication. Results: The quantitative analysis demonstrated the performance of the proposed NER approach in identifying substance-related entities from data with a high degree of accuracy compared to the baseline methods. The performance evaluation of the topic modelling was also notable. The qualitative analysis revealed knowledge about the use, non-medical use, and side effects of these medications in individuals and communities. Conclusions: NLP-based analyses of Twitter chatter associated with prescription medications belonging to different categories provide multi-faceted insights about their use and consequences. Our developed framework can be applied to chatter about other substances. Further research can validate the predictive value of this information on the prevention, assessment, and management of these disorders.

7.
Healthc Anal (N Y) ; 2: 100068, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37520616

RESUMEN

Coronavirus disease (COVID-19) is an infectious disease, which is caused by the SARS-CoV-2 virus. Due to the growing literature on COVID-19, it is hard to get precise, up-to-date information about the virus. Practitioners, front-line workers, and researchers require expert-specific methods to stay current on scientific knowledge and research findings. However, there are a lot of research papers being written on the subject, which makes it hard to keep up with the most recent research. This problem motivates us to propose the design of the COVID-19 Search Engine (CO-SE), which is an algorithmic system that finds relevant documents for each query (asked by a user) and answers complex questions by searching a large corpus of publications. The CO-SE has a retriever component trained on the TF-IDF vectorizer that retrieves the relevant documents from the system. It also consists of a reader component that consists of a Transformer-based model, which is used to read the paragraphs and find the answers related to the query from the retrieved documents. The proposed model has outperformed previous models, obtaining an exact match ratio score of 71.45% and a semantic answer similarity score of 78.55%. It also outperforms other benchmark datasets, demonstrating the generalizability of the proposed approach.

8.
Int J Data Sci Anal ; 13(4): 335-362, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35128038

RESUMEN

Fake news is a real problem in today's world, and it has become more extensive and harder to identify. A major challenge in fake news detection is to detect it in the early phase. Another challenge in fake news detection is the unavailability or the shortage of labelled data for training the detection models. We propose a novel fake news detection framework that can address these challenges. Our proposed framework exploits the information from the news articles and the social contexts to detect fake news. The proposed model is based on a Transformer architecture, which has two parts: the encoder part to learn useful representations from the fake news data and the decoder part that predicts the future behaviour based on past observations. We also incorporate many features from the news content and social contexts into our model to help us classify the news better. In addition, we propose an effective labelling technique to address the label shortage problem. Experimental results on real-world data show that our model can detect fake news with higher accuracy within a few minutes after it propagates (early detection) than the baselines.

9.
Artif Intell Rev ; 55(1): 749-800, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34305252

RESUMEN

Nowadays, more and more news readers read news online where they have access to millions of news articles from multiple sources. In order to help users find the right and relevant content, news recommender systems (NRS) are developed to relieve the information overload problem and suggest news items that might be of interest for the news readers. In this paper, we highlight the major challenges faced by the NRS and identify the possible solutions from the state-of-the-art. Our discussion is divided into two parts. In the first part, we present an overview of the recommendation solutions, datasets, evaluation criteria beyond accuracy and recommendation platforms being used in the NRS. We also talk about two popular classes of models that have been successfully used in recent years. In the second part, we focus on the deep neural networks as solutions to build the NRS. Different from previous surveys, we study the effects of news recommendations on user behaviors and try to suggest possible remedies to mitigate those effects. By providing the state-of-the-art knowledge, this survey can help researchers and professional practitioners have a better understanding of the recent developments in news recommendation algorithms. In addition, this survey sheds light on the potential new directions.

10.
Int J Data Sci Anal ; : 1-21, 2022 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-36065448

RESUMEN

Because of the increasing use of data-centric systems and algorithms in machine learning, the topic of fairness is receiving a lot of attention in the academic and broader literature. This paper introduces Dbias (https://pypi.org/project/Dbias/), an open-source Python package for ensuring fairness in news articles. Dbias can take any text to determine if it is biased. Then, it detects biased words in the text, masks them, and suggests a set of sentences with new words that are bias-free or at least less biased. We conduct extensive experiments to assess the performance of Dbias. To see how well our approach works, we compare it to the existing fairness models. We also test the individual components of Dbias to see how effective they are. The experimental results show that Dbias outperforms all the baselines in terms of accuracy and fairness. We make this package (Dbias) as publicly available for the developers and practitioners to mitigate biases in textual data (such as news articles), as well as to encourage extension of this work.

11.
Viruses ; 14(12)2022 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-36560764

RESUMEN

The clinical application of detecting COVID-19 factors is a challenging task. The existing named entity recognition models are usually trained on a limited set of named entities. Besides clinical, the non-clinical factors, such as social determinant of health (SDoH), are also important to study the infectious disease. In this paper, we propose a generalizable machine learning approach that improves on previous efforts by recognizing a large number of clinical risk factors and SDoH. The novelty of the proposed method lies in the subtle combination of a number of deep neural networks, including the BiLSTM-CNN-CRF method and a transformer-based embedding layer. Experimental results on a cohort of COVID-19 data prepared from PubMed articles show the superiority of the proposed approach. When compared to other methods, the proposed approach achieves a performance gain of about 1-5% in terms of macro- and micro-average F1 scores. Clinical practitioners and researchers can use this approach to obtain accurate information regarding clinical risks and SDoH factors, and use this pipeline as a tool to end the pandemic or to prepare for future pandemics.


Asunto(s)
COVID-19 , Procesamiento de Lenguaje Natural , Humanos , COVID-19/diagnóstico , Redes Neurales de la Computación , Aprendizaje Automático , Registros Electrónicos de Salud
12.
PLOS Digit Health ; 1(12): e0000152, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36812589

RESUMEN

BACKGROUND: Despite significant advancements in biomedical named entity recognition methods, the clinical application of these systems continues to face many challenges: (1) most of the methods are trained on a limited set of clinical entities; (2) these methods are heavily reliant on a large amount of data for both pre-training and prediction, making their use in production impractical; (3) they do not consider non-clinical entities, which are also related to patient's health, such as social, economic or demographic factors. METHODS: In this paper, we develop Bio-Epidemiology-NER (https://pypi.org/project/Bio-Epidemiology-NER/) an open-source Python package for detecting biomedical named entities from the text. This approach is based on a Transformer-based system and trained on a dataset that is annotated with many named entities (medical, clinical, biomedical, and epidemiological). This approach improves on previous efforts in three ways: (1) it recognizes many clinical entity types, such as medical risk factors, vital signs, drugs, and biological functions; (2) it is easily configurable, reusable, and can scale up for training and inference; (3) it also considers non-clinical factors (age and gender, race and social history and so) that influence health outcomes. At a high level, it consists of the phases: pre-processing, data parsing, named entity recognition, and named entity enhancement. RESULTS: Experimental results show that our pipeline outperforms other methods on three benchmark datasets with macro-and micro average F1 scores around 90 percent and above. CONCLUSION: This package is made publicly available for researchers, doctors, clinicians, and anyone to extract biomedical named entities from unstructured biomedical texts.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA