Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 154
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 281, 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39192204

RESUMO

BACKGROUND: Mining the vast pool of biomedical literature to extract accurate responses and relevant references is challenging due to the domain's interdisciplinary nature, specialized jargon, and continuous evolution. Early natural language processing (NLP) approaches often led to incorrect answers as they failed to comprehend the nuances of natural language. However, transformer models have significantly advanced the field by enabling the creation of large language models (LLMs), enhancing question-answering (QA) tasks. Despite these advances, current LLM-based solutions for specialized domains like biology and biomedicine still struggle to generate up-to-date responses while avoiding "hallucination" or generating plausible but factually incorrect responses. RESULTS: Our work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule ( https://github.com/wasimaftab/WeiseEule-LocalHost ), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation. CONCLUSIONS: Our findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Mineração de Dados/métodos , Armazenamento e Recuperação da Informação/métodos
2.
BMC Med Inform Decis Mak ; 24(1): 18, 2024 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-38243204

RESUMO

OBJECTIVE: To develop a Chinese Diabetes Mellitus Ontology (CDMO) and explore methods for constructing high-quality Chinese biomedical ontologies. MATERIALS AND METHODS: We used various data sources, including Chinese clinical practice guidelines, expert consensus, literature, and hospital information system database schema, to build the CDMO. We combined top-down and bottom-up strategies and integrated text mining and cross-lingual ontology mapping. The ontology was validated by clinical experts and ontology development tools, and its application was validated through clinical decision support and Chinese natural language medical question answering. RESULTS: The current CDMO consists of 3,752 classes, 182 fine-grained object properties with hierarchical relationships, 108 annotation properties, and over 12,000 mappings to other well-known medical ontologies in English. Based on the CDMO and clinical practice guidelines, we developed 200 rules for diabetes diagnosis, treatment, diet, and medication recommendations using the Semantic Web Rule Language. By injecting ontology knowledge, CDMO enhances the performance of the T5 model on a real-world Chinese medical question answering dataset related to diabetes. CONCLUSION: CDMO has fine-grained semantic relationships and extensive annotation information, providing a foundation for medical artificial intelligence applications in Chinese contexts, including the construction of medical knowledge graphs, clinical decision support systems, and automated medical question answering. Furthermore, the development process incorporated natural language processing and cross-lingual ontology mapping to improve the quality of the ontology and improved development efficiency. This workflow offers a methodological reference for the efficient development of other high-quality Chinese as well as non-English medical ontologies.


Assuntos
Ontologias Biológicas , Diabetes Mellitus , Humanos , Inteligência Artificial , Idioma , Semântica , Diabetes Mellitus/diagnóstico
3.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 41(3): 560-568, 2024 Jun 25.
Artigo em Zh | MEDLINE | ID: mdl-38932543

RESUMO

Recent studies have introduced attention models for medical visual question answering (MVQA). In medical research, not only is the modeling of "visual attention" crucial, but the modeling of "question attention" is equally significant. To facilitate bidirectional reasoning in the attention processes involving medical images and questions, a new MVQA architecture, named MCAN, has been proposed. This architecture incorporated a cross-modal co-attention network, FCAF, which identifies key words in questions and principal parts in images. Through a meta-learning channel attention module (MLCA), weights were adaptively assigned to each word and region, reflecting the model's focus on specific words and regions during reasoning. Additionally, this study specially designed and developed a medical domain-specific word embedding model, Med-GloVe, to further enhance the model's accuracy and practical value. Experimental results indicated that MCAN proposed in this study improved the accuracy by 7.7% on free-form questions in the Path-VQA dataset, and by 4.4% on closed-form questions in the VQA-RAD dataset, which effectively improves the accuracy of the medical vision question answer.


Assuntos
Redes Neurais de Computação , Humanos , Atenção , Algoritmos
4.
Brief Bioinform ; 22(2): 781-799, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-33279995

RESUMO

More than 50 000 papers have been published about COVID-19 since the beginning of 2020 and several hundred new papers continue to be published every day. This incredible rate of scientific productivity leads to information overload, making it difficult for researchers, clinicians and public health officials to keep up with the latest findings. Automated text mining techniques for searching, reading and summarizing papers are helpful for addressing information overload. In this review, we describe the many resources that have been introduced to support text mining applications over the COVID-19 literature; specifically, we discuss the corpora, modeling resources, systems and shared tasks that have been introduced for COVID-19. We compile a list of 39 systems that provide functionality such as search, discovery, visualization and summarization over the COVID-19 literature. For each system, we provide a qualitative description and assessment of the system's performance, unique data or user interface features and modeling decisions. Many systems focus on search and discovery, though several systems provide novel features, such as the ability to summarize findings over multiple documents or linking between scientific articles and clinical trials. We also describe the public corpora, models and shared tasks that have been introduced to help reduce repeated effort among community members; some of these resources (especially shared tasks) can provide a basis for comparing the performance of different systems. Finally, we summarize promising results and open challenges for text mining the COVID-19 literature.


Assuntos
COVID-19/epidemiologia , Mineração de Dados/métodos , COVID-19/virologia , Humanos , SARS-CoV-2/isolamento & purificação
5.
J Biomed Inform ; 142: 104382, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37156393

RESUMO

The article presents a workflow to create a question-answering system whose knowledge base combines knowledge graphs and scientific publications on coronaviruses. It is based on the experience gained in modeling evidence from research articles to provide answers to questions in natural language. The work contains best practices for acquiring scientific publications, tuning language models to identify and normalize relevant entities, creating representational models based on probabilistic topics, and formalizing an ontology that describes the associations between domain concepts supported by the scientific literature. All the resources generated in the domain of coronavirus are available openly as part of the Drugs4COVID initiative, and can be (re)-used independently or as a whole. They can be exploited by scientific communities conducting research related to SARS-CoV-2/COVID-19 and also by therapeutic communities, laboratories, etc., wishing to find and understand relationships between symptoms, drugs, active ingredients and their documentary evidence.


Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Reconhecimento Automatizado de Padrão , Publicações
6.
J Biomed Inform ; 146: 104486, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37722445

RESUMO

Large neural-based Pre-trained Language Models (PLM) have recently gained much attention due to their noteworthy performance in many downstream Information Retrieval (IR) and Natural Language Processing (NLP) tasks. PLMs can be categorized as either general-purpose, which are trained on resources such as large-scale Web corpora, and domain-specific which are trained on in-domain or mixed-domain corpora. While domain-specific PLMs have shown promising performance on domain-specific tasks, they are significantly more computationally expensive compared to general-purpose PLMs as they have to be either retrained or trained from scratch. The objective of our work in this paper is to explore whether it would be possible to leverage general-purpose PLMs to show competitive performance to domain-specific PLMs without the need for expensive retraining of the PLMs for domain-specific tasks. By focusing specifically on the recent BioASQ Biomedical Question Answering task, we show how different general-purpose PLMs show synergistic behaviour in terms of performance, which can lead to overall notable performance improvement when used in tandem with each other. More concretely, given a set of general-purpose PLMs, we propose a self-supervised method for training a classifier that systematically selects the PLM that is most likely to answer the question correctly on a per-input basis. We show that through such a selection strategy, the performance of general-purpose PLMs can become competitive with domain-specific PLMs while remaining computationally light since there is no need to retrain the large language model itself. We run experiments on the BioASQ dataset, which is a large-scale biomedical question-answering benchmark. We show that utilizing our proposed selection strategy can show statistically significant performance improvements on general-purpose language models with an average of 16.7% when using only lighter models such as DistilBERT and DistilRoBERTa, as well as 14.2% improvement when using relatively larger models such as BERT and RoBERTa and so, their performance become competitive with domain-specific large language models such as PubMedBERT.

7.
J Biomed Inform ; 137: 104272, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36563828

RESUMO

BACKGROUND: Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in multi-centre studies, finding suitable datasets to support the study is challenging, time-consuming, and sometimes impossible without a deep understanding of each dataset. METHODS: We propose a strategy for retrieving biomedical datasets of interest that were semantically annotated, using an interface built by applying a methodology for transforming natural language questions into formal language queries. The advantages of creating biomedical semantic data are enhanced by using natural language interfaces to issue complex queries without manipulating a logical query language. RESULTS: Our methodology was validated using Alzheimer's disease datasets published in a European platform for sharing and reusing biomedical data. We converted data to semantic information format using biomedical ontologies in everyday use in the biomedical community and published it as a FAIR endpoint. We have considered natural language questions of three types: single-concept questions, questions with exclusion criteria, and multi-concept questions. Finally, we analysed the performance of the question-answering module we used and its limitations. The source code is publicly available at https://bioinformatics-ua.github.io/BioKBQA/. CONCLUSION: We propose a strategy for using information extracted from biomedical data and transformed into a semantic format using open biomedical ontologies. Our method uses natural language to formulate questions to be answered by this semantic data without the direct use of formal query languages.


Assuntos
Processamento de Linguagem Natural , Semântica , Software , Idioma , Bases de Dados Factuais
8.
J Biomed Inform ; 144: 104416, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37321443

RESUMO

This paper describes contextualized medication event extraction for automatically identifying medication change events with their contexts from clinical notes. The striding named entity recognition (NER) model extracts medication name spans from an input text sequence using a sliding-window approach. Specifically, the striding NER model separates the input sequence into a set of overlapping subsequences of 512 tokens with 128 tokens of stride, processing each subsequence using a large pre-trained language model and aggregating the outputs from the subsequences. The event and context classification has been done with multi-turn question-answering (QA) and span-based models. The span-based model classifies the span of each medication name using the span representation of the language model. In the QA model, event classification is augmented with questions in classifying the change events of each medication name and the context of the change events, while the model architecture is a classification style that is the same as the span-based model. We evaluated our extraction system on the n2c2 2022 Track 1 dataset, which is annotated for medication extraction (ME), event classification (EC), and context classification (CC) from clinical notes. Our system is a pipeline of the striding NER model for ME and the ensemble of the span-based and QA-based models for EC and CC. Our system achieved a combined F-score of 66.47% for the end-to-end contextualized medication event extraction (Release 1), which is the highest score among the participants of the n2c2 2022 Track 1.


Assuntos
Sistemas de Medicação , Processamento de Linguagem Natural , Humanos , Idioma , Mineração de Dados , Registros Eletrônicos de Saúde
9.
BMC Med Inform Decis Mak ; 23(1): 119, 2023 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-37442993

RESUMO

BACKGROUND: Kampo medicine is widely used in Japan; however, most physicians and pharmacists have insufficient knowledge and experience in it. Although a chatbot-style system using machine learning and natural language processing has been used in some clinical settings and proven useful, the system developed specifically for the Japanese language using this method has not been validated by research. The purpose of this study is to develop a novel drug information provision system for Kampo medicines using a natural language classifier® (NLC®) based on IBM Watson. METHODS: The target Kampo formulas were 33 formulas listed in the 17th revision of the Japanese Pharmacopoeia. The information included in the system comes from the package inserts of Kampo medicines, Manuals for Management of Individual Serious Adverse Drug Reactions, and data on off-label usage. The system developed in this study classifies questions about the drug information of Kampo formulas input by natural language into preset questions and outputs preset answers for the questions. The system uses morphological analysis, synonym conversion by thesaurus, and NLC®. We fine-tuned the information registered into NLC® and increased the thesaurus. To validate the system, 900 validation questions were provided by six pharmacists who were classified into high or low levels of knowledge and experience of Kampo medicines and three pharmacy students. RESULTS: The precision, recall, and F-measure of the system performance were 0.986, 0.915, and 0.949, respectively. The results were stable even with differences in the amount of expertise of the question authors. CONCLUSIONS: We developed a system using natural language classification that can give appropriate answers to most of the validation questions.


Assuntos
Medicina Kampo , Médicos , Humanos , Processamento de Linguagem Natural , Farmacêuticos , Tecnologia , Japão
10.
Sensors (Basel) ; 23(11)2023 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-37300022

RESUMO

Fault diagnosis is crucial for repairing aircraft and ensuring their proper functioning. However, with the higher complexity of aircraft, some traditional diagnosis methods that rely on experience are becoming less effective. Therefore, this paper explores the construction and application of an aircraft fault knowledge graph to improve the efficiency of fault diagnosis for maintenance engineers. Firstly, this paper analyzes the knowledge elements required for aircraft fault diagnosis, and defines a schema layer of a fault knowledge graph. Secondly, with deep learning as the main method and heuristic rules as the auxiliary method, fault knowledge is extracted from structured and unstructured fault data, and a fault knowledge graph for a certain type of craft is constructed. Finally, a fault question-answering system based on a fault knowledge graph was developed, which can accurately answer questions from maintenance engineers. The practical implementation of our proposed methodology highlights how knowledge graphs provide an effective means of managing aircraft fault knowledge, ultimately assisting engineers in identifying fault roots accurately and quickly.


Assuntos
Aeronaves , Reconhecimento Automatizado de Padrão , Engenharia , Heurística , Conhecimento
11.
Entropy (Basel) ; 25(4)2023 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-37190427

RESUMO

Along with the explosion of ChatGPT, the artificial intelligence question-answering system has been pushed to a climax. Intelligent question-answering enables computers to simulate people's behavior habits of understanding a corpus through machine learning, so as to answer questions in professional fields. How to obtain more accurate answers to personalized questions in professional fields is the core content of intelligent question-answering research. As one of the key technologies of intelligent question-answering, the accuracy of text matching is related to the development of the intelligent question-answering community. Aiming to solve the problem of polysemy of text, the Enhanced Representation through Knowledge Integration (ERNIE) model is used to obtain the word vector representation of text, which makes up for the lack of prior knowledge in the traditional word vector representation model. Additionally, there are also problems of homophones and polyphones in Chinese, so this paper introduces the phonetic character sequence of the text to distinguish them. In addition, aiming at the problem that there are many proper nouns in the insurance field that are difficult to identify, after conventional part-of-speech tagging, proper nouns are distinguished by especially defining their parts of speech. After the above three types of text-based semantic feature extensions, this paper also uses the Bi-directional Long Short-Term Memory (BiLSTM) and TextCNN models to extract the global features and local features of the text, respectively. It can obtain the feature representation of the text more comprehensively. Thus, the text matching model integrating BiLSTM and TextCNN fusing Multi-Feature (namely MFBT) is proposed for the insurance question-answering community. The MFBT model aims to solve the problems that affect the answer selection in the insurance question-answering community, such as proper nouns, nonstandard sentences and sparse features. Taking the question-and-answer data of the insurance library as the sample, the MFBT text-matching model is compared and evaluated with other models. The experimental results show that the MFBT text-matching model has higher evaluation index values, including accuracy, recall and F1, than other models. The model trained by historical search data can better help users in the insurance question-and-answer community obtain the answers they need and improve their satisfaction.

12.
Med J Islam Repub Iran ; 37: 134, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38318401

RESUMO

Background: The study of various aspects of information behavior has attracted the attention of many researchers. This study used the structural equation modeling method to identify factors affecting respondents' strategies in answering health-related questions on social question answering (SQA) websites. Methods: The study population in this quantitative-applied survey included all respondents answering health-related questions on national and international SQA websites, among whom 431 individuals were selected as the sample using SPSS SAMPLE POWER software and convenience sampling. The data were collected using the Respondents' Motivations and Strategies Questionnaire and the Social Support Questionnaire. The items of these questionnaires are scored on a 5-point Likert scale. The conceptual research model was evaluated using the structural equation modeling method, and the collected data were analyzed in SPSS 26.0 and AMOS 24.0. Results: The authors identified and analyzed the factors influencing respondents' strategies and the relationships between these factors. Motivations, social support, sex, age, income, level of education, amount of activity per week, and response time are effective on response strategies with factor loadings of 0.61, 0.56, 0.50, 0.53, 0.31, 0.66, 0.53, and 0.65, respectively. The variable determination coefficient of response strategies in the structural equation model is reported to be 0.55 and significant. Finally, response strategies can be predicted based on the independent variables. Conclusion: In order to enhance response strategies, it is important to promote effective response behaviors, as determined by the components that influence response strategies. The quality of related online services, such as expert question-answering and digital reference services, can be improved with the help of the present findings.

13.
BMC Bioinformatics ; 23(1): 136, 2022 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-35428175

RESUMO

BACKGROUND: Medical information has rapidly increased on the internet and has become one of the main targets of search engine use. However, medical information on the internet is subject to the problems of quality and accessibility, so ordinary users are unable to obtain answers to their medical questions conveniently. As a solution, researchers build medical question answering (QA) systems. However, research on medical QA in the Chinese language lags behind work on English-based systems. This lag is mainly due to the difficulty of constructing a high-quality knowledge base and the underutilization of medical corpora in the Chinese language. RESULTS: This study developed an end-to-end solution to implement a medical QA system for the Chinese language with low cost and time. First, we created a high-quality medical knowledge graph from hospital data (electronic health/medical records) in a nearly automatic manner that trained a supervised model based on data labeled using bootstrapping techniques. Then, we designed a QA system based on a memory-based neural network and attention mechanism. Finally, we trained the system to generate answers from the knowledge base and a QA corpus on the internet. CONCLUSIONS: Bootstrapping and deep neural network techniques can construct a knowledge graph from electronic health/medical records with satisfactory precision and coverage. Our proposed context bridge mechanisms perform training with a variety of language features. Our QA system can achieve state-of-the-art quality in answering medical questions with constrained topics. As we evaluated, complex Chinese language processing techniques, such as segmentation and parsing, were not necessary for practice and complex architectures were not necessary to build the QA system. Lastly, we created an application using our method for internet QA usage.


Assuntos
Idioma , Redes Neurais de Computação , China , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural
14.
BMC Bioinformatics ; 23(1): 210, 2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35655148

RESUMO

BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time. METHODS: This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics. RESULTS: Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%. CONCLUSION: CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions.


Assuntos
Benchmarking , COVID-19 , Algoritmos , Humanos , Idioma , Processamento de Linguagem Natural
15.
J Biomed Inform ; 128: 104040, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35259544

RESUMO

Searching for health information online is becoming customary for more and more consumers every day, which makes the need for efficient and reliable question answering systems more pressing. An important contributor to the success rates of these systems is their ability to fully understand the consumers' questions. However, these questions are frequently longer than needed and mention peripheral information that is not useful in finding relevant answers. Question summarization is one of the potential solutions to simplifying long and complex consumer questions before attempting to find an answer. In this paper, we study the task of abstractive summarization for real-world consumer health questions. We develop an abstractive question summarization model that leverages the semantic interpretation of a question via recognition of medical entities, which enables generation of informative summaries. Towards this, we propose multiple Cloze tasks (i.e. the task of filing missing words in a given context) to identify the key medical entities that enforce the model to have better coverage in question-focus recognition. Additionally, we infuse the decoder inputs with question-type information to generate question-type driven summaries. When evaluated on the MeQSum benchmark corpus, our framework outperformed the state-of-the-art method by 10.2 ROUGE-L points. We also conducted a manual evaluation to assess the correctness of the generated summaries.


Assuntos
Semântica
16.
J Biomed Inform ; 134: 104183, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36038063

RESUMO

Medical Visual Question Answering (VQA) targets at answering questions related to given medical images and it contains tremendous potential in healthcare services. However, researches on medical VQA are still facing challenges, particularly on how to learn a fine-grained multimodal semantic representation from relatively small volume of data resources for answer prediction. Moreover, the long-tailed distribution labels of medical VQA data frequently result in poor performance of models. To this end, we propose a novel bi-level representation learning model with two reasoning modules to learn bi-level representations for the medical VQA task. One is sentence-level reasoning to learn sentence-level semantic representations from multimodal input. The other is token-level reasoning that employs an attention mechanism to generate a multimodal contextual vector by fusing image features and word embeddings. The contextual vector is used to filter irrelevant semantic representations from sentence-level reasoning to generate a fine-grained multimodal representation. Furthermore, a label-distribution-smooth margin loss is proposed to minimize generalization error bound of long-tailed distribution datasets by modifying margin bound of different labels in training set. Based on standard VQA-Rad dataset and PathVQA dataset, the proposed model achieves 0.7605 and 0.5434 on accuracy, 0.7741 and 0.5288 on F1-score, respectively, outperforming a set of state-of-the-art baseline models.


Assuntos
Aprendizado de Máquina , Semântica , Atenção à Saúde , Idioma , Aprendizagem
17.
BMC Med Inform Decis Mak ; 22(Suppl 1): 153, 2022 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-35799177

RESUMO

BACKGROUND: Dietary supplements (DS) have been widely used by consumers, but the information around the efficacy and safety of DS is disparate or incomplete, thus creating barriers for consumers to find information effectively. Conversational agent (CA) systems have been applied to healthcare domain, but there is no such system to answer consumers regarding DS use, although widespread use of DS. In this study, we develop the first CA system for DS use. METHODS: Our CA system for DS use developed on the MindMeld framework, consists of three components: question understanding, DS knowledge base, and answer generation. We collected and annotated 1509 questions to develop a natural language understanding module (e.g., question type classifier, named entity recognizer) which was then integrated into MindMeld framework. CA then queries the DS knowledge base (i.e., iDISK) and generates answers using rule-based slot filling techniques. We evaluated the algorithms of each component and the CA system as a whole. RESULTS: CNN is the best question classifier with an F1 score of 0.81, and CRF is the best named entity recognizer with an F1 score of 0.87. The system achieves an overall accuracy of 81% and an average score of 1.82 with succ@3 + score of 76.2% and succ@2 + of 66% approximately. CONCLUSION: This study develops the first CA system for DS use using the MindMeld framework and iDISK domain knowledge base.


Assuntos
Algoritmos , Processamento de Linguagem Natural , Suplementos Nutricionais , Humanos , Idioma
18.
Sensors (Basel) ; 22(3)2022 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-35161790

RESUMO

VQA (Visual Question Answering) is a multi-model task. Given a picture and a question related to the image, it will determine the correct answer. The attention mechanism has become a de facto component of almost all VQA models. Most recent VQA approaches use dot-product to calculate the intra-modality and inter-modality attention between visual and language features. In this paper, the BAN (Bilinear Attention Network) method was used to calculate attention. We propose a deep multimodality bilinear attention network (DMBA-NET) framework with two basic attention units (BAN-GA and BAN-SA) to construct inter-modality and intra-modality relations. The two basic attention units are the core of the whole network framework and can be cascaded in depth. In addition, we encode the question based on the dynamic word vector of BERT(Bidirectional Encoder Representations from Transformers), then use self-attention to process the question features further. Then we sum them with the features obtained by BAN-GA and BAN-SA before the final classification. Without using the Visual Genome datasets for augmentation, the accuracy of our model reaches 70.85% on the test-std dataset of VQA 2.0.


Assuntos
Idioma , Processamento de Linguagem Natural , Projetos de Pesquisa
19.
Entropy (Basel) ; 24(12)2022 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-36554210

RESUMO

In the process of bridge management, large amounts of domain information are accumulated, such as basic attributes, structural defects, technical conditions, etc. However, the valuable information is not fully utilized, resulting in insufficient knowledge service in the field of bridge management. To tackle these problems, this paper proposes a complex knowledge base question answering (C-KBQA) framework for intelligent bridge management based on multi-task learning (MTL) and cross-task constraints (CTC). First, with C-KBQA as the main task, part-of-speech (POS) tagging, topic entity extraction (TEE), and question classification (QC) as auxiliary tasks, an MTL framework is built by sharing encoders and parameters, thereby effectively avoiding the error propagation problem of the pipeline model. Second, cross-task semantic constraints are provided for different subtasks via POS embeddings, entity embeddings, and question-type embeddings. Finally, using template matching, relevant query statements are generated and interaction with the knowledge base is established. The experimental results show that the proposed model outperforms compared mainstream models in terms of TEE and QC on bridge management datasets, and its performance in C-KBQA is outstanding.

20.
Distrib Parallel Databases ; 40(2-3): 409-440, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36097541

RESUMO

The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training datasets to translate a natural language question to SPARQL in order to query the knowledge graph. Hence, these approaches often cannot be applied directly to complex scientific datasets where no prior training data is available. In this paper, we focus on the challenges of natural language processing over knowledge graphs of scientific datasets. In particular, we introduce Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries. Bio-SODA uses a generic graph-based approach for translating user questions to a ranked list of SPARQL candidate queries. Furthermore, Bio-SODA uses a novel ranking algorithm that includes node centrality as a measure of relevance for selecting the best SPARQL candidate query. Our experiments with real-world datasets across several scientific domains, including the official bioinformatics Question Answering over Linked Data (QALD) challenge, as well as the CORDIS dataset of European projects, show that Bio-SODA outperforms publicly available KGQA systems by an F1-score of least 20% and by an even higher factor on more complex bioinformatics datasets. Finally, we introduce Bio-SODA UX, a graphical user interface designed to assist users in the exploration of large knowledge graphs and in dynamically disambiguating natural language questions that target the data available in these graphs.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA