Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
BMC Bioinformatics ; 20(1): 511, 2019 Oct 22.
Artículo en Inglés | MEDLINE | ID: mdl-31640539

RESUMEN

BACKGROUND: One of the challenges in large-scale information retrieval (IR) is developing fine-grained and domain-specific methods to answer natural language questions. Despite the availability of numerous sources and datasets for answer retrieval, Question Answering (QA) remains a challenging problem due to the difficulty of the question understanding and answer extraction tasks. One of the promising tracks investigated in QA is mapping new questions to formerly answered questions that are "similar". RESULTS: We propose a novel QA approach based on Recognizing Question Entailment (RQE) and we describe the QA system and resources that we built and evaluated on real medical questions. First, we compare logistic regression and deep learning methods for RQE using different kinds of datasets including textual inference, question similarity, and entailment in both the open and clinical domains. Second, we combine IR models with the best RQE method to select entailed questions and rank the retrieved answers. To study the end-to-end QA approach, we built the MedQuAD collection of 47,457 question-answer pairs from trusted medical sources which we introduce and share in the scope of this paper. Following the evaluation process used in TREC 2017 LiveQA, we find that our approach exceeds the best results of the medical task with a 29.8% increase over the best official score. CONCLUSIONS: The evaluation results support the relevance of question entailment for QA and highlight the effectiveness of combining IR and RQE for future QA efforts. Our findings also show that relying on a restricted set of reliable answer sources can bring a substantial improvement in medical QA.


Asunto(s)
Aprendizaje Profundo , Almacenamiento y Recuperación de la Información/métodos , Informática Médica , Modelos Logísticos , Unified Medical Language System
2.
BMC Bioinformatics ; 19(1): 34, 2018 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-29409442

RESUMEN

BACKGROUND: Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. RESULTS: The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. CONCLUSIONS: To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.


Asunto(s)
Estado de Salud , Encuestas y Cuestionarios , Correo Electrónico , Humanos , Semántica , Navegador Web
3.
J Biomed Inform ; 58: 122-132, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26432353

RESUMEN

Pharmacovigilance (PV) is defined by the World Health Organization as the science and activities related to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem. An essential aspect in PV is to acquire knowledge about Drug-Drug Interactions (DDIs). The shared tasks on DDI-Extraction organized in 2011 and 2013 have pointed out the importance of this issue and provided benchmarks for: Drug Name Recognition, DDI extraction and DDI classification. In this paper, we present our text mining systems for these tasks and evaluate their results on the DDI-Extraction benchmarks. Our systems rely on machine learning techniques using both feature-based and kernel-based methods. The obtained results for drug name recognition are encouraging. For DDI-Extraction, our hybrid system combining a feature-based method and a kernel-based method was ranked second in the DDI-Extraction-2011 challenge, and our two-step system for DDI detection and classification was ranked first in the DDI-Extraction-2013 task at SemEval. We discuss our methods and results and give pointers to future work.


Asunto(s)
Minería de Datos , Interacciones Farmacológicas , Aprendizaje Automático , Farmacovigilancia
4.
J Am Med Inform Assoc ; 30(8): 1448-1455, 2023 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-37100768

RESUMEN

OBJECTIVE: Social determinants of health (SDOH) are nonmedical factors that can influence health outcomes. This paper seeks to extract SDOH from clinical texts in the context of the National NLP Clinical Challenges (n2c2) 2022 Track 2 Task. MATERIALS AND METHODS: Annotated and unannotated data from the Medical Information Mart for Intensive Care III (MIMIC-III) corpus, the Social History Annotation Corpus, and an in-house corpus were used to develop 2 deep learning models that used classification and sequence-to-sequence (seq2seq) approaches. RESULTS: The seq2seq approach had the highest overall F1 scores in the challenge's 3 subtasks: 0.901 on the extraction subtask, 0.774 on the generalizability subtask, and 0.889 on the learning transfer subtask. DISCUSSION: Both approaches rely on SDOH event representations that were designed to be compatible with transformer-based pretrained models, with the seq2seq representation supporting an arbitrary number of overlapping and sentence-spanning events. Models with adequate performance could be produced quickly, and the remaining mismatch between representation and task requirements was then addressed in postprocessing. The classification approach used rules to generate entity relationships from its sequence of token labels, while the seq2seq approach used constrained decoding and a constraint solver to recover entity text spans from its sequence of potentially ambiguous tokens. CONCLUSION: We proposed 2 different approaches to extract SDOH from clinical texts with high accuracy. However, accuracy suffers on text from new healthcare institutions not present in the training data, and thus generalization remains an important topic for future study.


Asunto(s)
Lenguaje , Determinantes Sociales de la Salud , Procesamiento de Lenguaje Natural
5.
Sci Data ; 10(1): 586, 2023 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-37673893

RESUMEN

Recent immense breakthroughs in generative models such as in GPT4 have precipitated re-imagined ubiquitous usage of these models in all applications. One area that can benefit by improvements in artificial intelligence (AI) is healthcare. The note generation task from doctor-patient encounters, and its associated electronic medical record documentation, is one of the most arduous time-consuming tasks for physicians. It is also a natural prime potential beneficiary to advances in generative models. However with such advances, benchmarking is more critical than ever. Whether studying model weaknesses or developing new evaluation metrics, shared open datasets are an imperative part of understanding the current state-of-the-art. Unfortunately as clinic encounter conversations are not routinely recorded and are difficult to ethically share due to patient confidentiality, there are no sufficiently large clinic dialogue-note datasets to benchmark this task. Here we present the Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue. We also present the benchmark performances of several common state-of-the-art approaches.


Asunto(s)
Inteligencia Artificial , Benchmarking , Instituciones de Salud , Humanos , Registros Electrónicos de Salud
6.
J Am Med Inform Assoc ; 27(2): 194-201, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31592532

RESUMEN

OBJECTIVE: Consumers increasingly turn to the internet in search of health-related information; and they want their questions answered with short and precise passages, rather than needing to analyze lists of relevant documents returned by search engines and reading each document to find an answer. We aim to answer consumer health questions with information from reliable sources. MATERIALS AND METHODS: We combine knowledge-based, traditional machine and deep learning approaches to understand consumers' questions and select the best answers from consumer-oriented sources. We evaluate the end-to-end system and its components on simple questions generated in a pilot development of MedlinePlus Alexa skill, as well as the short and long real-life questions submitted to the National Library of Medicine by consumers. RESULTS: Our system achieves 78.7% mean average precision and 87.9% mean reciprocal rank on simple Alexa questions, and 44.5% mean average precision and 51.6% mean reciprocal rank on real-life questions submitted by National Library of Medicine consumers. DISCUSSION: The ensemble of deep learning, domain knowledge, and traditional approaches recognizes question type and focus well in the simple questions, but it leaves room for improvement on the real-life consumers' questions. Information retrieval approaches alone are sufficient for finding answers to simple Alexa questions. Answering real-life questions, however, benefits from a combination of information retrieval and inference approaches. CONCLUSION: A pilot practical implementation of research needed to help consumers find reliable answers to their health-related questions demonstrates that for most questions the reliable answers exist and can be found automatically with acceptable accuracy.


Asunto(s)
Información de Salud al Consumidor , Aprendizaje Profundo , Almacenamiento y Recuperación de la Información/métodos , MedlinePlus , Conjuntos de Datos como Asunto , Humanos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Proyectos Piloto , Motor de Búsqueda , Unified Medical Language System
7.
Sci Data ; 5: 180251, 2018 11 20.
Artículo en Inglés | MEDLINE | ID: mdl-30457565

RESUMEN

Radiology images are an essential part of clinical decision making and population screening, e.g., for cancer. Automated systems could help clinicians cope with large amounts of images by answering questions about the image contents. An emerging area of artificial intelligence, Visual Question Answering (VQA) in the medical domain explores approaches to this form of clinical decision support. Success of such machine learning tools hinges on availability and design of collections composed of medical images augmented with question-answer pairs directed at the content of the image. We introduce VQA-RAD, the first manually constructed dataset where clinicians asked naturally occurring questions about radiology images and provided reference answers. Manual categorization of images and questions provides insight into clinically relevant tasks and the natural language to phrase them. Evaluating with well-known algorithms, we demonstrate the rich quality of this dataset over other automatically constructed ones. We propose VQA-RAD to encourage the community to design VQA tools with the goals of improving patient care.


Asunto(s)
Aprendizaje Automático , Sistemas de Información Radiológica , Algoritmos , Análisis de Datos , Minería de Datos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Radiografía/métodos , Sistemas de Información Radiológica/clasificación , Sistemas de Información Radiológica/normas
8.
AMIA Annu Symp Proc ; 2016: 310-318, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28269825

RESUMEN

With the increasing heterogeneity and specialization of medical texts, automated question answering is becoming more and more challenging. In this context, answering a given medical question by retrieving similar questions that are already answered by human experts seems to be a promising solution. In this paper, we propose a new approach for the detection of similar questions based on Recognizing Question Entailment (RQE). In particular, we consider Frequently Asked Question (FAQs) as a valuable and widespread source of information. Our final goal is to automatically provide an existing answer if FAQ similar to a consumer health question exists. We evaluate our approach using consumer health questions received by the National Library of Medicine and FAQs collected from NIH websites. Our first results are promising and suggest the feasibility of our approach as a valuable complement to classic question answering approaches.


Asunto(s)
Información de Salud al Consumidor , Sistemas Especialistas , Conducta en la Búsqueda de Información , Almacenamiento y Recuperación de la Información/métodos , Algoritmos , Bases de Datos como Asunto , Humanos , Internet , Aprendizaje Automático , National Library of Medicine (U.S.) , Estados Unidos
9.
J Biomed Semantics ; 7(1): 48, 2016 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-27502477

RESUMEN

BACKGROUND: The increasing number of open-access ontologies and their key role in several applications such as decision-support systems highlight the importance of their validation. Human expertise is crucial for the validation of ontologies from a domain point-of-view. However, the growing number of ontologies and their fast evolution over time make manual validation challenging. METHODS: We propose a novel semi-automatic approach based on the generation of natural language (NL) questions to support the validation of ontologies and their evolution. The proposed approach includes the automatic generation, factorization and ordering of NL questions from medical ontologies. The final validation and correction is performed by submitting these questions to domain experts and automatically analyzing their feedback. We also propose a second approach for the validation of mappings impacted by ontology changes. The method exploits the context of the changes to propose correction alternatives presented as Multiple Choice Questions. RESULTS: This research provides a question optimization strategy to maximize the validation of ontology entities with a reduced number of questions. We evaluate our approach for the validation of three medical ontologies. We also evaluate the feasibility and efficiency of our mappings validation approach in the context of ontology evolution. These experiments are performed with different versions of SNOMED-CT and ICD9. CONCLUSIONS: The obtained experimental results suggest the feasibility and adequacy of our approach to support the validation of interconnected and evolving ontologies. Results also suggest that taking into account RDFS and OWL entailment helps reducing the number of questions and validation time. The application of our approach to validate mapping evolution also shows the difficulty of adapting mapping evolution over time and highlights the importance of semi-automatic validation.


Asunto(s)
Ontologías Biológicas , Procesamiento de Lenguaje Natural , Estudios de Factibilidad
10.
J Biomed Semantics ; 2 Suppl 5: S4, 2011 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-22166723

RESUMEN

BACKGROUND: Information extraction is a complex task which is necessary to develop high-precision information retrieval tools. In this paper, we present the platform MeTAE (Medical Texts Annotation and Exploration). MeTAE allows (i) to extract and annotate medical entities and relationships from medical texts and (ii) to explore semantically the produced RDF annotations. RESULTS: Our annotation approach relies on linguistic patterns and domain knowledge and consists in two steps: (i) recognition of medical entities and (ii) identification of the correct semantic relation between each pair of entities. The first step is achieved by an enhanced use of MetaMap which improves the precision obtained by MetaMap by 19.59% in our evaluation. The second step relies on linguistic patterns which are built semi-automatically from a corpus selected according to semantic criteria. We evaluate our system's ability to identify medical entities of 16 types. We also evaluate the extraction of treatment relations between a treatment (e.g. medication) and a problem (e.g. disease): we obtain 75.72% precision and 60.46% recall. CONCLUSIONS: According to our experiments, using an external sentence segmenter and noun phrase chunker may improve the precision of MetaMap-based medical entity recognition. Our pattern-based relation extraction method obtains good precision and recall w.r.t related works. A more precise comparison with related approaches remains difficult however given the differences in corpora and in the exact nature of the extracted relations. The selection of MEDLINE articles through queries related to known drug-disease pairs enabled us to obtain a more focused corpus of relevant examples of treatment relations than a more general MEDLINE query.

11.
J Am Med Inform Assoc ; 18(5): 588-93, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21597105

RESUMEN

OBJECTIVE: This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts. DESIGN: The authors'approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods. RESULTS: The authors'assertion annotation system obtained an F-measure of 0.931, ranking fifth out of 21 participants at the i2b2/VA 2010 challenge. The authors' relation annotation system ranked third out of 16 participants with a 0.709 F-measure. The 0.773 F-measure the authors obtained on concept extraction did not make it to the top 10. CONCLUSION: On the one hand, the authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes. On the other hand, the use of only a rule-based method was not sufficient to deal with new types of data. Finally, the use of hybrid approaches combining machine-learning and rule-based approaches yielded higher scores.


Asunto(s)
Minería de Datos , Sistemas de Apoyo a Decisiones Clínicas , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Máquina de Vectores de Soporte , Sistemas Especialistas , Humanos , Semántica , Unified Medical Language System
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda