Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
J Behav Med ; 46(1-2): 253-275, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-35635593

RESUMO

Our study focused on the discovery of how vaccine hesitancy is framed in Twitter discourse, allowing us to recognize at-scale all tweets that evoke any of the hesitancy framings as well as the stance of the tweet authors towards the frame. By categorizing the hesitancy framings that propagate misinformation, address issues of trust in vaccines, or highlight moral issues or civil rights, we were able to empirically recognize their ontological commitments. Ontological commitments of vaccine hesitancy framings couples with the stance of tweet authors allowed us to identify hesitancy profiles for two most controversial yet effective and underutilized vaccines for which there remains substantial reluctance among the public: the Human Papillomavirus and the COVID-19 vaccines. The discovered hesitancy profiles inform public health messaging approaches to effectively reach Twitter users with promise to shift or bolster vaccine attitudes.


Assuntos
COVID-19 , Mídias Sociais , Vacinas , Humanos , Vacinas contra COVID-19 , Atitude Frente a Saúde , Vacinação
2.
J Am Med Inform Assoc ; 30(2): 329-339, 2023 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-36394232

RESUMO

OBJECTIVE: The rapidly growing body of communications during the COVID-19 pandemic posed a challenge to information seekers, who struggled to find answers to their specific and changing information needs. We designed a Question Answering (QA) system capable of answering ad-hoc questions about the COVID-19 disease, its causal virus SARS-CoV-2, and the recommended response to the pandemic. MATERIALS AND METHODS: The QA system incorporates, in addition to relevance models, automatic generation of questions from relevant sentences. We relied on entailment between questions for (1) pinpointing answers and (2) selecting novel answers early in the list of its results. RESULTS: The QA system produced state-of-the-art results when processing questions asked by experts (eg, researchers, scientists, or clinicians) and competitive results when processing questions asked by consumers of health information. Although state-of-the-art models for question generation and question entailment were used, more than half of the answers were missed, due to the limitations of the relevance models employed. DISCUSSION: Although question entailment enabled by automatic question generation is the cornerstone of our QA system's architecture, question entailment did not prove to always be reliable or sufficient in ranking the answers. Question entailment should be enhanced with additional inferential capabilities. CONCLUSION: The QA system presented in this article produced state-of-the-art results processing expert questions and competitive results processing consumer questions. Improvements should be considered by using better relevance models and enhanced inference methods. Moreover, experts and consumers have different answer expectations, which should be accounted for in future QA development.


Assuntos
COVID-19 , Armazenamento e Recuperação da Informação , Humanos , Comunicação , Pandemias , SARS-CoV-2 , Aprendizado Profundo
3.
Front Digit Health ; 4: 819228, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35966142

RESUMO

Social media offers a unique opportunity to widely disseminate HPV vaccine messaging to reach youth and parents, given the information channel has become mainstream with 330 million monthly users in the United States and 4.2 billion users worldwide. Yet, a gap remains on how to adapt evidence-based vaccine interventions for the in vivo competitive social media messaging environment and what strategies to employ to make vaccine messages go viral. Push-pull and RE-AIM dissemination frameworks guided our adaptation of a National Cancer Institute video-based HPV vaccine cancer control program, the HPV Vaccine Decision Narratives, for the social media environment. We also aimed to understand how dissemination might differ across three platforms, namely Instagram, TikTok, and Twitter, to increase reach and engagement. Centering theory and a question-answer framework guided the adaptation process of segmenting vaccine decision story videos into shorter coherent segments for social media. Twelve strategies were implemented over 4 months to build a following and disseminate the intervention. The evaluation showed that all platforms increased following, but Instagram and TikTok outperformed Twitter on impressions, followers, engagement, and reach metrics. Although TikTok increased reach the most (unique accounts that viewed content), Instagram increased followers, engagement, and impressions the most. For Instagram, the top performer, six of 12 strategies contributed to increasing reach, including the use of videos, more than 11 hashtags, COVID-19 hashtags, mentions, and follow-for-follow strategies. This observational social media study identified dissemination strategies that significantly increased the reach of vaccine messages in a real-world competitive social media messaging environment. Engagement presented greater challenges. Results inform the planning and adaptation considerations necessary for transforming public health HPV vaccine interventions for social media environments, with unique considerations depending on the platform.

4.
J Biomed Inform ; 124: 103955, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34800722

RESUMO

Enormous hope in the efficacy of vaccines became recently a successful reality in the fight against the COVID-19 pandemic. However, vaccine hesitancy, fueled by exposure to social media misinformation about COVID-19 vaccines became a major hurdle. Therefore, it is essential to automatically detect where misinformation about COVID-19 vaccines on social media is spread and what kind of misinformation is discussed, such that inoculation interventions can be delivered at the right time and in the right place, in addition to interventions designed to address vaccine hesitancy. This paper is addressing the first step in tackling hesitancy against COVID-19 vaccines, namely the automatic detection of known misinformation about the vaccines on Twitter, the social media platform that has the highest volume of conversations about COVID-19 and its vaccines. We present CoVaxLies, a new dataset of tweets judged relevant to several misinformation targets about COVID-19 vaccines on which a novel method of detecting misinformation was developed. Our method organizes CoVaxLies in a Misinformation Knowledge Graph as it casts misinformation detection as a graph link prediction problem. The misinformation detection method detailed in this paper takes advantage of the link scoring functions provided by several knowledge embedding methods. The experimental results demonstrate the superiority of this method when compared with classification-based methods, widely used currently.


Assuntos
COVID-19 , Mídias Sociais , Vacinas contra COVID-19 , Comunicação , Humanos , Pandemias , SARS-CoV-2 , Hesitação Vacinal
5.
J Am Med Inform Assoc ; 27(10): 1556-1567, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-33029619

RESUMO

OBJECTIVE: We explored how knowledge embeddings (KEs) learned from the Unified Medical Language System (UMLS) Metathesaurus impact the quality of relation extraction on 2 diverse sets of biomedical texts. MATERIALS AND METHODS: Two forms of KEs were learned for concepts and relation types from the UMLS Metathesaurus, namely lexicalized knowledge embeddings (LKEs) and unlexicalized KEs. A knowledge embedding encoder (KEE) enabled learning either LKEs or unlexicalized KEs as well as neural models capable of producing LKEs for mentions of biomedical concepts in texts and relation types that are not encoded in the UMLS Metathesaurus. This allowed us to design the relation extraction with knowledge embeddings (REKE) system, which incorporates either LKEs or unlexicalized KEs produced for relation types of interest and their arguments. RESULTS: The incorporation of either LKEs or unlexicalized KE in REKE advances the state of the art in relation extraction on 2 relation extraction datasets: the 2010 i2b2/VA dataset and the 2013 Drug-Drug Interaction Extraction Challenge corpus. Moreover, the impact of LKEs is superior, achieving F1 scores of 78.2 and 82.0, respectively. DISCUSSION: REKE not only highlights the importance of incorporating knowledge encoded in the UMLS Metathesaurus in a novel way, through 2 possible forms of KEs, but it also showcases the subtleties of incorporating KEs in relation extraction systems. CONCLUSIONS: Incorporating LKEs informed by the UMLS Metathesaurus in a relation extraction system operating on biomedical texts shows significant promise. We present the REKE system, which establishes new state-of-the-art results for relation extraction on 2 datasets when using LKEs.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Bases de Conhecimento , Unified Medical Language System , Aprendizado Profundo
6.
J Biomed Inform ; 98: 103265, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31470094

RESUMO

The identification of medical concepts, their attributes and the relations between concepts in a large corpus of Electroencephalography (EEG) reports is a crucial step in the development of an EEG-specific patient cohort retrieval system. However, the recognition of multiple types of medical concepts, along with the many attributes characterizing them is challenging, and so is the recognition of the possible relations between them, especially when desiring to make use of active learning. To address these challenges, in this paper we present the Self-Attention Concept, Attribute and Relation (SACAR) identifier, which relies on a powerful encoding mechanism based on the recently introduced Transformer neural architecture (Dehghani et al., 2018). The SACAR identifier enabled us to consider a recently introduced framework for active learning which uses deep imitation learning for its selection policy. Our experimental results show that SACAR was able to identify medical concepts more precisely and exhibited enhanced recall, compared with previous methods. Moreover, SACAR achieves superior performance in attribute classification for attribute categories of interest, while identifying the relations between concepts with performance competitive with our previous techniques. As a multi-task network, SACAR achieves this performance on the three prediction tasks simultaneously, with a single, complex neural network. The learning curves obtained in the active learning process when using the novel Active Learning Policy Neural Network (ALPNN) show a significant increase in performance as the active learning progresses. These promising results enable the extraction of clinical knowledge available in a large collection of EEG reports.


Assuntos
Aprendizado Profundo , Eletroencefalografia , Informática Médica/métodos , Teorema de Bayes , Encéfalo/fisiopatologia , Estudos de Coortes , Processamento Eletrônico de Dados , Epilepsia/diagnóstico , Humanos , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Redes Neurais de Computação , Aprendizagem Baseada em Problemas
7.
AMIA Jt Summits Transl Sci Proc ; 2019: 543-552, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31259009

RESUMO

Incorporating the knowledge encoded in the Unified Medical Language System (UMLS) in deep learning methods requires learning knowledge embeddings from the knowledge graphs available in UMLS: the Metathesaurus and the Semantic Network. In this paper we present a technique using Generative Adversarial Networks (GANs) for learning UMLS embeddings and showcase their usage in a clinical prediction model. When the UMLS embeddings are available, the predictions improve by up to 6.9% absolute F1 score.

8.
AMIA Annu Symp Proc ; 2019: 627-636, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308857

RESUMO

Learning how to automatically align biomedical ontologies has been a long-standing goal, given their ever-growing content and the many applications that rely on them. Because the knowledge graphs underlying biomedical ontologies enable neural learning techniques to acquire knowledge embeddings as representations of these ontologies, neural learning can also consider ontology alignments. In this paper, we present the Knowledge-graph Alignment & Embedding Generative Adversarial Network (KAEGAN) which learns (a) to represent the relational knowledge from two distinct biomedical ontologies in the form of knowledge embeddings and (b) to use them for ontology alignment, by also relying on the ontology semantics. KAEGAN is a Generative Adversarial Network trained using bootstrapping to iteratively improve the learned alignments. Experimental results show promise, demonstrating that jointly learning ontology alignment and knowledge representation improves upon learning either in isolation.


Assuntos
Ontologias Biológicas , Modelos Teóricos , Redes Neurais de Computação , Vocabulário Controlado , Aprendizado de Máquina , Semântica
9.
JAMIA Open ; 1(2): 265-275, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30474078

RESUMO

OBJECTIVE: We explored how judgements provided by physicians can be used to learn relevance models that enhance the quality of patient cohorts retrieved from Electronic Health Records (EHRs) collections. METHODS: A very large number of features were extracted from patient cohort descriptions as well as EHR collections. The features were used to investigate retrieving (1) neurology-specific patient cohorts from the de-identified Temple University Hospital electroencephalography (EEG) Corpus as well as (2) the more general cohorts evaluated in the TREC Medical Records Track (TRECMed) from the de-identified hospital records provided by the University of Pittsburgh Medical Center. The features informed a learning relevance model (LRM) that took advantage of relevance judgements provided by physicians. The LRM implements a pairwise learning-to-rank framework, which enables our learning patient cohort retrieval (L-PCR) system to learn from physicians' feedback. RESULTS AND DISCUSSION: We evaluated the L-PCR system against state-of-the-art traditional patient cohort retrieval systems, and observed a 27% improvement when operating on EEGs and a 53% improvement when operating on TRECMed EHRs, showing the promise of the L-PCR system. We also performed extensive feature analyses to reveal the most effective strategies for representing cohort descriptions as queries, encoding EHRs, and measuring cohort relevance. CONCLUSION: The L-PCR system has significant promise for reliably retrieving patient cohorts from EHRs in multiple settings when trained with relevance judgments. When provided with additional cohort descriptions, the L-PCR system will continue to learn, thus offering a potential solution to the performance barriers of current cohort retrieval systems.

10.
Artigo em Inglês | MEDLINE | ID: mdl-29888040

RESUMO

As medical science continues to advance, health care professionals and researchers are increasingly turning to clinical trials to obtain evidence supporting best-practice treatment options. While clinical trial registries such as Clinical-Trials.gov aim to facilitate these needs, it has been shown that many trials in the registry do not contain links to their published results. To address this problem, we present NCT Link, a system for automatically linking registered clinical trials to published MEDLINE articles reporting their results. NCT Link incorporates state-of-the-art deep learning and information retrieval techniques by automatically learning a Deep Highway Network (DHN) that estimates the likelihood that a MEDLINE article reports the results of a clinical trial. Our experimental results indicate that NCT Link obtains 30%-58% improved performance over previously reported automatic systems, suggesting that NCT Link could become a valuable tool for health care providers seeking to deliver best-practice medical care informed by evidence of clinical trials as well as (a) researchers investigating selective publication and reporting of clinical trial outcomes, and (b) study designers seeking to avoid unnecessary duplication of research efforts.

11.
AMIA Jt Summits Transl Sci Proc ; 2017: 156-165, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29888063

RESUMO

The automatic identification of relations between medical concepts in a large corpus of Electroencephalography (EEG) reports is an important step in the development of an EEG-specific patient cohort retrieval system as well as in the acquisition of EEG-specific knowledge from this corpus. EEG-specific relations involve medical concepts that are not typically mentioned in the same sentence or even the same section of a report, thus requiring extraction techniques that can handle such long-distance dependencies. To address this challenge, we present a novel frame work which combines the advantages of a deep learning framework employing Dynamic Relational Memory (DRM) with active learning. While DRM enables the prediction of long-distance relations, active learning provides a mechanism for accurately identifying relations with minimal training data, obtaining an 5-fold cross validationF1 score of 0.7475 on a set of 140 EEG reports selected with active learning. The results obtained with our novel framework show great promise.

12.
AMIA Annu Symp Proc ; 2018: 1018-1027, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815145

RESUMO

Detecting negation in biomedical texts entails the automatic identification of negation cues (e.g. "never", "not", "no longer") as well as the scope of these cues. When medical concepts or terms are identified within the scope of a negation cue, their polarity is inferred as "negative". All the other concepts or words receive a positive polarity. Correctly inferring the polarity is essential for patient cohort retrieval systems, as all inclusion criteria need to be automatically assigned positive polarity, whereas exclusion criteria should receive negative polarity. Motivated by the recent development of techniques using deep learning, we have experimented with a neural negation detection technique and compared it against an existing neural polarity recognition system, which were incorporated in a patient cohort system operating on clinical electroencephalography (EEG) reports. Our experiments indicate that the neural negation detection method produces better patient cohorts then the polarity recognition method.


Assuntos
Aprendizado Profundo , Eletroencefalografia , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Estudos de Coortes , Processamento Eletrônico de Dados , Humanos
13.
AMIA Jt Summits Transl Sci Proc ; 2017: 112-121, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28815118

RESUMO

Secondary use1of electronic health records (EHRs) often relies on the ability to automatically identify and extract information from EHRs. Unfortunately, EHRs are known to suffer from a variety of idiosyncrasies - most prevalently, they have been shown to often omit or underspecify information. Adapting traditional machine learning methods for inferring underspecified information relies on manually specifying features characterizing the specific information to recover (e.g. particular findings, test results, or physician's impressions). By contrast, in this paper, we present a method for jointly (1) automatically extracting word- and report-level features and (2) inferring underspecified information from EHRs. Our approach accomplishes these two tasks jointly by combining recent advances in deep neural learning with access to textual data in electroencephalogram (EEG) reports. We evaluate the performance of our model on the problem of inferring the neurologist's over-all impression (normal or abnormal) from electroencephalogram (EEG) reports and report an accuracy of 91.4% precision of 94.4% recall of 91.2% and F1 measure of 92.8% (a 40% improvement over the performance obtained using Doc2Vec). These promising results demonstrate the power of our approach, while error analysis reveals remaining obstacles as well as areas for future improvement.

14.
AMIA Jt Summits Transl Sci Proc ; 2017: 229-238, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28815135

RESUMO

The annotation of a large corpus of Electroencephalography (EEG) reports is a crucial step in the development of an EEG-specific patient cohort retrieval system. The annotation of multiple types of EEG-specific medical concepts, along with their polarity and modality, is challenging, especially when automatically performed on Big Data. To address this challenge, we present a novel framework which combines the advantages of active and deep learning while producing annotations that capture a variety of attributes of medical concepts. Results obtained through our novel framework show great promise.

15.
J Biomed Inform ; 75S: S71-S84, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28576748

RESUMO

This paper presents a novel method for automatically recognizing symptom severity by using natural language processing of psychiatric evaluation records to extract features that are processed by machine learning techniques to assign a severity score to each record evaluated in the 2016 RDoC for Psychiatry Challenge from CEGS/N-GRID. The natural language processing techniques focused on (a) discerning the discourse information expressed in questions and answers; (b) identifying medical concepts that relate to mental disorders; and (c) accounting for the role of negation. The machine learning techniques rely on the assumptions that (1) the severity of a patient's positive valence symptoms exists on a latent continuous spectrum and (2) all the patient's answers and narratives documented in the psychological evaluation records are informed by the patient's latent severity score along this spectrum. These assumptions motivated our two-step machine learning framework for automatically recognizing psychological symptom severity. In the first step, the latent continuous severity score is inferred from each record; in the second step, the severity score is mapped to one of the four discrete severity levels used in the CEGS/N-GRID challenge. We evaluated three methods for inferring the latent severity score associated with each record: (i) pointwise ridge regression; (ii) pairwise comparison-based classification; and (iii) a hybrid approach combining pointwise regression and the pairwise classifier. The second step was implemented using a tree of cascading support vector machine (SVM) classifiers. While the official evaluation results indicate that all three methods are promising, the hybrid approach not only outperformed the pairwise and pointwise methods, but also produced the second highest performance of all submissions to the CEGS/N-GRID challenge with a normalized MAE score of 84.093% (where higher numbers indicate better performance). These evaluation results enabled us to observe that, for this task, considering pairwise information can produce more accurate severity scores than pointwise regression - an approach widely used in other systems for assigning severity scores. Moreover, our analysis indicates that using a cascading SVM tree outperforms traditional SVM classification methods for the purpose of determining discrete severity levels.


Assuntos
Reconhecimento Automatizado de Padrão , Testes Psicológicos , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Índice de Gravidade de Doença
16.
AMIA Annu Symp Proc ; 2017: 770-779, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854143

RESUMO

Successful diagnosis and management of neurological dysfunction relies on proper communication between the neurologist and the primary physician (or other specialists). Because this communication is documented within medical records, the ability to automatically infer the clinical correlations for a patient from his or her medical records would provide an important step towards enabling health care systems to automatically identify patients requiring additional follow-up as well as flagging any unexpected clinical correlations for review. In this paper, we present a Deep Section Recovery Model (DSRM) which applies deep neural learning on a large body of EEG reports in order to infer the expected clinical correlations for a patient from the information in a given EEG report by (1) automatically extracting word- and report- level features from the report and (2) inferring the most likely clinical correlations and expressing those clinical correlations in natural language. We evaluated the performance of the DSRM by removing the clinical correlation sections from EEG reports and measuring how well the model could recover that information from the remainder of the report. The DSRM obtained a 17% improvement over the top-performing baseline, highlighting not only the power of the DSRM but also the promise of automatically recognizing unexpected clinical correlations in the future.


Assuntos
Aprendizado Profundo , Eletroencefalografia , Modelos Neurológicos , Processamento de Linguagem Natural , Doenças do Sistema Nervoso/diagnóstico , Feminino , Humanos , Masculino , Neurologia
17.
AMIA Annu Symp Proc ; 2017: 1233-1242, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854192

RESUMO

While biomedical ontologies have traditionally been used to guide the identification of concepts or relations in biomedical data, recent advances in deep learning are able to capture high-quality knowledge from textual data and represent it in graphical structures. As opposed to the top-down methodology used in the generation of ontologies, which starts with the principled design of the upper ontology, the bottom-up methodology enabled by deep learning encodes the likelihood that concepts share certain relations, as evidenced by data. In this paper, we present a knowledge representation produced by deep learning methods, called Medical Knowledge Embeddings (MKE), that encode medical concepts related to the study of epilepsy and the relations between them. Many of the epilepsy-relevant medical concepts from MKE are not yet available in existing biomedical ontologies, but are mentioned in vast collections of epilepsy-related medical records which also imply their relationships. The evaluation of the MKE indicates high accuracy of the medical concepts automatically identified from clinical text as well as promising results in terms of correctness and completeness of relations produced by deep learning.


Assuntos
Ontologias Biológicas , Aprendizado Profundo , Eletroencefalografia , Epilepsia , Confiabilidade dos Dados , Humanos , Prontuários Médicos
18.
Artigo em Inglês | MEDLINE | ID: mdl-27595044

RESUMO

The wealth of clinical information provided by the advent of electronic health records offers an exciting opportunity to improve the quality of patient care. Of particular importance are the risk factors, which indicate possible diagnoses, and the medications which treat them. By analysing which risk factors and medications were mentioned at different times in patients' EHRs, we are able to construct a patient's clinical chronology. This chronology enables us to not only predict how new patient's risk factors may progress, but also to discover patterns of interactions between risk factors and medications. We present a novel probabilistic model of patients' clinical chronologies and demonstrate how this model can be used to (1) predict the way a new patient's risk factors may evolve over time, (2) identify patients with irregular chronologies, and (3) discovering the interactions between pairs of risk factors, and between risk factors and medications over time. Moreover, the model proposed in this paper does not rely on (nor specify) any prior knowledge about any interactions between the risk factors and medications it represents. Thus, our model can be easily applied to any arbitrary set of risk factors and medications derived from a new dataset.

19.
AMIA Annu Symp Proc ; 2016: 1794-1803, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28269938

RESUMO

Clinical electroencephalography (EEG) is the most important investigation in the diagnosis and management of epilepsies. An EEG records the electrical activity along the scalp and measures spontaneous electrical activity of the brain. Because the EEG signal is complex, its interpretation is known to produce moderate inter-observer agreement among neurologists. This problem can be addressed by providing clinical experts with the ability to automatically retrieve similar EEG signals and EEG reports through a patient cohort retrieval system operating on a vast archive of EEG data. In this paper, we present a multi-modal EEG patient cohort retrieval system called MERCuRY which leverages the heterogeneous nature of EEG data by processing both the clinical narratives from EEG reports as well as the raw electrode potentials derived from the recorded EEG signal data. At the core of MERCuRY is a novel multimodal clinical indexing scheme which relies on EEG data representations obtained through deep learning. The index is used by two clinical relevance models that we have generated for identifying patient cohorts satisfying the inclusion and exclusion criteria expressed in natural language queries. Evaluations of the MERCuRY system measured the relevance of the patient cohorts, obtaining MAP scores of 69.87% and a NDCG of 83.21%.


Assuntos
Indexação e Redação de Resumos , Eletroencefalografia , Epilepsia/fisiopatologia , Armazenamento e Recuperação da Informação , Sistemas de Informação , Eletrodos , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação
20.
LREC Int Conf Lang Resour Eval ; 2016: 4621-4628, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-28649676

RESUMO

Our ability to understand language often relies on common-sense knowledge - background information the speaker can assume is known by the reader. Similarly, our comprehension of the language used in complex domains relies on access to domain-specific knowledge. Capturing common-sense and domain-specific knowledge can be achieved by taking advantage of recent advances in open information extraction (IE) techniques and, more importantly, of knowledge embeddings, which are multi-dimensional representations of concepts and relations. Building a knowledge graph for representing common-sense knowledge in which concepts discerned from noun phrases are cast as vertices and lexicalized relations are cast as edges leads to learning the embeddings of common-sense knowledge accounting for semantic compositionality as well as implied knowledge. Common-sense knowledge is acquired from a vast collection of blogs and books as well as from WordNet. Similarly, medical knowledge is learned from two large sets of electronic health records. The evaluation results of these two forms of knowledge are promising: the same knowledge acquisition methodology based on learning knowledge embeddings works well both for common-sense knowledge and for medical knowledge Interestingly, the common-sense knowledge that we have acquired was evaluated as being less neutral than than the medical knowledge, as it often reflected the opinion of the knowledge utterer. In addition, the acquired medical knowledge was evaluated as more plausible than the common-sense knowledge, reflecting the complexity of acquiring common-sense knowledge due to the pragmatics and economicity of language.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...