RESUMEN
OBJECTIVE AND BACKGROUND: The exponential growth of the unstructured data available in biomedical literature, and Electronic Health Record (EHR), requires powerful novel technologies and architectures to unlock the information hidden in the unstructured data. The success of smart healthcare applications such as clinical decision support systems, disease diagnosis systems, and healthcare management systems depends on knowledge that is understandable by machines to interpret and infer new knowledge from it. In this regard, ontological data models are expected to play a vital role to organize, integrate, and make informative inferences with the knowledge implicit in that unstructured data and represent the resultant knowledge in a form that machines can understand. However, constructing such models is challenging because they demand intensive labor, domain experts, and ontology engineers. Such requirements impose a limit on the scale or scope of ontological data models. We present a framework that will allow mitigating the time-intensity to build ontologies and achieve machine interoperability. METHODS: Empowered by linked biomedical ontologies, our proposed novel Automated Ontology Generation Framework consists of five major modules: a) Text Processing using compute on demand approach. b) Medical Semantic Annotation using N-Gram, ontology linking and classification algorithms, c) Relation Extraction using graph method and Syntactic Patterns, d), Semantic Enrichment using RDF mining, e) Domain Inference Engine to build the formal ontology. RESULTS: Quantitative evaluations show 84.78% recall, 53.35% precision, and 67.70% F-measure in terms of disease-drug concepts identification; 85.51% recall, 69.61% precision, and F-measure 76.74% with respect to taxonomic relation extraction; and 77.20% recall, 40.10% precision, and F-measure 52.78% with respect to biomedical non-taxonomic relation extraction. CONCLUSION: We present an automated ontology generation framework that is empowered by Linked Biomedical Ontologies. This framework integrates various natural language processing, semantic enrichment, syntactic pattern, and graph algorithm based techniques. Moreover, it shows that using Linked Biomedical Ontologies enables a promising solution to the problem of automating the process of disease-drug ontology generation.
Asunto(s)
Ontologías Biológicas/estadística & datos numéricos , Algoritmos , Minería de Datos , Sistemas de Apoyo a Decisiones Clínicas , Enfermedad , Quimioterapia , Humanos , Bases del Conocimiento , Aprendizaje Automático , SemánticaRESUMEN
BACKGROUND: Fulfilling the vision of Semantic Web requires an accurate data model for organizing knowledge and sharing common understanding of the domain. Fitting this description, ontologies are the cornerstones of Semantic Web and can be used to solve many problems of clinical information and biomedical engineering, such as word sense disambiguation, semantic similarity, question answering, ontology alignment, etc. Manual construction of ontology is labor intensive and requires domain experts and ontology engineers. To downsize the labor-intensive nature of ontology generation and minimize the need for domain experts, we present a novel automated ontology generation framework, Linked Open Data approach for Automatic Biomedical Ontology Generation (LOD-ABOG), which is empowered by Linked Open Data (LOD). LOD-ABOG performs concept extraction using knowledge base mainly UMLS and LOD, along with Natural Language Processing (NLP) operations; and applies relation extraction using LOD, Breadth first Search (BSF) graph method, and Freepal repository patterns. RESULTS: Our evaluation shows improved results in most of the tasks of ontology generation compared to those obtained by existing frameworks. We evaluated the performance of individual tasks (modules) of proposed framework using CDR and SemMedDB datasets. For concept extraction, evaluation shows an average F-measure of 58.12% for CDR corpus and 81.68% for SemMedDB; F-measure of 65.26% and 77.44% for biomedical taxonomic relation extraction using datasets of CDR and SemMedDB, respectively; and F-measure of 52.78% and 58.12% for biomedical non-taxonomic relation extraction using CDR corpus and SemMedDB, respectively. Additionally, the comparison with manually constructed baseline Alzheimer ontology shows F-measure of 72.48% in terms of concepts detection, 76.27% in relation extraction, and 83.28% in property extraction. Also, we compared our proposed framework with ontology-learning framework called "OntoGain" which shows that LOD-ABOG performs 14.76% better in terms of relation extraction. CONCLUSION: This paper has presented LOD-ABOG framework which shows that current LOD sources and technologies are a promising solution to automate the process of biomedical ontology generation and extract relations to a greater extent. In addition, unlike existing frameworks which require domain experts in ontology development process, the proposed approach requires involvement of them only for improvement purpose at the end of ontology life cycle.
Asunto(s)
Ontologías Biológicas , Investigación Biomédica , Procesamiento de Lenguaje Natural , Semántica , Humanos , Bases del ConocimientoRESUMEN
Venous thromboembolism (VTE) is the third most common cardiovascular disorder. It affects people of both genders at ages as young as 20 years. The increased number of VTE cases with a high fatality rate of 25% at first occurrence makes preventive measures essential. Clinical narratives are a rich source of knowledge and should be included in the diagnosis and treatment processes, as they may contain critical information on risk factors. It is very important to make such narrative blocks of information usable for searching, health analytics, and decision-making. This paper proposes a Semantic Extraction and Sentiment Assessment of Risk Factors (SESARF) framework. Unlike traditional machine-learning approaches, SESARF, which consists of two main algorithms, namely, ExtractRiskFactor and FindSeverity, prepares a feature vector as the input to a support vector machine (SVM) classifier to make a diagnosis. SESARF matches and maps the concepts of VTE risk factors and finds adjectives and adverbs that reflect their levels of severity. SESARF uses a semantic- and sentiment-based approach to analyze clinical narratives of electronic health records (EHR) and then predict a diagnosis of VTE. We use a dataset of 150 clinical narratives, 80% of which are used to train our prediction classifier support vector machine, with the remaining 20% used for testing. Semantic extraction and sentiment analysis results yielded precisions of 81% and 70%, respectively. Using a support vector machine, prediction of patients with VTE yielded precision and recall values of 54.5% and 85.7%, respectively.