RESUMO
The paper presents a method for recommending augmentations against conceptual gaps in textbooks. Question Answer (QA) pairs from community question-answering (cQA) forums are noted to offer precise and comprehensive illustrations of concepts. Our proposed method retrieves QA pairs for a target concept to suggest two types of augmentations: basic and supplementary. Basic augmentations are suggested for the concepts on which a textbook lacks fundamental references. We identified such deficiencies by employing a supervised machine learning-based approach trained on 12 features concerning the textbook's discourse. Supplementary augmentations aiming for additional references are suggested for all the concepts. Retrieved QA pairs were filtered to ensure their comprehensiveness for the target students. The proposed augmentation system was deployed using a web-based interface. We collected 28 Indian textbooks and manually curated them to create gold standards for assessing our proposed system. Analyzing expert opinions and adopting an equivalent pretest-posttest setup for the students, the quality of these augmentations was quantified. We evaluated the usability of the interface from students' responses. Both system and human-based evaluations indicated that the suggested augmentations addressed the concept-specific deficiency and provided additional materials to stimulate learning interest. The learning interface was easy-to-use and showcased these augmentations effectively.
RESUMO
In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.