Pesquisa | Biblioteca Virtual em Saúde

Reuse of termino-ontological resources and text corpora for building a multilingual domain ontology: an application to Alzheimer's disease.

Dramé, Khadim; Diallo, Gayo; Delva, Fleur; Dartigues, Jean François; Mouillet, Evelyne; Salamon, Roger; Mougin, Fleur.

J Biomed Inform ; 48: 171-82, 2014 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-24382429

RESUMO

Ontologies are useful tools for sharing and exchanging knowledge. However ontology construction is complex and often time consuming. In this paper, we present a method for building a bilingual domain ontology from textual and termino-ontological resources intended for semantic annotation and information retrieval of textual documents. This method combines two approaches: ontology learning from texts and the reuse of existing terminological resources. It consists of four steps: (i) term extraction from domain specific corpora (in French and English) using textual analysis tools, (ii) clustering of terms into concepts organized according to the UMLS Metathesaurus, (iii) ontology enrichment through the alignment of French and English terms using parallel corpora and the integration of new concepts, (iv) refinement and validation of results by domain experts. These validated results are formalized into a domain ontology dedicated to Alzheimer's disease and related syndromes which is available online (http://lesim.isped.u-bordeaux2.fr/SemBiP/ressources/ontoAD.owl). The latter currently includes 5765 concepts linked by 7499 taxonomic relationships and 10,889 non-taxonomic relationships. Among these results, 439 concepts absent from the UMLS were created and 608 new synonymous French terms were added. The proposed method is sufficiently flexible to be applied to other domains.

Assuntos

Doença de Alzheimer/diagnóstico , Doença de Alzheimer/fisiopatologia , Idioma , Informática Médica/métodos , Algoritmos , Classificação , Humanos , Armazenamento e Recuperação da Informação , Reprodutibilidade dos Testes , Semântica , Software , Unified Medical Language System , Vocabulário Controlado

Towards a bilingual Alzheimer's disease terminology acquisition using a parallel corpus.

Drame, Khadim; Diallo, Gayo; Mougin, Fleur.

Stud Health Technol Inform ; 180: 179-83, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22874176

RESUMO

We present in this paper a method for acquiring a bilingual terminology concerning the Alzheimer's disease using a parallel corpus. NLP techniques are used for parsing English and French texts in order to extract candidate terms. These terms are then matched automatically using an approach that combines two alignment techniques: one based on the calculation of an association score between two terms, and another technique based on the calculation of morphological similarity. This method provided good results on an Alzheimer's disease related corpus with a precision of 73%.

Assuntos

Algoritmos , Doença de Alzheimer/classificação , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Semântica , Terminologia como Assunto , Vocabulário Controlado , Humanos

Large scale biomedical texts classification: a kNN and an ESA-based approaches.

Dramé, Khadim; Mougin, Fleur; Diallo, Gayo.

J Biomed Semantics ; 7: 40, 2016 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-27312781

RESUMO

BACKGROUND: With the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. Then, using only partial information to annotate these documents is promising but remains a very ambitious issue. METHODS: We propose two classification methods: a k-nearest neighbours (kNN)-based approach and an explicit semantic analysis (ESA)-based approach. Although the kNN-based approach is widely used in text classification, it needs to be improved to perform well in this specific classification problem which deals with partial information. Compared to existing kNN-based methods, our method uses classical Machine Learning (ML) algorithms for ranking the labels. Additional features are also investigated in order to improve the classifiers' performance. In addition, the combination of several learning algorithms with various techniques for fixing the number of relevant topics is performed. On the other hand, ESA seems promising for this classification task as it yielded interesting results in related issues, such as semantic relatedness computation between texts and text classification. Unlike existing works, which use ESA for enriching the bag-of-words approach with additional knowledge-based features, our ESA-based method builds a standalone classifier. Furthermore, we investigate if the results of this method could be useful as a complementary feature of our kNN-based approach. RESULTS: Experimental evaluations performed on large standard annotated datasets, provided by the BioASQ organizers, show that the kNN-based method with the Random Forest learning algorithm achieves good performances compared with the current state-of-the-art methods, reaching a competitive f-measure of 0.55 % while the ESA-based approach surprisingly yielded unsatisfactory results. CONCLUSIONS: We have proposed simple classification methods suitable to annotate textual documents using only partial information. They are therefore adequate for large multi-label classification and particularly in the biomedical domain. Thus, our work contributes to the extraction of relevant information from unstructured documents in order to facilitate their automated processing. Consequently, it could be used for various purposes, including document indexing, information retrieval, etc.

Assuntos

Ontologias Biológicas , Pesquisa Biomédica , Mineração de Dados , Aprendizado de Máquina , Semântica , Humanos , Processamento de Linguagem Natural

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA