Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
J Biomed Inform ; 42(5): 950-66, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19535011

RESUMEN

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Registros Médicos , Procesamiento de Lenguaje Natural , Semántica , Indización y Redacción de Resúmenes , Investigación Biomédica , Guías como Asunto , Humanos , Internet , Modelos Estadísticos , Neoplasias , Terminología como Asunto , Interfaz Usuario-Computador
2.
BMC Bioinformatics ; 9 Suppl 11: S3, 2008 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-19025689

RESUMEN

BACKGROUND: The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships. RESULTS: We have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships. CONCLUSION: We have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Algoritmos , Humanos , Sistemas de Registros Médicos Computarizados
3.
BMC Bioinformatics ; 9 Suppl 11: S7, 2008 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-19025693

RESUMEN

BACKGROUND: Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of various sources of information including linguistic features of the context in which the ambiguous term is used and domain-specific resources, such as UMLS. MATERIALS AND METHODS: We compare various sources of information including ones which have been previously used and a novel one: MeSH terms. Evaluation is carried out using a standard test set (the NLM-WSD corpus). RESULTS: The best performance is obtained using a combination of linguistic features and MeSH terms. Performance of our system exceeds previously published results for systems evaluated using the same data set. CONCLUSION: Disambiguation of biomedical terms benefits from the use of information from a variety of sources. In particular, MeSH terms have proved to be useful and should be used if available.


Asunto(s)
Documentación/métodos , Almacenamiento y Recuperación de la Información , Medical Subject Headings , Reconocimiento de Normas Patrones Automatizadas/métodos , Terminología como Asunto , Algoritmos , Inteligencia Artificial , Bases de Datos Bibliográficas , Lingüística , MEDLINE , Procesamiento de Lenguaje Natural , Unified Medical Language System
4.
IEEE Trans Pattern Anal Mach Intell ; 40(12): 3045-3058, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29990152

RESUMEN

Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.

5.
J Am Med Inform Assoc ; 22(5): 987-92, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25971437

RESUMEN

OBJECTIVE: Literature-based discovery (LBD) aims to identify "hidden knowledge" in the medical literature by: (1) analyzing documents to identify pairs of explicitly related concepts (terms), then (2) hypothesizing novel relations between pairs of unrelated concepts that are implicitly related via a shared concept to which both are explicitly related. Many LBD approaches use simple techniques to identify semantically weak relations between concepts, for example, document co-occurrence. These generate huge numbers of hypotheses, difficult for humans to assess. More complex techniques rely on linguistic analysis, for example, shallow parsing, to identify semantically stronger relations. Such approaches generate fewer hypotheses, but may miss hidden knowledge. The authors investigate this trade-off in detail, comparing techniques for identifying related concepts to discover which are most suitable for LBD. MATERIALS AND METHODS: A generic LBD system that can utilize a range of relation types was developed. Experiments were carried out comparing a number of techniques for identifying relations. Two approaches were used for evaluation: replication of existing discoveries and the "time slicing" approach.(1) RESULTS: Previous LBD discoveries could be replicated using relations based either on document co-occurrence or linguistic analysis. Using relations based on linguistic analysis generated many fewer hypotheses, but a significantly greater proportion of them were candidates for hidden knowledge. DISCUSSION AND CONCLUSION: The use of linguistic analysis-based relations improves accuracy of LBD without overly damaging coverage. LBD systems often generate huge numbers of hypotheses, which are infeasible to manually review. Improving their accuracy has the potential to make these systems significantly more usable.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Lingüística
6.
AMIA Annu Symp Proc ; : 625-9, 2007 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-18693911

RESUMEN

The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations. CLEF uses Information Extraction (IE) to make this unstructured information available. An important part of IE is the identification of semantic entities and relationships. Typical approaches require human annotated documents to provide both evaluation standards and material for system development. CLEF has a corpus of clinical narratives, histopathology reports and imaging reports from 20 thousand patients. We describe the selection of a subset of this corpus for manual annotation of clinical entities and relationships. We describe an annotation methodology and report encouraging initial results of inter-annotator agreement. Comparisons are made between different text sub-genres, and between annotators with different skills.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Sistemas de Registros Médicos Computarizados , Procesamiento de Lenguaje Natural , Humanos , Semántica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA