Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Biomed Inform ; 130: 104073, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35427797

RESUMEN

A vast amount of crucial information about patients resides solely in unstructured clinical narrative notes. There has been a growing interest in clinical Named Entity Recognition (NER) task using deep learning models. Such approaches require sufficient annotated data. However, there is little publicly available annotated corpora in the medical field due to the sensitive nature of the clinical text. In this paper, we tackle this problem by building privacy-preserving shareable models for French clinical Named Entity Recognition using the mimic learning approach to enable the knowledge transfer through a teacher model trained on a private corpus to a student model. This student model could be publicly shared without any access to the original sensitive data. We evaluated three privacy-preserving models using three medical corpora and compared the performance of our models to those of baseline models such as dictionary-based models. An overall macro F-measure of 70.6% could be achieved by a student model trained using silver annotations produced by the teacher model, compared to 85.7% for the original private teacher model. Our results revealed that these privacy-preserving mimic learning models offer a good compromise between performance and data privacy preservation.


Asunto(s)
Narración , Privacidad , Humanos , Procesamiento de Lenguaje Natural
2.
BMC Bioinformatics ; 17(1): 392, 2016 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-27659604

RESUMEN

BACKGROUND: Clinical trial registries may allow for producing a global mapping of health research. However, health conditions are not described with standardized taxonomies in registries. Previous work analyzed clinical trial registries to improve the retrieval of relevant clinical trials for patients. However, no previous work has classified clinical trials across diseases using a standardized taxonomy allowing a comparison between global health research and global burden across diseases. We developed a knowledge-based classifier of health conditions studied in registered clinical trials towards categories of diseases and injuries from the Global Burden of Diseases (GBD) 2010 study. The classifier relies on the UMLS® knowledge source (Unified Medical Language System®) and on heuristic algorithms for parsing data. It maps trial records to a 28-class grouping of the GBD categories by automatically extracting UMLS concepts from text fields and by projecting concepts between medical terminologies. The classifier allows deriving pathways between the clinical trial record and candidate GBD categories using natural language processing and links between knowledge sources, and selects the relevant GBD classification based on rules of prioritization across the pathways found. We compared automatic and manual classifications for an external test set of 2,763 trials. We automatically classified 109,603 interventional trials registered before February 2014 at WHO ICTRP. RESULTS: In the external test set, the classifier identified the exact GBD categories for 78 % of the trials. It had very good performance for most of the 28 categories, especially "Neoplasms" (sensitivity 97.4 %, specificity 97.5 %). The sensitivity was moderate for trials not relevant to any GBD category (53 %) and low for trials of injuries (16 %). For the 109,603 trials registered at WHO ICTRP, the classifier did not assign any GBD category to 20.5 % of trials while the most common GBD categories were "Neoplasms" (22.8 %) and "Diabetes" (8.9 %). CONCLUSIONS: We developed and validated a knowledge-based classifier allowing for automatically identifying the diseases studied in registered trials by using the taxonomy from the GBD 2010 study. This tool is freely available to the research community and can be used for large-scale public health studies.

4.
BMC Bioinformatics ; 15: 266, 2014 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-25099227

RESUMEN

BACKGROUND: Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography and venography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts, modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. A corpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physician for relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by a physician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decision models accounted for the imbalanced nature of the data and exploited the structure of the reports. RESULTS: The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep vein thrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improved performances in all cases. CONCLUSIONS: This study demonstrates the benefits of developing an automated method to identify medical concepts, modality and relations from radiology reports in French. An end-to-end automatic system for annotation and classification which could be applied to other radiology reports databases would be valuable for epidemiological surveillance, performance monitoring, and accreditation in French hospitals.


Asunto(s)
Biología Computacional/métodos , Hallazgos Incidentales , Procesamiento de Lenguaje Natural , Embolia Pulmonar/diagnóstico por imagen , Radiología , Informe de Investigación , Tomografía Computarizada por Rayos X , Algoritmos , Humanos
5.
J Biomed Inform ; 50: 151-61, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24380818

RESUMEN

BACKGROUND: To facilitate research applying Natural Language Processing to clinical documents, tools and resources are needed for the automatic de-identification of Electronic Health Records. OBJECTIVE: This study investigates methods for developing a high-quality reference corpus for the de-identification of clinical documents in French. METHODS: A corpus comprising a variety of clinical document types covering several medical specialties was pre-processed with two automatic de-identification systems from the MEDINA suite of tools: a rule-based system and a system using Conditional Random Fields (CRF). The pre-annotated documents were revised by two human annotators trained to mark ten categories of Protected Health Information (PHI). The human annotators worked independently and were blind to the system that produced the pre-annotations they were revising.The best pre-annotation system was applied to another random selection of 100 documents.After revision by one annotator, this set was used to train a statistical de-identification system. RESULTS: Two gold standard sets of 100 documents were created based on the consensus of two human revisions of the automatic pre-annotations.The annotation experiment showed that (i) automatic pre-annotation obtained with the rule-based system performed better (F=0.813) than the CRF system (F=0.519), (ii) the human annotators spent more time revising the pre-annotations obtained with the rule-based system (from 102 to 160minutes for 50 documents), compared to the CRF system (from 93 to 142minutes for 50 documents), (iii) the quality of human annotation is higher when pre-annotations are obtained with the rule-based system (F-measure ranging from 0.970 to 0.987), compared to the CRF system (F-measure ranging from 0.914 to 0.981).Finally, only 20 documents from the training set were needed for the statistical system to outperform the pre-annotation systems that were trained on corpora from a medical speciality and hospital different from those in the reference corpus developed herein. CONCLUSION: We find that better pre-annotations increase the quality of the reference corpus but require more revision time. A statistical de-identification method outperforms our rule-based system when as little as 20 custom training documents are available.


Asunto(s)
Registros Electrónicos de Salud , Francia , Humanos , Procesamiento de Lenguaje Natural
6.
BMC Bioinformatics ; 14: 146, 2013 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-23631733

RESUMEN

BACKGROUND: Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. RESULTS: We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. CONCLUSIONS: We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts.


Asunto(s)
MEDLINE , Traducción , Lingüística/métodos , Modelos Estadísticos , Edición
7.
Stud Health Technol Inform ; 302: 768-772, 2023 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-37203492

RESUMEN

Previous work has successfully used machine learning and natural language processing for the phenotyping of Rheumatoid Arthritis (RA) patients in hospitals within the United States and France. Our goal is to evaluate the adaptability of RA phenotyping algorithms to a new hospital, both at the patient and encounter levels. Two algorithms are adapted and evaluated with a newly developed RA gold standard corpus, including annotations at the encounter level. The adapted algorithms offer comparably good performance for patient-level phenotyping on the new corpus (F1 0.68 to 0.82), but lower performance for encounter-level (F1 0.54). Regarding adaptation feasibility and cost, the first algorithm incurred a heavier adaptation burden because it required manual feature engineering. However, it is less computationally intensive than the second, semi-supervised, algorithm.


Asunto(s)
Artritis Reumatoide , Registros Electrónicos de Salud , Humanos , Algoritmos , Artritis Reumatoide/diagnóstico , Aprendizaje Automático , Procesamiento de Lenguaje Natural
8.
Bioinformatics ; 27(23): 3306-12, 2011 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-21998156

RESUMEN

MOTIVATION: Research in the biomedical domain can have a major impact through open sharing of the data produced. For this reason, it is important to be able to identify instances of data production and deposition for potential re-use. Herein, we report on the automatic identification of data deposition statements in research articles. RESULTS: We apply machine learning algorithms to sentences extracted from full-text articles in PubMed Central in order to automatically determine whether a given article contains a data deposition statement, and retrieve the specific statements. With an Support Vector Machine classifier using conditional random field determined deposition features, articles containing deposition statements are correctly identified with 81% F-measure. An error analysis shows that almost half of the articles classified as containing a deposition statement by our method but not by the gold standard do indeed contain a deposition statement. In addition, our system was used to process articles in PubMed Central, predicting that a total of 52 932 articles report data deposition, many of which are not currently included in the Secondary Source Identifier [si] field for MEDLINE citations. AVAILABILITY: All annotated datasets described in this study are freely available from the NLM/NCBI website at http://www.ncbi.nlm.nih.gov/CBBresearch/Fellows/Neveol/DepositionDataSets.zip CONTACT: aurelie.neveol@nih.gov; john.wilbur@nih.gov; zhiyong.lu@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Bases de Datos Genéticas , Datos de Secuencia Molecular , PubMed , Máquina de Vectores de Soporte , Inteligencia Artificial , Automatización , MEDLINE , National Library of Medicine (U.S.) , Estados Unidos
10.
J Med Libr Assoc ; 100(3): 176-83, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22879806

RESUMEN

BACKGROUND: As more scientific work is published, it is important to improve access to the biomedical literature. Since 2000, when Medical Subject Headings (MeSH) Concepts were introduced, the MeSH Thesaurus has been concept based. Nevertheless, information retrieval is still performed at the MeSH Descriptor or Supplementary Concept level. OBJECTIVE: The study assesses the benefit of using MeSH Concepts for indexing and information retrieval. METHODS: Three sets of queries were built for thirty-two rare diseases and twenty-two chronic diseases: (1) using PubMed Automatic Term Mapping (ATM), (2) using Catalog and Index of French-language Health Internet (CISMeF) ATM, and (3) extrapolating the MEDLINE citations that should be indexed with a MeSH Concept. RESULTS: Type 3 queries retrieve significantly fewer results than type 1 or type 2 queries (about 18,000 citations versus 200,000 for rare diseases; about 300,000 citations versus 2,000,000 for chronic diseases). CISMeF ATM also provides better precision than PubMed ATM for both disease categories. DISCUSSION: Using MeSH Concept indexing instead of ATM is theoretically possible to improve retrieval performance with the current indexing policy. However, using MeSH Concept information retrieval and indexing rules would be a fundamentally better approach. These modifications have already been implemented in the CISMeF search engine.


Asunto(s)
Indización y Redacción de Resúmenes/estadística & datos numéricos , Bases de Datos como Asunto/estadística & datos numéricos , Medical Subject Headings/estadística & datos numéricos , Terminología como Asunto , Algoritmos , Enfermedad Crónica , Procesamiento Automatizado de Datos , Francia , Humanos , Almacenamiento y Recuperación de la Información , Lenguaje , MEDLINE/estadística & datos numéricos , Control de Calidad , Enfermedades Raras
11.
BMC Bioinformatics ; 12 Suppl 3: S3, 2011 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-21658290

RESUMEN

BACKGROUND: Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preliminary steps for many important applications in medical informatics, ranging from quality of care to hypothesis generation. METHODS: In this work we describe an approach that facilitates the automatic recognition of eight relationships defined between medical problems, treatments and tests. Unlike the traditional bag-of-words representation, in this work, we represent a relationship with a scheme of five distinct context-blocks determined by the position of concepts in the text. As a preliminary step to relationship recognition, and in order to provide an end-to-end system, we also addressed the automatic extraction of medical problems, treatments and tests. Our approach combined the outcome of a statistical model for concept recognition and simple natural language processing features in a conditional random fields model. A set of 826 patient records from the 4th i2b2 challenge was used for training and evaluating the system. RESULTS: Results show that our concept recognition system achieved an F-measure of 0.870 for exact span concept detection. Moreover the context-block representation of relationships was more successful (F-Measure = 0.775) at identifying relationships than bag-of-words (F-Measure = 0.402). Most importantly, the performance of the end-to-end system of relationship extraction using automatically extracted concepts (F-Measure = 0.704) was comparable to that obtained using manually annotated concepts (F-Measure = 0.711), and their difference was not statistically significant. CONCLUSIONS: We extracted important clinical relationships from text in an automated manner, starting with concept recognition, and ending with relationship identification. The advantage of the context-blocks representation scheme was the correct management of word position information, which may be critical in identifying certain relationships. Our results may serve as benchmark for comparison to other systems developed on i2b2 challenge data. Finally, our system may serve as a preliminary step for other discovery tasks in medical informatics.


Asunto(s)
Sistemas de Registros Médicos Computarizados , Registros Médicos Orientados a Problemas , Modelos Estadísticos , Procesamiento de Lenguaje Natural , Humanos
12.
J Biomed Inform ; 44(2): 310-8, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21094696

RESUMEN

Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations.


Asunto(s)
PubMed , Semántica , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Vocabulario Controlado
13.
Stud Health Technol Inform ; 169: 492-6, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21893798

RESUMEN

BACKGROUND: Following a recent change in the indexing policy for French quality controlled health gateway CISMeF, multiple terminologies are now being used for indexing in addition to MeSH®. OBJECTIVE: To evaluate precision and recall of super-concepts for information retrieval in a multi-terminology paradigm compared to MeSH-only. METHODS: We evaluate the relevance of resources retrieved by multi-terminology super-concepts and MeSH-only super-concepts queries. RESULTS: Recall was 8-14% higher for multi-terminology super-concepts compared to MeSH only super-concepts. Precision decreased from 0.66 for MeSH only super-concepts to 0.61 for multi-terminology super-concepts. Retrieval performance was found to vary significantly depending on the super-concepts (p<10-4) and indexing methods (manual vs automatic; p<0.004). CONCLUSION: A multi-terminology paradigm contributes to increase recall but lowers precision. Automated tools for indexing are not accurate enough to allow a very precise information retrieval.


Asunto(s)
Indización y Redacción de Resúmenes , Almacenamiento y Recuperación de la Información/métodos , Informática Médica/métodos , Algoritmos , Catálogos como Asunto , Procesamiento Automatizado de Datos , Humanos , Internet , Medical Subject Headings , Reproducibilidad de los Resultados , Programas Informáticos , Estadística como Asunto , Terminología como Asunto
14.
J Am Med Inform Assoc ; 28(3): 504-515, 2021 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-33319904

RESUMEN

BACKGROUND: The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck. OBJECTIVE: To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks. MATERIALS AND METHODS: Based on the literature across multiple researcho fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations. RESULTS: We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows. DISCUSSION: 34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features. CONCLUSION: NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.


Asunto(s)
Procesamiento de Lenguaje Natural , Reproducibilidad de los Resultados , Biología Computacional , Sistemas de Administración de Bases de Datos , Informática Médica
15.
Stud Health Technol Inform ; 281: 1031-1035, 2021 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-34042835

RESUMEN

Diversity, inclusion and interdisciplinary collaboration are drivers for healthcare innovation and adoption of new, technology-mediated services. The importance of diversity has been highlighted by the United Nations' in SDG5 "Achieve gender equality and empower all women and girls", to drive adoption of social and digital innovation. Women play an instrumental role in health care and are in position to bring about significant changes to support ongoing digitalization and transformation. At the same time, women are underrepresented in Science, Technology, Engineering and Mathematics (STEM). To some extent, the same holds for health care informatics. This paper sums up input to strategies for peer mentoring to ensure diversity in health informatics, to target systemic inequalities and build sustainable, intergenerational communities, improve digital health literacy and build capacity in digital health without losing the human touch.


Asunto(s)
Informática Médica , Tutoría , Ingeniería , Femenino , Humanos , Liderazgo , Mentores
16.
J Biomed Inform ; 42(5): 814-23, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19166973

RESUMEN

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM's Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Inteligencia Artificial , MEDLINE , Medical Subject Headings , Procesamiento de Lenguaje Natural , Diccionarios Médicos como Asunto , Estudios de Evaluación como Asunto , Humanos , Interfaz Usuario-Computador
17.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31697361

RESUMEN

Curated databases of scientific literature play an important role in helping researchers find relevant literature, but populating such databases is a labour intensive and time-consuming process. One such database is the freely accessible Comet Core Outcome Set database, which was originally populated using manual screening in an annually updated systematic review. In order to reduce the workload and facilitate more timely updates we are evaluating machine learning methods to reduce the number of references needed to screen. In this study we have evaluated a machine learning approach based on logistic regression to automatically rank the candidate articles. Data from the original systematic review and its four first review updates were used to train the model and evaluate performance. We estimated that using automatic screening would yield a workload reduction of at least 75% while keeping the number of missed references around 2%. We judged this to be an acceptable trade-off for this systematic review, and the method is now being used for the next round of the Comet database update.


Asunto(s)
Curaduría de Datos , Minería de Datos , Bases de Datos Factuales , Aprendizaje Automático , Revisiones Sistemáticas como Asunto
18.
Syst Rev ; 8(1): 243, 2019 10 28.
Artículo en Inglés | MEDLINE | ID: mdl-31661028

RESUMEN

BACKGROUND: The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy. METHODS: We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates. RESULTS: The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations. CONCLUSION: Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.


Asunto(s)
Automatización , Pruebas Diagnósticas de Rutina , Tamizaje Masivo , Humanos , Pruebas Diagnósticas de Rutina/normas , Reproducibilidad de los Resultados , Proyectos de Investigación , Sensibilidad y Especificidad , Revisiones Sistemáticas como Asunto , Metaanálisis como Asunto
19.
BMC Bioinformatics ; 9 Suppl 11: S11, 2008 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-19025687

RESUMEN

BACKGROUND: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. METHODS: In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. RESULTS: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. CONCLUSION: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , MEDLINE , Medical Subject Headings , Procesamiento de Lenguaje Natural , Algoritmos , Inteligencia Artificial , Lenguajes de Programación
20.
Yearb Med Inform ; 27(1): 193-198, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-30157523

RESUMEN

OBJECTIVES: To summarize recent research and present a selection of the best papers published in 2017 in the field of clinical Natural Language Processing (NLP). METHODS: A survey of the literature was performed by the two editors of the NLP section of the International Medical Informatics Association (IMIA) Yearbook. Bibliographic databases PubMed and Association of Computational Linguistics (ACL) Anthology were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. A total of 709 papers were automatically ranked and then manually reviewed based on title and abstract. A shortlist of 15 candidate best papers was selected by the section editors and peer-reviewed by independent external reviewers to come to the three best clinical NLP papers for 2017. RESULTS: Clinical NLP best papers provide a contribution that ranges from methodological studies to the application of research results to practical clinical settings. They draw from text genres as diverse as clinical narratives across hospitals and languages or social media. CONCLUSIONS: Clinical NLP continued to thrive in 2017, with an increasing number of contributions towards applications compared to fundamental methods. Methodological work explores deep learning and system adaptation across language variants. Research results continue to translate into freely available tools and corpora, mainly for the English language.


Asunto(s)
Procesamiento de Lenguaje Natural , Personal de Salud , Humanos , Informática Médica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA