Pesquisa | BVS CLAP/SMR-OPAS/OMS

1.

Privacy-preserving mimic models for clinical named entity recognition in French.

Bannour, Nesrine; Wajsbürt, Perceval; Rance, Bastien; Tannier, Xavier; Névéol, Aurélie.

J Biomed Inform ; 130: 104073, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35427797

RESUMO

A vast amount of crucial information about patients resides solely in unstructured clinical narrative notes. There has been a growing interest in clinical Named Entity Recognition (NER) task using deep learning models. Such approaches require sufficient annotated data. However, there is little publicly available annotated corpora in the medical field due to the sensitive nature of the clinical text. In this paper, we tackle this problem by building privacy-preserving shareable models for French clinical Named Entity Recognition using the mimic learning approach to enable the knowledge transfer through a teacher model trained on a private corpus to a student model. This student model could be publicly shared without any access to the original sensitive data. We evaluated three privacy-preserving models using three medical corpora and compared the performance of our models to those of baseline models such as dictionary-based models. An overall macro F-measure of 70.6% could be achieved by a student model trained using silver annotations produced by the teacher model, compared to 85.7% for the original private teacher model. Our results revealed that these privacy-preserving mimic learning models offer a good compromise between performance and data privacy preservation.

Assuntos

Narração , Privacidade , Humanos , Processamento de Linguagem Natural

2.

Automatic classification of registered clinical trials towards the Global Burden of Diseases taxonomy of diseases and injuries.

Atal, Ignacio; Zeitoun, Jean-David; Névéol, Aurélie; Ravaud, Philippe; Porcher, Raphaël; Trinquart, Ludovic.

BMC Bioinformatics ; 17(1): 392, 2016 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-27659604

RESUMO

BACKGROUND: Clinical trial registries may allow for producing a global mapping of health research. However, health conditions are not described with standardized taxonomies in registries. Previous work analyzed clinical trial registries to improve the retrieval of relevant clinical trials for patients. However, no previous work has classified clinical trials across diseases using a standardized taxonomy allowing a comparison between global health research and global burden across diseases. We developed a knowledge-based classifier of health conditions studied in registered clinical trials towards categories of diseases and injuries from the Global Burden of Diseases (GBD) 2010 study. The classifier relies on the UMLS® knowledge source (Unified Medical Language System®) and on heuristic algorithms for parsing data. It maps trial records to a 28-class grouping of the GBD categories by automatically extracting UMLS concepts from text fields and by projecting concepts between medical terminologies. The classifier allows deriving pathways between the clinical trial record and candidate GBD categories using natural language processing and links between knowledge sources, and selects the relevant GBD classification based on rules of prioritization across the pathways found. We compared automatic and manual classifications for an external test set of 2,763 trials. We automatically classified 109,603 interventional trials registered before February 2014 at WHO ICTRP. RESULTS: In the external test set, the classifier identified the exact GBD categories for 78 % of the trials. It had very good performance for most of the 28 categories, especially "Neoplasms" (sensitivity 97.4 %, specificity 97.5 %). The sensitivity was moderate for trials not relevant to any GBD category (53 %) and low for trials of injuries (16 %). For the 109,603 trials registered at WHO ICTRP, the classifier did not assign any GBD category to 20.5 % of trials while the most common GBD categories were "Neoplasms" (22.8 %) and "Diabetes" (8.9 %). CONCLUSIONS: We developed and validated a knowledge-based classifier allowing for automatically identifying the diseases studied in registered trials by using the taxonomy from the GBD 2010 study. This tool is freely available to the research community and can be used for large-scale public health studies.

3.

Ten simple rules to make your research more sustainable.

Ligozat, Anne-Laure; Névéol, Aurélie; Daly, Bénédicte; Frenoux, Emmanuelle.

PLoS Comput Biol ; 16(9): e1008148, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32970666

Assuntos

Pesquisa Biomédica , Pegada de Carbono/normas , Comportamento Cooperativo , Desenvolvimento Sustentável , Conservação dos Recursos Naturais , Conhecimentos, Atitudes e Prática em Saúde , Humanos , Internet , Viagem

4.

Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.

Pham, Anne-Dominique; Névéol, Aurélie; Lavergne, Thomas; Yasunaga, Daisuke; Clément, Olivier; Meyer, Guy; Morello, Rémy; Burgun, Anita.

BMC Bioinformatics ; 15: 266, 2014 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-25099227

RESUMO

BACKGROUND: Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography and venography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts, modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. A corpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physician for relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by a physician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decision models accounted for the imbalanced nature of the data and exploited the structure of the reports. RESULTS: The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep vein thrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improved performances in all cases. CONCLUSIONS: This study demonstrates the benefits of developing an automated method to identify medical concepts, modality and relations from radiology reports in French. An end-to-end automatic system for annotation and classification which could be applied to other radiology reports databases would be valuable for epidemiological surveillance, performance monitoring, and accreditation in French hospitals.

Assuntos

Biologia Computacional/métodos , Achados Incidentais , Processamento de Linguagem Natural , Embolia Pulmonar/diagnóstico por imagem , Radiologia , Relatório de Pesquisa , Tomografia Computadorizada por Raios X , Algoritmos , Humanos

5.

De-identification of clinical notes in French: towards a protocol for reference corpus development.

Grouin, Cyril; Névéol, Aurélie.

J Biomed Inform ; 50: 151-61, 2014 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-24380818

RESUMO

BACKGROUND: To facilitate research applying Natural Language Processing to clinical documents, tools and resources are needed for the automatic de-identification of Electronic Health Records. OBJECTIVE: This study investigates methods for developing a high-quality reference corpus for the de-identification of clinical documents in French. METHODS: A corpus comprising a variety of clinical document types covering several medical specialties was pre-processed with two automatic de-identification systems from the MEDINA suite of tools: a rule-based system and a system using Conditional Random Fields (CRF). The pre-annotated documents were revised by two human annotators trained to mark ten categories of Protected Health Information (PHI). The human annotators worked independently and were blind to the system that produced the pre-annotations they were revising.The best pre-annotation system was applied to another random selection of 100 documents.After revision by one annotator, this set was used to train a statistical de-identification system. RESULTS: Two gold standard sets of 100 documents were created based on the consensus of two human revisions of the automatic pre-annotations.The annotation experiment showed that (i) automatic pre-annotation obtained with the rule-based system performed better (F=0.813) than the CRF system (F=0.519), (ii) the human annotators spent more time revising the pre-annotations obtained with the rule-based system (from 102 to 160minutes for 50 documents), compared to the CRF system (from 93 to 142minutes for 50 documents), (iii) the quality of human annotation is higher when pre-annotations are obtained with the rule-based system (F-measure ranging from 0.970 to 0.987), compared to the CRF system (F-measure ranging from 0.914 to 0.981).Finally, only 20 documents from the training set were needed for the statistical system to outperform the pre-annotation systems that were trained on corpora from a medical speciality and hospital different from those in the reference corpus developed herein. CONCLUSION: We find that better pre-annotations increase the quality of the reference corpus but require more revision time. A statistical de-identification method outperforms our rule-based system when as little as 20 custom training documents are available.

Assuntos

Registros Eletrônicos de Saúde , França , Humanos , Processamento de Linguagem Natural

6.

Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text.

Yepes, Antonio Jimeno; Prieur-Gaston, Elise; Névéol, Aurélie.

BMC Bioinformatics ; 14: 146, 2013 Apr 30.

Artigo em Inglês | MEDLINE | ID: mdl-23631733

RESUMO

BACKGROUND: Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. RESULTS: We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. CONCLUSIONS: We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts.

Assuntos

MEDLINE , Tradução , Linguística/métodos , Modelos Estatísticos , Editoração

7.

Evaluating the Portability of Rheumatoid Arthritis Phenotyping Algorithms: A Case Study on French EHRs.

Fabacher, Thibaut; Sauleau, Erik-André; Leclerc Du Sablon, Noémie; Bergier, Hugo; Gottenberg, Jacques-Eric; Coulet, Adrien; Névéol, Aurélie.

Stud Health Technol Inform ; 302: 768-772, 2023 May 18.

Artigo em Inglês | MEDLINE | ID: mdl-37203492

RESUMO

Previous work has successfully used machine learning and natural language processing for the phenotyping of Rheumatoid Arthritis (RA) patients in hospitals within the United States and France. Our goal is to evaluate the adaptability of RA phenotyping algorithms to a new hospital, both at the patient and encounter levels. Two algorithms are adapted and evaluated with a newly developed RA gold standard corpus, including annotations at the encounter level. The adapted algorithms offer comparably good performance for patient-level phenotyping on the new corpus (F1 0.68 to 0.82), but lower performance for encounter-level (F1 0.54). Regarding adaptation feasibility and cost, the first algorithm incurred a heavier adaptation burden because it required manual feature engineering. However, it is less computationally intensive than the second, semi-supervised, algorithm.

Assuntos

Artrite Reumatoide , Registros Eletrônicos de Saúde , Humanos , Algoritmos , Artrite Reumatoide/diagnóstico , Aprendizado de Máquina , Processamento de Linguagem Natural

8.

Extraction of data deposition statements from the literature: a method for automatically tracking research results.

Névéol, Aurélie; Wilbur, W John; Lu, Zhiyong.

Bioinformatics ; 27(23): 3306-12, 2011 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-21998156

RESUMO

MOTIVATION: Research in the biomedical domain can have a major impact through open sharing of the data produced. For this reason, it is important to be able to identify instances of data production and deposition for potential re-use. Herein, we report on the automatic identification of data deposition statements in research articles. RESULTS: We apply machine learning algorithms to sentences extracted from full-text articles in PubMed Central in order to automatically determine whether a given article contains a data deposition statement, and retrieve the specific statements. With an Support Vector Machine classifier using conditional random field determined deposition features, articles containing deposition statements are correctly identified with 81% F-measure. An error analysis shows that almost half of the articles classified as containing a deposition statement by our method but not by the gold standard do indeed contain a deposition statement. In addition, our system was used to process articles in PubMed Central, predicting that a total of 52 932 articles report data deposition, many of which are not currently included in the Secondary Source Identifier [si] field for MEDLINE citations. AVAILABILITY: All annotated datasets described in this study are freely available from the NLM/NCBI website at http://www.ncbi.nlm.nih.gov/CBBresearch/Fellows/Neveol/DepositionDataSets.zip CONTACT: aurelie.neveol@nih.gov; john.wilbur@nih.gov; zhiyong.lu@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Bases de Dados Genéticas , Dados de Sequência Molecular , PubMed , Máquina de Vetores de Suporte , Inteligência Artificial , Automação , MEDLINE , National Library of Medicine (U.S.) , Estados Unidos

9.

Contribution of Natural Language Processing in Predicting Rehospitalization Risk.

Norman, Christopher; Nguyen, Thu Van; Névéol, Aurélie.

Med Care ; 55(8): 781, 2017 08.

Artigo em Inglês | MEDLINE | ID: mdl-28549001

Assuntos

Processamento de Linguagem Natural , Readmissão do Paciente , Humanos , Risco , Fatores de Risco

10.

Improving information retrieval using Medical Subject Headings Concepts: a test case on rare and chronic diseases.

Darmoni, Stéfan J; Soualmia, Lina F; Letord, Catherine; Jaulent, Marie-Christine; Griffon, Nicolas; Thirion, Benoît; Névéol, Aurélie.

J Med Libr Assoc ; 100(3): 176-83, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22879806

RESUMO

BACKGROUND: As more scientific work is published, it is important to improve access to the biomedical literature. Since 2000, when Medical Subject Headings (MeSH) Concepts were introduced, the MeSH Thesaurus has been concept based. Nevertheless, information retrieval is still performed at the MeSH Descriptor or Supplementary Concept level. OBJECTIVE: The study assesses the benefit of using MeSH Concepts for indexing and information retrieval. METHODS: Three sets of queries were built for thirty-two rare diseases and twenty-two chronic diseases: (1) using PubMed Automatic Term Mapping (ATM), (2) using Catalog and Index of French-language Health Internet (CISMeF) ATM, and (3) extrapolating the MEDLINE citations that should be indexed with a MeSH Concept. RESULTS: Type 3 queries retrieve significantly fewer results than type 1 or type 2 queries (about 18,000 citations versus 200,000 for rare diseases; about 300,000 citations versus 2,000,000 for chronic diseases). CISMeF ATM also provides better precision than PubMed ATM for both disease categories. DISCUSSION: Using MeSH Concept indexing instead of ATM is theoretically possible to improve retrieval performance with the current indexing policy. However, using MeSH Concept information retrieval and indexing rules would be a fundamentally better approach. These modifications have already been implemented in the CISMeF search engine.

Assuntos

Indexação e Redação de Resumos/estatística & dados numéricos , Bases de Dados como Assunto/estatística & dados numéricos , Medical Subject Headings/estatística & dados numéricos , Terminologia como Assunto , Algoritmos , Doença Crônica , Processamento Eletrônico de Dados , França , Humanos , Armazenamento e Recuperação da Informação , Idioma , MEDLINE/estatística & dados numéricos , Controle de Qualidade , Doenças Raras

11.

A context-blocks model for identifying clinical relationships in patient records.

Islamaj Dogan, Rezarta; Névéol, Aurélie; Lu, Zhiyong.

BMC Bioinformatics ; 12 Suppl 3: S3, 2011 Jun 09.

Artigo em Inglês | MEDLINE | ID: mdl-21658290

RESUMO

BACKGROUND: Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preliminary steps for many important applications in medical informatics, ranging from quality of care to hypothesis generation. METHODS: In this work we describe an approach that facilitates the automatic recognition of eight relationships defined between medical problems, treatments and tests. Unlike the traditional bag-of-words representation, in this work, we represent a relationship with a scheme of five distinct context-blocks determined by the position of concepts in the text. As a preliminary step to relationship recognition, and in order to provide an end-to-end system, we also addressed the automatic extraction of medical problems, treatments and tests. Our approach combined the outcome of a statistical model for concept recognition and simple natural language processing features in a conditional random fields model. A set of 826 patient records from the 4th i2b2 challenge was used for training and evaluating the system. RESULTS: Results show that our concept recognition system achieved an F-measure of 0.870 for exact span concept detection. Moreover the context-block representation of relationships was more successful (F-Measure = 0.775) at identifying relationships than bag-of-words (F-Measure = 0.402). Most importantly, the performance of the end-to-end system of relationship extraction using automatically extracted concepts (F-Measure = 0.704) was comparable to that obtained using manually annotated concepts (F-Measure = 0.711), and their difference was not statistically significant. CONCLUSIONS: We extracted important clinical relationships from text in an automated manner, starting with concept recognition, and ending with relationship identification. The advantage of the context-blocks representation scheme was the correct management of word position information, which may be critical in identifying certain relationships. Our results may serve as benchmark for comparison to other systems developed on i2b2 challenge data. Finally, our system may serve as a preliminary step for other discovery tasks in medical informatics.

Assuntos

Sistemas Computadorizados de Registros Médicos , Registros Médicos Orientados a Problemas , Modelos Estatísticos , Processamento de Linguagem Natural , Humanos

12.

Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.

Névéol, Aurélie; Islamaj Dogan, Rezarta; Lu, Zhiyong.

J Biomed Inform ; 44(2): 310-8, 2011 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-21094696

RESUMO

Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations.

Assuntos

PubMed , Semântica , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Vocabulário Controlado

13.

Evaluation of multi-terminology super-concepts for information retrieval.

Griffon, Nicolas; Soualmia, Lina F; Névéol, Aurélie; Massari, Philippe; Thirion, Benoit; Dahamna, Badisse; Darmoni, Stefan J.

Stud Health Technol Inform ; 169: 492-6, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21893798

RESUMO

BACKGROUND: Following a recent change in the indexing policy for French quality controlled health gateway CISMeF, multiple terminologies are now being used for indexing in addition to MeSH®. OBJECTIVE: To evaluate precision and recall of super-concepts for information retrieval in a multi-terminology paradigm compared to MeSH-only. METHODS: We evaluate the relevance of resources retrieved by multi-terminology super-concepts and MeSH-only super-concepts queries. RESULTS: Recall was 8-14% higher for multi-terminology super-concepts compared to MeSH only super-concepts. Precision decreased from 0.66 for MeSH only super-concepts to 0.61 for multi-terminology super-concepts. Retrieval performance was found to vary significantly depending on the super-concepts (p<10-4) and indexing methods (manual vs automatic; p<0.004). CONCLUSION: A multi-terminology paradigm contributes to increase recall but lowers precision. Automated tools for indexing are not accurate enough to allow a very precise information retrieval.

Assuntos

Indexação e Redação de Resumos , Armazenamento e Recuperação da Informação/métodos , Informática Médica/métodos , Algoritmos , Catálogos como Assunto , Processamento Eletrônico de Dados , Humanos , Internet , Medical Subject Headings , Reprodutibilidade dos Testes , Software , Estatística como Assunto , Terminologia como Assunto

14.

Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites.

Digan, William; Névéol, Aurélie; Neuraz, Antoine; Wack, Maxime; Baudoin, David; Burgun, Anita; Rance, Bastien.

J Am Med Inform Assoc ; 28(3): 504-515, 2021 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-33319904

RESUMO

BACKGROUND: The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck. OBJECTIVE: To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks. MATERIALS AND METHODS: Based on the literature across multiple researcho fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations. RESULTS: We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows. DISCUSSION: 34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features. CONCLUSION: NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.

Assuntos

Processamento de Linguagem Natural , Reprodutibilidade dos Testes , Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Informática Médica

15.

Diversity in Health Informatics: Mentoring and Leadership.

Moen, Anne; Chronaki, Catherine; Petelos, Elena; Voulgaraki, Despina; Turk, Eva; Névéol, Aurélie.

Stud Health Technol Inform ; 281: 1031-1035, 2021 May 27.

Artigo em Inglês | MEDLINE | ID: mdl-34042835

RESUMO

Diversity, inclusion and interdisciplinary collaboration are drivers for healthcare innovation and adoption of new, technology-mediated services. The importance of diversity has been highlighted by the United Nations' in SDG5 "Achieve gender equality and empower all women and girls", to drive adoption of social and digital innovation. Women play an instrumental role in health care and are in position to bring about significant changes to support ongoing digitalization and transformation. At the same time, women are underrepresented in Science, Technology, Engineering and Mathematics (STEM). To some extent, the same holds for health care informatics. This paper sums up input to strategies for peer mentoring to ensure diversity in health informatics, to target systemic inequalities and build sustainable, intergenerational communities, improve digital health literacy and build capacity in digital health without losing the human touch.

Assuntos

Informática Médica , Tutoria , Engenharia , Feminino , Humanos , Liderança , Mentores

16.

A recent advance in the automatic indexing of the biomedical literature.

Névéol, Aurélie; Shooshan, Sonya E; Humphrey, Susanne M; Mork, James G; Aronson, Alan R.

J Biomed Inform ; 42(5): 814-23, 2009 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-19166973

RESUMO

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM's Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.

Assuntos

Indexação e Redação de Resumos/métodos , Inteligência Artificial , MEDLINE , Medical Subject Headings , Processamento de Linguagem Natural , Dicionários Médicos como Assunto , Estudos de Avaliação como Assunto , Humanos , Interface Usuário-Computador

17.

Evaluation of an automatic article selection method for timelier updates of the Comet Core Outcome Set database.

Norman, Christopher R; Gargon, Elizabeth; Leeflang, Mariska M G; Névéol, Aurélie; Williamson, Paula R.

Database (Oxford) ; 20192019 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31697361

RESUMO

Curated databases of scientific literature play an important role in helping researchers find relevant literature, but populating such databases is a labour intensive and time-consuming process. One such database is the freely accessible Comet Core Outcome Set database, which was originally populated using manual screening in an annually updated systematic review. In order to reduce the workload and facilitate more timely updates we are evaluating machine learning methods to reduce the number of references needed to screen. In this study we have evaluated a machine learning approach based on logistic regression to automatically rank the candidate articles. Data from the original systematic review and its four first review updates were used to train the model and evaluate performance. We estimated that using automatic screening would yield a workload reduction of at least 75% while keeping the number of missed references around 2%. We judged this to be an acceptable trade-off for this systematic review, and the method is now being used for the next round of the Comet database update.

Assuntos

Curadoria de Dados , Mineração de Dados , Bases de Dados Factuais , Aprendizado de Máquina , Revisões Sistemáticas como Assunto

18.

Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy.

Norman, Christopher R; Leeflang, Mariska M G; Porcher, Raphaël; Névéol, Aurélie.

Syst Rev ; 8(1): 243, 2019 10 28.

Artigo em Inglês | MEDLINE | ID: mdl-31661028

RESUMO

BACKGROUND: The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy. METHODS: We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates. RESULTS: The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations. CONCLUSION: Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.

Assuntos

Automação , Testes Diagnósticos de Rotina , Programas de Rastreamento , Humanos , Testes Diagnósticos de Rotina/normas , Reprodutibilidade dos Testes , Projetos de Pesquisa , Sensibilidade e Especificidade , Revisões Sistemáticas como Assunto , Metanálise como Assunto

19.

Automatic inference of indexing rules for MEDLINE.

Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent.

BMC Bioinformatics ; 9 Suppl 11: S11, 2008 Nov 19.

Artigo em Inglês | MEDLINE | ID: mdl-19025687

RESUMO

BACKGROUND: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. METHODS: In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. RESULTS: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. CONCLUSION: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.

Assuntos

Indexação e Redação de Resumos/métodos , MEDLINE , Medical Subject Headings , Processamento de Linguagem Natural , Algoritmos , Inteligência Artificial , Linguagens de Programação

20.

Expanding the Diversity of Texts and Applications: Findings from the Section on Clinical Natural Language Processing of the International Medical Informatics Association Yearbook.

Névéol, Aurélie; Zweigenbaum, Pierre.

Yearb Med Inform ; 27(1): 193-198, 2018 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-30157523

RESUMO

OBJECTIVES: To summarize recent research and present a selection of the best papers published in 2017 in the field of clinical Natural Language Processing (NLP). METHODS: A survey of the literature was performed by the two editors of the NLP section of the International Medical Informatics Association (IMIA) Yearbook. Bibliographic databases PubMed and Association of Computational Linguistics (ACL) Anthology were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. A total of 709 papers were automatically ranked and then manually reviewed based on title and abstract. A shortlist of 15 candidate best papers was selected by the section editors and peer-reviewed by independent external reviewers to come to the three best clinical NLP papers for 2017. RESULTS: Clinical NLP best papers provide a contribution that ranges from methodological studies to the application of research results to practical clinical settings. They draw from text genres as diverse as clinical narratives across hospitals and languages or social media. CONCLUSIONS: Clinical NLP continued to thrive in 2017, with an increasing number of contributions towards applications compared to fundamental methods. Methodological work explores deep learning and system adaptation across language variants. Research results continue to translate into freely available tools and corpora, mainly for the English language.

Assuntos

Processamento de Linguagem Natural , Pessoal de Saúde , Humanos , Informática Médica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA