Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.191
Filtrar
1.
J Biomed Semantics ; 13(1): 19, 2022 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-35841031

RESUMO

BACKGROUND: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. RESULTS: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings' classes belonged to top-level classes that matched. CONCLUSIONS: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.


Assuntos
Ontologias Biológicas , Ecossistema , Consenso , Armazenamento e Recuperação da Informação , Systematized Nomenclature of Medicine , Unified Medical Language System
2.
BMC Med Inform Decis Mak ; 22(Suppl 1): 88, 2022 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-35799294

RESUMO

BACKGROUND: Since no effective therapies exist for Alzheimer's disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle's effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English. METHODS: Based on the collected concept unique identifiers (CUIs) associated with the lifestyle status, we extracted all related EHRs for patients with AD from the Clinical Data Repository (CDR) of the University of Minnesota (UMN). We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models and three traditional machine learning models as baseline models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT (abstracts + full text), PubMedBERT (only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, Bio-clinical BERT, logistic regression, support vector machine, and random forest. The rule-based model used for weak supervision was tested on the GSC for comparison. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle status for all models were evaluated and compared on the developed Gold Standard Corpus (GSC) on the two case studies. RESULTS: The UMLS BERT model achieved the best performance for classifying status of physical activity, with its precision, recall, and F-1 scores of 0.93, 0.93, and 0.92, respectively. Regarding classifying excessive diet, the Bio-clinical BERT model showed the best performance with precision, recall, and F-1 scores of 0.93, 0.93, and 0.93, respectively. CONCLUSION: The proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. By comparing with the traditional machine learning models, the study also demonstrates the high performance of BERT models for classifying lifestyle status for Alzheimer's disease in clinical notes.


Assuntos
Doença de Alzheimer , Aprendizado Profundo , Humanos , Estilo de Vida , Processamento de Linguagem Natural , Unified Medical Language System
3.
Stud Health Technol Inform ; 295: 289-292, 2022 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-35773865

RESUMO

Contextualized word embeddings proved to be highly successful quantitative representations of words that allow to efficiently solve various tasks such as clinical entity normalization in unstructured texts. In this paper, we investigate how the Saussurean sign theory can be used as a qualitative explainable AI method for word embeddings. Our assumption is that the main goal of XAI is to produce confidence and/or trust, which can be gained through quantitative as well as quantitative approaches. One important result is related to the fact that the differential structure of language as explained by Saussure corresponds to the possibility of adding and subtracting word embeddings. On the other hand, these mathematical structures provide insights into the inner workings of natural language.


Assuntos
Pesquisa Biomédica , Processamento de Linguagem Natural , Idioma , Unified Medical Language System
4.
Stud Health Technol Inform ; 290: 187-191, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35672997

RESUMO

Most clinical texts including breast cancer patient summaries (BCPSs) are elaborated as narrative documents difficult to process by decision support systems. Annotators have been developed to extract the relevant content of such documents, e.g., MetaMap and cTAKES, that work with the English language and perform concept mapping using UMLS, SIFR and ECMT, that work for the French language and provide concepts using various terminologies. We compared the four annotators on a sample of 25 French BCPSs, pre-processed to manage acronyms and translated in English. We observed that MetaMap extracted the largest number of UMLS concepts (15,458), followed by SIFR (3,784), ECMT (1,962), and cTAKES (1,769). Each annotator extracted specific valuable information, not proposed by the other annotators. Considered as complementary, all annotators should be used in sequence to optimize the results.


Assuntos
Neoplasias da Mama , Processamento de Linguagem Natural , Neoplasias da Mama/diagnóstico por imagem , Feminino , Humanos , Idioma , Unified Medical Language System
5.
Stud Health Technol Inform ; 290: 1060-1061, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35673205

RESUMO

BACKGROUND: CIGs languages consist of approach specific concepts. More widely used concepts, such as those in UMLS are not typically used. OBJECTIVE: An evaluation of UMLS concept sufficiency for CIG definition. METHOD: A popular guideline is mapped to UMLS concepts with NLP. Results are reviewed to evaluate gaps, and appropriateness. RESULTS: A significant number of the guideline text mapped to UMLS concepts. CONCLUSIONS: The approach has shown promise and highlighted further challenges.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Computadores , Idioma , Semântica
6.
J Biomed Inform ; 131: 104118, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35690349

RESUMO

OBJECTIVE: To propose a new vector-based relatedness metric that derives word vectors from the intrinsic structure of biomedical ontologies, without consulting external resources such as large-scale biomedical corpora. MATERIALS AND METHODS: SNOMED CT on the mapping layer of UMLS was used as a testbed ontology. Vectors were created for every concept at the end of all semantic relations-attribute-value relations and descendants as well as is_a relation-of the defining concept. The cosine similarity between the averages of those vectors with respect to each defining concept was computed to produce a final semantic relatedness. RESULTS: Two benchmark sets that include a total of 62 biomedical term pairs were used for evaluation. Spearman's rank coefficient of the current method was 0.655, 0.744, and 0.742 with the relatedness rated by physicians, coders, and medical experts, respectively. The proposed method was comparable to a word-embedding method and outperformed path-based, information content-based, and another multiple relation-based relatedness metrics. DISCUSSION: The current study demonstrated that the addition of attribute relations to the is_a hierarchy of SNOMED CT better conforms to the human sense of relatedness than models based on taxonomic relations. The current approach also showed that it is robust to the design inconsistency of ontologies. CONCLUSION: Unlike the previous vector-based approach, the current study exploited the intrinsic semantic structure of an ontology, precluding the need for external textual resources to obtain context information of defining terms. Future research is recommended to prove the validity of the current method with other biomedical ontologies.


Assuntos
Ontologias Biológicas , Systematized Nomenclature of Medicine , Humanos , Processamento de Linguagem Natural , Semântica , Unified Medical Language System
7.
J Biomed Inform ; 131: 104120, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35709900

RESUMO

OBJECTIVE: Develop a novel methodology to create a comprehensive knowledge graph (SuppKG) to represent a domain with limited coverage in the Unified Medical Language System (UMLS), specifically dietary supplement (DS) information for discovering drug-supplement interactions (DSI), by leveraging biomedical natural language processing (NLP) technologies and a DS domain terminology. MATERIALS AND METHODS: We created SemRepDS (an extension of an NLP tool, SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT model to remove incorrect relations before generating SuppKG. Two discovery pathways were applied to SuppKG to identify potential DSIs, which are then compared with an existing DSI database and also evaluated by medical professionals for mechanistic plausibility. RESULTS: SemRepDS returned 158.5% more DS entities and 206.9% more DS relations than SemRep. The fine-tuned PubMedBERT model (significantly outperformed other machine learning and BERT models) obtained an F1 score of 0.8605 and removed 43.86% of semantic relations, improving the precision of the relations by 26.4% over pre-filtering. SuppKG consists of 56,635 nodes and 595,222 directed edges with 2,928 DS-specific nodes and 164,738 edges. Manual review of findings identified 182 of 250 (72.8%) proposed DS-Gene-Drug and 77 of 100 (77%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible. DISCUSSION: With added DS terminology to the UMLS, SemRepDS has the capability to find more DS-specific semantic relationships from PubMed than SemRep. The utility of the resulting SuppKG was demonstrated using discovery patterns to find novel DSIs. CONCLUSION: For the domain with limited coverage in the traditional terminology (e.g., UMLS), we demonstrated an approach to leverage domain terminology and improve existing NLP tools to generate a more comprehensive knowledge graph for the downstream task. Even this study focuses on DSI, the method may be adapted to other domains.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Suplementos Nutricionais , PubMed , Semântica
8.
Stud Health Technol Inform ; 290: 116-119, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35672982

RESUMO

BACKGROUND: Terminology integration at the scale of the UMLS Metathesaurus (i.e., over 200 source vocabularies) remains challenging despite recent advances in ontology alignment techniques based on neural networks. OBJECTIVES: To improve the performance of the neural network architecture we developed for predicting synonymy between terms in the UMLS Metathesaurus, specifically through the addition of an attention layer. METHODS: We modify our original Siamese neural network architecture with Long-Short Term Memory (LSTM) and create two variants by (1) adding an attention layer on top of the existing LSTM, and (2) replacing the existing LSTM layer by an attention layer. RESULTS: Adding an attention layer to the LSTM layer resulted in increasing precision to 92.38% (+3.63%) and F1 score to 91,74% (+1.13%), with limited impact on recall at 91.12% (-1.42%). CONCLUSIONS: Although limited, this increase in precision substantially reduces the false positive rate and minimizes the need for manual curation.


Assuntos
Redes Neurais de Computação , Unified Medical Language System , Atenção
9.
Artif Intell Med ; 128: 102311, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35534148

RESUMO

BACKGROUND: The development of electronic health records has provided a large volume of unstructured biomedical information. Extracting patient characteristics from these data has become a major challenge, especially in languages other than English. METHODS: Inspired by the French Text Mining Challenge (DEFT 2021) [1] in which we participated, our study proposes a multilabel classification of clinical narratives, allowing us to automatically extract the main features of a patient report. Our system is an end-to-end pipeline from raw text to labels with two main steps: named entity recognition and multilabel classification. Both steps are based on a neural network architecture based on transformers. To train our final classifier, we extended the dataset with all English and French Unified Medical Language System (UMLS) vocabularies related to human diseases. We focus our study on the multilingualism of training resources and models, with experiments combining French and English in different ways (multilingual embeddings or translation). RESULTS: We obtained an overall average micro-F1 score of 0.811 for the multilingual version, 0.807 for the French-only version and 0.797 for the translated version. CONCLUSION: Our study proposes an original multilabel classification of French clinical notes for patient phenotyping. We show that a multilingual algorithm trained on annotated real clinical notes and UMLS vocabularies leads to the best results.


Assuntos
Multilinguismo , Processamento de Linguagem Natural , Mineração de Dados , Humanos , Idioma , Unified Medical Language System
10.
Stud Health Technol Inform ; 292: 23-27, 2022 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-35575844

RESUMO

Among medical applications of natural language processing (NLP), word sense disambiguation (WSD) estimates alternative meanings from text around homonyms. Recently developed NLP methods include word vectors that combine easy computability with nuanced semantic representations. Here we explore the utility of simple linear WSD classifiers based on aggregating word vectors from a modern biomedical NLP library in homonym contexts. We evaluated eight WSD tasks that consider literature abstracts as textual contexts. Discriminative performance was measured in held-out annotations as the median area under sensitivity-specificity curves (AUC) across tasks and 200 bootstrap repetitions. We find that classifiers trained on domain-specific vectors outperformed those from a general language model by 4.0 percentage points, and that a preprocessing step of filtering stopwords and punctuation marks enhanced discrimination by another 0.7 points. The best models achieved a median AUC of 0.992 (interquartile range 0.975 - 0.998). These improvements suggest that more advanced WSD methods might also benefit from leveraging domain-specific vectors derived from large biomedical corpora.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Algoritmos , Idioma , Semântica
11.
Stud Health Technol Inform ; 294: 357-361, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612096

RESUMO

The distributed nature of our digital healthcare and the rapid emergence of new data sources prevents a compelling overview and the joint use of new data. Data integration, e.g., with metadata and semantic annotations, is expected to overcome this challenge. In this paper, we present an approach to predict UMLS codes to given German metadata using recurrent neural networks. The augmentation of the training dataset using the Medical Subject Headings (MeSH), particularly the German translations, also improved the model accuracy. The model demonstrates robust performance with 75% accuracy and aims to show that increasingly sophisticated machine learning tools can already play a significant role in data integration.


Assuntos
Metadados , Semântica , Armazenamento e Recuperação da Informação , Medical Subject Headings , Redes Neurais de Computação , Unified Medical Language System
12.
Stud Health Technol Inform ; 294: 844-848, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612223

RESUMO

The wide adoption of Electronic Health Records (EHR) in hospitals provides unique opportunities for high throughput phenotyping of patients. The phenotype extraction from narrative reports can be performed by using either dictionary-based or data-driven methods. We developed a hybrid pipeline using deep learning to enrich the UMLS Metathesaurus for automatic detection of phenotypes from EHRs. The pipeline was evaluated on a French database of patients with a rare disease characterized by skeletal abnormalities, Jeune syndrome. The results showed a 2.5-fold improvement regarding the number of detected skeletal abnormalities compared to the baseline extraction using the standard release of UMLS. Our method can help enrich the coverage of the UMLS and improve phenotyping, especially for languages other than English.


Assuntos
Aprendizado Profundo , Unified Medical Language System , Algoritmos , Registros Eletrônicos de Saúde , Síndrome de Ellis-Van Creveld , Humanos , Doenças Raras/diagnóstico
13.
Stud Health Technol Inform ; 294: 854-858, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612225

RESUMO

In health sciences, high-quality text embeddings may augment qualitative data analysis of large amounts of text by enabling, e.g., searching and clustering of health information. This study aimed to evaluate three different sentence-level embedding methods in clustering sentences in nursing narratives from individual patients' hospital care episodes. Two of these embeddings are generated from language models based on the BERT framework, and the third on the Sent2Vec method. These embedding methods were used to cluster sentences from 20 patient care episodes and the results were manually evaluated. Findings suggest that the best clusters were produced by the embeddings from a BERT model fine-tuned for the proxy task of predicting subject headings for nursing text.


Assuntos
Idioma , Processamento de Linguagem Natural , Análise por Conglomerados , Humanos , Unified Medical Language System
14.
Stud Health Technol Inform ; 294: 868-869, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612229

RESUMO

We address the problem of semantic labeling of terms in two French medical corpora with the subset of the UMLS. We perform two experiments relying on the structure of words and terms, and on their context: 1) the semantic label of already identified terms is predicted; 2) the terms are detected in raw texts and their semantic label is predicted. Our results show over 0.90 F-measure.


Assuntos
Semântica , Unified Medical Language System , Processamento de Linguagem Natural
15.
BMC Med Inform Decis Mak ; 22(1): 114, 2022 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-35488252

RESUMO

BACKGROUND: Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small. METHODS: In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus. RESULTS: To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications. CONCLUSION: This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes.


Assuntos
Registros Eletrônicos de Saúde , Unified Medical Language System , Algoritmos , Humanos , Aprendizado de Máquina
16.
Stud Health Technol Inform ; 288: 100-112, 2022 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-35102832

RESUMO

Donald A.B. Lindberg M.D. arrived at the U.S. National Library of Medicine in 1984 and quickly launched the Unified Medical Language System (UMLS) research and development project to help computer understand biomedical meaning and to enable retrieval and integration of information from disparate electronic sources, e.g., patient records, biomedical literature, knowledge bases. This chapter focuses on how Lindberg's thinking, preferred ways of working, and decision-making guided UMLS goals and development and on what made the UMLS markedly "new and different" and ahead of its time.


Assuntos
Bases de Conhecimento , Unified Medical Language System , Humanos , National Library of Medicine (U.S.) , Estados Unidos
17.
J Biomed Inform ; 127: 104005, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35144000

RESUMO

Consumers from non-medical backgrounds often look for information regarding a specific medical information need; however, they are limited by their lack of medical knowledge and may not be able to find reputable resources. As a case study, we investigate reducing this knowledge barrier to allow consumers to achieve search effectiveness comparable to that of an expert, or a medical professional, for COVID-19 related questions. We introduce and evaluate a hybrid index model that allows a consumer to formulate queries using consumer language to find relevant answers to COVID-19 questions. Our aim is to reduce performance degradation between medical professional queries and those of a consumer. We use a universal sentence embedding model to project consumer queries into the same semantic space as professional queries. We then incorporate sentence embeddings into a search framework alongside an inverted index. Documents from this index are retrieved using a novel scoring function that considers sentence embeddings and BM25 scoring. We find that our framework alleviates the expertise disparity, which we validate using an additional set of crowdsourced-consumer-queries even in an unsupervised setting. We also propose an extension of our method, where the sentence encoder is optimised in a supervised setup. Our framework allows for a consumer to search using consumer queries to match the search performance with that of a professional.


Assuntos
COVID-19 , Armazenamento e Recuperação da Informação , Humanos , Processamento de Linguagem Natural , SARS-CoV-2 , Unified Medical Language System
18.
J Biomed Inform ; 126: 103983, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34990838

RESUMO

OBJECTIVE: This paper aims to propose knowledge-aware embedding, a critical tool for medical term normalization. METHODS: We develop CODER (Cross-lingual knowledge-infused medical term embedding) via contrastive learning based on a medical knowledge graph (KG) named the Unified Medical Language System, and similarities are calculated utilizing both terms and relation triplets from the KG. Training with relations injects medical knowledge into embeddings and can potentially improve their performance as machine learning features. RESULTS: We evaluate CODER based on zero-shot term normalization, semantic similarity, and relation classification benchmarks, and the results show that CODER outperforms various state-of-the-art biomedical word embeddings, concept embeddings, and contextual embeddings. CONCLUSION: CODER embeddings excellently reflect semantic similarity and relatedness of medical concepts. One can use CODER for embedding-based medical term normalization or to provide features for machine learning. Similar to other pretrained language models, CODER can also be fine-tuned for specific tasks. Codes and models are available at https://github.com/GanjinZero/CODER.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Idioma , Aprendizado de Máquina , Semântica
19.
Health Inf Manag ; 51(1): 23-31, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32691638

RESUMO

OBJECTIVE: This study tests coverage of SNOMED CT as an expansion source in the process of automated expansion of clinical terms found in discharge summaries. Term expansion is commonly used as a technique in knowledge extraction, query formulation and semantic modelling among other applications. However, characteristics of the sources might affect credibility of outputs, and coverage is one of them. METHOD: We developed an automated method for testing coverage of more than one source at a time. We used several methods to clean our corpus of discharge summaries before we extracted text fragments as candidates for clinical concepts. We then used Unified Medical Language System (UMLS) sources and UMLS REST API to filter concepts from the pool of text fragments. Statistical measures like true positive rate and false negative rate were used to decide on the coverage of the source. We also tested the coverage of the individual SNOMED CT hierarchies using the same methods. RESULTS: Findings suggest that a combination of four terminologies tested (SNOMED CT, NCI, LNC and MSH) achieves over 90% of coverage for term expansion. We also found that the SNOMED CT hierarchies that hold clinically relevant concepts provided 60% of coverage. CONCLUSION: We believe that our findings and the method we developed will be of use to both scientists and practitioners working in the domain of knowledge extraction.


Assuntos
Alta do Paciente , Systematized Nomenclature of Medicine , Humanos , Semântica , Unified Medical Language System
20.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2294-2298, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34891745

RESUMO

The identification of rare diseases from clinical notes with Natural Language Processing (NLP) is challenging due to the few cases available for machine learning and the need of data annotation from clinical experts. We propose a method using ontologies and weak supervision. The approach includes two steps: (i) Text-to-UMLS, linking text mentions to concepts in Unified Medical Language System (UMLS), with a named entity linking tool (e.g. SemEHR) and weak supervision based on customised rules and Bidirectional Encoder Representations from Transformers (BERT) based contextual representations, and (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). Using MIMIC-III US intensive care discharge summaries as a case study, we show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts. Our analysis shows that the overall pipeline processing discharge summaries can surface rare disease cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.


Assuntos
Registros Médicos , Doenças Raras , Unified Medical Language System , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Doenças Raras/diagnóstico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...