Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Cureus ; 11(10): e6010, 2019 Oct 28.
Article in English | MEDLINE | ID: mdl-31815074

ABSTRACT

Currently, radiation oncology-specific electronic medical records (EMRs) allow providers to input the radiation treatment site using free text. The purpose of this study is to develop a natural language processing (NLP) tool to extract encoded data from radiation treatment sites in an EMR. Treatment sites were extracted from all patients who completed treatment in our department from April 1, 2011, to April 30, 2013. A system was designed to extract the Unified Medical Language System (UMLS) concept codes using a sample of 11,018 unique site names from 31118 radiation therapy (RT) sites. Among those, 5500 unique site name strings that constitute approximately half of the sample were spared as a test set to evaluate the final system. A dictionary and calculated n-gram statistics using UMLS concepts from related semantic types were combined with manually encoded data. There was an average of 2.2 sites per patient. Prior to extraction, the 20 most common unique treatment sites were used 4215 times (38.3%). The most common treatment site was whole brain RT, which was entered using 27 distinct terms for a total of 1063 times. The customized NLP solution displayed great gains as compared to other systems, with a recall of 0.99 and a precision of 0.99. A customized NLP tool was extracting encoded data from radiation treatment sites in an EMR with great accuracy. This can be integrated into a repository of demographic, genomic, treatment, and outcome data to advance personalized oncologic care.

2.
J Am Med Inform Assoc ; 26(11): 1163-1171, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31562516

ABSTRACT

OBJECTIVE: Track 1 of the 2018 National NLP Clinical Challenges shared tasks focused on identifying which patients in a corpus of longitudinal medical records meet and do not meet identified selection criteria. MATERIALS AND METHODS: To address this challenge, we annotated American English clinical narratives for 288 patients according to whether they met these criteria. We chose criteria from existing clinical trials that represented a variety of natural language processing tasks, including concept extraction, temporal reasoning, and inference. RESULTS: A total of 47 teams participated in this shared task, with 224 participants in total. The participants represented 18 countries, and the teams submitted 109 total system outputs. The best-performing system achieved a micro F1 score of 0.91 using a rule-based approach. The top 10 teams used rule-based and hybrid systems to approach the problems. DISCUSSION: Clinical narratives are open to interpretation, particularly in cases where the selection criterion may be underspecified. This leaves room for annotators to use domain knowledge and intuition in selecting patients, which may lead to error in system outputs. However, teams who consulted medical professionals while building their systems were more likely to have high recall for patients, which is preferable for patient selection systems. CONCLUSIONS: There is not yet a 1-size-fits-all solution for natural language processing systems approaching this task. Future research in this area can look to examining criteria requiring even more complex inferences, temporal reasoning, and domain knowledge.


Subject(s)
Clinical Trials as Topic/methods , Data Mining/methods , Machine Learning , Natural Language Processing , Patient Selection , Datasets as Topic , Humans
3.
Stud Health Technol Inform ; 264: 1041-1045, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31438083

ABSTRACT

Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.


Subject(s)
Information Storage and Retrieval , Neoplasms , Electronic Health Records , Humans , Natural Language Processing , Research Report
4.
J Am Med Inform Assoc ; 25(3): 300-308, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29346583

ABSTRACT

OBJECTIVE: Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. MATERIALS AND METHODS: DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. RESULTS AND CONCLUSION: Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

5.
J Am Med Inform Assoc ; 25(3): 331-336, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29186491

ABSTRACT

Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.

6.
AMIA Jt Summits Transl Sci Proc ; 2017: 268-277, 2017.
Article in English | MEDLINE | ID: mdl-28815141

ABSTRACT

Metastatic patterns of spread at the time of cancer recurrence are one of the most important prognostic factors in estimation of clinical course and survival of the patient. This information is not easily accessible since it's rarely recorded in a structured format. This paper describes a system for categorization of pathology reports by specimen site and the detection of metastatic status within the report. A clinical NLP pipeline was developed using sentence boundary detection, tokenization, section identification, part-of-speech tagger, and chunker with some rule based methods to extract metastasis site and status in combination with five types of information related to tumor metastases: histological type, grade, specimen site, metastatic status indicators and the procedure. The system achieved a recall of 0.84 and 0.88 precision for metastatic status detection, and 0.89 recall and 0.93 precision for metastasis site detection. This study demonstrates the feasibility of applying NLP technologies to extract valuable metastases information from pathology reports and we believe that it will greatly benefit studies on cancer metastases that utilize EHRs.

8.
J Am Med Inform Assoc ; 24(e1): e79-e86, 2017 Apr 01.
Article in English | MEDLINE | ID: mdl-27539197

ABSTRACT

OBJECTIVE: The goal of this study was to develop a practical framework for recognizing and disambiguating clinical abbreviations, thereby improving current clinical natural language processing (NLP) systems' capability to handle abbreviations in clinical narratives. METHODS: We developed an open-source framework for clinical abbreviation recognition and disambiguation (CARD) that leverages our previously developed methods, including: (1) machine learning based approaches to recognize abbreviations from a clinical corpus, (2) clustering-based semiautomated methods to generate possible senses of abbreviations, and (3) profile-based word sense disambiguation methods for clinical abbreviations. We applied CARD to clinical corpora from Vanderbilt University Medical Center (VUMC) and generated 2 comprehensive sense inventories for abbreviations in discharge summaries and clinic visit notes. Furthermore, we developed a wrapper that integrates CARD with MetaMap, a widely used general clinical NLP system. RESULTS AND CONCLUSION: CARD detected 27 317 and 107 303 distinct abbreviations from discharge summaries and clinic visit notes, respectively. Two sense inventories were constructed for the 1000 most frequent abbreviations in these 2 corpora. Using the sense inventories created from discharge summaries, CARD achieved an F1 score of 0.755 for identifying and disambiguating all abbreviations in a corpus from the VUMC discharge summaries, which is superior to MetaMap and Apache's clinical Text Analysis Knowledge Extraction System (cTAKES). Using additional external corpora, we also demonstrated that the MetaMap-CARD wrapper improved MetaMap's performance in recognizing disorder entities in clinical notes. The CARD framework, 2 sense inventories, and the wrapper for MetaMap are publicly available at https://sbmi.uth.edu/ccb/resources/abbreviation.htm . We believe the CARD framework can be a valuable resource for improving abbreviation identification in clinical NLP systems.


Subject(s)
Abbreviations as Topic , Electronic Health Records , Machine Learning , Natural Language Processing , Humans , Patient Discharge
9.
BMC Syst Biol ; 10 Suppl 3: 67, 2016 08 26.
Article in English | MEDLINE | ID: mdl-27585838

ABSTRACT

BACKGROUND: Information about drug-drug interactions (DDIs) supported by scientific evidence is crucial for establishing computational knowledge bases for applications like pharmacovigilance. Since new reports of DDIs are rapidly accumulating in the scientific literature, text-mining techniques for automatic DDI extraction are critical. We propose a novel approach for automated pharmacokinetic (PK) DDI detection that incorporates syntactic and semantic information into graph kernels, to address the problem of sparseness associated with syntactic-structural approaches. First, we used a novel all-path graph kernel using shallow semantic representation of sentences. Next, we statistically integrated fine-granular semantic classes into the dependency and shallow semantic graphs. RESULTS: When evaluated on the PK DDI corpus, our approach significantly outperformed the original all-path graph kernel that is based on dependency structure. Our system that combined dependency graph kernel with semantic classes achieved the best F-scores of 81.94 % for in vivo PK DDIs and 69.34 % for in vitro PK DDIs, respectively. Further, combining shallow semantic graph kernel with semantic classes achieved the highest precisions of 84.88 % for in vivo PK DDIs and 74.83 % for in vitro PK DDIs, respectively. CONCLUSIONS: We presented a graph kernel based approach to combine syntactic and semantic information for extracting pharmacokinetic DDIs from Biomedical Literature. Experimental results showed that our proposed approach could extract PK DDIs from literature effectively, which significantly enhanced the performance of the original all-path graph kernel based on dependency structure.


Subject(s)
Biomedical Research , Computational Biology/methods , Computer Graphics , Drug Interactions , Pharmacokinetics , Publications , Semantics , Data Mining
10.
Article in English | MEDLINE | ID: mdl-26306271

ABSTRACT

A computable knowledge base containing relations between diseases and lab tests would be a great resource for many biomedical informatics applications. This paper describes our initial step towards establishing a comprehensive knowledge base of disease and lab tests relations utilizing three public on-line resources. LabTestsOnline, MedlinePlus and Wikipedia are integrated to create a freely available, computable disease-lab test knowledgebase. Disease and lab test concepts are identified using MetaMap and relations between diseases and lab tests are determined based on source-specific rules. Experimental results demonstrate a high precision for relation extraction, with Wikipedia achieving the highest precision of 87%. Combining the three sources reached a recall of 51.40%, when compared with a subset of disease-lab test relations extracted from a reference book. Moreover, we found additional disease-lab test relations from on-line resources, indicating they are complementary to existing reference books for building a comprehensive disease and lab test relation knowledge base.

11.
BMC Syst Biol ; 9 Suppl 4: S2, 2015.
Article in English | MEDLINE | ID: mdl-26100720

ABSTRACT

BACKGROUND: Computational pharmacology can uniquely address some issues in the process of drug development by providing a macroscopic view and a deeper understanding of drug action. Specifically, network-assisted approach is promising for the inference of drug repurposing. However, the drug-target associations coming from different sources and various assays have much noise, leading to an inflation of the inference errors. To reduce the inference errors, it is necessary and critical to create a comprehensive and weighted data set of drug-target associations. RESULTS: In this study, we created a weighted and integrated drug-target interactome (WinDTome) to provide a comprehensive resource of drug-target associations for computational pharmacology. We first collected drug-target interactions from six commonly used drug-target centered data sources including DrugBank, KEGG, TTD, MATADOR, PDSP K(i) Database, and BindingDB. Then, we employed the record linkage method to normalize drugs and targets to the unique identifiers by utilizing the public data sources including PubChem, Entrez Gene, and UniProt. To assess the reliability of the drug-target associations, we assigned two scores (Score_S and Score_R) to each drug-target association based on their data sources and publication references. Consequently, the WinDTome contains 546,196 drug-target associations among 303,018 compounds and 4,113 genes. To assess the application of the WinDTome, we designed a network-based approach for drug repurposing using mental disorder schizophrenia (SCZ) as a case. Starting from 41 known SCZ drugs and their targets, we inferred a total of 264 potential SCZ drugs through the associations of drug-target with Score_S higher than two in WinDTome and human protein-protein interactions. Among the 264 SCZ-related drugs, 39 drugs have been investigated in clinical trials for SCZ treatment and 74 drugs for the treatment of other mental disorders, respectively. Compared with the results using other Score_S cutoff values, single data source, or the data from STITCH, the inference of 264 SCZ-related drugs had the highest performance. CONCLUSIONS: The WinDTome generated in this study contains comprehensive drug-target associations with confidence scores. Its application to the SCZ drug repurposing demonstrated that the WinDTome is promising to serve as a useful resource for drug repurposing.


Subject(s)
Computational Biology/methods , Drug Repositioning , Molecular Targeted Therapy , Pharmaceutical Preparations/metabolism , Schizophrenia/drug therapy , Databases, Pharmaceutical , Humans , Protein Binding , Proteins/metabolism , Schizophrenia/metabolism
12.
Comput Biol Med ; 40(11-12): 900-11, 2010.
Article in English | MEDLINE | ID: mdl-20970122

ABSTRACT

This paper describes an information extraction system that extracts and converts the available information in free text Turkish radiology reports into a structured information model using manually created extraction rules and domain ontology. The ontology provides flexibility in the design of extraction rules, and determines the information model for the extracted semantic information. Although our information extraction system mainly concentrates on abdominal radiology reports, the system can be used in another field of medicine by adapting its ontology and extraction rule set. We achieved very high precision and recall results during the evaluation of the developed system with unseen radiology reports.


Subject(s)
Models, Theoretical , Radiology/methods , Semantics , Software , Translating , Evaluation Studies as Topic , Humans , Radiology/instrumentation , Research Report , Turkey
13.
Ophthalmic Plast Reconstr Surg ; 24(3): 201-6, 2008.
Article in English | MEDLINE | ID: mdl-18520835

ABSTRACT

PURPOSE: To review the clinical and histopathologic features, treatment, and outcomes of eyelid basal cell carcinomas. METHODS: The clinical records and histopathologic specimens of 311 patients with eyelid basal cell carcinomas were reviewed and analyzed retrospectively. The main outcome measures are patient demographics, clinical characteristics, lesion size, duration of lesion, histologic subtypes, presence of orbital and perineural invasion, severity of peritumorous inflammation, treatment modalities, recurrence rate, tumor-related death, and prognostic features. RESULTS: Two-hundred ninety patients underwent surgery whereas others received radiotherapy or chemotherapy. The most common histologic subtypes were infiltrative, nodular, and basosquamous basal cell carcinomas. Nearly one-third (29.9%) of the patients were previously recurrent. Orbital and perineural invasion rates were 17.04% and 10.6%, respectively. Recurrent basal cell carcinomas were larger, with longer duration of lesion and a higher rate of orbital and perineural invasion. Basosquamous basal cell carcinomas were more likely to have prior recurrences, larger lesion size, and the highest rate of orbital invasion. Perineural invasion was most frequent in morpheaform and basosquamous subtypes. Peritumorous inflammation differed between subtypes and was highest in the superficial subtype. The recurrence rate was 7.39% in total. The death of 2 patients was tumor-related. CONCLUSIONS: In this large case series from a single center, the outcomes were worse than previously reported due to delay in treatment and previous inadequate treatments. Adverse prognostic factors associated with secondary orbital invasion are previous recurrences, aggressive histologic subtypes, longer duration of lesion, larger lesion size, and the presence of perineural invasion.


Subject(s)
Carcinoma, Basal Cell/pathology , Eyelid Neoplasms/pathology , Neoplasm Recurrence, Local , Orbital Neoplasms/pathology , Adult , Aged , Aged, 80 and over , Carcinoma, Basal Cell/diagnostic imaging , Carcinoma, Basal Cell/mortality , Carcinoma, Basal Cell/therapy , Eyelid Neoplasms/diagnostic imaging , Eyelid Neoplasms/mortality , Eyelid Neoplasms/therapy , Female , Humans , Male , Middle Aged , Neoplasm Invasiveness , Orbital Neoplasms/diagnostic imaging , Orbital Neoplasms/mortality , Orbital Neoplasms/therapy , Retrospective Studies , Survival Rate , Tomography, X-Ray Computed , Treatment Outcome , Turkey/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...