Search | VHL Regional Portal

The extraction of complex relationships and their conversion to biological expression language (BEL) overview of the BioCreative VI (2017) BEL track.

Madan, Sumit; Szostak, Justyna; Komandur Elayavilli, Ravikumar; Tsai, Richard Tzong-Han; Ali, Mehdi; Qian, Longhua; Rastegar-Mojarad, Majid; Hoeng, Julia; Fluck, Juliane.

Database (Oxford) ; 20192019 01 01.

Article in English | MEDLINE | ID: mdl-31603193

ABSTRACT

Knowledge of the molecular interactions of biological and chemical entities and their involvement in biological processes or clinical phenotypes is important for data interpretation. Unfortunately, this knowledge is mostly embedded in the literature in such a way that it is unavailable for automated data analysis procedures. Biological expression language (BEL) is a syntax representation allowing for the structured representation of a broad range of biological relationships. It is used in various situations to extract such knowledge and transform it into BEL networks. To support the tedious and time-intensive extraction work of curators with automated methods, we developed the BEL track within the framework of BioCreative Challenges. Within the BEL track, we provide training data and an evaluation environment to encourage the text mining community to tackle the automatic extraction of complex BEL relationships. In 2017 BioCreative VI, the 2015 BEL track was repeated with new test data. Although only minor improvements in text snippet retrieval for given statements were achieved during this second BEL task iteration, a significant increase of BEL statement extraction performance from provided sentences could be seen. The best performing system reached a 32% F-score for the extraction of complete BEL statements and with the given named entities this increased to 49%. This time, besides rule-based systems, new methods involving hierarchical sequence labeling and neural networks were applied for BEL statement extraction.

Subject(s)

Data Mining , Databases, Factual , Neural Networks, Computer , Vocabulary, Controlled

Extracting chemical-protein relations using attention-based neural networks.

Liu, Sijia; Shen, Feichen; Komandur Elayavilli, Ravikumar; Wang, Yanshan; Rastegar-Mojarad, Majid; Chaudhary, Vipin; Liu, Hongfang.

Database (Oxford) ; 20182018 01 01.

Article in English | MEDLINE | ID: mdl-30295724

ABSTRACT

Relation extraction is an important task in the field of natural language processing. In this paper, we describe our approach for the BioCreative VI Task 5: text mining chemical-protein interactions. We investigate multiple deep neural network (DNN) models, including convolutional neural networks, recurrent neural networks (RNNs) and attention-based (ATT-) RNNs (ATT-RNNs) to extract chemical-protein relations. Our experimental results indicate that ATT-RNN models outperform the same models without using attention and the ATT-gated recurrent unit (ATT-GRU) achieves the best performing micro average F1 score of 0.527 on the test set among the tested DNNs. In addition, the result of word-level attention weights also shows that attention mechanism is effective on selecting the most important trigger words when trained with semantic relation labels without the need of semantic parsing and feature engineering. The source code of this work is available at https://github.com/ohnlp/att-chemprot.

Subject(s)

Algorithms , Databases, Chemical , Databases, Protein , Neural Networks, Computer , Proteins/chemistry

Impact of Patient Reminders on Papanicolaou Test Completion for High-Risk Patients Identified by a Clinical Decision Support System.

MacLaughlin, Kathy L; Kessler, Maya E; Komandur Elayavilli, Ravikumar; Hickey, Branden C; Scheitel, Marianne R; Wagholikar, Kavishwar B; Liu, Hongfang; Kremers, Walter K; Chaudhry, Rajeev.

J Womens Health (Larchmt) ; 27(5): 569-574, 2018 05.

Article in English | MEDLINE | ID: mdl-29297754

ABSTRACT

BACKGROUND: A clinical decision support system (CDSS) for cervical cancer screening identifies patients due for routine cervical cancer screening. Yet, high-risk patients who require more frequent screening or earlier follow-up to address past abnormal results are not identified. We aimed to assess the effect of a complex CDSS, incorporating national guidelines for high-risk patient screening and abnormal result management, its implementation to identify patients overdue for testing, and the outcome of sending a targeted recommendation for follow-up. MATERIALS AND METHODS: At three primary care clinics affiliated with an academic medical center, a reminder recommending an appointment for Papanicolaou (Pap) testing or Pap and human papillomavirus cotesting was sent to high-risk women aged 18 through 65 years (intervention group) identified by CDSS as overdue for testing. Historical control patients, who did not receive a reminder, were identified by CDSS 1 year before the date when reminders were sent to the intervention group. Test completion rates were compared between the intervention and control groups through a generalized estimating equation extension. RESULTS: Across the three sites, the average completion rate of recommended follow-up testing was significantly higher in the intervention group at 23.7% (61/257) than the completion rate at 3.3% (17/516) in the control group (p < 0.001). CONCLUSIONS: A CDSS with enhanced capabilities to identify high-risk women due for cervical cancer testing beyond routine screening intervals, with subsequent patient notification, has the potential to decrease cervical precancer and cancer by improving adherence to guideline-compliant follow-up and needed treatment.

Subject(s)

Decision Support Systems, Clinical , Early Detection of Cancer/statistics & numerical data , Mass Screening , Papanicolaou Test/statistics & numerical data , Patient Compliance/statistics & numerical data , Reminder Systems/statistics & numerical data , Uterine Cervical Neoplasms/diagnosis , Vaginal Smears/statistics & numerical data , Adult , Aged , Female , Humans , Middle Aged , Socioeconomic Factors , Uterine Cervical Neoplasms/prevention & control

Effect of a Novel Clinical Decision Support Tool on the Efficiency and Accuracy of Treatment Recommendations for Cholesterol Management.

Scheitel, Marianne R; Kessler, Maya E; Shellum, Jane L; Peters, Steve G; Milliner, Dawn S; Liu, Hongfang; Komandur Elayavilli, Ravikumar; Poterack, Karl A; Miksch, Timothy A; Boysen, Jennifer; Hankey, Ron A; Chaudhry, Rajeev.

Appl Clin Inform ; 8(1): 124-136, 2017 Feb 08.

Article in English | MEDLINE | ID: mdl-28174820

ABSTRACT

BACKGROUND: The 2013 American College of Cardiology / American Heart Association Guidelines for the Treatment of Blood Cholesterol emphasize treatment based on cardiovascular risk. But finding time in a primary care visit to manually calculate cardiovascular risk and prescribe treatment based on risk is challenging. We developed an informatics-based clinical decision support tool, MayoExpertAdvisor, to deliver automated cardiovascular risk scores and guideline-based treatment recommendations based on patient-specific data in the electronic heath record. OBJECTIVE: To assess the impact of our clinical decision support tool on the efficiency and accuracy of clinician calculation of cardiovascular risk and its effect on the delivery of guideline-consistent treatment recommendations. METHODS: Clinicians were asked to review the EHR records of selected patients. We evaluated the amount of time and the number of clicks and keystrokes needed to calculate cardiovascular risk and provide a treatment recommendation with and without our clinical decision support tool. We also compared the treatment recommendation arrived at by clinicians with and without the use of our tool to those recommended by the guidelines. RESULTS: Clinicians saved 3 minutes and 38 seconds in completing both tasks with MayoExpertAdvisor, used 94 fewer clicks and 23 fewer key strokes, and improved accuracy from the baseline of 60.61% to 100% for both the risk score calculation and guideline-consistent treatment recommendation. CONCLUSION: Informatics solution can greatly improve the efficiency and accuracy of individualized treatment recommendations and have the potential to increase guideline compliance.

Subject(s)

Anticholesteremic Agents/therapeutic use , Cholesterol/metabolism , Decision Support Systems, Clinical , Anticholesteremic Agents/pharmacology , Cardiovascular Diseases/therapy , Electronic Health Records , Primary Health Care , Risk Factors , Surveys and Questionnaires

Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts.

Wang, Yanshan; Rastegar-Mojarad, Majid; Komandur-Elayavilli, Ravikumar; Liu, Hongfang.

Database (Oxford) ; 20172017 Jan 01.

Article in English | MEDLINE | ID: mdl-31725862

ABSTRACT

The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop portal aiming at facilitating their reuse for accelerating scientific advances. However, as the number of biomedical datasets stored and indexed increases, it becomes more and more challenging to retrieve the relevant datasets according to researchers' queries. In this article, we propose an information retrieval (IR) system to tackle this problem and implement it for the bioCADDIE Dataset Retrieval Challenge. The system leverages the unstructured texts of each dataset including the title and description for the dataset, and utilizes a state-of-the-art IR model, medical named entity extraction techniques, query expansion with deep learning-based word embeddings and a re-ranking strategy to enhance the retrieval performance. In empirical experiments, we compared the proposed system with 11 baseline systems using the bioCADDIE Dataset Retrieval Challenge datasets. The experimental results show that the proposed system outperforms other systems in terms of inference Average Precision and inference normalized Discounted Cumulative Gain, implying that the proposed system is a viable option for biomedical dataset retrieval. Database URL: https://github.com/yanshanwang/biocaddie2016mayodata.

BELTracker: evidence sentence retrieval for BEL statements.

Rastegar-Mojarad, Majid; Komandur Elayavilli, Ravikumar; Liu, Hongfang.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-27173525

ABSTRACT

Biological expression language (BEL) is one of the main formal representation models of biological networks. The primary source of information for curating biological networks in BEL representation has been literature. It remains a challenge to identify relevant articles and the corresponding evidence statements for curating and validating BEL statements. In this paper, we describe BELTracker, a tool used to retrieve and rank evidence sentences from PubMed abstracts and full-text articles for a given BEL statement (per the 2015 task requirements of BioCreative V BEL Task). The system is comprised of three main components, (i) translation of a given BEL statement to an information retrieval (IR) query, (ii) retrieval of relevant PubMed citations and (iii) finding and ranking the evidence sentences in those citations. BELTracker uses a combination of multiple approaches based on traditional IR, machine learning, and heuristics to accomplish the task. The system identified and ranked at least one fully relevant evidence sentence in the top 10 retrieved sentences for 72 out of 97 BEL statements in the test set. BELTracker achieved a precision of 0.392, 0.532 and 0.615 when evaluated with three criteria, namely full, relaxed and context criteria, respectively, by the task organizers. Our team at Mayo Clinic was the only participant in this task. BELTracker is available as a RESTful API and is available for public use.Database URL: http://www.openbionlp.org:8080/BelTracker/finder/Given_BEL_Statement.

Subject(s)

Computational Biology/methods , Data Mining/methods , Internet , Natural Language Processing , Software , Data Curation , Semantics

Assessing the Need of Discourse-Level Analysis in Identifying Evidence of Drug-Disease Relations in Scientific Literature.

Rastegar-Mojarad, Majid; Komandur Elayavilli, Ravikumar; Li, Dingcheng; Liu, Hongfang.

Stud Health Technol Inform ; 216: 539-43, 2015.

Article in English | MEDLINE | ID: mdl-26262109

ABSTRACT

Relation extraction typically involves the extraction of relations between two or more entities occurring within a single or multiple sentences. In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.

Subject(s)

Data Mining/methods , Drug-Related Side Effects and Adverse Reactions/classification , MEDLINE , Natural Language Processing , Periodicals as Topic/classification , Semantics , Humans , Machine Learning , Needs Assessment , Science

A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing.

Li, Dingcheng; Rastegar Mojarad, Majid; Li, Yanpeng; Sohn, Sunghwan; Mehrabi, Saeed; Komandur Elayavilli, Ravikumar; Yu, Yue; Liu, Hongfang.

Stud Health Technol Inform ; 216: 1033-4, 2015.

Article in English | MEDLINE | ID: mdl-26262333

ABSTRACT

In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. Both manual and automatic evaluations on 500 sentences out of the 7.9 million sentences of frequencies higher than one show that no PHI can be found among them. The promising results provide potentials of releasing those sentences for obtaining sentence-level NLP annotations via crowdsourcing.

Subject(s)

Crowdsourcing/methods , Data Interpretation, Statistical , Electronic Health Records/classification , Machine Learning , Natural Language Processing , Semantics , Language , Minnesota , Pattern Recognition, Automated/methods , Terminology as Topic , Vocabulary, Controlled

Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification.

Sohn, Sunghwan; Wagholikar, Kavishwar B; Li, Dingcheng; Jonnalagadda, Siddhartha R; Tao, Cui; Komandur Elayavilli, Ravikumar; Liu, Hongfang.

J Am Med Inform Assoc ; 20(5): 836-42, 2013.

Article in English | MEDLINE | ID: mdl-23558168

ABSTRACT

BACKGROUND: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. OBJECTIVE: To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text. MATERIALS AND METHODS: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. RESULTS: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. CONCLUSIONS: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.

Subject(s)

Artificial Intelligence , Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Patient Discharge Summaries , Humans , Time

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL