Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Med Inform Decis Mak ; 19(Suppl 3): 69, 2019 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-30943957

RESUMEN

BACKGROUND: The Health Information Technology for Economic and Clinical Health Act (HITECH) has greatly accelerated the adoption of electronic health records (EHRs) with the promise of better clinical decisions and patients' outcomes. One of the core criteria for "Meaningful Use" of EHRs is to have a problem list that shows the most important health problems faced by a patient. The implementation of problem lists in EHRs has a potential to help practitioners to provide customized care to patients. However, it remains an open question on how to leverage problem lists in different practice settings to provide tailored care, of which the bottleneck lies in the associations between problem list and practice setting. METHODS: In this study, using sampled clinical documents associated with a cohort of patients who received their primary care at Mayo Clinic, we investigated the associations between problem list and practice setting through natural language processing (NLP) and topic modeling techniques. Specifically, after practice settings and problem lists were normalized, statistical χ2 test, term frequency-inverse document frequency (TF-IDF) and enrichment analysis were used to choose representative concepts for each setting. Then Latent Dirichlet Allocations (LDA) were used to train topic models and predict potential practice settings using similarity metrics based on the problem concepts representative of practice settings. Evaluation was conducted through 5-fold cross validation and Recall@k, Precision@k and F1@k were calculated. RESULTS: Our method can generate prioritized and meaningful problem lists corresponding to specific practice settings. For practice setting prediction, recall increases from 0.719 (k = 2) to 0.931 (k = 10), precision increases from 0.882 (k = 2) to 0.931 (k = 10) and F1 increases from 0.790 (k = 2) to 0.931 (k = 10). CONCLUSION: To our best knowledge, our study is the first attempting to discover the association between the problem lists and hospital practice settings. In the future, we plan to investigate how to provide more tailored care by utilizing the association between problem list and practice setting revealed in this study.


Asunto(s)
Uso Significativo , Informática Médica , Algoritmos , Registros Electrónicos de Salud , Hospitales , Humanos , Procesamiento de Lenguaje Natural , Atención Primaria de Salud
2.
J Biomed Inform ; 87: 12-20, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30217670

RESUMEN

BACKGROUND: Word embeddings have been prevalently used in biomedical Natural Language Processing (NLP) applications due to the ability of the vector representations being able to capture useful semantic properties and linguistic relationships between words. Different textual resources (e.g., Wikipedia and biomedical literature corpus) have been utilized in biomedical NLP to train word embeddings and these word embeddings have been commonly leveraged as feature input to downstream machine learning models. However, there has been little work on evaluating the word embeddings trained from different textual resources. METHODS: In this study, we empirically evaluated word embeddings trained from four different corpora, namely clinical notes, biomedical publications, Wikipedia, and news. For the former two resources, we trained word embeddings using unstructured electronic health record (EHR) data available at Mayo Clinic and articles (MedLit) from PubMed Central, respectively. For the latter two resources, we used publicly available pre-trained word embeddings, GloVe and Google News. The evaluation was done qualitatively and quantitatively. For the qualitative evaluation, we randomly selected medical terms from three categories (i.e., disorder, symptom, and drug), and manually inspected the five most similar words computed by embeddings for each term. We also analyzed the word embeddings through a 2-dimensional visualization plot of 377 medical terms. For the quantitative evaluation, we conducted both intrinsic and extrinsic evaluation. For the intrinsic evaluation, we evaluated the word embeddings' ability to capture medical semantics by measruing the semantic similarity between medical terms using four published datasets: Pedersen's dataset, Hliaoutakis's dataset, MayoSRS, and UMNSRS. For the extrinsic evaluation, we applied word embeddings to multiple downstream biomedical NLP applications, including clinical information extraction (IE), biomedical information retrieval (IR), and relation extraction (RE), with data from shared tasks. RESULTS: The qualitative evaluation shows that the word embeddings trained from EHR and MedLit can find more similar medical terms than those trained from GloVe and Google News. The intrinsic quantitative evaluation verifies that the semantic similarity captured by the word embeddings trained from EHR is closer to human experts' judgments on all four tested datasets. The extrinsic quantitative evaluation shows that the word embeddings trained on EHR achieved the best F1 score of 0.900 for the clinical IE task; no word embeddings improved the performance for the biomedical IR task; and the word embeddings trained on Google News had the best overall F1 score of 0.790 for the RE task. CONCLUSION: Based on the evaluation results, we can draw the following conclusions. First, the word embeddings trained from EHR and MedLit can capture the semantics of medical terms better, and find semantically relevant medical terms closer to human experts' judgments than those trained from GloVe and Google News. Second, there does not exist a consistent global ranking of word embeddings for all downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, the word embeddings trained from the biomedical domain corpora do not necessarily have better performance than those trained from the general domain corpora for any downstream biomedical NLP task.


Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Informática Médica/métodos , Procesamiento de Lenguaje Natural , Unified Medical Language System , Adolescente , Adulto , Anciano , Salud de la Familia , Femenino , Humanos , Almacenamiento y Recuperación de la Información , Lingüística , Masculino , Persona de Mediana Edad , Minnesota , Modelos Estadísticos , Probabilidad , PubMed , Semántica , Adulto Joven
3.
J Biomed Inform ; 77: 34-49, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-29162496

RESUMEN

BACKGROUND: With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES: In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS: A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS: A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS: Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Informática Médica/tendencias , Humanos , Uso Significativo , Procesamiento de Lenguaje Natural , Proyectos de Investigación
4.
J Med Internet Res ; 19(10): e342, 2017 10 16.
Artículo en Inglés | MEDLINE | ID: mdl-29038097

RESUMEN

BACKGROUND: Self-management is crucial to diabetes care and providing expert-vetted content for answering patients' questions is crucial in facilitating patient self-management. OBJECTIVE: The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients. METHODS: We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents. RESULTS: We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively. CONCLUSIONS: This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients' questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums.


Asunto(s)
Diabetes Mellitus/terapia , Almacenamiento y Recuperación de la Información/métodos , Educación del Paciente como Asunto/métodos , Bases de Datos Factuales , Humanos , Encuestas y Cuestionarios
5.
Bioinformatics ; 31(12): 1981-7, 2015 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-25657332

RESUMEN

MOTIVATION: Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery. RESULTS: As an alternative to ICD9 coding, a text-based phenome was defined by 23 384 clinically relevant terms extracted from Marshfield Clinic's EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P<0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.


Asunto(s)
Biología Computacional/métodos , Enfermedad/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Genotipo , Humanos
6.
J Biomed Inform ; 55: 206-17, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25917055

RESUMEN

Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos/organización & administración , Minería de Datos/métodos , Sistemas de Administración de Bases de Datos/organización & administración , Bases de Datos Factuales , Interacciones Farmacológicas , Procesamiento de Lenguaje Natural , Internet/organización & administración , Aprendizaje Automático , Registro Médico Coordinado/métodos , Farmacovigilancia
7.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31603193

RESUMEN

Knowledge of the molecular interactions of biological and chemical entities and their involvement in biological processes or clinical phenotypes is important for data interpretation. Unfortunately, this knowledge is mostly embedded in the literature in such a way that it is unavailable for automated data analysis procedures. Biological expression language (BEL) is a syntax representation allowing for the structured representation of a broad range of biological relationships. It is used in various situations to extract such knowledge and transform it into BEL networks. To support the tedious and time-intensive extraction work of curators with automated methods, we developed the BEL track within the framework of BioCreative Challenges. Within the BEL track, we provide training data and an evaluation environment to encourage the text mining community to tackle the automatic extraction of complex BEL relationships. In 2017 BioCreative VI, the 2015 BEL track was repeated with new test data. Although only minor improvements in text snippet retrieval for given statements were achieved during this second BEL task iteration, a significant increase of BEL statement extraction performance from provided sentences could be seen. The best performing system reached a 32% F-score for the extraction of complete BEL statements and with the given named entities this increased to 49%. This time, besides rule-based systems, new methods involving hierarchical sequence labeling and neural networks were applied for BEL statement extraction.


Asunto(s)
Minería de Datos , Bases de Datos Factuales , Redes Neurales de la Computación , Vocabulario Controlado
8.
JMIR Cancer ; 4(1): e10, 2018 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-29764801

RESUMEN

BACKGROUND: Patient education materials given to breast cancer survivors may not be a good fit for their information needs. Needs may change over time, be forgotten, or be misreported, for a variety of reasons. An automated content analysis of survivors' postings to online health forums can identify expressed information needs over a span of time and be repeated regularly at low cost. Identifying these unmet needs can guide improvements to existing education materials and the creation of new resources. OBJECTIVE: The primary goals of this project are to assess the unmet information needs of breast cancer survivors from their own perspectives and to identify gaps between information needs and current education materials. METHODS: This approach employs computational methods for content modeling and supervised text classification to data from online health forums to identify explicit and implicit requests for health-related information. Potential gaps between needs and education materials are identified using techniques from information retrieval. RESULTS: We provide a new taxonomy for the classification of sentences in online health forum data. 260 postings from two online health forums were selected, yielding 4179 sentences for coding. After annotation of data and training alternative one-versus-others classifiers, a random forest-based approach achieved F1 scores from 66% (Other, dataset2) to 90% (Medical, dataset1) on the primary information types. 136 expressions of need were used to generate queries to indexed education materials. Upon examination of the best two pages retrieved for each query, 12% (17/136) of queries were found to have relevant content by all coders, and 33% (45/136) were judged to have relevant content by at least one. CONCLUSIONS: Text from online health forums can be analyzed effectively using automated methods. Our analysis confirms that breast cancer survivors have many information needs that are not covered by the written documents they typically receive, as our results suggest that at most a third of breast cancer survivors' questions would be addressed by the materials currently provided to them.

9.
Database (Oxford) ; 20182018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30295724

RESUMEN

Relation extraction is an important task in the field of natural language processing. In this paper, we describe our approach for the BioCreative VI Task 5: text mining chemical-protein interactions. We investigate multiple deep neural network (DNN) models, including convolutional neural networks, recurrent neural networks (RNNs) and attention-based (ATT-) RNNs (ATT-RNNs) to extract chemical-protein relations. Our experimental results indicate that ATT-RNN models outperform the same models without using attention and the ATT-gated recurrent unit (ATT-GRU) achieves the best performing micro average F1 score of 0.527 on the test set among the tested DNNs. In addition, the result of word-level attention weights also shows that attention mechanism is effective on selecting the most important trigger words when trained with semantic relation labels without the need of semantic parsing and feature engineering. The source code of this work is available at https://github.com/ohnlp/att-chemprot.


Asunto(s)
Algoritmos , Bases de Datos de Compuestos Químicos , Bases de Datos de Proteínas , Redes Neurales de la Computación , Proteínas/química
10.
Front Pharmacol ; 9: 875, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30131701

RESUMEN

Multiple data sources are preferred in adverse drug event (ADEs) surveillance owing to inadequacies of single source. However, analytic methods to monitor potential ADEs after prolonged drug exposure are still lacking. In this study we propose a method aiming to screen potential ADEs by combining FDA Adverse Event Reporting System (FAERS) and Electronic Medical Record (EMR). The proposed method uses natural language processing (NLP) techniques to extract treatment outcome information captured in unstructured text and adopts case-crossover design in EMR. Performances were evaluated using two ADE knowledge bases: Adverse Drug Reaction Classification System (ADReCS) and SIDER. We tested our method in ADE signal detection of conventional disease-modifying antirheumatic drugs (DMARDs) in rheumatoid arthritis patients. Findings showed that recall greatly increased when combining FAERS with EMR compared with FAERS alone and EMR alone, especially for flexible mapping strategy. Precision (FAERS + EMR) in detecting ADEs improved using ADReCS as gold standard compared with SIDER. In addition, signals detected from EMR have considerably overlapped with signals detected from FAERS or ADE knowledge bases, implying the importance of EMR for pharmacovigilance. ADE signals detected from EMR and/or FAERS but not in existing knowledge bases provide hypothesis for future study.

11.
Database (Oxford) ; 2017(1)2017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28365720

RESUMEN

Extracting meaningful relationships with semantic significance from biomedical literature is often a challenging task. BioCreative V track4 challenge for the first time has organized a comprehensive shared task to test the robustness of the text-mining algorithms in extracting semantically meaningful assertions from the evidence statement in biomedical text. In this work, we tested the ability of a rule-based semantic parser to extract Biological Expression Language (BEL) statements from evidence sentences culled out of biomedical literature as part of BioCreative V Track4 challenge. The system achieved an overall best F-measure of 21.29% in extracting the complete BEL statement. For relation extraction, the system achieved an F-measure of 65.13% on test data set. Our system achieved the best performance in five of the six criteria that was adopted for evaluation by the task organizers. Lack of ability to derive semantic inferences, limitation in the rule sets to map the textual extractions to BEL function were some of the reasons for low performance in extracting the complete BEL statement. Post shared task we also evaluated the impact of differential NER components on the ability to extract BEL statements on the test data sets besides making a single change in the rule sets that translate relation extractions into a BEL statement. There is a marked improvement by over 20% in the overall performance of the BELMiner's capability to extract BEL statement on the test set. The system is available as a REST-API at http://54.146.11.205:8484/BELXtractor/finder/. Database URL: http://54.146.11.205:8484/BELXtractor/finder/.


Asunto(s)
Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Programas Informáticos , Semántica
12.
AMIA Jt Summits Transl Sci Proc ; 2017: 95-103, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28815115

RESUMEN

The use of multiple data sources has been preferred in the surveillance of adverse drug events due to shortcomings of using only a single source. In this study, we proposed a framework where the ADEs associated with interested drugs are systematically discovered from the FDA's Adverse Event Reporting System (AERS), and then validated through mining unstructured clinical notes from Electronic Medical Records (EMRs). This framework has two features. First, a higher priority was given to clinical practice during signal detection and validation. Second, the normalization by NLP facilitated the interoperation between AERS-DM and the EMR. To demonstrate this methodology, we investigated potential ADEs associated with drugs (class level) for rheumatoid arthritis (RA) patients. The results demonstrated the feasibility and sufficient accuracy of the framework. The framework can serve as the interface between the informatics domain and the medical domain to facilitate ADE discovery.

13.
Database (Oxford) ; 20172017 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-31725862

RESUMEN

The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop portal aiming at facilitating their reuse for accelerating scientific advances. However, as the number of biomedical datasets stored and indexed increases, it becomes more and more challenging to retrieve the relevant datasets according to researchers' queries. In this article, we propose an information retrieval (IR) system to tackle this problem and implement it for the bioCADDIE Dataset Retrieval Challenge. The system leverages the unstructured texts of each dataset including the title and description for the dataset, and utilizes a state-of-the-art IR model, medical named entity extraction techniques, query expansion with deep learning-based word embeddings and a re-ranking strategy to enhance the retrieval performance. In empirical experiments, we compared the proposed system with 11 baseline systems using the bioCADDIE Dataset Retrieval Challenge datasets. The experimental results show that the proposed system outperforms other systems in terms of inference Average Precision and inference normalized Discounted Cumulative Gain, implying that the proposed system is a viable option for biomedical dataset retrieval. Database URL: https://github.com/yanshanwang/biocaddie2016mayodata.

14.
AMIA Jt Summits Transl Sci Proc ; 2017: 104-113, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28815117

RESUMEN

Family history is an important component in modern clinical care especially in the era of precision medicine. Family history information in the Electronic Health Record (EHR) system is usually stored in structured format as well as in free-text format. In this study, we systematically analyzed a family history text corpus from 3 million clinical notes for the patients receiving their primary care at Mayo Clinic. Family members, medical problems, and their associations were analyzed and reported. Our findings showed a great agreement between positive/negated medical problems mentioned in the diagnosis report and the family history, as measured by observed agreement and random agreement. We also found that the family history of some medical problems existed up to 10~15 years prior to the diagnosis date of such problems. Finally two patient cases were studied to show the medical problems in the diagnosis and family history associated with the timeline.

15.
Int J Med Inform ; 108: 78-84, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-29132635

RESUMEN

Clinical registries are designed to collect information relating to a particular condition for research or quality improvement. Intuitively, informatics in the area of data management and extraction plays a central role in clinical registries. Due to various reasons such as lack of informatics awareness or expertise, there may be little informatics involvement in designing clinical registries. In this paper, we studied a clinical registry from two critical perspectives, data quality and interoperability, where informatics can play a role. We evaluated these two aspects of an existing registry, Gynecology Surgery Registry, by mapping data elements and value sets, used in the registry, to a standardized terminology, SNOMED-CT. The results showed that majority of the values are ad-hoc and only 6 of 91 procedures in the registry could be mapped to the SNOMED-CT. To tackle this issue, we assessed the feasibility of automated data abstraction process, by training machine learning classifiers, based on existing manually extracted data. These classifiers achieved a reasonable average F-measure of 0.94. We concluded that more informatics engagement is needed to improve the interoperability, reusability, and quality of the registry.


Asunto(s)
Interoperabilidad de la Información en Salud/normas , Almacenamiento y Recuperación de la Información/métodos , Mejoramiento de la Calidad , Sistema de Registros/normas , Systematized Nomenclature of Medicine , Humanos , Aprendizaje Automático
16.
AMIA Jt Summits Transl Sci Proc ; 2017: 522-530, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28815152

RESUMEN

The physicians' biographical pages are essential in providing information about physicians' specialties. However, physicians may not have biographical pages or the current pages are not comprehensive. We hypothesize that physicians' specialty information can be mined from Electronic Medical Records (EMRs) of their patients. We proposed an automated physician specialty populating (PSP) system that analyzes physician-ascertained diagnoses in EMRs, aggregates them to an appropriate granularity based on the current biographical pages, and populates the biographical pages accordingly. In this study, we applied the system using EMR data from Mayo Clinic and evaluated the system using the current biographical pages regarding various ranking strategies. Preliminary results demonstrated that using EMR data is a scalable and systematic way to populate physicians' biographical pages.

17.
Artículo en Inglés | MEDLINE | ID: mdl-27173525

RESUMEN

Biological expression language (BEL) is one of the main formal representation models of biological networks. The primary source of information for curating biological networks in BEL representation has been literature. It remains a challenge to identify relevant articles and the corresponding evidence statements for curating and validating BEL statements. In this paper, we describe BELTracker, a tool used to retrieve and rank evidence sentences from PubMed abstracts and full-text articles for a given BEL statement (per the 2015 task requirements of BioCreative V BEL Task). The system is comprised of three main components, (i) translation of a given BEL statement to an information retrieval (IR) query, (ii) retrieval of relevant PubMed citations and (iii) finding and ranking the evidence sentences in those citations. BELTracker uses a combination of multiple approaches based on traditional IR, machine learning, and heuristics to accomplish the task. The system identified and ranked at least one fully relevant evidence sentence in the top 10 retrieved sentences for 72 out of 97 BEL statements in the test set. BELTracker achieved a precision of 0.392, 0.532 and 0.615 when evaluated with three criteria, namely full, relaxed and context criteria, respectively, by the task organizers. Our team at Mayo Clinic was the only participant in this task. BELTracker is available as a RESTful API and is available for public use.Database URL: http://www.openbionlp.org:8080/BelTracker/finder/Given_BEL_Statement.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Internet , Procesamiento de Lenguaje Natural , Programas Informáticos , Curaduría de Datos , Semántica
18.
JMIR Res Protoc ; 5(2): e121, 2016 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-27311964

RESUMEN

BACKGROUND: Drug repurposing (defined as discovering new indications for existing drugs) could play a significant role in drug development, especially considering the declining success rates of developing novel drugs. Typically, new indications for existing medications are identified by accident. However, new technologies and a large number of available resources enable the development of systematic approaches to identify and validate drug-repurposing candidates. Patients today report their experiences with medications on social media and reveal side effects as well as beneficial effects of those medications. OBJECTIVE: Our aim was to assess the feasibility of using patient reviews from social media to identify potential candidates for drug repurposing. METHODS: We retrieved patient reviews of 180 medications from an online forum, WebMD. Using dictionary-based and machine learning approaches, we identified disease names in the reviews. Several publicly available resources were used to exclude comments containing known indications and adverse drug effects. After manually reviewing some of the remaining comments, we implemented a rule-based system to identify beneficial effects. RESULTS: The dictionary-based system and machine learning system identified 2178 and 6171 disease names respectively in 64,616 patient comments. We provided a list of 10 common patterns that patients used to report any beneficial effects or uses of medication. After manually reviewing the comments tagged by our rule-based system, we identified five potential drug repurposing candidates. CONCLUSIONS: To our knowledge, this is the first study to consider using social media data to identify drug-repurposing candidates. We found that even a rule-based system, with a limited number of rules, could identify beneficial effect mentions in patient comments. Our preliminary study shows that social media has the potential to be used in drug repurposing.

19.
AMIA Annu Symp Proc ; 2016: 789-798, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28269875

RESUMEN

Classification of drug-drug interaction (DDI) from medical literatures is significant in preventing medication-related errors. Most of the existing machine learning approaches are based on supervised learning methods. However, the dynamic nature of drug knowledge, combined with the enormity and rapidly growing of the biomedical literatures make supervised DDI classification methods easily overfit the corpora and may not meet the needs of real-world applications. In this paper, we proposed a relation classification framework based on topic modeling (RelTM) augmented with distant supervision for the task of DDI from biomedical text. The uniqueness of RelTM lies in its two-level sampling from both DDI and drug entities. Through this design, RelTM take both relation features and drug mention features into considerations. An efficient inference algorithm for the model using Gibbs sampling is also proposed. Compared to the previous supervised models, our approach does not require human efforts such as annotation and labeling, which is its advantage in trending big data applications. Meanwhile, the distant supervision combination allows RelTM to incorporate rich existing knowledge resources provided by domain experts. The experimental results on the 2013 DDI challenge corpus reach 48% in F1 score, showing the effectiveness of RelTM.


Asunto(s)
Algoritmos , Interacciones Farmacológicas , Aprendizaje Automático , Humanos , Almacenamiento y Recuperación de la Información/métodos
20.
Artículo en Inglés | MEDLINE | ID: mdl-27189611

RESUMEN

The process of discovering new drugs has been extremely costly and slow in the last decades despite enormous investment in pharmaceutical research. Drug repurposing enables researchers to speed up the process of discovering other conditions that existing drugs can effectively treat, with low cost and fast FDA approval. Here, we introduce 'RE:fine Drugs', a freely available interactive website for integrated search and discovery of drug repurposing candidates from GWAS and PheWAS repurposing datasets constructed using previously reported methods in Nature Biotechnology. 'RE:fine Drugs' demonstrates the possibilities to identify and prioritize novelty of candidates for drug repurposing based on the theory of transitive Drug-Gene-Disease triads. This public website provides a starting point for research, industry, clinical and regulatory communities to accelerate the investigation and validation of new therapeutic use of old drugs.Database URL: http://drug-repurposing.nationwidechildrens.org.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Farmacéuticas , Reposicionamiento de Medicamentos , Interfaz Usuario-Computador , Quimioterapia , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA