Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Stud Health Technol Inform ; 310: 649-653, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269889

RESUMEN

Several studies have shown that about 80% of the medical information in an electronic health record is only available through unstructured data. Resources such as medical terminologies in languages other than English are limited and restrain the NLP tools. We propose here to leverage English based resources in other languages using a combination of translation, word alignment, entity extraction and term normalization (TAXN). We implement this extraction pipeline in an open-source library called "medkit". We demonstrate the interest of this approach through a specific use-case: enriching a phenotypic dictionary for post-acute sequelae in COVID-19 (PASC). TAXN proved to be efficient to propose new synonyms of UMLS terms using a corpus of 70 articles in French with 356 terms enriched with at least one validated new synonym. This study was based on freely available deep-learning models.


Asunto(s)
Multilingüismo , Humanos , Lenguaje , Progresión de la Enfermedad , Registros Electrónicos de Salud
2.
Stud Health Technol Inform ; 302: 768-772, 2023 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-37203492

RESUMEN

Previous work has successfully used machine learning and natural language processing for the phenotyping of Rheumatoid Arthritis (RA) patients in hospitals within the United States and France. Our goal is to evaluate the adaptability of RA phenotyping algorithms to a new hospital, both at the patient and encounter levels. Two algorithms are adapted and evaluated with a newly developed RA gold standard corpus, including annotations at the encounter level. The adapted algorithms offer comparably good performance for patient-level phenotyping on the new corpus (F1 0.68 to 0.82), but lower performance for encounter-level (F1 0.54). Regarding adaptation feasibility and cost, the first algorithm incurred a heavier adaptation burden because it required manual feature engineering. However, it is less computationally intensive than the second, semi-supervised, algorithm.


Asunto(s)
Artritis Reumatoide , Registros Electrónicos de Salud , Humanos , Algoritmos , Artritis Reumatoide/diagnóstico , Aprendizaje Automático , Procesamiento de Lenguaje Natural
3.
J Med Internet Res ; 24(10): e39698, 2022 10 31.
Artículo en Inglés | MEDLINE | ID: mdl-36315239

RESUMEN

Advances in digital medicine now make it possible to use digital twin systems (DTS), which combine (1) extensive patient monitoring through the use of multiple sensors and (2) personalized adaptation of patient care through the use of software. After the artificial pancreas system already operational in children with type 1 diabetes, new DTS could be developed for real-time monitoring and management of children with chronic diseases. Just as providing care for children is a specific discipline-pediatrics-because of their particular characteristics and needs, providing digital care for children also presents particular challenges. This article reviews the technical challenges, mainly related to the problem of data collection in children; the ethical challenges, including the need to preserve the child's place in their care when using DTS; the legal challenges and the dual need to guarantee the safety of DTS for children and to ensure their access to DTS; and the societal challenges, including the needs to maintain human contact and trust between the child and the pediatrician and to limit DTS to specific uses to avoid contributing to a surveillance society and, at another level, to climate change. .


Asunto(s)
Diabetes Mellitus Tipo 1 , Confianza , Niño , Humanos , Adolescente , Enfermedad Crónica , Familia , Diabetes Mellitus Tipo 1/terapia
4.
Stud Health Technol Inform ; 290: 91-95, 2022 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-35672977

RESUMEN

INTRODUCTION: Chemotherapies against cancers are often interrupted due to severe drug toxicities, reducing treatment opportunities. For this reason, the detection of toxicities and their severity from EHRs is of importance for many downstream applications. However toxicity information is dispersed in various sources in the EHRs, making its extraction challenging. METHODS: We introduce OntoTox, an ontology designed to represent chemotherapy toxicities, its attributes and provenance. We illustrated the interest of OntoTox by integrating toxicities and grading information extracted from three heterogeneous sources: EHR questionnaires, semi-structured tables, and free-text. RESULTS: We instantiated 53,510, 2,366 and 54,420 toxicities from questionnaires, tables and free-text respectively, and compared the complementarity and redundancy of the three sources. DISCUSSION: We illustrated with this preliminary study the potential of OntoTox to guide the integration of multiple sources, and identified that the three sources are only moderately overlapping, stressing the need for a common representation.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Neoplasias , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/prevención & control , Registros Electrónicos de Salud , Humanos , Almacenamiento y Recuperación de la Información , Neoplasias/tratamiento farmacológico , Encuestas y Cuestionarios
6.
J Biomed Semantics ; 12(1): 16, 2021 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-34407869

RESUMEN

BACKGROUND: Transfer learning aims at enhancing machine learning performance on a problem by reusing labeled data originally designed for a related, but distinct problem. In particular, domain adaptation consists for a specific task, in reusing training data developedfor the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because they usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains. RESULTS: In this paper, we experiment with transfer learning for the task of relation extraction from biomedical texts, using the TreeLSTM model. We empirically show the impact of TreeLSTM alone and with domain adaptation by obtaining better performances than the state of the art on two biomedical relation extraction tasks and equal performances for two others, for which little annotated data are available. Furthermore, we propose an analysis of the role that syntactic features may play in transfer learning for relation extraction. CONCLUSION: Given the difficulty to manually annotate corpora in the biomedical domain, the proposed transfer learning method offers a promising alternative to achieve good relation extraction performances for domains associated with scarce resources. Also, our analysis illustrates the importance that syntax plays in transfer learning, underlying the importance in this domain to privilege approaches that embed syntactic features.


Asunto(s)
Aprendizaje Automático , Procesamiento de Lenguaje Natural
7.
BMC Med Inform Decis Mak ; 21(1): 171, 2021 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-34039343

RESUMEN

BACKGROUND: Adverse drug reactions (ADRs) are statistically characterized within randomized clinical trials and postmarketing pharmacovigilance, but their molecular mechanism remains unknown in most cases. This is true even for hepatic or skin toxicities, which are classically monitored during drug design. Aside from clinical trials, many elements of knowledge about drug ingredients are available in open-access knowledge graphs, such as their properties, interactions, or involvements in pathways. In addition, drug classifications that label drugs as either causative or not for several ADRs, have been established. METHODS: We propose in this paper to mine knowledge graphs for identifying biomolecular features that may enable automatically reproducing expert classifications that distinguish drugs causative or not for a given type of ADR. In an Explainable AI perspective, we explore simple classification techniques such as Decision Trees and Classification Rules because they provide human-readable models, which explain the classification itself, but may also provide elements of explanation for molecular mechanisms behind ADRs. In summary, (1) we mine a knowledge graph for features; (2) we train classifiers at distinguishing, on the basis of extracted features, drugs associated or not with two commonly monitored ADRs: drug-induced liver injuries (DILI) and severe cutaneous adverse reactions (SCAR); (3) we isolate features that are both efficient in reproducing expert classifications and interpretable by experts (i.e., Gene Ontology terms, drug targets, or pathway names); and (4) we manually evaluate in a mini-study how they may be explanatory. RESULTS: Extracted features reproduce with a good fidelity classifications of drugs causative or not for DILI and SCAR (Accuracy = 0.74 and 0.81, respectively). Experts fully agreed that 73% and 38% of the most discriminative features are possibly explanatory for DILI and SCAR, respectively; and partially agreed (2/3) for 90% and 77% of them. CONCLUSION: Knowledge graphs provide sufficiently diverse features to enable simple and explainable models to distinguish between drugs that are causative or not for ADRs. In addition to explaining classifications, most discriminative features appear to be good candidates for investigating ADR mechanisms further.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Reconocimiento de Normas Patrones Automatizadas , Sistemas de Registro de Reacción Adversa a Medicamentos , Inteligencia Artificial , Estudios de Factibilidad , Humanos , Farmacovigilancia
8.
Sci Data ; 7(1): 3, 2020 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-31896797

RESUMEN

Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.


Asunto(s)
Curaduría de Datos , Farmacogenética , Aprendizaje Automático Supervisado , Humanos , PubMed
9.
J Am Heart Assoc ; 8(14): e011874, 2019 07 16.
Artículo en Inglés | MEDLINE | ID: mdl-31291803

RESUMEN

Background Risk assessment is the cornerstone for atherosclerotic cardiovascular disease ( ASCVD ) treatment decisions. The Pooled Cohort Equations ( PCE ) have not been validated in disaggregated Asian or Hispanic populations, who have heterogeneous cardiovascular risk and outcomes. Methods and Results We used electronic health record data from adults aged 40 to 79 years from a community-based, outpatient healthcare system in northern California between January 1, 2006 and December 31, 2015, without ASCVD and not on statins. We examined the calibration and discrimination of the PCE and recalibrated the equations for disaggregated race/ethnic subgroups. The cohort included 231 622 adults with a mean age of 53.1 (SD 9.7) years and 54.3% women. There were 56 130 Asian (Chinese, Asian Indian, Filipino, Japanese, Vietnamese, and other Asian) and 19 760 Hispanic (Mexican, Puerto Rican, and other Hispanic) patients. There were 2703 events (332 and 189 in Asian and Hispanic patients, respectively) during an average of 3.9 (SD 1.5) years of follow-up. The PCE overestimated risk for NHW s, African Americans, Asians, and Hispanics by 20% to 60%. The extent of overestimation of ASCVD risk varied by disaggregated racial/ethnic subgroups, with a predicted-to-observed ratio of ASCVD events ranging from 1.1 for Puerto Rican patients to 1.9 for Chinese patients. The PCE had adequate discrimination, although it varied significantly by race/ethnic subgroups (C-indices 0.66-0.83). Recalibration of the PCE did not significantly improve its performance. Conclusions Using electronic health record data from a large, real-world population, we found that the PCE generally overestimated ASCVD risk, with marked heterogeneity by disaggregated Asian and Hispanic subgroups.


Asunto(s)
Asiático/estadística & datos numéricos , Aterosclerosis/etnología , Hispánicos o Latinos/estadística & datos numéricos , Adulto , Aterosclerosis/epidemiología , China/etnología , Registros Electrónicos de Salud , Femenino , Humanos , India/etnología , Japón/etnología , Masculino , Americanos Mexicanos/estadística & datos numéricos , Persona de Mediana Edad , Filipinas/etnología , Puerto Rico/etnología , Medición de Riesgo/métodos , Vietnam/etnología
10.
BMC Bioinformatics ; 20(Suppl 4): 139, 2019 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-30999867

RESUMEN

BACKGROUND: Pharmacogenomics (PGx) studies how genomic variations impact variations in drug response phenotypes. Knowledge in pharmacogenomics is typically composed of units that have the form of ternary relationships gene variant - drug - adverse event. Such a relationship states that an adverse event may occur for patients having the specified gene variant and being exposed to the specified drug. State-of-the-art knowledge in PGx is mainly available in reference databases such as PharmGKB and reported in scientific biomedical literature. But, PGx knowledge can also be discovered from clinical data, such as Electronic Health Records (EHRs), and in this case, may either correspond to new knowledge or confirm state-of-the-art knowledge that lacks "clinical counterpart" or validation. For this reason, there is a need for automatic comparison of knowledge units from distinct sources. RESULTS: In this article, we propose an approach, based on Semantic Web technologies, to represent and compare PGx knowledge units. To this end, we developed PGxO, a simple ontology that represents PGx knowledge units and their components. Combined with PROV-O, an ontology developed by the W3C to represent provenance information, PGxO enables encoding and associating provenance information to PGx relationships. Additionally, we introduce a set of rules to reconcile PGx knowledge, i.e. to identify when two relationships, potentially expressed using different vocabularies and levels of granularity, refer to the same, or to different knowledge units. We evaluated our ontology and rules by populating PGxO with knowledge units extracted from PharmGKB (2701), the literature (65,720) and from discoveries reported in EHR analysis studies (only 10, manually extracted); and by testing their similarity. We called PGxLOD (PGx Linked Open Data) the resulting knowledge base that represents and reconciles knowledge units of those various origins. CONCLUSIONS: The proposed ontology and reconciliation rules constitute a first step toward a more complete framework for knowledge comparison in PGx. In this direction, the experimental instantiation of PGxO, named PGxLOD, illustrates the ability and difficulties of reconciling various existing knowledge sources.


Asunto(s)
Bases del Conocimiento , Farmacogenética , Minería de Datos , Bases de Datos Factuales , Registros Electrónicos de Salud , Humanos , Bancos de Tejidos
11.
Sci Rep ; 8(1): 15558, 2018 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-30349060

RESUMEN

Prescribing the right drug with the right dose is a central tenet of precision medicine. We examined the use of patients' prior Electronic Health Records to predict a reduction in drug dosage. We focus on drugs that interact with the P450 enzyme family, because their dosage is known to be sensitive and variable. We extracted diagnostic codes, conditions reported in clinical notes, and laboratory orders from Stanford's clinical data warehouse to construct cohorts of patients that either did or did not need a dose change. After feature selection, we trained models to predict the patients who will (or will not) require a dose change after being prescribed one of 34 drugs across 23 drug classes. Overall, we can predict (AUC ≥ 0.70-0.95) a dose reduction for 23 drugs and 22 drug classes. Several of these drugs are associated with clinical guidelines that recommend dose reduction exclusively in the case of adverse reaction. For these cases, a reduction in dosage may be considered as a surrogate for an adverse reaction, which our system could indirectly help predict and prevent. Our study illustrates the role machine learning may take in providing guidance in setting the starting dose for drugs associated with response variability.


Asunto(s)
Inhibidores Enzimáticos del Citocromo P-450/administración & dosificación , Inhibidores Enzimáticos del Citocromo P-450/efectos adversos , Cálculo de Dosificación de Drogas , Registros Electrónicos de Salud/estadística & datos numéricos , Humanos , Aprendizaje Automático
12.
J Biomed Semantics ; 8(1): 29, 2017 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-28830518

RESUMEN

BACKGROUND: Patient data, such as electronic health records or adverse event reporting systems, constitute an essential resource for studying Adverse Drug Events (ADEs). We explore an original approach to identify frequently associated ADEs in subgroups of patients. RESULTS: Because ADEs have complex manifestations, we use formal concept analysis and its pattern structures, a mathematical framework that allows generalization using domain knowledge formalized in medical ontologies. Results obtained with three different settings and two different datasets show that this approach is flexible and allows extraction of association rules at various levels of generalization. CONCLUSIONS: The chosen approach permits an expressive representation of a patient ADEs. Extracted association rules point to distinct ADEs that occur in a same group of patients, and could serve as a basis for a recommandation system. The proposed representation is flexible and can be extended to make use of additional ontologies and various patient records.


Asunto(s)
Ontologías Biológicas , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Reconocimiento de Normas Patrones Automatizadas , Registros Electrónicos de Salud , Humanos , Fenotipo
13.
J Biomed Semantics ; 8(1): 16, 2017 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-28427468

RESUMEN

BACKGROUND: A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, only molecular networks or the biomedical literature were used, whereas many other resources are available. METHOD: We propose here to consume a diverse and larger set of resources using linked data related either to genes, drugs or diseases. One of the advantages of linked data is that they are built on a standard framework that facilitates the joint use of various sources, and thus facilitates considering features of various origins. We propose a selection and linkage of data sources relevant to pharmacogenomics, including for example DisGeNET and Clinvar. We use machine learning to identify and prioritize pharmacogenes that are the most probably valid, considering the selected linked data. This identification relies on the classification of gene-drug pairs as either pharmacogenomically associated or not and was experimented with two machine learning methods -random forest and graph kernel-, which results are compared in this article. RESULTS: We assembled a set of linked data relative to pharmacogenomics, of 2,610,793 triples, coming from six distinct resources. Learning from these data, random forest enables identifying valid pharmacogenes with a F-measure of 0.73, on a 10 folds cross-validation, whereas graph kernel achieves a F-measure of 0.81. A list of top candidates proposed by both approaches is provided and their obtention is discussed.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Farmacogenética , Web Semántica , Gráficos por Computador , Minería de Datos , Fenotipo
14.
J Am Med Inform Assoc ; 19(e1): e177-86, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22494789

RESUMEN

BACKGROUND: Profiling the allocation and trend of research activity is of interest to funding agencies, administrators, and researchers. However, the lack of a common classification system hinders the comprehensive and systematic profiling of research activities. This study introduces ontology-based annotation as a method to overcome this difficulty. Analyzing over a decade of funding data and publication data, the trends of disease research are profiled across topics, across institutions, and over time. RESULTS: This study introduces and explores the notions of research sponsorship and allocation and shows that leaders of research activity can be identified within specific disease areas of interest, such as those with high mortality or high sponsorship. The funding profiles of disease topics readily cluster themselves in agreement with the ontology hierarchy and closely mirror the funding agency priorities. Finally, four temporal trends are identified among research topics. CONCLUSIONS: This work utilizes disease ontology (DO)-based annotation to profile effectively the landscape of biomedical research activity. By using DO in this manner a use-case driven mechanism is also proposed to evaluate the utility of classification hierarchies.


Asunto(s)
Bibliometría , Investigación Biomédica/clasificación , Investigación Biomédica/estadística & datos numéricos , Enfermedad/clasificación , Humanos , Publicaciones Periódicas como Asunto , Investigadores , Apoyo a la Investigación como Asunto/estadística & datos numéricos
15.
Pharmacogenomics ; 13(2): 201-12, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22256869

RESUMEN

Understanding how each individual's genetics and physiology influences pharmaceutical response is crucial to the realization of personalized medicine and the discovery and validation of pharmacogenomic biomarkers is key to its success. However, integration of genotype and phenotype knowledge in medical information systems remains a critical challenge. The inability to easily and accurately integrate the results of biomolecular studies with patients' medical records and clinical reports prevents us from realizing the full potential of pharmacogenomic knowledge for both drug development and clinical practice. Herein, we describe approaches using Semantic Web technologies, in which pharmacogenomic knowledge relevant to drug development and medical decision support is represented in such a way that it can be efficiently accessed both by software and human experts. We suggest that this approach increases the utility of data, and that such computational technologies will become an essential part of personalized medicine, alongside diagnostics and pharmaceutical products.


Asunto(s)
Bases de Datos Genéticas/tendencias , Sistemas de Información , Farmacogenética/tendencias , Medicina de Precisión/métodos , Medicina de Precisión/tendencias , Humanos , Internet/tendencias , Semántica
16.
Web Semant ; 9(3): 316-324, 2011 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-21918645

RESUMEN

The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index-a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics "under the hood."

17.
J Biomed Semantics ; 2 Suppl 2: S10, 2011 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-21624156

RESUMEN

BACKGROUND: Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering. RESULTS: We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network. CONCLUSIONS: The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at http://purl.bioontology.org/ontology/PHARE.

18.
Adv Exp Med Biol ; 696: 357-66, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21431576

RESUMEN

One current challenge in biomedicine is to analyze large amounts of complex biological data for extracting domain knowledge. This work holds on the use of knowledge-based techniques such as knowledge discovery (KD) and knowledge representation (KR) in pharmacogenomics, where knowledge units represent genotype-phenotype relationships in the context of a given treatment. An objective is to design knowledge base (KB, here also mentioned as an ontology) and then to use it in the KD process itself. A method is proposed for dealing with two main tasks: (1) building a KB from heterogeneous data related to genotype, phenotype, and treatment, and (2) applying KD techniques on knowledge assertions for extracting genotype-phenotype relationships. An application was carried out on a clinical trial concerned with the variability of drug response to montelukast treatment. Genotype-genotype and genotype-phenotype associations were retrieved together with new associations, allowing the extension of the initial KB. This experiment shows the potential of KR and KD processes, especially for designing KB, checking KB consistency, and reasoning for problem solving.


Asunto(s)
Farmacogenética/estadística & datos numéricos , Acetatos/farmacología , Antiasmáticos/farmacología , Asma/tratamiento farmacológico , Asma/genética , Biología Computacional , Ciclopropanos , Interpretación Estadística de Datos , Minería de Datos/estadística & datos numéricos , Bases de Datos Genéticas , Estudios de Asociación Genética/estadística & datos numéricos , Humanos , Bases del Conocimiento , Modelos Logísticos , Quinolinas/farmacología , Sulfuros
19.
Pharmacogenomics ; 11(10): 1467-89, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21047206

RESUMEN

The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications.


Asunto(s)
Minería de Datos/tendencias , Farmacogenética/métodos , Animales , Minería de Datos/métodos , Bases de Datos Genéticas/tendencias , Humanos , Almacenamiento y Recuperación de la Información/métodos , Almacenamiento y Recuperación de la Información/tendencias , Farmacogenética/estadística & datos numéricos , Farmacogenética/tendencias
20.
J Biomed Inform ; 43(6): 1009-19, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20723615

RESUMEN

Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.


Asunto(s)
Farmacogenética/métodos , Semántica , Bases de Datos Factuales , MEDLINE , Procesamiento de Lenguaje Natural , Terminología como Asunto , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...