Búsqueda | Portal Regional de la BVS

PGxCorpus, a manually annotated corpus for pharmacogenomics.

Legrand, Joël; Gogdemir, Romain; Bousquet, Cédric; Dalleau, Kevin; Devignes, Marie-Dominique; Digan, William; Lee, Chia-Ju; Ndiaye, Ndeye-Coumba; Petitpain, Nadine; Ringot, Patrice; Smaïl-Tabbone, Malika; Toussaint, Yannick; Coulet, Adrien.

Sci Data ; 7(1): 3, 2020 01 02.

Artículo en Inglés | MEDLINE | ID: mdl-31896797

RESUMEN

Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.

Asunto(s)

Curaduría de Datos , Farmacogenética , Aprendizaje Automático Supervisado , Humanos , PubMed

PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison.

Monnin, Pierre; Legrand, Joël; Husson, Graziella; Ringot, Patrice; Tchechmedjiev, Andon; Jonquet, Clément; Napoli, Amedeo; Coulet, Adrien.

BMC Bioinformatics ; 20(Suppl 4): 139, 2019 Apr 18.

Artículo en Inglés | MEDLINE | ID: mdl-30999867

RESUMEN

BACKGROUND: Pharmacogenomics (PGx) studies how genomic variations impact variations in drug response phenotypes. Knowledge in pharmacogenomics is typically composed of units that have the form of ternary relationships gene variant - drug - adverse event. Such a relationship states that an adverse event may occur for patients having the specified gene variant and being exposed to the specified drug. State-of-the-art knowledge in PGx is mainly available in reference databases such as PharmGKB and reported in scientific biomedical literature. But, PGx knowledge can also be discovered from clinical data, such as Electronic Health Records (EHRs), and in this case, may either correspond to new knowledge or confirm state-of-the-art knowledge that lacks "clinical counterpart" or validation. For this reason, there is a need for automatic comparison of knowledge units from distinct sources. RESULTS: In this article, we propose an approach, based on Semantic Web technologies, to represent and compare PGx knowledge units. To this end, we developed PGxO, a simple ontology that represents PGx knowledge units and their components. Combined with PROV-O, an ontology developed by the W3C to represent provenance information, PGxO enables encoding and associating provenance information to PGx relationships. Additionally, we introduce a set of rules to reconcile PGx knowledge, i.e. to identify when two relationships, potentially expressed using different vocabularies and levels of granularity, refer to the same, or to different knowledge units. We evaluated our ontology and rules by populating PGxO with knowledge units extracted from PharmGKB (2701), the literature (65,720) and from discoveries reported in EHR analysis studies (only 10, manually extracted); and by testing their similarity. We called PGxLOD (PGx Linked Open Data) the resulting knowledge base that represents and reconciles knowledge units of those various origins. CONCLUSIONS: The proposed ontology and reconciliation rules constitute a first step toward a more complete framework for knowledge comparison in PGx. In this direction, the experimental instantiation of PGxO, named PGxLOD, illustrates the ability and difficulties of reconciling various existing knowledge sources.

Asunto(s)

Bases del Conocimiento , Farmacogenética , Minería de Datos , Bases de Datos Factuales , Registros Electrónicos de Salud , Humanos , Bancos de Tejidos

Learning from biomedical linked data to suggest valid pharmacogenes.

Dalleau, Kevin; Marzougui, Yassine; Da Silva, Sébastien; Ringot, Patrice; Ndiaye, Ndeye Coumba; Coulet, Adrien.

J Biomed Semantics ; 8(1): 16, 2017 Apr 20.

Artículo en Inglés | MEDLINE | ID: mdl-28427468

RESUMEN

BACKGROUND: A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, only molecular networks or the biomedical literature were used, whereas many other resources are available. METHOD: We propose here to consume a diverse and larger set of resources using linked data related either to genes, drugs or diseases. One of the advantages of linked data is that they are built on a standard framework that facilitates the joint use of various sources, and thus facilitates considering features of various origins. We propose a selection and linkage of data sources relevant to pharmacogenomics, including for example DisGeNET and Clinvar. We use machine learning to identify and prioritize pharmacogenes that are the most probably valid, considering the selected linked data. This identification relies on the classification of gene-drug pairs as either pharmacogenomically associated or not and was experimented with two machine learning methods -random forest and graph kernel-, which results are compared in this article. RESULTS: We assembled a set of linked data relative to pharmacogenomics, of 2,610,793 triples, coming from six distinct resources. Learning from these data, random forest enables identifying valid pharmacogenes with a F-measure of 0.73, on a 10 folds cross-validation, whereas graph kernel achieves a F-measure of 0.81. A list of top candidates proposed by both approaches is provided and their obtention is discussed.

Asunto(s)

Biología Computacional/métodos , Aprendizaje Automático , Farmacogenética , Web Semántica , Gráficos por Computador , Minería de Datos , Fenotipo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA