Unsupervised text mining for assessing and augmenting GWAS results.

Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence

Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence.

Affiliation

Ailem M; LIPADE, Université Paris Descartes, Sorbonne Paris Cité, Paris F-75006, France.
Role F; LIPADE, Université Paris Descartes, Sorbonne Paris Cité, Paris F-75006, France.
Nadif M; LIPADE, Université Paris Descartes, Sorbonne Paris Cité, Paris F-75006, France.
Demenais F; INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris F-75010, France; Institut Universitaire d'Hématologie, Université Paris Diderot, Sorbonne Paris Cité, Paris F-75010, France.

J Biomed Inform ; 60: 252-9, 2016 Apr.

Article in En | MEDLINE | ID: mdl-26911523

ABSTRACT

Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma.

Subject(s)

Computational Biology/methods; Data Mining/methods; Genome-Wide Association Study; Algorithms; Asthma/genetics; Cluster Analysis; Genetic Predisposition to Disease; Genome, Human; Genomics; Humans; Phenotype; Polymorphism, Single Nucleotide

Key words

Asthma; Clustering; GWAS; Unsupervised text mining

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Computational Biology / Genome-Wide Association Study / Data Mining Limits: Humans Language: En Journal: J Biomed Inform Journal subject: INFORMATICA MEDICA Year: 2016 Type: Article Affiliation country: France

Fulltext

XML

PubMed Links

Search on Google