Your browser doesn't support javascript.
loading
Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.
Sybrandt, Justin; Shtutman, Michael; Safro, Ilya.
Afiliação
  • Sybrandt J; Clemson University, School of Computing, Clemson, USA.
  • Shtutman M; University of South Carolina, Drug Discovery and Biomedical Sciences, Columbia, USA.
  • Safro I; Clemson University, School of Computing, Clemson, USA.
Proc IEEE Int Conf Big Data ; 2018: 1494-1503, 2018 Dec.
Article em En | MEDLINE | ID: mdl-35789222
ABSTRACT
The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system by its ability to rank hypotheses by plausibility; a process reminiscent of human candidate selection. Because HG systems do not produce a ranking criteria, specifically those that produce topic models, we additionally present novel metrics to quantify the plausibility of hypotheses given topic model system output. Finally, we demonstrate that our proposed validation method aligns with real-world research goals by deploying our method within MOLIERE, our recent topic-driven HG system, in order to automatically generate a set of candidate genes related to HIV-associated neurodegenerative disease (HAND). By performing laboratory experiments based on this candidate set, we discover a new connection between HAND and Dead Box RNA Helicase 3 (DDX3). Reproducibility code, validation data, and results can be found at sybrandt.com/2018/validation.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Proc IEEE Int Conf Big Data Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Proc IEEE Int Conf Big Data Ano de publicação: 2018 Tipo de documento: Article