Búsqueda | BVS CLAP/SMR-OPS/OMS

A method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data.

Restificar, Angelo; Korkontzelos, Ioannis; Ananiadou, Sophia.

BMC Med Inform Decis Mak ; 13 Suppl 1: S6, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23566239

RESUMEN

BACKGROUND: We consider the user task of designing clinical trial protocols and propose a method that discovers and outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documentsD',|D'|âª|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D, D â D', by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. The appropriateness is measured by the degree to which they are consistent with the user-supplied sample documents D'. METHOD: We propose a novel three-step method called LDALR which views documents as a mixture of latent topics. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA). Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. RESULTS: Our experiments have shown that LDALR is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments using LDALR, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria. CONCLUSIONS: We have proposed LDALR, a practical method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Results from our experiments suggest that LDALR models can be used to effectively find appropriate eligibility criteria from a large repository of clinical trial protocols.

Asunto(s)

Algoritmos , Ensayos Clínicos como Asunto/métodos , Documentación , Determinación de la Elegibilidad/normas , Selección de Paciente , Factores de Edad , Protocolos Clínicos/clasificación , Documentación/estadística & datos numéricos , Humanos , Modelos Logísticos , Proyectos de Investigación , Factores Sexuales

Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature.

Wang, Xinglong; Rak, Rafal; Restificar, Angelo; Nobata, Chikashi; Rupp, C J; Batista-Navarro, Riza Theresa B; Nawaz, Raheel; Ananiadou, Sophia.

BMC Bioinformatics ; 12 Suppl 8: S11, 2011 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-22151769

RESUMEN

BACKGROUND: The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest's Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. RESULTS: We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task's development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew's Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. CONCLUSIONS: Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.

Asunto(s)

Minería de Datos , Proteómica , Humanos , Publicaciones Periódicas como Asunto , Mapeo de Interacción de Proteínas , Proteínas/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA