Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
BMC Med Inform Decis Mak ; 19(1): 96, 2019 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-31068178

RESUMEN

OBJECTIVE: Assessing risks of bias in randomized controlled trials (RCTs) is an important but laborious task when conducting systematic reviews. RobotReviewer (RR), an open-source machine learning (ML) system, semi-automates bias assessments. We conducted a user study of RobotReviewer, evaluating time saved and usability of the tool. MATERIALS AND METHODS: Systematic reviewers applied the Cochrane Risk of Bias tool to four randomly selected RCT articles. Reviewers judged: whether an RCT was at low, or high/unclear risk of bias for each bias domain in the Cochrane tool (Version 1); and highlighted article text justifying their decision. For a random two of the four articles, the process was semi-automated: users were provided with ML-suggested bias judgments and text highlights. Participants could amend the suggestions if necessary. We measured time taken for the task, ML suggestions, usability via the System Usability Scale (SUS) and collected qualitative feedback. RESULTS: For 41 volunteers, semi-automation was quicker than manual assessment (mean 755 vs. 824 s; relative time 0.75, 95% CI 0.62-0.92). Reviewers accepted 301/328 (91%) of the ML Risk of Bias (RoB) judgments, and 202/328 (62%) of text highlights without change. Overall, ML suggested text highlights had a recall of 0.90 (SD 0.14) and precision of 0.87 (SD 0.21) with respect to the users' final versions. Reviewers assigned the system a mean 77.7 SUS score, corresponding to a rating between "good" and "excellent". CONCLUSIONS: Semi-automation (where humans validate machine learning suggestions) can improve the efficiency of evidence synthesis. Our system was rated highly usable, and expedited bias assessment of RCTs.


Asunto(s)
Sesgo , Aprendizaje Automático , Ensayos Clínicos Controlados Aleatorios como Asunto , Retroalimentación , Humanos , Estudios Prospectivos , Medición de Riesgo
2.
Hum Mutat ; 36(7): 712-9, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25871441

RESUMEN

Next-generation sequencing in clinical diagnostics is providing valuable genomic variant data, which can be used to support healthcare decisions. In silico tools to predict pathogenicity are crucial to assess such variants and we have evaluated a new tool, Combined Annotation Dependent Depletion (CADD), and its classification of gene variants in Lynch syndrome by using a set of 2,210 DNA mismatch repair gene variants. These had already been classified by experts from InSiGHT's Variant Interpretation Committee. Overall, we found CADD scores do predict pathogenicity (Spearman's ρ = 0.595, P < 0.001). However, we discovered 31 major discrepancies between the InSiGHT classification and the CADD scores; these were explained in favor of the expert classification using population allele frequencies, cosegregation analyses, disease association studies, or a second-tier test. Of 751 variants that could not be clinically classified by InSiGHT, CADD indicated that 47 variants were worth further study to confirm their putative pathogenicity. We demonstrate CADD is valuable in prioritizing variants in clinically relevant genes for further assessment by expert classification teams.


Asunto(s)
Biología Computacional , Reparación de la Incompatibilidad de ADN , Variación Genética , Modelos Moleculares , Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Estudios de Asociación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos
3.
J Am Med Inform Assoc ; 27(12): 1903-1912, 2020 12 09.
Artículo en Inglés | MEDLINE | ID: mdl-32940710

RESUMEN

OBJECTIVE: Randomized controlled trials (RCTs) are the gold standard method for evaluating whether a treatment works in health care but can be difficult to find and make use of. We describe the development and evaluation of a system to automatically find and categorize all new RCT reports. MATERIALS AND METHODS: Trialstreamer continuously monitors PubMed and the World Health Organization International Clinical Trials Registry Platform, looking for new RCTs in humans using a validated classifier. We combine machine learning and rule-based methods to extract information from the RCT abstracts, including free-text descriptions of trial PICO (populations, interventions/comparators, and outcomes) elements and map these snippets to normalized MeSH (Medical Subject Headings) vocabulary terms. We additionally identify sample sizes, predict the risk of bias, and extract text conveying key findings. We store all extracted data in a database, which we make freely available for download, and via a search portal, which allows users to enter structured clinical queries. Results are ranked automatically to prioritize larger and higher-quality studies. RESULTS: As of early June 2020, we have indexed 673 191 publications of RCTs, of which 22 363 were published in the first 5 months of 2020 (142 per day). We additionally include 304 111 trial registrations from the International Clinical Trials Registry Platform. The median trial sample size was 66. CONCLUSIONS: We present an automated system for finding and categorizing RCTs. This yields a novel resource: a database of structured information automatically extracted for all published RCTs in humans. We make daily updates of this database available on our website (https://trialstreamer.robotreviewer.net).


Asunto(s)
Curaduría de Datos , Manejo de Datos , Bases de Datos Factuales , Ensayos Clínicos Controlados Aleatorios como Asunto , Sesgo , Medicina Basada en la Evidencia , Humanos , Medical Subject Headings
4.
Res Synth Methods ; 9(4): 602-614, 2018 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29314757

RESUMEN

Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non-RCTs than widely used traditional database search filters at all sensitivity levels; our best-performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984-0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high-sensitivity strategies) and (2) rapid reviews and clinical question answering (high-precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open-source software to enable these approaches to be used in practice.


Asunto(s)
Bases de Datos Bibliográficas , Almacenamiento y Recuperación de la Información/métodos , Aprendizaje Automático , Ensayos Clínicos Controlados Aleatorios como Asunto , Literatura de Revisión como Asunto , Algoritmos , Medicina Basada en la Evidencia , Humanos , Almacenamiento y Recuperación de la Información/normas , Curva ROC , Sistema de Registros , Reproducibilidad de los Resultados , Motor de Búsqueda , Sensibilidad y Especificidad , Descriptores , Máquina de Vectores de Soporte
5.
Artículo en Inglés | MEDLINE | ID: mdl-29093610

RESUMEN

We present RobotReviewer, an open-source web-based system that uses machine learning and NLP to semi-automate biomedical evidence synthesis, to aid the practice of Evidence-Based Medicine. RobotReviewer processes full-text journal articles (PDFs) describing randomized controlled trials (RCTs). It appraises the reliability of RCTs and extracts text describing key trial characteristics (e.g., descriptions of the population) using novel NLP methods. RobotReviewer then automatically generates a report synthesising this information. Our goal is for RobotReviewer to automatically extract and synthesise the full-range of structured data needed to inform evidence-based practice.

6.
J Am Med Inform Assoc ; 23(1): 193-201, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26104742

RESUMEN

OBJECTIVE: To develop and evaluate RobotReviewer, a machine learning (ML) system that automatically assesses bias in clinical trials. From a (PDF-formatted) trial report, the system should determine risks of bias for the domains defined by the Cochrane Risk of Bias (RoB) tool, and extract supporting text for these judgments. METHODS: We algorithmically annotated 12,808 trial PDFs using data from the Cochrane Database of Systematic Reviews (CDSR). Trials were labeled as being at low or high/unclear risk of bias for each domain, and sentences were labeled as being informative or not. This dataset was used to train a multi-task ML model. We estimated the accuracy of ML judgments versus humans by comparing trials with two or more independent RoB assessments in the CDSR. Twenty blinded experienced reviewers rated the relevance of supporting text, comparing ML output with equivalent (human-extracted) text from the CDSR. RESULTS: By retrieving the top 3 candidate sentences per document (top3 recall), the best ML text was rated more relevant than text from the CDSR, but not significantly (60.4% ML text rated 'highly relevant' v 56.5% of text from reviews; difference +3.9%, [-3.2% to +10.9%]). Model RoB judgments were less accurate than those from published reviews, though the difference was <10% (overall accuracy 71.0% with ML v 78.3% with CDSR). CONCLUSION: Risk of bias assessment may be automated with reasonable accuracy. Automatically identified text supporting bias assessment is of equal quality to the manually identified text in the CDSR. This technology could substantially reduce reviewer workload and expedite evidence syntheses.


Asunto(s)
Algoritmos , Sesgo , Ensayos Clínicos como Asunto , Aprendizaje Automático , Revisión de la Investigación por Pares/métodos , Minería de Datos , Bases de Datos como Asunto , Procesamiento de Lenguaje Natural , Literatura de Revisión como Asunto
7.
Artículo en Inglés | MEDLINE | ID: mdl-27746703

RESUMEN

Systematic reviews underpin Evidence Based Medicine (EBM) by addressing precise clinical questions via comprehensive synthesis of all relevant published evidence. Authors of systematic reviews typically define a Population/Problem, Intervention, Comparator, and Outcome (a PICO criteria) of interest, and then retrieve, appraise and synthesize results from all reports of clinical trials that meet these criteria. Identifying PICO elements in the full-texts of trial reports is thus a critical yet time-consuming step in the systematic review process. We seek to expedite evidence synthesis by developing machine learning models to automatically extract sentences from articles relevant to PICO elements. Collecting a large corpus of training data for this task would be prohibitively expensive. Therefore, we derive distant supervision (DS) with which to train models using previously conducted reviews. DS entails heuristically deriving 'soft' labels from an available structured resource. However, we have access only to unstructured, free-text summaries of PICO elements for corresponding articles; we must derive from these the desired sentence-level annotations. To this end, we propose a novel method - supervised distant supervision (SDS) - that uses a small amount of direct supervision to better exploit a large corpus of distantly labeled instances by learning to pseudo-annotate articles using the available DS. We show that this approach tends to outperform existing methods with respect to automated PICO extraction.

8.
IEEE J Biomed Health Inform ; 19(4): 1406-12, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25966488

RESUMEN

Systematic reviews, which summarize the entirety of the evidence pertaining to a specific clinical question, have become critical for evidence-based decision making in healthcare. But such reviews have become increasingly onerous to produce due to the exponentially expanding biomedical literature base. This study proposes a step toward mitigating this problem by automating risk of bias assessment in systematic reviews, in which reviewers determine whether study results may be affected by biases (e.g., poor randomization or blinding). Conducting risk of bias assessment is an important but onerous task. We thus describe a machine learning approach to automate this assessment, using the standard Cochrane Risk of Bias Tool which assesses seven common types of bias. Training such a system would typically require a large labeled corpus, which would be prohibitively expensive to collect here. Instead, we use distant supervision, using data from the Cochrane Database of Systematic Reviews (a large repository of systematic reviews), to pseudoannotate a corpus of 2200 clinical trial reports in PDF format. We then develop a joint model which, using the full text of a clinical trial report as input, predicts the risks of bias while simultaneously extracting the text fragments supporting these assessments. This study represents a step toward automating or semiautomating extraction of data necessary for the synthesis of clinical trials.


Asunto(s)
Sesgo , Ensayos Clínicos como Asunto , Medición de Riesgo/métodos , Humanos , Informática Médica/métodos , Procesamiento de Lenguaje Natural
9.
Source Code Biol Med ; 10: 14, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26587054

RESUMEN

BACKGROUND: Today researchers can choose from many bioinformatics protocols for all types of life sciences research, computational environments and coding languages. Although the majority of these are open source, few of them possess all virtues to maximize reuse and promote reproducible science. Wikipedia has proven a great tool to disseminate information and enhance collaboration between users with varying expertise and background to author qualitative content via crowdsourcing. However, it remains an open question whether the wiki paradigm can be applied to bioinformatics protocols. RESULTS: We piloted PyPedia, a wiki where each article is both implementation and documentation of a bioinformatics computational protocol in the python language. Hyperlinks within the wiki can be used to compose complex workflows and induce reuse. A RESTful API enables code execution outside the wiki. Initial content of PyPedia contains articles for population statistics, bioinformatics format conversions and genotype imputation. Use of the easy to learn wiki syntax effectively lowers the barriers to bring expert programmers and less computer savvy researchers on the same page. CONCLUSIONS: PyPedia demonstrates how wiki can provide a collaborative development, sharing and even execution environment for biologists and bioinformaticians that complement existing resources, useful for local and multi-center research teams. AVAILABILITY: PyPedia is available online at: http://www.pypedia.com. The source code and installation instructions are available at: https://github.com/kantale/PyPedia_server. The PyPedia python library is available at: https://github.com/kantale/pypedia. PyPedia is open-source, available under the BSD 2-Clause License.

10.
Biopreserv Biobank ; 13(3): 178-82, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26035007

RESUMEN

Owners of biobanks are in an unfortunate position: on the one hand, they need to protect the privacy of their participants, whereas on the other, their usefulness relies on the disclosure of the data they hold. Existing methods for Statistical Disclosure Control attempt to find a balance between utility and confidentiality, but come at a cost for the analysts of the data. We outline an alternative perspective to the balance between confidentiality and utility. By combining the generation of synthetic data with the automated execution of data analyses, biobank owners can guarantee the privacy of their participants, yet allow the analysts to work in an unrestricted manner.


Asunto(s)
Bancos de Muestras Biológicas , Bases de Datos como Asunto , Revelación , Estadística como Asunto/métodos , Humanos , Estadísticas no Paramétricas
11.
J Am Med Inform Assoc ; 22(1): 65-75, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25361575

RESUMEN

OBJECTIVE: Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of data sources. However, searching for desired data elements among the thousands of available elements and harmonizing differences in terminology, data collection, and structure, is arduous and time consuming. MATERIALS AND METHODS: To speed up biobank data pooling we developed BiobankConnect, a system to semi-automatically match desired data elements to available elements by: (1) annotating the desired elements with ontology terms using BioPortal; (2) automatically expanding the query for these elements with synonyms and subclass information using OntoCAT; (3) automatically searching available elements for these expanded terms using Lucene lexical matching; and (4) shortlisting relevant matches sorted by matching score. RESULTS: We evaluated BiobankConnect using human curated matches from EU-BioSHaRE, searching for 32 desired data elements in 7461 available elements from six biobanks. We found 0.75 precision at rank 1 and 0.74 recall at rank 10 compared to a manually curated set of relevant matches. In addition, best matches chosen by BioSHaRE experts ranked first in 63.0% and in the top 10 in 98.4% of cases, indicating that our system has the potential to significantly reduce manual matching work. CONCLUSIONS: BiobankConnect provides an easy user interface to significantly speed up the biobank harmonization process. It may also prove useful for other forms of biomedical data integration. All the software can be downloaded as a MOLGENIS open source app from http://www.github.com/molgenis, with a demo available at http://www.biobankconnect.org.


Asunto(s)
Indización y Redacción de Resúmenes , Ontologías Biológicas , Biología Computacional , Conjuntos de Datos como Asunto , Programas Informáticos , Humanos , Integración de Sistemas , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA