Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
J Epidemiol ; 34(8): 380-386, 2024 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-38105001

RESUMEN

BACKGROUND: We evaluated the applicability of automated citation screening in developing clinical practice guidelines. METHODS: We prospectively compared the efficiency of citation screening between the conventional (Rayyan) and semi-automated (ASReview software) methods. We searched the literature for five clinical questions (CQs) in the development of the Japanese Clinical Practice Guidelines for the Management of Sepsis and Septic Shock. Objective measurements of the time required to complete citation screening were recorded. Following the first screening round, in the primary analysis, the sensitivity, specificity, positive predictive value, and overall screening time were calculated for both procedures using the semi-automated tool as index and the results of the conventional method as standard reference. In the secondary analysis, the same parameters were compared between the two procedures using the final list of included studies after the second screening session as standard reference. RESULTS: Among the five CQs after the first screening session, the highest and lowest sensitivity, specificity, and positive predictive values were 0.241 and 0.795; 0.991 and 1.000; and 0.482 and 0.929, respectively. In the secondary analysis, the highest sensitivity and specificity in the semi-automated citation screening were 1.000 and 0.997, respectively. The overall screening time per 100 studies was significantly shorter with semi-automated than with conventional citation screening. CONCLUSION: The potential advantages of the semi-automated method (shorter screening time and higher discriminatory rate for the final list of studies) warrant further validation.


Asunto(s)
Guías de Práctica Clínica como Asunto , Programas Informáticos , Carga de Trabajo , Humanos , Estudios Prospectivos , Carga de Trabajo/estadística & datos numéricos , Sensibilidad y Especificidad , Japón
2.
Syst Rev ; 12(1): 211, 2023 11 13.
Artículo en Inglés | MEDLINE | ID: mdl-37957691

RESUMEN

BACKGROUND: Conducting a systematic review is a time- and resource-intensive multi-step process. Enhancing efficiency without sacrificing accuracy and rigor during the screening phase of a systematic review is of interest among the scientific community. METHODS: This case study compares the screening performance of a title-only (Ti/O) screening approach to the more conventional title-plus-abstract (Ti + Ab) screening approach. Both Ti/O and Ti + Ab screening approaches were performed simultaneously during first-level screening of a systematic review investigating the relationship between dietary patterns and risk factors and incidence of sarcopenia. The qualitative and quantitative performance of each screening approach was compared against the final results of studies included in the systematic review, published elsewhere, which used the standard Ti + Ab approach. A statistical analysis was conducted, and contingency tables were used to compare each screening approach in terms of false inclusions and false exclusions and subsequent sensitivity, specificity, accuracy, and positive predictive power. RESULTS: Thirty-eight citations were included in the final analysis, published elsewhere. The current case study found that the Ti/O first-level screening approach correctly identified 22 citations and falsely excluded 16 citations, most often due to titles lacking a clear indicator of study design or outcomes relevant to the systematic review eligibility criteria. The Ti + Ab approach correctly identified 36 citations and falsely excluded 2 citations due to limited population and intervention descriptions in the abstract. Our analysis revealed that the performance of the Ti + Ab first-level screening was statistically different compared to the average performance of both approaches (Chi-squared: 5.21, p value 0.0225) while the Ti/O approach was not (chi-squared: 2.92, p value 0.0874). The predictive power of the first-level screening was 14.3% and 25.5% for the Ti/O and Ti + Ab approaches, respectively. In terms of sensitivity, 57.9% of studies were correctly identified at the first-level screening stage using the Ti/O approach versus 94.7% by the Ti + Ab approach. CONCLUSIONS: In the current case study comparing two screening approaches, the Ti + Ab screening approach captured more relevant studies compared to the Ti/O approach by including a higher number of accurately eligible citations. Ti/O screening may increase the likelihood of missing evidence leading to evidence selection bias. SYSTEMATIC REVIEW REGISTRATION: PROSPERO Protocol Number: CRD42020172655.


Asunto(s)
Sarcopenia , Humanos , Sarcopenia/diagnóstico , Proyectos de Investigación
3.
Syst Rev ; 12(1): 68, 2023 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-37061711

RESUMEN

OBJECTIVE: To investigate the usefulness and performance metrics of three freely-available softwares (Rayyan®, Abstrackr® and Colandr®) for title screening in systematic reviews. STUDY DESIGN AND SETTING: In this methodological study, the usefulness of softwares to screen titles in systematic reviews was investigated by the comparison between the number of titles identified by software-assisted screening and those by manual screening using a previously published systematic review. To test the performance metrics, sensitivity, specificity, false negative rate, proportion missed, workload and timing savings were calculated. A purposely built survey was used to evaluate the rater's experiences regarding the softwares' performances. RESULTS: Rayyan® was the most sensitive software and raters correctly identified 78% of the true positives. All three softwares were specific and raters correctly identified 99% of the true negatives. They also had similar values for precision, proportion missed, workload and timing savings. Rayyan®, Abstrackr® and Colandr® had 21%, 39% and 34% of false negatives rates, respectively. Rayyan presented the best performance (35/40) according to the raters. CONCLUSION: Rayyan®, Abstrackr® and Colandr® are useful tools and provided good metric performance results for systematic title screening. Rayyan® appears to be the best ranked on the quantitative and on the raters' perspective evaluation. The most important finding of this study is that the use of software to screen titles does not remove any title that would meet the inclusion criteria for the final review, being valuable resources to facilitate the screening process.


Asunto(s)
Aprendizaje Automático , Programas Informáticos , Humanos , Revisiones Sistemáticas como Asunto , Carga de Trabajo
4.
Res Synth Methods ; 14(2): 156-172, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35798691

RESUMEN

We aimed to evaluate the performance of supervised machine learning algorithms in predicting articles relevant for full-text review in a systematic review. Overall, 16,430 manually screened titles/abstracts, including 861 references identified relevant for full-text review were used for the analysis. Of these, 40% (n = 6573) were sub-divided for training (70%) and testing (30%) the algorithms. The remaining 60% (n = 9857) were used as a validation set. We evaluated down- and up-sampling methods and compared unigram, bigram, and singular value decomposition (SVD) approaches. For each approach, Naïve Bayes, Support Vector Machines (SVM), regularized logistic regressions, neural networks, random forest, Logit boost, and XGBoost were implemented using simple term frequency or Tf-Idf feature representations. Performance was evaluated using sensitivity, specificity, precision and area under the Curve. We combined predictions of the best-performing algorithms (Youden Index ≥0.3 with sensitivity/specificity≥70/60%). In a down-sample unigram approach, Naïve Bayes, SVM/quanteda text models with Tf-Idf, and linear SVM e1071 package with Tf-Idf achieved >90% sensitivity at specificity >65%. Combining the predictions of the 10 best-performing algorithms improved the performance to reach 95% sensitivity and 64% specificity in the validation set. Crude screening burden was reduced by 61% (5979) (adjusted: 80.3%) with 5% (27) false negativity rate. All the other approaches yielded relatively poorer performances. The down-sampling unigram approach achieved good performance in our data. Combining the predictions of algorithms improved sensitivity while screening burden was reduced by almost two-third. Implementing machine learning approaches in title/abstract screening should be investigated further toward refining these tools and automating their implementation.


Asunto(s)
Algoritmos , Aprendizaje Automático , Teorema de Bayes , Sensibilidad y Especificidad , Recolección de Datos
5.
Syst Rev ; 10(1): 98, 2021 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-33820560

RESUMEN

BACKGROUND: Accepted systematic review (SR) methodology requires citation screening by two reviewers to maximise retrieval of eligible studies. We hypothesized that records could be excluded by a single reviewer without loss of sensitivity in two conditions; the record was ineligible for multiple reasons, or the record was ineligible for one or more specific reasons that could be reliably assessed. METHODS: Twenty-four SRs performed at CHEO, a pediatric health care and research centre in Ottawa, Canada, were divided into derivation and validation sets. Exclusion criteria during abstract screening were sorted into 11 specific categories, with loss in sensitivity determined by individual category and by number of exclusion criteria endorsed. Five single reviewer algorithms that combined individual categories and multiple exclusion criteria were then tested on the derivation and validation sets, with success defined a priori as less than 5% loss of sensitivity. RESULTS: The 24 SRs included 930 eligible and 27390 ineligible citations. The reviews were mostly focused on pediatrics (70.8%, N=17/24), but covered various specialties. Using a single reviewer to exclude any citation led to an average loss of sensitivity of 8.6% (95%CI, 6.0-12.1%). Excluding citations with ≥2 exclusion criteria led to 1.2% average loss of sensitivity (95%CI, 0.5-3.1%). Five specific exclusion criteria performed with perfect sensitivity: conference abstract, ineligible age group, case report/series, not human research, and review article. In the derivation set, the five algorithms achieved a loss of sensitivity ranging from 0.0 to 1.9% and work-saved ranging from 14.8 to 39.1%. In the validation set, the loss of sensitivity for all 5 algorithms remained below 2.6%, with work-saved between 10.5% and 48.2%. CONCLUSIONS: Findings suggest that targeted application of single-reviewer screening, considering both type and number of exclusion criteria, could retain sensitivity and significantly decrease workload. Further research is required to investigate the potential for combining this approach with crowdsourcing or machine learning methodologies.


Asunto(s)
Algoritmos , Aprendizaje Automático , Revisiones Sistemáticas como Asunto , Niño , Humanos , Canadá , Tamizaje Masivo , Investigación
6.
Syst Rev ; 9(1): 73, 2020 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-32241297

RESUMEN

BACKGROUND: Improving the speed of systematic review (SR) development is key to supporting evidence-based medicine. Machine learning tools which semi-automate citation screening might improve efficiency. Few studies have assessed use of screening prioritization functionality or compared two tools head to head. In this project, we compared performance of two machine-learning tools for potential use in citation screening. METHODS: Using 9 evidence reports previously completed by the ECRI Institute Evidence-based Practice Center team, we compared performance of Abstrackr and EPPI-Reviewer, two off-the-shelf citations screening tools, for identifying relevant citations. Screening prioritization functionality was tested for 3 large reports and 6 small reports on a range of clinical topics. Large report topics were imaging for pancreatic cancer, indoor allergen reduction, and inguinal hernia repair. We trained Abstrackr and EPPI-Reviewer and screened all citations in 10% increments. In Task 1, we inputted whether an abstract was ordered for full-text screening; in Task 2, we inputted whether an abstract was included in the final report. For both tasks, screening continued until all studies ordered and included for the actual reports were identified. We assessed potential reductions in hypothetical screening burden (proportion of citations screened to identify all included studies) offered by each tool for all 9 reports. RESULTS: For the 3 large reports, both EPPI-Reviewer and Abstrackr performed well with potential reductions in screening burden of 4 to 49% (Abstrackr) and 9 to 60% (EPPI-Reviewer). Both tools had markedly poorer performance for 1 large report (inguinal hernia), possibly due to its heterogeneous key questions. Based on McNemar's test for paired proportions in the 3 large reports, EPPI-Reviewer outperformed Abstrackr for identifying articles ordered for full-text review, but Abstrackr performed better in 2 of 3 reports for identifying articles included in the final report. For small reports, both tools provided benefits but EPPI-Reviewer generally outperformed Abstrackr in both tasks, although these results were often not statistically significant. CONCLUSIONS: Abstrackr and EPPI-Reviewer performed well, but prioritization accuracy varied greatly across reports. Our work suggests screening prioritization functionality is a promising modality offering efficiency gains without giving up human involvement in the screening process.


Asunto(s)
Aprendizaje Automático , Tamizaje Masivo , Medicina Basada en la Evidencia , Humanos , Investigación , Revisiones Sistemáticas como Asunto
7.
J Biomed Inform ; 94: 103202, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31075531

RESUMEN

CONTEXT: Citation screening (also called study selection) is a phase of systematic review process that has attracted a growing interest on the use of text mining (TM) methods to support it to reduce time and effort. Search results are usually imbalanced between the relevant and the irrelevant classes of returned citations. Class imbalance among other factors has been a persistent problem that impairs the performance of TM models, particularly in the context of automatic citation screening for systematic reviews. This has often caused the performance of classification models using the basic title and abstract data to ordinarily fall short of expectations. OBJECTIVE: In this study, we explore the effects of using full bibliography data in addition to title and abstract on text classification performance for automatic citation screening. METHODS: We experiment with binary and Word2vec feature representations and SVM models using 4 software engineering (SE) and 15 medical review datasets. We build and compare 3 types of models (binary-non-linear, Word2vec-linear and Word2vec-non-linear kernels) with each dataset using the two feature sets. RESULTS: The bibliography enriched data exhibited consistent improved performance in terms of recall, work saved over sampling (WSS) and Matthews correlation coefficient (MCC) in 3 of the 4 SE datasets that are fairly large in size. For the medical datasets, the results vary, however in the majority of cases the performance is the same or better. CONCLUSION: Inclusion of the bibliography data provides the potential of improving the performance of the models but to date results are inconclusive.


Asunto(s)
Bibliografías como Asunto , Minería de Datos/métodos , Automatización , Biología Computacional/métodos , Modelos Teóricos
8.
Syst Rev ; 8(1): 23, 2019 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-30646959

RESUMEN

BACKGROUND: Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review. METHODS: We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis). RESULTS: ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm. CONCLUSIONS: This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.


Asunto(s)
Algoritmos , Aprendizaje Automático , Revisiones Sistemáticas como Asunto , Animales , Humanos , Bibliometría , Trastorno Depresivo , Modelos Animales , Sensibilidad y Especificidad , Carga de Trabajo
9.
Syst Rev ; 7(1): 166, 2018 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-30340633

RESUMEN

BACKGROUND: Systematic information retrieval generally requires a two-step selection process for studies, which is conducted by two persons independently of one another (double-screening approach). To increase efficiency, two methods seem promising, which will be tested in the planned study: the use of text mining to prioritize search results as well as the involvement of only one person in the study selection process (single-screening approach). The aim of the present study is to examine the following questions related to the process of study selection: Can the use of the Rayyan or EPPI Reviewer tools to prioritize the results of study selection increase efficiency? How accurately does a single-screening approach identify relevant studies? Which advantages or disadvantages (e.g., shortened screening time or increase in the number of full texts ordered) does a single-screening versus a double-screening approach have? METHODS: Our study is a prospective analysis of study selection processes based on benefit assessments of drug and non-drug interventions. It consists of two parts: firstly, the evaluation of a single-screening approach based on a sample size calculation (11 study selection processes, including 33 single screenings) and involving different screening tools and, secondly, the evaluation of the conventional double-screening approach based on five conventional study selection processes. In addition, the advantages and disadvantages of the single-screening versus the double-screening approach with regard to the outcomes "number of full texts ordered" and "time required for study selection" are analyzed. The previous work experience of the screeners is considered as a potential effect modifier. DISCUSSION: No study comparing the features of prioritization tools is currently available. Our study can thus contribute to filling this evidence gap. This study is also the first to investigate a range of questions surrounding the screening process and to include an a priori sample size calculation, thus enabling statistical conclusions. In addition, the impact of missing studies on the conclusion of a benefit assessment is calculated. SYSTEMATIC REVIEW REGISTRATION: Not applicable.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Revisiones Sistemáticas como Asunto , Minería de Datos , Humanos , Proyectos de Investigación
10.
Syst Rev ; 6(1): 233, 2017 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-29178925

RESUMEN

BACKGROUND: Citation screening for scoping searches and rapid review is time-consuming and inefficient, often requiring days or sometimes months to complete. We examined the reliability of PICo-based title-only screening using keyword searches based on the PICo elements-Participants, Interventions, and Comparators, but not the Outcomes. METHODS: A convenience sample of 10 datasets, derived from the literature searches of completed systematic reviews, was used to test PICo-based title-only screening. Search terms for screening were generated from the inclusion criteria of each review, specifically the PICo elements-Participants, Interventions and Comparators. Synonyms for the PICo terms were sought, including alternatives for clinical conditions, trade names of generic drugs and abbreviations for clinical conditions, interventions and comparators. The MeSH database, Wikipedia, Google searches and online thesauri were used to assist generating terms. Title-only screening was performed by five reviewers independently in Endnote X7 reference management software using OR Boolean operator. Outcome measures were recall of included studies and the reduction in screening effort. Recall is the proportion of included studies retrieved using PICo title-only screening out of the total number of included studies in the original reviews. The percentage reduction in screening effort is the proportion of records not needing screening because the method eliminates them from the screen set. RESULTS: Across the 10 reviews, the reduction in screening effort ranged from 11 to 78% with a median reduction of 53%. In nine systematic reviews, the recall of included studies was 100%. In one review (oxygen therapy), four of five reviewers missed the same included study (median recall 67%). A post hoc analysis was performed on the dataset with the lowest reduction in screening effort (11%), and it was rescreened using only the intervention and comparator keywords and omitting keywords for participants. The reduction in screening effort increased to 57%, and the recall of included studies was maintained (100%). CONCLUSIONS: In this sample of datasets, PICo-based title-only screening was able to expedite citation screening for scoping searches and rapid reviews by reducing the number of citations needed to screen but requires a thorough workup of the potential synonyms and alternative terms. Further research which evaluates the feasibility of this technique with heterogeneous datasets in different fields would be useful to inform the generalisability of this technique.


Asunto(s)
Bases de Datos Bibliográficas , Almacenamiento y Recuperación de la Información/métodos , Literatura de Revisión como Asunto , Humanos
11.
J Biomed Inform ; 73: 1-13, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28711679

RESUMEN

CONTEXT: Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. OBJECTIVE: In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. METHODS: The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. RESULTS: Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. CONCLUSIONS: The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and how any randomization was controlled. We introduce a checklist of information that needs to be provided in order to ensure that a published study can be reproduced.


Asunto(s)
Lista de Verificación , Minería de Datos , Literatura de Revisión como Asunto , Investigación Biomédica , Humanos , Publicaciones , Reproducibilidad de los Resultados
12.
J Biomed Inform ; 72: 67-76, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28648605

RESUMEN

Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews.


Asunto(s)
Literatura de Revisión como Asunto , Automatización , Curaduría de Datos , Humanos , Procesamiento de Lenguaje Natural
13.
Transl Pediatr ; 6(1): 18-26, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-28164026

RESUMEN

BACKGROUND: Completing large systematic reviews and maintaining them up to date poses significant challenges. This is mainly due to the toll required of a small group of experts to screen and extract potentially eligible citations. Automated approaches have failed so far in providing an accessible and adaptable tool to the research community. Over the past decade, crowdsourcing has become attractive in the scientific field, and implementing it in citation screening could save the investigative team significant work and decrease the time to publication. METHODS: Citations from the 2015 update of a pediatrics vitamin D systematic review were uploaded to an online platform designed for crowdsourcing the screening process (http://www.CHEORI.org/en/CrowdScreenOverview). Three sets of exclusion criteria were used for screening, with a review of abstracts at level one, and full-text eligibility determined through two screening stages. Two trained reviewers, who participated in the initial systematic review, established citation eligibility. In parallel, each citation received four independent assessments from an untrained crowd with a medical background. Citations were retained or excluded if they received three congruent assessments. Otherwise, they were reviewed by the principal investigator. Measured outcomes included sensitivity of the crowd to retain eligible studies, and potential work saved defined as citations sorted by the crowd (excluded or retained) without involvement of the principal investigator. RESULTS: A total of 148 citations for screening were identified, of which 20 met eligibility criteria (true positives). The four reviewers from the crowd agreed completely on 63% (95% CI: 57-69%) of assessments, and achieved a sensitivity of 100% (95% CI: 88-100%) and a specificity of 99% (95% CI: 96-100%). Potential work saved to the research team was 84% (95% CI: 77-89%) at the abstract screening stage, and 73% (95% CI: 67-79%) through all three levels. In addition, different thresholds for citation retention and exclusion were assessed. With an algorithm favoring sensitivity (citation excluded only if all four reviewers agree), sensitivity was maintained at 100%, with a decrease of potential work saved to 66% (95% CI: 59-71%). In contrast, increasing the threshold required for retention (exclude all citations not obtaining 3/4 retain assessments) decreased sensitivity to 85% (95% CI: 65-96%), while improving potential workload saved to 92% (95% CI: 88-95%). CONCLUSIONS: This study demonstrates the accuracy of crowdsourcing for systematic review citations screening, with retention of all eligible articles and a significant reduction in the work required from the investigative team. Together, these two findings suggest that crowdsourcing could represent a significant advancement in the area of systematic review. Future directions include further study to assess validity across medical fields and determination of the capacity of a non-medical crowd.

14.
J Biomed Inform ; 62: 59-65, 2016 08.
Artículo en Inglés | MEDLINE | ID: mdl-27293211

RESUMEN

Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/.


Asunto(s)
Aprendizaje Automático , Semántica , Clasificación , Humanos , Literatura de Revisión como Asunto , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA