Pesquisa | Portal de Pesquisa da BVS Enfermagem

Error rates of human reviewers during abstract screening in systematic reviews.

Wang, Zhen; Nayfeh, Tarek; Tetzlaff, Jennifer; O'Blenis, Peter; Murad, Mohammad Hassan.

PLoS One ; 15(1): e0227742, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31935267

RESUMO

BACKGROUND: Automated approaches to improve the efficiency of systematic reviews are greatly needed. When testing any of these approaches, the criterion standard of comparison (gold standard) is usually human reviewers. Yet, human reviewers make errors in inclusion and exclusion of references. OBJECTIVES: To determine citation false inclusion and false exclusion rates during abstract screening by pairs of independent reviewers. These rates can help in designing, testing and implementing automated approaches. METHODS: We identified all systematic reviews conducted between 2010 and 2017 by an evidence-based practice center in the United States. Eligible reviews had to follow standard systematic review procedures with dual independent screening of abstracts and full texts, in which citation inclusion by one reviewer prompted automatic inclusion through the next level of screening. Disagreements between reviewers during full text screening were reconciled via consensus or arbitration by a third reviewer. A false inclusion or exclusion was defined as a decision made by a single reviewer that was inconsistent with the final included list of studies. RESULTS: We analyzed a total of 139,467 citations that underwent 329,332 inclusion and exclusion decisions from 86 unique reviewers. The final systematic reviews included 5.48% of the potential references identified through bibliographic database search (95% confidence interval (CI): 2.38% to 8.58%). After abstract screening, the total error rate (false inclusion and false exclusion) was 10.76% (95% CI: 7.43% to 14.09%). CONCLUSIONS: This study suggests important false inclusion and exclusion rates by human reviewers. When deciding the validity of a future automated study selection algorithm, it is important to keep in mind that the gold standard is not perfect and that achieving error rates similar to humans may be adequate and can save resources and time.

Assuntos

Revisões Sistemáticas como Assunto , Indexação e Redação de Resumos , Algoritmos , Bases de Dados Bibliográficas , Processamento Eletrônico de Dados , Humanos , Projetos de Pesquisa

Many quality measurements, but few quality measures assessing the quality of breast cancer care in women: a systematic review.

Schachter, Howard M; Mamaladze, Vasil; Lewin, Gabriela; Graham, Ian D; Brouwers, Melissa; Sampson, Margaret; Morrison, Andra; Zhang, Li; O'Blenis, Peter; Garritty, Chantelle.

BMC Cancer ; 6: 291, 2006 Dec 18.

Artigo em Inglês | MEDLINE | ID: mdl-17176480

RESUMO

BACKGROUND: Breast cancer in women is increasingly frequent, and care is complex, onerous and expensive, all of which lend urgency to improvements in care. Quality measurement is essential to monitor effectiveness and to guide improvements in healthcare. METHODS: Ten databases, including Medline, were searched electronically to identify measures assessing the quality of breast cancer care in women (diagnosis, treatment, followup, documentation of care). Eligible studies measured adherence to standards of breast cancer care in women diagnosed with, or in treatment for, any histological type of adenocarcinoma of the breast. Reference lists of studies, review articles, web sites, and files of experts were searched manually. Evidence appraisal entailed dual independent assessments of data (e.g., indicators used in quality measurement). The extent of each quality indicator's scientific validation as a measure was assessed. The American Society of Clinical Oncology (ASCO) was asked to contribute quality measures under development. RESULTS: Sixty relevant reports identified 58 studies with 143 indicators assessing adherence to quality breast cancer care. A paucity of validated indicators (n = 12), most of which assessed quality of life, only permitted a qualitative data synthesis. Most quality indicators evaluated processes of care. CONCLUSION: While some studies revealed patterns of under-use of care, all adherence data require confirmation using validated quality measures. ASCO's current development of a set of quality measures relating to breast cancer care may hold the key to conducting definitive studies.

Assuntos

Neoplasias da Mama/terapia , Feminino , Humanos , Avaliação de Processos e Resultados em Cuidados de Saúde , Seleção de Pacientes , Resultado do Tratamento

Exploiting the systematic review protocol for classification of medical abstracts.

Frunza, Oana; Inkpen, Diana; Matwin, Stan; Klement, William; O'Blenis, Peter.

Artif Intell Med ; 51(1): 17-25, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21084178

RESUMO

OBJECTIVE: To determine whether the automatic classification of documents can be useful in systematic reviews on medical topics, and specifically if the performance of the automatic classification can be enhanced by using the particular protocol of questions employed by the human reviewers to create multiple classifiers. METHODS AND MATERIALS: The test collection is the data used in large-scale systematic review on the topic of the dissemination strategy of health care services for elderly people. From a group of 47,274 abstracts marked by human reviewers to be included in or excluded from further screening, we randomly selected 20,000 as a training set, with the remaining 27,274 becoming a separate test set. As a machine learning algorithm we used complement naïve Bayes. We tested both a global classification method, where a single classifier is trained on instances of abstracts and their classification (i.e., included or excluded), and a novel per-question classification method that trains multiple classifiers for each abstract, exploiting the specific protocol (questions) of the systematic review. For the per-question method we tested four ways of combining the results of the classifiers trained for the individual questions. As evaluation measures, we calculated precision and recall for several settings of the two methods. It is most important not to exclude any relevant documents (i.e., to attain high recall for the class of interest) but also desirable to exclude most of the non-relevant documents (i.e., to attain high precision on the class of interest) in order to reduce human workload. RESULTS: For the global method, the highest recall was 67.8% and the highest precision was 37.9%. For the per-question method, the highest recall was 99.2%, and the highest precision was 63%. The human-machine workflow proposed in this paper achieved a recall value of 99.6%, and a precision value of 17.8%. CONCLUSION: The per-question method that combines classifiers following the specific protocol of the review leads to better results than the global method in terms of recall. Because neither method is efficient enough to classify abstracts reliably by itself, the technology should be applied in a semi-automatic way, with a human expert still involved. When the workflow includes one human expert and the trained automatic classifier, recall improves to an acceptable level, showing that automatic classification techniques can reduce the human workload in the process of building a systematic review.

Assuntos

Indexação e Redação de Resumos , Inteligência Artificial , Bibliometria , Mineração de Dados , Bases de Dados Bibliográficas , Publicações/classificação , Revisões Sistemáticas como Assunto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Medicina Baseada em Evidências , Serviços de Saúde para Idosos , Humanos , Reconhecimento Automatizado de Padrão , Fluxo de Trabalho

A new algorithm for reducing the workload of experts in performing systematic reviews.

Matwin, Stan; Kouznetsov, Alexandre; Inkpen, Diana; Frunza, Oana; O'Blenis, Peter.

J Am Med Inform Assoc ; 17(4): 446-53, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20595313

RESUMO

OBJECTIVE: To determine whether a factorized version of the complement naïve Bayes (FCNB) classifier can reduce the time spent by experts reviewing journal articles for inclusion in systematic reviews of drug class efficacy for disease treatment. DESIGN: The proposed classifier was evaluated on a test collection built from 15 systematic drug class reviews used in previous work. The FCNB classifier was constructed to classify each article as containing high-quality, drug class-specific evidence or not. Weight engineering (WE) techniques were added to reduce underestimation for Medical Subject Headings (MeSH)-based and Publication Type (PubType)-based features. Cross-validation experiments were performed to evaluate the classifier's parameters and performance. MEASUREMENTS: Work saved over sampling (WSS) at no less than a 95% recall was used as the main measure of performance. RESULTS: The minimum workload reduction for a systematic review for one topic, achieved with a FCNB/WE classifier, was 8.5%; the maximum was 62.2% and the average over the 15 topics was 33.5%. This is 15.0% higher than the average workload reduction obtained using a voting perceptron-based automated citation classification system. CONCLUSION: The FCNB/WE classifier is simple, easy to implement, and produces significantly better results in reducing the workload than previously achieved. The results support it being a useful algorithm for machine-learning-based automation of systematic reviews of drug class efficacy for disease treatment.

Assuntos

Técnicas de Apoio para a Decisão , Tratamento Farmacológico , Medicina Baseada em Evidências/classificação , Armazenamento e Recuperação da Informação/classificação , Literatura de Revisão como Assunto , Algoritmos , Automação , Teorema de Bayes , Eficiência , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA