RESUMO
PURPOSE: Software tools are seeing increased use in three-dimensional treatment planning. However, the development of these tools frequently omits careful evaluation before placing them in clinical use. This study demonstrates the application of a rigorous evaluation methodology using blinded peer review to an automated software tool that produces ICRU-50 planning target volumes (PTVs). METHODS AND MATERIALS: Seven physicians from three different institutions involved in three-dimensional treatment planning participated in the evaluation. Four physicians drew partial PTVs on nine test cases, consisting of four nasopharynx and five lung primaries. Using the same information provided to the human experts, the computer tool generated PTVs for comparison. The remaining three physicians, designated evaluators, individually reviewed the PTVs for acceptability. To exclude bias, the evaluators were blinded to the source (human or computer) of the PTVs they reviewed. Their scorings of the PTVs were statistically examined to determine if the computer tool performed as well as the human experts. RESULTS: The computer tool was as successful as the human experts in generating PTVs. Failures were primarily attributable to insufficient margins around the clinical target volume and to encroachment upon critical structures. In a qualitative analysis, the human and computer experts displayed similar types and distributions of errors. CONCLUSIONS: Rigorous evaluation of computer-based radiotherapy tools requires comparison to current practice and can reveal areas for improvement before the tool enters clinical practice.
Assuntos
Sistemas Inteligentes , Dosagem Radioterapêutica , Planejamento da Radioterapia Assistida por Computador/métodos , Humanos , Neoplasias Pulmonares/radioterapia , Neoplasias Nasofaríngeas/radioterapia , Variações Dependentes do Observador , Análise de Regressão , Reprodutibilidade dos TestesRESUMO
PURPOSE: Three-dimensional treatment planning depends upon exact and consistent delineation of target volumes. This study tested whether different physicians from different institutions vary significantly in their creation of planning target volumes (PTVs). METHODS AND MATERIALS: Eight physicians from three different institutions created partial planning target volumes for nine clinical test cases. Their target volumes were evaluated qualitatively and quantitatively. Quantitative results were tested for significant differences. RESULTS: Qualitative analysis showed the physicians to vary in (a) the margin placed around the clinical target volume, (b) the margin used near critical structures, and (c) handling of concavities in the clinical target volume. Quantitative analysis showed these variations to result in statistically significant differences in the measured volume of the physicians' planning target volumes. CONCLUSIONS: Individual physicians and institutions differ significantly in their creation of planning target volumes, suggesting individual and institutional differences in the working definition for the PTV. Implications of this fact are discussed, along with areas where standardization can be improved.
Assuntos
Radioterapia (Especialidade)/normas , Planejamento da Radioterapia Assistida por Computador/normas , Análise de Variância , Humanos , Dosagem RadioterapêuticaRESUMO
This paper reports the evaluation of an expert system whose output is a three-dimensional geometric solid. Evaluating such an output emphasizes the problems of establishing a comparison standard, and of identifying and classifying deviations from that standard. Our evaluation design used a panel of physicians for the first task and a separate panel of expert judges for the second. We found that multi-parameter or multi-dimensional expert system outputs, such as this one, may result in lower overall performance scores and increased variation in acceptability to different physicians. We surmise that these effects are a consequence of the higher number of factors which may be deemed unacceptable. The effects appear, however, to be equal for computer and human output. This evaluation design is thus applicable to other expert systems producing similarly complex output.