RESUMO
Background and purpose: Glioblastoma (GBM) patients have a dismal prognosis. Tumours typically recur within months of surgical resection and post-operative chemoradiation. Multiparametric magnetic resonance imaging (mpMRI) biomarkers promise to improve GBM outcomes by identifying likely regions of infiltrative tumour in tumour probability (TP) maps. These regions could be treated with escalated dose via dose-painting radiotherapy to achieve higher rates of tumour control. Crucial to the technical validation of dose-painting using imaging biomarkers is the repeatability of the derived dose prescriptions. Here, we quantify repeatability of dose-painting prescriptions derived from mpMRI. Materials and methods: TP maps were calculated with a clinically validated model that linearly combined apparent diffusion coefficient (ADC) and relative cerebral blood volume (rBV) or ADC and relative cerebral blood flow (rBF) data. Maps were developed for 11 GBM patients who received two mpMRI scans separated by a short interval prior to chemoradiation treatment. A linear dose mapping function was applied to obtain dose-painting prescription (DP) maps for each session. Voxel-wise and group-wise repeatability metrics were calculated for parametric, TP and DP maps within radiotherapy margins. Results: DP maps derived from mpMRI were repeatable between imaging sessions (ICC > 0.85). ADC maps showed higher repeatability than rBV and rBF maps (Wilcoxon test, p = 0.001). TP maps obtained from the combination of ADC and rBF were the most stable (median ICC: 0.89). Conclusions: Dose-painting prescriptions derived from a mpMRI model of tumour infiltration have a good level of repeatability and can be used to generate reliable dose-painting plans for GBM patients.
RESUMO
Imaging biomarkers require technical, biological, and clinical validation to be translated into robust tools in research or clinical settings. This study contributes to the technical validation of radiomic features from magnetic resonance imaging (MRI) by evaluating the repeatability of features from four MR sequences: pre-contrast T1- and T2-weighted images, pre-contrast quantitative T1 maps (qT1), and contrast-enhanced T1-weighted images. Fifty-one patients with colorectal cancer liver metastases were scanned twice, up to 7 days apart. Repeatability was quantified using the intraclass correlation coefficient (ICC) and repeatability coefficient (RC), and the impact of non-Gaussian feature distributions and image normalisation was evaluated. Most radiomic features had non-Gaussian distributions, but Box-Cox transformations enabled ICCs and RCs to be calculated appropriately for an average of 97% of features across sequences. ICCs ranged from 0.30 to 0.99, with volume and other shape features tending to be most repeatable; volume ICC > 0.98 for all sequences. 19% of features from non-normalised images exhibited significantly different ICCs in pair-wise sequence comparisons. Normalisation tended to increase ICCs for pre-contrast T1- and T2-weighted images, and decrease ICCs for qT1 maps. RCs tended to vary more between sequences than ICCs, showing that evaluations of feature performance depend on the chosen metric. This work suggests that feature-specific repeatability, from specific combinations of MR sequence and pre-processing steps, should be evaluated to select robust radiomic features as biomarkers in specific studies. In addition, as different repeatability metrics can provide different insights into a specific feature, consideration of the appropriate metric should be taken in a study-specific context.
RESUMO
Variance stabilization is an important step in the statistical assessment of quantitative imaging biomarkers. The objective of this study is to compare the Log and the Box-Cox transformations for variance stabilization in the context of assessing the performance of a particular quantitative imaging biomarker, the estimation of lung nodule volume from computed tomography images. First, a model is developed to generate and characterize repeated measurements typically observed in computed tomography lung nodule volume estimation. Given this model, we derive the parameter of the Box-Cox transformation that stabilizes the variance of the measurements across lung nodule volumes. Second, simulated, phantom, and clinical datasets are used to compare the Log and the Box-Cox transformations. Two metrics are used for quantifying the stability of the measurements across the transformed lung nodule volumes: the coefficient of variation for the standard deviation and the repeatability coefficient. The results for simulated, phantom, and clinical datasets show that the Box-Cox transformation generally had better variance stabilization performance compared to the Log transformation for lung nodule volume estimates from computed tomography scans.
Assuntos
Neoplasias Pulmonares , Nódulo Pulmonar Solitário , Biomarcadores , Humanos , Pulmão/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico por imagem , Imagens de Fantasmas , Reprodutibilidade dos Testes , Nódulo Pulmonar Solitário/diagnóstico por imagem , Tomografia Computadorizada por Raios XRESUMO
RATIONALE AND OBJECTIVES: Quantitative imaging biomarkers (QIBs) are becoming increasingly adopted into clinical practice to monitor changes in patients' conditions. The repeatability coefficient (RC) is the clinical cut-point used to discern between changes in a biomarker's measurements due to measurement error and changes that exceed measurement error, thus indicating real change in the patient. Imaging biomarkers have characteristics that make them difficult for estimating the repeatability coefficient, including nonconstant error, non-Gaussian distributions, and measurement error that must be estimated from small studies. METHODS: We conducted a Monte Carlo simulation study to investigate how well three statistical methods for estimating the repeatability coefficient perform under five settings common for QIBs. RESULTS: When the measurement error is constant and replicates are normally distributed, all of the statistical methods perform well. When the measurement error is proportional to the true value, approaches that use the log transformation or coefficient of variation perform similarly. For other common settings, none of the methods for estimating the repeatability coefficient perform adequately. CONCLUSION: Many of the common approaches to estimating the repeatability coefficient perform well for only limited scenarios. The optimal approach depends strongly on the pattern of the within-subject variability; thus, a precision profile is critical in evaluating the technical performance of QIBs. Asymmetric bounds for detecting regression vs progression can be implemented and should be used when clinically appropriate.
Assuntos
Biomarcadores/análise , Diagnóstico por Imagem , Simulação por Computador , Progressão da Doença , Humanos , Método de Monte Carlo , Reprodutibilidade dos TestesRESUMO
OBJECTIVE: To evaluate the frequency and adequacy of statistical analyses in a general radiology journal when reporting a reliability analysis for a diagnostic test. MATERIALS AND METHODS: Sixty-three studies of diagnostic test accuracy (DTA) and 36 studies reporting reliability analyses published in the Korean Journal of Radiology between 2012 and 2016 were analyzed. Studies were judged using the methodological guidelines of the Radiological Society of North America-Quantitative Imaging Biomarkers Alliance (RSNA-QIBA), and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. DTA studies were evaluated by nine editorial board members of the journal. Reliability studies were evaluated by study reviewers experienced with reliability analysis. RESULTS: Thirty-one (49.2%) of the 63 DTA studies did not include a reliability analysis when deemed necessary. Among the 36 reliability studies, proper statistical methods were used in all (5/5) studies dealing with dichotomous/nominal data, 46.7% (7/15) of studies dealing with ordinal data, and 95.2% (20/21) of studies dealing with continuous data. Statistical methods were described in sufficient detail regarding weighted kappa in 28.6% (2/7) of studies and regarding the model and assumptions of intraclass correlation coefficient in 35.3% (6/17) and 29.4% (5/17) of studies, respectively. Reliability parameters were used as if they were agreement parameters in 23.1% (3/13) of studies. Reproducibility and repeatability were used incorrectly in 20% (3/15) of studies. CONCLUSION: Greater attention to the importance of reporting reliability, thorough description of the related statistical methods, efforts not to neglect agreement parameters, and better use of relevant terminology is necessary.
Assuntos
Revisão da Pesquisa por Pares , Interface Usuário-Computador , Testes Diagnósticos de Rotina , Humanos , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Current surgical treatment for adolescent idiopathic scoliosis (AIS) involves correction in both the coronal and sagittal plane, and thorough assessment of these parameters is essential for evaluation of surgical results. However, various definitions of thoracic kyphosis (TK) have been proposed, and the intra- and inter-rater reproducibility of these measures has not been determined. As such, the purpose of the current study was to determine the intra- and inter-rater reproducibility of several TK measurements used in the assessment of AIS. METHODS: Twenty patients (90% females) surgically treated for AIS with alternate-level pedicle screw fixation were included in the study. Three raters independently evaluated pre- and postoperative standing lateral plain radiographs. For each radiograph, several definitions of TK were measured as well as L1-S1 and nonfixed lumbar lordosis. All variables were measured twice 14 days apart, and a mixed effects model was used to determine the repeatability coefficient (RC), which is a measure of the agreement between repeated measurements. Also, the intra- and inter-rater intra-class correlation coefficient (ICC) was determined as a measure of reliability. RESULTS: Preoperative median Cobb angle was 58° (range 41°-86°), and median surgical curve correction was 68% (range 49-87%). Overall intra-rater RC was highest for T2-T12 and nonfixed TK (11°) and lowest for T4-T12 and T5-T12 (8°). Inter-rater RC was highest for T1-T12, T1-nonfixed, and nonfixed TK (13°) and lowest for T5-T12 (9°). Agreement varied substantially between pre- and postoperative radiographs. Inter-rater ICC was highest for T4-T12 (0.92; 95% CI 0.88-0.95) and T5-T12 (0.92; 95% CI 0.88-0.95) and lowest for T1-nonfixed (0.80; 95% CI 0.72-0.88). CONCLUSIONS: Considerable variation for all TK measurements was noted. Intra- and inter-rater reproducibility was best for T4-T12 and T5-T12. Future studies should consider adopting a relevant minimum difference as a limit for true change in TK.
RESUMO
BACKGROUND: Quantitative measurement procedures need to be accurate and precise to justify their clinical use. Precision reflects deviation of groups of measurement from another, often expressed as proportions of agreement, standard errors of measurement, coefficients of variation, or the Bland-Altman plot. We suggest variance component analysis (VCA) to estimate the influence of errors due to single elements of a PET scan (scanner, time point, observer, etc.) to express the composite uncertainty of repeated measurements and obtain relevant repeatability coefficients (RCs) which have a unique relation to Bland-Altman plots. Here, we present this approach for assessment of intra- and inter-observer variation with PET/CT exemplified with data from two clinical studies. METHODS: In study 1, 30 patients were scanned pre-operatively for the assessment of ovarian cancer, and their scans were assessed twice by the same observer to study intra-observer agreement. In study 2, 14 patients with glioma were scanned up to five times. Resulting 49 scans were assessed by three observers to examine inter-observer agreement. Outcome variables were SUVmax in study 1 and cerebral total hemispheric glycolysis (THG) in study 2. RESULTS: In study 1, we found a RC of 2.46 equalling half the width of the Bland-Altman limits of agreement. In study 2, the RC for identical conditions (same scanner, patient, time point, and observer) was 2392; allowing for different scanners increased the RC to 2543. Inter-observer differences were negligible compared to differences owing to other factors; between observer 1 and 2: -10 (95 % CI: -352 to 332) and between observer 1 vs 3: 28 (95 % CI: -313 to 370). CONCLUSIONS: VCA is an appealing approach for weighing different sources of variation against each other, summarised as RCs. The involved linear mixed effects models require carefully considered sample sizes to account for the challenge of sufficiently accurately estimating variance components.
RESUMO
AIM: To compare the precision of the Topcon SP-3000P noncontact specular microscope (NCSM) and the DGH 500 ultrasound pachymeter (USP). METHODS: Triplicate measurements of central corneal thickness (CCT) for 100 eyes were taken with an NCSM and a USP in 2 visits separated by 1 week. Repeatability was assessed by computing the differences between all 3 readings from each subject. Coefficients of repeatability and reproducibility were computed. RESULTS: MEAN CCT AS MEASURED BY EACH INSTRUMENT WERE: 518.53 ± 34.96 µm (range 417.33-592.67) and 516.94 ± 33.60 µm (range 431.67-582.67) for sessions 1 and 2 respectively, with the NCSM; 546.69 ± 36.62 µm (range 457.33-617.00) and 549.78 ± 35.26 µm (range 454.00-618.67) for sessions 1 and 2 respectively, with the USP. The ultrasound CCT measurements were consistently higher than those obtained with the NCSM in both sessions 28.17 ± 19.20 µm (mean ± SD, session 1) and 32.81 ± 14.04 (mean ± SD, session 2). The repeatability coefficient for the NCSM was better in both sessions than those for USP (±10 µm vs ± 12 µm in session 1 and ±8 µm vs ±10 µm in session 2). The reproducibility coefficient with the NCSM was half that with the USP (±21 µm vs ±41 µm). CONCLUSION: The SP-3000P NCSM is a more precise and reproducible instrument for measurement of CCT than the USP, but both instruments are reliable, useful instruments for measuring CCT.
RESUMO
Objetiva-se com este trabalho estimar a repetibilidade para caracteres forrageiros de Panicum, e determinar o número de cortes de avaliação necessários para a seleção de genótipos de Panicum, com confiabilidade. Utilizaram-se os dados de um ensaio conduzido no período de 21/11/2002 a 08/04/2005, no Campo Experimental da Embrapa Gado de Leite, localizado em Valença-RJ, onde foram realizados 15 cortes de avaliação. No ensaio, foram avaliados 23 genótipos de Panicum maximum, em parcelas experimentais, dispostas no delineamento de blocos casualizados, com três repetições. Foram estimados os coeficientes de repetibilidade para as características produção de matéria verde de forragem (PMV); produção de matéria seca de forragem (PMS) e de folhas (PMSF); porcentagem de folhas na PMS ( por centoFOL) e altura da planta (AP), utilizando os métodos da análise de variância, componentes principais e análise estrutural. Para todas as características avaliadas os efeitos de genótipos, cortes e interação genótipos x cortes foram significativos (P<0,01). Quando se considerou o coeficiente de determinação de 85 por cento, o número de avaliações (cortes) necessários para a determinação do valor real dos genótipos foram de 10, 9, 7, 11 e 3, respectivamente, para PMV, PMS, PMSF, por centoFOL e AP. O método dos componentes principais e o da análiseestrutural (baseado na matriz de correlações) foram concordantes para todas as características avaliadas. A realização de 10 cortes de avaliação permite discriminar o valor real dos genótipos de Panicum, com confiabilidade superior a 85 por cento, para a maioria das características avaliadas.
The objective of this work was to estimate the repeatability for forage characters of Panicum and to determinate the necessary number of evaluation cuts to select Panicum genotypes with confidence. Data of a trial with 15 cuts, carried out between 21/11/2002 and 08/04/2005 in the experimental station of Embrapa Gado de Leite located in Valença, RJ, Brazil, were used. In this study. 23 genotypes of " Panicum maximum" were evaluated, in a complete randomized block, with three replications. The coefficient of repeatability for fresh forage production (PMV), total plant dry matter production (PMS) and leaves dry matter production (PMSF) were recorded along with leaves percentage in PMS ( percentFOL) and plant hight (AP), using the variance analysis, main components and structural analysis methods. For all evaluated parameters the effects of genotype, cut and genotype x cut interaction were significant (P<0.01). When considering the determination coefficient as 85 percent, the required number of measures (cuts) to determine the real value of genotypes were 10, 9, 7, 11 and 3, for PMV, PMS, PMSF, percentFOL and AP, respectively. Main components and structural analysis methods (based on the correlation matrix) did agree for all features evaluated. The utilization of data from 10 cuts allows to discriminate, with confidence higher than 85 percent, the real value for most of the evaluated characters of Panicum genotypes.