Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 56
Filter
1.
Clin Exp Dermatol ; 47(9): 1658-1665, 2022 Sep.
Article in English | MEDLINE | ID: mdl-35426450

ABSTRACT

BACKGROUND: Previous studies of second opinions in the diagnosis of melanocytic skin lesions have examined blinded second opinions, which do not reflect usual clinical practice. The current study, conducted in the USA, investigated both blinded and nonblinded second opinions for their impact on diagnostic accuracy. METHODS: In total, 100 melanocytic skin biopsy cases, ranging from benign to invasive melanoma, were interpreted by 74 dermatopathologists. Subsequently, 151 dermatopathologists performed nonblinded second and third reviews. We compared the accuracy of single reviewers, second opinions obtained from independent, blinded reviewers and second opinions obtained from sequential, nonblinded reviewers. Accuracy was defined with respect to a consensus reference diagnosis. RESULTS: The mean case-level diagnostic accuracy of single reviewers was 65.3% (95% CI 63.4-67.2%). Second opinions arising from sequential, nonblinded reviewers significantly improved accuracy to 69.9% (95% CI 68.0-71.7%; P < 0.001). Similarly, second opinions arising from blinded reviewers improved upon the accuracy of single reviewers (69.2%; 95% CI 68.0-71.7%). Nonblinded reviewers were more likely than blinded reviewers to give diagnoses in the same diagnostic classes as the first diagnosis. Nonblinded reviewers tended to be more confident when they agreed with previous reviewers, even with inaccurate diagnoses. CONCLUSION: We found that both blinded and nonblinded second reviewers offered a similar modest improvement in diagnostic accuracy compared with single reviewers. Obtaining second opinions with knowledge of previous reviews tends to generate agreement among reviews, and may generate unwarranted confidence in an inaccurate diagnosis. Combining aspects of both blinded and nonblinded review in practice may leverage the advantages while mitigating the disadvantages of each approach. Specifically, a second pathologist could give an initial diagnosis blinded to the results of the first pathologist, with subsequent nonblinded discussion between the two pathologists if their diagnoses differ.


Subject(s)
Melanoma , Skin Neoplasms , Humans , Melanocytes/pathology , Melanoma/diagnosis , Melanoma/pathology , Pathologists , Referral and Consultation , Skin Neoplasms/diagnosis , Skin Neoplasms/pathology
2.
Biom J ; 63(6): 1223-1240, 2021 08.
Article in English | MEDLINE | ID: mdl-33871887

ABSTRACT

Biomarkers abound in many areas of clinical research, and often investigators are interested in combining them for diagnosis, prognosis, or screening. In many applications, the true positive rate (TPR) for a biomarker combination at a prespecified, clinically acceptable false positive rate (FPR) is the most relevant measure of predictive capacity. We propose a distribution-free method for constructing biomarker combinations by maximizing the TPR while constraining the FPR. Theoretical results demonstrate desirable properties of biomarker combinations produced by the new method. In simulations, the biomarker combination provided by our method demonstrated improved operating characteristics in a variety of scenarios when compared with alternative methods for constructing biomarker combinations. Thus, use of our method could lead to the development of better biomarker combinations, increasing the likelihood of clinical adoption.


Subject(s)
Mass Screening , Biomarkers , False Positive Reactions , Probability , Prognosis
3.
Cancer Epidemiol Biomarkers Prev ; 29(12): 2575-2582, 2020 12.
Article in English | MEDLINE | ID: mdl-33172885

ABSTRACT

The cancer early-detection biomarker field was, compared with the therapeutic arena, in its infancy when the Early Detection Research Network (EDRN) was initiated in 2000. The EDRN has played a crucial role in changing the culture and the ways people conduct biomarker studies. The EDRN proposed biomarker developmental guidelines and biomarker pivotal trial study design standards, created biomarker reference sets and functioned as an unbiased broker for the field, implemented the most rigorous blinding policy in the biomarker field, developed an array of statistical and computational tools for early-detection biomarker evaluations, and developed a multidisciplinary team-science approach. We reviewed these contributions made by the EDRN and their impacts on maturing the field. Future challenges and opportunities in cancer early-detection biomarker translational research are discussed, particularly in strengthening biomarker discovery pipeline and conducting more efficient biomarker validation studies.See all articles in this CEBP Focus section, "NCI Early Detection Research Network: Making Cancer Detection Possible."


Subject(s)
Biomarkers, Tumor/metabolism , Biomedical Research/methods , Early Detection of Cancer , Humans
4.
Biometrics ; 76(3): 843-852, 2020 09.
Article in English | MEDLINE | ID: mdl-31732971

ABSTRACT

Referral strategies based on risk scores and medical tests are commonly proposed. Direct assessment of their clinical utility requires implementing the strategy and is not possible in the early phases of biomarker research. Prior to late-phase studies, net benefit measures can be used to assess the potential clinical impact of a proposed strategy. Validation studies, in which the biomarker defines a prespecified referral strategy, are a gold standard approach to evaluating biomarker potential. Uncertainty, quantified by a confidence interval, is important to consider when deciding whether a biomarker warrants an impact study, does not demonstrate clinical potential, or that more data are needed. We establish distribution theory for empirical estimators of net benefit and propose empirical estimators of variance. The primary results are for the most commonly employed estimators of net benefit: from cohort and unmatched case-control samples, and for point estimates and net benefit curves. Novel estimators of net benefit under stratified two-phase and categorically matched case-control sampling are proposed and distribution theory developed. Results for common variants of net benefit and for estimation from right-censored outcomes are also presented. We motivate and demonstrate the methodology with examples from lung cancer research and highlight its application to study design.


Subject(s)
Research Design , Biomarkers , Case-Control Studies , Humans , Uncertainty
5.
JAMA Netw Open ; 2(10): e1912597, 2019 10 02.
Article in English | MEDLINE | ID: mdl-31603483

ABSTRACT

Importance: Histopathologic criteria have limited diagnostic reliability for a range of cutaneous melanocytic lesions. Objective: To evaluate the association of second-opinion strategies by general pathologists and dermatopathologists with the overall reliability of diagnosis of difficult melanocytic lesions. Design, Setting, and Participants: This diagnostic study used samples from the Melanoma Pathology Study, which comprises 240 melanocytic lesion samples selected from a dermatopathology laboratory in Bellevue, Washington, and represents the full spectrum of lesions from common nevi to invasive melanoma. Five sets of 48 samples were evaluated independently by 187 US pathologists from July 15, 2013, through May 23, 2016. Data analysis was performed from April 2016 through November 2017. Main Outcomes and Measures: Accuracy of diagnosis, defined as concordance with an expert consensus diagnosis of 3 experienced pathologists, was assessed after applying 10 different second-opinion strategies. Results: Among the 187 US pathologists examining the 24 lesion samples, 113 were general pathologists (65 men [57.5%]; mean age at survey, 53.7 years [range, 33.0-79.0 years]) and 74 were dermatopathologists (49 men [66.2%]; mean age at survey, 46.4 years [range, 33.0-77.0 years]). Among the 8976 initial case interpretations, physicians desired second opinions for 3899 (43.4%), most often for interpretation of severely dysplastic nevi. The overall misclassification rate was highest when interpretations did not include second opinions and initial reviewers were all general pathologists lacking subspecialty training (52.8%; 95% CI, 51.3%-54.3%). When considering different second opinion strategies, the misclassification of melanocytic lesions was lowest when the first, second, and third consulting reviewers were subspecialty-trained dermatopathologists and when all lesions were subject to second opinions (36.7%; 95% CI, 33.1%-40.7%). When the second opinion strategies were compared with single interpretations without second opinions, the reductions in misclassification rates for some of the strategies were statistically significant, but none of the strategies eliminated diagnostic misclassification. Melanocytic lesions in the middle of the diagnostic spectrum had the highest misclassification rates (eg, moderately or severely dysplastic nevus, Spitz nevus, melanoma in situ, and pathologic stage [p]T1a invasive melanoma). Variability of in situ and thin invasive melanoma was relatively intractable to all examined strategies. Conclusions and Relevance: The results of this study suggest that second opinions rendered by dermatopathologists improve reliability of melanocytic lesion diagnosis. However, discordance among pathologists remained high.


Subject(s)
Diagnostic Errors/statistics & numerical data , Melanoma/pathology , Pathologists/statistics & numerical data , Referral and Consultation , Skin Neoplasms/pathology , Adult , Aged , Clinical Competence , Dermatologists , Diagnostic Errors/prevention & control , Female , Humans , Male , Middle Aged , Pathologists/standards , Washington , Melanoma, Cutaneous Malignant
6.
JAMA Netw Open ; 1(1)2018 05.
Article in English | MEDLINE | ID: mdl-30556054

ABSTRACT

IMPORTANCE: The recently updated American Joint Committee on Cancer (AJCC) classification of cancer staging, the AJCC Cancer Staging Manual, 8th edition (AJCC 8), includes revisions to definitions of T1a vs T1b or greater. The Melanoma Pathology Study database affords a comparison,of pathologists' concordance and reproducibility in the microstaging of melanoma according to both the existing 7th edition (AJCC 7) and the new AJCC 8. OBJECTIVE: To compare AJCC 7 and AJCC 8 to examine whether changes to the definitions of T1a and T1b or greater are associated with changes in concordance and reproducibility. DESIGN SETTING AND PARTICIPANTS: In this diagnostic study conducted as part of the national Melanoma Pathology Study across US states, 187 pathologists interpreting melanocytic skin lesions in practice completed 4342 independent case interpretations of 116 invasive melanoma cases. A consensus reference diagnosis and participating pathologists' interpretations were classified into the Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis class IV (T1a) or class V ( T1b) using both the AJCC 7 and AJCC 8 criteria. MAIN OUTCOMES AND MEASURES: Concordance with consensus reference diagnosis, interobserver reproducibility, and intraobserver reproducibility. RESULTS: For T1a diagnoses, participating pathologists' concordance with the consensus reference diagnosis increased from 44% (95% CI, 41%-48%) to 54% (95% CI, 51%-57%) using AJCC 7 and AJCC 8 criteria, respectively. The concordance for cases of T1b or greater increased from 72% (95% CI, 69%-75%) to 78% (95% CI, 75%-80%). Intraobserver reproducibility of diagnoses also improved, increasing from 59% (95% CI, 56%-63%) to 64% (95% CI, 62%-67%) for T1a invasive melanoma, and from 74% (95% CI, 71%-76%) to 77% (95% CI, 74%-79%) for T1b or greater invasive melanoma cases. CONCLUSIONS AND RELEVANCE: Melanoma staging in AJCC 8 shows greater reproducibility and higher concordance with a reference standard. Improved classification of invasive melanoma can be expected after implementation of AJCC 8, suggesting a positive impact on patients. However, despite improvement, concordance and reproducibility remain low.


Subject(s)
Melanoma/diagnosis , Neoplasm Staging/methods , Neoplasm Staging/standards , Consensus , Guidelines as Topic , Humans , Logistic Models , Pathologists , Reproducibility of Results , Societies, Medical , United States
7.
Cancer Epidemiol ; 56: 83-89, 2018 10.
Article in English | MEDLINE | ID: mdl-30099328

ABSTRACT

BACKGROUND: Biomarker candidates are often ranked using P-values. Standard P-value calculations use normal or logit-normal approximations, which may not be correct for small P-values and small sample sizes common in discovery research. METHODS: We compared exact P-values, correct by definition, with logit-normal approximations in a simulated study of 40 cases and 160 controls. The key measure of biomarker performance was sensitivity at 90% specificity. Data for 3000 uninformative false markers and 30 informative true markers were generated randomly. We also analyzed real data for 2371 plasma protein markers measured in 121 breast cancer cases and 121 controls. RESULTS: In our simulation, using the same discovery criterion, exact P-values led to discovery of 24 true and 82 false biomarkers, while logit-normal approximate P-values yielded 20 true and 106 false biomarkers. The estimated true discovery rate was substantially off for approximate P-values: logit-normal estimated 42 but found 20. The exact method estimated 22, very close to 24, which was the actual number of true discoveries. Although these results are based on one specific simulation, qualitatively similar results were obtained from 10 random repetitions. With real data, ranking candidate biomarkers by exact P-values, versus approximate P-values, resulted in a very different ordering of these markers. CONCLUSIONS: Exact P-values, which correspond to permutation tests with non-parametric rank statistics such as empirical ROC statistics, are preferred over approximate P-values. Approximate P-values can lead to inappropriate biomarker selection rules and incorrect conclusions. IMPACT: Exact P-values in place of approximate P-values in discovery research may improve the yield of biomarkers that validate clinically.


Subject(s)
Biomarkers/analysis , Computational Biology/methods , Computational Biology/standards , Data Interpretation, Statistical , Models, Statistical , Humans
8.
J Am Acad Dermatol ; 79(1): 52-59.e5, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29524584

ABSTRACT

BACKGROUND: Diagnostic interpretations of melanocytic skin lesions vary widely among pathologists, yet the underlying reasons remain unclear. OBJECTIVE: Identify pathologist characteristics associated with rates of accuracy and reproducibility. METHODS: Pathologists independently interpreted the same set of biopsy specimens from melanocytic lesions on 2 occasions. Diagnoses were categorized into 1 of 5 classes according to the Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis system. Reproducibility was determined by pathologists' concordance of diagnoses across 2 occasions. Accuracy was defined by concordance with a consensus reference standard. Associations of pathologist characteristics with reproducibility and accuracy were assessed individually and in multivariable logistic regression models. RESULTS: Rates of diagnostic reproducibility and accuracy were highest among pathologists with board certification and/or fellowship training in dermatopathology and in those with 5 or more years of experience. In addition, accuracy was high among pathologists with a higher proportion of melanocytic lesions in their caseload composition and higher volume of melanocytic lesions. LIMITATIONS: Data gathered in a test set situation by using a classification tool not currently in clinical use. CONCLUSION: Diagnoses are more accurate among pathologists with specialty training and those with more experience interpreting melanocytic lesions. These findings support the practice of referring difficult cases to more experienced pathologists to improve diagnostic accuracy, although the impact of these referrals on patient outcomes requires additional research.


Subject(s)
Melanoma/pathology , Pathologists , Pathology, Clinical/standards , Skin Neoplasms/pathology , Biopsy, Needle , Clinical Competence , Consensus , Delphi Technique , Female , Humans , Male , Observer Variation , Melanoma, Cutaneous Malignant
9.
BMJ ; 357: j2813, 2017 Jun 28.
Article in English | MEDLINE | ID: mdl-28659278

ABSTRACT

Objective To quantify the accuracy and reproducibility of pathologists' diagnoses of melanocytic skin lesions.Design Observer accuracy and reproducibility study.Setting 10 US states.Participants Skin biopsy cases (n=240), grouped into sets of 36 or 48. Pathologists from 10 US states were randomized to independently interpret the same set on two occasions (phases 1 and 2), at least eight months apart.Main outcome measures Pathologists' interpretations were condensed into five classes: I (eg, nevus or mild atypia); II (eg, moderate atypia); III (eg, severe atypia or melanoma in situ); IV (eg, pathologic stage T1a (pT1a) early invasive melanoma); and V (eg, ≥pT1b invasive melanoma). Reproducibility was assessed by intraobserver and interobserver concordance rates, and accuracy by concordance with three reference diagnoses.Results In phase 1, 187 pathologists completed 8976 independent case interpretations resulting in an average of 10 (SD 4) different diagnostic terms applied to each case. Among pathologists interpreting the same cases in both phases, when pathologists diagnosed a case as class I or class V during phase 1, they gave the same diagnosis in phase 2 for the majority of cases (class I 76.7%; class V 82.6%). However, the intraobserver reproducibility was lower for cases interpreted as class II (35.2%), class III (59.5%), and class IV (63.2%). Average interobserver concordance rates were lower, but with similar trends. Accuracy using a consensus diagnosis of experienced pathologists as reference varied by class: I, 92% (95% confidence interval 90% to 94%); II, 25% (22% to 28%); III, 40% (37% to 44%); IV, 43% (39% to 46%); and V, 72% (69% to 75%). It is estimated that at a population level, 82.8% (81.0% to 84.5%) of melanocytic skin biopsy diagnoses would have their diagnosis verified if reviewed by a consensus reference panel of experienced pathologists, with 8.0% (6.2% to 9.9%) of cases overinterpreted by the initial pathologist and 9.2% (8.8% to 9.6%) underinterpreted.Conclusion Diagnoses spanning moderately dysplastic nevi to early stage invasive melanoma were neither reproducible nor accurate in this large study of pathologists in the USA. Efforts to improve clinical practice should include using a standardized classification system, acknowledging uncertainty in pathology reports, and developing tools such as molecular markers to support pathologists' visual assessments.


Subject(s)
Clinical Competence/statistics & numerical data , Melanoma/diagnosis , Nevus, Pigmented/diagnosis , Pathology, Clinical/standards , Skin Neoplasms/diagnosis , Adult , Biopsy , Diagnosis, Differential , Diagnostic Errors , Humans , Middle Aged , Observer Variation , Reproducibility of Results , United States , Melanoma, Cutaneous Malignant
10.
Eur J Cancer ; 80: 39-47, 2017 07.
Article in English | MEDLINE | ID: mdl-28535496

ABSTRACT

BACKGROUND: Diagnostic agreement among pathologists is 84% for ductal carcinoma in situ (DCIS). Studies of interpretive variation according to grade are limited. METHODS: A national sample of 115 pathologists interpreted 240 breast pathology test set cases in the Breast Pathology Study and their interpretations were compared to expert consensus interpretations. We assessed agreement of pathologists' interpretations with a consensus reference diagnosis of DCIS dichotomised into low- and high-grade lesions. Generalised estimating equations were used in logistic regression models of rates of under- and over-interpretation of DCIS by grade. RESULTS: We evaluated 2097 independent interpretations of DCIS (512 low-grade DCIS and 1585 high-grade DCIS). Agreement with reference diagnoses was 46% (95% confidence interval [CI] 42-51) for low-grade DCIS and 83% (95% CI 81-86) for high-grade DCIS. The proportion of reference low-grade DCIS interpretations over-interpreted by pathologists (i.e. categorised as either high-grade DCIS or invasive cancer) was 23% (95% CI 19-28); 30% (95% CI 26-34) were interpreted as a lower diagnostic category (atypia or benign proliferative). Reference high-grade DCIS was under-interpreted in 14% (95% CI 12-16) of observations and only over-interpreted 3% (95% CI 2-4). CONCLUSION: Grade is a major factor when examining pathologists' variability in diagnosing DCIS, with much lower agreement for low-grade DCIS cases compared to high-grade. These findings support the hypothesis that low-grade DCIS poses a greater interpretive challenge than high-grade DCIS, which should be considered when developing DCIS management strategies.


Subject(s)
Breast Neoplasms/pathology , Carcinoma, Ductal, Breast/pathology , Carcinoma, Intraductal, Noninfiltrating/pathology , Adult , Aged , Biopsy , Breast Neoplasms/diagnosis , Carcinoma, Ductal, Breast/diagnosis , Carcinoma, Intraductal, Noninfiltrating/diagnosis , Clinical Competence , Female , Humans , Logistic Models , Middle Aged , Neoplasm Staging , Observer Variation , Pathology, Clinical/standards
11.
J Pathol Inform ; 8: 12, 2017.
Article in English | MEDLINE | ID: mdl-28382226

ABSTRACT

BACKGROUND: Digital whole slide imaging may be useful for obtaining second opinions and is used in many countries. However, the U.S. Food and Drug Administration requires verification studies. METHODS: Pathologists were randomized to interpret one of four sets of breast biopsy cases during two phases, separated by ≥9 months, using glass slides or digital format (sixty cases per set, one slide per case, n = 240 cases). Accuracy was assessed by comparing interpretations to a consensus reference standard. Intraobserver reproducibility was assessed by comparing the agreement of interpretations on the same cases between two phases. Estimated probabilities of confirmation by a reference panel (i.e., predictive values) were obtained by incorporating data on the population prevalence of diagnoses. RESULTS: Sixty-five percent of responding pathologists were eligible, and 252 consented to randomization; 208 completed Phase I (115 glass, 93 digital); and 172 completed Phase II (86 glass, 86 digital). Accuracy was slightly higher using glass compared to digital format and varied by category: invasive carcinoma, 96% versus 93% (P = 0.04); ductal carcinoma in situ (DCIS), 84% versus 79% (P < 0.01); atypia, 48% versus 43% (P = 0.08); and benign without atypia, 87% versus 82% (P < 0.01). There was a small decrease in intraobserver agreement when the format changed compared to when glass slides were used in both phases (P = 0.08). Predictive values for confirmation by a reference panel using glass versus digital were: invasive carcinoma, 98% and 97% (not significant [NS]); DCIS, 70% and 57% (P = 0.007); atypia, 38% and 28% (P = 0.002); and benign without atypia, 97% and 96% (NS). CONCLUSIONS: In this large randomized study, digital format interpretations were similar to glass slide interpretations of benign and invasive cancer cases. However, cases in the middle of the spectrum, where more inherent variability exists, may be more problematic in digital format. Future studies evaluating the effect these findings exert on clinical practice and patient outcomes are required.

12.
Ann Surg Oncol ; 24(5): 1234-1241, 2017 May.
Article in English | MEDLINE | ID: mdl-27913946

ABSTRACT

BACKGROUND: Surgeons may receive a different diagnosis when a breast biopsy is interpreted by a second pathologist. The extent to which diagnostic agreement by the same pathologist varies at two time points is unknown. METHODS: Pathologists from eight U.S. states independently interpreted 60 breast specimens, one glass slide per case, on two occasions separated by ≥9 months. Reproducibility was assessed by comparing interpretations between the two time points; associations between reproducibility (intraobserver agreement rates); and characteristics of pathologists and cases were determined and also compared with interobserver agreement of baseline interpretations. RESULTS: Sixty-five percent of invited, responding pathologists were eligible and consented; 49 interpreted glass slides in both study phases, resulting in 2940 interpretations. Intraobserver agreement rates between the two phases were 92% [95% confidence interval (CI) 88-95] for invasive breast cancer, 84% (95% CI 81-87) for ductal carcinoma-in-situ, 53% (95% CI 47-59) for atypia, and 84% (95% CI 81-86) for benign without atypia. When comparing all study participants' case interpretations at baseline, interobserver agreement rates were 89% (95% CI 84-92) for invasive cancer, 79% (95% CI 76-81) for ductal carcinoma-in-situ, 43% (95% CI 41-45) for atypia, and 77% (95% CI 74-79) for benign without atypia. CONCLUSIONS: Interpretive agreement between two time points by the same individual pathologist was low for atypia and was similar to observed rates of agreement for atypia between different pathologists. Physicians and patients should be aware of the diagnostic challenges associated with a breast biopsy diagnosis of atypia when considering treatment and surveillance decisions.


Subject(s)
Breast Neoplasms/pathology , Breast/pathology , Carcinoma, Ductal, Breast/pathology , Carcinoma, Intraductal, Noninfiltrating/pathology , Pathologists , Adult , Biopsy , Breast Density , Clinical Competence , Female , Humans , Middle Aged , Observer Variation , Reproducibility of Results , Time Factors , United States
13.
BMJ ; 353: i3069, 2016 Jun 22.
Article in English | MEDLINE | ID: mdl-27334105

ABSTRACT

OBJECTIVE:  To evaluate the potential effect of second opinions on improving the accuracy of diagnostic interpretation of breast histopathology. DESIGN:  Simulation study. SETTING:  12 different strategies for acquiring independent second opinions. PARTICIPANTS:  Interpretations of 240 breast biopsy specimens by 115 pathologists, one slide for each case, compared with reference diagnoses derived by expert consensus. MAIN OUTCOME MEASURES:  Misclassification rates for individual pathologists and for 12 simulated strategies for second opinions. Simulations compared accuracy of diagnoses from single pathologists with that of diagnoses based on pairing interpretations from first and second independent pathologists, where resolution of disagreements was by an independent third pathologist. 12 strategies were evaluated in which acquisition of second opinions depended on initial diagnoses, assessment of case difficulty or borderline characteristics, pathologists' clinical volumes, or whether a second opinion was required by policy or desired by the pathologists. The 240 cases included benign without atypia (10% non-proliferative, 20% proliferative without atypia), atypia (30%), ductal carcinoma in situ (DCIS, 30%), and invasive cancer (10%). Overall misclassification rates and agreement statistics depended on the composition of the test set, which included a higher prevalence of difficult cases than in typical practice. RESULTS:  Misclassification rates significantly decreased (P<0.001) with all second opinion strategies except for the strategy limiting second opinions only to cases of invasive cancer. The overall misclassification rate decreased from 24.7% to 18.1% when all cases received second opinions (P<0.001). Obtaining both first and second opinions from pathologists with a high volume (≥10 breast biopsy specimens weekly) resulted in the lowest misclassification rate in this test set (14.3%, 95% confidence interval 10.9% to 18.0%). Obtaining second opinions only for cases with initial interpretations of atypia, DCIS, or invasive cancer decreased the over-interpretation of benign cases without atypia from 12.9% to 6.0%. Atypia cases had the highest misclassification rate after single interpretation (52.2%), remaining at more than 34% in all second opinion scenarios. CONCLUSION:  Second opinions can statistically significantly improve diagnostic agreement for pathologists' interpretations of breast biopsy specimens; however, variability in diagnosis will not be completely eliminated, especially for breast specimens with atypia.


Subject(s)
Breast Neoplasms/pathology , Breast/pathology , Carcinoma, Ductal, Breast/pathology , Carcinoma, Intraductal, Noninfiltrating/pathology , Diagnosis, Differential , Diagnostic Errors/statistics & numerical data , Referral and Consultation , Adult , Biopsy , Breast Neoplasms/diagnosis , Carcinoma, Ductal, Breast/diagnosis , Carcinoma, Intraductal, Noninfiltrating/diagnosis , Consensus , Female , Humans , Male , Middle Aged , Observer Variation , Random Allocation
14.
Ann Intern Med ; 164(10): 649-55, 2016 05 17.
Article in English | MEDLINE | ID: mdl-26999810

ABSTRACT

BACKGROUND: The effect of physician diagnostic variability on accuracy at a population level depends on the prevalence of diagnoses. OBJECTIVE: To estimate how diagnostic variability affects accuracy from the perspective of a U.S. woman aged 50 to 59 years having a breast biopsy. DESIGN: Applied probability using Bayes' theorem. SETTING: B-Path (Breast Pathology) Study comparing pathologists' interpretations of a single biopsy slide versus a reference consensus interpretation from 3 experts. PARTICIPANTS: 115 practicing pathologists (6900 total interpretations from 240 distinct cases). MEASUREMENTS: A single representative slide from each of the 240 cases was used to estimate the proportion of biopsies with a diagnosis that would be verified if the same slide were interpreted by a reference group of 3 expert pathologists. Probabilities of confirmation (predictive values) were estimated using B-Path Study results and prevalence of biopsy diagnoses for women aged 50 to 59 years in the Breast Cancer Surveillance Consortium. RESULTS: Overall, if 1 representative slide were used per case, 92.3% (95% CI, 91.4% to 93.1%) of breast biopsy diagnoses would be verified by reference consensus diagnoses, with 4.6% (CI, 3.9% to 5.3%) overinterpreted and 3.2% (CI, 2.7% to 3.6%) underinterpreted. Verification of invasive breast cancer and benign without atypia diagnoses is highly probable; estimated predictive values were 97.7% (CI, 96.5% to 98.7%) and 97.1% (CI, 96.7% to 97.4%), respectively. Verification is less probable for atypia (53.6% overinterpreted and 8.6% underinterpreted) and ductal carcinoma in situ (DCIS) (18.5% overinterpreted and 11.8% underinterpreted). LIMITATIONS: Estimates are based on a testing situation with 1 slide used per case and without access to second opinions. Population-adjusted estimates may differ for women from other age groups, unscreened women, or women in different practice settings. CONCLUSION: This analysis, based on interpretation of a single breast biopsy slide per case, predicts a low likelihood that a diagnosis of atypia or DCIS would be verified by a reference consensus diagnosis. This diagnostic grey zone should be considered in clinical management decisions in patients with these diagnoses. PRIMARY FUNDING SOURCE: National Cancer Institute.


Subject(s)
Biopsy , Breast Neoplasms/diagnosis , Clinical Competence , Pathologists/standards , Bayes Theorem , Breast Carcinoma In Situ/diagnosis , Carcinoma, Ductal, Breast/diagnosis , Female , Humans , Middle Aged , Reference Standards
15.
Clin Chem ; 62(5): 737-42, 2016 05.
Article in English | MEDLINE | ID: mdl-27001493

ABSTRACT

BACKGROUND: Many cancer biomarker research studies seek to develop markers that can accurately detect or predict future onset of disease. To design and evaluate these studies, one must specify the levels of accuracy sought. However, justified target levels are rarely available. METHODS: We describe a way to calculate target levels of sensitivity and specificity for a biomarker intended to be applied in a defined clinical context. The calculation requires knowledge of the prevalence or incidence of cases in the clinical population and the ratio of benefit associated with the clinical consequences of a positive biomarker test in cases (true positive) to cost associated with a positive biomarker test in controls (false positive). Guidance is offered on soliciting the cost/benefit ratio. The calculations are based on the longstanding decision theory concept of providing a net benefit on average in the population, and they rely on some assumptions about uniformity of costs and benefits to those tested. RESULTS: Calculations are illustrated with 3 applications: predicting colon cancer recurrence in stage 1 patients; predicting interval breast cancer (between mammography screenings); and screening for ovarian cancer. CONCLUSIONS: It is feasible to specify target levels of biomarker performance that enable evaluation of the potential clinical impact of biomarkers in early-phase studies. Nevertheless, biomarkers meeting the criteria should still be tested rigorously in studies that measure the actual impact on patient outcomes of using the biomarker to make clinical decisions.


Subject(s)
Biomarkers, Tumor/analysis , Breast Neoplasms/diagnosis , Colonic Neoplasms/diagnosis , Ovarian Neoplasms/diagnosis , Aged , Female , Humans , Middle Aged , Sensitivity and Specificity
16.
Stat Biosci ; 7(2): 282-295, 2015 Oct 01.
Article in English | MEDLINE | ID: mdl-26504496

ABSTRACT

The Net Reclassification Index (NRI) is a very popular measure for evaluating the improvement in prediction performance gained by adding a marker to a set of baseline predictors. However, the statistical properties of this novel measure have not been explored in depth. We demonstrate the alarming result that the NRI statistic calculated on a large test dataset using risk models derived from a training set is likely to be positive even when the new marker has no predictive information. A related theoretical example is provided in which an incorrect risk function that includes an uninformative marker is proven to erroneously yield a positive NRI. Some insight into this phenomenon is provided. Since large values for the NRI statistic may simply be due to use of poorly fitting risk models, we suggest caution in using the NRI as the basis for marker evaluation. Other measures of prediction performance improvement, such as measures derived from the ROC curve, the net benefit function and the Brier score, cannot be large due to poorly fitting risk functions.

18.
J Natl Cancer Inst ; 107(8)2015 Aug.
Article in English | MEDLINE | ID: mdl-26109106

ABSTRACT

Developing biomarkers that can predict whether patients are likely to benefit from an intervention is a pressing objective in many areas of medicine. Recent guidance documents have recommended that the accuracy of predictive biomarkers, ie, sensitivity, specificity, and positive and negative predictive values, should be assessed. We clarify the meanings of these entities for predictive markers and demonstrate that generally they cannot be estimated from data without making strong untestable assumptions. Language suggesting that predictive biomarkers can identify patients who benefit from an intervention is also widespread. We show that in general one cannot estimate the chance that a patient will benefit from treatment. We recommend instead that predictive biomarkers be evaluated with respect to their ability to predict clinical outcomes among patients treated and among patients receiving standard of care, and the population impact of treatment rules based on those predictions. Ideally these entities are estimated from a randomized trial comparing the experimental intervention with standard of care.


Subject(s)
Biomarkers, Tumor , Molecular Targeted Therapy , Predictive Value of Tests , Sensitivity and Specificity , Antineoplastic Agents/pharmacology , Concept Formation , Humans , Molecular Targeted Therapy/methods , Randomized Controlled Trials as Topic , Reproducibility of Results , Treatment Outcome
19.
Stat Med ; 34(27): 3503-15, 2015 Nov 30.
Article in English | MEDLINE | ID: mdl-26112650

ABSTRACT

Biomarkers that predict the efficacy of treatment can potentially improve clinical outcomes and decrease medical costs by allowing treatment to be provided only to those most likely to benefit. We consider the design of a randomized clinical trial in which one objective is to evaluate a treatment selection marker. The marker may be measured prospectively or retrospectively using samples collected at baseline. We describe and contrast criteria around which the trial can be designed. An existing approach focuses on determining if there is a statistical interaction between the marker and treatment. We propose three alternative approaches based on estimating clinically relevant measures of improvement in outcomes with use of the marker. Importantly, our approaches accommodate the common scenario in which the marker-based rule for recommending treatment is developed with data from the trial. Sample sizes are calculated for powering a trial to assess these criteria in the context of adjuvant chemotherapy for the treatment of estrogen-receptor-positive, node-positive breast cancer. In this example, we find that larger sample sizes are generally required for assessing clinical impact than for simply evaluating if there is a statistical interaction between marker and treatment. We also find that retrospectively selecting a case-control subset of subjects for marker evaluation can lead to large efficiency gains, especially if cases and controls are matched on treatment assignment.


Subject(s)
Biomarkers , Patient Selection , Research Design , Breast Neoplasms , Female , Humans , Models, Statistical , Randomized Controlled Trials as Topic , Research Design/statistics & numerical data , Treatment Outcome
20.
Cancer Epidemiol Biomarkers Prev ; 24(6): 944-50, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25837819

ABSTRACT

BACKGROUND: Biomarker discovery research has yielded few biomarkers that validate for clinical use. A contributing factor may be poor study designs. METHODS: The goal in discovery research is to identify a subset of potentially useful markers from a large set of candidates assayed on case and control samples. We recommend the PRoBE design for selecting samples. We propose sample size calculations that require specifying: (i) a definition for biomarker performance; (ii) the proportion of useful markers the study should identify (Discovery Power); and (iii) the tolerable number of useless markers amongst those identified (False Leads Expected, FLE). RESULTS: We apply the methodology to a study of 9,000 candidate biomarkers for risk of colon cancer recurrence where a useful biomarker has positive predictive value ≥ 30%. We find that 40 patients with recurrence and 160 without recurrence suffice to filter out 98% of useless markers (2% FLE) while identifying 95% of useful biomarkers (95% Discovery Power). Alternative methods for sample size calculation required more assumptions. CONCLUSIONS: Biomarker discovery research should utilize quality biospecimen repositories and include sample sizes that enable markers meeting prespecified performance characteristics for well-defined clinical applications to be identified. IMPACT: The scientific rigor of discovery research should be improved.


Subject(s)
Biomarkers, Tumor/metabolism , Colonic Neoplasms/diagnosis , Neoplasm Recurrence, Local/diagnosis , Research Design , Sample Size , Biomedical Research , Cohort Studies , Colonic Neoplasms/metabolism , Humans , Neoplasm Recurrence, Local/metabolism , Prognosis
SELECTION OF CITATIONS
SEARCH DETAIL
...