RESUMO
BACKGROUND: A Generalized Linear Mixed Model (GLMM) is recommended to meta-analyze diagnostic test accuracy studies (DTAs) based on aggregate or individual participant data. Since a GLMM does not have a closed-form likelihood function or parameter solutions, computational methods are conventionally used to approximate the likelihoods and obtain parameter estimates. The most commonly used computational methods are the Iteratively Reweighted Least Squares (IRLS), the Laplace approximation (LA), and the Adaptive Gauss-Hermite quadrature (AGHQ). Despite being widely used, it has not been clear how these computational methods compare and perform in the context of an aggregate data meta-analysis (ADMA) of DTAs. METHODS: We compared and evaluated the performance of three commonly used computational methods for GLMM - the IRLS, the LA, and the AGHQ, via a comprehensive simulation study and real-life data examples, in the context of an ADMA of DTAs. By varying several parameters in our simulations, we assessed the performance of the three methods in terms of bias, root mean squared error, confidence interval (CI) width, coverage of the 95% CI, convergence rate, and computational speed. RESULTS: For most of the scenarios, especially when the meta-analytic data were not sparse (i.e., there were no or negligible studies with perfect diagnosis), the three computational methods were comparable for the estimation of sensitivity and specificity. However, the LA had the largest bias and root mean squared error for pooled sensitivity and specificity when the meta-analytic data were sparse. Moreover, the AGHQ took a longer computational time to converge relative to the other two methods, although it had the best convergence rate. CONCLUSIONS: We recommend practitioners and researchers carefully choose an appropriate computational algorithm when fitting a GLMM to an ADMA of DTAs. We do not recommend the LA for sparse meta-analytic data sets. However, either the AGHQ or the IRLS can be used regardless of the characteristics of the meta-analytic data.
Assuntos
Simulação por Computador , Testes Diagnósticos de Rotina , Metanálise como Assunto , Humanos , Testes Diagnósticos de Rotina/métodos , Testes Diagnósticos de Rotina/normas , Testes Diagnósticos de Rotina/estatística & dados numéricos , Modelos Lineares , Algoritmos , Funções Verossimilhança , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: Selective reporting of results from only well-performing cut-offs leads to biased estimates of accuracy in primary studies of questionnaire-based screening tools and in meta-analyses that synthesize results. Individual participant data meta-analysis (IPDMA) of sensitivity and specificity at each cut-off via bivariate random-effects models (BREMs) can overcome this problem. However, IPDMA is laborious and depends on the ability to successfully obtain primary datasets, and BREMs ignore the correlation between cut-offs within primary studies. METHODS: We compared the performance of three recent multiple cut-off models developed by Steinhauser et al., Jones et al., and Hoyer and Kuss, that account for missing cut-offs when meta-analyzing diagnostic accuracy studies with multiple cut-offs, to BREMs fitted at each cut-off. We used data from 22 studies of the accuracy of the Edinburgh Postnatal Depression Scale (EPDS; 4475 participants, 758 major depression cases). We fitted each of the three multiple cut-off models and BREMs to a dataset with results from only published cut-offs from each study (published data) and an IPD dataset with results for all cut-offs (full IPD data). We estimated pooled sensitivity and specificity with 95% confidence intervals (CIs) for each cut-off and the area under the curve. RESULTS: Compared to the BREMs fitted to the full IPD data, the Steinhauser et al., Jones et al., and Hoyer and Kuss models fitted to the published data produced similar receiver operating characteristic curves; though, the Hoyer and Kuss model had lower area under the curve, mainly due to estimating slightly lower sensitivity at lower cut-offs. When fitting the three multiple cut-off models to the full IPD data, a similar pattern of results was observed. Importantly, all models had similar 95% CIs for sensitivity and specificity, and the CI width increased with cut-off levels for sensitivity and decreased with an increasing cut-off for specificity, even the BREMs which treat each cut-off separately. CONCLUSIONS: Multiple cut-off models appear to be the favorable methods when only published data are available. While collecting IPD is expensive and time consuming, IPD can facilitate subgroup analyses that cannot be conducted with published data only.
Assuntos
Depressão , Comportamento de Utilização de Ferramentas , Humanos , Depressão/diagnóstico , Sensibilidade e Especificidade , Escalas de Graduação Psiquiátrica , Testes Diagnósticos de RotinaRESUMO
The seven-item Hospital Anxiety and Depression Scale Depression subscale (HADS-D) and the total score of the 14-item HADS (HADS-T) are both used for major depression screening. Compared to the HADS-D, the HADS-T includes anxiety items and requires more time to complete. We compared the screening accuracy of the HADS-D and HADS-T for major depression detection. We conducted an individual participant data meta-analysis and fit bivariate random effects models to assess diagnostic accuracy among participants with both HADS-D and HADS-T scores. We identified optimal cutoffs, estimated sensitivity and specificity with 95% confidence intervals, and compared screening accuracy across paired cutoffs via two-stage and individual-level models. We used a 0.05 equivalence margin to assess equivalency in sensitivity and specificity. 20,700 participants (2,285 major depression cases) from 98 studies were included. Cutoffs of ≥7 for the HADS-D (sensitivity 0.79 [0.75, 0.83], specificity 0.78 [0.75, 0.80]) and ≥15 for the HADS-T (sensitivity 0.79 [0.76, 0.82], specificity 0.81 [0.78, 0.83]) minimized the distance to the top-left corner of the receiver operating characteristic curve. Across all sets of paired cutoffs evaluated, differences of sensitivity between HADS-T and HADS-D ranged from -0.05 to 0.01 (0.00 at paired optimal cutoffs), and differences of specificity were within 0.03 for all cutoffs (0.02-0.03). The pattern was similar among outpatients, although the HADS-T was slightly (not nonequivalently) more specific among inpatients. The accuracy of HADS-T was equivalent to the HADS-D for detecting major depression. In most settings, the shorter HADS-D would be preferred. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Assuntos
Transtorno Depressivo Maior , Humanos , Transtorno Depressivo Maior/diagnóstico , Depressão/diagnóstico , Escalas de Graduação Psiquiátrica , Sensibilidade e Especificidade , Ansiedade/diagnóstico , Programas de RastreamentoRESUMO
OBJECTIVE: To update a previous individual participant data meta-analysis and determine the accuracy of the Patient Health Questionnaire-9 (PHQ-9), the most commonly used depression screening tool in general practice, for detecting major depression overall and by study or participant subgroups. DESIGN: Systematic review and individual participant data meta-analysis. DATA SOURCES: Medline, Medline In-Process, and Other Non-Indexed Citations via Ovid, PsycINFO, Web of Science searched through 9 May 2018. REVIEW METHODS: Eligible studies administered the PHQ-9 and classified current major depression status using a validated semistructured diagnostic interview (designed for clinician administration), fully structured interview (designed for lay administration), or the Mini International Neuropsychiatric Interview (MINI; a brief interview designed for lay administration). A bivariate random effects meta-analytic model was used to obtain point and interval estimates of pooled PHQ-9 sensitivity and specificity at cut-off values 5-15, separately, among studies that used semistructured diagnostic interviews (eg, Structured Clinical Interview for Diagnostic and Statistical Manual), fully structured interviews (eg, Composite International Diagnostic Interview), and the MINI. Meta-regression was used to investigate whether PHQ-9 accuracy correlated with reference standard categories and participant characteristics. RESULTS: Data from 44 503 total participants (27 146 additional from the update) were obtained from 100 of 127 eligible studies (42 additional studies; 79% eligible studies; 86% eligible participants). Among studies with a semistructured interview reference standard, pooled PHQ-9 sensitivity and specificity (95% confidence interval) at the standard cut-off value of ≥10, which maximised combined sensitivity and specificity, were 0.85 (0.79 to 0.89) and 0.85 (0.82 to 0.87), respectively. Specificity was similar across reference standards, but sensitivity in studies with semistructured interviews was 7-24% (median 21%) higher than with fully structured reference standards and 2-14% (median 11%) higher than with the MINI across cut-off values. Across reference standards and cut-off values, specificity was 0-10% (median 3%) higher for men and 0-12 (median 5%) higher for people aged 60 or older. CONCLUSIONS: Researchers and clinicians could use results to determine outcomes, such as total number of positive screens and false positive screens, at different PHQ-9 cut-off values for different clinical settings using the knowledge translation tool at www.depressionscreening100.com/phq. STUDY REGISTRATION: PROSPERO CRD42014010673.
Assuntos
Transtorno Depressivo Maior/diagnóstico , Questionário de Saúde do Paciente/normas , Adulto , Fatores Etários , Transtorno Depressivo Maior/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Questionário de Saúde do Paciente/estatística & dados numéricos , Escalas de Graduação Psiquiátrica/normas , Escalas de Graduação Psiquiátrica/estatística & dados numéricos , Curva ROC , Padrões de Referência , Fatores SexuaisRESUMO
OBJECTIVE: To evaluate the accuracy of the depression subscale of the Hospital Anxiety and Depression Scale (HADS-D) to screen for major depression among people with physical health problems. DESIGN: Systematic review and individual participant data meta-analysis. DATA SOURCES: Medline, Medline In-Process and Other Non-Indexed Citations, PsycInfo, and Web of Science (from inception to 25 October 2018). REVIEW METHODS: Eligible datasets included HADS-D scores and major depression status based on a validated diagnostic interview. Primary study data and study level data extracted from primary reports were combined. For HADS-D cut-off thresholds of 5-15, a bivariate random effects meta-analysis was used to estimate pooled sensitivity and specificity, separately, in studies that used semi-structured diagnostic interviews (eg, Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders), fully structured interviews (eg, Composite International Diagnostic Interview), and the Mini International Neuropsychiatric Interview. One stage meta-regression was used to examine whether accuracy was associated with reference standard categories and the characteristics of participants. Sensitivity analyses were done to assess whether including published results from studies that did not provide raw data influenced the results. RESULTS: Individual participant data were obtained from 101 of 168 eligible studies (60%; 25 574 participants (72% of eligible participants), 2549 with major depression). Combined sensitivity and specificity was maximised at a cut-off value of seven or higher for semi-structured interviews, fully structured interviews, and the Mini International Neuropsychiatric Interview. Among studies with a semi-structured interview (57 studies, 10 664 participants, 1048 with major depression), sensitivity and specificity were 0.82 (95% confidence interval 0.76 to 0.87) and 0.78 (0.74 to 0.81) for a cut-off value of seven or higher, 0.74 (0.68 to 0.79) and 0.84 (0.81 to 0.87) for a cut-off value of eight or higher, and 0.44 (0.38 to 0.51) and 0.95 (0.93 to 0.96) for a cut-off value of 11 or higher. Accuracy was similar across reference standards and subgroups and when published results from studies that did not contribute data were included. CONCLUSIONS: When screening for major depression, a HADS-D cut-off value of seven or higher maximised combined sensitivity and specificity. A cut-off value of eight or higher generated similar combined sensitivity and specificity but was less sensitive and more specific. To identify medically ill patients with depression with the HADS-D, lower cut-off values could be used to avoid false negatives and higher cut-off values to reduce false positives and identify people with higher symptom levels. TRIAL REGISTRATION: PROSPERO CRD42015016761.
Assuntos
Transtorno Depressivo Maior/psicologia , Hospitalização , Psicometria , Manual Diagnóstico e Estatístico de Transtornos Mentais , Humanos , Escalas de Graduação Psiquiátrica , Sensibilidade e EspecificidadeRESUMO
OBJECTIVES: Estimates of depression prevalence in pregnancy and postpartum are based on the Edinburgh Postnatal Depression Scale (EPDS) more than on any other method. We aimed to determine if any EPDS cutoff can accurately and consistently estimate depression prevalence in individual studies. METHODS: We analyzed datasets that compared EPDS scores to Structured Clinical Interview for DSM (SCID) major depression status. Random-effects meta-analysis was used to compare prevalence with EPDS cutoffs versus the SCID. RESULTS: Seven thousand three hundred and fifteen participants (1017 SCID major depression) from 29 primary studies were included. For EPDS cutoffs used to estimate prevalence in recent studies (≥9 to ≥14), pooled prevalence estimates ranged from 27.8% (95% CI: 22.0%-34.5%) for EPDS ≥ 9 to 9.0% (95% CI: 6.8%-11.9%) for EPDS ≥ 14; pooled SCID major depression prevalence was 9.0% (95% CI: 6.5%-12.3%). EPDS ≥14 provided pooled prevalence closest to SCID-based prevalence but differed from SCID prevalence in individual studies by a mean absolute difference of 5.1% (95% prediction interval: -13.7%, 12.3%). CONCLUSION: EPDS ≥14 approximated SCID-based prevalence overall, but considerable heterogeneity in individual studies is a barrier to using it for prevalence estimation.
Assuntos
Depressão Pós-Parto , Transtorno Depressivo Maior , Depressão , Transtorno Depressivo Maior/diagnóstico , Transtorno Depressivo Maior/epidemiologia , Feminino , Humanos , Gravidez , Prevalência , Escalas de Graduação PsiquiátricaRESUMO
INTRODUCTION: No studies have examined factors associated with fear in any group of people vulnerable during COVID-19 due to pre-existing medical conditions. OBJECTIVE: To investigate factors associated with fear of consequences of COVID-19 among people living with a pre-existing medical condition, the autoimmune disease systemic sclerosis (SSc; scleroderma), including country. METHODS: Pre-COVID-19 data from the Scleroderma Patient-centered Intervention Network (SPIN) Cohort were linked to COVID-19 data collected in April 2020. Multivariable linear regression was used to assess factors associated with continuous scores of the 10-item COVID-19 Fears Questionnaire for Chronic Medical Conditions, controlling for pre-COVID-19 anxiety symptoms. RESULTS: Compared to France (Nâ¯=â¯156), COVID-19 Fear scores among participants from the United Kingdom (Nâ¯=â¯50) were 0.12 SD (95% CI 0.03 to 0.21) higher; scores for Canada (Nâ¯=â¯97) and the United States (Nâ¯=â¯128) were higher, but not statistically significant. Greater interference of breathing problems was associated with higher fears due to COVID-19 (Standardized regression coefficientâ¯=â¯0.12, 95% CI 0.01 to 0.23). Participants with higher financial resources adequacy scores had lower COVID-19 Fear scores (Standardized coefficientâ¯=â¯-0.18, 95% CI -0.28 to -0.09). CONCLUSIONS: Fears due to COVID-19 were associated with clinical and functional vulnerabilities in this chronically ill population. This suggests that interventions may benefit from addressing specific clinical issues that apply to specific populations. Financial resources, health policies and political influences may also be important. The needs of people living with chronic illness during a pandemic may differ depending on the social and political context in which they live.
Assuntos
COVID-19/psicologia , Medo , Escleroderma Sistêmico/terapia , Adulto , Idoso , COVID-19/epidemiologia , Canadá/epidemiologia , Doença Crônica , Estudos de Coortes , Feminino , França/epidemiologia , Humanos , Masculino , Pessoa de Meia-Idade , Assistência Centrada no Paciente , Fatores de Risco , Inquéritos e Questionários , Reino Unido/epidemiologia , Estados Unidos/epidemiologiaRESUMO
OBJECTIVE: To evaluate the Edinburgh Postnatal Depression Scale (EPDS) for screening to detect major depression in pregnant and postpartum women. DESIGN: Individual participant data meta-analysis. DATA SOURCES: Medline, Medline In-Process and Other Non-Indexed Citations, PsycINFO, and Web of Science (from inception to 3 October 2018). ELIGIBILITY CRITERIA FOR SELECTING STUDIES: Eligible datasets included EPDS scores and major depression classification based on validated diagnostic interviews. Bivariate random effects meta-analysis was used to estimate EPDS sensitivity and specificity compared with semi-structured, fully structured (Mini International Neuropsychiatric Interview (MINI) excluded), and MINI diagnostic interviews separately using individual participant data. One stage meta-regression was used to examine accuracy by reference standard categories and participant characteristics. RESULTS: Individual participant data were obtained from 58 of 83 eligible studies (70%; 15 557 of 22 788 eligible participants (68%), 2069 with major depression). Combined sensitivity and specificity was maximised at a cut-off value of 11 or higher across reference standards. Among studies with a semi-structured interview (36 studies, 9066 participants, 1330 with major depression), sensitivity and specificity were 0.85 (95% confidence interval 0.79 to 0.90) and 0.84 (0.79 to 0.88) for a cut-off value of 10 or higher, 0.81 (0.75 to 0.87) and 0.88 (0.85 to 0.91) for a cut-off value of 11 or higher, and 0.66 (0.58 to 0.74) and 0.95 (0.92 to 0.96) for a cut-off value of 13 or higher, respectively. Accuracy was similar across reference standards and subgroups, including for pregnant and postpartum women. CONCLUSIONS: An EPDS cut-off value of 11 or higher maximised combined sensitivity and specificity; a cut-off value of 13 or higher was less sensitive but more specific. To identify pregnant and postpartum women with higher symptom levels, a cut-off of 13 or higher could be used. Lower cut-off values could be used if the intention is to avoid false negatives and identify most patients who meet diagnostic criteria. REGISTRATION: PROSPERO (CRD42015024785).
Assuntos
Depressão Pós-Parto/psicologia , Transtorno Depressivo Maior/psicologia , Complicações na Gravidez/psicologia , Psicometria , Feminino , Humanos , Gravidez , Cuidado Pré-Natal , Sensibilidade e EspecificidadeRESUMO
OBJECTIVE: Fear associated with medical vulnerability should be considered when assessing mental health among individuals with chronic medical conditions during the COVID-19 pandemic. The objective was to develop and validate the COVID-19 Fears Questionnaire for Chronic Medical Conditions. METHODS: Fifteen initial items were generated based on suggestions from 121 people with the chronic autoimmune disease systemic sclerosis (SSc; scleroderma). Patients in a COVID-19 SSc cohort completed items between April 9 and 27, 2020. Exploratory factor analysis (EFA) and item analysis were used to select items for inclusion. Cronbach's alpha and Pearson correlations were used to evaluate internal consistency reliability and convergent validity. Factor structure was confirmed with confirmatory factor analysis (CFA) in follow-up data collection two weeks later. RESULTS: 787 participants completed baseline measures; 563 of them completed the follow-up assessment. Ten of 15 initial items were included in the final questionnaire. EFA suggested that a single dimension explained the data reasonably well. There were no indications of floor or ceiling effects. Cronbach's alpha was 0.91. Correlations between the COVID-19 Fears Questionnaire and measures of anxiety (râ¯=â¯0.53), depressive symptoms (râ¯=â¯0.44), and perceived stress (râ¯=â¯0.50) supported construct validity. CFA supported the single-factor structure (χ2(35)â¯=â¯311.2, pâ¯<â¯0.001, Tucker-Lewis Indexâ¯=â¯0.97, Comparative Fit Indexâ¯=â¯0.96, Root Mean Square Error of Approximationâ¯=â¯0.12). CONCLUSION: The COVID-19 Fears Questionnaire for Chronic Medical Conditions can be used to assess fear among people at risk due to pre-existing medical conditions during the COVID-19 pandemic.
Assuntos
COVID-19/psicologia , Doença Crônica/psicologia , Medo/psicologia , Assistência Centrada no Paciente/normas , Escleroderma Sistêmico/psicologia , Inquéritos e Questionários/normas , Adulto , Idoso , COVID-19/epidemiologia , Doença Crônica/epidemiologia , Estudos de Coortes , Estudos Transversais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Pandemias , Assistência Centrada no Paciente/métodos , Psicometria/métodos , Psicometria/normas , Reprodutibilidade dos Testes , Escleroderma Sistêmico/epidemiologiaRESUMO
OBJECTIVE: The Maternal Mental Health in Canada, 2018/2019, survey reported that 18% of 7,085 mothers who recently gave birth reported "feelings consistent with postpartum depression" based on scores ≥7 on a 5-item version of the Edinburgh Postpartum Depression Scale (EPDS-5). The EPDS-5 was designed as a screening questionnaire, not to classify disorders or estimate prevalence; the extent to which EPDS-5 results reflect depression prevalence is unknown. We investigated EPDS-5 ≥7 performance relative to major depression prevalence based on a validated diagnostic interview, the Structured Clinical Interview for DSM (SCID). METHODS: We searched Medline, Medline In-Process & Other Non-Indexed Citations, PsycINFO, and the Web of Science Core Collection through June 2016 for studies with data sets with item response data to calculate EPDS-5 scores and that used the SCID to ascertain depression status. We conducted an individual participant data meta-analysis to estimate pooled percentage of EPDS-5 ≥7, pooled SCID major depression prevalence, and the pooled difference in prevalence. RESULTS: A total of 3,958 participants from 19 primary studies were included. Pooled prevalence of SCID major depression was 9.2% (95% confidence interval [CI] 6.0% to 13.7%), pooled percentage of participants with EPDS-5 ≥7 was 16.2% (95% CI 10.7% to 23.8%), and pooled difference was 8.0% (95% CI 2.9% to 13.2%). In the 19 included studies, mean and median ratios of EPDS-5 to SCID prevalence were 2.1 and 1.4 times. CONCLUSIONS: Prevalence estimated based on EPDS-5 ≥7 appears to be substantially higher than the prevalence of major depression. Validated diagnostic interviews should be used to establish prevalence.
Assuntos
Depressão Pós-Parto/epidemiologia , Depressão Pós-Parto/psicologia , Programas de Rastreamento/métodos , Mães/psicologia , Canadá/epidemiologia , Depressão Pós-Parto/diagnóstico , Transtorno Depressivo Maior , Medicina Baseada em Evidências , Feminino , Humanos , Gravidez , Prevalência , Escalas de Graduação PsiquiátricaRESUMO
OBJECTIVES: Validated diagnostic interviews are required to classify depression status and estimate prevalence of disorder, but screening tools are often used instead. We used individual participant data meta-analysis to compare prevalence based on standard Hospital Anxiety and Depression Scale - depression subscale (HADS-D) cutoffs of ≥8 and ≥11 versus Structured Clinical Interview for DSM (SCID) major depression and determined if an alternative HADS-D cutoff could more accurately estimate prevalence. METHODS: We searched Medline, Medline In-Process & Other Non-Indexed Citations via Ovid, PsycINFO, and Web of Science (inception-July 11, 2016) for studies comparing HADS-D scores to SCID major depression status. Pooled prevalence and pooled differences in prevalence for HADS-D cutoffs versus SCID major depression were estimated. RESULTS: 6005 participants (689 SCID major depression cases) from 41 primary studies were included. Pooled prevalence was 24.5% (95% Confidence Interval (CI): 20.5%, 29.0%) for HADS-D ≥8, 10.7% (95% CI: 8.3%, 13.8%) for HADS-D ≥11, and 11.6% (95% CI: 9.2%, 14.6%) for SCID major depression. HADS-D ≥11 was closest to SCID major depression prevalence, but the 95% prediction interval for the difference that could be expected for HADS-D ≥11 versus SCID in a new study was -21.1% to 19.5%. CONCLUSIONS: HADS-D ≥8 substantially overestimates depression prevalence. Of all possible cutoff thresholds, HADS-D ≥11 was closest to the SCID, but there was substantial heterogeneity in the difference between HADS-D ≥11 and SCID-based estimates. HADS-D should not be used as a substitute for a validated diagnostic interview.
Assuntos
Depressão/epidemiologia , Transtorno Depressivo Maior/diagnóstico , Adulto , Idoso , Transtorno Depressivo Maior/classificação , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , PrevalênciaRESUMO
Importance: The Patient Health Questionnaire depression module (PHQ-9) is a 9-item self-administered instrument used for detecting depression and assessing severity of depression. The Patient Health Questionnaire-2 (PHQ-2) consists of the first 2 items of the PHQ-9 (which assess the frequency of depressed mood and anhedonia) and can be used as a first step to identify patients for evaluation with the full PHQ-9. Objective: To estimate PHQ-2 accuracy alone and combined with the PHQ-9 for detecting major depression. Data Sources: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, and Web of Science (January 2000-May 2018). Study Selection: Eligible data sets compared PHQ-2 scores with major depression diagnoses from a validated diagnostic interview. Data Extraction and Synthesis: Individual participant data were synthesized with bivariate random-effects meta-analysis to estimate pooled sensitivity and specificity of the PHQ-2 alone among studies using semistructured, fully structured, or Mini International Neuropsychiatric Interview (MINI) diagnostic interviews separately and in combination with the PHQ-9 vs the PHQ-9 alone for studies that used semistructured interviews. The PHQ-2 score ranges from 0 to 6, and the PHQ-9 score ranges from 0 to 27. Results: Individual participant data were obtained from 100 of 136 eligible studies (44â¯318 participants; 4572 with major depression [10%]; mean [SD] age, 49 [17] years; 59% female). Among studies that used semistructured interviews, PHQ-2 sensitivity and specificity (95% CI) were 0.91 (0.88-0.94) and 0.67 (0.64-0.71) for cutoff scores of 2 or greater and 0.72 (0.67-0.77) and 0.85 (0.83-0.87) for cutoff scores of 3 or greater. Sensitivity was significantly greater for semistructured vs fully structured interviews. Specificity was not significantly different across the types of interviews. The area under the receiver operating characteristic curve was 0.88 (0.86-0.89) for semistructured interviews, 0.82 (0.81-0.84) for fully structured interviews, and 0.87 (0.85-0.88) for the MINI. There were no significant subgroup differences. For semistructured interviews, sensitivity for PHQ-2 scores of 2 or greater followed by PHQ-9 scores of 10 or greater (0.82 [0.76-0.86]) was not significantly different than PHQ-9 scores of 10 or greater alone (0.86 [0.80-0.90]); specificity for the combination was significantly but minimally higher (0.87 [0.84-0.89] vs 0.85 [0.82-0.87]). The area under the curve was 0.90 (0.89-0.91). The combination was estimated to reduce the number of participants needing to complete the full PHQ-9 by 57% (56%-58%). Conclusions and Relevance: In an individual participant data meta-analysis of studies that compared PHQ scores with major depression diagnoses, the combination of PHQ-2 (with cutoff ≥2) followed by PHQ-9 (with cutoff ≥10) had similar sensitivity but higher specificity compared with PHQ-9 cutoff scores of 10 or greater alone. Further research is needed to understand the clinical and research value of this combined approach to screening.
Assuntos
Transtorno Depressivo Maior/diagnóstico , Programas de Rastreamento/métodos , Questionário de Saúde do Paciente , Adulto , Transtorno Depressivo Maior/classificação , Feminino , Humanos , Entrevistas como Assunto , Masculino , Curva ROC , Sensibilidade e EspecificidadeRESUMO
Due to the inevitable inter-study correlation between test sensitivity (Se) and test specificity (Sp), mostly because of threshold variability, hierarchical or bivariate random-effects models are widely used to perform a meta-analysis of diagnostic test accuracy studies. Conventionally, these models assume that the random-effects follow the bivariate normal distribution. However, the inference made using the well-established bivariate random-effects models, when outlying and influential studies are present, may lead to misleading conclusions, since outlying or influential studies can extremely influence parameter estimates due to their disproportional weight. Therefore, we developed a new robust bivariate random-effects model that accommodates outlying and influential observations and gives robust statistical inference by down-weighting the effect of outlying and influential studies. The marginal model and the Monte Carlo expectation-maximization algorithm for our proposed model have been derived. A simulation study has been carried out to validate the proposed method and compare it against the standard methods. Regardless of the parameters varied in our simulations, the proposed model produced robust point estimates of Se and Sp compared to the standard models. Moreover, our proposed model resulted in precise estimates as it yielded the narrowest confidence intervals. The proposed model also generated a similar point and interval estimates of Se and Sp as the standard models when there are no outlying and influential studies. Two published meta-analyses have also been used to illustrate the methods.
Assuntos
Algoritmos , Testes Diagnósticos de Rotina , Simulação por Computador , Projetos de Pesquisa , Sensibilidade e EspecificidadeRESUMO
Hierarchical models are recommended for meta-analyzing diagnostic test accuracy (DTA) studies. The bivariate random-effects model is currently widely used to synthesize a pair of test sensitivity and specificity using logit transformation across studies. This model assumes a bivariate normal distribution for the random-effects. However, this assumption is restrictive and can be violated. When the assumption fails, inferences could be misleading. In this paper, we extended the current bivariate random-effects model by assuming a flexible bivariate skew-normal distribution for the random-effects in order to robustly model logit sensitivities and logit specificities. The marginal distribution of the proposed model is analytically derived so that parameter estimation can be performed using standard likelihood methods. The method of weighted-average is adopted to estimate the overall logit-transformed sensitivity and specificity. An extensive simulation study is carried out to investigate the performance of the proposed model compared to other standard models. Overall, the proposed model performs better in terms of confidence interval width of the average logit-transformed sensitivity and specificity compared to the standard bivariate linear mixed model and bivariate generalized linear mixed model. Simulations have also shown that the proposed model performed better than the well-established bivariate linear mixed model in terms of bias and comparable with regards to the root mean squared error (RMSE) of the between-study (co)variances. The proposed method is also illustrated using a published meta-analysis data.
Assuntos
Testes Diagnósticos de Rotina , Modelos Logísticos , Projetos de Pesquisa , Simulação por Computador , Testes Diagnósticos de Rotina/normas , Humanos , Modelos Lineares , Sensibilidade e EspecificidadeRESUMO
OBJECTIVES: Depression symptom questionnaires are not for diagnostic classification. Patient Health Questionnaire-9 (PHQ-9) scores ≥10 are nonetheless often used to estimate depression prevalence. We compared PHQ-9 ≥10 prevalence to Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders (SCID) major depression prevalence and assessed whether an alternative PHQ-9 cutoff could more accurately estimate prevalence. STUDY DESIGN AND SETTING: Individual participant data meta-analysis of datasets comparing PHQ-9 scores to SCID major depression status. RESULTS: A total of 9,242 participants (1,389 SCID major depression cases) from 44 primary studies were included. Pooled PHQ-9 ≥10 prevalence was 24.6% (95% confidence interval [CI]: 20.8%, 28.9%); pooled SCID major depression prevalence was 12.1% (95% CI: 9.6%, 15.2%); and pooled difference was 11.9% (95% CI: 9.3%, 14.6%). The mean study-level PHQ-9 ≥10 to SCID-based prevalence ratio was 2.5 times. PHQ-9 ≥14 and the PHQ-9 diagnostic algorithm provided prevalence closest to SCID major depression prevalence, but study-level prevalence differed from SCID-based prevalence by an average absolute difference of 4.8% for PHQ-9 ≥14 (95% prediction interval: -13.6%, 14.5%) and 5.6% for the PHQ-9 diagnostic algorithm (95% prediction interval: -16.4%, 15.0%). CONCLUSION: PHQ-9 ≥10 substantially overestimates depression prevalence. There is too much heterogeneity to correct statistically in individual studies.
Assuntos
Depressão/epidemiologia , Adolescente , Adulto , Idoso , Bases de Dados Factuais , Manual Diagnóstico e Estatístico de Transtornos Mentais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Questionário de Saúde do Paciente , Prevalência , Adulto JovemRESUMO
Bivariate random-effects models are currently widely used to synthesize pairs of test sensitivity and specificity across studies. Inferences drawn based on these models may be distorted in the presence of outlying or influential studies. Currently, subjective methods such as inspection of forest plots are used to identify outlying studies in meta-analysis of diagnostic test accuracy studies. We proposed objective methods based on solid statistical reasoning for identifying outlying and/or influential studies. The proposed methods have been validated using simulation study and illustrated on two published meta-analysis data. Our methods outperform and neglect the subjectivity of the currently used ad hoc methods. The proposed methods can be used as a sensitivity analysis tool concurrently with the current bivariate random-effects models or as a preliminary analysis tool for robust models that accommodate outlying and/or influential studies in meta-analysis of diagnostic test accuracy studies.
Assuntos
Testes Diagnósticos de Rotina , Metanálise como Assunto , Simulação por Computador , Sensibilidade e EspecificidadeRESUMO
Diagnostic or screening tests are widely used in medical fields to classify patients according to their disease status. Several statistical models for meta-analysis of diagnostic test accuracy studies have been developed to synthesize test sensitivity and specificity of a diagnostic test of interest. Because of the correlation between test sensitivity and specificity, modeling the two measures using a bivariate model is recommended. In this paper, we extend the current standard bivariate linear mixed model (LMM) by proposing two variance-stabilizing transformations: the arcsine square root and the Freeman-Tukey double arcsine transformation. We compared the performance of the proposed methods with the standard method through simulations using several performance measures. The simulation results showed that our proposed methods performed better than the standard LMM in terms of bias, root mean square error, and coverage probability in most of the scenarios, even when data were generated assuming the standard LMM. We also illustrated the methods using two real data sets.
Assuntos
Biometria/métodos , Diagnóstico , Metanálise como Assunto , Doença da Artéria Coronariana/diagnóstico , Feminino , Humanos , Modelos Estatísticos , Análise Multivariada , Processos Estocásticos , Doenças do Colo do Útero/diagnósticoRESUMO
We systematically reviewed and analyzed the available data for galactomannan (GM), ß-D-glucan (BG), and polymerase chain reaction (PCR)-based assays to detect invasive fungal disease (IFD) in patients with pediatric cancer or undergoing hematopoietic stem cell transplantation when used as screening tools during immunosuppression or as diagnostic tests in patients presenting with symptoms such as fever during neutropenia (FN). Of 1532 studies screened, 25 studies reported on GM (n = 19), BG (n = 3), and PCR (n = 11). All fungal biomarkers demonstrated highly variable sensitivity, specificity, and positive predictive values, and these were generally poor in both clinical settings. GM negative predictive values were high, ranging from 85% to 100% for screening and 70% to 100% in the diagnostic setting, but failure to identify non-Aspergillus molds limits its usefulness. Future work could focus on the usefulness of combinations of fungal biomarkers in pediatric cancer and HSCT.