Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 182
Filtrar
4.
BMJ Med ; 3(1): e000817, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38375077

RESUMO

Objectives: To conduct a systematic review of studies externally validating the ADNEX (Assessment of Different Neoplasias in the adnexa) model for diagnosis of ovarian cancer and to present a meta-analysis of its performance. Design: Systematic review and meta-analysis of external validation studies. Data sources: Medline, Embase, Web of Science, Scopus, and Europe PMC, from 15 October 2014 to 15 May 2023. Eligibility criteria for selecting studies: All external validation studies of the performance of ADNEX, with any study design and any study population of patients with an adnexal mass. Two independent reviewers extracted the data. Disagreements were resolved by discussion. Reporting quality of the studies was scored with the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) reporting guideline, and methodological conduct and risk of bias with PROBAST (Prediction model Risk Of Bias Assessment Tool). Random effects meta-analysis of the area under the receiver operating characteristic curve (AUC), sensitivity and specificity at the 10% risk of malignancy threshold, and net benefit and relative utility at the 10% risk of malignancy threshold were performed. Results: 47 studies (17 007 tumours) were included, with a median study sample size of 261 (range 24-4905). On average, 61% of TRIPOD items were reported. Handling of missing data, justification of sample size, and model calibration were rarely described. 91% of validations were at high risk of bias, mainly because of the unexplained exclusion of incomplete cases, small sample size, or no assessment of calibration. The summary AUC to distinguish benign from malignant tumours in patients who underwent surgery was 0.93 (95% confidence interval 0.92 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX with the serum biomarker, cancer antigen 125 (CA125), as a predictor (9202 tumours, 43 centres, 18 countries, and 21 studies) and 0.93 (95% confidence interval 0.91 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX without CA125 (6309 tumours, 31 centres, 13 countries, and 12 studies). The estimated probability that the model has use clinically in a new centre was 95% (with CA125) and 91% (without CA125). When restricting analysis to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125), and estimated probabilities that the model has use clinically were 89% (with CA125) and 87% (without CA125). Conclusions: The results of the meta-analysis indicated that ADNEX performed well in distinguishing between benign and malignant tumours in populations from different countries and settings, regardless of whether the serum biomarker, CA125, was used as a predictor. A key limitation was that calibration was rarely assessed. Systematic review registration: PROSPERO CRD42022373182.

5.
Stat Med ; 43(6): 1119-1134, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38189632

RESUMO

Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.


Assuntos
Projetos de Pesquisa , Humanos , Simulação por Computador , Tamanho da Amostra
6.
Br J Cancer ; 130(6): 934-940, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38243011

RESUMO

BACKGROUND: Several diagnostic prediction models to help clinicians discriminate between benign and malignant adnexal masses are available. This study is a head-to-head comparison of the performance of the Assessment of Different NEoplasias in the adneXa (ADNEX) model with that of the Risk of Ovarian Malignancy Algorithm (ROMA). METHODS: This is a retrospective study based on prospectively included consecutive women with an adnexal tumour scheduled for surgery at five oncology centres and one non-oncology centre in four countries between 2015 and 2019. The reference standard was histology. Model performance for ADNEX and ROMA was evaluated regarding discrimination, calibration, and clinical utility. RESULTS: The primary analysis included 894 patients, of whom 434 (49%) had a malignant tumour. The area under the receiver operating characteristic curve (AUC) was 0.92 (95% CI 0.88-0.95) for ADNEX with CA125, 0.90 (0.84-0.94) for ADNEX without CA125, and 0.85 (0.80-0.89) for ROMA. ROMA, and to a lesser extent ADNEX, underestimated the risk of malignancy. Clinical utility was highest for ADNEX. ROMA had no clinical utility at decision thresholds <27%. CONCLUSIONS: ADNEX had better ability to discriminate between benign and malignant adnexal tumours and higher clinical utility than ROMA. CLINICAL TRIAL REGISTRATION: clinicaltrials.gov NCT01698632 and NCT02847832.


Assuntos
Doenças dos Anexos , Neoplasias Ovarianas , Humanos , Feminino , Estudos Retrospectivos , Ultrassonografia , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/cirurgia , Neoplasias Ovarianas/patologia , Doenças dos Anexos/diagnóstico , Doenças dos Anexos/cirurgia , Doenças dos Anexos/patologia , Algoritmos , Sensibilidade e Especificidade , Antígeno Ca-125
9.
BMC Med Res Methodol ; 23(1): 276, 2023 11 24.
Artigo em Inglês | MEDLINE | ID: mdl-38001421

RESUMO

BACKGROUND: Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastatic. METHODS: This retrospective cohort study used 5909 patients recruited from 1999 to 2012 for model development, and 3199 patients recruited from 2012 to 2015 for model validation. Patients were recruited at oncology referral or general centers and underwent an ultrasound examination and surgery ≤ 120 days later. We developed models using standard multinomial logistic regression (MLR), Ridge MLR, random forest (RF), XGBoost, neural networks (NN), and support vector machines (SVM). We used nine clinical and ultrasound predictors but developed models with or without CA125. RESULTS: Most tumors were benign (3980 in development and 1688 in validation data), secondary metastatic tumors were least common (246 and 172). The c-statistic (AUROC) to discriminate benign from any type of malignant tumor ranged from 0.89 to 0.92 for models with CA125, from 0.89 to 0.91 for models without. The multiclass c-statistic ranged from 0.41 (SVM) to 0.55 (XGBoost) for models with CA125, and from 0.42 (SVM) to 0.51 (standard MLR) for models without. Multiclass calibration was best for RF and XGBoost. Estimated probabilities for a benign tumor in the same patient often differed by more than 0.2 (20% points) depending on the model. Net Benefit for diagnosing malignancy was similar for algorithms at the commonly used 10% risk threshold, but was slightly higher for RF at higher thresholds. Comparing models, between 3% (XGBoost vs. NN, with CA125) and 30% (NN vs. SVM, without CA125) of patients fell on opposite sides of the 10% threshold. CONCLUSION: Although several models had similarly good performance, individual probability estimates varied substantially.


Assuntos
Neoplasias Ovarianas , Feminino , Humanos , Estudos Retrospectivos , Incerteza , Neoplasias Ovarianas/diagnóstico por imagem , Neoplasias Ovarianas/patologia , Modelos Logísticos , Algoritmos , Antígeno Ca-125
10.
ERJ Open Res ; 9(6)2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37965232

RESUMO

Background: While patients with COPD often cite weather conditions as a reason for inactivity, little is known about the relationship between physical activity (PA) and weather conditions. The present study investigated the association of day-to-day weather changes on PA in patients with COPD and investigated patient characteristics related to being more or less influenced by weather conditions. Methods: In this longitudinal analysis, device-based day-by-day step counts were objectively measured in COPD patients for up to 12 months. Daily meteorological data (temperature, precipitation, wind speed, hours of sunlight and daylight) were linked to the daily step count and individual and multivariable relationships were investigated using mixed-model effects. Individual R2 was calculated for every subject to investigate the estimated influence of weather conditions on a patient level and its relationship with patient characteristics. Results: We included 50 patients with a mean±sd follow-up time of 282±93 days, totalling 14 117 patient-days. Daily temperature showed a positive linear pattern up until an inflexion point, after which a negative association with increasing temperature was observed (p<0.0001). Sunshine and daylight time had a positive association with PA (p<0.0001). Precipitation and wind speed were negatively associated with PA (p<0.0001). The median per-patient R2 for overall weather conditions was 0.08, ranging from 0.00 to 0.42. No strong associations between patient characteristics and per-patient R2 were observed. Conclusion: Weather conditions are partly associated with PA in patients with COPD, yet the overall explained variance of PA due to weather conditions is rather low and varied strongly between individuals.

11.
J Clin Epidemiol ; 161: 127-139, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37536503

RESUMO

OBJECTIVES: To systematically review the risk of bias and applicability of published prediction models for risk of central line-associated bloodstream infection (CLA-BSI) in hospitalized patients. STUDY DESIGN AND SETTING: Systematic review of literature in PubMed, Embase, Web of Science Core Collection, and Scopus up to July 10, 2023. Two authors independently appraised risk models using CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) and assessed their risk of bias and applicability using Prediction model Risk Of Bias ASsessment Tool (PROBAST). RESULTS: Sixteen studies were included, describing 37 models. When studies presented multiple algorithms, we focused on the model that was selected as the best by the study authors. Eventually we appraised 19 models, among which 15 were regression models and four machine learning models. All models were at a high risk of bias, primarily due to inappropriate proxy outcomes, predictors that are unavailable at prediction time in clinical practice, inadequate sample size, negligence of missing data, lack of model validation, and absence of calibration assessment. 18 out of 19 models had a high concern for applicability, one model had unclear concern for applicability due to incomplete reporting. CONCLUSION: We did not identify a prediction model of potential clinical use. There is a pressing need to develop an applicable model for CLA-BSI.


Assuntos
Sepse , Humanos , Viés , Prognóstico , Sepse/epidemiologia
13.
J Clin Epidemiol ; 157: 120-133, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36935090

RESUMO

OBJECTIVES: In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction. STUDY DESIGN AND SETTING: We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices. RESULTS: We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion. CONCLUSION: The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.


Assuntos
Oncologia , Pesquisa , Humanos , Prognóstico , Aprendizado de Máquina
17.
BMC Med ; 21(1): 70, 2023 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-36829188

RESUMO

BACKGROUND: Clinical prediction models should be validated before implementation in clinical practice. But is favorable performance at internal validation or one external validation sufficient to claim that a prediction model works well in the intended clinical context? MAIN BODY: We argue to the contrary because (1) patient populations vary, (2) measurement procedures vary, and (3) populations and measurements change over time. Hence, we have to expect heterogeneity in model performance between locations and settings, and across time. It follows that prediction models are never truly validated. This does not imply that validation is not important. Rather, the current focus on developing new models should shift to a focus on more extensive, well-conducted, and well-reported validation studies of promising models. CONCLUSION: Principled validation strategies are needed to understand and quantify heterogeneity, monitor performance over time, and update prediction models when appropriate. Such strategies will help to ensure that prediction models stay up-to-date and safe to support clinical decision-making.

18.
Stat Methods Med Res ; 32(3): 555-571, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36660777

RESUMO

AIMS: Multinomial logistic regression models allow one to predict the risk of a categorical outcome with > 2 categories. When developing such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (Ek) and the number of predictor parameters (pk) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. PROPOSED CRITERIA: The first criterion aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct 'one-to-one' logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression. EVALUATION OF CRITERIA: We tested the performance of the proposed criteria (i) through a simulation study and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) were natural extensions from previously proposed criteria for binary outcomes and did not require evaluation through simulation. SUMMARY: We illustrated how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.


Assuntos
Modelos Logísticos , Humanos , Tamanho da Amostra , Simulação por Computador
19.
J Clin Epidemiol ; 154: 75-84, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36528232

RESUMO

OBJECTIVES: To assess improvement in the completeness of reporting coronavirus (COVID-19) prediction models after the peer review process. STUDY DESIGN AND SETTING: Studies included in a living systematic review of COVID-19 prediction models, with both preprint and peer-reviewed published versions available, were assessed. The primary outcome was the change in percentage adherence to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines between pre-print and published manuscripts. RESULTS: Nineteen studies were identified including seven (37%) model development studies, two external validations of existing models (11%), and 10 (53%) papers reporting on both development and external validation of the same model. Median percentage adherence among preprint versions was 33% (min-max: 10 to 68%). The percentage adherence of TRIPOD components increased from preprint to publication in 11/19 studies (58%), with adherence unchanged in the remaining eight studies. The median change in adherence was just 3 percentage points (pp, min-max: 0-14 pp) across all studies. No association was observed between the change in percentage adherence and preprint score, journal impact factor, or time between journal submission and acceptance. CONCLUSIONS: The preprint reporting quality of COVID-19 prediction modeling studies is poor and did not improve much after peer review, suggesting peer review had a trivial effect on the completeness of reporting during the pandemic.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Prognóstico , Pandemias
20.
Ann Intern Med ; 176(1): 105-114, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36571841

RESUMO

Risk prediction models need thorough validation to assess their performance. Validation of models for survival outcomes poses challenges due to the censoring of observations and the varying time horizon at which predictions can be made. This article describes measures to evaluate predictions and the potential improvement in decision making from survival models based on Cox proportional hazards regression.As a motivating case study, the authors consider the prediction of the composite outcome of recurrence or death (the "event") in patients with breast cancer after surgery. They developed a simple Cox regression model with 3 predictors, as in the Nottingham Prognostic Index, in 2982 women (1275 events over 5 years of follow-up) and externally validated this model in 686 women (285 events over 5 years). Improvement in performance was assessed after the addition of progesterone receptor as a prognostic biomarker.The model predictions can be evaluated across the full range of observed follow-up times or for the event occurring by the end of a fixed time horizon of interest. The authors first discuss recommended statistical measures that evaluate model performance in terms of discrimination, calibration, or overall performance. Further, they evaluate the potential clinical utility of the model to support clinical decision making according to a net benefit measure. They provide SAS and R code to illustrate internal and external validation.The authors recommend the proposed set of performance measures for transparent reporting of the validity of predictions from survival models.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Modelos de Riscos Proporcionais , Prognóstico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA