Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 413
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38347140

RESUMEN

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Asunto(s)
Inteligencia Artificial
2.
Eur Heart J ; 44(32): 3073-3081, 2023 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-37452732

RESUMEN

AIMS: Risk stratification is used for decisions regarding need for imaging in patients with clinically suspected acute pulmonary embolism (PE). The aim was to develop a clinical prediction model that provides an individualized, accurate probability estimate for the presence of acute PE in patients with suspected disease based on readily available clinical items and D-dimer concentrations. METHODS AND RESULTS: An individual patient data meta-analysis was performed based on sixteen cross-sectional or prospective studies with data from 28 305 adult patients with clinically suspected PE from various clinical settings, including primary care, emergency care, hospitalized and nursing home patients. A multilevel logistic regression model was built and validated including ten a priori defined objective candidate predictors to predict objectively confirmed PE at baseline or venous thromboembolism (VTE) during follow-up of 30 to 90 days. Multiple imputation was used for missing data. Backward elimination was performed with a P-value <0.10. Discrimination (c-statistic with 95% confidence intervals [CI] and prediction intervals [PI]) and calibration (outcome:expected [O:E] ratio and calibration plot) were evaluated based on internal-external cross-validation. The accuracy of the model was subsequently compared with algorithms based on the Wells score and D-dimer testing. The final model included age (in years), sex, previous VTE, recent surgery or immobilization, haemoptysis, cancer, clinical signs of deep vein thrombosis, inpatient status, D-dimer (in µg/L), and an interaction term between age and D-dimer. The pooled c-statistic was 0.87 (95% CI, 0.85-0.89; 95% PI, 0.77-0.93) and overall calibration was very good (pooled O:E ratio, 0.99; 95% CI, 0.87-1.14; 95% PI, 0.55-1.79). The model slightly overestimated VTE probability in the lower range of estimated probabilities. Discrimination of the current model in the validation data sets was better than that of the Wells score combined with a D-dimer threshold based on age (c-statistic 0.73; 95% CI, 0.70-0.75) or structured clinical pretest probability (c-statistic 0.79; 95% CI, 0.76-0.81). CONCLUSION: The present model provides an absolute, individualized probability of PE presence in a broad population of patients with suspected PE, with very good discrimination and calibration. Its clinical utility needs to be evaluated in a prospective management or impact study. REGISTRATION: PROSPERO ID 89366.


Asunto(s)
Embolia Pulmonar , Tromboembolia Venosa , Adulto , Humanos , Tromboembolia Venosa/diagnóstico , Tromboembolia Venosa/epidemiología , Estudios Prospectivos , Estudios Transversales , Modelos Estadísticos , Pronóstico , Embolia Pulmonar/diagnóstico , Embolia Pulmonar/epidemiología , Productos de Degradación de Fibrina-Fibrinógeno/análisis
3.
Stat Med ; 42(19): 3508-3528, 2023 08 30.
Artículo en Inglés | MEDLINE | ID: mdl-37311563

RESUMEN

External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.


Asunto(s)
Benchmarking , Grupos Diagnósticos Relacionados , Humanos , Simulación por Computador
4.
Br J Clin Pharmacol ; 89(2): 751-761, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36102068

RESUMEN

AIM: To investigate the effects of off-label non-vitamin K oral anticoagulant (NOAC) dose reduction compared with on-label standard dosing in atrial fibrillation (AF) patients in routine care. METHODS: Population-based cohort study using data from the United Kingdom Clinical Practice Research Datalink, comparing adults with non-valvular AF receiving an off-label reduced NOAC dose to patients receiving an on-label standard dose. Outcomes were ischaemic stroke, major/non-major bleeding and mortality. Inverse probability of treatment weighting and inverse probability of censoring weighting on the propensity score were applied to adjust for confounding and informative censoring. RESULTS: Off-label dose reduction occurred in 2466 patients (8.0%), compared with 18 108 (58.5%) on-label standard-dose users. Median age was 80 years (interquartile range [IQR] 73.0-86.0) versus 72 years (IQR 66-78), respectively. Incidence rates were higher in the off-label dose reduction group compared to the on-label standard dose group, for ischaemic stroke (0.94 vs 0.70 per 100 person years), major bleeding (1.48 vs 0.83), non-major bleeding (6.78 vs 6.16) and mortality (10.12 vs 3.72). Adjusted analyses resulted in a hazard ratio of 0.95 (95% confidence interval [CI] 0.57-1.60) for ischaemic stroke, 0.88 (95% CI 0.57-1.35) for major bleeding, 0.81 (95% CI 0.67-0.98) for non-major bleeding and 1.34 (95% CI 1.12-1.61) for mortality. CONCLUSION: In this large population-based study, the hazards for ischaemic stroke and major bleeding were low, and similar in AF patients receiving an off-label reduced NOAC dose compared with on-label standard dose users, while non-major bleeding risk appeared to be lower and mortality risk higher. Caution towards prescribing an off-label reduced NOAC dose is therefore required.


Asunto(s)
Fibrilación Atrial , Isquemia Encefálica , Accidente Cerebrovascular Isquémico , Accidente Cerebrovascular , Humanos , Anciano , Anciano de 80 o más Años , Anticoagulantes , Fibrilación Atrial/complicaciones , Fibrilación Atrial/tratamiento farmacológico , Fibrilación Atrial/inducido químicamente , Accidente Cerebrovascular/epidemiología , Accidente Cerebrovascular/etiología , Accidente Cerebrovascular/prevención & control , Estudios de Cohortes , Isquemia Encefálica/epidemiología , Isquemia Encefálica/prevención & control , Isquemia Encefálica/inducido químicamente , Uso Fuera de lo Indicado , Reducción Gradual de Medicamentos , Hemorragia/inducido químicamente , Hemorragia/epidemiología , Hemorragia/tratamiento farmacológico , Accidente Cerebrovascular Isquémico/inducido químicamente , Accidente Cerebrovascular Isquémico/complicaciones , Accidente Cerebrovascular Isquémico/tratamiento farmacológico , Administración Oral
5.
Ann Intern Med ; 175(2): 244-255, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34904857

RESUMEN

BACKGROUND: How diagnostic strategies for suspected pulmonary embolism (PE) perform in relevant patient subgroups defined by sex, age, cancer, and previous venous thromboembolism (VTE) is unknown. PURPOSE: To evaluate the safety and efficiency of the Wells and revised Geneva scores combined with fixed and adapted D-dimer thresholds, as well as the YEARS algorithm, for ruling out acute PE in these subgroups. DATA SOURCES: MEDLINE from 1 January 1995 until 1 January 2021. STUDY SELECTION: 16 studies assessing at least 1 diagnostic strategy. DATA EXTRACTION: Individual-patient data from 20 553 patients. DATA SYNTHESIS: Safety was defined as the diagnostic failure rate (the predicted 3-month VTE incidence after exclusion of PE without imaging at baseline). Efficiency was defined as the proportion of individuals classified by the strategy as "PE considered excluded" without imaging tests. Across all strategies, efficiency was highest in patients younger than 40 years (47% to 68%) and lowest in patients aged 80 years or older (6.0% to 23%) or patients with cancer (9.6% to 26%). However, efficiency improved considerably in these subgroups when pretest probability-dependent D-dimer thresholds were applied. Predicted failure rates were highest for strategies with adapted D-dimer thresholds, with failure rates varying between 2% and 4% in the predefined patient subgroups. LIMITATIONS: Between-study differences in scoring predictor items and D-dimer assays, as well as the presence of differential verification bias, in particular for classifying fatal events and subsegmental PE cases, all of which may have led to an overestimation of the predicted failure rates of adapted D-dimer thresholds. CONCLUSION: Overall, all strategies showed acceptable safety, with pretest probability-dependent D-dimer thresholds having not only the highest efficiency but also the highest predicted failure rate. From an efficiency perspective, this individual-patient data meta-analysis supports application of adapted D-dimer thresholds. PRIMARY FUNDING SOURCE: Dutch Research Council. (PROSPERO: CRD42018089366).


Asunto(s)
Neoplasias , Embolia Pulmonar , Tromboembolia Venosa , Productos de Degradación de Fibrina-Fibrinógeno , Humanos , Neoplasias/complicaciones , Neoplasias/diagnóstico , Probabilidad , Embolia Pulmonar/diagnóstico , Embolia Pulmonar/epidemiología , Tromboembolia Venosa/diagnóstico , Tromboembolia Venosa/epidemiología
6.
Eur Heart J ; 43(31): 2921-2930, 2022 08 14.
Artículo en Inglés | MEDLINE | ID: mdl-35639667

RESUMEN

The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the limitations of the AI-based predictions. In this article, we present 12 critical questions for cardiovascular health professionals to ask when confronted with an AI-based prediction model. We aim to support medical professionals to distinguish the AI-based prediction models that can add value to patient care from the AI that does not.


Asunto(s)
Inteligencia Artificial , Enfermedades Cardiovasculares , Personal de Salud , Humanos , Programas Informáticos
7.
PLoS Med ; 19(1): e1003905, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35077453

RESUMEN

BACKGROUND: The challenging clinical dilemma of detecting pulmonary embolism (PE) in suspected patients is encountered in a variety of healthcare settings. We hypothesized that the optimal diagnostic approach to detect these patients in terms of safety and efficiency depends on underlying PE prevalence, case mix, and physician experience, overall reflected by the type of setting where patients are initially assessed. The objective of this study was to assess the capability of ruling out PE by available diagnostic strategies across all possible settings. METHODS AND FINDINGS: We performed a literature search (MEDLINE) followed by an individual patient data (IPD) meta-analysis (MA; 23 studies), including patients from self-referral emergency care (n = 12,612), primary healthcare clinics (n = 3,174), referred secondary care (n = 17,052), and hospitalized or nursing home patients (n = 2,410). Multilevel logistic regression was performed to evaluate diagnostic performance of the Wells and revised Geneva rules, both using fixed and adapted D-dimer thresholds to age or pretest probability (PTP), for the YEARS algorithm and for the Pulmonary Embolism Rule-out Criteria (PERC). All strategies were tested separately in each healthcare setting. Following studies done in this field, the primary diagnostic metrices estimated from the models were the "failure rate" of each strategy-i.e., the proportion of missed PE among patients categorized as "PE excluded" and "efficiency"-defined as the proportion of patients categorized as "PE excluded" among all patients. In self-referral emergency care, the PERC algorithm excludes PE in 21% of suspected patients at a failure rate of 1.12% (95% confidence interval [CI] 0.74 to 1.70), whereas this increases to 6.01% (4.09 to 8.75) in referred patients to secondary care at an efficiency of 10%. In patients from primary healthcare and those referred to secondary care, strategies adjusting D-dimer to PTP are the most efficient (range: 43% to 62%) at a failure rate ranging between 0.25% and 3.06%, with higher failure rates observed in patients referred to secondary care. For this latter setting, strategies adjusting D-dimer to age are associated with a lower failure rate ranging between 0.65% and 0.81%, yet are also less efficient (range: 33% and 35%). For all strategies, failure rates are highest in hospitalized or nursing home patients, ranging between 1.68% and 5.13%, at an efficiency ranging between 15% and 30%. The main limitation of the primary analyses was that the diagnostic performance of each strategy was compared in different sets of studies since the availability of items used in each diagnostic strategy differed across included studies; however, sensitivity analyses suggested that the findings were robust. CONCLUSIONS: The capability of safely and efficiently ruling out PE of available diagnostic strategies differs for different healthcare settings. The findings of this IPD MA help in determining the optimum diagnostic strategies for ruling out PE per healthcare setting, balancing the trade-off between failure rate and efficiency of each strategy.


Asunto(s)
Interpretación Estadística de Datos , Atención a la Salud/métodos , Embolia Pulmonar/diagnóstico , Embolia Pulmonar/epidemiología , Atención a la Salud/estadística & datos numéricos , Humanos , Embolia Pulmonar/terapia
8.
Eur Respir J ; 59(2)2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34172467

RESUMEN

INTRODUCTION: The individual prognostic factors for coronavirus disease 2019 (COVID-19) are unclear. For this reason, we aimed to present a state-of-the-art systematic review and meta-analysis on the prognostic factors for adverse outcomes in COVID-19 patients. METHODS: We systematically reviewed PubMed from 1 January 2020 to 26 July 2020 to identify non-overlapping studies examining the association of any prognostic factor with any adverse outcome in patients with COVID-19. Random-effects meta-analysis was performed, and between-study heterogeneity was quantified using I2 statistic. Presence of small-study effects was assessed by applying the Egger's regression test. RESULTS: We identified 428 eligible articles, which were used in a total of 263 meta-analyses examining the association of 91 unique prognostic factors with 11 outcomes. Angiotensin-converting enzyme inhibitors, obstructive sleep apnoea, pharyngalgia, history of venous thromboembolism, sex, coronary heart disease, cancer, chronic liver disease, COPD, dementia, any immunosuppressive medication, peripheral arterial disease, rheumatological disease and smoking were associated with at least one outcome and had >1000 events, p<0.005, I2<50%, 95% prediction interval excluding the null value, and absence of small-study effects in the respective meta-analysis. The risk of bias assessment using the Quality in Prognosis Studies tool indicated high risk of bias in 302 out of 428 articles for study participation, 389 articles for adjustment for other prognostic factors and 396 articles for statistical analysis and reporting. CONCLUSIONS: Our findings could be used for prognostic model building and guide patient selection for randomised clinical trials.


Asunto(s)
COVID-19 , Sesgo , Humanos , Pronóstico , SARS-CoV-2
9.
BMC Med ; 20(1): 406, 2022 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-36280827

RESUMEN

BACKGROUND: The diagnostic accuracy of unsupervised self-testing with rapid antigen diagnostic tests (Ag-RDTs) is mostly unknown. We studied the diagnostic accuracy of a self-performed SARS-CoV-2 saliva and nasal Ag-RDT in the general population. METHODS: This large cross-sectional study consecutively included unselected individuals aged ≥ 16 years presenting for SARS-CoV-2 testing at three public health service test sites. Participants underwent molecular test sampling and received two self-tests (the Hangzhou AllTest Biotech saliva self-test and the SD Biosensor nasal self-test by Roche Diagnostics) to perform themselves at home. Diagnostic accuracy of both self-tests was assessed with molecular testing as reference. RESULTS: Out of 2819 participants, 6.5% had a positive molecular test. Overall sensitivities were 46.7% (39.3-54.2%) for the saliva Ag-RDT and 68.9% (61.6-75.6%) for the nasal Ag-RDT. With a viral load cut-off (≥ 5.2 log10 SARS-CoV-2 E-gene copies/mL) as a proxy of infectiousness, these sensitivities increased to 54.9% (46.4-63.3%) and 83.9% (76.9-89.5%), respectively. For the nasal Ag-RDT, sensitivities were 78.5% (71.1-84.8%) and 22.6% (9.6-41.1%) in those symptomatic and asymptomatic at the time of sampling, which increased to 90.4% (83.8-94.9%) and 38.9% (17.3-64.3%) after applying the viral load cut-off. In those with and without prior SARS-CoV-2 infection, sensitivities were 36.8% (16.3-61.6%) and 72.7% (65.1-79.4%). Specificities were > 99% and > 99%, positive predictive values > 70% and > 90%, and negative predictive values > 95% and > 95%, for the saliva and nasal Ag-RDT, respectively, in most analyses. Most participants considered the self-performing and result interpretation (very) easy for both self-tests. CONCLUSIONS: The Hangzhou AllTest Biotech saliva self Ag-RDT is not reliable for SARS-CoV-2 detection, overall, and in all studied subgroups. The SD Biosensor nasal self Ag-RDT had high sensitivity in individuals with symptoms and in those without prior SARS-CoV-2 infection but low sensitivity in asymptomatic individuals and those with a prior SARS-CoV-2 infection which warrants further investigation.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , COVID-19/diagnóstico , Estudios Transversales , Prueba de COVID-19 , Saliva , Sensibilidad y Especificidad , Antígenos Virales
10.
BMC Med ; 20(1): 97, 2022 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-35197052

RESUMEN

BACKGROUND: Rapid antigen diagnostic tests (Ag-RDTs) are the most widely used point-of-care tests for detecting SARS-CoV-2 infection. Since the accuracy may have altered by changes in SARS-CoV-2 epidemiology, indications for testing, sampling and testing procedures, and roll-out of COVID-19 vaccination, we evaluated the performance of three prevailing SARS-CoV-2 Ag-RDTs. METHODS: In this cross-sectional study, we consecutively enrolled individuals aged >16 years presenting for SARS-CoV-2 testing at three Dutch public health service COVID-19 test sites. In the first phase, participants underwent either BD-Veritor System (Becton Dickinson), PanBio (Abbott), or SD-Biosensor (Roche Diagnostics) testing with routine sampling procedures. In a subsequent phase, participants underwent SD-Biosensor testing with a less invasive sampling method (combined oropharyngeal-nasal [OP-N] swab). Diagnostic accuracies were assessed against molecular testing. RESULTS: Six thousand nine hundred fifty-five of 7005 participants (99%) with results from both an Ag-RDT and a molecular reference test were analysed. SARS-CoV-2 prevalence and overall sensitivities were 13% (188/1441) and 69% (129/188, 95% CI 62-75) for BD-Veritor, 8% (173/2056) and 69% (119/173, 61-76) for PanBio, and 12% (215/1769) and 74% (160/215, 68-80) for SD-Biosensor with routine sampling and 10% (164/1689) and 75% (123/164, 68-81) for SD-Biosensor with OP-N sampling. In those symptomatic or asymptomatic at sampling, sensitivities were 72-83% and 54-56%, respectively. Above a viral load cut-off (≥5.2 log10 SARS-CoV-2 E-gene copies/mL), sensitivities were 86% (125/146, 79-91) for BD-Veritor, 89% (108/121, 82-94) for PanBio, and 88% (160/182, 82-92) for SD-Biosensor with routine sampling and 84% (118/141, 77-89) with OP-N sampling. Specificities were >99% for all tests in most analyses. Sixty-one per cent of false-negative Ag-RDT participants returned for testing within 14 days (median: 3 days, interquartile range 3) of whom 90% tested positive. CONCLUSIONS: Overall sensitivities of three SARS-CoV-2 Ag-RDTs were 69-75%, increasing to ≥86% above a viral load cut-off. The decreased sensitivity among asymptomatic participants and high positivity rate during follow-up in false-negative Ag-RDT participants emphasise the need for education of the public about the importance of re-testing after an initial negative Ag-RDT should symptoms develop. For SD-Biosensor, the diagnostic accuracy with OP-N and deep nasopharyngeal sampling was similar; adopting the more convenient sampling method might reduce the threshold for professional testing.


Asunto(s)
COVID-19 , Adolescente , Antígenos Virales/análisis , Prueba de COVID-19 , Vacunas contra la COVID-19 , Estudios Transversales , Humanos , SARS-CoV-2 , Sensibilidad y Especificidad
11.
BMC Med Res Methodol ; 22(1): 12, 2022 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-35026997

RESUMEN

BACKGROUND: While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. We aim to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. METHODS: We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields. We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies ( www.TRIPOD-statement.org ). We measured the overall adherence per article and per TRIPOD item. RESULTS: Our search identified 24,814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4%) of TRIPOD items. No article fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model's predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). CONCLUSION: Similar to prediction model studies developed using conventional regression-based techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste. SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.


Asunto(s)
Lista de Verificación , Modelos Estadísticos , Humanos , Aprendizaje Automático , Pronóstico , Aprendizaje Automático Supervisado
12.
BMC Med Res Methodol ; 22(1): 101, 2022 04 08.
Artículo en Inglés | MEDLINE | ID: mdl-35395724

RESUMEN

BACKGROUND: Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS: We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. RESULTS: Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. CONCLUSIONS: The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models.


Asunto(s)
Aprendizaje Automático , Oncología Médica , Proyectos de Investigación , Sesgo , Humanos , Pronóstico
13.
Diabetologia ; 64(7): 1550-1562, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33904946

RESUMEN

AIMS/HYPOTHESIS: Approximately 25% of people with type 2 diabetes experience a foot ulcer and their risk of amputation is 10-20 times higher than that of people without type 2 diabetes. Prognostic models can aid in targeted monitoring but an overview of their performance is lacking. This study aimed to systematically review prognostic models for the risk of foot ulcer or amputation and quantify their predictive performance in an independent cohort. METHODS: A systematic review identified studies developing prognostic models for foot ulcer or amputation over minimal 1 year follow-up applicable to people with type 2 diabetes. After data extraction and risk of bias assessment (both in duplicate), selected models were externally validated in a prospective cohort with a 5 year follow-up in terms of discrimination (C statistics) and calibration (calibration plots). RESULTS: We identified 21 studies with 34 models predicting polyneuropathy, foot ulcer or amputation. Eleven models were validated in 7624 participants, of whom 485 developed an ulcer and 70 underwent amputation. The models for foot ulcer showed C statistics (95% CI) ranging from 0.54 (0.54, 0.54) to 0.81 (0.75, 0.86) and models for amputation showed C statistics (95% CI) ranging from 0.63 (0.55, 0.71) to 0.86 (0.78, 0.94). Most models underestimated the ulcer or amputation risk in the highest risk quintiles. Three models performed well to predict a combined endpoint of amputation and foot ulcer (C statistics >0.75). CONCLUSIONS/INTERPRETATION: Thirty-four prognostic models for the risk of foot ulcer or amputation were identified. Although the performance of the models varied considerably, three models performed well to predict foot ulcer or amputation and may be applicable to clinical practice.


Asunto(s)
Amputación Quirúrgica , Diabetes Mellitus Tipo 2/diagnóstico , Pie Diabético/diagnóstico , Adulto , Amputación Quirúrgica/estadística & datos numéricos , Estudios de Cohortes , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/epidemiología , Pie Diabético/epidemiología , Pie Diabético/etiología , Femenino , Úlcera del Pie/diagnóstico , Úlcera del Pie/epidemiología , Úlcera del Pie/etiología , Humanos , Masculino , Modelos Estadísticos , Pronóstico , Medición de Riesgo , Factores de Riesgo
14.
Stat Med ; 40(15): 3533-3559, 2021 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-33948970

RESUMEN

Prediction models often yield inaccurate predictions for new individuals. Large data sets from pooled studies or electronic healthcare records may alleviate this with an increased sample size and variability in sample characteristics. However, existing strategies for prediction model development generally do not account for heterogeneity in predictor-outcome associations between different settings and populations. This limits the generalizability of developed models (even from large, combined, clustered data sets) and necessitates local revisions. We aim to develop methodology for producing prediction models that require less tailoring to different settings and populations. We adopt internal-external cross-validation to assess and reduce heterogeneity in models' predictive performance during the development. We propose a predictor selection algorithm that optimizes the (weighted) average performance while minimizing its variability across the hold-out clusters (or studies). Predictors are added iteratively until the estimated generalizability is optimized. We illustrate this by developing a model for predicting the risk of atrial fibrillation and updating an existing one for diagnosing deep vein thrombosis, using individual participant data from 20 cohorts (N = 10 873) and 11 diagnostic studies (N = 10 014), respectively. Meta-analysis of calibration and discrimination performance in each hold-out cluster shows that trade-offs between average and heterogeneity of performance occurred. Our methodology enables the assessment of heterogeneity of prediction model performance during model development in multiple or clustered data sets, thereby informing researchers on predictor selection to improve the generalizability to different settings and populations, and reduce the need for model tailoring. Our methodology has been implemented in the R package metamisc.


Asunto(s)
Proyectos de Investigación , Calibración , Humanos
15.
Stat Med ; 40(13): 3066-3084, 2021 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-33768582

RESUMEN

Individual participant data (IPD) from multiple sources allows external validation of a prognostic model across multiple populations. Often this reveals poor calibration, potentially causing poor predictive performance in some populations. However, rather than discarding the model outright, it may be possible to modify the model to improve performance using recalibration techniques. We use IPD meta-analysis to identify the simplest method to achieve good model performance. We examine four options for recalibrating an existing time-to-event model across multiple populations: (i) shifting the baseline hazard by a constant, (ii) re-estimating the shape of the baseline hazard, (iii) adjusting the prognostic index as a whole, and (iv) adjusting individual predictor effects. For each strategy, IPD meta-analysis examines (heterogeneity in) model performance across populations. Additionally, the probability of achieving good performance in a new population can be calculated allowing ranking of recalibration methods. In an applied example, IPD meta-analysis reveals that the existing model had poor calibration in some populations, and large heterogeneity across populations. However, re-estimation of the intercept substantially improved the expected calibration in new populations, and reduced between-population heterogeneity. Comparing recalibration strategies showed that re-estimating both the magnitude and shape of the baseline hazard gave the highest predicted probability of good performance in a new population. In conclusion, IPD meta-analysis allows a prognostic model to be externally validated in multiple settings, and enables recalibration strategies to be compared and ranked to decide on the least aggressive recalibration strategy to achieve acceptable external model performance without discarding existing model information.


Asunto(s)
Análisis de Datos , Proyectos de Investigación , Calibración , Humanos , Metaanálisis como Asunto , Probabilidad , Pronóstico
16.
Stat Med ; 40(26): 5961-5981, 2021 11 20.
Artículo en Inglés | MEDLINE | ID: mdl-34402094

RESUMEN

Randomized trials typically estimate average relative treatment effects, but decisions on the benefit of a treatment are possibly better informed by more individualized predictions of the absolute treatment effect. In case of a binary outcome, these predictions of absolute individualized treatment effect require knowledge of the individual's risk without treatment and incorporation of a possibly differential treatment effect (ie, varying with patient characteristics). In this article, we lay out the causal structure of individualized treatment effect in terms of potential outcomes and describe the required assumptions that underlie a causal interpretation of its prediction. Subsequently, we describe regression models and model estimation techniques that can be used to move from average to more individualized treatment effect predictions. We focus mainly on logistic regression-based methods that are both well-known and naturally provide the required probabilistic estimates. We incorporate key components from both causal inference and prediction research to arrive at individualized treatment effect predictions. While the separate components are well known, their successful amalgamation is very much an ongoing field of research. We cut the problem down to its essentials in the setting of a randomized trial, discuss the importance of a clear definition of the estimand of interest, provide insight into the required assumptions, and give guidance with respect to modeling and estimation options. Simulated data illustrate the potential of different modeling options across scenarios that vary both average treatment effect and treatment effect heterogeneity. Two applied examples illustrate individualized treatment effect prediction in randomized trial data.


Asunto(s)
Ensayos Clínicos Controlados Aleatorios como Asunto , Causalidad , Humanos , Estudios Longitudinales
17.
BMC Health Serv Res ; 21(1): 298, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33794869

RESUMEN

BACKGROUND: Recent attempts of active disinvestment (i.e. withdrawal of reimbursement by means of a policy decision) of reimbursed healthcare interventions in the Netherlands have differed in their outcome: some attempts were successful, with interventions actually being disinvested. Other attempts were terminated at some point, implying unsuccessful disinvestment. This study aimed to obtain insight into recent active disinvestment processes, and to explore what aspects affect their outcome. METHODS: Semi-structured interviews were conducted from January to December 2018 with stakeholders (e.g. patients, policymakers, physicians) who were involved in the policy process of five cases for which the full or partial withdrawal of reimbursement was considered in the Netherlands between 2007 and 2017: benzodiazepines, medication for Fabry disease, quit smoking programme, psychoanalytic therapy and maternity care assistance. These cases covered both interventions that were eventually disinvested and interventions for which reimbursement was maintained after consideration. Interviews were transcribed verbatim, double coded and analyzed using thematic analysis. RESULTS: The 37 interviews showed that support for disinvestment from stakeholders, especially from healthcare providers and policymakers, strongly affected the outcome of the disinvestment process. Furthermore, the institutional role of stakeholders as legitimized by the Dutch health insurance system, their financial interests in maintaining or discontinuing reimbursement, and the possibility to relieve the consequences of disinvestment for current patients affected the outcome of the disinvestment process as well. A poor organization of patient groups may make it difficult for patients to exert pressure, which may contribute to successful disinvestment. No evidence was found of a consistent role of the formal Dutch package criteria (i.e. effectiveness, cost-effectiveness, necessity and feasibility) in active disinvestment processes. CONCLUSIONS: Contextual factors as well as the possibility to relieve the consequences of disinvestment for current patients are important determinants of the outcome of active disinvestment processes. These results provide insight into active disinvestment processes and their determinants, and provide guidance to policymakers for a potentially more successful approach for future active disinvestment processes.


Asunto(s)
Servicios de Salud Materna , Análisis Costo-Beneficio , Atención a la Salud , Femenino , Humanos , Países Bajos , Embarazo , Investigación Cualitativa
18.
Ann Intern Med ; 2020 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-32479165

RESUMEN

Clear and informative reporting in titles and abstracts is essential to help readers and reviewers identify potentially relevant studies and decide whether to read the full text. Although the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement provides general recommendations for reporting titles and abstracts, more detailed guidance seems to be desirable. The authors present TRIPOD for Abstracts, a checklist and corresponding guidance for reporting prediction model studies in abstracts. A list of 32 potentially relevant items was the starting point for a modified Delphi procedure involving 110 experts, of whom 71 (65%) participated in the web-based survey. After 2 Delphi rounds, the experts agreed on 21 items as being essential to report in abstracts of prediction model studies. This number was reduced by merging some of the items. In a third round, participants provided feedback on a draft version of TRIPOD for Abstracts. The final checklist contains 12 items and applies to journal and conference abstracts that describe the development or external validation of a diagnostic or prognostic prediction model, or the added value of predictors to an existing model, regardless of the clinical domain or statistical approach used.

19.
Eur Heart J ; 41(30): 2836-2844, 2020 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-32112556

RESUMEN

AIMS: To evaluate whether integrated care for atrial fibrillation (AF) can be safely orchestrated in primary care. METHODS AND RESULTS: The ALL-IN trial was a cluster randomized, open-label, pragmatic non-inferiority trial performed in primary care practices in the Netherlands. We randomized 26 practices: 15 to the integrated care intervention and 11 to usual care. The integrated care intervention consisted of (i) quarterly AF check-ups by trained nurses in primary care, also focusing on possibly interfering comorbidities, (ii) monitoring of anticoagulation therapy in primary care, and finally (iii) easy-access availability of consultations from cardiologists and anticoagulation clinics. The primary endpoint was all-cause mortality during 2 years of follow-up. In the intervention arm, 527 out of 941 eligible AF patients aged ≥65 years provided informed consent to undergo the intervention. These 527 patients were compared with 713 AF patients in the control arm receiving usual care. Median age was 77 (interquartile range 72-83) years. The all-cause mortality rate was 3.5 per 100 patient-years in the intervention arm vs. 6.7 per 100 patient-years in the control arm [adjusted hazard ratio (HR) 0.55; 95% confidence interval (CI) 0.37-0.82]. For non-cardiovascular mortality, the adjusted HR was 0.47 (95% CI 0.27-0.82). For other adverse events, no statistically significant differences were observed. CONCLUSION: In this cluster randomized trial, integrated care for elderly AF patients in primary care showed a 45% reduction in all-cause mortality when compared with usual care.


Asunto(s)
Fibrilación Atrial , Cardiólogos , Anciano , Anciano de 80 o más Años , Anticoagulantes/uso terapéutico , Fibrilación Atrial/terapia , Comorbilidad , Humanos , Países Bajos/epidemiología , Atención Primaria de Salud
20.
Diabetologia ; 63(6): 1110-1119, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32246157

RESUMEN

AIMS/HYPOTHESIS: The aims of this study were to identify all published prognostic models predicting retinopathy risk applicable to people with type 2 diabetes, to assess their quality and accuracy, and to validate their predictive accuracy in a head-to-head comparison using an independent type 2 diabetes cohort. METHODS: A systematic search was performed in PubMed and Embase in December 2019. Studies that met the following criteria were included: (1) the model was applicable in type 2 diabetes; (2) the outcome was retinopathy; and (3) follow-up was more than 1 year. Screening, data extraction (using the checklist for critical appraisal and data extraction for systemic reviews of prediction modelling studies [CHARMS]) and risk of bias assessment (by prediction model risk of bias assessment tool [PROBAST]) were performed independently by two reviewers. Selected models were externally validated in the large Hoorn Diabetes Care System (DCS) cohort in the Netherlands. Retinopathy risk was calculated using baseline data and compared with retinopathy incidence over 5 years. Calibration after intercept adjustment and discrimination (Harrell's C statistic) were assessed. RESULTS: Twelve studies were included in the systematic review, reporting on 16 models. Outcomes ranged from referable retinopathy to blindness. Discrimination was reported in seven studies with C statistics ranging from 0.55 (95% CI 0.54, 0.56) to 0.84 (95% CI 0.78, 0.88). Five studies reported on calibration. Eight models could be compared head-to-head in the DCS cohort (N = 10,715). Most of the models underestimated retinopathy risk. Validating the models against different severities of retinopathy, C statistics ranged from 0.51 (95% CI 0.49, 0.53) to 0.89 (95% CI 0.88, 0.91). CONCLUSIONS/INTERPRETATION: Several prognostic models can accurately predict retinopathy risk in a population-based type 2 diabetes cohort. Most of the models include easy-to-measure predictors enhancing their applicability. Tailoring retinopathy screening frequency based on accurate risk predictions may increase the efficiency and cost-effectiveness of diabetic retinopathy care. REGISTRATION: PROSPERO registration ID CRD42018089122.


Asunto(s)
Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/epidemiología , Retinopatía Diabética/epidemiología , Retinopatía Diabética/etiología , Animales , Humanos , Países Bajos/epidemiología , Atención Primaria de Salud/estadística & datos numéricos , Pronóstico , Medición de Riesgo/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA