Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 346
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38347141

RESUMEN

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Aprendizaje Automático , Semántica
2.
J Am Soc Nephrol ; 2024 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-39412887

RESUMEN

BACKGROUND: Prognostic models are becoming increasingly relevant in clinical trials as potential surrogate endpoints, and for patient management as clinical decision support tools. However, the impact of competing risks on model performance remains poorly investigated. We aimed to carefully assess the performance of competing risk and noncompeting risk models in the context of kidney transplantation, where allograft failure and death with a functioning graft are two competing outcomes. METHODS: We included 11,046 kidney transplant recipients enrolled in 10 countries. We developed prediction models for long-term kidney graft failure prediction, without accounting (i.e., censoring) and accounting for the competing risk of death with a functioning graft, using Cox, Fine-Gray, and cause-specific Cox regression models. To this aim, we followed a detailed and transparent analytical framework for competing and noncompeting risk modelling, and carefully assessed the models' development, stability, discrimination, calibration, overall fit, clinical utility, and generalizability in external validation cohorts and subpopulations. More than 15 metrics were used to provide an exhaustive assessment of model performance. RESULTS: Among 11,046 recipients in the derivation and validation cohorts, 1,497 (14%) lost their graft and 1,003 (9%) died with a functioning graft after a median follow-up post-risk evaluation of 4.7 years (IQR 2.7-7.0). The cumulative incidence of graft loss was similarly estimated by Kaplan-Meier and Aalen-Johansen methods (17% versus 16% in the derivation cohort). Cox and competing risk models showed similar and stable risk estimates for predicting long-term graft failure (average mean absolute prediction error of 0.0140, 0.0138 and 0.0135 for Cox, Fine-Gray, and cause-specific Cox models, respectively). Discrimination and overall fit were comparable in the validation cohorts, with concordance index ranging from 0.76 to 0.87. Across various subpopulations and clinical scenarios, the models performed well and similarly, although in some high-risk groups (such as donors over 65 years old), the findings suggest a trend towards moderately improved calibration when using a competing risk approach. CONCLUSIONS: Competing and noncompeting risk models performed similarly in predicting long-term kidney graft failure.

3.
Am J Epidemiol ; 193(2): 377-388, 2024 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-37823269

RESUMEN

Propensity score analysis is a common approach to addressing confounding in nonrandomized studies. Its implementation, however, requires important assumptions (e.g., positivity). The disease risk score (DRS) is an alternative confounding score that can relax some of these assumptions. Like the propensity score, the DRS summarizes multiple confounders into a single score, on which conditioning by matching allows the estimation of causal effects. However, matching relies on arbitrary choices for pruning out data (e.g., matching ratio, algorithm, and caliper width) and may be computationally demanding. Alternatively, weighting methods, common in propensity score analysis, are easy to implement and may entail fewer choices, yet none have been developed for the DRS. Here we present 2 weighting approaches: One derives directly from inverse probability weighting; the other, named target distribution weighting, relates to importance sampling. We empirically show that inverse probability weighting and target distribution weighting display performance comparable to matching techniques in terms of bias but outperform them in terms of efficiency (mean squared error) and computational speed (up to >870 times faster in an illustrative study). We illustrate implementation of the methods in 2 case studies where we investigate placebo treatments for multiple sclerosis and administration of aspirin in stroke patients.


Asunto(s)
Accidente Cerebrovascular , Humanos , Puntaje de Propensión , Factores de Riesgo , Sesgo , Causalidad , Accidente Cerebrovascular/epidemiología , Accidente Cerebrovascular/etiología , Simulación por Computador
4.
BJOG ; 2024 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-39118202

RESUMEN

OBJECTIVES: Accurate assessment of gestational age (GA) is important at both individual and population levels. The most accurate way to estimate GA in women who book late in pregnancy is unknown. The aim of this study was to externally validate the accuracy of equations for GA estimation in late pregnancy and to identify the best equation for estimating GA in women who do not receive an ultrasound scan until the second or third trimester. DESIGN: This was a prospective, observational cross-sectional study. SETTING: 57 prenatal care centres, France. PARTICIPANTS: Women with a singleton pregnancy and a previous 11-14-week dating scan that gave the observed GA were recruited over an 8-week period. They underwent a standardised ultrasound examination at one time point during the pregnancy (15-43 weeks), measuring 12 foetal biometric parameters that have previously been identified as useful for GA estimation. MAIN OUTCOME MEASURES: A total of 189 equations that estimate GA based on foetal biometry were examined and compared with GA estimation based on foetal CRL. Comparisons between the observed GA and the estimated GA were made using R2, calibration slope and intercept. RMSE, mean difference and 95% range of error were also calculated. RESULTS: A total of 2741 pregnant women were examined. After exclusions, 2339 participants were included. In the 20 best performing equations, the intercept ranged from -0.22 to 0.30, the calibration slope from 0.96 to 1.03 and the RSME from 0.67 to 0.87. Overall, multiparameter models outperformed single-parameter models. Both the 95% range of error and mean difference increased with gestation. Commonly used models based on measurement of the head circumference alone were not amongst the best performing models and were associated with higher 95% error and mean difference. CONCLUSIONS: We provide strong evidence that GA-specific equations based on multiparameter models should be used to estimate GA in late pregnancy. However, as all methods of GA assessment in late pregnancy are associated with large prediction intervals, efforts to improve access to early antenatal ultrasound must remain a priority. TRIAL REGISTRATION: The proposal for this study and the corresponding methodological review was registered on PROSPERO international register of systematic reviews (registration number: CRD4201913776).

5.
Nature ; 618(7964): 238, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37280286
6.
BMC Med ; 21(1): 406, 2023 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-37880689

RESUMEN

BACKGROUND: The aim of this study was to forecast future patient demand for shoulder replacement surgery in England and investigate any geographic and socioeconomic inequalities in service provision and patient outcomes. METHODS: For this cohort study, all elective shoulder replacements carried out by NHS hospitals and NHS-funded care in England from 1999 to 2020 were identified using Hospital Episode Statistics data. Eligible patients were aged 18 years and older. Shoulder replacements for malignancy or acute trauma were excluded. Population estimates and projections were obtained from the Office for National Statistics. Standardised incidence rates and the risks of serious adverse events (SAEs) and revision surgery were calculated and stratified by geographical region, socioeconomic deprivation, sex, and age band. Hospital costs for each admission were calculated using Healthcare Resource Group codes and NHS Reference Costs based on the National Reimbursement System. Projected rates and hospital costs were predicted until the year 2050 for two scenarios of future growth. RESULTS: A total of 77,613 elective primary and 5847 revision shoulder replacements were available for analysis. Between 1999 and 2020, the standardised incidence of primary shoulder replacements in England quadrupled from 2.6 to 10.4 per 100,000 population, increasing predominantly in patients aged over 65 years. As many as 1 in 6 patients needed to travel to a different region for their surgery indicating inequality of service provision. A temporal increase in SAEs was observed: the 30-day risk increased from 1.3 to 4.8% and the 90-day risk increased from 2.4 to 6.0%. Patients from the more deprived socioeconomic groups appeared to have a higher risk of SAEs and revision surgery. Shoulder replacements are forecast to increase by up to 234% by 2050 in England, reaching 20,912 procedures per year with an associated annual cost to hospitals of £235 million. CONCLUSIONS: This study reports a rising incidence of shoulder replacements, regional disparities in service provision, and an overall increasing risk of SAEs, especially in more deprived socioeconomic groups. These findings highlight the need for better healthcare planning to match local population demand, while more research is needed to understand and prevent the increase observed in SAEs.


Asunto(s)
Artroplastía de Reemplazo de Hombro , Humanos , Estudios de Cohortes , Inglaterra/epidemiología , Hospitales , Hospitalización
7.
BMC Med ; 21(1): 502, 2023 12 18.
Artículo en Inglés | MEDLINE | ID: mdl-38110939

RESUMEN

BACKGROUND: Each year, thousands of clinical prediction models are developed to make predictions (e.g. estimated risk) to inform individual diagnosis and prognosis in healthcare. However, most are not reliable for use in clinical practice. MAIN BODY: We discuss how the creation of a prediction model (e.g. using regression or machine learning methods) is dependent on the sample and size of data used to develop it-were a different sample of the same size used from the same overarching population, the developed model could be very different even when the same model development methods are used. In other words, for each model created, there exists a multiverse of other potential models for that sample size and, crucially, an individual's predicted value (e.g. estimated risk) may vary greatly across this multiverse. The more an individual's prediction varies across the multiverse, the greater the instability. We show how small development datasets lead to more different models in the multiverse, often with vastly unstable individual predictions, and explain how this can be exposed by using bootstrapping and presenting instability plots. We recommend healthcare researchers seek to use large model development datasets to reduce instability concerns. This is especially important to ensure reliability across subgroups and improve model fairness in practice. CONCLUSIONS: Instability is concerning as an individual's predicted value is used to guide their counselling, resource prioritisation, and clinical decision making. If different samples lead to different models with very different predictions for the same individual, then this should cast doubt into using a particular model for that individual. Therefore, visualising, quantifying and reporting the instability in individual-level predictions is essential when proposing a new model.


Asunto(s)
Modelos Estadísticos , Humanos , Pronóstico , Reproducibilidad de los Resultados
8.
J Pediatr ; 258: 113370, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37059387

RESUMEN

OBJECTIVE: To review systematically and assess the accuracy of prediction models for bronchopulmonary dysplasia (BPD) at 36 weeks of postmenstrual age. STUDY DESIGN: Searches were conducted in MEDLINE and EMBASE. Studies published between 1990 and 2022 were included if they developed or validated a prediction model for BPD or the combined outcome death/BPD at 36 weeks in the first 14 days of life in infants born preterm. Data were extracted independently by 2 authors following the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (ie, CHARMS) and PRISMA guidelines. Risk of bias was assessed using the Prediction model Risk Of Bias ASsessment Tool (ie, PROBAST). RESULTS: Sixty-five studies were reviewed, including 158 development and 108 externally validated models. Median c-statistic of 0.84 (range 0.43-1.00) was reported at model development, and 0.77 (range 0.41-0.97) at external validation. All models were rated at high risk of bias, due to limitations in the analysis part. Meta-analysis of the validated models revealed increased c-statistics after the first week of life for both the BPD and death/BPD outcome. CONCLUSIONS: Although BPD prediction models perform satisfactorily, they were all at high risk of bias. Methodologic improvement and complete reporting are needed before they can be considered for use in clinical practice. Future research should aim to validate and update existing models.


Asunto(s)
Displasia Broncopulmonar , Recien Nacido Prematuro , Lactante , Recién Nacido , Humanos , Displasia Broncopulmonar/epidemiología
9.
BMC Med Res Methodol ; 23(1): 188, 2023 08 19.
Artículo en Inglés | MEDLINE | ID: mdl-37598153

RESUMEN

BACKGROUND: Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. METHODS: We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. RESULTS: A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63-82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66-84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). CONCLUSIONS: Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.


Asunto(s)
Modelos Estadísticos , Investigadores , Humanos , Pronóstico , PubMed
10.
Age Ageing ; 52(3)2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36995136

RESUMEN

BACKGROUND: Alzheimer's disease (AD) is the most common cause of dementia and this progressive neurological disorder is associated with substantial mortality and morbidity. We aimed to report the burden of AD and other types of dementia in the Middle East and North Africa (MENA) region, by age, sex and sociodemographic index (SDI), for the period 1990-2019. METHODS: publicly accessible data on the prevalence, death and disability-adjusted life years (DALYs) because of AD, and other types of dementia, were retrieved from the global burden of disease 2019 project for all MENA countries from 1990 to 2019. RESULTS: in 2019, the age-standardised point prevalence of dementia was 777.6 per 100,000 populations in MENA, which was 3.0% higher than in 1990. The age-standardised death and DALY rates of dementia were 25.5 and 387.0 per 100,000, respectively. In 2019, the highest DALY rate was observed in Afghanistan and the lowest rate was in Egypt. That same year, the age-standardised point prevalence, death and DALY rates increased with advancing age and were higher for females of all age groups. From 1990 to 2019, the DALY rate of dementia decreased with increasing SDI up to 0.4, then slightly increased up to an SDI of 0.75, followed by a decrease for the remaining SDI levels. CONCLUSIONS: the point prevalence of AD and other types of dementia has increased over the past three decades, and in 2019, the corresponding regional burden was higher than the global average.


Asunto(s)
Enfermedad de Alzheimer , Femenino , Humanos , Años de Vida Ajustados por Calidad de Vida , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/epidemiología , Carga Global de Enfermedades , Prevalencia , África del Norte/epidemiología , Medio Oriente/epidemiología , Salud Global
11.
Int J Eat Disord ; 56(2): 394-406, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36301044

RESUMEN

OBJECTIVE: We aimed to report the burden of bulimia nervosa (BN) in the Middle East and North Africa (MENA) region by age, sex, and sociodemographic index (SDI), for the period 1990-2019. METHODS: Estimates of the prevalence, incidence, and disability-adjusted life-years (DALYs) attributable to BN were retrieved from the Global Burden of Disease study 2019, between 1990 and 2019, for the 21 countries in the MENA region. The counts and age-standardized rates (per 100,000) were presented, along with their corresponding 95% uncertainty intervals. RESULTS: In 2019, the estimated regional age-standardized point prevalence and incidence rates of BN were 168.3 (115.0-229.6) and 178.6 (117.0-255.6) per 100,000, which represented 22.0% (17.5-27.2) and 10.4% (7.1-14.7) increases, respectively, since 1990. Moreover, in 2019 the regional age-standardized DALY rate was 35.5 (20.6-55.5) per 100,000, which was 22.2% (16.7-28.2) higher than in 1990. In 2019, Qatar (58.6 [34.3-92.5]) and Afghanistan (18.4 [10.6-29.2]) had the highest and lowest age-standardized DALY rates, respectively. Regionally, the age-standardized point prevalence of BN peaked in the 30-34 age group and was more prevalent among women. In addition, there was a generally positive association between SDI and the burden of BN across the measurement period. DISCUSSION: In the MENA region, the burden of BN has increased over the last three decades. Cost-effective preventive measures are needed in the region, especially in the high SDI countries. PUBLIC SIGNIFICANCE: This study reports the estimated burden of BN in the MENA region and shows that its burden has increased over the last three decades.


Asunto(s)
Bulimia Nerviosa , Humanos , Femenino , Bulimia Nerviosa/epidemiología , Años de Vida Ajustados por Calidad de Vida , Carga Global de Enfermedades , Medio Oriente/epidemiología , África del Norte/epidemiología , Prevalencia , Incidencia
12.
Qual Life Res ; 32(2): 507-518, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36169788

RESUMEN

PURPOSE: In order to enable cost-utility analysis of shoulder pain conditions and treatments, this study aimed to develop and evaluate mapping algorithms to estimate the EQ-5D health index from the Oxford Shoulder Score (OSS) when health outcomes are only assessed with the OSS. METHODS: 5437 paired OSS and EQ-5D questionnaire responses from four national multicentre randomised controlled trials investigating different shoulder pathologies and treatments were split into training and testing samples. Separate EQ-5D-3L and EQ-5D-5L analyses were undertaken. Transfer to utility (TTU) regression (univariate linear, polynomial, spline, multivariable linear, two-part logistic-linear, tobit and adjusted limited dependent variable mixture models) and response mapping (ordered logistic regression and seemingly unrelated regression (SUR)) models were developed on the training sample. These were internally validated, and their performance evaluated on the testing sample. Model performance was evaluated over 100-fold repeated training-testing sample splits. RESULTS: For the EQ-5D-3L analysis, the multivariable linear and splines models had the lowest mean square error (MSE) of 0.0415. The SUR model had the lowest mean absolute error (MAE) of 0.136. Model performance was greatest in the mid-range and best health states, and lowest in poor health states. For the EQ-5D-5L analyses, the multivariable linear and splines models had the lowest MSE (0.0241-0.0278) while the SUR models had the lowest MAE (0.105-0.113). CONCLUSION: The developed models now allow accurate estimation of the EQ-5D health index when only the OSS responses are available as a measure of patient-reported health outcome.


Asunto(s)
Calidad de Vida , Hombro , Humanos , Calidad de Vida/psicología , Encuestas y Cuestionarios , Dolor , Modelos Logísticos , Algoritmos
13.
Inj Prev ; 29(6): 461-473, 2023 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-37620010

RESUMEN

INTRODUCTION: Musculoskeletal injury (MSK-I) mitigation and prevention programmes (MSK-IMPPs) have been developed and implemented across militaries worldwide. Although programme efficacy is often reported, development and implementation details are often overlooked, limiting their scalability, sustainability and effectiveness. This scoping review aimed to identify the following in military populations: (1) barriers and facilitators to implementing and scaling MSK-IMPPs; (2) gaps in MSK-IMPP research and (3) future research priorities. METHODS: A scoping review assessed literature from inception to April 2022 that included studies on MSK-IMPP implementation and/or effectiveness in military populations. Barriers and facilitators to implementing these programmes were identified. RESULTS: From 132 articles, most were primary research studies (90; 68.2%); the remainder were review papers (42; 31.8%). Among primary studies, 3 (3.3%) investigated only women, 62 (69%) only men and 25 (27.8%) both. Barriers included limited resources, lack of stakeholder engagement, competing military priorities and equipment-related factors. Facilitators included strong stakeholder engagement, targeted programme design, involvement/proximity of MSK-I experts, providing MSK-I mitigation education, low burden on resources and emphasising end-user acceptability. Research gaps included variability in reported MSK-I outcomes and no consensus on relevant surveillance metrics and definitions. CONCLUSION: Despite a robust body of literature, there is a dearth of information about programme implementation; specifically, barriers or facilitators to success. Additionally, variability in outcomes and lack of consensus on MSK-I definitions may affect the development, implementation evaluation and comparison of MSK-IMPPs. There is a need for international consensus on definitions and optimal data reporting elements when conducting injury risk mitigation research in the military.


Asunto(s)
Personal Militar , Enfermedades Musculoesqueléticas , Masculino , Humanos , Femenino , Enfermedades Musculoesqueléticas/prevención & control , Evaluación de Programas y Proyectos de Salud
14.
BMC Musculoskelet Disord ; 24(1): 59, 2023 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-36683025

RESUMEN

BACKGROUND: Low back pain (LBP) is the most common musculoskeletal disorder globally. Providing region- and national-specific information on the burden of low back pain is critical for local healthcare policy makers. The present study aimed to report, compare, and contextualize the prevalence, incidence and years lived with disability (YLDs) of low back pain in the Middle East and North Africa (MENA) region by age, sex and sociodemographic index (SDI), from 1990 to 2019. METHODS: Publicly available data were obtained from the Global Burden of Disease (GBD) study 2019. The burden of LBP was reported for the 21 countries located in the MENA region, from 1990 to 2019. All estimates were reported as counts and age-standardised rates per 100,000 population, together with their corresponding 95% uncertainty intervals (UIs). RESULTS: In 2019, the age-standardised point prevalence and incidence rate per 100,000 in MENA were 7668.2 (95% UI 6798.0 to 8363.3) and 3215.9 (95%CI 2838.8 to 3638.3), which were 5.8% (4.3 to 7.4) and 4.4% (3.4 to 5.5) lower than in 1990, respectively. Furthermore, the regional age-standardised YLD rate in 2019 was 862.0 (605.5 to 1153.3) per 100,000, which was 6.0% (4.2 to 7.7) lower than in 1990. In 2019, Turkey [953.6 (671.3 to 1283.5)] and Lebanon [727.2 (511.5 to 966.0)] had the highest and lowest age-standardised YLD rates, respectively. There was no country in the MENA region that showed increases in the age-standardised prevalence, incidence or YLD rates of LBP over the measurement period. Furthermore, in 2019 the number of prevalent cases were highest in the 35-39 age group, with males having a higher number of cases in all age groups. In addition, the age-standardised YLD rates for males in the MENA region were higher than the global estimates in almost all age groups, in both 1990 and 2019. Furthermore, the burden of LBP was not associated with the level of socio-economic development during the measurement period. CONCLUSION: The burden attributable to LBP in the MENA region decreased slightly from 1990 to 2019. Furthermore, the burden among males was higher than the global average. Consequently, more integrated healthcare interventions are needed to more effectively alleviate the burden of low back pain in this region.


Asunto(s)
Dolor de la Región Lumbar , Masculino , Humanos , Dolor de la Región Lumbar/diagnóstico , Dolor de la Región Lumbar/epidemiología , Prevalencia , Incidencia , Carga Global de Enfermedades , África del Norte/epidemiología , Turquía , Salud Global , Años de Vida Ajustados por Calidad de Vida
15.
Biom J ; 65(8): e2200302, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37466257

RESUMEN

Clinical prediction models estimate an individual's risk of a particular health outcome. A developed model is a consequence of the development dataset and model-building strategy, including the sample size, number of predictors, and analysis method (e.g., regression or machine learning). We raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model-building steps (those used to develop the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and deriving (i) a prediction instability plot of bootstrap model versus original model predictions; (ii) the mean absolute prediction error (mean absolute difference between individuals' original and bootstrap model predictions), and (iii) calibration, classification, and decision curve instability plots of bootstrap models applied in the original sample. A case study illustrates how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), while informing a model's critical appraisal (risk of bias rating), fairness, and further validation requirements.


Asunto(s)
Aprendizaje Automático , Modelos Estadísticos , Humanos , Pronóstico , Simulación por Computador
16.
J Strength Cond Res ; 37(5): 1057-1063, 2023 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-36730571

RESUMEN

ABSTRACT: Bullock, GS, Shanley, E, Thigpen, CA, Arden, NK, Noonan, TK, Kissenberth, MJ, Wyland, DJ, and Collins, GS. Improving clinical utility of real-world prediction models: updating through recalibration. J Strength Cond Res 37(5): 1057-1063, 2023-Prediction models can aid clinicians in identifying at-risk athletes. However, sport and clinical practice patterns continue to change, causing predictive drift and potential suboptimal prediction model performance. Thus, there is a need to temporally recalibrate previously developed baseball arm injury models. The purpose of this study was to perform temporal recalibration on a previously developed injury prediction model and assess model performance in professional baseball pitchers. An arm injury prediction model was developed on data from a prospective cohort from 2009 to 2019 on minor league pitchers. Data for the 2015-2019 seasons were used for temporal recalibration and model performance assessment. Temporal recalibration constituted intercept-only and full model redevelopment. Model performance was investigated by assessing Nagelkerke's R-square, calibration in the large, calibration, and discrimination. Decision curves compared the original model, temporal recalibrated model, and current best evidence-based practice. One hundred seventy-eight pitchers participated in the 2015-2019 seasons with 1.63 arm injuries per 1,000 athlete exposures. The temporal recalibrated intercept model demonstrated the best discrimination (0.81 [95% confidence interval [CI]: 0.73, 0.88]) and R-square (0.32) compared with original model (0.74 [95% CI: 0.69, 0.80]; R-square: 0.32) and the redeveloped model (0.80 [95% CI: 0.73, 0.87]; R-square: 0.30). The temporal recalibrated intercept model demonstrated an improved net benefit of 0.34 compared with current best evidence-based practice. The temporal recalibrated intercept model demonstrated the best model performance and clinical utility. Updating prediction models can account for changes in sport training over time and improve professional baseball arm injury outcomes.


Asunto(s)
Traumatismos del Brazo , Béisbol , Humanos , Estudios Prospectivos , Béisbol/lesiones , Atletas , Estaciones del Año
17.
Cancer ; 128(9): 1840-1852, 2022 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-35239973

RESUMEN

BACKGROUND: Alcohol consumption is a risk factor for a number of communicable and non-communicable diseases, including several types of cancer. This article reports the burden of cancers attributable to alcohol consumption by age, sex, location, sociodemographic index (SDI), and cancer type from 1990 to 2019. METHODS: The Comparative Risk Assessment approach was used in the 2019 Global Burden of Disease study to report the burden of cancers attributable to alcohol consumption between 1990 and 2019. RESULTS: In 2019, there were globally an estimated 494.7 thousand cancer deaths (95% uncertainty interval [UI], 439.7 to 554.1) and 13.0 million cancer disability-adjusted life-years (DALYs; 95% UI, 11.6 to 14.5) that were attributable to alcohol consumption. The alcohol-attributable DALYs were much higher in men (10.5 million; 95% UI, 9.2 to 11.8) than women (2.5 million; 95% UI, 2.2 to 2.9). The global age-standardized death and DALY rates of cancers attributable to alcohol decreased by 14.7% (95% UI, 6.4% to 23%) and 18.1% (95% UI, 9.2% to 26.5%), respectively, over the study period. Central Europe had the highest age-standardized death rates that were attributable to alcohol consumption(10.3; 95% UI, 8.7 to12.0). Moreover, there was an overall positive association between SDI and the regional age-standardized DALY rate for alcohol-attributable cancers. CONCLUSIONS: Despite decreases in age-standardized deaths and DALYs, substantial numbers of cancer deaths and DALYs are still attributable to alcohol consumption. Because there is a higher burden in males, the elderly, and developed regions (based on SDI), these groups and regions should be prioritized in any prevention programs.


Asunto(s)
Años de Vida Ajustados por Discapacidad , Neoplasias , Anciano , Consumo de Bebidas Alcohólicas/efectos adversos , Consumo de Bebidas Alcohólicas/epidemiología , Femenino , Carga Global de Enfermedades , Salud Global , Humanos , Masculino , Neoplasias/epidemiología , Años de Vida Ajustados por Calidad de Vida , Factores de Riesgo
18.
Br J Cancer ; 126(4): 533-550, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34703006

RESUMEN

Apart from high-risk scenarios such as the presence of highly penetrant genetic mutations, breast screening typically comprises mammography or tomosynthesis strategies defined by age. However, age-based screening ignores the range of breast cancer risks that individual women may possess and is antithetical to the ambitions of personalised early detection. Whilst screening mammography reduces breast cancer mortality, this is at the risk of potentially significant harms including overdiagnosis with overtreatment, and psychological morbidity associated with false positives. In risk-stratified screening, individualised risk assessment may inform screening intensity/interval, starting age, imaging modality used, or even decisions not to screen. However, clear evidence for its benefits and harms needs to be established. In this scoping review, the authors summarise the established and emerging evidence regarding several critical dependencies for successful risk-stratified breast screening: risk prediction model performance, epidemiological studies, retrospective clinical evaluations, health economic evaluations and qualitative research on feasibility and acceptability. Family history, breast density or reproductive factors are not on their own suitable for precisely estimating risk and risk prediction models increasingly incorporate combinations of demographic, clinical, genetic and imaging-related parameters. Clinical evaluations of risk-stratified screening are currently limited. Epidemiological evidence is sparse, and randomised trials only began in recent years.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Predisposición Genética a la Enfermedad/genética , Mamografía/métodos , Neoplasias de la Mama/genética , Toma de Decisiones Clínicas , Detección Precoz del Cáncer , Femenino , Humanos , Guías de Práctica Clínica como Asunto , Estudios Retrospectivos , Sensibilidad y Especificidad
19.
Radiology ; 304(1): 50-62, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35348381

RESUMEN

Background Patients with fractures are a common emergency presentation and may be misdiagnosed at radiologic imaging. An increasing number of studies apply artificial intelligence (AI) techniques to fracture detection as an adjunct to clinician diagnosis. Purpose To perform a systematic review and meta-analysis comparing the diagnostic performance in fracture detection between AI and clinicians in peer-reviewed publications and the gray literature (ie, articles published on preprint repositories). Materials and Methods A search of multiple electronic databases between January 2018 and July 2020 (updated June 2021) was performed that included any primary research studies that developed and/or validated AI for the purposes of fracture detection at any imaging modality and excluded studies that evaluated image segmentation algorithms. Meta-analysis with a hierarchical model to calculate pooled sensitivity and specificity was used. Risk of bias was assessed by using a modified Prediction Model Study Risk of Bias Assessment Tool, or PROBAST, checklist. Results Included for analysis were 42 studies, with 115 contingency tables extracted from 32 studies (55 061 images). Thirty-seven studies identified fractures on radiographs and five studies identified fractures on CT images. For internal validation test sets, the pooled sensitivity was 92% (95% CI: 88, 93) for AI and 91% (95% CI: 85, 95) for clinicians, and the pooled specificity was 91% (95% CI: 88, 93) for AI and 92% (95% CI: 89, 92) for clinicians. For external validation test sets, the pooled sensitivity was 91% (95% CI: 84, 95) for AI and 94% (95% CI: 90, 96) for clinicians, and the pooled specificity was 91% (95% CI: 81, 95) for AI and 94% (95% CI: 91, 95) for clinicians. There were no statistically significant differences between clinician and AI performance. There were 22 of 42 (52%) studies that were judged to have high risk of bias. Meta-regression identified multiple sources of heterogeneity in the data, including risk of bias and fracture type. Conclusion Artificial intelligence (AI) and clinicians had comparable reported diagnostic performance in fracture detection, suggesting that AI technology holds promise as a diagnostic adjunct in future clinical practice. Clinical trial registration no. CRD42020186641 © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by Cohen and McInnes in this issue.


Asunto(s)
Inteligencia Artificial , Fracturas Óseas , Algoritmos , Fracturas Óseas/diagnóstico por imagen , Humanos , Sensibilidad y Especificidad
20.
Hum Reprod ; 37(8): 1919-1931, 2022 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-35586937

RESUMEN

STUDY QUESTION: What is the global, regional and national burden of polycystic ovary syndrome (PCOS), by age and socio-demographic index (SDI), over the period 1990-2019? SUMMARY ANSWER: In 2019, the global age-standardized point prevalence, incidence and years lived with disability (YLD) of PCOS were 30.4, 29.5 and 29.9 per 100 000 population, respectively. WHAT IS KNOWN ALREADY: Data from the Global Burden of Disease (GBD) study 2017 showed that the global age-standardized PCOS incidence rate increased 1.45% over the period 1990-2017. STUDY DESIGN, SIZE, DURATION: A systematic analysis of the PCOS prevalence, incidence and YLDs across 204 countries and territories was performed. PARTICIPANTS/MATERIALS, SETTING, METHODS: Data on the point prevalence, annual incidence and YLDs due to PCOS were retrieved from the GBD study 2019 for 204 countries and territories from 1990 to 2019. The counts and age-standardized rates (per 100 000) are presented, along with their corresponding 95% uncertainty intervals (UIs). MAIN RESULTS AND THE ROLE OF CHANCE: In 2019, the global age-standardized point prevalence and annual incidence rates for PCOS were 1677.8 (95% UI: 1166.0 to 2192.4) and 59.8 (95% UI: 41.7 to 78.9) per 100 000, which represents a 30.4% and 29.5% increase since 1990, respectively. Moreover, the global age-standardized YLD rate in 2019 was 14.7 (6.3-29.5), an increase of 29.9% since 1990. In 2019, Italy (7897.0), Japan (6298.7) and New Zealand (5419.1) had the highest estimated age-standardized point prevalences of PCOS. Globally, the number of prevalent cases and the point prevalence of PCOS peaked in the 25-29 years and 40-44 years age groups, respectively. Positive associations were found between the burden of PCOS and the SDI at the regional and national levels. LIMITATIONS, REASONS FOR CAUTION: Variations in how PCOS was defined is a major limitation that prevents valid comparisons between different regions. WIDER IMPLICATIONS OF THE FINDINGS: Globally, the burden of PCOS has increased at an alarming rate, making it a major public health concern. Increasing public awareness about this common condition, improving management options and increasing support to reduce factors which lead to further complications, need to be public health priorities. STUDY FUNDING/COMPETING INTEREST(S): The Bill and Melinda Gates Foundation, who were not involved in any way in the preparation of this manuscript, funded the GBD study. The Shahid Beheshti University of Medical Sciences, Tehran, Iran (Grant No. 28709) also supported the present report. The authors declare no competing interests. TRIAL REGISTRATION NUMBER: N/A.


Asunto(s)
Síndrome del Ovario Poliquístico , Años de Vida Ajustados por Discapacidad , Femenino , Carga Global de Enfermedades , Salud Global , Humanos , Incidencia , Irán , Síndrome del Ovario Poliquístico/complicaciones , Síndrome del Ovario Poliquístico/epidemiología , Prevalencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA