Búsqueda | Portal de Búsqueda de la BVS España

1.

Metrics reloaded: recommendations for image analysis validation.

Maier-Hein, Lena; Reinke, Annika; Godau, Patrick; Tizabi, Minu D; Buettner, Florian; Christodoulou, Evangelia; Glocker, Ben; Isensee, Fabian; Kleesiek, Jens; Kozubek, Michal; Reyes, Mauricio; Riegler, Michael A; Wiesenfarth, Manuel; Kavur, A Emre; Sudre, Carole H; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Rädsch, Tim; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew B; Cardoso, M Jorge; Cheplygina, Veronika; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kofler, Florian; Kopp-Schneider, Annette; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin.

Nat Methods ; 21(2): 195-212, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38347141

RESUMEN

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

Asunto(s)

Algoritmos , Procesamiento de Imagen Asistido por Computador , Aprendizaje Automático , Semántica

2.

Prognostic Biomarkers in Kidney Transplantation: A Systematic Review and Critical Appraisal.

Raynaud, Marc; Al-Awadhi, Solaf; Louis, Kevin; Zhang, Huanxi; Su, Xiaojun; Goutaudier, Valentin; Wang, Jiali; Demir, Zeynep; Wei, Yongcheng; Truchot, Agathe; Bouquegneau, Antoine; Del Bello, Arnaud; Bailly, Élodie; Lombardi, Yannis; Maanaoui, Mehdi; Giarraputo, Alessia; Naser, Sofia; Divard, Gillian; Aubert, Olivier; Murad, Mohammad Hassan; Wang, Changxi; Liu, Longshan; Bestard, Oriol; Naesens, Maarten; Friedewald, John J; Lefaucheur, Carmen; Riella, Leonardo; Collins, Gary; Ioannidis, John P A; Loupy, Alexandre.

J Am Soc Nephrol ; 35(2): 177-188, 2024 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-38053242

RESUMEN

SIGNIFICANCE STATEMENT: Why are there so few biomarkers accepted by health authorities and implemented in clinical practice, despite the high and growing number of biomaker studies in medical research ? In this meta-epidemiological study, including 804 studies that were critically appraised by expert reviewers, the authors have identified all prognostic kidney transplant biomarkers and showed overall suboptimal study designs, methods, results, interpretation, reproducible research standards, and transparency. The authors also demonstrated for the first time that the limited number of studies challenged the added value of their candidate biomarkers against standard-of-care routine patient monitoring parameters. Most biomarker studies tended to be single-center, retrospective studies with a small number of patients and clinical events. Less than 5% of the studies performed an external validation. The authors also showed the poor transparency reporting and identified a data beautification phenomenon. These findings suggest that there is much wasted research effort in transplant biomarker medical research and highlight the need to produce more rigorous studies so that more biomarkers may be validated and successfully implemented in clinical practice. BACKGROUND: Despite the increasing number of biomarker studies published in the transplant literature over the past 20 years, demonstrations of their clinical benefit and their implementation in routine clinical practice are lacking. We hypothesized that suboptimal design, data, methodology, and reporting might contribute to this phenomenon. METHODS: We formed a consortium of experts in systematic reviews, nephrologists, methodologists, and epidemiologists. A systematic literature search was performed in PubMed, Embase, Scopus, Web of Science, and Cochrane Library between January 1, 2005, and November 12, 2022 (PROSPERO ID: CRD42020154747). All English language, original studies investigating the association between a biomarker and kidney allograft outcome were included. The final set of publications was assessed by expert reviewers. After data collection, two independent reviewers randomly evaluated the inconsistencies for 30% of the references for each reviewer. If more than 5% of inconsistencies were observed for one given reviewer, a re-evaluation was conducted for all the references of the reviewer. The biomarkers were categorized according to their type and the biological milieu from which they were measured. The study characteristics related to the design, methods, results, and their interpretation were assessed, as well as reproducible research practices and transparency indicators. RESULTS: A total of 7372 publications were screened and 804 studies met the inclusion criteria. A total of 1143 biomarkers were assessed among the included studies from blood ( n =821, 71.8%), intragraft ( n =169, 14.8%), or urine ( n =81, 7.1%) compartments. The number of studies significantly increased, with a median, yearly number of 31.5 studies (interquartile range [IQR], 23.8-35.5) between 2005 and 2012 and 57.5 (IQR, 53.3-59.8) between 2013 and 2022 ( P < 0.001). A total of 655 studies (81.5%) were retrospective, while 595 (74.0%) used data from a single center. The median number of patients included was 232 (IQR, 96-629) with a median follow-up post-transplant of 4.8 years (IQR, 3.0-6.2). Only 4.7% of studies were externally validated. A total of 346 studies (43.0%) did not adjust their biomarker for key prognostic factors, while only 3.1% of studies adjusted the biomarker for standard-of-care patient monitoring factors. Data sharing, code sharing, and registration occurred in 8.8%, 1.1%, and 4.6% of studies, respectively. A total of 158 studies (20.0%) emphasized the clinical relevance of the biomarker, despite the reported nonsignificant association of the biomarker with the outcome measure. A total of 288 studies assessed rejection as an outcome. We showed that these rejection studies shared the same characteristics as other studies. CONCLUSIONS: Biomarker studies in kidney transplantation lack validation, rigorous design and methodology, accurate interpretation, and transparency. Higher standards are needed in biomarker research to prove the clinical utility and support clinical use.

Asunto(s)

Trasplante de Riñón , Humanos , Pronóstico , Estudios Retrospectivos , Revisiones Sistemáticas como Asunto , Biomarcadores

3.

Competing and Noncompeting Risk Models for Predicting Kidney Allograft Failure.

Truchot, Agathe; Raynaud, Marc; Helanterä, Ilkka; Aubert, Olivier; Kamar, Nassim; Divard, Gillian; Astor, Brad; Legendre, Christophe; Hertig, Alexandre; Buchler, Matthias; Crespo, Marta; Akalin, Enver; Pujol, Gervasio Soler; Ribeiro de Castro, Maria Cristina; Matas, Arthur J; Ulloa, Camilo; Jordan, Stanley C; Huang, Edmund; Juric, Ivana; Basic-Jukic, Nikolina; Coemans, Maarten; Naesens, Maarten; Friedewald, John J; Silva, Helio Tedesco; Lefaucheur, Carmen; Segev, Dorry L; Collins, Gary S; Loupy, Alexandre.

J Am Soc Nephrol ; 2024 Oct 16.

Artículo en Inglés | MEDLINE | ID: mdl-39412887

RESUMEN

BACKGROUND: Prognostic models are becoming increasingly relevant in clinical trials as potential surrogate endpoints, and for patient management as clinical decision support tools. However, the impact of competing risks on model performance remains poorly investigated. We aimed to carefully assess the performance of competing risk and noncompeting risk models in the context of kidney transplantation, where allograft failure and death with a functioning graft are two competing outcomes. METHODS: We included 11,046 kidney transplant recipients enrolled in 10 countries. We developed prediction models for long-term kidney graft failure prediction, without accounting (i.e., censoring) and accounting for the competing risk of death with a functioning graft, using Cox, Fine-Gray, and cause-specific Cox regression models. To this aim, we followed a detailed and transparent analytical framework for competing and noncompeting risk modelling, and carefully assessed the models' development, stability, discrimination, calibration, overall fit, clinical utility, and generalizability in external validation cohorts and subpopulations. More than 15 metrics were used to provide an exhaustive assessment of model performance. RESULTS: Among 11,046 recipients in the derivation and validation cohorts, 1,497 (14%) lost their graft and 1,003 (9%) died with a functioning graft after a median follow-up post-risk evaluation of 4.7 years (IQR 2.7-7.0). The cumulative incidence of graft loss was similarly estimated by Kaplan-Meier and Aalen-Johansen methods (17% versus 16% in the derivation cohort). Cox and competing risk models showed similar and stable risk estimates for predicting long-term graft failure (average mean absolute prediction error of 0.0140, 0.0138 and 0.0135 for Cox, Fine-Gray, and cause-specific Cox models, respectively). Discrimination and overall fit were comparable in the validation cohorts, with concordance index ranging from 0.76 to 0.87. Across various subpopulations and clinical scenarios, the models performed well and similarly, although in some high-risk groups (such as donors over 65 years old), the findings suggest a trend towards moderately improved calibration when using a competing risk approach. CONCLUSIONS: Competing and noncompeting risk models performed similarly in predicting long-term kidney graft failure.

4.

Confounder Adjustment Using the Disease Risk Score: A Proposal for Weighting Methods.

Nguyen, Tri-Long; Debray, Thomas P A; Youn, Bora; Simoneau, Gabrielle; Collins, Gary S.

Am J Epidemiol ; 193(2): 377-388, 2024 Feb 05.

Artículo en Inglés | MEDLINE | ID: mdl-37823269

RESUMEN

Propensity score analysis is a common approach to addressing confounding in nonrandomized studies. Its implementation, however, requires important assumptions (e.g., positivity). The disease risk score (DRS) is an alternative confounding score that can relax some of these assumptions. Like the propensity score, the DRS summarizes multiple confounders into a single score, on which conditioning by matching allows the estimation of causal effects. However, matching relies on arbitrary choices for pruning out data (e.g., matching ratio, algorithm, and caliper width) and may be computationally demanding. Alternatively, weighting methods, common in propensity score analysis, are easy to implement and may entail fewer choices, yet none have been developed for the DRS. Here we present 2 weighting approaches: One derives directly from inverse probability weighting; the other, named target distribution weighting, relates to importance sampling. We empirically show that inverse probability weighting and target distribution weighting display performance comparable to matching techniques in terms of bias but outperform them in terms of efficiency (mean squared error) and computational speed (up to >870 times faster in an illustrative study). We illustrate implementation of the methods in 2 case studies where we investigate placebo treatments for multiple sclerosis and administration of aspirin in stroke patients.

Asunto(s)

Accidente Cerebrovascular , Humanos , Puntaje de Propensión , Factores de Riesgo , Sesgo , Causalidad , Accidente Cerebrovascular/epidemiología , Accidente Cerebrovascular/etiología , Simulación por Computador

5.

Cyanide Trapping of Iminium Ion Reactive Metabolites: Implications for Clinical Hepatotoxicity.

Miao, Xiusheng; Dear, Gordon J; Beaumont, Claire; Vitulli, Giovanni; Collins, Gary; Gorycki, Peter D; Harrell, Andrew W; Sakatis, Melanie Z.

Chem Res Toxicol ; 37(5): 698-710, 2024 May 20.

Artículo en Inglés | MEDLINE | ID: mdl-38619497

RESUMEN

Reactive metabolite formation is a major mechanism of hepatotoxicity. Although reactive electrophiles can be soft or hard in nature, screening strategies have generally focused on the use of glutathione trapping assays to screen for soft electrophiles, with many data sets available to support their use. The use of a similar assay for hard electrophiles using cyanide as the trapping agent is far less common, and there is a lack of studies with sufficient supporting data. Using a set of 260 compounds with a defined hepatotoxicity status by the FDA, a comprehensive literature search yielded cyanide trapping data on an unbalanced set of 20 compounds that were all clinically hepatotoxic. Thus, a further set of 19 compounds was selected to generate cyanide trapping data, resulting in a more balanced data set of 39 compounds. Analysis of the data demonstrated that the cyanide trapping assay had high specificity (92%) and a positive predictive value (83%) such that hepatotoxic compounds would be confidently flagged. Structural analysis of the adducts formed revealed artifactual methylated cyanide adducts to also occur, highlighting the importance of full structural identification to confirm the nature of the adduct formed. The assay was demonstrated to add the most value for compounds containing typical structural alerts for hard electrophile formation: half of the severe hepatotoxins with these structural alerts formed cyanide adducts, while none of the severe hepatotoxins with no relevant structural alerts formed adducts. The assay conditions used included cytosolic enzymes (e.g., aldehyde oxidase) and an optimized cyanide concentration to minimize the inhibition of cytochrome P450 enzymes by cyanide. Based on the demonstrated added value of this assay, it is to be initiated for use at GSK as part of the integrated hepatotoxicity strategy, with its performance being reviewed periodically as more data is generated.

Asunto(s)

Enfermedad Hepática Inducida por Sustancias y Drogas , Cianuros , Cianuros/metabolismo , Cianuros/química , Humanos , Enfermedad Hepática Inducida por Sustancias y Drogas/metabolismo , Enfermedad Hepática Inducida por Sustancias y Drogas/etiología , Iminas/química , Iminas/metabolismo , Hígado/metabolismo , Hígado/efectos de los fármacos , Estructura Molecular

6.

External validation of models to estimate gestational age in the second and third trimester using ultrasound: A prospective multicentre observational study.

Self, Alice; Schlussel, Michael; Collins, Gary S; Dhombres, Ferdinand; Fries, Nicolas; Haddad, Georges; Salomon, Laurent J; Massoud, Mona; Papageorghiou, Aris T.

BJOG ; 2024 Aug 08.

Artículo en Inglés | MEDLINE | ID: mdl-39118202

RESUMEN

OBJECTIVES: Accurate assessment of gestational age (GA) is important at both individual and population levels. The most accurate way to estimate GA in women who book late in pregnancy is unknown. The aim of this study was to externally validate the accuracy of equations for GA estimation in late pregnancy and to identify the best equation for estimating GA in women who do not receive an ultrasound scan until the second or third trimester. DESIGN: This was a prospective, observational cross-sectional study. SETTING: 57 prenatal care centres, France. PARTICIPANTS: Women with a singleton pregnancy and a previous 11-14-week dating scan that gave the observed GA were recruited over an 8-week period. They underwent a standardised ultrasound examination at one time point during the pregnancy (15-43 weeks), measuring 12 foetal biometric parameters that have previously been identified as useful for GA estimation. MAIN OUTCOME MEASURES: A total of 189 equations that estimate GA based on foetal biometry were examined and compared with GA estimation based on foetal CRL. Comparisons between the observed GA and the estimated GA were made using R2, calibration slope and intercept. RMSE, mean difference and 95% range of error were also calculated. RESULTS: A total of 2741 pregnant women were examined. After exclusions, 2339 participants were included. In the 20 best performing equations, the intercept ranged from -0.22 to 0.30, the calibration slope from 0.96 to 1.03 and the RSME from 0.67 to 0.87. Overall, multiparameter models outperformed single-parameter models. Both the 95% range of error and mean difference increased with gestation. Commonly used models based on measurement of the head circumference alone were not amongst the best performing models and were associated with higher 95% error and mean difference. CONCLUSIONS: We provide strong evidence that GA-specific equations based on multiparameter models should be used to estimate GA in late pregnancy. However, as all methods of GA assessment in late pregnancy are associated with large prediction intervals, efforts to improve access to early antenatal ultrasound must remain a priority. TRIAL REGISTRATION: The proposal for this study and the corresponding methodological review was registered on PROSPERO international register of systematic reviews (registration number: CRD4201913776).

7.

ChatGPT: standard reporting guidelines for responsible use.

Cacciamani, Giovanni E; Collins, Gary S; Gill, Inderbir S.

Nature ; 618(7964): 238, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37280286

8.

Evidence of questionable research practices in clinical prediction models.

White, Nicole; Parsons, Rex; Collins, Gary; Barnett, Adrian.

BMC Med ; 21(1): 339, 2023 09 04.

Artículo en Inglés | MEDLINE | ID: mdl-37667344

RESUMEN

BACKGROUND: Clinical prediction models are widely used in health and medical research. The area under the receiver operating characteristic curve (AUC) is a frequently used estimate to describe the discriminatory ability of a clinical prediction model. The AUC is often interpreted relative to thresholds, with "good" or "excellent" models defined at 0.7, 0.8 or 0.9. These thresholds may create targets that result in "hacking", where researchers are motivated to re-analyse their data until they achieve a "good" result. METHODS: We extracted AUC values from PubMed abstracts to look for evidence of hacking. We used histograms of the AUC values in bins of size 0.01 and compared the observed distribution to a smooth distribution from a spline. RESULTS: The distribution of 306,888 AUC values showed clear excesses above the thresholds of 0.7, 0.8 and 0.9 and shortfalls below the thresholds. CONCLUSIONS: The AUCs for some models are over-inflated, which risks exposing patients to sub-optimal clinical decision-making. Greater modelling transparency is needed, including published protocols, and data and code sharing.

Asunto(s)

Investigación Biomédica , Modelos Estadísticos , Humanos , Pronóstico , Curva ROC

9.

Shoulder replacement surgery's rising demand, inequality of provision, and variation in outcomes: cohort study using Hospital Episode Statistics for England.

Valsamis, Epaminondas Markos; Pinedo-Villanueva, Rafael; Sayers, Adrian; Collins, Gary S; Rees, Jonathan L.

BMC Med ; 21(1): 406, 2023 10 26.

Artículo en Inglés | MEDLINE | ID: mdl-37880689

RESUMEN

BACKGROUND: The aim of this study was to forecast future patient demand for shoulder replacement surgery in England and investigate any geographic and socioeconomic inequalities in service provision and patient outcomes. METHODS: For this cohort study, all elective shoulder replacements carried out by NHS hospitals and NHS-funded care in England from 1999 to 2020 were identified using Hospital Episode Statistics data. Eligible patients were aged 18 years and older. Shoulder replacements for malignancy or acute trauma were excluded. Population estimates and projections were obtained from the Office for National Statistics. Standardised incidence rates and the risks of serious adverse events (SAEs) and revision surgery were calculated and stratified by geographical region, socioeconomic deprivation, sex, and age band. Hospital costs for each admission were calculated using Healthcare Resource Group codes and NHS Reference Costs based on the National Reimbursement System. Projected rates and hospital costs were predicted until the year 2050 for two scenarios of future growth. RESULTS: A total of 77,613 elective primary and 5847 revision shoulder replacements were available for analysis. Between 1999 and 2020, the standardised incidence of primary shoulder replacements in England quadrupled from 2.6 to 10.4 per 100,000 population, increasing predominantly in patients aged over 65 years. As many as 1 in 6 patients needed to travel to a different region for their surgery indicating inequality of service provision. A temporal increase in SAEs was observed: the 30-day risk increased from 1.3 to 4.8% and the 90-day risk increased from 2.4 to 6.0%. Patients from the more deprived socioeconomic groups appeared to have a higher risk of SAEs and revision surgery. Shoulder replacements are forecast to increase by up to 234% by 2050 in England, reaching 20,912 procedures per year with an associated annual cost to hospitals of £235 million. CONCLUSIONS: This study reports a rising incidence of shoulder replacements, regional disparities in service provision, and an overall increasing risk of SAEs, especially in more deprived socioeconomic groups. These findings highlight the need for better healthcare planning to match local population demand, while more research is needed to understand and prevent the increase observed in SAEs.

Asunto(s)

Artroplastía de Reemplazo de Hombro , Humanos , Estudios de Cohortes , Inglaterra/epidemiología , Hospitales , Hospitalización

10.

Clinical prediction models and the multiverse of madness.

Riley, Richard D; Pate, Alexander; Dhiman, Paula; Archer, Lucinda; Martin, Glen P; Collins, Gary S.

BMC Med ; 21(1): 502, 2023 12 18.

Artículo en Inglés | MEDLINE | ID: mdl-38110939

RESUMEN

BACKGROUND: Each year, thousands of clinical prediction models are developed to make predictions (e.g. estimated risk) to inform individual diagnosis and prognosis in healthcare. However, most are not reliable for use in clinical practice. MAIN BODY: We discuss how the creation of a prediction model (e.g. using regression or machine learning methods) is dependent on the sample and size of data used to develop it-were a different sample of the same size used from the same overarching population, the developed model could be very different even when the same model development methods are used. In other words, for each model created, there exists a multiverse of other potential models for that sample size and, crucially, an individual's predicted value (e.g. estimated risk) may vary greatly across this multiverse. The more an individual's prediction varies across the multiverse, the greater the instability. We show how small development datasets lead to more different models in the multiverse, often with vastly unstable individual predictions, and explain how this can be exposed by using bootstrapping and presenting instability plots. We recommend healthcare researchers seek to use large model development datasets to reduce instability concerns. This is especially important to ensure reliability across subgroups and improve model fairness in practice. CONCLUSIONS: Instability is concerning as an individual's predicted value is used to guide their counselling, resource prioritisation, and clinical decision making. If different samples lead to different models with very different predictions for the same individual, then this should cast doubt into using a particular model for that individual. Therefore, visualising, quantifying and reporting the instability in individual-level predictions is essential when proposing a new model.

Asunto(s)

Modelos Estadísticos , Humanos , Pronóstico , Reproducibilidad de los Resultados

11.

Prediction Models for Bronchopulmonary Dysplasia in Preterm Infants: A Systematic Review and Meta-Analysis.

Romijn, Michelle; Dhiman, Paula; Finken, Martijn J J; van Kaam, Anton H; Katz, Trixie A; Rotteveel, Joost; Schuit, Ewoud; Collins, Gary S; Onland, Wes; Torchin, Heloise.

J Pediatr ; 258: 113370, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-37059387

RESUMEN

OBJECTIVE: To review systematically and assess the accuracy of prediction models for bronchopulmonary dysplasia (BPD) at 36 weeks of postmenstrual age. STUDY DESIGN: Searches were conducted in MEDLINE and EMBASE. Studies published between 1990 and 2022 were included if they developed or validated a prediction model for BPD or the combined outcome death/BPD at 36 weeks in the first 14 days of life in infants born preterm. Data were extracted independently by 2 authors following the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (ie, CHARMS) and PRISMA guidelines. Risk of bias was assessed using the Prediction model Risk Of Bias ASsessment Tool (ie, PROBAST). RESULTS: Sixty-five studies were reviewed, including 158 development and 108 externally validated models. Median c-statistic of 0.84 (range 0.43-1.00) was reported at model development, and 0.77 (range 0.41-0.97) at external validation. All models were rated at high risk of bias, due to limitations in the analysis part. Meta-analysis of the validated models revealed increased c-statistics after the first week of life for both the BPD and death/BPD outcome. CONCLUSIONS: Although BPD prediction models perform satisfactorily, they were all at high risk of bias. Methodologic improvement and complete reporting are needed before they can be considered for use in clinical practice. Future research should aim to validate and update existing models.

Asunto(s)

Displasia Broncopulmonar , Recien Nacido Prematuro , Lactante , Recién Nacido , Humanos , Displasia Broncopulmonar/epidemiología

12.

Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review.

Dhiman, Paula; Ma, Jie; Qi, Cathy; Bullock, Garrett; Sergeant, Jamie C; Riley, Richard D; Collins, Gary S.

BMC Med Res Methodol ; 23(1): 188, 2023 08 19.

Artículo en Inglés | MEDLINE | ID: mdl-37598153

RESUMEN

BACKGROUND: Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. METHODS: We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. RESULTS: A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63-82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66-84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). CONCLUSIONS: Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.

Asunto(s)

Modelos Estadísticos , Investigadores , Humanos , Pronóstico , PubMed

13.

The burden of Alzheimer's disease and other types of dementia in the Middle East and North Africa region, 1990-2019.

Safiri, Saeid; Noori, Maryam; Nejadghaderi, Seyed Aria; Mousavi, Seyed Ehsan; Sullman, Mark J M; Araj-Khodaei, Mostafa; Collins, Gary S; Kolahi, Ali-Asghar; Gharagozli, Kurosh.

Age Ageing ; 52(3)2023 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-36995136

RESUMEN

BACKGROUND: Alzheimer's disease (AD) is the most common cause of dementia and this progressive neurological disorder is associated with substantial mortality and morbidity. We aimed to report the burden of AD and other types of dementia in the Middle East and North Africa (MENA) region, by age, sex and sociodemographic index (SDI), for the period 1990-2019. METHODS: publicly accessible data on the prevalence, death and disability-adjusted life years (DALYs) because of AD, and other types of dementia, were retrieved from the global burden of disease 2019 project for all MENA countries from 1990 to 2019. RESULTS: in 2019, the age-standardised point prevalence of dementia was 777.6 per 100,000 populations in MENA, which was 3.0% higher than in 1990. The age-standardised death and DALY rates of dementia were 25.5 and 387.0 per 100,000, respectively. In 2019, the highest DALY rate was observed in Afghanistan and the lowest rate was in Egypt. That same year, the age-standardised point prevalence, death and DALY rates increased with advancing age and were higher for females of all age groups. From 1990 to 2019, the DALY rate of dementia decreased with increasing SDI up to 0.4, then slightly increased up to an SDI of 0.75, followed by a decrease for the remaining SDI levels. CONCLUSIONS: the point prevalence of AD and other types of dementia has increased over the past three decades, and in 2019, the corresponding regional burden was higher than the global average.

Asunto(s)

Enfermedad de Alzheimer , Femenino , Humanos , Años de Vida Ajustados por Calidad de Vida , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/epidemiología , Carga Global de Enfermedades , Prevalencia , África del Norte/epidemiología , Medio Oriente/epidemiología , Salud Global

14.

The estimated burden of bulimia nervosa in the Middle East and North Africa region, 1990-2019.

Safiri, Saeid; Noori, Maryam; Nejadghaderi, Seyed Aria; Shamekh, Ali; Karamzad, Nahid; Sullman, Mark J M; Grieger, Jessica A; Collins, Gary S; Abdollahi, Morteza; Kolahi, Ali-Asghar.

Int J Eat Disord ; 56(2): 394-406, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36301044

RESUMEN

OBJECTIVE: We aimed to report the burden of bulimia nervosa (BN) in the Middle East and North Africa (MENA) region by age, sex, and sociodemographic index (SDI), for the period 1990-2019. METHODS: Estimates of the prevalence, incidence, and disability-adjusted life-years (DALYs) attributable to BN were retrieved from the Global Burden of Disease study 2019, between 1990 and 2019, for the 21 countries in the MENA region. The counts and age-standardized rates (per 100,000) were presented, along with their corresponding 95% uncertainty intervals. RESULTS: In 2019, the estimated regional age-standardized point prevalence and incidence rates of BN were 168.3 (115.0-229.6) and 178.6 (117.0-255.6) per 100,000, which represented 22.0% (17.5-27.2) and 10.4% (7.1-14.7) increases, respectively, since 1990. Moreover, in 2019 the regional age-standardized DALY rate was 35.5 (20.6-55.5) per 100,000, which was 22.2% (16.7-28.2) higher than in 1990. In 2019, Qatar (58.6 [34.3-92.5]) and Afghanistan (18.4 [10.6-29.2]) had the highest and lowest age-standardized DALY rates, respectively. Regionally, the age-standardized point prevalence of BN peaked in the 30-34 age group and was more prevalent among women. In addition, there was a generally positive association between SDI and the burden of BN across the measurement period. DISCUSSION: In the MENA region, the burden of BN has increased over the last three decades. Cost-effective preventive measures are needed in the region, especially in the high SDI countries. PUBLIC SIGNIFICANCE: This study reports the estimated burden of BN in the MENA region and shows that its burden has increased over the last three decades.

Asunto(s)

Bulimia Nerviosa , Humanos , Femenino , Bulimia Nerviosa/epidemiología , Años de Vida Ajustados por Calidad de Vida , Carga Global de Enfermedades , Medio Oriente/epidemiología , África del Norte/epidemiología , Prevalencia , Incidencia

15.

Mapping the Oxford Shoulder Score onto the EQ-5D utility index.

Valsamis, Epaminondas M; Beard, David; Carr, Andrew; Collins, Gary S; Brealey, Stephen; Rangan, Amar; Santos, Rita; Corbacho, Belen; Rees, Jonathan L; Pinedo-Villanueva, Rafael.

Qual Life Res ; 32(2): 507-518, 2023 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-36169788

RESUMEN

PURPOSE: In order to enable cost-utility analysis of shoulder pain conditions and treatments, this study aimed to develop and evaluate mapping algorithms to estimate the EQ-5D health index from the Oxford Shoulder Score (OSS) when health outcomes are only assessed with the OSS. METHODS: 5437 paired OSS and EQ-5D questionnaire responses from four national multicentre randomised controlled trials investigating different shoulder pathologies and treatments were split into training and testing samples. Separate EQ-5D-3L and EQ-5D-5L analyses were undertaken. Transfer to utility (TTU) regression (univariate linear, polynomial, spline, multivariable linear, two-part logistic-linear, tobit and adjusted limited dependent variable mixture models) and response mapping (ordered logistic regression and seemingly unrelated regression (SUR)) models were developed on the training sample. These were internally validated, and their performance evaluated on the testing sample. Model performance was evaluated over 100-fold repeated training-testing sample splits. RESULTS: For the EQ-5D-3L analysis, the multivariable linear and splines models had the lowest mean square error (MSE) of 0.0415. The SUR model had the lowest mean absolute error (MAE) of 0.136. Model performance was greatest in the mid-range and best health states, and lowest in poor health states. For the EQ-5D-5L analyses, the multivariable linear and splines models had the lowest MSE (0.0241-0.0278) while the SUR models had the lowest MAE (0.105-0.113). CONCLUSION: The developed models now allow accurate estimation of the EQ-5D health index when only the OSS responses are available as a measure of patient-reported health outcome.

Asunto(s)

Calidad de Vida , Hombro , Humanos , Calidad de Vida/psicología , Encuestas y Cuestionarios , Dolor , Modelos Logísticos , Algoritmos

16.

Barriers and facilitators to implementation of musculoskeletal injury mitigation programmes for military service members around the world: a scoping review.

Bullock, Garrett S; Dartt, Carolyn E; Ricker, Emily A; Fallowfield, Joanne L; Arden, Nigel; Clifton, Daniel; Danelson, Kerry; Fraser, John J; Gomez, Christina; Greenlee, Tina A; Gregory, Alexandria; Gribbin, Timothy; Losciale, Justin; Molloy, Joseph M; Nicholson, Kristen F; Polich, Julia-Grace; Räisänen, Anu; Shah, Karishma; Smuda, Michael; Teyhen, Deydre S; Allard, Rhonda J; Collins, Gary S; de la Motte, Sarah J; Rhon, Daniel I.

Inj Prev ; 29(6): 461-473, 2023 Nov 27.

Artículo en Inglés | MEDLINE | ID: mdl-37620010

RESUMEN

INTRODUCTION: Musculoskeletal injury (MSK-I) mitigation and prevention programmes (MSK-IMPPs) have been developed and implemented across militaries worldwide. Although programme efficacy is often reported, development and implementation details are often overlooked, limiting their scalability, sustainability and effectiveness. This scoping review aimed to identify the following in military populations: (1) barriers and facilitators to implementing and scaling MSK-IMPPs; (2) gaps in MSK-IMPP research and (3) future research priorities. METHODS: A scoping review assessed literature from inception to April 2022 that included studies on MSK-IMPP implementation and/or effectiveness in military populations. Barriers and facilitators to implementing these programmes were identified. RESULTS: From 132 articles, most were primary research studies (90; 68.2%); the remainder were review papers (42; 31.8%). Among primary studies, 3 (3.3%) investigated only women, 62 (69%) only men and 25 (27.8%) both. Barriers included limited resources, lack of stakeholder engagement, competing military priorities and equipment-related factors. Facilitators included strong stakeholder engagement, targeted programme design, involvement/proximity of MSK-I experts, providing MSK-I mitigation education, low burden on resources and emphasising end-user acceptability. Research gaps included variability in reported MSK-I outcomes and no consensus on relevant surveillance metrics and definitions. CONCLUSION: Despite a robust body of literature, there is a dearth of information about programme implementation; specifically, barriers or facilitators to success. Additionally, variability in outcomes and lack of consensus on MSK-I definitions may affect the development, implementation evaluation and comparison of MSK-IMPPs. There is a need for international consensus on definitions and optimal data reporting elements when conducting injury risk mitigation research in the military.

Asunto(s)

Personal Militar , Enfermedades Musculoesqueléticas , Masculino , Humanos , Femenino , Enfermedades Musculoesqueléticas/prevención & control , Evaluación de Programas y Proyectos de Salud

17.

The burden of low back pain and its association with socio-demographic variables in the Middle East and North Africa region, 1990-2019.

Safiri, Saeid; Nejadghaderi, Seyed Aria; Noori, Maryam; Sullman, Mark J M; Collins, Gary S; Kaufman, Jay S; Hill, Catherine L; Kolahi, Ali-Asghar.

BMC Musculoskelet Disord ; 24(1): 59, 2023 Jan 23.

Artículo en Inglés | MEDLINE | ID: mdl-36683025

RESUMEN

BACKGROUND: Low back pain (LBP) is the most common musculoskeletal disorder globally. Providing region- and national-specific information on the burden of low back pain is critical for local healthcare policy makers. The present study aimed to report, compare, and contextualize the prevalence, incidence and years lived with disability (YLDs) of low back pain in the Middle East and North Africa (MENA) region by age, sex and sociodemographic index (SDI), from 1990 to 2019. METHODS: Publicly available data were obtained from the Global Burden of Disease (GBD) study 2019. The burden of LBP was reported for the 21 countries located in the MENA region, from 1990 to 2019. All estimates were reported as counts and age-standardised rates per 100,000 population, together with their corresponding 95% uncertainty intervals (UIs). RESULTS: In 2019, the age-standardised point prevalence and incidence rate per 100,000 in MENA were 7668.2 (95% UI 6798.0 to 8363.3) and 3215.9 (95%CI 2838.8 to 3638.3), which were 5.8% (4.3 to 7.4) and 4.4% (3.4 to 5.5) lower than in 1990, respectively. Furthermore, the regional age-standardised YLD rate in 2019 was 862.0 (605.5 to 1153.3) per 100,000, which was 6.0% (4.2 to 7.7) lower than in 1990. In 2019, Turkey [953.6 (671.3 to 1283.5)] and Lebanon [727.2 (511.5 to 966.0)] had the highest and lowest age-standardised YLD rates, respectively. There was no country in the MENA region that showed increases in the age-standardised prevalence, incidence or YLD rates of LBP over the measurement period. Furthermore, in 2019 the number of prevalent cases were highest in the 35-39 age group, with males having a higher number of cases in all age groups. In addition, the age-standardised YLD rates for males in the MENA region were higher than the global estimates in almost all age groups, in both 1990 and 2019. Furthermore, the burden of LBP was not associated with the level of socio-economic development during the measurement period. CONCLUSION: The burden attributable to LBP in the MENA region decreased slightly from 1990 to 2019. Furthermore, the burden among males was higher than the global average. Consequently, more integrated healthcare interventions are needed to more effectively alleviate the burden of low back pain in this region.

Asunto(s)

Dolor de la Región Lumbar , Masculino , Humanos , Dolor de la Región Lumbar/diagnóstico , Dolor de la Región Lumbar/epidemiología , Prevalencia , Incidencia , Carga Global de Enfermedades , África del Norte/epidemiología , Turquía , Salud Global , Años de Vida Ajustados por Calidad de Vida

18.

Stability of clinical prediction models developed using statistical or machine learning methods.

Riley, Richard D; Collins, Gary S.

Biom J ; 65(8): e2200302, 2023 12.

Artículo en Inglés | MEDLINE | ID: mdl-37466257

RESUMEN

Clinical prediction models estimate an individual's risk of a particular health outcome. A developed model is a consequence of the development dataset and model-building strategy, including the sample size, number of predictors, and analysis method (e.g., regression or machine learning). We raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model-building steps (those used to develop the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and deriving (i) a prediction instability plot of bootstrap model versus original model predictions; (ii) the mean absolute prediction error (mean absolute difference between individuals' original and bootstrap model predictions), and (iii) calibration, classification, and decision curve instability plots of bootstrap models applied in the original sample. A case study illustrates how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), while informing a model's critical appraisal (risk of bias rating), fairness, and further validation requirements.

Asunto(s)

Aprendizaje Automático , Modelos Estadísticos , Humanos , Pronóstico , Simulación por Computador

19.

Improving Clinical Utility of Real-World Prediction Models: Updating Through Recalibration.

Bullock, Garrett S; Shanley, Ellen; Thigpen, Charles A; Arden, Nigel K; Noonan, Thomas K; Kissenberth, Michael J; Wyland, Douglas J; Collins, Gary S.

J Strength Cond Res ; 37(5): 1057-1063, 2023 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-36730571

RESUMEN

ABSTRACT: Bullock, GS, Shanley, E, Thigpen, CA, Arden, NK, Noonan, TK, Kissenberth, MJ, Wyland, DJ, and Collins, GS. Improving clinical utility of real-world prediction models: updating through recalibration. J Strength Cond Res 37(5): 1057-1063, 2023-Prediction models can aid clinicians in identifying at-risk athletes. However, sport and clinical practice patterns continue to change, causing predictive drift and potential suboptimal prediction model performance. Thus, there is a need to temporally recalibrate previously developed baseball arm injury models. The purpose of this study was to perform temporal recalibration on a previously developed injury prediction model and assess model performance in professional baseball pitchers. An arm injury prediction model was developed on data from a prospective cohort from 2009 to 2019 on minor league pitchers. Data for the 2015-2019 seasons were used for temporal recalibration and model performance assessment. Temporal recalibration constituted intercept-only and full model redevelopment. Model performance was investigated by assessing Nagelkerke's R-square, calibration in the large, calibration, and discrimination. Decision curves compared the original model, temporal recalibrated model, and current best evidence-based practice. One hundred seventy-eight pitchers participated in the 2015-2019 seasons with 1.63 arm injuries per 1,000 athlete exposures. The temporal recalibrated intercept model demonstrated the best discrimination (0.81 [95% confidence interval [CI]: 0.73, 0.88]) and R-square (0.32) compared with original model (0.74 [95% CI: 0.69, 0.80]; R-square: 0.32) and the redeveloped model (0.80 [95% CI: 0.73, 0.87]; R-square: 0.30). The temporal recalibrated intercept model demonstrated an improved net benefit of 0.34 compared with current best evidence-based practice. The temporal recalibrated intercept model demonstrated the best model performance and clinical utility. Updating prediction models can account for changes in sport training over time and improve professional baseball arm injury outcomes.

Asunto(s)

Traumatismos del Brazo , Béisbol , Humanos , Estudios Prospectivos , Béisbol/lesiones , Atletas , Estaciones del Año

20.

Global, regional, and national cancer deaths and disability-adjusted life-years (DALYs) attributable to alcohol consumption in 204 countries and territories, 1990-2019.

Safiri, Saeid; Nejadghaderi, Seyed Aria; Karamzad, Nahid; Carson-Chahhoud, Kristin; Bragazzi, Nicola Luigi; Sullman, Mark J M; Almasi-Hashiani, Amir; Mansournia, Mohammad Ali; Collins, Gary S; Kaufman, Jay S; Kolahi, Ali-Asghar.

Cancer ; 128(9): 1840-1852, 2022 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-35239973

RESUMEN

BACKGROUND: Alcohol consumption is a risk factor for a number of communicable and non-communicable diseases, including several types of cancer. This article reports the burden of cancers attributable to alcohol consumption by age, sex, location, sociodemographic index (SDI), and cancer type from 1990 to 2019. METHODS: The Comparative Risk Assessment approach was used in the 2019 Global Burden of Disease study to report the burden of cancers attributable to alcohol consumption between 1990 and 2019. RESULTS: In 2019, there were globally an estimated 494.7 thousand cancer deaths (95% uncertainty interval [UI], 439.7 to 554.1) and 13.0 million cancer disability-adjusted life-years (DALYs; 95% UI, 11.6 to 14.5) that were attributable to alcohol consumption. The alcohol-attributable DALYs were much higher in men (10.5 million; 95% UI, 9.2 to 11.8) than women (2.5 million; 95% UI, 2.2 to 2.9). The global age-standardized death and DALY rates of cancers attributable to alcohol decreased by 14.7% (95% UI, 6.4% to 23%) and 18.1% (95% UI, 9.2% to 26.5%), respectively, over the study period. Central Europe had the highest age-standardized death rates that were attributable to alcohol consumption(10.3; 95% UI, 8.7 to12.0). Moreover, there was an overall positive association between SDI and the regional age-standardized DALY rate for alcohol-attributable cancers. CONCLUSIONS: Despite decreases in age-standardized deaths and DALYs, substantial numbers of cancer deaths and DALYs are still attributable to alcohol consumption. Because there is a higher burden in males, the elderly, and developed regions (based on SDI), these groups and regions should be prioritized in any prevention programs.

Asunto(s)

Años de Vida Ajustados por Discapacidad , Neoplasias , Anciano , Consumo de Bebidas Alcohólicas/efectos adversos , Consumo de Bebidas Alcohólicas/epidemiología , Femenino , Carga Global de Enfermedades , Salud Global , Humanos , Masculino , Neoplasias/epidemiología , Años de Vida Ajustados por Calidad de Vida , Factores de Riesgo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA