Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 631
Filter
Add more filters

Publication year range
1.
Proc Natl Acad Sci U S A ; 121(32): e2403490121, 2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39078672

ABSTRACT

A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type's heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.

2.
Proc Natl Acad Sci U S A ; 120(23): e2215572120, 2023 Jun 06.
Article in English | MEDLINE | ID: mdl-37252958

ABSTRACT

Does competition affect moral behavior? This fundamental question has been debated among leading scholars for centuries, and more recently, it has been tested in experimental studies yielding a body of rather inconclusive empirical evidence. A potential source of ambivalent empirical results on the same hypothesis is design heterogeneity-variation in true effect sizes across various reasonable experimental research protocols. To provide further evidence on whether competition affects moral behavior and to examine whether the generalizability of a single experimental study is jeopardized by design heterogeneity, we invited independent research teams to contribute experimental designs to a crowd-sourced project. In a large-scale online data collection, 18,123 experimental participants were randomly allocated to 45 randomly selected experimental designs out of 95 submitted designs. We find a small adverse effect of competition on moral behavior in a meta-analysis of the pooled data. The crowd-sourced design of our study allows for a clean identification and estimation of the variation in effect sizes above and beyond what could be expected due to sampling variance. We find substantial design heterogeneity-estimated to be about 1.6 times as large as the average standard error of effect size estimates of the 45 research designs-indicating that the informativeness and generalizability of results based on a single experimental design are limited. Drawing strong conclusions about the underlying hypotheses in the presence of substantive design heterogeneity requires moving toward much larger data collections on various experimental designs testing the same hypothesis.

3.
Biostatistics ; 25(2): 289-305, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-36977366

ABSTRACT

Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but from which covariate information can be obtained. In such analyses, a key practical challenge is the presence of systematically missing data when some trials have collected data on one or more baseline covariates, but other trials have not, such that the covariate information is missing for all participants in the latter. In this article, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis. We propose three estimators for the average treatment effect in the target population, examine their asymptotic properties, and show that they have good finite-sample performance in simulation studies. We use the estimators to analyze data from two large lung cancer screening trials and target population data from the National Health and Nutrition Examination Survey (NHANES). To accommodate the complex survey design of the NHANES, we modify the methods to incorporate survey sampling weights and allow for clustering.


Subject(s)
Early Detection of Cancer , Lung Neoplasms , Humans , Nutrition Surveys , Lung Neoplasms/epidemiology , Computer Simulation , Research Design
4.
Proc Natl Acad Sci U S A ; 119(30): e2120377119, 2022 Jul 26.
Article in English | MEDLINE | ID: mdl-35858443

ABSTRACT

This initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability-for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples.

5.
Neuroimage ; 285: 120497, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38142755

ABSTRACT

Major depressive disorder (MDD) is a serious and heterogeneous psychiatric disorder that needs accurate diagnosis. Resting-state functional MRI (rsfMRI), which captures multiple perspectives on brain structure, function, and connectivity, is increasingly applied in the diagnosis and pathological research of MDD. Different machine learning algorithms are then developed to exploit the rich information in rsfMRI and discriminate MDD patients from normal controls. Despite recent advances reported, the MDD discrimination accuracy has room for further improvement. The generalizability and interpretability of the discrimination method are not sufficiently addressed either. Here, we propose a machine learning method (MFMC) for MDD discrimination by concatenating multiple features and stacking multiple classifiers. MFMC is tested on the REST-meta-MDD data set that contains 2428 subjects collected from 25 different sites. MFMC yields 96.9% MDD discrimination accuracy, demonstrating a significant improvement over existing methods. In addition, the generalizability of MFMC is validated by the good performance when the training and testing subjects are from independent sites. The use of XGBoost as the meta classifier allows us to probe the decision process of MFMC. We identify 13 feature values related to 9 brain regions including the posterior cingulate gyrus, superior frontal gyrus orbital part, and angular gyrus, which contribute most to the classification and also demonstrate significant differences at the group level. The use of these 13 feature values alone can reach 87% of MFMC's full performance when taking all feature values. These features may serve as clinically useful diagnostic and prognostic biomarkers for MDD in the future.


Subject(s)
Depressive Disorder, Major , Humans , Depressive Disorder, Major/diagnostic imaging , Depressive Disorder, Major/pathology , Brain Mapping/methods , Magnetic Resonance Imaging/methods , Brain , Machine Learning
6.
Am J Epidemiol ; 2024 Aug 31.
Article in English | MEDLINE | ID: mdl-39218437

ABSTRACT

Comparisons of treatments, interventions, or exposures are of central interest in epidemiology, but direct comparisons are not always possible due to practical or ethical reasons. Here, we detail a fusion approach to compare treatments across studies. The motivating example entails comparing the risk of the composite outcome of death, AIDS, or greater than a 50% CD4 cell count decline in people with HIV when assigned triple versus mono antiretroviral therapy, using data from the AIDS Clinical Trial Group (ACTG) 175 (mono versus dual therapy) and ACTG 320 (dual versus triple therapy). We review a set of identification assumptions and estimate the risk difference using an inverse probability weighting estimator that leverages the shared trial arms (dual therapy). A fusion diagnostic based on comparing the shared arms is proposed that may indicate violation of the identification assumptions. Application of the data fusion estimator and diagnostic to the ACTG trials indicates triple therapy results in a reduction in risk compared to monotherapy in individuals with baseline CD4 counts between 50 and 300 cells/mm3. Bridged treatment comparisons address questions that none of the constituent data sources could address alone, but valid fusion-based inference requires careful consideration of the underlying assumptions.

7.
Am J Epidemiol ; 193(5): 741-750, 2024 05 07.
Article in English | MEDLINE | ID: mdl-38456780

ABSTRACT

Epidemiologists are attempting to address research questions of increasing complexity by developing novel methods for combining information from diverse sources. Cole et al. (Am J Epidemiol. 2023;192(3)467-474) provide 2 examples of the process of combining information to draw inferences about a population proportion. In this commentary, we consider combining information to learn about a target population as an epidemiologic activity and distinguish it from more conventional meta-analyses. We examine possible rationales for combining information and discuss broad methodological considerations, with an emphasis on study design, assumptions, and sources of uncertainty.


Subject(s)
Epidemiologic Methods , Humans , Meta-Analysis as Topic , Epidemiologic Studies , Epidemiologic Research Design , Uncertainty
8.
Am J Epidemiol ; 193(8): 1176-1181, 2024 Aug 05.
Article in English | MEDLINE | ID: mdl-38629587

ABSTRACT

External validity is an important part of epidemiologic research. To validly estimate effects in specific external target populations using a chosen effect measure (ie, "transport"), some methods require that one account for all effect measure modifiers (EMMs). However, little is known about how including other variables that are not EMMs (ie, non-EMMs) in adjustment sets affects estimates. Using simulations, we evaluated how inclusion of non-EMMs affected estimation of the transported risk difference (RD) by assessing the impacts of covariates that (1) differ (or not) between the trial and the target, (2) are associated with the outcome (or not), and (3) modify the RD (or not). We assessed variation and bias when covariates with each possible combination of these factors were used to transport RDs using outcome modeling or inverse odds weighting. Inclusion of variables that differed in distribution between the populations but were non-EMMs reduced precision, regardless of whether they were associated with the outcome. However, non-EMMs associated with selection did not amplify bias resulting from omission of necessary EMMs. Including all variables associated with the outcome may result in unnecessarily imprecise estimates when estimating treatment effects in external target populations.


Subject(s)
Bias , Humans , Computer Simulation
9.
Am J Epidemiol ; 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38973744

ABSTRACT

Literature shows heterogeneous age-standardized dementia incidence rates across US Asian American, Native Hawaiian, and Pacific Islanders (AANHPI), but no estimates of population-representative dementia incidence exist due to lack of AANHPI longitudinal probability samples. We compared harmonized characteristics between AANHPI Kaiser Permanente Northern California members (KPNC cohort) and the target population of AANHPI 60+ with private or Medicare insurance using the California Health Interview Survey. We used stabilized inverse odds of selection weights (sIOSW) to estimate ethnicity-specific crude and age-standardized dementia incidence rates and cumulative risk by age 90 in the target population. Differences between the KPNC cohort and target population varied by ethnicity. sIOSW eliminated most differences in larger ethnic groups; some differences remained in smaller groups. Estimated crude dementia incidence rates using sIOSW (versus unweighted) were similar in Chinese, Filipinos, Pacific Islanders and Vietnamese, and higher in Japanese, Koreans, and South Asians. Unweighted and weighted age-standardized incidence rates differed for South Asians. Unweighted and weighted cumulative risk were similar for all groups. We estimated the first population-representative dementia incidence rates and cumulative risk in AANHPI ethnic groups. We encountered some estimation problems and weighted estimates were imprecise, highlighting challenges using weighting to extend inferences to target populations.

10.
Am J Epidemiol ; 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38904459

ABSTRACT

When analyzing a selected sample from a general population, selection bias can arise relative to the causal average treatment effect (ATE) for the general population, and also relative to the ATE for the selected sample itself. We provide simple graphical rules that indicate: (1) if a selected-sample analysis will be unbiased for each ATE; (2) whether adjusting for certain covariates could eliminate selection bias. The rules can easily be checked in a standard single-world intervention graph. When the treatment could affect selection, a third estimand of potential scientific interest is the "net treatment difference", namely the net change in outcomes that would occur for the selected sample if all members of the general population were treated versus not treated, including any effects of the treatment on which individuals are in the selected sample . We provide graphical rules for this estimand as well. We decompose bias in a selected-sample analysis relative to the general-population ATE into: (1) "internal bias" relative to the net treatment difference; (2) "net-external bias", a discrepancy between the net treatment difference and the general-population ATE. Each bias can be assessed unambiguously via a distinct graphical rule, providing new conceptual insight into the mechanisms by which certain causal structures produce selection bias.

11.
Am J Epidemiol ; 2024 Jun 18.
Article in English | MEDLINE | ID: mdl-38896054

ABSTRACT

Cardiovascular disease (CVD) is a leading cause of death globally. Angiotensin-converting enzyme inhibitors (ACEi) and angiotensin receptor blockers (ARB), compared in the ONTARGET trial, each prevent CVD. However, trial results may not be generalisable and their effectiveness in underrepresented groups is unclear. Using trial emulation methods within routine-care data to validate findings, we explored generalisability of ONTARGET results. For people prescribed an ACEi/ARB in the UK Clinical Practice Research Datalink GOLD from 1/1/2001-31/7/2019, we applied trial criteria and propensity-score methods to create an ONTARGET trial-eligible cohort. Comparing ARB to ACEi, we estimated hazard ratios for the primary composite trial outcome (cardiovascular death, myocardial infarction, stroke, or hospitalisation for heart failure), and secondary outcomes. As the pre-specified criteria were met confirming trial emulation, we then explored treatment heterogeneity among three trial-underrepresented subgroups: females, those aged ≥75 years and those with chronic kidney disease (CKD). In the trial-eligible population (n=137,155), results for the primary outcome demonstrated similar effects of ARB and ACEi, (HR 0.97 [95% CI: 0.93, 1.01]), meeting the pre-specified validation criteria. When extending this outcome to trial-underrepresented groups, similar treatment effects were observed by sex, age and CKD. This suggests that ONTARGET trial findings are generalisable to trial-underrepresented subgroups.

12.
Hum Brain Mapp ; 45(3): e26631, 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38379514

ABSTRACT

Aberrant brain network development represents a putative aetiological component in mental disorders, which typically emerge during childhood and adolescence. Previous studies have identified resting-state functional connectivity (RSFC) patterns reflecting psychopathology, but the generalisability to other samples and politico-cultural contexts has not been established. We investigated whether a previously identified cross-diagnostic case-control and autism spectrum disorder (ASD)-specific pattern of RSFC (discovery sample; aged 5-21 from New York City, USA; n = 1666) could be validated in a Norwegian convenience-based youth sample (validation sample; aged 9-25 from Oslo, Norway; n = 531). As a test of generalisability, we investigated if these diagnosis-derived RSFC patterns were sensitive to levels of symptom burden in both samples, based on an independent measure of symptom burden. Both the cross-diagnostic and ASD-specific RSFC pattern were validated across samples. Connectivity patterns were significantly associated with thematically appropriate symptom dimensions in the discovery sample. In the validation sample, the ASD-specific RSFC pattern showed a weak, inverse relationship with symptoms of conduct problems, hyperactivity and prosociality, while the cross-diagnostic pattern was not significantly linked to symptoms. Diagnosis-derived connectivity patterns in a developmental clinical US sample were validated in a convenience sample of Norwegian youth, however, they were not associated with mental health symptoms.


Subject(s)
Autism Spectrum Disorder , Humans , Adolescent , Autism Spectrum Disorder/diagnostic imaging , Brain Mapping/methods , Symptom Burden , Brain/diagnostic imaging , Norway , Magnetic Resonance Imaging/methods
13.
Hum Brain Mapp ; 45(6): e26683, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38647035

ABSTRACT

Machine learning (ML) approaches are increasingly being applied to neuroimaging data. Studies in neuroscience typically have to rely on a limited set of training data which may impair the generalizability of ML models. However, it is still unclear which kind of training sample is best suited to optimize generalization performance. In the present study, we systematically investigated the generalization performance of sex classification models trained on the parcelwise connectivity profile of either single samples or compound samples of two different sizes. Generalization performance was quantified in terms of mean across-sample classification accuracy and spatial consistency of accurately classifying parcels. Our results indicate that the generalization performance of parcelwise classifiers (pwCs) trained on single dataset samples is dependent on the specific test samples. Certain datasets seem to "match" in the sense that classifiers trained on a sample from one dataset achieved a high accuracy when tested on the respected other one and vice versa. The pwCs trained on the compound samples demonstrated overall highest generalization performance for all test samples, including one derived from a dataset not included in building the training samples. Thus, our results indicate that both a large sample size and a heterogeneous data composition of a training sample have a central role in achieving generalizable results.


Subject(s)
Connectome , Machine Learning , Magnetic Resonance Imaging , Humans , Female , Male , Adult , Connectome/methods , Sex Characteristics , Datasets as Topic , Young Adult , Brain/diagnostic imaging , Brain/physiology
14.
Biostatistics ; 24(2): 309-326, 2023 04 14.
Article in English | MEDLINE | ID: mdl-34382066

ABSTRACT

Scientists frequently generalize population level causal quantities such as average treatment effect from a source population to a target population. When the causal effects are heterogeneous, differences in subject characteristics between the source and target populations may make such a generalization difficult and unreliable. Reweighting or regression can be used to adjust for such differences when generalizing. However, these methods typically suffer from large variance if there is limited covariate distribution overlap between the two populations. We propose a generalizability score to address this issue. The score can be used as a yardstick to select target subpopulations for generalization. A simplified version of the score avoids using any outcome information and thus can prevent deliberate biases associated with inadvertent access to such information. Both simulation studies and real data analysis demonstrate convincing results for such selection.


Subject(s)
Research Design , Humans , Propensity Score , Computer Simulation , Causality , Bias
15.
Biostatistics ; 24(3): 728-742, 2023 Jul 14.
Article in English | MEDLINE | ID: mdl-35389429

ABSTRACT

Prediction models are often built and evaluated using data from a population that differs from the target population where model-derived predictions are intended to be used in. In this article, we present methods for evaluating model performance in the target population when some observations are right censored. The methods assume that outcome and covariate data are available from a source population used for model development and covariates, but no outcome data, are available from the target population. We evaluate the finite sample performance of the proposed estimators using simulations and apply the methods to transport a prediction model built using data from a lung cancer screening trial to a nationally representative population of participants eligible for lung cancer screening.


Subject(s)
Early Detection of Cancer , Lung Neoplasms , Humans , Models, Statistical , Computer Simulation
16.
NMR Biomed ; 37(9): e5163, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38649140

ABSTRACT

Quantitative Susceptibility Mapping (QSM) is an advanced magnetic resonance imaging (MRI) technique to quantify the magnetic susceptibility of the tissue under investigation. Deep learning methods have shown promising results in deconvolving the susceptibility distribution from the measured local field obtained from the MR phase. Although existing deep learning based QSM methods can produce high-quality reconstruction, they are highly biased toward training data distribution with less scope for generalizability. This work proposes a hybrid two-step reconstruction approach to improve deep learning based QSM reconstruction. The susceptibility map prediction obtained from the deep learning methods has been refined in the framework developed in this work to ensure consistency with the measured local field. The developed method was validated on existing deep learning and model-based deep learning methods for susceptibility mapping of the brain. The developed method resulted in improved reconstruction for MRI volumes obtained with different acquisition settings, including deep learning models trained on constrained (limited) data settings.


Subject(s)
Brain , Deep Learning , Magnetic Resonance Imaging , Magnetic Resonance Imaging/methods , Humans , Brain/diagnostic imaging , Brain Mapping/methods , Image Processing, Computer-Assisted/methods , Male , Female , Algorithms , Adult
17.
Eur J Nucl Med Mol Imaging ; 51(7): 1937-1954, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38326655

ABSTRACT

PURPOSE: Total metabolic tumor volume (TMTV) segmentation has significant value enabling quantitative imaging biomarkers for lymphoma management. In this work, we tackle the challenging task of automated tumor delineation in lymphoma from PET/CT scans using a cascaded approach. METHODS: Our study included 1418 2-[18F]FDG PET/CT scans from four different centers. The dataset was divided into 900 scans for development/validation/testing phases and 518 for multi-center external testing. The former consisted of 450 lymphoma, lung cancer, and melanoma scans, along with 450 negative scans, while the latter consisted of lymphoma patients from different centers with diffuse large B cell, primary mediastinal large B cell, and classic Hodgkin lymphoma cases. Our approach involves resampling PET/CT images into different voxel sizes in the first step, followed by training multi-resolution 3D U-Nets on each resampled dataset using a fivefold cross-validation scheme. The models trained on different data splits were ensemble. After applying soft voting to the predicted masks, in the second step, we input the probability-averaged predictions, along with the input imaging data, into another 3D U-Net. Models were trained with semi-supervised loss. We additionally considered the effectiveness of using test time augmentation (TTA) to improve the segmentation performance after training. In addition to quantitative analysis including Dice score (DSC) and TMTV comparisons, the qualitative evaluation was also conducted by nuclear medicine physicians. RESULTS: Our cascaded soft-voting guided approach resulted in performance with an average DSC of 0.68 ± 0.12 for the internal test data from developmental dataset, and an average DSC of 0.66 ± 0.18 on the multi-site external data (n = 518), significantly outperforming (p < 0.001) state-of-the-art (SOTA) approaches including nnU-Net and SWIN UNETR. While TTA yielded enhanced performance gains for some of the comparator methods, its impact on our cascaded approach was found to be negligible (DSC: 0.66 ± 0.16). Our approach reliably quantified TMTV, with a correlation of 0.89 with the ground truth (p < 0.001). Furthermore, in terms of visual assessment, concordance between quantitative evaluations and clinician feedback was observed in the majority of cases. The average relative error (ARE) and the absolute error (AE) in TMTV prediction on external multi-centric dataset were ARE = 0.43 ± 0.54 and AE = 157.32 ± 378.12 (mL) for all the external test data (n = 518), and ARE = 0.30 ± 0.22 and AE = 82.05 ± 99.78 (mL) when the 10% outliers (n = 53) were excluded. CONCLUSION: TMTV-Net demonstrates strong performance and generalizability in TMTV segmentation across multi-site external datasets, encompassing various lymphoma subtypes. A negligible reduction of 2% in overall performance during testing on external data highlights robust model generalizability across different centers and cancer types, likely attributable to its training with resampled inputs. Our model is publicly available, allowing easy multi-site evaluation and generalizability analysis on datasets from different institutions.


Subject(s)
Image Processing, Computer-Assisted , Lymphoma , Positron Emission Tomography Computed Tomography , Tumor Burden , Humans , Positron Emission Tomography Computed Tomography/methods , Lymphoma/diagnostic imaging , Image Processing, Computer-Assisted/methods , Fluorodeoxyglucose F18 , Automation , Male , Female
18.
Reprod Biol Endocrinol ; 22(1): 59, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38778327

ABSTRACT

BACKGROUND: Deep learning has been increasingly investigated for assisting clinical in vitro fertilization (IVF). The first technical step in many tasks is to visually detect and locate sperm, oocytes, and embryos in images. For clinical deployment of such deep learning models, different clinics use different image acquisition hardware and different sample preprocessing protocols, raising the concern over whether the reported accuracy of a deep learning model by one clinic could be reproduced in another clinic. Here we aim to investigate the effect of each imaging factor on the generalizability of object detection models, using sperm analysis as a pilot example. METHODS: Ablation studies were performed using state-of-the-art models for detecting human sperm to quantitatively assess how model precision (false-positive detection) and recall (missed detection) were affected by imaging magnification, imaging mode, and sample preprocessing protocols. The results led to the hypothesis that the richness of image acquisition conditions in a training dataset deterministically affects model generalizability. The hypothesis was tested by first enriching the training dataset with a wide range of imaging conditions, then validated through internal blind tests on new samples and external multi-center clinical validations. RESULTS: Ablation experiments revealed that removing subsets of data from the training dataset significantly reduced model precision. Removing raw sample images from the training dataset caused the largest drop in model precision, whereas removing 20x images caused the largest drop in model recall. by incorporating different imaging and sample preprocessing conditions into a rich training dataset, the model achieved an intraclass correlation coefficient (ICC) of 0.97 (95% CI: 0.94-0.99) for precision, and an ICC of 0.97 (95% CI: 0.93-0.99) for recall. Multi-center clinical validation showed no significant differences in model precision or recall across different clinics and applications. CONCLUSIONS: The results validated the hypothesis that the richness of data in the training dataset is a key factor impacting model generalizability. These findings highlight the importance of diversity in a training dataset for model evaluation and suggest that future deep learning models in andrology and reproductive medicine should incorporate comprehensive feature sets for enhanced generalizability across clinics.


Subject(s)
Deep Learning , Spermatozoa , Humans , Pilot Projects , Male , Spermatozoa/physiology , Fertilization in Vitro/methods , Image Processing, Computer-Assisted/methods , Semen Analysis/methods
19.
J Magn Reson Imaging ; 2024 May 11.
Article in English | MEDLINE | ID: mdl-38733369

ABSTRACT

BACKGROUND: Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large-scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated by multiscanner test-retest experiments, which might help to overcome the problem of limited generalizability to external data. PURPOSE: To evaluate the influence of using only a subset of robust radiomics features, defined in a prior in vivo multi-MRI-scanner test-retest-study, on the performance and generalizability of radiomics models. STUDY TYPE: Retrospective. POPULATION: Patients with monoclonal plasma cell disorders. Training set (117 MRIs from center 1); internal test set (42 MRIs from center 1); external test set (143 MRIs from center 2-8). FIELD STRENGTH/SEQUENCE: 1.5T and 3.0T; T1-weighted turbo spin echo. ASSESSMENT: The task for the radiomics models was to predict plasma cell infiltration, determined by bone marrow biopsy, noninvasively from MRI. Radiomics machine learning models, including linear regressor, support vector regressor (SVR), and random forest regressor (RFR), were trained on data from center 1, using either all radiomics features, or using only reproducible radiomics features. Models were tested on an internal (center 1) and a multicentric external data set (center 2-8). STATISTICAL TESTS: Pearson correlation coefficient r and mean absolute error (MAE) between predicted and actual plasma cell infiltration. Fisher's z-transformation, Wilcoxon signed-rank test, Wilcoxon rank-sum test; significance level P < 0.05. RESULTS: When using only reproducible features compared with all features, the performance of the SVR on the external test set significantly improved (r = 0.43 vs. r = 0.18 and MAE = 22.6 vs. MAE = 28.2). For the RFR, the performance on the external test set deteriorated when using only reproducible instead of all radiomics features (r = 0.33 vs. r = 0.44, P = 0.29 and MAE = 21.9 vs. MAE = 20.5, P = 0.10). CONCLUSION: Using only reproducible radiomics features improves the external performance of some, but not all machine learning models, and did not automatically lead to an improvement of the external performance of the overall best radiomics model. TECHNICAL EFFICACY: Stage 2.

20.
Pharmacol Res ; 200: 107074, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38232909

ABSTRACT

To date, no population-based studies have specifically explored the external validity of pivotal randomized clinical trials (RCTs) of biologics simultaneously for a broad spectrum of immuno-mediated inflammatory diseases (IMIDs). The aims of this study were, firstly, to compare the patients' characteristics and median treatment duration of biologics approved for IMIDs between RCTs' and real-world setting (RW); secondly, to assess the extent of biologic users treated for IMIDs in the real-world setting that would not have been eligible for inclusion into pivotal RCT for each indication of use. Using the Italian VALORE distributed database (66,639 incident biologic users), adult patients with IMIDs treated with biologics in the Italian real-world setting were substantially older (mean age ± SD: 50 ± 15 years) compared to those enrolled in pivotal RCTs (45 ± 15 years). In the real-world setting, certolizumab pegol was more commonly used by adult women with psoriasis/ankylosing spondylitis (F/M ratio: 1.8-1.9) compared to RCTs (F/M ratio: 0.5-0.6). The median treatment duration (weeks) of incident biologic users in RW was significantly higher than the duration of pivotal RCTs in almost all indications for use and most biologics (4-100 vs. 6-167). Furthermore, almost half (46.4%) of biologic users from RW settings would have been ineligible for inclusion in the respective indication-specific pivotal RCTs. The main reasons were: advanced age, recent history of cancer and presence of other concomitant IMIDs. These findings suggest that post-marketing surveillance of biologics should be prioritized for those patients.


Subject(s)
Biological Products , Psoriasis , Adult , Female , Humans , Biological Products/adverse effects , Immunomodulating Agents , Italy , Psoriasis/drug therapy
SELECTION OF CITATIONS
SEARCH DETAIL