Búsqueda | Biblioteca Virtual en Salud Fronteriza

1.

Holzmeister, Felix; Johannesson, Magnus; Böhm, Robert; Dreber, Anna; Huber, Jürgen; Kirchler, Michael.

Proc Natl Acad Sci U S A ; 121(32): e2403490121, 2024 Aug 06.

Artículo en Inglés | MEDLINE | ID: mdl-39078672

RESUMEN

A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type's heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.

2.

Competition and moral behavior: A meta-analysis of forty-five crowd-sourced experimental designs.

Huber, Christoph; Dreber, Anna; Huber, Jürgen; Johannesson, Magnus; Kirchler, Michael; Weitzel, Utz; Abellán, Miguel; Adayeva, Xeniya; Ay, Fehime Ceren; Barron, Kai; Berry, Zachariah; Bönte, Werner; Brütt, Katharina; Bulutay, Muhammed; Campos-Mercade, Pol; Cardella, Eric; Claassen, Maria Almudena; Cornelissen, Gert; Dawson, Ian G J; Delnoij, Joyce; Demiral, Elif E; Dimant, Eugen; Doerflinger, Johannes Theodor; Dold, Malte; Emery, Cécile; Fiala, Lenka; Fiedler, Susann; Freddi, Eleonora; Fries, Tilman; Gasiorowska, Agata; Glogowsky, Ulrich; M Gorny, Paul; Gretton, Jeremy David; Grohmann, Antonia; Hafenbrädl, Sebastian; Handgraaf, Michel; Hanoch, Yaniv; Hart, Einav; Hennig, Max; Hudja, Stanton; Hütter, Mandy; Hyndman, Kyle; Ioannidis, Konstantinos; Isler, Ozan; Jeworrek, Sabrina; Jolles, Daniel; Juanchich, Marie; Kc, Raghabendra Pratap; Khadjavi, Menusch; Kugler, Tamar.

Proc Natl Acad Sci U S A ; 120(23): e2215572120, 2023 Jun 06.

Artículo en Inglés | MEDLINE | ID: mdl-37252958

RESUMEN

Does competition affect moral behavior? This fundamental question has been debated among leading scholars for centuries, and more recently, it has been tested in experimental studies yielding a body of rather inconclusive empirical evidence. A potential source of ambivalent empirical results on the same hypothesis is design heterogeneity-variation in true effect sizes across various reasonable experimental research protocols. To provide further evidence on whether competition affects moral behavior and to examine whether the generalizability of a single experimental study is jeopardized by design heterogeneity, we invited independent research teams to contribute experimental designs to a crowd-sourced project. In a large-scale online data collection, 18,123 experimental participants were randomly allocated to 45 randomly selected experimental designs out of 95 submitted designs. We find a small adverse effect of competition on moral behavior in a meta-analysis of the pooled data. The crowd-sourced design of our study allows for a clean identification and estimation of the variation in effect sizes above and beyond what could be expected due to sampling variance. We find substantial design heterogeneity-estimated to be about 1.6 times as large as the average standard error of effect size estimates of the 45 research designs-indicating that the informativeness and generalizability of results based on a single experimental design are limited. Drawing strong conclusions about the underlying hypotheses in the presence of substantive design heterogeneity requires moving toward much larger data collections on various experimental designs testing the same hypothesis.

3.

Systematically missing data in causally interpretable meta-analysis.

Steingrimsson, Jon A; Barker, David H; Bie, Ruofan; Dahabreh, Issa J.

Biostatistics ; 25(2): 289-305, 2024 Apr 15.

Artículo en Inglés | MEDLINE | ID: mdl-36977366

RESUMEN

Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but from which covariate information can be obtained. In such analyses, a key practical challenge is the presence of systematically missing data when some trials have collected data on one or more baseline covariates, but other trials have not, such that the covariate information is missing for all participants in the latter. In this article, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis. We propose three estimators for the average treatment effect in the target population, examine their asymptotic properties, and show that they have good finite-sample performance in simulation studies. We use the estimators to analyze data from two large lung cancer screening trials and target population data from the National Health and Nutrition Examination Survey (NHANES). To accommodate the complex survey design of the NHANES, we modify the methods to incorporate survey sampling weights and allow for clustering.

Asunto(s)

Detección Precoz del Cáncer , Neoplasias Pulmonares , Humanos , Encuestas Nutricionales , Neoplasias Pulmonares/epidemiología , Simulación por Computador , Proyectos de Investigación

4.

Examining the generalizability of research findings from archival data.

Delios, Andrew; Clemente, Elena Giulia; Wu, Tao; Tan, Hongbin; Wang, Yong; Gordon, Michael; Viganola, Domenico; Chen, Zhaowei; Dreber, Anna; Johannesson, Magnus; Pfeiffer, Thomas; Uhlmann, Eric Luis.

Proc Natl Acad Sci U S A ; 119(30): e2120377119, 2022 Jul 26.

Artículo en Inglés | MEDLINE | ID: mdl-35858443

RESUMEN

This initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability-for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples.

5.

Multi-feature concatenation and multi-classifier stacking: An interpretable and generalizable machine learning method for MDD discrimination with rsfMRI.

Luo, Yunsong; Chen, Wenyu; Zhan, Ling; Qiu, Jiang; Jia, Tao.

Neuroimage ; 285: 120497, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38142755

RESUMEN

Major depressive disorder (MDD) is a serious and heterogeneous psychiatric disorder that needs accurate diagnosis. Resting-state functional MRI (rsfMRI), which captures multiple perspectives on brain structure, function, and connectivity, is increasingly applied in the diagnosis and pathological research of MDD. Different machine learning algorithms are then developed to exploit the rich information in rsfMRI and discriminate MDD patients from normal controls. Despite recent advances reported, the MDD discrimination accuracy has room for further improvement. The generalizability and interpretability of the discrimination method are not sufficiently addressed either. Here, we propose a machine learning method (MFMC) for MDD discrimination by concatenating multiple features and stacking multiple classifiers. MFMC is tested on the REST-meta-MDD data set that contains 2428 subjects collected from 25 different sites. MFMC yields 96.9% MDD discrimination accuracy, demonstrating a significant improvement over existing methods. In addition, the generalizability of MFMC is validated by the good performance when the training and testing subjects are from independent sites. The use of XGBoost as the meta classifier allows us to probe the decision process of MFMC. We identify 13 feature values related to 9 brain regions including the posterior cingulate gyrus, superior frontal gyrus orbital part, and angular gyrus, which contribute most to the classification and also demonstrate significant differences at the group level. The use of these 13 feature values alone can reach 87% of MFMC's full performance when taking all feature values. These features may serve as clinically useful diagnostic and prognostic biomarkers for MDD in the future.

Asunto(s)

Trastorno Depresivo Mayor , Humanos , Trastorno Depresivo Mayor/diagnóstico por imagen , Trastorno Depresivo Mayor/patología , Mapeo Encefálico/métodos , Imagen por Resonancia Magnética/métodos , Encéfalo , Aprendizaje Automático

6.

Invited Commentary: Combining Information to Answer Epidemiologic Questions About a Target Population.

Dahabreh, Issa J.

Am J Epidemiol ; 193(5): 741-750, 2024 05 07.

Artículo en Inglés | MEDLINE | ID: mdl-38456780

RESUMEN

Epidemiologists are attempting to address research questions of increasing complexity by developing novel methods for combining information from diverse sources. Cole et al. (Am J Epidemiol. 2023;192(3)467-474) provide 2 examples of the process of combining information to draw inferences about a population proportion. In this commentary, we consider combining information to learn about a target population as an epidemiologic activity and distinguish it from more conventional meta-analyses. We examine possible rationales for combining information and discuss broad methodological considerations, with an emphasis on study design, assumptions, and sources of uncertainty.

Asunto(s)

Métodos Epidemiológicos , Humanos , Metaanálisis como Asunto , Estudios Epidemiológicos , Diseño de Investigaciones Epidemiológicas , Incertidumbre

7.

Variable selection when estimating effects in external target populations.

Webster-Clark, Michael; Ross, Rachael K; Keil, Alexander P; Platt, Robert W.

Am J Epidemiol ; 193(8): 1176-1181, 2024 Aug 05.

Artículo en Inglés | MEDLINE | ID: mdl-38629587

RESUMEN

External validity is an important part of epidemiologic research. To validly estimate effects in specific external target populations using a chosen effect measure (ie, "transport"), some methods require that one account for all effect measure modifiers (EMMs). However, little is known about how including other variables that are not EMMs (ie, non-EMMs) in adjustment sets affects estimates. Using simulations, we evaluated how inclusion of non-EMMs affected estimation of the transported risk difference (RD) by assessing the impacts of covariates that (1) differ (or not) between the trial and the target, (2) are associated with the outcome (or not), and (3) modify the RD (or not). We assessed variation and bias when covariates with each possible combination of these factors were used to transport RDs using outcome modeling or inverse odds weighting. Inclusion of variables that differed in distribution between the populations but were non-EMMs reduced precision, regardless of whether they were associated with the outcome. However, non-EMMs associated with selection did not amplify bias resulting from omission of necessary EMMs. Including all variables associated with the outcome may result in unnecessarily imprecise estimates when estimating treatment effects in external target populations.

Asunto(s)

Sesgo , Humanos , Simulación por Computador

8.

Simple graphical rules for assessing selection bias in general-population and selected-sample treatment effects.

Mathur, Maya B; Shpitser, Ilya.

Am J Epidemiol ; 2024 Jun 20.

Artículo en Inglés | MEDLINE | ID: mdl-38904459

RESUMEN

When analyzing a selected sample from a general population, selection bias can arise relative to the causal average treatment effect (ATE) for the general population, and also relative to the ATE for the selected sample itself. We provide simple graphical rules that indicate: (1) if a selected-sample analysis will be unbiased for each ATE; (2) whether adjusting for certain covariates could eliminate selection bias. The rules can easily be checked in a standard single-world intervention graph. When the treatment could affect selection, a third estimand of potential scientific interest is the "net treatment difference", namely the net change in outcomes that would occur for the selected sample if all members of the general population were treated versus not treated, including any effects of the treatment on which individuals are in the selected sample . We provide graphical rules for this estimand as well. We decompose bias in a selected-sample analysis relative to the general-population ATE into: (1) "internal bias" relative to the net treatment difference; (2) "net-external bias", a discrepancy between the net treatment difference and the general-population ATE. Each bias can be assessed unambiguously via a distinct graphical rule, providing new conceptual insight into the mechanisms by which certain causal structures produce selection bias.

9.

Estimating dementia incidence in insured older Asian Americans and Pacific Islanders in California: an application of inverse odds of selection weights.

Hayes-Larson, Eleanor; Zhou, Yixuan; Wu, Yingyan; Rojas-Saunero, Paloma L; Seamans, Marissa J; Gee, Gilbert C; Brookmeyer, Ron; Gilsanz, Paola; Whitmer, Rachel A; Mayeda, Elizabeth Rose.

Am J Epidemiol ; 2024 Jul 05.

Artículo en Inglés | MEDLINE | ID: mdl-38973744

RESUMEN

Literature shows heterogeneous age-standardized dementia incidence rates across US Asian American, Native Hawaiian, and Pacific Islanders (AANHPI), but no estimates of population-representative dementia incidence exist due to lack of AANHPI longitudinal probability samples. We compared harmonized characteristics between AANHPI Kaiser Permanente Northern California members (KPNC cohort) and the target population of AANHPI 60+ with private or Medicare insurance using the California Health Interview Survey. We used stabilized inverse odds of selection weights (sIOSW) to estimate ethnicity-specific crude and age-standardized dementia incidence rates and cumulative risk by age 90 in the target population. Differences between the KPNC cohort and target population varied by ethnicity. sIOSW eliminated most differences in larger ethnic groups; some differences remained in smaller groups. Estimated crude dementia incidence rates using sIOSW (versus unweighted) were similar in Chinese, Filipinos, Pacific Islanders and Vietnamese, and higher in Japanese, Koreans, and South Asians. Unweighted and weighted age-standardized incidence rates differed for South Asians. Unweighted and weighted cumulative risk were similar for all groups. We estimated the first population-representative dementia incidence rates and cumulative risk in AANHPI ethnic groups. We encountered some estimation problems and weighted estimates were imprecise, highlighting challenges using weighting to extend inferences to target populations.

10.

Cardiorenal effects of Angiotensin-converting enzyme inhibitors and Angiotensin receptor blockers in people underrepresented in trials: analysis of routinely collected data with emulation of a reference trial (ONTARGET).

Baptiste, Paris J; Wong, Angel Y S; Schultze, Anna; Clase, Catherine M; Leyrat, Clémence; Williamson, Elizabeth; Powell, Emma; Mann, Johannes F E; Cunnington, Marianne; Teo, Koon; Bangdiwala, Shrikant I; Gao, Peggy; Tomlinson, Laurie; Wing, Kevin.

Am J Epidemiol ; 2024 Jun 18.

Artículo en Inglés | MEDLINE | ID: mdl-38896054

RESUMEN

Cardiovascular disease (CVD) is a leading cause of death globally. Angiotensin-converting enzyme inhibitors (ACEi) and angiotensin receptor blockers (ARB), compared in the ONTARGET trial, each prevent CVD. However, trial results may not be generalisable and their effectiveness in underrepresented groups is unclear. Using trial emulation methods within routine-care data to validate findings, we explored generalisability of ONTARGET results. For people prescribed an ACEi/ARB in the UK Clinical Practice Research Datalink GOLD from 1/1/2001-31/7/2019, we applied trial criteria and propensity-score methods to create an ONTARGET trial-eligible cohort. Comparing ARB to ACEi, we estimated hazard ratios for the primary composite trial outcome (cardiovascular death, myocardial infarction, stroke, or hospitalisation for heart failure), and secondary outcomes. As the pre-specified criteria were met confirming trial emulation, we then explored treatment heterogeneity among three trial-underrepresented subgroups: females, those aged ≥75 years and those with chronic kidney disease (CKD). In the trial-eligible population (n=137,155), results for the primary outcome demonstrated similar effects of ARB and ACEi, (HR 0.97 [95% CI: 0.93, 1.01]), meeting the pre-specified validation criteria. When extending this outcome to trial-underrepresented groups, similar treatment effects were observed by sex, age and CKD. This suggests that ONTARGET trial findings are generalisable to trial-underrepresented subgroups.

11.

Testing the sensitivity of diagnosis-derived patterns in functional brain networks to symptom burden in a Norwegian youth sample.

Voldsbekk, Irene; Kjelkenes, Rikka; Frogner, Erik R; Westlye, Lars T; Alnaes, Dag.

Hum Brain Mapp ; 45(3): e26631, 2024 Feb 15.

Artículo en Inglés | MEDLINE | ID: mdl-38379514

RESUMEN

Aberrant brain network development represents a putative aetiological component in mental disorders, which typically emerge during childhood and adolescence. Previous studies have identified resting-state functional connectivity (RSFC) patterns reflecting psychopathology, but the generalisability to other samples and politico-cultural contexts has not been established. We investigated whether a previously identified cross-diagnostic case-control and autism spectrum disorder (ASD)-specific pattern of RSFC (discovery sample; aged 5-21 from New York City, USA; n = 1666) could be validated in a Norwegian convenience-based youth sample (validation sample; aged 9-25 from Oslo, Norway; n = 531). As a test of generalisability, we investigated if these diagnosis-derived RSFC patterns were sensitive to levels of symptom burden in both samples, based on an independent measure of symptom burden. Both the cross-diagnostic and ASD-specific RSFC pattern were validated across samples. Connectivity patterns were significantly associated with thematically appropriate symptom dimensions in the discovery sample. In the validation sample, the ASD-specific RSFC pattern showed a weak, inverse relationship with symptoms of conduct problems, hyperactivity and prosociality, while the cross-diagnostic pattern was not significantly linked to symptoms. Diagnosis-derived connectivity patterns in a developmental clinical US sample were validated in a convenience sample of Norwegian youth, however, they were not associated with mental health symptoms.

Asunto(s)

Trastorno del Espectro Autista , Humanos , Adolescente , Trastorno del Espectro Autista/diagnóstico por imagen , Mapeo Encefálico/métodos , Carga Sintomática , Encéfalo/diagnóstico por imagen , Noruega , Imagen por Resonancia Magnética/métodos

12.

Sex classification from functional brain connectivity: Generalization to multiple datasets.

Wiersch, Lisa; Friedrich, Patrick; Hamdan, Sami; Komeyer, Vera; Hoffstaedter, Felix; Patil, Kaustubh R; Eickhoff, Simon B; Weis, Susanne.

Hum Brain Mapp ; 45(6): e26683, 2024 Apr 15.

Artículo en Inglés | MEDLINE | ID: mdl-38647035

RESUMEN

Machine learning (ML) approaches are increasingly being applied to neuroimaging data. Studies in neuroscience typically have to rely on a limited set of training data which may impair the generalizability of ML models. However, it is still unclear which kind of training sample is best suited to optimize generalization performance. In the present study, we systematically investigated the generalization performance of sex classification models trained on the parcelwise connectivity profile of either single samples or compound samples of two different sizes. Generalization performance was quantified in terms of mean across-sample classification accuracy and spatial consistency of accurately classifying parcels. Our results indicate that the generalization performance of parcelwise classifiers (pwCs) trained on single dataset samples is dependent on the specific test samples. Certain datasets seem to "match" in the sense that classifiers trained on a sample from one dataset achieved a high accuracy when tested on the respected other one and vice versa. The pwCs trained on the compound samples demonstrated overall highest generalization performance for all test samples, including one derived from a dataset not included in building the training samples. Thus, our results indicate that both a large sample size and a heterogeneous data composition of a training sample have a central role in achieving generalizable results.

Asunto(s)

Conectoma , Aprendizaje Automático , Imagen por Resonancia Magnética , Humanos , Femenino , Masculino , Adulto , Conectoma/métodos , Caracteres Sexuales , Conjuntos de Datos como Asunto , Adulto Joven , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología

13.

A generalizability score for aggregate causal effect.

Chen, Rui; Chen, Guanhua; Yu, Menggang.

Biostatistics ; 24(2): 309-326, 2023 04 14.

Artículo en Inglés | MEDLINE | ID: mdl-34382066

RESUMEN

Scientists frequently generalize population level causal quantities such as average treatment effect from a source population to a target population. When the causal effects are heterogeneous, differences in subject characteristics between the source and target populations may make such a generalization difficult and unreliable. Reweighting or regression can be used to adjust for such differences when generalizing. However, these methods typically suffer from large variance if there is limited covariate distribution overlap between the two populations. We propose a generalizability score to address this issue. The score can be used as a yardstick to select target subpopulations for generalization. A simplified version of the score avoids using any outcome information and thus can prevent deliberate biases associated with inadvertent access to such information. Both simulation studies and real data analysis demonstrate convincing results for such selection.

Asunto(s)

Proyectos de Investigación , Humanos , Puntaje de Propensión , Simulación por Computador , Causalidad , Sesgo

14.

Extending prediction models for use in a new target population with failure time outcomes.

Steingrimsson, Jon A.

Biostatistics ; 24(3): 728-742, 2023 Jul 14.

Artículo en Inglés | MEDLINE | ID: mdl-35389429

RESUMEN

Prediction models are often built and evaluated using data from a population that differs from the target population where model-derived predictions are intended to be used in. In this article, we present methods for evaluating model performance in the target population when some observations are right censored. The methods assume that outcome and covariate data are available from a source population used for model development and covariates, but no outcome data, are available from the target population. We evaluate the finite sample performance of the proposed estimators using simulations and apply the methods to transport a prediction model built using data from a lung cancer screening trial to a nationally representative population of participants eligible for lung cancer screening.

Asunto(s)

Detección Precoz del Cáncer , Neoplasias Pulmonares , Humanos , Modelos Estadísticos , Simulación por Computador

15.

DF-QSM: Data Fidelity based Hybrid Approach for Improved Quantitative Susceptibility Mapping of the Brain.

Paluru, Naveen; Susan Mathew, Raji; Yalavarthy, Phaneendra K.

NMR Biomed ; 37(9): e5163, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-38649140

RESUMEN

Quantitative Susceptibility Mapping (QSM) is an advanced magnetic resonance imaging (MRI) technique to quantify the magnetic susceptibility of the tissue under investigation. Deep learning methods have shown promising results in deconvolving the susceptibility distribution from the measured local field obtained from the MR phase. Although existing deep learning based QSM methods can produce high-quality reconstruction, they are highly biased toward training data distribution with less scope for generalizability. This work proposes a hybrid two-step reconstruction approach to improve deep learning based QSM reconstruction. The susceptibility map prediction obtained from the deep learning methods has been refined in the framework developed in this work to ensure consistency with the measured local field. The developed method was validated on existing deep learning and model-based deep learning methods for susceptibility mapping of the brain. The developed method resulted in improved reconstruction for MRI volumes obtained with different acquisition settings, including deep learning models trained on constrained (limited) data settings.

Asunto(s)

Encéfalo , Aprendizaje Profundo , Imagen por Resonancia Magnética , Imagen por Resonancia Magnética/métodos , Humanos , Encéfalo/diagnóstico por imagen , Mapeo Encefálico/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Masculino , Femenino , Algoritmos , Adulto

16.

TMTV-Net: fully automated total metabolic tumor volume segmentation in lymphoma PET/CT images - a multi-center generalizability analysis.

Yousefirizi, Fereshteh; Klyuzhin, Ivan S; O, Joo Hyun; Harsini, Sara; Tie, Xin; Shiri, Isaac; Shin, Muheon; Lee, Changhee; Cho, Steve Y; Bradshaw, Tyler J; Zaidi, Habib; Bénard, François; Sehn, Laurie H; Savage, Kerry J; Steidl, Christian; Uribe, Carlos F; Rahmim, Arman.

Eur J Nucl Med Mol Imaging ; 51(7): 1937-1954, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38326655

RESUMEN

PURPOSE: Total metabolic tumor volume (TMTV) segmentation has significant value enabling quantitative imaging biomarkers for lymphoma management. In this work, we tackle the challenging task of automated tumor delineation in lymphoma from PET/CT scans using a cascaded approach. METHODS: Our study included 1418 2-[18F]FDG PET/CT scans from four different centers. The dataset was divided into 900 scans for development/validation/testing phases and 518 for multi-center external testing. The former consisted of 450 lymphoma, lung cancer, and melanoma scans, along with 450 negative scans, while the latter consisted of lymphoma patients from different centers with diffuse large B cell, primary mediastinal large B cell, and classic Hodgkin lymphoma cases. Our approach involves resampling PET/CT images into different voxel sizes in the first step, followed by training multi-resolution 3D U-Nets on each resampled dataset using a fivefold cross-validation scheme. The models trained on different data splits were ensemble. After applying soft voting to the predicted masks, in the second step, we input the probability-averaged predictions, along with the input imaging data, into another 3D U-Net. Models were trained with semi-supervised loss. We additionally considered the effectiveness of using test time augmentation (TTA) to improve the segmentation performance after training. In addition to quantitative analysis including Dice score (DSC) and TMTV comparisons, the qualitative evaluation was also conducted by nuclear medicine physicians. RESULTS: Our cascaded soft-voting guided approach resulted in performance with an average DSC of 0.68 ± 0.12 for the internal test data from developmental dataset, and an average DSC of 0.66 ± 0.18 on the multi-site external data (n = 518), significantly outperforming (p < 0.001) state-of-the-art (SOTA) approaches including nnU-Net and SWIN UNETR. While TTA yielded enhanced performance gains for some of the comparator methods, its impact on our cascaded approach was found to be negligible (DSC: 0.66 ± 0.16). Our approach reliably quantified TMTV, with a correlation of 0.89 with the ground truth (p < 0.001). Furthermore, in terms of visual assessment, concordance between quantitative evaluations and clinician feedback was observed in the majority of cases. The average relative error (ARE) and the absolute error (AE) in TMTV prediction on external multi-centric dataset were ARE = 0.43 ± 0.54 and AE = 157.32 ± 378.12 (mL) for all the external test data (n = 518), and ARE = 0.30 ± 0.22 and AE = 82.05 ± 99.78 (mL) when the 10% outliers (n = 53) were excluded. CONCLUSION: TMTV-Net demonstrates strong performance and generalizability in TMTV segmentation across multi-site external datasets, encompassing various lymphoma subtypes. A negligible reduction of 2% in overall performance during testing on external data highlights robust model generalizability across different centers and cancer types, likely attributable to its training with resampled inputs. Our model is publicly available, allowing easy multi-site evaluation and generalizability analysis on datasets from different institutions.

Asunto(s)

Procesamiento de Imagen Asistido por Computador , Linfoma , Tomografía Computarizada por Tomografía de Emisión de Positrones , Carga Tumoral , Humanos , Tomografía Computarizada por Tomografía de Emisión de Positrones/métodos , Linfoma/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Fluorodesoxiglucosa F18 , Automatización , Masculino , Femenino

17.

Testing the generalizability and effectiveness of deep learning models among clinics: sperm detection as a pilot study.

Wang, Jiaqi; Jin, Yufei; Jiang, Aojun; Chen, Wenyuan; Shan, Guanqiao; Gu, Yifan; Ming, Yue; Li, Jichang; Yue, Chunfeng; Huang, Zongjie; Librach, Clifford; Lin, Ge; Wang, Xibu; Zhao, Huan; Sun, Yu; Zhang, Zhuoran.

Reprod Biol Endocrinol ; 22(1): 59, 2024 May 22.

Artículo en Inglés | MEDLINE | ID: mdl-38778327

RESUMEN

BACKGROUND: Deep learning has been increasingly investigated for assisting clinical in vitro fertilization (IVF). The first technical step in many tasks is to visually detect and locate sperm, oocytes, and embryos in images. For clinical deployment of such deep learning models, different clinics use different image acquisition hardware and different sample preprocessing protocols, raising the concern over whether the reported accuracy of a deep learning model by one clinic could be reproduced in another clinic. Here we aim to investigate the effect of each imaging factor on the generalizability of object detection models, using sperm analysis as a pilot example. METHODS: Ablation studies were performed using state-of-the-art models for detecting human sperm to quantitatively assess how model precision (false-positive detection) and recall (missed detection) were affected by imaging magnification, imaging mode, and sample preprocessing protocols. The results led to the hypothesis that the richness of image acquisition conditions in a training dataset deterministically affects model generalizability. The hypothesis was tested by first enriching the training dataset with a wide range of imaging conditions, then validated through internal blind tests on new samples and external multi-center clinical validations. RESULTS: Ablation experiments revealed that removing subsets of data from the training dataset significantly reduced model precision. Removing raw sample images from the training dataset caused the largest drop in model precision, whereas removing 20x images caused the largest drop in model recall. by incorporating different imaging and sample preprocessing conditions into a rich training dataset, the model achieved an intraclass correlation coefficient (ICC) of 0.97 (95% CI: 0.94-0.99) for precision, and an ICC of 0.97 (95% CI: 0.93-0.99) for recall. Multi-center clinical validation showed no significant differences in model precision or recall across different clinics and applications. CONCLUSIONS: The results validated the hypothesis that the richness of data in the training dataset is a key factor impacting model generalizability. These findings highlight the importance of diversity in a training dataset for model evaluation and suggest that future deep learning models in andrology and reproductive medicine should incorporate comprehensive feature sets for enhanced generalizability across clinics.

Asunto(s)

Aprendizaje Profundo , Espermatozoides , Humanos , Proyectos Piloto , Masculino , Espermatozoides/fisiología , Fertilización In Vitro/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Análisis de Semen/métodos

18.

Reproducible Radiomics Features from Multi-MRI-Scanner Test-Retest-Study: Influence on Performance and Generalizability of Models.

Wennmann, Markus; Rotkopf, Lukas T; Bauer, Fabian; Hielscher, Thomas; Kächele, Jessica; Mai, Elias K; Weinhold, Niels; Raab, Marc-Steffen; Goldschmidt, Hartmut; Weber, Tim F; Schlemmer, Heinz-Peter; Delorme, Stefan; Maier-Hein, Klaus; Neher, Peter.

J Magn Reson Imaging ; 2024 May 11.

Artículo en Inglés | MEDLINE | ID: mdl-38733369

RESUMEN

BACKGROUND: Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large-scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated by multiscanner test-retest experiments, which might help to overcome the problem of limited generalizability to external data. PURPOSE: To evaluate the influence of using only a subset of robust radiomics features, defined in a prior in vivo multi-MRI-scanner test-retest-study, on the performance and generalizability of radiomics models. STUDY TYPE: Retrospective. POPULATION: Patients with monoclonal plasma cell disorders. Training set (117 MRIs from center 1); internal test set (42 MRIs from center 1); external test set (143 MRIs from center 2-8). FIELD STRENGTH/SEQUENCE: 1.5T and 3.0T; T1-weighted turbo spin echo. ASSESSMENT: The task for the radiomics models was to predict plasma cell infiltration, determined by bone marrow biopsy, noninvasively from MRI. Radiomics machine learning models, including linear regressor, support vector regressor (SVR), and random forest regressor (RFR), were trained on data from center 1, using either all radiomics features, or using only reproducible radiomics features. Models were tested on an internal (center 1) and a multicentric external data set (center 2-8). STATISTICAL TESTS: Pearson correlation coefficient r and mean absolute error (MAE) between predicted and actual plasma cell infiltration. Fisher's z-transformation, Wilcoxon signed-rank test, Wilcoxon rank-sum test; significance level P < 0.05. RESULTS: When using only reproducible features compared with all features, the performance of the SVR on the external test set significantly improved (r = 0.43 vs. r = 0.18 and MAE = 22.6 vs. MAE = 28.2). For the RFR, the performance on the external test set deteriorated when using only reproducible instead of all radiomics features (r = 0.33 vs. r = 0.44, P = 0.29 and MAE = 21.9 vs. MAE = 20.5, P = 0.10). CONCLUSION: Using only reproducible radiomics features improves the external performance of some, but not all machine learning models, and did not automatically lead to an improvement of the external performance of the overall best radiomics model. TECHNICAL EFFICACY: Stage 2.

19.

Comparing clinical trial population representativeness to real-world users of 17 biologics approved for immune-mediated inflammatory diseases: An external validity analysis of 66,639 biologic users from the Italian VALORE project.

Ingrasciotta, Ylenia; Spini, Andrea; L'Abbate, Luca; Fiore, Elena Sofia; Carollo, Massimo; Ientile, Valentina; Isgrò, Valentina; Cavazzana, Anna; Biasi, Valeria; Rossi, Paola; Ejlli, Lucian; Belleudi, Valeria; Poggi, Francesca; Sapigni, Ester; Puccini, Aurora; Ancona, Domenica; Stella, Paolo; Pollina Addario, Sebastiano; Allotta, Alessandra; Leoni, Olivia; Zanforlini, Martina; Tuccori, Marco; Gini, Rosa; Trifirò, Gianluca.

Pharmacol Res ; 200: 107074, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38232909

RESUMEN

To date, no population-based studies have specifically explored the external validity of pivotal randomized clinical trials (RCTs) of biologics simultaneously for a broad spectrum of immuno-mediated inflammatory diseases (IMIDs). The aims of this study were, firstly, to compare the patients' characteristics and median treatment duration of biologics approved for IMIDs between RCTs' and real-world setting (RW); secondly, to assess the extent of biologic users treated for IMIDs in the real-world setting that would not have been eligible for inclusion into pivotal RCT for each indication of use. Using the Italian VALORE distributed database (66,639 incident biologic users), adult patients with IMIDs treated with biologics in the Italian real-world setting were substantially older (mean age ± SD: 50 ± 15 years) compared to those enrolled in pivotal RCTs (45 ± 15 years). In the real-world setting, certolizumab pegol was more commonly used by adult women with psoriasis/ankylosing spondylitis (F/M ratio: 1.8-1.9) compared to RCTs (F/M ratio: 0.5-0.6). The median treatment duration (weeks) of incident biologic users in RW was significantly higher than the duration of pivotal RCTs in almost all indications for use and most biologics (4-100 vs. 6-167). Furthermore, almost half (46.4%) of biologic users from RW settings would have been ineligible for inclusion in the respective indication-specific pivotal RCTs. The main reasons were: advanced age, recent history of cancer and presence of other concomitant IMIDs. These findings suggest that post-marketing surveillance of biologics should be prioritized for those patients.

Asunto(s)

Productos Biológicos , Psoriasis , Adulto , Femenino , Humanos , Productos Biológicos/efectos adversos , Agentes Inmunomoduladores , Italia , Psoriasis/tratamiento farmacológico

20.

Fusing trial data for treatment comparisons: Single vs multi-span bridging.

Shook-Sa, Bonnie E; Zivich, Paul N; Rosin, Samuel P; Edwards, Jessie K; Adimora, Adaora A; Hudgens, Michael G; Cole, Stephen R.

Stat Med ; 43(4): 793-815, 2024 02 20.

Artículo en Inglés | MEDLINE | ID: mdl-38110289

RESUMEN

While randomized controlled trials (RCTs) are critical for establishing the efficacy of new therapies, there are limitations regarding what comparisons can be made directly from trial data. RCTs are limited to a small number of comparator arms and often compare a new therapeutic to a standard of care which has already proven efficacious. It is sometimes of interest to estimate the efficacy of the new therapy relative to a treatment that was not evaluated in the same trial, such as a placebo or an alternative therapy that was evaluated in a different trial. Such dual-study comparisons are challenging because of potential differences between trial populations that can affect the outcome. In this article, two bridging estimators are considered that allow for comparisons of treatments evaluated in different trials, accounting for measured differences in trial populations. A "multi-span" estimator leverages a shared arm between two trials, while a "single-span" estimator does not require a shared arm. A diagnostic statistic that compares the outcome in the standardized shared arms is provided. The two estimators are compared in simulations, where both estimators demonstrate minimal empirical bias and nominal confidence interval coverage when the identification assumptions are met. The estimators are applied to data from the AIDS Clinical Trials Group 320 and 388 to compare the efficacy of two-drug vs four-drug antiretroviral therapy on CD4 cell counts among persons with advanced HIV. The single-span approach requires weaker identification assumptions and was more efficient in simulations and the application.

Asunto(s)

Antirretrovirales , Humanos , Sesgo

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA