Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.018
Filtrar
Más filtros

Intervalo de año de publicación
1.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38216539

RESUMEN

In the drug development process, approximately 30% of failures are attributed to drug safety issues. In particular, the first-in-human (FIH) trial of a new drug represents one of the highest safety risks, and initial dose selection is crucial for ensuring safety in clinical trials. With traditional dose estimation methods, which extrapolate data from animals to humans, catastrophic events have occurred during Phase I clinical trials due to interspecies differences in compound sensitivity and unknown molecular mechanisms. To address this issue, this study proposes a CrossFuse-extreme gradient boosting (XGBoost) method that can directly predict the maximum recommended daily dose of a compound based on existing human research data, providing a reference for FIH dose selection. This method not only integrates multiple features, including molecular representations, physicochemical properties and compound-protein interactions, but also improves feature selection based on cross-validation. The results demonstrate that the CrossFuse-XGBoost method not only improves prediction accuracy compared to that of existing local weighted methods [k-nearest neighbor (k-NN) and variable k-NN (v-NN)] but also solves the low prediction coverage issue of v-NN, achieving full coverage of the external validation set and enabling more reliable predictions. Furthermore, this study offers a high level of interpretability by identifying the importance of different features in model construction. The 241 features with the most significant impact on the maximum recommended daily dose were selected, providing references for optimizing the structure of new compounds and guiding experimental research. The datasets and source code are freely available at https://github.com/cqmu-lq/CrossFuse-XGBoost.


Asunto(s)
Proyectos de Investigación , Programas Informáticos , Animales , Humanos , Análisis por Conglomerados
2.
J Mol Evol ; 92(2): 181-206, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38502220

RESUMEN

Ancestral sequence reconstruction (ASR) is a phylogenetic method widely used to analyze the properties of ancient biomolecules and to elucidate mechanisms of molecular evolution. Despite its increasingly widespread application, the accuracy of ASR is currently unknown, as it is generally impossible to compare resurrected proteins to the true ancestors. Which evolutionary models are best for ASR? How accurate are the resulting inferences? Here we answer these questions using a cross-validation method to reconstruct each extant sequence in an alignment with ASR methodology, a method we term "extant sequence reconstruction" (ESR). We thus can evaluate the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding known true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability, is indeed a good estimate of the fraction of correct amino acids when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions from different models, because, surprisingly, a more accurate phylogenetic model often results in reconstructions with lower probability. While better (more predictive) models may produce reconstructions with lower sequence identity to the true sequences, better models nevertheless produce reconstructions that are more biophysically similar to true ancestors. In addition, we find that a large fraction of sequences sampled from the reconstruction distribution may have fewer errors than the single most probable (SMP) sequence reconstruction, despite the fact that the SMP has the lowest expected error of all possible sequences. Our results emphasize the importance of model selection for ASR and the usefulness of sampling sequence reconstructions for analyzing ancestral protein properties. ESR is a powerful method for validating the evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences. Most significantly, ESR uses ASR methodology to provide a general method by which the biophysical properties of resurrected proteins can be compared to the properties of the true protein.


Asunto(s)
Evolución Biológica , Proteínas , Filogenia , Proteínas/genética , Proteínas/química , Evolución Molecular , Aminoácidos
3.
Hum Brain Mapp ; 45(5): e26555, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38544418

RESUMEN

Novel features derived from imaging and artificial intelligence systems are commonly coupled to construct computer-aided diagnosis (CAD) systems that are intended as clinical support tools or for investigation of complex biological patterns. This study used sulcal patterns from structural images of the brain as the basis for classifying patients with schizophrenia from unaffected controls. Statistical, machine learning and deep learning techniques were sequentially applied as a demonstration of how a CAD system might be comprehensively evaluated in the absence of prior empirical work or extant literature to guide development, and the availability of only small sample datasets. Sulcal features of the entire cerebral cortex were derived from 58 schizophrenia patients and 56 healthy controls. No similar CAD systems has been reported that uses sulcal features from the entire cortex. We considered all the stages in a CAD system workflow: preprocessing, feature selection and extraction, and classification. The explainable AI techniques Local Interpretable Model-agnostic Explanations and SHapley Additive exPlanations were applied to detect the relevance of features to classification. At each stage, alternatives were compared in terms of their performance in the context of a small sample. Differentiating sulcal patterns were located in temporal and precentral areas, as well as the collateral fissure. We also verified the benefits of applying dimensionality reduction techniques and validation methods, such as resubstitution with upper bound correction, to optimize performance.


Asunto(s)
Inteligencia Artificial , Esquizofrenia , Humanos , Esquizofrenia/diagnóstico por imagen , Neuroimagen , Aprendizaje Automático , Diagnóstico por Computador
4.
BMC Plant Biol ; 24(1): 222, 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38539100

RESUMEN

BACKGROUND: Genomic selection (GS) is an efficient breeding strategy to improve quantitative traits. It is necessary to calculate genomic estimated breeding values (GEBVs) for GS. This study investigated the prediction accuracy of GEBVs for five fruit traits including fruit weight, fruit width, fruit height, pericarp thickness, and Brix. Two tomato germplasm collections (TGC1 and TGC2) were used as training populations, consisting of 162 and 191 accessions, respectively. RESULTS: Large phenotypic variations for the fruit traits were found in these collections and the 51K Axiom™ SNP array generated confident 31,142 SNPs. Prediction accuracy was evaluated using different cross-validation methods, GS models, and marker sets in three training populations (TGC1, TGC2, and combined). For cross-validation, LOOCV was effective as k-fold across traits and training populations. The parametric (RR-BLUP, Bayes A, and Bayesian LASSO) and non-parametric (RKHS, SVM, and random forest) models showed different prediction accuracies (0.594-0.870) between traits and training populations. Of these, random forest was the best model for fruit weight (0.780-0.835), fruit width (0.791-0.865), and pericarp thickness (0.643-0.866). The effect of marker density was trait-dependent and reached a plateau for each trait with 768-12,288 SNPs. Two additional sets of 192 and 96 SNPs from GWAS revealed higher prediction accuracies for the fruit traits compared to the 31,142 SNPs and eight subsets. CONCLUSION: Our study explored several factors to increase the prediction accuracy of GEBVs for fruit traits in tomato. The results can facilitate development of advanced GS strategies with cost-effective marker sets for improving fruit traits as well as other traits. Consequently, GS will be successfully applied to accelerate the tomato breeding process for developing elite cultivars.


Asunto(s)
Solanum lycopersicum , Solanum lycopersicum/genética , Teorema de Bayes , Frutas/genética , Fitomejoramiento , Fenotipo , Genómica/métodos , Polimorfismo de Nucleótido Simple/genética , Modelos Genéticos , Genotipo
5.
New Phytol ; 243(1): 111-131, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38708434

RESUMEN

Leaf traits are essential for understanding many physiological and ecological processes. Partial least squares regression (PLSR) models with leaf spectroscopy are widely applied for trait estimation, but their transferability across space, time, and plant functional types (PFTs) remains unclear. We compiled a novel dataset of paired leaf traits and spectra, with 47 393 records for > 700 species and eight PFTs at 101 globally distributed locations across multiple seasons. Using this dataset, we conducted an unprecedented comprehensive analysis to assess the transferability of PLSR models in estimating leaf traits. While PLSR models demonstrate commendable performance in predicting chlorophyll content, carotenoid, leaf water, and leaf mass per area prediction within their training data space, their efficacy diminishes when extrapolating to new contexts. Specifically, extrapolating to locations, seasons, and PFTs beyond the training data leads to reduced R2 (0.12-0.49, 0.15-0.42, and 0.25-0.56) and increased NRMSE (3.58-18.24%, 6.27-11.55%, and 7.0-33.12%) compared with nonspatial random cross-validation. The results underscore the importance of incorporating greater spectral diversity in model training to boost its transferability. These findings highlight potential errors in estimating leaf traits across large spatial domains, diverse PFTs, and time due to biased validation schemes, and provide guidance for future field sampling strategies and remote sensing applications.


Asunto(s)
Hojas de la Planta , Hojas de la Planta/fisiología , Hojas de la Planta/anatomía & histología , Análisis de los Mínimos Cuadrados , Carácter Cuantitativo Heredable , Clorofila/metabolismo , Estaciones del Año , Modelos Biológicos , Agua , Carotenoides/metabolismo
6.
Glob Chang Biol ; 30(1): e17019, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37987241

RESUMEN

Correlative species distribution models are widely used to quantify past shifts in ranges or communities, and to predict future outcomes under ongoing global change. Practitioners confront a wide range of potentially plausible models for ecological dynamics, but most specific applications only consider a narrow set. Here, we clarify that certain model structures can embed restrictive assumptions about key sources of forecast uncertainty into an analysis. To evaluate forecast uncertainties and our ability to explain community change, we fit and compared 39 candidate multi- or joint species occupancy models to avian incidence data collected at 320 sites across California during the early 20th century and resurveyed a century later. We found massive (>20,000 LOOIC) differences in within-time information criterion across models. Poorer fitting models omitting multivariate random effects predicted less variation in species richness changes and smaller contemporary communities, with considerable variation in predicted spatial patterns in richness changes across models. The top models suggested avian environmental associations changed across time, contemporary avian occupancy was influenced by previous site-specific occupancy states, and that both latent site variables and species associations with these variables also varied over time. Collectively, our results recapitulate that simplified model assumptions not only impact predictive fit but may mask important sources of forecast uncertainty and mischaracterize the current state of system understanding when seeking to describe or project community responses to global change. We recommend that researchers seeking to make long-term forecasts prioritize characterizing forecast uncertainty over seeking to present a single best guess. To do so reliably, we urge practitioners to employ models capable of characterizing the key sources of forecast uncertainty, where predictors, parameters and random effects may vary over time or further interact with previous occurrence states.


Asunto(s)
Cambio Climático , Clima , Animales , Incertidumbre , Aves/fisiología , Predicción
7.
J Magn Reson Imaging ; 59(5): 1630-1642, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-37584329

RESUMEN

BACKGROUND: Uncontrollable body movements are typical symptoms of Parkinson's disease (PD), which results in inconsistent findings regarding resting-state functional connectivity (rsFC) networks, especially for group difference clusters. Systematically identifying the motion-associated data was highly demanded. PURPOSE: To determine data censoring criteria using a quantitative cross validation-based data censoring (CVDC) method and to improve the detection of rsFC deficits in PD. STUDY TYPE: Prospective. SUBJECTS: Forty-one PD patients (68.63 ± 9.17 years, 44% female) and 20 healthy controls (66.83 ± 12.94 years, 55% female). FIELD STRENGTH/SEQUENCE: 3-T, T1-weighted gradient echo and EPI sequences. ASSESSMENT: Clusters with significant differences between groups were found in three visual networks, default network, and right sensorimotor network. Five-fold cross-validation tests were performed using multiple motion exclusion criteria, and the selected criteria were determined based on cluster sizes, significance values, and Dice coefficients among the cross-validation tests. As a reference method, whole brain rsFC comparisons between groups were analyzed using a FMRIB Software Library (FSL) pipeline with default settings. STATISTICAL TESTS: Group difference clusters were calculated using nonparametric permutation statistics of FSL-randomize. The family-wise error was corrected. Demographic information was evaluated using independent sample t-tests and Pearson's Chi-squared tests. The level of statistical significance was set at P < 0.05. RESULTS: With the FSL processing pipeline, the mean Dice coefficient of the network clusters was 0.411, indicating a low reproducibility. With the proposed CVDC method, motion exclusion criteria were determined as frame-wise displacement >0.55 mm. Group-difference clusters showed a mean P-value of 0.01 and a 72% higher mean Dice coefficient compared to the FSL pipeline. Furthermore, the CVDC method was capable of detecting subtle rsFC deficits in the medial sensorimotor network and auditory network that were unobservable using the conventional pipeline. DATA CONCLUSION: The CVDC method may provide superior sensitivity and improved reproducibility for detecting rsFC deficits in PD. LEVEL OF EVIDENCE: 1 TECHNICAL EFFICACY: Stage 2.


Asunto(s)
Enfermedad de Parkinson , Humanos , Femenino , Masculino , Enfermedad de Parkinson/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Reproducibilidad de los Resultados , Estudios Prospectivos , Encéfalo/diagnóstico por imagen , Mapeo Encefálico/métodos
8.
Stat Med ; 43(6): 1119-1134, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38189632

RESUMEN

Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.


Asunto(s)
Proyectos de Investigación , Humanos , Simulación por Computador , Tamaño de la Muestra
9.
Stat Med ; 43(20): 3921-3942, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-38951867

RESUMEN

For survival analysis applications we propose a novel procedure for identifying subgroups with large treatment effects, with focus on subgroups where treatment is potentially detrimental. The approach, termed forest search, is relatively simple and flexible. All-possible subgroups are screened and selected based on hazard ratio thresholds indicative of harm with assessment according to the standard Cox model. By reversing the role of treatment one can seek to identify substantial benefit. We apply a splitting consistency criteria to identify a subgroup considered "maximally consistent with harm." The type-1 error and power for subgroup identification can be quickly approximated by numerical integration. To aid inference we describe a bootstrap bias-corrected Cox model estimator with variance estimated by a Jacknife approximation. We provide a detailed evaluation of operating characteristics in simulations and compare to virtual twins and generalized random forests where we find the proposal to have favorable performance. In particular, in our simulation setting, we find the proposed approach favorably controls the type-1 error for falsely identifying heterogeneity with higher power and classification accuracy for substantial heterogeneous effects. Two real data applications are provided for publicly available datasets from a clinical trial in oncology, and HIV.


Asunto(s)
Simulación por Computador , Infecciones por VIH , Modelos de Riesgos Proporcionales , Humanos , Análisis de Supervivencia
10.
Stat Med ; 43(13): 2487-2500, 2024 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-38621856

RESUMEN

Precision medicine aims to identify specific patient subgroups that may benefit the most from a particular treatment than the whole population. Existing definitions for the best subgroup in subgroup analysis are based on a single outcome and do not consider multiple outcomes; specifically, outcomes of different types. In this article, we introduce a definition for the best subgroup under a multiple-outcome setting with continuous, binary, and censored time-to-event outcomes. Our definition provides a trade-off between the subgroup size and the conditional average treatment effects (CATE) in the subgroup with respect to each of the outcomes while taking the relative contribution of the outcomes into account. We conduct simulations to illustrate the proposed definition. By examining the outcomes of urinary tract infection and renal scarring in the RIVUR clinical trial, we identify a subgroup of children that would benefit the most from long-term antimicrobial prophylaxis.


Asunto(s)
Simulación por Computador , Medicina de Precisión , Infecciones Urinarias , Humanos , Infecciones Urinarias/tratamiento farmacológico , Resultado del Tratamiento , Modelos Estadísticos , Niño
11.
Stat Med ; 43(8): 1640-1659, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38351516

RESUMEN

The regression discontinuity (RD) design is a widely utilized approach for assessing treatment effects. It involves assigning treatment based on the value of an observed covariate in relation to a fixed threshold. Although the RD design has been widely employed across various problems, its application to specific data types has received limited attention. For instance, there has been little research on utilizing the RD design when the outcome variable exhibits zero-inflation. This study introduces a novel RD estimator using local likelihood, which overcomes the limitations of the local linear regression model, a popular approach for estimating treatment effects in RD design, by considering the data type of the outcome variable. To determine the optimal bandwidth, we propose a modified Ludwig-Miller cross validation method. A set of simulations is carried out, involving binary, count, and zero-inflated outcome variables, to showcase the superior performance of the suggested method over local linear regression models. Subsequently, the proposed local likelihood model is employed on HIV care data, where antiretroviral therapy eligibility is determined by a CD4 count threshold. A comparison is made between the results obtained using the local likelihood model and those obtained using local linear regression.


Asunto(s)
Fármacos Anti-VIH , Infecciones por VIH , Humanos , Sudáfrica , Fármacos Anti-VIH/uso terapéutico , Infecciones por VIH/tratamiento farmacológico , Modelos Lineales , Proyectos de Investigación
12.
Stat Med ; 43(11): 2096-2121, 2024 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-38488240

RESUMEN

Excessive zeros in multivariate count data are often observed in scenarios of biomedicine and public health. To provide a better analysis on this type of data, we first develop a marginalized multivariate zero-inflated Poisson (MZIP) regression model to directly interpret the overall exposure effects on marginal means. Then, we define a multiple Pearson residual for our newly developed MZIP regression model by simultaneously taking heterogeneity and correlation into consideration. Furthermore, a new model averaging prediction method is introduced based on the multiple Pearson residual, and the asymptotical optimality of this model averaging prediction is proved. Simulations and two empirical applications in medicine are used to illustrate the effectiveness of the proposed method.


Asunto(s)
Simulación por Computador , Modelos Estadísticos , Humanos , Distribución de Poisson , Análisis Multivariante , Análisis de Regresión , Interpretación Estadística de Datos
13.
Clin Transplant ; 38(4): e15316, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-38607291

RESUMEN

BACKGROUND: The incidence of graft failure following liver transplantation (LTx) is consistent. While traditional risk scores for LTx have limited accuracy, the potential of machine learning (ML) in this area remains uncertain, despite its promise in other transplant domains. This study aims to determine ML's predictive limitations in LTx by replicating methods used in previous heart transplant research. METHODS: This study utilized the UNOS STAR database, selecting 64,384 adult patients who underwent LTx between 2010 and 2020. Gradient boosting models (XGBoost and LightGBM) were used to predict 14, 30, and 90-day graft failure compared to conventional logistic regression model. Models were evaluated using both shuffled and rolling cross-validation (CV) methodologies. Model performance was assessed using the AUC across validation iterations. RESULTS: In a study comparing predictive models for 14-day, 30-day and 90-day graft survival, LightGBM consistently outperformed other models, achieving the highest AUC of.740,.722, and.700 in shuffled CV methods. However, in rolling CV the accuracy of the model declined across every ML algorithm. The analysis revealed influential factors for graft survival prediction across all models, including total bilirubin, medical condition, recipient age, and donor AST, among others. Several features like donor age and recipient diabetes history were important in two out of three models. CONCLUSIONS: LightGBM enhances short-term graft survival predictions post-LTx. However, due to changing medical practices and selection criteria, continuous model evaluation is essential. Future studies should focus on temporal variations, clinical implications, and ensure model transparency for broader medical utility.


Asunto(s)
Trasplante de Hígado , Adulto , Humanos , Trasplante de Hígado/efectos adversos , Proyectos de Investigación , Algoritmos , Bilirrubina , Aprendizaje Automático
14.
Ecol Appl ; 34(4): e2966, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38629509

RESUMEN

Generating spatial predictions of species distribution is a central task for research and policy. Currently, correlative species distribution models (cSDMs) are among the most widely used tools for this purpose. However, a fundamental assumption of cSDMs, that species distributions are in equilibrium with their environment, is rarely fulfilled in real data and limits the applicability of cSDMs for dynamic projections. Process-based, dynamic SDMs (dSDMs) promise to overcome these limitations as they explicitly represent transient dynamics and enhance spatiotemporal transferability. Software tools for implementing dSDMs are becoming increasingly available, but their parameter estimation can be complex. Here, we test the feasibility of calibrating and validating a dSDM using long-term monitoring data of Swiss red kites (Milvus milvus). This population has shown strong increases in abundance and a progressive range expansion over the last decades, indicating a nonequilibrium situation. We construct an individual-based model using the RangeShiftR modeling platform and use Bayesian inference for model calibration. This allows the integration of heterogeneous data sources, such as parameter estimates from published literature and observational data from monitoring schemes, with a coherent assessment of parameter uncertainty. Our monitoring data encompass counts of breeding pairs at 267 sites across Switzerland over 22 years. We validate our model using a spatial-block cross-validation scheme and assess predictive performance with a rank-correlation coefficient. Our model showed very good predictive accuracy of spatial projections and represented well the observed population dynamics over the last two decades. Results suggest that reproductive success was a key factor driving the observed range expansion. According to our model, the Swiss red kite population fills large parts of its current range but has potential for further increases in density. We demonstrate the practicality of data integration and validation for dSDMs using RangeShiftR. This approach can improve predictive performance compared to cSDMs. The workflow presented here can be adopted for any population for which some prior knowledge on demographic and dispersal parameters as well as spatiotemporal observations of abundance or presence/absence are available. The fitted model provides improved quantitative insights into the ecology of a species, which can greatly aid conservation and management efforts.


Asunto(s)
Modelos Biológicos , Dinámica Poblacional , Animales , Suiza , Falconiformes/fisiología , Monitoreo del Ambiente/métodos , Factores de Tiempo , Teorema de Bayes
15.
Value Health ; 2024 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-39094686

RESUMEN

OBJECTIVES: Reimbursement decisions for new Alzheimer's disease (AD) treatments are informed by economic evaluations. An open-source model with intuitive structure for model cross-validation can support the transparency and credibility of such evaluations. We describe the new International Pharmaco-Economic Collaboration on Alzheimer's Disease (IPECAD) open-source model framework (version 2) for the health-economic evaluation of early AD treatment and use it for cross-validation and addressing uncertainty. METHODS: A cohort state-transition model using a categorized composite domain (cognition and function) was developed by replicating an existing reference model and testing it for internal validity. Then, features of existing Institute for Clinical and Economic Review (ICER) and Alzheimer's Disease Archimedes Condition-Event Simulator (AD-ACE) models assessing lecanemab treatment were implemented for model cross-validation. Additional uncertainty scenarios were performed on choice of efficacy outcome from trial, natural disease progression, treatment effect waning and stopping rules, and other methodological choices. The model is available open-source as R code, spreadsheet, and web-based version via https://github.com/ronhandels/IPECAD. RESULTS: In the IPECAD model incremental life-years, quality-adjusted life-years (QALY) gains and cost savings were 21% to 31% smaller compared with the ICER model and 36% to 56% smaller compared with the AD-ACE model. IPECAD model results were particularly sensitive to assumptions on treatment effect waning and stopping rules and choice of efficacy outcome from trial. CONCLUSIONS: We demonstrated the ability of a new IPECAD open-source model framework for researchers and decision makers to cross-validate other (Health Technology Assessment submission) models and perform additional uncertainty analyses, setting an example for open science in AD decision modeling and supporting important reimbursement decisions.

16.
BMC Med Res Methodol ; 24(1): 83, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38589775

RESUMEN

BACKGROUND: The timing of treating cancer patients is an essential factor in the efficacy of treatment. So, patients who will not respond to current therapy should receive a different treatment as early as possible. Machine learning models can be built to classify responders and nonresponders. Such classification models predict the probability of a patient being a responder. Most methods use a probability threshold of 0.5 to convert the probabilities into binary group membership. However, the cutoff of 0.5 is not always the optimal choice. METHODS: In this study, we propose a novel data-driven approach to select a better cutoff value based on the optimal cross-validation technique. To illustrate our novel method, we applied it to three clinical trial datasets of small-cell lung cancer patients. We used two different datasets to build a scoring system to segment patients. Then the models were applied to segment patients into the test data. RESULTS: We found that, in test data, the predicted responders and non-responders had significantly different long-term survival outcomes. Our proposed novel method segments patients better than the standard approach using a cutoff of 0.5. Comparing clinical outcomes of responders versus non-responders, our novel method had a p-value of 0.009 with a hazard ratio of 0.668 for grouping patients using the Cox proportion hazard model and a p-value of 0.011 using the accelerated failure time model which approved a significant difference between responders and non-responders. In contrast, the standard approach had a p-value of 0.194 with a hazard ratio of 0.823 using the Cox proportion hazard model and a p-value of 0.240 using the accelerated failure time model indicating the responders and non-responders do not differ significantly in survival. CONCLUSION: In summary, our novel prediction method can successfully segment new patients into responders and non-responders. Clinicians can use our prediction to decide if a patient should receive a different treatment or stay with the current treatment.


Asunto(s)
Neoplasias Pulmonares , Carcinoma Pulmonar de Células Pequeñas , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/terapia , Carcinoma Pulmonar de Células Pequeñas/terapia , Resultado del Tratamiento , Proyectos de Investigación
17.
BMC Med Res Methodol ; 24(1): 123, 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38831346

RESUMEN

In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.


Asunto(s)
Algoritmos , Depresión , Aprendizaje Automático , Humanos , Depresión/diagnóstico , Índice de Severidad de la Enfermedad , Sensibilidad y Especificidad , Femenino
18.
BMC Med Res Methodol ; 24(1): 148, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003462

RESUMEN

We propose a compartmental model for investigating smoking dynamics in an Italian region (Tuscany). Calibrating the model on local data from 1993 to 2019, we estimate the probabilities of starting and quitting smoking and the probability of smoking relapse. Then, we forecast the evolution of smoking prevalence until 2043 and assess the impact on mortality in terms of attributable deaths. We introduce elements of novelty with respect to previous studies in this field, including a formal definition of the equations governing the model dynamics and a flexible modelling of smoking probabilities based on cubic regression splines. We estimate model parameters by defining a two-step procedure and quantify the sampling variability via a parametric bootstrap. We propose the implementation of cross-validation on a rolling basis and variance-based Global Sensitivity Analysis to check the robustness of the results and support our findings. Our results suggest a decrease in smoking prevalence among males and stability among females, over the next two decades. We estimate that, in 2023, 18% of deaths among males and 8% among females are due to smoking. We test the use of the model in assessing the impact on smoking prevalence and mortality of different tobacco control policies, including the tobacco-free generation ban recently introduced in New Zealand.


Asunto(s)
Predicción , Cese del Hábito de Fumar , Fumar , Humanos , Italia/epidemiología , Femenino , Masculino , Fumar/epidemiología , Prevalencia , Predicción/métodos , Cese del Hábito de Fumar/estadística & datos numéricos , Adulto , Persona de Mediana Edad , Modelos Estadísticos
19.
BMC Med Imaging ; 24(1): 72, 2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38532313

RESUMEN

BACKGROUND: Quantitative determination of the correlation between cognitive ability and functional biomarkers in the older brain is essential. To identify biomarkers associated with cognitive performance in the older, this study combined an index model specific for resting-state functional connectivity (FC) with a supervised machine learning method. METHODS: Performance scores on conventional cognitive test scores and resting-state functional MRI data were obtained for 98 healthy older individuals and 90 healthy youth from two public databases. Based on the test scores, the older cohort was categorized into two groups: excellent and poor. A resting-state FC scores model (rs-FCSM) was constructed for each older individual to determine the relative differences in FC among brain regions compared with that in the youth cohort. Brain areas sensitive to test scores could then be identified using this model. To suggest the effectiveness of constructed model, the scores of these brain areas were used as feature matrix inputs for training an extreme learning machine. classification accuracy (CA) was then tested in separate groups and validated by N-fold cross-validation. RESULTS: This learning study could effectively classify the cognitive status of healthy older individuals according to the model scores of frontal lobe, temporal lobe, and parietal lobe with a mean accuracy of 86.67%, which is higher than that achieved using conventional correlation analysis. CONCLUSION: This classification study of the rs-FCSM may facilitate early detection of age-related cognitive decline as well as help reveal the underlying pathological mechanisms.


Asunto(s)
Encéfalo , Cognición , Adolescente , Humanos , Mapeo Encefálico/métodos , Imagen por Resonancia Magnética/métodos , Biomarcadores
20.
World J Surg Oncol ; 22(1): 227, 2024 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-39198807

RESUMEN

OBJECTIVE: Tongue squamous cell carcinoma (TSCC) accounts for 43.4% of oral cancers in China and has a poor prognosis. This study aimed to explore whether radiomics features extracted from preoperative magnetic resonance imaging (MRI) could predict overall survival (OS) in patients with TSCC. METHODS: The clinical imaging data of 232 patients with pathologically confirmed TSCC at Xiangyang No. 1 People's Hospital were retrospectively analyzed from February 2010 to October 2022. Based on 2-10 years of follow-up, patients were categorized into two groups: control (healthy survival, n = 148) and research (adverse events: recurrence or metastasis-related death, n = 84). A training and a test set were established using a 7:3 ratio and a time node. Radiomics features were extracted from axial T2-weighted imaging, contrast-enhanced T1-weighted imaging, and diffusion-weighted imaging (DWI) sequences. The corresponding radiomics scores were generated using the least absolute shrinkage and selection operator algorithm. Kaplan-Meier and multivariate Cox regression analyses were used to screen for independent factors affecting adverse events in patients with TSCC using clinical and pathological results. A novel nomogram was established to predict the probability of adverse events and OS in patients with TSCC. RESULTS: The incidence of adverse events within 2-10 years after surgery was 36.21%. Kaplan-Meier analysis revealed that hot pot consumption, betel nut chewing, platelet-lymphocyte ratio, drug use, neutrophil-lymphocyte ratio, Radscore, and other factors impacted TSCC survival. Multivariate Cox regression analysis revealed that the clinical stage (P < 0.001), hot pot consumption (P < 0.001), Radscore 1 (P = 0.01), and Radscore 2 (P < 0.001) were independent factors affecting TSCC-OS. The same result was validated by the XGBoost algorithm. The nomogram based on the aforementioned factors exhibited good discrimination (C-index 0.86/0.81) and calibration (P > 0.05) in the training and test sets, accurately predicting the risk of adverse events and survival. CONCLUSION: The nomogram constructed using clinical data and MRI radiomics parameters may accurately predict TSCC-OS noninvasively, thereby assisting clinicians in promptly modifying treatment strategies to improve patient prognosis.


Asunto(s)
Imagen por Resonancia Magnética , Nomogramas , Neoplasias de la Lengua , Humanos , Masculino , Femenino , Persona de Mediana Edad , Neoplasias de la Lengua/patología , Neoplasias de la Lengua/mortalidad , Neoplasias de la Lengua/diagnóstico por imagen , Neoplasias de la Lengua/cirugía , Estudios Retrospectivos , Proyectos Piloto , Tasa de Supervivencia , Imagen por Resonancia Magnética/métodos , Imagen por Resonancia Magnética/estadística & datos numéricos , Pronóstico , Estudios de Seguimiento , Carcinoma de Células Escamosas/diagnóstico por imagen , Carcinoma de Células Escamosas/mortalidad , Carcinoma de Células Escamosas/patología , Carcinoma de Células Escamosas/cirugía , Anciano , Adulto , Carcinoma de Células Escamosas de Cabeza y Cuello/diagnóstico por imagen , Carcinoma de Células Escamosas de Cabeza y Cuello/mortalidad , Carcinoma de Células Escamosas de Cabeza y Cuello/patología , Carcinoma de Células Escamosas de Cabeza y Cuello/cirugía , Recurrencia Local de Neoplasia/patología , Recurrencia Local de Neoplasia/diagnóstico por imagen , Recurrencia Local de Neoplasia/mortalidad , Radiómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA