RESUMO
Background: Studying causality relationships between different brain regions using the fMRI method has attracted great attention. To investigate causality relationships between different brain regions, we need to identify both the brain network structure and the influence magnitude. Most current methods concentrate on magnitude estimation, but not on identifying the connection or structure of the network. To address this problem, we proposed a nonlinear system identification method, in which a polynomial kernel was adopted to approximate the relation between the system inputs and outputs. However, this method has an overfitting problem for modelling the input-output relation if we apply the method to model the brain network directly. Methods: To overcome this limitation, this study applied the least absolute shrinkage and selection operator (LASSO) model selection method to identify both brain region networks and the connection strength (system coefficients). From these coefficients, the causality influence is derived from the identified structure. The method was verified based on the human visual cortex with phase-encoded designs. The functional data were pre-processed with motion correction. The visual cortex brain regions were defined based on a retinotopic mapping method. An eight-connection visual system network was adopted to validate the method. The proposed method was able to identify both the connected visual networks and associated coefficients from the LASSO model selection. Results: The result showed that this method can be applied to identify both network structures and associated causalities between different brain regions. Conclusions: System identification with LASSO model selection algorithm is a powerful approach for fMRI effective connectivity study.
Assuntos
Mapeamento Encefálico , Imageamento por Ressonância Magnética , Córtex Visual , Humanos , Imageamento por Ressonância Magnética/métodos , Córtex Visual/diagnóstico por imagem , Córtex Visual/fisiologia , Mapeamento Encefálico/métodos , Algoritmos , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/fisiologia , Processamento de Imagem Assistida por Computador/métodos , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , AdultoRESUMO
Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields δ -optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests under a computational constraint on the maximum ensemble size. Compared to sample-split and K -fold cross-validation, ECV achieves higher accuracy by avoiding sample splitting. Meanwhile, its computational cost is considerably lower owing to the use of the risk extrapolation technique.
RESUMO
This paper concerns structure learning or discovery of discrete generative models. It focuses on Bayesian model selection and the assimilation of training data or content, with a special emphasis on the order in which data are ingested. A key move-in the ensuing schemes-is to place priors on the selection of models, based upon expected free energy. In this setting, expected free energy reduces to a constrained mutual information, where the constraints inherit from priors over outcomes (i.e., preferred outcomes). The resulting scheme is first used to perform image classification on the MNIST dataset to illustrate the basic idea, and then tested on a more challenging problem of discovering models with dynamics, using a simple sprite-based visual disentanglement paradigm and the Tower of Hanoi (cf., blocks world) problem. In these examples, generative models are constructed autodidactically to recover (i.e., disentangle) the factorial structure of latent states-and their characteristic paths or dynamics.
RESUMO
Background: In epidemiology, indicators such as the relative excess risk due to interaction (RERI), attributable proportion (AP), and synergy index (S) are commonly used to assess additive interactions between two variables. However, the results of these indicators are sometimes inconsistent in real world applications and it may be difficult to draw conclusions from them. Method: Based on the relationship between the RERI, AP, and S, we propose a method with consistent results, which are achieved by constraining e θ 3 - e θ 1 - e θ 2 + 1 = 0 , and the interpretation of the results is simple and clear. We present two pathways to achieve this end: one is to complete the constraint by adding a regular penalty term to the model likelihood function; the other is to use model selection. Result: Using simulated and real data, our proposed methods effectively identified additive interactions and proved to be applicable to real-world data. Simulations were used to evaluate the performance of the methods in scenarios with and without additive interactions. The penalty term converged to 0 with increasing λ, and the final models matched the expected interaction status, demonstrating that regularized estimation could effectively identify additive interactions. Model selection was compared with classical methods (delta and bootstrap) across various scenarios with different interaction strengths, and the additive interactions were closely observed and the results aligned closely with bootstrap results. The coefficients in the model without interaction adhered to a simplifying equation, reinforcing that there was no significant interaction between smoking and alcohol use on oral cancer risk. Conclusion: In summary, the model selection method based on the Hannan-Quinn criterion (HQ) appears to be a competitive alternative to the bootstrap method for identifying additive interactions. Furthermore, when using RERI, AP, and S to assess the additive interaction, the results are more consistent and the results are simple and easy to understand.
Assuntos
Simulação por Computador , Humanos , Modelos Estatísticos , Modelos Epidemiológicos , Fumar/epidemiologiaRESUMO
Understanding how genetic variation affects gene expression is essential for a complete picture of the functional pathways that give rise to complex traits. Although numerous studies have established that many genes are differentially expressed in distinct human tissues and cell types, no tools exist for identifying the genes whose expression is differentially regulated. Here we introduce DRAB (differential regulation analysis by bootstrapping), a gene-based method for testing whether patterns of genetic regulation are significantly different between tissues or other biological contexts. DRAB first leverages the elastic net to learn context-specific models of local genetic regulation and then applies a novel bootstrap-based model comparison test to check their equivalency. Unlike previous model comparison tests, our proposed approach can determine whether population-level models have equal predictive performance by accounting for the variability of feature selection and model training. We validated DRAB on mRNA expression data from a variety of human tissues in the Genotype-Tissue Expression (GTEx) Project. DRAB yielded biologically reasonable results and had sufficient power to detect genes with tissue-specific regulatory profiles while effectively controlling false positives. By providing a framework that facilitates the prioritization of differentially regulated genes, our study enables future discoveries on the genetic architecture of molecular phenotypes.
RESUMO
Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod data sets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple data sets, indicating that there is no 'one size fits all' when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.
RESUMO
Accurately predicting tree mortality in mixed forests sets a challenge for conventional models because of large uncertainty, especially under changing climate. Machine learning algorithms had potential for predicting individual tree mortality with higher accuracy via filtering the relevant climatic and environmental factors. In this study, the sensitivity of individual tree mortality to regional climate was validated by modeling in seminatural mixed coniferous forests based on 25-year observations in northeast of China. Three advanced machine learning and deep learning algorithms were employed, including support vector machines, multi-layer perceptron, and random forests. Mortality was predicted by the effects of multiple inherent and environmental factors, including tree size and growth, topography, competition, stand structure and regional climate. All three types of models performed satisfactorily with their values of the areas under receiving operating characteristic curve (AUC) > 0.9. With tree growth, competition and regional climate as input variables, a model based on random forests showed the highest values of the explained variance score (0.862) and AUC (0.914). Since the trees were vulnerable despite their species, mortality could occur after growth limit induced by insufficient or excessive sun radiation during growing seasons, cold threat caused thermal insufficiency in winters, and annual moisture constraints in these mixed coniferous forests. Our findings could enrich basic knowledge on individual tree mortality caused by water and heat inadequacy with the negative impacts of global warming. Successful individual tree mortality modeling via advanced algorithms in mixed forests could assist in adaptive forest ecology modeling in large areas.
Assuntos
Florestas , Árvores , China , Mudança Climática , Aprendizado de Máquina , Temperatura Baixa , Monitoramento Ambiental/métodosRESUMO
Students of biological allometry have used the logarithmic transformation for over a century to linearize bivariate distributions that are curvilinear on the arithmetic scale. When the distribution is linear, the equation for a straight line fitted to the distribution can be back-transformed to form a two-parameter power function for describing the original observations. However, many of the data in contemporary studies of allometry fail to meet the requirement for log-linearity, thereby precluding the use of the aforementioned protocol. Even when data are linear in logarithmic form, the two-parameter power equation estimated by back-transformation may yield a misleading or erroneous perception of pattern in the original distribution. A better approach to bivariate allometry would be to forego transformation altogether and to fit multiple models to untransformed observations by nonlinear regression, thereby creating a pool of candidate models with different functional form and different assumptions regarding random error. The best model in the pool of candidate models could then be identified by a selection procedure based on maximum likelihood. Two examples are presented to illustrate the power and versatility of newer methods for studying allometric variation. It always is better to examine the original data when it is possible to do so.
Assuntos
Modelos Biológicos , Algoritmos , Funções Verossimilhança , Animais , HumanosRESUMO
We developed a novel machine learning (ML) algorithm with the goal of producing transparent models (i.e., understandable by humans) while also flexibly accounting for nonlinearity and interactions. Our method is based on ranked sparsity, and it allows for flexibility and user control in varying the shade of the opacity of black box machine learning methods. The main tenet of ranked sparsity is that an algorithm should be more skeptical of higher-order polynomials and interactions a priori compared to main effects, and hence, the inclusion of these more complex terms should require a higher level of evidence. In this work, we put our new ranked sparsity algorithm (as implemented in the open source R package, sparseR) to the test in a predictive model "bakeoff" (i.e., a benchmarking study of ML algorithms applied "out of the box", that is, with no special tuning). Algorithms were trained on a large set of simulated and real-world data sets from the Penn Machine Learning Benchmarks database, addressing both regression and binary classification problems. We evaluated the extent to which our human-centered algorithm can attain predictive accuracy that rivals popular black box approaches such as neural networks, random forests, and support vector machines, while also producing more interpretable models. Using out-of-bag error as a meta-outcome, we describe the properties of data sets in which human-centered approaches can perform as well as or better than black box approaches. We found that interpretable approaches predicted optimally or within 5% of the optimal method in most real-world data sets. We provide a more in-depth comparison of the performances of random forests to interpretable methods for several case studies, including exemplars in which algorithms performed similarly, and several cases when interpretable methods underperformed. This work provides a strong rationale for including human-centered transparent algorithms such as ours in predictive modeling applications.
RESUMO
Virtually all multicellular organisms on Earth live in symbiotic associations with complex microbial communities: the microbiome. This ancient relationship is of fundamental importance for both the host and the microbiome. Recently, the analyses of numerous microbiomes have revealed an incredible diversity and complexity of symbionts, with different mechanisms identified as potential drivers of this diversity. However, the interplay of ecological and evolutionary forces generating these complex associations is still poorly understood. Here we explore and summarise the suite of ecological and evolutionary mechanisms identified as relevant to different aspects of microbiome complexity and diversity. We argue that microbiome assembly is a dynamic product of ecology and evolution at various spatio-temporal scales. We propose a theoretical framework to classify mechanisms and build mechanistic host-microbiome models to link them to empirical patterns. We develop a cohesive foundation for the theoretical understanding of the combined effects of ecology and evolution on the assembly of complex symbioses.
RESUMO
Purpose: Metabolite amplitude estimates derived from linear combination modeling of MR spectra depend upon the precise list of constituent metabolite basis functions used (the "basis set"). The absence of clear consensus on the "ideal" composition or objective criteria to determine the suitability of a particular basis set contributes to the poor reproducibility of MRS. In this proof-of-concept study, we demonstrate a novel, data-driven approach for deciding the basis-set composition using Bayesian information criteria (BIC). Methods: We have developed an algorithm that iteratively adds metabolites to the basis set using iterative modeling, informed by BIC scores. We investigated two quantitative "stopping conditions", referred to as max-BIC and zero-amplitude, and whether to optimize the selection of basis set on a per-spectrum basis or at the group level. The algorithm was tested using two groups of synthetic in-vivo-like spectra representing healthy brain and tumor spectra, respectively, and the derived basis sets (and metabolite amplitude estimates) were compared to the ground truth. Results: All derived basis sets correctly identified high-concentration metabolites and provided reasonable fits of the spectra. At the single-spectrum level, the two stopping conditions derived the underlying basis set with 77-87% accuracy. When optimizing across a group, basis set determination accuracy improved to 84-92%. Conclusion: Data-driven determination of the basis set composition is feasible. With refinement, this approach could provide a valuable data-driven way to derive or refine basis sets, reducing the operator bias of MRS analyses, enhancing the objectivity of quantitative analyses, and increasing the clinical viability of MRS.
RESUMO
Density-dependent population dynamic models strongly influence many of the world's most important harvest policies. Nearly all classic models (e.g. Beverton-Holt and Ricker) recommend that managers maintain a population size of roughly 40-50 percent of carrying capacity to maximize sustainable harvest, no matter the species' population growth rate. Such insights are the foundational logic behind most sustainability targets and biomass reference points for fisheries. However, a simple, less-commonly used model, called the Hockey-Stick model, yields very different recommendations. We show that the optimal population size to maintain in this model, as a proportion of carrying capacity, is one over the population growth rate. This leads to more conservative optimal harvest policies for slow-growing species, compared to other models, if all models use the same growth rate and carrying capacity values. However, parameters typically are not fixed; they are estimated after model-fitting. If the Hockey-Stick model leads to lower estimates of carrying capacity than other models, then the Hockey-Stick policy could yield lower absolute population size targets in practice. Therefore, to better understand the population size targets that may be recommended across real fisheries, we fit the Hockey-Stick, Ricker and Beverton-Holt models to population time series data across 284 fished species from the RAM Stock Assessment database. We found that the Hockey-Stick model usually recommended fisheries maintain population sizes higher than all other models (in 69-81% of the data sets). Furthermore, in 77% of the datasets, the Hockey-Stick model recommended an optimal population target even higher than 60% of carrying capacity (a widely used target, thought to be conservative). However, there was considerable uncertainty in the model fitting. While Beverton-Holt fit several of the data sets best, Hockey-Stick also frequently fit similarly well. In general, the best-fitting model rarely had overwhelming support (a model probability of greater than 95% was achieved in less than five percent of the datasets). A computational experiment, where time series data were simulated from all three models, revealed that Beverton-Holt often fit best even when it was not the true model, suggesting that fisheries data are likely too small and too noisy to resolve uncertainties in the functional forms of density-dependent growth. Therefore, sustainability targets may warrant revisiting, especially for slow-growing species.
Assuntos
Conservação dos Recursos Naturais , Pesqueiros , Peixes , Conceitos Matemáticos , Modelos Biológicos , Densidade Demográfica , Dinâmica Populacional , Pesqueiros/estatística & dados numéricos , Animais , Conservação dos Recursos Naturais/estatística & dados numéricos , Dinâmica Populacional/estatística & dados numéricos , Peixes/crescimento & desenvolvimento , Biomassa , Simulação por ComputadorRESUMO
Modern developments in autonomous chemometric machine learning technology strive to relinquish the need for human intervention. However, such algorithms developed and used in chemometric multivariate calibration and classification applications exclude crucial expert insight when difficult and safety-critical analysis situations arise, e.g., spectral-based medical decisions such as noninvasively determining if a biopsy is cancerous. The prediction accuracy and interpolation capabilities of autonomous methods for new samples depend on the quality and scope of their training (calibration) data. Specifically, analysis patterns within target data not captured by the training data will produce undesirable outcomes. Alternatively, using an immersive analytic approach allows insertion of human expert judgment at key machine learning algorithm junctures forming a sensemaking process performed in cooperation with a computer. The capacity of immersive virtual reality (IVR) environments to render human comprehensible three-dimensional space simulating real-world encounters, suggests its suitability as a hybrid immersive human-computer interface for data analysis tasks. Using IVR maximizes human senses to capitalize on our instinctual perception of the physical environment, thereby leveraging our innate ability to recognize patterns and visualize thresholds crucial to reducing erroneous outcomes. In this first use of IVR as an immersive analytic tool for spectral data, we examine an integrated IVR real-time model selection algorithm for a recent model updating method that adapts a model from the original calibration domain to predict samples from shifted target domains. Using near-infrared data, analyte prediction errors from IVR-selected models are reduced compared to errors using an established autonomous model selection approach. Results demonstrate the viability of IVR as a human data analysis interface for spectral data analysis including classification problems.
RESUMO
Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.
Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Filogenia , Alinhamento de Sequência , Funções Verossimilhança , Modelos Genéticos , Evolução MolecularRESUMO
Neurophysiological brain activity comprises rhythmic (periodic) and arrhythmic (aperiodic) signal elements, which are increasingly studied in relation to behavioral traits and clinical symptoms. Current methods for spectral parameterization of neural recordings rely on user-dependent parameter selection, which challenges the replicability and robustness of findings. Here, we introduce a principled approach to model selection, relying on Bayesian information criterion, for static and time-resolved spectral parameterization of neurophysiological data. We present extensive tests of the approach with ground-truth and empirical magnetoencephalography recordings. Data-driven model selection enhances both the specificity and sensitivity of spectral and spectrogram decompositions, even in non-stationary contexts. Overall, the proposed spectral decomposition with data-driven model selection minimizes the reliance on user expertise and subjective choices, enabling more robust, reproducible, and interpretable research findings.
RESUMO
Background: Mild cognitive impairment (MCI) patients are at a high risk of developing Alzheimer's disease and related dementias (ADRD) at an estimated annual rate above 10%. It is clinically and practically important to accurately predict MCI-to-dementia conversion time. Objective: It is clinically and practically important to accurately predict MCI-to-dementia conversion time by using easily available clinical data. Methods: The dementia diagnosis often falls between two clinical visits, and such survival outcome is known as interval-censored data. We utilized the semi-parametric model and the random forest model for interval-censored data in conjunction with a variable selection approach to select important measures for predicting the conversion time from MCI to dementia. Two large AD cohort data sets were used to build, validate, and test the predictive model. Results: We found that the semi-parametric model can improve the prediction of the conversion time for patients with MCI-to-dementia conversion, and it also has good predictive performance for all patients. Conclusions: Interval-censored data should be analyzed by using the models that were developed for interval- censored data to improve the model performance.
Assuntos
Disfunção Cognitiva , Demência , Progressão da Doença , Humanos , Disfunção Cognitiva/diagnóstico , Feminino , Masculino , Idoso , Demência/diagnóstico , Demência/epidemiologia , Demência/psicologia , Idoso de 80 Anos ou mais , Estudos de Coortes , Fatores de Tempo , Modelos Estatísticos , Valor Preditivo dos Testes , Doença de Alzheimer/diagnóstico , Testes Neuropsicológicos/estatística & dados numéricosRESUMO
As the pandemic continues to pose challenges to global public health, developing effective predictive models has become an urgent research topic. This study aims to explore the application of multi-objective optimization methods in selecting infectious disease prediction models and evaluate their impact on improving prediction accuracy, generalizability, and computational efficiency. In this study, the NSGA-II algorithm was used to compare models selected by multi-objective optimization with those selected by traditional single-objective optimization. The results indicate that decision tree (DT) and extreme gradient boosting regressor (XGBoost) models selected through multi-objective optimization methods outperform those selected by other methods in terms of accuracy, generalizability, and computational efficiency. Compared to the ridge regression model selected through single-objective optimization methods, the decision tree (DT) and XGBoost models demonstrate significantly lower root mean square error (RMSE) on real datasets. This finding highlights the potential advantages of multi-objective optimization in balancing multiple evaluation metrics. However, this study's limitations suggest future research directions, including algorithm improvements, expanded evaluation metrics, and the use of more diverse datasets. The conclusions of this study emphasize the theoretical and practical significance of multi-objective optimization methods in public health decision support systems, indicating their wide-ranging potential applications in selecting predictive models.
RESUMO
Small-angle scattering (SAS) is a key experimental technique for analyzing nanoscale structures in various materials. In SAS data analysis, selecting an appropriate mathematical model for the scattering intensity is critical, as it generates a hypothesis of the structure of the experimental sample. Traditional model selection methods either rely on qualitative approaches or are prone to overfitting. This paper introduces an analytical method that applies Bayesian model selection to SAS measurement data, enabling a quantitative evaluation of the validity of mathematical models. The performance of the method is assessed through numerical experiments using artificial data for multicomponent spherical materials, demonstrating that this proposed analysis approach yields highly accurate and interpretable results. The ability of the method to analyze a range of mixing ratios and particle size ratios for mixed components is also discussed, along with its precision in model evaluation by the degree of fitting. The proposed method effectively facilitates quantitative analysis of nanoscale sample structures in SAS, which has traditionally been challenging, and is expected to contribute significantly to advancements in a wide range of fields.
RESUMO
Drug resistance is one of the biggest challenges in the fight against cancer. In particular, in the case of glioblastoma, the most lethal brain tumour, resistance to temozolomide (the standard of care drug for chemotherapy in this tumour) is one of the main reasons behind treatment failure and hence responsible for the poor prognosis of patients diagnosed with this disease. In this work, we combine the power of three-dimensional in vitro experiments of treated glioblastoma spheroids with mathematical models of tumour evolution and adaptation. We use a novel approach based on internal variables for modelling the acquisition of resistance to temozolomide that was observed in experiments for a group of treated spheroids. These internal variables describe the cell's phenotypic state, which depends on the history of drug exposure and affects cell behaviour. We use model selection to determine the most parsimonious model and calibrate it to reproduce the experimental data, obtaining a high level of agreement between the in vitro and in silico outcomes. A sensitivity analysis is carried out to investigate the impact of each model parameter in the predictions. More importantly, we show how the model is useful for answering biological questions, such as what is the intrinsic adaptation mechanism, or for separating the sensitive and resistant populations. We conclude that the proposed in silico framework, in combination with experiments, can be useful to improve our understanding of the mechanisms behind drug resistance in glioblastoma and to eventually set some guidelines for the design of new treatment schemes.
Assuntos
Neoplasias Encefálicas , Resistencia a Medicamentos Antineoplásicos , Glioblastoma , Modelos Biológicos , Temozolomida , Temozolomida/farmacologia , Temozolomida/uso terapêutico , Glioblastoma/tratamento farmacológico , Humanos , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Neoplasias Encefálicas/tratamento farmacológico , Neoplasias Encefálicas/patologia , Antineoplásicos Alquilantes/uso terapêutico , Antineoplásicos Alquilantes/farmacologia , Linhagem Celular Tumoral , Esferoides Celulares/efeitos dos fármacos , Dacarbazina/análogos & derivados , Dacarbazina/uso terapêutico , Dacarbazina/farmacologia , Simulação por Computador , Adaptação FisiológicaRESUMO
Purpose: Best current practice in the analysis of dynamic contrast enhanced (DCE)-MRI is to employ a voxel-by-voxel model selection from a hierarchy of nested models. This nested model selection (NMS) assumes that the observed time-trace of contrast-agent (CA) concentration within a voxel, corresponds to a singular physiologically nested model. However, admixtures of different models may exist within a voxel's CA time-trace. This study introduces an unsupervised feature engineering technique (Kohonen-Self-Organizing-Map (K-SOM)) to estimate the voxel-wise probability of each nested model. Methods: Sixty-six immune-compromised-RNU rats were implanted with human U-251N cancer cells, and DCE-MRI data were acquired from all the rat brains. The time-trace of change in the longitudinalrelaxivity Δ R 1 for all animals' brain voxels was calculated. DCE-MRI pharmacokinetic (PK) analysis was performed using NMS to estimate three model regions: Model-1: normal vasculature without leakage, Model-2: tumor tissues with leakage without back-flux to the vasculature, Model-3: tumor vessels with leakage and back-flux. Approximately two hundred thirty thousand (229,314) normalized Δ R 1 profiles of animals' brain voxels along with their NMS results were used to build a K-SOM (topology-size: 8×8, with competitive-learning algorithm) and probability map of each model. K-fold nested-cross-validation (NCV, k=10) was used to evaluate the performance of the K-SOM probabilistic-NMS (PNMS) technique against the NMS technique. Results: The K-SOM PNMS's estimation for the leaky tumor regions were strongly similar (Dice-Similarity-Coefficient, DSC=0.774 [CI: 0.731-0.823], and 0.866 [CI: 0.828-0.912] for Models 2 and 3, respectively) to their respective NMS regions. The mean-percent-differences (MPDs, NCV, k=10) for the estimated permeability parameters by the two techniques were: -28%, +18%, and +24%, for v p , K trans , and v e , respectively. The KSOM-PNMS technique produced microvasculature parameters and NMS regions less impacted by the arterial-input-function dispersion effect. Conclusion: This study introduces an unsupervised model-averaging technique (K-SOM) to estimate the contribution of different nested-models in PK analysis and provides a faster estimate of permeability parameters.