Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.369
Filtrar
1.
Clin Chim Acta ; 564: 119928, 2024 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-39163897

RESUMO

BACKGROUND AND AIMS: Rheumatoid arthritis (RA) manifests through various symptoms and systemic manifestations. Diagnosis involves serological markers like rheumatoid factor (RF) and anti-citrullinated protein antibodies (ACPA). Past studies have shown the added value of likelihood ratios (LRs) in result interpretation. LRs can be combined with pretest probability to estimate posttest probability for RA. There is a lack of information on pretest probability. This study aimed to estimate pretest probabilities for RA. MATERIALS AND METHODS: This retrospective study included 133 consecutive RA patients and 651 consecutive disease controls presenting at a rheumatology outpatient clinic. Disease characteristics, risk factors associated with RA and laboratory parameters were documented for calculating pretest probabilities and LRs. RESULTS: Joint involvement, erosions, morning stiffness, and positive CRP, ESR tests significantly correlated with RA. Based on these factors, probabilities for RA were estimated. Besides, LRs for RA were established for RF and ACPA and combinations thereof. LRs increased with antibody levels and were highest for double high positivity. Posttest probabilities were estimated based on pretest probability and LR. CONCLUSION: By utilizing pretest probabilities for RA and LRs for RF and ACPA, posttest probabilities were estimated. Such approach enhances diagnostic accuracy, offering laboratory professionals and clinicians insights in the value of serological testing during the diagnostic process.

2.
Biometrics ; 80(3)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39166461

RESUMO

In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.


Assuntos
Algoritmos , Simulação por Computador , Modelos Estatísticos , Humanos , Funções Verossimilhança , Curva ROC , Biometria/métodos
3.
J Appl Stat ; 51(11): 2214-2231, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39157273

RESUMO

Nonparametric tests for equality of multivariate distributions are frequently desired in research. It is commonly required that test-procedures based on relatively small samples of vectors accurately control the corresponding Type I Error (TIE) rates. Often, in the multivariate testing, extensions of null-distribution-free univariate methods, e.g., Kolmogorov-Smirnov and Cramér-von Mises type schemes, are not exact, since their null distributions depend on underlying data distributions. The present paper extends the density-based empirical likelihood technique in order to nonparametrically approximate the most powerful test for the multivariate two-sample (MTS) problem, yielding an exact finite-sample test statistic. We rigorously apply one-to-one-mapping between the equality of vectors' distributions and the equality of distributions of relevant univariate linear projections. We establish a general algorithm that simplifies the use of projection pursuit, employing only a few of the infinitely many linear combinations of observed vectors' components. The displayed distribution-free strategy is employed in retrospective and group sequential manners. A novel MTS nonparametric procedure in the group sequential manner is proposed. The asymptotic consistency of the proposed technique is shown. Monte Carlo studies demonstrate that the proposed procedures exhibit extremely high and stable power characteristics across a variety of settings. Supplementary materials for this article are available online.

4.
J Appl Stat ; 51(11): 2197-2213, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39157269

RESUMO

In this paper, we study the robust estimation and empirical likelihood for the regression parameter in generalized linear models with right censored data. A robust estimating equation is proposed to estimate the regression parameter, and the resulting estimator has consistent and asymptotic normality. A bias-corrected empirical log-likelihood ratio statistic of the regression parameter is constructed, and it is shown that the statistic converges weakly to a standard χ 2 distribution. The result can be directly used to construct the confidence region of regression parameter. We use the bias correction method to directly calibrate the empirical log-likelihood ratio, which does not need to be multiplied by an adjustment factor. We also propose a method for selecting the tuning parameters in the loss function. Simulation studies show that the estimator of the regression parameter is robust and the bias-corrected empirical likelihood is better than the normal approximation method. An example of a real dataset from Alzheimer's disease studies shows that the proposed method can be applied in practical problems.

5.
Heliyon ; 10(15): e35040, 2024 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-39157407

RESUMO

In this paper, we explore the coefficient signs in weighted logistic regression, a variation of logistic regression that includes positive weights and is commonly used for handling uneven data sets and reject inference in credit scoring. Initially, we examine simple weighted logistic regression. Assuming full rank and overlap, we demonstrate that the slope's sign matches the sign of the difference in weighted averages of the independent variable across two groups, 1 and 0. We extend this analysis to multiple weighted logistic regression by employing two vectors: one representing the slopes and the other the differences in weighted averages of the independent variables across the groups. We establish that if one vector is zero, the other must also be zero. Additionally, we prove that if the slope vector isn't zero, the angle between these vectors will be acute. Our theoretical results can serve as a preliminary step prior to feature selection, which is important in logistic regression. Our numerical analysis further illustrates how our theoretical results can be applied to the well-known German Credit Data for reject inference. Additionally, we provide a detailed explanation of feature selection in our analysis.

6.
J Exp Child Psychol ; 247: 106037, 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39137505

RESUMO

Since the COVID-19 pandemic, both the public and researchers have raised questions regarding the potential impact of protective face-mask wearing on infants' development. Nevertheless, limited research has tested infants' response to protective face-mask wearing adults in real-life interactions and in neurodiverse populations. In addition, scarce attention was given to changes in interactive behavior of adults wearing a protective face-mask. The aims of the current study were (1) to examine differences in 12-month-old infants' behavioral response to an interactive parent wearing a protective face-mask during face-to-face interaction, (2) to investigate potential differences in infants at higher likelihood for autism (HL-ASD) as compared with general population (GP) counterparts, and (3) to explore significant differences in parents' behaviors while wearing or not wearing a protective face-mask. A total of 50 mother-infant dyads, consisting of 20 HL-ASD infants (siblings of individuals with autism) and 30 GP infants, participated in a 6-min face-to-face interaction. The interaction was videotaped through teleconferencing and comprised three 2-min episodes: (a) no mask, (b) mask, and (c) post-mask. Infants' emotionality and gaze direction, as well as mothers' vocal production and touching behaviors, were coded micro-analytically. Globally, GP infants exhibited more positive emotionality compared with their HL-ASD counterparts. Infants' negative emotionality and gaze avoidance did not differ statistically across episodes. Both groups of infants displayed a significant increase in looking time toward the caregiver during the mask episode. No statistically significant differences emerged in mothers' behaviors. These findings suggest that the use of protective face-masks might not negatively affect core dimensions of caregiver-infant interactions in GP and HL-ASD 12-month-old infants.

7.
Protein Sci ; 33(9): e5134, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39145435

RESUMO

Function and structure are strongly coupled in obligated oligomers such as Triosephosphate isomerase (TIM). In animals and fungi, TIM monomers are inactive and unstable. Previously, we used ancestral sequence reconstruction to study TIM evolution and found that before these lineages diverged, the last opisthokonta common ancestor of TIM (LOCATIM) was an obligated oligomer that resembles those of extant TIMs. Notably, calorimetric evidence indicated that ancestral TIM monomers are more structured than extant ones. To further increase confidence about the function, structure, and stability of the LOCATIM, in this work, we applied two different inference methodologies and the worst plausible case scenario for both of them, to infer four sequences of this ancestor and test the robustness of their physicochemical properties. The extensive biophysical characterization of the four reconstructed sequences of LOCATIM showed very similar hydrodynamic and spectroscopic properties, as well as ligand-binding energetics and catalytic parameters. Their 3D structures were also conserved. Although differences were observed in melting temperature, all LOCATIMs showed reversible urea-induced unfolding transitions, and for those that reached equilibrium, high conformational stability was estimated (ΔGTot = 40.6-46.2 kcal/mol). The stability of the inactive monomeric intermediates was also high (ΔGunf = 12.6-18.4 kcal/mol), resembling some protozoan TIMs rather than the unstable monomer observed in extant opisthokonts. A comparative analysis of the 3D structure of ancestral and extant TIMs shows a correlation between the higher stability of the ancestral monomers with the presence of several hydrogen bonds located in the "bottom" part of the barrel.


Assuntos
Triose-Fosfato Isomerase , Triose-Fosfato Isomerase/química , Triose-Fosfato Isomerase/genética , Triose-Fosfato Isomerase/metabolismo , Animais , Evolução Molecular , Multimerização Proteica , Modelos Moleculares , Estabilidade Enzimática
8.
Front Neurosci ; 18: 1392002, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39099634

RESUMO

Background: Acupuncture, as an alternative and complementary therapy recommended by the World Health Organization for stroke treatment, holds potential in ameliorating neurofunctional deficits induced by ischemic stroke (IS). Understanding the immediate and long-term effects of acupuncture and their interrelation would contribute to a better comprehension of the mechanisms underlying acupuncture efficacy. Methods: Activation likelihood estimation (ALE) meta-analysis was used to analyze the brain activation patterns reported in 21 relevant functional neuroimaging studies. Among these studies, 12 focused on the immediate brain activation and 9 on the long-term activation. Single dataset analysis were employed to identify both immediate and long-term brain activation of acupuncture treatment in IS patients, while contrast and conjunction analysis were utilized to explore distinctions and connections between the two. Results: According to the ALE analysis, immediately after acupuncture treatment, IS patients exhibited an enhanced cluster centered around the right precuneus (PCUN) and a reduced cluster centered on the left middle frontal gyrus (MFG). After long-term acupuncture treatment, IS patients showed an enhanced cluster in the left PCUN, along with two reduced clusters in the right insula (INS) and hippocampus (HIP), respectively. Additionally, in comparison to long-term acupuncture treatment, the right angular gyrus (ANG) demonstrated higher ALE scores immediately after acupuncture, whereas long-term acupuncture resulted in higher scores in the left superior parietal gyrus (SPG). The intersecting cluster activated by both of them was located in the left cuneus (CUN). Conclusion: The findings provide initial insights into both the immediate and long-term brain activation patterns of acupuncture treatment for IS, as well as the intricate interplay between them. Both immediate and long-term acupuncture treatments showed distinct patterns of brain activation, with the left CUN emerging as a crucial regulatory region in their association. Systematic Review Registration: https://www.crd.york.ac.uk/prospero/, CRD42023480834.

9.
Heliyon ; 10(14): e33598, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39100465

RESUMO

Recently, a novel improved adaptive Type-II progressive censoring strategy has been suggested in order to obtain adequate data from trials that require a lengthy amount of time. Considering this scheme, this paper focuses on various classical and Bayesian estimation challenges for parameter and some reliability metrics for the XLindley distribution. Two classical estimation methods are considered from the classical perspective to get the point and interval estimations of the model parameter as well as reliability and hazard rate functions. In addition to the usual approaches, the Bayesian methodology is looked at by leveraging the squared error loss function and the Markov chain Monte Carlo technique. The Bayes point and credible intervals are obtained based on two forms of the posterior distribution. A simulation examination is implemented adopting multiple circumstances to distinguish between the conventional and Bayesian estimations. The simulation results demonstrate that the Bayesian approach using the likelihood function is superior for estimating the model parameter when compared with the other methods. In contrast, when estimating reliability metrics, it is advisable to utilize the Bayesian method with the spacings function. Two real-world data sets are analyzed to integrate the proposed approaches into practice, and the ideal progressive censoring strategy is chosen using some optimality criteria.

10.
Commun Stat Theory Methods ; 53(17): 6038-6054, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39100716

RESUMO

Phase IV clinical trials are designed to monitor long-term side effects of medical treatment. For instance, childhood cancer survivors treated with chest radiation and/or anthracycline are often at risk of developing cardiotoxicity during their adulthood. Often the primary focus of a study could be on estimating the cumulative incidence of a particular outcome of interest such as cardiotoxicity. However, it is challenging to evaluate patients continuously and usually, this information is collected through cross-sectional surveys by following patients longitudinally. This leads to interval-censored data since the exact time of the onset of the toxicity is unknown. Rai et al. computed the transition intensity rate using a parametric model and estimated parameters using maximum likelihood approach in an illness-death model. However, such approach may not be suitable if the underlying parametric assumptions do not hold. This manuscript proposes a semi-parametric model, with a logit relationship for the treatment intensities in two groups, to estimate the transition intensity rates within the context of an illness-death model. The estimation of the parameters is done using an EM algorithm with profile likelihood. Results from the simulation studies suggest that the proposed approach is easy to implement and yields comparable results to the parametric model.

11.
Am J Hum Genet ; 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39106865

RESUMO

Mendelian randomization (MR) utilizes genome-wide association study (GWAS) summary data to infer causal relationships between exposures and outcomes, offering a valuable tool for identifying disease risk factors. Multivariable MR (MVMR) estimates the direct effects of multiple exposures on an outcome. This study tackles the issue of highly correlated exposures commonly observed in metabolomic data, a situation where existing MVMR methods often face reduced statistical power due to multicollinearity. We propose a robust extension of the MVMR framework that leverages constrained maximum likelihood (cML) and employs a Bayesian approach for identifying independent clusters of exposure signals. Applying our method to the UK Biobank metabolomic data for the largest Alzheimer disease (AD) cohort through a two-sample MR approach, we identified two independent signal clusters for AD: glutamine and lipids, with posterior inclusion probabilities (PIPs) of 95.0% and 81.5%, respectively. Our findings corroborate the hypothesized roles of glutamate and lipids in AD, providing quantitative support for their potential involvement.

12.
BMC Genomics ; 25(1): 764, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39107741

RESUMO

BACKGROUND: Chemoreception is crucial for insect fitness, underlying for instance food-, host-, and mate finding. Chemicals in the environment are detected by receptors from three divergent gene families: odorant receptors (ORs), gustatory receptors (GRs), and ionotropic receptors (IRs). However, how the chemoreceptor gene families evolve in parallel with ecological specializations remains poorly understood, especially in the order Coleoptera. Hence, we sequenced the genome and annotated the chemoreceptor genes of the specialised ambrosia beetle Trypodendron lineatum (Coleoptera, Curculionidae, Scolytinae) and compared its chemoreceptor gene repertoires with those of other scolytines with different ecological adaptations, as well as a polyphagous cerambycid species. RESULTS: We identified 67 ORs, 38 GRs, and 44 IRs in T. lineatum ('Tlin'). Across gene families, T. lineatum has fewer chemoreceptors compared to related scolytines, the coffee berry borer Hypothenemus hampei and the mountain pine beetle Dendroctonus ponderosae, and clearly fewer receptors than the polyphagous cerambycid Anoplophora glabripennis. The comparatively low number of chemoreceptors is largely explained by the scarcity of large receptor lineage radiations, especially among the bitter taste GRs and the 'divergent' IRs, and the absence of alternatively spliced GR genes. Only one non-fructose sugar receptor was found, suggesting several sugar receptors have been lost. Also, we found no orthologue in the 'GR215 clade', which is widely conserved across Coleoptera. Two TlinORs are orthologous to ORs that are functionally conserved across curculionids, responding to 2-phenylethanol (2-PE) and green leaf volatiles (GLVs), respectively. CONCLUSIONS: Trypodendron lineatum reproduces inside the xylem of decaying conifers where it feeds on its obligate fungal mutualist Phialophoropsis ferruginea. Like previous studies, our results suggest that stenophagy correlates with small chemoreceptor numbers in wood-boring beetles; indeed, the few GRs may be due to its restricted fungal diet. The presence of TlinORs orthologous to those detecting 2-PE and GLVs in other species suggests these compounds are important for T. lineatum. Future functional studies should test this prediction, and chemoreceptor annotations should be conducted on additional ambrosia beetle species to investigate whether few chemoreceptors is a general trait in this specialized group of beetles.


Assuntos
Receptores Odorantes , Animais , Receptores Odorantes/genética , Receptores Odorantes/metabolismo , Besouros/genética , Filogenia , Proteínas de Insetos/genética , Proteínas de Insetos/metabolismo
13.
Forensic Sci Int Genet ; 73: 103111, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39128429

RESUMO

This study evaluates the performance of analysing surface DNA samples using massively parallel sequencing (MPS) compared to traditional capillary electrophoresis (CE). A total of 30 samples were collected from various surfaces in an office environment and were analysed with CE and MPS. These were compared against 60 reference samples (office inhabitants). To identify contributors, likelihood ratios (LRs) were calculated for MPS and CE data using the probabilistic genotyping software MPSproto and EuroForMix respectively. Although a higher number of sequences/peaks were observed per DNA profile in MPS compared to CE, LR values were found to be lower for MPS data formats. This might be the result of the increased complexity of MPS data, along with a possible elevation of unknown alleles and/or artefacts. The study highlights avenues for improving MPS data quality and analysis to facilitate more robust interpretation of challenging casework-like samples.

14.
Biom J ; 66(6): e202300185, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39101657

RESUMO

There has been growing research interest in developing methodology to evaluate the health care providers' performance with respect to a patient outcome. Random and fixed effects models are traditionally used for such a purpose. We propose a new method, using a fusion penalty to cluster health care providers based on quasi-likelihood. Without any priori knowledge of grouping information, our method provides a desirable data-driven approach for automatically clustering health care providers into different groups based on their performance. Further, the quasi-likelihood is more flexible and robust than the regular likelihood in that no distributional assumption is needed. An efficient alternating direction method of multipliers algorithm is developed to implement the proposed method. We show that the proposed method enjoys the oracle properties; namely, it performs as well as if the true group structure were known in advance. The consistency and asymptotic normality of the estimators are established. Simulation studies and analysis of the national kidney transplant registry data demonstrate the utility and validity of our method.


Assuntos
Biometria , Pessoal de Saúde , Análise por Conglomerados , Funções Verossimilhança , Humanos , Pessoal de Saúde/estatística & dados numéricos , Biometria/métodos , Transplante de Rim , Algoritmos
15.
Biometrics ; 80(3)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39177025

RESUMO

Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.


Assuntos
Doença de Alzheimer , Simulação por Computador , Modelos Estatísticos , Humanos , Funções Verossimilhança , Algoritmos , Neuroimagem , Análise Fatorial , Interpretação Estatística de Dados , Fatores de Tempo
16.
J Med Toxicol ; 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39179942

RESUMO

Likelihood ratios compare two values (i.e., case rates) in order to illustrate the magnitude of the difference between the two. This ratio increases the confidence one can have in a diagnostic test from a different vantage point than that of sensitivity and specificity. The calculations of likelihood ratios are presented along with a simplified approach. Likelihood ratios are another tool the toxicologist should employ in their understanding of statistics and probability.

17.
R Soc Open Sci ; 11(8): 240733, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39169970

RESUMO

Parameter inference and uncertainty quantification are important steps when relating mathematical models to real-world observations and when estimating uncertainty in model predictions. However, methods for doing this can be computationally expensive, particularly when the number of unknown model parameters is large. The aim of this study is to develop and test an efficient profile likelihood-based method, which takes advantage of the structure of the mathematical model being used. We do this by identifying specific parameters that affect model output in a known way, such as a linear scaling. We illustrate the method by applying it to three toy models from different areas of the life sciences: (i) a predator-prey model from ecology; (ii) a compartment-based epidemic model from health sciences; and (iii) an advection-diffusion reaction model describing the transport of dissolved solutes from environmental science. We show that the new method produces results of comparable accuracy to existing profile likelihood methods but with substantially fewer evaluations of the forward model. We conclude that our method could provide a much more efficient approach to parameter inference for models where a structured approach is feasible. Computer code to apply the new method to user-supplied models and data is provided via a publicly accessible repository.

18.
Stat Methods Med Res ; : 9622802241259175, 2024 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-39193788

RESUMO

The mixture of probabilistic regression models is one of the most common techniques to incorporate the information of covariates into learning of the population heterogeneity. Despite its flexibility, unreliable estimates can occur due to multicollinearity among covariates. In this paper, we develop Liu-type shrinkage methods through an unsupervised learning approach to estimate the model coefficients in the presence of multicollinearity. We evaluate the performance of our proposed methods via classification and stochastic versions of the expectation-maximization algorithm. We show using numerical simulations that the proposed methods outperform their Ridge and maximum likelihood counterparts. Finally, we apply our methods to analyze the bone mineral data of women aged 50 and older.

19.
bioRxiv ; 2024 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-39149245

RESUMO

The Augmented Hebbian Reweighting Model (AHRM) has been effectively utilized to model the collective performance of observers in various perceptual learning studies. In this work, we have introduced a novel hierarchical Bayesian Augmented Hebbian Reweighting Model (HB-AHRM) to simultaneously model the learning curves of individual participants and the entire population within a single framework. We have compared its performance to that of a Bayesian Inference Procedure (BIP), which independently estimates the posterior distributions of model parameters for each individual subject without employing a hierarchical structure. To cope with the substantial computational demands, we developed an approach to approximate the likelihood function in the AHRM with feature engineering and linear regression, increasing the speed of the estimation procedure by 20,000 times. The HB-AHRM has enabled us to compute the joint posterior distribution of hyperparameters and parameters at the population, observer, and test levels, facilitating statistical inferences across these levels. While we have developed this methodology within the context of a single experiment, the HB-AHRM and the associated modeling techniques can be readily applied to analyze data from various perceptual learning experiments and provide predictions of human performance at both the population and individual levels. The likelihood approximation concept introduced in this study may have broader utility in fitting other stochastic models lacking analytic forms.

20.
Accid Anal Prev ; 207: 107752, 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39180851

RESUMO

The random parameters Generalized Linear Model (GLM) is frequently used to model speeding characteristics and capture the heterogenous effects of factors. However, this statistical approach is seldom employed for prediction and generalization due to the challenge of transferring its predefined errors. Recently, the emergence of explainable AI techniques has illuminated a new path for analyzing factors associated with risky driving behaviors. Despite this, there remains a gap that comparing results from machine and deep learning (ML/DL) approaches with those from random parameters GLM. This study aims to apply the random parameter GLM and explainable deep learning to evaluate the heterogenous effects of factors on the taxis' high-range speeding likelihood. Initially, a Beta GLM with random parameters (BGLM-RP) is developed to model the high-range speeding likelihood among taxi drivers. Additionally, XGBoost, a simple convolutional neural network (Simple-CNN), a deeper CNN (DCNN), and a deeper CNN with self-attention (DCNN-SA) are developed. The quantified explanations and illustrations of the factors' heterogenous effects from ML/DL models are derived from pseudo coefficients by decomposing factors' SHapley Additive exPlanations (SHAP) values. All the developed statistical, ML, and DL models are compared in terms of mean absolute errors and mean square errors on testing and full data. Results show that DCNN-SA excels in prediction on testing data, indicating its superior generalization capabilities, while BGLM-RP outperforms other models on full data. The DCNN-SA can reveal the heterogenous effects of factors for both in-sample and out-of-sample data, which is not possible for the random parameter GLM. However, BGLM-RP can reveal larger magnitudes of the factors' heterogenous effects for in-sample data. The signs and significances are identical between the varying coefficients from BGLM-RP and the pseudo coefficients from the ML/DL models, demonstrating the validity and rationale of using the proposed explanation framework to quantify the factors' effects in ML/DL models. The study also discusses the contributions of various factors to the high-range speeding likelihood of taxi drivers.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA