Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.072
Filter
1.
Stat Methods Med Res ; : 9622802241268601, 2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39105419

ABSTRACT

The case-cohort design is a commonly used cost-effective sampling strategy for large cohort studies, where some covariates are expensive to measure or obtain. In this paper, we consider regression analysis under a case-cohort study with interval-censored failure time data, where the failure time is only known to fall within an interval instead of being exactly observed. A common approach to analyzing data from a case-cohort study is the inverse probability weighting approach, where only subjects in the case-cohort sample are used in estimation, and the subjects are weighted based on the probability of inclusion into the case-cohort sample. This approach, though consistent, is generally inefficient as it does not incorporate information outside the case-cohort sample. To improve efficiency, we first develop a sieve maximum weighted likelihood estimator under the Cox model based on the case-cohort sample and then propose a procedure to update this estimator by using information in the full cohort. We show that the update estimator is consistent, asymptotically normal, and at least as efficient as the original estimator. The proposed method can flexibly incorporate auxiliary variables to improve estimation efficiency. A weighted bootstrap procedure is employed for variance estimation. Simulation results indicate that the proposed method works well in practical situations. An application to a Phase 3 HIV vaccine efficacy trial is provided for illustration.

2.
Front Psychol ; 15: 1400340, 2024.
Article in English | MEDLINE | ID: mdl-39021647

ABSTRACT

Background: Chronic pain's influence on emotional well-being can be significant. It may evoke feelings of despair, frustration, nervousness, and melancholy in individuals, which often manifest as reactions to enduring pain and disruptions in their daily lives. In this study, we seek to perform Bootstrap Exploratory Graph Analysis (EGA) on the Persian Version of the Perth Alexithymia Questionnaire (PAQ) in a cohort of people with chronic pain. Methods: The research concentrated on the population of individuals encountering chronic pain within Tehran province from 2022 to 2023. Ultimately, the analysis comprised information from 234 male participants (with a mean age of 30.59, SD = 6.84) and 307 female participants (with a mean age of 30.16, SD = 6.65). After data collection, statistical analysis was conducted using the EGAnet2.0.4 package in R.4.3.2 software. Results: The outcome of bootstrapped EGA unveiled a two-dimensional configuration of the PAQ comprising Factor 1 denoted as negative difficulty in describing and identifying feelings (N-DDIF) and Factor 2 characterized as general-externally orientated thinking (GEOT), representing robust structural integrity and item consistency (all items have stabilities > 0.70). Conclusion: These findings endorse the validity of the PAQ, as evidenced by its confirmation in a broader sample using a novel methodology consistent with existing literature on two-factor decentering models.

3.
Stat Med ; 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39044448

ABSTRACT

Logistic regression models are widely used in case-control data analysis, and testing the goodness-of-fit of their parametric model assumption is a fundamental research problem. In this article, we propose to enhance the power of the goodness-of-fit test by exploiting a monotonic density ratio model, in which the ratio of case and control densities is assumed to be a monotone function. We show that such a monotonic density ratio model is naturally induced by the retrospective case-control sampling design under the alternative hypothesis. The pool-adjacent-violator algorithm is adapted to solve for the constrained nonparametric maximum likelihood estimator under the alternative hypothesis. By measuring the discrepancy between this estimator and the semiparametric maximum likelihood estimator under the null hypothesis, we develop a new Kolmogorov-Smirnov-type statistic to test the goodness-of-fit for logistic regression models with case-control data. A bootstrap resampling procedure is suggested to approximate the p $$ p $$ -value of the proposed test. Simulation results show that the type I error of the proposed test is well controlled and the power improvement is substantial in many cases. Three real data applications are also included for illustration.

4.
BMC Bioinformatics ; 25(1): 236, 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-38997639

ABSTRACT

BACKGROUND: Homologous recombination deficiency (HRD) stands as a clinical indicator for discerning responsive outcomes to platinum-based chemotherapy and poly ADP-ribose polymerase (PARP) inhibitors. One of the conventional approaches to HRD prognostication has generally centered on identifying deleterious mutations within the BRCA1/2 genes, along with quantifying the genomic scars, such as Genomic Instability Score (GIS) estimation with scarHRD. However, the scarHRD method has limitations in scenarios involving tumors bereft of corresponding germline data. Although several RNA-seq-based HRD prediction algorithms have been developed, they mainly support cohort-wise classification, thereby yielding HRD status without furnishing an analogous quantitative metric akin to scarHRD. This study introduces the expHRD method, which operates as a novel transcriptome-based framework tailored to n-of-1-style HRD scoring. RESULTS: The prediction model has been established using the elastic net regression method in the Cancer Genome Atlas (TCGA) pan-cancer training set. The bootstrap technique derived the HRD geneset for applying the expHRD calculation. The expHRD demonstrated a notable correlation with scarHRD and superior performance in predicting HRD-high samples. We also performed intra- and extra-cohort evaluations for clinical feasibility in the TCGA-OV and the Genomic Data Commons (GDC) ovarian cancer cohort, respectively. The innovative web service designed for ease of use is poised to extend the realms of HRD prediction across diverse malignancies, with ovarian cancer standing as an emblematic example. CONCLUSIONS: Our novel approach leverages the transcriptome data, enabling the prediction of HRD status with remarkable precision. This innovative method addresses the challenges associated with limited available data, opening new avenues for utilizing transcriptomics to inform clinical decisions.


Subject(s)
Homologous Recombination , Neoplasms , Transcriptome , Humans , Transcriptome/genetics , Homologous Recombination/genetics , Neoplasms/genetics , Algorithms , Female , Gene Expression Profiling/methods
5.
Viruses ; 16(7)2024 Jul 11.
Article in English | MEDLINE | ID: mdl-39066276

ABSTRACT

Swine acute diarrhoea syndrome coronavirus (SADS-CoV; Coronaviridae, Rhinacovirus) was detected in 2017 in Guangdong Province (China), where it caused high mortality rates in piglets. According to previous studies, SADS-CoV evolved from horseshoe bat reservoirs. Here, we report the first five Rhinacovirus genomes sequenced in horseshoe bats from Vietnam and their comparisons with data published in China. Our phylogenetic analyses provided evidence for four groups: rhinacoviruses from Rhinolphus pusillus bats, including one from Vietnam; bat rhinacoviruses from Hainan; bat rhinacoviruses from Yunnan showing a divergent synonymous nucleotide composition; and SADS-CoV and related bat viruses, including four rhinacoviruses from Vietnam sampled in Rhinolophus affinis and Rhinolophus thomasi. Our phylogeographic analyses showed that bat rhinacoviruses from Dien Bien (Vietnam) share more affinities with those from Yunnan (China) and that the ancestor of SADS-CoVs arose in Rhinolophus affinis circulating in Guangdong. We detected sequencing errors and artificial chimeric genomes in published data. The two SADS-CoV genomes previously identified as recombinant could also be problematic. The reliable data currently available, therefore, suggests that all SADS-CoV strains originate from a single bat source and that the virus has been spreading in pig farms in several provinces of China for at least seven years since the first outbreak in August 2016.


Subject(s)
Alphacoronavirus , Chiroptera , Coronavirus Infections , Genome, Viral , Phylogeny , Swine Diseases , Animals , Chiroptera/virology , Vietnam/epidemiology , China/epidemiology , Swine , Swine Diseases/virology , Swine Diseases/epidemiology , Alphacoronavirus/genetics , Alphacoronavirus/classification , Alphacoronavirus/isolation & purification , Coronavirus Infections/veterinary , Coronavirus Infections/virology , Coronavirus Infections/epidemiology , Evolution, Molecular , Phylogeography
6.
Stat Med ; 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38951867

ABSTRACT

For survival analysis applications we propose a novel procedure for identifying subgroups with large treatment effects, with focus on subgroups where treatment is potentially detrimental. The approach, termed forest search, is relatively simple and flexible. All-possible subgroups are screened and selected based on hazard ratio thresholds indicative of harm with assessment according to the standard Cox model. By reversing the role of treatment one can seek to identify substantial benefit. We apply a splitting consistency criteria to identify a subgroup considered "maximally consistent with harm." The type-1 error and power for subgroup identification can be quickly approximated by numerical integration. To aid inference we describe a bootstrap bias-corrected Cox model estimator with variance estimated by a Jacknife approximation. We provide a detailed evaluation of operating characteristics in simulations and compare to virtual twins and generalized random forests where we find the proposal to have favorable performance. In particular, in our simulation setting, we find the proposed approach favorably controls the type-1 error for falsely identifying heterogeneity with higher power and classification accuracy for substantial heterogeneous effects. Two real data applications are provided for publicly available datasets from a clinical trial in oncology, and HIV.

7.
J Eval Clin Pract ; 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38959383

ABSTRACT

OBJECTIVES: Among the provisions within the Affordable Care Act (ACA), expanding Medicaid was arguably the greatest contributor to increasing access to care. For over a decade, researchers have investigated how Medicaid expansion impacted cancer outcomes. Over this same decade, statistical theory illuminated how state-based policy research could be compromised by invalid inference. After reviewing the literature to identify the inference strategies of state-based cancer registry Medicaid expansion research, this study aimed to assess how inference decisions could change the interpretation of Medicaid expansion's impact on staging, treatment, and mortality in cancer patients. DATA SOURCES: Cancer case data (2000-2019) was obtained from the Surveillance, Epidemiology, End Results (SEER) programme. Cases included all cancer sites combined, top 10 cancer sites combined, and three screening amenable cancers (colorectal, female breast, female cervical). STUDY DESIGN: A Difference-in-Differences design estimated the association between Medicaid expansion and four binary outcomes: distant stage, initiating treatment >1 month after diagnosis, no surgery recommendation, and death. Three inference techniques were compared: (1) traditional, (2) cluster, and (3) Wild Cluster Bootstrap. DATA COLLECTION: Data was accessed via SEER*Stat. PRINCIPAL FINDINGS: Estimating standard errors via traditional inference would suggest that Medicaid expansion was associated with delayed treatment initiation and surgery recommendations. Traditional and clustered inference also suggested that Medicaid expansion reduced mortality. Inference using Wild Cluster Bootstrap techniques never rejected the null hypotheses. CONCLUSIONS: This study reiterates the importance of explicit inference. Future state-based, cancer policy research can be improved by incorporating emerging techniques. These findings warrant caution when interpreting prior SEER research reporting significant effects of Medicaid expansion on cancer outcomes, especially studies that did not explicitly define their inference strategy.

8.
Biom J ; 66(5): e202300197, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38953619

ABSTRACT

In biomedical research, the simultaneous inference of multiple binary endpoints may be of interest. In such cases, an appropriate multiplicity adjustment is required that controls the family-wise error rate, which represents the probability of making incorrect test decisions. In this paper, we investigate two approaches that perform single-step p $p$ -value adjustments that also take into account the possible correlation between endpoints. A rather novel and flexible approach known as multiple marginal models is considered, which is based on stacking of the parameter estimates of the marginal models and deriving their joint asymptotic distribution. We also investigate a nonparametric vector-based resampling approach, and we compare both approaches with the Bonferroni method by examining the family-wise error rate and power for different parameter settings, including low proportions and small sample sizes. The results show that the resampling-based approach consistently outperforms the other methods in terms of power, while still controlling the family-wise error rate. The multiple marginal models approach, on the other hand, shows a more conservative behavior. However, it offers more versatility in application, allowing for more complex models or straightforward computation of simultaneous confidence intervals. The practical application of the methods is demonstrated using a toxicological dataset from the National Toxicology Program.


Subject(s)
Biomedical Research , Biometry , Models, Statistical , Biometry/methods , Biomedical Research/methods , Sample Size , Endpoint Determination , Humans
9.
BMC Med Res Methodol ; 24(1): 148, 2024 Jul 13.
Article in English | MEDLINE | ID: mdl-39003462

ABSTRACT

We propose a compartmental model for investigating smoking dynamics in an Italian region (Tuscany). Calibrating the model on local data from 1993 to 2019, we estimate the probabilities of starting and quitting smoking and the probability of smoking relapse. Then, we forecast the evolution of smoking prevalence until 2043 and assess the impact on mortality in terms of attributable deaths. We introduce elements of novelty with respect to previous studies in this field, including a formal definition of the equations governing the model dynamics and a flexible modelling of smoking probabilities based on cubic regression splines. We estimate model parameters by defining a two-step procedure and quantify the sampling variability via a parametric bootstrap. We propose the implementation of cross-validation on a rolling basis and variance-based Global Sensitivity Analysis to check the robustness of the results and support our findings. Our results suggest a decrease in smoking prevalence among males and stability among females, over the next two decades. We estimate that, in 2023, 18% of deaths among males and 8% among females are due to smoking. We test the use of the model in assessing the impact on smoking prevalence and mortality of different tobacco control policies, including the tobacco-free generation ban recently introduced in New Zealand.


Subject(s)
Forecasting , Smoking Cessation , Smoking , Humans , Italy/epidemiology , Female , Male , Smoking/epidemiology , Prevalence , Forecasting/methods , Smoking Cessation/statistics & numerical data , Adult , Middle Aged , Models, Statistical
10.
Clin Epidemiol ; 16: 461-473, 2024.
Article in English | MEDLINE | ID: mdl-39049900

ABSTRACT

Purpose: Childhood cancer survivors experience interconnected symptoms, patterns of which can be elucidated by network analysis. However, current symptom networks are constructed based on the average survivors without considering individual heterogeneities. We propose to evaluate personal symptom network estimation using the Ising model with covariates through simulations and estimate personal symptom network for adult childhood cancer survivors. Patients and Methods: We adopted the Ising model with covariates to construct networks by employing logistic regressions for estimating associations between binary symptoms. Simulation experiments assessed the robustness of this method in constructing personal symptom network. Real-world data illustration included 1708 adult childhood cancer survivors from the St. Jude Lifetime Cohort Study (SJLIFE), a retrospective cohort study with prospective follow-up to characterize the etiology and late effects for childhood cancer survivors. Patients' baseline symptoms in 10 domains (cardiac, pulmonary, sensation, nausea, movement, pain, memory, fatigue, anxiety, depression) and individual characteristics (age, sex, race/ethnicity, attained education, personal income, and marital status) were self-reported using survey. Treatment variables (any chemo or radiation therapy) were obtained from medical records. Personal symptom network of 10 domains was estimated using the Ising model, incorporating individual characteristics and treatment data. Results: Simulations confirmed the robustness of the Ising model with covariates in constructing personal symptom networks. Real-world data analysis identified age, sex, race/ethnicity, education, marital status, and treatment (any chemo and radiation therapy) as major factors influencing symptom co-occurrence. Older childhood cancer survivors showed stronger cardiac-fatigue associations. Survivors of racial/ethnic minorities had stronger pain-fatigue associations. Female survivors with above-college education demonstrated stronger pain-anxiety associations. Unmarried survivors who received radiation had stronger association between movement and memory problems. Conclusion: The Ising model with covariates accurately estimates personal symptom networks. Individual heterogeneities exist in symptom co-occurrence patterns for childhood cancer survivors. The estimated personal symptom network offers insights into interconnected symptom experiences.

11.
Inquiry ; 61: 469580241266373, 2024.
Article in English | MEDLINE | ID: mdl-39066676

ABSTRACT

Improving the productivity of healthcare delivery and optimizing the allocation of regional healthcare resources are crucial for the health providers. The objective of this study is to evaluate the productivity dynamics of healthcare delivery at the regional (provincial) level in China, to provide evidence-based policy implications. After a review of literature, actual number of open beds, number of occupational or assistant doctors, number of registered nurses, and number of other staff were selected as input variables. The number of diagnostic visits and number of discharged inpatients were adopted as the output indicators. The panel data of 31 provinces in mainland China from 2010 to 2019 were extracted from Health Statistics Yearbook. Bootstrap-Malmquist Data Envelopment Analysis (DEA) model was used to measure the total factor productivity changes (TFPC) and its components. During the study period, the analysis of total factor productivity (TFP) in China revealed a declining trend with an average annual decline of 0.9% (ranging from 0.860 to 1.204). For each of the 31 provinces, the annual TFP scores varied from 0.971 to 1.029. On average, technical efficiency changes (TEC) had showed a downward trend from 2010-2011 (0.980) to 2013-2014 (0.982), and then an upward trend in 2014-2015 (1.029) and the following three consecutive years since 2016-2017 (1.000, 1.013, 1.009). Similarly, the trend in technological changes (TC) was consistent with the TEC from 2010-2019, which fluctuated between 0.969 and 1.011 on average per year at the provincial level. Notably, the point of inflection appeared at 2013-2014. Regional healthcare inputs and outputs in mainland China saw an upward trend from 2010 to 2019. However, TFPC, TEC, and TC decreased across all 31 provinces. TFP experienced a declining trend from 2010 to 2014, followed by growth until 2019. This may be related to the new healthcare reform being implemented since 2009, as service efficiency and capacity may undergo a reversal at the beginning of the reform.


Subject(s)
Delivery of Health Care , Efficiency, Organizational , China , Humans
12.
Risk Manag Healthc Policy ; 17: 1669-1685, 2024.
Article in English | MEDLINE | ID: mdl-38919406

ABSTRACT

Purpose: The aim of this study was to investigate the risk factors of postmenopausal special uterine leiomyoma pathological types or leiomyosarcoma and to develop a nomogram for clinical risk assessment, ultimately to reduce unnecessary surgical interventions and corresponding economic expenses. Methods: A total of 707 patients with complete information were enrolled from 1 August 2012 to 1 August 2022. Univariate and multivariate logistic regression models were used to analyse the association between variables and special uterine leiomyoma pathological types or leiomyosarcoma in postmenopausal patients. A nomogram for special uterine leiomyoma pathological types or leiomyosarcoma in postmenopausal patients was developed and validated by bootstrap resampling. The calibration curve was used to assess the accuracy of the model and receiver operating characteristic (ROC) curve and decision curve analysis (DCA) were compared with the clinical experience model. Results: The increasing trend after menopause, the diameter of the largest uterine fibroid, serum carcinoembryonic antigen 125 concentration, Serum neutrophil to lymphocyte ratio, and Serum phosphorus ion concentration were independent risk factors for special uterine leiomyoma pathological types or leiomyosarcoma in postmenopausal patients. We developed a user-friendly nomogram which showed good diagnostic performance (AUC=0.724). The model was consistent and the calibration curve of our cohort was close to the ideal diagonal line. DCA indicated that the model has potential value for clinical application. Furthermore, our model was superior to the previous clinical experience model in terms of ROC and DCA. Conclusion: We have developed a prediction nomogram for special uterine leiomyoma pathological types or leiomyosarcoma in postmenopausal patients. This nomogram could serve as an important warning signal and evaluation method for special uterine leiomyoma pathological types or leiomyosarcoma in postmenopausal patients.

13.
Sensors (Basel) ; 24(11)2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38894454

ABSTRACT

The high-speed railway subgrade compaction quality is controlled by the compaction degree (K), with the maximum dry density (ρdmax) serving as a crucial indicator for its calculation. The current mechanisms and methods for determining the ρdmax still suffer from uncertainties, inefficiencies, and lack of intelligence. These deficiencies can lead to insufficient assessments for the high-speed railway subgrade compaction quality, further impacting the operational safety of high-speed railways. In this paper, a novel method for full-section assessment of high-speed railway subgrade compaction quality based on ML-interval prediction theory is proposed. Firstly, based on indoor vibration compaction tests, a method for determining the ρdmax based on the dynamic stiffness Krb turning point is proposed. Secondly, the Pso-OptimalML-Adaboost (POA) model for predicting ρdmax is determined based on three typical machine learning (ML) algorithms, which are back propagation neural network (BPNN), support vector regression (SVR), and random forest (RF). Thirdly, the interval prediction theory is introduced to quantify the uncertainty in ρdmax prediction. Finally, based on the Bootstrap-POA-ANN interval prediction model and spatial interpolation algorithms, the interval distribution of ρdmax across the full-section can be determined, and a model for full-section assessment of compaction quality is developed based on the compaction standard (95%). Moreover, the proposed method is applied to determine the optimal compaction thicknesses (H0), within the station subgrade test section in the southwest region. The results indicate that: (1) The PSO-BPNN-AdaBoost model performs better in the accuracy and error metrics, which is selected as the POA model for predicting ρdmax. (2) The Bootstrap-POA-ANN interval prediction model for ρdmax can construct clear and reliable prediction intervals. (3) The model for full-section assessment of compaction quality can provide the full-section distribution interval for K. Comparing the H0 of 50~60 cm and 60~70 cm, the compaction quality is better with the H0 of 40~50 cm. The research findings can provide effective techniques for assessing the compaction quality of high-speed railway subgrades.

14.
Econom Rev ; 43(6): 345-378, 2024.
Article in English | MEDLINE | ID: mdl-38894875

ABSTRACT

This article proposes a powerful alternative to the t-test of the null hypothesis that a coefficient in a linear regression is equal to zero when a regressor is mismeasured. We assume there are two contaminated measurements of the regressor of interest. We allow the two measurement errors to be nonclassical in the sense that they may both be correlated with the true regressor, they may be correlated with each other, and we do not require any location normalizations on the measurement errors. We propose a new maximal t-statistic that is formed from the regression of the outcome onto a maximally weighted linear combination of the two measurements. The critical values of the test are easily computed via a multiplier bootstrap. In simulations, we show that this new test can be significantly more powerful than t-statistics based on OLS or IV estimates. Finally, we apply the proposed test to a study of returns to education based on twin data from the UK. With our maximal t-test, we can discover statistically significant returns to education when standard t-tests do not.

15.
J Appl Stat ; 51(7): 1227-1250, 2024.
Article in English | MEDLINE | ID: mdl-38835822

ABSTRACT

The main concern of this paper is providing a flexible discrete model that captures every kind of dispersion (equi-, over- and under-dispersion). Based on the balanced discretization method, a new discrete version of Burr-Hatke distribution is introduced with the partial moment-preserving property. Some statistical properties of the new distribution are introduced, and the applicability of proposed model is evaluated by considering counting series. A new integer-valued autoregressive (INAR) process based on the mixing Pegram and binomial thinning operators with discrete Burr-Hatke innovations is introduced, which can model contagious data properly. The different estimation approaches of parameters of the new process are provided and compared through the Monte Carlo simulation scheme. The performance of the proposed process is evaluated by four data sets of the daily death counts of the COVID-19 in Austria, Switzerland, Nigeria and Slovenia in comparison with some competitor INAR(1) models, along with the Pearson residual analysis of the assessing model. The goodness of fit measures affirm the adequacy of the proposed process in modeling all COVID-19 data sets. The fundamental prediction procedures are considered for new process by classic, modified Sieve bootstrap and Bayesian forecasting methods for all COVID-19 data sets, which is concluded that the Bayesian forecasting approach provides more reliable results.

16.
Sensors (Basel) ; 24(12)2024 Jun 18.
Article in English | MEDLINE | ID: mdl-38931731

ABSTRACT

Remote sensing products are typically assessed using a single accuracy estimate for the entire map, despite significant variations in accuracy across different map areas or classes. Estimating per-pixel uncertainty is a major challenge for enhancing the usability and potential of remote sensing products. This paper introduces the dataDriven open access tool, a novel statistical design-based approach that specifically addresses this issue by estimating per-pixel uncertainty through a bootstrap resampling procedure. Leveraging Sentinel-2 remote sensing data as auxiliary information, the capabilities of the Google Earth Engine cloud computing platform, and the R programming language, dataDriven can be applied in any world region and variables of interest. In this study, the dataDriven tool was tested in the Rincine forest estate study area-eastern Tuscany, Italy-focusing on volume density as the variable of interest. The average volume density was 0.042, corresponding to 420 m3 per hectare. The estimated pixel errors ranged between 93 m3 and 979 m3 per hectare and were 285 m3 per hectare on average. The ability to produce error estimates for each pixel in the map is a novel aspect in the context of the current advances in remote sensing and forest monitoring and assessment. It constitutes a significant support in forest management applications and also a powerful communication tool since it informs users about areas where map estimates are unreliable, at the same time highlighting the areas where the information provided via the map is more trustworthy. In light of this, the dataDriven tool aims to support researchers and practitioners in the spatially exhaustive use of remote sensing-derived products and map validation.

17.
J Biopharm Stat ; : 1-24, 2024 Jun 22.
Article in English | MEDLINE | ID: mdl-38907670

ABSTRACT

In this paper, we present some results to make inference about the parameters of lower truncated proportional hazard rate models with the same baseline distributions based on three independent generalized order statistics samples. Then, especially by considering the results of the diagnostic tests for the non-diseased, early-diseased stage and fully diseased populations, we make inference about sensitivity to the early disease stage parameter. The maximum likelihood estimator, a generalized pivotal estimator and some Bayes estimators are obtained for different structures of prior distributions. The percentile bootstrap confidence interval, a generalized pivotal confidence interval and some Bayesian credible intervals are also presented. A Monte Carlo simulation study is used to evaluate the performances of the obtained point estimators and confidence/credible intervals and two competitors. We use two real datasets to illustrate the proposed methods.

18.
HGG Adv ; 5(3): 100304, 2024 Jul 18.
Article in English | MEDLINE | ID: mdl-38720460

ABSTRACT

Genetic correlation refers to the correlation between genetic determinants of a pair of traits. When using individual-level data, it is typically estimated based on a bivariate model specification where the correlation between the two variables is identifiable and can be estimated from a covariance model that incorporates the genetic relationship between individuals, e.g., using a pre-specified kinship matrix. Inference relying on asymptotic normality of the genetic correlation parameter estimates may be inaccurate when the sample size is low, when the genetic correlation is close to the boundary of the parameter space, and when the heritability of at least one of the traits is low. We address this problem by developing a parametric bootstrap procedure to construct confidence intervals for genetic correlation estimates. The procedure simulates paired traits under a range of heritability and genetic correlation parameters, and it uses the population structure encapsulated by the kinship matrix. Heritabilities and genetic correlations are estimated using the close-form, method of moment, Haseman-Elston regression estimators. The proposed parametric bootstrap procedure is especially useful when genetic correlations are computed on pairs of thousands of traits measured on the same exact set of individuals. We demonstrate the parametric bootstrap approach on a proteomics dataset from the Jackson Heart Study.


Subject(s)
Models, Genetic , Humans , Protein Interaction Maps/genetics , Confidence Intervals , Computer Simulation , Algorithms , Phenotype
19.
Stat Med ; 43(15): 2894-2927, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38738397

ABSTRACT

Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this article, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.


Subject(s)
Algorithms , Causality , Computer Simulation , Humans , Female , Confidence Intervals , Coronary Disease/epidemiology , Models, Statistical , Data Interpretation, Statistical , Bias , Observational Studies as Topic/methods , Observational Studies as Topic/statistics & numerical data
20.
J Am Stat Assoc ; 119(545): 297-307, 2024.
Article in English | MEDLINE | ID: mdl-38716406

ABSTRACT

The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth-order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing finite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application.

SELECTION OF CITATIONS
SEARCH DETAIL