Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Heliyon ; 9(11): e22260, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38058617

RESUMO

A two-parameter unit distribution and its regression model plus its extension to 0 and 1 inflation is introduced and studied. The distribution is called the unit upper truncated Weibull (UUTW) distribution, while the inflated variant is called the 0-1 inflated unit upper truncated Weibull (ZOIUUTW) distribution. The UUTW distribution has an increasing and a J-shaped hazard rate function. The parameters of the proposed models are estimated by the method of maximum likelihood estimation. For the UUTW distribution, two practical examples involving household expenditure and maximum flood level data are used to show its flexibility and the proposed distribution demonstrates better fit tendencies than some of the competing unit distributions. Application of the proposed regression model demonstrates adequate capability in describing the real data set with better modeling proficiency than the existing competing models. Then, for the ZOIUUTW distribution, the CD34+ data involving cancer patients are analyzed to show the flexibility of the model in characterizing inflation at both endpoints of the unit interval.

2.
J Appl Stat ; 49(16): 4278-4293, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36353301

RESUMO

In disease screening, a biomarker combination developed by combining multiple markers tends to have a higher sensitivity than an individual marker. Parametric methods for marker combination rely on the inverse of covariance matrices, which is often a non-trivial problem for high-dimensional data generated by modern high-throughput technologies. Additionally, another common problem in disease diagnosis is the existence of limit of detection (LOD) for an instrument - that is, when a biomarker's value falls below the limit, it cannot be observed and is assigned an NA value. To handle these two challenges in combining high-dimensional biomarkers with the presence of LOD, we propose a resample-replace lasso procedure. We first impute the values below LOD and then use the graphical lasso method to estimate the means and precision matrices for the high-dimensional biomarkers. The simulation results show that our method outperforms alternative methods such as either substitute NA values with LOD values or remove observations that have NA values. A real case analysis on a protein profiling study of glioblastoma patients on their survival status indicates that the biomarker combination obtained through the proposed method is more accurate in distinguishing between two groups.

3.
J Appl Stat ; 49(15): 3846-3867, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36324479

RESUMO

In systematic reviews and meta-analyses, one is interested in combining information from a variety of sources in order to obtain unbiased and efficient pooled estimates of the mean treatment effect compared to a control group along with the corresponding standard errors and confidence intervals, particularly when the source data is unavailable. However, in many studies the mean and standard deviation are not reported in lieu of other descriptive measures such as the median and quartiles. In this note we provide a theoretically optimal best linear unbiased estimator (BLUE) strategy for combining different types of summary information in order to pool results and estimate the overall treatment effect and the corresponding confidence intervals. Our approach is less biased and much more flexible than past attempts at solving this problem and can accommodate combining a variety of summary information across studies. We show that confidence intervals based on our methods have the appropriate coverage probabilities. Our proposed methods are theoretically justified and verified by simulation studies. The BLUE method is illustrated via a real data application.

4.
J Appl Stat ; 48(4): 583-604, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35706989

RESUMO

This study examines the optimal selections of bandwidth and semi-metric for a functional partial linear model. Our proposed method begins by estimating the unknown error density using a kernel density estimator of residuals, where the regression function, consisting of parametric and nonparametric components, can be estimated by functional principal component and functional Nadayara-Watson estimators. The estimation accuracy of the regression function and error density crucially depends on the optimal estimations of bandwidth and semi-metric. A Bayesian method is utilized to simultaneously estimate the bandwidths in the regression function and kernel error density by minimizing the Kullback-Leibler divergence. For estimating the regression function and error density, a series of simulation studies demonstrate that the functional partial linear model gives improved estimation and forecast accuracies compared with the functional principal component regression and functional nonparametric regression. Using a spectroscopy dataset, the functional partial linear model yields better forecast accuracy than some commonly used functional regression models. As a by-product of the Bayesian method, a pointwise prediction interval can be obtained, and marginal likelihood can be used to select the optimal semi-metric.

5.
J Appl Stat ; 47(13-15): 2785-2807, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35707433

RESUMO

Gamma-ray bursts (GRBs) have been confidently identified thus far and are prescribed to different physical scenarios, black hole mergers, and collapse of massive stars. The distribution of GRBs duration, which is one of the main characteristics of GRBs, is bimodal. Hence, many authors have used mixtures of distribution models to fit them, which suffers serious estimation problems either from classical or Bayesian approaches. Therefore, in this article we introduced a more flexible class of weighted bimodal distribution, called alpha two-piece skew normal (ATPSN), for modeling GRBs duration data set. Some of the main probabilistic and inferential properties of the distribution are discussed and a simulation study is carried out to illustrate the performance of the MLEs.

6.
Ann Appl Stat ; 12(1): 510-539, 2018 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29731954

RESUMO

Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challenging due to the information loss in sequencing experiments. A recent accumulation of multiple RNA-seq data sets from the same tissue or cell type provides new opportunities to improve the accuracy of isoform quantification. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples when estimating isoform abundance. These methods ignore the possible heterogeneity in the quality of different samples and could result in biased and unrobust estimates. In this article, we develop a method, which we call "joint modeling of multiple RNA-seq samples for accurate isoform quantification" (MSIQ), for more accurate and robust isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples by allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy and effectiveness of MSIQ compared with alternative methods through simulation studies on D. melanogaster genes. We justify MSIQ's advantages over existing approaches via application studies on real RNA-seq data from human embryonic stem cells, brain tissues, and the HepG2 immortalized cell line. We also perform a comprehensive analysis of how the isoform quantification accuracy would be affected by RNA-seq sample heterogeneity and different experimental protocols.

7.
Appl Math ; 32(4): 379-396, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29683128

RESUMO

Many modern biomedical studies have yielded survival data with high-throughput predictors. The goals of scientific research often lie in identifying predictive biomarkers, understanding biological mechanisms and making accurate and precise predictions. Variable screening is a crucial first step in achieving these goals. This work conducts a selective review of feature screening procedures for survival data with ultrahigh dimensional covariates. We present the main methodologies, along with the key conditions that ensure sure screening properties. The practical utility of these methods is examined via extensive simulations. We conclude the review with some future opportunities in this field.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA