Pesquisa | Portal de Pesquisa da BVS

1.

Homogeneity tests of covariance for high-dimensional functional data with applications to event segmentation.

Zhong, Ping-Shou.

Biometrics ; 79(4): 3332-3344, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-36807124

RESUMO

We consider inference problems for high-dimensional (HD) functional data with a dense number of T repeated measurements taken for a large number of p variables from a small number of n experimental units. The spatial and temporal dependence, high dimensionality, and dense number of repeated measurements pose theoretical and computational challenges. This paper has two aims; our first aim is to solve the theoretical and computational challenges in testing equivalence among covariance matrices from HD functional data. The second aim is to provide computationally efficient and tuning-free tools with guaranteed stochastic error control. The weak convergence of the stochastic process formed by the test statistics is established under the "large p, large T, and small n" setting. If the null is rejected, we further show that the locations of the change points can be estimated consistently. The estimator's rate of convergence is shown to depend on the data dimension, sample size, number of repeated measurements, and signal-to-noise ratio. We also show that our proposed computation algorithms can significantly reduce the computation time and are applicable to real-world data with a large number of HD-repeated measurements (e.g., functional magnetic resonance imaging (fMRI) data). Simulation results demonstrate both the finite sample performance and computational effectiveness of our proposed procedures. We observe that the empirical size of the test is well controlled at the nominal level, and the locations of multiple change points can be accurately identified. An application to fMRI data demonstrates that our proposed methods can identify event boundaries in the preface of the television series Sherlock. Code to implement the procedures is available in an R package named TechPhD.

Assuntos

Algoritmos , Imageamento por Ressonância Magnética , Simulação por Computador , Imageamento por Ressonância Magnética/métodos , Tamanho da Amostra

2.

An optimal kernel-based U-statistic method for quantitative gene-set association analysis.

He, Tao; Li, Shaoyu; Zhong, Ping-Shou; Cui, Yuehua.

Genet Epidemiol ; 43(2): 137-149, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30456931

RESUMO

Single-variant-based genome-wide association studies have successfully detected many genetic variants that are associated with a number of complex traits. However, their power is limited due to weak marginal signals and ignoring potential complex interactions among genetic variants. The set-based strategy was proposed to provide a remedy where multiple genetic variants in a given set (e.g., gene or pathway) are jointly evaluated, so that the systematic effect of the set is considered. Among many, the kernel-based testing (KBT) framework is one of the most popular and powerful methods in set-based association studies. Given a set of candidate kernels, the method has been proposed to choose the one with the smallest p-value. Such a method, however, can yield inflated Type 1 error, especially when the number of variants in a set is large. Alternatively one can get p values by permutations which, however, could be very time-consuming. In this study, we proposed an efficient testing procedure that cannot only control Type 1 error rate but also have power close to the one obtained under the optimal kernel in the candidate kernel set, for quantitative trait association studies. Our method, a maximum kernel-based U-statistic method, is built upon the KBT framework and is based on asymptotic results under a high-dimensional setting. Hence it can efficiently deal with the case where the number of variants in a set is much larger than the sample size. Both simulation and real data analysis demonstrate the advantages of the method compared with its counterparts.

Assuntos

Algoritmos , Estudos de Associação Genética/métodos , Estatística como Assunto , Simulação por Computador , Estudo de Associação Genômica Ampla , Humanos , Recém-Nascido , Modelos Genéticos

3.

Additive varying-coefficient model for nonlinear gene-environment interactions.

Wu, Cen; Zhong, Ping-Shou; Cui, Yuehua.

Stat Appl Genet Mol Biol ; 17(2)2018 02 08.

Artigo em Inglês | MEDLINE | ID: mdl-29420308

RESUMO

Gene-environment (G×E) interaction plays a pivotal role in understanding the genetic basis of complex disease. When environmental factors are measured continuously, one can assess the genetic sensitivity over different environmental conditions on a disease trait. Motivated by the increasing awareness of gene set based association analysis over single variant based approaches, we proposed an additive varying-coefficient model to jointly model variants in a genetic system. The model allows us to examine how variants in a gene set are moderated by an environment factor to affect a disease phenotype. We approached the problem from a variable selection perspective. In particular, we select variants with varying, constant and zero coefficients, which correspond to cases of G×E interaction, no G×E interaction and no genetic effect, respectively. The procedure was implemented through a two-stage iterative estimation algorithm via the smoothly clipped absolute deviation penalty function. Under certain regularity conditions, we established the consistency property in variable selection as well as effect separation of the two stage iterative estimators, and showed the optimal convergence rates of the estimates for varying effects. In addition, we showed that the estimate of non-zero constant coefficients enjoy the oracle property. The utility of our procedure was demonstrated through simulation studies and real data analysis.

Assuntos

Peso ao Nascer/genética , Interação Gene-Ambiente , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Algoritmos , Índice de Massa Corporal , Idade Gestacional , Humanos , Recém-Nascido , Mães , Fenótipo

4.

Order-restricted inference for means with missing values.

Wang, Heng; Zhong, Ping-Shou.

Biometrics ; 73(3): 972-980, 2017 09.

Artigo em Inglês | MEDLINE | ID: mdl-28182830

RESUMO

Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease.

Assuntos

Interpretação Estatística de Dados , Distribuição de Qui-Quadrado , Projetos de Pesquisa

5.

Testing a single regression coefficient in high dimensional linear models.

Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling.

J Econom ; 195(1): 154-168, 2016 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-28663668

RESUMO

In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.

6.

A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium.

Steibel, Juan P; Wang, Heng; Zhong, Ping-Shou.

BMC Bioinformatics ; 16: 61, 2015 Feb 22.

Artigo em Inglês | MEDLINE | ID: mdl-25887316

RESUMO

BACKGROUND: Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals. This requires prior ascertainment of the cSNP genotypes for all individuals. To meet the needs, we propose hidden Markov methods to call SNPs from next generation RNA sequence data when ASE possibly exists. RESULTS: We propose two hidden Markov models (HMMs), HMM-ASE and HMM-NASE that consider or do not consider ASE, respectively, in order to improve genotyping accuracy. Both HMMs have the advantages of calling the genotypes of several SNPs simultaneously and allow mapping error which, respectively, utilize the dependence among SNPs and correct the bias due to mapping error. In addition, HMM-ASE exploits ASE information to further improve genotype accuracy when the ASE is likely to be present. Simulation results indicate that the HMMs proposed demonstrate a very good prediction accuracy in terms of controlling both the false discovery rate (FDR) and the false negative rate (FNR). When ASE is present, the HMM-ASE had a lower FNR than HMM-NASE, while both can control the false discovery rate (FDR) at a similar level. By exploiting linkage disequilibrium (LD), a real data application demonstrate that the proposed methods have better sensitivity and similar FDR in calling heterozygous SNPs than the VarScan method. Sensitivity and FDR are similar to that of the BCFtools and Beagle methods. The resulting genotypes show good properties for the estimation of the genetic parameters and ASE ratios. CONCLUSIONS: We introduce HMMs, which are able to exploit LD and account for the ASE and mapping errors, to simultaneously call SNPs from the next generation RNA sequence data. The method introduced can reliably call for cSNP genotypes even in the presence of ASE and under low sequencing coverage. As a byproduct, the proposed method is able to provide predictions of ASE ratios for the heterozygous genotypes, which can then be used for ASE testing.

Assuntos

Desequilíbrio Alélico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Desequilíbrio de Ligação , Cadeias de Markov , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de RNA/métodos , Algoritmos , Alelos , Biologia Computacional/métodos , Genótipo , Humanos , RNA/genética

7.

Assessment of an Airway Curriculum in a Pulmonary and Critical Care Fellowship Program.

Imayama, Ikuyo; Haas, Kevin P; Binder, Ashley; Barac, Tanja; Holanday, John; Zhou, Xia; Zhong, Ping-Shou; Dudek, Steven M; Ferrer Marrero, Tirsa M.

ATS Sch ; 5(3): 420-432, 2024 Sep 30.

Artigo em Inglês | MEDLINE | ID: mdl-39371227

RESUMO

Background: Endotracheal intubations (EIs) in the intensive care unit are high-risk procedures often performed by pulmonary and critical care medicine (PCCM) providers. The Accreditation Council for Graduate Medical Education mandates PCCM fellows' competency in this procedure; however, the learning experiences vary across programs. After conducting a needs assessment, we developed a curriculum unique to our institution to supplement our fellows' existing EI experiences in the operating room and the intensive care unit. Objective: To assess the curriculum's short-term objectives: knowledge acquisition, maintenance, and practical skills 1 year after participation. Methods: We administered a survey to the graduating PCCM fellows for two consecutive years. We designed the comprehensive airway curriculum to include didactic lectures and simulation-based education. The knowledge acquisition and maintenance were measured by administering a 26-question knowledge survey before and after curriculum participation and after 1 year. The fellows also received a practical examination 1 year after participation. To compare knowledge survey scores, we used paired t tests and permutation tests. Results: In the needs assessment, 56% of graduating fellows believed they were proficient in performing EI, whereas 33% were undecided and 11% believed they were unprepared. Most believed they would need more than two courses after graduation to be confident in independently performing EIs. Most will only occasionally have backup for EI from anesthesiology or emergency medicine in their future jobs. One identified barrier to learning EI was the lack of a formal curriculum. In the knowledge assessment, nine first-year fellows participated in the curriculum. The cohort's mean presurvey score was 13.0 (standard deviation [SD], 4.5) versus 18.6 (SD, 3.6) mean postsurvey score. One year after participation, the mean survey score was 17 (SD, 1.2). The postsurvey and 1-year postparticipation survey scores were significantly higher than the presurvey scores (P < 0.05). One year after participation, the practical examination showed most fellows retained skills in EI using ramped position, video and direct laryngoscopy, bag-mask ventilation, and oropharyngeal airway placement. Conclusion: The airway curriculum enhances fellows' knowledge acquisition and maintenance 1 year after participation. The practical examination 1 year after participation highlighted the skills retained and those still needing improvement.

8.

Visual Outcomes of Cataract Surgery in Patients with Previous History of Implantable Collamer Lens.

Del Risco, Norma E; Talbot, Chad L; Moin, Kayvon A; Manion, Garrett N; Brown, Alex H; Walker, Stephen M; Zhong, Ping-Shou; Zhang, Hanting; Hoopes, Phillip C; Moshirfar, Majid.

J Clin Med ; 13(15)2024 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-39124559

RESUMO

Background/Objectives: This retrospective case series analyzed visual outcomes in patients with a prior history of implantable collamer lens (ICL) implantation who underwent cataract extraction (CE). A secondary aim was to investigate the relationship between vault height and the rate of cataract development. Methods: Visual acuity and refraction measurements were collected after CE at one week, one month and six months. Vault height measurements were correlated to the time until symptomatic cataracts were removed. Results: A total of 44 eyes were analyzed at six months after CE with efficacy and safety indexes of 1.20 ± 1.11 and 1.50 ± 1.06, respectively. In addition, 70% of eyes had a post-operative uncorrected distance visual acuity (UDVA) within one line of pre-operative corrected distance visual acuity (CDVA). Refractive predictability at six months demonstrated that 43% and 69% of eyes were within ±0.25 D and ±0.50 D of SEQ target, respectively. Astigmatism measured by refractive cylinder was ≤0.25 D in 17% and ≤0.50 D in 34% of eyes pre-operatively compared to 40% and 60% of eyes, respectively, at six months post-operatively. Vault heights one week after ICL (p < 0.0081) and one week before CE (p < 0.0154) demonstrated a positive linear regression with the time until CE. Conclusions: This sample population achieved favorable visual outcomes six months after CE, similar to six months after ICL implantation. Patients with a history of ICL implantation will similarly have a good visual prognosis after CE.

9.

Characterization of breast lesions using multi-parametric diffusion MRI and machine learning.

Mehta, Rahul; Bu, Yangyang; Zhong, Zheng; Dan, Guangyu; Zhong, Ping-Shou; Zhou, Changyu; Hu, Weihong; Zhou, Xiaohong Joe; Xu, Maosheng; Wang, Shiwei; Karaman, M Muge.

Phys Med Biol ; 68(8)2023 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-36808921

RESUMO

Objective. To investigate quantitative imaging markers based on parameters from two diffusion-weighted imaging (DWI) models, continuous-time random-walk (CTRW) and intravoxel incoherent motion (IVIM) models, for characterizing malignant and benign breast lesions by using a machine learning algorithm.Approach. With IRB approval, 40 women with histologically confirmed breast lesions (16 benign, 24 malignant) underwent DWI with 11b-values (50 to 3000 s/mm2) at 3T. Three CTRW parameters,Dm,α, andßand three IVIM parametersDdiff,Dperf, andfwere estimated from the lesions. A histogram was generated and histogram features of skewness, variance, mean, median, interquartile range; and the value of the 10%, 25% and 75% quantiles were extracted for each parameter from the regions-of-interest. Iterative feature selection was performed using the Boruta algorithm that uses the Benjamin Hochberg False Discover Rate to first determine significant features and then to apply the Bonferroni correction to further control for false positives across multiple comparisons during the iterative procedure. Predictive performance of the significant features was evaluated using Support Vector Machine, Random Forest, Naïve Bayes, Gradient Boosted Classifier (GB), Decision Trees, AdaBoost and Gaussian Process machine learning classifiers.Main Results. The 75% quantile, and median ofDm; 75% quantile off;mean, median, and skewness ofß;kurtosis ofDperf; and 75% quantile ofDdiffwere the most significant features. The GB differentiated malignant and benign lesions with an accuracy of 0.833, an area-under-the-curve of 0.942, and an F1 score of 0.87 providing the best statistical performance (p-value < 0.05) compared to the other classifiers.Significance. Our study has demonstrated that GB with a set of histogram features from the CTRW and IVIM model parameters can effectively differentiate malignant and benign breast lesions.

Assuntos

Neoplasias da Mama , Mama , Feminino , Humanos , Teorema de Bayes , Mama/diagnóstico por imagem , Mama/patologia , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Imagem de Difusão por Ressonância Magnética/métodos , Aprendizado de Máquina , Movimento (Física) , Reprodutibilidade dos Testes

10.

Homogeneity tests of covariance matrices with high-dimensional longitudinal data.

Zhong, Ping-Shou; Li, Runze; Santo, Shawn.

Biometrika ; 106(3): 619-634, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-31427823

RESUMO

This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. If a changepoint is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple changepoints, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the changepoint estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.

11.

Functional Mixed Effects Model for Small Area Estimation.

Maiti, Tapabrata; Sinha, Samiran; Zhong, Ping-Shou.

Scand Stat Theory Appl ; 43(3): 886-903, 2016 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-27795610

RESUMO

Functional data analysis has become an important area of research due to its ability of handling high dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models, and in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area level data, and fit a varying coefficient linear mixed effect model where the varying coefficients are semi-parametrically modeled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors, and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.

12.

A set-based association test identifies sex-specific gene sets associated with type 2 diabetes.

He, Tao; Zhong, Ping-Shou; Cui, Yuehua.

Front Genet ; 5: 395, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25429300

RESUMO

Single variant analysis in genome-wide association studies (GWAS) has been proven to be successful in identifying thousands of genetic variants associated with hundreds of complex diseases. However, these identified variants only explain a small fraction of inheritable variability in many diseases, suggesting that other resources, such as multilevel genetic variations, may contribute to disease susceptibility. In this work, we proposed to combine genetic variants that belong to a gene set, such as at gene- and pathway-level to form an integrated signal aimed to identify major players that function in a coordinated manner conferring disease risk. The integrated analysis provides novel insight into disease etiology while individual signals could be easily missed by single variant analysis. We applied our approach to a genome-wide association study of type 2 diabetes (T2D) with male and female data analyzed separately. Novel sex-specific genes and pathways were identified to increase the risk of T2D. We also demonstrated the performance of signal integration through simulation studies.

13.

A powerful statistical method identifies novel loci associated with diastolic blood pressure triggered by nonlinear gene-environment interaction.

Wang, Honglang; He, Tao; Wu, Cen; Zhong, Ping-Shou; Cui, Yuehua.

BMC Proc ; 8(Suppl 1): S61, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25519336

RESUMO

The genetic basis of blood pressure often involves multiple genetic factors and their interactions with environmental factors. Gene-environment interaction is assumed to play an important role in determining individual blood pressure variability. Older people are more prone to high blood pressure than younger ones and the risk may not display a linear trend over the life span. However, which gene shows sensitivity to aging in its effect on blood pressure is not clear. In this work, we allowed the genetic effect to vary over time and propose a varying-coefficient model to identify potential genetic players that show nonlinear response across different age stages. We detected 2 novel loci, gene MIR1263 (a microRNA coding gene) on chromosome 3 and gene UNC13B on chromosome 9, that are nonlinearly associated with diastolic blood pressure. Further experimental validation is needed to confirm this finding.

14.

Statistical dissection of cyto-nuclear epistasis subject to genomic imprinting in line crosses.

He, Tao; Sa, Jian; Zhong, Ping-Shou; Cui, Yuehua.

PLoS One ; 9(3): e91702, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24643065

RESUMO

Cytoplasm contains important metabolism reaction organelles such as mitochondria and chloroplast (in plant). In particular, mitochondria contains special DNA information which can be passed to offsprings through maternal gametes, and has been confirmed to play a pivotal role in nuclear activities. Experimental evidences have documented the importance of cyto-nuclear interactions in affecting important biological traits. While studies have also pointed out the role of interaction between imprinting nuclear DNA and cytoplasm, no statistical method has been developed to efficiently model such effect and further quantify its effect size. In this work, we developed an efficient statistical model for genome-wide estimating and testing the cytoplasmic effect, nuclear DNA imprinting effect as well as the interaction between them under reciprocal backcross and F2 designs derived from inbred lines. Parameters are estimated under maximum likelihood framework implemented with the EM algorithm. Extensive simulations show good performance in a variety of scenarios. The utility of the method is demonstrated by analyzing a published data set in an F2 family derived from C3H/HeJBir and C57BL/6 J mouse strains. Important cyto-nuclear interactions were identified. Our approach provides a quantitative framework for identifying and estimating cyto-nuclear interactions subject to genomic imprinting involved in the genetic control of complex traits.

Assuntos

Núcleo Celular/genética , Citoplasma/genética , Epistasia Genética , Impressão Genômica , Modelos Genéticos , Algoritmos , Animais , Simulação por Computador , Cruzamentos Genéticos , Feminino , Masculino , Camundongos , Camundongos Endogâmicos C3H , Camundongos Endogâmicos C57BL , Locos de Características Quantitativas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA