RESUMO
In many longitudinal studies, the number and timing of measurements differ across study subjects. Statistical analysis of such data requires accounting for both the unbalanced study design and the unequal spacing of repeated measurements. This paper proposes a time-heterogeneous D-vine copula model that allows for time adjustment in the dependence structure of unequally spaced and potentially unbalanced longitudinal data. The proposed approach not only offers flexibility over its time-homogeneous counterparts but also allows for parsimonious model specifications at the tree or vine level for a given D-vine structure. It further provides a robust strategy to specify the joint distribution of non-Gaussian longitudinal data. The performance of the time-heterogeneous D-vine copula models are evaluated through simulation studies and by a real data application. Our findings suggest improved predictive performance of the proposed approach over the linear mixed-effects model and time-homogeneous D-vine copula model.
Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Humanos , Simulação por Computador , Modelos Lineares , Estudos LongitudinaisRESUMO
The genetic landscape of diseases associated with changes in bone mineral density (BMD), such as osteoporosis, is only partially understood. Here, we explored data from 3,823 mutant mouse strains for BMD, a measure that is frequently altered in a range of bone pathologies, including osteoporosis. A total of 200 genes were found to significantly affect BMD. This pool of BMD genes comprised 141 genes with previously unknown functions in bone biology and was complementary to pools derived from recent human studies. Nineteen of the 141 genes also caused skeletal abnormalities. Examination of the BMD genes in osteoclasts and osteoblasts underscored BMD pathways, including vesicle transport, in these cells and together with in silico bone turnover studies resulted in the prioritization of candidate genes for further investigation. Overall, the results add novel pathophysiological and molecular insight into bone health and disease.
Assuntos
Densidade Óssea/genética , Regulação da Expressão Gênica/genética , Osteoblastos/metabolismo , Osteoclastos/metabolismo , Osteoporose/genética , Animais , Feminino , Ontologia Genética , Pleiotropia Genética , Estudo de Associação Genômica Ampla , Genótipo , Masculino , Camundongos , Camundongos Transgênicos , Mutação , Osteoblastos/patologia , Osteoclastos/patologia , Osteoporose/metabolismo , Fenótipo , Regiões Promotoras Genéticas , Mapas de Interação de Proteínas , Caracteres Sexuais , TranscriptomaRESUMO
MOTIVATION: High-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximizes analytic power while minimizing noise from unspecified environmental factors. RESULTS: Here we introduce 'soft windowing', a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype-phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant P-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft-windowed and non-windowed approaches, respectively, from a set of 2082 mutant mouse lines. Our method is generalizable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources. AVAILABILITY AND IMPLEMENTATION: The method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Saúde da População , Software , Animais , Estudos de Associação Genética , Humanos , Camundongos , FenótipoRESUMO
The International Mouse Phenotyping Consortium (IMPC) systematically produces and phenotypes mouse lines with presumptive null mutations to provide insight into gene function. The IMPC now uses the programmable RNA-guided nuclease Cas9 for its increased capacity and flexibility to efficiently generate null alleles in the C57BL/6N strain. In addition to being a valuable novel and accessible research resource, the production of 3313 knockout mouse lines using comparable protocols provides a rich dataset to analyze experimental and biological variables affecting in vivo gene engineering with Cas9. Mouse line production has two critical steps - generation of founders with the desired allele and germline transmission (GLT) of that allele from founders to offspring. A systematic evaluation of the variables impacting success rates identified gene essentiality as the primary factor influencing successful production of null alleles. Collectively, our findings provide best practice recommendations for using Cas9 to generate alleles in mouse essential genes, many of which are orthologs of genes linked to human disease.
Assuntos
Edição de Genes , Genes Essenciais , Camundongos Knockout , Animais , Camundongos , Edição de Genes/métodos , Sistemas CRISPR-Cas , Alelos , Camundongos Endogâmicos C57BL , Masculino , Feminino , Engenharia Genética/métodos , FenótipoRESUMO
Motivated by genetic association studies of SNPs with genotype uncertainty, we propose a generalization of the Kruskal-Wallis test that incorporates group uncertainty when comparing k samples. The extended test statistic is based on probability-weighted rank-sums and follows an asymptotic chi-square distribution with k - 1 degrees of freedom under the null hypothesis. Simulation studies confirm the validity and robustness of the proposed test in finite samples. Application to a genome-wide association study of type 1 diabetic complications further demonstrates the utilities of this generalized Kruskal-Wallis test for studies with group uncertainty. The method has been implemented as an open-resource R program, GKW.
Assuntos
Biometria/métodos , Estudos de Associação Genética/estatística & dados numéricos , Distribuição de Qui-Quadrado , Simulação por Computador , Diabetes Mellitus Tipo 1/complicações , Diabetes Mellitus Tipo 1/genética , Genótipo , Humanos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Estatísticas não Paramétricas , IncertezaRESUMO
The study of dependence between random variables is a mainstay in statistics. In many cases, the strength of dependence between two or more random variables varies according to the values of a measured covariate. We propose inference for this type of variation using a conditional copula model where the copula function belongs to a parametric copula family and the copula parameter varies with the covariate. In order to estimate the functional relationship between the copula parameter and the covariate, we propose a nonparametric approach based on local likelihood. Of importance is also the choice of the copula family that best represents a given set of data. The proposed framework naturally leads to a novel copula selection method based on cross-validated prediction errors. We derive the asymptotic bias and variance of the resulting local polynomial estimator, and outline how to construct pointwise confidence intervals. The finite-sample performance of our method is investigated using simulation studies and is illustrated using a subset of the Matched Multiple Birth data.