Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 135
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Stat Appl Genet Mol Biol ; 23(1)2024 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-38235525

RESUMEN

Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.


Asunto(s)
Modelos Genéticos , Polimorfismo de Nucleótido Simple , Haplotipos , Teorema de Bayes , Fenotipo , Estudio de Asociación del Genoma Completo
2.
Ann Hum Genet ; 87(6): 302-315, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37771252

RESUMEN

INTRODUCTION: Population stratification (PS) is a major source of confounding in population-based genetic association studies of quantitative traits. Principal component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for PS in association studies. Previous studies have shown that LMM can be interpreted as including all principal components (PCs) as random-effect covariates. However, including all PCs in LMM may dilute the influence of relevant PCs in some scenarios, while including only a few preselected PCs in PCR may fail to fully capture the genetic diversity. MATERIALS AND METHODS: To address these shortcomings, we introduce Bayestrat-a method to detect associated variants with PS correction under the Bayesian LASSO framework. To adjust for PS, Bayestrat accommodates a large number of PCs and utilizes appropriate shrinkage priors to shrink the effects of nonassociated PCs. RESULTS: Simulation results show that Bayestrat consistently controls type I error rates and achieves higher power compared to its non-shrinkage counterparts, especially when the number of PCs included in the model is large. As a demonstration of the utility of Bayestrat, we apply it to the Multi-Ethnic Study of Atherosclerosis (MESA). Variants and genes associated with serum triglyceride or HDL cholesterol are identified in our analyses. DISCUSSION: The automatic and self-selection features of Bayestrat make it particularly suited in situations with complex underlying PS scenarios, where it is unknown a priori which PCs are potential confounders, yet the number that needs to be considered could be large in order to fully account for PS.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Humanos , Teorema de Bayes , Estudios de Asociación Genética , Simulación por Computador , Modelos Lineales , Fenotipo
3.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33201180

RESUMEN

The prevalence of dropout events is a serious problem for single-cell Hi-C (scHiC) data due to insufficient sequencing depth and data coverage, which brings difficulties in downstream studies such as clustering and structural analysis. Complicating things further is the fact that dropouts are confounded with structural zeros due to underlying properties, leading to observed zeros being a mixture of both types of events. Although a great deal of progress has been made in imputing dropout events for single cell RNA-sequencing (RNA-seq) data, little has been done in identifying structural zeros and imputing dropouts for scHiC data. In this paper, we adapted several methods from the single-cell RNA-seq literature for inference on observed zeros in scHiC data and evaluated their effectiveness. Through an extensive simulation study and real data analysis, we have shown that a couple of the adapted single-cell RNA-seq algorithms can be powerful for correctly identifying structural zeros and accurately imputing dropout values. Downstream analysis using the imputed values showed considerable improvement for clustering cells of the same types together over clustering results before imputation.


Asunto(s)
Algoritmos , Simulación por Computador , ARN Citoplasmático Pequeño , RNA-Seq , Análisis de la Célula Individual , Programas Informáticos , Humanos , ARN Citoplasmático Pequeño/genética , ARN Citoplasmático Pequeño/metabolismo
4.
PLoS Comput Biol ; 18(6): e1010129, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35696429

RESUMEN

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not all zeros are created equal: some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros); others are indeed due to insufficient sequencing depth (sampling zeros or dropouts), especially for loci that interact infrequently. Differentiating between structural zeros and dropouts is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchical model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data have led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.


Asunto(s)
Cromatina , Cromosomas , Teorema de Bayes , Análisis por Conglomerados , Análisis Espacial
5.
Biometrics ; 79(4): 3445-3457, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37066855

RESUMEN

Finite mixture of regressions (FMR) are commonly used to model heterogeneous effects of covariates on a response variable in settings where there are unknown underlying subpopulations. FMRs, however, cannot accommodate situations where covariates' effects also vary according to an "index" variable-known as finite mixture of varying coefficient regression (FM-VCR). Although complex, this situation occurs in real data applications: the osteocalcin (OCN) data analyzed in this manuscript presents a heterogeneous relationship where the effect of a genetic variant on OCN in each hidden subpopulation varies over time. Oftentimes, the number of covariates with varying coefficients also presents a challenge: in the OCN study, genetic variants on the same chromosome are considered jointly. The relative proportions of hidden subpopulations may also change over time. Nevertheless, existing methods cannot provide suitable solutions for accommodating all these features in real data applications. To fill this gap, we develop statistical methodologies based on regularized local-kernel likelihood for simultaneous parameter estimation and variable selection in sparse FM-VCR models. We study large-sample properties of the proposed methods. We then carry out a simulation study to evaluate the performance of various penalties adopted for our regularized approach and ascertain the ability of a BIC-type criterion for estimating the number of subpopulations. Finally, we applied the FM-VCR model to analyze the OCN data and identified several covariates, including genetic variants, that have age-dependent effects on OCN.


Asunto(s)
Modelos Estadísticos , Simulación por Computador , Funciones de Verosimilitud
6.
Genet Epidemiol ; 45(1): 36-45, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32864779

RESUMEN

The breakthroughs in next generation sequencing have allowed us to access data consisting of both common and rare variants, and in particular to investigate the impact of rare genetic variation on complex diseases. Although rare genetic variants are thought to be important components in explaining genetic mechanisms of many diseases, discovering these variants remains challenging, and most studies are restricted to population-based designs. Further, despite the shift in the field of genome-wide association studies (GWAS) towards studying rare variants due to the "missing heritability" phenomenon, little is known about rare X-linked variants associated with complex diseases. For instance, there is evidence that X-linked genes are highly involved in brain development and cognition when compared with autosomal genes; however, like most GWAS for other complex traits, previous GWAS for mental diseases have provided poor resources to deal with identification of rare variant associations on X-chromosome. In this paper, we address the two issues described above by proposing a method that can be used to test X-linked variants using sequencing data on families. Our method is much more general than existing methods, as it can be applied to detect both common and rare variants, and is applicable to autosomes as well. Our simulation study shows that the method is efficient, and exhibits good operational characteristics. An application to the University of Miami Study on Genetics of Autism and Related Disorders also yielded encouraging results.


Asunto(s)
Genes Ligados a X , Estudio de Asociación del Genoma Completo , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Genéticos , Herencia Multifactorial
7.
Brief Bioinform ; 20(4): 1205-1214, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-29091999

RESUMEN

How chromosomes fold and how distal genomic elements interact with one another at a genomic scale have been actively pursued in the past decade following the seminal work describing the Chromosome Conformation Capture (3C) assay. Essentially, 3C-based technologies produce two-dimensional (2D) contact maps that capture interactions between genomic fragments. Accordingly, a plethora of analytical methods have been proposed to take a 2D contact map as input to recapitulate the underlying whole genome three-dimensional (3D) structure of the chromatin. However, their performance in terms of several factors, including data resolution and ability to handle contact map features, have not been sufficiently evaluated. This task is taken up in this article, in which we consider several recent and/or well-regarded methods, both optimization-based and model-based, for their aptness of producing 3D structures using contact maps generated based on a population of cells. These methods are evaluated and compared using both simulated and real data. Several criteria have been used. For simulated data sets, the focus is on accurate recapitulation of the entire structure given the existence of the gold standard. For real data sets, comparison with distances measured by Florescence in situ Hybridization and consistency with several genomic features of known biological functions are examined.


Asunto(s)
Cromatina/química , Cromatina/genética , Animales , Cromatina/ultraestructura , Cromosomas Humanos/química , Cromosomas Humanos/genética , Cromosomas Humanos/ultraestructura , Biología Computacional/métodos , Simulación por Computador , Bases de Datos Genéticas , Genoma Humano , Humanos , Imagenología Tridimensional/métodos , Hibridación Fluorescente in Situ , Ratones , Modelos Genéticos , Conformación Molecular
8.
J Cell Biochem ; 120(3): 3056-3070, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30548288

RESUMEN

Distal regulatory elements influence the activity of gene promoters through chromatin looping. Chromosome conformation capture (3C) methods permit identification of chromatin contacts across different regions of the genome. However, due to limitations in the resolution of these methods, the detection of functional chromatin interactions remains a challenge. In the current study, we employ an integrated approach to define and characterize the functional chromatin contacts of human pancreatic cancer cells. We applied tethered chromatin capture to define classes of chromatin domains on a genome-wide scale. We identified three types of structural domains (topologically associated, boundary, and gap) and investigated the functional relationships of these domains with respect to chromatin state and gene expression. We uncovered six distinct sub-domains associated with epigenetic states. Interestingly, specific epigenetically active domains are sensitive to treatment with histone acetyltransferase (HAT) inhibitors and decrease in H3K27 acetylation levels. To examine whether the subdomains that change upon drug treatment are functionally linked to transcription factor regulation, we compared TCF7L2 chromatin binding and gene regulation to HAT inhibition. We identified a subset of coding RNA genes that together can stratify pancreatic cancer patients into distinct survival groups. Overall, this study describes a process to evaluate the functional features of chromosome architecture and reveals the impact of epigenetic inhibitors on chromosome architecture and identifies genes that may provide insight into disease outcome.


Asunto(s)
Benzoatos/farmacología , Compuestos Bicíclicos Heterocíclicos con Puentes/farmacología , Cromatina/metabolismo , Redes Reguladoras de Genes , Neoplasias Pancreáticas/genética , Pirazoles/farmacología , Pirimidinonas/farmacología , Proteína 2 Similar al Factor de Transcripción 7/metabolismo , Línea Celular Tumoral , Cromatina/química , Cromatina/genética , Ensamble y Desensamble de Cromatina , Mapeo Cromosómico , Epigénesis Genética/efectos de los fármacos , Epigenómica , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Redes Reguladoras de Genes/efectos de los fármacos , Humanos , Nitrobencenos , Neoplasias Pancreáticas/metabolismo , Pirazolonas , Proteína 2 Similar al Factor de Transcripción 7/genética
9.
EMBO J ; 34(4): 502-16, 2015 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-25535248

RESUMEN

Human transcription factors recognize specific DNA sequence motifs to regulate transcription. It is unknown whether a single transcription factor is able to bind to distinctly different motifs on chromatin, and if so, what determines the usage of specific motifs. By using a motif-resolution chromatin immunoprecipitation-exonuclease (ChIP-exo) approach, we find that agonist-liganded human androgen receptor (AR) and antagonist-liganded AR bind to two distinctly different motifs, leading to distinct transcriptional outcomes in prostate cancer cells. Further analysis on clinical prostate tissues reveals that the binding of AR to these two distinct motifs is involved in prostate carcinogenesis. Together, these results suggest that unique ligands may switch DNA motifs recognized by ligand-dependent transcription factors in vivo. Our findings also provide a broad mechanistic foundation for understanding ligand-specific induction of gene expression profiles.


Asunto(s)
Antagonistas de Receptores Androgénicos/química , Andrógenos/química , ADN/metabolismo , Neoplasias de la Próstata/metabolismo , Receptores Androgénicos/metabolismo , Antagonistas de Receptores Androgénicos/metabolismo , Andrógenos/metabolismo , Proliferación Celular/fisiología , Inmunoprecipitación de Cromatina , Ensayo de Cambio de Movilidad Electroforética , Humanos , Masculino , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
10.
Hum Hered ; 83(4): 175-195, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30799419

RESUMEN

BACKGROUND: The variants identified in genome-wide association studies account for only a small fraction of disease heritability. A key to this "missing heritability" is believed to be rare variants. Specifically, we focus on rare haplotype variant (rHTV). The existing methods for detecting rHTV are mostly population-based, and as such, are susceptible to population stratification and admixture, leading to an inflated false-positive rate. Family-based methods are more robust in this respect. METHODS: We propose a method for detecting rHTVs associated with quantitative traits called family-based quantitative Bayesian LASSO (famQBL). FamQBL can analyze any type of pedigree and is based on a mixed model framework. We regularize the haplotype effects using Bayesian LASSO and estimate the posterior distributions using Markov chain Monte Carlo methods. RESULTS: We conduct simulation studies, including analyses of Genetic Analysis Workshop 18 simulated data, to study the properties of famQBL and compare with a standard family-based haplotype association test implemented in FBAT (family-based association test) software. We find famQBL to be more powerful than FBAT with well-controlled false-positive rates. We also apply famQBL to the Framingham Heart Study data and detect an rHTV associated with diastolic blood pressure. CONCLUSION: FamQBL can help uncover rHTVs associated with quantitative traits.


Asunto(s)
Enfermedades Cardiovasculares/genética , Estudio de Asociación del Genoma Completo , Haplotipos , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Teorema de Bayes , Enfermedades Cardiovasculares/patología , Femenino , Humanos , Masculino , Modelos Genéticos , Linaje , Fenotipo
11.
Genet Epidemiol ; 41(4): 363-371, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28300291

RESUMEN

Recent advances in genotyping with high-density markers allow researchers access to genomic variants including rare ones. Linkage disequilibrium (LD) is widely used to provide insight into evolutionary history. It is also the basis for association mapping in humans and other species. Better understanding of the genomic LD structure may lead to better-informed statistical tests that can improve the power of association studies. Although rare variant associations with common diseases (RVCD) have been extensively studied recently, there is very limited understanding, and even controversial view of LD structures among rare variants and between rare and common variants. In fact, many popular RVCD tests make the assumptions that rare variants are independent. In this report, we show that two commonly used LD measures are not capable of detecting LD when rare variants are involved. We present this argument from two perspectives, both the LD measures themselves and the computational issues associated with them. To address these issues, we propose an alternative LD measure, the polychoric correlation, that was originally designed for detecting associations among categorical variables. Using simulated as well as the 1000 Genomes data, we explore the performances of LD measures in detail and discuss their implications in association studies.


Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo , Cromosomas Humanos Par 21/genética , Simulación por Computador , Frecuencia de los Genes/genética , Genotipo , Humanos , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética
12.
Brief Bioinform ; 17(6): 926-937, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-26454095

RESUMEN

DNA methylation is a well-established epigenetic mark, whose pattern throughout the genome, especially in the promoter or CpG islands, may be modified in a cell at a disease stage. Recently developed probabilistic approaches allow distributing methylation signals at nucleotide resolution from MethylCap-seq data. Standard statistical methods for detecting differential methylation suffer from 'curse of dimensionality' and sparsity in signals, resulting in high false-positive rates. Strong correlation of signals between CG sites also yields spurious results. In this article, we review applicability of high-dimensional mean vector tests for detection of differentially methylated regions (DMRs) and compare and contrast such tests with other methods for detecting DMRs. Comprehensive simulation studies are conducted to highlight the performance of these tests under different settings. Based on our observation, we make recommendations on the optimal test to use. We illustrate the superiority of mean vector tests in detecting cancer-related canonical gene pathways, which are significantly enriched for acute myeloid leukemia and ovarian cancer.


Asunto(s)
Metilación de ADN , Islas de CpG , Epigenómica , Humanos , Leucemia Mieloide Aguda , Regiones Promotoras Genéticas
13.
BMC Genet ; 19(Suppl 1): 67, 2018 09 17.
Artículo en Inglés | MEDLINE | ID: mdl-30255768

RESUMEN

BACKGROUND: Association studies using a single type of omics data have been successful in identifying disease-associated genetic markers, but the underlying mechanisms are unaddressed. To provide a possible explanation of how these genetic factors affect the disease phenotype, integration of multiple omics data is needed. RESULTS: We propose a novel method, LIPID (likelihood inference proposal for indirect estimation), that uses both single nucleotide polymorphism (SNP) and DNA methylation data jointly to analyze the association between a trait and SNPs. The total effect of SNPs is decomposed into direct and indirect effects, where the indirect effects are the focus of our investigation. Simulation studies show that LIPID performs better in various scenarios than existing methods. Application to the GAW20 data also leads to encouraging results, as the genes identified appear to be biologically relevant to the phenotype studied. CONCLUSIONS: The proposed LIPID method is shown to be meritorious in extensive simulations and in real-data analyses.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genómica/métodos , Metilación de ADN , Humanos , Hipertrigliceridemia/tratamiento farmacológico , Hipertrigliceridemia/genética , Hipoglucemiantes/uso terapéutico , Polimorfismo de Nucleótido Simple
14.
Brief Bioinform ; 16(5): 759-68, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25596401

RESUMEN

In recent years, a myriad of new statistical methods have been proposed for detecting associations of rare single-nucleotide variants (SNVs) with common diseases. These methods can be generally classified as 'collapsing' or 'haplotyping' based. The former is the predominant class, composed of most of the rare variant association methods proposed to date. However, recent works have suggested that haplotyping-based methods may offer advantages and can even be more powerful than collapsing methods in certain situations. In this article, we review and compare collapsing- versus haplotyping-based methods/software in terms of both power and type I error. For collapsing methods, we consider three approaches: Combined Multivariate and Collapsing, Sequence Kernel Association Test and Family-Based Association Test (FBAT): the first two are population based and are among the most popular; the last test is family based, a modification from the popular FBAT to accommodate rare SNVs. For haplotyping-based methods, we include Logistic Bayesian Lasso (LBL) for population data and family-based LBL (famLBL) for family (trio) data. These two methods are selected, as they can be used to test association for specific rare and common haplotypes. Our results show that haplotype methods can be more powerful than collapsing methods if there are interacting SNVs leading to larger haplotype effects. Even if only common SNVs are genotyped, haplotype methods can still detect specific rare haplotypes that tag rare causal SNVs. As expected, family-based methods are robust, whereas population-based methods are susceptible, to population substructure. However, the population-based haplotype approach appears to have smaller inflation of type I error than its collapsing counterparts.


Asunto(s)
Predisposición Genética a la Enfermedad , Haplotipos , Teorema de Bayes , Estudios de Casos y Controles , Humanos , Polimorfismo de Nucleótido Simple
15.
J Hum Genet ; 62(9): 819-829, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28424482

RESUMEN

Detecting gene-environment interactions with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTVs) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case-control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case-control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that, when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error rate. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 gene.


Asunto(s)
Teorema de Bayes , Interacción Gen-Ambiente , Estudios de Asociación Genética/métodos , Haplotipos , Modelos Genéticos , Algoritmos , Simulación por Computador , Frecuencia de los Genes , Variación Genética , Humanos , Patrón de Herencia , Oportunidad Relativa
16.
Biometrics ; 73(1): 52-62, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-27214023

RESUMEN

A gene may be controlled by distal enhancers and repressors, not merely by regulatory elements in its promoter. Spatial organization of chromosomes is the mechanism that brings genes and their distal regulatory elements into close proximity. Recent molecular techniques, coupled with Next Generation Sequencing (NGS) technology, enable genome-wide detection of physical contacts between distant genomic loci. In particular, Hi-C is an NGS-aided assay for the study of genome-wide spatial interactions. The availability of such data makes it possible to reconstruct the underlying three-dimensional (3D) spatial chromatin structure. In this article, we present the Poisson Random effect Architecture Model (PRAM) for such an inference. The main feature of PRAM that separates it from previous methods is that it addresses the issue of over-dispersion and takes correlations among contact counts into consideration, thereby achieving greater consistency with observed data. PRAM was applied to Hi-C data to illustrate its performance and to compare the predicted distances with those measured by a Fluorescence In Situ Hybridization (FISH) validation experiment. Further, PRAM was compared to other methods in the literature based on both real and simulated data.


Asunto(s)
Cromatina/química , Modelos Biológicos , Modelos Estadísticos , Análisis Espacial , Regulación de la Expresión Génica , Hibridación Fluorescente in Situ , Distribución de Poisson
17.
Biometrics ; 73(1): 344-355, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-27478935

RESUMEN

Finding rare variants and gene-environment interactions (GXE) is critical in dissecting complex diseases. We consider the problem of detecting GXE where G is a rare haplotype and E is a nongenetic factor. Such methods typically assume G-E independence, which may not hold in many applications. A pertinent example is lung cancer-there is evidence that variants on Chromosome 15q25.1 interact with smoking to affect the risk. However, these variants are associated with smoking behavior rendering the assumption of G-E independence inappropriate. With the motivation of detecting GXE under G-E dependence, we extend an existing approach, logistic Bayesian LASSO, which assumes G-E independence (LBL-GXE-I) by modeling G-E dependence through a multinomial logistic regression (referred to as LBL-GXE-D). Unlike LBL-GXE-I, LBL-GXE-D controls type I error rates in all situations; however, it has reduced power when G-E independence holds. To control type I error without sacrificing power, we further propose a unified approach, LBL-GXE, to incorporate uncertainty in the G-E independence assumption by employing a reversible jump Markov chain Monte Carlo method. Our simulations show that LBL-GXE has power similar to that of LBL-GXE-I when G-E independence holds, yet has well-controlled type I errors in all situations. To illustrate the utility of LBL-GXE, we analyzed a lung cancer dataset and found several significant interactions in the 15q25.1 region, including one between a specific rare haplotype and smoking.


Asunto(s)
Biometría/métodos , Interacción Gen-Ambiente , Neoplasias Pulmonares/etiología , Fumar/efectos adversos , Cromosomas Humanos Par 15 , Simulación por Computador , Interpretación Estadística de Datos , Variación Genética , Haplotipos , Humanos , Modelos Logísticos , Neoplasias Pulmonares/genética , Modelos Genéticos , Riesgo , Fumar/genética
18.
BMC Bioinformatics ; 17: 70, 2016 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-26852142

RESUMEN

BACKGROUND: Assays that are capable of detecting genome-wide chromatin interactions have produced massive amount of data and led to great understanding of the chromosomal three-dimensional (3D) structure. As technology becomes more sophisticated, higher-and-higher resolution data are being produced, going from the initial 1 Megabases (Mb) resolution to the current 10 Kilobases (Kb) or even 1 Kb resolution. The availability of genome-wide interaction data necessitates development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. Most of the methods were proposed for analyzing data at low resolution (1 Mb). Their behaviors are thus unknown for higher resolution data. For such data, one of the key features is the high proportion of "0" contact counts among all available data, in other words, the excess of zeros. RESULTS: To address the issue of excess of zeros, in this paper, we propose a truncated Random effect EXpression (tREX) method that can handle data at various resolutions. We then assess the performance of tREX and a number of leading existing methods for recovering the underlying chromatin 3D structure. This was accomplished by creating in-silico data to mimic multiple levels of resolution and submit the methods to a "stress test". Finally, we applied tREX and the comparison methods to a Hi-C dataset for which FISH measurements are available to evaluate estimation accuracy. CONCLUSION: The proposed tREX method achieves consistently good performance in all 30 simulated settings considered. It is not only robust to resolution level and underlying parameters, but also insensitive to model misspecification. This conclusion is based on observations made in terms of 3D structure estimation accuracy and preservation of topologically associated domains. Application of the methods to the human lymphoblastoid cell line data on chromosomes 14 and 22 further substantiates the superior performance of tREX: the constructed 3D structure from tREX is consistent with the FISH measurements, and the corresponding distances predicted by tREX have higher correlation with the FISH measurements than any of the comparison methods. SOFTWARE: An open-source R-package is available at http://www.stat.osu.edu/~statgen/Software/tRex.


Asunto(s)
Cromatina/química , Cromosomas Humanos/química , Simulación por Computador , Linfocitos/química , Modelos Teóricos , Programas Informáticos , Células Cultivadas , Genoma Humano , Humanos , Hibridación Fluorescente in Situ
19.
Bioinformatics ; 31(10): 1648-54, 2015 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-25609792

RESUMEN

MOTIVATION: Microbiota compositions have great implications in human health, such as obesity and other conditions. As such, it is of great importance to cluster samples or taxa to visualize and discover community substructures. Graphical representation of metagenomic count data relies on two aspects, measure of dissimilarity between samples/taxa and algorithm used to estimate coordinates to study microbiota communities. UniFrac is a dissimilarity measure commonly used in metagenomic research, but it requires a phylogenetic tree. Principal coordinate analysis (PCoA) is a popular algorithm for estimating two-dimensional (2D) coordinates for graphical representation, although alternative and higher-dimensional representations may reveal underlying community substructures invisible in 2D representations. RESULTS: We adapt a new measure of dissimilarity, penalized Kendall's τ-distance, which does not depend on a phylogenetic tree, and hence more readily applicable to a wider class of problems. Further, we propose to use metric multidimensional scaling (MDS) as an alternative to PCoA for graphical representation. We then devise a novel procedure for determining the number of clusters in conjunction with PAM (mPAM). We show superior performances with higher-dimensional representations. We further demonstrate the utility of mPAM for accurate clustering analysis, especially with higher-dimensional MDS models. Applications to two human microbiota datasets illustrate greater insights into the subcommunity structure with a higher-dimensional analysis.


Asunto(s)
Algoritmos , Bacterias/genética , Biología Computacional/métodos , Gráficos por Computador , Almacenamiento y Recuperación de la Información/métodos , Metagenómica , Modelos Estadísticos , Bacterias/clasificación , Análisis por Conglomerados , Tracto Gastrointestinal/microbiología , Humanos , Análisis Multivariante , Filogenia
20.
J Hum Genet ; 61(11): 965-975, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-27412875

RESUMEN

Although genome-wide association studies have successfully detected numerous associations between common variants and complex diseases, these variants typically can only explain a small part of the heritable component of a disease. With the advent of next-generation sequencing, attention has turned to rare variants. Recently, a variety of approaches for detecting associations of rare variants have been proposed, including the Kullback-Leibler divergence-based tests (KLTs) for detecting genotypic differences between cases and controls. However, few of these approaches consider linkage disequilibrium (LD) structure among rare variants and common variants. In this study, we propose two block-based association tests for testing the effects of rare variants on a disease. The main idea for this approach comes from the hypothesis that a region of interest may consist of two or more LD blocks such that single-nucleotide variants (SNVs) within each block are correlated, whereas SNVs in different blocks are independent or weakly correlated. Under this hypothesis, we propose two tests that are generalizations of the KLTs by taking the block structure into account. A simulation study under various scenarios shows that the proposed methods have well-controlled type I error rates and outperform some leading methods in the literature. Moreover, application to the Dallas Heart Study data demonstrates the feasibility and performance of the two proposed methods in a realistic setting.


Asunto(s)
Estudios de Asociación Genética/métodos , Variación Genética , Modelos Genéticos , Algoritmos , Simulación por Computador , Conjuntos de Datos como Asunto , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA