Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 111(9): 1899-1913, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39173627

RESUMO

Understanding the molecular mechanisms of complex traits is essential for developing targeted interventions. We analyzed liver expression quantitative-trait locus (eQTL) meta-analysis data on 1,183 participants to identify conditionally distinct signals. We found 9,013 eQTL signals for 6,564 genes; 23% of eGenes had two signals, and 6% had three or more signals. We then integrated the eQTL results with data from 29 cardiometabolic genome-wide association study (GWAS) traits and identified 1,582 GWAS-eQTL colocalizations for 747 eGenes. Non-primary eQTL signals accounted for 17% of all colocalizations. Isolating signals by conditional analysis prior to coloc resulted in 37% more colocalizations than using marginal eQTL and GWAS data, highlighting the importance of signal isolation. Isolating signals also led to stronger evidence of colocalization: among 343 eQTL-GWAS signal pairs in multi-signal regions, analyses that isolated the signals of interest resulted in higher posterior probability of colocalization for 41% of tests. Leveraging allelic heterogeneity, we predicted causal effects of gene expression on liver traits for four genes. To predict functional variants and regulatory elements, we colocalized eQTL with liver chromatin accessibility QTL (caQTL) and found 391 colocalizations, including 73 with non-primary eQTL signals and 60 eQTL signals that colocalized with both a caQTL and a GWAS signal. Finally, we used publicly available massively parallel reporter assays in HepG2 to highlight 14 eQTL signals that include at least one expression-modulating variant. This multi-faceted approach to unraveling the genetic underpinnings of liver-related traits could lead to therapeutic development.


Assuntos
Estudo de Associação Genômica Ampla , Fígado , Locos de Características Quantitativas , Humanos , Alelos , Doenças Cardiovasculares/genética , Predisposição Genética para Doença , Fígado/metabolismo , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Am J Hum Genet ; 109(10): 1894-1908, 2022 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-36206743

RESUMO

Individuals with cystic fibrosis (CF) develop complications of the gastrointestinal tract influenced by genetic variants outside of CFTR. Cystic fibrosis-related diabetes (CFRD) is a distinct form of diabetes with a variable age of onset that occurs frequently in individuals with CF, while meconium ileus (MI) is a severe neonatal intestinal obstruction affecting ∼20% of newborns with CF. CFRD and MI are slightly correlated traits with previous evidence of overlap in their genetic architectures. To better understand the genetic commonality between CFRD and MI, we used whole-genome-sequencing data from the CF Genome Project to perform genome-wide association. These analyses revealed variants at 11 loci (6 not previously identified) that associated with MI and at 12 loci (5 not previously identified) that associated with CFRD. Of these, variants at SLC26A9, CEBPB, and PRSS1 associated with both traits; variants at SLC26A9 and CEBPB increased risk for both traits, while variants at PRSS1, the higher-risk alleles for CFRD, conferred lower risk for MI. Furthermore, common and rare variants within the SLC26A9 locus associated with MI only or CFRD only. As expected, different loci modify risk of CFRD and MI; however, a subset exhibit pleiotropic effects indicating etiologic and mechanistic overlap between these two otherwise distinct complications of CF.


Assuntos
Fibrose Cística , Diabetes Mellitus , Doenças do Recém-Nascido , Obstrução Intestinal , Fibrose Cística/complicações , Fibrose Cística/genética , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Diabetes Mellitus/genética , Estudo de Associação Genômica Ampla , Humanos , Recém-Nascido , Obstrução Intestinal/complicações , Obstrução Intestinal/genética
3.
Am J Hum Genet ; 109(11): 1986-1997, 2022 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-36198314

RESUMO

Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Calibragem , Genótipo , Aprendizado de Máquina
4.
Am J Hum Genet ; 109(9): 1638-1652, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36055212

RESUMO

Hypoxia-inducible factor prolyl hydroxylase inhibitors (HIF-PHIs) are currently under clinical development for treating anemia in chronic kidney disease (CKD), but it is important to monitor their cardiovascular safety. Genetic variants can be used as predictors to help inform the potential risk of adverse effects associated with drug treatments. We therefore aimed to use human genetics to help assess the risk of adverse cardiovascular events associated with therapeutically altered EPO levels to help inform clinical trials studying the safety of HIF-PHIs. By performing a genome-wide association meta-analysis of EPO (n = 6,127), we identified a cis-EPO variant (rs1617640) lying in the EPO promoter region. We validated this variant as most likely causal in controlling EPO levels by using genetic and functional approaches, including single-base gene editing. Using this variant as a partial predictor for therapeutic modulation of EPO and large genome-wide association data in Mendelian randomization tests, we found no evidence (at p < 0.05) that genetically predicted long-term rises in endogenous EPO, equivalent to a 2.2-unit increase, increased risk of coronary artery disease (CAD, OR [95% CI] = 1.01 [0.93, 1.07]), myocardial infarction (MI, OR [95% CI] = 0.99 [0.87, 1.15]), or stroke (OR [95% CI] = 0.97 [0.87, 1.07]). We could exclude increased odds of 1.15 for cardiovascular disease for a 2.2-unit EPO increase. A combination of genetic and functional studies provides a powerful approach to investigate the potential therapeutic profile of EPO-increasing therapies for treating anemia in CKD.


Assuntos
Anemia , Doença da Artéria Coronariana , Infarto do Miocárdio , Insuficiência Renal Crônica , Anemia/tratamento farmacológico , Anemia/genética , Doença da Artéria Coronariana/genética , Estudo de Associação Genômica Ampla , Humanos , Análise da Randomização Mendeliana , Infarto do Miocárdio/genética , Insuficiência Renal Crônica/genética
5.
Hepatology ; 80(5): 1012-1025, 2024 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-38536042

RESUMO

BACKGROUND AND AIMS: It is not known why severe cystic fibrosis (CF) liver disease (CFLD) with portal hypertension occurs in only ~7% of people with CF. We aimed to identify genetic modifiers for severe CFLD to improve understanding of disease mechanisms. APPROACH AND RESULTS: Whole-genome sequencing was available in 4082 people with CF with pancreatic insufficiency (n = 516 with severe CFLD; n = 3566 without CFLD). We tested ~15.9 million single nucleotide polymorphisms (SNPs) for association with severe CFLD versus no-CFLD, using pre-modulator clinical phenotypes including (1) genetic variant ( SERPINA1 ; Z allele) previously associated with severe CFLD; (2) candidate SNPs (n = 205) associated with non-CF liver diseases; (3) genome-wide association study of common/rare SNPs; (4) transcriptome-wide association; and (5) gene-level and pathway analyses. The Z allele was significantly associated with severe CFLD ( p = 1.1 × 10 -4 ). No significant candidate SNPs were identified. A genome-wide association study identified genome-wide significant SNPs in 2 loci and 2 suggestive loci. These 4 loci contained genes [significant, PKD1 ( p = 8.05 × 10 -10 ) and FNBP1 ( p = 4.74 × 10 -9 ); suggestive, DUSP6 ( p = 1.51 × 10 -7 ) and ANKUB1 ( p = 4.69 × 10 -7 )] relevant to severe CFLD pathophysiology. The transcriptome-wide association identified 3 genes [ CXCR1 ( p = 1.01 × 10 -6 ) , AAMP ( p = 1.07 × 10 -6 ), and TRBV24 ( p = 1.23 × 10 -5 )] involved in hepatic inflammation and innate immunity. Gene-ranked analyses identified pathways enriched in genes linked to multiple liver pathologies. CONCLUSION: These results identify loci/genes associated with severe CFLD that point to disease mechanisms involving hepatic fibrosis, inflammation, innate immune function, vascular pathology, intracellular signaling, actin cytoskeleton and tight junction integrity and mechanisms of hepatic steatosis and insulin resistance. These discoveries will facilitate mechanistic studies and the development of therapeutics for severe CFLD.


Assuntos
Fibrose Cística , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Fibrose Cística/genética , Fibrose Cística/complicações , Feminino , Masculino , Adulto , Índice de Gravidade de Doença , Hepatopatias/genética , Criança , Adolescente , alfa 1-Antitripsina/genética , Adulto Jovem , Hipertensão Portal/genética , Sequenciamento Completo do Genoma
6.
Hum Genomics ; 18(1): 92, 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39218963

RESUMO

Per- and poly-fluoroalkyl substances (PFAS) are emerging contaminants of concern because of their wide use, persistence, and potential to be hazardous to both humans and the environment. Several PFAS have been designated as substances of concern; however, most PFAS in commerce lack toxicology and exposure data to evaluate their potential hazards and risks. Cardiotoxicity has been identified as a likely human health concern, and cell-based assays are the most sensible approach for screening and prioritization of PFAS. Human-induced pluripotent stem cell (iPSC)-derived cardiomyocytes are a widely used method to test for cardiotoxicity, and recent studies showed that many PFAS affect these cells. Because iPSC-derived cardiomyocytes are available from different donors, they also can be used to quantify human variability in responses to PFAS. The primary objective of this study was to characterize potential human cardiotoxic hazard, risk, and inter-individual variability in responses to PFAS. A total of 56 PFAS from different subclasses were tested in concentration-response using human iPSC-derived cardiomyocytes from 16 donors without known heart disease. Kinetic calcium flux and high-content imaging were used to evaluate biologically-relevant phenotypes such as beat frequency, repolarization, and cytotoxicity. Of the tested PFAS, 46 showed concentration-response effects in at least one phenotype and donor; however, a wide range of sensitivities were observed across donors. Inter-individual variability in the effects could be quantified for 19 PFAS, and risk characterization could be performed for 20 PFAS based on available exposure information. For most tested PFAS, toxicodynamic variability was within a factor of 10 and the margins of exposure were above 100. This study identified PFAS that may pose cardiotoxicity risk and have high inter-individual variability. It also demonstrated the feasibility of using a population-based human in vitro method to quantify population variability and identify cardiotoxicity risks of emerging contaminants.


Assuntos
Cardiotoxicidade , Fluorocarbonos , Células-Tronco Pluripotentes Induzidas , Miócitos Cardíacos , Humanos , Células-Tronco Pluripotentes Induzidas/efeitos dos fármacos , Miócitos Cardíacos/efeitos dos fármacos , Miócitos Cardíacos/patologia , Cardiotoxicidade/etiologia , Fluorocarbonos/toxicidade , Poluentes Ambientais/toxicidade , Medição de Risco , Adulto , Feminino , Masculino , Exposição Ambiental/efeitos adversos
7.
BMC Bioinformatics ; 25(1): 147, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605284

RESUMO

BACKGROUND: Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches. RESULTS: In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypotheses. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. The heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR), assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. As a convenient alternate approach, we also propose Z-REG-FDR, an approximate version of REG-FDR, that uses only Z-statistics of association between genotype and expression for each gene-SNP pair. The performance of Z-REG-FDR is evaluated using both simulated and real data. Simulations demonstrate that Z-REG-FDR performs similarly to REG-FDR, but with much improved computational speed. CONCLUSION: Our results demonstrate that the Z-REG-FDR method performs favorably compared to other methods in terms of statistical power and control of FDR. It can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics due to its fast computation and ability to be fit using only summary data.


Assuntos
Genômica , Locos de Características Quantitativas , Simulação por Computador , Teorema de Bayes , Genótipo
8.
Am J Respir Crit Care Med ; 207(10): 1324-1333, 2023 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-36921087

RESUMO

Rationale: Lung disease is the major cause of morbidity and mortality in persons with cystic fibrosis (pwCF). Variability in CF lung disease has substantial non-CFTR (CF transmembrane conductance regulator) genetic influence. Identification of genetic modifiers has prognostic and therapeutic importance. Objectives: Identify genetic modifier loci and genes/pathways associated with pulmonary disease severity. Methods: Whole-genome sequencing data on 4,248 unique pwCF with pancreatic insufficiency and lung function measures were combined with imputed genotypes from an additional 3,592 patients with pancreatic insufficiency from the United States, Canada, and France. This report describes association of approximately 15.9 million SNPs using the quantitative Kulich normal residual mortality-adjusted (KNoRMA) lung disease phenotype in 7,840 pwCF using premodulator lung function data. Measurements and Main Results: Testing included common and rare SNPs, transcriptome-wide association, gene-level, and pathway analyses. Pathway analyses identified novel associations with genes that have key roles in organ development, and we hypothesize that these genes may relate to dysanapsis and/or variability in lung repair. Results confirmed and extended previous genome-wide association study findings. These whole-genome sequencing data provide finely mapped genetic information to support mechanistic studies. No novel primary associations with common single variants or rare variants were found. Multilocus effects at chr5p13 (SLC9A3/CEP72) and chr11p13 (EHF/APIP) were identified. Variant effect size estimates at associated loci were consistently ordered across the cohorts, indicating possible age or birth cohort effects. Conclusions: This premodulator genomic, transcriptomic, and pathway association study of 7,840 pwCF will facilitate mechanistic and postmodulator genetic studies and the development of novel therapeutics for CF lung disease.


Assuntos
Fibrose Cística , Humanos , Fibrose Cística/genética , Estudo de Associação Genômica Ampla/métodos , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Gravidade do Paciente , Pulmão , Proteínas Associadas aos Microtúbulos/genética
9.
Biom J ; 65(6): e2200029, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37212427

RESUMO

Multivariate heterogeneous responses and heteroskedasticity have attracted increasing attention in recent years. In genome-wide association studies, effective simultaneous modeling of multiple phenotypes would improve statistical power and interpretability. However, a flexible common modeling system for heterogeneous data types can pose computational difficulties. Here we build upon a previous method for multivariate probit estimation using a two-stage composite likelihood that exhibits favorable computational time while retaining attractive parameter estimation properties. We extend this approach to incorporate multivariate responses of heterogeneous data types (binary and continuous), and possible heteroskedasticity. Although the approach has wide applications, it would be particularly useful for genomics, precision medicine, or individual biomedical prediction. Using a genomics example, we explore statistical power and confirm that the approach performs well for hypothesis testing and coverage percentages under a wide variety of settings. The approach has the potential to better leverage genomics data and provide interpretable inference for pleiotropy, in which a locus is associated with multiple traits.


Assuntos
Estudo de Associação Genômica Ampla , Genômica , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Genômica/métodos , Probabilidade
10.
BMC Bioinformatics ; 23(1): 468, 2022 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-36348267

RESUMO

BACKGROUND: Studying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa-taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of  these taxa-taxa relationships.  Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies. RESULTS: In this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn's disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses. CONCLUSION: C3NA offers a new microbial data analyses pipeline for refined and enriched taxa-taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation.


Assuntos
Filogenia , Consenso
11.
Am J Respir Crit Care Med ; 197(1): 79-93, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28853905

RESUMO

RATIONALE: The severity of cystic fibrosis (CF) lung disease varies widely, even for Phe508del homozygotes. Heritability studies show that more than 50% of the variability reflects non-cystic fibrosis transmembrane conductance regulator (CFTR) genetic variation; however, the full extent of the pertinent genetic variation is not known. OBJECTIVES: We sought to identify novel CF disease-modifying mechanisms using an integrated approach based on analyzing "in vivo" CF airway epithelial gene expression complemented with genome-wide association study (GWAS) data. METHODS: Nasal mucosal RNA from 134 patients with CF was used for RNA sequencing. We tested for associations of transcriptomic (gene expression) data with a quantitative phenotype of CF lung disease severity. Pathway analysis of CF GWAS data (n = 5,659 patients) was performed to identify novel pathways and assess the concordance of genomic and transcriptomic data. Association of gene expression with previously identified CF GWAS risk alleles was also tested. MEASUREMENTS AND MAIN RESULTS: Significant evidence of heritable gene expression was identified. Gene expression pathways relevant to airway mucosal host defense were significantly associated with CF lung disease severity, including viral infection, inflammation/inflammatory signaling, lipid metabolism, apoptosis, ion transport, Phe508del CFTR processing, and innate immune responses, including HLA (human leukocyte antigen) genes. Ion transport and CFTR processing pathways, as well as HLA genes, were identified across differential gene expression and GWAS signals. CONCLUSIONS: Transcriptomic analyses of CF airway epithelia, coupled to genomic (GWAS) analyses, highlight the role of heritable host defense variation in determining the pathophysiology of CF lung disease. The identification of these pathways provides opportunities to pursue targeted interventions to improve CF lung health.


Assuntos
Regulador de Condutância Transmembrana em Fibrose Cística/genética , Fibrose Cística/genética , Variação Genética , Pneumopatias/genética , RNA/genética , Adolescente , Adulto , Estudos de Coortes , Fibrose Cística/complicações , Fibrose Cística/patologia , Progressão da Doença , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Genômica , Humanos , Pneumopatias/etiologia , Pneumopatias/patologia , Masculino , Mucosa Nasal/patologia , Prognóstico , RNA/análise , Medição de Risco , Índice de Gravidade de Doença , Adulto Jovem
12.
Am J Hum Genet ; 96(2): 318-28, 2015 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-25640674

RESUMO

Variation in cystic fibrosis (CF) phenotypes, including lung disease severity, age of onset of persistent Pseudomonas aeruginosa (P. aeruginosa) lung infection, and presence of meconium ileus (MI), has been partially explained by genome-wide association studies (GWASs). It is not expected that GWASs alone are sufficiently powered to uncover all heritable traits associated with CF phenotypic diversity. Therefore, we utilized gene expression association from lymphoblastoid cells lines from 754 p.Phe508del CF-affected homozygous individuals to identify genes and pathways. LPAR6, a G protein coupled receptor, associated with lung disease severity (false discovery rate q value = 0.0006). Additional pathway analyses, utilizing a stringent permutation-based approach, identified unique signals for all three phenotypes. Pathways associated with lung disease severity were annotated in three broad categories: (1) endomembrane function, containing p.Phe508del processing genes, providing evidence of the importance of p.Phe508del processing to explain lung phenotype variation; (2) HLA class I genes, extending previous GWAS findings in the HLA region; and (3) endoplasmic reticulum stress response genes. Expression pathways associated with lung disease were concordant for some endosome and HLA pathways, with pathways identified using GWAS associations from 1,978 CF-affected individuals. Pathways associated with age of onset of persistent P. aeruginosa infection were enriched for HLA class II genes, and those associated with MI were related to oxidative phosphorylation. Formal testing demonstrated that genes showing differential expression associated with lung disease severity were enriched for heritable genetic variation and expression quantitative traits. Gene expression provided a powerful tool to identify unrecognized heritable variation, complementing ongoing GWASs in this rare disease.


Assuntos
Fibrose Cística/genética , Fibrose Cística/patologia , Genes MHC Classe I/genética , Variação Genética , Fenótipo , Receptores de Ácidos Lisofosfatídicos/genética , Estresse do Retículo Endoplasmático/genética , Perfilação da Expressão Gênica , Humanos , Modelos Lineares , Deleção de Sequência/genética
13.
Biometrics ; 74(2): 439-447, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28853138

RESUMO

Genotype eigenvectors are widely used as covariates for control of spurious stratification in genetic association. Significance testing for the accompanying eigenvalues has typically been based on a standard Tracy-Widom limiting distribution for the largest eigenvalue, derived under white-noise assumptions. It is known that even modest local correlation among markers inflates the largest eigenvalues, even in the absence of true stratification. In addition, a few sample eigenvalues may be extreme, creating further complications in accurate testing. We explore several methods to identify appropriate null eigenvalue thresholds, while remaining sensitive to eigenvalues corresponding to population stratification. We introduce a novel block permutation approach, designed to produce an appropriate null eigenvalue distribution by eliminating long-range genomic correlation while preserving local correlation. We also propose a fast approach based on eigenvalue distribution modeling, using a simple fit criterion and the general Marcenko-Pastur equation under a simple discrete eigenvalue model. Block permutation and the model-based approach work well for pure simulations and for data resampled from the 1000 Genomes project. In contrast, we find that the standard approach of computing an "effective" number of markers does not perform well. The performance of the methods is also demonstrated for a motivating example from the International Cystic Fibrosis Consortium.


Assuntos
Estudos de Associação Genética/métodos , Modelos Estatísticos , Simulação por Computador , Fibrose Cística/genética , Interpretação Estatística de Dados , Genômica/métodos , Genótipo , Humanos , Modelos Genéticos
14.
Biometrics ; 74(1): 155-164, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-28452052

RESUMO

The issue of robustness to family relationships in computing genotype ancestry scores such as eigenvector projections has received increased attention in genetic association, and is particularly challenging when sets of both unrelated individuals and closely related family members are included. The current standard is to compute loadings (left singular vectors) using unrelated individuals and to compute projected scores for remaining family members. However, projected ancestry scores from this approach suffer from shrinkage toward zero. We consider two main novel strategies: (i) matrix substitution based on decomposition of a target family-orthogonalized covariance matrix, and (ii) using family-averaged data to obtain loadings. We illustrate the performance via simulations, including resampling from 1000 Genomes Project data, and analysis of a cystic fibrosis dataset. The matrix substitution approach has similar performance to the current standard, but is simple and uses only a genotype covariance matrix, while the family-average method shows superior performance. Our approaches are accompanied by novel ancillary approaches that provide considerable insight, including individual-specific eigenvalue scree plots.


Assuntos
Família , Genótipo , Linhagem , Fibrose Cística/genética , Genoma , Humanos , Modelos Genéticos , Análise de Componente Principal
15.
Biometrics ; 74(2): 616-625, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29073327

RESUMO

The study of expression Quantitative Trait Loci (eQTL) is an important problem in genomics and biomedicine. While detection (testing) of eQTL associations has been widely studied, less work has been devoted to the estimation of eQTL effect size. To reduce false positives, detection methods frequently rely on linear modeling of rank-based normalized or log-transformed gene expression data. Unfortunately, these approaches do not correspond to the simplest model of eQTL action, and thus yield estimates of eQTL association that can be uninterpretable and inaccurate. In this article, we propose a new, log-of-linear model for eQTL action, termed ACME, that captures allelic contributions to cis-acting eQTLs in an additive fashion, yielding effect size estimates that correspond to a biologically coherent model of cis-eQTLs. We describe a non-linear least-squares algorithm to fit the model by maximum likelihood, and obtain corresponding p-values. We perform careful investigation of the model using a combination of simulated data and data from the Genotype Tissue Expression (GTEx) project. Our results reveal little evidence for dominance effects, a parsimonious result that accords with a simple biological model for allele-specific expression and supports use of the ACME model. We show that Type-I error is well-controlled under our approach in a realistic setting, so that rank-based normalizations are unnecessary. Furthermore, we show that such normalizations can be detrimental to power and estimation accuracy under the proposed model. We then show, through effect size analyses of whole-genome cis-eQTLs in the GTEx data, that using standard normalizations instead of ACME noticeably affects the ranking and sign of estimates.


Assuntos
Modelos Lineares , Locos de Características Quantitativas , Algoritmos , Alelos , Expressão Gênica , Humanos , Estatística como Assunto
16.
Hum Mol Genet ; 24(15): 4464-79, 2015 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-25935004

RESUMO

Obesity is an important component of the pathophysiology of chronic diseases. Identifying epigenetic modifications associated with elevated adiposity, including DNA methylation variation, may point to genomic pathways that are dysregulated in numerous conditions. The Illumina 450K Bead Chip array was used to assay DNA methylation in leukocyte DNA obtained from 2097 African American adults in the Atherosclerosis Risk in Communities (ARIC) study. Mixed-effects regression models were used to test the association of methylation beta value with concurrent body mass index (BMI) and waist circumference (WC), and BMI change, adjusting for batch effects and potential confounders. Replication using whole-blood DNA from 2377 White adults in the Framingham Heart Study and CD4+ T cell DNA from 991 Whites in the Genetics of Lipid Lowering Drugs and Diet Network Study was followed by testing using adipose tissue DNA from 648 women in the Multiple Tissue Human Expression Resource cohort. Seventy-six BMI-related probes, 164 WC-related probes and 8 BMI change-related probes passed the threshold for significance in ARIC (P < 1 × 10(-7); Bonferroni), including probes in the recently reported HIF3A, CPT1A and ABCG1 regions. Replication using blood DNA was achieved for 37 BMI probes and 1 additional WC probe. Sixteen of these also replicated in adipose tissue, including 15 novel methylation findings near genes involved in lipid metabolism, immune response/cytokine signaling and other diverse pathways, including LGALS3BP, KDM2B, PBX1 and BBS2, among others. Adiposity traits are associated with DNA methylation at numerous CpG sites that replicate across studies despite variation in tissue type, ethnicity and analytic approaches.


Assuntos
Aterosclerose/genética , Metilação de DNA/genética , Epigênese Genética/genética , Obesidade/genética , Negro ou Afro-Americano/genética , Idoso , Aterosclerose/patologia , Índice de Massa Corporal , Feminino , Estudo de Associação Genômica Ampla , Humanos , Metabolismo dos Lipídeos/genética , Masculino , Redes e Vias Metabólicas/genética , Pessoa de Meia-Idade , Obesidade/patologia , Circunferência da Cintura/genética , População Branca/genética
17.
Biostatistics ; 17(1): 54-64, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26040912

RESUMO

The problem of ranked inference arises in a number of settings, for which the investigator wishes to perform parameter inference after ordering a set of [Formula: see text] statistics. In contrast to inference for a single hypothesis, the ranking procedure introduces considerable bias, a problem known as the "winner's curse" in genetic association. We introduce the projack (for Prediction by Re- Ordered Jackknife and Cross-Validation, [Formula: see text]-fold). The projack is a resampling-based procedure that provides low-bias estimates of the expected ranked effect size parameter for a set of possibly correlated [Formula: see text] statistics. The approach is flexible, and has wide applicability to high-dimensional datasets, including those arising from genomics platforms. Initially, motivated for the setting where original data are available for resampling, the projack can be extended to the situation where only the vector of [Formula: see text] values is available. We illustrate the projack for correction of the winner's curse in genetic association, although it can be used much more generally.


Assuntos
Interpretação Estatística de Dados , Estudo de Associação Genômica Ampla/métodos , Viés , Humanos , Psoríase/genética
18.
Microb Ecol Health Dis ; 28(1): 1303265, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28572753

RESUMO

Background: Recent studies of various human microbiome habitats have revealed thousands of bacterial species and the existence of large variation in communities of microorganisms in the same habitats across individual human subjects. Previous efforts to summarize this diversity, notably in the human gut and vagina, have categorized microbiome profiles by clustering them into community state types (CSTs). The functional relevance of specific CSTs has not been established. Objective: We investigate whether CSTs can be used to assess dynamics in the microbiome. Design: We conduct a re-analysis of five sequencing-based microbiome surveys derived from vaginal samples with repeated measures. Results: We observe that detection of a CST transition is largely insensitive to choices in methods for normalization or clustering. We find that healthy subjects persist in a CST for two to three weeks or more on average, while those with evidence of dysbiosis tend to change more often. Changes in CST can be gradual or occur over less than one day. Upcoming CST changes and switches to high-risk CSTs can be predicted with high accuracy in certain scenarios. Finally, we observe that presence of Gardnerella vaginalis is a strong predictor of an upcoming CST change. Conclusion: Overall, our results show that the CST concept is useful for studying microbiome dynamics.

19.
Biostatistics ; 16(3): 611-25, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25792622

RESUMO

A number of biomedical problems require performing many hypothesis tests, with an attendant need to apply stringent thresholds. Often the data take the form of a series of predictor vectors, each of which must be compared with a single response vector, perhaps with nuisance covariates. Parametric tests of association are often used, but can result in inaccurate type I error at the extreme thresholds, even for large sample sizes. Furthermore, standard two-sided testing can reduce power compared with the doubled [Formula: see text]-value, due to asymmetry in the null distribution. Exact (permutation) testing is attractive, but can be computationally intensive and cumbersome. We present an approximation to exact association tests of trend that is accurate and fast enough for standard use in high-throughput settings, and can easily provide standard two-sided or doubled [Formula: see text]-values. The approach is shown to be equivalent under permutation to likelihood ratio tests for the most commonly used generalized linear models (GLMs). For linear regression, covariates are handled by working with covariate-residualized responses and predictors. For GLMs, stratified covariates can be handled in a manner similar to exact conditional testing. Simulations and examples illustrate the wide applicability of the approach. The accompanying mcc package is available on CRAN http://cran.r-project.org/web/packages/mcc/index.html.


Assuntos
Ensaios de Triagem em Larga Escala/estatística & dados numéricos , Bioestatística , Neoplasias da Mama/genética , Simulação por Computador , Fibrose Cística/genética , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Funções Verossimilhança , Modelos Lineares , Polimorfismo de Nucleotídeo Único , Tamanho da Amostra , Software
20.
Biometrics ; 72(1): 165-74, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26259845

RESUMO

A variety of pathway/gene-set approaches have been proposed to provide evidence of higher-level biological phenomena in the association of expression with experimental condition or clinical outcome. Among these approaches, it has been repeatedly shown that resampling methods are far preferable to approaches that implicitly assume independence of genes. However, few approaches have been optimized for the specific characteristics of RNA-Seq transcription data, in which mapped tags produce discrete counts with varying library sizes, and with potential outliers or skewness patterns that violate parametric assumptions. We describe transformations to RNA-Seq data to improve power for linear associations with outcome and flexibly handle normalization factors. Using these transformations or alternate transformations, we apply recently developed null approximations to quadratic form statistics for both self-contained and competitive pathway testing. The approach provides a convenient integrated platform for RNA-Seq pathway testing. We demonstrate that the approach provides appropriate type I error control without actual permutation and is powerful under many settings in comparison to competing approaches. Pathway analysis of data from a study of F344 vs. HIV1Tg rats, and of sex differences in lymphoblastoid cell lines from humans, strongly supports the biological interpretability of the findings.


Assuntos
Mineração de Dados/métodos , Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Transdução de Sinais/genética , Fatores de Transcrição/genética , Algoritmos , Animais , Interpretação Estatística de Dados , Feminino , Masculino , Mapeamento de Interação de Proteínas/métodos , Ratos , Fatores Sexuais , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA