Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 109(1): 81-96, 2022 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-34932938

RESUMO

Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency < 1%) predicted damaging coding variation by using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels; some of these genes have not been previously associated with lipid levels when using rare coding variation from population-based samples. We prioritize 32 genes in array-based genome-wide association study (GWAS) loci based on aggregations of rare coding variants; three (EVI5, SH2B3, and PLIN1) had no prior association of rare coding variants with lipid levels. Most of our associated genes showed evidence of association among multiple ancestries. Finally, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes and for genes closest to GWAS index single-nucleotide polymorphisms (SNPs). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.


Assuntos
Exoma , Variação Genética , Estudo de Associação Genômica Ampla , Lipídeos/sangue , Fases de Leitura Aberta , Alelos , Glicemia/genética , Estudos de Casos e Controles , Biologia Computacional/métodos , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Predisposição Genética para Doença , Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Humanos , Metabolismo dos Lipídeos/genética , Fígado/metabolismo , Fígado/patologia , Anotação de Sequência Molecular , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Genome Res ; 31(9): 1629-1637, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34426515

RESUMO

The X Chromosome plays an important role in human development and disease. However, functional genomic and disease association studies of X genes greatly lag behind autosomal gene studies, in part owing to the unique biology of X-Chromosome inactivation (XCI). Because of XCI, most genes are only expressed from one allele. Yet, ∼30% of X genes "escape" XCI and are transcribed from both alleles, many only in a proportion of the population. Such interindividual differences are likely to be disease relevant, particularly for sex-biased disorders. To understand the functional biology for X-linked genes, we developed X-Chromosome inactivation for RNA-seq (XCIR), a novel approach to identify escape genes using bulk RNA-seq data. Our method, available as an R package, is more powerful than alternative approaches and is computationally efficient to handle large population-scale data sets. Using annotated XCI states, we examined the contribution of X-linked genes to the disease heritability in the United Kingdom Biobank data set. We show that escape and variable escape genes explain the largest proportion of X heritability, which is in large part attributable to X genes with Y homology. Finally, we investigated the role of each XCI state in sex-biased diseases and found that although XY homologous gene pairs have a larger overall effect size, enrichment for variable escape genes is significantly increased in female-biased diseases. Our results, for the first time, quantitate the importance of variable escape genes for the etiology of sex-biased disease, and our pipeline allows analysis of larger data sets for a broad range of phenotypes.


Assuntos
Genes Ligados ao Cromossomo X , Inativação do Cromossomo X , Alelos , Animais , Feminino , Genômica , Cromossomo X/genética
3.
Psychol Med ; 52(5): 968-978, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-32762793

RESUMO

BACKGROUND: Substance use occurs at a high rate in persons with a psychiatric disorder. Genetically informative studies have the potential to elucidate the etiology of these phenomena. Recent developments in genome-wide association studies (GWAS) allow new avenues of investigation. METHOD: Using results of GWAS meta-analyses, we performed a factor analysis of the genetic correlation structure, a genome-wide search of shared loci, and causally informative tests for six substance use phenotypes (four smoking, one alcohol, and one cannabis use) and five psychiatric disorders (ADHD, anorexia, depression, bipolar disorder, and schizophrenia). RESULTS: Two correlated externalizing and internalizing/psychosis factor were found, although model fit was beneath conventional standards. Of 458 loci reported in previous univariate GWAS of substance use and psychiatric disorders, about 50% (230 loci) were pleiotropic with additional 111 pleiotropic loci not reported from past GWAS. Of the 341 pleiotropic loci, 152 were associated with both substance use and psychiatric disorders, implicating neurodevelopment, cell morphogenesis, biological adhesion pathways, and enrichment in 13 different brain tissues. Seventy-five and 114 pleiotropic loci were specific to either psychiatric disorders or substance use phenotypes, implicating neuronal signaling pathway and clathrin-binding functions/structures, respectively. No consistent evidence for phenotypic causation was found across different Mendelian randomization methods. CONCLUSIONS: Genetic etiology of substance use and psychiatric disorders is highly pleiotropic and involves shared neurodevelopmental path, neurotransmission, and intracellular trafficking. In aggregate, the patterns are not consistent with vertical pleiotropy, more likely reflecting horizontal pleiotropy or more complex forms of phenotypic causation.


Assuntos
Transtornos Mentais , Esquizofrenia , Transtornos Relacionados ao Uso de Substâncias , Pleiotropia Genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Transtornos Mentais/epidemiologia , Transtornos Mentais/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Esquizofrenia/epidemiologia , Esquizofrenia/genética , Transtornos Relacionados ao Uso de Substâncias/epidemiologia , Transtornos Relacionados ao Uso de Substâncias/genética
4.
Bioinformatics ; 36(19): 4951-4954, 2020 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-32756942

RESUMO

SUMMARY: Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. AVAILABILITY AND IMPLEMENTATION: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bancos de Espécimes Biológicos , Software , Genótipo , Humanos
5.
Bioinformatics ; 36(12): 3811-3817, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32246825

RESUMO

MOTIVATION: Large scale genome-wide association studies (GWAS) have resulted in the identification of a wide range of genetic variants related to a host of complex traits and disorders. Despite their success, the individual single-nucleotide polymorphism (SNP) analysis approach adopted in most current GWAS can be limited in that it is usually biologically simple to elucidate a comprehensive genetic architecture of phenotypes and statistically underpowered due to heavy multiple-testing correction burden. On the other hand, multiple-SNP analyses (e.g. gene-based or region-based SNP-set analysis) are usually more powerful to examine the joint effects of a set of SNPs on the phenotype of interest. However, current multiple-SNP approaches can only draw an overall conclusion at the SNP-set level and does not directly inform which SNPs in the SNP-set are driving the overall genotype-phenotype association. RESULTS: In this article, we propose a new permutation-assisted tuning procedure in lasso (plasso) to identify phenotype-associated SNPs in a joint multiple-SNP regression model in GWAS. The tuning parameter of lasso determines the amount of shrinkage and is essential to the performance of variable selection. In the proposed plasso procedure, we first generate permutations as pseudo-SNPs that are not associated with the phenotype. Then, the lasso tuning parameter is delicately chosen to separate true signal SNPs and non-informative pseudo-SNPs. We illustrate plasso using simulations to demonstrate its superior performance over existing methods, and application of plasso to a real GWAS dataset gains new additional insights into the genetic control of complex traits. AVAILABILITY AND IMPLEMENTATION: R codes to implement the proposed methodology is available at https://github.com/xyz5074/plasso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudos de Associação Genética , Fenótipo
6.
PLoS Genet ; 14(7): e1007452, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-30016313

RESUMO

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.


Assuntos
Análise de Dados , Produtos do Tabaco/estatística & dados numéricos , Uso de Tabaco/genética , Alelos , Interpretação Estatística de Dados , Conjuntos de Dados como Assunto , Loci Gênicos/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
7.
Am J Hum Genet ; 101(1): 115-122, 2017 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-28669402

RESUMO

Massively parallel sequencing technologies provide great opportunities for discovering rare susceptibility variants involved in complex disease etiology via large-scale imputation and exome and whole-genome sequence-based association studies. Due to modest effect sizes, large sample sizes of tens to hundreds of thousands of individuals are required for adequately powered studies. Current analytical tools are obsolete when it comes to handling these large datasets. To facilitate the analysis of large-scale sequence-based studies, we developed SEQSpark which implements parallel processing based on Spark to increase the speed and efficiency of performing data quality control, annotation, and association analysis. To demonstrate the versatility and speed of SEQSpark, we analyzed whole-genome sequence data from the UK10K, testing for associations with waist-to-hip ratios. The analysis, which was completed in 1.5 hr, included loading data, annotation, principal component analysis, and single variant and rare variant aggregate association analysis of >9 million variants. For rare variant aggregate analysis, an exome-wide significant association (p < 2.5 × 10-6) was observed with CCDC62 (SKAT-O [p = 6.89 × 10-7], combined multivariate collapsing [p = 1.48 × 10-6], and burden of rare variants [p = 1.48 × 10-6]). SEQSpark was also used to analyze 50,000 simulated exomes and it required 1.75 hr for the analysis of a quantitative trait using several rare variant aggregate association methods. Additionally, the performance of SEQSpark was compared to Variant Association Tools and PLINK/SEQ. SEQSpark was always faster and in some situations computation was reduced to a hundredth of the time. SEQSpark will empower large sequence-based epidemiological studies to quickly elucidate genetic variation involved in the etiology of complex traits.


Assuntos
Bases de Dados de Ácidos Nucleicos , Exoma/genética , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Análise de Sequência de DNA/métodos , Software , Humanos , Análise de Componente Principal , Relação Cintura-Quadril
8.
Am J Hum Genet ; 99(1): 40-55, 2016 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-27346686

RESUMO

Platelet production, maintenance, and clearance are tightly controlled processes indicative of platelets' important roles in hemostasis and thrombosis. Platelets are common targets for primary and secondary prevention of several conditions. They are monitored clinically by complete blood counts, specifically with measurements of platelet count (PLT) and mean platelet volume (MPV). Identifying genetic effects on PLT and MPV can provide mechanistic insights into platelet biology and their role in disease. Therefore, we formed the Blood Cell Consortium (BCX) to perform a large-scale meta-analysis of Exomechip association results for PLT and MPV in 157,293 and 57,617 individuals, respectively. Using the low-frequency/rare coding variant-enriched Exomechip genotyping array, we sought to identify genetic variants associated with PLT and MPV. In addition to confirming 47 known PLT and 20 known MPV associations, we identified 32 PLT and 18 MPV associations not previously observed in the literature across the allele frequency spectrum, including rare large effect (FCER1A), low-frequency (IQGAP2, MAP1A, LY75), and common (ZMIZ2, SMG6, PEAR1, ARFGAP3/PACSIN2) variants. Several variants associated with PLT/MPV (PEAR1, MRVI1, PTGES3) were also associated with platelet reactivity. In concurrent BCX analyses, there was overlap of platelet-associated variants with red (MAP1A, TMPRSS6, ZMIZ2) and white (PEAR1, ZMIZ2, LY75) blood cell traits, suggesting common regulatory pathways with shared genetic architecture among these hematopoietic lineages. Our large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors.


Assuntos
Plaquetas/metabolismo , Exoma/genética , Variação Genética/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Volume Plaquetário Médio , Contagem de Plaquetas
9.
Nucleic Acids Res ; 45(9): e75, 2017 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-28115622

RESUMO

Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.


Assuntos
Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos
10.
Bioinformatics ; 32(9): 1423-6, 2016 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153000

RESUMO

MOTIVATION: Next-generation sequencing technologies have enabled the large-scale assessment of the impact of rare and low-frequency genetic variants for complex human diseases. Gene-level association tests are often performed to analyze rare variants, where multiple rare variants in a gene region are analyzed jointly. Applying gene-level association tests to analyze sequence data often requires integrating multiple heterogeneous sources of information (e.g. annotations, functional prediction scores, allele frequencies, genotypes and phenotypes) to determine the optimal analysis unit and prioritize causal variants. Given the complexity and scale of current sequence datasets and bioinformatics databases, there is a compelling need for more efficient software tools to facilitate these analyses. To answer this challenge, we developed RVTESTS, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals. RVTESTS also provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection. We illustrate the advantages of RVTESTS in functionality and efficiency using the 1000 Genomes Project data. AVAILABILITY AND IMPLEMENTATION: RVTESTS is available on Linux, MacOS and Windows. Source code and executable files can be obtained at https://github.com/zhanxw/rvtests CONTACT: zhanxw@gmail.com; goncalo@umich.edu; dajiang.liu@outlook.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Genética , Software , Animais , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Linguagens de Programação
11.
Ann Neurol ; 80(5): 730-740, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27717122

RESUMO

OBJECTIVE: In observational epidemiologic studies, higher plasma high-density lipoprotein cholesterol (HDL-C) has been associated with increased risk of intracerebral hemorrhage (ICH). DNA sequence variants that decrease cholesteryl ester transfer protein (CETP) gene activity increase plasma HDL-C; as such, medicines that inhibit CETP and raise HDL-C are in clinical development. Here, we test the hypothesis that CETP DNA sequence variants associated with higher HDL-C also increase risk for ICH. METHODS: We performed 2 candidate-gene analyses of CETP. First, we tested individual CETP variants in a discovery cohort of 1,149 ICH cases and 1,238 controls from 3 studies, followed by replication in 1,625 cases and 1,845 controls from 5 studies. Second, we constructed a genetic risk score comprised of 7 independent variants at the CETP locus and tested this score for association with HDL-C as well as ICH risk. RESULTS: Twelve variants within CETP demonstrated nominal association with ICH, with the strongest association at the rs173539 locus (odds ratio [OR] = 1.25, standard error [SE] = 0.06, p = 6.0 × 10-4 ) with no heterogeneity across studies (I2 = 0%). This association was replicated in patients of European ancestry (p = 0.03). A genetic score of CETP variants found to increase HDL-C by ∼2.85mg/dl in the Global Lipids Genetics Consortium was strongly associated with ICH risk (OR = 1.86, SE = 0.13, p = 1.39 × 10-6 ). INTERPRETATION: Genetic variants in CETP associated with increased HDL-C raise the risk of ICH. Given ongoing therapeutic development in CETP inhibition and other HDL-raising strategies, further exploration of potential adverse cerebrovascular outcomes may be warranted. Ann Neurol 2016;80:730-740.


Assuntos
Hemorragia Cerebral/genética , Proteínas de Transferência de Ésteres de Colesterol/genética , Predisposição Genética para Doença/genética , Adulto , Idoso , HDL-Colesterol/sangue , HDL-Colesterol/genética , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único
12.
Genet Epidemiol ; 39(8): 619-23, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26394715

RESUMO

Next-generation sequencing has enabled the study of a comprehensive catalogue of genetic variants for their impact on various complex diseases. Numerous consortia studies of complex traits have publically released their summary association statistics, which have become an invaluable resource for learning the underlying biology, understanding the genetic architecture, and guiding clinical translations. There is great interest in the field in developing novel statistical methods for analyzing and interpreting results from these genotype-phenotype association studies. One popular platform for method development and data analysis is R. In order to enable these analyses in R, it is necessary to develop packages that can efficiently query files of summary association statistics, explore the linkage disequilibrium structure between variants, and integrate various bioinformatics databases. The complexity and scale of sequence datasets and databases pose significant computational challenges for method developers. To address these challenges and facilitate method development, we developed the R package SEQMINER for annotating and querying files of sequence variants (e.g., VCF/BCF files) and summary association statistics (e.g., METAL/RAREMETAL files), and for integrating bioinformatics databases. SEQMINER provides an infrastructure where novel methods can be distributed and applied to analyzing sequence datasets in practice. We illustrate the performance of SEQMINER using datasets from the 1000 Genomes Project. We show that SEQMINER is highly efficient and easy to use. It will greatly accelerate the process of applying statistical innovations to analyze and interpret sequence-based associations. The R package, its source code and documentations are available from http://cran.r-project.org/web/packages/seqminer and http://seqminer.genomic.codes/.


Assuntos
Biologia Computacional/métodos , Estudos de Associação Genética/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Linguagens de Programação , Sequência de Bases , Interpretação Estatística de Dados , Bases de Dados Factuais , Variação Genética/genética , Genoma Humano , Humanos , Análise de Sequência de DNA , Software
13.
Genet Epidemiol ; 39(4): 227-38, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25740221

RESUMO

Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.


Assuntos
Estudos de Associação Genética/métodos , Variação Genética/genética , Modelos Genéticos , Software , Apolipoproteína C-III/genética , Proteínas de Transferência de Ésteres de Colesterol/genética , HDL-Colesterol/genética , Simulação por Computador , Exoma/genética , Família , Genótipo , Humanos , Lipase/genética , Lipase Lipoproteica/genética , Fenótipo
14.
Am J Hum Genet ; 91(4): 585-96, 2012 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-23022102

RESUMO

Next-generation sequencing has led to many complex-trait rare-variant (RV) association studies. Although single-variant association analysis can be performed, it is grossly underpowered. Therefore, researchers have developed many RV association tests that aggregate multiple variant sites across a genetic region (e.g., gene), and test for the association between the trait and the aggregated genotype. After these aggregate tests detect an association, it is only possible to estimate the average genetic effect for a group of RVs. As a result of the "winner's curse," such an estimate can be biased. Although for common variants one can obtain unbiased estimates of genetic parameters by analyzing a replication sample, for RVs it is desirable to obtain unbiased genetic estimates for the study where the association is identified. This is because there can be substantial heterogeneity of RV sites and frequencies even among closely related populations. In order to obtain an unbiased estimate for aggregated RV analysis, we developed bootstrap-sample-split algorithms to reduce the bias of the winner's curse. The unbiased estimates are greatly important for understanding the population-specific contribution of RVs to the heritability of complex traits. We also demonstrate both theoretically and via simulations that for aggregate RV analysis the genetic variance for a gene or region will always be underestimated, sometimes substantially, because of the presence of noncausal variants or because of the presence of causal variants with effects of different magnitudes or directions. Therefore, even if RVs play a major role in the complex-trait etiologies, a portion of the heritability will remain missing, and the contribution of RVs to the complex-trait etiologies will be underestimated.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Algoritmos , Frequência do Gene , Loci Gênicos , Predisposição Genética para Doença , Variação Genética , Genótipo , Humanos , Modelos Estatísticos
15.
PLoS Genet ; 8(11): e1003075, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23166519

RESUMO

Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting secondary-trait RV associations.


Assuntos
Predisposição Genética para Doença , Variação Genética , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Pressão Sanguínea/genética , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Itália , Lipoproteínas LDL/genética , Modelos Genéticos , Fenótipo , Software
16.
Nat Commun ; 15(1): 4260, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38769300

RESUMO

Transcriptome-wide association study (TWAS) is a popular approach to dissect the functional consequence of disease associated non-coding variants. Most existing TWAS use bulk tissues and may not have the resolution to reveal cell-type specific target genes. Single-cell expression quantitative trait loci (sc-eQTL) datasets are emerging. The largest bulk- and sc-eQTL datasets are most conveniently available as summary statistics, but have not been broadly utilized in TWAS. Here, we present a new method EXPRESSO (EXpression PREdiction with Summary Statistics Only), to analyze sc-eQTL summary statistics, which also integrates 3D genomic data and epigenomic annotation to prioritize causal variants. EXPRESSO substantially improves existing methods. We apply EXPRESSO to analyze multi-ancestry GWAS datasets for 14 autoimmune diseases. EXPRESSO uniquely identifies 958 novel gene x trait associations, which is 26% more than the second-best method. Among them, 492 are unique to cell type level analysis and missed by TWAS using whole blood. We also develop a cell type aware drug repurposing pipeline, which leverages EXPRESSO results to identify drug compounds that can reverse disease gene expressions in relevant cell types. Our results point to multiple drugs with therapeutic potentials, including metformin for type 1 diabetes, and vitamin K for ulcerative colitis.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Análise de Célula Única , Humanos , Análise de Célula Única/métodos , Estudo de Associação Genômica Ampla/métodos , Predisposição Genética para Doença/genética , Transcriptoma/genética , Doenças Autoimunes/genética , Polimorfismo de Nucleotídeo Único , Herança Multifatorial/genética , Perfilação da Expressão Gênica/métodos
17.
Nat Commun ; 15(1): 5357, 2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-38918381

RESUMO

Large national-level electronic health record (EHR) datasets offer new opportunities for disentangling the role of genes and environment through deep phenotype information and approximate pedigree structures. Here we use the approximate geographical locations of patients as a proxy for spatially correlated community-level environmental risk factors. We develop a spatial mixed linear effect (SMILE) model that incorporates both genetics and environmental contribution. We extract EHR and geographical locations from 257,620 nuclear families and compile 1083 disease outcome measurements from the MarketScan dataset. We augment the EHR with publicly available environmental data, including levels of particulate matter 2.5 (PM2.5), nitrogen dioxide (NO2), climate, and sociodemographic data. We refine the estimates of genetic heritability and quantify community-level environmental contributions. We also use wind speed and direction as instrumental variables to assess the causal effects of air pollution. In total, we find PM2.5 or NO2 have statistically significant causal effects on 135 diseases, including respiratory, musculoskeletal, digestive, metabolic, and sleep disorders, where PM2.5 and NO2 tend to affect biologically distinct disease categories. These analyses showcase several robust strategies for jointly modeling genetic and environmental effects on disease risk using large EHR datasets and will benefit upcoming biobank studies in the era of precision medicine.


Assuntos
Poluição do Ar , Dióxido de Nitrogênio , Material Particulado , Humanos , Poluição do Ar/efeitos adversos , Material Particulado/efeitos adversos , Dióxido de Nitrogênio/efeitos adversos , Dióxido de Nitrogênio/análise , Fatores de Risco , Exposição Ambiental/efeitos adversos , Masculino , Feminino , Registros Eletrônicos de Saúde , Poluentes Atmosféricos/efeitos adversos , Poluentes Atmosféricos/análise , Poluentes Atmosféricos/toxicidade , Predisposição Genética para Doença , Interação Gene-Ambiente , Pessoa de Meia-Idade , Adulto
18.
Nat Commun ; 15(1): 2359, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38504097

RESUMO

Genetic mechanisms of blood pressure (BP) regulation remain poorly defined. Using kidney-specific epigenomic annotations and 3D genome information we generated and validated gene expression prediction models for the purpose of transcriptome-wide association studies in 700 human kidneys. We identified 889 kidney genes associated with BP of which 399 were prioritised as contributors to BP regulation. Imputation of kidney proteome and microRNAome uncovered 97 renal proteins and 11 miRNAs associated with BP. Integration with plasma proteomics and metabolomics illuminated circulating levels of myo-inositol, 4-guanidinobutanoate and angiotensinogen as downstream effectors of several kidney BP genes (SLC5A11, AGMAT, AGT, respectively). We showed that genetically determined reduction in renal expression may mimic the effects of rare loss-of-function variants on kidney mRNA/protein and lead to an increase in BP (e.g., ENPEP). We demonstrated a strong correlation (r = 0.81) in expression of protein-coding genes between cells harvested from urine and the kidney highlighting a diagnostic potential of urinary cell transcriptomics. We uncovered adenylyl cyclase activators as a repurposing opportunity for hypertension and illustrated examples of BP-elevating effects of anticancer drugs (e.g. tubulin polymerisation inhibitors). Collectively, our studies provide new biological insights into genetic regulation of BP with potential to drive clinical translation in hypertension.


Assuntos
Hipertensão , Proteoma , Humanos , Pressão Sanguínea/genética , Proteoma/genética , Proteoma/metabolismo , Transcriptoma/genética , Multiômica , Hipertensão/metabolismo , Rim/metabolismo , Proteínas de Transporte de Sódio-Glucose/genética , Proteínas de Transporte de Sódio-Glucose/metabolismo
19.
Am J Hum Genet ; 87(6): 790-801, 2010 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-21129725

RESUMO

There is solid evidence that complex traits can be caused by rare variants. Next-generation sequencing technologies are powerful tools for mapping rare variants. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary for association studies. For gene-based mapping of rare variants, two replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 are genotyped and followed-up and (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested. The efficiency of the two strategies is dependent on the proportions of causative variants discovered in stage 1 and sequencing/genotyping errors. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful. However, the power gain is small (1) for large-scale studies with thousands of individuals, because a large fraction of causative variant sites can be observed and (2) for small- to medium-scale studies with a few hundred samples, because a large proportion of the locus population attributable risk can be explained by the uncovered variants. Therefore, genotyping can be a temporal solution for replicating genetic studies if stage 1 and 2 samples are drawn from the same population. However, sequence-based replication is advantageous if the stage 1 sample is small or novel variants discovery is also of interest. It is shown that currently attainable levels of sequencing error only minimally affect the comparison, and the advantage of sequence-based replication remains.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Análise de Sequência de DNA , Humanos , Modelos Genéticos , Probabilidade
20.
Bioinformatics ; 28(13): 1745-51, 2012 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-22556370

RESUMO

MOTIVATION: Next-generation sequencing greatly increases the capacity to detect rare-variant complex-trait associations. However, it is still expensive to sequence a large number of samples and therefore often small datasets are used. Given cost constraints, a potentially more powerful two-step strategy is to sequence a subset of the sample to discover variants, and genotype the identified variants in the remaining sample. If only cases are sequenced, directly combining sequence and genotype data will lead to inflated type-I errors in rare-variant association analysis. Although several methods have been developed to correct for the bias, they are either underpowered or theoretically invalid. We proposed a new method SEQCHIP to integrate genotype and sequence data, which can be used with most existing rare-variant tests. RESULTS: It is demonstrated using both simulated and real datasets that the SEQCHIP method has controlled type-I errors, and is substantially more powerful than all other currently available methods. AVAILABILITY: SEQCHIP is implemented in an R-Package and is available at http://linkage.rockefeller.edu/suzanne/seqchip/Seqchip.html.


Assuntos
Estudos de Associação Genética/métodos , Variação Genética , Análise de Sequência de DNA/métodos , Adenoma/genética , Estudos de Casos e Controles , Neoplasias Colorretais/genética , Genótipo , Humanos , Fenótipo , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA