Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
1.
Am J Hum Genet ; 109(1): 81-96, 2022 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-34932938

RESUMO

Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency < 1%) predicted damaging coding variation by using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels; some of these genes have not been previously associated with lipid levels when using rare coding variation from population-based samples. We prioritize 32 genes in array-based genome-wide association study (GWAS) loci based on aggregations of rare coding variants; three (EVI5, SH2B3, and PLIN1) had no prior association of rare coding variants with lipid levels. Most of our associated genes showed evidence of association among multiple ancestries. Finally, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes and for genes closest to GWAS index single-nucleotide polymorphisms (SNPs). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.


Assuntos
Exoma , Variação Genética , Estudo de Associação Genômica Ampla , Lipídeos/sangue , Fases de Leitura Aberta , Alelos , Glicemia/genética , Estudos de Casos e Controles , Biologia Computacional/métodos , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Predisposição Genética para Doença , Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Humanos , Metabolismo dos Lipídeos/genética , Fígado/metabolismo , Fígado/patologia , Anotação de Sequência Molecular , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Genome Res ; 31(9): 1629-1637, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34426515

RESUMO

The X Chromosome plays an important role in human development and disease. However, functional genomic and disease association studies of X genes greatly lag behind autosomal gene studies, in part owing to the unique biology of X-Chromosome inactivation (XCI). Because of XCI, most genes are only expressed from one allele. Yet, ∼30% of X genes "escape" XCI and are transcribed from both alleles, many only in a proportion of the population. Such interindividual differences are likely to be disease relevant, particularly for sex-biased disorders. To understand the functional biology for X-linked genes, we developed X-Chromosome inactivation for RNA-seq (XCIR), a novel approach to identify escape genes using bulk RNA-seq data. Our method, available as an R package, is more powerful than alternative approaches and is computationally efficient to handle large population-scale data sets. Using annotated XCI states, we examined the contribution of X-linked genes to the disease heritability in the United Kingdom Biobank data set. We show that escape and variable escape genes explain the largest proportion of X heritability, which is in large part attributable to X genes with Y homology. Finally, we investigated the role of each XCI state in sex-biased diseases and found that although XY homologous gene pairs have a larger overall effect size, enrichment for variable escape genes is significantly increased in female-biased diseases. Our results, for the first time, quantitate the importance of variable escape genes for the etiology of sex-biased disease, and our pipeline allows analysis of larger data sets for a broad range of phenotypes.


Assuntos
Genes Ligados ao Cromossomo X , Inativação do Cromossomo X , Alelos , Animais , Feminino , Genômica , Cromossomo X/genética
3.
Psychol Med ; 52(5): 968-978, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-32762793

RESUMO

BACKGROUND: Substance use occurs at a high rate in persons with a psychiatric disorder. Genetically informative studies have the potential to elucidate the etiology of these phenomena. Recent developments in genome-wide association studies (GWAS) allow new avenues of investigation. METHOD: Using results of GWAS meta-analyses, we performed a factor analysis of the genetic correlation structure, a genome-wide search of shared loci, and causally informative tests for six substance use phenotypes (four smoking, one alcohol, and one cannabis use) and five psychiatric disorders (ADHD, anorexia, depression, bipolar disorder, and schizophrenia). RESULTS: Two correlated externalizing and internalizing/psychosis factor were found, although model fit was beneath conventional standards. Of 458 loci reported in previous univariate GWAS of substance use and psychiatric disorders, about 50% (230 loci) were pleiotropic with additional 111 pleiotropic loci not reported from past GWAS. Of the 341 pleiotropic loci, 152 were associated with both substance use and psychiatric disorders, implicating neurodevelopment, cell morphogenesis, biological adhesion pathways, and enrichment in 13 different brain tissues. Seventy-five and 114 pleiotropic loci were specific to either psychiatric disorders or substance use phenotypes, implicating neuronal signaling pathway and clathrin-binding functions/structures, respectively. No consistent evidence for phenotypic causation was found across different Mendelian randomization methods. CONCLUSIONS: Genetic etiology of substance use and psychiatric disorders is highly pleiotropic and involves shared neurodevelopmental path, neurotransmission, and intracellular trafficking. In aggregate, the patterns are not consistent with vertical pleiotropy, more likely reflecting horizontal pleiotropy or more complex forms of phenotypic causation.


Assuntos
Transtornos Mentais , Esquizofrenia , Transtornos Relacionados ao Uso de Substâncias , Pleiotropia Genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Transtornos Mentais/epidemiologia , Transtornos Mentais/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Esquizofrenia/epidemiologia , Esquizofrenia/genética , Transtornos Relacionados ao Uso de Substâncias/epidemiologia , Transtornos Relacionados ao Uso de Substâncias/genética
4.
Bioinformatics ; 36(19): 4951-4954, 2020 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-32756942

RESUMO

SUMMARY: Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. AVAILABILITY AND IMPLEMENTATION: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bancos de Espécimes Biológicos , Software , Genótipo , Humanos
5.
Bioinformatics ; 36(12): 3811-3817, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32246825

RESUMO

MOTIVATION: Large scale genome-wide association studies (GWAS) have resulted in the identification of a wide range of genetic variants related to a host of complex traits and disorders. Despite their success, the individual single-nucleotide polymorphism (SNP) analysis approach adopted in most current GWAS can be limited in that it is usually biologically simple to elucidate a comprehensive genetic architecture of phenotypes and statistically underpowered due to heavy multiple-testing correction burden. On the other hand, multiple-SNP analyses (e.g. gene-based or region-based SNP-set analysis) are usually more powerful to examine the joint effects of a set of SNPs on the phenotype of interest. However, current multiple-SNP approaches can only draw an overall conclusion at the SNP-set level and does not directly inform which SNPs in the SNP-set are driving the overall genotype-phenotype association. RESULTS: In this article, we propose a new permutation-assisted tuning procedure in lasso (plasso) to identify phenotype-associated SNPs in a joint multiple-SNP regression model in GWAS. The tuning parameter of lasso determines the amount of shrinkage and is essential to the performance of variable selection. In the proposed plasso procedure, we first generate permutations as pseudo-SNPs that are not associated with the phenotype. Then, the lasso tuning parameter is delicately chosen to separate true signal SNPs and non-informative pseudo-SNPs. We illustrate plasso using simulations to demonstrate its superior performance over existing methods, and application of plasso to a real GWAS dataset gains new additional insights into the genetic control of complex traits. AVAILABILITY AND IMPLEMENTATION: R codes to implement the proposed methodology is available at https://github.com/xyz5074/plasso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudos de Associação Genética , Fenótipo
6.
PLoS Genet ; 14(7): e1007452, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-30016313

RESUMO

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.


Assuntos
Análise de Dados , Produtos do Tabaco/estatística & dados numéricos , Uso de Tabaco/genética , Alelos , Interpretação Estatística de Dados , Conjuntos de Dados como Assunto , Loci Gênicos/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
7.
Am J Hum Genet ; 101(1): 115-122, 2017 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-28669402

RESUMO

Massively parallel sequencing technologies provide great opportunities for discovering rare susceptibility variants involved in complex disease etiology via large-scale imputation and exome and whole-genome sequence-based association studies. Due to modest effect sizes, large sample sizes of tens to hundreds of thousands of individuals are required for adequately powered studies. Current analytical tools are obsolete when it comes to handling these large datasets. To facilitate the analysis of large-scale sequence-based studies, we developed SEQSpark which implements parallel processing based on Spark to increase the speed and efficiency of performing data quality control, annotation, and association analysis. To demonstrate the versatility and speed of SEQSpark, we analyzed whole-genome sequence data from the UK10K, testing for associations with waist-to-hip ratios. The analysis, which was completed in 1.5 hr, included loading data, annotation, principal component analysis, and single variant and rare variant aggregate association analysis of >9 million variants. For rare variant aggregate analysis, an exome-wide significant association (p < 2.5 × 10-6) was observed with CCDC62 (SKAT-O [p = 6.89 × 10-7], combined multivariate collapsing [p = 1.48 × 10-6], and burden of rare variants [p = 1.48 × 10-6]). SEQSpark was also used to analyze 50,000 simulated exomes and it required 1.75 hr for the analysis of a quantitative trait using several rare variant aggregate association methods. Additionally, the performance of SEQSpark was compared to Variant Association Tools and PLINK/SEQ. SEQSpark was always faster and in some situations computation was reduced to a hundredth of the time. SEQSpark will empower large sequence-based epidemiological studies to quickly elucidate genetic variation involved in the etiology of complex traits.


Assuntos
Bases de Dados de Ácidos Nucleicos , Exoma/genética , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Análise de Sequência de DNA/métodos , Software , Humanos , Análise de Componente Principal , Relação Cintura-Quadril
8.
Am J Hum Genet ; 99(1): 40-55, 2016 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-27346686

RESUMO

Platelet production, maintenance, and clearance are tightly controlled processes indicative of platelets' important roles in hemostasis and thrombosis. Platelets are common targets for primary and secondary prevention of several conditions. They are monitored clinically by complete blood counts, specifically with measurements of platelet count (PLT) and mean platelet volume (MPV). Identifying genetic effects on PLT and MPV can provide mechanistic insights into platelet biology and their role in disease. Therefore, we formed the Blood Cell Consortium (BCX) to perform a large-scale meta-analysis of Exomechip association results for PLT and MPV in 157,293 and 57,617 individuals, respectively. Using the low-frequency/rare coding variant-enriched Exomechip genotyping array, we sought to identify genetic variants associated with PLT and MPV. In addition to confirming 47 known PLT and 20 known MPV associations, we identified 32 PLT and 18 MPV associations not previously observed in the literature across the allele frequency spectrum, including rare large effect (FCER1A), low-frequency (IQGAP2, MAP1A, LY75), and common (ZMIZ2, SMG6, PEAR1, ARFGAP3/PACSIN2) variants. Several variants associated with PLT/MPV (PEAR1, MRVI1, PTGES3) were also associated with platelet reactivity. In concurrent BCX analyses, there was overlap of platelet-associated variants with red (MAP1A, TMPRSS6, ZMIZ2) and white (PEAR1, ZMIZ2, LY75) blood cell traits, suggesting common regulatory pathways with shared genetic architecture among these hematopoietic lineages. Our large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors.


Assuntos
Plaquetas/metabolismo , Exoma/genética , Variação Genética/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Volume Plaquetário Médio , Contagem de Plaquetas
9.
Nucleic Acids Res ; 45(9): e75, 2017 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-28115622

RESUMO

Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.


Assuntos
Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos
10.
Bioinformatics ; 32(9): 1423-6, 2016 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153000

RESUMO

MOTIVATION: Next-generation sequencing technologies have enabled the large-scale assessment of the impact of rare and low-frequency genetic variants for complex human diseases. Gene-level association tests are often performed to analyze rare variants, where multiple rare variants in a gene region are analyzed jointly. Applying gene-level association tests to analyze sequence data often requires integrating multiple heterogeneous sources of information (e.g. annotations, functional prediction scores, allele frequencies, genotypes and phenotypes) to determine the optimal analysis unit and prioritize causal variants. Given the complexity and scale of current sequence datasets and bioinformatics databases, there is a compelling need for more efficient software tools to facilitate these analyses. To answer this challenge, we developed RVTESTS, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals. RVTESTS also provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection. We illustrate the advantages of RVTESTS in functionality and efficiency using the 1000 Genomes Project data. AVAILABILITY AND IMPLEMENTATION: RVTESTS is available on Linux, MacOS and Windows. Source code and executable files can be obtained at https://github.com/zhanxw/rvtests CONTACT: zhanxw@gmail.com; goncalo@umich.edu; dajiang.liu@outlook.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Genética , Software , Animais , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Linguagens de Programação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA