Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 185(16): 3041-3055.e25, 2022 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-35917817

RESUMO

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.


Assuntos
Variações do Número de Cópias de DNA , Genoma Humano , Variações do Número de Cópias de DNA/genética , Dosagem de Genes , Haploinsuficiência/genética , Humanos
2.
Cell ; 184(20): 5247-5260.e19, 2021 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-34534445

RESUMO

3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.


Assuntos
Regiões 3' não Traduzidas/genética , Evolução Biológica , Doença/genética , Estudo de Associação Genômica Ampla , Algoritmos , Alelos , Regulação da Expressão Gênica , Genes Reporter , Variação Genética , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Polirribossomos/metabolismo , Locos de Características Quantitativas/genética , RNA/genética
3.
Cell ; 162(4): 738-50, 2015 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-26276630

RESUMO

The 2013-2015 West African epidemic of Ebola virus disease (EVD) reminds us of how little is known about biosafety level 4 viruses. Like Ebola virus, Lassa virus (LASV) can cause hemorrhagic fever with high case fatality rates. We generated a genomic catalog of almost 200 LASV sequences from clinical and rodent reservoir samples. We show that whereas the 2013-2015 EVD epidemic is fueled by human-to-human transmissions, LASV infections mainly result from reservoir-to-human infections. We elucidated the spread of LASV across West Africa and show that this migration was accompanied by changes in LASV genome abundance, fatality rates, codon adaptation, and translational efficiency. By investigating intrahost evolution, we found that mutations accumulate in epitopes of viral surface proteins, suggesting selection for immune escape. This catalog will serve as a foundation for the development of vaccines and diagnostics. VIDEO ABSTRACT.


Assuntos
Genoma Viral , Febre Lassa/virologia , Vírus Lassa/genética , RNA Viral/genética , África Ocidental/epidemiologia , Animais , Evolução Biológica , Reservatórios de Doenças , Ebolavirus/genética , Variação Genética , Glicoproteínas/genética , Doença pelo Vírus Ebola/virologia , Humanos , Febre Lassa/epidemiologia , Febre Lassa/transmissão , Vírus Lassa/classificação , Vírus Lassa/fisiologia , Murinae/genética , Mutação , Nigéria/epidemiologia , Proteínas Virais/genética , Zoonoses/epidemiologia , Zoonoses/virologia
4.
Nature ; 626(8000): 799-807, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38326615

RESUMO

Linking variants from genome-wide association studies (GWAS) to underlying mechanisms of disease remains a challenge1-3. For some diseases, a successful strategy has been to look for cases in which multiple GWAS loci contain genes that act in the same biological pathway1-6. However, our knowledge of which genes act in which pathways is incomplete, particularly for cell-type-specific pathways or understudied genes. Here we introduce a method to connect GWAS variants to functions. This method links variants to genes using epigenomics data, links genes to pathways de novo using Perturb-seq and integrates these data to identify convergence of GWAS loci onto pathways. We apply this approach to study the role of endothelial cells in genetic risk for coronary artery disease (CAD), and discover 43 CAD GWAS signals that converge on the cerebral cavernous malformation (CCM) signalling pathway. Two regulators of this pathway, CCM2 and TLNRD1, are each linked to a CAD risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. These results suggest a model whereby CAD risk is driven in part by the convergence of causal genes onto a particular transcriptional pathway in endothelial cells. They highlight shared genes between common and rare vascular diseases (CAD and CCM), and identify TLNRD1 as a new, previously uncharacterized member of the CCM signalling pathway. This approach will be widely useful for linking variants to functions for other common polygenic diseases.


Assuntos
Doença da Artéria Coronariana , Células Endoteliais , Estudo de Associação Genômica Ampla , Hemangioma Cavernoso do Sistema Nervoso Central , Humanos , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/patologia , Células Endoteliais/metabolismo , Células Endoteliais/patologia , Predisposição Genética para Doença/genética , Hemangioma Cavernoso do Sistema Nervoso Central/genética , Hemangioma Cavernoso do Sistema Nervoso Central/patologia , Polimorfismo de Nucleotídeo Único , Epigenômica , Transdução de Sinais/genética , Herança Multifatorial
5.
Nature ; 593(7858): 238-243, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33828297

RESUMO

Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.


Assuntos
Elementos Facilitadores Genéticos/genética , Predisposição Genética para Doença , Variação Genética/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Doenças Inflamatórias Intestinais/genética , Linhagem Celular , Cromossomos Humanos Par 10/genética , Ciclofilinas/genética , Células Dendríticas , Feminino , Humanos , Macrófagos/metabolismo , Masculino , Mitocôndrias/metabolismo , Especificidade de Órgãos/genética , Fenótipo
6.
PLoS Genet ; 19(9): e1010932, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37721944

RESUMO

The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.


Assuntos
Herança Multifatorial , Locos de Características Quantitativas , Humanos , Locos de Características Quantitativas/genética , Genótipo , Sequência de Bases , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único
7.
Nature ; 559(7714): 350-355, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29995854

RESUMO

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.


Assuntos
Aberrações Cromossômicas , Células Clonais/citologia , Células Clonais/metabolismo , Hematopoese/genética , Mosaicismo , Adulto , Idoso , Alelos , Bancos de Espécimes Biológicos , Quebra Cromossômica , Sítios Frágeis do Cromossomo/genética , Cromossomos Humanos Par 10/genética , Feminino , Saúde , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/mortalidade , Humanos , Masculino , Pessoa de Meia-Idade , Penetrância , Reino Unido
8.
Hum Mol Genet ; 30(16): 1521-1534, 2021 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-33987664

RESUMO

It is important to study the genetics of complex traits in diverse populations. Here, we introduce covariate-adjusted linkage disequilibrium (LD) score regression (cov-LDSC), a method to estimate SNP-heritability (${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}})$ and its enrichment in homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the genome-wide association studies samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ by 10-60% in admixed populations; in contrast, cov-LDSC is robustly accurate. We apply cov-LDSC to genotyping data from 8124 individuals, mostly of admixed ancestry, from the Slim Initiative in Genomic Medicine for the Americas study, and to approximately 161 000 Latino-ancestry individuals, 47 000 African American-ancestry individuals and 135 000 European-ancestry individuals, as classified by 23andMe. We estimate ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and detect heritability enrichment in three quantitative and five dichotomous phenotypes, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals to date. Most traits have high concordance of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$. We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size $ \tau $* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in the Latino-, African American- and European-ancestry populations, respectively. Our approach is a powerful way to analyze genetic data for complex traits from admixed populations.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Desequilíbrio de Ligação/genética , Herança Multifatorial/genética , Técnicas de Genotipagem/estatística & dados numéricos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Característica Quantitativa Herdável
9.
Nature ; 551(7678): 92-94, 2017 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-29059683

RESUMO

Breast cancer risk is influenced by rare coding variants in susceptibility genes, such as BRCA1, and many common, mostly non-coding variants. However, much of the genetic contribution to breast cancer risk remains unknown. Here we report the results of a genome-wide association study of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry. We identified 65 new loci that are associated with overall breast cancer risk at P < 5 × 10-8. The majority of credible risk single-nucleotide polymorphisms in these loci fall in distal regulatory elements, and by integrating in silico data to predict target genes in breast cells at each locus, we demonstrate a strong overlap between candidate target genes and somatic driver genes in breast tumours. We also find that heritability of breast cancer due to all single-nucleotide polymorphisms in regulatory features was 2-5-fold enriched relative to the genome-wide average, with strong enrichment for particular transcription factor binding sites. These results provide further insight into genetic susceptibility to breast cancer and will improve the use of genetic risk scores for individualized screening and prevention.


Assuntos
Neoplasias da Mama/genética , Loci Gênicos , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Ásia/etnologia , Povo Asiático/genética , Sítios de Ligação/genética , Neoplasias da Mama/diagnóstico , Simulação por Computador , Europa (Continente)/etnologia , Feminino , Humanos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Sequências Reguladoras de Ácido Nucleico , Medição de Risco , Fatores de Transcrição/metabolismo , População Branca/genética
10.
Hum Mol Genet ; 29(7): 1057-1067, 2020 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-31595288

RESUMO

Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


Assuntos
Cromatina/genética , Doenças Genéticas Inatas/genética , Anotação de Sequência Molecular , Fatores de Transcrição/genética , Sítios de Ligação/genética , Biologia Computacional , Regulação da Expressão Gênica/genética , Doenças Genéticas Inatas/classificação , Doenças Genéticas Inatas/patologia , Humanos , Desequilíbrio de Ligação/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica/genética
11.
Am J Hum Genet ; 104(5): 896-913, 2019 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-31051114

RESUMO

Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Genes/genética , Doenças Genéticas Inatas/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Humanos , Anotação de Sequência Molecular , Fenótipo , Software
12.
Genet Epidemiol ; 43(2): 180-188, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30474154

RESUMO

Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.


Assuntos
Genética Populacional , Adulto , Envelhecimento/genética , Artrite Reumatoide/genética , Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Genótipo , Humanos , Fenótipo , Característica Quantitativa Herdável , Reino Unido
13.
Am J Hum Genet ; 100(4): 605-616, 2017 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-28343628

RESUMO

Genetic variants that modulate gene expression levels play an important role in the etiology of human diseases and complex traits. Although large-scale eQTL mapping studies routinely identify many local eQTLs, the molecular mechanisms by which genetic variants regulate expression remain unclear, particularly for distal eQTLs, which these studies are not well powered to detect. Here, we leveraged all variants (not just those that pass stringent significance thresholds) to analyze the functional architecture of local and distal regulation of gene expression in 15 human tissues by employing an extension of stratified LD-score regression that produces robust results in simulations. The top enriched functional categories in local regulation of peripheral-blood gene expression included coding regions (11.41×), conserved regions (4.67×), and four histone marks (p < 5 × 10-5 for all enrichments); local enrichments were similar across the 15 tissues. We also observed substantial enrichments for distal regulation of peripheral-blood gene expression: coding regions (4.47×), conserved regions (4.51×), and two histone marks (p < 3 × 10-7 for all enrichments). Analyses of the genetic correlation of gene expression across tissues confirmed that local regulation of gene expression is largely shared across tissues but that distal regulation is highly tissue specific. Our results elucidate the functional components of the genetic architecture of local and distal regulation of gene expression.


Assuntos
Regulação da Expressão Gênica , Ansiedade/genética , Simulação por Computador , Depressão/genética , Humanos , Desequilíbrio de Ligação , Especificidade de Órgãos , Locos de Características Quantitativas , Análise de Regressão , Gêmeos/genética
14.
Am J Med Genet B Neuropsychiatr Genet ; 180(6): 428-438, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30593698

RESUMO

Anorexia nervosa (AN) occurs nine times more often in females than in males. Although environmental factors likely play a role, the reasons for this imbalanced sex ratio remain unresolved. AN displays high genetic correlations with anthropometric and metabolic traits. Given sex differences in body composition, we investigated the possible metabolic underpinnings of female propensity for AN. We conducted sex-specific GWAS in a healthy and medication-free subsample of the UK Biobank (n = 155,961), identifying 77 genome-wide significant loci associated with body fat percentage (BF%) and 174 with fat-free mass (FFM). Partitioned heritability analysis showed an enrichment for central nervous tissue-associated genes for BF%, which was more prominent in females than males. Genetic correlations of BF% and FFM with the largest GWAS of AN by the Psychiatric Genomics Consortium were estimated to explore shared genomics. The genetic correlations of BF%male and BF%female with AN differed significantly from each other (p < .0001, δ = -0.17), suggesting that the female preponderance in AN may, in part, be explained by sex-specific anthropometric and metabolic genetic factors increasing liability to AN.


Assuntos
Anorexia Nervosa/genética , Anorexia Nervosa/metabolismo , Composição Corporal/genética , Tecido Adiposo/metabolismo , Adulto , Anorexia Nervosa/fisiopatologia , Índice de Massa Corporal , Estudos de Casos e Controles , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Fatores Sexuais
15.
Am J Hum Genet ; 97(6): 775-89, 2015 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-26581902

RESUMO

The rate at which human genomes mutate is a central biological parameter that has many implications for our ability to understand demographic and evolutionary phenomena. We present a method for inferring mutation and gene-conversion rates by using the number of sequence differences observed in identical-by-descent (IBD) segments together with a reconstructed model of recent population-size history. This approach is robust to, and can quantify, the presence of substantial genotyping error, as validated in coalescent simulations. We applied the method to 498 trio-phased sequenced Dutch individuals and inferred a point mutation rate of 1.66 × 10(-8) per base per generation and a rate of 1.26 × 10(-9) for <20 bp indels. By quantifying how estimates varied as a function of allele frequency, we inferred the probability that a site is involved in non-crossover gene conversion as 5.99 × 10(-6). We found that recombination does not have observable mutagenic effects after gene conversion is accounted for and that local gene-conversion rates reflect recombination rates. We detected a strong enrichment of recent deleterious variation among mismatching variants found within IBD regions and observed summary statistics of local sharing of IBD segments to closely match previously proposed metrics of background selection; however, we found no significant effects of selection on our mutation-rate estimates. We detected no evidence of strong variation of mutation rates in a number of genomic annotations obtained from several recent studies. Our analysis suggests that a mutation-rate estimate higher than that reported by recent pedigree-based studies should be adopted in the context of DNA-based demographic reconstruction.


Assuntos
Genoma Humano , Mutação em Linhagem Germinativa , Modelos Genéticos , Taxa de Mutação , Alelos , Frequência do Gene , Haplótipos , Humanos , Mutação INDEL , Modelos Lineares , Recombinação Genética
16.
Am J Hum Genet ; 97(4): 576-92, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26430803

RESUMO

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.


Assuntos
Desequilíbrio de Ligação/genética , Modelos Teóricos , Herança Multifatorial/genética , Esclerose Múltipla/genética , Polimorfismo de Nucleotídeo Único/genética , Esquizofrenia/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fenótipo , Prognóstico , Locos de Características Quantitativas
17.
Bioinformatics ; 33(2): 272-279, 2017 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-27663502

RESUMO

MOTIVATION: LD score regression is a reliable and efficient method of using genome-wide association study (GWAS) summary-level results data to estimate the SNP heritability of complex traits and diseases, partition this heritability into functional categories, and estimate the genetic correlation between different phenotypes. Because the method relies on summary level results data, LD score regression is computationally tractable even for very large sample sizes. However, publicly available GWAS summary-level data are typically stored in different databases and have different formats, making it difficult to apply LD score regression to estimate genetic correlations across many different traits simultaneously. RESULTS: In this manuscript, we describe LD Hub - a centralized database of summary-level GWAS results for 173 diseases/traits from different publicly available resources/consortia and a web interface that automates the LD score regression analysis pipeline. To demonstrate functionality and validate our software, we replicated previously reported LD score regression analyses of 49 traits/diseases using LD Hub; and estimated SNP heritability and the genetic correlation across the different phenotypes. We also present new results obtained by uploading a recent atopic dermatitis GWAS meta-analysis to examine the genetic correlation between the condition and other potentially related traits. In response to the growing availability of publicly accessible GWAS summary-level results data, our database and the accompanying web interface will ensure maximal uptake of the LD score regression methodology, provide a useful database for the public dissemination of GWAS results, and provide a method for easily screening hundreds of traits for overlapping genetic aetiologies. AVAILABILITY AND IMPLEMENTATION: The web interface and instructions for using LD Hub are available at http://ldsc.broadinstitute.org/ CONTACT: jie.zheng@bristol.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Ácidos Nucleicos , Doenças Genéticas Inatas/genética , Estudo de Associação Genômica Ampla/métodos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Tamanho da Amostra , Software
19.
Am J Hum Genet ; 95(5): 535-52, 2014 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-25439723

RESUMO

Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg(2)) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg(2) from imputed SNPs (5.1× enrichment; p = 3.7 × 10(-17)) and 38% (SE = 4%) of hg(2) from genotyped SNPs (1.6× enrichment, p = 1.0 × 10(-4)). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg(2) despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease.


Assuntos
Doenças Genéticas Inatas/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Padrões de Herança/genética , Fases de Leitura Aberta/genética , Elementos Reguladores de Transcrição/genética , Simulação por Computador , Humanos , Modelos Genéticos
20.
Nat Genet ; 56(1): 162-169, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38036779

RESUMO

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Teorema de Bayes , Herança Multifatorial , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA