Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 18.823
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 182(2): 317-328.e10, 2020 07 23.
Artigo em Inglês | MEDLINE | ID: mdl-32526205

RESUMO

Hepatocellular carcinoma (HCC) is an aggressive malignancy with its global incidence and mortality rate continuing to rise, although early detection and surveillance are suboptimal. We performed serological profiling of the viral infection history in 899 individuals from an NCI-UMD case-control study using a synthetic human virome, VirScan. We developed a viral exposure signature and validated the results in a longitudinal cohort with 173 at-risk patients who had long-term follow-up for HCC development. Our viral exposure signature significantly associated with HCC status among at-risk individuals in the validation cohort (area under the curve: 0.91 [95% CI 0.87-0.96] at baseline and 0.98 [95% CI 0.97-1] at diagnosis). The signature identified cancer patients prior to a clinical diagnosis and was superior to alpha-fetoprotein. In summary, we established a viral exposure signature that can predict HCC among at-risk patients prior to a clinical diagnosis, which may be useful in HCC surveillance.


Assuntos
Carcinoma Hepatocelular/patologia , Neoplasias Hepáticas/patologia , Viroses/patologia , Adulto , Idoso , Área Sob a Curva , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Estudos de Casos e Controles , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Curva ROC , Fatores de Risco , Viroses/complicações , Adulto Jovem , alfa-Fetoproteínas/análise
2.
Cell ; 175(6): 1679-1687.e7, 2018 11 29.
Artigo em Inglês | MEDLINE | ID: mdl-30343897

RESUMO

Multiple sclerosis is a complex neurological disease, with ∼20% of risk heritability attributable to common genetic variants, including >230 identified by genome-wide association studies. Multiple strands of evidence suggest that much of the remaining heritability is also due to additive effects of common variants rather than epistasis between these variants or mutations exclusive to individual families. Here, we show in 68,379 cases and controls that up to 5% of this heritability is explained by low-frequency variation in gene coding sequence. We identify four novel genes driving MS risk independently of common-variant signals, highlighting key pathogenic roles for regulatory T cell homeostasis and regulation, IFNγ biology, and NFκB signaling. As low-frequency variants do not show substantial linkage disequilibrium with other variants, and as coding variants are more interpretable and experimentally tractable than non-coding variation, our discoveries constitute a rich resource for dissecting the pathobiology of MS.


Assuntos
Epistasia Genética , Predisposição Genética para Doença , Desequilíbrio de Ligação , Esclerose Múltipla/genética , Mutação , Fases de Leitura Aberta , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Esclerose Múltipla/imunologia , Fatores de Risco
3.
Cell ; 175(3): 848-858.e6, 2018 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-30318150

RESUMO

In familial searching in forensic genetics, a query DNA profile is tested against a database to determine whether it represents a relative of a database entrant. We examine the potential for using linkage disequilibrium to identify pairs of profiles as belonging to relatives when the query and database rely on nonoverlapping genetic markers. Considering data on individuals genotyped with both microsatellites used in forensic applications and genome-wide SNPs, we find that ∼30%-32% of parent-offspring pairs and ∼35%-36% of sib pairs can be identified from the SNPs of one member of the pair and the microsatellites of the other. The method suggests the possibility of performing familial searches of microsatellite databases using query SNP profiles, or vice versa. It also reveals that privacy concerns arising from computations across multiple databases that share no genetic markers in common entail risks, not only for database entrants, but for their close relatives as well.


Assuntos
Família , Genética Forense/métodos , Genética Populacional/métodos , Técnicas de Genotipagem/métodos , Polimorfismo de Nucleotídeo Único , Feminino , Humanos , Desequilíbrio de Ligação , Masculino , Repetições de Microssatélites , Modelos Genéticos , Modelos Estatísticos , Linhagem
4.
Nat Immunol ; 20(7): 824-834, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31209403

RESUMO

Multiple genome-wide studies have identified associations between outcome of human immunodeficiency virus (HIV) infection and polymorphisms in and around the gene encoding the HIV co-receptor CCR5, but the functional basis for the strongest of these associations, rs1015164A/G, is unknown. We found that rs1015164 marks variation in an activating transcription factor 1 binding site that controls expression of the antisense long noncoding RNA (lncRNA) CCR5AS. Knockdown or enhancement of CCR5AS expression resulted in a corresponding change in CCR5 expression on CD4+ T cells. CCR5AS interfered with interactions between the RNA-binding protein Raly and the CCR5 3' untranslated region, protecting CCR5 messenger RNA from Raly-mediated degradation. Reduction in CCR5 expression through inhibition of CCR5AS diminished infection of CD4+ T cells with CCR5-tropic HIV in vitro. These data represent a rare determination of the functional importance of a genome-wide disease association where expression of a lncRNA affects HIV infection and disease progression.


Assuntos
Regulação da Expressão Gênica , Variação Genética , Infecções por HIV/genética , Infecções por HIV/virologia , HIV-1 , RNA Antissenso/genética , RNA Longo não Codificante/genética , Receptores CCR5/genética , Regiões 3' não Traduzidas , Alelos , Biomarcadores , Linfócitos T CD4-Positivos/imunologia , Linfócitos T CD4-Positivos/metabolismo , Linfócitos T CD4-Positivos/virologia , Membrana Celular/metabolismo , Genes Reporter , Genótipo , Infecções por HIV/metabolismo , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Grupos Populacionais/genética , Prognóstico , Estabilidade de RNA , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Receptores CCR5/metabolismo , Carga Viral
5.
Nature ; 617(7962): 755-763, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37198480

RESUMO

Despite broad agreement that Homo sapiens originated in Africa, considerable uncertainty surrounds specific models of divergence and migration across the continent1. Progress is hampered by a shortage of fossil and genomic data, as well as variability in previous estimates of divergence times1. Here we seek to discriminate among such models by considering linkage disequilibrium and diversity-based statistics, optimized for rapid, complex demographic inference2. We infer detailed demographic models for populations across Africa, including eastern and western representatives, and newly sequenced whole genomes from 44 Nama (Khoe-San) individuals from southern Africa. We infer a reticulated African population history in which present-day population structure dates back to Marine Isotope Stage 5. The earliest population divergence among contemporary populations occurred 120,000 to 135,000 years ago and was preceded by links between two or more weakly differentiated ancestral Homo populations connected by gene flow over hundreds of thousands of years. Such weakly structured stem models explain patterns of polymorphism that had previously been attributed to contributions from archaic hominins in Africa2-7. In contrast to models with archaic introgression, we predict that fossil remains from coexisting ancestral populations should be genetically and morphologically similar, and that only an inferred 1-4% of genetic differentiation among contemporary human populations can be attributed to genetic drift between stem populations. We show that model misspecification explains the variation in previous estimates of divergence times, and argue that studying a range of models is key to making robust inferences about deep history.


Assuntos
Genética Populacional , Migração Humana , Filogenia , Humanos , África/etnologia , Fósseis , Fluxo Gênico , Deriva Genética , Introgressão Genética , Genoma Humano , História Antiga , Migração Humana/história , Desequilíbrio de Ligação/genética , Polimorfismo Genético , Fatores de Tempo
6.
Nature ; 606(7914): 527-534, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35676474

RESUMO

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.


Assuntos
Variação Genética , Genoma de Planta , Estudo de Associação Genômica Ampla , Melhoramento Vegetal , Solanum lycopersicum , Alelos , Produtos Agrícolas/genética , Genoma de Planta/genética , Desequilíbrio de Ligação , Solanum lycopersicum/genética , Solanum lycopersicum/metabolismo
7.
Nature ; 610(7933): 704-712, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36224396

RESUMO

Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.


Assuntos
Estatura , Mapeamento Cromossômico , Polimorfismo de Nucleotídeo Único , Humanos , Estatura/genética , Frequência do Gene/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Haplótipos/genética , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética , Europa (Continente)/etnologia , Tamanho da Amostra , Fenótipo
8.
Nature ; 602(7895): 106-111, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34883497

RESUMO

Host genetic factors can confer resistance against malaria1, raising the question of whether this has led to evolutionary adaptation of parasite populations. Here we searched for association between candidate host and parasite genetic variants in 3,346 Gambian and Kenyan children with severe malaria caused by Plasmodium falciparum. We identified a strong association between sickle haemoglobin (HbS) in the host and three regions of the parasite genome, which is not explained by population structure or other covariates, and which is replicated in additional samples. The HbS-associated alleles include nonsynonymous variants in the gene for the acyl-CoA synthetase family member2-4 PfACS8 on chromosome 2, in a second region of chromosome 2, and in a region containing structural variation on chromosome 11. The alleles are in strong linkage disequilibrium and have frequencies that covary with the frequency of HbS across populations, in particular being much more common in Africa than other parts of the world. The estimated protective effect of HbS against severe malaria, as determined by comparison of cases with population controls, varies greatly according to the parasite genotype at these three loci. These findings open up a new avenue of enquiry into the biological and epidemiological significance of the HbS-associated polymorphisms in the parasite genome and the evolutionary forces that have led to their high frequency and strong linkage disequilibrium in African P. falciparum populations.


Assuntos
Genótipo , Hemoglobina Falciforme/genética , Adaptação ao Hospedeiro/genética , Malária Falciparum/sangue , Malária Falciparum/parasitologia , Parasitos/genética , Plasmodium falciparum/genética , Alelos , Animais , Criança , Feminino , Gâmbia/epidemiologia , Genes de Protozoários/genética , Humanos , Quênia/epidemiologia , Desequilíbrio de Ligação , Malária Falciparum/epidemiologia , Masculino , Polimorfismo Genético
9.
Am J Hum Genet ; 111(7): 1448-1461, 2024 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-38821058

RESUMO

Both trio and population designs are popular study designs for identifying risk genetic variants in genome-wide association studies (GWASs). The trio design, as a family-based design, is robust to confounding due to population structure, whereas the population design is often more powerful due to larger sample sizes. Here, we propose KnockoffHybrid, a knockoff-based statistical method for hybrid analysis of both the trio and population designs. KnockoffHybrid provides a unified framework that brings together the advantages of both designs and produces powerful hybrid analysis while controlling the false discovery rate (FDR) in the presence of linkage disequilibrium and population structure. Furthermore, KnockoffHybrid has the flexibility to leverage different types of summary statistics for hybrid analyses, including expression quantitative trait loci (eQTL) and GWAS summary statistics. We demonstrate in simulations that KnockoffHybrid offers power gains over non-hybrid methods for the trio and population designs with the same number of cases while controlling the FDR with complex correlation among variants and population structure among subjects. In hybrid analyses of three trio cohorts for autism spectrum disorders (ASDs) from the Autism Speaks MSSNG, Autism Sequencing Consortium, and Autism Genome Project with GWAS summary statistics from the iPSYCH project and eQTL summary statistics from the MetaBrain project, KnockoffHybrid outperforms conventional methods by replicating several known risk genes for ASDs and identifying additional associations with variants in other genes, including the PRAME family genes involved in axon guidance and which may act as common targets for human speech/language evolution and related disorders.


Assuntos
Transtorno do Espectro Autista , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla/métodos , Humanos , Transtorno do Espectro Autista/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Simulação por Computador , Modelos Genéticos
10.
Am J Hum Genet ; 111(5): 990-995, 2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38636510

RESUMO

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.


Assuntos
Frequência do Gene , Genótipo , Polimorfismo de Nucleotídeo Único , Software , Humanos , Estudos de Coortes , Desequilíbrio de Ligação , Estudo de Associação Genômica Ampla/métodos , Genoma Humano , Controle de Qualidade , Aprendizado de Máquina , Sequenciamento Completo do Genoma/normas , Sequenciamento Completo do Genoma/métodos
11.
Am J Hum Genet ; 111(5): 966-978, 2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38701746

RESUMO

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Assuntos
Asma , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Humanos , Asma/genética , Cadeias de Markov , Colite Ulcerativa/genética , Reprodutibilidade dos Testes , Fenótipo , Genótipo
12.
Am J Hum Genet ; 111(7): 1405-1419, 2024 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-38906146

RESUMO

Genome-wide association studies (GWASs) have identified numerous lung cancer risk-associated loci. However, decoding molecular mechanisms of these associations is challenging since most of these genetic variants are non-protein-coding with unknown function. Here, we implemented massively parallel reporter assays (MPRAs) to simultaneously measure the allelic transcriptional activity of risk-associated variants. We tested 2,245 variants at 42 loci from 3 recent GWASs in East Asian and European populations in the context of two major lung cancer histological types and exposure to benzo(a)pyrene. This MPRA approach identified one or more variants (median 11 variants) with significant effects on transcriptional activity at 88% of GWAS loci. Multimodal integration of lung-specific epigenomic data demonstrated that 63% of the loci harbored multiple potentially functional variants in linkage disequilibrium. While 22% of the significant variants showed allelic effects in both A549 (adenocarcinoma) and H520 (squamous cell carcinoma) cell lines, a subset of the functional variants displayed a significant cell-type interaction. Transcription factor analyses nominated potential regulators of the functional variants, including those with cell-type-specific expression and those predicted to bind multiple potentially functional variants across the GWAS loci. Linking functional variants to target genes based on four complementary approaches identified candidate susceptibility genes, including those affecting lung cancer cell growth. CRISPR interference of the top functional variant at 20q13.33 validated variant-to-gene connections, including RTEL1, SOX18, and ARFRP1. Our data provide a comprehensive functional analysis of lung cancer GWAS loci and help elucidate the molecular basis of heterogeneity and polygenicity underlying lung cancer susceptibility.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Neoplasias Pulmonares , Polimorfismo de Nucleotídeo Único , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Desequilíbrio de Ligação , Herança Multifatorial/genética , Linhagem Celular Tumoral , Alelos , Células A549
13.
Genome Res ; 34(2): 300-309, 2024 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-38355307

RESUMO

Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.


Assuntos
Locos de Características Quantitativas , Splicing de RNA , Masculino , Bovinos/genética , Animais , Genótipo , Fenótipo , Desequilíbrio de Ligação , Variação Estrutural do Genoma
14.
Genome Res ; 34(1): 70-84, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38071472

RESUMO

Meiotic recombination is crucial for human genetic diversity and chromosome segregation accuracy. Understanding its variation across individuals and the processes by which it goes awry are long-standing goals in human genetics. Current approaches for inferring recombination landscapes rely either on population genetic patterns of linkage disequilibrium (LD)-capturing a time-averaged view-or on direct detection of crossovers in gametes or multigeneration pedigrees, which limits data set scale and availability. Here, we introduce an approach for inferring sex-specific recombination landscapes using data from preimplantation genetic testing for aneuploidy (PGT-A). This method relies on low-coverage (<0.05×) whole-genome sequencing of in vitro fertilized (IVF) embryo biopsies. To overcome the data sparsity, our method exploits its inherent relatedness structure, knowledge of haplotypes from external population reference panels, and the frequent occurrence of monosomies in embryos, whereby the remaining chromosome is phased by default. Extensive simulations show our method's high accuracy, even at coverages as low as 0.02×. Applying this method to PGT-A data from 18,967 embryos, we mapped 70,660 recombination events with ∼150 kbp resolution, replicating established sex-specific recombination patterns. We observed a reduced total length of the female genetic map in trisomies compared with disomies, as well as chromosome-specific alterations in crossover distributions. Based on haplotype configurations in pericentromeric regions, our data indicate chromosome-specific propensities for different mechanisms of meiotic error. Our results provide a comprehensive view of the role of aberrant meiotic recombination in the origins of human aneuploidies and offer a versatile tool for mapping crossovers in low-coverage sequencing data from multiple siblings.


Assuntos
Aneuploidia , Testes Genéticos , Masculino , Humanos , Feminino , Testes Genéticos/métodos , Aberrações Cromossômicas , Desequilíbrio de Ligação , Linhagem
15.
16.
Nature ; 600(7890): 675-679, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34887591

RESUMO

Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4-23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.


Assuntos
Doenças Cardiovasculares , Estudo de Associação Genômica Ampla , Doenças Cardiovasculares/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Desequilíbrio de Ligação , Herança Multifatorial , Polimorfismo de Nucleotídeo Único/genética , Grupos Populacionais
17.
PLoS Genet ; 20(1): e1010929, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38271473

RESUMO

Genome-wide association studies (GWASs) have achieved remarkable success in associating thousands of genetic variants with complex traits. However, the presence of linkage disequilibrium (LD) makes it challenging to identify the causal variants. To address this critical gap from association to causation, many fine-mapping methods have been proposed to assign well-calibrated probabilities of causality to candidate variants, taking into account the underlying LD pattern. In this manuscript, we introduce a statistical framework that incorporates expression quantitative trait locus (eQTL) information to fine-mapping, built on the sum of single-effects (SuSiE) regression model. Our new method, SuSiE2, connects two SuSiE models, one for eQTL analysis and one for genetic fine-mapping. This is achieved by first computing the posterior inclusion probabilities (PIPs) from an eQTL-based SuSiE model with the expression level of the candidate gene as the phenotype. These calculated PIPs are then utilized as prior inclusion probabilities for risk variants in another SuSiE model for the trait of interest. By prioritizing functional variants within the candidate region using eQTL information, SuSiE2 improves SuSiE by increasing the detection rate of causal SNPs and reducing the average size of credible sets. We compared the performance of SuSiE2 with other multi-trait fine-mapping methods with respect to power, coverage, and precision through simulations and applications to the GWAS results of Alzheimer's disease (AD) and body mass index (BMI). Our results demonstrate the better performance of SuSiE2, both when the in-sample linkage disequilibrium (LD) matrix and an external reference panel is used in inference.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Locos de Características Quantitativas/genética , Estudo de Associação Genômica Ampla/métodos , Mapeamento Cromossômico/métodos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único
18.
PLoS Genet ; 20(4): e1011212, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38630784

RESUMO

Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (FST) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t-test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.


Assuntos
Herança Multifatorial , Neoplasias da Próstata , Masculino , Humanos , Teorema de Bayes , Fatores de Risco , Desequilíbrio de Ligação , Estudo de Associação Genômica Ampla , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único
19.
PLoS Genet ; 20(7): e1011312, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39018328

RESUMO

Many traits are polygenic, affected by multiple genetic variants throughout the genome. Selection acting on these traits involves co-ordinated allele-frequency changes at these underlying variants, and this process has been extensively studied in random-mating populations. Yet many species self-fertilise to some degree, which incurs changes to genetic diversity, recombination and genome segregation. These factors cumulatively influence how polygenic selection is realised in nature. Here, we use analytical modelling and stochastic simulations to investigate to what extent self-fertilisation affects polygenic adaptation to a new environment. Our analytical solutions show that while selfing can increase adaptation to an optimum, it incurs linkage disequilibrium that can slow down the initial spread of favoured mutations due to selection interference, and favours the fixation of alleles with opposing trait effects. Simulations show that while selection interference is present, high levels of selfing (at least 90%) aids adaptation to a new optimum, showing a higher long-term fitness. If mutations are pleiotropic then only a few major-effect variants fix along with many neutral hitchhikers, with a transient increase in linkage disequilibrium. These results show potential advantages to self-fertilisation when adapting to a new environment, and how the mating system affects the genetic composition of polygenic selection.


Assuntos
Desequilíbrio de Ligação , Modelos Genéticos , Herança Multifatorial , Seleção Genética , Autofertilização , Seleção Genética/genética , Herança Multifatorial/genética , Autofertilização/genética , Mutação , Frequência do Gene , Variação Genética , Alelos , Simulação por Computador , Adaptação Fisiológica/genética , Animais
20.
Am J Hum Genet ; 110(4): 575-591, 2023 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-37028392

RESUMO

Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação/genética , Genótipo , Bancos de Espécimes Biológicos , Reino Unido , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA