RESUMO
Genome-wide association studies (GWAS) have successfully identified thousands of associations between common genetic variants and human disease phenotypes, but the majority of these variants are non-coding, often requiring genetic fine-mapping, epigenomic profiling, and individual reporter assays to delineate potential causal variants. We employ a massively parallel reporter assay (MPRA) to simultaneously screen 2,756 variants in strong linkage disequilibrium with 75 sentinel variants associated with red blood cell traits. We show that this assay identifies elements with endogenous erythroid regulatory activity. Across 23 sentinel variants, we conservatively identified 32 MPRA functional variants (MFVs). We used targeted genome editing to demonstrate endogenous enhancer activity across 3 MFVs that predominantly affect the transcription of SMIM1, RBM38, and CD164. Functional follow-up of RBM38 delineates a key role for this gene in the alternative splicing program occurring during terminal erythropoiesis. Finally, we provide evidence for how common GWAS-nominated variants can disrupt cell-type-specific transcriptional regulatory pathways.
Assuntos
Eritrócitos , Técnicas Genéticas , Variação Genética , Processamento Alternativo , Linhagem Celular , Linhagem da Célula/genética , Eritropoese/genética , Biblioteca Gênica , Genes Reporter , Humanos , Sequências Reguladoras de Ácido Nucleico , Transcrição GênicaRESUMO
The identification of genes that evolve under recessive natural selection is a long-standing goal of population genetics research that has important applications to the discovery of genes associated with disease. We found that commonly used methods to evaluate selective constraint at the gene level are highly sensitive to genes under heterozygous selection but ubiquitously fail to detect recessively evolving genes. Additionally, more sophisticated likelihood-based methods designed to detect recessivity similarly lack power for a human gene of realistic length from current population sample sizes. However, extensive simulations suggested that recessive genes may be detectable in aggregate. Here, we offer a method informed by population genetics simulations designed to detect recessive purifying selection in gene sets. Applying this to empirical gene sets produced significant enrichments for strong recessive selection in genes previously inferred to be under recessive selection in a consanguineous cohort and in genes involved in autosomal recessive monogenic disorders.
Assuntos
Frequência do Gene , Genes Recessivos , Genética Populacional , Seleção Genética , Algoritmos , Alelos , Genes Dominantes , Predisposição Genética para Doença , Variação Genética , Genética Populacional/métodos , Genômica/métodos , Genótipo , Humanos , Padrões de Herança , Funções Verossimilhança , Modelos Genéticos , Mutação , Reino UnidoRESUMO
BACKGROUND: Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE. METHODS: We trained and validated models on data from 159â 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123â 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6. RESULTS: In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology. CONCLUSIONS: Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE.
Assuntos
Saúde da População , Tromboembolia Venosa , Humanos , Registros Eletrônicos de Saúde , Fatores de Risco , Tromboembolia Venosa/diagnóstico , Tromboembolia Venosa/epidemiologia , Tromboembolia Venosa/etiologia , Medição de Risco , Aprendizado de Máquina , Estudos RetrospectivosRESUMO
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1-3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4-10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions13-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
Assuntos
Povo Asiático/genética , População Negra/genética , Estudo de Associação Genômica Ampla/métodos , Hispânico ou Latino/genética , Grupos Minoritários , Herança Multifatorial/genética , Saúde da Mulher , Estatura/genética , Estudos de Coortes , Feminino , Genética Médica/métodos , Equidade em Saúde/tendências , Disparidades nos Níveis de Saúde , Humanos , Masculino , Estados UnidosRESUMO
BACKGROUND: Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model. METHODS: In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts. We measured the association of ISCAD with clinical outcomes-namely, coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae. FINDINGS: Among 95 935 participants, 35 749 were from the BioMe Biobank (median age 61 years [IQR 18]; 14 599 [41%] were male and 21 150 [59%] were female; 5130 [14%] were with diagnosed coronary artery disease) and 60 186 were from the UK Biobank (median age 62 [15] years; 25 031 [42%] male and 35 155 [58%] female; 8128 [14%] with diagnosed coronary artery disease). The model predicted coronary artery disease with an area under the receiver operating characteristic curve of 0·95 (95% CI 0·94-0·95; sensitivity of 0·94 [0·94-0·95] and specificity of 0·82 [0·81-0·83]) and 0·93 (0·92-0·93; sensitivity of 0·90 [0·89-0·90] and specificity of 0·88 [0·87-0·88]) in the BioMe validation and holdout sets, respectively, and 0·91 (0·91-0·91; sensitivity of 0·84 [0·83-0·84] and specificity of 0·83 [0·82-0·83]) in the UK Biobank external test set. ISCAD captured coronary artery disease risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles (increase per quartile of 12 percentage points), including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of major coronary arteries. Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles (decile 1: HR 1·0 [95% CI 1·0-1·0], 0·2% prevalence; decile 6: 11 [3·9-31], 3·1% prevalence; and decile 10: 56 [20-158], 11% prevalence). A similar trend was observed for recurrent myocardial infarction. 12 (46%) undiagnosed individuals with high ISCAD (≥0·9) had clinical evidence of coronary artery disease according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines. INTERPRETATION: Electronic health record-based machine learning was used to generate an in-silico marker for coronary artery disease that can non-invasively quantify atherosclerosis and risk of death on a continuous spectrum, and identify underdiagnosed individuals. FUNDING: National Institutes of Health.
Assuntos
Doença da Artéria Coronariana , Estenose Coronária , Humanos , Masculino , Feminino , Pessoa de Meia-Idade , Doença da Artéria Coronariana/diagnóstico , Doença da Artéria Coronariana/epidemiologia , Estudos de Coortes , Valor Preditivo dos Testes , Estenose Coronária/diagnóstico , Fatores de Risco , Aprendizado de Máquina , Angiografia CoronáriaRESUMO
PURPOSE: We used a polygenic risk score (PRS) to identify high-risk groups for primary open-angle glaucoma (POAG) within population-based cohorts. DESIGN: Secondary analysis of 4 prospective population-based studies. PARTICIPANTS: We included four European-ancestry cohorts: the United States-based Nurses' Health Study, Nurses' Health Study 2, and the Health Professionals Follow-up Study and the Rotterdam Study (RS) in The Netherlands. The United States cohorts included female nurses and male health professionals ≤ 55 years of age. The RS included residents ≤ 45 years of age living in Rotterdam, The Netherlands. METHODS: Polygenic risk score weights were estimated by applying the lassosum method on imputed genotype and phenotype data from the UK Biobank. This resulted in 144 020 variants, single nucleotide polymorphism and insertions or deletions, with nonzero ßs that we used to calculate a PRS in the target populations. Using multivariable Cox proportional hazard models, we estimated the relationship between the standardized PRS and relative risk for POAG. Additionally, POAG prediction was tested by calculating these models' concordance (Harrell's C statistic). Finally, we assessed the association between PRS tertiles and glaucoma-related traits. MAIN OUTCOME MEASURES: The relative risk for POAG and Harrell's C statistic. RESULTS: Among 1046 patients and 38 809⬠control participants, the relative risk (95% confidence interval) for POAG for participants in the highest PRS quintile was 3.99 (3.08-5.18) times higher in the United States cohorts and 4.89 (2.93-8.17) times higher in the RS, compared with participants with median genetic risk (third quintile). Combining age, sex, intraocular pressure of more than 25 mmHg, and family history resulted in a meta-analyzed concordance of 0.75 (95% CI, 0.73-0.75). Adding the PRS to this model improved the concordance to 0.82 (95% CI, 0.80-0.84). In a meta-analysis of all cohorts, patients in the highest tertile showed a larger cup-to-disc ratio at diagnosis, by 0.10 (95% CI, 0.06 0.14), and a 2.07-fold increased risk of requiring glaucoma surgery (95% CI, 1.19-3.60). CONCLUSIONS: Incorporating a PRS into a POAG predictive model improves identification concordance from 0.75 up to 0.82, supporting its potential for guiding more cost-effective screening strategies. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
RESUMO
Understanding the relationship between natural selection and phenotypic variation has been a long-standing challenge in human population genetics. With the emergence of biobank-scale datasets, along with new statistical metrics to approximate strength of purifying selection at the variant level, it is now possible to correlate a proxy of individual relative fitness with a range of medical phenotypes. We calculated a per-individual deleterious load score by summing the total number of derived alleles per individual after incorporating a weight that approximates strength of purifying selection. We assessed four methods for the weight, including GERP, phyloP, CADD, and fitcons. By quantitatively tracking each of these scores with the site frequency spectrum, we identified phyloP as the most appropriate weight. The phyloP-weighted load score was then calculated across 15,129,142 variants in 335,161 individuals from the UK Biobank and tested for association on 1,380 medical phenotypes. After accounting for multiple test correction, we observed a strong association of the load score amongst coding sites only on 27 traits including body mass, adiposity and metabolic rate. We further observed that the association signals were driven by common variants (derived allele frequency > 5%) with high phyloP score (phyloP > 2). Finally, through permutation analyses, we showed that the load score amongst coding sites had an excess of nominally significant associations on many medical phenotypes. These results suggest a broad impact of deleterious load on medical phenotypes and highlight the deleterious load score as a tool to disentangle the complex relationship between natural selection and medical phenotypes.
Assuntos
Evolução Molecular , Aptidão Genética/genética , Genética Populacional , Seleção Genética/genética , Alelos , Bancos de Espécimes Biológicos , Índice de Massa Corporal , Feminino , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Variação Genética/genética , Humanos , Masculino , Reino UnidoRESUMO
Circulating biomarkers play a pivotal role in personalized medicine, offering potential for disease screening, prevention, and treatment. Despite established associations between numerous biomarkers and diseases, elucidating their causal relationships is challenging. Mendelian Randomization (MR) can address this issue by employing genetic instruments to discern causal links. Additionally, using multiple MR methods with overlapping results enhances the reliability of discovered relationships. Here, we report an MR study using multiple methods, including inverse variance weighted, simple mode, weighted mode, weighted median, and MR-Egger. We use the MR-base resource (v0.5.6) from Hemani et al. 2018 to evaluate causal relationships between 212 circulating biomarkers (curated from UK Biobank analyses by Neale lab and from Shin et al. 2014, Roederer et al. 2015, and Kettunen et al. 2016 and 99 complex diseases (curated from several consortia by MRC IEU and Biobank Japan). We report novel causal relationships found by four or more MR methods between glucose and bipolar disorder (Mean Effect Size estimate across methods: 0.39) and between cystatin C and bipolar disorder (Mean Effect Size: -0.31). Based on agreement in four or more methods, we also identify previously known links between urate with gout and creatine with chronic kidney disease, as well as biomarkers that may be causal of cardiovascular conditions: apolipoprotein B, cholesterol, LDL, lipoprotein A, and triglycerides in coronary heart disease, as well as lipoprotein A, LDL, cholesterol, and apolipoprotein B in myocardial infarction. This Mendelian Randomization study not only corroborates known causal relationships between circulating biomarkers and diseases but also uncovers two novel biomarkers associated with bipolar disorder that warrant further investigation. Our findings provide insight into understanding how biological processes reflecting circulating biomarkers and their associated effects may contribute to disease etiology, which can eventually help improve precision diagnostics and intervention.
Assuntos
Biomarcadores , Análise da Randomização Mendeliana , Humanos , Biomarcadores/sangue , Transtorno Bipolar/genética , Transtorno Bipolar/sangue , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/sangue , Fatores de Risco , Cistatina C/sangue , Cistatina C/genética , Gota/genética , Gota/sangueRESUMO
BACKGROUND: Venous thromboembolism (VTE) is a life-threatening vascular event with environmental and genetic determinants. Recent VTE genome-wide association studies (GWAS) meta-analyses involved nearly 30 000 VTE cases and identified up to 40 genetic loci associated with VTE risk, including loci not previously suspected to play a role in hemostasis. The aim of our research was to expand discovery of new genetic loci associated with VTE by using cross-ancestry genomic resources. METHODS: We present new cross-ancestry meta-analyzed GWAS results involving up to 81 669 VTE cases from 30 studies, with replication of novel loci in independent populations and loci characterization through in silico genomic interrogations. RESULTS: In our genetic discovery effort that included 55 330 participants with VTE (47 822 European, 6320 African, and 1188 Hispanic ancestry), we identified 48 novel associations, of which 34 were replicated after correction for multiple testing. In our combined discovery-replication analysis (81 669 VTE participants) and ancestry-stratified meta-analyses (European, African, and Hispanic), we identified another 44 novel associations, which are new candidate VTE-associated loci requiring replication. In total, across all GWAS meta-analyses, we identified 135 independent genomic loci significantly associated with VTE risk. A genetic risk score of the significantly associated loci in Europeans identified a 6-fold increase in risk for those in the top 1% of scores compared with those with average scores. We also identified 31 novel transcript associations in transcriptome-wide association studies and 8 novel candidate genes with protein quantitative-trait locus Mendelian randomization analyses. In silico interrogations of hemostasis and hematology traits and a large phenome-wide association analysis of the 135 GWAS loci provided insights to biological pathways contributing to VTE, with some loci contributing to VTE through well-characterized coagulation pathways and others providing new data on the role of hematology traits, particularly platelet function. Many of the replicated loci are outside of known or currently hypothesized pathways to thrombosis. CONCLUSIONS: Our cross-ancestry GWAS meta-analyses identified new loci associated with VTE. These findings highlight new pathways to thrombosis and provide novel molecules that may be useful in the development of improved antithrombosis treatments.
Assuntos
Trombose , Tromboembolia Venosa , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genômica , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Trombose/genética , Tromboembolia Venosa/diagnóstico , Tromboembolia Venosa/genéticaRESUMO
BACKGROUND: Lyme disease is the most prevalent vector-borne disease in the US, yet its host factors are poorly understood and diagnostic tests are limited. We evaluated patients in a large health system to uncover cholesterol's role in the susceptibility, severity, and machine learning-based diagnosis of Lyme disease. METHODS: A longitudinal health system cohort comprised 1 019 175 individuals with electronic health record data and 50 329 with linked genetic data. Associations of blood cholesterol level, cholesterol genetic scores comprising common genetic variants, and burden of rare loss-of-function (LoF) variants in cholesterol metabolism genes with Lyme disease were investigated. A portable machine learning model was constructed and tested to predict Lyme disease using routine lipid and clinical measurements. RESULTS: There were 3832 cases of Lyme disease. Increasing cholesterol was associated with greater risk of Lyme disease and hypercholesterolemia was more prevalent in Lyme disease cases than in controls. Cholesterol genetic scores and rare LoF variants in CD36 and LDLR were associated with Lyme disease risk. Serological profiling of cases revealed parallel trajectories of rising cholesterol and immunoglobulin levels over the disease course, including marked increases in individuals with LoF variants and high cholesterol genetic scores. The machine learning model predicted Lyme disease solely using routine lipid panel, blood count, and metabolic measurements. CONCLUSIONS: These results demonstrate the value of large-scale genetic and clinical data to reveal host factors underlying infectious disease biology, risk, and prognosis and the potential for their clinical translation to machine learning diagnostics that do not need specialized assays.
Assuntos
Hipercolesterolemia , Doença de Lyme , Humanos , Doença de Lyme/diagnóstico , Doença de Lyme/epidemiologia , Colesterol , Prognóstico , Aprendizado de MáquinaRESUMO
Diabetic retinopathy (DR) is a common consequence in type 2 diabetes (T2D) and a leading cause of blindness in working-age adults. Yet, its genetic predisposition is largely unknown. Here, we examined the polygenic architecture underlying DR by deriving and assessing a genome-wide polygenic risk score (PRS) for DR. We evaluated the PRS in 6079 individuals with T2D of European, Hispanic, African and other ancestries from a large-scale multi-ethnic biobank. Main outcomes were PRS association with DR diagnosis, symptoms and complications, and time to diagnosis, and transferability to non-European ancestries. We observed that PRS was significantly associated with DR. A standard deviation increase in PRS was accompanied by an adjusted odds ratio (OR) of 1.12 [95% confidence interval (CI) 1.04-1.20; P = 0.001] for DR diagnosis. When stratified by ancestry, PRS was associated with the highest OR in European ancestry (OR = 1.22, 95% CI 1.02-1.41; P = 0.049), followed by African (OR = 1.15, 95% CI 1.03-1.28; P = 0.028) and Hispanic ancestries (OR = 1.10, 95% CI 1.00-1.10; P = 0.050). Individuals in the top PRS decile had a 1.8-fold elevated risk for DR versus the bottom decile (P = 0.002). Among individuals without DR diagnosis, the top PRS decile had more DR symptoms than the bottom decile (P = 0.008). The PRS was associated with retinal hemorrhage (OR = 1.44, 95% CI 1.03-2.02; P = 0.03) and earlier DR presentation (10% probability of DR by 4 years in the top PRS decile versus 8 years in the bottom decile). These results establish the significant polygenic underpinnings of DR and indicate the need for more diverse ancestries in biobanks to develop multi-ancestral PRS.
Assuntos
Diabetes Mellitus Tipo 2/epidemiologia , Retinopatia Diabética/epidemiologia , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Adulto , Idoso , População Negra/genética , Diabetes Mellitus Tipo 2/complicações , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/patologia , Retinopatia Diabética/complicações , Retinopatia Diabética/genética , Retinopatia Diabética/patologia , Hispânico ou Latino/genética , Humanos , Pessoa de Meia-Idade , Herança Multifatorial/genética , Medição de Risco , Fatores de Risco , População Branca/genéticaRESUMO
BACKGROUND: Micronutrients, namely vitamins and minerals, are associated with cancer outcomes; however, their reported effects have been inconsistent across studies. We aimed to identify the causally estimated effects of micronutrients on cancer by applying the Mendelian randomization (MR) method, using single-nucleotide polymorphisms associated with micronutrient levels as instrumental variables. METHODS: We obtained instrumental variables of 14 genetically predicted micronutrient levels and applied two-sample MR to estimate their causal effects on 22 cancer outcomes from a meta-analysis of the UK Biobank (UKB) and FinnGen cohorts (overall cancer and 21 site-specific cancers, including breast, colorectal, lung, and prostate cancer), in addition to six major cancer outcomes and 20 cancer subset outcomes from cancer consortia. We used sensitivity MR methods, including weighted median, MR-Egger, and MR-PRESSO, to assess potential horizontal pleiotropy or heterogeneity. Genome-wide association summary statistical data of European descent were used for both exposure and outcome data, including up to 940,633 participants of European descent with 133,384 cancer cases. RESULTS: In total, 672 MR tests (14 micronutrients × 48 cancer outcomes) were performed. The following two associations met Bonferroni significance by the number of associations (P < 0.00016) in the UKB plus FinnGen cohorts: increased risk of breast cancer with magnesium levels (odds ratio [OR] = 1.281 per 1 standard deviation [SD] higher magnesium level, 95% confidence interval [CI] = 1.151 to 1.426, P < 0.0001) and increased risk of colorectal cancer with vitamin B12 level (OR = 1.22 per 1 SD higher vitamin B12 level, 95% CI = 1.107 to 1.345, P < 0.0001). These two associations remained significant in the analysis of the cancer consortia. No significant heterogeneity or horizontal pleiotropy was observed. Micronutrient levels were not associated with overall cancer risk. CONCLUSIONS: Our results may aid clinicians in deciding whether to regulate the intake of certain micronutrients, particularly in high-risk groups without nutritional deficiencies, and may help in the design of future clinical trials.
Assuntos
Neoplasias da Mama , Micronutrientes , Humanos , Masculino , Estudo de Associação Genômica Ampla , Magnésio , Análise da Randomização Mendeliana , FemininoRESUMO
A major goal of biomedicine is to understand the function of every gene in the human genome. Loss-of-function mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such 'human knockouts' can provide insight into gene function. Consanguineous unions are more likely to result in offspring carrying homozygous loss-of-function mutations. In Pakistan, consanguinity rates are notably high. Here we sequence the protein-coding regions of 10,503 adult participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS), designed to understand the determinants of cardiometabolic diseases in individuals from South Asia. We identified individuals carrying homozygous predicted loss-of-function (pLoF) mutations, and performed phenotypic analysis involving more than 200 biochemical and disease traits. We enumerated 49,138 rare (<1% minor allele frequency) pLoF mutations. These pLoF mutations are estimated to knock out 1,317 genes, each in at least one participant. Homozygosity for pLoF mutations at PLA2G7 was associated with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; at TREH, with lower concentrations of apoB-containing lipoprotein subfractions; at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations; and at SLC9A3R1, with mediators of calcium and phosphate signalling. Heterozygous deficiency of APOC3 has been shown to protect against coronary heart disease; we identified APOC3 homozygous pLoF carriers in our cohort. We recruited these human knockouts and challenged them with an oral fat load. Compared with family members lacking the mutation, individuals with APOC3 knocked out displayed marked blunting of the usual post-prandial rise in plasma triglycerides. Overall, these observations provide a roadmap for a 'human knockout project', a systematic effort to understand the phenotypic consequences of complete disruption of genes in humans.
Assuntos
Consanguinidade , Análise Mutacional de DNA , Deleção de Genes , Genes/genética , Estudos de Associação Genética/métodos , Homozigoto , Fenótipo , 1-Alquil-2-acetilglicerofosfocolina Esterase/deficiência , 1-Alquil-2-acetilglicerofosfocolina Esterase/genética , Apolipoproteína C-III/deficiência , Apolipoproteína C-III/genética , Estudos de Coortes , Doença das Coronárias/sangue , Doença das Coronárias/genética , Família 2 do Citocromo P450/genética , Gorduras na Dieta/farmacologia , Exoma/genética , Jejum/sangue , Feminino , Frequência do Gene , Humanos , Interleucina-8/sangue , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/sangue , Infarto do Miocárdio/genética , Neurregulinas/genética , Paquistão , Linhagem , Fosfoproteínas/genética , Período Pós-Prandial , Sítios de Splice de RNA/genética , Genética Reversa/métodos , Trocadores de Sódio-Hidrogênio/genética , Triglicerídeos/sangueRESUMO
Lipid levels are important markers for the development of cardio-metabolic diseases. Although hundreds of associated loci have been identified through genetic association studies, the contribution of genetic factors to variation in lipids is not fully understood, particularly in U.S. minority groups. We performed genome-wide association analyses for four lipid traits in over 45,000 ancestrally diverse participants from the Population Architecture using Genomics and Epidemiology (PAGE) Study, followed by a meta-analysis with several European ancestry studies. We identified nine novel lipid loci, five of which showed evidence of replication in independent studies. Furthermore, we discovered one novel gene in a PrediXcan analysis, minority-specific independent signals at eight previously reported loci, and potential functional variants at two known loci through fine-mapping. Systematic examination of known lipid loci revealed smaller effect estimates in African American and Hispanic ancestry populations than those in Europeans, and better performance of polygenic risk scores based on minority-specific effect estimates. Our findings provide new insight into the genetic architecture of lipid traits and highlight the importance of conducting genetic studies in diverse populations in the era of precision medicine.
Assuntos
Lipídeos/sangue , Lipídeos/genética , Grupos Raciais/genética , Bases de Dados Genéticas , Feminino , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Lipídeos/análise , Masculino , Metagenômica/métodos , Grupos Minoritários , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Estados Unidos/epidemiologiaRESUMO
BACKGROUND AND PURPOSE: Stroke is the leading cause of death and long-term disability worldwide. Previous genome-wide association studies identified 51 loci associated with stroke (mostly ischemic) and its subtypes among predominantly European populations. Using whole-genome sequencing in ancestrally diverse populations from the Trans-Omics for Precision Medicine (TOPMed) Program, we aimed to identify novel variants, especially low-frequency or ancestry-specific variants, associated with all stroke, ischemic stroke and its subtypes (large artery, cardioembolic, and small vessel), and hemorrhagic stroke and its subtypes (intracerebral and subarachnoid). METHODS: Whole-genome sequencing data were available for 6833 stroke cases and 27 116 controls, including 22 315 European, 7877 Black, 2616 Hispanic/Latino, 850 Asian, 54 Native American, and 237 other ancestry participants. In TOPMed, we performed single variant association analysis examining 40 million common variants and aggregated association analysis focusing on rare variants. We also combined TOPMed European populations with over 28 000 additional European participants from the UK BioBank genome-wide array data through meta-analysis. RESULTS: In the single variant association analysis in TOPMed, we identified one novel locus 13q33 for large artery at whole-genome-wide significance (P<5.00×10-9) and 4 novel loci at genome-wide significance (P<5.00×10-8), all of which need confirmation in independent studies. Lead variants in all 5 loci are low-frequency but are more common in non-European populations. An aggregation of synonymous rare variants within the gene C6orf26 demonstrated suggestive evidence of association for hemorrhagic stroke (P<3.11×10-6). By meta-analyzing European ancestry samples in TOPMed and UK BioBank, we replicated several previously reported stroke loci including PITX2, HDAC9, ZFHX3, and LRCH1. CONCLUSIONS: We represent the first association analysis for stroke and its subtypes using whole-genome sequencing data from ancestrally diverse populations. While our findings suggest the potential benefits of combining whole-genome sequencing data with populations of diverse genetic backgrounds to identify possible low-frequency or ancestry-specific variants, they also highlight the need to increase genome coverage and sample sizes.
Assuntos
Loci Gênicos , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Medicina de Precisão , Grupos Raciais/genética , Acidente Vascular Cerebral/genética , Idoso , Idoso de 80 Anos ou mais , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Sequenciamento Completo do GenomaRESUMO
Genetic risk for coronary artery disease (CAD) is commonly measured with polygenic risk scores (PRS); yet, the relationship of atherosclerotic burden with PRS in healthy individuals not at high clinical risk for CAD (ie, without a high pooled cohort equations [PCE] score) is unknown. Here, we implemented a novel recall-by-PRS strategy to measure coronary artery calcium (CAC) scores prospectively in 53 healthy individuals with extreme high PRS (median [IQR] PRS = 94% [83-98]) and low PRS (median [IQR] PRS = 3.6% [1.2-10]). The high PRS group was associated with a 2.8-fold greater CAC than the low PRS group, adjusted for age, sex, BMI, smoking, and statin use, and had a 6.7-fold greater proportion of individuals with CAC exceeding 300 HU. These findings reveal that extreme PRS tracks with CAD risk even in those without high clinical risk and demonstrate proof of principle for recall-by-PRS approaches that should be assessed prospectively in larger trials.
Assuntos
Cálcio , Doença da Artéria Coronariana , Cálcio da Dieta , Estudos de Coortes , Doença da Artéria Coronariana/genética , Humanos , Medição de Risco , Fatores de RiscoRESUMO
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Assuntos
Exoma/genética , Variação Genética/genética , Análise Mutacional de DNA , Conjuntos de Dados como Assunto , Humanos , Fenótipo , Proteoma/genética , Doenças Raras/genética , Tamanho da AmostraRESUMO
Importance: Population-based assessment of disease risk associated with gene variants informs clinical decisions and risk stratification approaches. Objective: To evaluate the population-based disease risk of clinical variants in known disease predisposition genes. Design, Setting, and Participants: This cohort study included 72â¯434 individuals with 37â¯780 clinical variants who were enrolled in the BioMe Biobank from 2007 onwards with follow-up until December 2020 and the UK Biobank from 2006 to 2010 with follow-up until June 2020. Participants had linked exome and electronic health record data, were older than 20 years, and were of diverse ancestral backgrounds. Exposures: Variants previously reported as pathogenic or predicted to cause a loss of protein function by bioinformatic algorithms (pathogenic/loss-of-function variants). Main Outcomes and Measures: The primary outcome was the disease risk associated with clinical variants. The risk difference (RD) between the prevalence of disease in individuals with a variant allele (penetrance) vs in individuals with a normal allele was measured. Results: Among 72â¯434 study participants, 43â¯395 were from the UK Biobank (mean [SD] age, 57 [8.0] years; 24 065 [55%] women; 2948 [7%] non-European) and 29â¯039 were from the BioMe Biobank (mean [SD] age, 56 [16] years; 17 355 [60%] women; 19 663 [68%] non-European). Of 5360 pathogenic/loss-of-function variants, 4795 (89%) were associated with an RD less than or equal to 0.05. Mean penetrance was 6.9% (95% CI, 6.0%-7.8%) for pathogenic variants and 0.85% (95% CI, 0.76%-0.95%) for benign variants reported in ClinVar (difference, 6.0 [95% CI, 5.6-6.4] percentage points), with a median of 0% for both groups due to large numbers of nonpenetrant variants. Penetrance of pathogenic/loss-of-function variants for late-onset diseases was modified by age: mean penetrance was 10.3% (95% CI, 9.0%-11.6%) in individuals 70 years or older and 8.5% (95% CI, 7.9%-9.1%) in individuals 20 years or older (difference, 1.8 [95% CI, 0.40-3.3] percentage points). Penetrance of pathogenic/loss-of-function variants was heterogeneous even in known disease predisposition genes, including BRCA1 (mean [range], 38% [0%-100%]), BRCA2 (mean [range], 38% [0%-100%]), and PALB2 (mean [range], 26% [0%-100%]). Conclusions and Relevance: In 2 large biobank cohorts, the estimated penetrance of pathogenic/loss-of-function variants was variable but generally low. Further research of population-based penetrance is needed to refine variant interpretation and clinical evaluation of individuals with these variant alleles.
Assuntos
Predisposição Genética para Doença , Variação Genética , Mutação com Perda de Função , Penetrância , Idoso , Bancos de Espécimes Biológicos , Estudos de Coortes , Feminino , Humanos , Masculino , Mutação , Reino UnidoRESUMO
Biobanks with exomes linked to electronic health records (EHRs) enable the study of genetic pleiotropy between rare variants and seemingly disparate diseases. We performed robust clinical phenotyping of rare, putatively deleterious variants (loss-of-function [LoF] and deleterious missense variants) in ERCC6, a gene implicated in inherited retinal disease. We analyzed 213,084 exomes, along with a targeted set of retinal, cardiac, and immune phenotypes from two large-scale EHR-linked biobanks. In the primary analysis, a burden of deleterious variants in ERCC6 was strongly associated with (1) retinal disorders; (2) cardiac and electrocardiogram perturbations; and (3) immunodeficiency and decreased immunoglobulin levels. Meta-analysis of results from the BioMe Biobank and UK Biobank showed a significant association of deleterious ERCC6 burden with retinal dystrophy (odds ratio [OR] = 2.6, 95% confidence interval [CI]: 1.5-4.6; p = 8.7 × 10-4 ), atypical atrial flutter (OR = 3.5, 95% CI: 1.9-6.5; p = 6.2 × 10-5 ), arrhythmia (OR = 1.5, 95% CI: 1.2-2.0; p = 2.7 × 10-3 ), and lymphocyte immunodeficiency (OR = 3.8, 95% CI: 2.1-6.8; p = 5.0 × 10-6 ). Carriers of ERCC6 LoF variants who lacked a diagnosis of these conditions exhibited increased symptoms, indicating underdiagnosis. These results reveal a unique genetic link among retinal, cardiac, and immune disorders and underscore the value of EHR-linked biobanks in assessing the full clinical profile of carriers of rare variants.
Assuntos
Pleiotropia Genética , Distrofias Retinianas , Arritmias Cardíacas , DNA Helicases , Enzimas Reparadoras do DNA , Exoma , Humanos , Proteínas de Ligação a Poli-ADP-Ribose , Distrofias Retinianas/genética , Sequenciamento do Exoma/métodosRESUMO
While sequence-based genetic tests have long been available for specific loci, especially for Mendelian disease, the rapidly falling costs of genome-wide genotyping arrays, whole-exome sequencing, and whole-genome sequencing are moving us toward a future where full genomic information might inform the prognosis and treatment of a variety of diseases, including complex disease. Similarly, the availability of large populations with full genomic information has enabled new insights about the etiology and genetic architecture of complex disease. Insights from the latest generation of genomic studies suggest that our categorization of diseases as complex may conceal a wide spectrum of genetic architectures and causal mechanisms that ranges from Mendelian forms of complex disease to complex regulatory structures underlying Mendelian disease. Here, we review these insights, along with advances in the prediction of disease risk and outcomes from full genomic information.