Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 336
Filter
1.
medRxiv ; 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-39040172

ABSTRACT

The number of assays on highly-multiplexed proteomic platforms has grown ten-fold over the past 15 years from less than 1,000 to >11,000. The leading aptamer-based and antibody-based platforms have different strengths. For example, Eldjarn et al 1 demonstrated that the aptamer-based SomaScan 5k (4,907 assays, assessed in the Icelandic 36K) and the antibody-based Olink Explore 3072 (2,931 assays, assessed in the UK BioBank) had a similar number of cis -pQTLs among all targets (2,120 vs. 2,101) but Olink had a greater number of cis -pQTLs among the overlapping targets (1,164 vs. 1,467). Analysis of split plasma measures showed the SomaScan assays to be more precise: median coefficient of variation (CV) of 9.9% vs. 16.5% for Olink. 1 Precision of the newest versions of the platforms-SomaScan 11k (>11,000 assays, released in December 2023) and Olink Explore HT (>5,400 assays, released in July 2023)-has not yet been established. We assessed the reproducibility of the SomaScan 11k and Olink Explore HT using split plasma samples from 102 Atherosclerosis Risk in Communities (ARIC) Study participants. We found that the SomaScan 11k assays had a median CV of 6.8% (vs 6.6% for the subset of assays available on the SomaScan 5k) and the Olink Explore HT assays had a median CV of 35.7% (vs 19.8% for the subset of assays available on the Olink Explore 3072). Across Olink assays, the CVs were strongly negatively correlated with protein detectability, i.e., percent of samples above the limit of detection (LOD). For the 4,443 overlapping assays, the distribution of between-platform correlations was bimodal with a peak at r ∼0 and with another smaller peak at r ∼0.8. These findings on precision are consistent with the updated results by Eldjarn et al 1 but indicate that precision of these two leading platforms in human plasma has diverged as the number of included proteins has increased.

2.
Environ Pollut ; : 124484, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38960120

ABSTRACT

Sundarban, a Ramsar site of India, has been encountering an ecological threat due to the presence of microplastic (MP) wastes generated from different anthropogenic sources. Clibanarius longitarsus, an intertidal hermit crab of Sundarban Biosphere Reserve, resides within the abandoned shell of a gastropod mollusc, Telescopium telescopium. We characterized and estimated the MP in the gills and gut of hermit crab, as well as in the water present in its occupied gastropod shell. The average microplastic abundance in sea water, sand and sediment were 0.175 ± 0.145 MP L-1, 42 ± 15.03 MP kg-1 and 67.63 ± 24.13 MP kg-1 respectively. The average microplastic load in hermit crab was 1.94 ± 0.59 MP crab-1, with 33.89 % and 66.11 % in gills and gut respectively. Gastropod shell water exhibited accumulation of 1.69 ± 1.43 MP L-1. Transparent and fibrous microplastics were documented as the dominant polymers of water, sand and sediment. Shell water exhibited the prevalence of green microplastics followed by transparent ones. Microscopic examination revealed microplastics with 100-300 µm size categories were dominant across all abiotic compartments. ATR-FTIR and Raman spectroscopy confirmed polyethylene and polypropylene as the prevalent polymers among the five identified polymers of biotic and abiotic components. The target group index indicated green and black as the preferable microplastics of crab. The ecological risk analysis indicated a considerable level of environmental pollution risk in Sundarban and its inhabiting organisms. This important information base may facilitate in developing a strategy of mitigation to limit the MP induced ecological risk at Sundarban Biosphere Reserve.

3.
Diabetes ; 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38869630

ABSTRACT

Genetic studies of non-traditional glycemic biomarkers, glycated albumin and fructosamine, can shed light on unknown aspects of type 2 diabetes genetics and biology. We performed a multi-phenotype GWAS of glycated albumin and fructosamine from 7,395 White and 2,016 Black participants in the Atherosclerosis Risk in Communities (ARIC) study on common variants from genotyped/imputed data. We discovered 2 genome-wide significant loci, one mapping to known type 2 diabetes gene (ARAP1/STARD10) and another mapping to a novel region (UGT1A complex of genes) using multi-omics gene-mapping strategies in diabetes-relevant tissues. We identified additional loci that were ancestry- and sex-specific (e.g., PRKCA in African ancestry, FCGRT in European ancestry, TEX29 in males). Further, we implemented multi-phenotype gene-burden tests on whole-exome sequence data from 6,590 White and 2,309 Black ARIC participants. Ten variant sets annotated to genes across different variant aggregation strategies were exome-wide significant only in multi-ancestry analysis, of which CD1D, EGFL7/AGPAT2 and MIR126 had notable enrichment of rare predicted loss of function variants in African ancestry despite smaller sample sizes. Overall, 8 out of 14 discovered loci and genes were implicated to influence these biomarkers via glycemic pathways, and most of them were not previously implicated in studies of type 2 diabetes. This study illustrates improved locus discovery and potential effector gene discovery by leveraging joint patterns of related biomarkers across the entire allele frequency spectrum in multi-ancestry analysis. Future investigation of the loci and genes potentially acting through glycemic pathways may help us better understand risk of developing type 2 diabetes.

4.
JAMIA Open ; 7(2): ooae055, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38938691

ABSTRACT

Objectives: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face limitations in portability and privacy due to their need for circulating user data in remote servers for operation. We overcome this by porting iCARE to the web platform. Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) and then compiled it to WebAssembly (Wasm-iCARE)-a portable web module, which operates within the privacy of the user's device. Results: We showcase the portability and privacy of Wasm-iCARE through 2 applications: for researchers to statistically validate risk models and to deliver them to end-users. Both applications run entirely on the client side, requiring no downloads or installations, and keep user data on-device during risk calculation. Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery.

5.
Am J Epidemiol ; 2024 May 29.
Article in English | MEDLINE | ID: mdl-38806447

ABSTRACT

Polygenic risk scores (PRS) are rapidly emerging as a way to measure disease risk by aggregating multiple genetic variants. Understanding the interplay of PRS with environmental factors is critical for interpreting and applying PRS in a wide variety of settings. We develop an efficient method for simultaneously modeling gene-environment correlations and interactions using PRS in case control studies. We use a logistic-normal regression modeling framework to specify the disease risk and PRS distribution in the underlying population and propose joint inference across the two models using the retrospective likelihood of the case-control data. Extensive simulation studies demonstrate the flexibility of the method in trading-off bias and efficiency for the estimation of various model parameters compared to the standard logistic regression or a case-only analysis for gene-environment interactions, or a control-only analysis for gene-environment correlations. Finally using simulated case-control data sets within the UK Biobank study, we demonstrate the power of our method for its ability to recover results from the full prospective cohort for the detection of an interaction between long-term oral contraceptive use and PRS on the risk of breast cancer. This method is computationally efficient and implemented in a user-friendly R package.

6.
Nat Commun ; 15(1): 3238, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38622117

ABSTRACT

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of L 1 (lasso) and L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.


Subject(s)
Genome-Wide Association Study , Population Health , Humans , Bayes Theorem , Multifactorial Inheritance/genetics , Black People/genetics , Genetic Risk Score , Risk Factors
7.
Cell Genom ; 4(4): 100539, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38604127

ABSTRACT

Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. For example, MUSSEL has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, trait architecture, and linkage disequilibrium reference samples; thus, ultimately a combination of methods may be needed to generate the most robust PRSs across diverse populations.


Subject(s)
Bivalvia , Multifactorial Inheritance , Humans , Animals , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Bayes Theorem , Phenotype , Genetic Risk Score
8.
HGG Adv ; 5(2): 100283, 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38491773

ABSTRACT

Integrating results from genome-wide association studies (GWASs) and studies of molecular phenotypes such as gene expressions can improve our understanding of the biological functions of trait-associated variants and can help prioritize candidate genes for downstream analysis. Using reference expression quantitative trait locus (eQTL) studies, several methods have been proposed to identify gene-trait associations, primarily based on gene expression imputation. To increase the statistical power by leveraging substantial eQTL sharing across tissues, meta-analysis methods aggregating such gene-based test results across multiple tissues or contexts have been developed as well. However, most existing meta-analysis methods have limited power to identify associations when the gene has weaker associations in only a few tissues and cannot identify the subset of tissues in which the gene is "activated." For this, we developed a cross-tissue subset-based transcriptome-wide association study (CSTWAS) meta-analysis method that improves power under such scenarios and can extract the set of potentially associated tissues. To improve applicability, CSTWAS uses only GWAS summary statistics and pre-computed correlation matrices to identify a subset of tissues that have the maximal evidence of gene-trait association. Through numerical simulations, we found that CSTWAS can maintain a well-calibrated type-I error rate, improves power especially when there is a small number of associated tissues for a gene-trait association, and identifies an accurate associated tissue set. By analyzing GWAS summary statistics of three complex traits and diseases, we demonstrate that CSTWAS could identify biological meaningful signals while providing an interpretation of disease etiology by extracting a set of potentially associated tissues.


Subject(s)
Genome-Wide Association Study , Transcriptome , Transcriptome/genetics , Genome-Wide Association Study/methods , Phenotype , Quantitative Trait Loci/genetics
9.
Biostatistics ; 2024 Mar 08.
Article in English | MEDLINE | ID: mdl-38459704

ABSTRACT

Mendelian randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers or other types of traits that are co-regulated by the exposure. We propose a method for MR analysis on latent exposures (MRLE), which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of GWAS summary association statistics for the observable traits, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies show that MRLE has well-controlled type I error rates and enhanced power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α, and MCP-1) provide evidence for potential causal effects of inflammation on increasing the risk of coronary artery disease, colorectal cancer, and rheumatoid arthritis, while standard MR analysis for individual biomarkers fails to detect consistent evidence for such effects.

10.
medRxiv ; 2024 Feb 18.
Article in English | MEDLINE | ID: mdl-38405761

ABSTRACT

Obesity is a recognised risk factor for many cancers and with rising global prevalence, has become a leading cause of cancer. Here we summarise the current evidence from both population-based epidemiologic investigations and experimental studies on the role of obesity in cancer development. This review presents a new meta-analysis using data from 40 million individuals and reports positive associations with 19 cancer types. Utilising major new data from East Asia, the meta-analysis also shows that the strength of obesity and cancer associations varies regionally, with stronger relative risks for several cancers in East Asia. This review also presents current evidence on the mechanisms linking obesity and cancer and identifies promising future research directions. These include the use of new imaging data to circumvent the methodological issues involved with body mass index and the use of omics technologies to resolve biologic mechanisms with greater precision and clarity.

11.
J Allergy Clin Immunol ; 153(4): 954-968, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38295882

ABSTRACT

Studies of asthma and allergy are generating increasing volumes of omics data for analysis and interpretation. The National Institute of Allergy and Infectious Diseases (NIAID) assembled a workshop comprising investigators studying asthma and allergic diseases using omics approaches, omics investigators from outside the field, and NIAID medical and scientific officers to discuss the following areas in asthma and allergy research: genomics, epigenomics, transcriptomics, microbiomics, metabolomics, proteomics, lipidomics, integrative omics, systems biology, and causal inference. Current states of the art, present challenges, novel and emerging strategies, and priorities for progress were presented and discussed for each area. This workshop report summarizes the major points and conclusions from this NIAID workshop. As a group, the investigators underscored the imperatives for rigorous analytic frameworks, integration of different omics data types, cross-disciplinary interaction, strategies for overcoming current limitations, and the overarching goal to improve scientific understanding and care of asthma and allergic diseases.


Subject(s)
Asthma , Hypersensitivity , United States , Humans , National Institute of Allergy and Infectious Diseases (U.S.) , Hypersensitivity/genetics , Asthma/etiology , Genomics , Proteomics , Metabolomics
12.
bioRxiv ; 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-36993331

ABSTRACT

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of ℒ1 (lasso) and ℒ2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.

13.
Mar Pollut Bull ; 198: 115857, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38039580

ABSTRACT

Sundarbans, a Ramsar site of India is contaminated with heterogeneous microplastic wastes. Boddart's goggle eye mudskipper and Rubicundus eelgoby, were common gobies of Sundarbans estuary which accumulated microplastics during their normal biological activities. We estimated the abundance of microplastics in water, sediment; skin, gills, bucco-opercular cavity and gastrointestinal tract of these two goby fishes. Microplastic load estimated in gobies were 0.84 and 2.62 particles per fish species with a dominance of transparent, fibrous microplastics with 100-300 µm in length. ATR-FTIR and Raman spectroscopy revealed polyethylene as prevalent polymer. Surface degradations and adsorption of contaminants on microplastic surface were investigated by SEM-EDX analysis. Presence of hazardous polymers influenced high polymer hazard index and potential ecological risk index which indicated acute environmental threat to Sundarbans estuary and its resident organisms. Current study will provide a new information base on the abundance of microplastics and its ecological hazard in this biosphere reserve.


Subject(s)
Microplastics , Water Pollutants, Chemical , Animals , Plastics , Environmental Monitoring , Water Pollutants, Chemical/analysis , Ecosystem , Fishes , Polymers
14.
Nat Rev Genet ; 25(1): 8-25, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37620596

ABSTRACT

Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.


Subject(s)
Genetic Predisposition to Disease , Humans , Risk Factors , Multifactorial Inheritance , Precision Medicine , Genome-Wide Association Study
15.
JAMA Netw Open ; 6(11): e2339254, 2023 Nov 01.
Article in English | MEDLINE | ID: mdl-37955902

ABSTRACT

Importance: Estimating absolute risk of lung cancer for never-smoking individuals is important to inform lung cancer screening programs. Objectives: To integrate data on environmental tobacco smoke (ETS), a known lung cancer risk factor, with a polygenic risk score (PRS) that captures overall genetic susceptibility, to estimate the absolute risk of lung adenocarcinoma (LUAD) among never-smokers in Taiwan. Design, Setting, and Participants: The analyses were conducted in never-smoking women in the Taiwan Genetic Epidemiology Study of Lung Adenocarcinoma, a case-control study. Participants were recruited between September 17, 2002, and March 30, 2011. Data analysis was performed from January 17 to July 15, 2022. Exposures: A PRS was derived using 25 genetic variants that achieved genome-wide significance (P < 5 × 10-8) in a recent genome-wide association study, and ETS was defined as never exposed, exposed at home or at work, and exposed at home and at work. Main Outcomes and Measures: The Individualized Coherent Absolute Risk Estimator software was used to estimate the lifetime absolute risk of LUAD in never-smoking women aged 40 years over a projected 40-year span among the controls by using the relative risk estimates for the PRS and ETS exposures, as well as age-specific lung cancer incidence rates for never-smokers in Taiwan. Likelihood ratio tests were conducted to assess an additive interaction between the PRS and ETS exposure. Results: Data were obtained on 1024 women with LUAD (mean [SD] age, 59.6 [11.4] years, 47.9% ever exposed to ETS at home, and 19.5% ever exposed to ETS at work) and 1024 controls (mean [SD] age, 58.9 [11.0] years, 37.0% ever exposed to ETS at home, and 14.3% ever exposed to ETS at work). The overall average lifetime 40-year absolute risk of LUAD estimated using PRS alone was 2.5% (range, 0.6%-10.3%) among women never exposed to ETS. When integrating both ETS and PRS data, the estimated absolute risk was 3.7% (range, 0.6%-14.5%) for women exposed to ETS at home or work and 5.3% (range, 1.2%-12.1%) for women exposed to ETS at home and work. A super-additive interaction between ETS and the PRS (P = 6.5 × 10-4 for interaction) was identified. Conclusions and Relevance: This study found differences in absolute risk of LUAD attributed to genetic susceptibility according to levels of ETS exposure in never-smoking women. Future studies are warranted to integrate these findings in expanded risk models for LUAD.


Subject(s)
Adenocarcinoma of Lung , Lung Neoplasms , Tobacco Smoke Pollution , Female , Humans , Middle Aged , Tobacco Smoke Pollution/adverse effects , Case-Control Studies , Early Detection of Cancer , Genetic Predisposition to Disease , Genome-Wide Association Study , Taiwan/epidemiology , Lung Neoplasms/etiology , Lung Neoplasms/genetics , Smoking , Risk Factors , Adenocarcinoma of Lung/epidemiology , Adenocarcinoma of Lung/genetics
16.
PeerJ ; 11: e15914, 2023.
Article in English | MEDLINE | ID: mdl-38025689

ABSTRACT

Background: Large carnivores play a crucial role in maintaining the balance of the ecosystem. Successful conservation initiatives have often led to a huge increase in predators which has often led to negative interactions with humans. Without the knowledge of the carrying capacity of the top predator, such decisions become challenging. Here, we have derived a new equation to estimate the carrying capacity of tigers based on the individual prey species density. Methods: We used tiger densities and respective prey densities of different protected areas. Relative prey abundance was used instead of absolute prey density as this could be a better surrogate of the prey preference. We used a regression approach to derive the species-wise equation. We have also scaled these coefficients accordingly to control the variation in the standard error (heteroscedasticity) of the tiger density. Furthermore, we have extended this regression equation for different species to different weight classes for more generalized application of the method. Results: The new equations performed considerably better compared to the earlier existing carrying capacity equations. Incorporating the species-wise approach in the equation also reflected the preference of the prey species for the tiger. This is the first carrying capacity equation where the individual prey densities are used to estimate the carnivore population density. The coefficient estimates of the model with the comparison with prey-predator power laws also reflect the differential effect of tigers on different prey species. The carrying capacity estimates will aid in a better understanding of the predator-prey interaction and will advance better management of the top predator.


Subject(s)
Carnivora , Tigers , Animals , Humans , Ecosystem , Conservation of Natural Resources , Population Density
17.
ArXiv ; 2023 Oct 13.
Article in English | MEDLINE | ID: mdl-37873020

ABSTRACT

Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to overcome these limitations. Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) then compiled it to WebAssembly (Wasm-iCARE): a portable web module, which operates entirely within the privacy of the user's device. Results: We showcase the portability and privacy of Wasm-iCARE through two applications: for researchers to statistically validate risk models, and to deliver them to end-users. Both applications run entirely on the client-side, requiring no downloads or installations, and keeps user data on-device during risk calculation. Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery.

18.
Cancer Epidemiol Biomarkers Prev ; 32(11): 1477-1478, 2023 11 01.
Article in English | MEDLINE | ID: mdl-37698541

ABSTRACT

A recent study published in the journal claimed that genetic susceptibility to breast cancer occurs mainly due to rare inherited variants. The claim relies on a set of deductive arguments following observations on patterns of age-at-onset distribution of the disease among twin pairs. In this brief commentary, we point out a major gap in the given argument due to the interchangeable use of hazard rates and age-at-onset distribution, and thus conclude that the published study does not provide any evidence against polygenic risk of breast cancer due to common variants. See related article by Yasui et al., p. 1518.


Subject(s)
Breast Neoplasms , Genetic Predisposition to Disease , Humans , Female , Age of Onset , Twins , Breast Neoplasms/epidemiology , Breast Neoplasms/genetics , Multifactorial Inheritance
19.
Nat Genet ; 55(10): 1757-1768, 2023 10.
Article in English | MEDLINE | ID: mdl-37749244

ABSTRACT

Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction.


Subject(s)
Multifactorial Inheritance , Population Health , Humans , Multifactorial Inheritance/genetics , Genome-Wide Association Study , Bayes Theorem , Polymorphism, Single Nucleotide/genetics , Risk Factors , Genetic Predisposition to Disease
20.
medRxiv ; 2023 Jun 16.
Article in English | MEDLINE | ID: mdl-37398180

ABSTRACT

Glycated hemoglobin, fasting glucose, glycated albumin, and fructosamine are biomarkers that reflect different aspects of the glycemic process. Genetic studies of these glycemic biomarkers can shed light on unknown aspects of type 2 diabetes genetics and biology. While there exists several GWAS of glycated hemoglobin and fasting glucose, very few GWAS have focused on glycated albumin or fructosamine. We performed a multi-phenotype GWAS of glycated albumin and fructosamine from 7,395 White and 2,016 Black participants in the Atherosclerosis Risk in Communities (ARIC) study on the common variants from genotyped/imputed data. We found 2 genome-wide significant loci, one mapping to known type 2 diabetes gene (ARAP1/STARD10, p = 2.8 × 10-8) and another mapping to a novel gene (UGT1A, p = 1.4 × 10-8) using multi-omics gene mapping strategies in diabetes-relevant tissues. We identified additional loci that were ancestry-specific (e.g., PRKCA from African ancestry individuals, p = 1.7 × 10-8) and sex-specific (TEX29 locus in males only, p = 3.0 × 10-8). Further, we implemented multi-phenotype gene-burden tests on whole-exome sequence data from 6,590 White and 2,309 Black ARIC participants. Eleven genes across different rare variant aggregation strategies were exome-wide significant only in multi-ancestry analysis. Four out of 11 genes had notable enrichment of rare predicted loss of function variants in African ancestry participants despite smaller sample size. Overall, 8 out of 15 loci/genes were implicated to influence these biomarkers via glycemic pathways. This study illustrates improved locus discovery and potential effector gene discovery by leveraging joint patterns of related biomarkers across entire allele frequency spectrum in multi-ancestry analyses. Most of the loci/genes we identified have not been previously implicated in studies of type 2 diabetes, and future investigation of the loci/genes potentially acting through glycemic pathways may help us better understand risk of developing type 2 diabetes.

SELECTION OF CITATIONS
SEARCH DETAIL