Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Nat Commun ; 15(1): 3238, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622117

RESUMEN

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of L 1 (lasso) and L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.


Asunto(s)
Estudio de Asociación del Genoma Completo , Salud Poblacional , Humanos , Teorema de Bayes , Herencia Multifactorial/genética , Población Negra/genética , Puntuación de Riesgo Genético , Factores de Riesgo
2.
Cell Genom ; 4(4): 100539, 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38604127

RESUMEN

Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. For example, MUSSEL has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, trait architecture, and linkage disequilibrium reference samples; thus, ultimately a combination of methods may be needed to generate the most robust PRSs across diverse populations.


Asunto(s)
Bivalvos , Herencia Multifactorial , Humanos , Animales , Herencia Multifactorial/genética , Estudio de Asociación del Genoma Completo/métodos , Teorema de Bayes , Fenotipo , Puntuación de Riesgo Genético
3.
bioRxiv ; 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-36993331

RESUMEN

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of ℒ1 (lasso) and ℒ2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.

4.
Nat Genet ; 55(10): 1757-1768, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37749244

RESUMEN

Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction.


Asunto(s)
Herencia Multifactorial , Salud Poblacional , Humanos , Herencia Multifactorial/genética , Estudio de Asociación del Genoma Completo , Teorema de Bayes , Polimorfismo de Nucleótido Simple/genética , Factores de Riesgo , Predisposición Genética a la Enfermedad
5.
bioRxiv ; 2023 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-37090648

RESUMEN

Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups. In our simulation studies and data analyses of 16 traits across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. The method, for example, has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African Ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, underlying trait architecture, and the choice of reference samples for LD estimation, and thus ultimately, a combination of methods may be needed to generate the most robust PRS across diverse populations.

6.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1782-1793, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33237867

RESUMEN

It remains challenging how to find existing but undiscovered genome sequence mutations or predict potential genome sequence mutations based on real sequence data. Motivated by this, we develop approaches to detect new, undiscovered genome sequences. Because discovering new genome sequences through biological experiments is resource-intensive, we want to achieve the new genome sequence detection task mathematically. However, little literature tells us how to detect new, undiscovered genome sequence mutations mathematically. We form a new framework based on natural vector convex hull method that conducts alignment-free sequence analysis. Our newly developed two approaches, Random-permutation Algorithm with Penalty (RAP) and Random-permutation Algorithm with Penalty and COstrained Search (RAPCOS), use the geometry properties captured by natural vectors. In our experiment, we discover a mathematically new human immunodeficiency virus (HIV) genome sequence using some real HIV genome sequences. Significantly, the proposed methods are applicable to solve the new genome sequence detection challenge and have many good properties, such as robustness, rapid convergence, and fast computation.


Asunto(s)
Algoritmos , Genoma , Genoma/genética , Humanos
7.
Entropy (Basel) ; 22(3)2020 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-33286064

RESUMEN

Traditional hypothesis-margin researches focus on obtaining large margins and feature selection. In this work, we show that the robustness of margins is also critical and can be measured using entropy. In addition, our approach provides clear mathematical formulations and explanations to uncover feature interactions, which is often lack in large hypothesis-margin based approaches. We design an algorithm, termed IMMIGRATE (Iterative max-min entropy margin-maximization with interaction terms), for training the weights associated with the interaction terms. IMMIGRATE simultaneously utilizes both local and global information and can be used as a base learner in Boosting. We evaluate IMMIGRATE in a wide range of tasks, in which it demonstrates exceptional robustness and achieves the state-of-the-art results with high interpretability.

8.
Environ Pollut ; 243(Pt B): 1710-1718, 2018 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30408858

RESUMEN

The estimation of PM2.5-related mortality is becoming increasingly important. The accuracy of results is largely dependent on the selection of methods for PM2.5 exposure assessment and Concentration-Response (C-R) function. In this study, PM2.5 observed data from the China National Environmental Monitoring Center, satellite-derived estimation, widely collected geographic and socioeconomic information variables were applied to develop a national satellite-based Land Use Regression model and evaluate PM2.5 exposure concentrations within 2013-2015 with the resolution of 1 km × 1 km. Population weighted concentration declined from 72.52 µg/m3 in 2013 to 57.18 µg/m3 in 2015. C-R function is another important section of health effect assessment, but most previous studies used the Integrated Exposure Regression (IER) function which may currently underestimate the excess relative risk of exceeding the exposure range in China. A new Shape Constrained Health Impact Function (SCHIF) method, which was developed from a national cohort of 189,793 Chinese men, was adopted to estimate the PM2.5-related premature deaths in China. Results showed that 2.19 million (2013), 1.94 million (2014), 1.65 million (2015) premature deaths were attributed to PM2.5 long-term exposure, different from previous understanding around 1.1-1.7 million. The top three provinces of the highest premature deaths were Henan, Shandong, Sichuan, while the least ones were Tibet, Hainan, Qinghai. The proportions of premature deaths caused by specific diseases were 53.2% for stroke, 20.5% for ischemic heart disease, 16.8% for chronic obstructive pulmonary disease and 9.5% for lung cancer. IER function was also used to calculate PM2.5-related premature deaths with the same exposed level used in SCHIF method, and the comparison of results indicated that IER had made a much lower estimation with less annual amounts around 0.15-0.5 million premature deaths within 2013-2015.


Asunto(s)
Contaminantes Atmosféricos/análisis , Exposición a Riesgos Ambientales/efectos adversos , Neoplasias Pulmonares/mortalidad , Isquemia Miocárdica/mortalidad , Material Particulado/análisis , Enfermedad Pulmonar Obstructiva Crónica/mortalidad , Accidente Cerebrovascular/mortalidad , Contaminantes Atmosféricos/toxicidad , Contaminación del Aire/análisis , China/epidemiología , Estudios de Cohortes , Monitoreo del Ambiente , Humanos , Neoplasias Pulmonares/inducido químicamente , Masculino , Mortalidad Prematura , Isquemia Miocárdica/inducido químicamente , Material Particulado/toxicidad , Enfermedad Pulmonar Obstructiva Crónica/inducido químicamente , Accidente Cerebrovascular/inducido químicamente , Tibet
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA