Búsqueda | Portal de Búsqueda de la BVS España

Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning.

Monti, Remo; Eick, Lisa; Hudjashov, Georgi; Läll, Kristi; Kanoni, Stavroula; Wolford, Brooke N; Wingfield, Benjamin; Pain, Oliver; Wharrie, Sophie; Jermy, Bradley; McMahon, Aoife; Hartonen, Tuomo; Heyne, Henrike; Mars, Nina; Lambert, Samuel; Hveem, Kristian; Inouye, Michael; van Heel, David A; Mägi, Reedik; Marttinen, Pekka; Ripatti, Samuli; Ganna, Andrea; Lippert, Christoph.

Am J Hum Genet ; 111(7): 1431-1447, 2024 07 11.

Artículo en Inglés | MEDLINE | ID: mdl-38908374

RESUMEN

Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (ß coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.

Asunto(s)

Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Fenotipo , Diabetes Mellitus Tipo 1/genética , Polimorfismo de Nucleótido Simple , Aprendizaje Automático

HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes.

Wharrie, Sophie; Yang, Zhiyu; Raj, Vishnu; Monti, Remo; Gupta, Rahul; Wang, Ying; Martin, Alicia; O'Connor, Luke J; Kaski, Samuel; Marttinen, Pekka; Palamara, Pier Francesco; Lippert, Christoph; Ganna, Andrea.

Bioinformatics ; 39(9)2023 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-37647640

RESUMEN

MOTIVATION: Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking. RESULTS: We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures. AVAILABILITY AND IMPLEMENTATION: A synthetic dataset of 1 008 000 individuals and nine traits for 6.8 million common variants is available at https://www.ebi.ac.uk/biostudies/studies/S-BSST936. The HAPNEST software for generating synthetic datasets is available as Docker/Singularity containers and open source Julia and C code at https://github.com/intervene-EU-H2020/synthetic_data.

Asunto(s)

Benchmarking , Exactitud de los Datos , Humanos , Genotipo , Fenotipo , Herencia Multifactorial

Micro-, meso-, macroscales: The effect of triangles on communities in networks.

Wharrie, Sophie; Azizi, Lamiae; Altmann, Eduardo G.

Phys Rev E ; 100(2-1): 022315, 2019 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-31574618

RESUMEN

Mesoscale structures (communities) are used to understand the macroscale properties of complex networks, such as their functionality and formation mechanisms. Microscale structures are known to exist in most complex networks (e.g., large number of triangles or motifs), but they are absent in the simple random-graph models considered (e.g., as null models) in community-detection algorithms. In this paper we investigate the effect of microstructures on the appearance of communities in networks. We find that alone the presence of triangles leads to the appearance of communities even in methods designed to avoid the detection of communities in random networks. This shows that communities can emerge spontaneously from simple processes of motiff generation happening at a microlevel. Our results are based on four widely used community-detection approaches (stochastic block model, spectral method, modularity maximization, and the Infomap algorithm) and three different generative network models (triadic closure, generalized configuration model, and random graphs with triangles).

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA