Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 618(7966): 774-781, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37198491

RESUMEN

Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.


Asunto(s)
Herencia Multifactorial , Grupos Raciales , Humanos , Europa (Continente)/etnología , Hispánicos o Latinos/genética , Herencia Multifactorial/genética , Grupos Raciales/genética , Reino Unido , Población Blanca/genética , Pueblo Europeo/genética , Los Angeles , Bases de Datos Genéticas
2.
Am J Hum Genet ; 110(12): 2042-2055, 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-37944514

RESUMEN

LDpred2 is a widely used Bayesian method for building polygenic scores (PGSs). LDpred2-auto can infer the two parameters from the LDpred model, the SNP heritability h2 and polygenicity p, so that it does not require an additional validation dataset to choose best-performing parameters. The main aim of this paper is to properly validate the use of LDpred2-auto for inferring multiple genetic parameters. Here, we present a new version of LDpred2-auto that adds an optional third parameter α to its model, for modeling negative selection. We then validate the inference of these three parameters (or two, when using the previous model). We also show that LDpred2-auto provides per-variant probabilities of being causal that are well calibrated and can therefore be used for fine-mapping purposes. We also introduce a formula to infer the out-of-sample predictive performance r2 of the resulting PGS directly from the Gibbs sampler of LDpred2-auto. Finally, we extend the set of HapMap3 variants recommended to use with LDpred2 with 37% more variants to improve the coverage of this set, and we show that this new set of variants captures 12% more heritability and provides 6% more predictive performance, on average, in UK Biobank analyses.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Teorema de Bayes , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética
3.
Am J Hum Genet ; 109(1): 12-23, 2022 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-34995502

RESUMEN

The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.


Asunto(s)
Estudios de Asociación Genética/métodos , Predisposición Genética a la Enfermedad , Genética de Población/métodos , Herencia Multifactorial , Algoritmos , Alelos , Bancos de Muestras Biológicas , Variación Genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Modelos Genéticos , Fenotipo , Reproducibilidad de los Resultados , Reino Unido
4.
Am J Hum Genet ; 109(3): 417-432, 2022 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-35139346

RESUMEN

Genome-wide association studies (GWASs) have revolutionized human genetics, allowing researchers to identify thousands of disease-related genes and possible drug targets. However, case-control status does not account for the fact that not all controls may have lived through their period of risk for the disorder of interest. This can be quantified by examining the age-of-onset distribution and the age of the controls or the age of onset for cases. The age-of-onset distribution may also depend on information such as sex and birth year. In addition, family history is not routinely included in the assessment of control status. Here, we present LT-FH++, an extension of the liability threshold model conditioned on family history (LT-FH), which jointly accounts for age of onset and sex as well as family history. Using simulations, we show that, when family history and the age-of-onset distribution are available, the proposed approach yields statistically significant power gains over LT-FH and large power gains over genome-wide association study by proxy (GWAX). We applied our method to four psychiatric disorders available in the iPSYCH data and to mortality in the UK Biobank and found 20 genome-wide significant associations with LT-FH++, compared to ten for LT-FH and eight for a standard case-control GWAS. As more genetic data with linked electronic health records become available to researchers, we expect methods that account for additional health information, such as LT-FH++, to become even more beneficial.


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Edad de Inicio , Estudios de Casos y Controles , Estudio de Asociación del Genoma Completo/métodos , Humanos , Anamnesis
5.
Am J Hum Genet ; 108(6): 1001-1011, 2021 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-33964208

RESUMEN

The accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWASs). However, it is now common for researchers to have access to large individual-level data as well, such as the UK Biobank data. To the best of our knowledge, it has not yet been explored how best to combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using 12 real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and meta-PRS. We find that, when large individual-level data are available, the linear combination of PRSs (meta-PRS) is both a simple alternative to meta-GWAS and often more accurate.


Asunto(s)
Enfermedad/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Modelos Estadísticos , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Estudios de Casos y Controles , Humanos , Fenotipo
6.
Bioinformatics ; 38(13): 3477-3480, 2022 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-35604078

RESUMEN

MOTIVATION: Measuring genetic diversity is an important problem because increasing genetic diversity is a key to making new genetic discoveries, while also being a major source of confounding to be aware of in genetics studies. RESULTS: Using the UK Biobank data, a prospective cohort study with deep genetic and phenotypic data collected on almost 500 000 individuals from across the UK, we carefully define 21 distinct ancestry groups from all four corners of the world. These ancestry groups can serve as a global reference of worldwide populations, with a handful of applications. Here, we develop a method that uses allele frequencies and principal components derived from these ancestry groups to effectively measure ancestry proportions from allele frequencies of any genetic dataset. AVAILABILITY AND IMPLEMENTATION: This method is implemented in function snp_ancestry_summary of R package bigsnpr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Estudios Prospectivos , Reino Unido , Polimorfismo de Nucleótido Simple
7.
Am J Hum Genet ; 105(6): 1213-1221, 2019 12 05.
Artículo en Inglés | MEDLINE | ID: mdl-31761295

RESUMEN

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.


Asunto(s)
Algoritmos , Enfermedad/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple , Bancos de Muestras Biológicas , Estudios de Casos y Controles , Simulación por Computador , Humanos , Modelos Genéticos , Reino Unido
8.
Bioinformatics ; 38(1): 255-256, 2021 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-34260708

RESUMEN

MOTIVATION: A few algorithms have been developed for splitting the genome in nearly independent blocks of linkage disequilibrium. Due to the complexity of this problem, these algorithms rely on heuristics, which makes them suboptimal. RESULTS: Here, we develop an optimal solution for this problem using dynamic programming. AVAILABILITY: This is now implemented as function snp_ldsplit as part of R package bigsnpr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Desequilibrio de Ligamiento , Programas Informáticos , Humanos , Genoma Humano , Biología Computacional
9.
Bioinformatics ; 36(22-23): 5424-5431, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33326037

RESUMEN

MOTIVATION: Polygenic scores have become a central tool in human genetics research. LDpred is a popular method for deriving polygenic scores based on summary statistics and a matrix of correlation between genetic variants. However, LDpred has limitations that may reduce its predictive performance. RESULTS: Here, we present LDpred2, a new version of LDpred that addresses these issues. We also provide two new options in LDpred2: a 'sparse' option that can learn effects that are exactly 0, and an 'auto' option that directly learns the two LDpred parameters from data. We benchmark predictive performance of LDpred2 against the previous version on simulated and real data, demonstrating substantial improvements in robustness and predictive accuracy compared to LDpred1. We then show that LDpred2 also outperforms other polygenic score methods recently developed, with a mean AUC over the 8 real traits analyzed here of 65.1%, compared to 63.8% for lassosum, 62.9% for PRS-CS and 61.5% for SBayesR. Note that LDpred2 provides more accurate polygenic scores when run genome-wide, instead of per chromosome. AVAILABILITY AND IMPLEMENTATION: LDpred2 is implemented in R package bigsnpr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

10.
Mol Biol Evol ; 37(7): 2153-2154, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32343802

RESUMEN

R package pcadapt is a user-friendly R package for performing genome scans for local adaptation. Here, we present version 4 of pcadapt which substantially improves computational efficiency while providing similar results. This improvement is made possible by using a different format for storing genotypes and a different algorithm for computing principal components of the genotype matrix, which is the most computationally demanding step in method pcadapt. These changes are seamlessly integrated into the existing pcadapt package, and users will experience a large reduction in computation time (by a factor of 20-60 in our analyses) as compared with previous versions.


Asunto(s)
Adaptación Biológica , Genómica/métodos , Programas Informáticos
11.
Bioinformatics ; 36(16): 4449-4457, 2020 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-32415959

RESUMEN

MOTIVATION: Principal component analysis (PCA) of genetic data is routinely used to infer ancestry and control for population structure in various genetic analyses. However, conducting PCA analyses can be complicated and has several potential pitfalls. These pitfalls include (i) capturing linkage disequilibrium (LD) structure instead of population structure, (ii) projected PCs that suffer from shrinkage bias, (iii) detecting sample outliers and (iv) uneven population sizes. In this work, we explore these potential issues when using PCA, and present efficient solutions to these. Following applications to the UK Biobank and the 1000 Genomes project datasets, we make recommendations for best practices and provide efficient and user-friendly implementations of the proposed solutions in R packages bigsnpr and bigutilsr. RESULTS: For example, we find that PC19-PC40 in the UK Biobank capture complex LD structure rather than population structure. Using our automatic algorithm for removing long-range LD regions, we recover 16 PCs that capture population structure only. Therefore, we recommend using only 16-18 PCs from the UK Biobank to account for population structure confounding. We also show how to use PCA to restrict analyses to individuals of homogeneous ancestry. Finally, when projecting individual genotypes onto the PCA computed from the 1000 Genomes project data, we find a shrinkage bias that becomes large for PC5 and beyond. We then demonstrate how to obtain unbiased projections efficiently using bigsnpr. Overall, we believe this work would be of interest for anyone using PCA in their analyses of genetic data, as well as for other omics data. AVAILABILITY AND IMPLEMENTATION: R packages bigsnpr and bigutilsr can be installed from either CRAN or GitHub (see https://github.com/privefl/bigsnpr). A tutorial on the steps to perform PCA on 1000G data is available at https://privefl.github.io/bigsnpr/articles/bedpca.html. All code used for this paper is available at https://github.com/privefl/paper4-bedpca/tree/master/code. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genética de Población , Programas Informáticos , Algoritmos , Humanos , Desequilibrio de Ligamiento , Análisis de Componente Principal
12.
BMC Bioinformatics ; 21(1): 16, 2020 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-31931698

RESUMEN

BACKGROUND: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. RESULTS: Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30-35%, and that selection of cell-type informative probes has similar effect. We show that Cattell's rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms' performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. CONCLUSION: Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.


Asunto(s)
Biología Computacional/normas , Metilación de ADN , Neoplasias/genética , Algoritmos , Biología Computacional/métodos , Simulación por Computador , Humanos , Programas Informáticos
14.
Bioinformatics ; 34(16): 2781-2787, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29617937

RESUMEN

Motivation: Genome-wide datasets produced for association studies have dramatically increased in size over the past few years, with modern datasets commonly including millions of variants measured in dozens of thousands of individuals. This increase in data size is a major challenge severely slowing down genomic analyses, leading to some software becoming obsolete and researchers having limited access to diverse analysis tools. Results: Here we present two R packages, bigstatsr and bigsnpr, allowing for the analysis of large scale genomic data to be performed within R. To address large data size, the packages use memory-mapping for accessing data matrices stored on disk instead of in RAM. To perform data pre-processing and data analysis, the packages integrate most of the tools that are commonly used, either through transparent system calls to existing software, or through updated or improved implementation of existing methods. In particular, the packages implement fast and accurate computations of principal component analysis and association studies, functions to remove single nucleotide polymorphisms in linkage disequilibrium and algorithms to learn polygenic risk scores on millions of single nucleotide polymorphisms. We illustrate applications of the two R packages by analyzing a case-control genomic dataset for celiac disease, performing an association study and computing polygenic risk scores. Finally, we demonstrate the scalability of the R packages by analyzing a simulated genome-wide dataset including 500 000 individuals and 1 million markers on a single desktop computer. Availability and implementation: https://privefl.github.io/bigstatsr/ and https://privefl.github.io/bigsnpr/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Algoritmos , Genoma Humano , Humanos , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Programas Informáticos
15.
Bioinformatics ; 34(19): 3412-3414, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29726908

RESUMEN

Summary: Many genome-wide association studies and genome-wide screening for gene-environment (GxE) interactions have been performed to elucidate the underlying mechanisms of human traits and diseases. When the analyzed outcome is quantitative, the overall contribution of identified genetic variants to the outcome is often expressed as the percentage of phenotypic variance explained. This is commonly done using individual-level genotype data but it is challenging when results are derived through meta-analyses. Here, we present R package, 'VarExp', that allows for the estimation of the percentage of phenotypic variance explained using summary statistics only. It allows for a range of models to be evaluated, including marginal genetic effects, GxE interaction effects and both effects jointly. Its implementation integrates all recent methodological developments and does not need external data to be uploaded by users. Availability and implementation: The R package is available at https://gitlab.pasteur.fr/statistical-genetics/VarExp.git. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genotipo , Programas Informáticos , Biología Computacional , Humanos , Fenotipo
16.
J Clin Endocrinol Metab ; 108(5): e89-e97, 2023 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-36413496

RESUMEN

BACKGROUND: Resource trade-off theory suggests that increased performance on a given trait comes at the cost of decreased performance on other traits. METHODS: Growth data from 1889 subjects (996 girls) were used from the GrowUp1974 Gothenburg study. Energy Trade-Off (ETO) between height and weight for individuals with extreme body types was characterized using a novel ETO-Score (ETOS). Four extreme body types were defined based on height and ETOI at early adulthood: tall-slender, short-stout, short-slender, and tall-stout; their growth trajectories assessed from ages 0.5-17.5 years.A GWAS using UK BioBank data was conducted to identify gene variants associated with height, BMI, and for the first time with ETOS. RESULTS: Height and ETOS trajectories show a two-hit pattern with profound changes during early infancy and at puberty for tall-slender and short-stout body types. Several loci (including FTO, ADCY3, GDF5, ) and pathways were identified by GWAS as being highly associated with ETOS. The most strongly associated pathways were related to "extracellular matrix," "signal transduction," "chromatin organization," and "energy metabolism." CONCLUSIONS: ETOS represents a novel anthropometric trait with utility in describing body types. We discovered the multiple genomic loci and pathways probably involved in energy trade-off.


Asunto(s)
Pubertad , Somatotipos , Femenino , Humanos , Adulto , Lactante , Preescolar , Niño , Adolescente , Fenotipo , Antropometría , Metabolismo Energético/genética , Estatura/genética , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/genética
17.
Biol Psychiatry ; 93(1): 29-36, 2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-35973856

RESUMEN

BACKGROUND: Single nucleotide polymorphism-based heritability is a fundamental quantity in the genetic analysis of complex traits. For case-control phenotypes, for which the continuous distribution of risk in the population is unobserved, observed-scale heritability estimates must be transformed to the more interpretable liability scale. This article describes how the field standard approach incorrectly performs the liability correction in that it does not appropriately account for variation in the proportion of cases across the cohorts comprising the meta-analysis. We propose a simple solution that incorporates cohort-specific ascertainment using the summation of effective sample sizes across cohorts. This solution is applied at the stage of single nucleotide polymorphism-based heritability estimation and does not require generating updated meta-analytic genome-wide association study summary statistics. METHODS: We began by performing a series of simulations to examine the ability of the standard approach and our proposed approach to recapture liability-scale heritability in the population. We went on to examine the differences in estimates obtained from these 2 approaches for real data for 12 major case-control genome-wide association studies of psychiatric and neurologic traits. RESULTS: We found that the field standard approach for performing the liability conversion can downwardly bias estimates by as much as approximately 50% in simulation and approximately 30% in real data. CONCLUSIONS: Prior estimates of liability-scale heritability for genome-wide association study meta-analysis may be drastically underestimated. To this end, we strongly recommend using our proposed approach of using the sum of effective sample sizes across contributing cohorts to obtain unbiased estimates.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Polimorfismo de Nucleótido Simple/genética , Fenotipo , Estudios de Casos y Controles
18.
Nat Commun ; 14(1): 4702, 2023 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-37543680

RESUMEN

The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increases prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad , Estudio de Asociación del Genoma Completo , Humanos , Trastorno por Déficit de Atención con Hiperactividad/genética , Fenotipo , Herencia Multifactorial/genética
19.
Nat Commun ; 14(1): 5553, 2023 09 09.
Artículo en Inglés | MEDLINE | ID: mdl-37689771

RESUMEN

Proportional hazards models have been proposed to analyse time-to-event phenotypes in genome-wide association studies (GWAS). However, little is known about the ability of proportional hazards models to identify genetic associations under different generative models and when ascertainment is present. Here we propose the age-dependent liability threshold (ADuLT) model as an alternative to a Cox regression based GWAS, here represented by SPACox. We compare ADuLT, SPACox, and standard case-control GWAS in simulations under two generative models and with varying degrees of ascertainment as well as in the iPSYCH cohort. We find Cox regression GWAS to be underpowered when cases are strongly ascertained (cases are oversampled by a factor 5), regardless of the generative model used. ADuLT is robust to ascertainment in all simulated scenarios. Then, we analyse four psychiatric disorders in iPSYCH, ADHD, Autism, Depression, and Schizophrenia, with a strong case-ascertainment. Across these psychiatric disorders, ADuLT identifies 20 independent genome-wide significant associations, case-control GWAS finds 17, and SPACox finds 8, which is consistent with simulation results. As more genetic data are being linked to electronic health records, robust GWAS methods that can make use of age-of-onset information will help increase power in analyses for common health outcomes.


Asunto(s)
Trastorno Autístico , Estudio de Asociación del Genoma Completo , Humanos , Simulación por Computador , Registros Electrónicos de Salud , Factor V
20.
Nat Commun ; 14(1): 852, 2023 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-36792583

RESUMEN

The vitamin D binding protein (DBP), encoded by the group-specific component (GC) gene, is a component of the vitamin D system. In a genome-wide association study of DBP concentration in 65,589 neonates we identify 26 independent loci, 17 of which are in or close to the GC gene, with fine-mapping identifying 2 missense variants on chromosomes 12 and 17 (within SH2B3 and GSDMA, respectively). When adjusted for GC haplotypes, we find 15 independent loci distributed over 10 chromosomes. Mendelian randomization analyses identify a unidirectional effect of higher DBP concentration and (a) higher 25-hydroxyvitamin D concentration, and (b) a reduced risk of multiple sclerosis and rheumatoid arthritis. A phenome-wide association study confirms that higher DBP concentration is associated with a reduced risk of vitamin D deficiency. Our findings provide valuable insights into the influence of DBP on vitamin D status and a range of health outcomes.


Asunto(s)
Estudio de Asociación del Genoma Completo , Proteína de Unión a Vitamina D , Recién Nacido , Humanos , Proteína de Unión a Vitamina D/genética , Vitamina D/genética , Calcifediol , Vitaminas , Polimorfismo de Nucleótido Simple , Proteínas Citotóxicas Formadoras de Poros/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA