RESUMEN
Lichen planus (LP) is a T-cell-mediated inflammatory disease affecting squamous epithelia in many parts of the body, most often the skin and oral mucosa. Cutaneous LP is usually transient and oral LP (OLP) is most often chronic, so we performed a large-scale genetic and epidemiological study of LP to address whether the oral and non-oral subgroups have shared or distinct underlying pathologies and their overlap with autoimmune disease. Using lifelong records covering diagnoses, procedures, and clinic identity from 473,580 individuals in the FinnGen study, genome-wide association analyses were conducted on carefully constructed subcategories of OLP (n = 3,323) and non-oral LP (n = 4,356) and on the combined group. We identified 15 genome-wide significant associations in FinnGen and an additional 12 when meta-analyzed with UKBB (27 independent associations at 25 distinct genomic locations), most of which are shared between oral and non-oral LP. Many associations coincide with known autoimmune disease loci, consistent with the epidemiologic enrichment of LP with hypothyroidism and other autoimmune diseases. Notably, a third of the FinnGen associations demonstrate significant differences between OLP and non-OLP. We also observed a 13.6-fold risk for tongue cancer and an elevated risk for other oral cancers in OLP, in agreement with earlier reports that connect LP with higher cancer incidence. In addition to a large-scale dissection of LP genetics and comorbidities, our study demonstrates the use of comprehensive, multidimensional health registry data to address outstanding clinical questions and reveal underlying biological mechanisms in common but understudied diseases.
Asunto(s)
Enfermedades Autoinmunes , Estudio de Asociación del Genoma Completo , Liquen Plano Oral , Neoplasias de la Boca , Humanos , Enfermedades Autoinmunes/genética , Liquen Plano Oral/genética , Liquen Plano Oral/patología , Neoplasias de la Boca/genética , Neoplasias de la Boca/patología , Femenino , Masculino , Heterogeneidad Genética , Persona de Mediana Edad , Liquen Plano/genética , Liquen Plano/patología , Predisposición Genética a la Enfermedad , Anciano , Adulto , Factores de Riesgo , Polimorfismo de Nucleótido SimpleRESUMEN
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (ß coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
Asunto(s)
Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Fenotipo , Diabetes Mellitus Tipo 1/genética , Polimorfismo de Nucleótido Simple , Aprendizaje AutomáticoRESUMEN
Understanding perturbations in circulating lipid levels that often occur years or decades before clinical symptoms may enhance our understanding of disease mechanisms and provide novel intervention opportunities. Here, we assessed if polygenic scores (PGSs) for complex traits could detect lipid dysfunctions related to the traits and provide new biological insights. We constructed genome-wide PGSs (approximately 1 million genetic variants) for 50 complex traits in 7,169 Finnish individuals with routine clinical lipid profiles and lipidomics measurements (179 lipid species). We identified 678 associations (P < 9.0 × 10-5) involving 26 traits and 142 lipids. Most of these associations were also validated with the actual phenotype measurements where available (89.5% of 181 associations where the trait was available), suggesting that these associations represent early signs of physiological changes of the traits. We detected many known relationships (e.g., PGS for body mass index (BMI) and lysophospholipids, PGS for type 2 diabetes and triacyglycerols) and those that suggested potential target for prevention strategies (e.g., PGS for venous thromboembolism and arachidonic acid). We also found association of PGS for favorable adiposity with increased sphingomyelins levels, suggesting a probable role of sphingomyelins in increased risk for certain disease, e.g., venous thromboembolism as reported previously, in favorable adiposity despite its favorable metabolic effect. Altogether, our study provides a comprehensive characterization of lipidomic alterations in genetic predisposition for a wide range of complex traits. The study also demonstrates potential of PGSs for complex traits to capture early, presymptomatic lipid alterations, highlighting its utility in understanding disease mechanisms and early disease detection.
Asunto(s)
Estudio de Asociación del Genoma Completo , Lípidos , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Masculino , Femenino , Lípidos/sangre , Lípidos/genética , Persona de Mediana Edad , Finlandia , Lipidómica/métodos , Adulto , Fenotipo , Índice de Masa Corporal , Metabolismo de los Lípidos/genética , Anciano , Polimorfismo de Nucleótido Simple/genética , Predisposición Genética a la EnfermedadRESUMEN
Family history is the standard indirect measure of inherited susceptibility in clinical care, whereas polygenic risk scores (PRSs) have more recently demonstrated potential for more directly capturing genetic risk in many diseases. Few studies have systematically compared how these overlap and complement each other across common diseases. Within FinnGen (N = 306,418), we leverage family relationships, up to 50 years of nationwide registries, and genome-wide genotyping to examine the interplay of family history and genome-wide PRSs. We explore the dynamic for three types of family history across 24 common diseases: first- and second-degree family history and parental causes of death. Covering a large proportion of the burden of non-communicable diseases in adults, we show that family history and PRS are independent and not interchangeable measures, but instead provide complementary information on inherited disease susceptibility. The PRSs explained on average 10% of the effect of first-degree family history, and first-degree family history 3% of PRSs, and PRS effects were independent of both early- and late-onset family history. The PRS stratified the risk similarly in individuals with and without family history. In most diseases, including coronary artery disease, glaucoma, and type 2 diabetes, a positive family history with a high PRS was associated with a considerably elevated risk, whereas a low PRS compensated completely for the risk implied by positive family history. This study provides a catalogue of risk estimates for both family history of disease and PRSs and highlights opportunities for a more comprehensive way of assessing inherited disease risk across common diseases.
Asunto(s)
Diabetes Mellitus Tipo 2 , Estudio de Asociación del Genoma Completo , Adulto , Humanos , Diabetes Mellitus Tipo 2/genética , Herencia Multifactorial/genética , Predisposición Genética a la Enfermedad , Anamnesis , Factores de RiesgoRESUMEN
Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.
Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Bilirrubina , Carnitina , Glicerofosfolípidos , Humanos , Masculino , Metabolómica , Sitios de Carácter Cuantitativo/genética , Miembro 5 de la Familia 22 de Transportadores de Solutos/genética , Transcriptoma/genéticaRESUMEN
Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.
Asunto(s)
Secuenciación del Exoma , Estudios de Asociación Genética/métodos , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Sitios de Carácter Cuantitativo/genética , Alelos , HDL-Colesterol/genética , Análisis por Conglomerados , Determinación de Punto Final , Finlandia , Mapeo Geográfico , Humanos , Herencia Multifactorial/genética , Reproducibilidad de los ResultadosRESUMEN
An Amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Lipidomic data often exhibit missing data points, which can be categorized as missing completely at random (MCAR), missing at random, or missing not at random (MNAR). In order to utilize statistical methods that require complete datasets or to improve the identification of potential effects in statistical comparisons, imputation techniques can be employed. In this study, we investigate commonly used methods such as zero, half-minimum, mean, and median imputation, as well as more advanced techniques such as k-nearest neighbor and random forest imputation. We employ a combination of simulation-based approaches and application to real datasets to assess the performance and effectiveness of these methods. Shotgun lipidomics datasets exhibit high correlations and missing values, often due to low analyte abundance, characterized as MNAR. In this context, k-nearest neighbor approaches based on correlation and truncated normal distributions demonstrate best performance. Importantly, both methods can effectively impute missing values independent of the type of missingness, the determination of which is nearly impossible in practice. The imputation methods still control the type I error rate.
Asunto(s)
Lipidómica , Lipidómica/métodos , Humanos , Algoritmos , Lípidos/análisis , Interpretación Estadística de DatosRESUMEN
Formalin-fixed paraffin-embedded (FFPE) tissues stored in biobanks and pathology archives are a vast but underutilized source for molecular studies on different diseases. Beyond being the "gold standard" for preservation of diagnostic human tissues, FFPE samples retain similar genetic information as matching blood samples, which could make FFPE samples an ideal resource for genomic analysis. However, research on this resource has been hindered by the perception that DNA extracted from FFPE samples is of poor quality. Here, we show that germline disease-predisposing variants and polygenic risk scores (PRS) can be identified from FFPE normal tissue (FFPE-NT) DNA with high accuracy. We optimized the performance of FFPE-NT DNA on a genome-wide array containing 657,675 variants. Via a series of testing and validation phases, we established a protocol for FFPE-NT genotyping with results comparable with blood genotyping. The median call rate of FFPE-NT samples in the validation phase was 99.85% (range 98.26%-99.94%) and median concordance with matching blood samples was 99.79% (range 98.85%-99.9%). We also demonstrated that a rare pathogenic PALB2 genetic variant predisposing to cancer can be correctly identified in FFPE-NT samples. We further imputed the FFPE-NT genotype data and calculated the FFPE-NT genome-wide PRS in 3 diseases and 4 disease risk variables. In all cases, FFPE-NT and matching blood PRS were highly concordant (all Pearson's r > 0.95). The ability to precisely genotype FFPE-NT on a genome-wide array enables translational genomics applications of archived FFPE-NT samples with the possibility to link to corresponding phenotypes and longitudinal health data.
Asunto(s)
Formaldehído , Puntuación de Riesgo Genético , Humanos , Genotipo , Fijación del Tejido/métodos , ADN/genética , Adhesión en Parafina/métodosRESUMEN
BACKGROUND: Hereditary factors, including single genetic variants and family history, can be used for targeting colorectal cancer (CRC) screening, but limited data exist on the impact of polygenic risk scores (PRS) on risk-based CRC screening. METHODS: Using longitudinal health and genomics data on 453,733 Finnish individuals including 8801 CRC cases, we estimated the impact of a genome-wide CRC PRS on CRC screening initiation age through population-calibrated incidence estimation over the life course in men and women. RESULTS: Compared to the cumulative incidence of CRC at age 60 in Finland (the current age for starting screening in Finland), a comparable cumulative incidence was reached 5 and 11 years earlier in persons with high PRS (80-99% and >99%, respectively), while those with a low PRS (< 20%) reached comparable incidence 7 years later. The PRS was associated with increased risk of post-colonoscopy CRC after negative colonoscopy (hazard ratio 1.76 per PRS SD, 95% CI 1.54-2.01). Moreover, the PRS predicted colorectal adenoma incidence and improved incident CRC risk prediction over non-genetic risk factors. CONCLUSIONS: Our findings demonstrate that a CRC PRS can be used for risk stratification of CRC, with further research needed to optimally integrate the PRS into risk-based screening.
Asunto(s)
Neoplasias Colorrectales , Puntuación de Riesgo Genético , Masculino , Humanos , Femenino , Persona de Mediana Edad , Detección Precoz del Cáncer , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/epidemiología , Neoplasias Colorrectales/genética , Riesgo , Colonoscopía , Factores de RiesgoRESUMEN
The contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low-frequency SVs for association with 116 quantitative traits and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including 2 loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p = 1.47 × 10-54) and is also associated with increased levels of total cholesterol (p = 1.22 × 10-28) and 14 additional cholesterol-related traits, and (2) a multi-allelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p = 4.81 × 10-21) and alanine (p = 6.14 × 10-12) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs) and one linking recurrent HP gene deletion and cholesterol levels (p = 6.24 × 10-10), which was also found to be strongly associated with increased glycoprotein level (p = 3.53 × 10-35). Our study confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk.
Asunto(s)
Enfermedades Cardiovasculares/genética , Variación Estructural del Genoma/genética , Alelos , Colesterol/sangre , Variaciones en el Número de Copia de ADN/genética , Femenino , Finlandia , Genoma Humano/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Proteínas Mitocondriales/genética , Regiones Promotoras Genéticas/genética , Piruvato Deshidrogenasa (Lipoamida)-Fosfatasa/genética , Ácido Pirúvico/metabolismo , Albúmina Sérica Humana/genéticaRESUMEN
OBJECTIVES: Rheumatic diseases may impair reproductive success and pregnancy outcomes, but systematic evaluations across diseases are lacking. We conducted a nationwide cohort study to examine the impact of rheumatic diseases on reproductive health measures, comparing the impacts with those of other immune-mediated diseases (IMDs). METHODS: Out of all of the 5 339 804 Finnish citizens, individuals born 1964-1984 and diagnosed with any of the 19 IMDs before age 30 (women) or 35 (men) were matched with 20 controls by birth year, sex, and education. We used data from nationwide health registers to study the impact of IMDs on reproductive health measures, such as reproductive success and, for women, ever having experienced adverse maternal and perinatal outcomes. RESULTS: Several of the rheumatic diseases, particularly SLE, JIA, and seropositive RA, were associated with higher rates of childlessness and fewer children. The risks for pre-eclampsia, newborns being small for gestational age, preterm delivery, non-elective Caesarean sections, and need of neonatal intensive care were increased in many IMDs. Particularly, SLE, SS, type 1 diabetes, and Addison's disease showed >2-fold risks for some of these outcomes. In most rheumatic diseases, moderate (1.1-1.5-fold) risk increases were observed for diverse adverse pregnancy outcomes, with similar effects in IBD, celiac disease, asthma, ITP, and psoriasis. CONCLUSION: Rheumatic diseases have a broad impact on reproductive health, with effects comparable with that of several other IMDs. Of the rheumatic diseases, SLE and SS conferred the largest risk increases on perinatal adverse event outcomes.
Asunto(s)
Resultado del Embarazo , Sistema de Registros , Salud Reproductiva , Enfermedades Reumáticas , Humanos , Femenino , Embarazo , Enfermedades Reumáticas/epidemiología , Finlandia/epidemiología , Adulto , Masculino , Salud Reproductiva/estadística & datos numéricos , Recién Nacido , Complicaciones del Embarazo/epidemiología , Complicaciones del Embarazo/inmunología , Nacimiento Prematuro/epidemiología , Estudios de Cohortes , Estudios de Casos y ControlesRESUMEN
Information about individual-level genetic ancestry is central to population genetics, forensics and genomic medicine. So far, studies have typically considered genetic ancestry on a broad continental level, and there is much less understanding of how more detailed genetic ancestry profiles can be generated and how accurate and reliable they are. Here, we assess these questions by developing a framework for individual-level ancestry estimation within a single European country, Finland, and we apply the framework to track changes in the fine-scale genetic structure throughout the 20th century. We estimate the genetic ancestry for 18,463 individuals from the National FINRISK Study with respect to up to 10 genetically and geographically motivated Finnish reference groups and illustrate the annual changes in the fine-scale genetic structure over the decades from 1920s to 1980s for 12 geographic regions of Finland. We detected major changes after a sudden, internal migration related to World War II from the region of ceded Karelia to the other parts of the country as well as the effect of urbanization starting from the 1950s. We also show that while the level of genetic heterogeneity in general increases towards the present day, its rate of change has considerable differences between the regions. To our knowledge, this is the first study that estimates annual changes in the fine-scale ancestry profiles within a relatively homogeneous European country and demonstrates how such information captures a detailed spatial and temporal history of a population. We provide an interactive website for the general public to examine our results.
Asunto(s)
Estructuras Genéticas , Genética de Población , Bases de Datos Genéticas , Finlandia , Heterogeneidad Genética , Geografía , Migración Humana , Humanos , Modelos GenéticosRESUMEN
Protein-truncating variants (PTVs) affecting dyslipidemia risk may point to therapeutic targets for cardiometabolic disease. Our objective was to identify PTVs that were associated with both lipid levels and the risk of coronary artery disease (CAD) or type 2 diabetes (T2D) and assess their possible associations with risks of other diseases. To achieve this aim, we leveraged the enrichment of PTVs in the Finnish population and tested the association of low-frequency PTVs in 1,209 genes with serum lipid levels in the Finrisk Study (n = 23,435). We then tested which of the lipid-associated PTVs were also associated with the risks of T2D or CAD, as well as 2,683 disease endpoints curated in the FinnGen Study (n = 218,792). Two PTVs were associated with both lipid levels and the risk of CAD or T2D: triglyceride-lowering variants in ANGPTL8 (-24.0[-30.4 to -16.9] mg/dL per rs760351239-T allele, P = 3.4 × 10-9) and ANGPTL4 (-14.4[-18.6 to -9.8] mg/dL per rs746226153-G allele, P = 4.3 × 10-9). The risk of T2D was lower in carriers of the ANGPTL4 PTV (OR = 0.70[0.60-0.81], P = 2.2 × 10-6) than noncarriers. The odds of CAD were 47% lower in carriers of a PTV in ANGPTL8 (OR = 0.53[0.37-0.76], P = 4.5 × 10-4) than noncarriers. Finally, the phenome-wide scan of the ANGPTL8 PTV showed that the ANGPTL8 PTV carriers were less likely to use statin therapy (68,782 cases, OR = 0.52[0.40-0.68], P = 1.7 × 10-6) compared to noncarriers. Our findings provide genetic evidence of potential long-term efficacy and safety of therapeutic targeting of dyslipidemias.
Asunto(s)
Proteínas Similares a la Angiopoyetina/genética , Enfermedad de la Arteria Coronaria/tratamiento farmacológico , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Dislipidemias/tratamiento farmacológico , Hormonas Peptídicas/genética , Anciano , Proteína 8 Similar a la Angiopoyetina , Enfermedad de la Arteria Coronaria/sangre , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/patología , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/patología , Dislipidemias/sangre , Dislipidemias/genética , Dislipidemias/patología , Femenino , Predisposición Genética a la Enfermedad , Humanos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/administración & dosificación , Inhibidores de Hidroximetilglutaril-CoA Reductasas/efectos adversos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple/genética , Factores de Riesgo , Triglicéridos/sangreRESUMEN
BACKGROUND: Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios. METHODS: We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning. RESULTS: Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively. CONCLUSIONS: By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.
Asunto(s)
Enfermedades Cardiovasculares , Aprendizaje Automático , Humanos , Persona de Mediana Edad , Masculino , Femenino , Factores de Riesgo de Enfermedad Cardiaca , Adulto , Metabolómica , Anciano , Factores de Riesgo , Medición de Riesgo , Finlandia , MultiómicaRESUMEN
BACKGROUND: The influence of genetics and environment on the association of the plasma proteome with body mass index (BMI) and changes in BMI remains underexplored, and the links to other omics in these associations remain to be investigated. We characterized protein-BMI trajectory associations in adolescents and adults and how these connect to other omics layers. METHODS: Our study included two cohorts of longitudinally followed twins: FinnTwin12 (N = 651) and the Netherlands Twin Register (NTR) (N = 665). Follow-up comprised 4 BMI measurements over approximately 6 (NTR: 23-27 years old) to 10 years (FinnTwin12: 12-22 years old), with omics data collected at the last BMI measurement. BMI changes were calculated in latent growth curve models. Mixed-effects models were used to quantify the associations between the abundance of 439 plasma proteins with BMI at blood sampling and changes in BMI. In FinnTwin12, the sources of genetic and environmental variation underlying the protein abundances were quantified by twin models, as were the associations of proteins with BMI and BMI changes. In NTR, we investigated the association of gene expression of genes encoding proteins identified in FinnTwin12 with BMI and changes in BMI. We linked identified proteins and their coding genes to plasma metabolites and polygenic risk scores (PRS) applying mixed-effects models and correlation networks. RESULTS: We identified 66 and 14 proteins associated with BMI at blood sampling and changes in BMI, respectively. The average heritability of these proteins was 35%. Of the 66 BMI-protein associations, 43 and 12 showed genetic and environmental correlations, respectively, including 8 proteins showing both. Similarly, we observed 7 and 3 genetic and environmental correlations between changes in BMI and protein abundance, respectively. S100A8 gene expression was associated with BMI at blood sampling, and the PRG4 and CFI genes were associated with BMI changes. Proteins showed strong connections with metabolites and PRSs, but we observed no multi-omics connections among gene expression and other omics layers. CONCLUSIONS: Associations between the proteome and BMI trajectories are characterized by shared genetic, environmental, and metabolic etiologies. We observed few gene-protein pairs associated with BMI or changes in BMI at the proteome and transcriptome levels.
Asunto(s)
Multiómica , Proteoma , Humanos , Adolescente , Adulto Joven , Adulto , Niño , Índice de Masa Corporal , Proteoma/genética , Gemelos Monocigóticos/genética , Estudios LongitudinalesRESUMEN
Polygenic scores (PSs) are becoming a useful tool to identify individuals with high genetic risk for complex diseases, and several projects are currently testing their utility for translational applications. It is also tempting to use PSs to assess whether genetic variation can explain a part of the geographic distribution of a phenotype. However, it is not well known how the population genetic properties of the training and target samples affect the geographic distribution of PSs. Here, we evaluate geographic differences, and related biases, of PSs in Finland in a geographically well-defined sample of 2,376 individuals from the National FINRISK study. First, we detect geographic differences in PSs for coronary artery disease (CAD), rheumatoid arthritis, schizophrenia, waist-hip ratio (WHR), body-mass index (BMI), and height, but not for Crohn disease or ulcerative colitis. Second, we use height as a model trait to thoroughly assess the possible population genetic biases in PSs and apply similar approaches to the other phenotypes. Most importantly, we detect suspiciously large accumulations of geographic differences for CAD, WHR, BMI, and height, suggesting bias arising from the population's genetic structure rather than from a direct genotype-phenotype association. This work demonstrates how sensitive the geographic patterns of current PSs are for small biases even within relatively homogeneous populations and provides simple tools to identify such biases. A thorough understanding of the effects of population genetic structure on PSs is essential for translational applications of PSs.
Asunto(s)
Marcadores Genéticos , Genética de Población , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Adulto , Anciano , Artritis Reumatoide/epidemiología , Artritis Reumatoide/genética , Índice de Masa Corporal , Colitis Ulcerosa/epidemiología , Colitis Ulcerosa/genética , Enfermedad de la Arteria Coronaria/epidemiología , Enfermedad de la Arteria Coronaria/genética , Enfermedad de Crohn/epidemiología , Enfermedad de Crohn/genética , Femenino , Finlandia/epidemiología , Estudios de Asociación Genética , Geografía , Humanos , Masculino , Persona de Mediana Edad , Factores de Riesgo , Esquizofrenia/epidemiología , Esquizofrenia/genética , Relación Cintura-CaderaRESUMEN
Cytokines are essential regulatory components of the immune system, and their aberrant levels have been linked to many disease states. Despite increasing evidence that cytokines operate in concert, many of the physiological interactions between cytokines, and the shared genetic architecture that underlies them, remain unknown. Here, we aimed to identify and characterize genetic variants with pleiotropic effects on cytokines. Using three population-based cohorts (n = 9,263), we performed multivariate genome-wide association studies (GWAS) for a correlation network of 11 circulating cytokines, then combined our results in meta-analysis. We identified a total of eight loci significantly associated with the cytokine network, of which two (PDGFRB and ABO) had not been detected previously. In addition, conditional analyses revealed a further four secondary signals at three known cytokine loci. Integration, through the use of Bayesian colocalization analysis, of publicly available GWAS summary statistics with the cytokine network associations revealed shared causal variants between the eight cytokine loci and other traits; in particular, cytokine network variants at the ABO, SERPINE2, and ZFPM2 loci showed pleiotropic effects on the production of immune-related proteins, on metabolic traits such as lipoprotein and lipid levels, on blood-cell-related traits such as platelet count, and on disease traits such as coronary artery disease and type 2 diabetes.
Asunto(s)
Biomarcadores/análisis , Enfermedades Cardiovasculares/genética , Citocinas/genética , Pleiotropía Genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Adolescente , Adulto , Anciano , Proteínas Sanguíneas/genética , Proteínas Sanguíneas/inmunología , Enfermedades Cardiovasculares/inmunología , Enfermedades Cardiovasculares/patología , Niño , Citocinas/inmunología , Femenino , Estudios de Seguimiento , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Genoma Humano , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Pronóstico , Estudios Prospectivos , Adulto JovenRESUMEN
BACKGROUND: The phospholipase domain-containing 3 gene (PNPLA3)-148M variant is associated with liver steatosis but its influence on the metabolism of triglyceride-rich lipoproteins remains unclear. Here, we investigated the kinetics of large, triglyceride-rich very-low-density lipoprotein (VLDL), (VLDL1 ), and smaller VLDL2 in homozygotes for the PNPLA3-148M variant. METHODS AND RESULTS: The kinetics of apolipoprotein (apo) B100 (apoB100) and triglyceride in VLDL subfractions were analysed in nine subjects homozygous for PNPLA3-148M and nine subjects homozygous for PNPLA3-148I (controls). Liver fat was >3-fold higher in the 148M subjects. Production rates for apoB100 and triglyceride in VLDL1 did not differ significantly between the two groups. Likewise, production rates for VLDL2 -apoB100 and -triglyceride, and fractional clearance rates for both apoB100 and triglyceride in VLDL1 and VLDL2 , were not significantly different. CONCLUSIONS: Despite the higher liver fat content in PNPLA3 148M homozygotes, there was no increase in VLDL production. Equally, VLDL production was maintained at normal levels despite the putative impairment in cytosolic lipid hydrolysis in these subjects.