RESUMEN
Genotype-stratified variance of a quantitative trait could differ in the presence of gene-gene or gene-environment interactions. Genetic markers associated with phenotypic variance are thus considered promising candidates for follow-up interaction or joint location-scale analyses. However, as in studies of main effects, the X-chromosome is routinely excluded from "whole-genome" scans due to analytical challenges. Specifically, as males carry only one copy of the X-chromosome, the inherent sex-genotype dependency could bias the trait-genotype association, through sexual dimorphism in quantitative traits with sex-specific means or variances. Here we investigate phenotypic variance heterogeneity associated with X-chromosome single nucleotide polymorphisms (SNPs) and propose valid and powerful strategies. Among those, a generalized Levene's test has adequate power and remains robust to sexual dimorphism. An alternative approach is a sex-stratified analysis but at the cost of slightly reduced power and modeling flexibility. We applied both methods to an Estonian study of gene expression quantitative trait loci (eQTL; n = 841), and two complex trait studies of height, hip, and waist circumferences, and body mass index from Multi-Ethnic Study of Atherosclerosis (MESA; n = 2,073) and UK Biobank (UKB; n = 327,393). Consistent with previous eQTL findings on mean, we found some but no conclusive evidence for cis regulators being enriched for variance association. SNP rs2681646 is associated with variance of waist circumference (p = 9.5E-07) at X-chromosome-wide significance in UKB, with a suggestive female-specific effect in MESA (p = 0.048). Collectively, an enrichment analysis using permutated UKB (p < 0.1) and MESA (p < 0.01) datasets, suggests a possible polygenic structure for the variance of human height.
Asunto(s)
Cromosomas Humanos X/genética , Heterogeneidad Genética , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo/genética , Simulación por Computador , Femenino , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Fenotipo , Caracteres Sexuales , Circunferencia de la CinturaRESUMEN
Complex traits can share a substantial proportion of their polygenic heritability. However, genome-wide polygenic correlations between pairs of traits can mask heterogeneity in their shared polygenic effects across loci. We propose a novel method (weighted maximum likelihood-regional polygenic correlation [RPC]) to evaluate polygenic correlation between two complex traits in small genomic regions using summary association statistics. Our method tests for evidence that the polygenic effect at a given region affects two traits concurrently. We show through simulations that our method is well calibrated, powerful, and more robust to misspecification of linkage disequilibrium than other methods under a polygenic model. As small genomic regions are more likely to harbor specific genetic effects, our method is ideal to identify heterogeneity in shared polygenic correlation across regions. We illustrate the usefulness of our method by addressing two questions related to cardiometabolic traits. First, we explored how RPC can inform on the strong epidemiological association between high-density lipoprotein cholesterol and coronary artery disease (CAD), suggesting a key role for triglycerides metabolism. Second, we investigated the potential role of PPARγ activators in the prevention of CAD. Our results provide a compelling argument that shared heritability between complex traits is highly heterogeneous across loci.
Asunto(s)
Desequilibrio de Ligamiento/genética , Herencia Multifactorial/genética , HDL-Colesterol/genética , Simulación por Computador , Enfermedad de la Arteria Coronaria/tratamiento farmacológico , Enfermedad de la Arteria Coronaria/genética , Sitios Genéticos , Genoma Humano , Estudio de Asociación del Genoma Completo , Haplotipos/genética , Humanos , Modelos Genéticos , PPAR gamma/metabolismo , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Factores de Riesgo , Tiazolidinedionas/uso terapéuticoRESUMEN
BACKGROUND: Gas chromatography-olfactometry (GC-O) is the most frequently used method to estimate the sensory contribution of single odorant, but disregards the interactions between volatiles. In order to select the key volatiles responsible for the aroma attributes of Congou black tea (Camellia sinensis), instrumental, sensory and multivariate statistical approaches were applied. RESULTS: Using sensory analysis, nine panellists developed eight descriptors: floral, sweet, fruity, green, roasted, oil, spicy, and off-odour. Linalool, (E)-furan linalool oxide, (Z)-pyran linalool oxide, methyl salicylate, ß-myrcene, and phenylethyl alcohol, which were identified from the most representative samples by the GC-O procedure, were the essential aroma-active compounds in the formation of basic Congou black tea aroma. In addition, 136 volatiles were identified by gas chromatography-mass spectrometry (GC-MS), among which 55 compounds were determined as the key factors for six sensory attributes by partial least-square regression (PLSR) with variable importance of projection scores. CONCLUSION: Our results demonstrated that headspace solid-phase microextraction/GC-MS/GC-O was a fast approach for isolation and quantification aroma-active compounds. The PLSR method was also considered to be a useful tool in selecting important variables for sensory attributes. These two strategies, which allowed us to comprehensively evaluate the sensorial contribution of a single volatile from different perspectives, can be applied to related products for comprehensive quality control. © 2018 Society of Chemical Industry.
Asunto(s)
Camellia sinensis/química , Aromatizantes/química , Té/química , Compuestos Orgánicos Volátiles/química , Aromatizantes/aislamiento & purificación , Cromatografía de Gases y Espectrometría de Masas , Humanos , Análisis de los Mínimos Cuadrados , Odorantes/análisis , Olfatometría , Microextracción en Fase Sólida , Gusto , Compuestos Orgánicos Volátiles/aislamiento & purificaciónRESUMEN
Increasing attention has focused on the significance of RNA in sperm, in light of its contribution to the birth and long-term health of a child, role in sperm function and diagnostic potential. As the composition of sperm RNA is in flux, assigning specific roles to individual RNAs presents a significant challenge. For the first time RNA-seq was used to characterize the population of coding and non-coding transcripts in human sperm. Examining RNA representation as a function of multiple methods of library preparation revealed unique features indicative of very specific and stage-dependent maturation and regulation of sperm RNA, illuminating their various transitional roles. Correlation of sperm transcript abundance with epigenetic marks suggested roles for these elements in the pre- and post-fertilization genome. Several classes of non-coding RNAs including lncRNAs, CARs, pri-miRNAs, novel elements and mRNAs have been identified which, based on factors including relative abundance, integrity in sperm, available knockout data of embryonic effect and presence or absence in the unfertilized human oocyte, are likely to be essential male factors critical to early post-fertilization development. The diverse and unique attributes of sperm transcripts that were revealed provides the first detailed analysis of the biology and anticipated clinical significance of spermatozoal RNAs.
Asunto(s)
ARN/metabolismo , Espermatozoides/metabolismo , Epigénesis Genética , Fertilización/genética , Humanos , Masculino , MicroARNs/metabolismo , Poliadenilación , ARN/química , Isoformas de ARN/metabolismo , Precursores del ARN/metabolismo , Estabilidad del ARN , ARN Pequeño no Traducido/metabolismo , Análisis de Secuencia de ARN , Testículo/metabolismoRESUMEN
Activation of the major histocompatibility complex (MHC) by interferon-gamma (IFN-γ) is a fundamental step in the adaptive immune response to pathogens. Here, we show that reorganization of chromatin loop domains in the MHC is evident within the first 30 min of IFN-γ treatment of fibroblasts, and that further dynamic alterations occur up to 6 h. These very rapid changes occur at genomic sites which are occupied by CTCF and are close to IFN-γ-inducible MHC genes. Early responses to IFN-γ are thus initiated independently of CIITA, the master regulator of MHC class II genes and prepare the MHC for subsequent induction of transcription.
Asunto(s)
Interferón gamma/farmacología , Complejo Mayor de Histocompatibilidad , Proteínas Represoras/metabolismo , Sitios de Unión , Factor de Unión a CCCTC , Células Cultivadas , Cromatina/química , Cromatina/efectos de los fármacos , Humanos , Regiones de Fijación a la Matriz/efectos de los fármacos , Factores de Transcripción/metabolismoRESUMEN
It has been postulated that rare coding variants (RVs; MAF < 0.01) contribute to the "missing" heritability of complex traits. We developed a framework, the Rare variant heritability (RARity) estimator, to assess RV heritability (h2RV) without assuming a particular genetic architecture. We applied RARity to 31 complex traits in the UK Biobank (n = 167,348) and showed that gene-level RV aggregation suffers from 79% (95% CI: 68-93%) loss of h2RV. Using unaggregated variants, 27 traits had h2RV > 5%, with height having the highest h2RV at 21.9% (95% CI: 19.0-24.8%). The total heritability, including common and rare variants, recovered pedigree-based estimates for 11 traits. RARity can estimate gene-level h2RV, enabling the assessment of gene-level characteristics and revealing 11, previously unreported, gene-phenotype relationships. Finally, we demonstrated that in silico pathogenicity prediction (variant-level) and gene-level annotations do not generally enrich for RVs that over-contribute to complex trait variance, and thus, innovative methods are needed to predict RV functionality.
Asunto(s)
Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Herencia Multifactorial/genética , Fenotipo , Anotación de Secuencia Molecular , Estudio de Asociación del Genoma Completo , Modelos GenéticosRESUMEN
Post-fermented tea (PFT), a commonly consumed beverage worldwide, is characterized by the rapid growth of its microbial groups and the substantial changes they undergo. Consequently, PFT may contain mycotoxins such as B-type fumonisins (FBs). This study aimed to assess the intake of FBs through the consumption of PFT among consumers in Guangxi, China. A novel quantitative method using high-performance liquid chromatography-mass spectrometry was used to determine the FB concentration in PFT products. Additionally, a PFT consumption survey was conducted using a face-to-face questionnaire, recording their body weight and PFT consumption patterns based on a three-day dietary recall method. Finally, hazard index was calculated to estimate the health risk of FBs from the consumption of PFT products in Guangxi. The results revealed that the occurrence of FBs in PFT was 20% (24/120), with a concentration ranging from 2.14 to 18.28 µg/kg. The results of the survey showed that the average daily consumption of PFT by consumers was 9.19 ± 11.14 g. The deterministic risk assessment revealed that only 0.026% of the provisional maximum tolerable daily intake of FBs was consumed through PFT, indicating that FB contamination in PFT is not a public health risk.
RESUMEN
Importance: Body mass index (BMI) is an easily obtained adiposity surrogate. However, there is variability in body composition and adipose tissue distribution between individuals with the same BMI, and there is controversy regarding the BMI associated with the lowest mortality risk. Objective: To evaluate which of BMI, fat mass index (FMI), and waist-to-hip (WHR) has the strongest and most consistent association with mortality. Design, Setting, and Participant: This cohort study used incident deaths from the UK Biobank (UKB; 2006-2022), which includes data from 22 clinical assessment centers across the United Kingdom. UKB British participants of British White ancestry (N = 387â¯672) were partitioned into a discovery cohort (n = 337â¯078) and validation cohort (n = 50â¯594), with the latter consisting of 25â¯297 deaths and 25â¯297 controls. The discovery cohort was used to derive genetically determined adiposity measures while the validation cohort was used for analyses. Exposure-outcome associations were analyzed through observational and mendelian randomization (MR) analyses. Exposures: BMI, FMI, and WHR. Main Outcomes and Measures: All-cause and cause-specific (cancer, cardiovascular disease [CVD], respiratory disease, or other causes) mortality. Results: There were 387â¯672 and 50â¯594 participants in our observational (mean [SD] age, 56.9 [8.0] years; 177â¯340 [45.9%] male, 210â¯332 [54.2%], female), and MR (mean [SD] age, 61.6 [6.2] years; 30â¯031 [59.3%] male, 20â¯563 [40.6%], female) analyses, respectively. Associations between measured BMI and FMI with all-cause mortality were J-shaped, whereas the association of WHR with all-cause mortality was linear using the hazard ratio (HR) scale (HR per SD increase of WHR, 1.41 [95% CI, 1.38-1.43]). Genetically determined WHR had a stronger association with all-cause mortality than BMI (odds ratio [OR] per SD increase of WHR, 1.51 [95% CI, 1.32-1.72]; OR per SD increase of BMI, 1.29 [95% CI, 1.20-1.38]; P for heterogeneity = .02). This association was stronger in male than female participants (OR, 1.89 [95% CI, 1.54-2.32]; P for heterogeneity = .01). Unlike BMI or FMI, the genetically determined WHR-all-cause mortality association was consistent irrespective of observed BMI. Conclusions and Relevance: In this cohort study, WHR had the strongest and most consistent association with mortality irrespective of BMI. Clinical recommendations should consider focusing on adiposity distribution compared with mass.
Asunto(s)
Adiposidad , Obesidad , Humanos , Femenino , Masculino , Persona de Mediana Edad , Estudios de Cohortes , Obesidad/epidemiología , Distribución de la Grasa Corporal , BiomarcadoresRESUMEN
Identification of gene-by-environment interactions (GxE) is crucial to understand the interplay of environmental effects on complex traits. However, current methods evaluating GxE on biobank-scale datasets have limitations. We introduce MonsterLM, a multiple linear regression method that does not rely on model specification and provides unbiased estimates of variance explained by GxE. We demonstrate robustness of MonsterLM through comprehensive genome-wide simulations using real genetic data from 325,989 individuals. We estimate GxE using waist-to-hip-ratio, smoking, and exercise as the environmental variables on 13 outcomes (N = 297,529-325,989) in the UK Biobank. GxE variance is significant for 8 environment-outcome pairs, ranging from 0.009 - 0.071. The majority of GxE variance involves SNPs without strong marginal or interaction associations. We observe modest improvements in polygenic score prediction when incorporating GxE. Our results imply a significant contribution of GxE to complex trait variance and we show MonsterLM to be well-purposed to handle this with biobank-scale data.
Asunto(s)
Bancos de Muestras Biológicas , Interacción Gen-Ambiente , Humanos , Clima , Ejercicio Físico , Modelos LinealesRESUMEN
GABA depolarizes and excites central neurons during early development, becoming inhibitory and hyperpolarizing with maturation. This "developmental shift" occurs abruptly, reflecting a decrease in intracellular Cl(-) concentration ([Cl(-)](i)) and a hyperpolarizing shift in Cl(-) equilibrium potential due to upregulation of the K(+)-Cl(-) cotransporter KCC2b, a neuron-specific Cl(-) extruder. In contrast, primary afferent neurons (PANs) are depolarized by GABA throughout adulthood because of expression of NKCC1, a Na(+)-K(+)-2Cl(-) cotransporter that accumulates Cl(-) above equilibrium. The GABA(A)-mediated depolarization of PANs determines presynaptic inhibition in the spinal cord, a key mechanism gating somatosensory information. Little is known about developmental changes in Cl(-) transporter expression and Cl(-) homeostasis in PANs. Whether NKCC1 is expressed in PANs of all phenotypes or is restricted to subpopulations (e.g., nociceptors) is debatable. Likewise, whether PANs express KCC2s is controversial. We investigated NKCC1 and K(+)-Cl(-) cotransporter expression in rat and mouse dorsal root ganglion (DRG) neurons with molecular methods. Using fluorescence imaging microscopy, we measured [Cl(-)](i) in acutely dissociated rat DRG neurons (P0-P21) loaded with N-(ethoxycarbonylmethyl)-6-methoxyquinolinium bromide and classified with phenotypic markers. DRG neurons of all sizes express two NKCC1 mRNAs, one full-length and a shorter splice variant lacking exon 21. Immunolabeling with validated antibodies revealed ubiquitous expression of NKCC1 in DRG neurons irrespective of postnatal age and phenotype. As maturation progresses [Cl(-)](i) decreases gradually, persisting above equilibrium in >95% mature neurons. DRG neurons express mRNAs for KCC1, KCC3s, and KCC4, but not for KCC2s. Mechanisms underlying PANs' developmental changes in Cl(-) homeostasis are discussed and compared with those of central neurons.
Asunto(s)
Ganglios Espinales/crecimiento & desarrollo , Simportadores de Cloruro de Sodio-Potasio/fisiología , Simportadores/fisiología , Animales , Animales Recién Nacidos , Cloruros/análisis , Exones , Ganglios Espinales/citología , Ganglios Espinales/efectos de los fármacos , Masculino , Ratones , Ratones Endogámicos C57BL , Compuestos de Quinolinio/farmacología , Ratas , Ratas Sprague-Dawley , Simportadores de Cloruro de Sodio-Potasio/biosíntesis , Simportadores de Cloruro de Sodio-Potasio/genética , Miembro 2 de la Familia de Transportadores de Soluto 12 , Simportadores/biosíntesis , Cotransportadores de K ClRESUMEN
BACKGROUND: Atherosclerotic cardiovascular diseases (CVDs) are leading causes of death despite effective therapies and result in unnecessary morbidity and mortality throughout the world. We aimed to investigate the cost-effectiveness of polygenic risk scores (PRS) to guide statin therapy for Canadians with intermediate CVD risk and model its economic outlook. METHODS: This cost-utility analysis was conducted using UK Biobank prospective cohort study participants, with recruitment from 2006 to 2010, and at least 10 years of follow-up. We included nonrelated white British-descent participants (n=96 116) at intermediate CVD risk with no prior lipid lowering medication or statin-indicated conditions. A coronary artery disease PRS was used to inform decision to use statins. The effects of statin therapy with and without PRS, as well as CVD events were modelled to determine the incremental cost-effectiveness ratio from a Canadian public health care perspective. We discounted future costs and quality-adjusted life-years by 1.5% annually. RESULTS: The optimal economic strategy was when intermediate risk individuals with a PRS in the top 70% are eligible for statins while the lowest 1% are excluded. Base-case analysis at a genotyping cost of $70 produced an incremental cost-effectiveness ratio of $172 906 (143 685 USD) per quality-adjusted life-year. In the probabilistic sensitivity analysis, the intervention has approximately a 50% probability of being cost-effective at $179 100 (148 749 USD) per quality-adjusted life-year. At a $0 genotyping cost, representing individuals with existing genotyping information, PRS-guided strategies dominated standard care when 12% of the lowest PRS individuals were withheld from statins. With improved PRS predictive performance and lower genotyping costs, the incremental cost-effectiveness ratio demonstrates possible cost-effectiveness under thresholds of $150 000 and possibly $50 000 per quality-adjusted life-year. CONCLUSIONS: This study suggests that using PRS alongside existing guidelines might be cost-effective for CVD. Stronger predictiveness combined with decreased cost of PRS could further improve cost-effectiveness, providing an economic basis for its inclusion into clinical care.
Asunto(s)
Enfermedades Cardiovasculares , Inhibidores de Hidroximetilglutaril-CoA Reductasas , Humanos , Análisis Costo-Beneficio , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Enfermedades Cardiovasculares/tratamiento farmacológico , Enfermedades Cardiovasculares/genética , Enfermedades Cardiovasculares/prevención & control , Estudios Prospectivos , Canadá , Factores de Riesgo , LípidosRESUMEN
Gestational diabetes Mellitus (GDM) affects 1 in 7 births and is associated with numerous adverse health outcomes for both mother and child. GDM is suspected to share a large common genetic background with type 2 diabetes (T2D). The aim of our study was to characterize different GDM polygenic risk scores (PRSs) and test their association with GDM using data from the South Asian Birth Cohort (START). PRSs were derived for 832 South Asian women from START using the pruning and thresholding (P + T), LDpred, and GraBLD methods. Weights were derived from a multi-ethnic and a white Caucasian study of the DIAGRAM consortium. GDM status was defined using South Asian-specific glucose values in response to an oral glucose tolerance test. Association with GDM was tested using logistic regression. Results were replicated in South Asian women from the UK Biobank (UKB) study. The top ranking P + T, LDpred and GraBLD PRSs were all based on DIAGRAM's multi-ethnic study. The best PRS was highly associated with GDM in START (AUC = 0.62, OR = 1.60 [95% CI = 1.44-1.69]), and in South Asian women from UKB (AUC = 0.65, OR = 1.69 [95% CI = 1.28-2.24]). Our results highlight the importance of combining genome-wide genotypes and summary statistics from large multi-ethnic studies to optimize PRSs in South Asians.
Asunto(s)
Diabetes Gestacional/epidemiología , Estudio de Asociación del Genoma Completo/métodos , Medición de Riesgo/métodos , Adulto , Asia/epidemiología , Pueblo Asiatico/estadística & datos numéricos , Diabetes Mellitus Tipo 2/epidemiología , Diabetes Mellitus Tipo 2/genética , Diabetes Gestacional/genética , Etnicidad/genética , Etnicidad/estadística & datos numéricos , Femenino , Predisposición Genética a la Enfermedad/genética , Prueba de Tolerancia a la Glucosa , Humanos , Herencia Multifactorial/genética , Embarazo , Pronóstico , Factores de Riesgo , Población Blanca/estadística & datos numéricosRESUMEN
To explore the relationship between the moisture content of withered tea leaves and their physical properties (i.e., elasticity, plasticity, flexibility, and texture) during withering, texture analyzer was employed to test the elasticity and flexibility of withered tea leaves with different moisture contents. The texture was evaluated by computer vision technology. The withered tea leaves with different moisture contents were used to process congou black tea, which was then subjected to sensory evaluation. Results showed that good elasticity, optimal flexibility, and plasticity were achieved when the moisture content of the withered tea leaves of Fudingdabai comprising two leaves and one bud varied arranging from 65.51 to 61.48%. The sensory evaluation of congou black tea revealed that moderate withering was better than long-term withering and that both moderate and long-term withering were better than no withering during processing. The moisture content was significantly correlated with the flexibility and plasticity of the withered tea leaves. Fresh tea leaves undergoing moderate withering with moisture content of 65.51-61.48% to process congou black tea, good tea shape and liquor color were achieved. This study provided new evidence that the moisture content of withered tea leaves significantly affected the quality of black tea.
Asunto(s)
Manipulación de Alimentos/métodos , Hojas de la Planta/química , Gusto , Té/química , China , Elasticidad , Análisis de los Alimentos , Hojas de la Planta/anatomía & histología , Docilidad , PresiónRESUMEN
Microarray technology has great potential for improving our understanding of biological processes, medical conditions, and diseases. Often, microarray datasets are collected using different microarray platforms (provided by different companies) under different conditions in different laboratories. The cross-platform and cross-laboratory concordance of the microarray technology needs to be evaluated before it can be successfully and reliably applied in biological/clinical practice. New measures and techniques are proposed for comparing and evaluating the quality of microarray datasets generated from different platforms/laboratories. These measures and techniques are based on the following philosophy: the practical usefulness of the microarray technology may be confirmed if discriminating genes and classifiers, which are the focus of most, if not all, comparative investigations, discovered/trained from data collected in one lab/platform combination can be transferred to another lab/platform combination. The rationale is that the nondiscriminating genes might not be as strongly regulated as the discriminating genes, by the biological process of the tissue cells under study, and hence they may behave more randomly than the discriminating genes. Our experiment results, on microarray datasets generated from different platforms/laboratories using the reference mRNA samples in the Microarray Quality Control (MAQC) project, showed that DNA microarrays can produce highly repeatable data in a cross-platform cross-lab manner, when one focuses on the discriminating genes and classifiers. In our comparative study, we compare samples of one type against samples of another type; the methodology can be applied to situations where one compares one arbitrary class of data against another. Other findings include: (1) using three discriminating-gene/classifier-based methods to test the concordance between microarray datasets gave consistent results; (2) when noisy (nondiscriminating) genes were removed, the microarray datasets from different laboratories using common platform were found to be highly concordant, and the data generated using most of the commercial platforms studied here were also found to be concordant with each other; (3) several series of artificial datasets with known degree of difference were created, to establish a bridge between consistency rate and P-value, allowing us to estimate P-value if consistency rate between two datasets is known.
Asunto(s)
Algoritmos , Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reconocimiento de Normas Patrones Automatizadas/métodosRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
OBJECTIVE: Observations of a metabolically unhealthy normal weight phenotype suggest that a lack of favorable adiposity contributes to an increased risk of type 2 diabetes. We aimed to identify causal blood biomarkers linking favorable adiposity with type 2 diabetes risk for use in cardiometabolic risk assessments. RESEARCH DESIGN AND METHODS: A weighted polygenic risk score (PRS) underpinning metabolically favorable adiposity was validated in the UK Biobank (n = 341,872) and the Outcome Reduction With Initial Glargine Intervention (ORIGIN Trial) (n = 8,197) and tested for association with 238 blood biomarkers. Associated biomarkers were investigated for causation with type 2 diabetes risk using Mendelian randomization and for its performance in predictive models for incident major adverse cardiovascular events (MACE). RESULTS: Of the 238 biomarkers tested, only insulin-like growth factor-binding protein (IGFBP)-3 concentration was associated with the PRS, where a 1 unit increase in PRS predicted a 0.28-SD decrease in IGFBP-3 blood levels (P < 0.05/238). Higher IGFBP-3 levels causally increased type 2 diabetes risk (odds ratio 1.26 per 1 SD genetically determined IGFBP-3 level [95% CI 1.11-1.43]) and predicted a higher incidence of MACE (hazard ratio 1.13 per 1 SD IGFBP-3 concentration [95% CI 1.07-1.20]). Adding IGFBP-3 concentrations to the standard clinical assessment of metabolic health enhanced the prediction of incident MACE, with a net reclassification improvement of 11.5% in normal weight individuals (P = 0.004). CONCLUSIONS: We identified IGFBP-3 as a novel biomarker linking a lack of favorable adiposity with type 2 diabetes risk and a predictive marker for incident cardiovascular events. Using IGFBP-3 blood concentrations may improve the risk assessment of cardiometabolic diseases.
Asunto(s)
Adiposidad/genética , Enfermedades Cardiovasculares/genética , Diabetes Mellitus Tipo 2/genética , Proteína 3 de Unión a Factor de Crecimiento Similar a la Insulina/sangre , Obesidad Metabólica Benigna/sangre , Biomarcadores/sangre , Enfermedades Cardiovasculares/epidemiología , Diabetes Mellitus Tipo 2/epidemiología , Femenino , Humanos , Incidencia , Masculino , Análisis de la Aleatorización Mendeliana , Persona de Mediana Edad , Obesidad Metabólica Benigna/genética , Oportunidad Relativa , Fenotipo , Modelos de Riesgos Proporcionales , Medición de Riesgo , Factores de Riesgo , Reino Unido/epidemiologíaRESUMEN
Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction R 2 of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (N = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (p < 2.2 × 10-16) and BMI (p < 1.57 × 10-4), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (N = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.
RESUMEN
Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.
Asunto(s)
Estudios de Asociación Genética/métodos , Modelos Genéticos , Modelos Estadísticos , Herencia Multifactorial , Algoritmos , Ligamiento Genético , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
MOTIVATION: It is commonly believed that suitable analysis of microarray gene expression profile data can lead to better understanding of diseases, and better ways to diagnose and treat diseases. To achieve those goals, it is of interest to discover the gene interaction networks, and perhaps even pathways, underlying given diseases from such data. In this paper, we consider methods for efficiently discovering highly differentiative gene groups (HDGG), which may provide insights on gene interaction networks. HDGGs are groups of genes which completely or nearly completely characterize the diseased or normal tissues. Discovering HDGGs is challenging, due to the high dimensionality of the data. RESULTS: Our methods are based on the novel concept of gene clubs. A gene club consists of a set of genes having high potential to be interactive with each other. The methods can (i) efficiently discover signature HDGGs which completely characterize the diseased and the normal tissues respectively, (ii) find strongest or near strongest HDGGs containing any given gene, and (iii) find much stronger HDGGs than previous methods. As part of the experimental evaluation, the methods are applied to colon, prostate, ovarian, and breast cancer, and leukemia and so on. Some of the genes in the extracted signature HDGGs have known biological functions, and some have attracted little attention in biology and medicine. We hope that appropriate study on them can lead to medical breakthroughs. Some HDGGs for colon and prostate cancers are listed here. The website listed below contains HDGGs for the other cancers. AVAILABILITY: HDGG is implemented in C++ and runs on Unix or Windows platform. The code is available at: http://www.cs.wright.edu/~gdong/hdgg/.
Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Modelos Genéticos , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Mapeo de Interacción de Proteínas/métodos , Simulación por Computador , Interpretación Estadística de Datos , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica/genética , Humanos , Modelos Estadísticos , Proteínas de Neoplasias/genética , Neoplasias/genética , Reproducibilidad de los Resultados , Tamaño de la Muestra , Sensibilidad y Especificidad , Estadística como AsuntoRESUMEN
A significant challenge to the effective application of RNA-seq to the complete transcript analysis of low quantity and/or degraded samples is the amplification of minimal input RNA to enable sequencing library construction. Several strategies have been commercialized in order to facilitate this goal. However, each strategy has its own specific protocols and methodology, and each may introduce unique bias and in some cases show specific preference for a collection of sequences. Our wider investigation of human spermatozoal RNAs was able to reveal their complexity despite being generally characterized by low quantity and high fragmentation. In this study, the following four commercially available RNA-seq amplification and library protocols for the preparation of low quantity/highly fragmented samples, SMARTer™ Ultra Low RNA (SU) for Illumina® Sequencing, SeqPlex RNA Amplification (SP), Ovation® RNA-Seq System V2 (OR), and Ovation® RNA-Seq Formalin Fixed Paraffin Embedded System (FFPES) were assessed using human sperm RNAs. Further investigation analyzed the effects on the end results of two different library preparation methods, Encore NGS Multiplex System I (Enc) and Ovation Ultralow Library Systems (UL), that appeared best suited to this type of RNA, along with other potential confounding factors such as FFPE preservation. Our results indicate that for each library preparation protocol, the differences in the initial amount of input RNA and choice of RNA purification step do not generate marked differences in terms of RNA profiling. However, substantial disparity is introduced by individual amplification methods prior to library construction. These significant differences may be caused by the different priming methods or amplification strategies used in each of the four different protocols examined. The observation of intra-sample variation introduced by the choice of protocol highlights the role that external factors play in planning and subsequent reliable interpretation of results of any RNA-seq experiment.