RESUMEN
Polygenic risk scores (PRSs) have been among the leading advances in biomedicine in recent years. As a proxy of genetic liability, PRSs are utilised across multiple fields and applications. While numerous statistical and machine learning methods have been developed to optimise their predictive accuracy, these typically distil genetic liability to a single number based on aggregation of an individual's genome-wide risk alleles. This results in a key loss of information about an individual's genetic profile, which could be critical given the functional sub-structure of the genome and the heterogeneity of complex disease. In this manuscript, we introduce a 'pathway polygenic' paradigm of disease risk, in which multiple genetic liabilities underlie complex diseases, rather than a single genome-wide liability. We describe a method and accompanying software, PRSet, for computing and analysing pathway-based PRSs, in which polygenic scores are calculated across genomic pathways for each individual. We evaluate the potential of pathway PRSs in two distinct ways, creating two major sections: (1) In the first section, we benchmark PRSet as a pathway enrichment tool, evaluating its capacity to capture GWAS signal in pathways. We find that for target sample sizes of >10,000 individuals, pathway PRSs have similar power for evaluating pathway enrichment as leading methods MAGMA and LD score regression, with the distinct advantage of providing individual-level estimates of genetic liability for each pathway -opening up a range of pathway-based PRS applications, (2) In the second section, we evaluate the performance of pathway PRSs for disease stratification. We show that using a supervised disease stratification approach, pathway PRSs (computed by PRSet) outperform two standard genome-wide PRSs (computed by C+T and lassosum) for classifying disease subtypes in 20 of 21 scenarios tested. As the definition and functional annotation of pathways becomes increasingly refined, we expect pathway PRSs to offer key insights into the heterogeneity of complex disease and treatment response, to generate biologically tractable therapeutic targets from polygenic signal, and, ultimately, to provide a powerful path to precision medicine.
Asunto(s)
Genómica , Herencia Multifactorial , Humanos , Factores de Riesgo , Herencia Multifactorial/genética , Estudio de Asociación del Genoma Completo , Programas Informáticos , Predisposición Genética a la EnfermedadRESUMEN
The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.
Asunto(s)
Estudios de Asociación Genética/métodos , Predisposición Genética a la Enfermedad , Genética de Población/métodos , Herencia Multifactorial , Algoritmos , Alelos , Bancos de Muestras Biológicas , Variación Genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Modelos Genéticos , Fenotipo , Reproducibilidad de los Resultados , Reino UnidoRESUMEN
Neisseria meningitidis protects itself from complement-mediated killing by binding complement factor H (FH). Previous studies associated susceptibility to meningococcal disease (MD) with variation in CFH, but the causal variants and underlying mechanism remained unknown. Here we attempted to define the association more accurately by sequencing the CFH-CFHR locus and imputing missing genotypes in previously obtained GWAS datasets of MD-affected individuals of European ancestry and matched controls. We identified a CFHR3 SNP that provides protection from MD (rs75703017, p value = 1.1 × 10-16) by decreasing the concentration of FH in the blood (p value = 1.4 × 10-11). We subsequently used dual-luciferase studies and CRISPR gene editing to establish that deletion of rs75703017 increased FH expression in hepatocyte by preventing promotor inhibition. Our data suggest that reduced concentrations of FH in the blood confer protection from MD; with reduced access to FH, N. meningitidis is less able to shield itself from complement-mediated killing.
Asunto(s)
Factor H de Complemento , Infecciones Meningocócicas , Proteínas Sanguíneas/genética , Factor H de Complemento/genética , Proteínas del Sistema Complemento/genética , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Infecciones Meningocócicas/genéticaRESUMEN
BACKGROUND: Evidence is urgently needed to support treatment decisions for children with multisystem inflammatory syndrome (MIS-C) associated with severe acute respiratory syndrome coronavirus 2. METHODS: We performed an international observational cohort study of clinical and outcome data regarding suspected MIS-C that had been uploaded by physicians onto a Web-based database. We used inverse-probability weighting and generalized linear models to evaluate intravenous immune globulin (IVIG) as a reference, as compared with IVIG plus glucocorticoids and glucocorticoids alone. There were two primary outcomes: the first was a composite of inotropic support or mechanical ventilation by day 2 or later or death; the second was a reduction in disease severity on an ordinal scale by day 2. Secondary outcomes included treatment escalation and the time until a reduction in organ failure and inflammation. RESULTS: Data were available regarding the course of treatment for 614 children from 32 countries from June 2020 through February 2021; 490 met the World Health Organization criteria for MIS-C. Of the 614 children with suspected MIS-C, 246 received primary treatment with IVIG alone, 208 with IVIG plus glucocorticoids, and 99 with glucocorticoids alone; 22 children received other treatment combinations, including biologic agents, and 39 received no immunomodulatory therapy. Receipt of inotropic or ventilatory support or death occurred in 56 patients who received IVIG plus glucocorticoids (adjusted odds ratio for the comparison with IVIG alone, 0.77; 95% confidence interval [CI], 0.33 to 1.82) and in 17 patients who received glucocorticoids alone (adjusted odds ratio, 0.54; 95% CI, 0.22 to 1.33). The adjusted odds ratios for a reduction in disease severity were similar in the two groups, as compared with IVIG alone (0.90 for IVIG plus glucocorticoids and 0.93 for glucocorticoids alone). The time until a reduction in disease severity was similar in the three groups. CONCLUSIONS: We found no evidence that recovery from MIS-C differed after primary treatment with IVIG alone, IVIG plus glucocorticoids, or glucocorticoids alone, although significant differences may emerge as more data accrue. (Funded by the European Union's Horizon 2020 Program and others; BATS ISRCTN number, ISRCTN69546370.).
Asunto(s)
Tratamiento Farmacológico de COVID-19 , Glucocorticoides/uso terapéutico , Inmunoglobulinas Intravenosas/uso terapéutico , Síndrome de Respuesta Inflamatoria Sistémica/tratamiento farmacológico , Adolescente , Anticuerpos Antivirales , COVID-19/inmunología , COVID-19/mortalidad , COVID-19/terapia , Niño , Preescolar , Estudios de Cohortes , Intervalos de Confianza , Quimioterapia Combinada , Femenino , Hospitalización , Humanos , Inmunomodulación , Masculino , Puntaje de Propensión , Análisis de Regresión , Respiración Artificial , SARS-CoV-2/inmunología , Síndrome de Respuesta Inflamatoria Sistémica/inmunología , Síndrome de Respuesta Inflamatoria Sistémica/mortalidad , Síndrome de Respuesta Inflamatoria Sistémica/terapia , Resultado del TratamientoRESUMEN
BACKGROUND: Kawasaki disease (KD) is a systemic vasculitis that mainly affects children under 5 years of age. Up to 30% of patients develop coronary artery abnormalities, which are reduced with early treatment. Timely diagnosis of KD is challenging but may become more straightforward with the recent discovery of a whole-blood host response classifier that discriminates KD patients from patients with other febrile conditions. Here, we bridged this microarray-based classifier to a clinically applicable quantitative reverse transcription-polymerase chain reaction (qRT-PCR) assay: the Kawasaki Disease Gene Expression Profiling (KiDs-GEP) classifier. METHODS: We designed and optimized a qRT-PCR assay and applied it to a subset of samples previously used for the classifier discovery to reweight the original classifier. RESULTS: The performance of the KiDs-GEP classifier was comparable to the original classifier with a cross-validated area under the ROC curve of 0.964 [95% CI: 0.924-1.00] vs 0.992 [95% CI: 0.978-1.00], respectively. Both classifiers demonstrated similar trends over various disease conditions, with the clearest distinction between individuals diagnosed with KD vs viral infections. CONCLUSION: We successfully bridged the microarray-based classifier into the KiDs-GEP classifier, a more rapid and more cost-efficient qRT-PCR assay, bringing a diagnostic test for KD closer to the hospital clinical laboratory. IMPACT: A diagnostic test is needed for Kawasaki disease and is currently not available. We describe the development of a One-Step multiplex qRT-PCR assay and the subsequent modification (i.e., bridging) of the microarray-based host response classifier previously described by Wright et al. The bridged KiDs-GEP classifier performs well in discriminating Kawasaki disease patients from febrile controls. This host response clinical test for Kawasaki disease can be adapted to the hospital clinical laboratory.
Asunto(s)
Síndrome Mucocutáneo Linfonodular , Niño , Humanos , Preescolar , Síndrome Mucocutáneo Linfonodular/diagnóstico , Síndrome Mucocutáneo Linfonodular/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Perfilación de la Expresión Génica , Fiebre , Curva ROCRESUMEN
MOTIVATION: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. RESULTS: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals' ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user's research needs. AVAILABILITY AND IMPLEMENTATION: An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Estudio de Asociación del Genoma Completo , Aprendizaje Automático , Redes Reguladoras de Genes , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. RESULT: We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats. CONCLUSION: The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.
Asunto(s)
Inversión Cromosómica/genética , Genotipo , HumanosRESUMEN
The phenotypic effect of some single nucleotide polymorphisms (SNPs) depends on their parental origin. We present a novel approach to detect parent-of-origin effects (POEs) in genome-wide genotype data of unrelated individuals. The method exploits increased phenotypic variance in the heterozygous genotype group relative to the homozygous groups. We applied the method to >56,000 unrelated individuals to search for POEs influencing body mass index (BMI). Six lead SNPs were carried forward for replication in five family-based studies (of â¼4,000 trios). Two SNPs replicated: the paternal rs2471083-C allele (located near the imprinted KCNK9 gene) and the paternal rs3091869-T allele (located near the SLC2A10 gene) increased BMI equally (betaâ=â0.11 (SD), P<0.0027) compared to the respective maternal alleles. Real-time PCR experiments of lymphoblastoid cell lines from the CEPH families showed that expression of both genes was dependent on parental origin of the SNPs alleles (P<0.01). Our scheme opens new opportunities to exploit GWAS data of unrelated individuals to identify POEs and demonstrates that they play an important role in adult obesity.
Asunto(s)
Proteínas Facilitadoras del Transporte de la Glucosa/genética , Obesidad/genética , Polimorfismo de Nucleótido Simple/genética , Canales de Potasio de Dominio Poro en Tándem/genética , Adulto , Índice de Masa Corporal , Femenino , Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Impresión Genómica , Genotipo , Humanos , Masculino , Obesidad/patología , Población Blanca/genéticaRESUMEN
We present the analysis of a prospective multicentre study to investigate genetic effects on the prognosis of newly treated epilepsy. Patients with a new clinical diagnosis of epilepsy requiring medication were recruited and followed up prospectively. The clinical outcome was defined as freedom from seizures for a minimum of 12 months in accordance with the consensus statement from the International League Against Epilepsy (ILAE). Genetic effects on remission of seizures after starting treatment were analysed with and without adjustment for significant clinical prognostic factors, and the results from each cohort were combined using a fixed-effects meta-analysis. After quality control (QC), we analysed 889 newly treated epilepsy patients using 472 450 genotyped and 6.9 × 10(6) imputed single-nucleotide polymorphisms. Suggestive evidence for association (defined as Pmeta < 5.0 × 10(-7)) with remission of seizures after starting treatment was observed at three loci: 6p12.2 (rs492146, Pmeta = 2.1 × 10(-7), OR[G] = 0.57), 9p23 (rs72700966, Pmeta = 3.1 × 10(-7), OR[C] = 2.70) and 15q13.2 (rs143536437, Pmeta = 3.2 × 10(-7), OR[C] = 1.92). Genes of biological interest at these loci include PTPRD and ARHGAP11B (encoding functions implicated in neuronal development) and GSTA4 (a phase II biotransformation enzyme). Pathway analysis using two independent methods implicated a number of pathways in the prognosis of epilepsy, including KEGG categories 'calcium signaling pathway' and 'phosphatidylinositol signaling pathway'. Through a series of power curves, we conclude that it is unlikely any single common variant explains >4.4% of the variation in the outcome of newly treated epilepsy.
Asunto(s)
Epilepsia/diagnóstico , Epilepsia/genética , Estudio de Asociación del Genoma Completo , Adulto , Anticonvulsivantes/uso terapéutico , Señalización del Calcio/genética , Cromosomas Humanos Par 15 , Cromosomas Humanos Par 6 , Cromosomas Humanos Par 9 , Epilepsia/tratamiento farmacológico , Femenino , Predisposición Genética a la Enfermedad , Variación Genética , Humanos , Masculino , Persona de Mediana Edad , Fosfatidilinositoles/genética , Polimorfismo de Nucleótido Simple , Pronóstico , Estudios Prospectivos , Resultado del Tratamiento , Adulto JovenRESUMEN
IMPORTANCE: Because clinical features do not reliably distinguish bacterial from viral infection, many children worldwide receive unnecessary antibiotic treatment, while bacterial infection is missed in others. OBJECTIVE: To identify a blood RNA expression signature that distinguishes bacterial from viral infection in febrile children. DESIGN, SETTING, AND PARTICIPANTS: Febrile children presenting to participating hospitals in the United Kingdom, Spain, the Netherlands, and the United States between 2009-2013 were prospectively recruited, comprising a discovery group and validation group. Each group was classified after microbiological investigation as having definite bacterial infection, definite viral infection, or indeterminate infection. RNA expression signatures distinguishing definite bacterial from viral infection were identified in the discovery group and diagnostic performance assessed in the validation group. Additional validation was undertaken in separate studies of children with meningococcal disease (n = 24) and inflammatory diseases (n = 48) and on published gene expression datasets. EXPOSURES: A 2-transcript RNA expression signature distinguishing bacterial infection from viral infection was evaluated against clinical and microbiological diagnosis. MAIN OUTCOMES AND MEASURES: Definite bacterial and viral infection was confirmed by culture or molecular detection of the pathogens. Performance of the RNA signature was evaluated in the definite bacterial and viral group and in the indeterminate infection group. RESULTS: The discovery group of 240 children (median age, 19 months; 62% male) included 52 with definite bacterial infection, of whom 36 (69%) required intensive care, and 92 with definite viral infection, of whom 32 (35%) required intensive care. Ninety-six children had indeterminate infection. Analysis of RNA expression data identified a 38-transcript signature distinguishing bacterial from viral infection. A smaller (2-transcript) signature (FAM89A and IFI44L) was identified by removing highly correlated transcripts. When this 2-transcript signature was implemented as a disease risk score in the validation group (130 children, with 23 definite bacterial, 28 definite viral, and 79 indeterminate infections; median age, 17 months; 57% male), all 23 patients with microbiologically confirmed definite bacterial infection were classified as bacterial (sensitivity, 100% [95% CI, 100%-100%]) and 27 of 28 patients with definite viral infection were classified as viral (specificity, 96.4% [95% CI, 89.3%-100%]). When applied to additional validation datasets from patients with meningococcal and inflammatory diseases, bacterial infection was identified with a sensitivity of 91.7% (95% CI, 79.2%-100%) and 90.0% (95% CI, 70.0%-100%), respectively, and with specificity of 96.0% (95% CI, 88.0%-100%) and 95.8% (95% CI, 89.6%-100%). Of the children in the indeterminate groups, 46.3% (63/136) were classified as having bacterial infection, although 94.9% (129/136) received antibiotic treatment. CONCLUSIONS AND RELEVANCE: This study provides preliminary data regarding test accuracy of a 2-transcript host RNA signature discriminating bacterial from viral infection in febrile children. Further studies are needed in diverse groups of patients to assess accuracy and clinical utility of this test in different clinical settings.
Asunto(s)
Antígenos/sangre , Infecciones Bacterianas/diagnóstico , Proteínas del Citoesqueleto/sangre , Fiebre/microbiología , Fiebre/virología , ARN/sangre , Virosis/diagnóstico , Antibacterianos/administración & dosificación , Antígenos/genética , Área Bajo la Curva , Infecciones Bacterianas/complicaciones , Infecciones Bacterianas/genética , Biomarcadores/sangre , Preescolar , Coinfección/diagnóstico , Coinfección/microbiología , Coinfección/virología , Proteínas del Citoesqueleto/genética , Diagnóstico Diferencial , Femenino , Fiebre/sangre , Perfilación de la Expresión Génica , Marcadores Genéticos , Humanos , Lactante , Modelos Logísticos , Masculino , Estudios Prospectivos , ARN/análisis , ARN/genética , Riesgo , Sensibilidad y Especificidad , Índice de Severidad de la Enfermedad , Virosis/complicaciones , Virosis/genéticaRESUMEN
Twin and family studies indicate that the timing of primary tooth eruption is highly heritable, with estimates typically exceeding 80%. To identify variants involved in primary tooth eruption, we performed a population-based genome-wide association study of 'age at first tooth' and 'number of teeth' using 5998 and 6609 individuals, respectively, from the Avon Longitudinal Study of Parents and Children (ALSPAC) and 5403 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966). We tested 2 446 724 SNPs imputed in both studies. Analyses were controlled for the effect of gestational age, sex and age of measurement. Results from the two studies were combined using fixed effects inverse variance meta-analysis. We identified a total of 15 independent loci, with 10 loci reaching genome-wide significance (P < 5 × 10(-8)) for 'age at first tooth' and 11 loci for 'number of teeth'. Together, these associations explain 6.06% of the variation in 'age of first tooth' and 4.76% of the variation in 'number of teeth'. The identified loci included eight previously unidentified loci, some containing genes known to play a role in tooth and other developmental pathways, including an SNP in the protein-coding region of BMP4 (rs17563, P = 9.080 × 10(-17)). Three of these loci, containing the genes HMGA2, AJUBA and ADK, also showed evidence of association with craniofacial distances, particularly those indexing facial width. Our results suggest that the genome-wide association approach is a powerful strategy for detecting variants involved in tooth eruption, and potentially craniofacial growth and more generally organ development.
Asunto(s)
Estatura/genética , Cara/anatomía & histología , Sitios Genéticos , Erupción Dental/genética , Cromosomas Humanos , Dentición , Femenino , Finlandia , Pleiotropía Genética , Estudio de Asociación del Genoma Completo , Humanos , Estudios Longitudinales , Polimorfismo de Nucleótido SimpleRESUMEN
There are many known examples of multiple semi-independent associations at individual loci; such associations might arise either because of true allelic heterogeneity or because of imperfect tagging of an unobserved causal variant. This phenomenon is of great importance in monogenic traits but has not yet been systematically investigated and quantified in complex-trait genome-wide association studies (GWASs). Here, we describe a multi-SNP association method that estimates the effect of loci harboring multiple association signals by using GWAS summary statistics. Applying the method to a large anthropometric GWAS meta-analysis (from the Genetic Investigation of Anthropometric Traits consortium study), we show that for height, body mass index (BMI), and waist-to-hip ratio (WHR), 3%, 2%, and 1%, respectively, of additional phenotypic variance can be explained on top of the previously reported 10% (height), 1.5% (BMI), and 1% (WHR). The method also permitted a substantial increase (by up to 50%) in the number of loci that replicate in a discovery-validation design. Specifically, we identified 74 loci at which the multi-SNP, a linear combination of SNPs, explains significantly more variance than does the best individual SNP. A detailed analysis of multi-SNPs shows that most of the additional variability explained is derived from SNPs that are not in linkage disequilibrium with the lead SNP, suggesting a major contribution of allelic heterogeneity to the missing heritability.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Índice de Masa Corporal , Humanos , Lípidos/sangre , Lípidos/genética , Fenotipo , Relación Cintura-CaderaRESUMEN
Here we present BridgePRS, a novel Bayesian polygenic risk score (PRS) method that leverages shared genetic effects across ancestries to increase PRS portability. We evaluate BridgePRS via simulations and real UK Biobank data across 19 traits in individuals of African, South Asian and East Asian ancestry, using both UK Biobank and Biobank Japan genome-wide association study summary statistics; out-of-cohort validation is performed in the Mount Sinai (New York) BioMe biobank. BridgePRS is compared with the leading alternative, PRS-CSx, and two other PRS methods. Simulations suggest that the performance of BridgePRS relative to PRS-CSx increases as uncertainty increases: with lower trait heritability, higher polygenicity and greater between-population genetic diversity; and when causal variants are not present in the data. In real data, BridgePRS has a 61% larger average R2 than PRS-CSx in out-of-cohort prediction of African ancestry samples in BioMe (P = 6 × 10-5). BridgePRS is a computationally efficient, user-friendly and powerful approach for PRS analyses in non-European ancestries.
Asunto(s)
Predisposición Genética a la Enfermedad , Puntuación de Riesgo Genético , Humanos , Factores de Riesgo , Estudio de Asociación del Genoma Completo , Teorema de Bayes , Polimorfismo de Nucleótido Simple/genética , Herencia Multifactorial/genéticaRESUMEN
Polygenic risk scores (PRSs) have improved in predictive performance, but several challenges remain to be addressed before PRSs can be implemented in the clinic, including reduced predictive performance of PRSs in diverse populations, and the interpretation and communication of genetic results to both providers and patients. To address these challenges, the National Human Genome Research Institute-funded Electronic Medical Records and Genomics (eMERGE) Network has developed a framework and pipeline for return of a PRS-based genome-informed risk assessment to 25,000 diverse adults and children as part of a clinical study. From an initial list of 23 conditions, ten were selected for implementation based on PRS performance, medical actionability and potential clinical utility, including cardiometabolic diseases and cancer. Standardized metrics were considered in the selection process, with additional consideration given to strength of evidence in African and Hispanic populations. We then developed a pipeline for clinical PRS implementation (score transfer to a clinical laboratory, validation and verification of score performance), and used genetic ancestry to calibrate PRS mean and variance, utilizing genetically diverse data from 13,475 participants of the All of Us Research Program cohort to train and test model parameters. Finally, we created a framework for regulatory compliance and developed a PRS clinical report for return to providers and for inclusion in an additional genome-informed risk assessment. The initial experience from eMERGE can inform the approach needed to implement PRS-based testing in diverse clinical settings.
Asunto(s)
Enfermedad Crónica , Puntuación de Riesgo Genético , Salud Poblacional , Adulto , Niño , Humanos , Comunicación , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Factores de Riesgo , Estados UnidosRESUMEN
We construct data exploration tools for recognizing important covariate patterns associated with a phenotype, with particular focus on searching for association with gene-gene patterns. To this end, we propose a new variable selection procedure that employs latent selection weights and compare it to an alternative formulation. The selection procedures are implemented in tandem with a Dirichlet process mixture model for the flexible clustering of genetic and epidemiological profiles. We illustrate our approach with the aid of simulated data and the analysis of a real data set from a genome-wide association study.
Asunto(s)
Teorema de Bayes , Estudios de Asociación Genética/métodos , Modelos Genéticos , Análisis por Conglomerados , Simulación por Computador , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Neoplasias Pulmonares/genética , Modelos Estadísticos , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
Rheumatoid arthritis (RA) is the commonest chronic, systemic, inflammatory disorder affecting â¼1% of the world population. It has a strong genetic component and a growing number of associated genes have been discovered in genome-wide association studies (GWAS), which nevertheless only account for 23% of the total genetic risk. We aimed to identify additional susceptibility loci through the analysis of GWAS in the context of biological function. We bridge the gap between pathway and gene-oriented analyses of GWAS, by introducing a pathway-driven gene stability-selection methodology that identifies potential causal genes in the top-associated disease pathways that may be driving the pathway association signals. We analysed the WTCCC and the NARAC studies of â¼5000 and â¼2000 subjects, respectively. We examined 700 pathways comprising â¼8000 genes. Ranking pathways by significance revealed that the NARAC top-ranked â¼6% laid within the top 10% of WTCCC. Gene selection on those pathways identified 58 genes in WTCCC and 61 in NARAC; 21 of those were common (P(overlap)< 10(-21)), of which 16 were novel discoveries. Among the identified genes, we validated 10 known RA associations in WTCCC and 13 in NARAC, not discovered using single-SNP approaches on the same data. Gene ontology functional enrichment analysis on the identified genes showed significant over-representation of signalling activity (P< 10(-29)) in both studies. Our findings suggest a novel model of RA genetic predisposition, which involves cell-membrane receptors and genes in second messenger signalling systems, in addition to genes that regulate immune responses, which have been the focus of interest previously.
Asunto(s)
Artritis Reumatoide/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple , Transducción de Señal/genética , Transducción de Señal/fisiologíaRESUMEN
Tooth development is a highly heritable process which relates to other growth and developmental processes, and which interacts with the development of the entire craniofacial complex. Abnormalities of tooth development are common, with tooth agenesis being the most common developmental anomaly in humans. We performed a genome-wide association study of time to first tooth eruption and number of teeth at one year in 4,564 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966) and 1,518 individuals from the Avon Longitudinal Study of Parents and Children (ALSPAC). We identified 5 loci at P<5x10(-8), and 5 with suggestive association (P<5x10(-6)). The loci included several genes with links to tooth and other organ development (KCNJ2, EDA, HOXB2, RAD51L1, IGF2BP1, HMGA2, MSRB3). Genes at four of the identified loci are implicated in the development of cancer. A variant within the HOXB gene cluster associated with occlusion defects requiring orthodontic treatment by age 31 years.
Asunto(s)
Sitios Genéticos/genética , Estudio de Asociación del Genoma Completo , Diente Primario/crecimiento & desarrollo , Alelos , Inglaterra , Femenino , Finlandia , Genotipo , Humanos , Lactante , Desequilibrio de Ligamiento/genética , Estudios Longitudinales , Masculino , Metaanálisis como Asunto , Parto , Polimorfismo de Nucleótido Simple/genética , Erupción Dental/genéticaRESUMEN
OBJECTIVES: We aimed at extending the Natural and Orthogonal Interaction (NOIA) framework, developed for modeling gene-gene interactions in the analysis of quantitative traits, to allow for reduced genetic models, dichotomous traits, and gene-environment interactions. We evaluate the performance of the NOIA statistical models using simulated data and lung cancer data. METHODS: The NOIA statistical models are developed for additive, dominant, and recessive genetic models as well as for a binary environmental exposure. Using the Kronecker product rule, a NOIA statistical model is built to model gene-environment interactions. By treating the genotypic values as the logarithm of odds, the NOIA statistical models are extended to the analysis of case-control data. RESULTS: Our simulations showed that power for testing associations while allowing for interaction using the NOIA statistical model is much higher than using functional models for most of the scenarios we simulated. When applied to lung cancer data, much smaller p values were obtained using the NOIA statistical model for either the main effects or the SNP-smoking interactions for some of the SNPs tested. CONCLUSION: The NOIA statistical models are usually more powerful than the functional models in detecting main effects and interaction effects for both quantitative traits and binary traits.
Asunto(s)
Detección Precoz del Cáncer/métodos , Interacción Gen-Ambiente , Modelos Logísticos , Neoplasias Pulmonares/genética , Estudios de Casos y Controles , Simulación por Computador , Bases de Datos Factuales , Frecuencia de los Genes , Sitios Genéticos , Predisposición Genética a la Enfermedad , Genética de Población/métodos , Estudio de Asociación del Genoma Completo , Humanos , Neoplasias Pulmonares/diagnóstico , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Fumar/efectos adversosRESUMEN
Polygenic Risk Scores (PRS) have huge potential to contribute to biomedical research and to a future of precision medicine, but to date their calculation relies largely on Europeanancestry GWAS data. This global bias makes most PRS substantially less accurate in individuals of non-European ancestry. Here we present BridgePRS , a novel Bayesian PRS method that leverages shared genetic effects across ancestries to increase the accuracy of PRS in non-European populations. The performance of BridgePRS is evaluated in simulated data and real UK Biobank (UKB) data across 19 traits in African, South Asian and East Asian ancestry individuals, using both UKB and Biobank Japan GWAS summary statistics. BridgePRS is compared to the leading alternative, PRS-CSx , and two single-ancestry PRS methods adapted for trans-ancestry prediction. PRS trained in the UK Biobank are then validated out-of-cohort in the independent Mount Sinai (New York) Bio Me Biobank. Simulations reveal that BridgePRS performance, relative to PRS-CSx , increases as uncertainty increases: with lower heritability, higher polygenicity, greater between-population genetic diversity, and when causal variants are not present in the data. Our simulation results are consistent with real data analyses in which BridgePRS has better predictive accuracy in African ancestry samples, especially in out-of-cohort prediction (into Bio Me ), which shows a 60% boost in mean R 2 compared to PRS-CSx ( P = 2 × 10 -6 ). BridgePRS performs the full PRS analysis pipeline, is computationally efficient, and is a powerful method for deriving PRS in diverse and under-represented ancestry populations.