RESUMEN
Host-mediated lung inflammation is present1, and drives mortality2, in the critical illness caused by coronavirus disease 2019 (COVID-19). Host genetic variants associated with critical illness may identify mechanistic targets for therapeutic development3. Here we report the results of the GenOMICC (Genetics Of Mortality In Critical Care) genome-wide association study in 2,244 critically ill patients with COVID-19 from 208 UK intensive care units. We have identified and replicated the following new genome-wide significant associations: on chromosome 12q24.13 (rs10735079, P = 1.65 × 10-8) in a gene cluster that encodes antiviral restriction enzyme activators (OAS1, OAS2 and OAS3); on chromosome 19p13.2 (rs74956615, P = 2.3 × 10-8) near the gene that encodes tyrosine kinase 2 (TYK2); on chromosome 19p13.3 (rs2109069, P = 3.98 × 10-12) within the gene that encodes dipeptidyl peptidase 9 (DPP9); and on chromosome 21q22.1 (rs2236757, P = 4.99 × 10-8) in the interferon receptor gene IFNAR2. We identified potential targets for repurposing of licensed medications: using Mendelian randomization, we found evidence that low expression of IFNAR2, or high expression of TYK2, are associated with life-threatening disease; and transcriptome-wide association in lung tissue revealed that high expression of the monocyte-macrophage chemotactic receptor CCR2 is associated with severe COVID-19. Our results identify robust genetic signals relating to key host antiviral defence mechanisms and mediators of inflammatory organ damage in COVID-19. Both mechanisms may be amenable to targeted treatment with existing drugs. However, large-scale randomized clinical trials will be essential before any change to clinical practice.
Asunto(s)
COVID-19/genética , COVID-19/fisiopatología , Enfermedad Crítica , 2',5'-Oligoadenilato Sintetasa/genética , COVID-19/patología , Cromosomas Humanos Par 12/genética , Cromosomas Humanos Par 19/genética , Cromosomas Humanos Par 21/genética , Cuidados Críticos , Dipeptidil-Peptidasas y Tripeptidil-Peptidasas/genética , Reposicionamiento de Medicamentos , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Inflamación/genética , Inflamación/patología , Inflamación/fisiopatología , Pulmón/patología , Pulmón/fisiopatología , Pulmón/virología , Masculino , Familia de Multigenes/genética , Receptor de Interferón alfa y beta/genética , Receptores CCR2/genética , TYK2 Quinasa/genética , Reino UnidoRESUMEN
BACKGROUND: The U.K. 100,000 Genomes Project is in the process of investigating the role of genome sequencing in patients with undiagnosed rare diseases after usual care and the alignment of this research with health care implementation in the U.K. National Health Service. Other parts of this project focus on patients with cancer and infection. METHODS: We conducted a pilot study involving 4660 participants from 2183 families, among whom 161 disorders covering a broad spectrum of rare diseases were present. We collected data on clinical features with the use of Human Phenotype Ontology terms, undertook genome sequencing, applied automated variant prioritization on the basis of applied virtual gene panels and phenotypes, and identified novel pathogenic variants through research analysis. RESULTS: Diagnostic yields varied among family structures and were highest in family trios (both parents and a proband) and families with larger pedigrees. Diagnostic yields were much higher for disorders likely to have a monogenic cause (35%) than for disorders likely to have a complex cause (11%). Diagnostic yields for intellectual disability, hearing disorders, and vision disorders ranged from 40 to 55%. We made genetic diagnoses in 25% of the probands. A total of 14% of the diagnoses were made by means of the combination of research and automated approaches, which was critical for cases in which we found etiologic noncoding, structural, and mitochondrial genome variants and coding variants poorly covered by exome sequencing. Cohortwide burden testing across 57,000 genomes enabled the discovery of three new disease genes and 19 new associations. Of the genetic diagnoses that we made, 25% had immediate ramifications for clinical decision making for the patients or their relatives. CONCLUSIONS: Our pilot study of genome sequencing in a national health care system showed an increase in diagnostic yield across a range of rare diseases. (Funded by the National Institute for Health Research and others.).
Asunto(s)
Genoma Humano , Enfermedades Raras/genética , Adolescente , Adulto , Niño , Preescolar , Composición Familiar , Femenino , Variación Genética , Humanos , Masculino , Persona de Mediana Edad , Proyectos Piloto , Reacción en Cadena de la Polimerasa , Enfermedades Raras/diagnóstico , Sensibilidad y Especificidad , Medicina Estatal , Reino Unido , Secuenciación Completa del Genoma , Adulto JovenRESUMEN
The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Alelos , Análisis Mutacional de ADN , Europa (Continente)/etnología , Exoma , Estudio de Asociación del Genoma Completo , Técnicas de Genotipaje , Humanos , Tamaño de la MuestraRESUMEN
BACKGROUND & AIMS: Anti-tumor necrosis factor (anti-TNF) therapies are the most widely used biologic drugs for treating immune-mediated diseases, but repeated administration can induce the formation of anti-drug antibodies. The ability to identify patients at increased risk for development of anti-drug antibodies would facilitate selection of therapy and use of preventative strategies. METHODS: We performed a genome-wide association study to identify variants associated with time to development of anti-drug antibodies in a discovery cohort of 1240 biologic-naïve patients with Crohn's disease starting infliximab or adalimumab therapy. Immunogenicity was defined as an anti-drug antibody titer ≥10 AU/mL using a drug-tolerant enzyme-linked immunosorbent assay. Significant association signals were confirmed in a replication cohort of 178 patients with inflammatory bowel disease. RESULTS: The HLA-DQA1*05 allele, carried by approximately 40% of Europeans, significantly increased the rate of immunogenicity (hazard ratio [HR], 1.90; 95% confidence interval [CI], 1.60-2.25; P = 5.88 × 10-13). The highest rates of immunogenicity, 92% at 1 year, were observed in patients treated with infliximab monotherapy who carried HLA-DQA1*05; conversely the lowest rates of immunogenicity, 10% at 1 year, were observed in patients treated with adalimumab combination therapy who did not carry HLA-DQA1*05. We confirmed this finding in the replication cohort (HR, 2.00; 95% CI, 1.35-2.98; P = 6.60 × 10-4). This association was consistent for patients treated with adalimumab (HR, 1.89; 95% CI, 1.32-2.70) or infliximab (HR, 1.92; 95% CI, 1.57-2.33), and for patients treated with anti-TNF therapy alone (HR, 1.75; 95% CI, 1.37-2.22) or in combination with an immunomodulator (HR, 2.01; 95% CI, 1.57-2.58). CONCLUSIONS: In an observational study, we found a genome-wide significant association between HLA-DQA1*05 and the development of antibodies against anti-TNF agents. A randomized controlled biomarker trial is required to determine whether pretreatment testing for HLA-DQA1*05 improves patient outcomes by helping physicians select anti-TNF and combination therapies. ClinicalTrials.gov ID: NCT03088449.
Asunto(s)
Adalimumab/inmunología , Enfermedad de Crohn/terapia , Cadenas alfa de HLA-DQ/genética , Infliximab/inmunología , Factor de Necrosis Tumoral alfa/antagonistas & inhibidores , Adalimumab/uso terapéutico , Adulto , Alelos , Enfermedad de Crohn/sangre , Femenino , Estudio de Asociación del Genoma Completo , Heterocigoto , Humanos , Infliximab/uso terapéutico , Masculino , Persona de Mediana Edad , Selección de Paciente , Factor de Necrosis Tumoral alfa/inmunología , Adulto JovenRESUMEN
Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α = 2.5 × 10(-6)) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
Asunto(s)
Enfermedades Genéticas Congénitas , Variación Genética , Estudio de Asociación del Genoma Completo , Modelos Teóricos , Alelos , Simulación por Computador , Diabetes Mellitus Tipo 2/genética , Exoma/genética , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , FenotipoRESUMEN
We have investigated the evidence for positive selection in samples of African, European, and East Asian ancestry at 65 loci associated with susceptibility to type 2 diabetes (T2D) previously identified through genome-wide association studies. Selection early in human evolutionary history is predicted to lead to ancestral risk alleles shared between populations, whereas late selection would result in population-specific signals at derived risk alleles. By using a wide variety of tests based on the site frequency spectrum, haplotype structure, and population differentiation, we found no global signal of enrichment for positive selection when we considered all T2D risk loci collectively. However, in a locus-by-locus analysis, we found nominal evidence for positive selection at 14 of the loci. Selection favored the protective and risk alleles in similar proportions, rather than the risk alleles specifically as predicted by the thrifty gene hypothesis, and may not be related to influence on diabetes. Overall, we conclude that past positive selection has not been a powerful influence driving the prevalence of T2D risk alleles.
Asunto(s)
Diabetes Mellitus Tipo 2/epidemiología , Diabetes Mellitus Tipo 2/genética , Sitios Genéticos , Predisposición Genética a la Enfermedad , Selección Genética , Alelos , Pueblo Asiatico/genética , Población Negra/genética , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Haplotipos , Humanos , Polimorfismo de Nucleótido Simple , Factores de Riesgo , Población Blanca/genéticaRESUMEN
Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Inmunidad Celular/inmunología , Esclerosis Múltiple/genética , Esclerosis Múltiple/inmunología , Alelos , Diferenciación Celular/inmunología , Europa (Continente)/etnología , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Antígenos HLA-A/genética , Antígenos HLA-DR/genética , Cadenas HLA-DRB1 , Humanos , Inmunidad Celular/genética , Complejo Mayor de Histocompatibilidad/genética , Polimorfismo de Nucleótido Simple/genética , Tamaño de la Muestra , Linfocitos T Colaboradores-Inductores/citología , Linfocitos T Colaboradores-Inductores/inmunologíaRESUMEN
MOTIVATION: In sequencing studies of common diseases and quantitative traits, power to test rare and low frequency variants individually is weak. To improve power, a common approach is to combine statistical evidence from several genetic variants in a region. Major challenges are how to do the combining and which statistical framework to use. General approaches for testing association between rare variants and quantitative traits include aggregating genotypes and trait values, referred to as 'collapsing', or using a score-based variance component test. However, little attention has been paid to alternative models tailored for protein truncating variants. Recent studies have highlighted the important role that protein truncating variants, commonly referred to as 'loss of function' variants, may have on disease susceptibility and quantitative levels of biomarkers. We propose a Bayesian modelling framework for the analysis of protein truncating variants and quantitative traits. RESULTS: Our simulation results show that our models have an advantage over the commonly used methods. We apply our models to sequence and exome-array data and discover strong evidence of association between low plasma triglyceride levels and protein truncating variants at APOC3 (Apolipoprotein C3). AVAILABILITY: Software is available from http://www.well.ox.ac.uk/~rivas/mamba
Asunto(s)
Mutación , Sitios de Carácter Cuantitativo , Apolipoproteína C-III/genética , Teorema de Bayes , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/genética , Exoma , Genoma Humano , Genotipo , Humanos , Internet , Modelos Genéticos , Fenotipo , Diseño de Software , Triglicéridos/sangreRESUMEN
Statistical imputation of classical HLA alleles in case-control studies has become established as a valuable tool for identifying and fine-mapping signals of disease association in the MHC. Imputation into diverse populations has, however, remained challenging, mainly because of the additional haplotypic heterogeneity introduced by combining reference panels of different sources. We present an HLA type imputation model, HLA*IMP:02, designed to operate on a multi-population reference panel. HLA*IMP:02 is based on a graphical representation of haplotype structure. We present a probabilistic algorithm to build such models for the HLA region, accommodating genotyping error, haplotypic heterogeneity and the need for maximum accuracy at the HLA loci, generalizing the work of Browning and Browning (2007) and Ron et al. (1998). HLA*IMP:02 achieves an average 4-digit imputation accuracy on diverse European panels of 97% (call rate 97%). On non-European samples, 2-digit performance is over 90% for most loci and ethnicities where data available. HLA*IMP:02 supports imputation of HLA-DPB1 and HLA-DRB3-5, is highly tolerant of missing data in the imputation panel and works on standard genotype data from popular genotyping chips. It is publicly available in source code and as a user-friendly web service framework.
Asunto(s)
Biología Computacional/métodos , Genética de Población/métodos , Antígenos HLA/genética , Modelos Genéticos , Modelos Inmunológicos , Haplotipos , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Grupos Raciales , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
We built a reference panel with 342 million autosomal variants using 78,195 individuals from the Genomics England (GEL) dataset, achieving a phasing switch error rate of 0.18% for European samples and imputation quality of r2 = 0.75 for variants with minor allele frequencies as low as 2 × 10-4 in white British samples. The GEL-imputed UK Biobank genome-wide association analysis identified 70% of associations found by direct exome sequencing (P < 2.18 × 10-11), while extending testing of rare variants to the entire genome. Coding variants dominated the rare-variant genome-wide association results, implying less disruptive effects of rare non-coding variants.
Asunto(s)
Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Haplotipos , Polimorfismo de Nucleótido Simple , Humanos , Inglaterra , Secuenciación del Exoma/métodos , Genoma Humano , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Biobanco del Reino Unido , Reino Unido , Población Blanca/genéticaRESUMEN
PURPOSE: As part of the 100,000 Genomes Project, we set out to assess the potential viability and clinical impact of reporting genetic variants associated with drug-induced toxicity for patients with cancer recruited for whole-genome sequencing (WGS) as part of a genomic medicine service. METHODS: Germline WGS from 76,805 participants was analyzed for pharmacogenetic (PGx) variants in four genes (DPYD, NUDT15, TPMT, UGT1A1) associated with toxicity induced by five drugs used in cancer treatment (capecitabine, fluorouracil, mercaptopurine, thioguanine, irinotecan). Linking genomic data with prescribing and hospital incidence records, a phenome-wide association study (PheWAS) was performed to identify whether phenotypes indicative of adverse drug reactions (ADRs) were enriched in drug-exposed individuals with the relevant PGx variants. In a subset of 7,081 patients with cancer, DPYD variants were reported back to clinicians and outcomes were collected. RESULTS: We identified clinically relevant PGx variants across the four genes in 62.7% of participants in our cohort. Extending this to annual prescription numbers in England for the drugs affected by these PGx variants, approximately 14,540 patients per year could potentially benefit from a reduced dose or alternative drug to reduce the risk of ADRs. Validating PGx associations in a real-world data set, we found a significant association between PGx variants in DPYD and toxicity-related phenotypes in patients treated with capecitabine or fluorouracil. Reported DPYD variants were deemed informative for clinical decision making in a majority of cases. CONCLUSION: Reporting PGx variants from germline WGS relevant to patients with cancer alongside primary findings related to their cancer can be clinically informative, informing prescribing to reduce the risk of ADRs. Extending the range of actionable variants to those found in patients of non-European ancestry is important and will extend the potential clinical impact.
RESUMEN
Repeat expansion disorders (REDs) are a devastating group of predominantly neurological diseases. Together they are common, affecting 1 in 3,000 people worldwide with population-specific differences. However, prevalence estimates of REDs are hampered by heterogeneous clinical presentation, variable geographic distributions and technological limitations leading to underascertainment. Here, leveraging whole-genome sequencing data from 82,176 individuals from different populations, we found an overall disease allele frequency of REDs of 1 in 283 individuals. Modeling disease prevalence using genetic data, age at onset and survival, we show that the expected number of people with REDs would be two to three times higher than currently reported figures, indicating underdiagnosis and/or incomplete penetrance. While some REDs are population specific, for example, Huntington disease-like 2 in Africans, most REDs are represented in all broad genetic ancestries (that is, Europeans, Africans, Americans, East Asians and South Asians), challenging the notion that some REDs are found only in specific populations. These results have worldwide implications for local and global health communities in the diagnosis and counseling of REDs.
RESUMEN
The combining of genome-wide association (GWA) data across populations represents a major challenge for massive global meta-analyses. Genotype imputation using densely genotyped reference samples facilitates the combination of data across different genotyping platforms. HapMap data is typically used as a reference for single nucleotide polymorphism (SNP) imputation and tagging copy number polymorphisms (CNPs). However, the advantage of having population-specific reference panels for founder populations has not been evaluated. We looked at the properties and impact of adding 81 individuals from a founder population to HapMap3 reference data on imputation quality, CNP tagging, and power to detect association in simulations and in an independent cohort of 2138 individuals. The gain in SNP imputation accuracy was highest among low-frequency markers (minor allele frequency [MAF] < 5%), for which adding the population-specific samples to the reference set increased the median R(2) between imputed and genotyped SNPs from 0.90 to 0.94. Accuracy also increased in regions with high recombination rates. Similarly, a reference set with population-specific extension facilitated the identification of better tag-SNPs for a subset of CNPs; for 4% of CNPs the R(2) between SNP genotypes and CNP intensity in the independent population cohort was at least twice as high as without the extension. We conclude that even a relatively small population-specific reference set yields considerable benefits in SNP imputation, CNP tagging accuracy, and the power to detect associations in founder populations and population isolates in particular.
Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Efecto Fundador , Estudio de Asociación del Genoma Completo/métodos , Población Blanca/genética , Finlandia , Frecuencia de los Genes , Genética de Población , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
Since an association between the human leukocyte antigen (HLA) region and Hodgkin lymphoma (HL) was first reported in 1967, many studies have reported associations between HL risk and both single nucleotide polymorphism (SNP) and classic HLA allele variation in the major histocompatibility complex. However, population stratification and the extent and complexity of linkage disequilibrium within the major histocompatibility complex have hindered efforts to fine-map causal signals. Using SNP data to impute alleles at classic HLA loci, we have conducted an integrated analysis of HL risk within the HLA region in 582 early-onset HL cases and 4736 controls. We confirm that the strongest signal of association comes from an SNP located in the class II region, rs6903608 (odds ratio [OR] = 1.79, P = 6.63 × 10(-19)), which is unlikely to be driven by association to HLA-DRB, DQA, or DQB alleles. In addition, we identify independent signals at rs2281389 (OR = 1.73, P = 6.31 × 10(-13)), a SNP that maps closely to HLA-DPB1, and the class II HLA allele DQA1*02:01 (OR = 0.56, P = 1.51 × 10(-7)). These data suggest that multiple independent loci within the HLA class II region contribute to the risk of developing early-onset HL.
Asunto(s)
Cromosomas Humanos Par 6 , Antígenos HLA/genética , Enfermedad de Hodgkin/epidemiología , Enfermedad de Hodgkin/genética , Edad de Inicio , Predisposición Genética a la Enfermedad/epidemiología , Predisposición Genética a la Enfermedad/genética , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Factores de RiesgoRESUMEN
A role for specific human leukocyte antigen (HLA) variants in the etiology of childhood acute lymphoblastic leukemia (ALL) has been extensively studied over the last 30 years, but no unambiguous association has been identified. To comprehensively study the relationship between genetic variation within the 4.5 Mb major histocompatibility complex genomic region and precursor B-cell (BCP) ALL risk, we analyzed 1075 observed and 8176 imputed single nucleotide polymorphisms and their related haplotypes in 824 BCP-ALL cases and 4737 controls. Using these genotypes we also imputed both common and rare alleles at class I (HLA-A, HLA-B, and HLA-C) and class II (HLA-DRB1, HLA-DQA1, and HLA-DQB1) HLA loci. Overall, we found no statistically significant association between variants and BCP-ALL risk. We conclude that major histocompatibility complex-defined variation in immune-mediated response is unlikely to be a major risk factor for BCP-ALL.
Asunto(s)
Haplotipos/genética , Antígenos de Histocompatibilidad Clase II/genética , Antígenos de Histocompatibilidad Clase I/genética , Polimorfismo de Nucleótido Simple/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B/genética , Alelos , Estudios de Casos y Controles , Niño , Preescolar , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Leucemia-Linfoma Linfoblástico de Células Precursoras B/patologíaRESUMEN
Repeat expansion disorders (REDs) are a devastating group of predominantly neurological diseases. Together they are common, affecting 1 in 3,000 people worldwide with population-specific differences. However, prevalence estimates of REDs are hampered by heterogeneous clinical presentation, variable geographic distributions, and technological limitations leading to under-ascertainment. Here, leveraging whole genome sequencing data from 82,176 individuals from different populations we found an overall carrier frequency of REDs of 1 in 340 individuals. Modelling disease prevalence using genetic data, age at onset and survival, we show that REDs are up to 3-fold more prevalent than currently reported figures. While some REDs are population-specific, e.g. Huntington's disease type 2, most REDs are represented in all broad genetic ancestries, including Africans and Asians, challenging the notion that some REDs are found only in European populations. These results have worldwide implications for local and global health communities in the diagnosis and management of REDs both at local and global levels.
RESUMEN
MOTIVATION: Genetic variation at classical HLA alleles influences many phenotypes, including susceptibility to autoimmune disease, resistance to pathogens and the risk of adverse drug reactions. However, classical HLA typing methods are often prohibitively expensive for large-scale studies. We previously described a method for imputing classical alleles from linked SNP genotype data. Here, we present a modification of the original algorithm implemented in a freely available software suite that combines local data preparation and QC with probabilistic imputation through a remote server. RESULTS: We introduce two modifications to the original algorithm. First, we present a novel SNP selection function that leads to pronounced increases (up by 40% in some scenarios) in call rate. Second, we develop a parallelized model building algorithm that allows us to process a reference set of over 2500 individuals. In a validation experiment, we show that our framework produces highly accurate HLA type imputations at class I and class II loci for independent datasets: at call rates of 95-99%, imputation accuracy is between 92% and 98% at the four-digit level and over 97% at the two-digit level. We demonstrate utility of the method through analysis of a genome-wide association study for psoriasis where there is a known classical HLA risk allele (HLA-C*06:02). We show that the imputed allele shows stronger association with disease than any single SNP within the region. The imputation framework, HLA*IMP, provides a powerful tool for dissecting the architecture of genetic risk within the HLA. AVAILABILITY: HLA*IMP, implemented in C++ and Perl, is available from http://oxfordhla.well.ox.ac.uk and is free for academic use.
Asunto(s)
Alelos , Antígenos HLA/genética , Polimorfismo de Nucleótido Simple , Programas Informáticos , Algoritmos , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Psoriasis/genéticaRESUMEN
Irritable bowel syndrome (IBS) results from disordered brain-gut interactions. Identifying susceptibility genes could highlight the underlying pathophysiological mechanisms. We designed a digestive health questionnaire for UK Biobank and combined identified cases with IBS with independent cohorts. We conducted a genome-wide association study with 53,400 cases and 433,201 controls and replicated significant associations in a 23andMe panel (205,252 cases and 1,384,055 controls). Our study identified and confirmed six genetic susceptibility loci for IBS. Implicated genes included NCAM1, CADM2, PHF2/FAM120A, DOCK9, CKAP2/TPTE2P3 and BAG6. The first four are associated with mood and anxiety disorders, expressed in the nervous system, or both. Mirroring this, we also found strong genome-wide correlation between the risk of IBS and anxiety, neuroticism and depression (rg > 0.5). Additional analyses suggested this arises due to shared pathogenic pathways rather than, for example, anxiety causing abdominal symptoms. Implicated mechanisms require further exploration to help understand the altered brain-gut interactions underlying IBS.
Asunto(s)
Trastornos de Ansiedad/genética , Síndrome del Colon Irritable/genética , Trastornos del Humor/genética , Anciano , Antígeno CD56/genética , Moléculas de Adhesión Celular/genética , Proteínas del Citoesqueleto/genética , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Factores de Intercambio de Guanina Nucleótido/genética , Proteínas de Homeodominio/genética , Humanos , Síndrome del Colon Irritable/epidemiología , Masculino , Persona de Mediana Edad , Chaperonas Moleculares/genética , Polimorfismo de Nucleótido Simple , Reino Unido/epidemiologíaRESUMEN
MOTIVATION: Short interfering RNA (siRNA)-induced RNA interference is an endogenous pathway in sequence-specific gene silencing. The potency of different siRNAs to inhibit a common target varies greatly and features affecting inhibition are of high current interest. The limited success in predicting siRNA potency being reported so far could originate in the small number and the heterogeneity of available datasets in addition to the knowledge-driven, empirical basis on which features thought to be affecting siRNA potency are often chosen. We attempt to overcome these problems by first constructing a meta-dataset of 6483 publicly available siRNAs (targeting mammalian mRNA), the largest to date, and then applying a Bayesian analysis which accommodates feature set uncertainty. A stochastic logistic regression-based algorithm is designed to explore a vast model space of 497 compositional, structural and thermodynamic features, identifying associations with siRNA potency. RESULTS: Our algorithm reveals a number of features associated with siRNA potency that are, to the best of our knowledge, either under reported in literature, such as anti-sense 5' -3' motif 'UCU', or not reported at all, such as the anti-sense 5' -3' motif 'ACGA'. These findings should aid in improving future siRNA potency predictions and might offer further insights into the working of the RNA-induced silencing complex (RISC).
Asunto(s)
Interferencia de ARN , ARN Interferente Pequeño/química , Algoritmos , Teorema de Bayes , Modelos Genéticos , Complejo Silenciador Inducido por ARN/química , Análisis de Secuencia de ARNRESUMEN
Very-early-onset inflammatory bowel disease (VEO-IBD) is a heterogeneous phenotype associated with a spectrum of rare Mendelian disorders. Here, we perform whole-exome-sequencing and genome-wide genotyping in 145 patients (median age-at-diagnosis of 3.5 years), in whom no Mendelian disorders were clinically suspected. In five patients we detect a primary immunodeficiency or enteropathy, with clinical consequences (XIAP, CYBA, SH2D1A, PCSK1). We also present a case study of a VEO-IBD patient with a mosaic de novo, pathogenic allele in CYBB. The mutation is present in ~70% of phagocytes and sufficient to result in defective bacterial handling but not life-threatening infections. Finally, we show that VEO-IBD patients have, on average, higher IBD polygenic risk scores than population controls (99 patients and 18,780 controls; P < 4 × 10-10), and replicate this finding in an independent cohort of VEO-IBD cases and controls (117 patients and 2,603 controls; P < 5 × 10-10). This discovery indicates that a polygenic component operates in VEO-IBD pathogenesis.