RESUMO
Fully understanding autism spectrum disorder (ASD) genetics requires whole-genome sequencing (WGS). We present the latest release of the Autism Speaks MSSNG resource, which includes WGS data from 5,100 individuals with ASD and 6,212 non-ASD parents and siblings (total n = 11,312). Examining a wide variety of genetic variants in MSSNG and the Simons Simplex Collection (SSC; n = 9,205), we identified ASD-associated rare variants in 718/5,100 individuals with ASD from MSSNG (14.1%) and 350/2,419 from SSC (14.5%). Considering genomic architecture, 52% were nuclear sequence-level variants, 46% were nuclear structural variants (including copy-number variants, inversions, large insertions, uniparental isodisomies, and tandem repeat expansions), and 2% were mitochondrial variants. Our study provides a guidebook for exploring genotype-phenotype correlations in families who carry ASD-associated rare variants and serves as an entry point to the expanded studies required to dissect the etiology in the â¼85% of the ASD population that remain idiopathic.
Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno do Espectro Autista/genética , Predisposição Genética para Doença , Variações do Número de Cópias de DNA/genética , GenômicaRESUMO
Interindividual variability in genes encoding drug-metabolizing enzymes, transporters, receptors, and human leukocyte antigens has a major impact on a patient's response to drugs with regard to efficacy and safety. Enabled by both technological and conceptual advances, the field of pharmacogenomics is developing rapidly. Major progress in omics profiling methods has enabled novel genotypic and phenotypic characterization of patients and biobanks. These developments are paralleled by advances in machine learning, which have allowed us to parse the immense wealth of data and establish novel genetic markers and polygenic models for drug selection and dosing. Pharmacogenomics has recently become more widespread in clinical practice to personalize treatment and to develop new drugs tailored to specific patient populations. In this review, we provide an overview of the latest developments in the field and discuss the way forward, including how to address the missing heritability, develop novel polygenic models, and further improve the clinical implementation of pharmacogenomics.
Assuntos
Proteínas de Membrana Transportadoras , Farmacogenética , Humanos , TecnologiaRESUMO
Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.
Assuntos
Frequência do Gene , Genótipo , Polimorfismo de Nucleotídeo Único , Software , Humanos , Estudos de Coortes , Desequilíbrio de Ligação , Estudo de Associação Genômica Ampla/métodos , Genoma Humano , Controle de Qualidade , Aprendizado de Máquina , Sequenciamento Completo do Genoma/normas , Sequenciamento Completo do Genoma/métodosRESUMO
Systemic sclerosis (SSc) is a heterogeneous rare autoimmune fibrosing disorder affecting connective tissue. The etiology of systemic sclerosis is largely unknown and many genes have been suggested as susceptibility loci of modest impact by genome-wide association study (GWAS). Multiple factors can contribute to the pathological process of the disease, which makes it more difficult to identify possible disease-causing genetic alterations. In this study, we have applied whole genome sequencing (WGS) in 101 indexed family trios, supplemented with transcriptome sequencing on cultured fibroblast cells of four patients and five family controls where available. Single nucleotide variants (SNVs) and copy number variants (CNVs) were examined, with emphasis on de novo variants. We also performed enrichment test for rare variants in candidate genes previously proposed in association with systemic sclerosis. We identified 42 exonic and 34 ncRNA de novo SNV changes in 101 trios, from a total of over 6000 de novo variants genome wide. We observed higher than expected de novo variants in PRKXP1 gene. We also observed such phenomenon along with increased expression in patient group in NEK7 gene. Additionally, we also observed significant enrichment of rare variants in candidate genes in the patient cohort, further supporting the complexity/multi-factorial etiology of systemic sclerosis. Our findings identify new candidate genes including PRKXP1 and NEK7 for future studies in SSc. We observed rare variant enrichment in candidate genes previously proposed in association with SSc, which suggest more efforts should be pursued to further investigate possible pathogenetic mechanisms associated with those candidate genes.
Assuntos
Variações do Número de Cópias de DNA , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Escleroderma Sistêmico , Sequenciamento Completo do Genoma , Humanos , Escleroderma Sistêmico/genética , Escleroderma Sistêmico/patologia , Variações do Número de Cópias de DNA/genética , Masculino , Feminino , Adulto , Quinases Relacionadas a NIMA/genética , Pessoa de Meia-Idade , Fibroblastos/metabolismo , Fibroblastos/patologiaRESUMO
Long non-coding RNAs (lncRNAs) are known to perform important regulatory functions in lipid metabolism. Large-scale whole-genome sequencing (WGS) studies and new statistical methods for variant set tests now provide an opportunity to assess more associations between rare variants in lncRNA genes and complex traits across the genome. In this study, we used high-coverage WGS from 66,329 participants of diverse ancestries with measurement of blood lipids and lipoproteins (LDL-C, HDL-C, TC, and TG) in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program to investigate the role of lncRNAs in lipid variability. We aggregated rare variants for 165,375 lncRNA genes based on their genomic locations and conducted rare-variant aggregate association tests using the STAAR (variant-set test for association using annotation information) framework. We performed STAAR conditional analysis adjusting for common variants in known lipid GWAS loci and rare-coding variants in nearby protein-coding genes. Our analyses revealed 83 rare lncRNA variant sets significantly associated with blood lipid levels, all of which were located in known lipid GWAS loci (in a ±500-kb window of a Global Lipids Genetics Consortium index variant). Notably, 61 out of 83 signals (73%) were conditionally independent of common regulatory variation and rare protein-coding variation at the same loci. We replicated 34 out of 61 (56%) conditionally independent associations using the independent UK Biobank WGS data. Our results expand the genetic architecture of blood lipids to rare variants in lncRNAs.
Assuntos
RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Estudo de Associação Genômica Ampla , Medicina de Precisão , Sequenciamento Completo do Genoma/métodos , Lipídeos/genética , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.
Assuntos
Exoma , Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Exoma/genética , Bancos de Espécimes Biológicos , Fenótipo , Análise de Dados , Reino UnidoRESUMO
Deleterious mutations in the X-linked gene encoding ornithine transcarbamylase (OTC) cause the most common urea cycle disorder, OTC deficiency. This rare but highly actionable disease can present with severe neonatal onset in males or with later onset in either sex. Individuals with neonatal onset appear normal at birth but rapidly develop hyperammonemia, which can progress to cerebral edema, coma, and death, outcomes ameliorated by rapid diagnosis and treatment. Here, we develop a high-throughput functional assay for human OTC and individually measure the impact of 1,570 variants, 84% of all SNV-accessible missense mutations. Comparison to existing clinical significance calls, demonstrated that our assay distinguishes known benign from pathogenic variants and variants with neonatal onset from late-onset disease presentation. This functional stratification allowed us to identify score ranges corresponding to clinically relevant levels of impairment of OTC activity. Examining the results of our assay in the context of protein structure further allowed us to identify a 13 amino acid domain, the SMG loop, whose function appears to be required in human cells but not in yeast. Finally, inclusion of our data as PS3 evidence under the current ACMG guidelines, in a pilot reclassification of 34 variants with complete loss of activity, would change the classification of 22 from variants of unknown significance to clinically actionable likely pathogenic variants. These results illustrate how large-scale functional assays are especially powerful when applied to rare genetic diseases.
Assuntos
Hiperamonemia , Doença da Deficiência de Ornitina Carbomoiltransferase , Ornitina Carbamoiltransferase , Humanos , Substituição de Aminoácidos , Hiperamonemia/etiologia , Hiperamonemia/genética , Mutação de Sentido Incorreto/genética , Ornitina Carbamoiltransferase/genética , Doença da Deficiência de Ornitina Carbomoiltransferase/genética , Doença da Deficiência de Ornitina Carbomoiltransferase/diagnóstico , Doença da Deficiência de Ornitina Carbomoiltransferase/terapiaRESUMO
Most genome-wide association studies are based on case-control designs, which provide abundant resources for secondary phenotype analyses. However, such studies suffer from biased sampling of primary phenotypes, and the traditional statistical methods can lead to seriously distorted analysis results when they are applied to secondary phenotypes without accounting for the biased sampling mechanism. To our knowledge, there are no statistical methods specifically tailored for rare variant association analysis with secondary phenotypes. In this article, we proposed two novel joint test statistics for identifying secondary-phenotype-associated rare variants based on prospective likelihood and retrospective likelihood, respectively. We also exploit the assumption of gene-environment independence in retrospective likelihood to improve the statistical power and adopt a two-step strategy to balance statistical power and robustness. Simulations and a real-data application are conducted to demonstrate the superior performance of our proposed methods.
RESUMO
Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent p value thresholds.
RESUMO
Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.
Assuntos
Variação Genética , Projetos de Pesquisa , Simulação por Computador , Variação Genética/genética , Haplótipos/genética , Humanos , Modelos Genéticos , Herança MultifatorialRESUMO
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
Assuntos
Herança Multifatorial , Obesidade , Índice de Massa Corporal , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Obesidade/genética , Fenótipo , Fatores de RiscoRESUMO
Atrial fibrillation (AF) is a globally prevalent cardiac arrhythmia with significant genetic underpinnings, as highlighted by recent large-scale genetic studies. A prominent clinical and genetic overlap exists between AF, heritable ventricular cardiomyopathies, and arrhythmia syndromes, underlining the potential of AF as an early indicator of severe ventricular disease in younger individuals. Indeed, several recent studies have demonstrated meaningful yields of rare pathogenic variants among early-onset AF patients (â¼4%-11%), most notably for cardiomyopathy genes in which rare variants are considered clinically actionable. Genetic testing thus presents a promising opportunity to identify monogenetic defects linked to AF and inherited cardiac conditions, such as cardiomyopathy, and may contribute to prognosis and management in early-onset AF patients. A first step towards recognizing this monogenic contribution was taken with the Class IIb recommendation for genetic testing in AF patients aged 45 years or younger by the 2023 American College of Cardiology/American Heart Association guidelines for AF. By identifying pathogenic genetic variants known to underlie inherited cardiomyopathies and arrhythmia syndromes, a personalized care pathway can be developed, encompassing more tailored screening, cascade testing, and potentially genotype-informed prognosis and preventive measures. However, this can only be ensured by frameworks that are developed and supported by all stakeholders. Ambiguity in test results such as variants of uncertain significance remain a major challenge and as many as â¼60% of people with early-onset AF might carry such variants. Patient education (including pretest counselling), training of genetic teams, selection of high-confidence genes, and careful reporting are strategies to mitigate this. Further challenges to implementation include financial barriers, insurability issues, workforce limitations, and the need for standardized definitions in a fast-moving field. Moreover, the prevailing genetic evidence largely rests on European descent populations, underscoring the need for diverse research cohorts and international collaboration. Embracing these challenges and the potential of genetic testing may improve AF care. However, further research-mechanistic, translational, and clinical-is urgently needed.
Assuntos
Idade de Início , Fibrilação Atrial , Testes Genéticos , Humanos , Fibrilação Atrial/genética , Fibrilação Atrial/diagnóstico , Testes Genéticos/métodos , Predisposição Genética para Doença/genética , Pessoa de Meia-Idade , Cardiomiopatias/genética , Cardiomiopatias/diagnóstico , AdultoRESUMO
AIMS/HYPOTHESIS: GLIS3 encodes a transcription factor involved in pancreatic beta cell development and function. Rare pathogenic, bi-allelic mutations in GLIS3 cause syndromic neonatal diabetes whereas frequent SNPs at this locus associate with common type 2 diabetes risk. Because rare, functional variants located in other susceptibility genes for type 2 diabetes have already been shown to strongly increase individual risk for common type 2 diabetes, we aimed to investigate the contribution of rare pathogenic GLIS3 variants to type 2 diabetes. METHODS: GLIS3 was sequenced in 5471 individuals from the Rare Variants Involved in Diabetes and Obesity (RaDiO) study. Variant pathogenicity was assessed following the criteria established by the American College of Medical Genetics and Genomics (ACMG). To address the pathogenic strong criterion number 3 (PS3), we conducted functional investigations of these variants using luciferase assays, focusing on capacity of GLIS family zinc finger 3 (GLIS3) to bind to and activate the INS promoter. The association between rare pathogenic or likely pathogenic (P/LP) variants and type 2 diabetes risk (and other metabolic traits) was then evaluated. A meta-analysis combining association results from RaDiO, the 52K study (43,125 individuals) and the TOPMed study (44,083 individuals) was finally performed. RESULTS: Through targeted resequencing of GLIS3, we identified 105 rare variants that were carried by 395 participants from RaDiO. Among them, 49 variants decreased the activation of the INS promoter. Following ACMG criteria, 18 rare variants were classified as P/LP, showing an enrichment in the last two exons compared with the remaining exons (p<5×10-6; OR>3.5). The burden of these P/LP variants was strongly higher in individuals with type 2 diabetes (p=3.0×10-3; OR 3.9 [95% CI 1.4, 12]), whereas adiposity, age at type 2 diabetes diagnosis and cholesterol levels were similar between variant carriers and non-carriers with type 2 diabetes. Interestingly, all carriers with type 2 diabetes were sensitive to oral sulfonylureas. A total of 7 P/LP variants were identified in both 52K and TOPMed studies. The meta-analysis of association studies obtained from RaDiO, 52K and TOPMed showed an enrichment of P/LP GLIS3 variants in individuals with type 2 diabetes (p=5.6×10-5; OR 2.1 [95% CI 1.4, 2.9]). CONCLUSIONS/INTERPRETATION: Rare P/LP GLIS3 variants do contribute to type 2 diabetes risk. The variants located in the distal part of the protein could have a direct effect on its functional activity by impacting its transactivation domain, by homology with the mouse GLIS3 protein. Furthermore, rare P/LP GLIS3 variants seem to have a direct clinical effect on beta cell function, which could be improved by increasing insulin secretion via the use of sulfonylureas.
Assuntos
Diabetes Mellitus Tipo 2 , Células Secretoras de Insulina , Camundongos , Animais , Recém-Nascido , Humanos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Células Secretoras de Insulina/metabolismo , Mutação , Proteínas de Ligação a DNA/metabolismo , Proteínas Repressoras/metabolismo , Transativadores/metabolismoRESUMO
Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.
Assuntos
Variação Genética , Modelos Genéticos , Humanos , Simulação por Computador , Fenótipo , SoftwareRESUMO
The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal-fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother-infant pairs and 1306 control mother-infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, TMEM107 (p = 1.64e-06) and CTC1 (p = 2.0e-06), were identified for significant association with CHD in common variants analysis. Gene TMEM107 regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene CTC1 plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of TMEM107 and CTC1 with CHDs.
Assuntos
Estudo de Associação Genômica Ampla , Cardiopatias Congênitas , Feminino , Humanos , Modelos Genéticos , Genótipo , Cardiopatias Congênitas/genética , Mães , Estudos de Casos e ControlesRESUMO
Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with excess relatedness among individuals with similar trait values. Analysis may be conducted on related individuals from families, or on samples of unrelated individuals from a population. For allelically heterogeneous traits, population-based linkage analysis can be more powerful than genotypic-association analysis. Here, we focus on linkage analysis in a population sample, but use sequences rather than individuals as our unit of observation. Earlier investigations of sequence-based linkage mapping relied on known sequence relatedness, whereas we infer relatedness from the sequence data. We propose two ways to associate similarity in relatedness of sequences with similarity in their trait values and compare the resulting linkage methods to two genotypic-association methods. We also introduce a procedure to label case sequences as potential carriers or noncarriers of causal variants after an association has been found. This post hoc labeling of case sequences is based on inferred relatedness to other case sequences. Our simulation results indicate that methods based on sequence relatedness improve localization and perform as well as genotypic-association methods for detecting rare causal variants. Sequence-based linkage analysis therefore has potential to fine-map allelically heterogeneous disease traits.
Assuntos
Modelos Genéticos , Locos de Características Quantitativas , Humanos , Mapeamento Cromossômico/métodos , Fenótipo , Genótipo , Ligação Genética , Desequilíbrio de LigaçãoRESUMO
In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.
Assuntos
Degeneração Macular , Modelos Genéticos , Humanos , Fenótipo , Degeneração Macular/genética , Simulação por Computador , Modelos LinearesRESUMO
ATPase, class 1, type 8 A, member 2 (ATP8A2) is a P4-ATPase with a critical role in phospholipid translocation across the plasma membrane. Pathogenic variants in ATP8A2 are known to cause cerebellar ataxia, impaired intellectual development, and disequilibrium syndrome 4 (CAMRQ4) which is often associated with encephalopathy, global developmental delay, and severe motor deficits. Here, we present a family with two siblings born from a consanguineous, first-cousin union from Sudan presenting with global developmental delay, intellectual disability, spasticity, ataxia, nystagmus, and thin corpus callosum. Whole exome sequencing revealed a homozygous missense variant in the nucleotide binding domain of ATP8A2 (p.Leu538Pro) that results in near complete loss of protein expression. This is in line with other missense variants in the same domain leading to protein misfolding and loss of ATPase function. In addition, by performing diffusion-weighted imaging, we identified bilateral hyperintensities in the posterior limbs of the internal capsule suggesting possible microstructural changes in axon tracts that had not been appreciated before and could contribute to the sensorimotor deficits in these individuals.
RESUMO
Common genetic variation throughout the genome together with rare coding variants identified to date explain about a half of the inherited genetic component of epithelial ovarian cancer risk. It is likely that rare variation in the non-coding genome will explain some of the unexplained heritability, but identifying such variants is challenging. The primary problem is lack of statistical power to identifying individual risk variants by association as power is a function of sample size, effect size and allele frequency. Power can be increased by using burden tests which test for association of carriers of any variant in a specified genomic region. This has the effect of increasing the putative effect allele frequency. PAX8 is a transcription factor that plays a critical role in tumour progression, migration and invasion. Furthermore, regulatory elements proximal to target genes of PAX8 are enriched for common ovarian cancer risk variants. We hypothesised that rare variation in PAX8 binding sites are also associated with ovarian cancer risk, but unlikely to be associated with risk of breast, colorectal or endometrial cancer. We have used publicly available, whole-genome sequencing data from the UK 100,000 Genomes Project to evaluate the burden of rare variation in PAX8 binding sites across the genome. Data were available for 522 ovarian cancers, 2,984 breast cancers, 2,696 colorectal cancers, 836 endometrial cancers and 2253 non-cancer controls. Active binding sites were defined using data from multiple PAX8 and H3K27 ChIPseq experiments. We found no association between the burden of rare variation in PAX8 binding sites (defined in several ways) and risk of ovarian, breast or endometrial cancer. An apparent association with colorectal cancer was likely to be a technical artefact as a similar association was also detected for rare variation in random regions of the genome. Despite the null result this study provides a proof-of -principle for using burden testing to identify rare, non-coding germline genetic variation associated with disease. Larger sample sizes available from large-scale sequencing projects together with improved understanding of the function of the non-coding genome will increase the potential of similar studies in the future.
RESUMO
Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.