Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
Add more filters

Publication year range
1.
Mol Syst Biol ; 20(4): 362-373, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38355920

ABSTRACT

Unraveling the genetic sources of gene expression variation is essential to better understand the origins of phenotypic diversity in natural populations. Genome-wide association studies identified thousands of variants involved in gene expression variation, however, variants detected only explain part of the heritability. In fact, variants such as low-frequency and structural variants (SVs) are poorly captured in association studies. To assess the impact of these variants on gene expression variation, we explored a half-diallel panel composed of 323 hybrids originated from pairwise crosses of 26 natural Saccharomyces cerevisiae isolates. Using short- and long-read sequencing strategies, we established an exhaustive catalog of single nucleotide polymorphisms (SNPs) and SVs for this panel. Combining this dataset with the transcriptomes of all hybrids, we comprehensively mapped SNPs and SVs associated with gene expression variation. While SVs impact gene expression variation, SNPs exhibit a higher effect size with an overrepresentation of low-frequency variants compared to common ones. These results reinforce the importance of dissecting the heritability of complex traits with a comprehensive catalog of genetic variants at the population level.


Subject(s)
Genome-Wide Association Study , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Gene Expression , Polymorphism, Single Nucleotide/genetics , Genetic Variation
2.
Hum Reprod ; 2024 May 30.
Article in English | MEDLINE | ID: mdl-38815977

ABSTRACT

STUDY QUESTION: Can a genome-wide association study (GWAS) meta-analysis, including a large sample of young premenopausal women from a founder population from Northern Finland, identify novel genetic variants for circulating anti-Müllerian hormone (AMH) levels and provide insights into single-nucleotide polymorphism enrichment in different biological pathways and tissues involved in AMH regulation? SUMMARY ANSWER: The meta-analysis identified a total of six loci associated with AMH levels at P < 5 × 10-8, three of which were novel in or near CHEK2, BMP4, and EIF4EBP1, as well as highlighted significant enrichment in renal system vasculature morphogenesis, and the pituitary gland as the top associated tissue in tissue enrichment analysis. WHAT IS KNOWN ALREADY: AMH is expressed by preantral and small antral stage ovarian follicles in women, and variation in age-specific circulating AMH levels has been associated with several health conditions. However, the biological mechanisms underlying the association between health conditions and AMH levels are not yet fully understood. Previous GWAS have identified loci associated with AMH levels in pre-menopausal women, in or near MCM8, AMH, TEX41, and CDCA7. STUDY DESIGN, SIZE, DURATION: We performed a GWAS meta-analysis for circulating AMH level measurements in 9668 pre-menopausal women. PARTICIPANTS/MATERIALS, SETTING, METHODS: We performed a GWAS meta-analysis in which we combined 2619 AMH measurements (at age 31 years) from a prospective founder population cohort (Northern Finland Birth Cohort 1966, NFBC1966) with a previous GWAS meta-analysis that included 7049 pre-menopausal women (age range 15-48 years) (N = 9668). NFBC1966 AMH measurements were quantified using an automated assay. We annotated the genetic variants, combined different data layers to prioritize potential candidate genes, described significant pathways and tissues enriched by the GWAS signals, identified plausible regulatory roles using colocalization analysis, and leveraged publicly available summary statistics to assess genetic and phenotypic correlations with multiple traits. MAIN RESULTS AND THE ROLE OF CHANCE: Three novel genome-wide significant loci were identified. One of these is in complete linkage disequilibrium with c.1100delC in CHEK2, which is found to be 4-fold enriched in the Finnish population compared to other European populations. We propose a plausible regulatory effect of some of the GWAS variants linked to AMH, as they colocalize with GWAS signals associated with gene expression levels of BMP4, TEX41, and EIFBP41. Gene set analysis highlighted significant enrichment in renal system vasculature morphogenesis, and tissue enrichment analysis ranked the pituitary gland as the top association. LARGE SCALE DATA: The GWAS meta-analysis summary statistics are available for download from the GWAS Catalogue with accession number GCST90428625. LIMITATIONS, REASONS FOR CAUTION: This study only included women of European ancestry and the lack of sufficiently sized relevant tissue data in gene expression datasets hinders the assessment of potential regulatory effects in reproductive tissues. WIDER IMPLICATIONS OF THE FINDINGS: Our results highlight the increased power of founder populations and larger sample sizes to boost the discovery of novel trait-associated variants underlying variation in AMH levels, which aided the characterization of GWAS signals enrichment in different biological pathways and plausible genetic regulatory effects linked with AMH level variation for the first time. STUDY FUNDING/COMPETING INTEREST(S): This work has received funding from the European Union's Horizon 2020 Research and Innovation Programme under the MATER Marie Sklodowska-Curie Grant Agreement No. 813707 and Oulu University Scholarship Foundation and Paulon Säätiö Foundation. (N.P.-G.), Academy of Finland, Sigrid Jusélius Foundation, Novo Nordisk, University of Oulu, Roche Diagnostics (T.T.P.). This work was supported by the Estonian Research Council Grant 1911 (R.M.). J.R. was supported by the European Union's Horizon 2020 Research and Innovation Program under Grant Agreements No. 874739 (LongITools), 824989 (EUCAN-Connect), 848158 (EarlyCause), and 733206 (LifeCycle). U.V. was supported by the Estonian Research Council grant PRG (PRG1291). The NFBC1966 received financial support from University of Oulu Grant No. 24000692, Oulu University Hospital Grant No. 24301140, and ERDF European Regional Development Fund Grant No. 539/2010 A31592. T.T.P. has received grants from Roche, Perkin Elmer, and honoraria for scientific presentations from Gedeon Richter, Exeltis, Astellas, Roche, Stragen, Astra Zeneca, Merck, MSD, Ferring, Duodecim, and Ajaton Terveys. For all other authors, there are no competing interests.

3.
Int J Mol Sci ; 24(18)2023 Sep 07.
Article in English | MEDLINE | ID: mdl-37762110

ABSTRACT

Whole-exome sequencing (WES) in families with an unexplained tendency for venous thromboembolism (VTE) may favor detection of low-frequency variants in genes with known contribution to hemostasis or associated with VTE-related phenotypes. WES analysis in six family members, three of whom affected by documented VTE, filtered for MAF < 0.04 in 192 candidate genes, revealed 22 heterozygous (16 missense and six synonymous) variants in patients. Functional prediction by multi-component bioinformatics tools, implemented by a database/literature search, including ClinVar annotation and QTL analysis, prioritized 12 missense variants, three of which (CRP Leu61Pro, F2 Asn514Lys and NQO1 Arg139Trp) were present in all patients, and the frequent functional variants FGB Arg478Lys and IL1A Ala114Ser. Combinations of prioritized variants in each patient were used to infer functional protein interactions. Different interaction patterns, supported by high-quality evidence, included eight proteins intertwined in the "acute phase" (CRP, F2, SERPINA1 and IL1A) and/or in the "fibrinogen complex" (CRP, F2, PLAT, THBS1, VWF and FGB) significantly enriched terms. In a wide group of candidate genes, this approach highlighted six low-frequency variants (CRP Leu61Pro, F2 Asn514Lys, SERPINA1 Arg63Cys, THBS1 Asp901Glu, VWF Arg1399His and PLAT Arg164Trp), five of which were top ranked for predicted deleteriousness, which in different combinations may contribute to disease susceptibility in members of this family.


Subject(s)
Venous Thromboembolism , Humans , Venous Thromboembolism/genetics , Exome Sequencing , von Willebrand Factor/genetics , Genes, Regulator , Computational Biology
4.
Adv Exp Med Biol ; 1361: 37-54, 2022.
Article in English | MEDLINE | ID: mdl-35230682

ABSTRACT

Re-sequencing of the human genome by next-generation sequencing (NGS) has been widely applied to discover pathogenic genetic variants and/or causative genes accounting for various types of diseases including cancers. The advances in NGS have allowed the sequencing of the entire genome of patients and identification of disease-associated variants in a reasonable timeframe and cost. The core of the variant identification relies on accurate variant calling and annotation. Numerous algorithms have been developed to elucidate the repertoire of somatic and germline variants. Each algorithm has its own distinct strengths, weaknesses, and limitations due to the difference in the statistical modeling approach adopted and read information utilized. Accurate variant calling remains challenging due to the presence of sequencing artifacts and read misalignments. All of these can lead to the discordance of the variant calling results and even misinterpretation of the discovery. For somatic variant detection, multiple factors including chromosomal abnormalities, tumor heterogeneity, tumor-normal cross contaminations, unbalanced tumor/normal sample coverage, and variants with low allele frequencies add even more layers of complexity to accurate variant identification. Given the discordances and difficulties, ensemble approaches have emerged by harmonizing information from different algorithms to improve variant calling performance. In this chapter, we first introduce the general scheme of variant calling algorithms and potential challenges at distinct stages. We next review the existing workflows of variant calling and annotation, and finally explore the strategies deployed by different callers as well as their strengths and caveats. Overall, NGS-based variant identification with careful consideration allows reliable detection of pathogenic variant and candidate variant selection for precision medicine.


Subject(s)
Genome, Human , High-Throughput Nucleotide Sequencing , Algorithms , Germ Cells , High-Throughput Nucleotide Sequencing/methods , Humans , Models, Statistical , Software
5.
Curr Issues Mol Biol ; 43(3): 1778-1793, 2021 Oct 27.
Article in English | MEDLINE | ID: mdl-34889895

ABSTRACT

Multiple Sclerosis (MS) is a complex multifactorial autoimmune disease, whose sex- and age-adjusted prevalence in Sardinia (Italy) is among the highest worldwide. To date, 233 loci were associated with MS and almost 20% of risk heritability is attributable to common genetic variants, but many low-frequency and rare variants remain to be discovered. Here, we aimed to contribute to the understanding of the genetic basis of MS by investigating potentially functional rare variants. To this end, we analyzed thirteen multiplex Sardinian families with Immunochip genotyping data. For five families, Whole Exome Sequencing (WES) data were also available. Firstly, we performed a non-parametric Homozygosity Haplotype analysis for identifying the Region from Common Ancestor (RCA). Then, on these potential disease-linked RCA, we searched for the presence of rare variants shared by the affected individuals by analyzing WES data. We found: (i) a variant (43181034 T > G) in the splicing region on exon 27 of CUL9; (ii) a variant (50245517 A > C) in the splicing region on exon 16 of ATP9A; (iii) a non-synonymous variant (43223539 A > C), on exon 9 of TTBK1; (iv) a non-synonymous variant (42976917 A > C) on exon 9 of PPP2R5D; and v) a variant (109859349-109859354) in 3'UTR of MYO16.


Subject(s)
Exome Sequencing , Genetic Predisposition to Disease , Genetic Variation , Haplotypes , Homozygote , Multiple Sclerosis/diagnosis , Multiple Sclerosis/genetics , Alleles , Female , Genome-Wide Association Study , Humans , Italy , Male , Pedigree , Polymorphism, Single Nucleotide
6.
Mol Cell Biochem ; 476(7): 2703-2718, 2021 Jul.
Article in English | MEDLINE | ID: mdl-33666829

ABSTRACT

The zinc transporter 8 (ZnT8) plays an essential role in zinc homeostasis inside pancreatic ß cells, its function is related to the stabilization of insulin hexameric form. Genome-wide association studies (GWAS) have established a positive and negative relationship of ZnT8 variants with type 2 diabetes mellitus (T2DM), exposing a dual and controversial role. The first hypotheses about its role in T2DM indicated a higher risk of developing T2DM for loss of function; nevertheless, recent GWAS of ZnT8 loss-of-function mutations in humans have shown protection against T2DM. With regard to the ZnT8 role in T2DM, most studies have focused on rodent models and common high-risk variants; however, considerable differences between human and rodent models have been found and the new approaches have included lower-frequency variants as a tool to clarify gene functions, allowing a better understanding of the disease and offering possible therapeutic targets. Therefore, this review will discuss the physiological effects of the ZnT8 variants associated with a major and lower risk of T2DM, emphasizing the low- and rare-frequency variants.


Subject(s)
Diabetes Mellitus, Type 2 , Zinc Transporter 8 , Animals , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Genome-Wide Association Study , Humans , Zinc Transporter 8/deficiency , Zinc Transporter 8/metabolism
7.
Int J Mol Sci ; 22(18)2021 Sep 13.
Article in English | MEDLINE | ID: mdl-34576031

ABSTRACT

TREM2 is among the most well-known Alzheimer's disease (AD) risk genes; however, the functional roles of its AD-associated variants remain to be elucidated, and most known risk alleles are low-frequency variants whose investigation is challenging. Here, we utilized a splicing-guided aggregation method in which multiple low-frequency TREM2 variants were bundled together to investigate the functional impact of those variants on alternative splicing in AD. We analyzed whole genome sequencing (WGS) and RNA-seq data generated from cognitively normal elderly controls (CN) and AD patients in two independent cohorts, representing three regions in the frontal lobe of the human brain: the dorsolateral prefrontal cortex (CN = 213 and AD = 376), frontal pole (CN = 72 and AD = 175), and inferior frontal (CN = 63 and AD = 157). We observed an exon skipping event in the second exon of TREM2, with that exon tending to be more frequently skipped (p = 0.0012) in individuals having at least one low-frequency variant that caused loss-of-function for a splicing regulatory element. In addition, genes differentially expressed between AD patients with high vs. low skipping of the second exon (i.e., loss of a TREM2 functional domain) were significantly enriched in immune-related pathways. Our splicing-guided aggregation method thus provides new insight into the regulation of alternative splicing of the second exon of TREM2 by low-frequency variants and could be a useful tool for further exploring the potential molecular mechanisms of multiple, disease-associated, low-frequency variants.


Subject(s)
Alternative Splicing/genetics , Alzheimer Disease/genetics , Genetic Predisposition to Disease , Membrane Glycoproteins/genetics , Receptors, Immunologic/genetics , Aged , Alzheimer Disease/pathology , Brain/metabolism , Brain/pathology , Exons/genetics , Female , Gene Frequency/genetics , Genetic Variation/genetics , Humans , Male , RNA Splicing/genetics , RNA-Seq , Regulatory Sequences, Nucleic Acid/genetics , Whole Genome Sequencing
8.
Zhonghua Zhong Liu Za Zhi ; 43(7): 801-805, 2021 Jul 23.
Article in Zh | MEDLINE | ID: mdl-34289576

ABSTRACT

Objective: To analyze the association between low-frequency variants of ARID1A gene and primary liver cancer using latent category model. Methods: The low-frequency variants of ARID1A gene was combined according to different functional areas, and the combined variables were analyzed by using the latent class model to obtain the latent variables. Then the logistic regression was used to analyze the association between low-frequency variants of ARID1A gene and primary liver cancer. Results: The low-frequency variants of ARID1A gene were divided into three categories by the latent class model. The class 1 was mainly unmutated population, the proportion was 94.2% (2 454/2 603). The class 2 was mainly transcriptional regulatory domain mutation, take 4.8% (124/2 603). The class 3 was dominantly exon mutation, about 1.0% (27/2 603). Using class 1 as a reference, it was found that mutations in the transcriptional regulatory domain could reduce the risk of liver cancer (OR=0.601, 95% CI=0.364-0.992, P=0.046). Conclusion: The latent class model can identify low-frequency variants of gene associated with liver cancer and can be extended to more genetic association studies of low-frequency variants related to complex diseases.


Subject(s)
Liver Neoplasms , Nuclear Proteins , DNA-Binding Proteins , Humans , Latent Class Analysis , Liver Neoplasms/genetics , Mutation , Nuclear Proteins/genetics , Transcription Factors/genetics
9.
BMC Bioinformatics ; 21(1): 96, 2020 Mar 04.
Article in English | MEDLINE | ID: mdl-32131723

ABSTRACT

BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.


Subject(s)
User-Computer Interface , Algorithms , DNA/chemistry , DNA/metabolism , Humans , Sequence Alignment , Sequence Analysis, DNA
10.
Breast Cancer Res Treat ; 167(1): 249-256, 2018 01.
Article in English | MEDLINE | ID: mdl-28913729

ABSTRACT

PURPOSE: Anthracyclines are widely used chemotherapeutic drugs that can cause progressive and irreversible cardiac damage and fatal heart failure. Several genetic variants associated with anthracycline-induced cardiotoxicity (AIC) have been identified, but they explain only a small proportion of the interindividual differences in AIC susceptibility. METHODS: In this study, we evaluated the association of low-frequency variants with risk of chronic AIC using the Illumina HumanExome BeadChip array in a discovery cohort of 61 anthracycline-treated breast cancer patients with replication in a second independent cohort of 83 anthracycline-treated pediatric cancer patients, using gene-based tests (SKAT-O). RESULTS: The most significant associated gene in the discovery cohort was ETFB (electron transfer flavoprotein beta subunit) involved in mitochondrial ß-oxidation and ATP production (P = 4.16 × 10-4) and this association was replicated in an independent set of anthracycline-treated cancer patients (P = 2.81 × 10-3). Within ETFB, we found that the missense variant rs79338777 (p.Pro52Leu; c.155C > T) made the greatest contribution to the observed gene association and it was associated with increased risk of chronic AIC in the two cohorts separately and when combined (OR 9.00, P = 1.95 × 10-4, 95% CI 2.83-28.6). CONCLUSIONS: We identified and replicated a novel gene, ETFB, strongly associated with chronic AIC independently of age at tumor onset and related to anthracycline-mediated mitochondrial dysfunction. Although experimental verification and further studies in larger patient cohorts are required to confirm our finding, we demonstrated that exome array data analysis represents a valuable strategy to identify novel genes contributing to the susceptibility to chronic AIC.


Subject(s)
Anthracyclines/adverse effects , Breast Neoplasms/genetics , Cardiotoxicity/genetics , Electron-Transferring Flavoproteins/genetics , Adult , Aged , Breast Neoplasms/complications , Breast Neoplasms/drug therapy , Breast Neoplasms/pathology , Cancer Survivors , Cardiotoxicity/physiopathology , Exome/genetics , Female , Gene Expression Regulation, Neoplastic , Genetic Association Studies , Genetic Predisposition to Disease , Heart Failure/chemically induced , Heart Failure/genetics , Heart Failure/pathology , Humans , Middle Aged , Mitochondria/drug effects , Mitochondria/pathology
11.
Genet Epidemiol ; 40(5): 404-15, 2016 07.
Article in English | MEDLINE | ID: mdl-27230302

ABSTRACT

Studying gene-environment (G × E) interactions is important, as they extend our knowledge of the genetic architecture of complex traits and may help to identify novel variants not detected via analysis of main effects alone. The main statistical framework for studying G × E interactions uses a single regression model that includes both the genetic main and G × E interaction effects (the "joint" framework). The alternative "stratified" framework combines results from genetic main-effect analyses carried out separately within the exposed and unexposed groups. Although there have been several investigations using theory and simulation, an empirical comparison of the two frameworks is lacking. Here, we compare the two frameworks using results from genome-wide association studies of systolic blood pressure for 3.2 million low frequency and 6.5 million common variants across 20 cohorts of European ancestry, comprising 79,731 individuals. Our cohorts have sample sizes ranging from 456 to 22,983 and include both family-based and population-based samples. In cohort-specific analyses, the two frameworks provided similar inference for population-based cohorts. The agreement was reduced for family-based cohorts. In meta-analyses, agreement between the two frameworks was less than that observed in cohort-specific analyses, despite the increased sample size. In meta-analyses, agreement depended on (1) the minor allele frequency, (2) inclusion of family-based cohorts in meta-analysis, and (3) filtering scheme. The stratified framework appears to approximate the joint framework well only for common variants in population-based cohorts. We conclude that the joint framework is the preferred approach and should be used to control false positives when dealing with low-frequency variants and/or family-based cohorts.


Subject(s)
Blood Pressure/genetics , Gene-Environment Interaction , Smoking , Cohort Studies , Databases, Factual , Family , Gene Frequency , Genome-Wide Association Study , Genotype , Humans , Phenotype
12.
Annu Rev Genomics Hum Genet ; 15: 295-325, 2014.
Article in English | MEDLINE | ID: mdl-24821496

ABSTRACT

Human immunodeficiency virus (HIV) exhibits remarkable diversity in its genomic makeup and exists in any given individual as a complex distribution of closely related but nonidentical genomes called a viral quasispecies, which is subject to genetic variation, competition, and selection. This viral diversity clinically manifests as a selection of mutant variants based on viral fitness in treatment-naive individuals and based on drug-selective pressure in those on antiretroviral therapy (ART). The current standard-of-care ART consists of a combination of antiretroviral agents, which ensures maximal viral suppression while preventing the emergence of drug-resistant HIV variants. Unfortunately, transmission of drug-resistant HIV does occur, affecting 5% to >20% of newly infected individuals. To optimize therapy, clinicians rely on viral genotypic information obtained from conventional population sequencing-based assays, which cannot reliably detect viral variants that constitute <20% of the circulating viral quasispecies. These low-frequency variants can be detected by highly sensitive genotyping methods collectively grouped under the moniker of deep sequencing. Low-frequency variants have been correlated to treatment failures and HIV transmission, and detection of these variants is helping to inform strategies for vaccine development. Here, we discuss the molecular virology of HIV, viral heterogeneity, drug-resistance mutations, and the application of deep sequencing technologies in research and the clinical care of HIV-infected individuals.


Subject(s)
Genome, Viral , HIV Infections/genetics , HIV/genetics , High-Throughput Nucleotide Sequencing , Drug Resistance, Viral/genetics , HIV/chemistry , HIV/pathogenicity , HIV Infections/drug therapy , HIV Infections/epidemiology , HIV Infections/transmission , HIV Infections/virology , Humans , Mutation
13.
Mol Carcinog ; 56(2): 774-780, 2017 02.
Article in English | MEDLINE | ID: mdl-27479355

ABSTRACT

Genome-wide association studies have reported more than 100 independent common loci associated with breast cancer risk. The contribution of low-frequency or rare variants to breast cancer susceptibility has not been well explored. Thus, we applied exome chip to genotype >200 000 low-frequency and rare variants in 1064 breast cancer cases and 1125 cancer-free controls and subsequently validated promising associations in another 1040 breast cancer cases and 1240 controls. We identified two low-frequency nonsynonymous variants at FKBPL (rs200847762, OR = 0.34, 95% CI = 0.20-0.57, P = 4.31 × 10-5 ) and ARPC1B (rs1045012, OR = 0.56, 95% CI = 0.43-0.74, P = 4.30 × 10-5 ) associated with breast cancer risk. In stratification analyses, we found that the protective effect of rs200847762 was stronger in ER-positive breast cancer (OR = 0.18, 95% CI = 0.06-0.42) than that in ER-negative one (OR = 0.59, 95% CI = 0.31-1.05). Our findings indicate that low-frequency variants may also contribute to breast cancer susceptibility and genetic variants in 6p21.33 and 7q22.1 are important in breast carcinogenesis. © 2016 Wiley Periodicals, Inc.


Subject(s)
Actin-Related Protein 2-3 Complex/genetics , Breast Neoplasms/genetics , Immunophilins/genetics , Polymorphism, Genetic , Adult , Asian People/genetics , Breast/pathology , Breast Neoplasms/diagnosis , Breast Neoplasms/epidemiology , Case-Control Studies , China/epidemiology , Female , Gene Frequency , Genetic Predisposition to Disease , Genome-Wide Association Study , Genotype , Humans , Middle Aged , Tacrolimus Binding Proteins
14.
Diabetologia ; 59(5): 938-41, 2016 May.
Article in English | MEDLINE | ID: mdl-26993633

ABSTRACT

Over the last 10 years substantial progress has been made in our understanding of the genetic basis for type 2 diabetes and related traits. These developments have been facilitated by technological advancements that have allowed comprehensive genome-wide assessments of the impact of common genetic variation on disease risk. Current efforts are now focused on extending this to genetic variants in the rare and low-frequency spectrum by capitalising on next-generation sequencing technologies. This review discusses the important contributions that studies in isolated populations are making to this effort for diabetes and metabolic disease, drawing on specific examples from populations in Greece and Greenland. This review summarises a presentation given at the 'Exciting news in genetics of diabetes' symposium at the 2015 annual meeting of the EASD, with topics presented by Eleftheria Zeggini and Torben Hansen, and an overview by the Session Chair, Anna Gloyn.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Greece , Greenland , High-Throughput Nucleotide Sequencing , Humans
15.
Genet Epidemiol ; 39(7): 499-508, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26454253

ABSTRACT

Although genome-wide association studies (GWAS) have identified thousands of trait-associated genetic variants, there are relatively few findings on the X chromosome. For analysis of low-frequency variants (minor allele frequency <5%), investigators can use region- or gene-based tests where multiple variants are analyzed jointly to increase power. To date, there are no gene-based tests designed for association testing of low-frequency variants on the X chromosome. Here we propose three gene-based tests for the X chromosome: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKAT-O). Using simulated case-control and quantitative trait (QT) data, we evaluate the calibration and power of these tests as a function of (1) male:female sample size ratio; and (2) coding of haploid male genotypes for variants under X-inactivation. For case-control studies, all three tests are reasonably well-calibrated for all scenarios we evaluated. As expected, power for gene-based tests depends on the underlying genetic architecture of the genomic region analyzed. Studies with more (haploid) males are generally less powerful due to decreased number of chromosomes. Power generally is slightly greater when the coding scheme for male genotypes matches the true underlying model, but the power loss for misspecifying the (generally unknown) model is small. For QT studies, type I error and power results largely mirror those for binary traits. We demonstrate the use of these three gene-based tests for X-chromosome association analysis in simulated data and sequencing data from the Genetics of Type 2 Diabetes (GoT2D) study.


Subject(s)
Chromosomes, Human, X/genetics , Genetic Association Studies/methods , Genetic Variation/genetics , Calibration , Case-Control Studies , Diabetes Mellitus, Type 2/genetics , Female , Gene Frequency/genetics , Genome-Wide Association Study , Genotype , Humans , Male , Phenotype , Sample Size , X Chromosome Inactivation/genetics
16.
Behav Genet ; 46(5): 693-704, 2016 09.
Article in English | MEDLINE | ID: mdl-27085880

ABSTRACT

Common SNPs in nicotinic acetylcholine receptor genes (CHRN genes) have been associated with drug behaviors and personality traits, but the influence of rare genetic variants is not well characterized. The goal of this project was to identify novel rare variants in CHRN genes in the Center for Antisocial Drug Dependence (CADD) and Genetics of Antisocial Drug Dependence (GADD) samples and to determine if low frequency variants are associated with antisocial drug dependence. Two samples of 114 and 200 individuals were selected using a case/control design including the tails of the phenotypic distribution of antisocial drug dependence. The capture, sequencing, and analysis of all variants in 16 CHRN genes (CHRNA1-7, 9, 10, CHRNB1-4, CHRND, CHRNG, CHRNE) were performed independently for each subject in each sample. Sequencing reads were aligned to the human reference sequence using BWA prior to variant calling with the Genome Analysis ToolKit (GATK). Low frequency variants (minor allele frequency < 0.05) were analyzed using SKAT-O and C-alpha to examine the distribution of rare variants among cases and controls. In our larger sample, the region containing the CHRNA6/CHRNB3 gene cluster was significantly associated with disease status using both SKAT-O and C-alpha (unadjusted p values <0.05). More low frequency variants in the CHRNA6/CHRNB3 gene region were observed in cases compared to controls. These data support a role for genetic variants in CHRN genes and antisocial drug behaviors.


Subject(s)
Gene Frequency/genetics , Genetic Association Studies , Genetic Predisposition to Disease , Receptors, Nicotinic/genetics , Adolescent , Antisocial Personality Disorder , Female , Humans , Male , Quality Control , Software , Young Adult
17.
Genet Epidemiol ; 37(6): 539-50, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23788246

ABSTRACT

In genome-wide association studies of binary traits, investigators typically use logistic regression to test common variants for disease association within studies, and combine association results across studies using meta-analysis. For common variants, logistic regression tests are well calibrated, and meta-analysis of study-specific association results is only slightly less powerful than joint analysis of the combined individual-level data. In recent sequencing and dense chip based association studies, investigators increasingly test low-frequency variants for disease association. In this paper, we seek to (1) identify the association test with maximal power among tests with well controlled type I error rate and (2) compare the relative power of joint and meta-analysis tests. We use analytic calculation and simulation to compare the empirical type I error rate and power of four logistic regression based tests: Wald, score, likelihood ratio, and Firth bias-corrected. We demonstrate for low-count variants (roughly minor allele count [MAC] < 400) that: (1) for joint analysis, the Firth test has the best combination of type I error and power; (2) for meta-analysis of balanced studies (equal numbers of cases and controls), the score test is best, but is less powerful than Firth test based joint analysis; and (3) for meta-analysis of sufficiently unbalanced studies, all four tests can be anti-conservative, particularly the score test. We also establish MAC as the key parameter determining test calibration for joint and meta-analysis.


Subject(s)
Genetic Variation , Logistic Models , Models, Genetic , Calibration , Case-Control Studies , Computer Simulation , Diabetes Mellitus, Type 2/genetics , Gene Frequency , Humans , Meta-Analysis as Topic
18.
G3 (Bethesda) ; 14(6)2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38656424

ABSTRACT

Identifying genuine polymorphic variants is a significant challenge in sequence data analysis, although detecting low-frequency variants in sequence data is essential for estimating demographic parameters and investigating genetic processes, such as selection, within populations. Arbuscular mycorrhizal (AM) fungi are multinucleate organisms, in which individual nuclei collectively operate as a population, and the extent of genetic variation across nuclei has long been an area of scientific interest. In this study, we investigated the patterns of polymorphism discovery and the alternate allele frequency distribution by comparing polymorphism discovery in 2 distinct genomic sequence datasets of the AM fungus model species, Rhizophagus irregularis strain DAOM197198. The 2 datasets used in this study are publicly available and were generated either from pooled spores and hyphae or amplified single nuclei from a single spore. We also estimated the intraorganismal variation within the DAOM197198 strain. Our results showed that the 2 datasets exhibited different frequency patterns for discovered variants. The whole-organism dataset showed a distribution spanning low-, intermediate-, and high-frequency variants, whereas the single-nucleus dataset predominantly featured low-frequency variants with smaller proportions in intermediate and high frequencies. Furthermore, single nucleotide polymorphism density estimates within both the whole organism and individual nuclei confirmed the low intraorganismal variation of the DAOM197198 strain and that most variants are rare. Our study highlights the methodological challenges associated with detecting low-frequency variants in AM fungal whole-genome sequence data and demonstrates that alternate alleles can be reliably identified in single nuclei of AM fungi.


Subject(s)
Glomeromycota , Mycorrhizae , Mycorrhizae/genetics , Glomeromycota/genetics , Genome, Fungal , Polymorphism, Single Nucleotide , Gene Frequency , Genetic Variation , Cell Nucleus/genetics , Fungi
19.
bioRxiv ; 2024 Apr 28.
Article in English | MEDLINE | ID: mdl-38712066

ABSTRACT

The evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved for cis regulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlying cis-regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverse Arabidopsis thaliana accessions. We first tested for evidence that cis regulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learn de novo cis-regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation in cis-regulatory sequences, although more information needs to be incorporated and better models may be required.

20.
Ann Hum Genet ; 77(4): 333-5, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23488943

ABSTRACT

In a recent paper in this journal, the use of variance-stabilising transformation techniques was proposed to overcome the problem of inadequacy in normality approximation when testing association for a low-frequency variant in a case-control study. It was shown that tests based on the variance-stabilising transformations are more powerful than Fisher's exact test while controlling for type I error rate. Earlier in the journal, another study had shown that the likelihood ratio test (LRT) is superior to Fisher's exact test, Wald's test, and Pearson's χ(2) test in testing association for low-frequency variants. Thus, it is of interest to make a direct comparison between the LRT and the tests based on the variance-stabilising transformations. In this commentary, we show that the LRT and the variance-stabilising transformation-based tests have comparable power greater than Fisher's exact test, Wald's test, and Pearson's χ(2) test.


Subject(s)
Gene Frequency , Genetic Variation , Models, Statistical , Humans
SELECTION OF CITATIONS
SEARCH DETAIL