RESUMO
The existing framework of Mendelian randomization (MR) infers the causal effect of one or multiple exposures on one single outcome. It is not designed to jointly model multiple outcomes, as would be necessary to detect causes of more than one outcome and would be relevant to model multimorbidity or other related disease outcomes. Here, we introduce multi-response Mendelian randomization (MR2), an MR method specifically designed for multiple outcomes to identify exposures that cause more than one outcome or, conversely, exposures that exert their effect on distinct responses. MR2 uses a sparse Bayesian Gaussian copula regression framework to detect causal effects while estimating the residual correlation between summary-level outcomes, i.e., the correlation that cannot be explained by the exposures, and vice versa. We show both theoretically and in a comprehensive simulation study how unmeasured shared pleiotropy induces residual correlation between outcomes irrespective of sample overlap. We also reveal how non-genetic factors that affect more than one outcome contribute to their correlation. We demonstrate that by accounting for residual correlation, MR2 has higher power to detect shared exposures causing more than one outcome. It also provides more accurate causal effect estimates than existing methods that ignore the dependence between related responses. Finally, we illustrate how MR2 detects shared and distinct causal exposures for five cardiovascular diseases in two applications considering cardiometabolic and lipidomic exposures and uncovers residual correlation between summary-level outcomes reflecting known relationships between cardiovascular diseases.
Assuntos
Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genética , Teorema de Bayes , Multimorbidade , Análise da Randomização Mendeliana/métodos , Causalidade , Estudo de Associação Genômica AmplaRESUMO
Understanding the penetrance of pathogenic variants identified as secondary findings (SFs) is of paramount importance with the growing availability of genetic testing. We estimated penetrance through large-scale analyses of individuals referred for diagnostic sequencing for hypertrophic cardiomyopathy (HCM; 10,400 affected individuals, 1,332 variants) and dilated cardiomyopathy (DCM; 2,564 affected individuals, 663 variants), using a cross-sectional approach comparing allele frequencies against reference populations (293,226 participants from UK Biobank and gnomAD). We generated updated prevalence estimates for HCM (1:543) and DCM (1:220). In aggregate, the penetrance by late adulthood of rare, pathogenic variants (23% for HCM, 35% for DCM) and likely pathogenic variants (7% for HCM, 10% for DCM) was substantial for dominant cardiomyopathy (CM). Penetrance was significantly higher for variant subgroups annotated as loss of function or ultra-rare and for males compared to females for variants in HCM-associated genes. We estimated variant-specific penetrance for 316 recurrent variants most likely to be identified as SFs (found in 51% of HCM- and 17% of DCM-affected individuals). 49 variants were observed at least ten times (14% of affected individuals) in HCM-associated genes. Median penetrance was 14.6% (±14.4% SD). We explore estimates of penetrance by age, sex, and ancestry and simulate the impact of including future cohorts. This dataset reports penetrance of individual variants at scale and will inform the management of individuals undergoing genetic screening for SFs. While most variants had low penetrance and the costs and harms of screening are unclear, some individuals with highly penetrant variants may benefit from SFs.
Assuntos
Cardiomiopatias , Cardiomiopatia Dilatada , Cardiomiopatia Hipertrófica , Feminino , Masculino , Humanos , Adulto , Penetrância , Cardiomiopatias/genética , Cardiomiopatia Dilatada/genética , Frequência do GeneRESUMO
We present EPISPOT, a fully joint framework which exploits large panels of epigenetic annotations as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, EPISPOT accommodates functional information for both cis and trans actions, including QTL hotspot effects. It effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular traits with hypothesis-free selection of biologically interpretable annotations which directly contribute to the QTL effects. This unified, epigenome-aided learning boosts statistical power and sheds light on the regulatory basis of the uncovered hits; EPISPOT therefore marks an essential step toward improving the challenging detection and functional interpretation of trans-acting genetic variants and hotspots. We illustrate the advantages of EPISPOT in simulations emulating real-data conditions and in a monocyte expression QTL study, which confirms known hotspots and finds other signals, as well as plausible mechanisms of action. In particular, by highlighting the role of monocyte DNase-I sensitivity sites from >150 epigenetic annotations, we clarify the mediation effects and cell-type specificity of major hotspots close to the lysozyme gene. Our approach forgoes the daunting and underpowered task of one-annotation-at-a-time enrichment analyses for prioritizing cis and trans QTL hits and is tailored to any transcriptomic, proteomic, or metabolomic QTL problem. By enabling principled epigenome-driven QTL mapping transcriptome-wide, EPISPOT helps progress toward a better functional understanding of genetic regulation.
Assuntos
Algoritmos , Simulação por Computador , Epigenoma , Modelos Genéticos , Mutação , Fenótipo , Locos de Características Quantitativas , Teorema de Bayes , Mapeamento Cromossômico , HumanosRESUMO
BACKGROUND: Lipoprotein-related traits have been consistently identified as risk factors for atherosclerotic cardiovascular disease, largely on the basis of studies of coronary artery disease (CAD). The relative contributions of specific lipoproteins to the risk of peripheral artery disease (PAD) have not been well defined. We leveraged large-scale genetic association data to investigate the effects of circulating lipoprotein-related traits on PAD risk. METHODS: Genome-wide association study summary statistics for circulating lipoprotein-related traits were used in the mendelian randomization bayesian model averaging framework to prioritize the most likely causal major lipoprotein and subfraction risk factors for PAD and CAD. Mendelian randomization was used to estimate the effect of apolipoprotein B (ApoB) lowering on PAD risk using gene regions proxying lipid-lowering drug targets. Genes relevant to prioritized lipoprotein subfractions were identified with transcriptome-wide association studies. RESULTS: ApoB was identified as the most likely causal lipoprotein-related risk factor for both PAD (marginal inclusion probability, 0.86; P=0.003) and CAD (marginal inclusion probability, 0.92; P=0.005). Genetic proxies for ApoB-lowering medications were associated with reduced risk of both PAD (odds ratio,0.87 per 1-SD decrease in ApoB [95% CI, 0.84-0.91]; P=9×10-10) and CAD (odds ratio,0.66 [95% CI, 0.63-0.69]; P=4×10-73), with a stronger predicted effect of ApoB lowering on CAD (ratio of effects, 3.09 [95% CI, 2.29-4.60]; P<1×10-6). Extra-small very-low-density lipoprotein particle concentration was identified as the most likely subfraction associated with PAD risk (marginal inclusion probability, 0.91; P=2.3×10-4), whereas large low-density lipoprotein particle concentration was the most likely subfraction associated with CAD risk (marginal inclusion probability, 0.95; P=0.011). Genes associated with extra-small very-low-density lipoprotein particle and large low-density lipoprotein particle concentration included canonical ApoB pathway components, although gene-specific effects were variable. Lipoprotein(a) was associated with increased risk of PAD independently of ApoB (odds ratio, 1.04 [95% CI, 1.03-1.04]; P=1.0×10-33). CONCLUSIONS: ApoB was prioritized as the major lipoprotein fraction causally responsible for both PAD and CAD risk. However, ApoB-lowering drug targets and ApoB-containing lipoprotein subfractions had diverse associations with atherosclerotic cardiovascular disease, and distinct subfraction-associated genes suggest possible differences in the role of lipoproteins in the pathogenesis of PAD and CAD.
Assuntos
Apolipoproteínas/metabolismo , Suscetibilidade a Doenças , Doença Arterial Periférica/epidemiologia , Doença Arterial Periférica/etiologia , Alelos , Apolipoproteínas/sangue , Biomarcadores , Perfilação da Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Metabolismo dos Lipídeos , Doença Arterial Periférica/diagnóstico , Doença Arterial Periférica/metabolismo , Vigilância em Saúde Pública , Característica Quantitativa Herdável , Medição de Risco , Fatores de Risco , Transcriptoma , Reino Unido/epidemiologiaRESUMO
PURPOSE: Disruptions of genomic imprinting are associated with congenital imprinting disorders (CIDs) and other disease states, including cancer. CIDs are most often associated with altered methylation at imprinted differentially methylated regions (iDMRs). In some cases, multiple iDMRs are affected causing multilocus imprinting disturbances (MLIDs). The availability of accurate, quantitative, and scalable high-throughput methods to interrogate multiple iDMRs simultaneously would enhance clinical diagnostics and research. METHODS: We report the development of a custom targeted methylation sequencing panel that covered most relevant 63 iDMRs for CIDs and the detection of MLIDs. We tested it in 70 healthy controls and 147 individuals with CIDs. We distinguished loss and gain of methylation per differentially methylated region and classified high and moderate methylation alterations. RESULTS: Across a range of CIDs with a variety of molecular mechanisms, ImprintSeq performed at 98.4% sensitivity, 99.9% specificity, and 99.9% accuracy (when compared with previous diagnostic testing). ImprintSeq was highly sensitive for detecting MLIDs and enabled diagnostic criteria for MLID to be proposed. In a child with extreme MLID profile a probable genetic cause was identified. CONCLUSION: ImprintSeq provides a novel assay for clinical diagnostic and research studies of CIDs, MLIDs, and the role of disordered imprinting in human disease states.
Assuntos
Metilação de DNA , Impressão Genômica , Criança , Metilação de DNA/genética , Impressão Genômica/genética , HumanosRESUMO
PURPOSE: Accurate discrimination of benign and pathogenic rare variation remains a priority for clinical genome interpretation. State-of-the-art machine learning variant prioritization tools are imprecise and ignore important parameters defining gene-disease relationships, e.g., distinct consequences of gain-of-function versus loss-of-function variants. We hypothesized that incorporating disease-specific information would improve tool performance. METHODS: We developed a disease-specific variant classifier, CardioBoost, that estimates the probability of pathogenicity for rare missense variants in inherited cardiomyopathies and arrhythmias. We assessed CardioBoost's ability to discriminate known pathogenic from benign variants, prioritize disease-associated variants, and stratify patient outcomes. RESULTS: CardioBoost has high global discrimination accuracy (precision recall area under the curve [AUC] 0.91 for cardiomyopathies; 0.96 for arrhythmias), outperforming existing tools (4-24% improvement). CardioBoost obtains excellent accuracy (cardiomyopathies 90.2%; arrhythmias 91.9%) for variants classified with >90% confidence, and increases the proportion of variants classified with high confidence more than twofold compared with existing tools. Variants classified as disease-causing are associated with both disease status and clinical severity, including a 21% increased risk (95% confidence interval [CI] 11-29%) of severe adverse outcomes by age 60 in patients with hypertrophic cardiomyopathy. CONCLUSIONS: A disease-specific variant classifier outperforms state-of-the-art genome-wide tools for rare missense variants in inherited cardiac conditions ( https://www.cardiodb.org/cardioboost/ ), highlighting broad opportunities for improved pathogenicity prediction through disease specificity.
Assuntos
Cardiomiopatias , Mutação de Sentido Incorreto , Algoritmos , Área Sob a Curva , Cardiomiopatias/diagnóstico , Cardiomiopatias/genética , Humanos , Pessoa de Meia-Idade , Mutação de Sentido Incorreto/genética , VirulênciaRESUMO
Bisulfite amplicon sequencing has become the primary choice for single-base methylation quantification of multiple targets in parallel. The main limitation of this technology is a preferential amplification of an allele and strand in the PCR due to methylation state. This effect, known as 'PCR bias', causes inaccurate estimation of the methylation levels and calibration methods based on standard controls have been proposed to correct for it. Here, we present a Bayesian calibration tool, MethylCal, which can analyse jointly all CpGs within a CpG island (CGI) or a Differentially Methylated Region (DMR), avoiding 'one-at-a-time' CpG calibration. This enables more precise modeling of the methylation levels observed in the standard controls. It also provides accurate predictions of the methylation levels not considered in the controlled experiment, a feature that is paramount in the derivation of the corrected methylation degree. We tested the proposed method on eight independent assays (two CpG islands and six imprinting DMRs) and demonstrated its benefits, including the ability to detect outliers. We also evaluated MethylCal's calibration in two practical cases, a clinical diagnostic test on 18 patients potentially affected by Beckwith-Wiedemann syndrome, and 17 individuals with celiac disease. The calibration of the methylation levels obtained by MethylCal allows a clearer identification of patients undergoing loss or gain of methylation in borderline cases and could influence further clinical or treatment decisions.
Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Ilhas de CpG/genética , Metilação de DNA , Impressão Genômica , Análise de Sequência de DNA/métodos , Algoritmos , Síndrome de Beckwith-Wiedemann/diagnóstico , Síndrome de Beckwith-Wiedemann/genética , Síndrome de Beckwith-Wiedemann/terapia , Calibragem , Doença Celíaca/diagnóstico , Doença Celíaca/genética , Doença Celíaca/terapia , Humanos , Canais de Potássio de Abertura Dependente da Tensão da Membrana/genética , RNA Longo não Codificante/genética , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: Lymphangioleiomyomatosis (LAM) is a rare multisystem disease almost exclusively affecting women which causes loss of lung function, lymphatic abnormalities and angiomyolipomas. LAM occurs sporadically and in people with tuberous sclerosis complex (TSC). Loss of TSC gene function leads to dysregulated mechanistic target of rapamycin (mTOR) signalling. As mTOR is a regulator of lipid and nucleotide synthesis, we hypothesised that the serum metabolome would be altered in LAM and related to disease severity and activity. METHODS: Ultrahigh performance liquid chromatography-tandem mass spectroscopy was used to examine the serum metabolome of 79 closely phenotyped women with LAM, including 29 receiving treatment with an mTOR inhibitor and 43 healthy control women. RESULTS: Sphingolipid, fatty acid and phospholipid metabolites were associated with FEV1 in women with LAM (eg, behenoyl sphingomyelin adjusted (adj.) p=8.10 × 10-3). Those with higher disease-burden scores had abnormalities in fatty acid, phospholipid and lysolipids. Rate of loss of FEV1 was associated with differences in acyl-carnitine, acyl-glycines, acyl-glutamine, fatty acids, endocanbinoids and sphingolipids (eg, myristoleoylcarnitine adj. p=0.07). In TSC-LAM, rapamycin affected modules of interrelated metabolites which comprised linoleic acid, the tricarboxylic acid cycle, aminoacyl-tRNA biosynthesis, cysteine, methionine, arginine and proline metabolism. Metabolomic pathway analysis within modules reiterated the importance of glycerophospholipid metabolites (adj. p=0.047). CONCLUSIONS: Women with LAM have altered lipid metabolism. The associations between these metabolites, multiple markers of disease activity and their potential biological roles in cell survival and signalling, suggest that lipid species may be both disease-relevant biomarkers and potential therapeutic targets for LAM.
Assuntos
Ácidos Graxos/sangue , Linfangioleiomiomatose/sangue , Linfangioleiomiomatose/tratamento farmacológico , Fosfolipídeos/sangue , Esfingolipídeos/sangue , Serina-Treonina Quinases TOR/antagonistas & inibidores , Adulto , Estudos de Casos e Controles , Feminino , Humanos , Imunossupressores/uso terapêutico , Sirolimo/uso terapêuticoRESUMO
OBJECTIVE: Variant ataxia-telangiectasia is caused by mutations that allow some retained ataxia telangiectasia-mutated (ATM) kinase activity. Here, we describe the clinical features of the largest established cohort of individuals with variant ataxia-telangiectasia and explore genotype-phenotype correlations. METHODS: Cross-sectional data were collected retrospectively. Patients were classified as variant ataxia-telangiectasia based on retained ATM kinase activity. RESULTS: The study includes 57 individuals. Mean age at assessment was 37.5 years. Most had their first symptoms by age 10 (81%). There was a diagnostic delay of more than 10 years in 68% and more than 20 years in one third of probands. Disease severity was mild in one third of patients, and 43% were still ambulant 20 years after disease onset. Only one third had predominant ataxia, and 18% had a pure extrapyramidal presentation. Individuals with extrapyramidal presentations had milder neurological disease severity. There were no significant respiratory or immunological complications, but 25% of individuals had a history of malignancy. Missense mutations were associated with milder neurological disease severity, but with a higher risk of malignancy, compared to leaky splice site mutations. INTERPRETATION: Individuals with variant ataxia-telangiectasia require malignancy surveillance and tailored management. However, our data suggest the condition may sometimes be mis- or underdiagnosed because of atypical features, including exclusive extrapyramidal symptoms, normal eye movements, and normal alpha-fetoprotein levels in some individuals. Missense mutations are associated with milder neurological presentations, but a particularly high malignancy risk, and it is important for clinicians to be aware of these phenotypes. ANN NEUROL 2019;85:170-180.
Assuntos
Ataxia Telangiectasia/diagnóstico , Ataxia Telangiectasia/genética , Doenças dos Gânglios da Base/diagnóstico , Doenças dos Gânglios da Base/genética , Genótipo , Índice de Gravidade de Doença , Adolescente , Adulto , Criança , Estudos de Coortes , Estudos Transversais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Mutação de Sentido Incorreto/genética , Estudos Retrospectivos , Adulto JovemRESUMO
AIMS/HYPOTHESIS: The genetic risk of type 1 diabetes has been extensively studied. However, the genetic determinants of age at diagnosis (AAD) of type 1 diabetes remain relatively unexplained. Identification of AAD genes and pathways could provide insight into the earliest events in the disease process. METHODS: Using ImmunoChip data from 15,696 cases, we aimed to identify regions in the genome associated with AAD. RESULTS: Two regions were convincingly associated with AAD (p < 5 × 10-8): the MHC on 6p21, and 6q22.33. Fine-mapping of 6q22.33 identified two AAD-associated haplotypes in the region nearest to the genes encoding protein tyrosine phosphatase receptor kappa (PTPRK) and thymocyte-expressed molecule involved in selection (THEMIS). We examined the susceptibility to type 1 diabetes at these SNPs by performing a meta-analysis including 19,510 control participants. Although these SNPs were not associated with type 1 diabetes overall (p > 0.001), the SNP most associated with AAD, rs72975913, was associated with susceptibility to type 1 diabetes in those individuals diagnosed at less than 5 years old (p = 2.3 × 10-9). CONCLUSION/INTERPRETATION: PTPRK and its neighbour THEMIS are required for early development of the thymus, which we can assume influences the initiation of autoimmunity. Non-HLA genes may only be detectable as risk factors for the disease in individuals diagnosed under the age 5 years because, after that period of immune development, their role in disease susceptibility has become redundant.
Assuntos
Diabetes Mellitus Tipo 1/diagnóstico , Adulto , Cromossomos/genética , Diabetes Mellitus Tipo 1/genética , Diagnóstico Precoce , Feminino , Predisposição Genética para Doença/genética , Haplótipos/genética , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
MOTIVATION: Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ': hotspots ': , important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether eQTLs are common to several tissues, cell-types or, more generally, conditions or whether they are specific to a particular condition. RESULTS: We have implemented MT-HESS, a Bayesian hierarchical model that analyses the association between a large set of predictors, e.g. SNPs, and many responses, e.g. gene expression, in multiple tissues, cells or conditions. Our Bayesian sparse regression algorithm goes beyond ': one-at-a-time ': association tests between SNPs and responses and uses a fully multivariate model search across all linear combinations of SNPs, coupled with a model of the correlation between condition/tissue-specific responses. In addition, we use a hierarchical structure to leverage shared information across different genes, thus improving the detection of hotspots. We show the increase of power resulting from our new approach in an extensive simulation study. Our analysis of two case studies highlights new hotspots that would remain undetected by standard approaches and shows how greater prediction power can be achieved when several tissues are jointly considered. AVAILABILITY AND IMPLEMENTATION: C[Formula: see text] source code and documentation including compilation instructions are available under GNU licence at http://www.mrc-bsu.cam.ac.uk/software/.
Assuntos
Algoritmos , Teorema de Bayes , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Inflamação/genética , Doenças Inflamatórias Intestinais/genética , Locos de Características Quantitativas/genética , Software , Animais , Diabetes Mellitus Tipo 1/genética , Genômica/métodos , Humanos , Modelos Teóricos , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único/genética , Linguagens de Programação , Ratos , Distribuição TecidualRESUMO
Left ventricular mass (LVM) is a highly heritable trait and an independent risk factor for all-cause mortality. So far, genome-wide association studies have not identified the genetic factors that underlie LVM variation, and the regulatory mechanisms for blood-pressure-independent cardiac hypertrophy remain poorly understood. Unbiased systems genetics approaches in the rat now provide a powerful complementary tool to genome-wide association studies, and we applied integrative genomics to dissect a highly replicated, blood-pressure-independent LVM locus on rat chromosome 3p. Here we identified endonuclease G (Endog), which previously was implicated in apoptosis but not hypertrophy, as the gene at the locus, and we found a loss-of-function mutation in Endog that is associated with increased LVM and impaired cardiac function. Inhibition of Endog in cultured cardiomyocytes resulted in an increase in cell size and hypertrophic biomarkers in the absence of pro-hypertrophic stimulation. Genome-wide network analysis unexpectedly implicated ENDOG in fundamental mitochondrial processes that are unrelated to apoptosis. We showed direct regulation of ENDOG by ERR-α and PGC1α (which are master regulators of mitochondrial and cardiac function), interaction of ENDOG with the mitochondrial genome and ENDOG-mediated regulation of mitochondrial mass. At baseline, the Endog-deleted mouse heart had depleted mitochondria, mitochondrial dysfunction and elevated levels of reactive oxygen species, which were associated with enlarged and steatotic cardiomyocytes. Our study has further established the link between mitochondrial dysfunction, reactive oxygen species and heart disease and has uncovered a role for Endog in maladaptive cardiac hypertrophy.
Assuntos
Cardiomegalia/enzimologia , Cardiomegalia/patologia , Endodesoxirribonucleases/metabolismo , Mitocôndrias/metabolismo , Animais , Apoptose , Peso Corporal/genética , Cardiomegalia/genética , Cardiomegalia/fisiopatologia , Respiração Celular , Cromossomos de Mamíferos/genética , Cruzamentos Genéticos , Endodesoxirribonucleases/deficiência , Endodesoxirribonucleases/genética , Feminino , Regulação da Expressão Gênica , Genes Mitocondriais/genética , Hipertrofia Ventricular Esquerda/enzimologia , Hipertrofia Ventricular Esquerda/genética , Hipertrofia Ventricular Esquerda/patologia , Hipertrofia Ventricular Esquerda/fisiopatologia , Metabolismo dos Lipídeos , Masculino , Mitocôndrias/genética , Mitocôndrias/patologia , Tamanho do Órgão/genética , Coativador 1-alfa do Receptor gama Ativado por Proliferador de Peroxissomo , Locos de Características Quantitativas/genética , Proteínas de Ligação a RNA/metabolismo , Ratos , Ratos Endogâmicos , Espécies Reativas de Oxigênio/metabolismo , Receptores de Estrogênio/metabolismo , Fatores de Transcrição/metabolismo , Receptor ERRalfa Relacionado ao EstrogênioRESUMO
Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-expressed heat shock protein (Hsp) and cardiomyopathy genes (Bag3, Cryab, Kras, Emd, Plec), which was significantly replicated using separate failing heart and liver gene expression datasets in humans, thus revealing a conserved functional role for Hsp genes in cardiovascular disease.
Assuntos
Cardiomiopatias/genética , Redes Reguladoras de Genes , Genoma Humano , Transcrição Gênica , Algoritmos , Animais , Cardiomiopatias/patologia , Proteínas de Ciclo Celular/biossíntese , Proteínas de Ciclo Celular/genética , Expressão Gênica , Perfilação da Expressão Gênica , Humanos , Especificidade de Órgãos , Ratos , Transdução de Sinais/genéticaRESUMO
MOTIVATION: As the number of studies looking at differences between DNA methylation increases, there is a growing demand to develop and benchmark statistical methods to analyse these data. To date no objective approach for the comparison of these methods has been developed and as such it remains difficult to assess which analysis tool is most appropriate for a given experiment. As a result, there is an unmet need for a DNA methylation data simulator that can accurately reproduce a wide range of experimental setups, and can be routinely used to compare the performance of different statistical models. RESULTS: We have developed WGBSSuite, a flexible stochastic simulation tool that generates single-base resolution DNA methylation data genome-wide. Several simulator parameters can be derived directly from real datasets provided by the user in order to mimic real case scenarios. Thus, it is possible to choose the most appropriate statistical analysis tool for a given simulated design. To show the usefulness of our simulator, we also report a benchmark of commonly used methods for differential methylation analysis. AVAILABILITY AND IMPLEMENTATION: WGBS code and documentation are available under GNU licence at http://www.wgbssuite.org.uk/ CONTACT: : owen.rackham@imperial.ac.uk or l.bottolo@imperial.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Benchmarking , Simulação por Computador , Metilação de DNA , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Software , Sulfitos/química , Genoma Humano , Humanos , Processos EstocásticosRESUMO
Combined analyses of gene networks and DNA sequence variation can provide new insights into the aetiology of common diseases that may not be apparent from genome-wide association studies alone. Recent advances in rat genomics are facilitating systems-genetics approaches. Here we report the use of integrated genome-wide approaches across seven rat tissues to identify gene networks and the loci underlying their regulation. We defined an interferon regulatory factor 7 (IRF7)-driven inflammatory network (IDIN) enriched for viral response genes, which represents a molecular biomarker for macrophages and which was regulated in multiple tissues by a locus on rat chromosome 15q25. We show that Epstein-Barr virus induced gene 2 (Ebi2, also known as Gpr183), which lies at this locus and controls B lymphocyte migration, is expressed in macrophages and regulates the IDIN. The human orthologous locus on chromosome 13q32 controlled the human equivalent of the IDIN, which was conserved in monocytes. IDIN genes were more likely to associate with susceptibility to type 1 diabetes (T1D)-a macrophage-associated autoimmune disease-than randomly selected immune response genes (P = 8.85 × 10(-6)). The human locus controlling the IDIN was associated with the risk of T1D at single nucleotide polymorphism rs9585056 (P = 7.0 × 10(-10); odds ratio, 1.15), which was one of five single nucleotide polymorphisms in this region associated with EBI2 (GPR183) expression. These data implicate IRF7 network genes and their regulatory locus in the pathogenesis of T1D.
Assuntos
Diabetes Mellitus Tipo 1/genética , Loci Gênicos/genética , Predisposição Genética para Doença/genética , Imunidade Inata/genética , Vírus/imunologia , Animais , Sequência de Bases , Cromossomos Humanos Par 13/genética , Cromossomos de Mamíferos/genética , Diabetes Mellitus Tipo 1/imunologia , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla , Humanos , Inflamação/genética , Inflamação/imunologia , Fator Regulador 7 de Interferon/imunologia , Macrófagos/imunologia , Macrófagos/metabolismo , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Ratos , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismoRESUMO
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n â= â3,175), when compared with the largest published meta-GWAS (n > 100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.
Assuntos
Algoritmos , Evolução Biológica , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Teorema de Bayes , Exoma/genética , Expressão Gênica , Humanos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Technological advances in molecular biology over the past decade have given rise to high dimensional and complex datasets offering the possibility to investigate biological associations between a range of genomic features and complex phenotypes. The analysis of this novel type of data generated unprecedented computational challenges which ultimately led to the definition and implementation of computationally efficient statistical models that were able to scale to genome-wide data, including Bayesian variable selection approaches. While extensive methodological work has been carried out in this area, only few methods capable of handling hundreds of thousands of predictors were implemented and distributed. Among these we recently proposed GUESS, a computationally optimised algorithm making use of graphics processing unit capabilities, which can accommodate multiple outcomes. In this paper we propose R2GUESS, an R package wrapping the original C++ source code. In addition to providing a user-friendly interface of the original code automating its parametrisation, and data handling, R2GUESS also incorporates many features to explore the data, to extend statistical inferences from the native algorithm (e.g., effect size estimation, significance assessment), and to visualize outputs from the algorithm. We first detail the model and its parametrisation, and describe in details its optimised implementation. Based on two examples we finally illustrate its statistical performances and flexibility.
RESUMO
Variable number tandem repeats (VNTRs) constitute a relatively under-examined class of genomic variants in the context of complex disease because of their sequence complexity and the challenges in assaying them. Recent large-scale genome-wide copy number variant mapping and association efforts have highlighted the need for improved methodology for association studies using these complex polymorphisms. Here we describe the in-depth investigation of a complex region on chromosome 8p21.2 encompassing the dedicator of cytokinesis 5 (DOCK5) gene. The region includes two VNTRs of complex sequence composition which flank a common 3975 bp deletion, all three of which were genotyped by polymerase chain reaction and fragment analysis in a total of 2744 subjects. We have developed a novel VNTR association method named VNTRtest, suitable for association analysis of multi-allelic loci with binary and quantitative outcomes, and have used this approach to show significant association of the DOCK5 VNTRs with childhood and adult severe obesity (P(empirical)= 8.9 × 10(-8) and P= 3.1 × 10(-3), respectively) which we estimate explains ~0.8% of the phenotypic variance. We also identified an independent association between the 3975 base pair (bp) deletion and obesity, explaining a further 0.46% of the variance (P(combined)= 1.6 × 10(-3)). Evidence for association between DOCK5 transcript levels and the 3975 bp deletion (P= 0.027) and both VNTRs (P(empirical)= 0.015) was also identified in adipose tissue from a Swedish family sample, providing support for a functional effect of the DOCK5 deletion and VNTRs. These findings highlight the potential role of DOCK5 in human obesity and illustrate a novel approach for analysis of the contribution of VNTRs to disease susceptibility through association studies.
Assuntos
Fatores de Troca do Nucleotídeo Guanina/genética , Repetições Minissatélites , Obesidade Mórbida/genética , Tecido Adiposo/fisiologia , Adulto , Estudos de Casos e Controles , Criança , Cromossomos Humanos Par 8 , Estudos de Coortes , Gorduras na Dieta , Regulação da Expressão Gênica , Predisposição Genética para Doença , Humanos , Deleção de SequênciaRESUMO
BACKGROUND: Imprinting disorders (ImpDis) comprise diseases which are caused by aberrant regulation of monoallelically and parent-of-origin-dependent expressed genes. A characteristic molecular change in ImpDis patients is aberrant methylation signatures at disease-specific loci, without an obvious DNA change at the specific differentially methylated region (DMR). However, there is a growing number of reports on multilocus imprinting disturbances (MLIDs), i.e. aberrant methylation at different DMRs in the same patient. These MLIDs account for a significant number of patients with specific ImpDis, and several reports indicate a central role of pathogenic maternal effect variants in their aetiology by affecting the maturation of the oocyte and the early embryo. Though several studies on the prevalence and the molecular causes of MLID have been conducted, homogeneous datasets comprising both genomic and methylation data are still lacking. RESULTS: Based on a cohort of 36 MLID patients, we here present both methylation data obtained from next-generation sequencing (NGS, ImprintSeq) approaches and whole-exome sequencing (WES). The compilation of methylation data did not reveal a disease-specific MLID episignature, and a predisposition for the phenotypic modification was not obvious as well. In fact, this lack of epigenotype-phenotype correlation might be related to the mosaic distribution of imprinting defects and their functional relevance in specific tissues. CONCLUSIONS: Due to the higher sensitivity of NGS-based approaches, we suggest that ImprintSeq might be offered at reference centres in case of ImpDis patients with unusual phenotypes but MLID negative by conventional tests. By WES, additional MLID causes than the already known maternal effect variants could not be identified, neither in the patients nor in the maternal exomes. In cases with negative WES results, it is currently unclear to what extent either environmental factors or undetected genetic variants contribute to MLID.
Assuntos
Metilação de DNA , Genômica , Genótipo , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
SUMMARY: ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the 'large p, small n' case. In the current version, ESS++ can handle several hundred observations, thousands of predictors and a few responses simultaneously. The core engine of ESS++ for the selection of relevant predictors is based on Evolutionary Monte Carlo. Our implementation is open source, allowing community-based alterations and improvements. AVAILABILITY: C++ source code and documentation including compilation instructions are available under GNU licence at http://bgx.org.uk/software/ESS.html.