RESUMO
Integrative genetic association methods have shown great promise in post-GWAS (genome-wide association study) analyses, in which one of the most challenging tasks is identifying putative causal genes and uncovering molecular mechanisms of complex traits. Recent studies suggest that prevailing computational approaches, including transcriptome-wide association studies (TWASs) and colocalization analysis, are individually imperfect, but their joint usage can yield robust and powerful inference results. This paper presents INTACT, a computational framework to integrate probabilistic evidence from these distinct types of analyses and implicate putative causal genes. This procedure is flexible and can work with a wide range of existing integrative analysis approaches. It has the unique ability to quantify the uncertainty of implicated genes, enabling rigorous control of false-positive discoveries. Taking advantage of this highly desirable feature, we further propose an efficient algorithm, INTACT-GSE, for gene set enrichment analysis based on the integrated probabilistic evidence. We examine the proposed computational methods and illustrate their improved performance over the existing approaches through simulation studies. We apply the proposed methods to analyze the multi-tissue eQTL data from the GTEx project and eight large-scale complex- and molecular-trait GWAS datasets from multiple consortia and the UK Biobank. Overall, we find that the proposed methods markedly improve the existing putative gene implication methods and are particularly advantageous in evaluating and identifying key gene sets and biological pathways underlying complex traits.
Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Humanos , Transcriptoma/genética , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial/genética , Locos de Características Quantitativas/genética , Simulação por Computador , Polimorfismo de Nucleotídeo Único/genética , Predisposição Genética para DoençaRESUMO
Many statistical genetics analysis methods make use of GWAS summary statistics. Best statistical practice requires evaluating these methods in realistic simulation experiments. However, simulating summary statistics by first simulating individual genotype and phenotype data is extremely computationally demanding. This high cost may force researchers to conduct overly simplistic simulations that fail to accurately measure method performance. Alternatively, summary statistics can be simulated directly from their theoretical distribution. Although this is a common need among statistical genetics researchers, no software packages exist for comprehensive GWAS summary statistic simulation. We present GWASBrewer, an open source R package for direct simulation of GWAS summary statistics. We show that statistics simulated by GWASBrewer have the same distribution as statistics generated from individual level data, and can be produced at a fraction of the computational expense. Additionally, GWASBrewer can simulate standard error estimates, something that is typically not done when sampling summary statistics directly. GWASBrewer is highly flexible, allowing the user to simulate data for multiple traits connected by causal effects and with complex distributions of effect sizes. We demonstrate example uses of GWASBrewer for evaluating Mendelian randomization, polygenic risk score, and heritability estimation methods.
RESUMO
Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.
Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Bilirrubina , Carnitina , Glicerofosfolipídeos , Humanos , Masculino , Metabolômica , Locos de Características Quantitativas/genética , Membro 5 da Família 22 de Carreadores de Soluto/genética , Transcriptoma/genéticaRESUMO
Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWASs) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is challenging, however, because the functional significance of non-coding mutations is difficult to predict. We propose a statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, to learn from data which annotations are informative of pathogenic mutations, and to combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of â¼300 autism-affected family trios across five studies and discovered several autism risk genes. The software is freely available for all research uses.
Assuntos
Mapeamento Cromossômico , Predisposição Genética para Doença , Mutação/genética , Estatística como Assunto , Sequenciamento Completo do Genoma , Transtorno Autístico/genética , Calibragem , Elementos Facilitadores Genéticos/genética , Humanos , Anotação de Sequência Molecular , Taxa de Mutação , Splicing de RNA/genética , Fatores de Risco , Sequenciamento do ExomaRESUMO
Prior GWAS have identified loci associated with red blood cell (RBC) traits in populations of European, African, and Asian ancestry. These studies have not included individuals with an Amerindian ancestral background, such as Hispanics/Latinos, nor evaluated the full spectrum of genomic variation beyond single nucleotide variants. Using a custom genotyping array enriched for Amerindian ancestral content and 1000 Genomes imputation, we performed GWAS in 12,502 participants of Hispanic Community Health Study and Study of Latinos (HCHS/SOL) for hematocrit, hemoglobin, RBC count, RBC distribution width (RDW), and RBC indices. Approximately 60% of previously reported RBC trait loci generalized to HCHS/SOL Hispanics/Latinos, including African ancestral alpha- and beta-globin gene variants. In addition to the known 3.8kb alpha-globin copy number variant, we identified an Amerindian ancestral association in an alpha-globin regulatory region on chromosome 16p13.3 for mean corpuscular volume and mean corpuscular hemoglobin. We also discovered and replicated three genome-wide significant variants in previously unreported loci for RDW (SLC12A2 rs17764730, PSMB5 rs941718), and hematocrit (PROX1 rs3754140). Among the proxy variants at the SLC12A2 locus we identified rs3812049, located in a bi-directional promoter between SLC12A2 (which encodes a red cell membrane ion-transport protein) and an upstream anti-sense long-noncoding RNA, LINC01184, as the likely causal variant. We further demonstrate that disruption of the regulatory element harboring rs3812049 affects transcription of SLC12A2 and LINC01184 in human erythroid progenitor cells. Together, these results reinforce the importance of genetic study of diverse ancestral populations, in particular Hispanics/Latinos.
Assuntos
Proteínas de Homeodomínio/genética , Complexo de Endopeptidases do Proteassoma/genética , RNA Longo não Codificante/genética , Membro 2 da Família 12 de Carreador de Soluto/genética , Proteínas Supressoras de Tumor/genética , alfa-Globinas/genética , Contagem de Eritrócitos , Eritrócitos , Feminino , Estudo de Associação Genômica Ampla , Hemoglobinas/genética , Hispânico ou Latino/genética , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Globinas beta/genéticaRESUMO
Circulating white blood cell (WBC) counts (neutrophils, monocytes, lymphocytes, eosinophils, basophils) differ by ethnicity. The genetic factors underlying basal WBC traits in Hispanics/Latinos are unknown. We performed a genome-wide association study of total WBC and differential counts in a large, ethnically diverse US population sample of Hispanics/Latinos ascertained by the Hispanic Community Health Study and Study of Latinos (HCHS/SOL). We demonstrate that several previously known WBC-associated genetic loci (e.g. the African Duffy antigen receptor for chemokines null variant for neutrophil count) are generalizable to WBC traits in Hispanics/Latinos. We identified and replicated common and rare germ-line variants at FLT3 (a gene often somatically mutated in leukemia) associated with monocyte count. The common FLT3 variant rs76428106 has a large allele frequency differential between African and non-African populations. We also identified several novel genetic loci involving or regulating hematopoietic transcription factors (CEBPE-SLC7A7, CEBPA and CRBN-TRNT1) associated with basophil count. The minor allele of the CEBPE variant associated with lower basophil count has been previously associated with Amerindian ancestry and higher risk of acute lymphoblastic leukemia in Hispanics. Together, these data suggest that germline genetic variation affecting transcriptional and signaling pathways that underlie WBC development and lineage specification can contribute to inter-individual as well as ethnic differences in peripheral blood cell counts (normal hematopoiesis) in addition to susceptibility to leukemia (malignant hematopoiesis).
Assuntos
Proteínas Estimuladoras de Ligação a CCAAT/genética , Estudo de Associação Genômica Ampla , Contagem de Leucócitos , Tirosina Quinase 3 Semelhante a fms/genética , Negro ou Afro-Americano/genética , Basófilos/citologia , Feminino , Frequência do Gene , Hispânico ou Latino/genética , Humanos , Linfócitos/citologia , Masculino , Monócitos/citologia , Neutrófilos/citologia , Estados Unidos/epidemiologia , População Branca/genéticaRESUMO
Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10(-28)) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits.
Assuntos
Estudos de Associação Genética/métodos , Loci Gênicos , Hispânico ou Latino/genética , Contagem de Plaquetas , Actinina/genética , Adolescente , Adulto , Idoso , Alelos , Frequência do Gene , Genótipo , Técnicas de Genotipagem , Humanos , Fatores de Transcrição MEF2/genética , Proteínas de Membrana/genética , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único , Receptores de GABA-B/genética , Adulto JovemRESUMO
RATIONALE: Lung function and chronic obstructive pulmonary disease (COPD) are heritable traits. Genome-wide association studies (GWAS) have identified numerous pulmonary function and COPD loci, primarily in cohorts of European ancestry. OBJECTIVES: Perform a GWAS of COPD phenotypes in Hispanic/Latino populations to identify loci not previously detected in European populations. METHODS: GWAS of lung function and COPD in Hispanic/Latino participants from a population-based cohort. We performed replication studies of novel loci in independent studies. MEASUREMENTS AND MAIN RESULTS: Among 11,822 Hispanic/Latino participants, we identified eight novel signals; three replicated in independent populations of European Ancestry. A novel locus for FEV1 in ZSWIM7 (rs4791658; P = 4.99 × 10-9) replicated. A rare variant (minor allele frequency = 0.002) in HAL (rs145174011) was associated with FEV1/FVC (P = 9.59 × 10-9) in a region previously identified for COPD-related phenotypes; it remained significant in conditional analyses but did not replicate. Admixture mapping identified a novel region, with a variant in AGMO (rs41331850), associated with Amerindian ancestry and FEV1, which replicated. A novel locus for FEV1 identified among ever smokers (rs291231; P = 1.92 × 10-8) approached statistical significance for replication in admixed populations of African ancestry, and a novel SNP for COPD in PDZD2 (rs7709630; P = 1.56 × 10-8) regionally replicated. In addition, loci previously identified for lung function in European samples were associated in Hispanic/Latino participants in the Hispanic Community Health Study/Study of Latinos at the genome-wide significance level. CONCLUSIONS: We identified novel signals for lung function and COPD in a Hispanic/Latino cohort. Including admixed populations when performing genetic studies may identify variants contributing to genetic etiologies of COPD.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Hispânico ou Latino/genética , Doença Pulmonar Obstrutiva Crônica/genética , População Branca/genética , Adolescente , Adulto , Idoso , Estudos de Coortes , Europa (Continente) , Feminino , Frequência do Gene , Loci Gênicos , Humanos , Masculino , Pessoa de Meia-Idade , Testes de Função Respiratória , Estados Unidos , Adulto JovemRESUMO
Dental caries is the most common chronic disease worldwide, and exhibits profound disparities in the USA with racial and ethnic minorities experiencing disproportionate disease burden. Though heritable, the specific genes influencing risk of dental caries remain largely unknown. Therefore, we performed genome-wide association scans (GWASs) for dental caries in a population-based cohort of 12 000 Hispanic/Latino participants aged 18-74 years from the HCHS/SOL. Intra-oral examinations were used to generate two common indices of dental caries experience which were tested for association with 27.7 M genotyped or imputed single-nucleotide polymorphisms separately in the six ancestry groups. A mixed-models approach was used, which adjusted for age, sex, recruitment site, five principal components of ancestry and additional features of the sampling design. Meta-analyses were used to combine GWAS results across ancestry groups. Heritability estimates ranged from 20-53% in the six ancestry groups. The most significant association observed via meta-analysis for both phenotypes was in the region of the NAMPT gene (rs190395159; P-value = 6 × 10(-10)), which is involved in many biological processes including periodontal healing. Another significant association was observed for rs72626594 (P-value = 3 × 10(-8)) downstream of BMP7, a tooth development gene. Other associations were observed in genes lacking known or plausible roles in dental caries. In conclusion, this was the largest GWAS of dental caries, to date and was the first to target Hispanic/Latino populations. Understanding the factors influencing dental caries susceptibility may lead to improvements in prediction, prevention and disease management, which may ultimately reduce the disparities in oral health across racial, ethnic and socioeconomic strata.
Assuntos
Cárie Dentária/etnologia , Cárie Dentária/genética , Hispânico ou Latino/genética , Adulto , Idoso , Centros Comunitários de Saúde , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Genomic phenotypes, such as DNA methylation and chromatin accessibility, can be used to characterize the transcriptional and regulatory activity of DNA within a cell. Recent technological advances have made it possible to measure such phenotypes very densely. This density often results in spatial structure, in the sense that measurements at nearby sites are very similar. In this article, we consider the task of comparing genomic phenotypes across experimental conditions, cell types, or disease subgroups. We propose a new method, Joint Adaptive Differential Estimation (JADE), which leverages the spatial structure inherent to genomic phenotypes. JADE simultaneously estimates smooth underlying group average genomic phenotype profiles and detects regions in which the average profile differs between groups. We evaluate JADE's performance in several biologically plausible simulation settings. We also consider an application to the detection of regions with differential methylation between mature skeletal muscle cells, myotubes, and myoblasts.
Assuntos
Metilação de DNA/genética , Genoma/genética , Modelos Genéticos , Modelos Estatísticos , Fenótipo , Humanos , Fibras Musculares Esqueléticas/metabolismo , Mioblastos Esqueléticos/metabolismoRESUMO
It has been shown that EPA Method 3060A does not adequately extract Cr(VI) from chromium ore processing residue (COPR). We modified various parameters of EPA 3060A toward understanding the transformation of COPR minerals in the alkaline extraction and improving extraction of Cr(VI) from NIST SRM 2701, a standard COPR-contaminated soil. Aluminum and Si were the major elements dissolved from NIST 2701, and their concentrations in solution were correlated with Cr(VI). The extraction fluid leached additional Al and Si from the method-prescribed borosilicate glass vessels which appeared to suppress the release of Cr(VI). Use of polytetrafluoroethylene vessels and intensive grinding of NIST 2701 increased the amount of Cr(VI) extracted. These modifications, combined with an increased extraction fluid to sample ratio of ≥900 mL g-1 and 48-h extraction time resulted in a maximum release of 1274 ± 7 mg kg-1 Cr(VI). This is greater than the NIST 2701 certified value of 551 ± 35 mg kg-1 but less than 3050 mg kg-1 Cr(VI) previously estimated by X-ray absorption near edge structure spectroscopy. Some of the increased Cr(VI) may have resulted from oxidation of Cr(III) released from brownmillerite which rapidly transformed during the extractions. Layered-double hydroxides remained stable during extractions and represent a potential residence for unextracted Cr(VI).
Assuntos
Cromo , Resíduos Industriais , Poluentes do Solo , Solo , Espectroscopia por Absorção de Raios XRESUMO
Newborns characterized as large and small for gestational age are at risk for increased mortality and morbidity during the first year of life as well as for obesity and dysglycemia as children and adults. The intrauterine environment and fetal genes contribute to the fetal size at birth. To define the genetic architecture underlying the newborn size, we performed a genome-wide association study (GWAS) in 4281 newborns in four ethnic groups from the Hyperglycemia and Adverse Pregnancy Outcome Study. We tested for association with newborn anthropometric traits (birth length, head circumference, birth weight, percent fat mass and sum of skinfolds) and newborn metabolic traits (cord glucose and C-peptide) under three models. Model 1 adjusted for field center, ancestry, neonatal gender, gestational age at delivery, parity, maternal age at oral glucose tolerance test (OGTT); Model 2 adjusted for Model 1 covariates, maternal body mass index (BMI) at OGTT, maternal height at OGTT, maternal mean arterial pressure at OGTT, maternal smoking and drinking; Model 3 adjusted for Model 2 covariates, maternal glucose and C-peptide at OGTT. Strong evidence for association was observed with measures of newborn adiposity (sum of skinfolds model 3 Z-score 7.356, P = 1.90×10⻹³, and to a lesser degree fat mass and birth weight) and a region on Chr3q25.31 mapping between CCNL and LEKR1. These findings were replicated in an independent cohort of 2296 newborns. This region has previously been shown to be associated with birth weight in Europeans. The current study suggests that association of this locus with birth weight is secondary to an effect on fat as opposed to lean body mass.
Assuntos
Adiposidade/genética , Peso ao Nascer/genética , Cromossomos Humanos Par 3/genética , Ciclinas/genética , Etnicidade/genética , Proteínas Secretadas Inibidoras de Proteinases/genética , Grupos Raciais/genética , Povo Asiático/genética , População Negra/genética , Índice de Massa Corporal , Região do Caribe , Estudos de Coortes , Feminino , Estudo de Associação Genômica Ampla , Humanos , Recém-Nascido , Modelos Lineares , Masculino , Americanos Mexicanos/genética , Gravidez , Inibidor de Serinopeptidase do Tipo Kazal 5 , Tailândia , População Branca/genéticaRESUMO
The proportion of the genome that is shared identical by descent (IBD) between pairs of individuals is often estimated in studies involving genome-wide SNP data. These estimates can be used to check pedigrees, estimate heritability, and adjust association analyses. We focus on the method of moments technique as implemented in PLINK [Purcell et al., 2007] and other software that estimates the proportions of the genome at which two individuals share 0, 1, or 2 alleles IBD. This technique is based on the assumption that the study sample is drawn from a single, homogeneous, randomly mating population. This assumption is violated if pedigree founders are drawn from multiple populations or include admixed individuals. In the presence of population structure, the method of moments estimator has an inflated variance and can be biased because it relies on sample-based allele frequency estimates. In the case of the PLINK estimator, which truncates genome-wide sharing estimates at zero and one to generate biologically interpretable results, the bias is most often towards over-estimation of relatedness between ancestrally similar individuals. Using simulated pedigrees, we are able to demonstrate and quantify the behavior of the PLINK method of moments estimator under different population structure conditions. We also propose a simple method based on SNP pruning for improving genome-wide IBD estimates when the assumption of a single, homogeneous population is violated.
Assuntos
Genética Populacional , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Simulação por Computador , Estudo de Associação Genômica Ampla , Humanos , América Latina , LinhagemRESUMO
Causal gene discovery methods are often evaluated using reference sets of causal genes, which are treated as gold standards (GS) for the purposes of evaluation. However, evaluation methods typically treat genes not in the GS positive set as known negatives rather than unknowns. This leads to inaccurate estimates of sensitivity, specificity, and AUC. Labeling biases in GS gene sets can also lead to inaccurate ordering of alternative causal gene discovery methods. We argue that the evaluation of causal gene discovery methods should rely on statistical techniques like those used for variant discovery rather than on comparison with GS gene sets.
Assuntos
Padrões de Referência , Humanos , Bases de Dados GenéticasRESUMO
Using administrative patient-care data such as Electronic Health Records (EHR) and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real-world data. We compare these methods using a data example where our goal is to estimate the well-known association of cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.
RESUMO
Rationale: Intermediate care (also termed "step-down" or "moderate care") has been proposed as a lower cost alternative to care for patients who may not clearly benefit from intensive care unit admission. Intermediate care units may be appealing to hospitals in financial crisis, including those in rural areas. Outcomes of patients receiving intermediate care are not widely described. Objectives: To examine relationships among rurality, location of care, and mortality for mechanically ventilated patients. Methods: Medicare beneficiaries aged 65 years and older who received invasive mechanical ventilation between 2010 and 2019 were included. Multivariable logistic regression was used to estimate the association between admission to a rural or an urban hospital and 30-day mortality, with separate analyses for patients in general, intermediate, and intensive care. Models were adjusted for age, sex, area deprivation index, primary diagnosis, severity of illness, year, comorbidities, and hospital volume. Results: There were 2,752,492 hospitalizations for patients receiving mechanical ventilation from 2010 to 2019, and 193,745 patients (7.0%) were in rural hospitals. The proportion of patients in rural intermediate care increased from 4.1% in 2010 to 6.3% in 2019. Patient admissions to urban hospitals remained relatively stable. Patients in rural and urban intensive care units had similar adjusted 30-day mortality, at 46.7% (adjusted absolute risk difference -0.1% [95% confidence interval, -0.7% to 0.6%]; P = 0.88). However, adjusted 30-day mortality for patients in rural intermediate care was significantly higher (36.9%) than for patients in urban intermediate care (31.3%) (adjusted absolute risk difference 5.6% [95% confidence interval, 3.7% to 7.6%]; P < 0.001). Conclusions: Hospitalization in rural intermediate care was associated with increased mortality. There is a need to better understand how intermediate care is used across hospitals and to carefully evaluate the types of patients admitted to intermediate care units.
Assuntos
Unidades de Terapia Intensiva , Medicare , Respiração Artificial , Humanos , Feminino , Masculino , Idoso , Respiração Artificial/estatística & dados numéricos , Estados Unidos/epidemiologia , Idoso de 80 Anos ou mais , Medicare/estatística & dados numéricos , Unidades de Terapia Intensiva/estatística & dados numéricos , Mortalidade Hospitalar/tendências , Hospitais Urbanos/estatística & dados numéricos , Hospitais Rurais/estatística & dados numéricos , Cuidados Críticos/estatística & dados numéricos , Estudos Retrospectivos , População Rural/estatística & dados numéricos , Modelos Logísticos , Instituições para Cuidados Intermediários/estatística & dados numéricosRESUMO
Metabolites are small molecules that are useful for estimating disease risk and elucidating disease biology. Nevertheless, their causal effects on human diseases have not been evaluated comprehensively. We performed two-sample Mendelian randomization to systematically infer the causal effects of 1,099 plasma metabolites measured in 6,136 Finnish men from the METSIM study on risk of 2,099 binary disease endpoints measured in 309,154 Finnish individuals from FinnGen. We identified evidence for 282 causal effects of 70 metabolites on 183 disease endpoints (FDR<1%). We found 25 metabolites with potential causal effects across multiple disease domains, including ascorbic acid 2-sulfate affecting 26 disease endpoints in 12 disease domains. Our study suggests that N-acetyl-2-aminooctanoate and glycocholenate sulfate affect risk of atrial fibrillation through two distinct metabolic pathways and that N-methylpipecolate may mediate the causal effect of N6, N6-dimethyllysine on anxious personality disorder. This study highlights the broad causal impact of plasma metabolites and widespread metabolic connections across diseases.
RESUMO
Mendelian randomization (MR) is a term that applies to the use of genetic variation to address causal questions about how modifiable exposures influence different outcomes. The principles of MR are based on Mendel's laws of inheritance and instrumental variable estimation methods, which enable the inference of causal effects in the presence of unobserved confounding. In this Primer, we outline the principles of MR, the instrumental variable conditions underlying MR estimation and some of the methods used for estimation. We go on to discuss how the assumptions underlying an MR study can be assessed and give methods of estimation that are robust to certain violations of these assumptions. We give examples of a range of studies in which MR has been applied, the limitations of current methods of analysis and the outlook for MR in the future. The difference between the assumptions required for MR analysis and other forms of non-interventional epidemiological studies means that MR can be used as part of a triangulation across multiple sources of evidence for causal inference.
RESUMO
Few studies have explored the impact of rare variants (minor allele frequency < 1%) on highly heritable plasma metabolites identified in metabolomic screens. The Finnish population provides an ideal opportunity for such explorations, given the multiple bottlenecks and expansions that have shaped its history, and the enrichment for many otherwise rare alleles that has resulted. Here, we report genetic associations for 1391 plasma metabolites in 6136 men from the late-settlement region of Finland. We identify 303 novel association signals, more than one third at variants rare or enriched in Finns. Many of these signals identify genes not previously implicated in metabolite genome-wide association studies and suggest mechanisms for diseases and disease-related traits.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Alelos , Finlândia , Frequência do Gene , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Humanos , Masculino , FenótipoRESUMO
Mendelian randomization (MR) is a valuable tool for detecting causal effects by using genetic variant associations. Opportunities to apply MR are growing rapidly with the increasing number of genome-wide association studies (GWAS). However, existing MR methods rely on strong assumptions that are often violated, leading to false positives. Correlated horizontal pleiotropy, which arises when variants affect both traits through a heritable shared factor, remains a particularly challenging problem. We propose a new MR method, Causal Analysis Using Summary Effect estimates (CAUSE), that accounts for correlated and uncorrelated horizontal pleiotropic effects. We demonstrate, in simulations, that CAUSE avoids more false positives induced by correlated horizontal pleiotropy than other methods. Applied to traits studied in recent GWAS studies, we find that CAUSE detects causal relationships that have strong literature support and avoids identifying most unlikely relationships. Our results suggest that shared heritable factors are common and may lead to many false positives using alternative methods.