Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Nature ; 590(7845): 290-299, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33568819

RESUMO

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Genômica , National Heart, Lung, and Blood Institute (U.S.) , Medicina de Precisão , Citocromo P-450 CYP2D6/genética , Haplótipos/genética , Heterozigoto , Humanos , Mutação INDEL , Mutação com Perda de Função , Mutagênese , Fenótipo , Polimorfismo de Nucleotídeo Único , Densidade Demográfica , Medicina de Precisão/normas , Controle de Qualidade , Tamanho da Amostra , Estados Unidos , Sequenciamento Completo do Genoma/normas
2.
Am J Hum Genet ; 110(9): 1522-1533, 2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37607538

RESUMO

Population-scale biobanks linked to electronic health record data provide vast opportunities to extend our knowledge of human genetics and discover new phenotype-genotype associations. Given their dense phenotype data, biobanks can also facilitate replication studies on a phenome-wide scale. Here, we introduce the phenotype-genotype reference map (PGRM), a set of 5,879 genetic associations from 523 GWAS publications that can be used for high-throughput replication experiments. PGRM phenotypes are standardized as phecodes, ensuring interoperability between biobanks. We applied the PGRM to five ancestry-specific cohorts from four independent biobanks and found evidence of robust replications across a wide array of phenotypes. We show how the PGRM can be used to detect data corruption and to empirically assess parameters for phenome-wide studies. Finally, we use the PGRM to explore factors associated with replicability of GWAS results.


Assuntos
Bancos de Espécimes Biológicos , Ciência de Dados , Humanos , Fenômica , Fenótipo , Genótipo
3.
Nucleic Acids Res ; 52(W1): W70-W77, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38709879

RESUMO

Polygenic scores (PGS) enable the prediction of genetic predisposition for a wide range of traits and diseases by calculating the weighted sum of allele dosages for genetic variants associated with the trait or disease in question. Present approaches for calculating PGS from genotypes are often inefficient and labor-intensive, limiting transferability into clinical applications. Here, we present 'Imputation Server PGS', an extension of the Michigan Imputation Server designed to automate a standardized calculation of polygenic scores based on imputed genotypes. This extends the widely used Michigan Imputation Server with new functionality, bringing the simplicity and efficiency of modern imputation to the PGS field. The service currently supports over 4489 published polygenic scores from publicly available repositories and provides extensive quality control, including ancestry estimation to report population stratification. An interactive report empowers users to screen and compare thousands of scores in a fast and intuitive way. Imputation Server PGS provides a user-friendly web service, facilitating the application of polygenic scores to a wide range of genetic studies and is freely available at https://imputationserver.sph.umich.edu.


Assuntos
Predisposição Genética para Doença , Herança Multifatorial , Software , Herança Multifatorial/genética , Humanos , Internet , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Genótipo , Alelos , Estratificação de Risco Genético
4.
Am J Hum Genet ; 109(6): 1007-1015, 2022 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-35508176

RESUMO

Genotype imputation is an integral tool in genome-wide association studies, in which it facilitates meta-analysis, increases power, and enables fine-mapping. With the increasing availability of whole-genome-sequence datasets, investigators have access to a multitude of reference-panel choices for genotype imputation. In principle, combining all sequenced whole genomes into a single large panel would provide the best imputation performance, but this is often cumbersome or impossible due to privacy restrictions. Here, we describe meta-imputation, a method that allows imputation results generated using different reference panels to be combined into a consensus imputed dataset. Our meta-imputation method requires small changes to the output of existing imputation tools to produce necessary inputs, which are then combined using dynamically estimated weights that are tailored to each individual and genome segment. In the scenarios we examined, the method consistently outperforms imputation using a single reference panel and achieves accuracy comparable to imputation using a combined reference panel.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Genoma , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Projetos de Pesquisa
5.
Am J Hum Genet ; 109(9): 1653-1666, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-35981533

RESUMO

Understanding the genetic basis of human diseases and traits is dependent on the identification and accurate genotyping of genetic variants. Deep whole-genome sequencing (WGS), the gold standard technology for SNP and indel identification and genotyping, remains very expensive for most large studies. Here, we quantify the extent to which array genotyping followed by genotype imputation can approximate WGS in studies of individuals of African, Hispanic/Latino, and European ancestry in the US and of Finnish ancestry in Finland (a population isolate). For each study, we performed genotype imputation by using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. Using the Omni 2.5M array and the TOPMed panel, ≥90% of bi-allelic single-nucleotide variants (SNVs) are well imputed (r2 > 0.8) down to minor-allele frequencies (MAFs) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. There was little difference in TOPMed-based imputation quality among the arrays with >700k variants. Individual-level imputation quality varied widely between and within the three US studies. Imputation quality also varied across genomic regions, producing regions where even common (MAF > 5%) variants were consistently not well imputed across ancestries. The extent to which array genotyping and imputation can approximate WGS therefore depends on reference panel, genotype array, sample ancestry, and genomic location. Imputation quality by variant or genomic region can be queried with our new tool, RsqBrowser, now deployed on the Michigan Imputation Server.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Frequência do Gene/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento Completo do Genoma
6.
Hum Mol Genet ; 30(21): 2027-2039, 2021 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-33961016

RESUMO

Circulating cardiac troponin proteins are associated with structural heart disease and predict incident cardiovascular disease in the general population. However, the genetic contribution to cardiac troponin I (cTnI) concentrations and its causal effect on cardiovascular phenotypes are unclear. We combine data from two large population-based studies, the Trøndelag Health Study and the Generation Scotland Scottish Family Health Study, and perform a genome-wide association study of high-sensitivity cTnI concentrations with 48 115 individuals. We further use two-sample Mendelian randomization to investigate the causal effects of circulating cTnI on acute myocardial infarction (AMI) and heart failure (HF). We identified 12 genetic loci (8 novel) associated with cTnI concentrations. Associated protein-altering variants highlighted putative functional genes: CAND2, HABP2, ANO5, APOH, FHOD3, TNFAIP2, KLKB1 and LMAN1. Phenome-wide association tests in 1688 phecodes and 83 continuous traits in UK Biobank showed associations between a genetic risk score for cTnI and cardiac arrhythmias, metabolic and anthropometric measures. Using two-sample Mendelian randomization, we confirmed the non-causal role of cTnI in AMI (5948 cases, 355 246 controls). We found indications for a causal role of cTnI in HF (47 309 cases and 930 014 controls), but this was not supported by secondary analyses using left ventricular mass as outcome (18 257 individuals). Our findings clarify the biology underlying the heritable contribution to circulating cTnI and support cTnI as a non-causal biomarker for AMI in the general population. Using genetically informed methods for causal inference helps inform the role and value of measuring cTnI in the general population.


Assuntos
Biomarcadores , Genética Populacional , Estudo de Associação Genômica Ampla , Troponina I/genética , Alelos , Mapeamento Cromossômico , Expressão Gênica , Variação Genética , Análise da Randomização Mendeliana , Especificidade de Órgãos , Locos de Características Quantitativas , Troponina T/genética
7.
Bioinformatics ; 37(22): 4248-4250, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-33989384

RESUMO

SUMMARY: The sparse allele vectors file format is an efficient storage format for large-scale DNA variation data and is designed for high throughput association analysis by leveraging techniques for fast deserialization of data into computer memory. A command line interface has been developed to complement the storage format and supports basic features like importing, exporting and subsetting. Additionally, a C++ programming API is available allowing for easy integration into analysis software. AVAILABILITY AND IMPLEMENTATION: https://github.com/statgen/savvy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Alelos
8.
PLoS Genet ; 15(6): e1008202, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31194742

RESUMO

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.


Assuntos
Predisposição Genética para Doença , Genômica , Herança Multifatorial/genética , Neoplasias Cutâneas/genética , Bancos de Espécimes Biológicos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Michigan/epidemiologia , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Neoplasias Cutâneas/patologia , Reino Unido/epidemiologia
9.
Genome Biol ; 24(1): 31, 2023 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-36810122

RESUMO

The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 protein-coding genes, including 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together with a modified version of the GRCh38 reference genome that improves the subsequent analysis across these genes within minutes for an existing alignment file while maintaining the same coordinates. We showcase these improvements over multi-ethnic control samples, demonstrating improvements for population variant calling as well as eQTL studies.


Assuntos
Genoma Humano , Genômica , Humanos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
10.
Nat Protoc ; 18(9): 2625-2641, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37495751

RESUMO

The human leukocyte antigen (HLA) locus is associated with more complex diseases than any other locus in the human genome. In many diseases, HLA explains more heritability than all other known loci combined. In silico HLA imputation methods enable rapid and accurate estimation of HLA alleles in the millions of individuals that are already genotyped on microarrays. HLA imputation has been used to define causal variation in autoimmune diseases, such as type I diabetes, and in human immunodeficiency virus infection control. However, there are few guidelines on performing HLA imputation, association testing, and fine mapping. Here, we present a comprehensive tutorial to impute HLA alleles from genotype data. We provide detailed guidance on performing standard quality control measures for input genotyping data and describe options to impute HLA alleles and amino acids either locally or using the web-based Michigan Imputation Server, which hosts a multi-ancestry HLA imputation reference panel. We also offer best practice recommendations to conduct association tests to define the alleles, amino acids, and haplotypes that affect human traits. Along with the pipeline, we provide a step-by-step online guide with scripts and available software ( https://github.com/immunogenomics/HLA_analyses_tutorial ). This tutorial will be broadly applicable to large-scale genotyping data and will contribute to defining the role of HLA in human diseases across global populations.


Assuntos
Antígenos HLA , Antígenos de Histocompatibilidade Classe I , Humanos , Alelos , Antígenos HLA/genética , Genótipo , Haplótipos , Aminoácidos/genética , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla
11.
Sci Rep ; 12(1): 4516, 2022 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-35296692

RESUMO

The Environmental Determinants of Diabetes in the Young (TEDDY) study enrolled 8676 children, 3-4 months of age, born with HLA-susceptibility genotypes for islet autoimmunity (IA) and type 1 diabetes (T1D). Whole-genome sequencing (WGS) was performed in 1119 children in a nested case-control study design. Telomere length was estimated from WGS data using five tools: Computel, Telseq, Telomerecat, qMotif and Motif_counter. The estimated median telomere length was 5.10 kb (IQR 4.52-5.68 kb) using Computel. The age when the blood sample was drawn had a significant negative correlation with telomere length (P = 0.003). European children, particularly those from Finland (P = 0.041) and from Sweden (P = 0.001), had shorter telomeres than children from the U.S.A. Paternal age (P = 0.019) was positively associated with telomere length. First-degree relative status, presence of gestational diabetes in the mother, and maternal age did not have a significant impact on estimated telomere length. HLA-DR4/4 or HLA-DR4/X children had significantly longer telomeres compared to children with HLA-DR3/3 or HLA-DR3/9 haplogenotypes (P = 0.008). Estimated telomere length was not significantly different with respect to any IA (P = 0.377), IAA-first (P = 0.248), GADA-first (P = 0.248) or T1D (P = 0.861). These results suggest that telomere length has no major impact on the risk for IA, the first step to develop T1D. Nevertheless, telomere length was shorter in the T1D high prevalence populations, Finland and Sweden.


Assuntos
Diabetes Mellitus Tipo 1 , Ilhotas Pancreáticas , Autoanticorpos , Autoimunidade/genética , Estudos de Casos e Controles , Criança , Feminino , Predisposição Genética para Doença , Genótipo , Humanos , Telômero/genética
12.
Genetics ; 218(1)2021 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-33720349

RESUMO

Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.


Assuntos
Frequência do Gene/genética , Genética Populacional/métodos , Desequilíbrio de Ligação/genética , Alelos , Genótipo , Humanos , Modelos Genéticos , Modelos Estatísticos , Fenótipo , Software
13.
Nat Genet ; 52(6): 634-639, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32424355

RESUMO

With very large sample sizes, biobanks provide an exciting opportunity to identify genetic components of complex traits. To analyze rare variants, region-based multiple-variant aggregate tests are commonly used to increase power for association tests. However, because of the substantial computational cost, existing region-based tests cannot analyze hundreds of thousands of samples while accounting for confounders such as population stratification and sample relatedness. Here we propose a scalable generalized mixed-model region-based association test, SAIGE-GENE, that is applicable to exome-wide and genome-wide region-based analysis for hundreds of thousands of samples and can account for unbalanced case-control ratios for binary traits. Through extensive simulation studies and analysis of the HUNT study with 69,716 Norwegian samples and the UK Biobank data with 408,910 White British samples, we show that SAIGE-GENE can efficiently analyze large-sample data (N > 400,000) with type I error rates well controlled.


Assuntos
Bancos de Espécimes Biológicos/estatística & dados numéricos , Estudos de Casos e Controles , Exoma , Modelos Lineares , Marcadores Genéticos , Humanos , Lipoproteínas HDL/genética , Modelos Genéticos , Herança Multifatorial , Noruega , Reino Unido , Relação Cintura-Quadril
14.
Nat Genet ; 50(9): 1335-1341, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30104761

RESUMO

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Estudos de Casos e Controles , Simulação por Computador , Humanos , Modelos Lineares , Modelos Logísticos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa