Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 557
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 180(3): 568-584.e23, 2020 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-31981491

RESUMO

We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n = 35,584 total samples, 11,986 with ASD). Using an enhanced analytical framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate of 0.1 or less. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained to have severe neurodevelopmental delay, whereas 53 show higher frequencies in individuals ascertained to have ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In cells from the human cortex, expression of risk genes is enriched in excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.


Assuntos
Transtorno Autístico/genética , Córtex Cerebral/crescimento & desenvolvimento , Sequenciamento do Exoma/métodos , Regulação da Expressão Gênica no Desenvolvimento , Neurobiologia/métodos , Estudos de Casos e Controles , Linhagem da Célula , Estudos de Coortes , Exoma , Feminino , Frequência do Gene , Predisposição Genética para Doença , Humanos , Masculino , Mutação de Sentido Incorreto , Neurônios/metabolismo , Fenótipo , Fatores Sexuais , Análise de Célula Única/métodos
2.
Nat Immunol ; 23(7): 1063-1075, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35668320

RESUMO

Extracellular acidification occurs in inflamed tissue and the tumor microenvironment; however, a systematic study on how pH sensing contributes to tissue homeostasis is lacking. In the present study, we examine cell type-specific roles of the pH sensor G protein-coupled receptor 65 (GPR65) and its inflammatory disease-associated Ile231Leu-coding variant in inflammation control. GPR65 Ile231Leu knock-in mice are highly susceptible to both bacterial infection-induced and T cell-driven colitis. Mechanistically, GPR65 Ile231Leu elicits a cytokine imbalance through impaired helper type 17 T cell (TH17 cell) and TH22 cell differentiation and interleukin (IL)-22 production in association with altered cellular metabolism controlled through the cAMP-CREB-DGAT1 axis. In dendritic cells, GPR65 Ile231Leu elevates IL-12 and IL-23 release at acidic pH and alters endo-lysosomal fusion and degradation capacity, resulting in enhanced antigen presentation. The present study highlights GPR65 Ile231Leu as a multistep risk factor in intestinal inflammation and illuminates a mechanism by which pH sensing controls inflammatory circuits and tissue homeostasis.


Assuntos
Colite , Receptores Acoplados a Proteínas G , Animais , Colite/metabolismo , Concentração de Íons de Hidrogênio , Inflamação/metabolismo , Lisossomos/metabolismo , Camundongos , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Células Th17/metabolismo
3.
Cell ; 178(3): 714-730.e22, 2019 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-31348891

RESUMO

Genome-wide association studies (GWAS) have revealed risk alleles for ulcerative colitis (UC). To understand their cell type specificities and pathways of action, we generate an atlas of 366,650 cells from the colon mucosa of 18 UC patients and 12 healthy individuals, revealing 51 epithelial, stromal, and immune cell subsets, including BEST4+ enterocytes, microfold-like cells, and IL13RA2+IL11+ inflammatory fibroblasts, which we associate with resistance to anti-TNF treatment. Inflammatory fibroblasts, inflammatory monocytes, microfold-like cells, and T cells that co-express CD8 and IL-17 expand with disease, forming intercellular interaction hubs. Many UC risk genes are cell type specific and co-regulated within relatively few gene modules, suggesting convergence onto limited sets of cell types and pathways. Using this observation, we nominate and infer functions for specific risk genes across GWAS loci. Our work provides a framework for interrogating complex human diseases and mapping risk variants to cell types and pathways.


Assuntos
Colite Ulcerativa/patologia , Colo/metabolismo , Adulto , Idoso , Anticorpos Monoclonais/uso terapêutico , Bestrofinas/metabolismo , Antígenos CD8/metabolismo , Estudos de Casos e Controles , Colite Ulcerativa/tratamento farmacológico , Colite Ulcerativa/metabolismo , Colo/patologia , Enterócitos/citologia , Enterócitos/metabolismo , Feminino , Loci Gênicos , Estudo de Associação Genômica Ampla , Humanos , Interleucina-17/metabolismo , Masculino , Pessoa de Meia-Idade , Fatores de Risco , Linfócitos T/citologia , Linfócitos T/metabolismo , Trombospondinas/metabolismo , Fator de Necrose Tumoral alfa/imunologia , Fator de Necrose Tumoral alfa/metabolismo , Adulto Jovem
4.
Cell ; 171(6): 1340-1353.e14, 2017 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-29195075

RESUMO

Approximately 15 genes have been directly associated with skin pigmentation variation in humans, leading to its characterization as a relatively simple trait. However, by assembling a global survey of quantitative skin pigmentation phenotypes, we demonstrate that pigmentation is more complex than previously assumed, with genetic architecture varying by latitude. We investigate polygenicity in the KhoeSan populations indigenous to southern Africa who have considerably lighter skin than equatorial Africans. We demonstrate that skin pigmentation is highly heritable, but known pigmentation loci explain only a small fraction of the variance. Rather, baseline skin pigmentation is a complex, polygenic trait in the KhoeSan. Despite this, we identify canonical and non-canonical skin pigmentation loci, including near SLC24A5, TYRP1, SMARCA2/VLDLR, and SNX13, using a genome-wide association approach complemented by targeted resequencing. By considering diverse, under-studied African populations, we show how the architecture of skin pigmentation can vary across humans subject to different local evolutionary pressures.


Assuntos
Pigmentação da Pele , África , População Negra/genética , Humanos , Polimorfismo de Nucleotídeo Único
5.
Nature ; 628(8008): 620-629, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38509369

RESUMO

Epstein-Barr virus (EBV) infection can engender severe B cell lymphoproliferative diseases1,2. The primary infection is often asymptomatic or causes infectious mononucleosis (IM), a self-limiting lymphoproliferative disorder3. Selective vulnerability to EBV has been reported in association with inherited mutations impairing T cell immunity to EBV4. Here we report biallelic loss-of-function variants in IL27RA that underlie an acute and severe primary EBV infection with a nevertheless favourable outcome requiring a minimal treatment. One mutant allele (rs201107107) was enriched in the Finnish population (minor allele frequency = 0.0068) and carried a high risk of severe infectious mononucleosis when homozygous. IL27RA encodes the IL-27 receptor alpha subunit5,6. In the absence of IL-27RA, phosphorylation of STAT1 and STAT3 by IL-27 is abolished in T cells. In in vitro studies, IL-27 exerts a synergistic effect on T-cell-receptor-dependent T cell proliferation7 that is deficient in cells from the patients, leading to impaired expansion of potent anti-EBV effector cytotoxic CD8+ T cells. IL-27 is produced by EBV-infected B lymphocytes and an IL-27RA-IL-27 autocrine loop is required for the maintenance of EBV-transformed B cells. This potentially explains the eventual favourable outcome of the EBV-induced viral disease in patients with IL-27RA deficiency. Furthermore, we identified neutralizing anti-IL-27 autoantibodies in most individuals who developed sporadic infectious mononucleosis and chronic EBV infection. These results demonstrate the critical role of IL-27RA-IL-27 in immunity to EBV, but also the hijacking of this defence by EBV to promote the expansion of infected transformed B cells.


Assuntos
Infecções por Vírus Epstein-Barr , Interleucina-27 , Receptores de Interleucina , Adolescente , Adulto , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Adulto Jovem , Alelos , Linfócitos B/patologia , Linfócitos B/virologia , Linfócitos T CD8-Positivos/patologia , Infecções por Vírus Epstein-Barr/complicações , Infecções por Vírus Epstein-Barr/genética , Infecções por Vírus Epstein-Barr/terapia , Finlândia , Frequência do Gene , Herpesvirus Humano 4 , Homozigoto , Mononucleose Infecciosa/complicações , Mononucleose Infecciosa/genética , Mononucleose Infecciosa/terapia , Interleucina-27/imunologia , Interleucina-27/metabolismo , Mutação com Perda de Função , Receptores de Interleucina/deficiência , Receptores de Interleucina/genética , Receptores de Interleucina/metabolismo , Resultado do Tratamento
6.
Nature ; 625(7993): 92-100, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38057664

RESUMO

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Assuntos
Genoma Humano , Genômica , Modelos Genéticos , Mutação , Humanos , Acesso à Informação , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Frequência do Gene , Genoma Humano/genética , Mutação/genética , Seleção Genética
7.
Nature ; 631(8019): 134-141, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38867047

RESUMO

Mosaic loss of the X chromosome (mLOX) is the most common clonal somatic alteration in leukocytes of female individuals1,2, but little is known about its genetic determinants or phenotypic consequences. Here, to address this, we used data from 883,574 female participants across 8 biobanks; 12% of participants exhibited detectable mLOX in approximately 2% of leukocytes. Female participants with mLOX had an increased risk of myeloid and lymphoid leukaemias. Genetic analyses identified 56 common variants associated with mLOX, implicating genes with roles in chromosomal missegregation, cancer predisposition and autoimmune diseases. Exome-sequence analyses identified rare missense variants in FBXO10 that confer a twofold increased risk of mLOX. Only a small fraction of associations was shared with mosaic Y chromosome loss, suggesting that distinct biological processes drive formation and clonal expansion of sex chromosome missegregation. Allelic shift analyses identified X chromosome alleles that are preferentially retained in mLOX, demonstrating variation at many loci under cellular selection. A polygenic score including 44 allelic shift loci correctly inferred the retained X chromosomes in 80.7% of mLOX cases in the top decile. Our results support a model in which germline variants predispose female individuals to acquiring mLOX, with the allelic content of the X chromosome possibly shaping the magnitude of clonal expansion.


Assuntos
Aneuploidia , Cromossomos Humanos X , Células Clonais , Leucócitos , Mosaicismo , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Alelos , Doenças Autoimunes/genética , Bancos de Espécimes Biológicos , Segregação de Cromossomos/genética , Cromossomos Humanos X/genética , Cromossomos Humanos Y/genética , Células Clonais/metabolismo , Células Clonais/patologia , Exoma/genética , Proteínas F-Box/genética , Predisposição Genética para Doença/genética , Mutação em Linhagem Germinativa , Leucemia/genética , Leucócitos/metabolismo , Modelos Genéticos , Herança Multifatorial/genética , Mutação de Sentido Incorreto/genética
8.
Nat Rev Genet ; 23(9): 533-546, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35501396

RESUMO

Human genetics can inform the biology and epidemiology of coronavirus disease 2019 (COVID-19) by pinpointing causal mechanisms that explain why some individuals become more severely affected by the disease upon infection by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. Large-scale genetic association studies, encompassing both rare and common genetic variants, have used different study designs and multiple disease phenotype definitions to identify several genomic regions associated with COVID-19. Along with a multitude of follow-up studies, these findings have increased our understanding of disease aetiology and provided routes for management of COVID-19. Important emergent opportunities include the clinical translatability of genetic risk prediction, the repurposing of existing drugs, exploration of variable host effects of different viral strains, study of inter-individual variability in vaccination response and understanding the long-term consequences of SARS-CoV-2 infection. Beyond the current pandemic, these transferrable opportunities are likely to affect the study of many infectious diseases.


Assuntos
COVID-19 , COVID-19/epidemiologia , COVID-19/genética , Humanos , Epidemiologia Molecular , Pandemias , SARS-CoV-2/genética
9.
Nature ; 603(7899): 95-102, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35197637

RESUMO

Genome-wide association studies (GWAS) have identified thousands of genetic variants linked to the risk of human disease. However, GWAS have so far remained largely underpowered in relation to identifying associations in the rare and low-frequency allelic spectrum and have lacked the resolution to trace causal mechanisms to underlying genes1. Here we combined whole-exome sequencing in 392,814 UK Biobank participants with imputed genotypes from 260,405 FinnGen participants (653,219 total individuals) to conduct association meta-analyses for 744 disease endpoints across the protein-coding allelic frequency spectrum, bridging the gap between common and rare variant studies. We identified 975 associations, with more than one-third being previously unreported. We demonstrate population-level relevance for mutations previously ascribed to causing single-gene disorders, map GWAS associations to likely causal genes, explain disease mechanisms, and systematically relate disease associations to levels of 117 biomarkers and clinical-stage drug targets. Combining sequencing and genotyping in two population biobanks enabled us to benefit from increased power to detect and explain disease associations, validate findings through replication and propose medical actionability for rare genetic variants. Our study provides a compendium of protein-coding variant associations for future insights into disease biology and drug discovery.


Assuntos
Estudo de Associação Genômica Ampla , Proteínas , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Sequenciamento do Exoma
10.
Am J Hum Genet ; 111(10): 2129-2138, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39270648

RESUMO

Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.


Assuntos
Aprendizado de Máquina Supervisionado , Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Genoma Humano , Genética Populacional/métodos , Etnicidade/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Genótipo
11.
Am J Hum Genet ; 111(6): 1047-1060, 2024 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-38776927

RESUMO

Lichen planus (LP) is a T-cell-mediated inflammatory disease affecting squamous epithelia in many parts of the body, most often the skin and oral mucosa. Cutaneous LP is usually transient and oral LP (OLP) is most often chronic, so we performed a large-scale genetic and epidemiological study of LP to address whether the oral and non-oral subgroups have shared or distinct underlying pathologies and their overlap with autoimmune disease. Using lifelong records covering diagnoses, procedures, and clinic identity from 473,580 individuals in the FinnGen study, genome-wide association analyses were conducted on carefully constructed subcategories of OLP (n = 3,323) and non-oral LP (n = 4,356) and on the combined group. We identified 15 genome-wide significant associations in FinnGen and an additional 12 when meta-analyzed with UKBB (27 independent associations at 25 distinct genomic locations), most of which are shared between oral and non-oral LP. Many associations coincide with known autoimmune disease loci, consistent with the epidemiologic enrichment of LP with hypothyroidism and other autoimmune diseases. Notably, a third of the FinnGen associations demonstrate significant differences between OLP and non-OLP. We also observed a 13.6-fold risk for tongue cancer and an elevated risk for other oral cancers in OLP, in agreement with earlier reports that connect LP with higher cancer incidence. In addition to a large-scale dissection of LP genetics and comorbidities, our study demonstrates the use of comprehensive, multidimensional health registry data to address outstanding clinical questions and reveal underlying biological mechanisms in common but understudied diseases.


Assuntos
Doenças Autoimunes , Estudo de Associação Genômica Ampla , Líquen Plano Bucal , Neoplasias Bucais , Humanos , Doenças Autoimunes/genética , Líquen Plano Bucal/genética , Líquen Plano Bucal/patologia , Neoplasias Bucais/genética , Neoplasias Bucais/patologia , Feminino , Masculino , Heterogeneidade Genética , Pessoa de Meia-Idade , Líquen Plano/genética , Líquen Plano/patologia , Predisposição Genética para Doença , Idoso , Adulto , Fatores de Risco , Polimorfismo de Nucleotídeo Único
12.
Genome Res ; 34(5): 796-809, 2024 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-38749656

RESUMO

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Humanos , Projeto Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Variação Genética , Genômica/métodos
13.
Cell ; 149(3): 525-37, 2012 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-22521361

RESUMO

Balanced chromosomal abnormalities (BCAs) represent a relatively untapped reservoir of single-gene disruptions in neurodevelopmental disorders (NDDs). We sequenced BCAs in patients with autism or related NDDs, revealing disruption of 33 loci in four general categories: (1) genes previously associated with abnormal neurodevelopment (e.g., AUTS2, FOXP1, and CDKL5), (2) single-gene contributors to microdeletion syndromes (MBD5, SATB2, EHMT1, and SNURF-SNRPN), (3) novel risk loci (e.g., CHD8, KIRREL3, and ZNF507), and (4) genes associated with later-onset psychiatric disorders (e.g., TCF4, ZNF804A, PDE10A, GRIN2B, and ANK3). We also discovered among neurodevelopmental cases a profoundly increased burden of copy-number variants from these 33 loci and a significant enrichment of polygenic risk alleles from genome-wide association studies of autism and schizophrenia. Our findings suggest a polygenic risk model of autism and reveal that some neurodevelopmental genes are sensitive to perturbation by multiple mutational mechanisms, leading to variable phenotypic outcomes that manifest at different life stages.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Aberrações Cromossômicas , Transtorno Autístico/diagnóstico , Transtorno Autístico/genética , Criança , Transtornos Globais do Desenvolvimento Infantil/diagnóstico , Quebra Cromossômica , Deleção Cromossômica , Variações do Número de Cópias de DNA , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Sistema Nervoso/crescimento & desenvolvimento , Esquizofrenia/genética , Análise de Sequência de DNA , Transdução de Sinais
14.
Nature ; 593(7858): 238-243, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33828297

RESUMO

Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.


Assuntos
Elementos Facilitadores Genéticos/genética , Predisposição Genética para Doença , Variação Genética/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Doenças Inflamatórias Intestinais/genética , Linhagem Celular , Cromossomos Humanos Par 10/genética , Ciclofilinas/genética , Células Dendríticas , Feminino , Humanos , Macrófagos/metabolismo , Masculino , Mitocôndrias/metabolismo , Especificidade de Órgãos/genética , Fenótipo
15.
Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-38000370

RESUMO

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.


Assuntos
DNA , Truta , Humanos , Animais , Análise de Sequência de DNA/métodos , Genótipo , Homozigoto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
16.
Genome Res ; 33(6): 999-1005, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37253541

RESUMO

Large-scale high-throughput sequencing data sets have been transformative for informing clinical variant interpretation and for use as reference panels for statistical and population genetic efforts. Although such resources are often treated as ground truth, we find that in widely used reference data sets such as the Genome Aggregation Database (gnomAD), some variants pass gold-standard filters, yet are systematically different in their genotype calls across genotype discovery approaches. The inclusion of such discordant sites in study designs involving multiple genotype discovery strategies could bias results and lead to false-positive hits in association studies owing to technological artifacts rather than a true relationship to the phenotype. Here, we describe this phenomenon of discordant genotype calls across genotype discovery approaches, characterize the error mode of wrong calls, provide a list of discordant sites identified in gnomAD that should be treated with caution in analyses, and present a metric and machine learning classifier trained on gnomAD data to identify likely discordant variants in other data sets. We find that different genotype discovery approaches have different sets of variants at which this problem occurs, but there are characteristic variant features that can be used to predict discordant behavior. Discordant sites are largely shared across ancestry groups, although different populations are powered for the discovery of different variants. We find that the most common error mode is that of a variant being heterozygous for one approach and homozygous for the other, with heterozygous in the genomes and homozygous reference in the exomes making up the majority of miscalls.


Assuntos
Exoma , Genética Populacional , Genótipo , Heterozigoto , Fenótipo , Polimorfismo de Nucleotídeo Único
17.
Blood ; 143(23): 2425-2432, 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38498041

RESUMO

ABSTRACT: The factor V Leiden (FVL; rs6025) and prothrombin G20210A (PTGM; rs1799963) polymorphisms are 2 of the most well-studied genetic risk factors for venous thromboembolism (VTE). However, double heterozygosity (DH) for FVL and PTGM remains poorly understood, with previous studies showing marked disagreement regarding thrombosis risk conferred by the DH genotype. Using multidimensional data from the UK Biobank (UKB) and FinnGen biorepositories, we evaluated the clinical impact of DH carrier status across 937 939 individuals. We found that 662 participants (0.07%) were DH carriers. After adjustment for age, sex, and ancestry, DH individuals experienced a markedly elevated risk of VTE compared with wild-type individuals (odds ratio [OR] = 5.24; 95% confidence interval [CI], 4.01-6.84; P = 4.8 × 10-34), which approximated the risk conferred by FVL homozygosity. A secondary analysis restricted to UKB participants (N = 445 144) found that effect size estimates for the DH genotype remained largely unchanged (OR = 4.53; 95% CI, 3.42-5.90; P < 1 × 10-16) after adjustment for commonly cited VTE risk factors, such as body mass index, blood type, and markers of inflammation. In contrast, the DH genotype was not associated with a significantly higher risk of any arterial thrombosis phenotype, including stroke, myocardial infarction, and peripheral artery disease. In summary, we leveraged population-scale genomic data sets to conduct, to our knowledge, the largest study to date on the DH genotype and were able to establish far more precise effect size estimates than previously possible. Our findings indicate that the DH genotype may occur as frequently as FVL homozygosity and may confer a similarly increased risk of VTE.


Assuntos
Bancos de Espécimes Biológicos , Fator V , Heterozigoto , Protrombina , Humanos , Protrombina/genética , Fator V/genética , Feminino , Masculino , Pessoa de Meia-Idade , Reino Unido/epidemiologia , Idoso , Fatores de Risco , Tromboembolia Venosa/genética , Tromboembolia Venosa/epidemiologia , Adulto , Trombose/genética , Trombose/epidemiologia , Trombose/etiologia , Predisposição Genética para Doença , Genótipo , Polimorfismo de Nucleotídeo Único , Biobanco do Reino Unido
18.
Nature ; 581(7809): 459-464, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461653

RESUMO

Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.


Assuntos
Genes Essenciais/efeitos dos fármacos , Genes Essenciais/genética , Mutação com Perda de Função/genética , Terapia de Alvo Molecular , Artefatos , Automação , Consanguinidade , Éxons/genética , Mutação com Ganho de Função/genética , Frequência do Gene , Técnicas de Silenciamento de Genes , Heterozigoto , Homozigoto , Humanos , Proteína Huntingtina/genética , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Doenças Neurodegenerativas/genética , Proteínas Priônicas/genética , Reprodutibilidade dos Testes , Tamanho da Amostra , Proteínas tau/genética
19.
Nature ; 581(7809): 452-458, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461655

RESUMO

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.


Assuntos
Doença/genética , Haploinsuficiência/genética , Mutação com Perda de Função/genética , Anotação de Sequência Molecular , Transcrição Gênica , Transcriptoma/genética , Transtorno do Espectro Autista/genética , Conjuntos de Dados como Assunto , Deficiências do Desenvolvimento/genética , Éxons/genética , Feminino , Genótipo , Humanos , Deficiência Intelectual/genética , Masculino , Anotação de Sequência Molecular/normas , Distribuição de Poisson , RNA Mensageiro/análise , RNA Mensageiro/genética , Doenças Raras/diagnóstico , Doenças Raras/genética , Reprodutibilidade dos Testes , Sequenciamento do Exoma
20.
Nature ; 586(7831): 769-775, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33057200

RESUMO

Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P < 5.0 × 10-8), 7 of which have not been previously reported. We find that there is a shared genetic architecture between MPN risk and several haematopoietic traits from distinct lineages; that there is an enrichment for MPN risk variants within accessible chromatin of HSCs; and that increased MPN risk is associated with longer telomere length in leukocytes and other clonal haematopoietic states-collectively suggesting that MPN risk is associated with the function and self-renewal of HSCs. We use gene mapping to identify modulators of HSC biology linked to MPN risk, and show through targeted variant-to-function assays that CHEK2 and GFI1B have roles in altering the function of HSCs to confer disease risk. Overall, our results reveal a previously unappreciated mechanism for inherited MPN risk through the modulation of HSC function.


Assuntos
Predisposição Genética para Doença/genética , Células-Tronco Hematopoéticas/patologia , Transtornos Mieloproliferativos/genética , Transtornos Mieloproliferativos/patologia , Neoplasias/genética , Neoplasias/patologia , Linhagem da Célula/genética , Autorrenovação Celular , Quinase do Ponto de Checagem 2/genética , Feminino , Humanos , Leucócitos/patologia , Masculino , Proteínas Proto-Oncogênicas/genética , Proteínas Repressoras/genética , Risco , Homeostase do Telômero
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA