Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
Nature ; 625(7993): 92-100, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38057664

RESUMEN

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Asunto(s)
Genoma Humano , Genómica , Modelos Genéticos , Mutación , Humanos , Acceso a la Información , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Frecuencia de los Genes , Genoma Humano/genética , Mutación/genética , Selección Genética
3.
Commun Biol ; 5(1): 140, 2022 02 17.
Artículo en Inglés | MEDLINE | ID: mdl-35177770

RESUMEN

The Weddell seal (Leptonychotes weddellii) thrives in its extreme Antarctic environment. We generated the Weddell seal genome assembly and a high-quality annotation to investigate genome-wide evolutionary pressures that underlie its phenotype and to study genes implicated in hypoxia tolerance and a lipid-based metabolism. Genome-wide analyses included gene family expansion/contraction, positive selection, and diverged sequence (acceleration) compared to other placental mammals, identifying selection in coding and non-coding sequence in five pathways that may shape cardiovascular phenotype. Lipid metabolism as well as hypoxia genes contained more accelerated regions in the Weddell seal compared to genomic background. Top-significant genes were SUMO2 and EP300; both regulate hypoxia inducible factor signaling. Liver expression of four genes with the strongest acceleration signals differ between Weddell seals and a terrestrial mammal, sheep. We also report a high-density lipoprotein-like particle in Weddell seal serum not present in other mammals, including the shallow-diving harbor seal.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genoma , Phocidae/genética , Animales , Regiones Antárticas , Regulación de la Expresión Génica/fisiología , Metabolismo de los Lípidos , Oxígeno/metabolismo , Filogenia , Especificidad de la Especie
12.
Nature ; 587(7833): 246-251, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33177663

RESUMEN

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.


Asunto(s)
Genoma/genética , Genómica/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Vertebrados/genética , Amnios , Animales , Simulación por Computador , Genómica/normas , Haplotipos , Humanos , Control de Calidad , Alineación de Secuencia/normas , Programas Informáticos/normas
13.
Nat Commun ; 11(1): 2539, 2020 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-32461613

RESUMEN

Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.


Asunto(s)
Exoma , Variación Genética , Genoma Humano , Islas de CpG , Análisis Mutacional de ADN , Bases de Datos Genéticas , Humanos , Mutación
14.
Nat Commun ; 11(1): 2523, 2020 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-32461616

RESUMEN

Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes.


Asunto(s)
Regiones no Traducidas 5' , Variación Genética , Mutación con Pérdida de Función , Proteínas/genética , Secuencia de Bases , Genoma Humano , Humanos , Sistemas de Lectura Abierta
15.
Nature ; 581(7809): 444-451, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32461652

RESUMEN

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Asunto(s)
Enfermedad/genética , Variación Genética , Genética Médica/normas , Genética de Población/normas , Genoma Humano/genética , Femenino , Pruebas Genéticas , Técnicas de Genotipaje , Humanos , Masculino , Persona de Mediana Edad , Mutación , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Estándares de Referencia , Selección Genética , Secuenciación Completa del Genoma
16.
Nature ; 581(7809): 459-464, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32461653

RESUMEN

Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.


Asunto(s)
Genes Esenciales/efectos de los fármacos , Genes Esenciales/genética , Mutación con Pérdida de Función/genética , Terapia Molecular Dirigida , Artefactos , Automatización , Consanguinidad , Exones/genética , Mutación con Ganancia de Función/genética , Frecuencia de los Genes , Técnicas de Silenciamiento del Gen , Heterocigoto , Homocigoto , Humanos , Proteína Huntingtina/genética , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/genética , Enfermedades Neurodegenerativas/genética , Proteínas Priónicas/genética , Reproducibilidad de los Resultados , Tamaño de la Muestra , Proteínas tau/genética
17.
Nature ; 581(7809): 452-458, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32461655

RESUMEN

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.


Asunto(s)
Enfermedad/genética , Haploinsuficiencia/genética , Mutación con Pérdida de Función/genética , Anotación de Secuencia Molecular , Transcripción Genética , Transcriptoma/genética , Trastorno del Espectro Autista/genética , Conjuntos de Datos como Asunto , Discapacidades del Desarrollo/genética , Exones/genética , Femenino , Genotipo , Humanos , Discapacidad Intelectual/genética , Masculino , Anotación de Secuencia Molecular/normas , Distribución de Poisson , ARN Mensajero/análisis , ARN Mensajero/genética , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Reproducibilidad de los Resultados , Secuenciación del Exoma
18.
Nature ; 581(7809): 434-443, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32461654

RESUMEN

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.


Asunto(s)
Exoma/genética , Genes Esenciales/genética , Variación Genética/genética , Genoma Humano/genética , Adulto , Encéfalo/metabolismo , Enfermedades Cardiovasculares/genética , Estudios de Cohortes , Bases de Datos Genéticas , Femenino , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Mutación con Pérdida de Función/genética , Masculino , Tasa de Mutación , Proproteína Convertasa 9/genética , ARN Mensajero/genética , Reproducibilidad de los Resultados , Secuenciación del Exoma , Secuenciación Completa del Genoma
19.
Nat Med ; 26(6): 869-877, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32461697

RESUMEN

Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5-8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.


Asunto(s)
Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/genética , Mutación con Pérdida de Función/genética , Adulto , Anciano , Anciano de 80 o más Años , Bancos de Muestras Biológicas , Línea Celular , Células Madre Embrionarias/metabolismo , Femenino , Mutación con Ganancia de Función/genética , Heterocigoto , Humanos , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/antagonistas & inhibidores , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/metabolismo , Longevidad/genética , Linfocitos/metabolismo , Masculino , Persona de Mediana Edad , Miocitos Cardíacos/metabolismo , Enfermedad de Parkinson/tratamiento farmacológico , Enfermedad de Parkinson/genética , Fenotipo
20.
Proc Natl Acad Sci U S A ; 116(51): 25745-25755, 2019 12 17.
Artículo en Inglés | MEDLINE | ID: mdl-31772017

RESUMEN

Venom systems are key adaptations that have evolved throughout the tree of life and typically facilitate predation or defense. Despite venoms being model systems for studying a variety of evolutionary and physiological processes, many taxonomic groups remain understudied, including venomous mammals. Within the order Eulipotyphla, multiple shrew species and solenodons have oral venom systems. Despite morphological variation of their delivery systems, it remains unclear whether venom represents the ancestral state in this group or is the result of multiple independent origins. We investigated the origin and evolution of venom in eulipotyphlans by characterizing the venom system of the endangered Hispaniolan solenodon (Solenodon paradoxus). We constructed a genome to underpin proteomic identifications of solenodon venom toxins, before undertaking evolutionary analyses of those constituents, and functional assessments of the secreted venom. Our findings show that solenodon venom consists of multiple paralogous kallikrein 1 (KLK1) serine proteases, which cause hypotensive effects in vivo, and seem likely to have evolved to facilitate vertebrate prey capture. Comparative analyses provide convincing evidence that the oral venom systems of solenodons and shrews have evolved convergently, with the 4 independent origins of venom in eulipotyphlans outnumbering all other venom origins in mammals. We find that KLK1s have been independently coopted into the venom of shrews and solenodons following their divergence during the late Cretaceous, suggesting that evolutionary constraints may be acting on these genes. Consequently, our findings represent a striking example of convergent molecular evolution and demonstrate that distinct structural backgrounds can yield equivalent functions.


Asunto(s)
Euterios , Evolución Molecular , Genoma/genética , Musarañas , Ponzoñas/genética , Animales , Euterios/clasificación , Euterios/genética , Euterios/fisiología , Duplicación de Gen , Masculino , Filogenia , Proteómica , Musarañas/clasificación , Musarañas/genética , Musarañas/fisiología , Calicreínas de Tejido/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...