Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Nature ; 625(7993): 92-100, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38057664

RESUMEN

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Asunto(s)
Genoma Humano , Genómica , Modelos Genéticos , Mutación , Humanos , Acceso a la Información , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Frecuencia de los Genes , Genoma Humano/genética , Mutación/genética , Selección Genética
2.
Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-38000370

RESUMEN

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.


Asunto(s)
ADN , Trucha , Humanos , Animales , Análisis de Secuencia de ADN/métodos , Genotipo , Homocigoto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos
3.
Am J Hum Genet ; 110(9): 1454-1469, 2023 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-37595579

RESUMEN

Short-read genome sequencing (GS) holds the promise of becoming the primary diagnostic approach for the assessment of autism spectrum disorder (ASD) and fetal structural anomalies (FSAs). However, few studies have comprehensively evaluated its performance against current standard-of-care diagnostic tests: karyotype, chromosomal microarray (CMA), and exome sequencing (ES). To assess the clinical utility of GS, we compared its diagnostic yield against these three tests in 1,612 quartet families including an individual with ASD and in 295 prenatal families. Our GS analytic framework identified a diagnostic variant in 7.8% of ASD probands, almost 2-fold more than CMA (4.3%) and 3-fold more than ES (2.7%). However, when we systematically captured copy-number variants (CNVs) from the exome data, the diagnostic yield of ES (7.4%) was brought much closer to, but did not surpass, GS. Similarly, we estimated that GS could achieve an overall diagnostic yield of 46.1% in unselected FSAs, representing a 17.2% increased yield over karyotype, 14.1% over CMA, and 4.1% over ES with CNV calling or 36.1% increase without CNV discovery. Overall, GS provided an added diagnostic yield of 0.4% and 0.8% beyond the combination of all three standard-of-care tests in ASD and FSAs, respectively. This corresponded to nine GS unique diagnostic variants, including sequence variants in exons not captured by ES, structural variants (SVs) inaccessible to existing standard-of-care tests, and SVs where the resolution of GS changed variant classification. Overall, this large-scale evaluation demonstrated that GS significantly outperforms each individual standard-of-care test while also outperforming the combination of all three tests, thus warranting consideration as the first-tier diagnostic approach for the assessment of ASD and FSAs.


Asunto(s)
Trastorno del Espectro Autista , Femenino , Embarazo , Humanos , Trastorno del Espectro Autista/diagnóstico , Trastorno del Espectro Autista/genética , Primer Trimestre del Embarazo , Ultrasonografía Prenatal , Mapeo Cromosómico , Exoma
4.
Nature ; 581(7809): 444-451, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32461652

RESUMEN

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Asunto(s)
Enfermedad/genética , Variación Genética , Genética Médica/normas , Genética de Población/normas , Genoma Humano/genética , Femenino , Pruebas Genéticas , Técnicas de Genotipaje , Humanos , Masculino , Persona de Mediana Edad , Mutación , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Estándares de Referencia , Selección Genética , Secuenciación Completa del Genoma
5.
Nature ; 581(7809): 434-443, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32461654

RESUMEN

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.


Asunto(s)
Exoma/genética , Genes Esenciales/genética , Variación Genética/genética , Genoma Humano/genética , Adulto , Encéfalo/metabolismo , Enfermedades Cardiovasculares/genética , Estudios de Cohortes , Bases de Datos Genéticas , Femenino , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Mutación con Pérdida de Función/genética , Masculino , Tasa de Mutación , Proproteína Convertasa 9/genética , ARN Mensajero/genética , Reproducibilidad de los Resultados , Secuenciación del Exoma , Secuenciación Completa del Genoma
10.
Biophys J ; 105(12): 2832-42, 2013 Dec 17.
Artículo en Inglés | MEDLINE | ID: mdl-24359755

RESUMEN

It has been observed experimentally that cells from failing hearts exhibit elevated levels of reactive oxygen species (ROS) upon increases in energetic workload. One proposed mechanism for this behavior is mitochondrial Ca(2+) mismanagement that leads to depletion of ROS scavengers. Here, we present a computational model to test this hypothesis. Previously published models of ROS production and scavenging were combined and reparameterized to describe ROS regulation in the cellular environment. Extramitochondrial Ca(2+) pulses were applied to simulate frequency-dependent changes in cytosolic Ca(2+). Model results show that decreased mitochondrial Ca(2+)uptake due to mitochondrial Ca(2+) uniporter inhibition (simulating Ru360) or elevated cytosolic Na(+), as in heart failure, leads to a decreased supply of NADH and NADPH upon increasing cellular workload. Oxidation of NADPH leads to oxidation of glutathione (GSH) and increased mitochondrial ROS levels, validating the Ca(2+) mismanagement hypothesis. The model goes on to predict that the ratio of steady-state [H2O2]m during 3Hz pacing to [H2O2]m at rest is highly sensitive to the size of the GSH pool. The largest relative increase in [H2O2]m in response to pacing is shown to occur when the total GSH and GSSG is close to 1 mM, whereas pool sizes below 0.9 mM result in high resting H2O2 levels, a quantitative prediction only possible with a computational model.


Asunto(s)
Depuradores de Radicales Libres/metabolismo , Glutatión/metabolismo , Insuficiencia Cardíaca/metabolismo , Mitocondrias/metabolismo , Modelos Cardiovasculares , Especies Reactivas de Oxígeno/metabolismo , Animales , Calcio/metabolismo , Humanos , NAD/metabolismo , Sodio/metabolismo
11.
Biophys J ; 105(4): 1045-56, 2013 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-23972856

RESUMEN

Elevated levels of reactive oxygen species (ROS) play a critical role in cardiac myocyte signaling in both healthy and diseased cells. Mitochondria represent the predominant cellular source of ROS, specifically the activity of complexes I and III. The model presented here explores the modulation of electron transport chain ROS production for state 3 and state 4 respiration and the role of substrates and respiratory inhibitors. Model simulations show that ROS production from complex III increases exponentially with membrane potential (ΔΨm) when in state 4. Complex I ROS release in the model can occur in the presence of NADH and succinate (reverse electron flow), leading to a highly reduced ubiquinone pool, displaying the highest ROS production flux in state 4. In the presence of ample ROS scavenging, total ROS production is moderate in state 3 and increases substantially under state 4 conditions. The ROS production model was extended by combining it with a minimal model of ROS scavenging. When the mitochondrial redox status was oxidized by increasing the proton permeability of the inner mitochondrial membrane, simulations with the combined model show that ROS levels initially decline as production drops off with decreasing ΔΨm and then increase as scavenging capacity is exhausted. Hence, this mechanistic model of ROS production demonstrates how ROS levels are controlled by mitochondrial redox balance.


Asunto(s)
Simulación por Computador , Mitocondrias/metabolismo , Miocardio/citología , Especies Reactivas de Oxígeno/metabolismo , Respiración de la Célula , Proteínas del Complejo de Cadena de Transporte de Electrón/química , Proteínas del Complejo de Cadena de Transporte de Electrón/metabolismo , Potencial de la Membrana Mitocondrial , Modelos Biológicos , Modelos Moleculares , Oxidación-Reducción , Conformación Proteica
12.
bioRxiv ; 2023 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-37425834

RESUMEN

DNA sample contamination is a major issue in clinical and research applications of whole genome and exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a new metric to estimate DNA sample contamination from variant-level whole genome and exome sequence data, CHARR, Contamination from Homozygous Alternate Reference Reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VDS format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole genome and exome sequencing datasets.

13.
Nat Genet ; 55(9): 1589-1597, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37604963

RESUMEN

Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.


Asunto(s)
Variaciones en el Número de Copia de ADN , Exoma , Humanos , Exoma/genética , Secuenciación del Exoma , Variaciones en el Número de Copia de ADN/genética , Mapeo Cromosómico , Exones
14.
Cell Genom ; 2(9): 100168, 2022 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-36778668

RESUMEN

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

15.
Nat Commun ; 11(1): 2539, 2020 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-32461613

RESUMEN

Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.


Asunto(s)
Exoma , Variación Genética , Genoma Humano , Islas de CpG , Análisis Mutacional de ADN , Bases de Datos Genéticas , Humanos , Mutación
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda