Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-38260295

RESUMO

The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it both costly and complicated to produce and analyze. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays. These requirements lead to unnecessary data duplication and, ultimately, very large files. To address these challenges, we introduce the Scalable Variant Call Representation (SVCR). This representation reduces file sizes by ensuring they scale linearly with samples. SVCR achieves this by adopting reference blocks from the Genomic Variant Call Format (GVCF) and employing local allele indices. SVCR is also lossless and mergeable, allowing for N+1 and N+K incremental joint-calling. We present two implementations of SVCR: SVCR-VCF, which encodes SVCR in VCF format, and VDS, which uses Hail's native format. Our experiments confirm the linear scalability of SVCR-VCF and VDS, in contrast to the super-linear growth seen with standard VCF files. We also discuss the VDS Combiner, a scalable, open-source tool for producing a VDS from GVCFs and unique features of VDS which enable rapid data analysis. SVCR, and VDS in particular, ensure the scientific community can generate, analyze, and disseminate genetics datasets with millions of samples.

3.
Nature ; 625(7993): 92-100, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38057664

RESUMO

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Assuntos
Genoma Humano , Genômica , Modelos Genéticos , Mutação , Humanos , Acesso à Informação , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Frequência do Gene , Genoma Humano/genética , Mutação/genética , Seleção Genética
4.
Science ; 379(6639): 1341-1348, 2023 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-36996212

RESUMO

Classical statistical genetics theory defines dominance as any deviation from a purely additive, or dosage, effect of a genotype on a trait, which is known as the dominance deviation. Dominance is well documented in plant and animal breeding. Outside of rare monogenic traits, however, evidence in humans is limited. We systematically examined common genetic variation across 1060 traits in a large population cohort (UK Biobank, N = 361,194 samples analyzed) for evidence of dominance effects. We then developed a computationally efficient method to rapidly assess the aggregate contribution of dominance deviations to heritability. Lastly, observing that dominance associations are inherently less correlated between sites at a genomic locus than their additive counterparts, we explored whether they may be leveraged to identify causal variants more confidently.


Assuntos
Bancos de Espécimes Biológicos , Genes Dominantes , Variação Genética , Herança Multifatorial , Animais , Humanos , Cruzamento , Genótipo , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Reino Unido
5.
Cell Genom ; 2(9): 100168, 2022 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-36778668

RESUMO

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

9.
Nature ; 581(7809): 452-458, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461655

RESUMO

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.


Assuntos
Doença/genética , Haploinsuficiência/genética , Mutação com Perda de Função/genética , Anotação de Sequência Molecular , Transcrição Gênica , Transcriptoma/genética , Transtorno do Espectro Autista/genética , Conjuntos de Dados como Assunto , Deficiências do Desenvolvimento/genética , Éxons/genética , Feminino , Genótipo , Humanos , Deficiência Intelectual/genética , Masculino , Anotação de Sequência Molecular/normas , Distribuição de Poisson , RNA Mensageiro/análise , RNA Mensageiro/genética , Doenças Raras/diagnóstico , Doenças Raras/genética , Reprodutibilidade dos Testes , Sequenciamento do Exoma
10.
Nature ; 581(7809): 434-443, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461654

RESUMO

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.


Assuntos
Exoma/genética , Genes Essenciais/genética , Variação Genética/genética , Genoma Humano/genética , Adulto , Encéfalo/metabolismo , Doenças Cardiovasculares/genética , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Mutação com Perda de Função/genética , Masculino , Taxa de Mutação , Pró-Proteína Convertase 9/genética , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Sequenciamento do Exoma , Sequenciamento Completo do Genoma
12.
PLoS One ; 14(11): e0225206, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31725765

RESUMO

The engineered AAV-PHP.B family of adeno-associated virus efficiently delivers genes throughout the mouse central nervous system. To guide their application across disease models, and to inspire the development of translational gene therapy vectors for targeting neurological diseases in humans, we sought to elucidate the host factors responsible for the CNS tropism of the AAV-PHP.B vectors. Leveraging CNS tropism differences across 13 mouse strains, we systematically determined a set of genetic variants that segregate with the permissivity phenotype, and rapidly identified LY6A as an essential receptor for the AAV-PHP.B vectors. Interfering with LY6A by CRISPR/Cas9-mediated Ly6a disruption or with blocking antibodies reduced transduction of mouse brain endothelial cells by AAV-PHP.eB, while ectopic expression of Ly6a increased AAV-PHP.eB transduction of HEK293T and CHO cells by 30-fold or more. Importantly, we demonstrate that this newly discovered mode of AAV binding and transduction can occur independently of other known AAV receptors. These findings illuminate the previously reported species- and strain-specific tropism characteristics of the AAV-PHP.B vectors and inform ongoing efforts to develop next-generation AAV vehicles for human CNS gene therapy.


Assuntos
Barreira Hematoencefálica/metabolismo , Técnicas de Transferência de Genes , Transdução Genética , Transgenes , Animais , Antígenos Ly/química , Antígenos Ly/genética , Encéfalo/metabolismo , Linhagem Celular , Dependovirus/genética , Variação Genética , Vetores Genéticos/administração & dosagem , Vetores Genéticos/genética , Humanos , Proteínas de Membrana/química , Proteínas de Membrana/genética , Camundongos , Neurônios/metabolismo , Tropismo
13.
Nat Commun ; 9(1): 3391, 2018 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-30140000

RESUMO

Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth >29X and analyze genotypes with four quantitative traits-plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Lipídeos/sangue , Sequência de Bases , LDL-Colesterol/genética , Frequência do Gene/genética , Estudo de Associação Genômica Ampla , Humanos , Modelos Genéticos , Mutação/genética
15.
Nat Commun ; 9(1): 2606, 2018 07 04.
Artigo em Inglês | MEDLINE | ID: mdl-29973585

RESUMO

Lipoprotein(a), Lp(a), is a modified low-density lipoprotein particle that contains apolipoprotein(a), encoded by LPA, and is a highly heritable, causal risk factor for cardiovascular diseases that varies in concentrations across ancestries. Here, we use deep-coverage whole genome sequencing in 8392 individuals of European and African ancestry to discover and interpret both single-nucleotide variants and copy number (CN) variation associated with Lp(a). We observe that genetic determinants between Europeans and Africans have several unique determinants. The common variant rs12740374 associated with Lp(a) cholesterol is an eQTL for SORT1 and independent of LDL cholesterol. Observed associations of aggregates of rare non-coding variants are largely explained by LPA structural variation, namely the LPA kringle IV 2 (KIV2)-CN. Finally, we find that LPA risk genotypes confer greater relative risk for incident atherosclerotic cardiovascular diseases compared to directly measured Lp(a), and are significantly associated with measures of subclinical atherosclerosis in African Americans.


Assuntos
Doenças Cardiovasculares/genética , Variações do Número de Cópias de DNA , Genoma Humano , Lipoproteína(a)/genética , Polimorfismo de Nucleotídeo Único , Proteínas Adaptadoras de Transporte Vesicular/sangue , Proteínas Adaptadoras de Transporte Vesicular/genética , População Negra , Doenças Cardiovasculares/sangue , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/etnologia , LDL-Colesterol/sangue , Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Lipoproteína(a)/sangue , Locos de Características Quantitativas , Fatores de Risco , População Branca , Sequenciamento Completo do Genoma
16.
Nat Neurosci ; 19(12): 1563-1565, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27694993

RESUMO

Disruptive, damaging ultra-rare variants in highly constrained genes are enriched in individuals with neurodevelopmental disorders. In the general population, this class of variants was associated with a decrease in years of education (YOE). This effect was stronger among highly brain-expressed genes and explained more YOE variance than pathogenic copy number variation but less than common variants. Disruptive, damaging ultra-rare variants in highly constrained genes influence the determinants of YOE in the general population.


Assuntos
Variações do Número de Cópias de DNA/genética , Predisposição Genética para Doença , Mutação/genética , Transtornos do Neurodesenvolvimento/genética , Educação , Humanos , Análise e Desempenho de Tarefas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...