Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Cell ; 185(18): 3426-3440.e19, 2022 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-36055201

RESUMEN

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.


Asunto(s)
Genoma Humano , Secuenciación Completa del Genoma , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación INDEL , Masculino , Polimorfismo de Nucleótido Simple
2.
Am J Hum Genet ; 109(10): 1727-1741, 2022 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-36055244

RESUMEN

Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Bilirrubina , Carnitina , Glicerofosfolípidos , Humanos , Masculino , Metabolómica , Sitios de Carácter Cuantitativo/genética , Miembro 5 de la Familia 22 de Transportadores de Solutos/genética , Transcriptoma/genética
3.
Nature ; 604(7906): 437-446, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35444317

RESUMEN

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.


Asunto(s)
Genoma Humano , Genómica , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN
4.
Nat Commun ; 13(1): 1644, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35347128

RESUMEN

Few studies have explored the impact of rare variants (minor allele frequency < 1%) on highly heritable plasma metabolites identified in metabolomic screens. The Finnish population provides an ideal opportunity for such explorations, given the multiple bottlenecks and expansions that have shaped its history, and the enrichment for many otherwise rare alleles that has resulted. Here, we report genetic associations for 1391 plasma metabolites in 6136 men from the late-settlement region of Finland. We identify 303 novel association signals, more than one third at variants rare or enriched in Finns. Many of these signals identify genes not previously implicated in metabolite genome-wide association studies and suggest mechanisms for diseases and disease-related traits.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Alelos , Finlandia , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Masculino , Fenotipo
5.
Science ; 376(6588): 44-53, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35357919

RESUMEN

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Asunto(s)
Genoma Humano , Proyecto Genoma Humano , Análisis de Secuencia de ADN/normas , Línea Celular , Cromosomas Artificiales Bacterianos/genética , Cromosomas Humanos/genética , Humanos , Valores de Referencia
6.
Cell Genom ; 2(1)2022 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-35199087

RESUMEN

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.

7.
Genome Res ; 31(12): 2249-2257, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34544830

RESUMEN

Structural variants (SVs) are an important source of human genome diversity, but their functional effects are poorly understood. We mapped 61,668 SVs in 613 individuals from the GTEx project and measured their effects on gene expression. We estimate that common SVs are causal at 2.66% of eQTLs, a 10.5-fold enrichment relative to their abundance in the genome. Duplications and deletions were the most impactful variant types, whereas the contribution of mobile element insertions was small (0.12% of eQTLs, 1.9-fold enriched). Multitissue analysis of eQTLs revealed that gene-altering SVs show more constitutive effects than other variant types, with 62.09% of coding SV-eQTLs active in all tissues with eQTL activity compared with 23.08% of coding SNV- and indel-eQTLs. Noncoding SVs, SNVs and indels show broadly similar patterns. We also identified 539 rare SVs associated with nearby gene expression outliers. Of these, 62.34% are noncoding SVs that affect gene expression but have modest enrichment at regulatory elements, showing that rare noncoding SVs are a major source of gene expression differences but remain difficult to predict from current annotations. Both common and rare SVs often affect the expression of multiple genes: SV-eQTLs affect an average of 1.82 nearby genes, whereas SNV- and indel-eQTLs affect an average of 1.09 genes, and 21.34% of rare expression-altering SVs show effects on two to nine different genes. We also observe significant effects on rare gene expression changes extending 1 Mb from the SV. This provides a mechanism by which individual SVs may have strong or pleiotropic effects on phenotypic variation.

8.
Hum Genomics ; 15(1): 34, 2021 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-34099068

RESUMEN

BACKGROUND: Mitochondrial genome copy number (MT-CN) varies among humans and across tissues and is highly heritable, but its causes and consequences are not well understood. When measured by bulk DNA sequencing in blood, MT-CN may reflect a combination of the number of mitochondria per cell and cell-type composition. Here, we studied MT-CN variation in blood-derived DNA from 19184 Finnish individuals using a combination of genome (N = 4163) and exome sequencing (N = 19034) data as well as imputed genotypes (N = 17718). RESULTS: We identified two loci significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6 × 10-8), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMBIM1 gene (P = 3.0 × 10-8), which has been reported to protect against non-alcoholic fatty liver disease. We also found that MT-CN is strongly associated with insulin levels (P = 2.0 × 10-21) and other metabolic syndrome (metS)-related traits. Using a Mendelian randomization framework, we show evidence that MT-CN measured in blood is causally related to insulin levels. We then applied an MT-CN polygenic risk score (PRS) derived from Finnish data to the UK Biobank, where the association between the PRS and metS traits was replicated. Adjusting for cell counts largely eliminated these signals, suggesting that MT-CN affects metS via cell-type composition. CONCLUSION: These results suggest that measurements of MT-CN in blood-derived DNA partially reflect differences in cell-type composition and that these differences are causally linked to insulin and related traits.


Asunto(s)
Proteínas Reguladoras de la Apoptosis/genética , Variaciones en el Número de Copia de ADN/genética , ADN Mitocondrial/sangre , Proteínas de Unión al GTP/genética , Proteínas de la Membrana/genética , Proteínas Proto-Oncogénicas c-myb/genética , Adulto , Anciano , Linaje de la Célula/genética , ADN Mitocondrial/genética , Femenino , Predisposición Genética a la Enfermedad , Genoma Mitocondrial/genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Análisis de la Aleatorización Mendeliana , Persona de Mediana Edad , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN , Secuenciación del Exoma
9.
Cell ; 184(10): 2633-2648.e19, 2021 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-33864768

RESUMEN

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.


Asunto(s)
Enfermedad/genética , Herencia Multifactorial/genética , Población/genética , ARN Largo no Codificante/genética , Transcriptoma , Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Perfilación de la Expresión Génica , Variación Genética , Humanos , Enfermedades Inflamatorias del Intestino/genética , Especificidad de Órganos/genética , Sitios de Carácter Cuantitativo
10.
Am J Hum Genet ; 108(4): 583-596, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33798444

RESUMEN

The contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low-frequency SVs for association with 116 quantitative traits and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including 2 loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p = 1.47 × 10-54) and is also associated with increased levels of total cholesterol (p = 1.22 × 10-28) and 14 additional cholesterol-related traits, and (2) a multi-allelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p = 4.81 × 10-21) and alanine (p = 6.14 × 10-12) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs) and one linking recurrent HP gene deletion and cholesterol levels (p = 6.24 × 10-10), which was also found to be strongly associated with increased glycoprotein level (p = 3.53 × 10-35). Our study confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk.


Asunto(s)
Enfermedades Cardiovasculares/genética , Variación Estructural del Genoma/genética , Alelos , Colesterol/sangre , Variaciones en el Número de Copia de ADN/genética , Femenino , Finlandia , Genoma Humano/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Proteínas Mitocondriales/genética , Regiones Promotoras Genéticas/genética , Piruvato Deshidrogenasa (Lipoamida)-Fosfatasa/genética , Ácido Pirúvico/metabolismo , Albúmina Sérica Humana/genética
11.
Science ; 372(6537)2021 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-33632895

RESUMEN

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.


Asunto(s)
Variación Genética , Genoma Humano , Haplotipos , Femenino , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Secuencias Repetitivas Esparcidas , Masculino , Grupos de Población/genética , Sitios de Carácter Cuantitativo , Retroelementos , Análisis de Secuencia de ADN , Inversión de Secuencia , Secuenciación Completa del Genoma
12.
Mol Psychiatry ; 26(9): 4884-4895, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-33526825

RESUMEN

Copy number variants (CNVs) are associated with syndromic and severe neurological and psychiatric disorders (SNPDs), such as intellectual disability, epilepsy, schizophrenia, and bipolar disorder. Although considered high-impact, CNVs are also observed in the general population. This presents a diagnostic challenge in evaluating their clinical significance. To estimate the phenotypic differences between CNV carriers and non-carriers regarding general health and well-being, we compared the impact of SNPD-associated CNVs on health, cognition, and socioeconomic phenotypes to the impact of three genome-wide polygenic risk score (PRS) in two Finnish cohorts (FINRISK, n = 23,053 and NFBC1966, n = 4895). The focus was on CNV carriers and PRS extremes who do not have an SNPD diagnosis. We identified high-risk CNVs (DECIPHER CNVs, risk gene deletions, or large [>1 Mb] CNVs) in 744 study participants (2.66%), 36 (4.8%) of whom had a diagnosed SNPD. In the remaining 708 unaffected carriers, we observed lower educational attainment (EA; OR = 0.77 [95% CI 0.66-0.89]) and lower household income (OR = 0.77 [0.66-0.89]). Income-associated CNVs also lowered household income (OR = 0.50 [0.38-0.66]), and CNVs with medical consequences lowered subjective health (OR = 0.48 [0.32-0.72]). The impact of PRSs was broader. At the lowest extreme of PRS for EA, we observed lower EA (OR = 0.31 [0.26-0.37]), lower-income (OR = 0.66 [0.57-0.77]), lower subjective health (OR = 0.72 [0.61-0.83]), and increased mortality (Cox's HR = 1.55 [1.21-1.98]). PRS for intelligence had a similar impact, whereas PRS for schizophrenia did not affect these traits. We conclude that the majority of working-age individuals carrying high-risk CNVs without SNPD diagnosis have a modest impact on morbidity and mortality, as well as the limited impact on income and educational attainment, compared to individuals at the extreme end of common genetic variation. Our findings highlight that the contribution of traditional high-risk variants such as CNVs should be analyzed in a broader genetic context, rather than evaluated in isolation.


Asunto(s)
Variaciones en el Número de Copia de ADN , Esquizofrenia , Cognición , Variaciones en el Número de Copia de ADN/genética , Escolaridad , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Herencia Multifactorial/genética , Esquizofrenia/genética
13.
Nature ; 583(7814): 83-89, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32460305

RESUMEN

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.


Asunto(s)
Variación Genética , Genoma Humano/genética , Secuenciación Completa del Genoma , Alelos , Estudios de Casos y Controles , Epigénesis Genética , Femenino , Dosificación de Gen/genética , Genética de Población , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Anotación de Secuencia Molecular , Sitios de Carácter Cuantitativo , Grupos Raciales/genética , Programas Informáticos
15.
Nature ; 572(7769): 323-328, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31367044

RESUMEN

Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.


Asunto(s)
Secuenciación del Exoma , Estudios de Asociación Genética/métodos , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Sitios de Carácter Cuantitativo/genética , Alelos , HDL-Colesterol/genética , Análisis por Conglomerados , Determinación de Punto Final , Finlandia , Mapeo Geográfico , Humanos , Herencia Multifactorial/genética , Reproducibilidad de los Resultados
16.
Bioinformatics ; 35(22): 4782-4787, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31218349

RESUMEN

SUMMARY: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies. AVAILABILITY AND IMPLEMENTATION: svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Humano , Programas Informáticos , Humanos , Eliminación de Secuencia , Secuenciación Completa del Genoma
17.
Cell ; 177(1): 70-84, 2019 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-30901550

RESUMEN

Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands-and soon millions-of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.


Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Análisis de Secuencia de ADN/tendencias , Bancos de Muestras Biológicas , Mapeo Cromosómico/métodos , Predisposición Genética a la Enfermedad/genética , Pruebas Genéticas/tendencias , Estudio de Asociación del Genoma Completo , Genómica/métodos , Genómica/tendencias , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proyecto Genoma Humano , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN/métodos , Secuenciación Completa del Genoma/métodos , Secuenciación Completa del Genoma/tendencias
18.
Nat Commun ; 9(1): 4038, 2018 10 02.
Artículo en Inglés | MEDLINE | ID: mdl-30279509

RESUMEN

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.


Asunto(s)
Genética Humana/normas , Secuenciación Completa del Genoma/normas , Genoma Humano , Humanos
19.
Cell Rep ; 23(9): 2758-2769, 2018 05 29.
Artículo en Inglés | MEDLINE | ID: mdl-29847804

RESUMEN

Although aneuploidy is found in the majority of tumors, the degree of aneuploidy varies widely. It is unclear how cancer cells become aneuploid or how highly aneuploid tumors are different from those of more normal ploidy. We developed a simple computational method that measures the degree of aneuploidy or structural rearrangements of large chromosome regions of 522 human breast tumors from The Cancer Genome Atlas (TCGA). Highly aneuploid tumors overexpress activators of mitotic transcription and the genes encoding proteins that segregate chromosomes. Overexpression of three mitotic transcriptional regulators, E2F1, MYBL2, and FOXM1, is sufficient to increase the rate of lagging anaphase chromosomes in a non-transformed vertebrate tissue, demonstrating that this event can initiate aneuploidy. Highly aneuploid human breast tumors are also enriched in TP53 mutations. TP53 mutations co-associate with the overexpression of mitotic transcriptional activators, suggesting that these events work together to provide fitness to breast tumors.


Asunto(s)
Aneuploidia , Neoplasias de la Mama/genética , Anafase/genética , Animales , Neoplasias de la Mama/patología , Inestabilidad Cromosómica , Cromosomas Humanos/genética , Embrión no Mamífero/metabolismo , Femenino , Frecuencia de los Genes/genética , Humanos , Mitosis/genética , Modelos Genéticos , Mutación/genética , Fenotipo , Factores de Transcripción/metabolismo , Transcripción Genética , Xenopus/embriología
20.
Nature ; 550(7675): 239-243, 2017 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-29022581

RESUMEN

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.


Asunto(s)
Perfilación de la Expresión Génica , Variación Genética/genética , Especificidad de Órganos/genética , Teorema de Bayes , Femenino , Genoma Humano/genética , Genómica , Genotipo , Humanos , Masculino , Modelos Genéticos , Análisis de Secuencia de ARN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...