Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genet Epidemiol ; 2019 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-31520489

RESUMO

Copy number variants (CNVs) play an important role in a number of human diseases, but the accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. We use a regression tree-based approach to call germline CNVs from whole-genome sequencing (WGS, >18x) variant call sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. Eighty-one percent of detected events have been previously reported in the Database of Genomic Variants. Twenty-three percent of high-quality deletions affect entire genes, and we recapitulate known events such as the GSTM1 and RHD gene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe complex CNV patterns underlying an association with levels of the CCL3 protein (MAF = 0.15, p = 3.6x10-12 ) at the CCL3L3 locus, and a novel cis-association between a low-frequency NOMO1 deletion and NOMO1 protein levels (MAF = 0.02, p = 2.2x10-7 ). This study demonstrates that existing population-wide WGS call sets can be mined for germline CNVs with minimal computational overhead, delivering insight into a less well-studied, yet potentially impactful class of genetic variant.

2.
Nat Commun ; 10(1): 357, 2019 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-30664637

RESUMO

Cranial growth and development is a complex process which affects the closely related traits of head circumference (HC) and intracranial volume (ICV). The underlying genetic influences shaping these traits during the transition from childhood to adulthood are little understood, but might include both age-specific genetic factors and low-frequency genetic variation. Here, we model the developmental genetic architecture of HC, showing this is genetically stable and correlated with genetic determinants of ICV. Investigating up to 46,000 children and adults of European descent, we identify association with final HC and/or final ICV + HC at 9 novel common and low-frequency loci, illustrating that genetic variation from a wide allele frequency spectrum contributes to cranial growth. The largest effects are reported for low-frequency variants within TP53, with 0.5 cm wider heads in increaser-allele carriers versus non-carriers during mid-childhood, suggesting a previously unrecognized role of TP53 transcripts in human cranial development.


Assuntos
Alelos , Loci Gênicos , Variação Genética , RNA Mensageiro/genética , Crânio/metabolismo , Proteína Supressora de Tumor p53/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Cefalometria , Criança , Grupo com Ancestrais do Continente Europeu , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Frequência do Gene , Genoma Humano , Humanos , Masculino , Pessoa de Meia-Idade , Crânio/anatomia & histologia
3.
Nat Genet ; 51(2): 343-353, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30692680

RESUMO

Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies' findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community.


Assuntos
Doença/genética , Genoma/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Humanos , Anotação de Sequência Molecular/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Sequências Reguladoras de Ácido Nucleico/genética , Software
4.
Nat Commun ; 9(1): 5460, 2018 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-30568165

RESUMO

The original version of this Article contained an error in Fig. 2. In panel a, the two legend items "rare" and "common" were inadvertently swapped. This has been corrected in both the PDF and HTML versions of the Article.

6.
Nat Commun ; 9(1): 4674, 2018 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-30405126

RESUMO

The role of rare variants in complex traits remains uncharted. Here, we conduct deep whole genome sequencing of 1457 individuals from an isolated population, and test for rare variant burdens across six cardiometabolic traits. We identify a role for rare regulatory variation, which has hitherto been missed. We find evidence of rare variant burdens that are independent of established common variant signals (ADIPOQ and adiponectin, P = 4.2 × 10-8; APOC3 and triglyceride levels, P = 1.5 × 10-26), and identify replicating evidence for a burden associated with triglyceride levels in FAM189B (P = 2.2 × 10-8), indicating a role for this gene in lipid metabolism.

7.
Methods Mol Biol ; 1793: 25-36, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29876889

RESUMO

Thorough data quality control (QC) is a key step to the success of high-throughput genotyping approaches. Following extensive research several criteria and thresholds have been established for data QC at the sample and variant level. Sample QC is aimed at the identification and removal (when appropriate) of individuals with (1) low call rate, (2) discrepant sex or other identity-related information, (3) excess genome-wide heterozygosity and homozygosity, (4) relations to other samples, (5) ethnicity differences, (6) batch effects, and (7) contamination. Variant QC is aimed at identification and removal or refinement of variants with (1) low call rate, (2) call rate differences by phenotypic status, (3) gross deviation from Hardy-Weinberg Equilibrium (HWE), (4) bad genotype intensity plots, (5) batch effects, (6) differences in allele frequencies with published data sets, (7) very low minor allele counts (MAC), (8) low imputation quality score, (9) low variant quality score log-odds, and (10) few or low quality reads.

8.
Lancet Haematol ; 5(6): e241-e251, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29780001

RESUMO

BACKGROUND: There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. METHODS: This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. FINDINGS: We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). INTERPRETATION: By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. FUNDING: National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust.


Assuntos
Sistema do Grupo Sanguíneo ABO/genética , Antígenos de Plaquetas Humanas/genética , Tipagem e Reações Cruzadas Sanguíneas/métodos , Sistema do Grupo Sanguíneo Rh-Hr/genética , Sequenciamento Completo do Genoma , Sistema do Grupo Sanguíneo ABO/classificação , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Antígenos de Plaquetas Humanas/classificação , Plaquetas/imunologia , Bases de Dados Genéticas , Eritrócitos/imunologia , Genoma Humano , Humanos , Pessoa de Meia-Idade , Ensaios Clínicos Controlados Aleatórios como Assunto , Sistema do Grupo Sanguíneo Rh-Hr/classificação , Adulto Jovem
9.
Am J Hum Genet ; 100(6): 865-884, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28552196

RESUMO

Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader allelic architecture of 12 anthropometric traits associated with height, body mass, and fat distribution in up to 267,616 individuals. We report 106 genome-wide significant signals that have not been previously identified, including 9 low-frequency variants pointing to functional candidates. Of the 106 signals, 6 are in genomic regions that have not been implicated with related traits before, 28 are independent signals at previously reported regions, and 72 represent previously reported signals for a different anthropometric trait. 71% of signals reside within genes and fine mapping resolves 23 signals to one or two likely causal variants. We confirm genetic overlap between human monogenic and polygenic anthropometric traits and find signal enrichment in cis expression QTLs in relevant tissues. Our results highlight the potential of WGS strategies to enhance biologically relevant discoveries across the frequency spectrum.


Assuntos
Antropometria , Genoma Humano , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Análise de Sequência de DNA/métodos , Estatura/genética , Estudos de Coortes , Metilação de DNA/genética , Bases de Dados Genéticas , Feminino , Variação Genética , Humanos , Lipodistrofia/genética , Masculino , Metanálise como Assunto , Obesidade/genética , Mapeamento Físico do Cromossomo , Caracteres Sexuais , Síndrome , Reino Unido
10.
Genome Biol ; 18(1): 77, 2017 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-28449691

RESUMO

Despite thousands of genetic loci identified to date, a large proportion of genetic variation predisposing to complex disease and traits remains unaccounted for. Advances in sequencing technology enable focused explorations on the contribution of low-frequency and rare variants to human traits. Here we review experimental approaches and current knowledge on the contribution of these genetic variants in complex disease and discuss challenges and opportunities for personalised medicine.


Assuntos
Estudos de Associação Genética/métodos , Predisposição Genética para Doença , Biologia Computacional/métodos , Variação Genética , Estudo de Associação Genômica Ampla , Genômica/métodos , Humanos , Taxa de Mutação
11.
Cell ; 168(5): 830-842.e7, 2017 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-28235197

RESUMO

De novo copy number variants (dnCNVs) arising at multiple loci in a personal genome have usually been considered to reflect cancer somatic genomic instabilities. We describe a multiple dnCNV (MdnCNV) phenomenon in which individuals with genomic disorders carry five to ten constitutional dnCNVs. These CNVs originate from independent formation incidences, are predominantly tandem duplications or complex gains, exhibit breakpoint junction features reminiscent of replicative repair, and show increased de novo point mutations flanking the rearrangement junctions. The active CNV mutation shower appears to be restricted to a transient perizygotic period. We propose that a defect in the CNV formation process is responsible for the "CNV-mutator state," and this state is dampened after early embryogenesis. The constitutional MdnCNV phenomenon resembles chromosomal instability in various cancers. Investigations of this phenomenon may provide unique access to understanding genomic disorders, structural variant mutagenesis, human evolution, and cancer biology.


Assuntos
Aberrações Cromossômicas , Variações do Número de Cópias de DNA , Doenças Genéticas Inatas/embriologia , Doenças Genéticas Inatas/genética , Instabilidade Genômica , Mutação , Pontos de Quebra do Cromossomo , Duplicação Cromossômica , Replicação do DNA , Desenvolvimento Embrionário , Feminino , Gametogênese , Humanos , Masculino
12.
Eur J Hum Genet ; 25(4): 477-484, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28145424

RESUMO

Isolated populations with enrichment of variants due to recent population bottlenecks provide a powerful resource for identifying disease-associated genetic variants and genes. As a model of an isolate population, we sequenced the genomes of 1463 Finnish individuals as part of the Sequencing Initiative Suomi (SISu) Project. We compared the genomic profiles of the 1463 Finns to a sample of 1463 British individuals that were sequenced in parallel as part of the UK10K Project. Whereas there were no major differences in the allele frequency of common variants, a significant depletion of variants in the rare frequency spectrum was observed in Finns when comparing the two populations. On the other hand, we observed >2.1 million variants that were twice as frequent among Finns compared with Britons and 800 000 variants that were more than 10 times more frequent in Finns. Furthermore, in Finns we observed a relative proportional enrichment of variants in the minor allele frequency range between 2 and 5% (P<2.2 × 10-16). When stratified by their functional annotations, loss-of-function variants showed the highest proportional enrichment in Finns (P=0.0291). In the non-coding part of the genome, variants in conserved regions (P=0.002) and promoters (P=0.01) were also significantly enriched in the Finnish samples. These functional categories represent the highest a priori power for downstream association studies of rare variants using population isolates.


Assuntos
Estudo de Associação Genômica Ampla/normas , Polimorfismo Genético , População/genética , Sequência Conservada , Finlândia , Genoma Humano , Humanos , Taxa de Mutação , Reino Unido
13.
Cell ; 167(5): 1415-1429.e19, 2016 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-27863252

RESUMO

Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla , Células-Tronco Hematopoéticas/metabolismo , Doenças do Sistema Imunitário/genética , Alelos , Diferenciação Celular , Grupo com Ancestrais do Continente Europeu/genética , Predisposição Genética para Doença , Células-Tronco Hematopoéticas/patologia , Humanos , Doenças do Sistema Imunitário/patologia , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
14.
Nat Genet ; 48(11): 1303-1312, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27668658

RESUMO

Large-scale whole-genome sequence data sets offer novel opportunities to identify genetic variation underlying human traits. Here we apply genotype imputation based on whole-genome sequence data from the UK10K and 1000 Genomes Project into 35,981 study participants of European ancestry, followed by association analysis with 20 quantitative cardiometabolic and hematological traits. We describe 17 new associations, including 6 rare (minor allele frequency (MAF) < 1%) or low-frequency (1% < MAF < 5%) variants with platelet count (PLT), red blood cell indices (MCH and MCV) and HDL cholesterol. Applying fine-mapping analysis to 233 known and new loci associated with the 20 traits, we resolve the associations of 59 loci to credible sets of 20 or fewer variants and describe trait enrichments within regions of predicted regulatory function. These findings improve understanding of the allelic architecture of risk factors for cardiometabolic and hematological diseases and provide additional functional insights with the identification of potentially novel biological targets.


Assuntos
Loci Gênicos , Genoma Humano , Estudo de Associação Genômica Ampla , Cardiopatias/genética , Doenças Hematológicas/genética , Feminino , Predisposição Genética para Doença , Variação Genética , Humanos , Masculino , Locos de Características Quantitativas , Análise de Sequência de DNA
16.
Am J Hum Genet ; 99(2): 481-8, 2016 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-27486782

RESUMO

Circulating blood cell counts and indices are important indicators of hematopoietic function and a number of clinical parameters, such as blood oxygen-carrying capacity, inflammation, and hemostasis. By performing whole-exome sequence association analyses of hematologic quantitative traits in 15,459 community-dwelling individuals, followed by in silico replication in up to 52,024 independent samples, we identified two previously undescribed coding variants associated with lower platelet count: a common missense variant in CPS1 (rs1047891, MAF = 0.33, discovery + replication p = 6.38 × 10(-10)) and a rare synonymous variant in GFI1B (rs150813342, MAF = 0.009, discovery + replication p = 1.79 × 10(-27)). By performing CRISPR/Cas9 genome editing in hematopoietic cell lines and follow-up targeted knockdown experiments in primary human hematopoietic stem and progenitor cells, we demonstrate an alternative splicing mechanism by which the GFI1B rs150813342 variant suppresses formation of a GFI1B isoform that preferentially promotes megakaryocyte differentiation and platelet production. These results demonstrate how unbiased studies of natural variation in blood cell traits can provide insight into the regulation of human hematopoiesis.


Assuntos
Processamento Alternativo/genética , Análise Mutacional de DNA , Exoma/genética , Loci Gênicos/genética , Hematopoese/genética , Proteínas Proto-Oncogênicas/genética , Proteínas Repressoras/genética , Plaquetas/citologia , Sistemas CRISPR-Cas , Edição de Genes , Células-Tronco Hematopoéticas/citologia , Humanos , Megacariócitos/citologia , Contagem de Plaquetas
17.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26432246

RESUMO

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Mapeamento Físico do Cromossomo , Sequência de Aminoácidos , Predisposição Genética para Doença , Genética Médica , Genética Populacional , Estudo de Associação Genômica Ampla , Genômica , Genótipo , Haplótipos/genética , Homozigoto , Humanos , Dados de Sequência Molecular , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Análise de Sequência de DNA , Deleção de Sequência/genética
18.
Nature ; 526(7571): 112-7, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26367794

RESUMO

The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10(-14)), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10(-11)). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.


Assuntos
Densidade Óssea/genética , Fraturas Ósseas/genética , Genoma Humano/genética , Proteínas de Homeodomínio/genética , Animais , Osso e Ossos/metabolismo , Modelos Animais de Doenças , Europa (Continente)/etnologia , Grupo com Ancestrais do Continente Europeu/genética , Exoma/genética , Feminino , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Genômica , Genótipo , Humanos , Camundongos , Análise de Sequência de DNA , Proteínas Wnt/genética
19.
Nature ; 526(7571): 82-90, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26367797

RESUMO

The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.


Assuntos
Doença/genética , Variação Genética/genética , Genoma Humano/genética , Saúde , Adiponectina/sangue , Alelos , Estudos de Coortes , Exoma/genética , Feminino , Predisposição Genética para Doença/genética , Genética Médica , Genética Populacional , Estudo de Associação Genômica Ampla , Genômica , Humanos , Metabolismo dos Lipídeos/genética , Masculino , Anotação de Sequência Molecular , Receptores de LDL/genética , Padrões de Referência , Análise de Sequência de DNA , Triglicerídeos/sangue , Reino Unido
20.
Nat Commun ; 6: 8111, 2015 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-26368830

RESUMO

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.


Assuntos
Grupo com Ancestrais do Continente Europeu/genética , Frequência do Gene , Variação Genética , Genótipo , Haplótipos , Modelos Estatísticos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Genoma Humano , Humanos , Itália , Pessoa de Meia-Idade , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Reino Unido , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA