RESUMO
Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.
Assuntos
População Negra/genética , Predisposição Genética para Doença , Genoma Humano/genética , Genômica , Feminino , Frequência do Gene/genética , Estudo de Associação Genômica Ampla , Humanos , Masculino , Polimorfismo de Nucleotídeo Único/genética , Uganda/epidemiologia , Sequenciamento Completo do GenomaRESUMO
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Assuntos
Variação Genética/genética , Genética Médica/tendências , Genoma Humano/genética , Genômica/tendências , África , África Subsaariana , Ásia/etnologia , Europa (Continente)/etnologia , Humanos , Fatores de Risco , Seleção Genética/genéticaRESUMO
In recent years long-read technologies have moved from being a niche and specialist field to a point of relative maturity likely to feature frequently in the genomic landscape. Analogous to next generation sequencing, the cost of sequencing using long-read technologies has materially dropped whilst the instrument throughput continues to increase. Together these changes present the prospect of sequencing large numbers of individuals with the aim of fully characterizing genomes at high resolution. In this article, we will endeavour to present an introduction to long-read technologies showing: what long reads are; how they are distinct from short reads; why long reads are useful and how they are being used. We will highlight the recent developments in this field, and the applications and potential of these technologies in medical research, and clinical diagnostics and therapeutics.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Genômica/métodos , HumanosRESUMO
MOTIVATION: Very low-depth sequencing has been proposed as a cost-effective approach to capture low-frequency and rare variation in complex trait association studies. However, a full characterization of the genotype quality and association power for very low-depth sequencing designs is still lacking. RESULTS: We perform cohort-wide whole-genome sequencing (WGS) at low depth in 1239 individuals (990 at 1× depth and 249 at 4× depth) from an isolated population, and establish a robust pipeline for calling and imputing very low-depth WGS genotypes from standard bioinformatics tools. Using genotyping chip, whole-exome sequencing (75× depth) and high-depth (22×) WGS data in the same samples, we examine in detail the sensitivity of this approach, and show that imputed 1× WGS recapitulates 95.2% of variants found by imputed GWAS with an average minor allele concordance of 97% for common and low-frequency variants. In our study, 1× further allowed the discovery of 140 844 true low-frequency variants with 73% genotype concordance when compared to high-depth WGS data. Finally, using association results for 57 quantitative traits, we show that very low-depth WGS is an efficient alternative to imputed GWAS chip designs, allowing the discovery of up to twice as many true association signals than the classical imputed GWAS design. AVAILABILITY AND IMPLEMENTATION: The HELIC genotype and WGS datasets have been deposited to the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/home): EGAD00010000518; EGAD00010000522; EGAD00010000610; EGAD00001001636, EGAD00001001637. The peakplotter software is available at https://github.com/wtsi-team144/peakplotter, the transformPhenotype app can be downloaded at https://github.com/wtsi-team144/transformPhenotype. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Genótipo , Humanos , Herança Multifatorial , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. FINDINGS: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. CONCLUSION: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Genoma , GenômicaRESUMO
Kaposi's sarcoma-associated herpesvirus (KSHV) and Epstein-Barr Virus (EBV) establish life-long infections and are associated with malignancies. Striking geographic variation in incidence and the fact that virus alone is insufficient to cause disease, suggests other co-factors are involved. Here we present epidemiological analysis and genome-wide association study (GWAS) in 4365 individuals from an African population cohort, to assess the influence of host genetic and non-genetic factors on virus antibody responses. EBV/KSHV co-infection (OR = 5.71(1.58-7.12)), HIV positivity (OR = 2.22(1.32-3.73)) and living in a more rural area (OR = 1.38(1.01-1.89)) are strongly associated with immunogenicity. GWAS reveals associations with KSHV antibody response in the HLA-B/C region (p = 6.64 × 10-09). For EBV, associations are identified for VCA (rs71542439, p = 1.15 × 10-12). Human leucocyte antigen (HLA) and trans-ancestry fine-mapping substantiate that distinct variants in HLA-DQA1 (p = 5.24 × 10-44) are driving associations for EBNA-1 in Africa. This study highlights complex interactions between KSHV and EBV, in addition to distinct genetic architectures resulting in important differences in pathogenesis and transmission.
Assuntos
Anticorpos Antivirais/biossíntese , Resistência à Doença/genética , Infecções por Vírus Epstein-Barr/genética , Infecções por Henipavirus/genética , Interações Hospedeiro-Patógeno/genética , Sarcoma de Kaposi/genética , Adolescente , Adulto , Antígenos Virais/genética , Antígenos Virais/imunologia , Proteínas do Capsídeo/genética , Proteínas do Capsídeo/imunologia , Coinfecção , Infecções por Vírus Epstein-Barr/epidemiologia , Infecções por Vírus Epstein-Barr/imunologia , Infecções por Vírus Epstein-Barr/virologia , Antígenos Nucleares do Vírus Epstein-Barr/genética , Antígenos Nucleares do Vírus Epstein-Barr/imunologia , Feminino , Expressão Gênica , Estudo de Associação Genômica Ampla , HIV/genética , HIV/imunologia , HIV/patogenicidade , Cadeias alfa de HLA-DQ/genética , Cadeias alfa de HLA-DQ/imunologia , Infecções por Henipavirus/epidemiologia , Infecções por Henipavirus/imunologia , Infecções por Henipavirus/virologia , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/imunologia , Herpesvirus Humano 4/patogenicidade , Herpesvirus Humano 8/genética , Herpesvirus Humano 8/imunologia , Herpesvirus Humano 8/patogenicidade , Interações Hospedeiro-Patógeno/imunologia , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , População Rural , Sarcoma de Kaposi/epidemiologia , Sarcoma de Kaposi/imunologia , Sarcoma de Kaposi/virologia , Uganda/epidemiologia , População UrbanaRESUMO
Australia was one of the earliest regions outside Africa to be colonized by fully modern humans, with archaeological evidence for human presence by 47,000 years ago (47 kya) widely accepted [1, 2]. However, the extent of subsequent human entry before the European colonial age is less clear. The dingo reached Australia about 4 kya, indirectly implying human contact, which some have linked to changes in language and stone tool technology to suggest substantial cultural changes at the same time [3]. Genetic data of two kinds have been proposed to support gene flow from the Indian subcontinent to Australia at this time, as well: first, signs of South Asian admixture in Aboriginal Australian genomes have been reported on the basis of genome-wide SNP data [4]; and second, a Y chromosome lineage designated haplogroup C(∗), present in both India and Australia, was estimated to have a most recent common ancestor around 5 kya and to have entered Australia from India [5]. Here, we sequence 13 Aboriginal Australian Y chromosomes to re-investigate their divergence times from Y chromosomes in other continents, including a comparison of Aboriginal Australian and South Asian haplogroup C chromosomes. We find divergence times dating back to â¼50 kya, thus excluding the Y chromosome as providing evidence for recent gene flow from India into Australia.
Assuntos
Cromossomos Humanos Y/genética , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética , Filogenia , Austrália , Fluxo Gênico , Haplótipos , Humanos , Índia , Masculino , Papua Nova Guiné/etnologiaRESUMO
Congenital heart defects (CHDs) have a neonatal incidence of 0.8-1% (refs. 1,2). Despite abundant examples of monogenic CHD in humans and mice, CHD has a low absolute sibling recurrence risk (â¼2.7%), suggesting a considerable role for de novo mutations (DNMs) and/or incomplete penetrance. De novo protein-truncating variants (PTVs) have been shown to be enriched among the 10% of 'syndromic' patients with extra-cardiac manifestations. We exome sequenced 1,891 probands, including both syndromic CHD (S-CHD, n = 610) and nonsyndromic CHD (NS-CHD, n = 1,281). In S-CHD, we confirmed a significant enrichment of de novo PTVs but not inherited PTVs in known CHD-associated genes, consistent with recent findings. Conversely, in NS-CHD we observed significant enrichment of PTVs inherited from unaffected parents in CHD-associated genes. We identified three genome-wide significant S-CHD disorders caused by DNMs in CHD4, CDK13 and PRKD1. Our study finds evidence for distinct genetic architectures underlying the low sibling recurrence risk in S-CHD and NS-CHD.