ABSTRACT
Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.
Subject(s)
Black People/genetics , Genetic Predisposition to Disease , Genome, Human/genetics , Genomics , Female , Gene Frequency/genetics , Genome-Wide Association Study , Humans , Male , Polymorphism, Single Nucleotide/genetics , Uganda/epidemiology , Whole Genome SequencingABSTRACT
Risk of disease is multifactorial and can be shaped by socio-economic, demographic, cultural, environmental and genetic factors. Our understanding of the genetic determinants of disease risk has greatly advanced with the advent of genome-wide association studies (GWAS), which detect associations between genetic variants and complex traits or diseases by comparing populations of cases and controls. However, much of this discovery has occurred through GWAS of individuals of European ancestry, with limited representation of other populations, including from Africa, The Americas, Asia and Oceania. Population demography, genetic drift and adaptation to environments over thousands of years have led globally to the diversification of populations. This global genomic diversity can provide new opportunities for discovery and translation into therapies, as well as a better understanding of population disease risk. Large-scale multi-ethnic and representative biobanks and population health resources provide unprecedented opportunities to understand the genetic determinants of disease on a global scale.
ABSTRACT
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
ABSTRACT
Pharmacogenomics is increasingly moving into mainstream clinical practice. Careful consideration must be paid to inclusion of diverse populations in research, translation and implementation, in the historical and social context of population stratification, to ensure that this leads to improvements in healthcare for all rather than increased health disparities. This review takes a broad and critical approach to the current role of diversity in pharmacogenomics and addresses potential pitfalls in order to raise awareness for prescribers. It also emphasizes evidence gaps and suggests approaches that may minimize negative consequences and promote health equality.
Subject(s)
Health Promotion , Pharmacogenetics , HumansABSTRACT
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Subject(s)
Genetic Variation/genetics , Genetics, Medical/trends , Genome, Human/genetics , Genomics/trends , Africa , Africa South of the Sahara , Asia/ethnology , Europe/ethnology , Humans , Risk Factors , Selection, Genetic/geneticsABSTRACT
In recent years long-read technologies have moved from being a niche and specialist field to a point of relative maturity likely to feature frequently in the genomic landscape. Analogous to next generation sequencing, the cost of sequencing using long-read technologies has materially dropped whilst the instrument throughput continues to increase. Together these changes present the prospect of sequencing large numbers of individuals with the aim of fully characterizing genomes at high resolution. In this article, we will endeavour to present an introduction to long-read technologies showing: what long reads are; how they are distinct from short reads; why long reads are useful and how they are being used. We will highlight the recent developments in this field, and the applications and potential of these technologies in medical research, and clinical diagnostics and therapeutics.
Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Algorithms , Animals , Genomics/methods , HumansABSTRACT
The linear mixed model (LMM) is now routinely used to estimate heritability. Unfortunately, as we demonstrate, LMM estimates of heritability can be inflated when using a standard model. To help reduce this inflation, we used a more general LMM with two random effects-one based on genomic variants and one based on easily measured spatial location as a proxy for environmental effects. We investigated this approach with simulated data and with data from a Uganda cohort of 4,778 individuals for 34 phenotypes including anthropometric indices, blood factors, glycemic control, blood pressure, lipid tests, and liver function tests. For the genomic random effect, we used identity-by-descent estimates from accurately phased genome-wide data. For the environmental random effect, we constructed a covariance matrix based on a Gaussian radial basis function. Across the simulated and Ugandan data, narrow-sense heritability estimates were lower using the more general model. Thus, our approach addresses, in part, the issue of "missing heritability" in the sense that much of the heritability previously thought to be missing was fictional. Software is available at https://github.com/MicrosoftGenomics/FaST-LMM.
Subject(s)
Environment , Linear Models , Models, Genetic , Phenotype , Humans , Inheritance PatternsABSTRACT
Background: Human noroviruses (HuNoVs) are a prominent cause of gastroenteritis, yet fundamental questions remain regarding epidemiology, diversity, and immunity in sub-Saharan African children. We investigated HuNoV seroprevalence and genetic and sociodemographic risk factors in Ugandan children. Methods: We randomly screened 797 participants of a longitudinal birth cohort (Entebbe, EMaBS) and 378 from a cross-sectional survey (rural Lake Victoria, LaVIISWA), for antibodies against HuNoV genotypes by ELISA. We used linear regression modeling to test for associations between HuNoV antibody levels and sociodemographic factors, and with the human susceptibility rs601338 FUT2 secretor SNP and histo-blood group antigens (A/B/O). Results: Of EMaBS participants, 76.6% were seropositive by age 1, rising to 94.5% by age 2 years. Seroprevalence in 1 year olds of the rural LaVIISWA survey was even higher (95%). In the birth cohort, 99% of seropositive 2 year olds had responses to multiple HuNoV genotypes. We identified associations between secretor status and genogroup GII antibody levels (GII.4 P = 3.1 × 10-52), as well as ABO and GI (GI.2 P = 2.1 × 10-12). Conclusions: HuNoVs are highly prevalent in Ugandan children, indicating a substantial burden of diarrhea-associated morbidity with recurrent infections. Public health interventions, including vaccination, and increased surveillance are urgently needed.
Subject(s)
Antibodies, Viral/blood , Caliciviridae Infections/epidemiology , Caliciviridae Infections/virology , Genetic Variation , Genotype , Norovirus/classification , Norovirus/immunology , Blood Group Antigens/analysis , Caliciviridae Infections/genetics , Caliciviridae Infections/immunology , Child, Preschool , Cross-Sectional Studies , Demography , Disease Susceptibility , Enzyme-Linked Immunosorbent Assay , Female , Fucosyltransferases/genetics , Gastroenteritis/epidemiology , Gastroenteritis/genetics , Gastroenteritis/immunology , Gastroenteritis/virology , Humans , Infant , Infant, Newborn , Longitudinal Studies , Male , Norovirus/genetics , Polymorphism, Single Nucleotide , Pregnancy , Risk Factors , Seroepidemiologic Studies , Socioeconomic Factors , Uganda/epidemiology , Galactoside 2-alpha-L-fucosyltransferaseABSTRACT
The predominantly African origin of all modern human populations is well established, but the route taken out of Africa is still unclear. Two alternative routes, via Egypt and Sinai or across the Bab el Mandeb strait into Arabia, have traditionally been proposed as feasible gateways in light of geographic, paleoclimatic, archaeological, and genetic evidence. Distinguishing among these alternatives has been difficult. We generated 225 whole-genome sequences (225 at 8× depth, of which 8 were increased to 30×; Illumina HiSeq 2000) from six modern Northeast African populations (100 Egyptians and five Ethiopian populations each represented by 25 individuals). West Eurasian components were masked out, and the remaining African haplotypes were compared with a panel of sub-Saharan African and non-African genomes. We showed that masked Northeast African haplotypes overall were more similar to non-African haplotypes and more frequently present outside Africa than were any sets of haplotypes derived from a West African population. Furthermore, the masked Egyptian haplotypes showed these properties more markedly than the masked Ethiopian haplotypes, pointing to Egypt as the more likely gateway in the exodus to the rest of the world. Using five Ethiopian and three Egyptian high-coverage masked genomes and the multiple sequentially Markovian coalescent (MSMC) approach, we estimated the genetic split times of Egyptians and Ethiopians from non-African populations at 55,000 and 65,000 years ago, respectively, whereas that of West Africans was estimated to be 75,000 years ago. Both the haplotype and MSMC analyses thus suggest a predominant northern route out of Africa via Egypt.
Subject(s)
Biological Evolution , Black People/genetics , Genome, Human/genetics , Human Migration/history , Base Sequence , Egypt, Ancient , Ethiopia , Geography , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , History, Ancient , Humans , Markov Chains , Models, Genetic , Molecular Sequence Data , Principal Component AnalysisABSTRACT
Previous genome-wide association studies (GWAS) of HIV-1-infected populations have been underpowered to detect common variants with moderate impact on disease outcome and have not assessed the phenotypic variance explained by genome-wide additive effects. By combining the majority of available genome-wide genotyping data in HIV-infected populations, we tested for association between â¼8 million variants and viral load (HIV RNA copies per milliliter of plasma) in 6,315 individuals of European ancestry. The strongest signal of association was observed in the HLA class I region that was fully explained by independent effects mapping to five variable amino acid positions in the peptide binding grooves of the HLA-B and HLA-A proteins. We observed a second genome-wide significant association signal in the chemokine (C-C motif) receptor (CCR) gene cluster on chromosome 3. Conditional analysis showed that this signal could not be fully attributed to the known protective CCR5Δ32 allele and the risk P1 haplotype, suggesting further causal variants in this region. Heritability analysis demonstrated that common human genetic variation-mostly in the HLA and CCR5 regions-explains 25% of the variability in viral load. This study suggests that analyses in non-European populations and of variant classes not assessed by GWAS should be priorities for the field going forward.
Subject(s)
Genetic Predisposition to Disease , HIV-1/genetics , Host-Pathogen Interactions/genetics , Polymorphism, Single Nucleotide/genetics , Viral Load/genetics , Adult , Alleles , Amino Acids/genetics , Chromosomes, Human, Pair 3/genetics , Genome-Wide Association Study , HLA-B Antigens/genetics , Humans , Inheritance Patterns/genetics , Physical Chromosome Mapping , Receptors, CCR5/geneticsABSTRACT
Background: Previous genetic association studies of human immunodeficiency virus-1 (HIV-1) progression have focused on common human genetic variation ascertained through genome-wide genotyping. Methods: We sought to systematically assess the full spectrum of functional variation in protein coding gene regions on HIV-1 progression through exome sequencing of 1327 individuals. Genetic variants were tested individually and in aggregate across genes and gene sets for an influence on HIV-1 viral load. Results: Multiple single variants within the major histocompatibility complex (MHC) region were observed to be strongly associated with HIV-1 outcome, consistent with the known impact of classical HLA alleles. However, no single variant or gene located outside of the MHC region was significantly associated with HIV progression. Set-based association testing focusing on genes identified as being essential for HIV replication in genome-wide small interfering RNA (siRNA) and clustered regularly interspaced short palindromic repeats (CRISPR) studies did not reveal any novel associations. Conclusions: These results suggest that exonic variants with large effect sizes are unlikely to have a major contribution to host control of HIV infection.
Subject(s)
Exome Sequencing , HIV Infections/genetics , HIV Infections/virology , HIV-1/genetics , Host-Pathogen Interactions/genetics , Viral Load/genetics , Adult , Female , Genetic Predisposition to Disease , Genetic Variation , Genotype , Humans , Male , Middle Aged , Polymorphism, Single NucleotideSubject(s)
COVID-19/prevention & control , Guidelines as Topic , Schools , Humans , United Kingdom/epidemiologySubject(s)
COVID-19 Vaccines , COVID-19 , Disease Transmission, Infectious , Health Policy , Patient Acceptance of Health Care , Public Health , Child , Humans , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines/administration & dosage , Disease Transmission, Infectious/prevention & control , Pandemics/prevention & control , SARS-CoV-2 , United Kingdom/epidemiologyABSTRACT
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
Subject(s)
Haplotypes/genetics , Chromosome Mapping/methods , Cohort Effect , Family , Genotype , Humans , Models, Genetic , Pedigree , Phenotype , Recombination, Genetic/geneticsSubject(s)
Communicable Disease Control/methods , Coronavirus Infections/prevention & control , Coronavirus Infections/transmission , Pandemics/prevention & control , Pneumonia, Viral/prevention & control , Pneumonia, Viral/transmission , Betacoronavirus , COVID-19 , Communicable Disease Control/standards , Consensus , Europe , Evidence-Based Medicine , Humans , Immunity, Herd , SARS-CoV-2ABSTRACT
Vast quantities of open-source data from news reports, social media and other sources can be harnessed using artificial intelligence and machine learning, and utilised to generate valid early warning signals of emerging epidemics. Early warning signals from open-source data are not a replacement for traditional, validated disease surveillance, but provide a trigger for earlier investigation and diagnostics. This may yield earlier pathogen characterisation and genomic data, which can enable earlier vaccine development or deployment of vaccines. Early warning also provides a more feasible prospect of stamping out epidemics before they spread. There are several of such systems currently, but they are not used widely in public health practice, and only some are publicly available. Routine and widespread use of open-source intelligence, as well as training and capacity building in digital surveillance, will improve pandemic preparedness and early response capability.