RESUMEN
Esophageal squamous cell carcinoma (ESCC) has a high disease burden in sub-Saharan Africa and has a very poor prognosis. Genome-wide association studies (GWASs) of ESCC in predominantly East Asian populations indicate a substantial genetic contribution to its etiology, but no genome-wide studies have been done in populations of African ancestry. Here, we report a GWAS in 1,686 African individuals with ESCC and 3,217 population-matched control individuals to investigate its genetic etiology. We identified a genome-wide-significant risk locus on chromosome 9 upstream of FAM120A (rs12379660, p = 4.58 × 10-8, odds ratio = 1.28, 95% confidence interval = 1.22-1.34), as well as a potential African-specific risk locus on chromosome 2 (rs142741123, p = 5.49 × 10-8) within MYO1B. FAM120A is a component of oxidative stress-induced survival signals, and the associated variants at the FAM120A locus co-localized with highly significant cis-eQTLs in FAM120AOS in both esophageal mucosa and esophageal muscularis tissue. A trans-ethnic meta-analysis was then performed with the African ESCC study and a Chinese ESCC study in a combined total of 3,699 ESCC-affected individuals and 5,918 control individuals, which identified three genome-wide-significant loci on chromosome 9 at FAM120A (rs12379660, pmeta = 9.36 × 10-10), chromosome 10 at PLCE1 (rs7099485, pmeta = 1.48 × 10-8), and chromosome 22 at CHEK2 (rs1033667, pmeta = 1.47 × 10-9). This indicates the existence of both shared and distinct genetic risk loci for ESCC in African and Asian populations. Our GWAS of ESCC conducted in a population of African ancestry indicates a substantial genetic contribution to ESCC risk in Africa.
Asunto(s)
Carcinoma de Células Escamosas , Neoplasias Esofágicas , Carcinoma de Células Escamosas de Esófago , Humanos , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patología , Estudios de Casos y Controles , Pueblos del Este de Asia , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/epidemiología , Neoplasias Esofágicas/patología , Carcinoma de Células Escamosas de Esófago/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética , Pueblo AfricanoRESUMEN
The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals-comprising 50 ethnolinguistic groups, including previously unsampled populations-to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon-but in other genes, variants denoted as 'likely pathogenic' in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.
Asunto(s)
Variación Genética , Genoma Humano/genética , Genómica , Salud , Migración Humana , África/etnología , Reparación del ADN/genética , Conjuntos de Datos como Asunto , Femenino , Flujo Génico , Genética Médica , Genética de Población , Salud/historia , Historia Antigua , Migración Humana/historia , Humanos , Inmunidad/genética , Lenguaje , Masculino , Metabolismo/genética , Selección Genética , Secuenciación Completa del GenomaRESUMEN
The complex pathogenesis of rheumatoid arthritis (RA) is not fully understood, with few studies exploring the genomic contribution to RA in patients from Africa. We report a genome-wide association study (GWAS) of South-Eastern Bantu-Speaking South Africans (SEBSSAs) with seropositive RA (n = 531) and population controls (n = 2653). Association testing was performed using PLINK (logistic regression assuming an additive model) with sex, age, smoking and the first three principal components as covariates. The strong association with the Human Leukocyte Antigen (HLA) region, indexed by rs602457 (near HLA-DRB1), was replicated. An additional independent signal in the HLA region represented by the lead SNP rs2523593 (near the HLA-B gene; Conditional P-value = 6.4 × 10-10) was detected. Although none of the non-HLA signals reached genome-wide significance (P < 5 × 10-8), 17 genomic regions showed suggestive association (P < 5 × 10-6). The GWAS replicated two known non-HLA associations with MMEL1 (rs2843401) and ANKRD55 (rs7731626) at a threshold of P < 5 × 10-3 providing, for the first time, evidence for replication of non-HLA signals for RA in sub-Saharan African populations. Meta-analysis with summary statistics from an African-American cohort (CLEAR study) replicated three additional non-HLA signals (rs11571302, rs2558210 and rs2422345 around KRT18P39-NPM1P33, CTLA4-ICOS and AL645568.1, respectively). Analysis based on genomic regions (200 kb windows) further replicated previously reported non-HLA signals around PADI4, CD28 and LIMK1. Although allele frequencies were overall strongly correlated between the SEBSSA and the CLEAR cohort, we observed some differences in effect size estimates for associated loci. The study highlights the need for conducting larger association studies across diverse African populations to inform precision medicine-based approaches for RA in Africa.
Asunto(s)
Artritis Reumatoide , Estudio de Asociación del Genoma Completo , Antígenos HLA , Humanos , Artritis Reumatoide/genética , Predisposición Genética a la Enfermedad , Antígenos HLA/genética , Cadenas HLA-DRB1/genética , Quinasas Lim/genética , Polimorfismo de Nucleótido Simple , SudáfricaRESUMEN
The presence of Early and Middle Stone Age human remains and associated archeological artifacts from various sites scattered across southern Africa, suggests this geographic region to be one of the first abodes of anatomically modern humans. Although the presence of hunter-gatherer cultures in this region dates back to deep times, the peopling of southern Africa has largely been reshaped by three major sets of migrations over the last 2000 years. These migrations have led to a confluence of four distinct ancestries (San hunter-gatherer, East-African pastoralist, Bantu-speaker farmer and Eurasian) in populations from this region. In this review, we have summarized the recent insights into the refinement of timelines and routes of the migration of Bantu-speaking populations to southern Africa and their admixture with resident southern African Khoe-San populations. We highlight two recent studies providing evidence for the emergence of fine-scale population structure within some South-Eastern Bantu-speaker groups. We also accentuate whole genome sequencing studies (current and ancient) that have both enhanced our understanding of the peopling of southern Africa and demonstrated a huge potential for novel variant discovery in populations from this region. Finally, we identify some of the major gaps and inconsistencies in our understanding and emphasize the importance of more systematic studies of southern African populations from diverse ethnolinguistic groups and geographic locations.
Asunto(s)
Población Negra/genética , ADN Antiguo/análisis , Migración Humana/historia , África Austral/etnología , Genética de Población , Haplotipos , Historia Antigua , Humanos , Lenguaje , Secuenciación Completa del GenomaRESUMEN
A single nucleotide polymorphism (SNP), 251 bases upstream from the IL-8 transcription start (-251A>T, rs4073), has been extensively investigated in cancers and inflammatory and infectious diseases in predominantly European and Asian populations. We sequenced the IL-8 gene of 109 black and 32 white South African (SA) individuals and conducted detailed characterization of gene variation and haplotype structure. IL-8 production in phytohaemagglutinin (PHA)-stimulated peripheral blood mononuclear cells (PBMCs) of a subset (black: N = 22; white: N = 32) of these individuals was measured using ELISA. Select variants were genotyped for additional black individuals (N = 141), and data from the 1000 Genomes Project were used for haplotype analysis and comparative purposes. In white individuals, the -251A>T SNP formed part of a prevalent six-variant haplotype [haplotype frequency (HF): 61%], Hap-1C, involving the following variants: -251A>T; +394T>G (rs2227307); +780C>T (rs2227306); +1240->A (rs2227541); +1635C>T (rs2227543) and +2770A>T (rs2227543). Hap-1C (-251T+394T+780C+1240+A+1635C+2770A) was composed of two three-variant sub-haplotypes [Hap-1Ca: -251T+394T+1240+A; Hap-1Cb: +780C+1635C+2770A) sharing similarities with haplotypes identified in the black population. Hap-1C was found to be present in European, East and South Asian populations. Four haplotypes were identified in the black population with the two prevalent haplotypes each comprised of two variants: Hap-1B [-251A>T and +1240->A; -251T+1240+A; HF: 14%] and Hap-2B [-743T>C (rs2227532) and +2452A>C (rs2227545); -743C+2452C; HF: 13%]. Populations did not differ in unstimulated PBMC IL-8 production. Upon PHA stimulation, PBMCs from white individuals produced more IL-8 (P = 0.04), suggesting the -251T allele is responsible for higher production, however further analysis revealed that Hap-1C (and constituent sub-haplotypes), did not associate with IL-8 production. Populations did however differ in monocyte number with the white population having significantly more monocytes compared to the black population (P = 0.025), and furthermore monocyte number strongly correlated with IL-8 production in both population groups (black: p = 0.0002, r = 0.71; white: P = 0.0005, r = 0.59). Hap-1B, Hap-2B, and a SNP located one base pair upstream of the IL-8 ATG start codon, +100C>T SNP (rs2227538), all associated with higher IL-8 production in the black population - individuals harbouring at least one of these haplotypes/variant associated with higher IL-8 production (P = 0.003) compared to individuals without. The black population was enriched for individuals harbouring Hap-1B and/or Hap-2B compared to the 1000 Genomes project sub-Saharan African population (P = 0.006), suggesting that SA black individuals may be high IL-8 producers. Given the paucity of IL-8-related studies that have been conducted in populations from sub-Saharan Africa, this study has significantly increased our understanding of this important chemokine in the South African population.
Asunto(s)
Etnicidad/genética , Variación Genética , Genética de Población , Haplotipos/genética , Interleucina-8/genética , Adulto , África del Sur del Sahara , Alelos , Población Negra/genética , Femenino , Frecuencia de los Genes , Humanos , Interleucina-8/sangre , Leucocitos Mononucleares/metabolismo , Desequilibrio de Ligamiento/genética , Masculino , Persona de Mediana Edad , Monocitos/efectos de los fármacos , Monocitos/metabolismo , Fitohemaglutininas/farmacología , Sudáfrica , Población Blanca/genética , Adulto JovenRESUMEN
Genetic variation and susceptibility to disease are shaped by human demographic history and adaptation. We can now study the genomes of extant Africans and uncover traces of population migration, admixture, assimilation and selection by applying sophisticated computational algorithms. There are four major ethnolinguistic divisions among present day Africans: Hunter-gatherer populations in southern and central Africa; Nilo-Saharan speakers from north and northeast Africa; Afro-Asiatic speakers from north and east Africa; and Niger-Congo speakers who are the predominant ethnolinguistic group spread across most of sub-Saharan Africa. The enormous ethnolinguistic diversity in sub-Saharan African populations is largely paralleled by extensive genetic diversity and until a decade ago, little was known about detailed origins and divergence of these groups. Results from large-scale population genetic studies, and more recently whole genome sequence data, are unravelling the critical role of events like migration and admixture and environmental factors including diet, infectious diseases and climatic conditions in shaping current population diversity. It is now possible to start providing quantitative estimates of divergence times, population size and dynamic processes that have affected populations and their genetic risk for disease. Finally, the availability of ancient genomes from Africa provides historical insights of unprecedented depth. In this review, we highlight some key interpretations that have emerged from recent African genome studies.
Asunto(s)
Adaptación Biológica/genética , Población Negra/genética , África/etnología , Evolución Biológica , Etnicidad/genética , Evolución Molecular , Técnicas Genéticas , Variación Genética/genética , Genética , Genética de Población/métodos , Genómica/métodos , Haplotipos/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Secuenciación Completa del Genoma/métodosRESUMEN
The H3ABioNet pan-African bioinformatics network, which is funded to support the Human Heredity and Health in Africa (H3Africa) program, has developed node-assessment exercises to gauge the ability of its participating research and service groups to analyze typical genome-wide datasets being generated by H3Africa research groups. We describe a framework for the assessment of computational genomics analysis skills, which includes standard operating procedures, training and test datasets, and a process for administering the exercise. We present the experiences of 3 research groups that have taken the exercise and the impact on their ability to manage complex projects. Finally, we discuss the reasons why many H3ABioNet nodes have declined so far to participate and potential strategies to encourage them to do so.
Asunto(s)
Población Negra/genética , Bases de Datos Genéticas , Genómica/métodos , Sistemas de Administración de Bases de Datos , Países en Desarrollo , Humanos , Nigeria , SudáfricaRESUMEN
Socio-economic status of participants in many public health, epidemiological, and genome-wide association studies is an important trait of interest. It is often used in these studies as a measure of direct interest or as a covariate. The Africa Wits INDEPTH Partnership for Genomic and Environmental Research (AWI-Gen) explores genomic and environmental factors in non-communicable diseases, particularly cardio-metabolic disease. In Phase I of AWI-Gen, approximately 12,000 participants were recruited at six sites in four African countries. Participants were asked questions about asset ownership. This technical note describes how AWI-Gen computed socio-economic status from the asset register.
RESUMEN
We hypothesized that subjects with heterozygous loss-of-function (LoF) ACE mutations are at risk for Alzheimer's disease because amyloid Aß42, a primary component of the protein aggregates that accumulate in the brains of AD patients, is cleaved by ACE (angiotensin I-converting enzyme). Thus, decreased ACE activity in the brain, either due to genetic mutation or the effects of ACE inhibitors, could be a risk factor for AD. To explore this hypothesis in the current study, existing SNP databases were analyzed for LoF ACE mutations using four predicting tools, including PolyPhen-2, and compared with the topology of known ACE mutations already associated with AD. The combined frequency of >400 of these LoF-damaging ACE mutations in the general population is quite significant-up to 5%-comparable to the frequency of AD in the population > 70 y.o., which indicates that the contribution of low ACE in the development of AD could be under appreciated. Our analysis suggests several mechanisms by which ACE mutations may be associated with Alzheimer's disease. Systematic analysis of blood ACE levels in patients with all ACE mutations is likely to have clinical significance because available sequencing data will help detect persons with increased risk of late-onset Alzheimer's disease. Patients with transport-deficient ACE mutations (about 20% of damaging ACE mutations) may benefit from preventive or therapeutic treatment with a combination of chemical and pharmacological (e.g., centrally acting ACE inhibitors) chaperones and proteosome inhibitors to restore impaired surface ACE expression, as was shown previously by our group for another transport-deficient ACE mutation-Q1069R.
RESUMEN
Based on evaluations of imputation performed on a genotype dataset consisting of about 11,000 sub-Saharan African (SSA) participants, we show Trans-Omics for Precision Medicine (TOPMed) and the African Genome Resource (AGR) to be currently the best panels for imputing SSA datasets. We report notable differences in the number of single-nucleotide polymorphisms (SNPs) that are imputed by different panels in datasets from East, West, and South Africa. Comparisons with a subset of 95 SSA high-coverage whole-genome sequences (WGSs) show that despite being about 20-fold smaller, the AGR imputed dataset has higher concordance with the WGSs. Moreover, the level of concordance between imputed and WGS datasets was strongly influenced by the extent of Khoe-San ancestry in a genome, highlighting the need for integration of not only geographically but also ancestrally diverse WGS data in reference panels for further improvement in imputation of SSA datasets. Approaches that integrate imputed data from different panels could also lead to better imputation.
RESUMEN
BACKGROUND: The three-dimensional structure of a protein can be described as a graph where nodes represent residues and the strength of non-covalent interactions between them are edges. These protein contact networks can be separated into long and short-range interactions networks depending on the positions of amino acids in primary structure. Long-range interactions play a distinct role in determining the tertiary structure of a protein while short-range interactions could largely contribute to the secondary structure formations. In addition, physico chemical properties and the linear arrangement of amino acids of the primary structure of a protein determines its three dimensional structure. Here, we present an extensive analysis of protein contact subnetworks based on the London van der Waals interactions of amino acids at different length scales. We further subdivided those networks in hydrophobic, hydrophilic and charged residues networks and have tried to correlate their influence in the overall topology and organization of a protein. RESULTS: The largest connected component (LCC) of long (LRN)-, short (SRN)- and all-range (ARN) networks within proteins exhibit a transition behaviour when plotted against different interaction strengths of edges among amino acid nodes. While short-range networks having chain like structures exhibit highly cooperative transition; long- and all-range networks, which are more similar to each other, have non-chain like structures and show less cooperativity. Further, the hydrophobic residues subnetworks in long- and all-range networks have similar transition behaviours with all residues all-range networks, but the hydrophilic and charged residues networks don't. While the nature of transitions of LCC's sizes is same in SRNs for thermophiles and mesophiles, there exists a clear difference in LRNs. The presence of larger size of interconnected long-range interactions in thermophiles than mesophiles, even at higher interaction strength between amino acids, give extra stability to the tertiary structure of the thermophiles. All the subnetworks at different length scales (ARNs, LRNs and SRNs) show assortativity mixing property of their participating amino acids. While there exists a significant higher percentage of hydrophobic subclusters over others in ARNs and LRNs; we do not find the assortative mixing behaviour of any the subclusters in SRNs. The clustering coefficient of hydrophobic subclusters in long-range network is the highest among types of subnetworks. There exist highly cliquish hydrophobic nodes followed by charged nodes in LRNs and ARNs; on the other hand, we observe the highest dominance of charged residues cliques in short-range networks. Studies on the perimeter of the cliques also show higher occurrences of hydrophobic and charged residues' cliques. CONCLUSIONS: The simple framework of protein contact networks and their subnetworks based on London van der Waals force is able to capture several known properties of protein structure as well as can unravel several new features. The thermophiles do not only have the higher number of long-range interactions; they also have larger cluster of connected residues at higher interaction strengths among amino acids, than their mesophilic counterparts. It can reestablish the significant role of long-range hydrophobic clusters in protein folding and stabilization; at the same time, it shed light on the higher communication ability of hydrophobic subnetworks over the others. The results give an indication of the controlling role of hydrophobic subclusters in determining protein's folding rate. The occurrences of higher perimeters of hydrophobic and charged cliques imply the role of charged residues as well as hydrophobic residues in stabilizing the distant part of primary structure of a protein through London van der Waals interaction.
Asunto(s)
Interacciones Hidrofóbicas e Hidrofílicas , Conformación Proteica , Proteínas/química , Aminoácidos/química , Análisis por Conglomerados , Pliegue de ProteínaRESUMEN
Genetic associations for lipid traits have identified hundreds of variants with clear differences across European, Asian and African studies. Based on a sub-Saharan-African GWAS for lipid traits in the population cross-sectional AWI-Gen cohort (N = 10,603) we report a novel LDL-C association in the GATB region (P-value=1.56 × 10-8). Meta-analysis with four other African cohorts (N = 23,718) provides supporting evidence for the LDL-C association with the GATB/FHIP1A region and identifies a novel triglyceride association signal close to the FHIT gene (P-value =2.66 × 10-8). Our data enable fine-mapping of several well-known lipid-trait loci including LDLR, PMFBP1 and LPA. The transferability of signals detected in two large global studies (GLGC and PAGE) consistently improves with an increase in the size of the African replication cohort. Polygenic risk score analysis shows increased predictive accuracy for LDL-C levels with the narrowing of genetic distance between the discovery dataset and our cohort. Novel discovery is enhanced with the inclusion of African data.
Asunto(s)
Estudio de Asociación del Genoma Completo , África del Sur del Sahara , LDL-Colesterol/genética , Estudios Transversales , HumanosRESUMEN
Aim: Despite the high disease burden of human immunodeficiency virus (HIV) infection and colorectal cancer (CRC) in South Africa (SA), treatment-relevant pharmacogenetic variants are understudied. Materials & methods: Using publicly available genotype and gene expression data, a bioinformatic pipeline was developed to identify liver expression quantitative trait loci (eQTLs). Results: A novel cis-eQTL, rs28967009, was identified for UGT1A1, which is predicted to upregulate UGT1A1 expression thereby potentially affecting the metabolism of dolutegravir and irinotecan, which are extensively prescribed in SA for HIV and colorectal cancer treatment, respectively. Conclusion: As increased UGT1A1 expression could affect the clinical outcome of dolutegravir and irinotecan treatment by increasing drug clearance, patients with the rs28967009A variant may require increased drug doses to reach therapeutic levels or should be prescribed alternative drugs.
Asunto(s)
Fármacos Anti-VIH/farmacocinética , Fármacos Anti-VIH/uso terapéutico , Antineoplásicos/farmacocinética , Antineoplásicos/uso terapéutico , Neoplasias Colorrectales/tratamiento farmacológico , Neoplasias Colorrectales/genética , Glucuronosiltransferasa/genética , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/genética , Antineoplásicos Fitogénicos , Biología Computacional , Genotipo , Compuestos Heterocíclicos con 3 Anillos/farmacocinética , Compuestos Heterocíclicos con 3 Anillos/uso terapéutico , Humanos , Irinotecán/farmacocinética , Irinotecán/uso terapéutico , Hígado/enzimología , Oxazinas/farmacocinética , Oxazinas/uso terapéutico , Piperazinas/farmacocinética , Piperazinas/uso terapéutico , Piridonas/farmacocinética , Piridonas/uso terapéutico , Control de Calidad , Sudáfrica , Resultado del Tratamiento , Regulación hacia ArribaRESUMEN
South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.
Asunto(s)
Población Negra/genética , Demografía , Flujo Génico , Estudio de Asociación del Genoma Completo , Lenguaje , Cromosomas Humanos Y/genética , Etnicidad , Femenino , Frecuencia de los Genes , Variación Genética , Genética de Población , Genómica , Geografía , Haplotipos , Humanos , Lingüística , Masculino , Filogenia , SudáfricaRESUMEN
Genetic characterisation of a non-coding region allelic variant, HLA-DPB1*34:01:01:03, in Black South African individuals.
Asunto(s)
Población Negra/genética , Cadenas beta de HLA-DP/genética , Mutación INDEL , Polimorfismo de Nucleótido Simple , Alelos , Población Negra/estadística & datos numéricos , Estudios de Casos y Controles , Frecuencia de los Genes , Estudios de Asociación Genética , Tamización de Portadores Genéticos , Genética de Población/métodos , Seropositividad para VIH/sangre , Seropositividad para VIH/etnología , Seropositividad para VIH/genética , VIH-1 , Secuenciación de Nucleótidos de Alto Rendimiento , Prueba de Histocompatibilidad/métodos , Humanos , Sudáfrica/epidemiología , Secuenciación Completa del GenomaRESUMEN
Genetic characterisation of a novel intra-locus recombinant allele, HLA-DPB1*835:01:01:02, in Black South African individuals.
Asunto(s)
Población Negra/genética , Cadenas beta de HLA-DP/genética , Mutación INDEL , Polimorfismo de Nucleótido Simple , Recombinación Genética/fisiología , Regiones no Traducidas 5'/genética , Alelos , Población Negra/estadística & datos numéricos , Frecuencia de los Genes , Estudios de Asociación Genética/métodos , Seropositividad para VIH/etnología , Seropositividad para VIH/genética , VIH-1 , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Prueba de Histocompatibilidad/métodos , Humanos , Intrones/genética , Polimorfismo Genético , Análisis de Secuencia de ADN , Sudáfrica/epidemiología , Secuenciación Completa del Genoma/métodosRESUMEN
INTRODUCTION: Atherosclerosis is a key contributor to the burden of cardiovascular diseases (CVDs) and many epidemiological studies have reported on the effect of smoking on carotid intima-media thickness (cIMT) and its subsequent effect on CVD risk. Gene-environment interaction studies have contributed towards understanding some of the missing heritability of genome-wide association studies. Gene-smoking interactions on cIMT have been studied in non-African populations (European, Latino-American, and African American) but no comparable African research has been reported. Our aim was to investigate smoking-SNP interactions on cIMT in two West African populations by genome-wide analysis. MATERIALS AND METHODS: Only male participants from Burkina Faso (Nanoro = 993) and Ghana (Navrongo = 783) were included, as smoking was extremely rare among women. Phenotype and genotype data underwent stringent QC and genotype imputation was performed using the Sanger African Imputation Panel. Smoking prevalence among men was 13.3% in Nanoro and 42.5% in Navrongo. We analyzed gene-smoking interactions with PLINK after adjusting for covariates: age and 6 PCs (Model 1); age, BMI, blood pressure, fasting glucose, cholesterol levels, MVPA, and 6 PCs (Model 2). All analyses were performed at site level and for the combined data set. RESULTS: In Nanoro, we identified new gene-smoking interaction variants for cIMT within the previously described RCBTB1 region (rs112017404, rs144170770, and rs4941649) (Model 1: p = 1.35E-07; Model 2: p = 3.08E-08). In the combined sample, two novel intergenic interacting variants were identified, rs1192824 in the regulatory region of TBC1D8 (p = 5.90E-09) and rs77461169 (p = 4.48E-06) located in an upstream region of open chromatin. In silico functional analysis suggests the involvement of genes implicated in biological processes related to cell or biological adhesion and regulatory processes in gene-smoking interactions with cIMT (as evidenced by chromatin interactions and eQTLs). DISCUSSION: This is the first gene-smoking interaction study for cIMT, as a risk factor for atherosclerosis, in sub-Saharan African populations. In addition to replicating previously known signals for RCBTB1, we identified two novel genomic regions (TBC1D8, near BCHE) involved in this gene-environment interaction.
RESUMEN
The Southern African Human Genome Programme is a national initiative that aspires to unlock the unique genetic character of southern African populations for a better understanding of human genetic diversity. In this pilot study the Southern African Human Genome Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique variants are identified. Despite the shallow time depth since divergence between the two main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component analysis and structure analysis reveal significant (p < 10-6) differentiation, and FST analysis identifies regions with high divergence. The Coloured individuals show evidence of varying proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity, increasing our understanding of the complex and region-specific history of African populations and highlighting its potential impact on biomedical research and genetic susceptibility to disease.