Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 16 de 16
1.
Database (Oxford) ; 20242024 Jan 10.
Article En | MEDLINE | ID: mdl-38204360

There is growing evidence that comprehensive and harmonized metadata are fundamental for effective public data reusability. However, it is often challenging to extract accurate metadata from public repositories. Of particular concern is the metagenomic data related to African individuals, which often omit important information about the particular features of these populations. As part of a collaborative consortium, H3ABioNet, we created a web portal, namely the African Human Microbiome Portal (AHMP), exclusively dedicated to metadata related to African human microbiome samples. Metadata were collected from various public repositories prior to cleaning, curation and harmonization according to a pre-established guideline and using ontology terms. These metadata sets can be accessed at https://microbiome.h3abionet.org/. This web portal is open access and offers an interactive visualization of 14 889 records from 70 bioprojects associated with 72 peer reviewed research articles. It also offers the ability to download harmonized metadata according to the user's applied filters. The AHMP thereby supports metadata search and retrieve operations, facilitating, thus, access to relevant studies linked to the African Human microbiome. Database URL:  https://microbiome.h3abionet.org/.


Metadata , Microbiota , Humans , Metagenome , Databases, Factual , Metagenomics , Microbiota/genetics
2.
Cell Genom ; 3(6): 100332, 2023 Jun 14.
Article En | MEDLINE | ID: mdl-37388906

Based on evaluations of imputation performed on a genotype dataset consisting of about 11,000 sub-Saharan African (SSA) participants, we show Trans-Omics for Precision Medicine (TOPMed) and the African Genome Resource (AGR) to be currently the best panels for imputing SSA datasets. We report notable differences in the number of single-nucleotide polymorphisms (SNPs) that are imputed by different panels in datasets from East, West, and South Africa. Comparisons with a subset of 95 SSA high-coverage whole-genome sequences (WGSs) show that despite being about 20-fold smaller, the AGR imputed dataset has higher concordance with the WGSs. Moreover, the level of concordance between imputed and WGS datasets was strongly influenced by the extent of Khoe-San ancestry in a genome, highlighting the need for integration of not only geographically but also ancestrally diverse WGS data in reference panels for further improvement in imputation of SSA datasets. Approaches that integrate imputed data from different panels could also lead to better imputation.

3.
J Pers Med ; 12(2)2022 Feb 11.
Article En | MEDLINE | ID: mdl-35207753

Genomics data are currently being produced at unprecedented rates, resulting in increased knowledge discovery and submission to public data repositories. Despite these advances, genomic information on African-ancestry populations remains significantly low compared with European- and Asian-ancestry populations. This information is typically segmented across several different biomedical data repositories, which often lack sufficient fine-grained structure and annotation to account for the diversity of African populations, leading to many challenges related to the retrieval, representation and findability of such information. To overcome these challenges, we developed the African Genomic Medicine Portal (AGMP), a database that contains metadata on genomic medicine studies conducted on African-ancestry populations. The metadata is curated from two public databases related to genomic medicine, PharmGKB and DisGeNET. The metadata retrieved from these source databases were limited to genomic variants that were associated with disease aetiology or treatment in the context of African-ancestry populations. Over 2000 variants relevant to populations of African ancestry were retrieved. Subsequently, domain experts curated and annotated additional information associated with the studies that reported the variants, including geographical origin, ethnolinguistic group, level of association significance and other relevant study information, such as study design and sample size, where available. The AGMP functions as a dedicated resource through which to access African-specific information on genomics as applied to health research, through querying variants, genes, diseases and drugs. The portal and its corresponding technical documentation, implementation code and content are publicly available.

4.
BMC Bioinformatics ; 19(1): 457, 2018 Nov 29.
Article En | MEDLINE | ID: mdl-30486782

BACKGROUND: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. RESULTS: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. CONCLUSION: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.


Computational Biology/methods , Genomics/methods , Africa , Humans , Reproducibility of Results
5.
PLoS One ; 13(7): e0199461, 2018.
Article En | MEDLINE | ID: mdl-29979707

A chronic inflammatory state to a large extent explains sickle cell disease (SCD) pathophysiology. Nonetheless, the principal dysregulated factors affecting this major pathway and their mechanisms of action still have to be fully identified and elucidated. Integrating gene expression and genome-wide association study (GWAS) data analysis represents a novel approach to refining the identification of key mediators and functions in complex diseases. Here, we performed gene expression meta-analysis of five independent publicly available microarray datasets related to homozygous SS patients with SCD to identify a consensus SCD transcriptomic profile. The meta-analysis conducted using the MetaDE R package based on combining p values (maxP approach) identified 335 differentially expressed genes (DEGs; 224 upregulated and 111 downregulated). Functional gene set enrichment revealed the importance of several metabolic pathways, of innate immune responses, erythrocyte development, and hemostasis pathways. Advanced analyses of GWAS data generated within the framework of this study by means of the atSNP R package and SIFT tool identified 60 regulatory single-nucleotide polymorphisms (rSNPs) occurring in the promoter of 20 DEGs and a deleterious SNP, affecting CAMKK2 protein function. This novel database of candidate genes, transcription factors, and rSNPs associated with SCD provides new markers that may help to identify new therapeutic targets.


Anemia, Sickle Cell/genetics , Gene Expression Profiling , Genome-Wide Association Study , Transcriptome , Alleles , Computational Biology/methods , Data Mining , Databases, Genetic , Gene Ontology , Gene Regulatory Networks , Genotype , Humans , Polymorphism, Single Nucleotide
6.
AAS Open Res ; 1: 9, 2018.
Article En | MEDLINE | ID: mdl-32382696

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. To address this need, in 2016 H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon, with the purpose of building key genomics analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.

7.
Nat Commun ; 8(1): 2062, 2017 12 12.
Article En | MEDLINE | ID: mdl-29233967

The Southern African Human Genome Programme is a national initiative that aspires to unlock the unique genetic character of southern African populations for a better understanding of human genetic diversity. In this pilot study the Southern African Human Genome Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique variants are identified. Despite the shallow time depth since divergence between the two main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component analysis and structure analysis reveal significant (p < 10-6) differentiation, and FST analysis identifies regions with high divergence. The Coloured individuals show evidence of varying proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity, increasing our understanding of the complex and region-specific history of African populations and highlighting its potential impact on biomedical research and genetic susceptibility to disease.


Black People/genetics , Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Genome, Human , DNA Mutational Analysis/methods , Healthy Volunteers , Humans , Male , Mutation/genetics , Pilot Projects , Principal Component Analysis , South Africa
8.
PLoS Comput Biol ; 13(6): e1005419, 2017 Jun.
Article En | MEDLINE | ID: mdl-28570565

The H3ABioNet pan-African bioinformatics network, which is funded to support the Human Heredity and Health in Africa (H3Africa) program, has developed node-assessment exercises to gauge the ability of its participating research and service groups to analyze typical genome-wide datasets being generated by H3Africa research groups. We describe a framework for the assessment of computational genomics analysis skills, which includes standard operating procedures, training and test datasets, and a process for administering the exercise. We present the experiences of 3 research groups that have taken the exercise and the impact on their ability to manage complex projects. Finally, we discuss the reasons why many H3ABioNet nodes have declined so far to participate and potential strategies to encourage them to do so.


Black People/genetics , Databases, Genetic , Genomics/methods , Database Management Systems , Developing Countries , Humans , Nigeria , South Africa
9.
Glob Heart ; 12(2): 91-98, 2017 06.
Article En | MEDLINE | ID: mdl-28302555

BACKGROUND: Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. OBJECTIVES: H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. METHODS AND RESULTS: Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. CONCLUSIONS: For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa.


Biomedical Research/methods , Computational Biology/trends , Genomics/methods , Africa , Humans
10.
F1000Res ; 3: 50, 2014.
Article En | MEDLINE | ID: mdl-25075288

SUMMARY: We present two web-based components for the display of Protein-Protein Interaction networks using different self-organizing layout methods: force-directed and circular. These components conform to the BioJS standard and can be rendered in an HTML5-compliant browser without the need for third-party plugins. We provide examples of interaction networks and how the components can be used to visualize them, and refer to a more complex tool that uses these components. AVAILABILITY: http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.7753.

11.
BMC Genomics ; 15: 437, 2014 Jun 06.
Article En | MEDLINE | ID: mdl-24906912

BACKGROUND: Population differentiation is the result of demographic and evolutionary forces. Whole genome datasets from the 1000 Genomes Project (October 2012) provide an unbiased view of genetic variation across populations from Europe, Asia, Africa and the Americas. Common population-specific SNPs (MAF > 0.05) reflect a deep history and may have important consequences for health and wellbeing. Their interpretation is contextualised by currently available genome data. RESULTS: The identification of common population-specific (CPS) variants (SNPs and SSV) is influenced by admixture and the sample size under investigation. Nine of the populations in the 1000 Genomes Project (2 African, 2 Asian (including a merged Chinese group) and 5 European) revealed that the African populations (LWK and YRI), followed by the Japanese (JPT) have the highest number of CPS SNPs, in concordance with their histories and given the populations studied. Using two methods, sliding 50-SNP and 5-kb windows, the CPS SNPs showed distinct clustering across large genome segments and little overlap of clusters between populations. iHS enrichment score and the population branch statistic (PBS) analyses suggest that selective sweeps are unlikely to account for the clustering and population specificity. Of interest is the association of clusters close to recombination hotspots. Functional analysis of genes associated with the CPS SNPs revealed over-representation of genes in pathways associated with neuronal development, including axonal guidance signalling and CREB signalling in neurones. CONCLUSIONS: Common population-specific SNPs are non-randomly distributed throughout the genome and are significantly associated with recombination hotspots. Since the variant alleles of most CPS SNPs are the derived allele, they likely arose in the specific population after a split from a common ancestor. Their proximity to genes involved in specific pathways, including neuronal development, suggests evolutionary plasticity of selected genomic regions. Contrary to expectation, selective sweeps did not play a large role in the persistence of population-specific variation. This suggests a stochastic process towards population-specific variation which reflects demographic histories and may have some interesting implications for health and susceptibility to disease.


Genetics, Population , Genome, Human , Polymorphism, Single Nucleotide , Racial Groups/genetics , Alleles , Computational Biology , Databases, Nucleic Acid , Evolution, Molecular , Humans , Recombination, Genetic , Selection, Genetic
12.
BMC Bioinformatics ; 15: 129, 2014 May 06.
Article En | MEDLINE | ID: mdl-24885165

BACKGROUND: Interaction between proteins is one of the most important mechanisms in the execution of cellular functions. The study of these interactions has provided insight into the functioning of an organism's processes. As of October 2013, Homo sapiens had over 170000 Protein-Protein interactions (PPI) registered in the Interologous Interaction Database, which is only one of the many public resources where protein interactions can be accessed. These numbers exemplify the volume of data that research on the topic has generated. Visualization of large data sets is a well known strategy to make sense of information, and protein interaction data is no exception. There are several tools that allow the exploration of this data, providing different methods to visualize protein network interactions. However, there is still no native web tool that allows this data to be explored interactively online. RESULTS: Given the advances that web technologies have made recently it is time to bring these interactive views to the web to provide an easily accessible forum to visualize PPI. We have created a Web-based Protein Interaction Network Visualizer: PINV, an open source, native web application that facilitates the visualization of protein interactions (http://biosual.cbio.uct.ac.za/pinv.html). We developed PINV as a set of components that follow the protocol defined in BioJS and use the D3 library to create the graphic layouts. We demonstrate the use of PINV with multi-organism interaction networks for a predicted target from Mycobacterium tuberculosis, its interacting partners and its orthologs. CONCLUSIONS: The resultant tool provides an attractive view of complex, fully interactive networks with components that allow the querying, filtering and manipulation of the visible subset. Moreover, as a web resource, PINV simplifies sharing and publishing, activities which are vital in today's research collaborative environments. The source code is freely available for download at https://github.com/4ndr01d3/biosual.


Protein Interaction Maps , Software , Computer Graphics , Humans , Internet , Protein Interaction Mapping
13.
PLoS One ; 8(2): e50695, 2013.
Article En | MEDLINE | ID: mdl-23383294

BACKGROUND: Measles virus (MV) causes T cell suppression by interference with phosphatidylinositol-3-kinase (PI3K) activation. We previously found that this interference affected the activity of splice regulatory proteins and a T cell inhibitory protein isoform was produced from an alternatively spliced pre-mRNA. HYPOTHESIS: Differentially regulated and alternatively splice variant transcripts accumulating in response to PI3K abrogation in T cells potentially encode proteins involved in T cell silencing. METHODS: To test this hypothesis at the cellular level, we performed a Human Exon 1.0 ST Array on RNAs isolated from T cells stimulated only or stimulated after PI3K inhibition. We developed a simple algorithm based on a splicing index to detect genes that undergo alternative splicing (AS) or are differentially regulated (RG) upon T cell suppression. RESULTS: Applying our algorithm to the data, 9% of the genes were assigned as AS, while only 3% were attributed to RG. Though there are overlaps, AS and RG genes differed with regard to functional regulation, and were found to be enriched in different functional groups. AS genes targeted extracellular matrix (ECM)-receptor interaction and focal adhesion pathways, while RG genes were mainly enriched in cytokine-receptor interaction and Jak-STAT. When combined, AS/RG dependent alterations targeted pathways essential for T cell receptor signaling, cytoskeletal dynamics and cell cycle entry. CONCLUSIONS: PI3K abrogation interferes with key T cell activation processes through both differential expression and alternative splicing, which together actively contribute to T cell suppression.


Algorithms , Gene Expression Regulation/immunology , Measles virus/metabolism , Phosphatidylinositol 3-Kinases/metabolism , Protein Isoforms/genetics , Suppressor Factors, Immunologic/metabolism , T-Lymphocytes/enzymology , DNA Primers/genetics , Gene Expression Profiling , Humans , Measles virus/immunology , Oligonucleotide Array Sequence Analysis , Phosphoinositide-3 Kinase Inhibitors , Reverse Transcriptase Polymerase Chain Reaction
14.
PLoS One ; 5(9): e12989, 2010 Sep 27.
Article En | MEDLINE | ID: mdl-20886000

Multiple factors underlie susceptibility to essential hypertension, including a significant genetic and ethnic component, and environmental effects. Blood pressure response of hypertensive individuals to salt is heterogeneous, but salt sensitivity appears more prevalent in people of indigenous African origin. The underlying genetics of salt-sensitive hypertension, however, are poorly understood. In this study, computational methods including text- and data-mining have been used to select and prioritize candidate aetiological genes for salt-sensitive hypertension. Additionally, we have compared allele frequencies and copy number variation for single nucleotide polymorphisms in candidate genes between indigenous Southern African and Caucasian populations, with the aim of identifying candidate genes with significant variability between the population groups: identifying genetic variability between population groups can exploit ethnic differences in disease prevalence to aid with prioritisation of good candidate genes. Our top-ranking candidate genes include parathyroid hormone precursor (PTH) and type-1 angiotensin II receptor (AGTR1). We propose that the candidate genes identified in this study warrant further investigation as potential aetiological genes for salt-sensitive hypertension.


Genetic Variation , Hypertension/ethnology , Hypertension/genetics , Sodium Chloride/metabolism , Africa, Southern/ethnology , Black People/genetics , Computational Biology , Gene Dosage , Gene Frequency , Humans , Hypertension/metabolism , Polymorphism, Single Nucleotide , White People/genetics
15.
Hum Genet ; 128(2): 145-53, 2010 Aug.
Article En | MEDLINE | ID: mdl-20490549

Admixed populations present unique opportunities to discover the genetic factors underlying many multifactorial diseases. The geographical position and complex history of South Africa has led to the establishment of the unique admixed population known as the South African Coloured. Not much is known about the genetic make-up of this population, and the historical record is patchy. We genotyped 959 individuals from the Western Cape area, self-identified as belonging to this population, using the Affymetrix 500k genotyping platform. This resulted in nearly 75,000 autosomal SNPs that could be compared with populations represented in the International HapMap Project and the Human Genome Diversity Project. Analysis by means of both the admixture and linkage models in STRUCTURE revealed that the major ancestral components of this population are predominantly Khoesan (32-43%), Bantu-speaking Africans (20-36%), European (21-28%) and a smaller Asian contribution (9-11%), depending on the model used. This is consistent with historical data. While of great historical and genealogical interest, this information is also essential for future admixture mapping of disease genes in this population.


Black People/genetics , Population Groups/genetics , Asian People/genetics , Ethnicity/genetics , Genome , Genotype , Geography , Humans , Male , Polymorphism, Single Nucleotide , Research , South Africa , White People/genetics
16.
Int J Plant Genomics ; 2008: 369601, 2008.
Article En | MEDLINE | ID: mdl-18483570

The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive, high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making.

...