Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 586(7831): 741-748, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33116287

RESUMO

The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals-comprising 50 ethnolinguistic groups, including previously unsampled populations-to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon-but in other genes, variants denoted as 'likely pathogenic' in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.


Assuntos
Variação Genética , Genoma Humano/genética , Genômica , Saúde , Migração Humana , África/etnologia , Reparo do DNA/genética , Conjuntos de Dados como Assunto , Feminino , Fluxo Gênico , Genética Médica , Genética Populacional , Saúde/história , História Antiga , Migração Humana/história , Humanos , Imunidade/genética , Idioma , Masculino , Metabolismo/genética , Seleção Genética , Sequenciamento Completo do Genoma
2.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34415019

RESUMO

Over the past few years, meta-analysis has become popular among biomedical researchers for detecting biomarkers across multiple cohort studies with increased predictive power. Combining datasets from different sources increases sample size, thus overcoming the issue related to limited sample size from each individual study and boosting the predictive power. This leads to an increased likelihood of more accurately predicting differentially expressed genes/proteins or significant biomarkers underlying the biological condition of interest. Currently, several meta-analysis methods and tools exist, each having its own strengths and limitations. In this paper, we survey existing meta-analysis methods, and assess the performance of different methods based on results from different datasets as well as assessment from prior knowledge of each method. This provides a reference summary of meta-analysis models and tools, which helps to guide end-users on the choice of appropriate models or tools for given types of datasets and enables developers to consider current advances when planning the development of new meta-analysis models and more practical integrative tools.


Assuntos
Algoritmos , Análise de Dados , Metanálise como Assunto , Software , Árvores de Decisões , Humanos , Fluxo de Trabalho
3.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33129201

RESUMO

Advances in high-throughput sequencing technologies have resulted in an exponential growth of publicly accessible biological datasets. In the 'big data' driven 'post-genomic' context, much work is being done to explore human protein-protein interactions (PPIs) for a systems level based analysis to uncover useful signals and gain more insights to advance current knowledge and answer specific biological and health questions. These PPIs are experimentally or computationally predicted, stored in different online databases and some of PPI resources are updated regularly. As with many biological datasets, such regular updates continuously render older PPI datasets potentially outdated. Moreover, while many of these interactions are shared between these online resources, each resource includes its own identified PPIs and none of these databases exhaustively contains all existing human PPI maps. In this context, it is essential to enable the integration of or combining interaction datasets from different resources, to generate a PPI map with increased coverage and confidence. To allow researchers to produce an integrated human PPI datasets in real-time, we introduce the integrated human protein-protein interaction network generator (IHP-PING) tool. IHP-PING is a flexible python package which generates a human PPI network from freely available online resources. This tool extracts and integrates heterogeneous PPI datasets to generate a unified PPI network, which is stored locally for further applications.


Assuntos
Bases de Dados de Proteínas , Linguagens de Programação , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Humanos
4.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33341897

RESUMO

Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.


Assuntos
População Negra/genética , Bases de Dados de Ácidos Nucleicos , Variação Genética , Genoma Humano , População Branca/genética , Sequenciamento Completo do Genoma , Humanos , Desequilíbrio de Ligação
5.
Brief Bioinform ; 21(1): 144-155, 2020 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30462157

RESUMO

Advances in human sequencing technologies, coupled with statistical and computational tools, have fostered the development of methods for dating admixture events. These methods have merits and drawbacks in estimating admixture events in multi-way admixed populations. Here, we first provide a comprehensive review and comparison of current methods pertinent to dating admixture events. Second, we assess various admixture dating tools. We do so by performing various simulations. Third, we apply the top two assessed methods to real data of a uniquely admixed population from South Africa. Results reveal that current dating admixture models are not sufficiently equipped to estimate ancient admixtures events and to identify multi-faceted admixture events in complex multi-way admixed populations. We conclude with a discussion of research areas where further work on dating admixture-based methods is needed.

6.
Brief Bioinform ; 21(5): 1663-1675, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31711157

RESUMO

Drug-like compounds are most of the time denied approval and use owing to the unexpected clinical side effects and cross-reactivity observed during clinical trials. These unexpected outcomes resulting in significant increase in attrition rate centralizes on the selected drug targets. These targets may be disease candidate proteins or genes, biological pathways, disease-associated microRNAs, disease-related biomarkers, abnormal molecular phenotypes, crucial nodes of biological network or molecular functions. This is generally linked to several factors, including incomplete knowledge on the drug targets and unpredicted pharmacokinetic expressions upon target interaction or off-target effects. A method used to identify targets, especially for polygenic diseases, is essential and constitutes a major bottleneck in drug development with the fundamental stage being the identification and validation of drug targets of interest for further downstream processes. Thus, various computational methods have been developed to complement experimental approaches in drug discovery. Here, we present an overview of various computational methods and tools applied in predicting or validating drug targets and drug-like molecules. We provide an overview on their advantages and compare these methods to identify effective methods which likely lead to optimal results. We also explore major sources of drug failure considering the challenges and opportunities involved. This review might guide researchers on selecting the most efficient approach or technique during the computational drug discovery process.


Assuntos
Biologia Computacional/métodos , Sistemas de Liberação de Medicamentos , Biomarcadores/metabolismo , Simulação por Computador , Descoberta de Drogas , Aprendizado de Máquina , Simulação de Acoplamento Molecular
8.
Brief Bioinform ; 20(2): 690-700, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-29701762

RESUMO

Over thousands of genetic associations to diseases have been identified by genome-wide association studies (GWASs), which conceptually is a single-marker-based approach. There are potentially many uses of these identified variants, including a better understanding of the pathogenesis of diseases, new leads for studying underlying risk prediction and clinical prediction of treatment. However, because of inadequate power, GWAS might miss disease genes and/or pathways with weak genetic or strong epistatic effects. Driven by the need to extract useful information from GWAS summary statistics, post-GWAS approaches (PGAs) were introduced. Here, we dissect and discuss advances made in pathway/network-based PGAs, with a particular focus on protein-protein interaction networks that leverage GWAS summary statistics by combining effects of multiple loci, subnetworks or pathways to detect genetic signals associated with complex diseases. We conclude with a discussion of research areas where further work on summary statistic-based methods is needed.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Epistasia Genética , Humanos , Mapas de Interação de Proteínas
9.
Brief Bioinform ; 20(5): 1709-1724, 2019 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-30010715

RESUMO

Over the past decade, studies of admixed populations have increasingly gained interest in both medical and population genetics. These studies have so far shed light on the patterns of genetic variation throughout modern human evolution and have improved our understanding of the demographics and adaptive processes of human populations. To date, there exist about 20 methods or tools to deconvolve local ancestry. These methods have merits and drawbacks in estimating local ancestry in multiway admixed populations. In this article, we survey existing ancestry deconvolution methods, with special emphasis on multiway admixture, and compare these methods based on simulation results reported by different studies, computational approaches used, including mathematical and statistical models, and biological challenges related to each method. This should orient users on the choice of an appropriate method or tool for given population admixture characteristics and update researchers on current advances, challenges and opportunities behind existing ancestry deconvolution methods.


Assuntos
Evolução Molecular , Genoma Humano , Modelos Genéticos , Humanos
10.
Malar J ; 20(1): 421, 2021 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-34702263

RESUMO

BACKGROUND: The emergence and spread of malaria drug resistance have resulted in the need to understand disease mechanisms and importantly identify essential targets and potential drug candidates. Malaria infection involves the complex interaction between the host and pathogen, thus, functional interactions between human and Plasmodium falciparum is essential to obtain a holistic view of the genetic architecture of malaria. Several functional interaction studies have extended the understanding of malaria disease and integrating such datasets would provide further insights towards understanding drug resistance and/or genetic resistance/susceptibility, disease pathogenesis, and drug discovery. METHODS: This study curated and analysed data including pathogen and host selective genes, host and pathogen protein sequence data, protein-protein interaction datasets, and drug data from literature and databases to perform human-host and P. falciparum network-based analysis. An integrative computational framework is presented that was developed and found to be reasonably accurate based on various evaluations, applications, and experimental evidence of outputs produced, from data-driven analysis. RESULTS: This approach revealed 8 hub protein targets essential for parasite and human host-directed malaria drug therapy. In a semantic similarity approach, 26 potential repurposable drugs involved in regulating host immune response to inflammatory-driven disorders and/or inhibiting residual malaria infection that can be appropriated for malaria treatment. Further analysis of host-pathogen network shortest paths enabled the prediction of immune-related biological processes and pathways subverted by P. falciparum to increase its within-host survival. CONCLUSIONS: Host-pathogen network analysis reveals potential drug targets and biological processes and pathways subverted by P. falciparum to enhance its within malaria host survival. The results presented have implications for drug discovery and will inform experimental studies.


Assuntos
Descoberta de Drogas , Resistência a Medicamentos/genética , Malária Falciparum/prevenção & controle , Plasmodium falciparum/genética , Mapeamento de Interação de Proteínas , Proteínas de Protozoários/genética , Antimaláricos/uso terapêutico , Simulação por Computador , Humanos , Plasmodium falciparum/efeitos dos fármacos
11.
Brief Bioinform ; 19(6): 1141-1152, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-28520909

RESUMO

Populations worldwide currently face several public health challenges, including growing prevalence of infections and the emergence of new pathogenic organisms. The cost and risk associated with drug development make the development of new drugs for several diseases, especially orphan or rare diseases, unappealing to the pharmaceutical industry. Proof of drug safety and efficacy is required before market approval, and rigorous testing makes the drug development process slow, expensive and frequently result in failure. This failure is often because of the use of irrelevant targets identified in the early steps of the drug discovery process, suggesting that target identification and validation are cornerstones for the success of drug discovery and development. Here, we present a large-scale data-driven integrative computational framework to extract essential targets and processes from an existing disease-associated data set and enhance target selection by leveraging drug-target-disease association at the systems level. We applied this framework to tuberculosis and Ebola virus diseases combining heterogeneous data from multiple sources, including protein-protein functional interaction, functional annotation and pharmaceutical data sets. Results obtained demonstrate the effectiveness of the pipeline, leading to the extraction of essential drug targets and to the rational use of existing approved drugs. This provides an opportunity to move toward optimal target-based strategies for screening available drugs and for drug discovery. There is potential for this model to bridge the gap in the production of orphan disease therapies, offering a systematic approach to predict new uses for existing drugs, thereby harnessing their full therapeutic potential.


Assuntos
Conjuntos de Dados como Assunto , Antituberculosos/química , Antituberculosos/farmacologia , Antivirais/química , Antivirais/farmacologia , Desenvolvimento de Medicamentos , Ebolavirus/efeitos dos fármacos , Doença pelo Vírus Ebola/genética , Interações Hospedeiro-Patógeno , Humanos , Anotação de Sequência Molecular , Mycobacterium tuberculosis/efeitos dos fármacos , Reprodutibilidade dos Testes , Tuberculose/genética
12.
BMC Med Genet ; 21(1): 125, 2020 06 05.
Artigo em Inglês | MEDLINE | ID: mdl-32503527

RESUMO

BACKGROUND: Sickle cell disease (SCD) is a blood disorder caused by a point mutation on the beta globin gene resulting in the synthesis of abnormal hemoglobin. Fetal hemoglobin (HbF) reduces disease severity, but the levels vary from one individual to another. Most research has focused on common genetic variants which differ across populations and hence do not fully account for HbF variation. METHODS: We investigated rare and common genetic variants that influence HbF levels in 14 SCD patients to elucidate variants and pathways in SCD patients with extreme HbF levels (≥7.7% for high HbF) and (≤2.5% for low HbF) in Tanzania. We performed targeted next generation sequencing (Illumina_Miseq) covering exonic and other significant fetal hemoglobin-associated loci, including BCL11A, MYB, HOXA9, HBB, HBG1, HBG2, CHD4, KLF1, MBD3, ZBTB7A and PGLYRP1. RESULTS: Results revealed a range of genetic variants, including bi-allelic and multi-allelic SNPs, frameshift insertions and deletions, some of which have functional importance. Notably, there were significantly more deletions in individuals with high HbF levels (11% vs 0.9%). We identified frameshift deletions in individuals with high HbF levels and frameshift insertions in individuals with low HbF. CHD4 and MBD3 genes, interacting in the same sub-network, were identified to have a significant number of pathogenic or non-synonymous mutations in individuals with low HbF levels, suggesting an important role of epigenetic pathways in the regulation of HbF synthesis. CONCLUSIONS: This study provides new insights in selecting essential variants and identifying potential biological pathways associated with extreme HbF levels in SCD interrogating multiple genomic variants associated with HbF in SCD.


Assuntos
Anemia Falciforme/genética , Hemoglobina Fetal/genética , Variação Genética , Adolescente , Criança , Pré-Escolar , Redes Reguladoras de Genes , Humanos , Mutação com Perda de Função/genética , Tanzânia , Adulto Jovem
13.
Brief Bioinform ; 18(5): 886-901, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27473066

RESUMO

Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.


Assuntos
Ontologia Genética , Humanos , Anotação de Sequência Molecular , Semântica , Inquéritos e Questionários
14.
Bioinformatics ; 33(19): 2995-3002, 2017 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-28957497

RESUMO

MOTIVATION: Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities. RESULTS: Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies. AVAILABILITY AND IMPLEMENTATION: The FractalSIM package is available at http://www.cbio.uct.ac.za/FractalSIM. CONTACT: emile.chimusa@uct.ac.za. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genética Populacional/métodos , Genômica/métodos , Variação Genética , Genoma , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Seleção Genética , Análise de Sequência de DNA , Software
15.
BMC Plant Biol ; 17(1): 218, 2017 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-29169324

RESUMO

BACKGROUND: Advances in forward and reverse genetic techniques have enabled the discovery and identification of several plant defence genes based on quantifiable disease phenotypes in mutant populations. Existing models for testing the effect of gene inactivation or genes causing these phenotypes do not take into account eventual uncertainty of these datasets and potential noise inherent in the biological experiment used, which may mask downstream analysis and limit the use of these datasets. Moreover, elucidating biological mechanisms driving the induced disease resistance and influencing these observable disease phenotypes has never been systematically tackled, eliciting the need for an efficient model to characterize completely the gene target under consideration. RESULTS: We developed a post-gene silencing bioinformatics (post-GSB) protocol which accounts for potential biases related to the disease phenotype datasets in assessing the contribution of the gene target to the plant defence response. The post-GSB protocol uses Gene Ontology semantic similarity and pathway dataset to generate enriched process regulatory network based on the functional degeneracy of the plant proteome to help understand the induced plant defence response. We applied this protocol to investigate the effect of the NPR1 gene silencing to changes in Arabidopsis thaliana plants following Pseudomonas syringae pathovar tomato strain DC3000 infection. Results indicated that the presence of a functionally active NPR1 reduced the plant's susceptibility to the infection, with about 99% of variability in Pseudomonas spore growth between npr1 mutant and wild-type samples. Moreover, the post-GSB protocol has revealed the coordinate action of target-associated genes and pathways through an enriched process regulatory network, summarizing the potential target-based induced disease resistance mechanism. CONCLUSIONS: This protocol can improve the characterization of the gene target and, potentially, elucidate induced defence response by more effectively utilizing available phenotype information and plant proteome functional knowledge.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Biologia Computacional/métodos , Doenças das Plantas/genética , Arabidopsis/microbiologia , Proteínas de Arabidopsis/fisiologia , Conjuntos de Dados como Assunto , Inativação Gênica , Modelos Genéticos , Mutação , Fenótipo , Doenças das Plantas/microbiologia , Pseudomonas syringae/fisiologia
16.
Bioinformatics ; 32(4): 549-56, 2016 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-26508762

RESUMO

MOTIVATION: Despite numerous successful Genome-wide Association Studies (GWAS), detecting variants that have low disease risk still poses a challenge. GWAS may miss disease genes with weak genetic effects or strong epistatic effects due to the single-marker testing approach commonly used. GWAS may thus generate false negative or inconclusive results, suggesting the need for novel methods to combine effects of single nucleotide polymorphisms within a gene to increase the likelihood of fully characterizing the susceptibility gene. RESULTS: We developed ancGWAS, an algebraic graph-based centrality measure that accounts for linkage disequilibrium in identifying significant disease sub-networks by integrating the association signal from GWAS data sets into the human protein-protein interaction (PPI) network. We validated ancGWAS using an association study result from a breast cancer data set and the simulation of interactive disease loci in the simulation of a complex admixed population, as well as pathway-based GWAS simulation. This new approach holds promise for deconvoluting the interactions between genes underlying the pathogenesis of complex diseases. Results obtained yield a novel central breast cancer sub-network of the human interactome implicated in the proteoglycan syndecan-mediated signaling events pathway which is known to play a major role in mesenchymal tumor cell proliferation, thus providing further insights into breast cancer pathogenesis. AVAILABILITY AND IMPLEMENTATION: The ancGWAS package and documents are available at http://www.cbio.uct.ac.za/~emile/software.html.


Assuntos
Neoplasias da Mama/patologia , Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Mapeamento de Interação de Proteínas/métodos , Transdução de Sinais , Software , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/genética , Feminino , Redes Reguladoras de Genes , Predisposição Genética para Doença , Humanos , Desequilíbrio de Ligação
17.
Bioinformatics ; 32(3): 477-9, 2016 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26476781

RESUMO

SUMMARY: Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. AVAILABILITY AND IMPLEMENTATION: A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). CONTACT: gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Genes , Anotação de Sequência Molecular/métodos , Proteínas/genética , Semântica , Software , Bases de Dados Factuais , Humanos
18.
BMC Bioinformatics ; 15: 129, 2014 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-24885165

RESUMO

BACKGROUND: Interaction between proteins is one of the most important mechanisms in the execution of cellular functions. The study of these interactions has provided insight into the functioning of an organism's processes. As of October 2013, Homo sapiens had over 170000 Protein-Protein interactions (PPI) registered in the Interologous Interaction Database, which is only one of the many public resources where protein interactions can be accessed. These numbers exemplify the volume of data that research on the topic has generated. Visualization of large data sets is a well known strategy to make sense of information, and protein interaction data is no exception. There are several tools that allow the exploration of this data, providing different methods to visualize protein network interactions. However, there is still no native web tool that allows this data to be explored interactively online. RESULTS: Given the advances that web technologies have made recently it is time to bring these interactive views to the web to provide an easily accessible forum to visualize PPI. We have created a Web-based Protein Interaction Network Visualizer: PINV, an open source, native web application that facilitates the visualization of protein interactions (http://biosual.cbio.uct.ac.za/pinv.html). We developed PINV as a set of components that follow the protocol defined in BioJS and use the D3 library to create the graphic layouts. We demonstrate the use of PINV with multi-organism interaction networks for a predicted target from Mycobacterium tuberculosis, its interacting partners and its orthologs. CONCLUSIONS: The resultant tool provides an attractive view of complex, fully interactive networks with components that allow the querying, filtering and manipulation of the visible subset. Moreover, as a web resource, PINV simplifies sharing and publishing, activities which are vital in today's research collaborative environments. The source code is freely available for download at https://github.com/4ndr01d3/biosual.


Assuntos
Mapas de Interação de Proteínas , Software , Gráficos por Computador , Humanos , Internet , Mapeamento de Interação de Proteínas
19.
BMC Bioinformatics ; 14: 284, 2013 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-24067102

RESUMO

BACKGROUND: The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. RESULTS: We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. CONCLUSIONS: The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Software , Análise por Conglomerados , Bases de Dados Genéticas , Genes/genética , Proteínas/genética , Semântica
20.
Artigo em Inglês | MEDLINE | ID: mdl-39247216

RESUMO

Background: Sickle cell disease (SCD) is a severe hereditary form of anemia that contributes between 50% and 80% of under-five mortality in Africa. Eleven thousand babies are born with SCD annually in Tanzania, ranking 4th after Nigeria, the Democratic Republic of Congo and India. The absence of well-described SCD cohorts is a major barrier to health research in SCD in Africa. Objective: This paper describes the Sickle Pan African Consortium (SPARCO) database in Tanzania, from the development, design of the study instruments, data collection, analysis of data and management of data quality issues. Methods: The SPARCO registry used existing Muhimbili Sickle Cell Cohort (MSC) study case report forms (CRF) and later harmonized data elements from the SickleInAfrica consortium to develop Research Electronic Data Capture (REDCap) instruments. Patients were enrolled through various strategies, including mass screening following media sensitization and health education events during World Sickle Cell Day each June and the SCD awareness month in September. Additional patients were identified through active surveillance of previously participating patients in the MSC. Results: Three thousand eight hundred patients were enrolled between October 2017 and May 2021. Of these, 1,946 (51.21%) were males and 1,864 (48.79%) were females. The hemoglobin phenotype distribution was 3,762 (99%) HbSS, 3 (0.08%) HbSC and 35 (0.92%) HbSb +thalassemia. Hemoglobin levels, admission history, blood transfusion and painful events were recorded from December 2017 to May 2021. Conclusion: The Tanzania SPARCO registry will improve healthcare for SCD in Africa through the facilitation of collaborative data-driven research for SCD.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA