GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions.
BMC Bioinformatics
; 21(1): 139, 2020 Apr 10.
Article
em En
| MEDLINE
| ID: mdl-32272889
BACKGROUND: Functional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses. GO terms statistically overrepresented within a set of a large number of genes are typically used to describe the main functional attributes of the gene set. However, these lists of overrepresented GO terms are often too large and contains redundant overlapping GO terms hindering informative functional interpretations. RESULTS: We developed GOMCL to reduce redundancy and summarize lists of GO terms effectively and informatively. This lightweight python toolkit efficiently identifies clusters within a list of GO terms using the Markov Clustering (MCL) algorithm, based on the overlap of gene members between GO terms. GOMCL facilitates biological interpretation of a large number of GO terms by condensing them into GO clusters representing non-overlapping functional themes. It enables visualizing GO clusters as a heatmap, networks based on either overlap of members or hierarchy among GO terms, and tables with depth and cluster information for each GO term. Each GO cluster generated by GOMCL can be evaluated and further divided into non-overlapping sub-clusters using the GOMCL-sub module. The outputs from both GOMCL and GOMCL-sub can be imported to Cytoscape for additional visualization effects. CONCLUSIONS: GOMCL is a convenient toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions. GOMCL helps researchers to reduce time spent on manual curation of large lists of GO terms, minimize biases introduced by redundant GO terms in data interpretation, and batch processing of multiple GO enrichment datasets. A user guide, a test dataset, and the source code of GOMCL are available at https://github.com/Guannan-Wang/GOMCL and www.lsugenomics.org.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Interface Usuário-Computador
/
Ontologia Genética
Tipo de estudo:
Health_economic_evaluation
/
Risk_factors_studies
Idioma:
En
Revista:
BMC Bioinformatics
Assunto da revista:
INFORMATICA MEDICA
Ano de publicação:
2020
Tipo de documento:
Article
País de afiliação:
Estados Unidos
País de publicação:
Reino Unido