PhyloCorrelate: inferring bacterial gene-gene functional associations through large-scale phylogenetic profiling.
Bioinformatics
; 37(1): 17-22, 2021 Apr 09.
Article
em En
| MEDLINE
| ID: mdl-33416870
ABSTRACT
MOTIVATION Statistical detection of co-occurring genes across genomes, known as 'phylogenetic profiling', is a powerful bioinformatic technique for inferring gene-gene functional associations. However, this can be a challenging task given the size and complexity of phylogenomic databases, difficulty in accounting for phylogenetic structure, inconsistencies in genome annotation and substantial computational requirements. RESULTS:
We introduce PhyloCorrelate-a computational framework for gene co-occurrence analysis across large phylogenomic datasets. PhyloCorrelate implements a variety of co-occurrence metrics including standard correlation metrics and model-based metrics that account for phylogenetic history. By combining multiple metrics, we developed an optimized score that exhibits a superior ability to link genes with overlapping GO terms and KEGG pathways, enabling gene function prediction. Using genomic and functional annotation data from the Genome Taxonomy Database and AnnoTree, we performed all-by-all comparisons of gene occurrence profiles across the bacterial tree of life, totaling 154 217 052 comparisons for 28 315 genes across 27 372 bacterial genomes. All predictions are available in an online database, which instantaneously returns the top correlated genes for any PFAM, TIGRFAM or KEGG query. In total, PhyloCorrelate detected 29 762 high confidence associations between bacterial gene/protein pairs, and generated functional predictions for 834 DUFs and proteins of unknown function. AVAILABILITYAND IMPLEMENTATION PhyloCorrelate is available as a web-server at phylocorrelate.uwaterloo.ca as well as an R package for analysis of custom datasets. We anticipate that PhyloCorrelate will be broadly useful as a tool for predicting function and interactions for gene families. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Texto completo:
1
Base de dados:
MEDLINE
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
Bioinformatics
Assunto da revista:
INFORMATICA MEDICA
Ano de publicação:
2021
Tipo de documento:
Article
País de afiliação:
Canadá