Your browser doesn't support javascript.
loading
Bipartite tight spectral clustering (BiTSC) algorithm for identifying conserved gene co-clusters in two species.
Sun, Yidan Eden; Zhou, Heather J; Li, Jingyi Jessica.
Affiliation
  • Sun YE; Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.
  • Zhou HJ; Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.
  • Li JJ; Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.
Bioinformatics ; 37(9): 1225-1233, 2021 06 09.
Article in En | MEDLINE | ID: mdl-32814973
MOTIVATION: Gene clustering is a widely used technique that has enabled computational prediction of unknown gene functions within a species. However, it remains a challenge to refine gene function prediction by leveraging evolutionarily conserved genes in another species. This challenge calls for a new computational algorithm to identify gene co-clusters in two species, so that genes in each co-cluster exhibit similar expression levels in each species and strong conservation between the species. RESULTS: Here, we develop the bipartite tight spectral clustering (BiTSC) algorithm, which identifies gene co-clusters in two species based on gene orthology information and gene expression data. BiTSC novelly implements a formulation that encodes gene orthology as a bipartite network and gene expression data as node covariates. This formulation allows BiTSC to adopt and combine the advantages of multiple unsupervised learning techniques: kernel enhancement, bipartite spectral clustering, consensus clustering, tight clustering and hierarchical clustering. As a result, BiTSC is a flexible and robust algorithm capable of identifying informative gene co-clusters without forcing all genes into co-clusters. Another advantage of BiTSC is that it does not rely on any distributional assumptions. Beyond cross-species gene co-clustering, BiTSC also has wide applications as a general algorithm for identifying tight node co-clusters in any bipartite network with node covariates. We demonstrate the accuracy and robustness of BiTSC through comprehensive simulation studies. In a real data example, we use BiTSC to identify conserved gene co-clusters of Drosophila melanogaster and Caenorhabditis elegans, and we perform a series of downstream analysis to both validate BiTSC and verify the biological significance of the identified co-clusters. AVAILABILITY AND IMPLEMENTATION: The Python package BiTSC is open-access and available at https://github.com/edensunyidan/BiTSC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Gene Expression Profiling / Drosophila melanogaster Type of study: Prognostic_studies Limits: Animals Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2021 Type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Gene Expression Profiling / Drosophila melanogaster Type of study: Prognostic_studies Limits: Animals Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2021 Type: Article Affiliation country: United States