Pesquisa | Portal Regional da BVS

RASMA: a reverse search algorithm for mining maximal frequent subgraphs.

Salem, Saeed; Alokshiya, Mohammed; Hasan, Mohammad Al.

BioData Min ; 14(1): 19, 2021 Mar 16.

Artigo em Inglês | MEDLINE | ID: mdl-33726790

RESUMO

BACKGROUND: Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification. RESULTS: We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs. CONCLUSION: Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies.

GPU Accelerated Browser for Neuroimaging Genomics.

Zigon, Bob; Li, Huang; Yao, Xiaohui; Fang, Shiaofen; Hasan, Mohammad Al; Yan, Jingwen; Moore, Jason H; Saykin, Andrew J; Shen, Li.

Neuroinformatics ; 16(3-4): 393-402, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-29691798

RESUMO

Neuroimaging genomics is an emerging field that provides exciting opportunities to understand the genetic basis of brain structure and function. The unprecedented scale and complexity of the imaging and genomics data, however, have presented critical computational bottlenecks. In this work we present our initial efforts towards building an interactive visual exploratory system for mining big data in neuroimaging genomics. A GPU accelerated browsing tool for neuroimaging genomics is created that implements the ANOVA algorithm for single nucleotide polymorphism (SNP) based analysis and the VEGAS algorithm for gene-based analysis, and executes them at interactive rates. The ANOVA algorithm is 110 times faster than the 4-core OpenMP version, while the VEGAS algorithm is 375 times faster than its 4-core OpenMP counter part. This approach lays a solid foundation for researchers to address the challenges of mining large-scale imaging genomics datasets via interactive visual exploration.

Assuntos

Encéfalo/diagnóstico por imagem , Mineração de Dados/métodos , Genômica/métodos , Neuroimagem/métodos , Polimorfismo de Nucleotídeo Único/genética , Navegador , Algoritmos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Humanos

Pathway and network analysis in proteomics.

Wu, Xiaogang; Hasan, Mohammad Al; Chen, Jake Yue.

J Theor Biol ; 362: 44-52, 2014 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-24911777

RESUMO

Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics.

Assuntos

Proteômica/métodos , Algoritmos , Animais , Neoplasias Colorretais/metabolismo , Biologia Computacional , Bases de Dados de Proteínas , Feminino , Regulação da Expressão Gênica , Humanos , Masculino , Família Multigênica , Neoplasias Ovarianas/metabolismo , Reconhecimento Automatizado de Padrão , Neoplasias da Próstata/metabolismo , Análise Serial de Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Software , Biologia de Sistemas

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA