Practical Approaches for Mining Frequent Patterns in Molecular Datasets.

Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter

Naulaerts, Stefan; Moens, Sandy; Engelen, Kristof; Berghe, Wim Vanden; Goethals, Bart; Laukens, Kris; Meysman, Pieter.

Afiliação

Naulaerts S; Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.; Biomedical Informatics Research Center Antwerpen (Biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium.
Moens S; Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.
Engelen K; Department of Computational Biology, Fondazione Edmund Mach, San Michele all'Adige, Trento, Italy.
Berghe WV; Department of Biomedical Sciences, Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), University of Antwerp, Antwerp, Belgium.
Goethals B; Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.
Laukens K; Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.; Biomedical Informatics Research Center Antwerpen (Biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium.
Meysman P; Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.; Biomedical Informatics Research Center Antwerpen (Biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium.

Bioinform Biol Insights ; 10: 37-47, 2016.

Article em En | MEDLINE | ID: mdl-27168722

RESUMO

Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features.

Palavras-chave

Mycobacterium tuberculosis; frequent itemset mining; gene expression; protein domain structure; proteinprotein interaction

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links