A multi-view genomic data simulator.

Fratello, Michele; Serra, Angela; Fortino, Vittorio; Raiconi, Giancarlo; Tagliaferri, Roberto; Greco, Dario

Fratello, Michele; Serra, Angela; Fortino, Vittorio; Raiconi, Giancarlo; Tagliaferri, Roberto; Greco, Dario.

Afiliação

Fratello M; Department of Medical, Surgical, Neurological, Metabolic and Ageing Sciences, Second University of Napoli, Napoli, Italy. michele.fratello@unina2.it.
Serra A; Department of Computer Science, Fisciano, Italy. michele.fratello@unina2.it.
Fortino V; Department of Computer Science, Fisciano, Italy. aserra@unisa.it.
Raiconi G; Unit of Systems Toxicology and Nanosafety Research Centre, Finnish Institute of Occupational Health, FIOH, Helsinki, Finland. vittorio.fortino@ttl.fi.
Tagliaferri R; Department of Computer Science, Fisciano, Italy. gianni@unisa.it.
Greco D; Department of Computer Science, Fisciano, Italy. rtagliaferr@unisa.it.

BMC Bioinformatics ; 16: 151, 2015 May 12.

Article em En | MEDLINE | ID: mdl-25962835

ABSTRACT

ABSTRACT

BACKGROUND:

OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori.

RESULTS:

Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions.

CONCLUSIONS:

The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http//neuronelab.unisa.it/?p=1722.

Assuntos

Algoritmos; Biologia Computacional/métodos; Simulação por Computador; Perfilação da Expressão Gênica/métodos; Redes Reguladoras de Genes; Genômica/métodos; Variações do Número de Cópias de DNA; Metilação de DNA; Conjuntos de Dados como Assunto; Regulação da Expressão Gênica; Humanos; MicroRNAs/genética

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Simulação por Computador / Biologia Computacional / Perfilação da Expressão Gênica / Genômica / Redes Reguladoras de Genes Limite: Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google