Your browser doesn't support javascript.
loading
Quality versus accuracy: result of a reanalysis of protein-binding microarrays from the DREAM5 challenge by using BayesPI2 including dinucleotide interdependence.
Wang, Junbai.
Afiliación
  • Wang J; Pathology Department, Oslo University Hospital - Norwegian Radium Hospital, Montebello, Oslo, 0310, Norway. junbai.wang@rr-research.no.
BMC Bioinformatics ; 15: 289, 2014 Aug 27.
Article en En | MEDLINE | ID: mdl-25158938
ABSTRACT

BACKGROUND:

Computational modeling transcription factor (TF) sequence specificity is an important research topic in regulatory genomics. A systematic comparison of 26 algorithms to learn TF-DNA binding specificity in in vitro protein-binding microarray (PBM) data was published recently, but the quality of those examined PBMs was not evaluated completely.

RESULTS:

Here, new quality-control parameters such as principal component analysis (PCA) ellipse is proposed to assess the data quality for either single or paired PBMs. Additionally, a biophysical model of TF-DNA interactions including adjacent dinucleotide interdependence was implemented in a new program - BayesPI2, where sparse Bayesian learning and relevance vector machine are used to predict unknown model parameters. Then, 66 mouse TFs from the DREAM5 challenge were classified into two groups (i.e. good vs. bad) based on the paired PBM quality-control parameters. Subsequently, computational methods to model TF sequence specificity were evaluated between the two groups.

CONCLUSION:

Results indicate that both the algorithm performance and the predicted TF-binding energy-level of a motif are significantly influenced by PBM data quality, where poor PBM data quality is linked to specific protein domains (e.g. C2H2 DNA-binding domain). Especially, the new dinucleotide energy-dependent model (BayesPI2) offers great improvement in testing prediction accuracy over the simple energy-independent model, for at least 21% of analyzed the TFs.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Oligonucleótidos / Biología Computacional / Análisis por Matrices de Proteínas Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2014 Tipo del documento: Article País de afiliación: Noruega

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Oligonucleótidos / Biología Computacional / Análisis por Matrices de Proteínas Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2014 Tipo del documento: Article País de afiliación: Noruega