Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
BMC Bioinformatics ; 15: 151, 2014 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-24885407

RESUMO

BACKGROUND: Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. RESULTS: We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group.These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. CONCLUSIONS: The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise por Conglomerados
2.
Bioinformatics ; 24(16): i63-9, 2008 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-18689841

RESUMO

SUMMARY: A novel integration approach targeting the combination of multi-experiment time series expression data is proposed. A recursive hybrid aggregation algorithm is initially employed to extract a set of genes, which are eventually of interest for the biological phenomenon under study. Next, a hierarchical merge procedure is specifically developed for the purpose of fusing together the multiple-experiment expression pro.les of the selected genes. This employs dynamic time warping alignment techniques in order to account adequately for the potential phase shift between the different experiments. We subsequently demonstrate that the resulting gene expression pro.les consistently re.ect the behavior of the original expression pro.les in the different experiments. SUPPLEMENTARY INFORMATION: Supplementary data are available athttp://www.tu-plovdiv.bg/Container/bi/DataIntegration/


Assuntos
Algoritmos , Proteínas de Bactérias/metabolismo , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Ciclo Celular/fisiologia , Proteínas de Ciclo Celular/metabolismo , Saccharomyces/citologia , Saccharomyces/metabolismo , Fatores de Tempo
3.
Anal Chem ; 80(10): 3783-90, 2008 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-18419139

RESUMO

As with every -omics technology, metabolomics requires new methodologies for data processing. Due to the large spectral size, a standard approach in NMR-based metabolomics implies the division of spectra into equally sized bins, thereby simplifying subsequent data analysis. Yet, disadvantages are the loss of information and the occurrence of artifacts caused by peak shifts. Here, a new binning algorithm, Adaptive Intelligent Binning (AI-Binning), which largely circumvents these problems, is presented. AI-Binning recursively identifies bin edges in existing bins, requires only minimal user input, and avoids the use of arbitrary parameters or reference spectra. The performance of AI-Binning is demonstrated using serum spectra from 40 hypertensive and 40 matched normotensive subjects from the Asklepios study. Hypertension is a major cardiovascular risk factor characterized by a complex biochemistry and, in most cases, an unknown origin. The binning algorithm resulted in an improved classification of hypertensive status compared with that of standard binning and facilitated the identification of relevant metabolites. Moreover, since the occurrence of noise variables is largely avoided, AI-Binned spectra can be unit-variance scaled. This enables the detection of relevant, low-intensity metabolites. These results demonstrate the power of AI-Binning and suggest the involvement of alpha-1 acid glycoproteins and choline biochemistry in hypertension.


Assuntos
Algoritmos , Hipertensão/metabolismo , Espectroscopia de Ressonância Magnética/métodos , Biomarcadores/metabolismo , Humanos
4.
Bioinformatics ; 23(2): e64-70, 2007 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-17237107

RESUMO

MOTIVATION: The validity of periodic cell cycle regulation studies in plants is seriously compromised by the relatively poor quality of cell synchrony that is achieved for plant suspension cultures in comparison to yeast and mammals. The present state-of-the-art plant synchronization techniques cannot offer a complete cell cycle coverage and moreover a considerable loss of cell synchrony may occur toward the end of the sampling. One possible solution is to consider combining multiple datasets, produced by different synchronization techniques and thus covering different phases of the cell cycle, in order to arrive at a better cell cycle coverage. RESULTS: We propose a method that enables pasting expression profiles from different plant cell synchronization experiments and results in an expression curve that spans more than one cell cycle. The optimal pasting overlap is determined via a dynamic time warping alignment. Consequently, the different expression time series are merged together by aggregating the corresponding expression values lying within the overlap area. We demonstrate that the periodic analysis of the merged expression profiles produces more reliable p-values for periodicity. Subsequent Gene Ontology analysis of the results confirms that merging synchronization experiments is a more robust strategy for the selection of potentially periodic genes. Additional validation of the proposed algorithm on yeast data is also presented. AVAILABILITY: Results, benchmark sets and scripts are freely available at our website: http://www.psb.ugent.be/cbd/publications.php


Assuntos
Algoritmos , Proteínas de Arabidopsis/metabolismo , Arabidopsis/citologia , Arabidopsis/metabolismo , Proteínas de Ciclo Celular/metabolismo , Ciclo Celular/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Análise Numérica Assistida por Computador
5.
J Bioinform Comput Biol ; 5(5): 1005-22, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17933008

RESUMO

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Biologia Computacional , Interpretação Estatística de Dados , Bases de Dados Genéticas/estatística & dados numéricos
6.
Bioinformatics ; 22(2): 251-2, 2006 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-16293669

RESUMO

UNLABELLED: An application tool for alignment, template matching and visualization of gene expression time series is presented. The core algorithm is based on dynamic time warping techniques used in the speech recognition field. These techniques allow for non-linear (elastic) alignment of temporal sequences of feature vectors and consequently enable detection of similar shapes with different phases. AVAILABILITY: The Java program, examples and a tutorial are available at http://www.psb.ugent.be/cbd/papers/gentxwarper/


Assuntos
Algoritmos , Gráficos por Computador , Perfilação da Expressão Gênica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Interface Usuário-Computador , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa