Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Methods Mol Biol ; 453: 363-77, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18712314

RESUMO

The aim of this chapter is to present combinatorial optimization models and techniques for the analysis of microarray datasets. The chapter illustrates the application of a novel objective function that guides the search for high-quality solutions for sequential ordering of expression profiles. The approach is unsupervised and a metaheuristic method (a memetic algorithm) is used to provide high-quality solutions. For the problem of selecting discriminative groups of genes, we used a supervised method that has provided good results in a variety of datasets. This chapter illustrates the application of these models in an Alzheimer's disease microarray dataset.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Animais , Expressão Gênica , Humanos
2.
Methods Mol Biol ; 453: 379-92, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18712315

RESUMO

This chapter illustrates the use of the combinatorial optimization models presented in Chapter 19 for the Feature Set selection and Gene Ordering problems to find genetic signatures for diseases using micro-array data. We demonstrate the quality of this approach by using a microarray dataset from a mouse model of Parkinson's disease. The results are accompanied by a description of the currently known molecular functions and biological processes of the genes in the signatures.


Assuntos
Perfilação da Expressão Gênica , Ordem dos Genes , Análise de Sequência com Séries de Oligonucleotídeos , Doença de Parkinson/genética , Animais , Modelos Animais de Doenças , Camundongos , Seleção Genética
3.
BMC Med Genomics ; 10(1): 19, 2017 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-28351365

RESUMO

BACKGROUND: Basal-like constitutes an important molecular subtype of breast cancer characterised by an aggressive behaviour and a limited therapy response. The outcome of patients within this subtype is, however, divergent. Some individuals show an increased risk of dying in the first five years, and others a long-term survival of over ten years after the diagnosis. In this study, we aim at identifying markers associated with basal-like patients' survival and characterising subgroups with distinct disease outcome. METHODS: We explored the genomic and transcriptomic profiles of 351 basal-like samples from the METABRIC and ROCK data sets. Two selection methods, labelled Differential and Survival filters, were employed to determine genes/probes that are differentially expressed in tumour and control samples, and are associated with overall survival. These probes were further used to define molecular subgroups, which vary at the microRNA level and in DNA copy number. RESULTS: We identified the expression signature of 80 probes that distinguishes between two basal-like subgroups with distinct clinical features and survival outcomes. Genes included in this list have been mainly linked to cancer immune response, epithelial-mesenchymal transition and cell cycle. In particular, high levels of CXCR6, HCST, C3AR1 and FPR3 were found in Basal I; whereas HJURP, RRP12 and DNMT3B appeared over-expressed in Basal II. These genes exhibited the highest betweenness centrality and node degree values and play a key role in the basal-like breast cancer differentiation. Further molecular analysis revealed 17 miRNAs correlated to the subgroups, including hsa-miR-342-5p, -150, -155, -200c and -17. Additionally, increased percentages of gains/amplifications were detected on chromosomes 1q, 3q, 8q, 10p and 17q, and losses/deletions on 4q, 5q, 8p and X, associated with reduced survival. CONCLUSIONS: The proposed signature supports the existence of at least two subgroups of basal-like breast cancers with distinct disease outcome. The identification of patients at a low risk may impact the clinical decisions-making by reducing the prescription of high-dose chemotherapy and, consequently, avoiding adverse effects. The recognition of other aggressive features within this subtype may be also critical for improving individual care and for delineating more effective therapies for patients at high risk.


Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Biologia Computacional , Variações do Número de Cópias de DNA , Perfilação da Expressão Gênica , Humanos , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sobrevida
4.
PLoS One ; 11(4): e0152342, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27050411

RESUMO

BACKGROUND: Alzheimer's disease (AD) is the most common form of dementia in older adults that damages the brain and results in impaired memory, thinking and behaviour. The identification of differentially expressed genes and related pathways among affected brain regions can provide more information on the mechanisms of AD. In the past decade, several studies have reported many genes that are associated with AD. This wealth of information has become difficult to follow and interpret as most of the results are conflicting. In that case, it is worth doing an integrated study of multiple datasets that helps to increase the total number of samples and the statistical power in detecting biomarkers. In this study, we present an integrated analysis of five different brain region datasets and introduce new genes that warrant further investigation. METHODS: The aim of our study is to apply a novel combinatorial optimisation based meta-analysis approach to identify differentially expressed genes that are associated to AD across brain regions. In this study, microarray gene expression data from 161 samples (74 non-demented controls, 87 AD) from the Entorhinal Cortex (EC), Hippocampus (HIP), Middle temporal gyrus (MTG), Posterior cingulate cortex (PC), Superior frontal gyrus (SFG) and visual cortex (VCX) brain regions were integrated and analysed using our method. The results are then compared to two popular meta-analysis methods, RankProd and GeneMeta, and to what can be obtained by analysing the individual datasets. RESULTS: We find genes related with AD that are consistent with existing studies, and new candidate genes not previously related with AD. Our study confirms the up-regualtion of INFAR2 and PTMA along with the down regulation of GPHN, RAB2A, PSMD14 and FGF. Novel genes PSMB2, WNK1, RPL15, SEMA4C, RWDD2A and LARGE are found to be differentially expressed across all brain regions. Further investigation on these genes may provide new insights into the development of AD. In addition, we identified the presence of 23 non-coding features, including four miRNA precursors (miR-7, miR570, miR-1229 and miR-6821), dysregulated across the brain regions. Furthermore, we compared our results with two popular meta-analysis methods RankProd and GeneMeta to validate our findings and performed a sensitivity analysis by removing one dataset at a time to assess the robustness of our results. These new findings may provide new insights into the disease mechanisms and thus make a significant contribution in the near future towards understanding, prevention and cure of AD.


Assuntos
Doença de Alzheimer/genética , Encéfalo/metabolismo , Perfilação da Expressão Gênica , Biomarcadores/metabolismo , Encéfalo/patologia , Mapeamento Encefálico , Humanos
5.
PLoS One ; 11(1): e0146116, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26764911

RESUMO

Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, ß) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.


Assuntos
Algoritmos , Modelos Teóricos
6.
PLoS One ; 11(8): e0157988, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27571416

RESUMO

In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.


Assuntos
Algoritmos , Análise por Conglomerados
7.
BioData Min ; 9: 2, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26770261

RESUMO

BACKGROUND: Multi-gene lists and single sample predictor models have been currently used to reduce the multidimensional complexity of breast cancers, and to identify intrinsic subtypes. The perceived inability of some models to deal with the challenges of processing high-dimensional data, however, limits the accurate characterisation of these subtypes. Towards the development of robust strategies, we designed an iterative approach to consistently discriminate intrinsic subtypes and improve class prediction in the METABRIC dataset. FINDINGS: In this study, we employed the CM1 score to identify the most discriminative probes for each group, and an ensemble learning technique to assess the ability of these probes on assigning subtype labels using 24 different classifiers. Our analysis is comprised of an iterative computation of these methods and statistical measures performed on a set of over 2000 samples. The refined labels assigned using this iterative approach revealed to be more consistent and in better agreement with clinicopathological markers and patients' overall survival than those originally provided by the PAM50 method. CONCLUSIONS: The assignment of intrinsic subtypes has a significant impact in translational research for both understanding and managing breast cancer. The refined labelling, therefore, provides more accurate and reliable information by improving the source of fundamental science prior to clinical applications in medicine.

8.
PLoS One ; 10(6): e0127702, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26106884

RESUMO

BACKGROUND: The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. METHODS: We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,ß)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. RESULTS: Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Neoplasias da Próstata/genética , Transcriptoma/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Análise de Sequência com Séries de Oligonucleotídeos , Neoplasias da Próstata/patologia
9.
PLoS One ; 10(7): e0129711, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26132585

RESUMO

BACKGROUND: The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction. METHODS AND FINDINGS: The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method. CONCLUSIONS: The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes.


Assuntos
Biomarcadores Tumorais , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Neoplasias da Mama/mortalidade , Análise por Conglomerados , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genômica/métodos , Humanos , Prognóstico , Reprodutibilidade dos Testes , Transcriptoma
10.
Microarrays (Basel) ; 2(2): 131-52, 2013 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-27605185

RESUMO

While Illumina microarrays can be used successfully for detecting small gene expression changes due to their high degree of technical replicability, there is little information on how different normalization and differential expression analysis strategies affect outcomes. To evaluate this, we assessed concordance across gene lists generated by applying different combinations of normalization strategy and analytical approach to two Illumina datasets with modest expression changes. In addition to using traditional statistical approaches, we also tested an approach based on combinatorial optimization. We found that the choice of both normalization strategy and analytical approach considerably affected outcomes, in some cases leading to substantial differences in gene lists and subsequent pathway analysis results. Our findings suggest that important biological phenomena may be overlooked when there is a routine practice of using only one approach to investigate all microarray datasets. Analytical artefacts of this kind are likely to be especially relevant for datasets involving small fold changes, where inherent technical variation-if not adequately minimized by effective normalization-may overshadow true biological variation. This report provides some basic guidelines for optimizing outcomes when working with Illumina datasets involving small expression changes.

11.
PLoS One ; 7(8): e44000, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22937144

RESUMO

BACKGROUND: The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers). An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU), can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem. RESULTS: We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN) search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50-60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets. CONCLUSION: Our GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN) provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL) at https://sourceforge.net/p/gpufsknn/.


Assuntos
Biologia Computacional/métodos , Linguagens de Programação , Software , Algoritmos , Análise por Conglomerados
12.
PLoS One ; 7(4): e34341, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22485168

RESUMO

BACKGROUND: Recent Alzheimer's disease (AD) research has focused on finding biomarkers to identify disease at the pre-clinical stage of mild cognitive impairment (MCI), allowing treatment to be initiated before irreversible damage occurs. Many studies have examined brain imaging or cerebrospinal fluid but there is also growing interest in blood biomarkers. The Alzheimer's Disease Neuroimaging Initiative (ADNI) has generated data on 190 plasma analytes in 566 individuals with MCI, AD or normal cognition. We conducted independent analyses of this dataset to identify plasma protein signatures predicting pre-clinical AD. METHODS AND FINDINGS: We focused on identifying signatures that discriminate cognitively normal controls (n = 54) from individuals with MCI who subsequently progress to AD (n = 163). Based on p value, apolipoprotein E (APOE) showed the strongest difference between these groups (p = 2.3 × 10(-13)). We applied a multivariate approach based on combinatorial optimization ((α,ß)-k Feature Set Selection), which retains information about individual participants and maintains the context of interrelationships between different analytes, to identify the optimal set of analytes (signature) to discriminate these two groups. We identified 11-analyte signatures achieving values of sensitivity and specificity between 65% and 86% for both MCI and AD groups, depending on whether APOE was included and other factors. Classification accuracy was improved by considering "meta-features," representing the difference in relative abundance of two analytes, with an 8-meta-feature signature consistently achieving sensitivity and specificity both over 85%. Generating signatures based on longitudinal rather than cross-sectional data further improved classification accuracy, returning sensitivities and specificities of approximately 90%. CONCLUSIONS: Applying these novel analysis approaches to the powerful and well-characterized ADNI dataset has identified sets of plasma biomarkers for pre-clinical AD. While studies of independent test sets are required to validate the signatures, these analyses provide a starting point for developing a cost-effective and minimally invasive test capable of diagnosing AD in its pre-clinical stages.


Assuntos
Doença de Alzheimer/diagnóstico , Proteínas Sanguíneas/metabolismo , Disfunção Cognitiva/diagnóstico , Proteoma/metabolismo , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/sangue , Apolipoproteínas E/sangue , Apolipoproteínas E/genética , Biomarcadores/sangue , Estudos de Casos e Controles , Disfunção Cognitiva/sangue , Diagnóstico Precoce , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Análise Multivariada , Sensibilidade e Especificidade
13.
PLoS One ; 7(9): e45535, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23029078

RESUMO

BACKGROUND: One primary goal of transcriptomic studies is identifying gene expression patterns correlating with disease progression. This is usually achieved by considering transcripts that independently pass an arbitrary threshold (e.g. p<0.05). In diseases involving severe perturbations of multiple molecular systems, such as Alzheimer's disease (AD), this univariate approach often results in a large list of seemingly unrelated transcripts. We utilised a powerful multivariate clustering approach to identify clusters of RNA biomarkers strongly associated with markers of AD progression. We discuss the value of considering pairs of transcripts which, in contrast to individual transcripts, helps avoid natural human transcriptome variation that can overshadow disease-related changes. METHODOLOGY/PRINCIPAL FINDINGS: We re-analysed a dataset of hippocampal transcript levels in nine controls and 22 patients with varying degrees of AD. A large-scale clustering approach determined groups of transcript probe sets that correlate strongly with measures of AD progression, including both clinical and neuropathological measures and quantifiers of the characteristic transcriptome shift from control to severe AD. This enabled identification of restricted groups of highly correlated probe sets from an initial list of 1,372 previously published by our group. We repeated this analysis on an expanded dataset that included all pair-wise combinations of the 1,372 probe sets. As clustering of this massive dataset is unfeasible using standard computational tools, we adapted and re-implemented a clustering algorithm that uses external memory algorithmic approach. This identified various pairs that strongly correlated with markers of AD progression and highlighted important biological pathways potentially involved in AD pathogenesis. CONCLUSIONS/SIGNIFICANCE: Our analyses demonstrate that, although there exists a relatively large molecular signature of AD progression, only a small number of transcripts recurrently cluster with different markers of AD progression. Furthermore, considering the relationship between two transcripts can highlight important biological relationships that are missed when considering either transcript in isolation.


Assuntos
Doença de Alzheimer/genética , Perfilação da Expressão Gênica , Transcriptoma , Algoritmos , Doença de Alzheimer/patologia , Biomarcadores , Análise por Conglomerados , Biologia Computacional/métodos , Bases de Dados Genéticas , Progressão da Doença , Humanos , Anotação de Sequência Molecular , Reprodutibilidade dos Testes
14.
PLoS One ; 6(1): e14468, 2011 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-21267077

RESUMO

BACKGROUND: The visualization of large volumes of data is a computationally challenging task that often promises rewarding new insights. There is great potential in the application of new algorithms and models from combinatorial optimisation. Datasets often contain "hidden regularities" and a combined identification and visualization method should reveal these structures and present them in a way that helps analysis. While several methodologies exist, including those that use non-linear optimization algorithms, severe limitations exist even when working with only a few hundred objects. METHODOLOGY/PRINCIPAL FINDINGS: We present a new data visualization approach (QAPgrid) that reveals patterns of similarities and differences in large datasets of objects for which a similarity measure can be computed. Objects are assigned to positions on an underlying square grid in a two-dimensional space. We use the Quadratic Assignment Problem (QAP) as a mathematical model to provide an objective function for assignment of objects to positions on the grid. We employ a Memetic Algorithm (a powerful metaheuristic) to tackle the large instances of this NP-hard combinatorial optimization problem, and we show its performance on the visualization of real data sets. CONCLUSIONS/SIGNIFICANCE: Overall, the results show that QAPgrid algorithm is able to produce a layout that represents the relationships between objects in the data set. Furthermore, it also represents the relationships between clusters that are feed into the algorithm. We apply the QAPgrid on the 84 Indo-European languages instance, producing a near-optimal layout. Next, we produce a layout of 470 world universities with an observed high degree of correlation with the score used by the Academic Ranking of World Universities compiled in the The Shanghai Jiao Tong University Academic Ranking of World Universities without the need of an ad hoc weighting of attributes. Finally, our Gene Ontology-based study on Saccharomyces cerevisiae fully demonstrates the scalability and precision of our method as a novel alternative tool for functional genomics.


Assuntos
Algoritmos , Gráficos por Computador , Bases de Dados Factuais , Modelos Teóricos , Análise por Conglomerados , Genômica/métodos , Métodos , Saccharomyces cerevisiae/genética
15.
PLoS One ; 6(3): e17481, 2011 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-21479255

RESUMO

BACKGROUND: In November 2007 a study published in Nature Medicine proposed a simple test based on the abundance of 18 proteins in blood to predict the onset of clinical symptoms of Alzheimer's Disease (AD) two to six years before these symptoms manifest. Later, another study, published in PLoS ONE, showed that only five proteins (IL-1, IL-3, EGF, TNF- and G-CSF) have overall better prediction accuracy. These classifiers are based on the abundance of 120 proteins. Such values were standardised by a Z-score transformation, which means that their values are relative to the average of all others. METHODOLOGY: The original datasets from the Nature Medicine paper are further studied using methods from combinatorial optimisation and Information Theory. We expand the original dataset by also including all pair-wise differences of z-score values of the original dataset ("metafeatures"). Using an exact algorithm to solve the resulting Feature Set problem, used to tackle the feature selection problem, we found signatures that contain either only features, metafeatures or both, and evaluated their predictive performance on the independent test set. CONCLUSIONS: It was possible to show that a specific pattern of cell signalling imbalance in blood plasma has valuable information to distinguish between NDC and AD samples. The obtained signatures were able to predict AD in patients that already had a Mild Cognitive Impairment (MCI) with up to 84% of sensitivity, while maintaining also a strong prediction accuracy of 90% on a independent dataset with Non Demented Controls (NDC) and AD samples. The novel biomarkers uncovered with this method now confirms ANG-2, IL-11, PDGF-BB, CCL15/MIP-1; and supports the joint measurement of other signalling proteins not previously discussed: GM-CSF, NT-3, IGFBP-2 and VEGF-B.


Assuntos
Doença de Alzheimer/sangue , Doença de Alzheimer/diagnóstico , Proteínas Sanguíneas , Transdução de Sinais , Doença de Alzheimer/metabolismo , Doença de Alzheimer/patologia , Biomarcadores/sangue , Bases de Dados de Proteínas , Diagnóstico Precoce , Humanos
16.
PLoS One ; 5(8): e12262, 2010 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-20805891

RESUMO

BACKGROUND: It is a commonly accepted belief that cancer cells modify their transcriptional state during the progression of the disease. We propose that the progression of cancer cells towards malignant phenotypes can be efficiently tracked using high-throughput technologies that follow the gradual changes observed in the gene expression profiles by employing Shannon's mathematical theory of communication. Methods based on Information Theory can then quantify the divergence of cancer cells' transcriptional profiles from those of normally appearing cells of the originating tissues. The relevance of the proposed methods can be evaluated using microarray datasets available in the public domain but the method is in principle applicable to other high-throughput methods. METHODOLOGY/PRINCIPAL FINDINGS: Using melanoma and prostate cancer datasets we illustrate how it is possible to employ Shannon Entropy and the Jensen-Shannon divergence to trace the transcriptional changes progression of the disease. We establish how the variations of these two measures correlate with established biomarkers of cancer progression. The Information Theory measures allow us to identify novel biomarkers for both progressive and relatively more sudden transcriptional changes leading to malignant phenotypes. At the same time, the methodology was able to validate a large number of genes and processes that seem to be implicated in the progression of melanoma and prostate cancer. CONCLUSIONS/SIGNIFICANCE: We thus present a quantitative guiding rule, a new unifying hallmark of cancer: the cancer cell's transcriptome changes lead to measurable observed transitions of Normalized Shannon Entropy values (as measured by high-throughput technologies). At the same time, tumor cells increment their divergence from the normal tissue profile increasing their disorder via creation of states that we might not directly measure. This unifying hallmark allows, via the the Jensen-Shannon divergence, to identify the arrow of time of the processes from the gene expression profiles, and helps to map the phenotypical and molecular hallmarks of specific cancer subtypes. The deep mathematical basis of the approach allows us to suggest that this principle is, hopefully, of general applicability for other diseases.


Assuntos
Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Entropia , Neoplasias/genética , Animais , Biomarcadores Tumorais/metabolismo , Perfilação da Expressão Gênica , Genes Neoplásicos/genética , Humanos , Masculino , Melanoma/genética , Melanoma/metabolismo , Melanoma/patologia , Camundongos , Neoplasias/metabolismo , Neoplasias/patologia , Análise de Sequência com Séries de Oligonucleotídeos , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia
17.
PLoS One ; 5(4): e10153, 2010 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-20405009

RESUMO

BACKGROUND: Alzheimer's disease (AD) is characterized by a neurodegenerative progression that alters cognition. On a phenotypical level, cognition is evaluated by means of the MiniMental State Examination (MMSE) and the post-mortem examination of Neurofibrillary Tangle count (NFT) helps to confirm an AD diagnostic. The MMSE evaluates different aspects of cognition including orientation, short-term memory (retention and recall), attention and language. As there is a normal cognitive decline with aging, and death is the final state on which NFT can be counted, the identification of brain gene expression biomarkers from these phenotypical measures has been elusive. METHODOLOGY/PRINCIPAL FINDINGS: We have reanalysed a microarray dataset contributed in 2004 by Blalock et al. of 31 samples corresponding to hippocampus gene expression from 22 AD subjects of varying degree of severity and 9 controls. Instead of only relying on correlations of gene expression with the associated MMSE and NFT measures, and by using modern bioinformatics methods based on information theory and combinatorial optimization, we uncovered a 1,372-probe gene expression signature that presents a high-consensus with established markers of progression in AD. The signature reveals alterations in calcium, insulin, phosphatidylinositol and wnt-signalling. Among the most correlated gene probes with AD severity we found those linked to synaptic function, neurofilament bundle assembly and neuronal plasticity. CONCLUSIONS/SIGNIFICANCE: A transcription factors analysis of 1,372-probe signature reveals significant associations with the EGR/KROX family of proteins, MAZ, and E2F1. The gene homologous of EGR1, zif268, Egr-1 or Zenk, together with other members of the EGR family, are consolidating a key role in the neuronal plasticity in the brain. These results indicate a degree of commonality between putative genes involved in AD and prion-induced neurodegenerative processes that warrants further investigation.


Assuntos
Doença de Alzheimer/diagnóstico , Biomarcadores/análise , Transtornos Cognitivos/diagnóstico , Perfilação da Expressão Gênica , Hipocampo/metabolismo , Doença de Alzheimer/genética , Transtornos Cognitivos/genética , Biologia Computacional/métodos , Hipocampo/patologia , Humanos , Plasticidade Neuronal , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fatores de Transcrição
18.
PLoS One ; 5(12): e14176, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21152067

RESUMO

BACKGROUND: Several lines of evidence suggest that transcription factors are involved in the pathogenesis of Multiple Sclerosis (MS) but complete mapping of the whole network has been elusive. One of the reasons is that there are several clinical subtypes of MS and transcription factors that may be involved in one subtype may not be in others. We investigate the possibility that this network could be mapped using microarray technologies and contemporary bioinformatics methods on a dataset derived from whole blood in 99 untreated MS patients (36 Relapse Remitting MS, 43 Primary Progressive MS, and 20 Secondary Progressive MS) and 45 age-matched healthy controls. METHODOLOGY/PRINCIPAL FINDINGS: We have used two different analytical methodologies: a non-standard differential expression analysis and a differential co-expression analysis, which have converged on a significant number of regulatory motifs that are statistically overrepresented in genes that are either differentially expressed (or differentially co-expressed) in cases and controls (e.g., V$KROX_Q6, p-value <3.31E-6; V$CREBP1_Q2, p-value <9.93E-6, V$YY1_02, p-value <1.65E-5). CONCLUSIONS/SIGNIFICANCE: Our analysis uncovered a network of transcription factors that potentially dysregulate several genes in MS or one or more of its disease subtypes. The most significant transcription factor motifs were for the Early Growth Response EGR/KROX family, ATF2, YY1 (Yin and Yang 1), E2F-1/DP-1 and E2F-4/DP-2 heterodimers, SOX5, and CREB and ATF families. These transcription factors are involved in early T-lymphocyte specification and commitment as well as in oligodendrocyte dedifferentiation and development, both pathways that have significant biological plausibility in MS causation.


Assuntos
Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Esclerose Múltipla/sangue , RNA Mensageiro/metabolismo , Fatores de Transcrição/metabolismo , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla/metabolismo , Oligodendroglia/citologia
19.
J Neurosci Methods ; 181(2): 257-67, 2009 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-19445963

RESUMO

In this sequel to our previous work [Rosso OA, Mendes A, Rostas JA, Hunter M, Moscato P. Distinguishing childhood absence epilepsy patients from controls by the analysis of their background brain electrical activity. J. Neurosci. Methods 2009;177:461-68], we extend the analysis of background electroencephalography (EEG), recorded with scalp electrodes in a clinical setting, in children with childhood absence epilepsy (CAE) and control individuals. The same set of individuals was considered-five CAE patients, all right-handed females and aged 6-8 years. The EEG was obtained using bipolar connections from a standard 10-20 electrode placement. The functional activity between electrodes was evaluated using a wavelet decomposition in conjunction with the Wootters distance. In the previous study, a Kruskal-Wallis statistical test was used to select the pairs of electrodes with differentiated behavior between CAE and control samples (classes). In this contribution, we present the results for a combinatorial optimization approach to select the pairs of electrodes. The new method produces a better separation between the classes, and at the same time uses a smaller number of features (pairs of electrodes). It managed to almost halve the number of features and also improves the separation between the CAE and control samples. The new results strengthen the hypothesis that mostly fronto-central electrodes carry useful information and patterns that can help to discriminate CAE cases from controls. Finally, we provide a comprehensive set of tests and in-depth explanation of the method and results.


Assuntos
Encéfalo/fisiopatologia , Eletrodiagnóstico , Epilepsia Tipo Ausência/fisiopatologia , Criança , Eletrodos , Eletroencefalografia , Feminino , Humanos , Modelos Neurológicos , Processamento de Sinais Assistido por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA