Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
BMC Bioinformatics ; 10: 53, 2009 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-19200394

RESUMEN

BACKGROUND: Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation. RESULTS: A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic performance. CONCLUSION: Multiple parameters can influence the selection of a gene-signature and its predictive power, thus possible biases in validation methods must always be accounted for. This paper illustrates that independent test-set evaluation reduces the bias of CV, and case-specific measures reveal stability characteristics of the gene-signature over changes of the training set. Moreover, frequency measures on gene selection address the algorithmic consistency in selecting the same gene signature under different training conditions. These issues contribute to the development of an objective evaluation framework and aid the derivation of statistically consistent gene signatures that could eventually be correlated with biological relevance. The benefits of the proposed framework are supported by the evaluation results and methodological comparisons performed for several gene-selection algorithms on three publicly available datasets.


Asunto(s)
Algoritmos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos
2.
Comput Methods Programs Biomed ; 91(1): 22-35, 2008 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-18423925

RESUMEN

OBJECTIVE: The problem of gene selection has been extensively studied in a number of scientific works using various kinds of methods. However, the application of a linear neuron is a novel approach possessing several advantages. In this work we propose to study the behavior of such a linear neuron, appropriately adapted and trained to the problem of gene selection in the DNA microarray experiment. METHODS AND MATERIALS: We explore the proposed approach in terms of an accuracy evaluation criterion, which is used to assess the performance of the proposed methodology, but we also evaluate the produced results in terms of cluster quality and survival prediction. Cluster quality reflects the ability of the method to select differentially expressed genes, which in turn leads to better clustering and survival prediction. RESULTS: We directly compare the proposed methodology with RFE-SVM, a well known and broadly accepted method demonstrating remarkable performance on various data sets of clinical interest. CONCLUSIONS: Conducted computational experiments show that the proposed approach can be efficiently used within the field of gene selection producing high-quality results in terms of accuracy and robustness.


Asunto(s)
Biomarcadores de Tumor/análisis , Diagnóstico por Computador/métodos , Perfilación de la Expresión Génica/métodos , Proteínas de Neoplasias/análisis , Neoplasias/diagnóstico , Neoplasias/mortalidad , Medición de Riesgo/métodos , Análisis de Supervivencia , Humanos , Neoplasias/metabolismo , Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas/métodos , Pronóstico , Reproducibilidad de los Resultados , Factores de Riesgo , Sensibilidad y Especificidad , Tasa de Supervivencia
3.
Comput Biol Med ; 38(8): 894-912, 2008 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-18656182

RESUMEN

OBJECTIVE: The problem of marker selection in DNA microarray analysis has been addressed so far by two basic types of approaches, the so-called filter and wrapper methods. Wrapper methods operate in a recursive fashion where feature (gene) weights are re-evaluated and dynamically changing from iteration to iteration, while in filter methods feature weights remain fixed. Our objective in this study is to show that the application of filter criteria in a recursive fashion, where weights are potentially adjusted from cycle to cycle, produces noticeable improvement on the generalization performance measured on independent test sets. METHODS AND MATERIALS: Toward this direction we explore the behavior of two well known and broadly accepted pattern recognition approaches namely the support vector machines (SVM) and a single linear neuron (LN), properly adapted to the problem of marker selection. Within this context we also show how the kernel ability of SVM could be employed in a practical manner to provide alternative ways to approach the problem of reliable marker selection. RESULTS: We explore how the proposed approaches behave in two application domains (breast cancer and leukemia), achieving comparable or even better results than those reported in the related bibliography. An important advantage of these approaches is their ability to derive stable performance without deteriorating due to the complexity of the application domain. Validation is performed using internal leave one out (ILOO) and 10-fold cross validation as well as independent test set evaluation. CONCLUSIONS: Results show that the proposed methodologies achieve remarkable performance and indicate that applying filter criteria in a wrapper fashion ('wrapper filtering criteria') provides a useful tool for marker selection. The contribution of this study is threefold. First it provides a methodology to apply filter criteria in a wrapper way (which is a new approach), second it introduces a fundamental pattern recognition component namely the single neuron (which is a linear estimator) and explores its behavior on marker selection and third it demonstrates an approach to exploit the kernel ability of SVMs in a practical and effective manner.


Asunto(s)
Neuronas , Algoritmos , Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos
4.
Artif Intell Med ; 53(1): 57-71, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21767937

RESUMEN

OBJECTIVE: Gene expression patterns that distinguish clinically significant disease subclasses may not only play a prominent role in diagnosis, but also lead to the therapeutic strategies tailoring the treatment to the particular biology of each disease. Nevertheless, gene expression signatures derived through statistical feature-extraction procedures on population datasets have received rightful criticism, since they share few genes in common, even when derived from the same dataset. We focus on knowledge complementarities conveyed by two or more gene-expression signatures by means of embedded biological processes and pathways, which alternatively form a meta-knowledge platform of analysis towards a more global, robust and powerful solution. METHODS: The main contribution of this work is the introduction and study of an approach for integrating different gene signatures based on the underlying biological knowledge, in an attempt to derive a unified global solution. It is further recognized that one group's signature does not perform well on another group's data, due to incompatibilities of microarray technologies and the experimental design. We assess this cross-platform aspect, showing that a unified solution derived on the basis of both statistical and biological validation may also help in overcoming such inconsistencies. RESULTS: Based on the proposed approach we derived a unified 69-gene signature, which outperforms significantly the performance of the initial signatures succeeding a 0.73 accuracy metric on 234 new patients with 81% sensitivity and 64% specificity. The same signature manages to reveal the two prognostic groups on an additional dataset of 286 new patients obtained through a different experimental protocol and microarray platform. Furthermore, it manages to derive two clusters in a dataset from a different platform, showing remarkable difference on both gene-expression and survival-prediction levels.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Neoplasias de la Mama/genética , Bases de Datos Factuales , Femenino , Humanos , Bases del Conocimiento , Sensibilidad y Especificidad
5.
IEEE Trans Inf Technol Biomed ; 15(1): 155-63, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20813648

RESUMEN

The concept of gene signature overlap has been addressed previously in a number of research papers. A common conclusion is the absence of significant overlap. In this paper, we verify the aforementioned fact, but we also assess the issue of similarities not on the gene level, but on the biology level hidden underneath a given signature. We proceed by taking into account the biological knowledge that exists among different signatures, and use it as a means of integrating them and refining their statistical significance on the datasets. In this form, by integrating biological knowledge with information stemming from data distributions, we derive a unified signature that is significantly improved over its predecessors in terms of performance and robustness. Our motive behind this approach is to assess the problem of evaluating different signatures not in a competitive but rather in a complementary manner, where one is treated as a pool of knowledge contributing to a global and unified solution.


Asunto(s)
Biomarcadores de Tumor/genética , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos , Algoritmos , Área Bajo la Curva , Neoplasias de la Mama/genética , Análisis por Conglomerados , Bases de Datos de Ácidos Nucleicos , Femenino , Humanos , Estimación de Kaplan-Meier , Curva ROC
6.
Artículo en Inglés | MEDLINE | ID: mdl-19963602

RESUMEN

The concept of deriving a gene signature in breast cancer has been addressed by different research groups, each one proposing a different solution with minor overlap among them. There is still an open issue of unifying results among different research groups. In this study we evaluate two published signatures, namely the 70 gene signature of Netherlands group and a 57 gene signature published in our previous study and propose an evaluation platform under which the underlined signatures could be compared effectively. After such an evaluation, we proceed with a unified signature and assess its performance with improved efficiency over the initial signatures.


Asunto(s)
Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos/instrumentación , Algoritmos , Área Bajo la Curva , Biología Computacional/métodos , Femenino , Regulación de la Expresión Génica , Genómica , Humanos , Modelos Biológicos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Pronóstico , Recurrencia , Reproducibilidad de los Resultados
7.
Artículo en Inglés | MEDLINE | ID: mdl-18002933

RESUMEN

The problem of marker selection in DNA microarray analysis has been mostly addressed by linear methods. RFE-SVM is such a representative method where a linear kernel is used as the basic tool to address the problem. On the other hand a single neuron is known to be a linear estimator. In this study we explore such a single neuron to address the problem of marker selection.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/genética , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Redes Neurales de la Computación , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Biomarcadores de Tumor/biosíntesis , Neoplasias de la Mama/metabolismo , Femenino , Humanos , Neuronas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA