Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Arthritis Rheumatol ; 67(11): 2905-15, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26195278

RESUMEN

OBJECTIVE: Inflammatory mediators, such as prostaglandin E2 (PGE2 ) and interleukin-1ß (IL-1ß), are produced by osteoarthritic (OA) joint tissue, where they may contribute to disease pathogenesis. We undertook the present study to examine whether inflammation, evidenced in plasma and peripheral blood leukocytes (PBLs), reflects the presence, progression, or specific symptoms of symptomatic knee OA. METHODS: Patients with symptomatic knee OA were enrolled in a 24-month prospective study of radiographic progression. Standardized knee radiographs were obtained at baseline and 24 months. At baseline, levels of the plasma lipids PGE2 and 15-hydroxyeicosatetraenoic acid (15-HETE) were measured, and transcriptome analysis of PBLs was performed by microarray and quantitative polymerase chain reaction. RESULTS: Baseline PGE2 synthase (PGES) levels determined by PBL microarray gene expression and plasma PGE2 levels distinguished patients with symptomatic knee OA from non-OA controls (area under the receiver operating characteristic curve [AUC] 0.87 and 0.89, respectively, P < 0.0001). Baseline plasma 15-HETE levels were significantly elevated in patients with symptomatic knee OA versus non-OA controls (P < 0.0195). In the 146 patients who completed the 24-month study, elevated baseline expression of IL-1ß, tumor necrosis factor α, and cyclooxygenase 2 (COX-2) messenger RNA in PBLs predicted higher risk of radiographic progression as evidenced by joint space narrowing (JSN). In a multivariate model, AUC point estimates of models containing COX-2 in combination with demographic traits overlapped the confidence interval of the base model in 2 of the 3 JSN outcome measures (JSN >0.0 mm, JSN >0.2 mm, and JSN >0.5 mm; AUC 0.62-0.67). CONCLUSION: The inflammatory plasma lipid biomarkers PGE2 and 15-HETE identify patients with symptomatic knee OA, and the PBL inflammatory transcriptome identifies a subset of patients with symptomatic knee OA who are at increased risk of radiographic progression. These findings may reflect low-grade inflammation in OA and may be useful as diagnostic and prognostic biomarkers in clinical development of disease-modifying OA drugs.


Asunto(s)
Dinoprostona/sangre , Ácidos Hidroxieicosatetraenoicos/sangre , Inflamación/patología , Articulación de la Rodilla/patología , Osteoartritis de la Rodilla/patología , Anciano , Biomarcadores/sangre , Progresión de la Enfermedad , Femenino , Humanos , Inflamación/sangre , Articulación de la Rodilla/diagnóstico por imagen , Masculino , Persona de Mediana Edad , Osteoartritis de la Rodilla/sangre , Osteoartritis de la Rodilla/diagnóstico por imagen , Pronóstico , Estudios Prospectivos , Radiografía
2.
PLoS One ; 10(2): e0118132, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25705890

RESUMEN

Field of cancerization in the airway epithelium has been increasingly examined to understand early pathogenesis of non-small cell lung cancer. However, the extent of field of cancerization throughout the lung airways is unclear. Here we sought to determine the differential gene and microRNA expressions associated with field of cancerization in the peripheral airway epithelial cells of patients with lung adenocarcinoma. We obtained peripheral airway brushings from smoker controls (n=13) and from the lung contralateral to the tumor in cancer patients (n=17). We performed gene and microRNA expression profiling on these peripheral airway epithelial cells using Affymetrix GeneChip and TaqMan Array. Integrated gene and microRNA analysis was performed to identify significant molecular pathways. We identified 26 mRNAs and 5 miRNAs that were significantly (FDR <0.1) up-regulated and 38 mRNAs and 12 miRNAs that were significantly down-regulated in the cancer patients when compared to smoker controls. Functional analysis identified differential transcriptomic expressions related to tumorigenesis. Integration of miRNA-mRNA data into interaction network analysis showed modulation of the extracellular signal-regulated kinase/mitogen-activated protein kinase (ERK/MAPK) pathway in the contralateral lung field of cancerization. In conclusion, patients with lung adenocarcinoma have tumor related molecules and pathways in histologically normal appearing peripheral airway epithelial cells, a substantial distance from the tumor itself. This finding can potentially provide new biomarkers for early detection of lung cancer and novel therapeutic targets.


Asunto(s)
Adenocarcinoma/genética , Perfilación de la Expresión Génica , Neoplasias Pulmonares/genética , MicroARNs/genética , ARN Mensajero/genética , Sistema Respiratorio/metabolismo , Adenocarcinoma/metabolismo , Anciano , Transformación Celular Neoplásica/genética , Transformación Celular Neoplásica/metabolismo , Células Epiteliales/metabolismo , Quinasas MAP Reguladas por Señal Extracelular/metabolismo , Femenino , Humanos , Neoplasias Pulmonares/metabolismo , Masculino , Persona de Mediana Edad , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Fumar
3.
Sci Rep ; 4: 4411, 2014 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-24651673

RESUMEN

The spectrum of modern molecular high-throughput assaying includes diverse technologies such as microarray gene expression, miRNA expression, proteomics, DNA methylation, among many others. Now that these technologies have matured and become increasingly accessible, the next frontier is to collect "multi-modal" data for the same set of subjects and conduct integrative, multi-level analyses. While multi-modal data does contain distinct biological information that can be useful for answering complex biology questions, its value for predicting clinical phenotypes and contributions of each type of input remain unknown. We obtained 47 datasets/predictive tasks that in total span over 9 data modalities and executed analytic experiments for predicting various clinical phenotypes and outcomes. First, we analyzed each modality separately using uni-modal approaches based on several state-of-the-art supervised classification and feature selection methods. Then, we applied integrative multi-modal classification techniques. We have found that gene expression is the most predictively informative modality. Other modalities such as protein expression, miRNA expression, and DNA methylation also provide highly predictive results, which are often statistically comparable but not superior to gene expression data. Integrative multi-modal analyses generally do not increase predictive signal compared to gene expression data.


Asunto(s)
Biología Computacional/estadística & datos numéricos , ADN de Neoplasias/genética , MicroARNs/genética , Proteínas de Neoplasias/genética , Neoplasias/diagnóstico , ARN Neoplásico/genética , Metilación de ADN , ADN de Neoplasias/metabolismo , Conjuntos de Datos como Asunto , Diagnóstico por Imagen , Femenino , Dosificación de Gen , Expresión Génica , Humanos , Masculino , MicroARNs/metabolismo , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/mortalidad , Neoplasias/patología , Pronóstico , ARN Neoplásico/metabolismo , Análisis de Supervivencia
4.
BMC Syst Biol ; 7 Suppl 5: S1, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564859

RESUMEN

BACKGROUND: Oncogenic mechanisms in small-cell lung cancer remain poorly understood leaving this tumor with the worst prognosis among all lung cancers. Unlike other cancer types, sequencing genomic approaches have been of limited success in small-cell lung cancer, i.e., no mutated oncogenes with potential driver characteristics have emerged, as it is the case for activating mutations of epidermal growth factor receptor in non-small-cell lung cancer. Differential gene expression analysis has also produced SCLC signatures with limited application, since they are generally not robust across datasets. Nonetheless, additional genomic approaches are warranted, due to the increasing availability of suitable small-cell lung cancer datasets. Gene co-expression network approaches are a recent and promising avenue, since they have been successful in identifying gene modules that drive phenotypic traits in several biological systems, including other cancer types. RESULTS: We derived an SCLC-specific classifier from weighted gene co-expression network analysis (WGCNA) of a lung cancer dataset. The classifier, termed SCLC-specific hub network (SSHN), robustly separates SCLC from other lung cancer types across multiple datasets and multiple platforms, including RNA-seq and shotgun proteomics. The classifier was also conserved in SCLC cell lines. SSHN is enriched for co-expressed signaling network hubs strongly associated with the SCLC phenotype. Twenty of these hubs are actionable kinases with oncogenic potential, among which spleen tyrosine kinase (SYK) exhibits one of the highest overall statistical associations to SCLC. In patient tissue microarrays and cell lines, SCLC can be separated into SYK-positive and -negative. SYK siRNA decreases proliferation rate and increases cell death of SYK-positive SCLC cell lines, suggesting a role for SYK as an oncogenic driver in a subset of SCLC. CONCLUSIONS: SCLC treatment has thus far been limited to chemotherapy and radiation. Our WGCNA analysis identifies SYK both as a candidate biomarker to stratify SCLC patients and as a potential therapeutic target. In summary, WGCNA represents an alternative strategy to large scale sequencing for the identification of potential oncogenic drivers, based on a systems view of signaling networks. This strategy is especially useful in cancer types where no actionable mutations have emerged.


Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patología , Proteínas Oncogénicas/metabolismo , Proteínas Tirosina Quinasas/metabolismo , Carcinoma Pulmonar de Células Pequeñas/metabolismo , Carcinoma Pulmonar de Células Pequeñas/patología , Línea Celular Tumoral , Proliferación Celular , Supervivencia Celular , Técnicas de Silenciamiento del Gen , Humanos , Péptidos y Proteínas de Señalización Intracelular/deficiencia , Péptidos y Proteínas de Señalización Intracelular/genética , Terapia Molecular Dirigida , Proteínas Oncogénicas/deficiencia , Proteínas Oncogénicas/genética , Proteínas Tirosina Quinasas/deficiencia , Proteínas Tirosina Quinasas/genética , Proteómica , Quinasa Syk
5.
PLoS One ; 7(6): e39790, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22761902

RESUMEN

We have developed a mouse model of atherosclerotic plaque regression in which an atherosclerotic aortic arch from a hyperlipidemic donor is transplanted into a normolipidemic recipient, resulting in rapid elimination of cholesterol and monocyte-derived macrophage cells (CD68+) from transplanted vessel walls. To gain a comprehensive view of the differences in gene expression patterns in macrophages associated with regressing compared with progressing atherosclerotic plaque, we compared mRNA expression patterns in CD68+ macrophages extracted from plaque in aortic aches transplanted into normolipidemic or into hyperlipidemic recipients. In CD68+ cells from regressing plaque we observed that genes associated with the contractile apparatus responsible for cellular movement (e.g. actin and myosin) were up-regulated whereas genes related to cell adhesion (e.g. cadherins, vinculin) were down-regulated. In addition, CD68+ cells from regressing plaque were characterized by enhanced expression of genes associated with an anti-inflammatory M2 macrophage phenotype, including arginase I, CD163 and the C-lectin receptor. Our analysis suggests that in regressing plaque CD68+ cells preferentially express genes that reduce cellular adhesion, enhance cellular motility, and overall act to suppress inflammation.


Asunto(s)
Aterosclerosis/patología , Macrófagos/metabolismo , Transcriptoma , Animales , Antígenos CD/genética , Antígenos CD/inmunología , Antígenos de Diferenciación Mielomonocítica/genética , Antígenos de Diferenciación Mielomonocítica/inmunología , Apolipoproteínas E/genética , Aterosclerosis/genética , Macrófagos/inmunología , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Reacción en Cadena en Tiempo Real de la Polimerasa
6.
BMC Genomics ; 13 Suppl 8: S22, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23282373

RESUMEN

BACKGROUND: The discovery of molecular pathways is a challenging problem and its solution relies on the identification of causal molecular interactions in genomics data. Causal molecular interactions can be discovered using randomized experiments; however such experiments are often costly, infeasible, or unethical. Fortunately, algorithms that infer causal interactions from observational data have been in development for decades, predominantly in the quantitative sciences, and many of them have recently been applied to genomics data. While these algorithms can infer unoriented causal interactions between involved molecular variables (i.e., without specifying which one is the cause and which one is the effect), causally orienting all inferred molecular interactions was assumed to be an unsolvable problem until recently. In this work, we use transcription factor-target gene regulatory interactions in three different organisms to evaluate a new family of methods that, given observational data for just two causally related variables, can determine which one is the cause and which one is the effect. RESULTS: We have found that a particular family of causal orientation methods (IGCI Gaussian) is often able to accurately infer directionality of causal interactions, and that these methods usually outperform other causal orientation techniques. We also introduced a novel ensemble technique for causal orientation that combines decisions of individual causal orientation methods. The ensemble method was found to be more accurate than any best individual causal orientation method in the tested data. CONCLUSIONS: This work represents a first step towards establishing context for practical use of causal orientation methods in the genomics domain. We have found that some causal orientation methodologies yield accurate predictions of causal orientation in genomics data, and we have improved on this capability with a novel ensemble method. Our results suggest that these methods have the potential to facilitate reconstruction of molecular pathways by minimizing the number of required randomized experiments to find causal directionality and by avoiding experiments that are infeasible and/or unethical.


Asunto(s)
Algoritmos , Genómica , Área Bajo la Curva , Bases de Datos Factuales , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Redes Reguladoras de Genes , Humanos , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células T Precursoras/metabolismo , Curva ROC , Receptor Notch1/genética , Receptor Notch1/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Factor de Transcripción ReIA/genética , Factor de Transcripción ReIA/metabolismo
7.
Biol Direct ; 6: 15, 2011 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-21356087

RESUMEN

BACKGROUND: Pathway databases are becoming increasingly important and almost omnipresent in most types of biological and translational research. However, little is known about the quality and completeness of pathways stored in these databases. The present study conducts a comprehensive assessment of transcriptional regulatory pathways in humans for seven well-studied transcription factors: MYC, NOTCH1, BCL6, TP53, AR, STAT1, and RELA. The employed benchmarking methodology first involves integrating genome-wide binding with functional gene expression data to derive direct targets of transcription factors. Then the lists of experimentally obtained direct targets are compared with relevant lists of transcriptional targets from 10 commonly used pathway databases. RESULTS: The results of this study show that for the majority of pathway databases, the overlap between experimentally obtained target genes and targets reported in transcriptional regulatory pathway databases is surprisingly small and often is not statistically significant. The only exception is MetaCore pathway database which yields statistically significant intersection with experimental results in 84% cases. Additionally, we suggest that the lists of experimentally derived direct targets obtained in this study can be used to reveal new biological insight in transcriptional regulation and suggest novel putative therapeutic targets in cancer. CONCLUSIONS: Our study opens a debate on validity of using many popular pathway databases to obtain transcriptional regulatory targets. We conclude that the choice of pathway databases should be informed by solid scientific evidence and rigorous empirical evaluation. REVIEWERS: This article was reviewed by Prof. Wing Hung Wong, Dr. Thiago Motta Venancio (nominated by Dr. L Aravind), and Prof. Geoff J McLachlan.


Asunto(s)
Redes Reguladoras de Genes/genética , Genoma Humano/genética , Transcripción Genética , Bases de Datos Genéticas , Humanos , Estándares de Referencia , Factores de Transcripción/metabolismo
8.
Cancer Cell ; 18(3): 268-81, 2010 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-20832754

RESUMEN

It was previously shown that the NF-κB pathway is downstream of oncogenic Notch1 in T cell acute lymphoblastic leukemia (T-ALL). Here, we visualize Notch-induced NF-κB activation using both human T-ALL cell lines and animal models. We demonstrate that Hes1, a canonical Notch target and transcriptional repressor, is responsible for sustaining IKK activation in T-ALL. Hes1 exerts its effects by repressing the deubiquitinase CYLD, a negative IKK complex regulator. CYLD expression was found to be significantly suppressed in primary T-ALL. Finally, we demonstrate that IKK inhibition is a promising option for the targeted therapy of T-ALL as specific suppression of IKK expression and function affected both the survival of human T-ALL cells and the maintenance of the disease in vivo.


Asunto(s)
Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Proteínas de Homeodominio/metabolismo , Leucemia de Células T/metabolismo , FN-kappa B/metabolismo , Receptores Notch/metabolismo , Proteínas Supresoras de Tumor/antagonistas & inhibidores , Animales , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Diferenciación Celular/fisiología , Procesos de Crecimiento Celular/fisiología , Supervivencia Celular/fisiología , Enzima Desubiquitinante CYLD , Genes Supresores de Tumor , Proteínas de Homeodominio/genética , Humanos , Leucemia de Células T/genética , Leucemia de Células T/patología , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Ratones Noqueados , FN-kappa B/genética , Receptores Notch/genética , Transducción de Señal , Factor de Transcripción HES-1 , Factor de Transcripción ReIA/metabolismo , Proteínas Supresoras de Tumor/genética , Proteínas Supresoras de Tumor/metabolismo
9.
PLoS One ; 4(3): e4922, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19290050

RESUMEN

BACKGROUND: Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development. METHODOLOGY/PRINCIPAL FINDINGS: We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data. CONCLUSIONS/SIGNIFICANCE: THE FINDINGS OF THE PRESENT STUDY HAVE TWO IMPORTANT PRACTICAL IMPLICATIONS: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests.


Asunto(s)
Interpretación Estadística de Datos , Análisis de Secuencia por Matrices de Oligonucleótidos , Humanos , Neoplasias/genética
10.
BMC Bioinformatics ; 9: 319, 2008 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-18647401

RESUMEN

BACKGROUND: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. RESULTS: In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. CONCLUSION: We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.


Asunto(s)
Inteligencia Artificial , Biomarcadores de Tumor/análisis , Biología Computacional/métodos , Árboles de Decisión , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Humanos , Neoplasias/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Distribución Aleatoria , Estudios de Validación como Asunto
12.
PLoS One ; 2(9): e958, 2007 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-17895998

RESUMEN

BACKGROUND: The development of new high-throughput genotyping technologies has allowed fast evaluation of single nucleotide polymorphisms (SNPs) on a genome-wide scale. Several recent genome-wide association studies employing these technologies suggest that panels of SNPs can be a useful tool for predicting cancer susceptibility and discovery of potentially important new disease loci. METHODOLOGY/PRINCIPAL FINDINGS: In the present paper we undertake a careful examination of the relative significance of genetics, environmental factors, and biases of the data analysis protocol that was used in a previously published genome-wide association study. That prior study reported a nearly perfect discrimination of esophageal cancer patients and healthy controls on the basis of only genetic information. On the other hand, our results strongly suggest that SNPs in this dataset are not statistically linked to the phenotype, while several environmental factors and especially family history of esophageal cancer (a proxy to both environmental and genetic factors) have only a modest association with the disease. CONCLUSIONS/SIGNIFICANCE: The main component of the previously claimed strong discriminatory signal is due to several data analysis pitfalls that in combination led to the strongly optimistic results. Such pitfalls are preventable and should be avoided in future studies since they create misleading conclusions and generate many false leads for subsequent research.


Asunto(s)
Neoplasias Esofágicas/genética , Predisposición Genética a la Enfermedad/genética , Genoma Humano , Polimorfismo de Nucleótido Simple , Factores de Edad , Consumo de Bebidas Alcohólicas/efectos adversos , Ambiente , Neoplasias Esofágicas/etiología , Salud de la Familia , Femenino , Genotipo , Humanos , Masculino , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Componente Principal , Factores de Riesgo , Fumar/efectos adversos
13.
AMIA Annu Symp Proc ; : 686-90, 2007 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-18693924

RESUMEN

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate decision support algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to-date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work however found that random forest classifiers outperform support vector machines. In the present paper we point to several biases of this prior work and conduct a new unbiased evaluation of the two algorithms. Our experiments using 18 diagnostic and prognostic datasets show that support vector machines outperform random forests often by a large margin.


Asunto(s)
Algoritmos , Toma de Decisiones Asistida por Computador , Árboles de Decisión , Perfilación de la Expresión Génica/métodos , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Biología Computacional/métodos , Humanos , Neoplasias/clasificación , Reconocimiento de Normas Patrones Automatizadas/métodos
14.
Cancer Inform ; 2: 133-62, 2007 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-19458765

RESUMEN

Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them.

15.
Int J Med Inform ; 74(7-8): 491-503, 2005 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-15967710

RESUMEN

The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, we have built a system called GEMS (gene expression model selector) for the automated development and evaluation of high-quality cancer diagnostic models and biomarker discovery from microarray gene expression data. In order to determine and equip the system with the best performing diagnostic methodologies in this domain, we first conducted a comprehensive evaluation of classification algorithms using 11 cancer microarray datasets. In this paper we present a preliminary evaluation of the system with five new datasets. The performance of the models produced automatically by GEMS is comparable or better than the results obtained by human analysts. Additionally, we performed a cross-dataset evaluation of the system. This involved using a dataset to build a diagnostic model and to estimate its future performance, then applying this model and evaluating its performance on a different dataset. We found that models produced by GEMS indeed perform well in independent samples and, furthermore, the cross-validation performance estimates output by the system approximate well the error obtained by the independent validation. GEMS is freely available for download for non-commercial use from http://www.gems-system.org.


Asunto(s)
Biomarcadores/análisis , Expresión Génica , Neoplasias/diagnóstico , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Diagnóstico por Computador , Humanos , Modelos Genéticos , Programas Informáticos , Estados Unidos
16.
Bioinformatics ; 21(5): 631-43, 2005 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-15374862

RESUMEN

MOTIVATION: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. RESULTS: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. AVAILABILITY: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. CONTACT: alexander.statnikov@vanderbilt.edu.


Asunto(s)
Algoritmos , Inteligencia Artificial , Biomarcadores de Tumor/metabolismo , Diagnóstico por Computador/métodos , Perfilación de la Expresión Génica/métodos , Pruebas Genéticas/métodos , Proteínas de Neoplasias/metabolismo , Neoplasias/diagnóstico , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis por Conglomerados , Predisposición Genética a la Enfermedad/genética , Humanos , Proteínas de Neoplasias/genética , Neoplasias/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Programas Informáticos , Interfaz Usuario-Computador
17.
Stud Health Technol Inform ; 107(Pt 2): 813-7, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15360925

RESUMEN

Cancer diagnosis is a major clinical applications area of gene expression microarray technology. We are seeking to develop a system for cancer diagnostic model creation based on microarray data. In order to equip the system with the optimal combination of data modeling methods, we performed a comprehensive evaluation of several major classification algorithms, gene selection methods, and cross-validation designs using 11 datasets spanning 74 diagnostic categories (41 cancer types and 12 normal tissue types). The Multi-Category Support Vector Machine techniques by Crammer and Singer, Weston and Watkins, and one-versus-rest were found to be the best methods and they outperform other learning algorithms such as K-Nearest Neighbors and Neural Networks often to a remarkable degree. Gene selection techniques are shown to significantly improve classification performance. These results guided the development of a software system that fully automates cancer diagnostic model construction with quality on par with or better than previously published results derived by expert human analysts.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/clasificación , Neoplasias/diagnóstico , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Diagnóstico por Computador , Sistemas Especialistas , Análisis Factorial , Expresión Génica , Humanos , Reconocimiento de Normas Patrones Automatizadas , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA