Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Bioinformatics ; 34(6): 985-993, 2018 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-29048458

RESUMEN

Summary: Gene-based supervised machine learning classification models have been widely used to differentiate disease states, predict disease progression and determine effective treatment options. However, many of these classifiers are sensitive to noise and frequently do not replicate in external validation sets. For complex, heterogeneous diseases, these classifiers are further limited by being unable to capture varying combinations of genes that lead to the same phenotype. Pathway-based classification can overcome these challenges by using robust, aggregate features to represent biological mechanisms. In this work, we developed a novel pathway-based approach, PRObabilistic Pathway Score, which uses genes to calculate individualized pathway scores for classification. Unlike previous individualized pathway-based classification methods that use gene sets, we incorporate gene interactions using probabilistic graphical models to more accurately represent the underlying biology and achieve better performance. We apply our method to differentiate two similar complex diseases, ulcerative colitis (UC) and Crohn's disease (CD), which are the two main types of inflammatory bowel disease (IBD). Using five IBD datasets, we compare our method against four gene-based and four alternative pathway-based classifiers in distinguishing CD from UC. We demonstrate superior classification performance and provide biological insight into the top pathways separating CD from UC. Availability and Implementation: PROPS is available as a R package, which can be downloaded at http://simtk.org/home/props or on Bioconductor. Contact: rbaltman@stanford.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Colitis Ulcerosa/diagnóstico , Biología Computacional/métodos , Enfermedad de Crohn/diagnóstico , Redes y Vías Metabólicas , Aprendizaje Automático Supervisado , Adulto , Niño , Colitis Ulcerosa/genética , Colitis Ulcerosa/metabolismo , Colitis Ulcerosa/terapia , Enfermedad de Crohn/genética , Enfermedad de Crohn/metabolismo , Enfermedad de Crohn/terapia , Diagnóstico Diferencial , Progresión de la Enfermedad , Redes Reguladoras de Genes , Humanos , Modelos Biológicos , Mapas de Interacción de Proteínas
2.
Bioinformatics ; 28(8): 1114-21, 2012 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-22355083

RESUMEN

MOTIVATION: The interpretation of high-throughput datasets has remained one of the central challenges of computational biology over the past decade. Furthermore, as the amount of biological knowledge increases, it becomes more and more difficult to integrate this large body of knowledge in a meaningful manner. In this article, we propose a particular solution to both of these challenges. METHODS: We integrate available biological knowledge by constructing a network of molecular interactions of a specific kind: causal interactions. The resulting causal graph can be queried to suggest molecular hypotheses that explain the variations observed in a high-throughput gene expression experiment. We show that a simple scoring function can discriminate between a large number of competing molecular hypotheses about the upstream cause of the changes observed in a gene expression profile. We then develop an analytical method for computing the statistical significance of each score. This analytical method also helps assess the effects of random or adversarial noise on the predictive power of our model. RESULTS: Our results show that the causal graph we constructed from known biological literature is extremely robust to random noise and to missing or spurious information. We demonstrate the power of our causal reasoning model on two specific examples, one from a cancer dataset and the other from a cardiac hypertrophy experiment. We conclude that causal reasoning models provide a valuable addition to the biologist's toolkit for the interpretation of gene expression data. AVAILABILITY AND IMPLEMENTATION: R source code for the method is available upon request.


Asunto(s)
Neoplasias de la Mama/genética , Cardiomegalia/genética , Biología Computacional/métodos , Perfilación de la Expresión Génica , Algoritmos , Humanos , Modelos Biológicos
3.
Sci Rep ; 8(1): 1237, 2018 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-29352257

RESUMEN

Discovery of robust diagnostic or prognostic biomarkers is a key to optimizing therapeutic benefit for select patient cohorts - an idea commonly referred to as precision medicine. Most discovery studies to derive such markers from high-dimensional transcriptomics datasets are weakly powered with sample sizes in the tens of patients. Therefore, highly regularized statistical approaches are essential to making generalizable predictions. At the same time, prior knowledge-driven approaches have been successfully applied to the manual interpretation of high-dimensional transcriptomics datasets. In this work, we assess the impact of combining two orthogonal approaches for the discovery of biomarker signatures, namely (1) well-known lasso-based regression approaches and its more recent derivative, the group lasso, and (2) the discovery of significant upstream regulators in literature-derived biological networks. Our method integrates both approaches in a weighted group-lasso model and differentially weights gene sets based on inferred active regulatory mechanism. Using nested cross-validation as well as independent clinical datasets, we demonstrate that our approach leads to increased accuracy and generalizable results. We implement our approach in a computationally efficient, user-friendly R package called creNET. The package can be downloaded at https://github.com/kouroshz/creNethttps://github.com/kouroshz/creNet and is accompanied by a parsed version of the STRING DB data base.


Asunto(s)
Biomarcadores/análisis , Redes Reguladoras de Genes , Fenotipo , Programas Informáticos , Humanos
4.
Inflamm Bowel Dis ; 24(3): 471-481, 2018 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-29462399

RESUMEN

Background: Monogenic diseases have been shown to contribute to complex disease risk and may hold new insights into the underlying biological mechanism of Inflammatory Bowel Disease (IBD). Methods: We analyzed Mendelian disease associations with IBD using over 55 million patients from the Optum's deidentified electronic health records dataset database. Using the significant Mendelian diseases, we performed pathway enrichment analysis and constructed a model using gene expression datasets to differentiate Crohn's disease (CD), ulcerative colitis (UC), and healthy patient samples. Results: We found 50 Mendelian diseases were significantly associated with IBD, with 40 being significantly associated with both CD and UC. Our results for CD replicated those from previous studies. Pathways that were enriched consisted of mainly immune and metabolic processes with a focus on tolerance and oxidative stress. Our 3-way classifier for UC, CD, and healthy samples yielded an accuracy of 72%. Conclusions: Mendelian diseases that are significantly associated with IBD may reveal novel insights into the genetic architecture of IBD.


Asunto(s)
Enfermedades Genéticas Congénitas/complicaciones , Predisposición Genética a la Enfermedad , Enfermedades Inflamatorias del Intestino/genética , Minería de Datos , Femenino , Expresión Génica , Humanos , Enfermedades Inflamatorias del Intestino/complicaciones , Masculino , Factores de Riesgo
5.
BMC Bioinformatics ; 5: 195, 2004 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-15588317

RESUMEN

BACKGROUND: Gecko (Gene Expression: Computation and Knowledge Organization) is a complete, high-capacity centralized gene expression analysis system, developed in response to the needs of a distributed user community. RESULTS: Based on a client-server architecture, with a centralized repository of typically many tens of thousands of Affymetrix scans, Gecko includes automatic processing pipelines for uploading data from remote sites, a data base, a computational engine implementing approximately 50 different analysis tools, and a client application. Among available analysis tools are clustering methods, principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA, statistical contrast calculations, and various post-processing tools for extracting data at given error rates or significance levels. On account of its open architecture, Gecko also allows for the integration of new algorithms. The Gecko framework is very general: non-Affymetrix and non-gene expression data can be analyzed as well. A unique feature of the Gecko architecture is the concept of the Analysis Tree (actually, a directed acyclic graph), in which all successive results in ongoing analyses are saved. This approach has proven invaluable in allowing a large (approximately 100 users) and distributed community to share results, and to repeatedly return over a span of years to older and potentially very complex analyses of gene expression data. CONCLUSIONS: The Gecko system is being made publicly available as free software http://sourceforge.net/projects/geckoe. In totality or in parts, the Gecko framework should prove useful to users and system developers with a broad range of analysis needs.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Carcinoma/clasificación , Carcinoma/genética , Carcinoma/patología , Línea Celular Tumoral , Análisis por Conglomerados , Biología Computacional/estadística & datos numéricos , Perfilación de la Expresión Génica/clasificación , Perfilación de la Expresión Génica/estadística & datos numéricos , Regulación Neoplásica de la Expresión Génica/genética , Genes Relacionados con las Neoplasias/genética , Humanos , Neoplasias Renales/clasificación , Neoplasias Renales/genética , Neoplasias Renales/patología , Análisis de Secuencia por Matrices de Oligonucleótidos/clasificación , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Diseño de Software , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA