Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Psychiatry ; 15: 30, 2015 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-25886446

RESUMEN

BACKGROUND: Predicting Posttraumatic Stress Disorder (PTSD) is a pre-requisite for targeted prevention. Current research has identified group-level risk-indicators, many of which (e.g., head trauma, receiving opiates) concern but a subset of survivors. Identifying interchangeable sets of risk indicators may increase the efficiency of early risk assessment. The study goal is to use supervised machine learning (ML) to uncover interchangeable, maximally predictive combinations of early risk indicators. METHODS: Data variables (features) reflecting event characteristics, emergency department (ED) records and early symptoms were collected in 957 trauma survivors within ten days of ED admission, and used to predict PTSD symptom trajectories during the following fifteen months. A Target Information Equivalence Algorithm (TIE*) identified all minimal sets of features (Markov Boundaries; MBs) that maximized the prediction of a non-remitting PTSD symptom trajectory when integrated in a support vector machine (SVM). The predictive accuracy of each set of predictors was evaluated in a repeated 10-fold cross-validation and expressed as average area under the Receiver Operating Characteristics curve (AUC) for all validation trials. RESULTS: The average number of MBs per cross validation was 800. MBs' mean AUC was 0.75 (95% range: 0.67-0.80). The average number of features per MB was 18 (range: 12-32) with 13 features present in over 75% of the sets. CONCLUSIONS: Our findings support the hypothesized existence of multiple and interchangeable sets of risk indicators that equally and exhaustively predict non-remitting PTSD. ML's ability to increase prediction versatility is a promising step towards developing algorithmic, knowledge-based, personalized prediction of post-traumatic psychopathology.


Asunto(s)
Adaptación Psicológica/fisiología , Inteligencia Artificial , Trastornos por Estrés Postraumático , Heridas y Lesiones , Adulto , Algoritmos , Diagnóstico Precoz , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pronóstico , Curva ROC , Medición de Riesgo , Factores de Riesgo , Trastornos por Estrés Postraumático/diagnóstico , Trastornos por Estrés Postraumático/etiología , Trastornos por Estrés Postraumático/fisiopatología , Trastornos por Estrés Postraumático/prevención & control , Investigación Biomédica Traslacional , Heridas y Lesiones/complicaciones , Heridas y Lesiones/psicología
2.
BMC Genomics ; 13 Suppl 8: S22, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23282373

RESUMEN

BACKGROUND: The discovery of molecular pathways is a challenging problem and its solution relies on the identification of causal molecular interactions in genomics data. Causal molecular interactions can be discovered using randomized experiments; however such experiments are often costly, infeasible, or unethical. Fortunately, algorithms that infer causal interactions from observational data have been in development for decades, predominantly in the quantitative sciences, and many of them have recently been applied to genomics data. While these algorithms can infer unoriented causal interactions between involved molecular variables (i.e., without specifying which one is the cause and which one is the effect), causally orienting all inferred molecular interactions was assumed to be an unsolvable problem until recently. In this work, we use transcription factor-target gene regulatory interactions in three different organisms to evaluate a new family of methods that, given observational data for just two causally related variables, can determine which one is the cause and which one is the effect. RESULTS: We have found that a particular family of causal orientation methods (IGCI Gaussian) is often able to accurately infer directionality of causal interactions, and that these methods usually outperform other causal orientation techniques. We also introduced a novel ensemble technique for causal orientation that combines decisions of individual causal orientation methods. The ensemble method was found to be more accurate than any best individual causal orientation method in the tested data. CONCLUSIONS: This work represents a first step towards establishing context for practical use of causal orientation methods in the genomics domain. We have found that some causal orientation methodologies yield accurate predictions of causal orientation in genomics data, and we have improved on this capability with a novel ensemble method. Our results suggest that these methods have the potential to facilitate reconstruction of molecular pathways by minimizing the number of required randomized experiments to find causal directionality and by avoiding experiments that are infeasible and/or unethical.


Asunto(s)
Algoritmos , Genómica , Área Bajo la Curva , Bases de Datos Factuales , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Redes Reguladoras de Genes , Humanos , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células T Precursoras/metabolismo , Curva ROC , Receptor Notch1/genética , Receptor Notch1/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Factor de Transcripción ReIA/genética , Factor de Transcripción ReIA/metabolismo
3.
Genomics ; 97(1): 7-18, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20951196

RESUMEN

De-novo reverse-engineering of genome-scale regulatory networks is an increasingly important objective for biological and translational research. While many methods have been recently developed for this task, their absolute and relative performance remains poorly understood. The present study conducts a rigorous performance assessment of 32 computational methods/variants for de-novo reverse-engineering of genome-scale regulatory networks by benchmarking these methods in 15 high-quality datasets and gold-standards of experimentally verified mechanistic knowledge. The results of this study show that some methods need to be substantially improved upon, while others should be used routinely. Our results also demonstrate that several univariate methods provide a "gatekeeper" performance threshold that should be applied when method developers assess the performance of their novel multivariate algorithms. Finally, the results of this study can be used to show practical utility and to establish guidelines for everyday use of reverse-engineering algorithms, aiming towards creation of automated data-analysis protocols and software systems.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Genoma , Algoritmos , Biología Computacional/normas , Biología Computacional/estadística & datos numéricos , Bases de Datos de Ácidos Nucleicos , Métodos , Análisis Multivariante
4.
PLoS Comput Biol ; 6(5): e1000790, 2010 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-20502670

RESUMEN

Molecular signatures are computational or mathematical models created to diagnose disease and other phenotypes and to predict clinical outcomes and response to treatment. It is widely recognized that molecular signatures constitute one of the most important translational and basic science developments enabled by recent high-throughput molecular assays. A perplexing phenomenon that characterizes high-throughput data analysis is the ubiquitous multiplicity of molecular signatures. Multiplicity is a special form of data analysis instability in which different analysis methods used on the same data, or different samples from the same population lead to different but apparently maximally predictive signatures. This phenomenon has far-reaching implications for biological discovery and development of next generation patient diagnostics and personalized treatments. Currently the causes and interpretation of signature multiplicity are unknown, and several, often contradictory, conjectures have been made to explain it. We present a formal characterization of signature multiplicity and a new efficient algorithm that offers theoretical guarantees for extracting the set of maximally predictive and non-redundant signatures independent of distribution. The new algorithm identifies exactly the set of optimal signatures in controlled experiments and yields signatures with significantly better predictivity and reproducibility than previous algorithms in human microarray gene expression datasets. Our results shed light on the causes of signature multiplicity, provide computational tools for studying it empirically and introduce a framework for in silico bioequivalence of this important new class of diagnostic and personalized medicine modalities.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Área Bajo la Curva , Simulación por Computador , Perfilación de la Expresión Génica , Humanos , Cadenas de Markov , Reproducibilidad de los Resultados
5.
BMC Bioinformatics ; 9: 319, 2008 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-18647401

RESUMEN

BACKGROUND: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. RESULTS: In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. CONCLUSION: We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.


Asunto(s)
Inteligencia Artificial , Biomarcadores de Tumor/análisis , Biología Computacional/métodos , Árboles de Decisión , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Humanos , Neoplasias/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Distribución Aleatoria , Estudios de Validación como Asunto
6.
J Am Med Inform Assoc ; 13(4): 446-55, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16622165

RESUMEN

OBJECTIVE: The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard). DESIGN: Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors. MEASUREMENTS: Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation. RESULTS: For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters. CONCLUSIONS: These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.


Asunto(s)
Inteligencia Artificial , Bibliometría , Almacenamiento y Recuperación de la Información/métodos , MEDLINE , Área Bajo la Curva , Curva ROC , Análisis de Regresión
7.
Sci Rep ; 6: 22558, 2016 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-26939894

RESUMEN

Reverse-engineering of causal pathways that implicate diseases and vital cellular functions is a fundamental problem in biomedicine. Discovery of the local causal pathway of a target variable (that consists of its direct causes and direct effects) is essential for effective intervention and can facilitate accurate diagnosis and prognosis. Recent research has provided several active learning methods that can leverage passively observed high-throughput data to draft causal pathways and then refine the inferred relations with a limited number of experiments. The current study provides a comprehensive evaluation of the performance of active learning methods for local causal pathway discovery in real biological data. Specifically, 54 active learning methods/variants from 3 families of algorithms were applied for local causal pathways reconstruction of gene regulation for 5 transcription factors in S. cerevisiae. Four aspects of the methods' performance were assessed, including adjacency discovery quality, edge orientation accuracy, complete pathway discovery quality, and experimental cost. The results of this study show that some methods provide significant performance benefits over others and therefore should be routinely used for local causal pathway discovery tasks. This study also demonstrates the feasibility of local causal pathway reconstruction in real biological systems with significant quality and low experimental cost.


Asunto(s)
Regulación de la Expresión Génica , Modelos Biológicos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/fisiología , Factores de Transcripción/metabolismo , Algoritmos , Ontologías Biológicas , Simulación por Computador , Estudios de Factibilidad , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Ensayos Analíticos de Alto Rendimiento , Humanos , Aprendizaje Basado en Problemas , Proyectos de Investigación , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/genética
8.
PLoS One ; 11(3): e0151174, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27028297

RESUMEN

Conventional research methodologies and data analytic approaches in psychiatric research are unable to reliably infer causal relations without experimental designs, or to make inferences about the functional properties of the complex systems in which psychiatric disorders are embedded. This article describes a series of studies to validate a novel hybrid computational approach--the Complex Systems-Causal Network (CS-CN) method-designed to integrate causal discovery within a complex systems framework for psychiatric research. The CS-CN method was first applied to an existing dataset on psychopathology in 163 children hospitalized with injuries (validation study). Next, it was applied to a much larger dataset of traumatized children (replication study). Finally, the CS-CN method was applied in a controlled experiment using a 'gold standard' dataset for causal discovery and compared with other methods for accurately detecting causal variables (resimulation controlled experiment). The CS-CN method successfully detected a causal network of 111 variables and 167 bivariate relations in the initial validation study. This causal network had well-defined adaptive properties and a set of variables was found that disproportionally contributed to these properties. Modeling the removal of these variables resulted in significant loss of adaptive properties. The CS-CN method was successfully applied in the replication study and performed better than traditional statistical methods, and similarly to state-of-the-art causal discovery algorithms in the causal detection experiment. The CS-CN method was validated, replicated, and yielded both novel and previously validated findings related to risk factors and potential treatments of psychiatric disorders. The novel approach yields both fine-grain (micro) and high-level (macro) insights and thus represents a promising approach for complex systems-oriented research in psychiatry.


Asunto(s)
Psiquiatría/métodos , Adolescente , Niño , Análisis por Conglomerados , Humanos , Modelos Psicológicos , Análisis de Sistemas , Heridas y Lesiones/psicología
9.
J Am Med Inform Assoc ; 12(2): 207-16, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-15561789

RESUMEN

OBJECTIVE Finding the best scientific evidence that applies to a patient problem is becoming exceedingly difficult due to the exponential growth of medical publications. The objective of this study was to apply machine learning techniques to automatically identify high-quality, content-specific articles for one time period in internal medicine and compare their performance with previous Boolean-based PubMed clinical query filters of Haynes et al. DESIGN The selection criteria of the ACP Journal Club for articles in internal medicine were the basis for identifying high-quality articles in the areas of etiology, prognosis, diagnosis, and treatment. Naive Bayes, a specialized AdaBoost algorithm, and linear and polynomial support vector machines were applied to identify these articles. MEASUREMENTS The machine learning models were compared in each category with each other and with the clinical query filters using area under the receiver operating characteristic curves, 11-point average recall precision, and a sensitivity/specificity match method. RESULTS In most categories, the data-induced models have better or comparable sensitivity, specificity, and precision than the clinical query filters. The polynomial support vector machine models perform the best among all learning methods in ranking the articles as evaluated by area under the receiver operating curve and 11-point average recall precision. CONCLUSION This research shows that, using machine learning methods, it is possible to automatically build models for retrieving high-quality, content-specific articles using inclusion or citation by the ACP Journal Club as a gold standard in a given time period in internal medicine that perform better than the 1994 PubMed clinical query filters.


Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Medicina Interna , Algoritmos , Área Bajo la Curva , MEDLINE , Medical Subject Headings , Curva ROC , Sesgo de Selección
10.
Int J Med Inform ; 74(7-8): 491-503, 2005 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-15967710

RESUMEN

The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, we have built a system called GEMS (gene expression model selector) for the automated development and evaluation of high-quality cancer diagnostic models and biomarker discovery from microarray gene expression data. In order to determine and equip the system with the best performing diagnostic methodologies in this domain, we first conducted a comprehensive evaluation of classification algorithms using 11 cancer microarray datasets. In this paper we present a preliminary evaluation of the system with five new datasets. The performance of the models produced automatically by GEMS is comparable or better than the results obtained by human analysts. Additionally, we performed a cross-dataset evaluation of the system. This involved using a dataset to build a diagnostic model and to estimate its future performance, then applying this model and evaluating its performance on a different dataset. We found that models produced by GEMS indeed perform well in independent samples and, furthermore, the cross-validation performance estimates output by the system approximate well the error obtained by the independent validation. GEMS is freely available for download for non-commercial use from http://www.gems-system.org.


Asunto(s)
Biomarcadores/análisis , Expresión Génica , Neoplasias/diagnóstico , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Diagnóstico por Computador , Humanos , Modelos Genéticos , Programas Informáticos , Estados Unidos
11.
AMIA Annu Symp Proc ; 2015: 2043-52, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26958304

RESUMEN

Brain science is a frontier research area with great promise for understanding, preventing, and treating multiple diseases affecting millions of patients. Its key task of reconstructing neuronal brain connectivity poses unique Big Data Analysis challenges distinct from those in clinical or "-omics" domains. Our goal is to understand the strengths and limitations of reconstruction algorithms, measure performance and its determinants, and ultimately enhance performance and applicability. We devised a set of experiments in a well-controlled setting using an established gold-standard based on calcium fluorescence time series recordings of thousands of neurons sampled from a previously validated neuronal model of complex time-varying causal neuronal connections. Following empirical testing of several state-of-the-art reconstruction algorithms, and using the best-performing algorithms, we constructed features of a classifier and predicted the presence or absence of connections using meta-learning. This approach combines information-theoretic, feature construction, and pattern recognition meta-learning methods to considerably improve the Area under ROC curve performance. Our data are very promising toward the feasibility of reliably reconstructing complex neuronal connectivity.


Asunto(s)
Algoritmos , Encéfalo/fisiología , Neuronas , Humanos , Aprendizaje , Estadística como Asunto
12.
J Affect Disord ; 184: 170-5, 2015 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-26093830

RESUMEN

BACKGROUND: Pre-deployment identification of soldiers at risk for long-term posttraumatic stress psychopathology after home coming is important to guide decisions about deployment. Early post-deployment identification can direct early interventions to those in need and thereby prevents the development of chronic psychopathology. Both hold significant public health benefits given large numbers of deployed soldiers, but has so far not been achieved. Here, we aim to assess the potential for pre- and early post-deployment prediction of resilience or posttraumatic stress development in soldiers by application of machine learning (ML) methods. METHODS: ML feature selection and prediction algorithms were applied to a prospective cohort of 561 Danish soldiers deployed to Afghanistan in 2009 to identify unique risk indicators and forecast long-term posttraumatic stress responses. RESULTS: Robust pre- and early postdeployment risk indicators were identified, and included individual PTSD symptoms as well as total level of PTSD symptoms, previous trauma and treatment, negative emotions, and thought suppression. The predictive performance of these risk indicators combined was assessed by cross-validation. Together, these indicators forecasted long term posttraumatic stress responses with high accuracy (pre-deployment: AUC = 0.84 (95% CI = 0.81-0.87), post-deployment: AUC = 0.88 (95% CI = 0.85-0.91)). LIMITATIONS: This study utilized a previously collected data set and was therefore not designed to exhaust the potential of ML methods. Further, the study relied solely on self-reported measures. CONCLUSIONS: Pre-deployment and early post-deployment identification of risk for long-term posttraumatic psychopathology are feasible and could greatly reduce the public health costs of war.


Asunto(s)
Aprendizaje Automático , Personal Militar/psicología , Trastornos por Estrés Postraumático/psicología , Adulto , Campaña Afgana 2001- , Algoritmos , Estudios de Cohortes , Dinamarca , Emociones , Femenino , Humanos , Estudios Longitudinales , Masculino , Valor Predictivo de las Pruebas , Estudios Prospectivos , Resiliencia Psicológica , Medición de Riesgo , Máquina de Vectores de Soporte
13.
Arthritis Rheumatol ; 67(11): 2905-15, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26195278

RESUMEN

OBJECTIVE: Inflammatory mediators, such as prostaglandin E2 (PGE2 ) and interleukin-1ß (IL-1ß), are produced by osteoarthritic (OA) joint tissue, where they may contribute to disease pathogenesis. We undertook the present study to examine whether inflammation, evidenced in plasma and peripheral blood leukocytes (PBLs), reflects the presence, progression, or specific symptoms of symptomatic knee OA. METHODS: Patients with symptomatic knee OA were enrolled in a 24-month prospective study of radiographic progression. Standardized knee radiographs were obtained at baseline and 24 months. At baseline, levels of the plasma lipids PGE2 and 15-hydroxyeicosatetraenoic acid (15-HETE) were measured, and transcriptome analysis of PBLs was performed by microarray and quantitative polymerase chain reaction. RESULTS: Baseline PGE2 synthase (PGES) levels determined by PBL microarray gene expression and plasma PGE2 levels distinguished patients with symptomatic knee OA from non-OA controls (area under the receiver operating characteristic curve [AUC] 0.87 and 0.89, respectively, P < 0.0001). Baseline plasma 15-HETE levels were significantly elevated in patients with symptomatic knee OA versus non-OA controls (P < 0.0195). In the 146 patients who completed the 24-month study, elevated baseline expression of IL-1ß, tumor necrosis factor α, and cyclooxygenase 2 (COX-2) messenger RNA in PBLs predicted higher risk of radiographic progression as evidenced by joint space narrowing (JSN). In a multivariate model, AUC point estimates of models containing COX-2 in combination with demographic traits overlapped the confidence interval of the base model in 2 of the 3 JSN outcome measures (JSN >0.0 mm, JSN >0.2 mm, and JSN >0.5 mm; AUC 0.62-0.67). CONCLUSION: The inflammatory plasma lipid biomarkers PGE2 and 15-HETE identify patients with symptomatic knee OA, and the PBL inflammatory transcriptome identifies a subset of patients with symptomatic knee OA who are at increased risk of radiographic progression. These findings may reflect low-grade inflammation in OA and may be useful as diagnostic and prognostic biomarkers in clinical development of disease-modifying OA drugs.


Asunto(s)
Dinoprostona/sangre , Ácidos Hidroxieicosatetraenoicos/sangre , Inflamación/patología , Articulación de la Rodilla/patología , Osteoartritis de la Rodilla/patología , Anciano , Biomarcadores/sangre , Progresión de la Enfermedad , Femenino , Humanos , Inflamación/sangre , Articulación de la Rodilla/diagnóstico por imagen , Masculino , Persona de Mediana Edad , Osteoartritis de la Rodilla/sangre , Osteoartritis de la Rodilla/diagnóstico por imagen , Pronóstico , Estudios Prospectivos , Radiografía
14.
PLoS One ; 10(2): e0118132, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25705890

RESUMEN

Field of cancerization in the airway epithelium has been increasingly examined to understand early pathogenesis of non-small cell lung cancer. However, the extent of field of cancerization throughout the lung airways is unclear. Here we sought to determine the differential gene and microRNA expressions associated with field of cancerization in the peripheral airway epithelial cells of patients with lung adenocarcinoma. We obtained peripheral airway brushings from smoker controls (n=13) and from the lung contralateral to the tumor in cancer patients (n=17). We performed gene and microRNA expression profiling on these peripheral airway epithelial cells using Affymetrix GeneChip and TaqMan Array. Integrated gene and microRNA analysis was performed to identify significant molecular pathways. We identified 26 mRNAs and 5 miRNAs that were significantly (FDR <0.1) up-regulated and 38 mRNAs and 12 miRNAs that were significantly down-regulated in the cancer patients when compared to smoker controls. Functional analysis identified differential transcriptomic expressions related to tumorigenesis. Integration of miRNA-mRNA data into interaction network analysis showed modulation of the extracellular signal-regulated kinase/mitogen-activated protein kinase (ERK/MAPK) pathway in the contralateral lung field of cancerization. In conclusion, patients with lung adenocarcinoma have tumor related molecules and pathways in histologically normal appearing peripheral airway epithelial cells, a substantial distance from the tumor itself. This finding can potentially provide new biomarkers for early detection of lung cancer and novel therapeutic targets.


Asunto(s)
Adenocarcinoma/genética , Perfilación de la Expresión Génica , Neoplasias Pulmonares/genética , MicroARNs/genética , ARN Mensajero/genética , Sistema Respiratorio/metabolismo , Adenocarcinoma/metabolismo , Anciano , Transformación Celular Neoplásica/genética , Transformación Celular Neoplásica/metabolismo , Células Epiteliales/metabolismo , Quinasas MAP Reguladas por Señal Extracelular/metabolismo , Femenino , Humanos , Neoplasias Pulmonares/metabolismo , Masculino , Persona de Mediana Edad , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Fumar
15.
Stud Health Technol Inform ; 107(Pt 2): 813-7, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15360925

RESUMEN

Cancer diagnosis is a major clinical applications area of gene expression microarray technology. We are seeking to develop a system for cancer diagnostic model creation based on microarray data. In order to equip the system with the optimal combination of data modeling methods, we performed a comprehensive evaluation of several major classification algorithms, gene selection methods, and cross-validation designs using 11 datasets spanning 74 diagnostic categories (41 cancer types and 12 normal tissue types). The Multi-Category Support Vector Machine techniques by Crammer and Singer, Weston and Watkins, and one-versus-rest were found to be the best methods and they outperform other learning algorithms such as K-Nearest Neighbors and Neural Networks often to a remarkable degree. Gene selection techniques are shown to significantly improve classification performance. These results guided the development of a software system that fully automates cancer diagnostic model construction with quality on par with or better than previously published results derived by expert human analysts.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/clasificación , Neoplasias/diagnóstico , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Diagnóstico por Computador , Sistemas Especialistas , Análisis Factorial , Expresión Génica , Humanos , Reconocimiento de Normas Patrones Automatizadas , Programas Informáticos
16.
PLoS One ; 9(9): e106479, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25215507

RESUMEN

De-novo reverse-engineering of genome-scale regulatory networks is a fundamental problem of biological and translational research. One of the major obstacles in developing and evaluating approaches for de-novo gene network reconstruction is the absence of high-quality genome-scale gold-standard networks of direct regulatory interactions. To establish a foundation for assessing the accuracy of de-novo gene network reverse-engineering, we constructed high-quality genome-scale gold-standard networks of direct regulatory interactions in Saccharomyces cerevisiae that incorporate binding and gene knockout data. Then we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different datasets spanning over 4 data types. We found that most reconstructed networks had statistically significant accuracies. We also determined which statistical approaches and datasets/data types lead to networks with better reconstruction accuracies. While we found that de-novo reverse-engineering of the entire network is a challenging problem, it is possible to reconstruct sub-networks around some transcription factors with good accuracy. The latter transcription factors can be identified by assessing their connectivity in the inferred networks. Overall, this study provides the gene network reverse-engineering community with a rigorous assessment of the accuracy of S. cerevisiae gene network reconstruction and variability in performance of various approaches for learning both the entire network and sub-networks around transcription factors.


Asunto(s)
Redes Reguladoras de Genes/genética , Genoma Fúngico/genética , Saccharomyces cerevisiae/genética , Algoritmos , Bases de Datos Genéticas , Mutación/genética , Valor Predictivo de las Pruebas , Unión Proteica/genética , Curva ROC , Estándares de Referencia , Genética Inversa , Factores de Transcripción/metabolismo
17.
J Psychiatr Res ; 59: 68-76, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25260752

RESUMEN

There is broad interest in predicting the clinical course of mental disorders from early, multimodal clinical and biological information. Current computational models, however, constitute a significant barrier to realizing this goal. The early identification of trauma survivors at risk of post-traumatic stress disorder (PTSD) is plausible given the disorder's salient onset and the abundance of putative biological and clinical risk indicators. This work evaluates the ability of Machine Learning (ML) forecasting approaches to identify and integrate a panel of unique predictive characteristics and determine their accuracy in forecasting non-remitting PTSD from information collected within 10 days of a traumatic event. Data on event characteristics, emergency department observations, and early symptoms were collected in 957 trauma survivors, followed for fifteen months. An ML feature selection algorithm identified a set of predictors that rendered all others redundant. Support Vector Machines (SVMs) as well as other ML classification algorithms were used to evaluate the forecasting accuracy of i) ML selected features, ii) all available features without selection, and iii) Acute Stress Disorder (ASD) symptoms alone. SVM also compared the prediction of a) PTSD diagnostic status at 15 months to b) posterior probability of membership in an empirically derived non-remitting PTSD symptom trajectory. Results are expressed as mean Area Under Receiver Operating Characteristics Curve (AUC). The feature selection algorithm identified 16 predictors, present in ≥ 95% cross-validation trials. The accuracy of predicting non-remitting PTSD from that set (AUC = .77) did not differ from predicting from all available information (AUC = .78). Predicting from ASD symptoms was not better then chance (AUC = .60). The prediction of PTSD status was less accurate than that of membership in a non-remitting trajectory (AUC = .71). ML methods may fill a critical gap in forecasting PTSD. The ability to identify and integrate unique risk indicators makes this a promising approach for developing algorithms that infer probabilistic risk of chronic posttraumatic stress psychopathology based on complex sources of biological, psychological, and social information.


Asunto(s)
Inteligencia Artificial , Trastornos por Estrés Postraumático/diagnóstico , Trastornos de Estrés Traumático Agudo/psicología , Adolescente , Adulto , Anciano , Algoritmos , Femenino , Estudios de Seguimiento , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Escalas de Valoración Psiquiátrica , Curva ROC , Reproducibilidad de los Resultados , Factores de Riesgo , Trastornos por Estrés Postraumático/prevención & control , Adulto Joven
18.
PLoS One ; 9(2): e89987, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24587168

RESUMEN

The extreme diversity of HIV-1 strains presents a formidable challenge for HIV-1 vaccine design. Although antibodies (Abs) can neutralize HIV-1 and potentially protect against infection, antibodies that target the immunogenic viral surface protein gp120 have widely variable and poorly predictable cross-strain reactivity. Here, we developed a novel computational approach, the Method of Dynamic Epitopes, for identification of neutralization epitopes targeted by anti-HIV-1 monoclonal antibodies (mAbs). Our data demonstrate that this approach, based purely on calculated energetics and 3D structural information, accurately predicts the presence of neutralization epitopes targeted by V3-specific mAbs 2219 and 447-52D in any HIV-1 strain. The method was used to calculate the range of conservation of these specific epitopes across all circulating HIV-1 viruses. Accurately identifying an Ab-targeted neutralization epitope in a virus by computational means enables easy prediction of the breadth of reactivity of specific mAbs across the diversity of thousands of different circulating HIV-1 variants and facilitates rational design and selection of immunogens mimicking specific mAb-targeted epitopes in a multivalent HIV-1 vaccine. The defined epitopes can also be used for the purpose of epitope-specific analyses of breakthrough sequences recorded in vaccine clinical trials. Thus, our study is a prototype for a valuable tool for rational HIV-1 vaccine design.


Asunto(s)
Anticuerpos Monoclonales/inmunología , Anticuerpos Neutralizantes/inmunología , Biología Computacional , Epítopos/inmunología , Anticuerpos Anti-VIH/inmunología , Proteína gp120 de Envoltorio del VIH/inmunología , VIH-1/inmunología , Fragmentos de Péptidos/inmunología , Vacunas contra el SIDA/inmunología , Secuencia de Aminoácidos , Secuencia Conservada , Proteína gp120 de Envoltorio del VIH/química , VIH-1/genética , Modelos Moleculares , Datos de Secuencia Molecular , Fragmentos de Péptidos/química , Conformación Proteica , Especificidad de la Especie , Termodinámica
19.
Sci Rep ; 4: 4411, 2014 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-24651673

RESUMEN

The spectrum of modern molecular high-throughput assaying includes diverse technologies such as microarray gene expression, miRNA expression, proteomics, DNA methylation, among many others. Now that these technologies have matured and become increasingly accessible, the next frontier is to collect "multi-modal" data for the same set of subjects and conduct integrative, multi-level analyses. While multi-modal data does contain distinct biological information that can be useful for answering complex biology questions, its value for predicting clinical phenotypes and contributions of each type of input remain unknown. We obtained 47 datasets/predictive tasks that in total span over 9 data modalities and executed analytic experiments for predicting various clinical phenotypes and outcomes. First, we analyzed each modality separately using uni-modal approaches based on several state-of-the-art supervised classification and feature selection methods. Then, we applied integrative multi-modal classification techniques. We have found that gene expression is the most predictively informative modality. Other modalities such as protein expression, miRNA expression, and DNA methylation also provide highly predictive results, which are often statistically comparable but not superior to gene expression data. Integrative multi-modal analyses generally do not increase predictive signal compared to gene expression data.


Asunto(s)
Biología Computacional/estadística & datos numéricos , ADN de Neoplasias/genética , MicroARNs/genética , Proteínas de Neoplasias/genética , Neoplasias/diagnóstico , ARN Neoplásico/genética , Metilación de ADN , ADN de Neoplasias/metabolismo , Conjuntos de Datos como Asunto , Diagnóstico por Imagen , Femenino , Dosificación de Gen , Expresión Génica , Humanos , Masculino , MicroARNs/metabolismo , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/mortalidad , Neoplasias/patología , Pronóstico , ARN Neoplásico/metabolismo , Análisis de Supervivencia
20.
J Mach Learn Res ; 14: 499-566, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25285052

RESUMEN

Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA