Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Psychiatr Res ; 99: 62-68, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29407288

RESUMEN

Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction.


Asunto(s)
Trastorno Depresivo Mayor/tratamiento farmacológico , Trastorno Depresivo Mayor/genética , Clorhidrato de Duloxetina/farmacología , Estudio de Asociación del Genoma Completo , Inhibidores de Captación de Serotonina y Norepinefrina/farmacología , Máquina de Vectores de Soporte , Adulto , Clorhidrato de Duloxetina/administración & dosificación , Femenino , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Pronóstico , Sensibilidad y Especificidad , Inhibidores de Captación de Serotonina y Norepinefrina/administración & dosificación
2.
Nucleic Acids Res ; 46(D1): D360-D370, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29194489

RESUMEN

MicroRNAs are important regulators of gene expression, achieved by binding to the gene to be regulated. Even with modern high-throughput technologies, it is laborious and expensive to detect all possible microRNA targets. For this reason, several computational microRNA-target prediction tools have been developed, each with its own strengths and limitations. Integration of different tools has been a successful approach to minimize the shortcomings of individual databases. Here, we present mirDIP v4.1, providing nearly 152 million human microRNA-target predictions, which were collected across 30 different resources. We also introduce an integrative score, which was statistically inferred from the obtained predictions, and was assigned to each unique microRNA-target interaction to provide a unified measure of confidence. We demonstrate that integrating predictions across multiple resources does not cumulate prediction bias toward biological processes or pathways. mirDIP v4.1 is freely available at http://ophid.utoronto.ca/mirDIP/.


Asunto(s)
Bases de Datos Genéticas , MicroARNs/metabolismo , ARN Mensajero/metabolismo , Humanos , ARN Mensajero/química
3.
J Integr Bioinform ; 14(2)2017 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-28678736

RESUMEN

Distinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features that might influence bacterial adaptation to a specific niche, we introduce LifeStyle-Specific-Islands (LiSSI). LiSSI combines evolutionary sequence analysis with statistical learning (Random Forest with feature selection, model tuning and robustness analysis). In summary, our strategy aims to identify conserved consecutive homology sequences (islands) in genomes and to identify the most discriminant islands for each lifestyle.


Asunto(s)
Aclimatación/genética , Bacterias/genética , Genoma Bacteriano/genética , Islas Genómicas/genética , Genómica/métodos , Secuencia Conservada/genética , Evolución Molecular , Aprendizaje Automático
4.
Metabolites ; 5(2): 344-63, 2015 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-26065494

RESUMEN

Computational breath analysis is a growing research area aiming at identifying volatile organic compounds (VOCs) in human breath to assist medical diagnostics of the next generation. While inexpensive and non-invasive bioanalytical technologies for metabolite detection in exhaled air and bacterial/fungal vapor exist and the first studies on the power of supervised machine learning methods for profiling of the resulting data were conducted, we lack methods to extract hidden data features emerging from confounding factors. Here, we present Carotta, a new cluster analysis framework dedicated to uncovering such hidden substructures by sophisticated unsupervised statistical learning methods. We study the power of transitivity clustering and hierarchical clustering to identify groups of VOCs with similar expression behavior over most patient breath samples and/or groups of patients with a similar VOC intensity pattern. This enables the discovery of dependencies between metabolites. On the one hand, this allows us to eliminate the effect of potential confounding factors hindering disease classification, such as smoking. On the other hand, we may also identify VOCs associated with disease subtypes or concomitant diseases. Carotta is an open source software with an intuitive graphical user interface promoting data handling, analysis and visualization. The back-end is designed to be modular, allowing for easy extensions with plugins in the future, such as new clustering methods and statistics. It does not require much prior knowledge or technical skills to operate. We demonstrate its power and applicability by means of one artificial dataset. We also apply Carotta exemplarily to a real-world example dataset on chronic obstructive pulmonary disease (COPD). While the artificial data are utilized as a proof of concept, we will demonstrate how Carotta finds candidate markers in our real dataset associated with confounders rather than the primary disease (COPD) and bronchial carcinoma (BC). Carotta is publicly available at http://carotta.compbio.sdu.dk [1].

5.
J Integr Bioinform ; 11(2): 236, 2014 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-24953305

RESUMEN

Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.


Asunto(s)
Neoplasias de la Mama/clasificación , Metilación de ADN , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Algoritmos , Inteligencia Artificial , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Biología Computacional/métodos , Epigénesis Genética , Femenino , Expresión Génica , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Pronóstico , Reproducibilidad de los Resultados , Programas Informáticos
6.
Brief Funct Genomics ; 13(5): 398-408, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24855068

RESUMEN

We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.


Asunto(s)
Biología Computacional/métodos , Genoma Bacteriano/genética , Genómica/métodos , Humanos
8.
J Integr Bioinform ; 10(2): 218, 2013 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-23545212

RESUMEN

Over the last decade the evaluation of odors and vapors in human breath has gained more and more attention, particularly in the diagnostics of pulmonary diseases. Ion mobility spectrometry coupled with multi-capillary columns (MCC/IMS), is a well known technology for detecting volatile organic compounds (VOCs) in air. It is a comparatively inexpensive, non-invasive, high-throughput method, which is able to handle the moisture that comes with human exhaled air, and allows for characterizing of VOCs in very low concentrations. To identify discriminating compounds as biomarkers, it is necessary to have a clear understanding of the detailed composition of human breath. Therefore, in addition to the clinical studies, there is a need for a flexible and comprehensive centralized data repository, which is capable of gathering all kinds of related information. Moreover, there is a demand for automated data integration and semi-automated data analysis, in particular with regard to the rapid data accumulation, emerging from the high-throughput nature of the MCC/IMS technology. Here, we present a comprehensive database application and analysis platform, which combines metabolic maps with heterogeneous biomedical data in a well-structured manner. The design of the database is based on a hybrid of the entity-attribute-value (EAV) model and the EAV-CR, which incorporates the concepts of classes and relationships. Additionally it offers an intuitive user interface that provides easy and quick access to the platform’s functionality: automated data integration and integrity validation, versioning and roll-back strategy, data retrieval as well as semi-automatic data mining and machine learning capabilities. The platform will support MCC/IMS-based biomarker identification and validation. The software, schemata, data sets and further information is publicly available at http://imsdb.mpi-inf.mpg.de.


Asunto(s)
Aire/análisis , Biomarcadores/análisis , Pruebas Respiratorias/métodos , Bases de Datos como Asunto , Espiración , Análisis Espectral/métodos , Árboles de Decisión , Humanos , Iones , Programas Informáticos , Factores de Tiempo
9.
Metabolites ; 3(2): 277-93, 2013 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-24957992

RESUMEN

Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.

10.
Metabolites ; 2(4): 733-55, 2012 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-24957760

RESUMEN

Ion mobility spectrometry combined with multi-capillary columns (MCC/IMS) is a well known technology for detecting volatile organic compounds (VOCs). We may utilize MCC/IMS for scanning human exhaled air, bacterial colonies or cell lines, for example. Thereby we gain information about the human health status or infection threats. We may further study the metabolic response of living cells to external perturbations. The instrument is comparably cheap, robust and easy to use in every day practice. However, the potential of the MCC/IMS methodology depends on the successful application of computational approaches for analyzing the huge amount of emerging data sets. Here, we will review the state of the art and highlight existing challenges. First, we address methods for raw data handling, data storage and visualization. Afterwards we will introduce de-noising, peak picking and other pre-processing approaches. We will discuss statistical methods for analyzing correlations between peaks and diseases or medical treatment. Finally, we study up-to-date machine learning techniques for identifying robust biomarker molecules that allow classifying patients into healthy and diseased groups. We conclude that MCC/IMS coupled with sophisticated computational methods has the potential to successfully address a broad range of biomedical questions. While we can solve most of the data pre-processing steps satisfactorily, some computational challenges with statistical learning and model validation remain.

11.
Philos Trans A Math Phys Eng Sci ; 367(1895): 1971-92, 2009 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-19380321

RESUMEN

Modelling human and animal metabolism is impeded by the lack of accurate quantitative parameters and the large number of biochemical reactions. This problem may be tackled by: (i) study of modules of the network independently; (ii) ensemble simulations to explore many plausible parameter combinations; (iii) analysis of 'sloppy' parameter behaviour, revealing interdependent parameter combinations with little influence; (iv) multiscale analysis that combines molecular and whole network data; and (v) measuring metabolic flux (rate of flow) in vivo via stable isotope labelling. For the latter method, carbon transition networks were modelled with systems of ordinary differential equations, but we show that coloured Petri nets provide a more intuitive graphical approach. Analysis of parameter sensitivities shows that only a few parameter combinations have a large effect on predictions. Model analysis of high-energy phosphate transport indicates that membrane permeability, inaccurately known at the organellar level, can be well determined from whole-organ responses. Ensemble simulations that take into account the imprecision of measured molecular parameters contradict the popular hypothesis that high-energy phosphate transport in heart muscle is mostly by phosphocreatine. Combining modular, multiscale, ensemble and sloppy modelling approaches with in vivo flux measurements may prove indispensable for the modelling of the large human metabolic system.


Asunto(s)
Metabolismo , Modelos Biológicos , Animales , Humanos , Miocardio/metabolismo , Fosfatos/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA