Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Mol Inform ; 42(3): e2200232, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36529710

RESUMEN

Maximum common substructures (MCS) have received a lot of attention in the chemoinformatics community. They are typically used as a similarity measure between molecules, showing high predictive performance when used in classification tasks, while being easily explainable substructures. In the present work, we applied the Pairwise Maximum Common Subgraph Feature Generation (PMCSFG) algorithm to automatically detect toxicophores (structural alerts) and to compute fingerprints based on MCS. We present a comparison between our MCS-based fingerprints and 12 well-known chemical fingerprints when used as features in machine learning models. We provide an experimental evaluation and discuss the usefulness of the different methods on mutagenicity data. The features generated by the MCS method have a state-of-the-art performance when predicting mutagenicity, while they are more interpretable than the traditional chemical fingerprints.


Asunto(s)
Algoritmos , Mutágenos , Mutágenos/química , Mutagénesis , Aprendizaje Automático
2.
Proc Natl Acad Sci U S A ; 116(36): 18142-18147, 2019 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-31420515

RESUMEN

One of the most challenging tasks in modern science is the development of systems biology models: Existing models are often very complex but generally have low predictive performance. The construction of high-fidelity models will require hundreds/thousands of cycles of model improvement, yet few current systems biology research studies complete even a single cycle. We combined multiple software tools with integrated laboratory robotics to execute three cycles of model improvement of the prototypical eukaryotic cellular transformation, the yeast (Saccharomyces cerevisiae) diauxic shift. In the first cycle, a model outperforming the best previous diauxic shift model was developed using bioinformatic and systems biology tools. In the second cycle, the model was further improved using automatically planned experiments. In the third cycle, hypothesis-led experiments improved the model to a greater extent than achieved using high-throughput experiments. All of the experiments were formalized and communicated to a cloud laboratory automation system (Eve) for automatic execution, and the results stored on the semantic web for reuse. The final model adds a substantial amount of knowledge about the yeast diauxic shift: 92 genes (+45%), and 1,048 interactions (+147%). This knowledge is also relevant to understanding cancer, the immune system, and aging. We conclude that systems biology software tools can be combined and integrated with laboratory robots in closed-loop cycles.


Asunto(s)
Biología Computacional , Regulación Fúngica de la Expresión Génica , Robótica , Saccharomyces cerevisiae , Programas Informáticos , Biología de Sistemas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
3.
PLoS One ; 13(4): e0195997, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29698494

RESUMEN

MOTIVATION: Graphlets are small network patterns that can be counted in order to characterise the structure of a network (topology). As part of a topology optimisation process, one could use graphlet counts to iteratively modify a network and keep track of the graphlet counts, in order to achieve certain topological properties. Up until now, however, graphlets were not suited as a metric for performing topology optimisation; when millions of minor changes are made to the network structure it becomes computationally intractable to recalculate all the graphlet counts for each of the edge modifications. RESULTS: IncGraph is a method for calculating the differences in graphlet counts with respect to the network in its previous state, which is much more efficient than calculating the graphlet occurrences from scratch at every edge modification made. In comparison to static counting approaches, our findings show IncGraph reduces the execution time by several orders of magnitude. The usefulness of this approach was demonstrated by developing a graphlet-based metric to optimise gene regulatory networks. IncGraph is able to quickly quantify the topological impact of small changes to a network, which opens novel research opportunities to study changes in topologies in evolving or online networks, or develop graphlet-based criteria for topology optimisation. AVAILABILITY: IncGraph is freely available as an open-source R package on CRAN (incgraph). The development version is also available on GitHub (rcannood/incgraph).


Asunto(s)
Programas Informáticos , Algoritmos , Redes Reguladoras de Genes , Modelos Biológicos
4.
PLoS Comput Biol ; 14(4): e1006097, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29684010

RESUMEN

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner's predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.


Asunto(s)
Aprendizaje Automático , Retroelementos , Secuencias Repetidas Terminales , Animales , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Biología Computacional , Secuencia Conservada , ADN de Plantas/genética , Árboles de Decisión , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Evolución Molecular , Genoma de los Insectos , Genoma de Planta , Programas Informáticos
5.
Mol Inform ; 36(10)2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28590546

RESUMEN

This article introduces a new type of structural fragment called a geometrical pattern. Such geometrical patterns are defined as molecular graphs that include a labelling of atoms together with constraints on interatomic distances. The discovery of geometrical patterns in a chemical dataset relies on the induction of multiple decision trees combined in random forests. Each computational step corresponds to a refinement of a preceding set of constraints, extending a previous geometrical pattern. This paper focuses on the mutagenicity of chemicals via the definition of structural alerts in relation with these geometrical patterns. It follows an experimental assessment of the main geometrical patterns to show how they can efficiently originate the definition of a chemical feature related to a chemical function or a chemical property. Geometrical patterns have provided a valuable and innovative approach to bring new pieces of information for discovering and assessing structural characteristics in relation to a particular biological phenotype.


Asunto(s)
Mutagénesis/fisiología , Carcinógenos/química , Mutagénesis/genética , Pruebas de Mutagenicidad , Mutágenos/química , Relación Estructura-Actividad
6.
Expert Rev Proteomics ; 13(5): 495-511, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27031651

RESUMEN

With the current expanded technical capabilities to perform mass spectrometry-based biomedical proteomics experiments, an improved focus on the design of experiments is crucial. As it is clear that ignoring the importance of a good design leads to an unprecedented rate of false discoveries which would poison our results, more and more tools are developed to help researchers designing proteomic experiments. In this review, we apply statistical thinking to go through the entire proteomics workflow for biomarker discovery and validation and relate the considerations that should be made at the level of hypothesis building, technology selection, experimental design and the optimization of the experimental parameters.


Asunto(s)
Espectrometría de Masas/métodos , Proteómica/métodos , Proyectos de Investigación , Humanos , Proteómica/estadística & datos numéricos , Proteómica/tendencias
7.
J R Soc Interface ; 12(104): 20141289, 2015 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-25652463

RESUMEN

There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robot Scientist 'Eve' designed to make drug discovery more economical. A Robot Scientist is a laboratory automation system that uses artificial intelligence (AI) techniques to discover scientific knowledge through cycles of experimentation. Eve integrates and automates library-screening, hit-confirmation, and lead generation through cycles of quantitative structure activity relationship learning and testing. Using econometric modelling we demonstrate that the use of AI to select compounds economically outperforms standard drug screening. For further efficiency Eve uses a standardized form of assay to compute Boolean functions of compound properties. These assays can be quickly and cheaply engineered using synthetic biology, enabling more targets to be assayed for a given budget. Eve has repositioned several drugs against specific targets in parasites that cause tropical diseases. One validated discovery is that the anti-cancer compound TNP-470 is a potent inhibitor of dihydrofolate reductase from the malaria-causing parasite Plasmodium vivax.


Asunto(s)
Diseño de Fármacos , Reposicionamiento de Medicamentos , Enfermedades Raras/tratamiento farmacológico , Tecnología Farmacéutica/tendencias , Algoritmos , Antineoplásicos/uso terapéutico , Automatización , Evaluación Preclínica de Medicamentos , Humanos , Malaria Vivax/tratamiento farmacológico , Modelos Estadísticos , Plasmodium vivax/efectos de los fármacos , Relación Estructura-Actividad Cuantitativa , Análisis de Regresión , Reproducibilidad de los Resultados , Programas Informáticos , Medicina Tropical
8.
Biol Direct ; 10: 1, 2015 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-25564011

RESUMEN

BACKGROUND: A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. RESULTS: We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. CONCLUSIONS: This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.


Asunto(s)
Biología Computacional/métodos , Evolución Molecular , Proteasa del VIH/genética , VIH-1/genética , Productos del Gen gag del Virus de la Inmunodeficiencia Humana/genética , Área Bajo la Curva , Bases de Datos de Proteínas , Productos del Gen gag/química , Humanos , Modelos Moleculares , Modelos Estadísticos , Unión Proteica , Estructura Terciaria de Proteína , Reproducibilidad de los Resultados , Proteínas Virales/química
9.
Proteomics ; 14(4-5): 353-66, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24323524

RESUMEN

Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.


Asunto(s)
Inteligencia Artificial , Biología Computacional , Proteómica/métodos , Estándares de Referencia , Proyectos de Investigación
10.
Bioinformatics ; 29(15): 1913-4, 2013 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-23709496

RESUMEN

SUMMARY: We present PIUS, a tool that identifies peptides from tandem mass spectrometry data by analyzing the six-frame translation of a complete genome. It differs from earlier studies that have performed such a genomic search in two ways: (i) it considers a larger search space and (ii) it is designed for natural peptide identification rather than proteomics. Differently from other peptidomics tools designed for genome-wide searches, PIUS does not limit the analysis to a set of sequences that match a list of de novo reconstructions. AVAILABILITY: Source code, executables and a detailed technical report are freely available at http://dtai.cs.kuleuven.be/ml/systems/pius. CONTACT: eduardo.costa@cs.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptidos/química , Programas Informáticos , Espectrometría de Masas en Tándem , Algoritmos , Animales , Línea Celular , Bases de Datos de Proteínas , Genoma , Genómica , Ratones , Péptidos/análisis , Proteómica/métodos , Análisis de Secuencia de Proteína
11.
J Proteome Res ; 12(5): 2253-9, 2013 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-23517142

RESUMEN

Trypsin is the workhorse protease in mass spectrometry-based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction with Decision Trees), an algorithm based on a decision tree ensemble that was learned on publicly available peptide identification data from the PRIDE repository. We demonstrate that CP-DT is able to accurately predict tryptic cleavage: tests on three independent data sets show that CP-DT significantly outperforms the Keil rules that are currently used to predict tryptic cleavage. Moreover, the trees generated by CP-DT can make predictions efficiently and are interpretable by domain experts.


Asunto(s)
Modelos Biológicos , Tripsina/química , Algoritmos , Secuencia de Aminoácidos , Animales , Inteligencia Artificial , Interpretación Estadística de Datos , Árboles de Decisión , Humanos , Proteolisis , Proteómica
12.
BMC Med Inform Decis Mak ; 11: 64, 2011 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-22027016

RESUMEN

BACKGROUND: The intensive care unit (ICU) length of stay (LOS) of patients undergoing cardiac surgery may vary considerably, and is often difficult to predict within the first hours after admission. The early clinical evolution of a cardiac surgery patient might be predictive for his LOS. The purpose of the present study was to develop a predictive model for ICU discharge after non-emergency cardiac surgery, by analyzing the first 4 hours of data in the computerized medical record of these patients with Gaussian processes (GP), a machine learning technique. METHODS: Non-interventional study. Predictive modeling, separate development (n = 461) and validation (n = 499) cohort. GP models were developed to predict the probability of ICU discharge the day after surgery (classification task), and to predict the day of ICU discharge as a discrete variable (regression task). GP predictions were compared with predictions by EuroSCORE, nurses and physicians. The classification task was evaluated using aROC for discrimination, and Brier Score, Brier Score Scaled, and Hosmer-Lemeshow test for calibration. The regression task was evaluated by comparing median actual and predicted discharge, loss penalty function (LPF) ((actual-predicted)/actual) and calculating root mean squared relative errors (RMSRE). RESULTS: Median (P25-P75) ICU length of stay was 3 (2-5) days. For classification, the GP model showed an aROC of 0.758 which was significantly higher than the predictions by nurses, but not better than EuroSCORE and physicians. The GP had the best calibration, with a Brier Score of 0.179 and Hosmer-Lemeshow p-value of 0.382. For regression, GP had the highest proportion of patients with a correctly predicted day of discharge (40%), which was significantly better than the EuroSCORE (p < 0.001) and nurses (p = 0.044) but equivalent to physicians. GP had the lowest RMSRE (0.408) of all predictive models. CONCLUSIONS: A GP model that uses PDMS data of the first 4 hours after admission in the ICU of scheduled adult cardiac surgery patients was able to predict discharge from the ICU as a classification as well as a regression task. The GP model demonstrated a significantly better discriminative power than the EuroSCORE and the ICU nurses, and at least as good as predictions done by ICU physicians. The GP model was the only well calibrated model.


Asunto(s)
Unidades de Cuidados Intensivos/organización & administración , Modelos Teóricos , Alta del Paciente , Procedimientos Quirúrgicos Operativos , Adulto , Inteligencia Artificial , Procedimientos Quirúrgicos Cardíacos , Registros Electrónicos de Salud , Humanos , Tiempo de Internación , Distribución Normal
13.
Stud Health Technol Inform ; 150: 590-4, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19745380

RESUMEN

This work studies the impact of using dynamic information as features in a machine learning algorithm for the prediction task of classifying critically ill patients in two classes according to the time they need to reach a stable state after coronary bypass surgery: less or more than nine hours. On the basis of five physiological variables different dynamic features were extracted. These sets of features served subsequently as inputs for a Gaussian process and the prediction results were compared with the case where only admission data was used for the classification. The dynamic features, especially the cepstral coefficients (aROC: 0.749, Brier score: 0.206), resulted in higher performances when compared to static admission data (aROC: 0.547, Brier score: 0.247). In all cases, the Gaussian process classifier outperformed logistic regression.


Asunto(s)
Almacenamiento y Recuperación de la Información , Estadística como Asunto/métodos , Anciano , Bélgica , Femenino , Humanos , Unidades de Cuidados Intensivos , Masculino , Persona de Mediana Edad , Distribución Normal , Desconexión del Ventilador/estadística & datos numéricos
14.
Best Pract Res Clin Anaesthesiol ; 23(1): 127-43, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19449621

RESUMEN

Computerization in healthcare in general, and in the operating room (OR) and intensive care unit (ICU) in particular, is on the rise. This leads to large patient databases, with specific properties. Machine learning techniques are able to examine and to extract knowledge from large databases in an automatic way. Although the number of potential applications for these techniques in medicine is large, few medical doctors are familiar with their methodology, advantages and pitfalls. A general overview of machine learning techniques, with a more detailed discussion of some of these algorithms, is presented in this review.


Asunto(s)
Inteligencia Artificial , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Sistemas de Registros Médicos Computarizados , Algoritmos , Biología Computacional/métodos , Sistemas de Apoyo a Decisiones Clínicas , Humanos , Unidades de Cuidados Intensivos , Redes Neurales de la Computación , Quirófanos/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...