Búsqueda | Portal de Búsqueda de la BVS España

1.

The Bigger Fish: A Comparison of Meta-Learning QSAR Models on Low-Resourced Aquatic Toxicity Regression Tasks.

Schlender, Thalea; Viljanen, Markus; van Rijn, Jan N; Mohr, Felix; Peijnenburg, Willie Jgm; Hoos, Holger H; Rorije, Emiel; Wong, Albert.

Environ Sci Technol ; 57(46): 17818-17830, 2023 Nov 21.

Artículo en Inglés | MEDLINE | ID: mdl-37315216

RESUMEN

Toxicological information as needed for risk assessments of chemical compounds is often sparse. Unfortunately, gathering new toxicological information experimentally often involves animal testing. Simulated alternatives, e.g., quantitative structure-activity relationship (QSAR) models, are preferred to infer the toxicity of new compounds. Aquatic toxicity data collections consist of many related tasksâeach predicting the toxicity of new compounds on a given species. Since many of these tasks are inherently low-resource, i.e., involve few associated compounds, this is challenging. Meta-learning is a subfield of artificial intelligence that can lead to more accurate models by enabling the utilization of information across tasks. In our work, we benchmark various state-of-the-art meta-learning techniques for building QSAR models, focusing on knowledge sharing between species. Specifically, we employ and compare transformational machine learning, model-agnostic meta-learning, fine-tuning, and multi-task models. Our experiments show that established knowledge-sharing techniques outperform single-task approaches. We recommend the use of multi-task random forest models for aquatic toxicity modeling, which matched or exceeded the performance of other approaches and robustly produced good results in the low-resource settings we studied. This model functions on a species level, predicting toxicity for multiple species across various phyla, with flexible exposure duration and on a large chemical applicability domain.

Asunto(s)

Inteligencia Artificial , Relación Estructura-Actividad Cuantitativa , Animales , Peces

2.

The CLAIRE COVID-19 initiative: approach, experiences and recommendations.

Bontempi, Gianluca; Chavarriaga, Ricardo; eD Canck, Hans; Girardi, Emanuela; Hoos, Holger; Kilbane-Dawe, Iarla; Ball, Tonio; Nowé, Ann; Sousa, Jose; Bacciu, Davide; Aldinucci, Marco; eD Domenico, Manlio; Saffiotti, Alessandro; Maratea, Marco.

Ethics Inf Technol ; 23(Suppl 1): 127-133, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-33584129

RESUMEN

A volunteer effort by Artificial Intelligence (AI) researchers has shown it can deliver significant research outcomes rapidly to help tackle COVID-19. Within two months, CLAIRE's self-organising volunteers delivered the World's first comprehensive curated repository of COVID-19-related datasets useful for drug-repurposing, drafted review papers on the role CT/X-ray scan analysis and robotics could play, and progressed research in other areas. Given the pace required and nature of voluntary efforts, the teams faced a number of challenges. These offer insights in how better to prepare for future volunteer scientific efforts and large scale, data-dependent AI collaborations in general. We offer seven recommendations on how to best leverage such efforts and collaborations in the context of managing future crises.

3.

Successive Statistical and Structure-Based Modeling to Identify Chemically Novel Kinase Inhibitors.

Burggraaff, Lindsey; Lenselink, Eelke B; Jespers, Willem; van Engelen, Jesper; Bongers, Brandon J; González, Marina Gorostiola; Liu, Rongfang; Hoos, Holger H; van Vlijmen, Herman W T; IJzerman, Adriaan P; van Westen, Gerard J P.

J Chem Inf Model ; 60(9): 4283-4295, 2020 09 28.

Artículo en Inglés | MEDLINE | ID: mdl-32343143

RESUMEN

Kinases are frequently studied in the context of anticancer drugs. Their involvement in cell responses, such as proliferation, differentiation, and apoptosis, makes them interesting subjects in multitarget drug design. In this study, a workflow is presented that models the bioactivity spectra for two panels of kinases: (1) inhibition of RET, BRAF, SRC, and S6K, while avoiding inhibition of MKNK1, TTK, ERK8, PDK1, and PAK3, and (2) inhibition of AURKA, PAK1, FGFR1, and LKB1, while avoiding inhibition of PAK3, TAK1, and PIK3CA. Both statistical and structure-based models were included, which were thoroughly benchmarked and optimized. A virtual screening was performed to test the workflow for one of the main targets, RET kinase. This resulted in 5 novel and chemically dissimilar RET inhibitors with remaining RET activity of <60% (at a concentration of 10 µM) and similarities with known RET inhibitors from 0.18 to 0.29 (Tanimoto, ECFP6). The four more potent inhibitors were assessed in a concentration range and proved to be modestly active with a pIC50 value of 5.1 for the most active compound. The experimental validation of inhibitors for RET strongly indicates that the multitarget workflow is able to detect novel inhibitors for kinases, and hence, this workflow can potentially be applied in polypharmacology modeling. We conclude that this approach can identify new chemical matter for existing targets. Moreover, this workflow can easily be applied to other targets as well.

Asunto(s)

Antineoplásicos , Proteínas Proto-Oncogénicas c-ret , Antineoplásicos/farmacología , Diseño de Fármacos , Polifarmacología , Inhibidores de Proteínas Quinasas/farmacología

4.

Automated Algorithm Selection: Survey and Perspectives.

Kerschke, Pascal; Hoos, Holger H; Neumann, Frank; Trautmann, Heike.

Evol Comput ; 27(1): 3-45, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-30475672

RESUMEN

It has long been observed that for practically any computational problem that has been intensely studied, different instances are best solved using different algorithms. This is particularly pronounced for computationally hard problems, where in most cases, no single algorithm defines the state of the art; instead, there is a set of algorithms with complementary strengths. This performance complementarity can be exploited in various ways, one of which is based on the idea of selecting, from a set of given algorithms, for each problem instance to be solved the one expected to perform best. The task of automatically selecting an algorithm from a given set is known as the per-instance algorithm selection problem and has been intensely studied over the past 15 years, leading to major improvements in the state of the art in solving a growing number of discrete combinatorial problems, including propositional satisfiability and AI planning. Per-instance algorithm selection also shows much promise for boosting performance in solving continuous and mixed discrete/continuous optimisation problems. This survey provides an overview of research in automated algorithm selection, ranging from early and seminal works to recent and promising application areas. Different from earlier work, it covers applications to discrete and continuous problems, and discusses algorithm selection in context with conceptually related approaches, such as algorithm configuration, scheduling, or portfolio selection. Since informative and cheaply computable problem instance features provide the basis for effective per-instance algorithm selection systems, we also provide an overview of such features for discrete and continuous problems. Finally, we provide perspectives on future work in the area and discuss a number of open research challenges.

Asunto(s)

Algoritmos , Simulación por Computador , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnicas de Apoyo para la Decisión , Humanos , Encuestas y Cuestionarios

5.

Automatic Configuration of Multi-Objective Local Search Algorithms for Permutation Problems.

Blot, Aymeric; Kessaci, Marie-Éléonore; Jourdan, Laetitia; Hoos, Holger H.

Evol Comput ; 27(1): 147-171, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-30407875

RESUMEN

Automatic algorithm configuration (AAC) is becoming a key ingredient in the design of high-performance solvers for challenging optimisation problems. However, most existing work on AAC deals with configuration procedures that optimise a single performance metric of a given, single-objective algorithm. Of course, these configurators can also be used to optimise the performance of multi-objective algorithms, as measured by a single performance indicator. In this work, we demonstrate that better results can be obtained by using a native, multi-objective algorithm configuration procedure. Specifically, we compare three AAC approaches: one considering only the hypervolume indicator, a second optimising the weighted sum of hypervolume and spread, and a third that simultaneously optimises these complementary indicators, using a genuinely multi-objective approach. We assess these approaches by applying them to a highly-parametric local search framework for two widely studied multi-objective optimisation problems, the bi-objective permutation flowshop and travelling salesman problems. Our results show that multi-objective algorithms are indeed best configured using a multi-objective configurator.

Asunto(s)

Algoritmos , Simulación por Computador , Almacenamiento y Recuperación de la Información/métodos , Modelos Teóricos , Reconocimiento de Normas Patrones Automatizadas/métodos , Solución de Problemas , Humanos

6.

Leveraging TSP Solver Complementarity through Machine Learning.

Kerschke, Pascal; Kotthoff, Lars; Bossek, Jakob; Hoos, Holger H; Trautmann, Heike.

Evol Comput ; 26(4): 597-620, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-28836836

RESUMEN

The Travelling Salesperson Problem (TSP) is one of the best-studied NP-hard problems. Over the years, many different solution approaches and solvers have been developed. For the first time, we directly compare five state-of-the-art inexact solvers-namely, LKH, EAX, restart variants of those, and MAOS-on a large set of well-known benchmark instances and demonstrate complementary performance, in that different instances may be solved most effectively by different algorithms. We leverage this complementarity to build an algorithm selector, which selects the best TSP solver on a per-instance basis and thus achieves significantly improved performance compared to the single best solver, representing an advance in the state of the art in solving the Euclidean TSP. Our in-depth analysis of the selectors provides insight into what drives this performance improvement.

7.

Critical assessment of automated flow cytometry data analysis techniques.

Aghaeepour, Nima; Finak, Greg; Hoos, Holger; Mosmann, Tim R; Brinkman, Ryan; Gottardo, Raphael; Scheuermann, Richard H.

Nat Methods ; 10(3): 228-38, 2013 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-23396282

RESUMEN

Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.

Asunto(s)

Biología Computacional , Citometría de Flujo/métodos , Procesamiento de Imagen Asistido por Computador , Algoritmos , Animales , Análisis por Conglomerados , Interpretación Estadística de Datos , Citometría de Flujo/normas , Citometría de Flujo/estadística & datos numéricos , Enfermedad Injerto contra Huésped/sangre , Enfermedad Injerto contra Huésped/patología , Humanos , Leucocitos Mononucleares/patología , Leucocitos Mononucleares/virología , Linfoma de Células B Grandes Difuso/sangre , Linfoma de Células B Grandes Difuso/patología , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Programas Informáticos , Fiebre del Nilo Occidental/sangre , Fiebre del Nilo Occidental/patología , Fiebre del Nilo Occidental/virología

8.

Enhanced flowType/RchyOptimyx: a BioConductor pipeline for discovery in high-dimensional cytometry data.

O'Neill, Kieran; Jalali, Adrin; Aghaeepour, Nima; Hoos, Holger; Brinkman, Ryan R.

Bioinformatics ; 30(9): 1329-30, 2014 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-24407226

RESUMEN

We present a significantly improved version of the flowType and RchyOptimyx BioConductor-based pipeline that is both 14 times faster and can accommodate multiple levels of biomarker expression for up to 96 markers. With these improvements, the pipeline is positioned to be an integral part of data analysis for high-throughput experiments on high-dimensional single-cell assay platforms, including flow cytometry, mass cytometry and single-cell RT-qPCR.

Asunto(s)

Citometría de Flujo/métodos , Antígenos CD/análisis , Biomarcadores/análisis , Programas Informáticos

9.

Ensemble-based prediction of RNA secondary structures.

Aghaeepour, Nima; Hoos, Holger H.

BMC Bioinformatics ; 14: 139, 2013 Apr 24.

Artículo en Inglés | MEDLINE | ID: mdl-23617269

RESUMEN

BACKGROUND: Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. RESULTS: In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. CONCLUSIONS: Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

Asunto(s)

Algoritmos , ARN/química , Conformación de Ácido Nucleico , Programas Informáticos

10.

Early immunologic correlates of HIV protection can be identified from computational analysis of complex multivariate T-cell flow cytometry assays.

Aghaeepour, Nima; Chattopadhyay, Pratip K; Ganesan, Anuradha; O'Neill, Kieran; Zare, Habil; Jalali, Adrin; Hoos, Holger H; Roederer, Mario; Brinkman, Ryan R.

Bioinformatics ; 28(7): 1009-16, 2012 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-22383736

RESUMEN

MOTIVATION: Polychromatic flow cytometry (PFC), has enormous power as a tool to dissect complex immune responses (such as those observed in HIV disease) at a single cell level. However, analysis tools are severely lacking. Although high-throughput systems allow rapid data collection from large cohorts, manual data analysis can take months. Moreover, identification of cell populations can be subjective and analysts rarely examine the entirety of the multidimensional dataset (focusing instead on a limited number of subsets, the biology of which has usually already been well-described). Thus, the value of PFC as a discovery tool is largely wasted. RESULTS: To address this problem, we developed a computational approach that automatically reveals all possible cell subsets. From tens of thousands of subsets, those that correlate strongly with clinical outcome are selected and grouped. Within each group, markers that have minimal relevance to the biological outcome are removed, thereby distilling the complex dataset into the simplest, most clinically relevant subsets. This allows complex information from PFC studies to be translated into clinical or resource-poor settings, where multiparametric analysis is less feasible. We demonstrate the utility of this approach in a large (n=466), retrospective, 14-parameter PFC study of early HIV infection, where we identify three T-cell subsets that strongly predict progression to AIDS (only one of which was identified by an initial manual analysis). AVAILABILITY: The 'flowType: Phenotyping Multivariate PFC Assays' package is available through Bioconductor. Additional documentation and examples are available at: www.terryfoxlab.ca/flowsite/flowType/ SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: rbrinkman@bccrc.ca.

Asunto(s)

Biología Computacional/métodos , Citometría de Flujo , Infecciones por VIH/inmunología , Subgrupos de Linfocitos T/inmunología , Biomarcadores/análisis , Humanos , Inmunofenotipificación/métodos , Valor Predictivo de las Pruebas , Modelos de Riesgos Proporcionales , Estudios Retrospectivos , Subgrupos de Linfocitos T/citología

11.

Analysis of energy-based algorithms for RNA secondary structure prediction.

Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H.

BMC Bioinformatics ; 13: 22, 2012 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-22296803

RESUMEN

BACKGROUND: RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. RESULTS: We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). CONCLUSIONS: Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

Asunto(s)

Algoritmos , ARN/química , Conformación de Ácido Nucleico , Valor Predictivo de las Pruebas , Ribonucleasa P/química , Termodinámica

12.

Computational approaches for RNA energy parameter estimation.

Andronescu, Mirela; Condon, Anne; Hoos, Holger H; Mathews, David H; Murphy, Kevin P.

RNA ; 16(12): 2304-18, 2010 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-20940338

RESUMEN

Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.

Asunto(s)

Biología Computacional/métodos , Metabolismo Energético/fisiología , ARN/química , ARN/metabolismo , Estadística como Asunto/métodos , Algoritmos , Animales , Composición de Base , Secuencia de Bases , Biología Computacional/estadística & datos numéricos , Humanos , Modelos Teóricos , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis de Secuencia de ARN

13.

RchyOptimyx: cellular hierarchy optimization for flow cytometry.

Aghaeepour, Nima; Jalali, Adrin; O'Neill, Kieran; Chattopadhyay, Pratip K; Roederer, Mario; Hoos, Holger H; Brinkman, Ryan R.

Cytometry A ; 81(12): 1022-30, 2012 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-23044634

RESUMEN

Analysis of high-dimensional flow cytometry datasets can reveal novel cell populations with poorly understood biology. Following discovery, characterization of these populations in terms of the critical markers involved is an important step, as this can help to both better understand the biology of these populations and aid in designing simpler marker panels to identify them on simpler instruments and with fewer reagents (i.e., in resource poor or highly regulated clinical settings). However, current tools to design panels based on the biological characteristics of the target cell populations work exclusively based on technical parameters (e.g., instrument configurations, spectral overlap, and reagent availability). To address this shortcoming, we developed RchyOptimyx (cellular hieraRCHY OPTIMization), a computational tool that constructs cellular hierarchies by combining automated gating with dynamic programming and graph theory to provide the best gating strategies to identify a target population to a desired level of purity or correlation with a clinical outcome, using the simplest possible marker panels. RchyOptimyx can assess and graphically present the trade-offs between marker choice and population specificity in high-dimensional flow or mass cytometry datasets. We present three proof-of-concept use cases for RchyOptimyx that involve 1) designing a panel of surface markers for identification of rare populations that are primarily characterized using their intracellular signature; 2) simplifying the gating strategy for identification of a target cell population; 3) identification of a non-redundant marker set to identify a target cell population.

Asunto(s)

Células de la Médula Ósea/citología , Citometría de Flujo/métodos , Programas Informáticos , Algoritmos , Antígenos CD/análisis , Antígenos CD/inmunología , Biomarcadores/análisis , Células de la Médula Ósea/inmunología , Biología Computacional/métodos , Infecciones por VIH/inmunología , Humanos , Inmunofenotipificación/métodos , Interleucina-7/inmunología , Lipopolisacáridos/inmunología , Fenotipo , Coloración y Etiquetado , Linfocitos T/citología , Linfocitos T/inmunología

14.

VPint: value propagation-based spatial interpolation.

Arp, Laurens; Baratchi, Mitra; Hoos, Holger.

Data Min Knowl Discov ; 36(5): 1647-1678, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35789913

RESUMEN

Given the common problem of missing data in real-world applications from various fields, such as remote sensing, ecology and meteorology, the interpolation of missing spatial and spatio-temporal data can be of tremendous value. Existing methods for spatial interpolation, most notably Gaussian processes and spatial autoregressive models, tend to suffer from (a) a trade-off between modelling local or global spatial interaction, (b) the assumption there is only one possible path between two points, and (c) the assumption of homogeneity of intermediate locations between points. Addressing these issues, we propose a value propagation-based spatial interpolation method called VPint, inspired by Markov reward processes (MRPs), and introduce two variants thereof: (i) a static discount (SD-MRP) and (ii) a data-driven weight prediction (WP-MRP) variant. Both these interpolation variants operate locally, while implicitly accounting for global spatial relationships in the entire system through recursion. We evaluated our proposed methods by comparing the mean absolute error, root mean squared error, peak signal-to-noise ratio and structural similarity of interpolated grid cells to those of 8 common baselines. Our analysis involved detailed experiments on a synthetic and two real-world datasets, as well as experiments on convergence and scalability. Empirical results demonstrate the competitive advantage of VPint on randomly missing data, where it performed better than baselines in terms of mean absolute error and structural similarity, as well as spatially clustered missing data, where it performed best on 2 out of 3 datasets.

15.

Rapid cell population identification in flow cytometry data.

Aghaeepour, Nima; Nikolic, Radina; Hoos, Holger H; Brinkman, Ryan R.

Cytometry A ; 79(1): 6-13, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21182178

RESUMEN

We have developed flowMeans, a time-efficient and accurate method for automated identification of cell populations in flow cytometry (FCM) data based on K-means clustering. Unlike traditional K-means, flowMeans can identify concave cell populations by modelling a single population with multiple clusters. flowMeans uses a change point detection algorithm to determine the number of sub-populations, enabling the method to be used in high throughput FCM data analysis pipelines. Our approach compares favorably to manual analysis by human experts and current state-of-the-art automated gating algorithms. flowMeans is freely available as an open source R package through Bioconductor.

Asunto(s)

Citometría de Flujo/métodos , Citometría de Flujo/estadística & datos numéricos , Algoritmos , Automatización , Análisis por Conglomerados , Enfermedad Injerto contra Huésped/sangre , Humanos , Linfoma de Células B Grandes Difuso/patología , Modelos Estadísticos

16.

Predicting the 9-year course of mood and anxiety disorders with automated machine learning: A comparison between auto-sklearn, naïve Bayes classifier, and traditional logistic regression.

van Eeden, Wessel A; Luo, Chuan; van Hemert, Albert M; Carlier, Ingrid V E; Penninx, Brenda W; Wardenaar, Klaas J; Hoos, Holger; Giltay, Erik J.

Psychiatry Res ; 299: 113823, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33667949

RESUMEN

BACKGROUND: Predicting the onset and course of mood and anxiety disorders is of clinical importance but remains difficult. We compared the predictive performances of traditional logistic regression, basic probabilistic machine learning (ML) methods, and automated ML (Auto-sklearn). METHODS: Data were derived from the Netherlands Study of Depression and Anxiety. We compared how well multinomial logistic regression, a naïve Bayes classifier, and Auto-sklearn predicted depression and anxiety diagnoses at a 2-, 4-, 6-, and 9-year follow up, operationalized as binary or categorical variables. Predictor sets included demographic and self-report data, which can be easily collected in clinical practice at two initial time points (baseline and 1-year follow up). RESULTS: At baseline, participants were 42.2 years old, 66.5% were women, and 53.6% had a current mood or anxiety disorder. The three methods were similarly successful in predicting (mental) health status, with correct predictions for up to 79% (95% CI 75-81%). However, Auto-sklearn was superior when assessing a more complex dataset with individual item scores. CONCLUSIONS: Automated ML methods added only limited value, compared to traditional data modelling when predicting the onset and course of depression and anxiety. However, they hold potential for automatization and may be better suited for complex datasets.

Asunto(s)

Trastornos de Ansiedad , Aprendizaje Automático , Adulto , Ansiedad/diagnóstico , Trastornos de Ansiedad/diagnóstico , Teorema de Bayes , Femenino , Humanos , Modelos Logísticos

17.

RNA STRAND: the RNA secondary structure and statistical analysis database.

Andronescu, Mirela; Bereg, Vera; Hoos, Holger H; Condon, Anne.

BMC Bioinformatics ; 9: 340, 2008 Aug 13.

Artículo en Inglés | MEDLINE | ID: mdl-18700982

RESUMEN

BACKGROUND: The ability to access, search and analyse secondary structures of a large set of known RNA molecules is very important for deriving improved RNA energy models, for evaluating computational predictions of RNA secondary structures and for a better understanding of RNA folding. Currently there is no database that can easily provide these capabilities for almost all RNA molecules with known secondary structures. RESULTS: In this paper we describe RNA STRAND - the RNA secondary STRucture and statistical ANalysis Database, a curated database containing known secondary structures of any type and organism. Our new database provides a wide collection of known RNA secondary structures drawn from public databases, searchable and downloadable in a common format. Comprehensive statistical information on the secondary structures in our database is provided using the RNA Secondary Structure Analyser, a new tool we have developed to analyse RNA secondary structures. The information thus obtained is valuable for understanding to which extent and with which probability certain structural motifs can appear. We outline several ways in which the data provided in RNA STRAND can facilitate research on RNA structure, including the improvement of RNA energy models and evaluation of secondary structure prediction programs. In order to keep up-to-date with new RNA secondary structure experiments, we offer the necessary tools to add solved RNA secondary structures to our database and invite researchers to contribute to RNA STRAND. CONCLUSION: RNA STRAND is a carefully assembled database of trusted RNA secondary structures, with easy on-line tools for searching, analyzing and downloading user selected entries, and is publicly available at http://www.rnasoft.ca/strand.

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Modelos Químicos , Modelos Moleculares , ARN/química , ARN/ultraestructura , Interfaz Usuario-Computador , Gráficos por Computador , Simulación por Computador , Almacenamiento y Recuperación de la Información/métodos , Conformación de Ácido Nucleico

18.

Correlation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces cerevisiae.

Rogic, Sanja; Montpetit, Ben; Hoos, Holger H; Mackworth, Alan K; Ouellette, Bf Francis; Hieter, Philip.

BMC Genomics ; 9: 355, 2008 Jul 29.

Artículo en Inglés | MEDLINE | ID: mdl-18664289

RESUMEN

BACKGROUND: Secondary structure interactions within introns have been shown to be essential for efficient splicing of several yeast genes. The nature of these base-pairing interactions and their effect on splicing efficiency were most extensively studied in ribosomal protein gene RPS17B (previously known as RP51B). It was determined that complementary pairing between two sequence segments located downstream of the 5' splice site and upstream of the branchpoint sequence promotes efficient splicing of the RPS17B pre-mRNA, presumably by shortening the branchpoint distance. However, no attempts were made to compute a shortened, 'structural' branchpoint distance and thus the functional relationship between this distance and the splicing efficiency remains unknown. RESULTS: In this paper we use computational RNA secondary structure prediction to analyze the secondary structure of the RPS17B intron. We show that it is necessary to consider suboptimal structure predictions and to compute the structural branchpoint distances in order to explain previously published splicing efficiency results. Our study reveals that there is a tight correlation between this distance and splicing efficiency levels of intron mutants described in the literature. We experimentally test this correlation on additional RPS17B mutants and intron mutants within two other yeast genes. CONCLUSION: The proposed model of secondary structure requirements for efficient splicing is the first attempt to specify the functional relationship between pre-mRNA secondary structure and splicing. Our findings provide further insights into the role of pre-mRNA secondary structure in gene splicing in yeast and also offer basis for improvement of computational methods for splice site identification and gene-finding.

Asunto(s)

Intrones , Precursores del ARN/genética , Empalme del ARN , ARN Mensajero/genética , Saccharomyces cerevisiae/genética , Algoritmos , Emparejamiento Base , Biología Computacional , Genes Fúngicos , Genoma Fúngico , Mutación , Conformación de Ácido Nucleico , ARN de Hongos/genética , Proteínas Ribosómicas/genética , Proteínas de Saccharomyces cerevisiae/genética

19.

Efficient parameter estimation for RNA secondary structure prediction.

Andronescu, Mirela; Condon, Anne; Hoos, Holger H; Mathews, David H; Murphy, Kevin P.

Bioinformatics ; 23(13): i19-28, 2007 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-17646296

RESUMEN

MOTIVATION: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. RESULTS: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods. AVAILABILITY: Our CG implementation is available at http://www.rnasoft.ca/CG/.

Asunto(s)

Algoritmos , Modelos Químicos , Modelos Moleculares , ARN/química , ARN/ultraestructura , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Simulación por Computador , Datos de Secuencia Molecular , Conformación de Ácido Nucleico

20.

An adaptive bin framework search method for a beta-sheet protein homopolymer model.

Shmygelska, Alena; Hoos, Holger H.

BMC Bioinformatics ; 8: 136, 2007 Apr 24.

Artículo en Inglés | MEDLINE | ID: mdl-17451609

RESUMEN

BACKGROUND: The problem of protein structure prediction consists of predicting the functional or native structure of a protein given its linear sequence of amino acids. This problem has played a prominent role in the fields of biomolecular physics and algorithm design for over 50 years. Additionally, its importance increases continually as a result of an exponential growth over time in the number of known protein sequences in contrast to a linear increase in the number of determined structures. Our work focuses on the problem of searching an exponentially large space of possible conformations as efficiently as possible, with the goal of finding a global optimum with respect to a given energy function. This problem plays an important role in the analysis of systems with complex search landscapes, and particularly in the context of ab initio protein structure prediction. RESULTS: In this work, we introduce a novel approach for solving this conformation search problem based on the use of a bin framework for adaptively storing and retrieving promising locally optimal solutions. Our approach provides a rich and general framework within which a broad range of adaptive or reactive search strategies can be realized. Here, we introduce adaptive mechanisms for choosing which conformations should be stored, based on the set of conformations already stored in memory, and for biasing choices when retrieving conformations from memory in order to overcome search stagnation. CONCLUSION: We show that our bin framework combined with a widely used optimization method, Monte Carlo search, achieves significantly better performance than state-of-the-art generalized ensemble methods for a well-known protein-like homopolymer model on the face-centered cubic lattice.

Asunto(s)

Bases de Datos de Proteínas , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestructura , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Simulación por Computador , Almacenamiento y Recuperación de la Información/métodos , Datos de Secuencia Molecular , Conformación Proteica

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA