Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
1.
Mutagenesis ; 34(1): 33-40, 2019 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-30541036

RESUMO

Valid and predictive models for classifying Ames mutagenicity have been developed using conformal prediction. The models are Random Forest models using signature molecular descriptors. The investigation indicates, on excluding not-strongly mutagenic compounds (class B), that the validity for mutagenic compounds is increased for the predictions based on both public and the Division of Genetics and Mutagenesis, National Institute of Health Sciences of Japan (DGM/NIHS) data while less so when using only the latter data source. The former models only result in valid predictions for the majority, non-mutagenic, class whereas the latter models are valid for both classes, i.e. mutagenic and non-mutagenic compounds. These results demonstrate the importance of data consistency manifested through the superior predictive quality and validity of the models based only on DGM/NIHS generated data compared to a combination of this data with public data sources.


Assuntos
Testes de Mutagenicidade/tendências , Mutagênicos/toxicidade , Relação Quantitativa Estrutura-Atividade , Simulação por Computador , Japão , Mutagênese/genética
2.
J Chem Inf Model ; 59(3): 962-972, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30408959

RESUMO

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Aprendizado de Máquina , Redes Neurais de Computação
3.
J Chem Inf Model ; 59(3): 1230-1237, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30726080

RESUMO

Iterative screening has emerged as a promising approach to increase the efficiency of high-throughput screening (HTS) campaigns in drug discovery. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models. One of the challenges of iterative screening is to decide how many iterations to perform. This is mainly related to difficulties in estimating the prospective hit rate in any given iteration. In this article, a novel method based on Venn-ABERS predictors is proposed. The method provides accurate estimates of the number of hits retrieved in any given iteration during an HTS campaign. The estimates provide the necessary information to support the decision on the number of iterations needed to maximize the screening outcome. Thus, this method offers a prospective screening strategy for early-stage drug discovery.


Assuntos
Biologia Computacional/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Ensaios de Triagem em Larga Escala , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade
4.
J Chem Inf Model ; 58(5): 1132-1140, 2018 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-29701973

RESUMO

Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the resultant prediction intervals to create as efficient (i.e., narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges, and the different approaches were evaluated on 29 publicly available data sets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals, but other approaches were almost as efficient. This approach afforded an average prediction range of 1.65 pIC50 units at the 80% confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.


Assuntos
Informática/métodos , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Incerteza , Tomada de Decisões
5.
J Arthroplasty ; 33(1): 51-54, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28844765

RESUMO

BACKGROUND: Considerable blood loss which requires transfusion is frequently reported after total hip and knee arthroplasties (THA and TKA). The purpose of this study is to review the transfusion rates in contemporary THA and TKA with optimized perioperative protocols including minimized surgical trauma and optimal perioperative patient care. METHODS: This retrospective study included 1442 consecutive patients receiving either a primary THA or a TKA from the same high-volume surgeon between January 2008 and December 2015. Demographics and surgical data were collected from patients' journals. Estimated blood loss, decline in hemoglobin, and use of transfusion were registered. RESULTS: One (0.0013%) THA and 3 (0.0044%) TKAs required blood transfusion postoperatively. Average measured bleeding was 253 mL ± 142 and 207 mL ± 169 in THA and TKA, respectively. Average decline in hemoglobin was 23.5 g/L ± 11.4 and 22.9 g/L ± 11.6 for THA and TKA, respectively. CONCLUSION: In contemporary THA and TKA, perioperative protocols and patient optimization can decrease the rate of blood transfusion to near zero.


Assuntos
Artroplastia de Quadril/estatística & dados numéricos , Artroplastia do Joelho/estatística & dados numéricos , Transfusão de Sangue/estatística & dados numéricos , Adulto , Idoso , Idoso de 80 Anos ou mais , Lista de Checagem , Feminino , Hemoglobinas/análise , Hemorragia , Humanos , Masculino , Pessoa de Meia-Idade , Assistência Perioperatória , Período Pós-Operatório , Estudos Retrospectivos
6.
J Chem Inf Model ; 57(7): 1591-1598, 2017 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-28628322

RESUMO

Conformal prediction has been proposed as a more rigorous way to define prediction confidence compared to other application domain concepts that have earlier been used for QSAR modeling. One main advantage of such a method is that it provides a prediction region potentially with multiple predicted labels, which contrasts to the single valued (regression) or single label (classification) output predictions by standard QSAR modeling algorithms. Standard conformal prediction might not be suitable for imbalanced data sets. Therefore, Mondrian cross-conformal prediction (MCCP) which combines the Mondrian inductive conformal prediction with cross-fold calibration sets has been introduced. In this study, the MCCP method was applied to 18 publicly available data sets that have various imbalance levels varying from 1:10 to 1:1000 (ratio of active/inactive compounds). Our results show that MCCP in general performed well on bioactivity data sets with various imbalance levels. More importantly, the method not only provides confidence of prediction and prediction regions compared to standard machine learning methods but also produces valid predictions for the minority class. In addition, a compound similarity based nonconformity measure was investigated. Our results demonstrate that although it gives valid predictions, its efficiency is much worse than that of model dependent metrics.


Assuntos
Informática/métodos , Relação Quantitativa Estrutura-Atividade , Algoritmos , Conformação Molecular
7.
Eat Weight Disord ; 21(4): 607-616, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27170194

RESUMO

PURPOSE: The main aim of this clinical study was to explore how adolescent patients with eating disorders and their parents report their perceived self-image, using Structural Analysis of Social Behavior (SASB), before and after treatment at an intensive outpatient program. Another aim was to relate the self-image of the young patients to the outcome measures body mass index (BMI) and Children's Global Assessment Scale (C-GAS) score. METHODS: A total of 93 individuals (32 adolescents, 34 mothers, and 27 fathers) completed the SASB self-report questionnaire before and after family-based treatment combined with an individual approach at a child and youth psychiatry day care unit. The patients were also assessed using the C-GAS, and their BMI was calculated. RESULTS: The self-image (SASB) of the adolescent patients was negative before treatment and changed to positive after treatment, especially regarding the clusters self-love (higher) and self-blame (lower). A positive correlation between change in self-love and in C-GAS score was found, which rose significantly. Increased self-love was an important factor, explaining a variance of 26 %. BMI also increased significantly, but without any correlation to change in SASB. The patients' fathers exhibited low on the cluster self-protection. Mothers' profiles were in line with a non-clinical group. CONCLUSIONS: Results indicate that the self-image of adolescent patients change from negative to positive alongside with a mainly positive outcome of the ED after treatment. Low self-protection according to SASB among fathers suggests the need for greater focus on their involvement.


Assuntos
Imagem Corporal/psicologia , Terapia Familiar , Transtornos da Alimentação e da Ingestão de Alimentos/psicologia , Transtornos da Alimentação e da Ingestão de Alimentos/terapia , Autoimagem , Adolescente , Criança , Feminino , Humanos , Masculino , Pacientes Ambulatoriais , Pais , Resultado do Tratamento
8.
J Chem Inf Model ; 55(1): 125-34, 2015 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-25406036

RESUMO

We consider the impact of gross, systematic, and random experimental errors in relation to their impact on the predictive ability of QSAR/QSPR DMPK models used within early drug discovery. Models whose training sets contain fewer but repeatedly measured data points, with a defined threshold for the random error, resulted in prediction improvements ranging from 3.3% to 23.0% for an external test set, compared to models built from training sets in which the molecules were defined by single measurements. Similarly, models built on data with low experimental uncertainty, compared to those built on data with higher experimental uncertainty, gave prediction improvements ranging from 3.3% to 27.5%.


Assuntos
Preparações Farmacêuticas/metabolismo , Relação Quantitativa Estrutura-Atividade , Animais , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos/métodos , Humanos , Farmacocinética , Projetos de Pesquisa
9.
J Chem Inf Model ; 55(1): 19-25, 2015 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-25493610

RESUMO

Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.


Assuntos
Biologia Computacional/métodos , Metodologias Computacionais , Bases de Dados de Compostos Químicos , Descoberta de Drogas/métodos , Relação Quantitativa Estrutura-Atividade , Bases de Dados Factuais , Internet , Ligantes , Software
10.
Regul Toxicol Pharmacol ; 71(2): 279-84, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25559551

RESUMO

Conformal prediction is presented as a framework which fulfills the OECD principles on (Q)SAR. It offers an intuitive extension to the application of machine-learning methods to structure-activity data where focus is on predictions with pre-defined confidence levels. A conformal predictor will make correct predictions on new compounds corresponding to a user defined confidence level. The confidence level can be altered depending on the situation the predictor is being used in, which allows for flexibility and adaption to risks that the user is willing to take. We demonstrate the usefulness of conformal prediction by applying it to 2 publicly available CAESAR binary classification datasets.


Assuntos
Bases de Dados Factuais , Controle de Medicamentos e Entorpecentes/legislação & jurisprudência , Modelos Teóricos , Conformação Molecular , Controle de Medicamentos e Entorpecentes/métodos , Previsões , Relação Quantitativa Estrutura-Atividade
11.
BMC Fam Pract ; 16: 21, 2015 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-25888369

RESUMO

BACKGROUND: Many physicians in Sweden, as well as in other countries, find the matter of certification of sickness absence (COSA) particularly burdensome. The issuing of COSAs has also been perceived as a work-environment problem among physicians. Among general practitioners (GPs) are the highest proportion of physicians in Sweden who experience difficulties with COSA. Swedish authorities have created several initiatives, by changing the social security system, to improve the rehabilitation of people who are ill and decrease the number of days of sick leave used. The aim of this study was to describe how GPs in Sweden perceive their work with COSA after these changes. METHODS: A descriptive design with a qualitative, inductive focus-group discussion (FGD) approach was used. RESULTS: Four categories emerged from the analysis of FGDs with GPs in Sweden: 1) Physicians' difficulties in their professional role; 2) Collaboration with other professionals facilitates the COSA; 3) Physicians' approach in relation to the patient; 4) An easier COSA process. CONCLUSIONS: Swedish GPs still perceived COSA to be a burdensome task. However, system changes in recent years have facilitated work related to COSA. Cooperation with other professionals on COSA was perceived positively.


Assuntos
Absenteísmo , Clínicos Gerais , Previdência Social , Atitude do Pessoal de Saúde , Documentação , Grupos Focais , Clínicos Gerais/psicologia , Humanos , Prontuários Médicos , Papel Profissional , Pesquisa Qualitativa , Licença Médica/legislação & jurisprudência , Licença Médica/estatística & dados numéricos , Previdência Social/legislação & jurisprudência , Suécia
12.
Bioinformatics ; 29(2): 286-9, 2013 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-23178637

RESUMO

SUMMARY: Bioclipse, a graphical workbench for the life sciences, provides functionality for managing and visualizing life science data. We introduce Bioclipse-R, which integrates Bioclipse and the statistical programming language R. The synergy between Bioclipse and R is demonstrated by the construction of a decision support system for anticancer drug screening and mutagenicity prediction, which shows how Bioclipse-R can be used to perform complex tasks from within a single software system. AVAILABILITY AND IMPLEMENTATION: Bioclipse-R is implemented as a set of Java plug-ins for Bioclipse based on the R-package rj. Source code and binary packages are available from https://github.com/bioclipse and http://www.bioclipse.net/bioclipse-r, respectively. CONTACT: martin.eklund@farmbio.uu.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Disciplinas das Ciências Biológicas , Gráficos por Computador , Software , Antineoplásicos/química , Antineoplásicos/farmacologia , Antineoplásicos/toxicidade , Interpretação Estatística de Dados , Mutagênese , Linguagens de Programação , Relação Quantitativa Estrutura-Atividade , Integração de Sistemas
13.
J Chem Inf Model ; 54(10): 2945-52, 2014 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-25275755

RESUMO

Structural alerts have been one of the backbones of computational toxicology and have applications in many areas including cosmetic, environmental, and pharmaceutical toxicology. The development of structural alerts has always involved a manual analysis of existing data related to a relevant end point followed by the determination of substructures that appear to be related to a specific outcome. The substructures are then analyzed for their utility in posterior validation studies, which at times have stretched over years or even decades. With higher throughput methods now being employed in many areas of toxicology, data sets are growing at an unprecedented rate. This growth has made manual analysis of data sets impractical in many cases. This report outlines a fully automatic method that highlights significant substructures for toxicologically important data sets. The method identifies important substructures by computationally breaking chemical structures into fragments and analyzing those fragments for their contribution to the given activity by the calculation of a p-value and a substructure accuracy. The method is intended to aid the expert in locating and analyzing alerts by automatic retrieval of alerts or by enhancing existing alerts. The method has been applied to a data set of AMES mutagenicity results and compared to the substructures generated by manual curation of this same data set as well as another computationally based substructure identification method. The results show that this method can retrieve significant substructures quickly, that the substructures are comparable and in some cases superior to those derived from manual curation, that the substructures found covers all previously known substructures, and that they can be used to make reasonably accurate predictions of AMES activity.


Assuntos
Modelos Químicos , Mutagênicos/química , Bibliotecas de Moléculas Pequenas/química , Animais , Simulação por Computador , Conjuntos de Dados como Assunto , Desenho de Fármacos , Humanos , Conformação Molecular , Testes de Mutagenicidade , Mutagênicos/toxicidade , Valor Preditivo dos Testes , Bibliotecas de Moléculas Pequenas/toxicidade , Relação Estrutura-Atividade
14.
J Chem Inf Model ; 54(3): 837-43, 2014 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-24460242

RESUMO

Feature selection is an important part of contemporary QSAR analysis. In a recently published paper, we investigated the performance of different feature selection methods in a large number of in silico experiments conducted using real QSAR datasets. However, an interesting question that we did not address is whether certain feature selection methods are better than others in combination with certain learning methods, in terms of producing models with high prediction accuracy. In this report we extend our work from the previous investigation by using four different feature selection methods (wrapper, ReliefF, MARS, and elastic nets), together with eight learners (MARS, elastic net, random forest, SVM, neural networks, multiple linear regression, PLS, kNN) in an empirical investigation to address this question. The results indicate that state-of-the-art learners (random forest, SVM, and neural networks) do not gain prediction accuracy from feature selection, and we found no evidence that a certain feature selection is particularly well-suited for use in combination with a certain learner.


Assuntos
Algoritmos , Relação Quantitativa Estrutura-Atividade , Análise dos Mínimos Quadrados , Modelos Lineares , Redes Neurais de Computação , Software , Máquina de Vetores de Suporte
15.
J Chem Inf Model ; 54(6): 1596-603, 2014 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-24797111

RESUMO

Conformal prediction is introduced as an alternative approach to domain applicability estimation. The advantages of using conformal prediction are as follows: First, the approach is based on a consistent and well-defined mathematical framework. Second, the understanding of the confidence level concept in conformal predictions is straightforward, e.g. a confidence level of 0.8 means that the conformal predictor will commit, at most, 20% errors (i.e., true values outside the assigned prediction range). Third, the confidence level can be varied depending on the situation where the model is to be applied and the consequences of such changes are readily understandable, i.e. prediction ranges are increased or decreased, and the changes can immediately be inspected. We demonstrate the usefulness of conformal prediction by applying it to 10 publicly available data sets.


Assuntos
Simulação por Computador , Modelos Químicos , Relação Quantitativa Estrutura-Atividade , Conformação Molecular , Análise de Regressão
16.
J Chem Inf Model ; 54(11): 3211-7, 2014 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-25318024

RESUMO

QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, γ, and signature height. C is a penalty parameter that limits overfitting, γ controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and γ in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Máquina de Vetores de Suporte , Benchmarking , Relação Quantitativa Estrutura-Atividade
17.
J Chem Inf Model ; 54(10): 2647-53, 2014 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-25230336

RESUMO

When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.


Assuntos
Desenho de Fármacos , Modelos Químicos , Impressão Molecular/métodos , Software , Área Sob a Curva , Simulação por Computador , Bases de Dados de Compostos Químicos , Ligantes , Estrutura Molecular , Curva ROC
18.
J Chem Inf Model ; 54(4): 1117-28, 2014 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-24684732

RESUMO

In a recent study, we presented a novel quantitative-structure-activity-relationship (QSAR) approach, combining R-group signatures and nonlinear support-vector-machines (SVM), to build interpretable local models for congeneric compound sets. Here, we outline further refinements in the fingerprint scheme for the purpose of analyzing and visualizing structure-activity relationships (SAR). The concept of distance encoded R-group signature descriptors is introduced, and we explore the influence of different signature encoding schemes on both interpretability and predictive power of the SVM models using ten public data sets. The R-group and atomic gradients provide a way to interpret SVM models and enable detailed analysis of structure-activity relationships within substituent groups. We discuss applications of the method and show how it can be used to analyze nonadditive SAR and provide intuitive and powerful SAR visualizations.


Assuntos
Dinâmica não Linear , Relação Quantitativa Estrutura-Atividade , Máquina de Vetores de Suporte
19.
J Chem Inf Model ; 54(2): 431-41, 2014 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-24490838

RESUMO

The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite.


Assuntos
Inteligência Artificial , Descoberta de Drogas/métodos , Relação Quantitativa Estrutura-Atividade , Algoritmos , Análise de Regressão , Fatores de Tempo
20.
J Cheminform ; 16(1): 75, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38943219

RESUMO

Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign .Scientific contribution CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance-showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA