Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
1.
J Chem Inf Model ; 64(9): 3670-3688, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38686880

RESUMO

Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.


Assuntos
Redes Neurais de Computação , Aprendizado de Máquina
2.
J Comput Aided Mol Des ; 34(7): 783-803, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32112286

RESUMO

Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules.


Assuntos
Desenho de Fármacos , Algoritmos , Técnicas de Química Sintética/métodos , Técnicas de Química Sintética/estatística & dados numéricos , Química Farmacêutica/métodos , Química Farmacêutica/estatística & dados numéricos , Simulação por Computador , Desenho Assistido por Computador , Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Humanos , Aprendizado de Máquina , Bibliotecas de Moléculas Pequenas
3.
J Chem Inf Model ; 59(1): 98-116, 2019 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-30462505

RESUMO

A framework is presented for the calculation of novel alignment-free descriptors of molecular shape. The methods are based on the technique of spectral geometry which has been developed in the field of computer vision where it has shown impressive performance for the comparison of deformable objects such as people and animals. Spectral geometry techniques encode shape by capturing the curvature of the surface of an object into a compact, information-rich representation that is alignment-free while also being invariant to isometric deformations, that is, changes that do not distort distances over the surface. Here, we adapt the technique to the new domain of molecular shape representation. We describe a series of parametrization steps aimed at optimizing the method for this new domain. Our focus here is on demonstrating that the basic approach is able to capture a molecular shape into a compact and information-rich descriptor. We demonstrate improved performance in virtual screening over a more established alignment-free method and impressive performance compared to a more accurate, but much more computationally demanding, alignment-based approach.


Assuntos
Processamento de Imagem Assistida por Computador , Estrutura Molecular , Algoritmos , Simulação por Computador , Bases de Dados de Compostos Químicos , Modelos Moleculares
4.
J Chem Inf Model ; 59(10): 4167-4187, 2019 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-31529948

RESUMO

Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for "dynamic" reaction fingerprinting to maximize the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organizes labels into four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction data sets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationships between reaction classes.


Assuntos
Química Farmacêutica , Bases de Dados de Compostos Químicos , Aprendizado de Máquina , Modelos Químicos , Estrutura Molecular
5.
J Chem Inf Model ; 55(9): 1781-803, 2015 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-26237649

RESUMO

Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development, mean that there is the potential to apply these techniques to larger data sets and thus to different problems in the future.


Assuntos
Algoritmos , Informática , Estrutura Molecular , Fenômenos Farmacológicos , Relação Estrutura-Atividade
6.
J Chem Inf Model ; 54(12): 3302-19, 2014 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-25379955

RESUMO

Spectral clustering involves placing objects into clusters based on the eigenvectors and eigenvalues of an associated matrix. The technique was first applied to molecular data by Brewer [J. Chem. Inf. Model. 2007, 47, 1727-1733] who demonstrated its use on a very small dataset of 125 COX-2 inhibitors. We have determined suitable parameters for spectral clustering using a wide variety of molecular descriptors and several datasets of a few thousand compounds and compared the results of clustering using a nonoverlapping version of Brewer's use of Sarker and Boyer's algorithm with that of Ward's and k-means clustering. We then replaced the exact eigendecomposition method with two different approximate methods and concluded that Singular Value Decomposition is the most appropriate method for clustering larger compound collections of up to 100,000 compounds. We have also used spectral clustering with the Tversky coefficient to generate two sets of clusters linked by a common set of eigenvalues and have used this novel approach to cluster sets of fragments such as those used in fragment-based drug design.


Assuntos
Algoritmos , Estatística como Assunto/métodos , Análise por Conglomerados , Inibidores de Ciclo-Oxigenase 2/farmacologia , Descoberta de Drogas
7.
J Chem Inf Model ; 54(7): 1864-79, 2014 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-24873983

RESUMO

Knowledge-based systems for toxicity prediction are typically based on rules, known as structural alerts, that describe relationships between structural features and different toxic effects. The identification of structural features associated with toxicological activity can be a time-consuming process and often requires significant input from domain experts. Here, we describe an emerging pattern mining method for the automated identification of activating structural features in toxicity data sets that is designed to help expedite the process of alert development. We apply the contrast pattern tree mining algorithm to generate a set of emerging patterns of structural fragment descriptors. Using the emerging patterns it is possible to form hierarchical clusters of compounds that are defined by the presence of common structural features and represent distinct chemical classes. The method has been tested on a large public in vitro mutagenicity data set and a public hERG channel inhibition data set and is shown to be effective at identifying common toxic features and recognizable classes of toxicants. We also describe how knowledge developers can use emerging patterns to improve the specificity and sensitivity of an existing expert system.


Assuntos
Mineração de Dados/métodos , Toxicologia , Algoritmos , Determinação de Ponto Final , Canais de Potássio Éter-A-Go-Go/antagonistas & inibidores , Testes de Mutagenicidade , Bloqueadores dos Canais de Potássio/toxicidade
8.
Mol Inform ; 43(4): e202300183, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38258328

RESUMO

De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.


Assuntos
Desenho de Fármacos , Inibidores de Poli(ADP-Ribose) Polimerases , Inibidores de Poli(ADP-Ribose) Polimerases/química , Inibidores de Poli(ADP-Ribose) Polimerases/farmacologia , Inibidores de Poli(ADP-Ribose) Polimerases/síntese química , Humanos , Poli(ADP-Ribose) Polimerase-1/antagonistas & inibidores , Poli(ADP-Ribose) Polimerase-1/metabolismo , Software
9.
J Chem Inf Model ; 52(3): 757-69, 2012 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-22324299

RESUMO

Molecular interaction fields provide a useful description of ligand binding propensity and have found widespread use in computer-aided drug design, for example, to characterize protein binding sites and in small molecular applications, such as three-dimensional quantitative structure-activity relationships, physicochemical property prediction, and virtual screening. However, the grids on which the field data are stored are typically very large, consisting of thousands of data points, which make them cumbersome to store and manipulate. The wavelet transform is a commonly used data compression technique, for example, in signal processing and image compression. Here we use the wavelet transform to encode molecular interaction fields as wavelet thumbnails, which represent the original grid data in significantly reduced volumes. We describe a method for aligning wavelet thumbnails based on extracting extrema from the thumbnails and subsequently use them for virtual screening. We demonstrate that wavelet thumbnails provide an effective method of capturing the three-dimensional information encoded in a molecular interaction field.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Modelos Moleculares , Área Sob a Curva , Ligantes , Conformação Molecular , Proteínas/metabolismo , Interface Usuário-Computador
10.
J Chem Inf Model ; 52(11): 3074-87, 2012 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-23092382

RESUMO

The design of new alerts, that is, collections of structural features observed to result in toxicological activity, can be a slow process and may require significant input from toxicology and chemistry experts. A method has therefore been developed to help automate alert identification by mining descriptions of activating structural features directly from toxicity data sets. The method is based on jumping emerging pattern mining which is applied to a set of toxic and nontoxic compounds that are represented using atom pair descriptors. Using the resulting jumping emerging patterns, it is possible to cluster toxic compounds into groups defined by the presence of shared structural features and to arrange the clusters into hierarchies. The methodology has been tested on a number of data sets for Ames mutagenicity, oestrogenicity, and hERG channel inhibition end points. These tests have shown the method to be effective at clustering the data sets around minimal jumping-emerging structural patterns and finding descriptions of potentially activating structural features. Furthermore, the mined structural features have been shown to be related to some of the known alerts for all three tested end points.


Assuntos
Mineração de Dados/métodos , Estrogênios/química , Mutagênicos/química , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Estrogênios/toxicidade , Canais de Potássio Éter-A-Go-Go/antagonistas & inibidores , Humanos , Mutagênicos/toxicidade
11.
J Comput Aided Mol Des ; 26(4): 451-72, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22538643

RESUMO

A program for overlaying multiple flexible molecules has been developed. Candidate overlays are generated by a novel fingerprint algorithm, scored on three objective functions (union volume, hydrogen-bond match, and hydrophobic match), and ranked by constrained Pareto ranking. A diverse subset of the best ranked solutions is chosen using an overlay-dissimilarity metric. If necessary, the solutions can be optimised. A multi-objective genetic algorithm can be used to find additional overlays with a given mapping of chemical features but different ligand conformations. The fingerprint algorithm may also be used to produce constrained overlays, in which user-specified chemical groups are forced to be superimposed. The program has been tested on several sets of ligands, for each of which the true overlay is known from protein-ligand crystal structures. Both objective and subjective success criteria indicate that good results are obtained on the majority of these sets.


Assuntos
Algoritmos , Estrutura Molecular
12.
J Cheminform ; 14(1): 32, 2022 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-35672779

RESUMO

Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale with 417 assays. Moreover, we analyzed in detail the improvements shown by the imputation models. We found that test compounds that were dissimilar to training compounds, as well as test compounds with a large number of experimental values for other assays, showed the largest improvements. We also investigated the impact of sparsity on the improvements seen as well as the relatedness of the assays being considered. Our results show that even a small amount of additional information can provide imputation methods with a strong boost in predictive performance over traditional single task and multi-task predictive models.

13.
Mol Inform ; 41(4): e2100207, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34750989

RESUMO

Reaction-based de novo design refers to the generation of synthetically accessible molecules using transformation rules extracted from known reactions in the literature. In this context, we have previously described the extraction of reaction vectors from a reactions database and their coupling with a structure generation algorithm for the generation of novel molecules from a starting material. An issue when designing molecules from a starting material is the combinatorial explosion of possible product molecules that can be generated, especially for multistep syntheses. Here, we present the development of RENATE, a reaction-based de novo design tool, which is based on a pseudo-retrosynthetic fragmentation of a reference ligand and an inside-out approach to de novo design. The reference ligand is fragmented; each fragment is used to search for similar fragments as building blocks; the building blocks are combined into products using reaction vectors; and a synthetic route is suggested for each product molecule. The RENATE methodology is presented followed by a retrospective validation to recreate a set of approved drugs. Results show that RENATE can generate very similar or even identical structures to the corresponding input drugs, hence validating the fragmentation, search, and design heuristics implemented in the tool.


Assuntos
Algoritmos , Ligantes , Estudos Retrospectivos
14.
Chem Sci ; 12(10): 3768-3785, 2021 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-34163650

RESUMO

Amyloid ß oligomers (Aßo) are the main toxic species in Alzheimer's disease, which have been targeted for single drug treatment with very little success. In this work we report a new approach for identifying functional Aßo binding compounds. A tailored library of 971 fluorine containing compounds was selected by a computational method, developed to generate molecular diversity. These compounds were screened for Aßo binding by a combined 19F and STD NMR technique. Six hits were evaluated in three parallel biochemical and functional assays. Two compounds disrupted Aßo binding to its receptor PrPC in HEK293 cells. They reduced the pFyn levels triggered by Aßo treatment in neuroprogenitor cells derived from human induced pluripotent stem cells (hiPSC). Inhibitory effects on pTau production in cortical neurons derived from hiPSC were also observed. These drug-like compounds connect three of the pillars in Alzheimer's disease pathology, i.e. prion, Aß and Tau, affecting three different pathways through specific binding to Aßo and are, indeed, promising candidates for further development.

15.
J Chem Inf Model ; 50(10): 1872-86, 2010 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-20873842

RESUMO

Previous studies of the analysis of molecular matched pairs (MMPs) have often assumed that the effect of a substructural transformation on a molecular property is independent of the context (i.e., the local structural environment in which that transformation occurs). Experiments with large sets of hERG, solubility, and lipophilicity data demonstrate that the inclusion of contextual information can enhance the predictive power of MMP analyses, with significant trends (both positive and negative) being identified that are not apparent when using conventional, context-independent approaches.


Assuntos
Desenho de Fármacos , Canais de Potássio Éter-A-Go-Go/antagonistas & inibidores , Canais de Potássio Éter-A-Go-Go/metabolismo , Algoritmos , Bases de Dados Factuais , Canais de Potássio Éter-A-Go-Go/química , Humanos , Ligantes , Lipídeos/química , Estrutura Molecular , Solubilidade
16.
Curr Opin Chem Biol ; 12(3): 372-8, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18331851

RESUMO

The high costs associated with high-throughput screening (HTS) coupled with the limited coverage and bias of current screening collections is such that diversity analysis continues to be an important criterion in lead generation. Whereas early approaches to diversity analysis were based on traditional descriptors such as two-dimensional fingerprints a recent emphasis has been on assessing scaffold coverage to ensure that a variety of different chemotypes are represented. Moreover, whether designing diverse or focused libraries, it is widely recognised that designs should aim to achieve a balance in a number of different properties and multiobjective optimisation provides an effective way of achieving such designs.


Assuntos
Desenho de Fármacos , Avaliação Pré-Clínica de Medicamentos/métodos , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Técnicas de Química Combinatória , Reprodutibilidade dos Testes
17.
J Chem Inf Model ; 49(12): 2761-73, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19908873

RESUMO

Two methods are described for biasing conformational search during pharmacophore elucidation using a multiobjective genetic algorithm (MOGA). The MOGA explores conformation on-the-fly while simultaneously aligning a set of molecules such that their pharmacophoric features are maximally overlaid. By using a clique detection method to generate overlays of precomputed conformations to initialize the population (rather than starting from random), the speed of the algorithm has been increased by 2 orders of magnitude. This increase in speed has enabled the program to be applied to greater numbers of molecules than was previously possible. Furthermore, it was found that biasing the conformations explored during search time to those found in the Cambridge Structural Database could also improve the quality of the results.


Assuntos
Algoritmos , Descoberta de Drogas/métodos , Conformação Molecular , Quinase 2 Dependente de Ciclina/antagonistas & inibidores , Quinase 2 Dependente de Ciclina/química , Quinase 2 Dependente de Ciclina/genética , Bases de Dados Factuais , Inibidores Enzimáticos/química , Inibidores Enzimáticos/farmacologia , Humanos , Ligantes , Modelos Moleculares , Mutação , Tetra-Hidrofolato Desidrogenase/química , Tetra-Hidrofolato Desidrogenase/genética , Termodinâmica , Trombina/antagonistas & inibidores , Trombina/química , Trombina/genética
18.
Elife ; 82019 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-31180326

RESUMO

Adgrg6 (Gpr126) is an adhesion class G protein-coupled receptor with a conserved role in myelination of the peripheral nervous system. In the zebrafish, mutation of adgrg6 also results in defects in the inner ear: otic tissue fails to down-regulate versican gene expression and morphogenesis is disrupted. We have designed a whole-animal screen that tests for rescue of both up- and down-regulated gene expression in mutant embryos, together with analysis of weak and strong alleles. From a screen of 3120 structurally diverse compounds, we have identified 68 that reduce versican b expression in the adgrg6 mutant ear, 41 of which also restore myelin basic protein gene expression in Schwann cells of mutant embryos. Nineteen compounds unable to rescue a strong adgrg6 allele provide candidates for molecules that may interact directly with the Adgrg6 receptor. Our pipeline provides a powerful approach for identifying compounds that modulate GPCR activity, with potential impact for future drug design.


Assuntos
Orelha Interna/metabolismo , Bainha de Mielina/metabolismo , Sistema Nervoso Periférico/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Proteínas de Peixe-Zebra/metabolismo , Animais , Orelha Interna/efeitos dos fármacos , Orelha Interna/embriologia , Embrião não Mamífero/efeitos dos fármacos , Embrião não Mamífero/embriologia , Embrião não Mamífero/metabolismo , Regulação da Expressão Gênica no Desenvolvimento/efeitos dos fármacos , Estrutura Molecular , Mutação , Bainha de Mielina/efeitos dos fármacos , Sistema Nervoso Periférico/efeitos dos fármacos , Proteoglicanas/genética , Proteoglicanas/metabolismo , Receptores Acoplados a Proteínas G/genética , Células de Schwann/efeitos dos fármacos , Células de Schwann/metabolismo , Transdução de Sinais/efeitos dos fármacos , Transdução de Sinais/genética , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Peixe-Zebra , Proteínas de Peixe-Zebra/genética
19.
ChemMedChem ; 13(6): 607-613, 2018 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-29314719

RESUMO

Bioisosterism is an important concept in the lead optimisation phase of drug discovery where the aim is to make modifications to parts of a molecule in order to improve some properties while maintaining others. We present an analysis of bioisosteric fragments extracted from the ligands in an established data set consisting of 121 protein targets. A pairwise analysis is carried out of all ligands for a given target. The ligands are fragmented using the BRICS fragmentation scheme and a pair of fragments is deemed to be bioisosteric if they occupy a similar volume of the protein binding site. We consider two levels of generality, one which does not consider the number of attachment points in the fragments and a more restricted case in which both fragments are required to have the same number of attachments. We investigate the extent to which the bioisosteric pairs that are found are common across different target.


Assuntos
Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Proteínas/química , Descoberta de Drogas , Ligantes , Conformação Proteica
20.
J Cheminform ; 10(1): 26, 2018 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-29789977

RESUMO

There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA