Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
J Cheminform ; 14(1): 62, 2022 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-36109826

RESUMO

MOTIVATION: Compound structure identification is using increasingly more sophisticated computational tools, among which machine learning tools are a recent addition that quickly gains in importance. These tools, of which the method titled Compound Structure Identification:Input Output Kernel Regression (CSI:IOKR) is an excellent example, have been used to elucidate compound structure from mass spectral (MS) data with significant accuracy, confidence and speed. They have, however, largely focused on data coming from liquid chromatography coupled to tandem mass spectrometry (LC-MS). Gas chromatography coupled to mass spectrometry (GC-MS) is an alternative which offers several advantages as compared to LC-MS, including higher data reproducibility. Of special importance is the substantial compound coverage offered by GC-MS, further expanded by derivatization procedures, such as silylation, which can improve the volatility, thermal stability and chromatographic peak shape of semi-volatile analytes. Despite these advantages and the increasing size of compound databases and MS libraries, GC-MS data have not yet been used by machine learning approaches to compound structure identification. RESULTS: This study presents a successful application of the CSI:IOKR machine learning method for the identification of environmental contaminants from GC-MS spectra. We use CSI:IOKR as an alternative to exhaustive search of MS libraries, independent of instrumental platform and data processing software. We use a comprehensive dataset of GC-MS spectra of trimethylsilyl derivatives and their molecular structures, derived from a large commercially available MS library, to train a model that maps between spectra and molecular structures. We test the learned model on a different dataset of GC-MS spectra of trimethylsilyl derivatives of environmental contaminants, generated in-house and made publicly available. The results show that 37% (resp. 50%) of the tested compounds are correctly ranked among the top 10 (resp. 20) candidate compounds suggested by the model. Even though spectral comparisons with reference standards or de novo structural elucidations are neccessary to validate the predictions, machine learning provides efficient candidate prioritization and reduction of the time spent for compound annotation.

2.
Sci Data ; 9(1): 229, 2022 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-35610234

RESUMO

We present six datasets containing telemetry data of the Mars Express Spacecraft (MEX), a spacecraft orbiting Mars operated by the European Space Agency. The data consisting of context data and thermal power consumption measurements, capture the status of the spacecraft over three Martian years, sampled at six different time resolutions that range from 1 min to 60 min. From a data analysis point-of-view, these data are challenging even for the more sophisticated state-of-the-art artificial intelligence methods. In particular, given the heterogeneity, complexity, and magnitude of the data, they can be employed in a variety of scenarios and analyzed through the prism of different machine learning tasks, such as multi-target regression, learning from data streams, anomaly detection, clustering, etc. Analyzing MEX's telemetry data is critical for aiding very important decisions regarding the spacecraft's status and operation, extracting novel knowledge, and monitoring the spacecraft's health, but the data can also be used to benchmark artificial intelligence methods designed for a variety of tasks.

3.
Comput Biol Med ; 141: 105001, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34782112

RESUMO

Many clinical studies follow patients over time and record the time until the occurrence of an event of interest (e.g., recovery, death, …). When patients drop out of the study or when their event did not happen before the study ended, the collected dataset is said to contain censored observations. Given the rise of personalized medicine, clinicians are often interested in accurate risk prediction models that predict, for unseen patients, a survival profile, including the expected time until the event. Survival analysis methods are used to detect associations or compare subpopulations of patients in this context. In this article, we propose to cast the time-to-event prediction task as a multi-target regression task, with censored observations modeled as partially labeled examples. We then apply semi-supervised learning to the resulting data representation. More specifically, we use semi-supervised predictive clustering trees and ensembles thereof. Empirical results over eleven real-life datasets demonstrate superior or equivalent predictive performance of the proposed approach as compared to three competitor methods. Moreover, smaller models are obtained compared to random survival forests, another tree ensemble method. Finally, we illustrate the informative feature selection mechanism of our method, by interpreting the splits induced by a single tree model when predicting survival for amyotrophic lateral sclerosis patients.


Assuntos
Aprendizado de Máquina Supervisionado , Análise por Conglomerados , Humanos , Análise Multivariada , Análise de Sobrevida
4.
Cell Death Dis ; 13(1): 2, 2021 12 17.
Artigo em Inglês | MEDLINE | ID: mdl-34916483

RESUMO

Therapies halting the progression of fibrosis are ineffective and limited. Activated myofibroblasts are emerging as important targets in the progression of fibrotic diseases. Previously, we performed a high-throughput screen on lung fibroblasts and subsequently demonstrated that the inhibition of myofibroblast activation is able to prevent lung fibrosis in bleomycin-treated mice. High-throughput screens are an ideal method of repurposing drugs, yet they contain an intrinsic limitation, which is the size of the library itself. Here, we exploited the data from our "wet" screen and used "dry" machine learning analysis to virtually screen millions of compounds, identifying novel anti-fibrotic hits which target myofibroblast differentiation, many of which were structurally related to dopamine. We synthesized and validated several compounds ex vivo ("wet") and confirmed that both dopamine and its derivative TS1 are powerful inhibitors of myofibroblast activation. We further used RNAi-mediated knock-down and demonstrated that both molecules act through the dopamine receptor 3 and exert their anti-fibrotic effect by inhibiting the canonical transforming growth factor ß pathway. Furthermore, molecular modelling confirmed the capability of TS1 to bind both human and mouse dopamine receptor 3. The anti-fibrotic effect on human cells was confirmed using primary fibroblasts from idiopathic pulmonary fibrosis patients. Finally, TS1 prevented and reversed disease progression in a murine model of lung fibrosis. Both our interdisciplinary approach and our novel compound TS1 are promising tools for understanding and combating lung fibrosis.


Assuntos
Bleomicina/efeitos adversos , Descoberta de Drogas/métodos , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Ensaios de Triagem em Larga Escala/métodos , Fibrose Pulmonar Idiopática/induzido quimicamente , Fibrose Pulmonar Idiopática/terapia , Pneumopatias/induzido quimicamente , Pneumopatias/terapia , Aprendizado de Máquina/normas , Miofibroblastos/metabolismo , Animais , Diferenciação Celular , Humanos , Fibrose Pulmonar Idiopática/patologia , Pneumopatias/patologia , Camundongos , Transfecção
5.
PeerJ Comput Sci ; 7: e506, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33987461

RESUMO

Semi-supervised learning combines supervised and unsupervised learning approaches to learn predictive models from both labeled and unlabeled data. It is most appropriate for problems where labeled examples are difficult to obtain but unlabeled examples are readily available (e.g., drug repurposing). Semi-supervised predictive clustering trees (SSL-PCTs) are a prominent method for semi-supervised learning that achieves good performance on various predictive modeling tasks, including structured output prediction tasks. The main issue, however, is that the learning time scales quadratically with the number of features. In contrast to axis-parallel trees, which only use individual features to split the data, oblique predictive clustering trees (SPYCTs) use linear combinations of features. This makes the splits more flexible and expressive and often leads to better predictive performance. With a carefully designed criterion function, we can use efficient optimization techniques to learn oblique splits. In this paper, we propose semi-supervised oblique predictive clustering trees (SSL-SPYCTs). We adjust the split learning to take unlabeled examples into account while remaining efficient. The main advantage over SSL-PCTs is that the proposed method scales linearly with the number of features. The experimental evaluation confirms the theoretical computational advantage and shows that SSL-SPYCTs often outperform SSL-PCTs and supervised PCTs both in single-tree setting and ensemble settings. We also show that SSL-SPYCTs are better at producing meaningful feature importance scores than supervised SPYCTs when the amount of labeled data is limited.

6.
Comput Biol Med ; 130: 104197, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33429140

RESUMO

Machine learning methods are commonly used for predicting molecular properties to accelerate material and drug design. An important part of this process is deciding how to represent the molecules. Typically, machine learning methods expect examples represented by vectors of values, and many methods for calculating molecular feature representations have been proposed. In this paper, we perform a comprehensive comparison of different molecular features, including traditional methods such as fingerprints and molecular descriptors, and recently proposed learnable representations based on neural networks. Feature representations are evaluated on 11 benchmark datasets, used for predicting properties and measures such as mutagenicity, melting points, activity, solubility, and IC50. Our experiments show that several molecular features work similarly well over all benchmark datasets. The ones that stand out most are Spectrophores, which give significantly worse performance than other features on most datasets. Molecular descriptors from the PaDEL library seem very well suited for predicting physical properties of molecules. Despite their simplicity, MACCS fingerprints performed very well overall. The results show that learnable representations achieve competitive performance compared to expert based representations. However, task-specific representations (graph convolutions and Weave methods) rarely offer any benefits, even though they are computationally more demanding. Lastly, combining different molecular feature representations typically does not give a noticeable improvement in performance compared to individual feature representations.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Desenho de Fármacos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA