Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Sci Data ; 9(1): 229, 2022 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-35610234

RESUMO

We present six datasets containing telemetry data of the Mars Express Spacecraft (MEX), a spacecraft orbiting Mars operated by the European Space Agency. The data consisting of context data and thermal power consumption measurements, capture the status of the spacecraft over three Martian years, sampled at six different time resolutions that range from 1 min to 60 min. From a data analysis point-of-view, these data are challenging even for the more sophisticated state-of-the-art artificial intelligence methods. In particular, given the heterogeneity, complexity, and magnitude of the data, they can be employed in a variety of scenarios and analyzed through the prism of different machine learning tasks, such as multi-target regression, learning from data streams, anomaly detection, clustering, etc. Analyzing MEX's telemetry data is critical for aiding very important decisions regarding the spacecraft's status and operation, extracting novel knowledge, and monitoring the spacecraft's health, but the data can also be used to benchmark artificial intelligence methods designed for a variety of tasks.

2.
Sci Rep ; 12(1): 7267, 2022 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-35508507

RESUMO

Multilabel classification (MLC) is a machine learning task where the goal is to learn to label an example with multiple labels simultaneously. It receives increasing interest from the machine learning community, as evidenced by the increasing number of papers and methods that appear in the literature. Hence, ensuring proper, correct, robust, and trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved by adhering to the recently emerged data management standards, such as the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) principles. We introduce an ontology-based online catalogue of MLC datasets originating from various application domains following these principles. The catalogue extensively describes many MLC datasets with comprehensible meta-features, MLC-specific semantic descriptions, and different data provenance information. The MLC data catalogue is available at: http://semantichub.ijs.si/MLCdatasets .


Assuntos
Aprendizado de Máquina , Semântica , Publicações
3.
Cell Death Dis ; 13(1): 2, 2021 12 17.
Artigo em Inglês | MEDLINE | ID: mdl-34916483

RESUMO

Therapies halting the progression of fibrosis are ineffective and limited. Activated myofibroblasts are emerging as important targets in the progression of fibrotic diseases. Previously, we performed a high-throughput screen on lung fibroblasts and subsequently demonstrated that the inhibition of myofibroblast activation is able to prevent lung fibrosis in bleomycin-treated mice. High-throughput screens are an ideal method of repurposing drugs, yet they contain an intrinsic limitation, which is the size of the library itself. Here, we exploited the data from our "wet" screen and used "dry" machine learning analysis to virtually screen millions of compounds, identifying novel anti-fibrotic hits which target myofibroblast differentiation, many of which were structurally related to dopamine. We synthesized and validated several compounds ex vivo ("wet") and confirmed that both dopamine and its derivative TS1 are powerful inhibitors of myofibroblast activation. We further used RNAi-mediated knock-down and demonstrated that both molecules act through the dopamine receptor 3 and exert their anti-fibrotic effect by inhibiting the canonical transforming growth factor ß pathway. Furthermore, molecular modelling confirmed the capability of TS1 to bind both human and mouse dopamine receptor 3. The anti-fibrotic effect on human cells was confirmed using primary fibroblasts from idiopathic pulmonary fibrosis patients. Finally, TS1 prevented and reversed disease progression in a murine model of lung fibrosis. Both our interdisciplinary approach and our novel compound TS1 are promising tools for understanding and combating lung fibrosis.


Assuntos
Bleomicina/efeitos adversos , Descoberta de Drogas/métodos , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Ensaios de Triagem em Larga Escala/métodos , Fibrose Pulmonar Idiopática/induzido quimicamente , Fibrose Pulmonar Idiopática/terapia , Pneumopatias/induzido quimicamente , Pneumopatias/terapia , Aprendizado de Máquina/normas , Miofibroblastos/metabolismo , Animais , Diferenciação Celular , Humanos , Fibrose Pulmonar Idiopática/patologia , Pneumopatias/patologia , Camundongos , Transfecção
4.
PeerJ Comput Sci ; 7: e506, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33987461

RESUMO

Semi-supervised learning combines supervised and unsupervised learning approaches to learn predictive models from both labeled and unlabeled data. It is most appropriate for problems where labeled examples are difficult to obtain but unlabeled examples are readily available (e.g., drug repurposing). Semi-supervised predictive clustering trees (SSL-PCTs) are a prominent method for semi-supervised learning that achieves good performance on various predictive modeling tasks, including structured output prediction tasks. The main issue, however, is that the learning time scales quadratically with the number of features. In contrast to axis-parallel trees, which only use individual features to split the data, oblique predictive clustering trees (SPYCTs) use linear combinations of features. This makes the splits more flexible and expressive and often leads to better predictive performance. With a carefully designed criterion function, we can use efficient optimization techniques to learn oblique splits. In this paper, we propose semi-supervised oblique predictive clustering trees (SSL-SPYCTs). We adjust the split learning to take unlabeled examples into account while remaining efficient. The main advantage over SSL-PCTs is that the proposed method scales linearly with the number of features. The experimental evaluation confirms the theoretical computational advantage and shows that SSL-SPYCTs often outperform SSL-PCTs and supervised PCTs both in single-tree setting and ensemble settings. We also show that SSL-SPYCTs are better at producing meaningful feature importance scores than supervised SPYCTs when the amount of labeled data is limited.

5.
Comput Biol Med ; 130: 104197, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33429140

RESUMO

Machine learning methods are commonly used for predicting molecular properties to accelerate material and drug design. An important part of this process is deciding how to represent the molecules. Typically, machine learning methods expect examples represented by vectors of values, and many methods for calculating molecular feature representations have been proposed. In this paper, we perform a comprehensive comparison of different molecular features, including traditional methods such as fingerprints and molecular descriptors, and recently proposed learnable representations based on neural networks. Feature representations are evaluated on 11 benchmark datasets, used for predicting properties and measures such as mutagenicity, melting points, activity, solubility, and IC50. Our experiments show that several molecular features work similarly well over all benchmark datasets. The ones that stand out most are Spectrophores, which give significantly worse performance than other features on most datasets. Molecular descriptors from the PaDEL library seem very well suited for predicting physical properties of molecules. Despite their simplicity, MACCS fingerprints performed very well overall. The results show that learnable representations achieve competitive performance compared to expert based representations. However, task-specific representations (graph convolutions and Weave methods) rarely offer any benefits, even though they are computationally more demanding. Lastly, combining different molecular feature representations typically does not give a noticeable improvement in performance compared to individual feature representations.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Desenho de Fármacos
6.
Comput Biol Med ; 128: 104143, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33307385

RESUMO

The task of biomarker discovery is best translated to the machine learning task of feature ranking. Namely, the goal of biomarker discovery is to identify a set of potentially viable targets for addressing a given biological status. This is aligned with the definition of feature ranking and its goal - to produce a list of features ordered by their importance for the target concept. This differs from the task of feature selection (typically used for biomarker discovery) in that it catches viable biomarkers that have redundant or overlapping information with often highly important biomarkers, while with feature selection this is not the case. We propose to use a methodology for evaluating feature rankings to assess the quality of a given feature ranking and to discover the best cut-off point. We demonstrate the effectiveness of the proposed methodology on 10 datasets containing data about embryonal tumors. We evaluate two most commonly used feature ranking algorithms (Random forests and RReliefF) and using the evaluation methodology identifies a set of viable biomarkers that have been confirmed to be related to cancer.


Assuntos
Neoplasias Embrionárias de Células Germinativas , Neoplasias , Algoritmos , Biomarcadores , Humanos , Aprendizado de Máquina
7.
Trends Food Sci Technol ; 104: 268-272, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32905099

RESUMO

BACKGROUND: The COVID-19 pandemic affects all aspects of human life including their food consumption. The changes in the food production and supply processes introduce changes to the global dietary patterns. SCOPE AND APPROACH: To study the COVID-19 impact on food consumption process, we have analyzed two data sets that consist of food preparation recipes published before (69,444) and during the quarantine (10,009) period. Since working with large data sets is a time-consuming task, we have applied a recently proposed artificial intelligence approach called DietHub. The approach uses the recipe preparation description (i.e. text) and automatically provides a list of main ingredients annotated using the Hansard semantic tags. After extracting the semantic tags of the ingredients for every recipe, we have compared the food consumption patterns between the two data sets by comparing the relative frequency of the ingredients that compose the recipes. KEY FINDINGS AND CONCLUSIONS: Using the AI methodology, the changes in the food consumption patterns before and during the COVID-19 pandemic are obvious. The highest positive difference in the food consumption can be found in foods such as "Pulses/ plants producing pulses", "Pancake/Tortilla/Outcake", and "Soup/pottage", which increase by 300%, 280%, and 100%, respectively. Conversely, the largest decrease in consumption can be food for food such as "Order Perciformes (type of fish)", "Corn/cereals/grain", and "Wine-making", with a reduction of 50%, 40%, and 30%, respectively. This kind of analysis is valuable in times of crisis and emergencies, which is a very good example of the scientific support that regulators require in order to take quick and appropriate response.

8.
PeerJ Comput Sci ; 6: e310, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33816961

RESUMO

In this article, we propose a method for evaluating feature ranking algorithms. A feature ranking algorithm estimates the importance of descriptive features when predicting the target variable, and the proposed method evaluates the correctness of these importance values by computing the error measures of two chains of predictive models. The models in the first chain are built on nested sets of top-ranked features, while the models in the other chain are built on nested sets of bottom ranked features. We investigate which predictive models are appropriate for building these chains, showing empirically that the proposed method gives meaningful results and can detect differences in feature ranking quality. This is first demonstrated on synthetic data, and then on several real-world classification benchmark problems.

9.
J Dairy Sci ; 102(11): 10639-10656, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31447146

RESUMO

Nutrient management on grazed grasslands is of critical importance to maintain productivity levels, as grass is the cheapest feed for ruminants and underpins these meat and milk production systems. Many attempts have been made to model the relationships between controllable (crop and soil fertility management) and noncontrollable influencing factors (weather, soil drainage) and nutrient/productivity levels. However, to the best of our knowledge not much research has been performed on modeling the interconnections between the influencing factors on one hand and nutrient uptake/herbage production on the other hand, by using data-driven modeling techniques. Our paper proposes to use predictive clustering trees (PCT) learned for building models on data from dairy farms in the Republic of Ireland. The PCT models show good accuracy in estimating herbage production and nutrient uptake. They are also interpretable and are found to embody knowledge that is in accordance with existing theoretical understanding of the task at hand. Moreover, if we combine more PCT into an ensemble of PCT (random forest of PCT), we can achieve improved accuracy of the estimates. In practical terms, the number of grazings, which is related proportionally with soil drainage class, is one of the most important factors that moderates the herbage production potential and nutrient uptake. Furthermore, we found the nutrient (N, P, and K) uptake and herbage nutrient concentration to be conservative in fields that had medium yield potential (11 t of dry matter per hectare on average), whereas nutrient uptake was more variable and potentially limiting in fields that had higher and lower herbage production. Our models also show that phosphorus is the most limiting nutrient for herbage production across the fields on these Irish dairy farms, followed by nitrogen and potassium.


Assuntos
Ração Animal , Bovinos/metabolismo , Indústria de Laticínios/métodos , Aprendizado de Máquina , Nutrientes/metabolismo , Ração Animal/análise , Animais , Dieta/veterinária , Feminino , Irlanda , Lactação , Leite , Poaceae
10.
Nat Commun ; 9(1): 358, 2018 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-29367740

RESUMO

Antibiotic resistance poses rapidly increasing global problems in combatting multidrug-resistant (MDR) infectious diseases like MDR tuberculosis, prompting for novel approaches including host-directed therapies (HDT). Intracellular pathogens like Salmonellae and Mycobacterium tuberculosis (Mtb) exploit host pathways to survive. Only very few HDT compounds targeting host pathways are currently known. In a library of pharmacologically active compounds (LOPAC)-based drug-repurposing screen, we identify multiple compounds, which target receptor tyrosine kinases (RTKs) and inhibit intracellular Mtb and Salmonellae more potently than currently known HDT compounds. By developing a data-driven in silico model based on confirmed targets from public databases, we successfully predict additional efficacious HDT compounds. These compounds target host RTK signaling and inhibit intracellular (MDR) Mtb. A complementary human kinome siRNA screen independently confirms the role of RTK signaling and kinases (BLK, ABL1, and NTRK1) in host control of Mtb. These approaches validate RTK signaling as a drugable host pathway for HDT against intracellular bacteria.


Assuntos
Antibacterianos/farmacologia , Inibidores Enzimáticos/farmacologia , Mycobacterium tuberculosis/efeitos dos fármacos , Receptores Proteína Tirosina Quinases/antagonistas & inibidores , Infecções por Salmonella/enzimologia , Salmonella typhimurium/efeitos dos fármacos , Tuberculose/enzimologia , Linhagem Celular , Biologia Computacional , Farmacorresistência Bacteriana , Interações Hospedeiro-Patógeno/efeitos dos fármacos , Humanos , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/fisiologia , Receptores Proteína Tirosina Quinases/genética , Receptores Proteína Tirosina Quinases/metabolismo , Infecções por Salmonella/genética , Infecções por Salmonella/microbiologia , Salmonella typhimurium/genética , Salmonella typhimurium/fisiologia , Transdução de Sinais/efeitos dos fármacos , Tuberculose/genética , Tuberculose/microbiologia
11.
PLoS One ; 11(12): e0169116, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28036382

RESUMO

The food- and airborne fungal genus Wallemia comprises seven xerophilic and halophilic species: W. sebi, W. mellicola, W. canadensis, W. tropicalis, W. muriae, W. hederae and W. ichthyophaga. All listed species are adapted to low water activity and can contaminate food preserved with high amounts of salt or sugar. In relation to food safety, the effect of high salt and sugar concentrations on the production of secondary metabolites by this toxigenic fungus was investigated. The secondary metabolite profiles of 30 strains of the listed species were examined using general growth media, known to support the production of secondary metabolites, supplemented with different concentrations of NaCl, glucose and MgCl2. In more than two hundred extracts approximately one hundred different compounds were detected using high-performance liquid chromatography-diode array detection (HPLC-DAD). Although the genome data analysis of W. mellicola (previously W. sebi sensu lato) and W. ichthyophaga revealed a low number of secondary metabolites clusters, a substantial number of secondary metabolites were detected at different conditions. Machine learning analysis of the obtained dataset showed that NaCl has higher influence on the production of secondary metabolites than other tested solutes. Mass spectrometric analysis of selected extracts revealed that NaCl in the medium affects the production of some compounds with substantial biological activities (wallimidione, walleminol, walleminone, UCA 1064-A and UCA 1064-B). In particular an increase in NaCl concentration from 5% to 15% in the growth media increased the production of the toxic metabolites wallimidione, walleminol and walleminone.


Assuntos
Basidiomycota/genética , Basidiomycota/metabolismo , Ambientes Extremos , Micotoxinas/metabolismo , Metabolismo Secundário/genética , Cloreto de Sódio/metabolismo , Azasteroides/metabolismo , Basidiomycota/classificação , Colestadienóis/metabolismo , Cromatografia Líquida de Alta Pressão , Contaminação de Alimentos , Microbiologia de Alimentos , Glucose/metabolismo , Cloreto de Magnésio/metabolismo , Metabolismo Secundário/fisiologia , Sesquiterpenos/metabolismo
12.
Comput Methods Programs Biomed ; 122(2): 136-48, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26254827

RESUMO

The paper addresses the issue of non-invasive real-time prediction of hidden inner body temperature variables during therapeutic cooling or heating and proposes a solution that uses computer simulations and machine learning. The proposed approach is applied on a real-world problem in the domain of biomedicine - prediction of inner knee temperatures during therapeutic cooling (cryotherapy) after anterior cruciate ligament (ACL) reconstructive surgery. A validated simulation model of the cryotherapeutic treatment is used to generate a substantial amount of diverse data from different simulation scenarios. We apply machine learning methods on the simulated data to construct a predictive model that provides a prediction for the inner temperature variable based on other system variables whose measurement is more feasible, i.e. skin temperatures. First, we perform feature ranking using the RReliefF method. Next, based on the feature ranking results, we investigate the predictive performance and time/memory efficiency of several predictive modeling methods: linear regression, regression trees, model trees, and ensembles of regression and model trees. Results have shown that using only temperatures from skin sensors as input attributes gives excellent prediction for the temperature in the knee center. Moreover, satisfying predictive accuracy is also achieved using short history of temperatures from just two skin sensors (placed anterior and posterior to the knee) as input variables. The model trees perform the best with prediction error in the same range as the accuracy of the simulated data (0.1°C). Furthermore, they satisfy the requirements for small memory size and real-time response. We successfully validate the best performing model tree with real data from in vivo temperature measurement from a patient undergoing cryotherapy after ACL reconstruction.


Assuntos
Reconstrução do Ligamento Cruzado Anterior/reabilitação , Hipotermia Induzida/métodos , Joelho/fisiopatologia , Modelos Biológicos , Terapia Assistida por Computador/métodos , Termografia/métodos , Temperatura Corporal , Simulação por Computador , Sistemas Computacionais , Humanos , Joelho/cirurgia , Aprendizado de Máquina , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Condutividade Térmica
13.
Comput Med Imaging Graph ; 39: 14-26, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24997992

RESUMO

In this paper, we present the approach that we applied to the medical modality classification tasks at the ImageCLEF evaluation forum. More specifically, we used the modality classification databases from the ImageCLEF competitions in 2011, 2012 and 2013, described by four visual and one textual types of features, and combinations thereof. We used local binary patterns, color and edge directivity descriptors, fuzzy color and texture histogram and scale-invariant feature transform (and its variant opponentSIFT) as visual features and the standard bag-of-words textual representation coupled with TF-IDF weighting. The results from the extensive experimental evaluation identify the SIFT and opponentSIFT features as the best performing features for modality classification. Next, the low-level fusion of the visual features improves the predictive performance of the classifiers. This is because the different features are able to capture different aspects of an image, their combination offering a more complete representation of the visual content in an image. Moreover, adding textual features further increases the predictive performance. Finally, the results obtained with our approach are the best results reported on these databases so far.


Assuntos
Algoritmos , Documentação/métodos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Sistemas de Informação em Radiologia/organização & administração , Inteligência Artificial , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Terminologia como Assunto , Interface Usuário-Computador
14.
Front Microbiol ; 5: 708, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25566222

RESUMO

It is well known that few halophilic bacteria and archaea as well as certain fungi can grow at the highest concentrations of NaCl. However, data about possible life at extremely high concentrations of various others kosmotropic (stabilizing; like NaCl, KCl, and MgSO4) and chaotropic (destabilizing) salts (NaBr, MgCl2, and CaCl2) are scarce for prokaryotes and almost absent for the eukaryotic domain including fungi. Fungi from diverse (extreme) environments were tested for their ability to grow at the highest concentrations of kosmotropic and chaotropic salts ever recorded to support life. The majority of fungi showed preference for relatively high concentrations of kosmotropes. However, our study revealed the outstanding tolerance of several fungi to high concentrations of MgCl2 (up to 2.1 M) or CaCl2 (up to 2.0 M) without compensating kosmotropic salts. Few species, for instance Hortaea werneckii, Eurotium amstelodami, Eurotium chevalieri and Wallemia ichthyophaga, are able to thrive in media with the highest salinities of all salts (except for CaCl2 in the case of W. ichthyophaga). The upper concentration of MgCl2 to support fungal life in the absence of kosmotropes (2.1 M) is much higher than previously determined to be the upper limit for microbial growth (1.26 M). No fungal representatives showed exclusive preference for only chaotropic salts (being obligate chaophiles). Nevertheless, our study expands the knowledge of possible active life by a diverse set of fungi in biologically detrimental chaotropic environments.

15.
J Environ Qual ; 40(6): 1972-82, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22031581

RESUMO

The amount of biosolids recycled in agriculture has steadily increased during the last decades. However, few models are available to predict the accompanying risks, mainly due to the presence of trace element and organic contaminants, and benefits for soil fertility of their application. This paper deals with using data mining to assess the benefits and risks of biosolids application in agriculture. The analyzed data come from a 10-yr field experiment in northeast France focusing on the effects of biosolid application and mineral fertilization on soil fertility and contamination. Biosolids were applied at agriculturally recommended rates. Biosolids had a significant effect on soil fertility, causing in particular a persistent increase in plant-available phosphorus (P) relative to plots receiving mineral fertilizer. However, soil fertility at seeding and crop management method had greater effects than biosolid application on soil fertility at harvest, especially soil nitrogen (N) content. Levels of trace elements and organic contaminants in soils remained below legal threshold values. Levels of extractable metals correlated more strongly than total metal levels with other factors. Levels of organic contaminants, particularly polycyclic aromatic hydrocarbons, were linked to total metal levels in biosolids and treated soil. This study confirmed that biosolid application at rates recommended for agriculture is a safe option for increasing soil fertility. However, the quality of the biosolids selected has to be taken into account. The results also indicate the power of data mining in examining links between parameters in complex data sets.


Assuntos
Mineração de Dados , Eliminação de Resíduos/métodos , Solo/química , Agricultura , Técnicas de Apoio para a Decisão , Metais , Modelos Teóricos , Compostos Orgânicos , Poluentes do Solo
16.
BMC Bioinformatics ; 11: 2, 2010 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-20044933

RESUMO

BACKGROUND: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. RESULTS: We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. CONCLUSIONS: Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.


Assuntos
Árvores de Decisões , Perfilação da Expressão Gênica/métodos , Algoritmos , Inteligência Artificial , Biologia Computacional , Genes , Fases de Leitura Aberta/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA