Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Sci Rep ; 13(1): 13391, 2023 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-37592002

RESUMO

The spread of misinformation on social media can lead to inappropriate behaviors that can make disasters worse. In our study, we focused on tweets containing misinformation about earthquake predictions and analyzed their dynamics. To this end, we retrieved 82,129 tweets over a period of 2 years (March 2020-March 2022) and hand-labeled 4157 tweets. We used RoBERTa to classify the complete dataset and analyzed the results. We found that (1) there are significantly more not-misinformation than misinformation tweets; (2) earthquake predictions are continuously present on Twitter with peaks after felt events; and (3) prediction misinformation tweets sometimes link or tag official earthquake notifications from credible sources. These insights indicate that official institutions present on social media should continuously address misinformation (even in quiet times when no event occurred), check that their institution is not tagged/linked in misinformation tweets, and provide authoritative sources that can be used to support their arguments against unfounded earthquake predictions.

2.
Mol Inform ; 42(4): e2200186, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36617991

RESUMO

QSAR models are widely and successfully used in many research areas. The success of such models highly depends on molecular descriptors typically classified as 1D, 2D, 3D, or 4D. While 3D information is likely important, e. g., for modeling ligand-protein binding, previous comparisons between the performances of 2D and 3D descriptors were inconclusive. Yet in such comparisons the modeled ligands were not necessarily represented by their bioactive conformations. With this in mind, we mined the PDB for sets of protein-ligand complexes sharing the same protein for which uniform activity data were reported. The results, totaling 461 structures spread across six series were compiled into a carefully curated, first of its kind dataset in which each ligand is represented by its bioactive conformation. Next, each set was characterized by 2D, 3D and 2D + 3D descriptors and modeled using three machine learning algorithms, namely, k-Nearest Neighbors, Random Forest and Lasso Regression. Models' performances were evaluated on external test sets derived from the parent datasets either randomly or in a rational manner. We found that many more significant models were obtained when combining 2D and 3D descriptors. We attribute these improvements to the ability of 2D and 3D descriptors to code for different, yet complementary molecular properties.


Assuntos
Proteínas , Relação Quantitativa Estrutura-Atividade , Ligantes , Conformação Molecular , Algoritmos
3.
Soc Media Soc ; 8(4): 20563051221126051, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36245701

RESUMO

The coronavirus disease 2019 (COVID-19) pandemic was an unexpected event and resulted in catastrophic consequences with long-lasting behavioral effects. People began to seek explanations for different aspects of COVID-19 and resorted to conspiracy narratives. The objective of this article is to analyze the changes on the discussion of different COVID-19 conspiracy theories throughout the pandemic on Twitter. We have collected a data set of 1.269 million tweets associated with the discussion on conspiracy theories between January 2020 and November 2021. The data set includes tweets related to eight conspiracy theories: the 5G, Big Pharma, Bill Gates, biological weapon, exaggeration, FilmYourHospital, genetically modified organism (GMO), and the vaccines conspiracy. The analysis highlights several behaviors in the discussion of conspiracy theories and allows categorizing them into four groups. The first group are conspiracy theories that peaked at the beginning of the pandemic and sharply declined afterwards, including the 5G and FilmYourHospital conspiracies. The second group associated with the Big Pharma and vaccination-related conspiracy whose role increased as the pandemic progressed. The third are conspiracies that remained persistent throughout the pandemic such as exaggeration and Bill Gates conspiracies. The fourth are those that had multiple peaks at different times of the pandemic including the GMO and biological weapon conspiracies. In addition, the number of COVID-19 new cases was found to be a significant predictor for the next week tweet frequency for most of the conspiracies.

4.
Mol Inform ; 41(1): e2000173, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-32985106

RESUMO

The ever-growing data acquisition speed represents a challenge for data analysis in materials sciences in general and the field of solar cells in particular. This is because many unsupervised and supervised learning algorithms require model re-derivation when presented with new samples which are markedly different from those used for model construction. Dynamic segmentation addresses this problem by continuously updating the clusters structure, for example, by splitting old clusters or opening new ones, as new samples are presented. In this work we present the application of a Dynamic Classification Unit (DCU) to the study of the photovoltaic space. Using a database of 1165 metal oxide-based solar cells, constructed from five libraries, we demonstrate that the DCU algorithm, when initiated with only 10 % of the database, correctly classified 82 % of the remaining, 90 % samples. At the same time the algorithm unveiled the presence of interesting trends, outliers and compositional activity cliffs. These abilities may prove useful for the analysis of the photovoltaic space and in turn may contribute to the design of solar cells with improved properties. We suggest that DCU and other dynamic clustering methods will find wide applications in the rapidly developing field of materials informatics.


Assuntos
Algoritmos , Ciência dos Materiais , Análise por Conglomerados , Bases de Dados Factuais , Óxidos/química
5.
J Chem Inf Model ; 58(12): 2428-2439, 2018 12 24.
Artigo em Inglês | MEDLINE | ID: mdl-30485100

RESUMO

Visualizing high-dimensional data by projecting them into a two- or three-dimensional space is a popular approach in many scientific fields, including computer-aided drug design and cheminformatics. In contrast, dimensionality reduction techniques have been far less explored for materials informatics. Nevertheless, similar to their usefulness in analyzing the space of, e.g., drug-like molecules, such techniques could provide useful insights on materials space, including an intuitive grasp of the overall distribution of samples, the identification of interesting trends, including the formation of materials clusters and the presence of activity cliffs and outliers, and rational navigation through this space in the search for new materials. Here we present the first application of four dimensionality reduction techniques, namely, principal component analysis (PCA), kernel PCA, Isomap, and diffusion map, to visualize and analyze a part of the materials space populated by solar cells made of metal oxides. Solar cells in general and metal-oxide-based solar cells in particular hold the promise of contributing to the world's search for clean and affordable energy resources. With the exception of PCA, these methods have seldom been used to visualize chemistry space and almost never been used to visualize materials space. For this purpose, we integrated five metal-oxide-based solar cell libraries into a uniform database and subjected it to dimensionality reduction by all four methods, comparing their performances using various criteria such as maintaining the local environment of samples and the clustering structure in the low-dimensional space. We also looked at the number of outliers produced by each method and analyzed common outliers. We found that PCA performs best in terms of the ability to correctly maintain the local environment of samples, whereas Isomap does the best job of assigning class membership on the basis of the identities of nearest neighbors (i.e., it is the best classifier). We also found that many of the outliers identified by all of the methods could be rationalized. We suggest that the methods used in this work could be extended to study other types of solar cells, thereby setting the ground for further analysis of the photovoltaic (PV) space as well as other regions of materials space.


Assuntos
Mineração de Dados , Bibliotecas de Moléculas Pequenas , Energia Solar , Ciência dos Materiais
6.
Mol Inform ; 37(9-10): e1800067, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30022619

RESUMO

This work describes the integration of several data mining and machine learning tools for researching Photovoltaic (PV) solar cells libraries into a unified workflow embedded within a GUI-supported Decision Support System (DSS), named PV Analyzer. The analyzer's workflow is composed of several data analysis components including basic statistical and visualization methods as well as an algorithm for building predictive machine learning models. The analyzer allows for the identification of interesting trends within the libraries, not easily observable using simple bi-parametric correlations. This may lead to new insights into factor affecting solar cells performances with the ultimate goal of designing better solar cells. The analyzer was developed using MATLAB version R2014a and consequently could be easily extended by adding additional tools and algorithms. Furthermore, while in our hands, the analyzer has been primarily used in the area of PV cells, is it equally applicable to the analysis of any other dataset composed of activities as dependent variables and descriptors as independent variables.


Assuntos
Aprendizado de Máquina , Software , Energia Solar , Relação Quantitativa Estrutura-Atividade
7.
Front Chem ; 6: 162, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29868564

RESUMO

Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features or in case of visualization methods uncover underlying patterns in the feature space. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neural network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.

8.
J Cheminform ; 9(1): 34, 2017 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-29086047

RESUMO

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.

9.
Mol Inform ; 35(11-12): 622-628, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27870244

RESUMO

Material informatics may provide meaningful insights and powerful predictions for the development of new and efficient Metal Oxide (MO) based solar cells. The main objective of this paper is to establish the usefulness of data reduction and visualization methods for analyzing data sets emerging from multiple all-MOs solar cell libraries. For this purpose, two libraries, TiO2 |Co3 O4 and TiO2 |Co3 O4 |MoO3 , differing only by the presence of a MoO3 layer in the latter were analyzed with Principal Component Analysis and Self-Organizing Maps. Both analyses suggest that the addition of the MoO3 layer to the TiO2 |Co3 O4 library has affected the overall photovoltaic (PV) activity profile of the solar cells making the two libraries clearly distinguishable from one another. Furthermore, while MoO3 had an overall favorable effect on PV parameters, a sub-population of cells was identified which were either indifferent to its presence or even demonstrated a reduction in several parameters.


Assuntos
Mineração de Dados/métodos , Cobalto/química , Molibdênio/química , Óxidos/química , Energia Solar , Titânio/química
10.
Mol Inform ; 35(11-12): 568-579, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27870246

RESUMO

Material informatics is engaged with the application of informatic principles to materials science in order to assist in the discovery and development of new materials. Central to the field is the application of data mining techniques and in particular machine learning approaches, often referred to as Quantitative Structure Activity Relationship (QSAR) modeling, to derive predictive models for a variety of materials-related "activities". Such models can accelerate the development of new materials with favorable properties and provide insight into the factors governing these properties. Here we provide a comparison between medicinal chemistry/drug design and materials-related QSAR modeling and highlight the importance of developing new, materials-specific descriptors. We survey some of the most recent QSAR models developed in materials science with focus on energetic materials and on solar cells. Finally we present new examples of material-informatic analyses of solar cells libraries produced from metal oxides using combinatorial material synthesis. Different analyses lead to interesting physical insights as well as to the design of new cells with potentially improved photovoltaic parameters.


Assuntos
Técnicas de Química Combinatória/métodos , Ciência da Informação/métodos , Ciência dos Materiais/métodos , Mineração de Dados/métodos , Desenho de Fármacos , Metais/química , Modelos Estatísticos , Relação Quantitativa Estrutura-Atividade
12.
J Chem Inf Model ; 55(12): 2507-18, 2015 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-26553402

RESUMO

Quantitative structure activity relationship (QSAR) or quantitative structure property relationship (QSPR) models are developed to correlate activities for sets of compounds with their structure-derived descriptors by means of mathematical models. The presence of outliers, namely, compounds that differ in some respect from the rest of the data set, compromise the ability of statistical methods to derive QSAR models with good prediction statistics. Hence, outliers should be removed from data sets prior to model derivation. Here we present a new multi-objective genetic algorithm for the identification and removal of outliers based on the k nearest neighbors (kNN) method. The algorithm was used to remove outliers from three different data sets of pharmaceutical interest (logBBB, factor 7 inhibitors, and dihydrofolate reductase inhibitors), and its performances were compared with those of five other methods for outlier removal. The results suggest that the new algorithm provides filtered data sets that (1) better maintain the internal diversity of the parent data sets and (2) give rise to QSAR models with much better prediction statistics. Equally good filtered data sets in terms of these metrics were obtained when another objective function was added to the algorithm (termed "preservation"), forcing it to remove certain compounds with low probability only. This option is highly useful when specific compounds should be preferably kept in the final data set either because they have favorable activities or because they represent interesting molecular scaffolds. We expect this new algorithm to be useful in future QSAR applications.


Assuntos
Algoritmos , Descoberta de Drogas/métodos , Modelos Teóricos , Relação Quantitativa Estrutura-Atividade , Modelos Moleculares
13.
J Comput Chem ; 36(8): 493-506, 2015 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-25503870

RESUMO

Datasets of molecular compounds often contain outliers, that is, compounds which are different from the rest of the dataset. Outliers, while often interesting may affect data interpretation, model generation, and decisions making, and therefore, should be removed from the dataset prior to modeling efforts. Here, we describe a new method for the iterative identification and removal of outliers based on a k-nearest neighbors optimization algorithm. We demonstrate for three different datasets that the removal of outliers using the new algorithm provides filtered datasets which are better than those provided by four alternative outlier removal procedures as well as by random compound removal in two important aspects: (1) they better maintain the diversity of the parent datasets; (2) they give rise to quantitative structure activity relationship (QSAR) models with much better prediction statistics. The new algorithm is, therefore, suitable for the pretreatment of datasets prior to QSAR modeling.

14.
Mol Inform ; 34(6-7): 367-79, 2015 06.
Artigo em Inglês | MEDLINE | ID: mdl-27490383

RESUMO

Growth in energy demands, coupled with the need for clean energy, are likely to make solar cells an important part of future energy resources. In particular, cells entirely made of metal oxides (MOs) have the potential to provide clean and affordable energy if their power conversion efficiencies are improved. Such improvements require the development of new MOs which could benefit from combining combinatorial material sciences for producing solar cells libraries with data mining tools to direct synthesis efforts. In this work we developed a data mining workflow and applied it to the analysis of two recently reported solar cell libraries based on Titanium and Copper oxides. Our results demonstrate that QSAR models with good prediction statistics for multiple solar cells properties could be developed and that these models highlight important factors affecting these properties in accord with experimental findings. The resulting models are therefore suitable for designing better solar cells.


Assuntos
Mineração de Dados/métodos , Aprendizado de Máquina , Metais/química , Modelos Teóricos , Óxidos/química , Energia Solar
15.
J Chem Inf Model ; 54(6): 1567-77, 2014 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-24802762

RESUMO

Representative subsets selected from within larger data sets are useful in many chemoinformatics applications including the design of information-rich compound libraries, the selection of compounds for biological evaluation, and the development of reliable quantitative structure-activity relationship (QSAR) models. Such subsets can overcome many of the problems typical of diverse subsets, most notably the tendency of the latter to focus on outliers. Yet only a few algorithms for the selection of representative subsets have been reported in the literature. Here we report on the development of two algorithms for the selection of representative subsets from within parent data sets based on the optimization of a newly devised representativeness function either alone or simultaneously with the MaxMin function. The performances of the new algorithms were evaluated using several measures representing their ability to produce (1) subsets which are, on average, close to data set compounds; (2) subsets which, on average, span the same space as spanned by the entire data set; (3) subsets mirroring the distribution of biological indications in a parent data set; and (4) test sets which are well predicted by qualitative QSAR models built on data set compounds. We demonstrate that for three data sets (containing biological indication data, logBBB permeation data, and Plasmodium falciparum inhibition data), subsets obtained using the new algorithms are more representative than subsets obtained by hierarchical clustering, k-means clustering, or the MaxMin optimization at least in three of these measures.


Assuntos
Algoritmos , Bases de Dados de Produtos Farmacêuticos , Descoberta de Drogas/métodos , Relação Quantitativa Estrutura-Atividade , Antiparasitários/farmacologia , Análise por Conglomerados , Humanos , Malária Falciparum/tratamento farmacológico , Modelos Biológicos , Modelos Moleculares , Plasmodium falciparum/efeitos dos fármacos
16.
J Phys Chem A ; 117(33): 7737-41, 2013 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-23886075

RESUMO

Medium variations usually affect the shape of the bimolecular nucleophilic reaction profile at the reactants' and products' ends and, to a much lesser extent, the shape around the transition state. In water, the reactions of extended allylic systems such as F(-) + H-(CH=CH)n-CH2-F → F-CH2-(CH=CH)n-H + F(-) have been computationally shown (for n = 2) to have a single transition state. As the polarity is decreased the transition state is gradually transformed into a double-humped profile that then changes smoothly through a triple-well profile into a single-well profile where the symmetric structure of the transition state is retained. The depth of the well is ca. 16 kcal/mol for n = 2 and reaches 40 kcal/mol for n = 7, resembling the stability of a weak chemical bond. This is traced to electrostatic effects as well as to the effect of an intermediate VB configuration. In the analogous polyynes, a stable adduct is already formed at n = 1. This is attributed to the formation of the relatively stable vinylic carbanion. As the number of acetylene units increases, the vinylic geometry (a CCC angle of 123°) is gradually lost until at n = 5 the adduct attains a linear geometry.

17.
J Phys Chem A ; 117(24): 5023-7, 2013 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-23705974

RESUMO

Computational studies at the B3LYP/6-31+G* level were carried out on the addition of pyridine to polyynes (C6-C18) and on the protonation of polyynes by methyl ammonium fluoride under electric fields of 2.5 and 5 MV/cm. The electric field in each case was oriented along the polyyne axis in a direction that enhances the reaction by stabilizing the incipient dipole. It was found that the reaction of pyridine addition is endothermic with a late transition state. The longer the polyynes and the stronger the field, the electric field catalysis was more efficient. Extrapolation of the data to long polyynes shows that at 1000 nm an electric field of 50 000 V/cm will reduce the barrier by 10 kcal/mol. This reduction is equivalent to 7 orders of magnitude in rate enhancement. A similar barrier reduction could be achieved with a 2.5 MV/cm field at a polyyne length of 20 nm. Protonation reactions were found to be much more affected by the electric field. A reduction of the reaction barrier by 10 kcal/mol using a 2.5 MV/cm electric field could be achieved at a polyyne length of 10 nm. Thus the electric field along the long axis of a substrate could induce a gradient of reactivity which could, in principle, enable the barcoding of substrates by using a sequence of reactants having different reactivities.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA