RESUMO
Using generative deep learning models and reinforcement learning together can effectively generate new molecules with desired properties. By employing a multi-objective scoring function, thousands of high-scoring molecules can be generated, making this approach useful for drug discovery and material science. However, the application of these methods can be hindered by computationally expensive or time-consuming scoring procedures, particularly when a large number of function calls are required as feedback in the reinforcement learning optimization. Here, we propose the use of double-loop reinforcement learning with simplified molecular line entry system (SMILES) augmentation to improve the efficiency and speed of the optimization. By adding an inner loop that augments the generated SMILES strings to non-canonical SMILES for use in additional reinforcement learning rounds, we can both reuse the scoring calculations on the molecular level, thereby speeding up the learning process, as well as offer additional protection against mode collapse. We find that employing between 5 and 10 augmentation repetitions is optimal for the scoring functions tested and is further associated with an increased diversity in the generated compounds, improved reproducibility of the sampling runs and the generation of molecules of higher similarity to known ligands.
Assuntos
Desenho de Fármacos , Redes Neurais de Computação , Reprodutibilidade dos Testes , Descoberta de Drogas/métodosRESUMO
Classification is a very common image processing task. The accuracy of the classified map is typically assessed through a comparison with real-world situations or with available reference data to estimate the reliability of the classification results. Common accuracy assessment approaches are based on an error matrix and provide a measure for the overall accuracy. A frequently used index is the Kappa index. As the Kappa index has increasingly been criticized, various alternative measures have been investigated with minimal success in practice. In this article, we introduce a novel index that overcomes the limitations. Unlike Kappa, it is not sensitive to asymmetric distributions. The quantity and allocation disagreement index (QADI) index computes the degree of disagreement between the classification results and reference maps by counting wrongly labeled pixels as A and quantifying the difference in the pixel count for each class between the classified map and reference data as Q. These values are then used to determine a quantitative QADI index value, which indicates the value of disagreement and difference between a classification result and training data. It can also be used to generate a graph that indicates the degree to which each factor contributes to the disagreement. The efficiency of Kappa and QADI were compared in six use cases. The results indicate that the QADI index generates more reliable classification accuracy assessments than the traditional Kappa can do. We also developed a toolbox in a GIS software environment.
Assuntos
Processamento de Imagem Assistida por Computador , Tecnologia de Sensoriamento Remoto , Processamento de Imagem Assistida por Computador/métodos , Tecnologia de Sensoriamento Remoto/métodos , Reprodutibilidade dos Testes , SoftwareRESUMO
This work evaluates the performance of three machine learning (ML) techniques, namely logistic regression (LGR), linear regression (LR), and support vector machines (SVM), and two multi-criteria decision-making (MCDM) techniques, namely analytical hierarchy process (AHP) and the technique for order of preference by similarity to ideal solution (TOPSIS), for mapping landslide susceptibility in the Chitral district, northern Pakistan. Moreover, we create landslide inventory maps from LANDSAT-8 satellite images through the change vector analysis (CVA) change detection method. The change detection yields more than 500 landslide spots. After some manual post-processing correction, the landslide inventory spots are randomly split into two sets with a 70/30 ratio for training and validating the performance of the ML techniques. Sixteen topographical, hydrological, and geological landslide-related factors of the study area are prepared as GIS layers. They are used to produce landslide susceptibility maps (LSMs) with weighted overlay techniques using different weights of landslide-related factors. The accuracy assessment shows that the ML techniques outperform the MCDM methods, while SVM yields the highest accuracy of 88% for the resulting LSM.
Assuntos
Deslizamentos de Terra , Sistemas de Informação Geográfica , Modelos Logísticos , Paquistão , Máquina de Vetores de SuporteRESUMO
In many parts of the world, lake drying is caused by water management failures, while the phenomenon is exacerbated by climate change. Lake Urmia in Northern Iran is drying up at such an alarming rate that it is considered to be a dying lake, which has dire consequences for the whole region. While salinization caused by a dying lake is well understood and known to influence the local and regional food production, other potential impacts by dying lakes are as yet unknown. The food production in the Urmia region is predominantly regional and relies on local water sources. To explore the current and projected impacts of the dying lake on food production, we investigated changes in the climatic conditions, land use, and land degradation for the period 1990-2020. We examined the environmental impacts of lake drought on food production using an integrated scenario-based geoinformation framework. The results show that the lake drought has significantly affected and reduced food production over the past three decades. Based on a combination of cellular automaton and Markov modeling, we project the food production for the next 30 years and predict it will reduce further. The results of this study emphasize the critical environmental impacts of the Urmia Lake drought on food production in the region. We hope that the results will encourage authorities and environmental planners to counteract these issues and take steps to support food production. As our proposed integrated geoinformation approach considers both the extensive impacts of global climate change and the factors associated with dying lakes, we consider it to be suitable to investigate the relationships between environmental degradation and scenario-based food production in other regions with dying lakes around the world.
Assuntos
Monitoramento Ambiental , Lagos , Mudança Climática , Irã (Geográfico) , Água , Abastecimento de ÁguaRESUMO
This study deals with the issue of greenwashing, i.e. the false portrayal of companies as environmentally friendly. The analysis focuses on the US metal industry, which is a major emission source of sulfur dioxide (SO2), one of the most harmful air pollutants. One way to monitor the distribution of atmospheric SO2 concentrations is through satellite data from the Sentinel-5P programme, which represents a major advance due to its unprecedented spatial resolution. In this paper, Sentinel-5P remote sensing data was combined with a plant-level firm database to investigate the relationship between the US metal industry and SO2 concentrations using a spatial regression analysis. Additionally, this study considered web text data, classifying companies based on their websites in order to depict their self-portrayal on the topic of sustainability. In doing so, we investigated the topic of greenwashing, i.e. whether or not a positive self-portrayal regarding sustainability is related to lower local SO2 concentrations. Our results indicated a general, positive correlation between the number of employees in the metal industry and local SO2 concentrations. The web-based analysis showed that only 8% of companies in the metal industry could be classified as engaged in sustainability based on their websites. The regression analyses indicated that these self-reported "sustainable" companies had a weaker effect on local SO2 concentrations compared to their "non-sustainable" counterparts, which we interpreted as an indication of the absence of general greenwashing in the US metal industry. However, the large share of firms without a website and lack of specificity of the text classification model were limitations to our methodology.
Assuntos
Poluentes Atmosféricos , Poluição do Ar , Poluentes Atmosféricos/análise , Poluição do Ar/análise , Mineração de Dados , Monitoramento Ambiental , Humanos , Indústrias , Metais/análise , Análise de Regressão , Dióxido de Enxofre/análiseRESUMO
Exploring the origin of multi-target activity of small molecules and designing new multi-target compounds are highly topical issues in pharmaceutical research. We have investigated the ability of a generative neural network to create multi-target compounds. Data sets of experimentally confirmed multi-target, single-target, and consistently inactive compounds were extracted from public screening data considering positive and negative assay results. These data sets were used to fine-tune the REINVENT generative model via transfer learning to systematically recognize multi-target compounds, distinguish them from single-target or inactive compounds, and construct new multi-target compounds. During fine-tuning, the model showed a clear tendency to increasingly generate multi-target compounds and structural analogs. Our findings indicate that generative models can be adopted for de novo multi-target compound design.
Assuntos
Desenho de Fármacos , Redes Neurais de ComputaçãoRESUMO
Earthquakes and heavy rainfalls are the two leading causes of landslides around the world. Since they often occur across large areas, landslide detection requires rapid and reliable automatic detection approaches. Currently, deep learning (DL) approaches, especially different convolutional neural network and fully convolutional network (FCN) algorithms, are reliably achieving cutting-edge accuracies in automatic landslide detection. However, these successful applications of various DL approaches have thus far been based on very high resolution satellite images (e.g., GeoEye and WorldView), making it easier to achieve such high detection performances. In this study, we use freely available Sentinel-2 data and ALOS digital elevation model to investigate the application of two well-known FCN algorithms, namely the U-Net and residual U-Net (or so-called ResU-Net), for landslide detection. To our knowledge, this is the first application of FCN for landslide detection only from freely available data. We adapt the algorithms to the specific aim of landslide detection, then train and test with data from three different case study areas located in Western Taitung County (Taiwan), Shuzheng Valley (China), and Eastern Iburi (Japan). We characterize three different window size sample patches to train the algorithms. Our results also contain a comprehensive transferability assessment achieved through different training and testing scenarios in the three case studies. The highest f1-score value of 73.32% was obtained by ResU-Net, trained with a dataset from Japan, and tested on China's holdout testing area using the sample patch size of 64 × 64 pixels.
RESUMO
The world's poorest countries were hit hardest by COVID-19 due to their limited capacities to combat the pandemic. The urban water supply and water consumption are affected by the pandemic because it intensified the existing deficits in the urban water supply and sanitation services. In this study, we develop an integrated spatial analysis approach to investigate the impacts of COVID-19 on multi-dimensional Urban Water Consumption Patterns (UWCPs) with the aim of forecasting the water demand. We selected the Tabriz metropolitan area as a case study area and applied an integrated approach of GIS spatial analysis and regression-based autocorrelation assessment to develop the UWCPs for 2018, 2019 and 2020. We then employed GIS-based multi-criteria decision analysis and a CA-Markov model to analyze the water demand under the impacts of COVID-19 and to forecast the UWCPs for 2021, 2022 and 2023. In addition, we tested the spatial uncertainty of the prediction maps using the Dempster Shafer Theory. The results show that the domestic water consumption increased by 17.57% during the year 2020 as a result of the COVID-19 pandemic. The maximum increase in water consumption was observed in spring 2020 (April-June) when strict quarantine regulations were in place. Based on our results, the annual water deficit in Tabriz has increased from ~18% to about 30% in 2020. In addition, our projections show that this may further increase to about 40-45% in 2021. Relevant stakeholders can use the findings to develop evidence-informed strategies for sustainable water resource management in the post-COVID era. This research also makes other significant contributions. From the environmental perspective, since COVID-19 has affected resource management in many parts of the world, the proposed method can be applied to similar contexts to mitigate the adverse impacts and developed better informed recovery plans.
Assuntos
COVID-19 , Pandemias , Humanos , Irã (Geográfico)/epidemiologia , SARS-CoV-2 , Água , Abastecimento de ÁguaRESUMO
AIM: Generating a data and software infrastructure for evaluating multi-target compound (MT-CPD) design via deep generative modeling. METHODOLOGY: The REINVENT 2.0 approach for generative modeling was extended for MT-CPD design and a large benchmark data set was curated. EXEMPLARY RESULTS & DATA: Proof-of-concept for deep generative MT-CPD design was established. Custom code and the benchmark set comprising 2809 MT-CPDs, 61,928 single-target and 295,395 inactive compounds from biological screens are made freely available. LIMITATIONS & NEXT STEPS: MT-CPD design via deep learning is still at its conceptual stages. It will be required to demonstrate experimental impact. The data and software we provide enable further investigation of MT-CPD design and generation of candidate molecules for experimental programs.
RESUMO
Traditional soil salinity studies are time-consuming and expensive, especially over large areas. This study proposed an innovative deep learning convolutional neural network (DL-CNN) data-driven approach for SSD mapping. Multi-spectral remote sensing data encompassing Landsat series images provide the possibility for frequent assessment of SSD in various regions of the world. Therefore, Landsat 7 ETM+ and 8 OLI images were acquired for years 2005, 2010, 2015 and 2019. Totally, 704 sample points collected from the top 20 cm of the soil surface, which 70% was used to train the network and the remains (30%) were utilized to validate the network. Accordingly, DL-CNN model trained using remote sensing (RS)-derived variables (land surface temperature (LST), Soil moisture (SM) and evapotranspiration) and geospatial data such as NDVI and landuse. To train the CNN, ReLu, Cross-entropy and ADAM were employed respectively as activation, loss/cost functions and optimizer. The results indicated the high confidence of OA 0.94.02, 0.93.99, 0.94.87 and 0.95.0 respectively for years 2005, 2010, 2015 and 2019. These accuracies demonstrated the best performance of automated DL-CNN for SSD mapping compared to RS soil salinity indexes. Furthermore, the FR and WOE models applied in order to generate a geospatial assessment of the DL-CNN classification results. According to the FR model, landuse, LST, LST and NDVI with the frequency ratio of 0.98.25, 0.94.03, 0.97.23 and 0.96.36 selected respectively as more effective factors for SSD in the study area for years 2005, 2010, 2015 and 2019. Also based on the WOE model, landuse, LST, landuse and NDVI with the WOE of 0.88.25, 0.91.88, 0.87.43 and 0.89.02 were ranked respectively for years 2005, 2010, 2015 and 2019 as efficient variables for SSD. In sum, our introduced method can be recommended for SDD spatial modelling in other favored areas with similar environmental conditions.
RESUMO
Land subsidence (LS) in arid and semi-arid areas, such as Iran, is a significant threat to sustainable land management. The purpose of this study is to predict the LS distribution by generating land subsidence susceptibility models (LSSMs) for the Shahroud plain in Iran using three different multi-criteria decision making (MCDM) and five different artificial intelligence (AI) models. The MCDM models we used are the VlseKriterijumska Optimizacija IKompromisno Resenje (VIKOR), Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) and Complex Proportional Assessment (COPRAS), and the AI models are the extreme gradient boosting (XGBoost), Cubist, Elasticnet, Bayesian multivariate adaptive regression spline (BMARS) and conditional random forest (Cforest) methods. We used the Receiver Operating Characteristic (ROC) curve, Area Under Curve (AUC) and different statistical indices,i.e. accuracy, sensitivity, specificity, F score, Kappa, Mean Absolute Error (MAE) and Nash-Sutcliffe Criteria (NSC)to validate and evaluate the methods. Based on the different validation techniques, the Cforest method yielded the best results with minimum and maximum values of 0.04 and 0.99, respectively. According to the Cforest model, 30.55% of the study area is extremely vulnerable to land subsidence. The results of our research will be of great help to planners and policy makers in the identification of the most vulnerable regions and the implementation of appropriate development strategies in this area.
Assuntos
Inteligência Artificial , Teorema de Bayes , Irã (Geográfico) , Curva ROCRESUMO
There is an evident increase in the importance that remote sensing sensors play in the monitoring and evaluation of natural hazards susceptibility and risk. The present study aims to assess the flash-flood potential values, in a small catchment from Romania, using information provided remote sensing sensors and Geographic Informational Systems (GIS) databases which were involved as input data into a number of four ensemble models. In a first phase, with the help of high-resolution satellite images from the Google Earth application, 481 points affected by torrential processes were acquired, another 481 points being randomly positioned in areas without torrential processes. Seventy percent of the dataset was kept as training data, while the other 30% was assigned to validating sample. Further, in order to train the machine learning models, information regarding the 10 flash-flood predictors was extracted in the training sample locations. Finally, the following four ensembles were used to calculate the Flash-Flood Potential Index across the Bâsca Chiojdului river basin: Deep Learning Neural Network-Frequency Ratio (DLNN-FR), Deep Learning Neural Network-Weights of Evidence (DLNN-WOE), Alternating Decision Trees-Frequency Ratio (ADT-FR) and Alternating Decision Trees-Weights of Evidence (ADT-WOE). The model's performances were assessed using several statistical metrics. Thus, in terms of Sensitivity, the highest value of 0.985 was achieved by the DLNN-FR model, meanwhile the lowest one (0.866) was assigned to ADT-FR ensemble. Moreover, the specificity analysis shows that the highest value (0.991) was attributed to DLNN-WOE algorithm, while the lowest value (0.892) was achieved by ADT-FR. During the training procedure, the models achieved overall accuracies between 0.878 (ADT-FR) and 0.985 (DLNN-WOE). K-index shows again that the most performant model was DLNN-WOE (0.97). The Flash-Flood Potential Index (FFPI) values revealed that the surfaces with high and very high flash-flood susceptibility cover between 46.57% (DLNN-FR) and 59.38% (ADT-FR) of the study zone. The use of the Receiver Operating Characteristic (ROC) curve for results validation highlights the fact that FFPIDLNN-WOE is characterized by the most precise results with an Area Under Curve of 0.96.
RESUMO
Compounds with the ability to interact with multiple targets, also called promiscuous compounds, provide the basis for polypharmacological drug discovery. In recent years, a plethora of structural analogs with different promiscuity has been identified. Nevertheless, the molecular origins of promiscuity remain to be elucidated. In this study, we systematically extracted different structural analogs with varying promiscuity using the matched molecular pair (MMP) formalism from public biological screening and medicinal chemistry data. Care was taken to eliminate all compounds with potential false-positive activity annotations from the analysis. Promiscuity predictions were then attempted at the level of compound pairs representing promiscuity cliffs (PCs; formed by analogs with large promiscuity differences) and corresponding non-PC MMPs (analog pairs without significant promiscuity differences). To address this prediction task, different machine learning models were generated and the results were compared with single compound predictions. PCs encoding promiscuity differences were found to contain more structure-promiscuity relationship information than sets of individual promiscuous compounds. In addition, feature analysis was carried out revealing key contributions to the correct prediction of PCs and non-PC MMPs via machine learning.
Assuntos
Aprendizado de Máquina , Polifarmacologia , Aprendizado Profundo , Humanos , Relação Estrutura-AtividadeRESUMO
The present research examines the landslide susceptibility in Rudraprayag district of Uttarakhand, India using the conditional probability (CP) statistical technique, the boost regression tree (BRT) machine learning algorithm, and the CP-BRT ensemble approach to improve the accuracy of the BRT model. Using the four fold of data, the models' outcomes were cross-checked. The locations of existing landslides were detected by general field surveys and relevant records. 220 previous landslide locations were obtained, presented as an inventory map, and divided into four folds to calibrate and authenticate the models. For modelling the landslide susceptibility, twelve LCFs (landslide conditioning factors) were used. Two statistical methods, i.e. the mean absolute error (MAE) and the root mean square error (RMSE), one statistical test, i.e. the Freidman rank test, as well as the receiver operating characteristic (ROC), efficiency and precision were used for authenticating the produced landslide models. The results of the accuracy measures revealed that all models have good potential to recognize the landslide susceptibility in the Garhwal Himalayan region. Among these models, the ensemble model achieved a higher accuracy (precision: 0.829, efficiency: 0.833, AUC: 89.460, RMSE: 0.069 and MAE: 0.141) than the individual models. According to the outcome of the ensemble simulations, the BRT model's predictive accuracy was enhanced by integrating it with the statistical model (CP). The study showed that the areas of fallow land, plantation fields, and roadsides with elevations of more than 1500 m. with steep slopes of 24° to 87° and eroding hills are highly susceptible to landslides. The findings of this work could help in minimizing the landslides' risk in the Western Himalaya and its adjoining areas with similar landscapes and geological characteristics.
RESUMO
In de novo molecular design, recurrent neural networks (RNN) have been shown to be effective methods for sampling and generating novel chemical structures. Using a technique called reinforcement learning (RL), an RNN can be tuned to target a particular section of chemical space with optimized desirable properties using a scoring function. However, ligands generated by current RL methods so far tend to have relatively low diversity, and sometimes even result in duplicate structures when optimizing towards desired properties. Here, we propose a new method to address the low diversity issue in RL for molecular design. Memory-assisted RL is an extension of the known RL, with the introduction of a so-called memory unit. As proof of concept, we applied our method to generate structures with a desired AlogP value. In a second case study, we applied our method to design ligands for the dopamine type 2 receptor and the 5-hydroxytryptamine type 1A receptor. For both receptors, a machine learning model was developed to predict whether generated molecules were active or not for the receptor. In both case studies, it was found that memory-assisted RL led to the generation of more compounds predicted to be active having higher chemical diversity, thus achieving better coverage of chemical space of known ligands compared to established RL methods.
RESUMO
In the past few years, we have witnessed a renaissance of the field of molecular de novo drug design. The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chemical compounds by using either graph- or string (SMILES)-based representations. With this application note, we aim to offer the community a production-ready tool for de novo design, called REINVENT. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space. It can facilitate the idea generation process by bringing to the researcher's attention the most promising compounds. REINVENT's code is publicly available at https://github.com/MolecularAI/Reinvent.
Assuntos
Inteligência Artificial , Desenho de Fármacos , Descoberta de DrogasRESUMO
Gully erosion is a form of natural disaster and one of the land loss mechanisms causing severe problems worldwide. This study aims to delineate the areas with the most severe gully erosion susceptibility (GES) using the machine learning techniques Random Forest (RF), Gradient Boosted Regression Tree (GBRT), Naïve Bayes Tree (NBT), and Tree Ensemble (TE). The gully inventory map (GIM) consists of 120 gullies. Of the 120 gullies, 84 gullies (70%) were used for training and 36 gullies (30%) were used to validate the models. Fourteen gully conditioning factors (GCFs) were used for GES modeling and the relationships between the GCFs and gully erosion was assessed using the weight-of-evidence (WofE) model. The GES maps were prepared using RF, GBRT, NBT, and TE and were validated using area under the receiver operating characteristic(AUROC) curve, the seed cell area index (SCAI) and five statistical measures including precision (PPV), false discovery rate (FDR), accuracy, mean absolute error (MAE), and root mean squared error (RMSE). Nearly 7% of the basin has high to very high susceptibility for gully erosion. Validation results proved the excellent ability of these models to predict the GES. Of the analyzed models, the RF (AUROC = 0.96, PPV = 1.00, FDR = 0.00, accuracy = 0.87, MAE = 0.11, RMSE = 0.19 for validation dataset) is accurate enough for modeling and better suited for GES modeling than the other models. Therefore, the RF model can be used to model the GES areas not only in this river basin but also in other areas with the same geo-environmental conditions.
RESUMO
Driver behavior has been considered as the most critical and uncertain criteria in the study of traffic safety issues. Driver behavior identification and categorization by using the Fuzzy Analytic Hierarchy Process (FAHP) can overcome the uncertainty of driver behavior by capturing the ambiguity of driver thinking style. The main goal of this paper is to examine the significant driver behavior criteria that influence traffic safety for different traffic cultures such as Hungary, Turkey, Pakistan and China. The study utilized the FAHP framework to compare and quantify the driver behavior criteria designed on a three-level hierarchical structure. The FAHP procedure computed the weight factors and ranked the significant driver behavior criteria based on pairwise comparisons (PCs) of driver's responses on the Driver Behavior Questionnaire (DBQ). The study results observed "violations" as the most significant driver behavior criteria for level 1 by all nominated regions except Hungary. While for level 2, "aggressive violations" is observed as the most significant driver behavior criteria by all regions except Turkey. Moreover, for level 3, Hungary and Turkey drivers evaluated the "drive with alcohol use" as the most significant driver behavior criteria. While Pakistan and China drivers evaluated the "fail to yield pedestrian" as the most significant driver behavior criteria. Finally, Kendall's agreement test was performed to measure the agreement degree between observed groups for each level in a hierarchical structure. The methodology applied can be easily transferable to other study areas and our results in this study can be helpful for the drivers of each region to focus on highlighted significant driver behavior criteria to reduce fatal and seriously injured traffic accidents.
Assuntos
Condução de Veículo , Características Culturais , Acidentes de Trânsito , Adulto , China , Feminino , Humanos , Hungria , Masculino , Paquistão , Segurança , Turquia , Adulto JovemRESUMO
A long-standing question in GIScience is whether geographic information systems (GIS) facilitates an adequate quantifiable representation of the concept of place. Considering the difficulties of quantifying elusive concepts related to place, several researchers focus on more tangible dimensions of the human understanding of place. The most common approaches are semantic enrichment of spatial information and holistic conceptualization of the notion of place. However, these approaches give emphasis on either space or human meaning, or they mainly exist as concepts without practically proven usable artifacts. A partial answer to this problem was proposed by the function-based model that treats place as functional space. This paper focuses primarily on the level of composition, describing and formalizing it as a rule-based framework with the following objectives: (a) contribute to the formalization efforts of the notion of place and its integration within GIS and (b) maintain tangible properties intertwined with the human understanding of place. The operationalization potential of the proposed framework is illustrated with an example of identifying the shopping areas in an urban region. The results show that the proposed model is able to capture all shopping malls as well as other areas that are not explicitly labeled as such but still function similarly to a shopping mall.