RESUMO
This article proposes a methodology that uses machine learning algorithms to extract actions from structured chemical synthesis procedures, thereby bridging the gap between chemistry and natural language processing. The proposed pipeline combines ML algorithms and scripts to extract relevant data from USPTO and EPO patents, which helps transform experimental procedures into structured actions. This pipeline includes two primary tasks: classifying patent paragraphs to select chemical procedures and converting chemical procedure sentences into a structured, simplified format. We employ artificial neural networks such as long short-term memory, bidirectional LSTMs, transformers, and fine-tuned T5. Our results show that the bidirectional LSTM classifier achieved the highest accuracy of 0.939 in the first task, while the Transformer model attained the highest BLEU score of 0.951 in the second task. The developed pipeline enables the creation of a dataset of chemical reactions and their procedures in a structured format, facilitating the application of AI-based approaches to streamline synthetic pathways, predict reaction outcomes, and optimize experimental conditions. Furthermore, the developed pipeline allows for creating a structured dataset of chemical reactions and procedures, making it easier for researchers to access and utilize the valuable information in synthesis procedures.
RESUMO
In this research, a process for developing normal-phase liquid chromatography solvent systems has been proposed. In contrast to the development of conditions via thin-layer chromatography (TLC), this process is based on the architecture of two hierarchically connected neural network-based components. Using a large database of reaction procedures allows those two components to perform an essential role in the machine-learning-based prediction of chromatographic purification conditions, i.e., solvents and the ratio between solvents. In our paper, we build two datasets and test various molecular vectorization approaches, such as extended-connectivity fingerprints, learned embedding, and auto-encoders along with different types of deep neural networks to demonstrate a novel method for modeling chromatographic solvent systems employing two neural networks in sequence. Afterward, we present our findings and provide insights on the most effective methods for solving prediction tasks. Our approach results in a system of two neural networks with long short-term memory (LSTM)-based auto-encoders, where the first predicts solvent labels (by reaching the classification accuracy of 0.950 ± 0.001) and in the case of two solvents, the second one predicts the ratio between two solvents (R2 metric equal to 0.982 ± 0.001). Our approach can be used as a guidance instrument in laboratories to accelerate scouting for suitable chromatography conditions.
RESUMO
Here, we report on the design, synthesis, and biological evaluation of 4-thiazolidinone (rhodanine) derivatives targeting Mycobacterial tuberculosis (Mtb) trans-2-enoyl-acyl carrier protein reductase (InhA). Compounds having bulky aromatic substituents at position 5 and a tryptophan residue at position N-3 of the rhodanine ring were the most active against InhA, with IC50 values ranging from 2.7 to 30 µM. The experimental data showed consistent correlations with computational studies. Their antimicrobial activity was assessed against Mycobacterium marinum (Mm) (a model for Mtb), Pseudomonas aeruginosa (Pa), Legionella pneumophila (Lp), and Enterococcus faecalis (Ef) by using anti-infective, antivirulence, and antibiotic assays. Nineteen out of 34 compounds reduced Mm virulence at 10 µM. 33 exhibited promising antibiotic activity against Mm with a MIC of 0.21 µM and showed up to 89% reduction of Lp growth in an anti-infective assay at 30 µM. 32 showed high antibiotic activity against Ef, with a MIC of 0.57 µM.