Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
1.
Digit Discov ; 3(1): 23-33, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38239898

RESUMEN

In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.

2.
Sci Rep ; 14(1): 552, 2024 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-38177656

RESUMEN

In designing functional biological sequences with machine learning, the activity predictor tends to be inaccurate due to shortage of data. Top ranked sequences are thus unlikely to contain effective ones. This paper proposes to take prediction stability into account to provide domain experts with a reasonable list of sequences to choose from. In our approach, multiple prediction models are trained by subsampling the training set and the multi-objective optimization problem, where one objective is the average activity and the other is the standard deviation, is solved. The Pareto front represents a list of sequences with the whole spectrum of activity and stability. Using this method, we designed VHH (Variable domain of Heavy chain of Heavy chain) antibodies based on the dataset obtained from deep mutational screening. To solve multi-objective optimization, we employed our sequence design software MOQA that uses quantum annealing. By applying several selection criteria to 19,778 designed sequences, five sequences were selected for wet-lab validation. One sequence, 16 mutations away from the closest training sequence, was successfully expressed and found to possess desired binding specificity. Our whole spectrum approach provides a balanced way of dealing with the prediction uncertainty, and can possibly be applied to extensive search of functional sequences.


Asunto(s)
Anticuerpos , Ingeniería de Proteínas , Aprendizaje Automático
3.
Patterns (N Y) ; 4(12): 100890, 2023 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-38106611

RESUMEN

Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering unified inconsistent notation sub-structures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the safe pattern pruning method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences.

4.
Patterns (N Y) ; 4(12): 100846, 2023 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-38106610

RESUMEN

The efficient treatment of polymer waste is a major challenge for marine sustainability. It is useful to reveal the factors that dominate the degradability of polymer materials for developing polymer materials in the future. The small number of available datasets on degradability and the diversity of their experimental means and conditions hinder large-scale analysis. In this study, we have developed a platform for evaluating the degradability of polymers that is suitable for such data, using a rank-based machine learning technique based on RankSVM. We then made a ranking model to evaluate the degradability of polymers, integrating three datasets on the degradability of polymers that are measured by different means and conditions. Analysis of this ranking model with a decision tree revealed factors that dominate the degradability of polymers.

5.
J Chem Theory Comput ; 19(19): 6770-6781, 2023 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-37729470

RESUMEN

Density functional theory (DFT) is a significant computational tool that has substantially influenced chemistry, physics, and materials science. DFT necessitates parametrized approximation for determining an expected value. Hence, to predict the properties of a given molecule using DFT, appropriate parameters of the functional should be set for each molecule. Herein, we optimize the parameters of range-separated functionals (LC-BLYP and CAM-B3LYP) via Bayesian optimization (BO) to satisfy Koopmans' theorem. Our results demonstrate the effectiveness of the BO in optimizing functional parameters. Particularly, Koopmans' theorem-compliant LC-BLYP (KTLC-BLYP) shows results comparable to the experimental UV-absorption values. Furthermore, we prepared an optimized parameter dataset of KTLC-BLYP for over 3000 molecules through BO for satisfying Koopmans' theorem. We have developed a machine learning model on this dataset to predict the parameters of the LC-BLYP functional for a given molecule. The prediction model automatically predicts the appropriate parameters for a given molecule and calculates the corresponding values. The approach in this paper would be useful to develop new functionals and to update the previously developed functionals.

6.
Sci Rep ; 13(1): 14306, 2023 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-37653108

RESUMEN

Automatic mitosis detection from video is an essential step in analyzing proliferative behaviour of cells. In existing studies, a conventional object detector such as Unet is combined with a link prediction algorithm to find correspondences between parent and daughter cells. However, they do not take into account the biological constraint that a cell in a frame can correspond to up to two cells in the next frame. Our model called GNN-DOL enables mitosis detection by complementing a graph neural network (GNN) with a differentiable optimization layer (DOL) that implements the constraint. In time-lapse microscopy sequences cultured under four different conditions, we observed that the layer substantially improved detection performance in comparison with GNN-based link prediction. Our results illustrate the importance of incorporating biological knowledge explicitly into deep learning models.


Asunto(s)
División del Núcleo Celular , Mitosis , Redes Neurales de la Computación , Algoritmos , Conocimiento
7.
ACS Med Chem Lett ; 14(5): 577-582, 2023 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-37197452

RESUMEN

Increasing the variety of antimicrobial peptides is crucial in meeting the global challenge of multi-drug-resistant bacterial pathogens. While several deep-learning-based peptide design pipelines are reported, they may not be optimal in data efficiency. High efficiency requires a well-compressed latent space, where optimization is likely to fail due to numerous local minima. We present a multi-objective peptide design pipeline based on a discrete latent space and D-Wave quantum annealer with the aim of solving the local minima problem. To achieve multi-objective optimization, multiple peptide properties are encoded into a score using non-dominated sorting. Our pipeline is applied to design therapeutic peptides that are antimicrobial and non-hemolytic at the same time. From 200 000 peptides designed by our pipeline, four peptides proceeded to wet-lab validation. Three of them showed high anti-microbial activity, and two are non-hemolytic. Our results demonstrate how quantum-based optimizers can be taken advantage of in real-world medical studies.

8.
J Chem Inf Model ; 63(8): 2360-2369, 2023 04 24.
Artículo en Inglés | MEDLINE | ID: mdl-37036083

RESUMEN

In the presence of structural data, one sometimes need to compare 3D ligands. We design an overlay-free method to rank order 3D molecules in the pharmacophore feature space. The proposed encoding includes only two fittable parameters, is sparse, and is not too high dimensional. At the cost of an additional parameter, to delineate the binding site from a protein-ligand complex, the method can compare binding sites. The method was benchmarked on the LIT-PCBA data set for ligand-based virtual screening experiments and the sc-PDB and a Vertex data set when comparing binding sites. In similarity searches, the proposed method outperforms an open-source software doing optimal superposition of ligand-based pharmacophores and RDKit's 3D pharmacophore fingerprints. When comparing binding sites, the method is competitive with state of the art approaches. On a single CPU core, up to 374,000 ligand/s or 132,000 binding site/s can be rank ordered. The "AutoCorrelation of Pharmacophore Features" open-source software is released at https://github.com/tsudalab/ACP4.


Asunto(s)
Farmacóforo , Programas Informáticos , Ligandos , Sitios de Unión
9.
MAbs ; 15(1): 2168470, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36683172

RESUMEN

Despite the advances in surface-display systems for directed evolution, variants with high affinity are not always enriched due to undesirable biases that increase target-unrelated variants during biopanning. Here, our goal was to design a library containing improved variants from the information of the "weakly enriched" library where functional variants were weakly enriched. Deep sequencing for the previous biopanning result, where no functional antibody mimetics were experimentally identified, revealed that weak enrichment was partly due to undesirable biases during phage infection and amplification steps. The clustering analysis of the deep sequencing data from appropriate steps revealed no distinct sequence patterns, but a Bayesian machine learning model trained with the selected deep sequencing data supplied nine clusters with distinct sequence patterns. Phage libraries were designed on the basis of the sequence patterns identified, and four improved variants with target-specific affinity (EC50 = 80-277 nM) were identified by biopanning. The selection and use of deep sequencing data without undesirable bias enabled us to extract the information on prospective variants. In summary, the use of appropriate deep sequencing data and machine learning with the sequence data has the possibility of finding sequence space where functional variants are enriched.


Asunto(s)
Bacteriófagos , Biblioteca de Péptidos , Proteínas Portadoras , Teorema de Bayes , Estudios Prospectivos , Bacteriófagos/genética , Secuenciación de Nucleótidos de Alto Rendimiento
10.
J Chem Theory Comput ; 19(3): 713-717, 2023 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-36689311

RESUMEN

Artificial force has been proven useful to get over energy barriers and quickly search a large portion of the energy landscape. This work proposes a method based on graph neural networks to optimize the choice of transformation patterns to examine and accelerate energy landscape exploration. In open search from glutathione, the search efficiency was largely improved in comparison to random selection. We also applied transfer learning from glutathione to tuftsin, resulting in further efficiency gains.

11.
Methods Mol Biol ; 2552: 125-139, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36346589

RESUMEN

This chapter describes the application of constrained geometric simulations for prediction of antibody structural dynamics. We utilize constrained geometric simulations method FRODAN, which is a low computational complexity alternative to molecular dynamics (MD) simulations that can rapidly explore flexible motions in protein structures. FRODAN is highly suited for conformational dynamics analysis of large proteins, complexes, intrinsically disordered proteins, and dynamics that occurs on longer biologically relevant time scales that are normally inaccessible to classical MD simulations. This approach predicts protein dynamics at an all-atom scale while retaining realistic covalent bonding, maintaining dihedral angles in energetically good conformations while avoiding steric clashes in addition to performing other geometric and stereochemical criteria checks. In this chapter, we apply FRODAN to showcase its applicability for probing functionally relevant dynamics of IgG2a, including large-amplitude domain-domain motions and motions of complementarity determining region (CDR) loops. As was suggested in previous experimental studies, our simulations show that antibodies can explore a large range of conformational space.


Asunto(s)
Proteínas Intrínsecamente Desordenadas , Simulación de Dinámica Molecular , Conformación Proteica , Regiones Determinantes de Complementariedad , Anticuerpos
12.
J Chem Inf Model ; 62(18): 4427-4434, 2022 09 26.
Artículo en Inglés | MEDLINE | ID: mdl-36074116

RESUMEN

To obtain observable physical or molecular properties such as ionization potential and fluorescent wavelength with quantum chemical (QC) computation, multi-step computation manipulated by a human is required. Hence, automating the multi-step computational process and making it a black box that can be handled by anybody are important for effective database construction and fast realistic material design through the framework of black-box optimization where machine learning algorithms are introduced as a predictor. Here, we propose a Python library, QCforever, to automate the computation of some molecular properties and chemical phenomena induced by molecules. This tool just requires a molecule file for providing its observable properties, automating the computation process of molecular properties (for ionization potential, fluorescence, etc.) and output analysis for providing their multi-values for evaluating a molecule. Incorporating the tool in black-box optimization, we can explore molecules that have properties we desired within the limitation of QC computation.


Asunto(s)
Algoritmos , Aprendizaje Automático , Bases de Datos Factuales , Humanos
13.
Sci Rep ; 12(1): 13955, 2022 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-35977989

RESUMEN

Within the microbial rhodopsin family, heliorhodopsins (HeRs) form a phylogenetically distinct group of light-harvesting retinal proteins with largely unknown functions. We have determined the 1.97 Å resolution X-ray crystal structure of Thermoplasmatales archaeon SG8-52-1 heliorhodopsin (TaHeR) in the presence of NaCl under acidic conditions (pH 4.5), which complements the known 2.4 Å TaHeR structure acquired at pH 8.0. The low pH structure revealed that the hydrophilic Schiff base cavity (SBC) accommodates a chloride anion to stabilize the protonated retinal Schiff base when its primary counterion (Glu-108) is neutralized. Comparison of the two structures at different pH revealed conformational changes connecting the SBC and the extracellular loop linking helices A-B. We corroborated this intramolecular signaling transduction pathway with computational studies, which revealed allosteric network changes propagating from the perturbed SBC to the intracellular and extracellular space, suggesting TaHeR may function as a sensory rhodopsin. This intramolecular signaling mechanism may be conserved among HeRs, as similar changes were observed for HeR 48C12 between its pH 8.8 and pH 4.3 structures. We additionally performed DEER experiments, which suggests that TaHeR forms possible dimer-of-dimer associations which may be integral to its putative functionality as a light sensor in binding a transducer protein.


Asunto(s)
Cloruros , Bases de Schiff , Sitios de Unión , Espectroscopía de Resonancia por Spin del Electrón , Concentración de Iones de Hidrógeno , Rodopsina/química , Rodopsinas Microbianas/química , Bases de Schiff/química , Transducción de Señal
14.
Sci Technol Adv Mater ; 23(1): 352-360, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35693890

RESUMEN

Recently, artificial intelligence (AI)-enabled de novo molecular generators (DNMGs) have automated molecular design based on data-driven or simulation-based property estimates. In some domains like the game of Go where AI surpassed human intelligence, humans are trying to learn from AI about the best strategy of the game. To understand DNMG's strategy of molecule optimization, we propose an algorithm called characteristic functional group monitoring (CFGM). Given a time series of generated molecules, CFGM monitors statistically enriched functional groups in comparison to the training data. In the task of absorption wavelength maximization of pure organic molecules (consisting of H, C, N, and O), we successfully identified a strategic change from diketone and aniline derivatives to quinone derivatives. In addition, CFGM led us to a hypothesis that 1,2-quinone is an unconventional chromophore, which was verified with chemical synthesis. This study shows the possibility that human experts can learn from DNMGs to expand their ability to discover functional molecules.

15.
Phys Chem Chem Phys ; 24(17): 10305-10310, 2022 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-35437567

RESUMEN

Reaction path finding methods construct a graph connecting reactants and products in a quantum chemical energy landscape. They are useful in elucidating various reactions and provide footsteps for designing new reactions. Their enormous computational cost, however, limits their application to relatively simple reactions. This paper engages in accelerating reaction path finding by introducing the principles of algorithmic search. A new method called RRT/SC-AFIR is devised by combining rapidly exploring random tree (RRT) and single component artificial force induced reaction (SC-AFIR). Using 96 cores, our method succeeded in constructing a reaction graph for Fritsch-Buttenberg-Wiechell rearrangement within a time limit of 3 days, while the conventional methods could not. Our results illustrate that the algorithm theory provides refreshing and beneficial viewpoints on quantum chemical methodologies.

16.
Sci Adv ; 8(10): eabj3906, 2022 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-35263133

RESUMEN

Designing fluorescent molecules requires considering multiple interrelated molecular properties, as opposed to properties that straightforwardly correlated with molecular structure, such as light absorption of molecules. In this study, we have used a de novo molecule generator (DNMG) coupled with quantum chemical computation (QC) to develop fluorescent molecules, which are garnering significant attention in various disciplines. Using massive parallel computation (1024 cores, 5 days), the DNMG has produced 3643 candidate molecules. We have selected an unreported molecule and seven reported molecules and synthesized them. Photoluminescence spectrum measurements demonstrated that the DNMG can successfully design fluorescent molecules with 75% accuracy (n = 6/8) and create an unreported molecule that emits fluorescence detectable by the naked eye.

17.
J Chem Phys ; 156(4): 044117, 2022 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-35105077

RESUMEN

To develop useful drugs and materials, chemists synthesize diverse molecules by trying various reactants and reaction routes. Toward automating this process, we propose a deep generative model, called cascaded variational autoencoder (casVAE), for synthesizable molecular design. It generates a reaction tree, where the reactants are chosen from commercially available compounds and the synthesis route is constructed as a tree of reaction templates. The first part of casVAE is designed to generate a molecule called a surrogate product, while the second part constructs a reaction tree that synthesizes it. In benchmarking, casVAE showed its ability to generate reaction trees that yield high-quality and synthesizable molecules. An implementation of casVAE is publicly available at https://github.com/tsudalab/rxngenerator.

18.
Molecules ; 27(3)2022 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-35164065

RESUMEN

The entry of the SARS-CoV-2, a causative agent of COVID-19, into human host cells is mediated by the SARS-CoV-2 spike (S) glycoprotein, which critically depends on the formation of complexes involving the spike protein receptor-binding domain (RBD) and the human cellular membrane receptor angiotensin-converting enzyme 2 (hACE2). Using classical site density functional theory (SDFT) and structural bioinformatics methods, we investigate binding and conformational properties of these complexes and study the overlooked role of water-mediated interactions. Analysis of the three-dimensional reference interaction site model (3DRISM) of SDFT indicates that water mediated interactions in the form of additional water bridges strongly increases the binding between SARS-CoV-2 spike protein and hACE2 compared to SARS-CoV-1-hACE2 complex. By analyzing structures of SARS-CoV-2 and SARS-CoV-1, we find that the homotrimer SARS-CoV-2 S receptor-binding domain (RBD) has expanded in size, indicating large conformational change relative to SARS-CoV-1 S protein. Protomer with the up-conformational form of RBD, which binds with hACE2, exhibits stronger intermolecular interactions at the RBD-ACE2 interface, with differential distributions and the inclusion of specific H-bonds in the CoV-2 complex. Further interface analysis has shown that interfacial water promotes and stabilizes the formation of CoV-2/hACE2 complex. This interaction causes a significant structural rigidification of the spike protein, favoring proteolytic processing of the S protein for the fusion of the viral and cellular membrane. Moreover, conformational dynamics simulations of RBD motions in SARS-CoV-2 and SARS-CoV-1 point to the role in modification of the RBD dynamics and their impact on infectivity.


Asunto(s)
Enzima Convertidora de Angiotensina 2/ultraestructura , SARS-CoV-2/metabolismo , Glicoproteína de la Espiga del Coronavirus/ultraestructura , Enzima Convertidora de Angiotensina 2/metabolismo , COVID-19/metabolismo , COVID-19/fisiopatología , Biología Computacional/métodos , Teoría Funcional de la Densidad , Humanos , Modelos Teóricos , Unión Proteica , Dominios Proteicos , SARS-CoV-2/patogenicidad , Glicoproteína de la Espiga del Coronavirus/metabolismo , Glicoproteína de la Espiga del Coronavirus/fisiología , Relación Estructura-Actividad
19.
ACS Med Chem Lett ; 13(1): 70-75, 2022 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-35047110

RESUMEN

A large amount of bioactivity assay data is already accumulated in public databases, but the integration of these data sets for quantitative structure-activity relationship (QSAR) studies is not straightforward due to differences in experimental methods and settings. We present an efficient deep-learning-based approach called Deep Preference Data Integration (DPDI). For integrating outcome variables of different assay types, a surrogate variable is introduced, and a neural network is trained such that the total order induced by the surrogate variable is maximally consistent with given data sets. In a task of predicting efficacy of factor Xa inhibitors, DPDI successfully integrated 2959 molecules distributed in 129 assay data sets. In most of our experiments, data integration improved prediction accuracy strongly in interpolation and extrapolation tasks, indicating that DPDI is an effective tool for QSAR studies.

20.
J Cheminform ; 13(1): 88, 2021 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-34775976

RESUMEN

BACKGROUND: In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. RESULTS: In this article, a simple method is described to generate only valid molecules at high frequency ([Formula: see text] molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ([Formula: see text] molecule/s) because it relies almost exclusively on string operations. The "Fast Assembly of SMILES Fragments" software is released as open-source at https://github.com/UnixJunkie/FASMIFRA . Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...