Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
1.
J Chem Inf Model ; 63(8): 2360-2369, 2023 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-37036083

RESUMO

In the presence of structural data, one sometimes need to compare 3D ligands. We design an overlay-free method to rank order 3D molecules in the pharmacophore feature space. The proposed encoding includes only two fittable parameters, is sparse, and is not too high dimensional. At the cost of an additional parameter, to delineate the binding site from a protein-ligand complex, the method can compare binding sites. The method was benchmarked on the LIT-PCBA data set for ligand-based virtual screening experiments and the sc-PDB and a Vertex data set when comparing binding sites. In similarity searches, the proposed method outperforms an open-source software doing optimal superposition of ligand-based pharmacophores and RDKit's 3D pharmacophore fingerprints. When comparing binding sites, the method is competitive with state of the art approaches. On a single CPU core, up to 374,000 ligand/s or 132,000 binding site/s can be rank ordered. The "AutoCorrelation of Pharmacophore Features" open-source software is released at https://github.com/tsudalab/ACP4.


Assuntos
Farmacóforo , Software , Ligantes , Sítios de Ligação
2.
Acc Chem Res ; 54(6): 1334-1346, 2021 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-33635621

RESUMO

In chemistry and materials science, researchers and engineers discover, design, and optimize chemical compounds or materials with their professional knowledge and techniques. At the highest level of abstraction, this process is formulated as black-box optimization. For instance, the trial-and-error process of synthesizing various molecules for better material properties can be regarded as optimizing a black-box function describing the relation between a chemical formula and its properties. Various black-box optimization algorithms have been developed in the machine learning and statistics communities. Recently, a number of researchers have reported successful applications of such algorithms to chemistry. They include the design of photofunctional molecules and medical drugs, optimization of thermal emission materials and high Li-ion conductive solid electrolytes, and discovery of a new phase in inorganic thin films for solar cells.There are a wide variety of algorithms available for black-box optimization, such as Bayesian optimization, reinforcement learning, and active learning. Practitioners need to select an appropriate algorithm or, in some cases, develop novel algorithms to meet their demands. It is also necessary to determine how to best combine machine learning techniques with quantum mechanics- and molecular mechanics-based simulations, and experiments. In this Account, we give an overview of recent studies regarding automated discovery, design, and optimization based on black-box optimization. The Account covers the following algorithms: Bayesian optimization to optimize the chemical or physical properties, an optimization method using a quantum annealer, best-arm identification, gray-box optimization, and reinforcement learning. In addition, we introduce active learning and boundless objective-free exploration, which may not fall into the category of black-box optimization.Data quality and quantity are key for the success of these automated discovery techniques. As laboratory automation and robotics are put forward, automated discovery algorithms would be able to match human performance at least in some domains in the near future.

3.
J Chem Inf Model ; 62(18): 4427-4434, 2022 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-36074116

RESUMO

To obtain observable physical or molecular properties such as ionization potential and fluorescent wavelength with quantum chemical (QC) computation, multi-step computation manipulated by a human is required. Hence, automating the multi-step computational process and making it a black box that can be handled by anybody are important for effective database construction and fast realistic material design through the framework of black-box optimization where machine learning algorithms are introduced as a predictor. Here, we propose a Python library, QCforever, to automate the computation of some molecular properties and chemical phenomena induced by molecules. This tool just requires a molecule file for providing its observable properties, automating the computation process of molecular properties (for ionization potential, fluorescence, etc.) and output analysis for providing their multi-values for evaluating a molecule. Incorporating the tool in black-box optimization, we can explore molecules that have properties we desired within the limitation of QC computation.


Assuntos
Algoritmos , Aprendizado de Máquina , Bases de Dados Factuais , Humanos
4.
Phys Chem Chem Phys ; 24(17): 10305-10310, 2022 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-35437567

RESUMO

Reaction path finding methods construct a graph connecting reactants and products in a quantum chemical energy landscape. They are useful in elucidating various reactions and provide footsteps for designing new reactions. Their enormous computational cost, however, limits their application to relatively simple reactions. This paper engages in accelerating reaction path finding by introducing the principles of algorithmic search. A new method called RRT/SC-AFIR is devised by combining rapidly exploring random tree (RRT) and single component artificial force induced reaction (SC-AFIR). Using 96 cores, our method succeeded in constructing a reaction graph for Fritsch-Buttenberg-Wiechell rearrangement within a time limit of 3 days, while the conventional methods could not. Our results illustrate that the algorithm theory provides refreshing and beneficial viewpoints on quantum chemical methodologies.

5.
J Chem Phys ; 156(4): 044117, 2022 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-35105077

RESUMO

To develop useful drugs and materials, chemists synthesize diverse molecules by trying various reactants and reaction routes. Toward automating this process, we propose a deep generative model, called cascaded variational autoencoder (casVAE), for synthesizable molecular design. It generates a reaction tree, where the reactants are chosen from commercially available compounds and the synthesis route is constructed as a tree of reaction templates. The first part of casVAE is designed to generate a molecule called a surrogate product, while the second part constructs a reaction tree that synthesizes it. In benchmarking, casVAE showed its ability to generate reaction trees that yield high-quality and synthesizable molecules. An implementation of casVAE is publicly available at https://github.com/tsudalab/rxngenerator.

6.
Sci Technol Adv Mater ; 23(1): 352-360, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35693890

RESUMO

Recently, artificial intelligence (AI)-enabled de novo molecular generators (DNMGs) have automated molecular design based on data-driven or simulation-based property estimates. In some domains like the game of Go where AI surpassed human intelligence, humans are trying to learn from AI about the best strategy of the game. To understand DNMG's strategy of molecule optimization, we propose an algorithm called characteristic functional group monitoring (CFGM). Given a time series of generated molecules, CFGM monitors statistically enriched functional groups in comparison to the training data. In the task of absorption wavelength maximization of pure organic molecules (consisting of H, C, N, and O), we successfully identified a strategic change from diketone and aniline derivatives to quinone derivatives. In addition, CFGM led us to a hypothesis that 1,2-quinone is an unconventional chromophore, which was verified with chemical synthesis. This study shows the possibility that human experts can learn from DNMGs to expand their ability to discover functional molecules.

7.
Molecules ; 27(3)2022 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-35164065

RESUMO

The entry of the SARS-CoV-2, a causative agent of COVID-19, into human host cells is mediated by the SARS-CoV-2 spike (S) glycoprotein, which critically depends on the formation of complexes involving the spike protein receptor-binding domain (RBD) and the human cellular membrane receptor angiotensin-converting enzyme 2 (hACE2). Using classical site density functional theory (SDFT) and structural bioinformatics methods, we investigate binding and conformational properties of these complexes and study the overlooked role of water-mediated interactions. Analysis of the three-dimensional reference interaction site model (3DRISM) of SDFT indicates that water mediated interactions in the form of additional water bridges strongly increases the binding between SARS-CoV-2 spike protein and hACE2 compared to SARS-CoV-1-hACE2 complex. By analyzing structures of SARS-CoV-2 and SARS-CoV-1, we find that the homotrimer SARS-CoV-2 S receptor-binding domain (RBD) has expanded in size, indicating large conformational change relative to SARS-CoV-1 S protein. Protomer with the up-conformational form of RBD, which binds with hACE2, exhibits stronger intermolecular interactions at the RBD-ACE2 interface, with differential distributions and the inclusion of specific H-bonds in the CoV-2 complex. Further interface analysis has shown that interfacial water promotes and stabilizes the formation of CoV-2/hACE2 complex. This interaction causes a significant structural rigidification of the spike protein, favoring proteolytic processing of the S protein for the fusion of the viral and cellular membrane. Moreover, conformational dynamics simulations of RBD motions in SARS-CoV-2 and SARS-CoV-1 point to the role in modification of the RBD dynamics and their impact on infectivity.


Assuntos
Enzima de Conversão de Angiotensina 2/ultraestrutura , SARS-CoV-2/metabolismo , Glicoproteína da Espícula de Coronavírus/ultraestrutura , Enzima de Conversão de Angiotensina 2/metabolismo , COVID-19/metabolismo , COVID-19/fisiopatologia , Biologia Computacional/métodos , Teoria da Densidade Funcional , Humanos , Modelos Teóricos , Ligação Proteica , Domínios Proteicos , SARS-CoV-2/patogenicidade , Glicoproteína da Espícula de Coronavírus/metabolismo , Glicoproteína da Espícula de Coronavírus/fisiologia , Relação Estrutura-Atividade
8.
Environ Health Prev Med ; 26(1): 51, 2021 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-33892635

RESUMO

BACKGROUND: The Fujiwara-kyo Osteoporosis Risk in Men (FORMEN) study was launched to investigate risk factors for osteoporotic fractures, interactions of osteoporosis with other non-communicable chronic diseases, and effects of fracture on QOL and mortality. METHODS: FORMEN baseline study participants (in 2007 and 2008) included 2012 community-dwelling men (aged 65-93 years) in Nara prefecture, Japan. Clinical follow-up surveys were conducted 5 and 10 years after the baseline survey, and 1539 and 906 men completed them, respectively. Supplemental mail, telephone, and visit surveys were conducted with non-participants to obtain outcome information. Survival and fracture outcomes were determined for 2006 men, with 566 deaths identified and 1233 men remaining in the cohort at 10-year follow-up. COMMENTS: The baseline survey covered a wide range of bone health-related indices including bone mineral density, trabecular microarchitecture assessment, vertebral imaging for detecting vertebral fractures, and biochemical markers of bone turnover, as well as comprehensive geriatric assessment items. Follow-up surveys were conducted to obtain outcomes including osteoporotic fracture, cardiovascular diseases, initiation of long-term care, and mortality. A complete list of publications relating to the FORMEN study can be found at https://www.med.kindai.ac.jp/pubheal/FORMEN/Publications.html .


Assuntos
Densidade Óssea , Doenças Cardiovasculares/epidemiologia , Assistência de Longa Duração/estatística & dados numéricos , Osteoporose/epidemiologia , Fraturas por Osteoporose/epidemiologia , Idoso , Doenças Cardiovasculares/etiologia , Estudos de Coortes , Avaliação Geriátrica , Humanos , Vida Independente , Japão/epidemiologia , Masculino , Pessoa de Meia-Idade , Osteoporose/complicações , Osteoporose/etiologia , Fraturas por Osteoporose/etiologia , Fatores de Risco
9.
Sci Technol Adv Mater ; 21(1): 552-561, 2020 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-32939179

RESUMO

Nuclear magnetic resonance (NMR) spectroscopy is an effective tool for identifying molecules in a sample. Although many previously observed NMR spectra are accumulated in public databases, they cover only a tiny fraction of the chemical space, and molecule identification is typically accomplished manually based on expert knowledge. Herein, we propose NMR-TS, a machine-learning-based python library, to automatically identify a molecule from its NMR spectrum. NMR-TS discovers candidate molecules whose NMR spectra match the target spectrum by using deep learning and density functional theory (DFT)-computed spectra. As a proof-of-concept, we identify prototypical metabolites from their computed spectra. After an average 5451 DFT runs for each spectrum, six of the nine molecules are identified correctly, and proximal molecules are obtained in the other cases. This encouraging result implies that de novo molecule generation can contribute to the fully automated identification of chemical structures. NMR-TS is available at https://github.com/tsudalab/NMR-TS.

10.
Bioinformatics ; 34(17): 3047-3049, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29659720

RESUMO

Summary: Exhaustive detection of multi-loci markers from genome-wide association study datasets is a computationally challenging problem. This paper presents a massively parallel algorithm for finding all significant combinations of alleles and introduces a software tool termed MP-LAMP that can be easily deployed in a cloud platform, such as Amazon Web Service, as well as in an in-house computer cluster. Multi-loci marker detection is an unbalanced tree search problem that cannot be parallelized by simple tree-splitting using generic parallel programming frameworks, such as Map-Reduce. We employ work stealing and periodic reduce-broadcast to decrease the running time almost linearly to the number of cores. Availability and implementation: MP-LAMP is available at https://github.com/tsudalab/mp-lamp. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Computação em Nuvem , Algoritmos , Humanos , Software
11.
Bioinformatics ; 34(5): 770-778, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29040432

RESUMO

Motivation: Fast and accurate prediction of protein-ligand binding structures is indispensable for structure-based drug design and accurate estimation of binding free energy of drug candidate molecules in drug discovery. Recently, accurate pose prediction methods based on short Molecular Dynamics (MD) simulations, such as MM-PBSA and MM-GBSA, among generated docking poses have been used. Since molecular structures obtained from MD simulation depend on the initial condition, taking the average over different initial conditions leads to better accuracy. Prediction accuracy of protein-ligand binding poses can be improved with multiple runs at different initial velocity. Results: This paper shows that a machine learning method, called Best Arm Identification, can optimally control the number of MD runs for each binding pose. It allows us to identify a correct binding pose with a minimum number of total runs. Our experiment using three proteins and eight inhibitors showed that the computational cost can be reduced substantially without sacrificing accuracy. This method can be applied for controlling all kinds of molecular simulations to obtain best results under restricted computational resources. Availability and implementation: Code and data are available on GitHub at https://github.com/tsudalab/bpbi. Contact: terayama@cbms.k.u-tokyo.ac.jp or tsuda@k.u-tokyo.ac.jp. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Descoberta de Drogas/métodos , Ligantes , Aprendizado de Máquina , Simulação de Dinâmica Molecular , Proteínas/química , Biologia Computacional/métodos , Ligação Proteica , Conformação Proteica , Proteínas/metabolismo
12.
J Chem Phys ; 151(21): 215104, 2019 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-31822094

RESUMO

Computational techniques for accurate and efficient prediction of protein-protein complex structures are widely used for elucidating protein-protein interactions, which play important roles in biological systems. Recently, it has been reported that selecting a structure similar to the native structure among generated structure candidates (decoys) is possible by calculating binding free energies of the decoys based on all-atom molecular dynamics (MD) simulations with explicit solvent and the solution theory in the energy representation, which is called evERdock. A recent version of evERdock achieves a higher-accuracy decoy selection by introducing MD relaxation and multiple MD simulations/energy calculations; however, huge computational cost is required. In this paper, we propose an efficient decoy selection method using evERdock and the best arm identification (BAI) framework, which is one of the techniques of reinforcement learning. The BAI framework realizes an efficient selection by suppressing calculations for nonpromising decoys and preferentially calculating for the promising ones. We evaluate the performance of the proposed method for decoy selection problems of three protein-protein complex systems. Their results show that computational costs are successfully reduced by a factor of 4.05 (in the best case) compared to a standard decoy selection approach without sacrificing accuracy.


Assuntos
Aprendizado de Máquina , Simulação de Dinâmica Molecular , Proteínas/química , Ligação Proteica , Conformação Proteica
13.
J Comput Chem ; 39(32): 2679-2689, 2018 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-30515903

RESUMO

Protein-drug binding mode prediction from the apo-protein structure is challenging because drug binding often induces significant protein conformational changes. Here, the authors report a computational workflow that incorporates a novel pocket generation method. First, the closed protein pocket is expanded by repeatedly filling virtual atoms during molecular dynamics (MD) simulations. Second, after ligand docking toward the prepared pocket structures, binding mode candidates are ranked by MD/Molecular Mechanics Poisson-Boltzmann Surface Area. The authors validated our workflow using CDK2 kinase, which has an especially-closed ATP-binding pocket in the apo-form, and several inhibitors. The crystallographic pose coincided with the top-ranked docking pose for 59% (34/58) of the compounds and was within the top five-ranked ones for 88% (51/58), while those estimated by a conventional prediction protocol were 9% (5/58) and 50% (29/58), respectively. Our study demonstrates that the prediction accuracy is significantly improved by preceding pocket expansion, leading to generation of conformationally-diverse binding mode candidates. © 2018 Wiley Periodicals, Inc.


Assuntos
Quinase 2 Dependente de Ciclina/química , Simulação de Dinâmica Molecular , Inibidores de Proteínas Quinases/química , Sítios de Ligação , Quinase 2 Dependente de Ciclina/antagonistas & inibidores , Humanos , Ligantes , Modelos Moleculares , Estrutura Molecular , Inibidores de Proteínas Quinases/farmacologia
14.
J Chem Phys ; 148(24): 241716, 2018 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-29960333

RESUMO

Heteroatom doping has endowed graphene with manifold aspects of material properties and boosted its applications. The atomic structure determination of doped graphene is vital to understand its material properties. Motivated by the recently synthesized boron-doped graphene with relatively high concentration, here we employ machine learning methods to search the most stable structures of doped boron atoms in graphene, in conjunction with the atomistic simulations. From the determined stable structures, we find that in the free-standing pristine graphene, the doped boron atoms energetically prefer to substitute for the carbon atoms at different sublattice sites and that the para configuration of boron-boron pair is dominant in the cases of high boron concentrations. The boron doping can increase the work function of graphene by 0.7 eV for a boron content higher than 3.1%.

15.
BMC Bioinformatics ; 18(1): 468, 2017 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-29110632

RESUMO

BACKGROUND: Artificially synthesized RNA molecules provide important ways for creating a variety of novel functional molecules. State-of-the-art RNA inverse folding algorithms can design simple and short RNA sequences of specific GC content, that fold into the target RNA structure. However, their performance is not satisfactory in complicated cases. RESULT: We present a new inverse folding algorithm called MCTS-RNA, which uses Monte Carlo tree search (MCTS), a technique that has shown exceptional performance in Computer Go recently, to represent and discover the essential part of the sequence space. To obtain high accuracy, initial sequences generated by MCTS are further improved by a series of local updates. Our algorithm has an ability to control the GC content precisely and can deal with pseudoknot structures. Using common benchmark datasets for evaluation, MCTS-RNA showed a lot of promise as a standard method of RNA inverse folding. CONCLUSION: MCTS-RNA is available at https://github.com/tsudalab/MCTS-RNA .


Assuntos
Algoritmos , RNA/química , Internet , Método de Monte Carlo , Conformação de Ácido Nucleico , Dobramento de RNA , Análise de Sequência de RNA , Interface Usuário-Computador
16.
Bioinformatics ; 32(22): 3513-3515, 2016 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-27412093

RESUMO

One of the major issues in genome-wide association studies is to solve the missing heritability problem. While considering epistatic interactions among multiple SNPs may contribute to solving this problem, existing software cannot detect statistically significant high-order interactions. We propose software named LAMPLINK, which employs a cutting-edge method to enumerate statistically significant SNP combinations from genome-wide case-control data. LAMPLINK is implemented as a set of additional functions to PLINK, and hence existing procedures with PLINK can be applicable. Applied to the 1000 Genomes Project data, LAMPLINK detected a combination of five SNPs that are statistically significantly accumulated in the Japanese population. AVAILABILITY AND IMPLEMENTATION: LAMPLINK is available at http://a-terada.github.io/lamplink/ CONTACT: terada@cbms.k.u-tokyo.ac.jp or sese.jun@aist.go.jpSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Software , Animais , Genoma , Humanos
17.
Sci Technol Adv Mater ; 18(1): 972-976, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29435094

RESUMO

Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS.

18.
Sci Technol Adv Mater ; 18(1): 498-503, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28804525

RESUMO

Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.

19.
Sci Technol Adv Mater ; 18(1): 756-765, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29152012

RESUMO

We propose a novel representation of materials named an 'orbital-field matrix (OFM)', which is based on the distribution of valence shell electrons. We demonstrate that this new representation can be highly useful in mining material data. Experimental investigation shows that the formation energies of crystalline materials, atomization energies of molecular materials, and local magnetic moments of the constituent atoms in bimetal alloys of lanthanide metal and transition-metal can be predicted with high accuracy using the OFM. Knowledge regarding the role of the coordination numbers of the transition-metal and lanthanide elements in determining the local magnetic moments of the transition-metal sites can be acquired directly from decision tree regression analyses using the OFM.

20.
BMC Bioinformatics ; 17(1): 363, 2016 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-27620863

RESUMO

BACKGROUND: Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. RESULTS: Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. CONCLUSIONS: With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .


Assuntos
RNA/genética , Análise de Sequência de RNA/métodos , Células-Tronco/imunologia , Diferenciação Celular , Humanos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa