|

1.

Ligand-based virtual-screening identified a novel CFTR ligand which improves the defective cell surface expression of misfolded ABC transporters.

Taniguchi, Shogo; Berenger, Francois; Doi, Yukako; Mimura, Ayana; Yamanishi, Yoshihiro; Okiyoneda, Tsukasa.

Front Pharmacol ; 15: 1370676, 2024.

Article En | MEDLINE | ID: mdl-38666024

Cystic fibrosis (CF) is a monogenetic disease caused by the mutation of CFTR, a cAMP-regulated Cl- channel expressing at the apical plasma membrane (PM) of epithelia. ∆F508-CFTR, the most common mutant in CF, fails to reach the PM due to its misfolding and premature degradation at the endoplasmic reticulum (ER). Recently, CFTR modulators have been developed to correct CFTR abnormalities, with some being used as therapeutic agents for CF treatment. One notable example is Trikafta, a triple combination of CFTR modulators (TEZ/ELX/IVA), which significantly enhances the functionality of ΔF508-CFTR on the PM. However, there's room for improvement in its therapeutic effectiveness since TEZ/ELX/IVA doesn't fully stabilize ΔF508-CFTR on the PM. To discover new CFTR modulators, we conducted a virtual screening of approximately 4.3 million compounds based on the chemical structures of existing CFTR modulators. This effort led us to identify a novel CFTR ligand named FR3. Unlike clinically available CFTR modulators, FR3 appears to operate through a distinct mechanism of action. FR3 enhances the functional expression of ΔF508-CFTR on the apical PM in airway epithelial cell lines by stabilizing NBD1. Notably, FR3 counteracted the degradation of mature ΔF508-CFTR, which still occurs despite the presence of TEZ/ELX/IVA. Furthermore, FR3 corrected the defective PM expression of a misfolded ABCB1 mutant. Therefore, FR3 may be a potential lead compound for addressing diseases resulting from the misfolding of ABC transporters.

2.

A community effort in SARS-CoV-2 drug discovery.

Schimunek, Johannes; Seidl, Philipp; Elez, Katarina; Hempel, Tim; Le, Tuan; Noé, Frank; Olsson, Simon; Raich, Lluís; Winter, Robin; Gokcan, Hatice; Gusev, Filipp; Gutkin, Evgeny M; Isayev, Olexandr; Kurnikova, Maria G; Narangoda, Chamali H; Zubatyuk, Roman; Bosko, Ivan P; Furs, Konstantin V; Karpenko, Anna D; Kornoushenko, Yury V; Shuldau, Mikita; Yushkevich, Artsemi; Benabderrahmane, Mohammed B; Bousquet-Melou, Patrick; Bureau, Ronan; Charton, Beatrice; Cirou, Bertrand C; Gil, Gérard; Allen, William J; Sirimulla, Suman; Watowich, Stanley; Antonopoulos, Nick; Epitropakis, Nikolaos; Krasoulis, Agamemnon; Itsikalis, Vassilis; Theodorakis, Stavros; Kozlovskii, Igor; Maliutin, Anton; Medvedev, Alexander; Popov, Petr; Zaretckii, Mark; Eghbal-Zadeh, Hamid; Halmich, Christina; Hochreiter, Sepp; Mayr, Andreas; Ruch, Peter; Widrich, Michael; Berenger, Francois; Kumar, Ashutosh; Yamanishi, Yoshihiro.

Mol Inform ; 43(1): e202300262, 2024 Jan.

Article En | MEDLINE | ID: mdl-37833243

The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.

COVID-19 , SARS-CoV-2 , Humans , Pandemics , Biological Assay , Drug Discovery

3.

Quantum Annealing Designs Nonhemolytic Antimicrobial Peptides in a Discrete Latent Space.

Tucs, Andrejs; Berenger, Francois; Yumoto, Akiko; Tamura, Ryo; Uzawa, Takanori; Tsuda, Koji.

ACS Med Chem Lett ; 14(5): 577-582, 2023 May 11.

Article En | MEDLINE | ID: mdl-37197452

Increasing the variety of antimicrobial peptides is crucial in meeting the global challenge of multi-drug-resistant bacterial pathogens. While several deep-learning-based peptide design pipelines are reported, they may not be optimal in data efficiency. High efficiency requires a well-compressed latent space, where optimization is likely to fail due to numerous local minima. We present a multi-objective peptide design pipeline based on a discrete latent space and D-Wave quantum annealer with the aim of solving the local minima problem. To achieve multi-objective optimization, multiple peptide properties are encoded into a score using non-dominated sorting. Our pipeline is applied to design therapeutic peptides that are antimicrobial and non-hemolytic at the same time. From 200â¯000 peptides designed by our pipeline, four peptides proceeded to wet-lab validation. Three of them showed high anti-microbial activity, and two are non-hemolytic. Our results demonstrate how quantum-based optimizers can be taken advantage of in real-world medical studies.

4.

3D-Sensitive Encoding of Pharmacophore Features.

Berenger, Francois; Tsuda, Koji.

J Chem Inf Model ; 63(8): 2360-2369, 2023 04 24.

Article En | MEDLINE | ID: mdl-37036083

In the presence of structural data, one sometimes need to compare 3D ligands. We design an overlay-free method to rank order 3D molecules in the pharmacophore feature space. The proposed encoding includes only two fittable parameters, is sparse, and is not too high dimensional. At the cost of an additional parameter, to delineate the binding site from a protein-ligand complex, the method can compare binding sites. The method was benchmarked on the LIT-PCBA data set for ligand-based virtual screening experiments and the sc-PDB and a Vertex data set when comparing binding sites. In similarity searches, the proposed method outperforms an open-source software doing optimal superposition of ligand-based pharmacophores and RDKit's 3D pharmacophore fingerprints. When comparing binding sites, the method is competitive with state of the art approaches. On a single CPU core, up to 374,000 ligand/s or 132,000 binding site/s can be rank ordered. The "AutoCorrelation of Pharmacophore Features" open-source software is released at https://github.com/tsudalab/ACP4.

Pharmacophore , Software , Ligands , Binding Sites

5.

Molecular generation by Fast Assembly of (Deep)SMILES fragments.

Berenger, Francois; Tsuda, Koji.

J Cheminform ; 13(1): 88, 2021 Nov 14.

Article En | MEDLINE | ID: mdl-34775976

BACKGROUND: In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. RESULTS: In this article, a simple method is described to generate only valid molecules at high frequency ([Formula: see text] molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ([Formula: see text] molecule/s) because it relies almost exclusively on string operations. The "Fast Assembly of SMILES Fragments" software is released as open-source at https://github.com/UnixJunkie/FASMIFRA . Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.

6.

Lean-Docking: Exploiting Ligands' Predicted Docking Scores to Accelerate Molecular Docking.

Berenger, Francois; Kumar, Ashutosh; Zhang, Kam Y J; Yamanishi, Yoshihiro.

J Chem Inf Model ; 61(5): 2341-2352, 2021 05 24.

Article En | MEDLINE | ID: mdl-33861591

In structure-based virtual screening (SBVS), a binding site on a protein structure is used to search for ligands with favorable nonbonded interactions. Because it is computationally difficult, docking is time-consuming and any docking user will eventually encounter a chemical library that is too big to dock. This problem might arise because there is not enough computing power or because preparing and storing so many three-dimensional (3D) ligands requires too much space. In this study, however, we show that quality regressors can be trained to predict docking scores from molecular fingerprints. Although typical docking has a screening rate of less than one ligand per second on one CPU core, our regressors can predict about 5800 docking scores per second. This approach allows us to focus docking on the portion of a database that is predicted to have docking scores below a user-chosen threshold. Herein, usage examples are shown, where only 25% of a ligand database is docked, without any significant virtual screening performance loss. We call this method "lean-docking". To validate lean-docking, a massive docking campaign using several state-of-the-art docking software packages was undertaken on an unbiased data set, with only wet-lab tested active and inactive molecules. Although regressors allow the screening of a larger chemical space, even at a constant docking power, it is also clear that significant progress in the virtual screening power of docking scores is desirable.

Small Molecule Libraries , Binding Sites , Ligands , Molecular Docking Simulation , Protein Binding

7.

Ranking Molecules with Vanishing Kernels and a Single Parameter: Active Applicability Domain Included.

Berenger, Francois; Yamanishi, Yoshihiro.

J Chem Inf Model ; 60(9): 4376-4387, 2020 09 28.

Article En | MEDLINE | ID: mdl-32281797

In ligand-based virtual screening, high-throughput screening (HTS) data sets can be exploited to train classification models. Such models can be used to prioritize yet untested molecules, from the most likely active (against a protein target of interest) to the least likely active. In this study, a single-parameter ranking method with an Applicability Domain (AD) is proposed. In effect, Kernel Density Estimates (KDE) are revisited to improve their computational efficiency and incorporate an AD. Two modifications are proposed: (i) using vanishing kernels (i.e., kernel functions with a finite support) and (ii) using the Tanimoto distance between molecular fingerprints as a radial basis function. This construction is termed "Vanishing Ranking Kernels" (VRK). Using VRK on 21 HTS assays, it is shown that VRK can compete in performance with a graph convolutional deep neural network. VRK are conceptually simple and fast to train. During training, they require optimizing a single parameter. A trained VRK model usually defines an active AD. Exploiting this AD can significantly increase the screening frequency of a VRK model. Software: https://github.com/UnixJunkie/rankers. Data sets: https://zenodo.org/record/1320776 and https://zenodo.org/record/3540423.

Neural Networks, Computer , Software , Ligands

8.

Improvement of the novel inhibitor for Mycobacterium enoyl-acyl carrier protein reductase (InhA): a structure-activity relationship study of KES4 assisted by in silico structure-based drug screening.

Taira, Junichi; Umei, Tomohiro; Inoue, Keitaro; Kitamura, Mitsuru; Berenger, Francois; Sacchettini, James C; Sakamoto, Hiroshi; Aoki, Shunsuke.

J Antibiot (Tokyo) ; 73(6): 372-381, 2020 06.

Article En | MEDLINE | ID: mdl-32152525

InhA or enoyl-acyl carrier protein reductase of Mycobacterium tuberculosis (mtInhA), which controls mycobacterial cell wall construction, has been targeted in the development of antituberculosis drugs. Previously, our in silico structure-based drug screening study identified a novel class of compounds (designated KES4), which is capable of inhibiting the enzymatic activity of mtInhA, as well as mycobacterial growth. The compounds are composed of four ring structures (A-D), and the MD simulation predicted specific interactions with mtInhA of the D-ring and methylene group between the B-ring and C-ring; however, there is still room for improvement in the A-ring structure. In this study, a structure-activity relationship study of the A-ring was attempted with the assistance of in silico docking simulations. In brief, the virtual chemical library of A-ring-modified KES4 was constructed and subjected to in silico docking simulation against mtInhA using the GOLD program. Among the selected candidates, we achieved synthesis of seven compounds, and the bioactivities (effects on InhA activity and mycobacterial growth and cytotoxicity) of the synthesized molecules were evaluated. Among the compounds tested, two candidates (compounds 3d and 3f) exhibited superior properties as mtInhA-targeted anti-infectives for mycobacteria than the lead compound KES4.

Antitubercular Agents/pharmacology , Bacterial Proteins/antagonists & inhibitors , Mycobacterium tuberculosis/drug effects , Oxidoreductases/antagonists & inhibitors , Antitubercular Agents/chemistry , Computer Simulation , Molecular Docking Simulation , Structure-Activity Relationship

9.

Omics-based Identification of Glycan Structures as Biomarkers for a Variety of Diseases.

Akiyoshi, Sayaka; Iwata, Michio; Berenger, Francois; Yamanishi, Yoshihiro.

Mol Inform ; 39(1-2): e1900112, 2020 01.

Article En | MEDLINE | ID: mdl-31622036

Glycans play important roles in cell communication, protein interaction, and immunity, and structural changes in glycans are associated with the regulation of a range of biological pathways involved in disease. However, our understanding of the detailed relationships between specific diseases and glycans is very limited. In this study, we proposed an omics-based method to investigate the correlations between glycans and a wide range of human diseases. We analyzed the gene expression patterns of glycogenes (glycosyltransferases and glycosidases) for 79 different diseases. A biological pathway-based glycogene signature was constructed to identify the alteration in glycan biosynthesis and the associated glycan structures for each disease state. The degradation of N-glycan and keratan sulfate, for example, may promote the growth or metastasis of multiple types of cancer, including endometrial, gastric, and nasopharyngeal. Our results also revealed that commonalities between diseases can be interpreted using glycogene expression patterns, as well as the associated glycan structure patterns at the level of the affected pathway. The proposed method is expected to be useful for understanding the relationships between glycans, glycogenes, and disease and identifying disease-specific glycan biomarkers.

Biomarkers, Tumor/genetics , Neoplasms/genetics , Polysaccharides/genetics , Biomarkers, Tumor/metabolism , Carbohydrate Conformation , Humans , Neoplasms/metabolism , Polysaccharides/metabolism

10.

Predicting drug-induced transcriptome responses of a wide range of human cell lines by a novel tensor-train decomposition algorithm.

Iwata, Michio; Yuan, Longhao; Zhao, Qibin; Tabei, Yasuo; Berenger, Francois; Sawada, Ryusuke; Akiyoshi, Sayaka; Hamano, Momoko; Yamanishi, Yoshihiro.

Bioinformatics ; 35(14): i191-i199, 2019 07 15.

Article En | MEDLINE | ID: mdl-31510663

MOTIVATION: Genome-wide identification of the transcriptomic responses of human cell lines to drug treatments is a challenging issue in medical and pharmaceutical research. However, drug-induced gene expression profiles are largely unknown and unobserved for all combinations of drugs and human cell lines, which is a serious obstacle in practical applications. RESULTS: Here, we developed a novel computational method to predict unknown parts of drug-induced gene expression profiles for various human cell lines and predict new drug therapeutic indications for a wide range of diseases. We proposed a tensor-train weighted optimization (TT-WOPT) algorithm to predict the potential values for unknown parts in tensor-structured gene expression data. Our results revealed that the proposed TT-WOPT algorithm can accurately reconstruct drug-induced gene expression data for a range of human cell lines in the Library of Integrated Network-based Cellular Signatures. The results also revealed that in comparison with the use of original gene expression profiles, the use of imputed gene expression profiles improved the accuracy of drug repositioning. We also performed a comprehensive prediction of drug indications for diseases with gene expression profiles, which suggested many potential drug indications that were not predicted by previous approaches. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Computational Biology , Transcriptome , Algorithms , Cell Line , Drug Repositioning , Humans

11.

Chemoinformatics and structural bioinformatics in OCaml.

Berenger, Francois; Zhang, Kam Y J; Yamanishi, Yoshihiro.

J Cheminform ; 11(1): 10, 2019 Feb 05.

Article En | MEDLINE | ID: mdl-30719579

BACKGROUND: OCaml is a functional programming language with strong static types, Hindley-Milner type inference and garbage collection. In this article, we share our experience in prototyping chemoinformatics and structural bioinformatics software in OCaml. RESULTS: First, we introduce the language, list entry points for chemoinformaticians who would be interested in OCaml and give code examples. Then, we list some scientific open source software written in OCaml. We also present recent open source libraries useful in chemoinformatics. The parallelization of OCaml programs and their performance is also shown. Finally, tools and methods useful when prototyping scientific software in OCaml are given. CONCLUSIONS: In our experience, OCaml is a programming language of choice for method development in chemoinformatics and structural bioinformatics.

12.

A Distance-Based Boolean Applicability Domain for Classification of High Throughput Screening Data.

Berenger, Francois; Yamanishi, Yoshihiro.

J Chem Inf Model ; 59(1): 463-476, 2019 01 28.

Article En | MEDLINE | ID: mdl-30567434

In Quantitative Structure-Activity Relationship (QSAR) modeling, one must come up with an activity model but also with an applicability domain for that model. Some existing methods to create an applicability domain are complex, hard to implement, and/or difficult to interpret. Also, they often require the user to select a threshold value, or they embed an empirical constant. In this work, we propose a trivial to interpret and fully automatic Distance-Based Boolean Applicability Domain (DBBAD) algorithm for category QSAR. In retrospective experiments on High Throughput Screening data sets, this applicability domain improves the classification performance and early retrieval of support vector machine and random forest based classifiers, while improving the scaffold diversity among top-ranked active molecules.

Algorithms , Computational Biology/methods , Drug Evaluation, Preclinical , High-Throughput Screening Assays , Quantitative Structure-Activity Relationship

13.

Consensus queries in ligand-based virtual screening experiments.

Berenger, Francois; Vu, Oanh; Meiler, Jens.

J Cheminform ; 9(1): 60, 2017 Nov 28.

Article En | MEDLINE | ID: mdl-29185065

BACKGROUND: In ligand-based virtual screening experiments, a known active ligand is used in similarity searches to find putative active compounds for the same protein target. When there are several known active molecules, screening using all of them is more powerful than screening using a single ligand. A consensus query can be created by either screening serially with different ligands before merging the obtained similarity scores, or by combining the molecular descriptors (i.e. chemical fingerprints) of those ligands. RESULTS: We report on the discriminative power and speed of several consensus methods, on two datasets only made of experimentally verified molecules. The two datasets contain a total of 19 protein targets, 3776 known active and ~ 2 × 106 inactive molecules. Three chemical fingerprints are investigated: MACCS 166 bits, ECFP4 2048 bits and an unfolded version of MOLPRINT2D. Four different consensus policies and five consensus sizes were benchmarked. CONCLUSIONS: The best consensus method is to rank candidate molecules using the maximum score obtained by each candidate molecule versus all known actives. When the number of actives used is small, the same screening performance can be approached by a consensus fingerprint. However, if the computational exploration of the chemical space is limited by speed (i.e. throughput), a consensus fingerprint allows to outperform this consensus of scores.

14.

Fragger: a protein fragment picker for structural queries.

Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J.

F1000Res ; 6: 1722, 2017.

Article En | MEDLINE | ID: mdl-29399321

Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

15.

A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening.

Berenger, Francois; Voet, Arnout; Lee, Xiao Yin; Zhang, Kam Yj.

J Cheminform ; 6: 23, 2014.

Article En | MEDLINE | ID: mdl-24887178

BACKGROUND: Measures of similarity for chemical molecules have been developed since the dawn of chemoinformatics. Molecular similarity has been measured by a variety of methods including molecular descriptor based similarity, common molecular fragments, graph matching and 3D methods such as shape matching. Similarity measures are widespread in practice and have proven to be useful in drug discovery. Because of our interest in electrostatics and high throughput ligand-based virtual screening, we sought to exploit the information contained in atomic coordinates and partial charges of a molecule. RESULTS: A new molecular descriptor based on partial charges is proposed. It uses the autocorrelation function and linear binning to encode all atoms of a molecule into two rotation-translation invariant vectors. Combined with a scoring function, the descriptor allows to rank-order a database of compounds versus a query molecule. The proposed implementation is called ACPC (AutoCorrelation of Partial Charges) and released in open source. Extensive retrospective ligand-based virtual screening experiments were performed and other methods were compared with in order to validate the method and associated protocol. CONCLUSIONS: While it is a simple method, it performed remarkably well in experiments. At an average speed of 1649 molecules per second, it reached an average median area under the curve of 0.81 on 40 different targets; hence validating the proposed protocol and implementation.

16.

Combining in silico and in cerebro approaches for virtual screening and pose prediction in SAMPL4.

Voet, Arnout R D; Kumar, Ashutosh; Berenger, Francois; Zhang, Kam Y J.

J Comput Aided Mol Des ; 28(4): 363-73, 2014 Apr.

Article En | MEDLINE | ID: mdl-24446075

The SAMPL challenges provide an ideal opportunity for unbiased evaluation and comparison of different approaches used in computational drug design. During the fourth round of this SAMPL challenge, we participated in the virtual screening and binding pose prediction on inhibitors targeting the HIV-1 integrase enzyme. For virtual screening, we used well known and widely used in silico methods combined with personal in cerebro insights and experience. Regular docking only performed slightly better than random selection, but the performance was significantly improved upon incorporation of additional filters based on pharmacophore queries and electrostatic similarities. The best performance was achieved when logical selection was added. For the pose prediction, we utilized a similar consensus approach that amalgamated the results of the Glide-XP docking with structural knowledge and rescoring. The pose prediction results revealed that docking displayed reasonable performance in predicting the binding poses. However, prediction performance can be improved utilizing scientific experience and rescoring approaches. In both the virtual screening and pose prediction challenges, the top performance was achieved by our approaches. Here we describe the methods and strategies used in our approaches and discuss the rationale of their performances.

Computer-Aided Design , HIV Integrase Inhibitors/chemistry , HIV Integrase Inhibitors/pharmacology , HIV Integrase/metabolism , HIV-1/enzymology , Molecular Docking Simulation , Drug Design , HIV Infections/drug therapy , HIV Infections/enzymology , HIV Infections/virology , HIV Integrase/chemistry , Humans , Protein Binding , Software

17.

Electrostatic similarities between protein and small molecule ligands facilitate the design of protein-protein interaction inhibitors.

Voet, Arnout; Berenger, Francois; Zhang, Kam Y J.

PLoS One ; 8(10): e75762, 2013.

Article En | MEDLINE | ID: mdl-24130741

One of the underlying principles in drug discovery is that a biologically active compound is complimentary in shape and molecular recognition features to its receptor. This principle infers that molecules binding to the same receptor may share some common features. Here, we have investigated whether the electrostatic similarity can be used for the discovery of small molecule protein-protein interaction inhibitors (SMPPIIs). We have developed a method that can be used to evaluate the similarity of electrostatic potentials between small molecules and known protein ligands. This method was implemented in a software called EleKit. Analyses of all available (at the time of research) SMPPII structures indicate that SMPPIIs bear some similarities of electrostatic potential with the ligand proteins of the same receptor. This is especially true for the more polar SMPPIIs. Retrospective analysis of several successful SMPPIIs has shown the applicability of EleKit in the design of new SMPPIIs.

Proteins/antagonists & inhibitors , Proteins/metabolism , Static Electricity , Ligands , Protein Binding , Proteins/chemistry , Software

18.

A probabilistic fragment-based protein structure prediction algorithm.

Simoncini, David; Berenger, Francois; Shrestha, Rojan; Zhang, Kam Y J.

PLoS One ; 7(7): e38799, 2012.

Article En | MEDLINE | ID: mdl-22829868

Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software.html [corrected].

Algorithms , Proteins/chemistry , Protein Conformation

19.

Durandal: fast exact clustering of protein decoys.

Berenger, Francois; Shrestha, Rojan; Zhou, Yong; Simoncini, David; Zhang, Kam Y J.

J Comput Chem ; 33(4): 471-4, 2012 Feb 05.

Article En | MEDLINE | ID: mdl-22120171

In protein folding, clustering is commonly used as one way to identify the best decoy produced. Initializing the pairwise distance matrix for a large decoy set is computationally expensive. We have proposed a fast method that works even on large decoy sets. This method is implemented in a software called Durandal. Durandal has been shown to be consistently faster than other software performing fast exact clustering. In some cases, Durandal can even outperform the speed of an approximate method. Durandal uses the triangular inequality to accelerate exact clustering, without compromising the distance function. Recently, we have further enhanced the performance of Durandal by incorporating a Quaternion-based characteristic polynomial method that has increased the speed of Durandal between 13% and 27% compared with the previous version. Durandal source code is available under the GNU General Public License at http://www.riken.jp/zhangiru/software/durandal_released_qcp.tgz. Alternatively, a compiled version of Durandal is also distributed with the nightly builds of the Phenix (http://www.phenix-online.org/) crystallographic software suite (Adams et al., Acta Crystallogr Sect D 2010, 66, 213).

Protein Folding , Proteins/chemistry , Software , Cluster Analysis

20.

Accelerating ab initio phasing with de novo models.

Shrestha, Rojan; Berenger, Francois; Zhang, Kam Y J.

Acta Crystallogr D Biol Crystallogr ; 67(Pt 9): 804-12, 2011 Sep.

Article En | MEDLINE | ID: mdl-21904033

Ab initio phasing is one of the remaining challenges in protein crystallography. Recent progress in computational structure prediction has enabled the generation of de novo models with high enough accuracy to solve the phase problem ab initio. This `ab initio phasing with de novo models' method first generates a huge number of de novo models and then selects some lowest energy models to solve the phase problem using molecular replacement. The amount of CPU time required is huge even for small proteins and this has limited the utility of this method. Here, an approach is described that significantly reduces the computing time required to perform ab initio phasing with de novo models. Instead of performing molecular replacement after the completion of all models, molecular replacement is initiated during the course of each simulation. The approach principally focuses on avoiding the refinement of the best and the worst models and terminating the entire simulation early once suitable models for phasing have been obtained. In a benchmark data set of 20 proteins, this method is over two orders of magnitude faster than the conventional approach. It was observed that in most cases molecular-replacement solutions were determined soon after the coarse-grained models were turned into full-atom representations. It was also found that all-atom refinement was hardly able to change the models sufficiently to enable successful molecular replacement if the coarse-grained models were not very close to the native structure. Therefore, it remains critical to generate good-quality coarse-grained models to enable subsequent all-atom refinement for successful ab initio phasing by molecular replacement.

Crystallography, X-Ray/methods , Proteins/chemistry