Search | Nursing VHL Search Portal

Improving protein-ligand docking and screening accuracies by incorporating a scoring function correction term.

Zheng, Liangzhen; Meng, Jintao; Jiang, Kai; Lan, Haidong; Wang, Zechen; Lin, Mingzhi; Li, Weifeng; Guo, Hongwei; Wei, Yanjie; Mu, Yuguang.

Brief Bioinform ; 23(3)2022 05 13.

Article in English | MEDLINE | ID: mdl-35289359

ABSTRACT

Scoring functions are important components in molecular docking for structure-based drug discovery. Traditional scoring functions, generally empirical- or force field-based, are robust and have proven to be useful for identifying hits and lead optimizations. Although multiple highly accurate deep learning- or machine learning-based scoring functions have been developed, their direct applications for docking and screening are limited. We describe a novel strategy to develop a reliable protein-ligand scoring function by augmenting the traditional scoring function Vina score using a correction term (OnionNet-SFCT). The correction term is developed based on an AdaBoost random forest model, utilizing multiple layers of contacts formed between protein residues and ligand atoms. In addition to the Vina score, the model considerably enhances the AutoDock Vina prediction abilities for docking and screening tasks based on different benchmarks (such as cross-docking dataset, CASF-2016, DUD-E and DUD-AD). Furthermore, our model could be combined with multiple docking applications to increase pose selection accuracies and screening abilities, indicating its wide usage for structure-based drug discoveries. Furthermore, in a reverse practice, the combined scoring strategy successfully identified multiple known receptors of a plant hormone. To summarize, the results show that the combination of data-driven model (OnionNet-SFCT) and empirical scoring function (Vina score) is a good scoring strategy that could be useful for structure-based drug discoveries and potentially target fishing in future.

Subject(s)

Drug Discovery , Proteins , Drug Discovery/methods , Ligands , Machine Learning , Molecular Docking Simulation , Protein Binding , Proteins/chemistry

RabbitQC: high-speed scalable quality control for sequencing data.

Yin, Zekun; Zhang, Hao; Liu, Meiyang; Zhang, Wen; Song, Honglei; Lan, Haidong; Wei, Yanjie; Niu, Beifang; Schmidt, Bertil; Liu, Weiguo.

Bioinformatics ; 37(4): 573-574, 2021 05 01.

Article in English | MEDLINE | ID: mdl-32790850

ABSTRACT

MOTIVATION: Modern sequencing technologies continue to revolutionize many areas of biology and medicine. Since the generated datasets are error-prone, downstream applications usually require quality control methods to pre-process FASTQ files. However, existing tools for this task are currently not able to fully exploit the capabilities of computing platforms leading to slow runtimes. RESULTS: We present RabbitQC, an extremely fast integrated quality control tool for FASTQ files, which can take full advantage of modern hardware. It includes a variety of operations and supports different sequencing technologies (Illumina, Oxford Nanopore and PacBio). RabbitQC achieves speedups between one and two orders-of-magnitude compared to other state-of-the-art tools. AVAILABILITY AND IMPLEMENTATION: C++ sources and binaries are available at https://github.com/ZekunYin/RabbitQC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Nanopores , Software , High-Throughput Nucleotide Sequencing , Quality Control , Sequence Analysis, DNA

When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction).

Shen, Tao; Wu, Jiaxiang; Lan, Haidong; Zheng, Liangzhen; Pei, Jianguo; Wang, Sheng; Liu, Wei; Huang, Junzhou.

Proteins ; 89(12): 1901-1910, 2021 12.

Article in English | MEDLINE | ID: mdl-34473376

ABSTRACT

In this paper, we report our tFold framework's performance on the inter-residue contact prediction task in the 14th Critical Assessment of protein Structure Prediction (CASP14). Our tFold framework seamlessly combines both homologous sequences and structural decoys under an ultra-deep network architecture. Squeeze-excitation and axial attention mechanisms are employed to effectively capture inter-residue interactions. In CASP14, our best predictor achieves 41.78% in the averaged top-L precision for long-range contacts for all the 22 free-modeling (FM) targets, and ranked 1st among all the 60 participating teams. The tFold web server is now freely available at: https://drug.ai.tencent.com/console/en/tfold.

Subject(s)

Neural Networks, Computer , Protein Folding , Proteins , Software , Structural Homology, Protein , Computational Biology , Models, Molecular , Proteins/chemistry , Proteins/metabolism , Reproducibility of Results , Sequence Analysis, Protein

Biosynthesis of Chuangxinmycin Featuring a Deubiquitinase-like Sulfurtransferase.

Zhang, Xingwang; Xu, Xiaokun; You, Cai; Yang, Chaofan; Guo, Jiawei; Sang, Moli; Geng, Ce; Cheng, Fangyuan; Du, Lei; Shen, Yuemao; Wang, Sheng; Lan, Haidong; Yang, Fan; Li, Yuezhong; Tang, Ya-Jie; Zhang, Youming; Bian, Xiaoying; Li, Shengying; Zhang, Wei.

Angew Chem Int Ed Engl ; 60(46): 24418-24423, 2021 11 08.

Article in English | MEDLINE | ID: mdl-34498345

ABSTRACT

The knowledge on sulfur incorporation mechanism involved in sulfur-containing molecule biosynthesis remains limited. Chuangxinmycin is a sulfur-containing antibiotic with a unique thiopyrano[4,3,2-cd]indole (TPI) skeleton and selective inhibitory activity against bacterial tryptophanyl-tRNA synthetase. Despite the previously reported biosynthetic gene clusters and the recent functional characterization of a P450 enzyme responsible for C-S bond formation, the enzymatic mechanism for sulfur incorporation remains unknown. Here, we resolve this central biosynthetic problem by in vitro biochemical characterization of the key enzymes and reconstitute the TPI skeleton in a one-pot enzymatic reaction. We reveal that the JAMM/MPN+ protein Cxm3 functions as a deubiquitinase-like sulfurtransferase to catalyze a non-classical sulfur-transfer reaction by interacting with the ubiquitin-like sulfur carrier protein Cxm4GG. This finding adds a new mechanism for sulfurtransferase in nature.

Subject(s)

Anti-Bacterial Agents/biosynthesis , Bacterial Proteins/metabolism , Sulfurtransferases/metabolism , Actinoplanes/genetics , Actinoplanes/metabolism , Anti-Bacterial Agents/chemistry , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Escherichia coli/chemistry , Escherichia coli/genetics , Escherichia coli/metabolism , Humans , Indoles/analysis , Indoles/chemistry , Indoles/metabolism , Multigene Family , Pyrococcus/enzymology , Pyrococcus/genetics , Sulfur/metabolism , Sulfurtransferases/chemistry , Sulfurtransferases/genetics , Ubiquitination , Ubiquitins/genetics , Ubiquitins/metabolism

BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures.

Zhang, Jikai; Lan, Haidong; Chan, Yuandong; Shang, Yuan; Schmidt, Bertil; Liu, Weiguo.

Bioinformatics ; 35(13): 2306-2308, 2019 07 01.

Article in English | MEDLINE | ID: mdl-30445566

ABSTRACT

MOTIVATION: Modern bioinformatics tools for analyzing large-scale NGS datasets often need to include fast implementations of core sequence alignment algorithms in order to achieve reasonable execution times. We address this need by presenting the BGSA toolkit for optimized implementations of popular bit-parallel global pairwise alignment algorithms on modern microprocessors. RESULTS: BGSA outperforms Edlib, SeqAn and BitPAl for pairwise edit distance computations and Parasail, SeqAn and BitPAl when using more general scoring schemes for pairwise alignments of a batch of sequence reads on both standard multi-core CPUs and Xeon Phi many-core CPUs. Furthermore, banded edit distance performance of BGSA on a Xeon Phi-7210 outperforms the highly optimized NVBio implementation on a Titan X GPU for the seed verification stage of a read mapper by a factor of 4.4. AVAILABILITY AND IMPLEMENTATION: BGSA is open-source and available at https://github.com/sdu-hpcl/BGSA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Software , Sequence Alignment , Sequence Analysis, DNA

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo.

BMC Bioinformatics ; 17 Suppl 9: 267, 2016 Jul 19.

Article in English | MEDLINE | ID: mdl-27455061

ABSTRACT

BACKGROUND: Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. RESULTS: This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. CONCLUSIONS: Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

Subject(s)

Computational Biology/methods , Proteins/chemistry , Algorithms , Amino Acid Sequence , Databases, Nucleic Acid , Databases, Protein , Proteins/genetics , Sequence Alignment , Software

Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

Yin, Zekun; Lan, Haidong; Tan, Guangming; Lu, Mian; Vasilakos, Athanasios V; Liu, Weiguo.

Comput Struct Biotechnol J ; 15: 403-411, 2017.

Article in English | MEDLINE | ID: mdl-28883909

ABSTRACT

The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL