Pesquisa | BVS Doenças Infecciosas e Parasitárias

Improving protein-ligand docking and screening accuracies by incorporating a scoring function correction term.

Zheng, Liangzhen; Meng, Jintao; Jiang, Kai; Lan, Haidong; Wang, Zechen; Lin, Mingzhi; Li, Weifeng; Guo, Hongwei; Wei, Yanjie; Mu, Yuguang.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35289359

RESUMO

Scoring functions are important components in molecular docking for structure-based drug discovery. Traditional scoring functions, generally empirical- or force field-based, are robust and have proven to be useful for identifying hits and lead optimizations. Although multiple highly accurate deep learning- or machine learning-based scoring functions have been developed, their direct applications for docking and screening are limited. We describe a novel strategy to develop a reliable protein-ligand scoring function by augmenting the traditional scoring function Vina score using a correction term (OnionNet-SFCT). The correction term is developed based on an AdaBoost random forest model, utilizing multiple layers of contacts formed between protein residues and ligand atoms. In addition to the Vina score, the model considerably enhances the AutoDock Vina prediction abilities for docking and screening tasks based on different benchmarks (such as cross-docking dataset, CASF-2016, DUD-E and DUD-AD). Furthermore, our model could be combined with multiple docking applications to increase pose selection accuracies and screening abilities, indicating its wide usage for structure-based drug discoveries. Furthermore, in a reverse practice, the combined scoring strategy successfully identified multiple known receptors of a plant hormone. To summarize, the results show that the combination of data-driven model (OnionNet-SFCT) and empirical scoring function (Vina score) is a good scoring strategy that could be useful for structure-based drug discoveries and potentially target fishing in future.

Assuntos

Descoberta de Drogas , Proteínas , Descoberta de Drogas/métodos , Ligantes , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química

RabbitQC: high-speed scalable quality control for sequencing data.

Yin, Zekun; Zhang, Hao; Liu, Meiyang; Zhang, Wen; Song, Honglei; Lan, Haidong; Wei, Yanjie; Niu, Beifang; Schmidt, Bertil; Liu, Weiguo.

Bioinformatics ; 37(4): 573-574, 2021 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-32790850

RESUMO

MOTIVATION: Modern sequencing technologies continue to revolutionize many areas of biology and medicine. Since the generated datasets are error-prone, downstream applications usually require quality control methods to pre-process FASTQ files. However, existing tools for this task are currently not able to fully exploit the capabilities of computing platforms leading to slow runtimes. RESULTS: We present RabbitQC, an extremely fast integrated quality control tool for FASTQ files, which can take full advantage of modern hardware. It includes a variety of operations and supports different sequencing technologies (Illumina, Oxford Nanopore and PacBio). RabbitQC achieves speedups between one and two orders-of-magnitude compared to other state-of-the-art tools. AVAILABILITY AND IMPLEMENTATION: C++ sources and binaries are available at https://github.com/ZekunYin/RabbitQC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Nanoporos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , Análise de Sequência de DNA

When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction).

Shen, Tao; Wu, Jiaxiang; Lan, Haidong; Zheng, Liangzhen; Pei, Jianguo; Wang, Sheng; Liu, Wei; Huang, Junzhou.

Proteins ; 89(12): 1901-1910, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34473376

RESUMO

In this paper, we report our tFold framework's performance on the inter-residue contact prediction task in the 14th Critical Assessment of protein Structure Prediction (CASP14). Our tFold framework seamlessly combines both homologous sequences and structural decoys under an ultra-deep network architecture. Squeeze-excitation and axial attention mechanisms are employed to effectively capture inter-residue interactions. In CASP14, our best predictor achieves 41.78% in the averaged top-L precision for long-range contacts for all the 22 free-modeling (FM) targets, and ranked 1st among all the 60 participating teams. The tFold web server is now freely available at: https://drug.ai.tencent.com/console/en/tfold.

Assuntos

Redes Neurais de Computação , Dobramento de Proteína , Proteínas , Software , Homologia Estrutural de Proteína , Biologia Computacional , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Reprodutibilidade dos Testes , Análise de Sequência de Proteína

Biosynthesis of Chuangxinmycin Featuring a Deubiquitinase-like Sulfurtransferase.

Zhang, Xingwang; Xu, Xiaokun; You, Cai; Yang, Chaofan; Guo, Jiawei; Sang, Moli; Geng, Ce; Cheng, Fangyuan; Du, Lei; Shen, Yuemao; Wang, Sheng; Lan, Haidong; Yang, Fan; Li, Yuezhong; Tang, Ya-Jie; Zhang, Youming; Bian, Xiaoying; Li, Shengying; Zhang, Wei.

Angew Chem Int Ed Engl ; 60(46): 24418-24423, 2021 11 08.

Artigo em Inglês | MEDLINE | ID: mdl-34498345

RESUMO

The knowledge on sulfur incorporation mechanism involved in sulfur-containing molecule biosynthesis remains limited. Chuangxinmycin is a sulfur-containing antibiotic with a unique thiopyrano[4,3,2-cd]indole (TPI) skeleton and selective inhibitory activity against bacterial tryptophanyl-tRNA synthetase. Despite the previously reported biosynthetic gene clusters and the recent functional characterization of a P450 enzyme responsible for C-S bond formation, the enzymatic mechanism for sulfur incorporation remains unknown. Here, we resolve this central biosynthetic problem by in vitro biochemical characterization of the key enzymes and reconstitute the TPI skeleton in a one-pot enzymatic reaction. We reveal that the JAMM/MPN+ protein Cxm3 functions as a deubiquitinase-like sulfurtransferase to catalyze a non-classical sulfur-transfer reaction by interacting with the ubiquitin-like sulfur carrier protein Cxm4GG. This finding adds a new mechanism for sulfurtransferase in nature.

Assuntos

Antibacterianos/biossíntese , Proteínas de Bactérias/metabolismo , Sulfurtransferases/metabolismo , Actinoplanes/genética , Actinoplanes/metabolismo , Antibacterianos/química , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Escherichia coli/química , Escherichia coli/genética , Escherichia coli/metabolismo , Humanos , Indóis/análise , Indóis/química , Indóis/metabolismo , Família Multigênica , Pyrococcus/enzimologia , Pyrococcus/genética , Enxofre/metabolismo , Sulfurtransferases/química , Sulfurtransferases/genética , Ubiquitinação , Ubiquitinas/genética , Ubiquitinas/metabolismo

BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures.

Zhang, Jikai; Lan, Haidong; Chan, Yuandong; Shang, Yuan; Schmidt, Bertil; Liu, Weiguo.

Bioinformatics ; 35(13): 2306-2308, 2019 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-30445566

RESUMO

MOTIVATION: Modern bioinformatics tools for analyzing large-scale NGS datasets often need to include fast implementations of core sequence alignment algorithms in order to achieve reasonable execution times. We address this need by presenting the BGSA toolkit for optimized implementations of popular bit-parallel global pairwise alignment algorithms on modern microprocessors. RESULTS: BGSA outperforms Edlib, SeqAn and BitPAl for pairwise edit distance computations and Parasail, SeqAn and BitPAl when using more general scoring schemes for pairwise alignments of a batch of sequence reads on both standard multi-core CPUs and Xeon Phi many-core CPUs. Furthermore, banded edit distance performance of BGSA on a Xeon Phi-7210 outperforms the highly optimized NVBio implementation on a Titan X GPU for the seed verification stage of a read mapper by a factor of 4.4. AVAILABILITY AND IMPLEMENTATION: BGSA is open-source and available at https://github.com/sdu-hpcl/BGSA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Alinhamento de Sequência , Análise de Sequência de DNA

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo.

BMC Bioinformatics ; 17 Suppl 9: 267, 2016 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-27455061

RESUMO

BACKGROUND: Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. RESULTS: This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. CONCLUSIONS: Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

Assuntos

Biologia Computacional/métodos , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Proteínas/genética , Alinhamento de Sequência , Software

Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

Yin, Zekun; Lan, Haidong; Tan, Guangming; Lu, Mian; Vasilakos, Athanasios V; Liu, Weiguo.

Comput Struct Biotechnol J ; 15: 403-411, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28883909

RESUMO

The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA