Pesquisa | Portal Regional da BVS

Docking optimization, variance and promiscuity for large-scale drug-like chemical space using high performance computing architectures.

Trager, Richard E; Giblock, Paul; Soltani, Sherwin; Upadhyay, Amit A; Rekapalli, Bhanu; Peterson, Yuri K.

Drug Discov Today ; 21(10): 1672-1680, 2016 10.

Artigo em Inglês | MEDLINE | ID: mdl-27352630

RESUMO

There is a continuing need to hasten and improve protein-ligand docking to facilitate the next generation of drug discovery. As the drug-like chemical space reaches into the billions of molecules, increasingly powerful computer systems are required to probe, as well as tackle, the software engineering challenges needed to adapt existing docking programs to use next-generation massively parallel processing systems. We demonstrate docking setup using the wrapper code approach to optimize the DOCK program for large-scale computation as well as docking analysis using variance and promiscuity as examples. Wrappers provide faster docking speeds when compared with the naive multi-threading system MPI-DOCK, making future endeavors in large-scale docking more feasible; in addition, eliminating highly variant or promiscuous compounds will make databases more useful.

Assuntos

Descoberta de Drogas , Simulação de Acoplamento Molecular , Metodologias Computacionais , Humanos

Optimizing high performance computing workflow for protein functional annotation.

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene.

Concurr Comput ; 26(13): 2112-2121, 2014 Sep 10.

Artigo em Inglês | MEDLINE | ID: mdl-25313296

RESUMO

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

PoPLAR: Portal for Petascale Lifescience Applications and Research.

Rekapalli, Bhanu; Giblock, Paul; Reardon, Christopher.

BMC Bioinformatics ; 14 Suppl 9: S3, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23902523

RESUMO

BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. METHODS: The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. RESULTS: This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. CONCLUSIONS: This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers.

Assuntos

Biologia Computacional/métodos , Metodologias Computacionais , Software , Bases de Dados Genéticas , Processamento Eletrônico de Dados , Armazenamento e Recuperação da Informação/métodos , Internet , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA