Pesquisa | BVS IEC

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation.

Lee, Daeseok; Hwang, Wonjun; Byun, Jeunghyun; Shin, Bonggun.

BMC Bioinformatics ; 25(1): 306, 2024 Sep 20.

Artigo em Inglês | MEDLINE | ID: mdl-39304807

RESUMO

BACKGROUND: Locating small molecule binding sites in target proteins, in the resolution of either pocket or residue, is critical in many drug-discovery scenarios. Since it is not always easy to find such binding sites using conventional methods, different deep learning methods to predict binding sites out of protein structures have been developed in recent years. The existing deep learning based methods have several limitations, including (1) the inefficiency of the CNN-only architecture, (2) loss of information due to excessive post-processing, and (3) the under-utilization of available data sources. METHODS: We present a new model architecture and training method that resolves the aforementioned problems. First, by layering geometric self-attention units on top of residue-level 3D CNN outputs, our model overcomes the problems of CNN-only architectures. Second, by configuring the fundamental units of computation as residues and pockets instead of voxels, our method reduced the information loss from post-processing. Lastly, by employing inter-resolution transfer learning and homology-based augmentation, our method maximizes the utilization of available data sources to a significant extent. RESULTS: The proposed method significantly outperformed all state-of-the-art baselines regarding both resolutions-pocket and residue. An ablation study demonstrated the indispensability of our proposed architecture, as well as transfer learning and homology-based augmentation, for achieving optimal performance. We further scrutinized our model's performance through a case study involving human serum albumin, which demonstrated our model's superior capability in identifying multiple binding sites of the protein, outperforming the existing methods. CONCLUSIONS: We believe that our contribution to the literature is twofold. Firstly, we introduce a novel computational method for binding site prediction with practical applications, substantiated by its strong performance across diverse benchmarks and case studies. Secondly, the innovative aspects in our method- specifically, the design of the model architecture, inter-resolution transfer learning, and homology-based augmentation-would serve as useful components for future work.

Assuntos

Proteínas , Sítios de Ligação , Proteínas/química , Proteínas/metabolismo , Aprendizado Profundo , Biologia Computacional/métodos , Ligação Proteica , Humanos , Bases de Dados de Proteínas

Controlled Molecule Generator for Optimizing Multiple Chemical Properties.

Shin, Bonggun; Park, Sungsoo; Bak, JinYeong; Ho, Joyce C.

ACM CHIL 2021 (2021) ; 2021: 146-153, 2021 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-35194593

RESUMO

Generating a novel and optimized molecule with desired chemical properties is an essential part of the drug discovery process. Failure to meet one of the required properties can frequently lead to failure in a clinical test which is costly. In addition, optimizing these multiple properties is a challenging task because the optimization of one property is prone to changing other properties. In this paper, we pose this multi-property optimization problem as a sequence translation process and propose a new optimized molecule generator model based on the Transformer with two constraint networks: property prediction and similarity prediction. We further improve the model by incorporating score predictions from these constraint networks in a modified beam search algorithm. The experiments demonstrate that our proposed model, Controlled Molecule Generator (CMG), outperforms state-of-the-art models by a significant margin for optimizing multiple properties simultaneously.

SMAT: An attention-based deep learning solution to the automation of schema matching.

Zhang, Jing; Shin, Bonggun; Choi, Jinho D; Ho, Joyce C.

Adv Databases Inf Syst ; 12843: 260-274, 2021 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-34608464

RESUMO

Schema matching aims to identify the correspondences among attributes of database schemas. It is frequently considered as the most challenging and decisive stage existing in many contemporary web semantics and database systems. Low-quality algorithmic matchers fail to provide improvement while manually annotation consumes extensive human efforts. Further complications arise from data privacy in certain domains such as healthcare, where only schema-level matching should be used to prevent data leakage. For this problem, we propose SMAT, a new deep learning model based on state-of-the-art natural language processing techniques to obtain semantic mappings between source and target schemas using only the attribute name and description. SMAT avoids directly encoding domain knowledge about the source and target systems, which allows it to be more easily deployed across different sites. We also introduce a new benchmark dataset, OMAP, based on real-world schema-level mappings from the healthcare domain. Our extensive evaluation of various benchmark datasets demonstrates the potential of SMAT to help automate schema-level matching tasks.

Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model.

Beck, Bo Ram; Shin, Bonggun; Choi, Yoonjung; Park, Sungsoo; Kang, Keunsoo.

Comput Struct Biotechnol J ; 18: 784-790, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32280433

RESUMO

The infection of a novel coronavirus found in Wuhan of China (SARS-CoV-2) is rapidly spreading, and the incidence rate is increasing worldwide. Due to the lack of effective treatment options for SARS-CoV-2, various strategies are being tested in China, including drug repurposing. In this study, we used our pre-trained deep learning-based drug-target interaction model called Molecule Transformer-Drug Target Interaction (MT-DTI) to identify commercially available drugs that could act on viral proteins of SARS-CoV-2. The result showed that atazanavir, an antiretroviral medication used to treat and prevent the human immunodeficiency virus (HIV), is the best chemical compound, showing an inhibitory potency with Kd of 94.94 nM against the SARS-CoV-2 3C-like proteinase, followed by remdesivir (113.13 nM), efavirenz (199.17 nM), ritonavir (204.05 nM), and dolutegravir (336.91 nM). Interestingly, lopinavir, ritonavir, and darunavir are all designed to target viral proteinases. However, in our prediction, they may also bind to the replication complex components of SARS-CoV-2 with an inhibitory potency with Kd < 1000 nM. In addition, we also found that several antiviral agents, such as Kaletra (lopinavir/ritonavir), could be used for the treatment of SARS-CoV-2. Overall, we suggest that the list of antiviral drugs identified by the MT-DTI model should be considered, when establishing effective treatment strategies for SARS-CoV-2.

Target-Centered Drug Repurposing Predictions of Human Angiotensin-Converting Enzyme 2 (ACE2) and Transmembrane Protease Serine Subtype 2 (TMPRSS2) Interacting Approved Drugs for Coronavirus Disease 2019 (COVID-19) Treatment through a Drug-Target Interaction Deep Learning Model.

Choi, Yoonjung; Shin, Bonggun; Kang, Keunsoo; Park, Sungsoo; Beck, Bo Ram.

Viruses ; 12(11)2020 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-33218024

RESUMO

Previously, our group predicted commercially available Food and Drug Administration (FDA) approved drugs that can inhibit each step of the replication of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) using a deep learning-based drug-target interaction model called Molecule Transformer-Drug Target Interaction (MT-DTI). Unfortunately, additional clinically significant treatment options since the approval of remdesivir are scarce. To overcome the current coronavirus disease 2019 (COVID-19) more efficiently, a treatment strategy that controls not only SARS-CoV-2 replication but also the host entry step should be considered. In this study, we used MT-DTI to predict FDA approved drugs that may have strong affinities for the angiotensin-converting enzyme 2 (ACE2) receptor and the transmembrane protease serine 2 (TMPRSS2) which are essential for viral entry to the host cell. Of the 460 drugs with Kd of less than 100 nM for the ACE2 receptor, 17 drugs overlapped with drugs that inhibit the interaction of ACE2 and SARS-CoV-2 spike reported in the NCATS OpenData portal. Among them, enalaprilat, an ACE inhibitor, showed a Kd value of 1.5 nM against the ACE2. Furthermore, three of the top 30 drugs with strong affinity prediction for the TMPRSS2 are anti-hepatitis C virus (HCV) drugs, including ombitasvir, daclatasvir, and paritaprevir. Notably, of the top 30 drugs, AT1R blocker eprosartan and neuropsychiatric drug lisuride showed similar gene expression profiles to potential TMPRSS2 inhibitors. Collectively, we suggest that drugs predicted to have strong inhibitory potencies to ACE2 and TMPRSS2 through the DTI model should be considered as potential drug repurposing candidates for COVID-19.

Assuntos

Enzima de Conversão de Angiotensina 2/antagonistas & inibidores , Tratamento Farmacológico da COVID-19 , Aprendizado Profundo , Reposicionamento de Medicamentos/métodos , Serina Endopeptidases/metabolismo , Enzima de Conversão de Angiotensina 2/metabolismo , Desenvolvimento de Medicamentos , Hepacivirus/efeitos dos fármacos , Humanos , SARS-CoV-2/efeitos dos fármacos , Internalização do Vírus/efeitos dos fármacos , Replicação Viral/efeitos dos fármacos

Wx: a neural network-based feature selection algorithm for transcriptomic data.

Park, Sungsoo; Shin, Bonggun; Sang Shim, Won; Choi, Yoonjung; Kang, Kilsoo; Kang, Keunsoo.

Sci Rep ; 9(1): 10500, 2019 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-31324856

RESUMO

Next-generation sequencing (NGS), which allows the simultaneous sequencing of billions of DNA fragments simultaneously, has revolutionized how we study genomics and molecular biology by generating genome-wide molecular maps of molecules of interest. However, the amount of information produced by NGS has made it difficult for researchers to choose the optimal set of genes. We have sought to resolve this issue by developing a neural network-based feature (gene) selection algorithm called Wx. The Wx algorithm ranks genes based on the discriminative index (DI) score that represents the classification power for distinguishing given groups. With a gene list ranked by DI score, researchers can institutively select the optimal set of genes from the highest-ranking ones. We applied the Wx algorithm to a TCGA pan-cancer gene-expression cohort to identify an optimal set of gene-expression biomarker candidates that can distinguish cancer samples from normal samples for 12 different types of cancer. The 14 gene-expression biomarker candidates identified by Wx were comparable to or outperformed previously reported universal gene expression biomarkers, highlighting the usefulness of the Wx algorithm for next-generation sequencing data. Thus, we anticipate that the Wx algorithm can complement current state-of-the-art analytical applications for the identification of biomarker candidates as an alternative method. The stand-alone and web versions of the Wx algorithm are available at https://github.com/deargen/DearWXpub and https://wx.deargendev.me/ , respectively.

Assuntos

Algoritmos , Genes Neoplásicos , Redes Neurais de Computação , Transcriptoma , Biomarcadores Tumorais , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , RNA Neoplásico/genética

Cascaded Wx: A Novel Prognosis-Related Feature Selection Framework in Human Lung Adenocarcinoma Transcriptomes.

Shin, Bonggun; Park, Sungsoo; Hong, Ji Hyung; An, Ho Jung; Chun, Sang Hoon; Kang, Kilsoo; Ahn, Young-Ho; Ko, Yoon Ho; Kang, Keunsoo.

Front Genet ; 10: 662, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31379926

RESUMO

Artificial neural network-based analysis has recently been used to predict clinical outcomes in patients with solid cancers, including lung cancer. However, the majority of algorithms were not originally developed to identify genes associated with patients' prognoses. To address this issue, we developed a novel prognosis-related feature selection framework called Cascaded Wx (CWx). The CWx framework ranks features according to the survival of a given cohort by training neural networks with three different high- and low-risk groups in a cascaded fashion. We showed that this approach accurately identified features that best identify the patients' prognoses, compared to other feature selection algorithms, including the Cox proportional hazards and Coxnet models, when applied to The Cancer Genome Atlas lung adenocarcinoma (LUAD) transcriptome data. The prognostic potential of the top 100 genes identified by CWx outperformed or was comparable to those identified by the other methods as assessed by the concordance index (c-index). In addition, the top 100 genes identified by CWx were found to be associated with the Wnt signaling pathway, providing biologically relevant evidence for the value of these genes in predicting the prognosis of patients with LUAD. Further analyses of other cancer types showed that the genes identified by CWx had the highest prognostic values according to the c-index. Collectively, the CWx framework will potentially be of great use to prognosis-related biomarker discoveries in a variety of diseases.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA