Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35945035

RESUMO

Neural network (NN)-based protein modeling methods have improved significantly in recent years. Although the overall accuracy of the two non-homology-based modeling methods, AlphaFold and RoseTTAFold, is outstanding, their performance for specific protein families has remained unexamined. G-protein-coupled receptor (GPCR) proteins are particularly interesting since they are involved in numerous pathways. This work directly compares the performance of these novel deep learning-based protein modeling methods for GPCRs with the most widely used template-based software-Modeller. We collected the experimentally determined structures of 73 GPCRs from the Protein Data Bank. The official AlphaFold repository and RoseTTAFold web service were used with default settings to predict five structures of each protein sequence. The predicted models were then aligned with the experimentally solved structures and evaluated by the root-mean-square deviation (RMSD) metric. If only looking at each program's top-scored structure, Modeller had the smallest average modeling RMSD of 2.17 Å, which is better than AlphaFold's 5.53 Å and RoseTTAFold's 6.28 Å, probably since Modeller already included many known structures as templates. However, the NN-based methods (AlphaFold and RoseTTAFold) outperformed Modeller in 21 and 15 out of the 73 cases with the top-scored model, respectively, where no good templates were available for Modeller. The larger RMSD values generated by the NN-based methods were primarily due to the differences in loop prediction compared to the crystal structures.


Assuntos
Receptores Acoplados a Proteínas G , Software , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Receptores Acoplados a Proteínas G/química
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34530437

RESUMO

The trade-off between a machine learning (ML) and deep learning (DL) model's predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure-activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood-brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.


Assuntos
Aprendizado Profundo , Barreira Hematoencefálica , Aprendizado de Máquina , Permeabilidade , Máquina de Vetores de Suporte
3.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32770190

RESUMO

In drug development, preclinical safety and pharmacokinetics assessments of candidate drugs to ensure the safety profile are a must. While in vivo and in vitro tests are traditionally used, experimental determinations have disadvantages, as they are usually time-consuming and costly. In silico predictions of these preclinical endpoints have each been developed in the past decades. However, only a few web-based tools have integrated different models to provide a simple one-step platform to help researchers thoroughly evaluate potential drug candidates. To efficiently achieve this approach, a platform for preclinical evaluation must not only predict key ADMET (absorption, distribution, metabolism, excretion and toxicity) properties but also provide some guidance on structural modifications to improve the undesired properties. In this review, we organized and compared several existing integrated web servers that can be adopted in preclinical drug development projects to evaluate the subject of interest. We also introduced our new web server, Virtual Rat, as an alternative choice to profile the properties of drug candidates. In Virtual Rat, we provide not only predictions of important ADMET properties but also possible reasons as to why the model made those structural predictions. Multiple models were implemented into Virtual Rat, including models for predicting human ether-a-go-go-related gene (hERG) inhibition, cytochrome P450 (CYP) inhibition, mutagenicity (Ames test), blood-brain barrier penetration, cytotoxicity and Caco-2 permeability. Virtual Rat is free and has been made publicly available at https://virtualrat.cmdm.tw/.


Assuntos
Desenvolvimento de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Biológicos , Farmacocinética , Software , Animais , Células CACO-2 , Avaliação Pré-Clínica de Medicamentos , Humanos , Ratos
4.
Bioinformatics ; 37(8): 1184-1186, 2021 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-32915954

RESUMO

SUMMARY: Drug discovery targeting G protein-coupled receptors (GPCRs), the largest known class of therapeutic targets, is challenging. To facilitate the rapid discovery and development of GPCR drugs, we built a system, PanGPCR, to predict multiple potential GPCR targets and their expression locations in the tissues, side effects and possible repurposing of GPCR drugs. With PanGPCR, the compound of interest is docked to a library of 36 experimentally determined crystal structures comprising of 46 docking sites for human GPCRs, and a ranked list is generated from the docking studies to assess all GPCRs and their binding affinities. Users can determine a given compound's GPCR targets and its repurposing potential accordingly. Moreover, potential side effects collected from the SIDER (Side-Effect Resource) database and mapped to 45 tissues and organs are provided by linking predicted off-targets and their expressed sequence tag profiles. With PanGPCR, multiple targets, repurposing potential and side effects can be determined by simply uploading a small ligand. AVAILABILITY AND IMPLEMENTATION: PanGPCR is freely accessible at https://gpcrpanel.cmdm.tw/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Reposicionamento de Medicamentos , Receptores Acoplados a Proteínas G , Descoberta de Drogas , Humanos , Ligantes , Receptores Acoplados a Proteínas G/genética
5.
J Formos Med Assoc ; 121(12): 2649-2652, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36031487

RESUMO

New psychoactive substances (NPS) have increasingly been illegally synthesized and used around the world in recent years. Due to the large volume and the variety of NPS, most do not have sufficient information about their addictive potential and harmful effects to human subjects. This makes it difficult to evaluate these potential substances of abuse. This study aims to build a database based on Taiwan's controlled substances, to provide quick structural and pharmacological feedback. Taiwan Controlled Substances Database (TCSD) includes the collection of controlled substances, relevant experimental and structural information, as well as computational features such as molecular fingerprints and descriptors. Two types of structural search were added: substructure search and topological fingerprint similarity search. A web framework was used to enhance accessibility and usability (https://cs2search.cmdm.tw).


Assuntos
Substâncias Controladas , Humanos , Taiwan , Bases de Dados Factuais
6.
J Chem Inf Model ; 57(12): 3138-3148, 2017 12 26.
Artigo em Inglês | MEDLINE | ID: mdl-29131618

RESUMO

Identification of the individual chemical constituents of a mixture, especially solutions extracted from medicinal plants, is a time-consuming task. The identification results are often limited by challenges such as the development of separation methods and the availability of known reference standards. A novel structure elucidation system, NP-StructurePredictor, is presented and used to accelerate the process of identifying chemical structures in a mixture based on a branch and bound algorithm combined with a large collection of natural product databases. NP-StructurePredictor requires only targeted molecular weights calculated from a list of m/z values from liquid chromatography-mass spectrometry (LC-MS) experiments as input information to predict the chemical structures of individual components matching the weights in a mixture. NP-StructurePredictor also provides the predicted structures with statistically calculated probabilities so that the most likely chemical structures of the natural products and their analogs can be proposed accordingly. Four data sets consisting of different Chinese herbs with mixtures containing known compounds were selected for validation studies, and all their components were correctly identified and highly predicted using NP-StructurePredictor. NP-StructurePredictor demonstrated its applicability for predicting the chemical structures of novel compounds by returning highly accurate results from four different validation case studies.


Assuntos
Produtos Biológicos/química , Extratos Vegetais/química , Plantas Medicinais/química , Cromatografia Líquida , Bases de Dados Factuais , Espectrometria de Massas , Modelos Químicos , Software
7.
Bioinformatics ; 31(11): 1869-71, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25617412

RESUMO

UNLABELLED: Cytochrome P450 (CYPs) are the major enzymes involved in drug metabolism and bioactivation. Inhibition models were constructed for five of the most popular enzymes from the CYP superfamily in human liver. The five enzymes chosen for this study, namely CYP1A2, CYP2D6, CYP2C19, CYP2C9 and CYP3A4, account for 90% of the xenobiotic and drug metabolism in human body. CYP enzymes can be inhibited or induced by various drugs or chemical compounds. In this work, a rule-based CYP inhibition prediction online server, CypRules, was created based on predictive models generated by the rule-based C5.0 algorithm. CypRules can predict and provide structural rulesets for CYP inhibition for each compound uploaded to the server. Capable of fast execution performance, it can be used for virtual high-throughput screening (VHTS) of a large set of testing compounds. AVAILABILITY AND IMPLEMENTATION: CypRules is freely accessible at http://cyprules.cmdm.tw/ and models, descriptor and program files for all compounds are publically available at http://cyprules.cmdm.tw/sources/sources.rar.


Assuntos
Inibidores das Enzimas do Citocromo P-450/farmacologia , Software , Algoritmos , Sistema Enzimático do Citocromo P-450/metabolismo , Ensaios de Triagem em Larga Escala , Humanos , Fígado/enzimologia
8.
Toxicol Appl Pharmacol ; 288(1): 52-62, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26200234

RESUMO

Carbon nanotubes have become widely used in a variety of applications including biosensors and drug carriers. Therefore, the issue of carbon nanotube toxicity is increasingly an area of focus and concern. While previous studies have focused on the gross mechanisms of action relating to nanomaterials interacting with biological entities, this study proposes detailed mechanisms of action, relating to nanotoxicity, for a series of decorated (functionalized) carbon nanotube complexes based on previously reported QSAR models. Possible mechanisms of nanotoxicity for six endpoints (bovine serum albumin, carbonic anhydrase, chymotrypsin, hemoglobin along with cell viability and nitrogen oxide production) have been extracted from the corresponding optimized QSAR models. The molecular features relevant to each of the endpoint respective mechanism of action for the decorated nanotubes are also discussed. Based on the molecular information contained within the optimal QSAR models for each nanotoxicity endpoint, either the decorator attached to the nanotube is directly responsible for the expression of a particular activity, irrespective of the decorator's 3D-geometry and independent of the nanotube, or those decorators having structures that place the functional groups of the decorators as far as possible from the nanotube surface most strongly influence the biological activity. These molecular descriptors are further used to hypothesize specific interactions involved in the expression of each of the six biological endpoints.


Assuntos
Nanotubos de Carbono/toxicidade , Anidrases Carbônicas/metabolismo , Sobrevivência Celular/efeitos dos fármacos , Quimotripsina/metabolismo , Hemoglobinas/metabolismo , Macrófagos/efeitos dos fármacos , Macrófagos/metabolismo , Macrófagos/patologia , Estrutura Molecular , Nanotubos de Carbono/química , Óxido Nítrico/metabolismo , Ligação Proteica , Relação Quantitativa Estrutura-Atividade , Medição de Risco , Soroalbumina Bovina/metabolismo , Propriedades de Superfície
9.
J Chem Inf Model ; 55(2): 434-45, 2015 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-25625768

RESUMO

Fluorescence-based detection has been commonly used in high-throughput screening (HTS) assays. Autofluorescent compounds, which can emit light in the absence of artificial fluorescent markers, often interfere with the detection of fluorophores and result in false positive signals in these assays. This interference presents a major issue in fluorescence-based screening techniques. In an effort to reduce the time and cost that will be spent on prescreening of autofluorescent compounds, in silico autofluorescence prediction models were developed for selected fluorescence-based assays in this study. Five prediction models were developed based on the respective fluorophores used in these HTS assays, which absorb and emit light at specific wavelengths (excitation/emission): Alexa Fluor 350 (A350) (340 nm/450 nm), 7-amino-4-trifluoromethyl-coumarin (AFC) (405 nm/520 nm), Alexa Fluor 488 (A488) (480 nm/540 nm), Rhodamine (547 nm/598 nm), and Texas Red (547 nm/618 nm). The C5.0 rule-based classification algorithm and PubChem 2D chemical structure fingerprints were used to develop prediction models. To optimize the accuracies of these prediction models despite the highly imbalanced ratio of fluorescent versus nonfluorescent compounds presented in the collected data sets, oversampling and undersampling strategies were applied. The average final accuracy achieved for the training set was 97%, and that for the testing set was 92%. In addition, five external data sets were used to further validate the models. Ultimately, 14 representative structural features (or rules) were determined to efficiently predict autofluorescence in data sets containing both fluorescent and nonfluorescent compounds. Several cases were illustrated in this study to demonstrate the applicability of these rules.


Assuntos
Corantes Fluorescentes/classificação , Ensaios de Triagem em Larga Escala/métodos , Modelos Químicos , Algoritmos , Análise por Conglomerados , Simulação por Computador , Fluorescência , Corantes Fluorescentes/química , Lógica Fuzzy , Aprendizado de Máquina , Valor Preditivo dos Testes , Relação Quantitativa Estrutura-Atividade , Relação Estrutura-Atividade
10.
J Chem Inf Model ; 55(7): 1426-34, 2015 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-26108525

RESUMO

Hepatotoxicity, drug-induced liver injury, and competitive Cytochrome P-450 (CYP) isozyme binding are serious problems associated with drug use. It would be favorable to avoid or to understand potential CYP inhibition at the developmental stages. However, current in silico CYP prediction models or available public prediction servers can provide only yes/no classification results for just one or a few CYP enzymes. In this study, we utilized a rule-based C5.0 algorithm with different descriptors, including PaDEL, Mold(2), and PubChem fingerprints, to construct rule-based inhibition prediction models for five major CYP enzymes-CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4-that account for 90% of drug oxidation or hydrolysis. We also developed a rational sampling algorithm for the selection of compounds in the training data set, to enhance the performance of these CYP prediction models. The optimized models include several improved features. First, the final models significantly outperformed all of the currently available models. Second, the final models can also be used for rapid virtual screening of a large set of compounds due to their ruleset-based nature. Moreover, such rule-based prediction models can provide rulesets for structural features related to the five major CYP enzymes. The five most significant rules for CYP inhibition were identified for each CYP enzymes and discussed. An example was chosen for each of the five CYP enzymes to demonstrate how rule-based models can be used to gain insights into structural features that correspond with CYP inhibitions. A newer version of the freely accessible CYP prediction server, CypRules, is presented here as a result of the aforementioned improvements.


Assuntos
Simulação por Computador , Inibidores das Enzimas do Citocromo P-450/farmacologia , Sistema Enzimático do Citocromo P-450/metabolismo , Descoberta de Drogas/métodos , Algoritmos , Inibidores das Enzimas do Citocromo P-450/metabolismo , Sistema Enzimático do Citocromo P-450/química , Isoenzimas/antagonistas & inibidores , Isoenzimas/química , Isoenzimas/metabolismo , Modelos Moleculares , Conformação Proteica
11.
J Chem Inf Model ; 53(1): 142-58, 2013 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-23252880

RESUMO

Little attention has been given to the selection of trial descriptor sets when designing a QSAR analysis even though a great number of descriptor classes, and often a greater number of descriptors within a given class, are now available. This paper reports an effort to explore interrelationships between QSAR models and descriptor sets. Zhou and co-workers (Zhou et al., Nano Lett. 2008, 8 (3), 859-865) designed, synthesized, and tested a combinatorial library of 80 surface modified, that is decorated, multi-walled carbon nanotubes for their composite nanotoxicity using six endpoints all based on a common 0 to 100 activity scale. Each of the six endpoints for the 29 most nanotoxic decorated nanotubes were incorporated as the training set for this study. The study reported here includes trial descriptor sets for all possible combinations of MOE, VolSurf, and 4D-fingerprints (FP) descriptor classes, as well as including and excluding explicit spatial contributions from the nanotube. Optimized QSAR models were constructed from these multiple trial descriptor sets. It was found that (a) both the form and quality of the best QSAR models for each of the endpoints are distinct and (b) some endpoints are quite dependent upon 4D-FP descriptors of the entire nanotube-decorator complex. However, other endpoints yielded equally good models only using decorator descriptors with and without the decorator-only 4D-FP descriptors. Lastly, and most importantly, the quality, significance, and interpretation of a QSAR model were found to be critically dependent on the trial descriptor sets used within a given QSAR endpoint study.


Assuntos
Determinação de Ponto Final , Nanotubos/química , Nanotubos/toxicidade , Relação Quantitativa Estrutura-Atividade , Animais , Bovinos , Modelos Moleculares , Conformação Molecular , Proteínas/metabolismo , Testes de Toxicidade
12.
Molecules ; 18(11): 13487-509, 2013 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-24184819

RESUMO

There is a compelling need to discover type II inhibitors targeting the unique DFG-out inactive kinase conformation since they are likely to possess greater potency and selectivity relative to traditional type I inhibitors. Using a known inhibitor, such as a currently available and approved drug or inhibitor, as a template to design new drugs via computational de novo design is helpful when working with known ligand-receptor interactions. This study proposes a new template-based de novo design protocol to discover new inhibitors that preserve and also optimize the binding interactions of the type II kinase template. First, sorafenib (Nexavar) and nilotinib (Tasigna), two type II inhibitors with different ligand-receptor interactions, were selected as the template compounds. The five-step protocol can reassemble each drug from a large fragment library. Our procedure demonstrates that the selected template compounds can be successfully reassembled while the key ligand-receptor interactions are preserved. Furthermore, to demonstrate that the algorithm is able to construct more potent compounds, we considered kinase inhibitors and other protein dataset, acetylcholinesterase (AChE) inhibitors. The de novo optimization was initiated using a template compound possessing a less than optimal activity from a series of aminoisoquinoline and TAK-285 inhibiting type II kinases, and E2020 derivatives inhibiting AChE respectively. Three compounds with greater potency than the template compound were discovered that were also included in the original congeneric series. This template-based lead optimization protocol with the fragment library can help to design compounds with preferred binding interactions of known inhibitors automatically and further optimize the compounds in the binding pockets.


Assuntos
Inibidores da Colinesterase/química , Inibidores de Proteínas Quinases/química , Desenho de Fármacos , Humanos , Niacinamida/análogos & derivados , Niacinamida/química , Compostos de Fenilureia/química , Pirimidinas/química , Sorafenibe , Relação Estrutura-Atividade
13.
J Chem Inf Model ; 52(6): 1660-73, 2012 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-22642982

RESUMO

The inclusion and accessibility of different methodologies to explore chemical data sets has been beneficial to the field of predictive modeling, specifically in the chemical sciences in the field of Quantitative Structure-Activity Relationship (QSAR) modeling. This study discusses using contemporary protocols and QSAR modeling methods to properly model two biomolecular systems that have historically not performed well using traditional and three-dimensional QSAR methodologies. Herein, we explore, analyze, and discuss the creation of a classification human Ether-a-go-go Related Gene (hERG) potassium channel model and a continuous Tetrahymena pyriformis (T. pyriformis) model using Support Vector Machine (SVM) and Support Vector Regression (SVR), respectively. The models are constructed with three types of molecular descriptors that capture the gross physicochemical features of the compounds: (i) 2D, 2 1/2D, and 3D physical features, (ii) VolSurf-like molecular interaction fields, and (iii) 4D-Fingerprints. The best hERG SVM model achieved 89% accuracy and the three-best SVM models were able to screen a Pubchem data set with an accuracy of 97%. The best T. pyriformis model had an R(2) value of 0.924 for the training set and was able to predict the continuous end points for two test sets with R(2) values of 0.832 and 0.620, respectively. The studies presented within demonstrate the predictive ability (classification and continuous end points) of QSAR models constructed from curated data sets, biologically relevant molecular descriptors, and Support Vector Machines and Support Vector Regression. The ability of these protocols and methodologies to accommodate large data sets (several thousands compounds) that are chemically diverse - and in the case of classification modeling unbalanced (one experimental outcome dominates the data set) - allows scientists to further explore a remarkable amount of biological and chemical information.


Assuntos
Canais de Potássio Éter-A-Go-Go/classificação , Modelos Moleculares , Tetrahymena pyriformis/efeitos dos fármacos , Toxicologia , Animais , Canal de Potássio ERG1 , Relação Quantitativa Estrutura-Atividade , Máquina de Vetores de Suporte
14.
Chem Res Toxicol ; 24(6): 934-49, 2011 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-21504223

RESUMO

The human ether-a-go-go related gene (hERG) potassium ion channel plays a key role in cardiotoxicity and is therefore a key target as part of preclinical drug discovery toxicity screening. The PubChem hERG Bioassay data set, composed of 1668 compounds, was used to construct an in silico screening model. The corresponding trial models were constructed from a descriptor pool composed of 4D fingerprints (4D-FP) and traditional 2D and 3D VolSurf-like molecular descriptors. A final binary classification model was constructed via a support vector machine (SVM). The resultant model was then validated using the PubChem hERG Bioassay data set (AID 376) and an external hERG data set by evaluating the model's ability to determine hERG blockers from nonblockers. The external data set (the test set) consisted of 356 compounds collected from available literature data and consisting of 287 actives and 69 inactives. Four different sampling protocols and a 10-fold cross-correlation analysis--used in the validation process to evaluate classification models--explored the impact of the active--inactive data imbalance distribution of the PubChem high-throughput data set. Four different data sets were explored, and the one employing Lipinski's rule-of-five coupled with measures of relative molecular lipophilicity performed the best in the 10-fold cross-correlation validation of the training data set as well as overall prediction accuracy of the external test sets. The linear SVM binary classification model building strategy was applied to different combinations of MOE (traditional 2D, "21/2D", and 3D VolSurf-like) and 4D-FP molecular descriptors to further explore and refine previously proposed key descriptors, identify new significant features that contribute to the prediction of hERG toxicity, and construct the optimal SVM binary classification model from a shrunken descriptor pool. The accuracy, sensitivity, and specificity of the best model determined from 10-fold cross-validation are 95, 90, and 96%, respectively; the overall accuracy is near 87% for the external set. The models constructed in this study demonstrate the following: (i) robustness based upon performance in accuracy across the structural diversity of the training set, (ii) ability to predict a compound's "predisposition" to block hERG ion channels, and (iii) define and illustrate structural features that can be overlaid onto the chemical structures to aid in the 3D structure-activity interpretation of the hERG blocking effect.


Assuntos
Descoberta de Drogas/métodos , Canais de Potássio Éter-A-Go-Go/antagonistas & inibidores , Canais de Potássio Éter-A-Go-Go/metabolismo , Bloqueadores dos Canais de Potássio/química , Bloqueadores dos Canais de Potássio/farmacologia , Inteligência Artificial , Simulação por Computador , Humanos , Modelos Biológicos , Modelos Moleculares , Ligação Proteica , Relação Quantitativa Estrutura-Atividade
15.
J Chem Inf Model ; 50(7): 1304-18, 2010 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-20565102

RESUMO

Blockage of the human ether-a-go-go related gene (hERG) potassium ion channel is a major factor related to cardiotoxicity. Hence, drugs binding to this channel have become an important biological end point in side effects screening. A set of 250 structurally diverse compounds screened for hERG activity from the literature was assembled using a set of reliability filters. This data set was used to construct a set of two-state hERG QSAR models. The descriptor pool used to construct the models consisted of 4D-fingerprints generated from the thermodynamic distribution of conformer states available to a molecule, 204 traditional 2D descriptors and 76 3D VolSurf-like descriptors computed using the Molecular Operating Environment (MOE) software. One model is a continuous partial least-squares (PLS) QSAR hERG binding model. Another related model is an optimized binary classification QSAR model that classifies compounds as active or inactive. This binary model achieves 91% accuracy over a large range of molecular diversity spanning the training set. Two external test sets were constructed. One test set is the condensed PubChem bioassay database containing 876 compounds, and the other test set consists of 106 additional compounds found in the literature. Both of the test sets were used to validate the binary QSAR model. The binary QSAR model permits a structural interpretation of possible sources for hERG activity. In particular, the presence of a polar negative group at a distance of 6-8 A from a hydrogen bond donor in a compound is predicted to be a quite structure-specific pharmacophore that increases hERG blockage. Since a data set of high chemical diversity was used to construct the binary model, it is applicable for performing general virtual hERG screening.


Assuntos
Química Farmacêutica , Simulação por Computador , Canais de Potássio Éter-A-Go-Go/antagonistas & inibidores , Carbolinas/química , Carbolinas/farmacologia , Cardiotoxinas/química , Cardiotoxinas/farmacologia , Cocaína/análogos & derivados , Cocaína/química , Cocaína/farmacologia , Humanos , Concentração Inibidora 50 , Estrutura Molecular , Nicotina/química , Nicotina/farmacologia , Relação Quantitativa Estrutura-Atividade , Software
16.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31976536

RESUMO

Breathomics is a special branch of metabolomics that quantifies volatile organic compounds (VOCs) from collected exhaled breath samples. Understanding how breath molecules are related to diseases, mechanisms and pathways identified from experimental analytical measurements is challenging due to the lack of an organized resource describing breath molecules, related references and biomedical information embedded in the literature. To provide breath VOCs, related references and biomedical information, we aim to organize a database composed of manually curated information and automatically extracted biomedical information. First, VOCs-related disease information was manually organized from 207 literature linked to 99 VOCs and known Medical Subject Headings (MeSH) terms. Then an automated text mining algorithm was used to extract biomedical information from this literature. In the end, the manually curated information and auto-extracted biomedical information was combined to form a breath molecule database-the Human Breathomics Database (HBDB). We first manually curated and organized disease information including MeSH term from 207 literatures associated with 99 VOCs. Then, an automatic pipeline of text mining approach was used to collect 2766 literatures and extract biomedical information from breath researches. We combined curated information with automatically extracted biomedical information to assemble a breath molecule database, the HBDB. The HBDB is a database that includes references, VOCs and diseases associated with human breathomics. Most of these VOCs were detected in human breath samples or exhaled breath condensate samples. So far, the database contains a total of 913 VOCs in relation to human exhaled breath researches reported in 2766 publications. The HBDB is the most comprehensive HBDB of VOCs in human exhaled breath to date. It is a useful and organized resource for researchers and clinicians to identify and further investigate potential biomarkers from the breath of patients. Database URL: https://hbdb.cmdm.tw.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Expiração/fisiologia , Metaboloma/fisiologia , Metabolômica/métodos , Compostos Orgânicos Voláteis , Testes Respiratórios , Mineração de Dados , Humanos , Compostos Orgânicos Voláteis/análise , Compostos Orgânicos Voláteis/química
17.
Nucleic Acids Res ; 34(Web Server issue): W198-201, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16844991

RESUMO

We provide a 'R(E)MUS' (reinforced merging techniques for unique peptide segments) web server for identification of the locations and compositions of unique peptide segments from a set of protein family sequences. Different levels of uniqueness are determined according to substitutional relationship in the amino acids, frequency of appearance and biological properties such as priority for serving as candidates for epitopes where antibodies recognize. R(E)MUS also provides interactive visualization of 3D structures for allocation and comparison of the identified unique peptide segments. Accuracy of the algorithm was found to be 70% in terms of mapping a unique peptide segment as an epitope. The R(E)MUS web server is available at http://biotools.cs.ntou.edu.tw/REMUS and the PC version software can be freely downloaded either at http://bioinfo.life.nthu.edu.tw/REMUS or http://spider.cs.ntou.edu.tw/BioTools/REMUS. User guide and working examples for PC version are available at http://spider.cs.ntou.edu.tw/BioTools/REMUS-DOCS.html, and details of the proposed algorithm can be referred to the documents as described previously [H. T. Chang, T. W. Pai, T. C. Fan, B. H. Su, P. C. Wu, C. Y. Tang, C. T. Chang, S. H. Liu and M. D. T. Chang (2006) BMC Bioinformatics, 7, 38 and T. W. Pai, B. H. Su, P. C. Wu, M. D. T. Chang, H. T. Chang, T. C. Fan and S. H. Liu (2006) J. Bioinform. Comput. Biol., 4, 75-92].


Assuntos
Epitopos/química , Peptídeos/química , Peptídeos/imunologia , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Humanos , Internet
18.
J Cheminform ; 9(1): 50, 2017 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-29086161

RESUMO

GPU acceleration is useful in solving complex chemical information problems. Identifying unknown structures from the mass spectra of natural product mixtures has been a desirable yet unresolved issue in metabolomics. However, this elucidation process has been hampered by complex experimental data and the inability of instruments to completely separate different compounds. Fortunately, with current high-resolution mass spectrometry, one feasible strategy is to define this problem as extending a scaffold database with sidechains of different probabilities to match the high-resolution mass obtained from a high-resolution mass spectrum. By introducing a dynamic programming (DP) algorithm, it is possible to solve this NP-complete problem in pseudo-polynomial time. However, the running time of the DP algorithm grows by orders of magnitude as the number of mass decimal digits increases, thus limiting the boost in structural prediction capabilities. By harnessing the heavily parallel architecture of modern GPUs, we designed a "compute unified device architecture" (CUDA)-based GPU-accelerated mixture elucidator (G.A.M.E.) that considerably improves the performance of the DP, allowing up to five decimal digits for input mass data. As exemplified by four testing datasets with verified constitutions from natural products, G.A.M.E. allows for efficient and automatic structural elucidation of unknown mixtures for practical procedures. Graphical abstract .

19.
J Cheminform ; 9(1): 57, 2017 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-29143270

RESUMO

The identification of chemical structures in natural product mixtures is an important task in drug discovery but is still a challenging problem, as structural elucidation is a time-consuming process and is limited by the available mass spectra of known natural products. Computer-aided structure elucidation (CASE) strategies seek to automatically propose a list of possible chemical structures in mixtures by utilizing chromatographic and spectroscopic methods. However, current CASE tools still cannot automatically solve structures for experienced natural product chemists. Here, we formulated the structural elucidation of natural products in a mixture as a computational problem by extending a list of scaffolds using a weighted side chain list after analyzing a collection of 243,130 natural products and designed an efficient algorithm to precisely identify the chemical structures. The complexity of such a problem is NP-complete. A dynamic programming (DP) algorithm can solve this NP-complete problem in pseudo-polynomial time after converting floating point molecular weights into integers. However, the running time of the DP algorithm degrades exponentially as the precision of the mass spectrometry experiment grows. To ideally solve in polynomial time, we proposed a novel iterative DP algorithm that can quickly recognize the chemical structures of natural products. By utilizing this algorithm to elucidate the structures of four natural products that were experimentally and structurally determined, the algorithm can search the exact solutions, and the time performance was shown to be in polynomial time for average cases. The proposed method improved the speed of the structural elucidation of natural products and helped broaden the spectrum of available compounds that could be applied as new drug candidates. A web service built for structural elucidation studies is freely accessible via the following link ( http://csccp.cmdm.tw/ ).

20.
BMC Bioinformatics ; 7: 38, 2006 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-16433931

RESUMO

BACKGROUND: Members of a protein family often have highly conserved sequences; most of these sequences carry identical biological functions and possess similar three-dimensional (3-D) structures. However, enzymes with high sequence identity may acquire differential functions other than the common catalytic ability. It is probable that each of their variable regions consists of a unique peptide motif (UPM), which selectively interacts with other cellular proteins, rendering additional biological activities. The ability to identify and localize such UPMs is paramount in recognizing the characteristic role of each member of a protein family. RESULTS: We have developed a reinforced merging algorithm (RMA) with which non-gapped UPMs were identified in a variety of query protein sequences including members of human ribonuclease A (RNaseA), epidermal growth factor receptor (EGFR), matrix metalloproteinase (MMP), and Sma-and-Mad related protein families (Smad). The UPMs generally occupy specific positions in the resolved 3-D structures, especially the loop regions on the structural surfaces. These motifs coincide with the recognition sites for antibodies, as the epitopes of four monoclonal antibodies and two polyclonal antibodies were shown to overlap with the UPMs. Most of the UPMs were found to correlate well with the potential antigenic regions predicted by PROTEAN. Furthermore, an accuracy of 70% can be achieved in terms of mapping a UPM to an epitope. CONCLUSION: Our study provides a bioinformatic approach for searching and predicting potential epitopes and interacting motifs that distinguish different members of a protein family.


Assuntos
Algoritmos , Motivos de Aminoácidos , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sequência Conservada , Modelos Químicos , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA