RESUMO
The use of virtual drug screening can be beneficial to research teams, enabling them to narrow down potentially useful compounds for further study. A variety of virtual screening methods have been developed, typically with machine learning classifiers at the center of their design. In the present study, we created a virtual screener for protein kinase inhibitors. Experimental compound-target interaction data were obtained from the IDG-DREAM Drug-Kinase Binding Prediction Challenge. These data were converted and fed as inputs into two multi-input recurrent neural networks (RNNs). The first network utilized data encoded in one-hot representation, while the other incorporated embedding layers. The models were developed in Python, and were designed to output the IC50 of the target compounds. The performance of the models was assessed primarily through analysis of the Q2 values produced from runs of differing sample and epoch size; recorded loss values were also reported and graphed. The performance of the models was limited, though multiple changes are proposed for potential improvement of a multi-input recurrent neural network-based screening tool.
Assuntos
Inibidores de Proteínas Quinases/farmacologia , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Simulação por Computador , Aprendizado Profundo , Avaliação Pré-Clínica de Medicamentos , Concentração Inibidora 50 , Aprendizado de Máquina , Redes Neurais de Computação , Projetos Piloto , Ligação Proteica , Inibidores de Proteínas Quinases/químicaRESUMO
Many research groups and institutions have created a variety of databases curating experimental and predicted data related to protein-ligand binding. The landscape of available databases is dynamic, with new databases emerging and established databases becoming defunct. Here, we review the current state of databases that contain binding pockets and protein-ligand binding interactions. We have compiled a list of such databases, fifty-three of which are currently available for use. We discuss variation in how binding pockets are defined and summarize pocket-finding methods. We organize the fifty-three databases into subgroups based on goals and contents, and describe standard use cases. We also illustrate that pockets within the same protein are characterized differently across different databases. Finally, we assess critical issues of sustainability, accessibility and redundancy.
RESUMO
Social media can provide real-time insight into trends in substance use, addiction, and recovery. Prior studies have used platforms such as Reddit and X (formerly Twitter), but evolving policies around data access have threatened these platforms' usability in research. We evaluate the potential of a broad set of platforms to detect emerging trends in the opioid epidemic. From these, we created a shortlist of 11 platforms, for which we documented official policies regulating drug-related discussion, data accessibility, geolocatability, and prior use in opioid-related studies. We quantified their volumes of opioid discussion, capturing informal language by including slang generated using a large language model. Beyond the most commonly used Reddit and X, the platforms with high potential for use in opioid-related surveillance are TikTok, YouTube, and Facebook. Leveraging many different social platforms, instead of a single platform, safeguards against sudden changes to data access and may better capture all populations that use opioids than any single platform.
RESUMO
Drug abuse is a serious problem in the United States, with over 90,000 drug overdose deaths nationally in 2020. A key step in combating drug abuse is detecting, monitoring, and characterizing its trends over time and location, also known as pharmacovigilance. While federal reporting systems accomplish this to a degree, they often have high latency and incomplete coverage. Social-media-based pharmacovigilance has zero latency, is easily accessible and unfiltered, and benefits from drug users being willing to share their experiences online pseudo-anonymously. However, unlike highly structured official data sources, social media text is rife with misspellings and slang, making automated analysis difficult. Generative Pretrained Transformer 3 (GPT-3) is a large autoregressive language model specialized for few-shot learning that was trained on text from the entire internet. We demonstrate that GPT-3 can be used to generate slang and common misspellings of terms for drugs of abuse. We repeatedly queried GPT-3 for synonyms of drugs of abuse and filtered the generated terms using automated Google searches and cross-references to known drug names. When generated terms for alprazolam were manually labeled, we found that our method produced 269 synonyms for alprazolam, 221 of which were new discoveries not included in an existing drug lexicon for social media. We repeated this process for 98 drugs of abuse, of which 22 are widely-discussed drugs of abuse, building a lexicon of colloquial drug synonyms that can be used for pharmacovigilance on social media.
Assuntos
Mídias Sociais , Transtornos Relacionados ao Uso de Substâncias , Estados Unidos , Humanos , Farmacovigilância , Alprazolam , Processamento de Linguagem NaturalRESUMO
The three-dimensional structures of proteins are crucial for understanding their molecular mechanisms and interactions. Machine learning algorithms that are able to learn accurate representations of protein structures are therefore poised to play a key role in protein engineering and drug development. The accuracy of such models in deployment is directly influenced by training data quality. The use of different experimental methods for protein structure determination may introduce bias into the training data. In this work, we evaluate the magnitude of this effect across three distinct tasks: estimation of model accuracy, protein sequence design, and catalytic residue prediction. Most protein structures are derived from X-ray crystallography, nuclear magnetic resonance (NMR), or cryo-electron microscopy (cryo-EM); we trained each model on datasets consisting of either all three structure types or of only X-ray data. We Find that across these tasks, models consistently perform worse on test sets derived from NMR and cryo-EM than they do on test sets of structures derived from X-ray crystallography, but that the difference can be mitigated when NMR and cryo-EM structures are included in the training set. Importantly, we show that including all three types of structures in the training set does not degrade test performance on X-ray structures, and in some cases even increases it. Finally, we examine the relationship between model performance and the biophysical properties of each method, and recommend that the biochemistry of the task of interest should be considered when composing training sets.
Assuntos
Biologia Computacional , Proteínas , Algoritmos , Microscopia Crioeletrônica , Cristalografia por Raios X , Humanos , Conformação ProteicaRESUMO
Plasma membranes (PMs) contain hundreds of different lipid species that contribute differently to overall bilayer properties. By modulation of these properties, membrane protein function can be affected. Furthermore, inhomogeneous lipid mixing and domains of lipid enrichment/depletion can sort proteins and provide optimal local environments. Recent coarse-grained (CG) Martini molecular dynamics efforts have provided glimpses into lipid organization of different PMs: an "Average" and a "Brain" PM. Their high complexity and large size require long simulations (â¼80 µs) for proper sampling. Thus, these simulations are computationally taxing. This level of complexity is beyond the possibilities of all-atom simulations, raising the question-what complexity is needed for "realistic" bilayer properties? We constructed CG Martini PM models of varying complexity (63 down to 8 different lipids). Lipid tail saturations and headgroup combinations were kept as consistent as possible for the "tissues'" (Average/Brain) at three levels of compositional complexity. For each system, we analyzed membrane properties to evaluate which features can be retained at lower complexity and validate eight-component bilayers that can act as reliable mimetics for Average or Brain PMs. Systems of reduced complexity deliver a more robust and malleable tool for computational membrane studies and allow for equivalent all-atom simulations and experiments.
Assuntos
Bicamadas Lipídicas , Simulação de Dinâmica Molecular , Membrana Celular , Membranas , ProteínasRESUMO
It is urgent to find the appropriate technology for the early detection of Alzheimer's disease (AD) due to the unknown AD etiopathologies that bring about serious social problems. Early detection of mild cognitive impairment (MCI) has pivotal importance in delaying or preventing the AD onset. Herein, we utilize deep learning (DL) techniques for the purpose of multiclass classification between normal control, MCI, and AD subjects. We used multi-categorical data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) including brain imaging measurements, cognitive test results, cerebrospinal fluid measures, ApoE4 status, and age. We achieved an overall accuracy of 87.197% for our artificial neural network classifier and a similar overall accuracy of 88.275% for our 1D convolutional neural network classifier. We conclude that DL-based techniques are powerful tools in analyzing ADNI data although further method refinements are needed.
RESUMO
A blockchain is a system for storing and sharing information that is secure because of its transparency. Each block in the chain is both its own independent unit containing its own information, and a dependent link in the collective chain, and this duality creates a network regulated by participants who store and share the information, rather than a third party. Blockchain has many applications in healthcare, and can improve mobile health applications, monitoring devices, sharing and storing of electronic medical records, clinical trial data, and insurance information storage. Research about blockchain and healthcare is currently limited, but blockchain is on the brink of transforming the healthcare system; through its decentralized principles, blockchain can improve accessibility and security of patient information, and can therefore overturn the healthcare hierarchy and build a new system in which patients manage their own care.
RESUMO
BACKGROUND: Alzheimer's disease (AD) is the most common form of senile dementia. However, its pathological mechanisms are not fully understood. In order to comprehend AD pathological mechanisms, researchers employed AD-related DNA microarray data and diverse computational algorithms. More efficient computational algorithms are needed to process DNA microarray data for identifying AD-related candidate genes. METHODS: In this paper, we propose a specific algorithm that is based on the following observation: When an acrobat walks along a steel-wire, his/her body must have some swing; if the swing can be controlled, then the acrobat can maintain the body balance. Otherwise, the acrobat will fall. Based on this simple idea, we have designed a simple, yet practical, algorithm termed as the Amplitude Deviation Algorithm (ADA). Deviation, overall deviation, deviation amplitude, and 3δ are introduced to characterize ADA. RESULTS: 52 candidate genes for AD have been identified via ADA. The implications for some of the AD candidate genes in AD pathogenesis have been discussed. CONCLUSIONS: Through the analysis of these AD candidate genes, we believe that AD pathogenesis may be related to the abnormality of signal transduction (AGTR1 and PTAFR), the decrease in protein transport capacity (COL5A2 (221729_at), COL5A2 (221730_at), COL4A1), the impairment of axon repair (CNR1), and the intracellular calcium dyshomeostasis (CACNB2, CACNA1E). However, their potential implication for AD pathology should be further validated by wet lab experiments as they were only identified by computation using ADA.
RESUMO
BACKGROUND: Virtual Screening (VS) has emerged as an important tool in the drug development process, as it conducts efficient in silico searches over millions of compounds, ultimately increasing yields of potential drug leads. As a subset of Artificial Intelligence (AI), Machine Learning (ML) is a powerful way of conducting VS for drug leads. ML for VS generally involves assembling a filtered training set of compounds, comprised of known actives and inactives. After training the model, it is validated and, if sufficiently accurate, used on previously unseen databases to screen for novel compounds with desired drug target binding activity. OBJECTIVE: The study aims to review ML-based methods used for VS and applications to Alzheimer's Disease (AD) drug discovery. METHODS: To update the current knowledge on ML for VS, we review thorough backgrounds, explanations, and VS applications of the following ML techniques: Naïve Bayes (NB), k-Nearest Neighbors (kNN), Support Vector Machines (SVM), Random Forests (RF), and Artificial Neural Networks (ANN). RESULTS: All techniques have found success in VS, but the future of VS is likely to lean more largely toward the use of neural networks - and more specifically, Convolutional Neural Networks (CNN), which are a subset of ANN that utilize convolution. We additionally conceptualize a work flow for conducting ML-based VS for potential therapeutics for AD, a complex neurodegenerative disease with no known cure and prevention. This both serves as an example of how to apply the concepts introduced earlier in the review and as a potential workflow for future implementation. CONCLUSION: Different ML techniques are powerful tools for VS, and they have advantages and disadvantages albeit. ML-based VS can be applied to AD drug development.
Assuntos
Doença de Alzheimer/tratamento farmacológico , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos , Aprendizado de Máquina , Fármacos Neuroprotetores/uso terapêutico , Teorema de Bayes , HumanosRESUMO
Current drug development is still costly and slow given tremendous technological advancements in drug discovery and medicinal chemistry. Using machine learning (ML) to virtually screen compound libraries promises to fix this for generating drug leads more efficiently and accurately. Herein, we explain the broad basics and integration of both virtual screening (VS) and ML. We then discuss artificial neural networks (ANNs) and their usage for VS. The ANN is emerging as the dominant classifier for ML in general, and has proven its utility for both structure-based and ligand-based VS. Techniques such as dropout, multitask learning and convolution improve the performance of ANNs and enable them to take on chemical meaning when learning about the drug-target-binding activity of compounds.
Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Humanos , Ligantes , Redes Neurais de ComputaçãoRESUMO
Drug development pipeline inefficiency has called for more novel solutions and cutting-edge technologies. Artificial intelligence (AI)-based methods including different machine- and deep-learning algorithms have been employed for virtual drug screening. With the continuous refinement of algorithms, improvement of computing hardware, and increased availability of molecular datasets for drug development, it is certainly a prime time for AI-powered virtual drug screening.