RESUMO
Computing quantum chemical properties of small molecules and polymers can provide insights valuable into physicists, chemists, and biologists when designing new materials, catalysts, biological probes, and drugs. Deep learning can compute quantum chemical properties accurately in a fraction of time required by commonly used methods such as density functional theory. Most current approaches to deep learning in quantum chemistry begin with geometric information from experimentally derived molecular structures or pre-calculated atom coordinates. These approaches have many useful applications, but they can be costly in time and computational resources. In this study, we demonstrate that accurate quantum chemical computations can be performed without geometric information by operating in the coordinate-free domain using deep learning on graph encodings. Coordinate-free methods rely only on molecular graphs, the connectivity of atoms and bonds, without atom coordinates or bond distances. We also find that the choice of graph-encoding architecture substantially affects the performance of these methods. The structures of these graph-encoding architectures provide an opportunity to probe an important, outstanding question in quantum mechanics: what types of quantum chemical properties can be represented by local variable models? We find that Wave, a local variable model, accurately calculates the quantum chemical properties, while graph convolutional architectures require global variables. Furthermore, local variable Wave models outperform global variable graph convolution models on complex molecules with large, correlated systems.
RESUMO
Machine learning, combined with a proliferation of electronic healthcare records (EHR), has the potential to transform medicine by identifying previously unknown interventions that reduce the risk of adverse outcomes. To realize this potential, machine learning must leave the conceptual 'black box' in complex domains to overcome several pitfalls, like the presence of confounding variables. These variables predict outcomes but are not causal, often yielding uninformative models. In this work, we envision a 'conversational' approach to design machine learning models, which couple modeling decisions to domain expertise. We demonstrate this approach via a retrospective cohort study to identify factors which affect the risk of hospital-acquired venous thromboembolism (HA-VTE). Using logistic regression for modeling, we have identified drugs that reduce the risk of HA-VTE. Our analysis reveals that ondansetron, an anti-nausea and anti-emetic medication, commonly used in treating side-effects of chemotherapy and post-general anesthesia period, substantially reduces the risk of HA-VTE when compared to aspirin (11% vs. 15% relative risk reduction or RRR, respectively). The low cost and low morbidity of ondansetron may justify further inquiry into its use as a preventative agent for HA-VTE. This case study highlights the importance of engaging domain expertise while applying machine learning in complex domains.
Assuntos
Tromboembolia Venosa , Hospitais , Humanos , Aprendizado de Máquina , Ondansetron/uso terapêutico , Estudos Retrospectivos , Fatores de Risco , Tromboembolia Venosa/epidemiologia , Tromboembolia Venosa/prevenção & controleRESUMO
Atom- or bond-level chemical properties of interest in medicinal chemistry, such as drug metabolism and electrophilic reactivity, are important to understand and predict across arbitrary new molecules. Deep learning can be used to map molecular structures to their chemical properties, but the data sets for these tasks are relatively small, which can limit accuracy and generalizability. To overcome this limitation, it would be preferable to model these properties on the basis of the underlying quantum chemical characteristics of small molecules. However, it is difficult to learn higher level chemical properties from lower level quantum calculations. To overcome this challenge, we pretrained deep learning models to compute quantum chemical properties and then reused the intermediate representations constructed by the pretrained network. Transfer learning, in this way, substantially outperformed models based on chemical graphs alone or quantum chemical properties alone. This result was robust, observable in five prediction tasks: identifying sites of epoxidation by metabolic enzymes and identifying sites of covalent reactivity with cyanide, glutathione, DNA and protein. We see that this approach may substantially improve the accuracy of deep learning models for specific chemical structures, such as aromatic systems.
Assuntos
Aprendizado Profundo , Teoria Quântica , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologiaRESUMO
BACKGROUND: Pathologist evaluation of donor liver biopsies provides information for accepting or discarding potential donor livers. Due to the urgent nature of the decision process, this is regularly performed using frozen sectioning at the time of biopsy. The percent steatosis in a donor liver biopsy correlates with transplant outcome, however there is significant inter- and intra-observer variability in quantifying steatosis, compounded by frozen section artifact. We hypothesized that a deep learning model could identify and quantify steatosis in donor liver biopsies. METHODS: We developed a deep learning convolutional neural network that generates a steatosis probability map from an input whole slide image (WSI) of a hematoxylin and eosin-stained frozen section, and subsequently calculates the percent steatosis. Ninety-six WSI of frozen donor liver sections from our transplant pathology service were annotated for steatosis and used to train (n = 30 WSI) and test (n = 66 WSI) the deep learning model. FINDINGS: The model had good correlation and agreement with the annotation in both the training set (r of 0.88, intraclass correlation coefficient [ICC] of 0.88) and novel input test sets (r = 0.85 and ICC=0.85). These measurements were superior to the estimates of the on-service pathologist at the time of initial evaluation (r = 0.52 and ICC=0.52 for the training set, and r = 0.74 and ICC=0.72 for the test set). INTERPRETATION: Use of this deep learning algorithm could be incorporated into routine pathology workflows for fast, accurate, and reproducible donor liver evaluation. FUNDING: Mid-America Transplant Society.
Assuntos
Aprendizado Profundo , Fígado Gorduroso/patologia , Doadores Vivos , Algoritmos , Biópsia , Fígado Gorduroso/diagnóstico por imagem , Secções Congeladas , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imuno-Histoquímica , Transplante de Fígado , Anotação de Sequência Molecular , Redes Neurais de Computação , Índice de Gravidade de DoençaRESUMO
Thiazoles are biologically active aromatic heterocyclic rings occurring frequently in natural products and drugs. These molecules undergo typically harmless elimination; however, a hepatotoxic response can occur due to multistep bioactivation of the thiazole to generate a reactive thioamide. A basis for those differences in outcomes remains unknown. A textbook example is the high hepatotoxicity observed for sudoxicam in contrast to the relative safe use and marketability of meloxicam, which differs in structure from sudoxicam by the addition of a single methyl group. Both drugs undergo bioactivation, but meloxicam exhibits an additional detoxification pathway due to hydroxylation of the methyl group. We hypothesized that thiazole bioactivation efficiency is similar between sudoxicam and meloxicam due to the methyl group being a weak electron donator, and thus, the relevance of bioactivation depends on the competing detoxification pathway. For a rapid analysis, we modeled epoxidation of sudoxicam derivatives to investigate the impact of substituents on thiazole bioactivation. As expected, electron donating groups increased the likelihood for epoxidation with a minimal effect for the methyl group, but model predictions did not extrapolate well among all types of substituents. Through analytical methods, we measured steady-state kinetics for metabolic bioactivation of sudoxicam and meloxicam by human liver microsomes. Sudoxicam bioactivation was 6-fold more efficient than that for meloxicam, yet meloxicam showed a 6-fold higher efficiency of detoxification than bioactivation. Overall, sudoxicam bioactivation was 15-fold more likely than meloxicam considering all metabolic clearance pathways. Kinetic differences likely arise from different enzymes catalyzing respective metabolic pathways based on phenotyping studies. Rather than simply providing an alternative detoxification pathway, the meloxicam methyl group suppressed the bioactivation reaction. These findings indicate the impact of thiazole substituents on bioactivation is more complex than previously thought and likely contributes to the unpredictability of their toxic potential.
Assuntos
Meloxicam/metabolismo , Tiazinas/metabolismo , Ativação Metabólica , Biotransformação , Doença Hepática Induzida por Substâncias e Drogas/metabolismo , Elétrons , Compostos de Epóxi/metabolismo , Humanos , Hidroxilação , Técnicas In Vitro , Cinética , Redes e Vias Metabólicas/efeitos dos fármacos , Microssomos Hepáticos/metabolismo , Tiazóis/metabolismoRESUMO
Metabolism of drugs affects their absorption, distribution, efficacy, excretion, and toxicity profiles. Metabolism is routinely assessed experimentally using recombinant enzymes, human liver microsome, and animal models. Unfortunately, these experiments are expensive, time-consuming, and often extrapolate poorly to humans because they fail to capture the full breadth of metabolic reactions observed in vivo. As a result, metabolic pathways leading to the formation of toxic metabolites are often missed during drug development, giving rise to costly failures. To address some of these limitations, computational metabolism models can rapidly and cost-effectively predict sites of metabolism-the atoms or bonds which undergo enzymatic modifications-on thousands of drug candidates, thereby improving the likelihood of discovering metabolic transformations forming toxic metabolites. However, current computational metabolism models are often unable to predict the specific metabolites formed by metabolism at certain sites. Identification of reaction type is a key step toward metabolite prediction. Phase I enzymes, which are responsible for the metabolism of more than 90% of FDA approved drugs, catalyze highly diverse types of reactions and produce metabolites with substantial structural variability. Without knowledge of potential metabolite structures, medicinal chemists cannot differentiate harmful metabolic transformations from beneficial ones. To address this shortcoming, we propose a system for simultaneously labeling sites of metabolism and reaction types, by classifying them into five key reaction classes: stable and unstable oxidations, dehydrogenation, hydrolysis, and reduction. These classes unambiguously identify 21 types of phase I reactions, which cover 92.3% of known reactions in our database. We used this labeling system to train a neural network model of phase I metabolism on a literature-derived data set encompassing 20â¯736 human phase I metabolic reactions. Our model, Rainbow XenoSite, was able to identify reaction-type specific sites of metabolism with a cross-validated accuracy of 97.1% area under the receiver operator curve. Rainbow XenoSite with five-color and combined output is available for use free and online through our secure server at http://swami.wustl.edu/xenosite/p/phase1_rainbow.
Assuntos
Aprendizado Profundo , Animais , Cor , Humanos , Redes e Vias Metabólicas , Microssomos Hepáticos , Redes Neurais de ComputaçãoRESUMO
Pediatric patients are at elevated risk of adverse drug reactions, and there is insufficient information on drug safety in children. Complicating risk assessment in children, there are numerous age-dependent changes in the absorption, distribution, metabolism, and elimination of drugs. A key contributor to age-dependent drug toxicity risk is the ontogeny of drug metabolism enzymes, the changes in both abundance and type throughout development from the fetal period through adulthood. Critically, these changes affect not only the overall clearance of drugs but also exposure to individual metabolites. In this study, we introduce time-embedding neural networks in order to model population-level variation in metabolism enzyme expression as a function of age. We use a time-embedding network to model the ontogeny of 23 drug metabolism enzymes. The time-embedding network recapitulates known demographic factors impacting 3A5 expression. The time-embedding network also effectively models the nonlinear dynamics of 2D6 expression, enabling a better fit to clinical data than prior work. In contrast, a standard neural network fails to model these features of 3A5 and 2D6 expression. Finally, we combine the time-embedding model of ontogeny with additional information to estimate age-dependent changes in reactive metabolite exposure. This simple approach identifies age-dependent changes in exposure to valproic acid and dextromethorphan metabolites and suggests potential mechanisms of valproic acid toxicity. This approach may help researchers evaluate the risk of drug toxicity in pediatric populations.
Assuntos
Neoplasias Hepáticas/metabolismo , Redes Neurais de Computação , Adolescente , Carboxilesterase/metabolismo , Criança , Pré-Escolar , Sistema Enzimático do Citocromo P-450/metabolismo , Glucuronosiltransferase/metabolismo , Glutationa Transferase/metabolismo , Humanos , Inativação Metabólica , Lactente , Oxigenases/metabolismo , Análise de Componente Principal , Sulfurtransferases/metabolismo , Fatores de TempoRESUMO
PURPOSE: Following automated variant calling, manual review of aligned read sequences is required to identify a high-quality list of somatic variants. Despite widespread use in analyzing sequence data, methods to standardize manual review have not been described, resulting in high inter- and intralab variability. METHODS: This manual review standard operating procedure (SOP) consists of methods to annotate variants with four different calls and 19 tags. The calls indicate a reviewer's confidence in each variant and the tags indicate commonly observed sequencing patterns and artifacts that inform the manual review call. Four individuals were asked to classify variants prior to, and after, reading the SOP and accuracy was assessed by comparing reviewer calls with orthogonal validation sequencing. RESULTS: After reading the SOP, average accuracy in somatic variant identification increased by 16.7% (p value = 0.0298) and average interreviewer agreement increased by 12.7% (p value < 0.001). Manual review conducted after reading the SOP did not significantly increase reviewer time. CONCLUSION: This SOP supports and enhances manual somatic variant detection by improving reviewer accuracy while reducing the interreviewer variability for variant calling and annotation.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Mutação/genética , Neoplasias/genética , Software , Algoritmos , Humanos , Neoplasias/patologia , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de SequênciaRESUMO
Transplantable kidneys are in very limited supply. Accurate viability assessment prior to transplantation could minimize organ discard. Rapid and accurate evaluation of intra-operative donor kidney biopsies is essential for determining which kidneys are eligible for transplantation. The criterion for accepting or rejecting donor kidneys relies heavily on pathologist determination of the percent of glomeruli (determined from a frozen section) that are normal and sclerotic. This percentage is a critical measurement that correlates with transplant outcome. Inter- and intra-observer variability in donor biopsy evaluation is, however, significant. An automated method for determination of percent global glomerulosclerosis could prove useful in decreasing evaluation variability, increasing throughput, and easing the burden on pathologists. Here, we describe the development of a deep learning model that identifies and classifies non-sclerosed and sclerosed glomeruli in whole-slide images of donor kidney frozen section biopsies. This model extends a convolutional neural network (CNN) pre-trained on a large database of digital images. The extended model, when trained on just 48 whole slide images, exhibits slide-level evaluation performance on par with expert renal pathologists. Encouragingly, the model's performance is robust to slide preparation artifacts associated with frozen section preparation. The model substantially outperforms a model trained on image patches of isolated glomeruli, in terms of both accuracy and speed. The methodology overcomes the technical challenge of applying a pretrained CNN bottleneck model to whole-slide image classification. The traditional patch-based approach, while exhibiting deceptively good performance classifying isolated patches, does not translate successfully to whole-slide image segmentation in this setting. As the first model reported that identifies and classifies normal and sclerotic glomeruli in frozen kidney sections, and thus the first model reported in the literature relevant to kidney transplantation, it may become an essential part of donor kidney biopsy evaluation in the clinical setting.
Assuntos
Aprendizado Profundo , Glomerulonefrite/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Rim/diagnóstico por imagem , Transplantes/diagnóstico por imagem , Algoritmos , Secções Congeladas , Humanos , Transplante de RimRESUMO
Scientists rely on high-throughput screening tools to identify promising small-molecule compounds for the development of biochemical probes and drugs. This study focuses on the identification of promiscuous bioactive compounds, which are compounds that appear active in many high-throughput screening experiments against diverse targets but are often false-positives which may not be easily developed into successful probes. These compounds can exhibit bioactivity due to nonspecific, intractable mechanisms of action and/or by interference with specific assay technology readouts. Such "frequent hitters" are now commonly identified using substructure filters, including pan assay interference compounds (PAINS). Herein, we show that mechanistic modeling of small-molecule reactivity using deep learning can improve upon PAINS filters when modeling promiscuous bioactivity in PubChem assays. Without training on high-throughput screening data, a deep learning model of small-molecule reactivity achieves a sensitivity and specificity of 18.5% and 95.5%, respectively, in identifying promiscuous bioactive compounds. This performance is similar to PAINS filters, which achieve a sensitivity of 20.3% at the same specificity. Importantly, such reactivity modeling is complementary to PAINS filters. When PAINS filters and reactivity models are combined, the resulting model outperforms either method alone, achieving a sensitivity of 24% at the same specificity. However, as a probabilistic model, the sensitivity and specificity of the deep learning model can be tuned by adjusting the threshold. Moreover, for a subset of PAINS filters, this reactivity model can help discriminate between promiscuous and nonpromiscuous bioactive compounds even among compounds matching those filters. Critically, the reactivity model provides mechanistic hypotheses for assay interference by predicting the precise atoms involved in compound reactivity. Overall, our analysis suggests that deep learning approaches to modeling promiscuous compound bioactivity may provide a complementary approach to current methods for identifying promiscuous compounds.
Assuntos
Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala/métodos , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Animais , Simulação por Computador , Bases de Dados Factuais , Inibidores Enzimáticos/química , Inibidores Enzimáticos/farmacologia , Histona Acetiltransferases/antagonistas & inibidores , Histona Acetiltransferases/metabolismo , Humanos , Modelos Biológicos , Redes Neurais de ComputaçãoRESUMO
A collection of new approaches to building and training neural networks, collectively referred to as deep learning, are attracting attention in theoretical chemistry. Several groups aim to replace computationally expensive ab initio quantum mechanics calculations with learned estimators. This raises questions about the representability of complex quantum chemical systems with neural networks. Can local-variable models efficiently approximate nonlocal quantum chemical features? Here, we find that convolutional architectures, those that only aggregate information locally, cannot efficiently represent aromaticity and conjugation in large systems. They cannot represent long-range nonlocality known to be important in quantum chemistry. This study uses aromatic and conjugated systems computed from molecule graphs, though reproducing quantum simulations is the ultimate goal. This task, by definition, is both computable and known to be important to chemistry. The failure of convolutional architectures on this focused task calls into question their use in modeling quantum mechanics. To remedy this heretofore unrecognized deficiency, we introduce a new architecture that propagates information back and forth in waves of nonlinear computation. This architecture is still a local-variable model, and it is both computationally and representationally efficient, processing molecules in sublinear time with far fewer parameters than convolutional networks. Wave-like propagation models aromatic and conjugated systems with high accuracy, and even models the impact of small structural changes on large molecules. This new architecture demonstrates that some nonlocal features of quantum chemistry can be efficiently represented in local variable models.
RESUMO
Follicular lymphoma (FL) is the most common form of indolent non-Hodgkin lymphoma, yet it remains only partially characterized at the genomic level. To improve our understanding of the genetic underpinnings of this incurable and clinically heterogeneous disease, whole-exome sequencing was performed on tumor/normal pairs from a discovery cohort of 24 patients with FL. Using these data and mutations identified in other B-cell malignancies, 1716 genes were sequenced in 113 FL tumor samples from 105 primarily treatment-naive individuals. We identified 39 genes that were mutated significantly above background mutation rates. CREBBP mutations were associated with inferior PFS. In contrast, mutations in previously unreported HVCN1, a voltage-gated proton channel-encoding gene and B-cell receptor signaling modulator, were associated with improved PFS. In total, 47 (44.8%) patients harbor mutations in the interconnected B-cell receptor (BCR) and CXCR4 signaling pathways. Histone gene mutations were more frequent than previously reported (identified in 43.8% of patients) and often co-occurred (17.1% of patients). A novel, recurrent hotspot was identified at a posttranslationally modified residue in the histone H2B family. This study expands the number of mutated genes described in several known signaling pathways and complexes involved in lymphoma pathogenesis (BCR, Notch, SWitch/sucrose nonfermentable (SWI/SNF), vacuolar ATPases) and identified novel recurrent mutations (EGR1/2, POU2AF1, BTK, ZNF608, HVCN1) that require further investigation in the context of FL biology, prognosis, and treatment.
Assuntos
Proteína de Ligação a CREB/genética , Regulação Neoplásica da Expressão Gênica , Canais Iônicos/genética , Linfoma Folicular/genética , Receptores de Antígenos de Linfócitos B/genética , Transdução de Sinais/genética , Adulto , Tirosina Quinase da Agamaglobulinemia , Idoso , Idoso de 80 Anos ou mais , Proteína de Ligação a CREB/metabolismo , Intervalo Livre de Doença , Proteína 1 de Resposta de Crescimento Precoce/genética , Proteína 1 de Resposta de Crescimento Precoce/metabolismo , Feminino , Perfilação da Expressão Gênica , Histonas/genética , Histonas/metabolismo , Humanos , Canais Iônicos/metabolismo , Linfoma Folicular/diagnóstico , Linfoma Folicular/mortalidade , Linfoma Folicular/patologia , Masculino , Pessoa de Meia-Idade , Mutação , Proteínas Tirosina Quinases/genética , Proteínas Tirosina Quinases/metabolismo , Receptores de Antígenos de Linfócitos B/metabolismo , Receptores CXCR4/genética , Receptores CXCR4/metabolismo , Receptores Notch/genética , Receptores Notch/metabolismo , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Transativadores/genética , Transativadores/metabolismo , ATPases Vacuolares Próton-Translocadoras/genética , ATPases Vacuolares Próton-Translocadoras/metabolismoRESUMO
Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.
Assuntos
Segurança Computacional , Disseminação de Informação , Algoritmos , Registros Eletrônicos de Saúde , Humanos , Modelos TeóricosRESUMO
UNLABELLED: Cytochrome P450 enzymes (P450s) are metabolic enzymes that process the majority of FDA-approved, small-molecule drugs. Understanding how these enzymes modify molecule structure is key to the development of safe, effective drugs. XenoSite server is an online implementation of the XenoSite, a recently published computational model for P450 metabolism. XenoSite predicts which atomic sites of a molecule--sites of metabolism (SOMs)--are modified by P450s. XenoSite server accepts input in common chemical file formats including SDF and SMILES and provides tools for visualizing the likelihood that each atomic site is a site of metabolism for a variety of important P450s, as well as a flat file download of SOM predictions. AVAILABILITY AND IMPLEMENTATION: XenoSite server is available at http://swami.wustl.edu/xenosite.
Assuntos
Biologia Computacional/métodos , Sistema Enzimático do Citocromo P-450/metabolismo , Dibenzotiepinas/metabolismo , Internet , Redes e Vias Metabólicas , Xenobióticos/metabolismo , Antipsicóticos/metabolismo , Sistema Enzimático do Citocromo P-450/química , Humanos , Simulação de Acoplamento Molecular , Redes Neurais de Computação , Probabilidade , Bibliotecas de Moléculas Pequenas/metabolismoRESUMO
ProteomeScout (https://proteomescout.wustl.edu) is a resource for the study of proteins and their post-translational modifications (PTMs) consisting of a database of PTMs, a repository for experimental data, an analysis suite for PTM experiments, and a tool for visualizing the relationships between complex protein annotations. The PTM database is a compendium of public PTM data, coupled with user-uploaded experimental data. ProteomeScout provides analysis tools for experimental datasets, including summary views and subset selection, which can identify relationships within subsets of data by testing for statistically significant enrichment of protein annotations. Protein annotations are incorporated in the ProteomeScout database from external resources and include terms such as Gene Ontology annotations, domains, secondary structure and non-synonymous polymorphisms. These annotations are available in the database download, in the analysis tools and in the protein viewer. The protein viewer allows for the simultaneous visualization of annotations in an interactive web graphic, which can be exported in Scalable Vector Graphics (SVG) format. Finally, quantitative data measurements associated with public experiments are also easily viewable within protein records, allowing researchers to see how PTMs change across different contexts. ProteomeScout should prove useful for protein researchers and should benefit the proteomics community by providing a stable repository for PTM experiments.
Assuntos
Bases de Dados de Proteínas , Processamento de Proteína Pós-Traducional , Internet , Anotação de Sequência Molecular , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , ProteômicaRESUMO
Small-molecule screens are an integral part of drug discovery. Public domain data in PubChem alone represent more than 158 million measurements, 1.2 million molecules, and 4300 assays. We conducted a global analysis of these data, building a network of assays and connecting the assays if they shared nonpromiscuous active molecules. This network spans both phenotypic and target-based screens, recapitulates known biology, and identifies new polypharmacology. Phenotypic screens are extremely important for drug discovery, contributing to the discovery of a large proportion of new drugs. Connections between phenotypic and biochemical, target-based screens can suggest strategies for repurposing both small-molecule and biologic drugs. For example, a screen for molecules that prevent cell death from a mutated version of superoxide-dismutase is linked with ALOX15. This connection suggests a therapeutic role for ALOX15 inhibitors in amyotrophic lateral sclerosis. An interactive version of the network is available online (http://swami.wustl.edu/flow/assay_network.html).
Assuntos
Esclerose Lateral Amiotrófica/tratamento farmacológico , Bioensaio/métodos , Descoberta de Drogas , Ensaios de Triagem em Larga Escala/métodos , Algoritmos , Esclerose Lateral Amiotrófica/genética , Esclerose Lateral Amiotrófica/metabolismo , Araquidonato 15-Lipoxigenase/química , Araquidonato 15-Lipoxigenase/genética , Área Sob a Curva , Humanos , Inibidores de Lipoxigenase/química , Modelos Estatísticos , Mutação , Fenótipo , Curva ROCRESUMO
In this study, we propose a new, secure method of sharing useful chemical information from small-molecule libraries, without revealing the structures of the libraries' molecules. Our method shares the relationship between molecules rather than structural descriptors. This is an important advance because, over the past few years, several groups have developed and published new methods of analyzing small-molecule screening data. These methods include advanced hit-picking protocols, promiscuous active filters, economic optimization algorithms, and screening visualizations, which can identify patterns in the data that might otherwise be overlooked. Application of these methods to private data requires finding strategies for sharing useful chemical data without revealing chemical structures. This problem has been examined in the context of ADME prediction models, with results from information theory suggesting it is impossible to share useful chemical information without revealing structures. In contrast, we present a new strategy for encoding the relationships between molecules instead of their structures, based on anonymized scaffold networks and trees, that safely shares enough chemical information to be useful in analyzing chemical data, while also sufficiently blinding structures from discovery. We present the details of this encoding, an analysis of the usefulness of the information it conveys, and the security of the structures it encodes. This approach makes it possible to share data across institutions, and may securely enable collaborative analysis that can yield insight into both specific projects and screening technology as a whole.
Assuntos
Bases de Dados de Compostos Químicos , Disseminação de Informação , Modelos Químicos , Biologia Computacional , Segurança Computacional , Comportamento Cooperativo , Teoria da Informação , Estrutura MolecularRESUMO
Understanding how xenobiotic molecules are metabolized is important because it influences the safety, efficacy, and dose of medicines and how they can be modified to improve these properties. The cytochrome P450s (CYPs) are proteins responsible for metabolizing 90% of drugs on the market, and many computational methods can predict which atomic sites of a molecule--sites of metabolism (SOMs)--are modified during CYP-mediated metabolism. This study improves on prior methods of predicting CYP-mediated SOMs by using new descriptors and machine learning based on neural networks. The new method, XenoSite, is faster to train and more accurate by as much as 4% or 5% for some isozymes. Furthermore, some "incorrect" predictions made by XenoSite were subsequently validated as correct predictions by revaluation of the source literature. Moreover, XenoSite output is interpretable as a probability, which reflects both the confidence of the model that a particular atom is metabolized and the statistical likelihood that its prediction for that atom is correct.
Assuntos
Sistema Enzimático do Citocromo P-450/química , Simulação de Acoplamento Molecular , Redes Neurais de Computação , Bibliotecas de Moléculas Pequenas/química , Biotransformação , Domínio Catalítico , Sistema Enzimático do Citocromo P-450/metabolismo , Humanos , Isoenzimas/química , Isoenzimas/metabolismo , Ligantes , Probabilidade , Ligação Proteica , Bibliotecas de Moléculas Pequenas/metabolismo , Relação Estrutura-Atividade , Especificidade por Substrato , TermodinâmicaRESUMO
Mercury is present in many industrial processes at low concentrations and is a cause for concern due to the propensity for mercury to bioaccumulate. As a cumulative toxin, introduction of mercury into the environment at any level has the potential to adversely affect ecologic systems. To date, no commercial precipitants are available that can irreversibly and permanently bind mercury. In the current work, selected commercial reagents were compared alongside the dianionic ligand 1,3-benzenediamidoethanethiolate (BDET(2-)) to test the feasibility of low-level (parts-per-billion, ppb) mercury treatment for groundwater near a chloralkali plant. Of all the reagents examined, only K(2)BDET was capable of reducing mercury concentrations to below instrumental detection limits of 0.05 ppb with the added benefit of producing a stable precipitate.