RESUMO
Although gene discovery in neuropsychiatric disorders, including autism spectrum disorder, intellectual disability, epilepsy, schizophrenia, and Tourette disorder, has accelerated, resulting in a large number of molecular clues, it has proven difficult to generate specific hypotheses without the corresponding datasets at the protein complex and functional pathway level. Here, we describe one path forward-an initiative aimed at mapping the physical and genetic interaction networks of these conditions and then using these maps to connect the genomic data to neurobiology and, ultimately, the clinic. These efforts will include a team of geneticists, structural biologists, neurobiologists, systems biologists, and clinicians, leveraging a wide array of experimental approaches and creating a collaborative infrastructure necessary for long-term investigation. This initiative will ultimately intersect with parallel studies that focus on other diseases, as there is a significant overlap with genes implicated in cancer, infectious disease, and congenital heart defects.
Assuntos
Mapeamento Cromossômico/métodos , Transtornos do Neurodesenvolvimento/genética , Biologia de Sistemas/métodos , Redes Reguladoras de Genes/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Humanos , Neurobiologia/métodos , NeuropsiquiatriaRESUMO
Make-on-demand chemical libraries have drastically increased the reach of molecular docking, with the enumerated ready-to-dock ZINC-22 library approaching 6.4 billion molecules (July 2024). While ever-growing libraries result in better-scoring molecules, the computational resources required to dock all of ZINC-22 make this endeavor infeasible for most. Here, we organize and traverse chemical space with hierarchical navigable small-world graphs, a method we term retrieval augmented docking (RAD). RAD recovers most virtual actives, despite docking only a fraction of the library. Furthermore, RAD is protein-agnostic, supporting additional docking campaigns without additional computational overhead. In depth, we assess RAD on published large-scale docking campaigns against D4 and AmpC spanning 99.5 million and 138 million molecules, respectively. RAD recovers 95% of DOCK virtual actives for both targets after evaluating only 10% of the libraries. In breadth, RAD shows widespread applicability against 43 DUDE-Z proteins, evaluating 50.3 million associations. On average, RAD recovers 87% of virtual actives while docking 10% of the library without sacrificing chemical diversity.
Assuntos
Simulação de Acoplamento Molecular , Bibliotecas de Moléculas Pequenas , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Bibliotecas de Moléculas Pequenas/metabolismo , Proteínas/química , Proteínas/metabolismoRESUMO
Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein-ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand-receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand-receptor complex properties when ligand-receptor data are available.
Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Proteínas/química , Proteínas/metabolismo , Simulação de Acoplamento Molecular , Ligação ProteicaRESUMO
Machine learning-based drug discovery success depends on molecular representation. Yet traditional molecular fingerprints omit both the protein and pointers back to structural information that would enable better model interpretability. Therefore, we propose LUNA, a Python 3 toolkit that calculates and encodes protein-ligand interactions into new hashed fingerprints inspired by Extended Connectivity FingerPrint (ECFP): EIFP (Extended Interaction FingerPrint), FIFP (Functional Interaction FingerPrint), and Hybrid Interaction FingerPrint (HIFP). LUNA also provides visual strategies to make the fingerprints interpretable. We performed three major experiments exploring the fingerprints' use. First, we trained machine learning models to reproduce DOCK3.7 scores using 1 million docked Dopamine D4 complexes. We found that EIFP-4,096 performed (R2 = 0.61) superior to related molecular and interaction fingerprints. Second, we used LUNA to support interpretable machine learning models. Finally, we demonstrate that interaction fingerprints can accurately identify similarities across molecular complexes that other fingerprints overlook. Hence, we envision LUNA and its interface fingerprints as promising methods for machine learning-based virtual screening campaigns. LUNA is freely available at https://github.com/keiserlab/LUNA.
Assuntos
Dopamina , Proteínas , Descoberta de Drogas/métodos , Ligantes , Aprendizado de Máquina , Proteínas/químicaRESUMO
Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R2 = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.
Assuntos
Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
Many psychiatric drugs act on multiple targets and therefore require screening assays that encompass a wide target space. With sufficiently rich phenotyping and a large sampling of compounds, it should be possible to identify compounds with desired mechanisms of action on the basis of behavioral profiles alone. Although zebrafish (Danio rerio) behavior has been used to rapidly identify neuroactive compounds, it is not clear what types of behavioral assays would be necessary to identify multitarget compounds such as antipsychotics. Here we developed a battery of behavioral assays in larval zebrafish to determine whether behavioral profiles can provide sufficient phenotypic resolution to identify and classify psychiatric drugs. Using the antipsychotic drug haloperidol as a test case, we found that behavioral profiles of haloperidol-treated zebrafish could be used to identify previously uncharacterized compounds with desired antipsychotic-like activities and multitarget mechanisms of action.
Assuntos
Antipsicóticos/análise , Antipsicóticos/farmacologia , Comportamento Animal/efeitos dos fármacos , Peixe-Zebra , Animais , Antipsicóticos/química , Larva/efeitos dos fármacos , Camundongos , Estrutura Molecular , Peixe-Zebra/crescimento & desenvolvimentoRESUMO
Whereas 400 million distinct compounds are now purchasable within the span of a few weeks, the biological activities of most are unknown. To facilitate access to new chemistry for biology, we have combined the Similarity Ensemble Approach (SEA) with the maximum Tanimoto similarity to the nearest bioactive to predict activity for every commercially available molecule in ZINC. This method, which we label SEA+TC, outperforms both SEA and a naïve-Bayesian classifier via predictive performance on a 5-fold cross-validation of ChEMBL's bioactivity data set (version 21). Using this method, predictions for over 40% of compounds (>160 million) have either high significance (pSEA ≥ 40), high similarity (ECFP4MaxTc ≥ 0.4), or both, for one or more of 1382 targets well described by ligands in the literature. Using a further 1347 less-well-described targets, we predict activities for an additional 11 million compounds. To gauge whether these predictions are sensible, we investigate 75 predictions for 50 drugs lacking a binding affinity annotation in ChEMBL. The 535 million predictions for over 171 million compounds at 2629 targets are linked to purchasing information and evidence to support each prediction and are freely available via https://zinc15.docking.org and https://files.docking.org .
Assuntos
Descoberta de Drogas/métodos , Teorema de Bayes , Perfilação da Expressão Gênica , Ligantes , Relação Quantitativa Estrutura-Atividade , Reprodutibilidade dos Testes , Software , Interface Usuário-ComputadorRESUMO
Discovering the unintended 'off-targets' that predict adverse drug reactions is daunting by empirical methods alone. Drugs can act on several protein targets, some of which can be unrelated by conventional molecular metrics, and hundreds of proteins have been implicated in side effects. Here we use a computational strategy to predict the activity of 656 marketed drugs on 73 unintended 'side-effect' targets. Approximately half of the predictions were confirmed, either from proprietary databases unknown to the method or by new experimental assays. Affinities for these new off-targets ranged from 1 nM to 30 µM. To explore relevance, we developed an association metric to prioritize those new off-targets that explained side effects better than any known target of a given drug, creating a drug-target-adverse drug reaction network. Among these new associations was the prediction that the abdominal pain side effect of the synthetic oestrogen chlorotrianisene was mediated through its newly discovered inhibition of the enzyme cyclooxygenase-1. The clinical relevance of this inhibition was borne out in whole human blood platelet aggregation assays. This approach may have wide application to de-risking toxicological liabilities in drug discovery.
Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Testes de Toxicidade/métodos , Plaquetas/efeitos dos fármacos , Clorotrianiseno/efeitos adversos , Clorotrianiseno/química , Clorotrianiseno/farmacologia , Ciclo-Oxigenase 1/metabolismo , Inibidores de Ciclo-Oxigenase/efeitos adversos , Inibidores de Ciclo-Oxigenase/farmacologia , Bases de Dados Factuais , Estrogênios não Esteroides/efeitos adversos , Estrogênios não Esteroides/farmacologia , Previsões , Humanos , Modelos Biológicos , Terapia de Alvo Molecular/efeitos adversos , Agregação Plaquetária/efeitos dos fármacos , Reprodutibilidade dos Testes , Especificidade por SubstratoRESUMO
Phenotypic screens can identify molecules that are at once penetrant and active on the integrated circuitry of a whole cell or organism. These advantages are offset by the need to identify the targets underlying the phenotypes. Additionally, logistical considerations limit screening for certain physiological and behavioral phenotypes to organisms such as zebrafish and C. elegans. This further raises the challenge of elucidating whether compound-target relationships found in model organisms are preserved in humans. To address these challenges we searched for compounds that affect feeding behavior in C. elegans and sought to identify their molecular mechanisms of action. Here, we applied predictive chemoinformatics to small molecules previously identified in a C. elegans phenotypic screen likely to be enriched for feeding regulatory compounds. Based on the predictions, 16 of these compounds were tested in vitro against 20 mammalian targets. Of these, nine were active, with affinities ranging from 9 nM to 10 µM. Four of these nine compounds were found to alter feeding. We then verified the in vitro findings in vivo through genetic knockdowns, the use of previously characterized compounds with high affinity for the four targets, and chemical genetic epistasis, which is the effect of combined chemical and genetic perturbations on a phenotype relative to that of each perturbation in isolation. Our findings reveal four previously unrecognized pathways that regulate feeding in C. elegans with strong parallels in mammals. Together, our study addresses three inherent challenges in phenotypic screening: the identification of the molecular targets from a phenotypic screen, the confirmation of the in vivo relevance of these targets, and the evolutionary conservation and relevance of these targets to their human orthologs.
Assuntos
Caenorhabditis elegans/efeitos dos fármacos , Comportamento Alimentar/efeitos dos fármacos , Animais , Caenorhabditis elegans/fisiologia , Proteínas de Caenorhabditis elegans/antagonistas & inibidores , Proteínas de Caenorhabditis elegans/metabolismo , Simulação por Computador , Avaliação Pré-Clínica de Medicamentos , Humanos , Peristaltismo/efeitos dos fármacos , Faringe/efeitos dos fármacos , Fenótipo , Quinolinas/farmacologia , Receptores Acoplados a Proteínas G/antagonistas & inibidores , Receptores Acoplados a Proteínas G/metabolismo , Bibliotecas de Moléculas PequenasRESUMO
Although drugs are intended to be selective, at least some bind to several physiological targets, explaining side effects and efficacy. Because many drug-target combinations exist, it would be useful to explore possible interactions computationally. Here we compared 3,665 US Food and Drug Administration (FDA)-approved and investigational drugs against hundreds of targets, defining each target by its ligands. Chemical similarities between drugs and ligand sets predicted thousands of unanticipated associations. Thirty were tested experimentally, including the antagonism of the beta(1) receptor by the transporter inhibitor Prozac, the inhibition of the 5-hydroxytryptamine (5-HT) transporter by the ion channel drug Vadilex, and antagonism of the histamine H(4) receptor by the enzyme inhibitor Rescriptor. Overall, 23 new drug-target associations were confirmed, five of which were potent (<100 nM). The physiological relevance of one, the drug N,N-dimethyltryptamine (DMT) on serotonergic receptors, was confirmed in a knockout mouse. The chemical similarity approach is systematic and comprehensive, and may suggest side-effects and new indications for many drugs.
Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Preparações Farmacêuticas/metabolismo , Especificidade por Substrato , Animais , Biologia Computacional , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Ligantes , Camundongos , Camundongos Knockout , Uso Off-Label , Receptores de Serotonina/metabolismo , Estados Unidos , United States Food and Drug AdministrationRESUMO
Metformin, an established first-line treatment for patients with type 2 diabetes, has been associated with gastrointestinal (GI) adverse effects that limit its use. Histamine and serotonin have potent effects on the GI tract. The effects of metformin on histamine and serotonin uptake were evaluated in cell lines overexpressing several amine transporters (OCT1, OCT3 and SERT). Metformin inhibited histamine and serotonin uptake by OCT1, OCT3 and SERT in a dose-dependent manner, with OCT1-mediated amine uptake being most potently inhibited (IC50 = 1.5 mM). A chemoinformatics-based method known as Similarity Ensemble Approach predicted diamine oxidase (DAO) as an additional intestinal target of metformin, with an E-value of 7.4 × 10(-5). Inhibition of DAO was experimentally validated using a spectrophotometric assay with putrescine as the substrate. The Ki of metformin for DAO was measured to be 8.6 ± 3.1 mM. In this study, we found that metformin inhibited intestinal amine transporters and DAO at concentrations that may be achieved in the intestine after therapeutic doses. Further studies are warranted to determine the relevance of these interactions to the adverse effects of metformin on the gastrointestinal tract.
Assuntos
Proteínas de Membrana Transportadoras/metabolismo , Metformina/metabolismo , Amina Oxidase (contendo Cobre)/metabolismo , Transporte Biológico/fisiologia , Linhagem Celular , Diabetes Mellitus Tipo 2/metabolismo , Células HEK293 , Humanos , Mucosa Intestinal/metabolismo , Cinética , Fator 3 de Transcrição de Octâmero/metabolismo , Transportador 1 de Cátions Orgânicos/metabolismo , Proteínas da Membrana Plasmática de Transporte de Serotonina/metabolismoRESUMO
Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we develop ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we infer the chemical sensitivity of cancer cell lines and tumor samples and analyze how the model makes predictions. We retrospectively evaluate drug response predictions for precision breast cancer treatment and prospectively validate chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identifies transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.
Assuntos
Transcriptoma , Humanos , Linhagem Celular Tumoral , Antineoplásicos/farmacologia , Neoplasias da Mama/genética , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Ferroptose/efeitos dos fármacos , Ferroptose/genética , Feminino , Aprendizado de Máquina , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Simulação por ComputadorRESUMO
Quantitatively mapping enzyme sequence-catalysis landscapes remains a critical challenge in understanding enzyme function, evolution, and design. Here, we expand an emerging microfluidic platform to measure catalytic constants- k cat and K M -for hundreds of diverse naturally occurring sequences and mutants of the model enzyme Adenylate Kinase (ADK). This enables us to dissect the sequence-catalysis landscape's topology, navigability, and mechanistic underpinnings, revealing distinct catalytic peaks organized by structural motifs. These results challenge long-standing hypotheses in enzyme adaptation, demonstrating that thermophilic enzymes are not slower than their mesophilic counterparts. Combining the rich representations of protein sequences provided by deep-learning models with our custom high-throughput kinetic data yields semi-supervised models that significantly outperform existing models at predicting catalytic parameters of naturally occurring ADK sequences. Our work demonstrates a promising strategy for dissecting sequence-catalysis landscapes across enzymatic evolution and building family-specific models capable of accurately predicting catalytic constants, opening new avenues for enzyme engineering and functional prediction.
RESUMO
Accumulation of abnormal tau protein into neurofibrillary tangles (NFTs) is a pathologic hallmark of Alzheimer disease (AD). Accurate detection of NFTs in tissue samples can reveal relationships with clinical, demographic, and genetic features through deep phenotyping. However, expert manual analysis is time-consuming, subject to observer variability, and cannot handle the data amounts generated by modern imaging. We present a scalable, open-source, deep-learning approach to quantify NFT burden in digital whole slide images (WSIs) of post-mortem human brain tissue. To achieve this, we developed a method to generate detailed NFT boundaries directly from single-point-per-NFT annotations. We then trained a semantic segmentation model on 45 annotated 2400µm by 1200µm regions of interest (ROIs) selected from 15 unique temporal cortex WSIs of AD cases from three institutions (University of California (UC)-Davis, UC-San Diego, and Columbia University). Segmenting NFTs at the single-pixel level, the model achieved an area under the receiver operating characteristic of 0.832 and an F1 of 0.527 (196-fold over random) on a held-out test set of 664 NFTs from 20 ROIs (7 WSIs). We compared this to deep object detection, which achieved comparable but coarser-grained performance that was 60% faster. The segmentation and object detection models correlated well with expert semi-quantitative scores at the whole-slide level (Spearman's rho ρ=0.654 (p=6.50e-5) and ρ=0.513 (p=3.18e-3), respectively). We openly release this multi-institution deep-learning pipeline to provide detailed NFT spatial distribution and morphology analysis capability at a scale otherwise infeasible by manual assessment.
RESUMO
Target identification is a core challenge in chemical genetics. Here we use chemical similarity to computationally predict the targets of 586 compounds that were active in a zebrafish behavioral assay. Among 20 predictions tested, 11 compounds had activities ranging from 1 nM to 10,000 nM on the predicted targets. The roles of two of these targets were tested in the original zebrafish phenotype. Prediction of targets from chemotype is rapid and may be generally applicable.
Assuntos
Simulação por Computador , Avaliação Pré-Clínica de Medicamentos/métodos , Animais , Comportamento Animal/efeitos dos fármacos , Relação Dose-Resposta a Droga , Fenótipo , Relação Estrutura-Atividade , Peixe-ZebraRESUMO
Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we developed ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we inferred the chemical sensitivity of cancer cell lines and tumor samples and analyzed how the model makes predictions. We retrospectively evaluated drug response predictions for precision breast cancer treatment and prospectively validated chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identified transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.
RESUMO
Natural and experimental genetic variants can modify DNA loops and insulating boundaries to tune transcription, but it is unknown how sequence perturbations affect chromatin organization genome wide. We developed a deep-learning strategy to quantify the effect of any insertion, deletion, or substitution on chromatin contacts and systematically scored millions of synthetic variants. While most genetic manipulations have little impact, regions with CTCF motifs and active transcription are highly sensitive, as expected. Our unbiased screen and subsequent targeted experiments also point to noncoding RNA genes and several families of repetitive elements as CTCF-motif-free DNA sequences with particularly large effects on nearby chromatin interactions, sometimes exceeding the effects of CTCF sites and explaining interactions that lack CTCF. We anticipate that our disruption tracks may be of broad interest and utility as a measure of 3D genome sensitivity, and our computational strategies may serve as a template for biological inquiry with deep learning.
RESUMO
The investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components. ChromaFactor provides the ability to identify trends accounting for the maximum variance in the dataset while simultaneously describing the contribution of individual molecules to each component. Applying our approach to two single-molecule imaging datasets across different genomic scales, we find that these primary components demonstrate significant correlation with key functional phenotypes, including active transcription, enhancer-promoter distance, and genomic compartment. ChromaFactor offers a robust tool for understanding the complex interplay between chromatin structure and function on individual DNA molecules, pinpointing which subpopulations drive functional changes and fostering new insights into cellular heterogeneity and its implications for bulk genomic phenomena.
RESUMO
Precise, scalable, and quantitative evaluation of whole slide images is crucial in neuropathology. We release a deep learning model for rapid object detection and precise information on the identification, locality, and counts of cored plaques and cerebral amyloid angiopathies (CAAs). We trained this object detector using a repurposed image-tile dataset without any human-drawn bounding boxes. We evaluated the detector on a new manually-annotated dataset of whole slide images (WSIs) from three institutions, four staining procedures, and four human experts. The detector matched the cohort of neuropathology experts, achieving 0.64 (model) vs. 0.64 (cohort) average precision (AP) for cored plaques and 0.75 vs. 0.51 AP for CAAs at a 0.5 IOU threshold. It provided count and locality predictions that correlated with gold-standard CERAD-like WSI scoring (p=0.07± 0.10). The openly-available model can quickly score WSIs in minutes without a GPU on a standard workstation.
RESUMO
Precise, scalable, and quantitative evaluation of whole slide images is crucial in neuropathology. We release a deep learning model for rapid object detection and precise information on the identification, locality, and counts of cored plaques and cerebral amyloid angiopathy (CAA). We trained this object detector using a repurposed image-tile dataset without any human-drawn bounding boxes. We evaluated the detector on a new manually-annotated dataset of whole slide images (WSIs) from three institutions, four staining procedures, and four human experts. The detector matched the cohort of neuropathology experts, achieving 0.64 (model) vs. 0.64 (cohort) average precision (AP) for cored plaques and 0.75 vs. 0.51 AP for CAAs at a 0.5 IOU threshold. It provided count and locality predictions that approximately correlated with gold-standard human CERAD-like WSI scoring (p = 0.07 ± 0.10). The openly-available model can quickly score WSIs in minutes without a GPU on a standard workstation.