Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Comput Struct Biotechnol J ; 23: 1929-1937, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38736695

RESUMO

Recent advances in language modeling have had a tremendous impact on how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade, and have since gained prominence in modeling proteins and chemical processes, elucidating structural relationships from textual/sequential data. Surprisingly, some of these relationships refer to three-dimensional structural features, raising important questions on the dimensionality of the information encoded within sequential data. Here, we demonstrate that the unsupervised use of a language model architecture to a language representation of bio-catalyzed chemical reactions can capture the signal at the base of the substrate-binding site atomic interactions. This allows us to identify the three-dimensional binding site position in unknown protein sequences. The language representation comprises a reaction-simplified molecular-input line-entry system (SMILES) for substrate and products, and amino acid sequence information for the enzyme. This approach can recover, with no supervision, 52.13% of the binding site when considering co-crystallized substrate-enzyme structures as ground truth, vastly outperforming other attention-based models.

2.
Chimia (Aarau) ; 77(7-8): 484-488, 2023 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-38047789

RESUMO

The RXN for Chemistry project, initiated by IBM Research Europe - Zurich in 2017, aimed to develop a series of digital assets using machine learning techniques to promote the use of data-driven methodologies in synthetic organic chemistry. This research adopts an innovative concept by treating chemical reaction data as language records, treating the prediction of a synthetic organic chemistry reaction as a translation task between precursor and product languages. Over the years, the IBM Research team has successfully developed language models for various applications including forward reaction prediction, retrosynthesis, reaction classification, atom-mapping, procedure extraction from text, inference of experimental protocols and its use in programming commercial automation hardware to implement an autonomous chemical laboratory. Furthermore, the project has recently incorporated biochemical data in training models for greener and more sustainable chemical reactions. The remarkable ease of constructing prediction models and continually enhancing them through data augmentation with minimal human intervention has led to the widespread adoption of language model technologies, facilitating the digitalization of chemistry in diverse industrial sectors such as pharmaceuticals and chemical manufacturing. This manuscript provides a concise overview of the scientific components that contributed to the prestigious Sandmeyer Award in 2022.

4.
Nat Commun ; 14(1): 3686, 2023 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-37344485

RESUMO

Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.

5.
J Chem Inf Model ; 62(18): 4295-4299, 2022 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-36098536

RESUMO

Recent work showed that active site rather than full-protein-sequence information improves predictive performance in kinase-ligand binding affinity prediction. To refine the notion of an "active site", we here propose and compare multiple definitions. We report significant evidence that our novel definition is superior to previous definitions and better models of ATP-noncompetitive inhibitors. Moreover, we leverage the discontiguity of the active site sequence to motivate novel protein-sequence augmentation strategies and find that combining them further improves performance.


Assuntos
Trifosfato de Adenosina , Trifosfato de Adenosina/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Ligantes , Ligação Proteica
6.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35724564

RESUMO

In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes-or macromolecular machines-are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.


Assuntos
Biologia Computacional , Proteínas , Biologia Computacional/métodos , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Processamento de Linguagem Natural
7.
Nat Commun ; 13(1): 964, 2022 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-35181654

RESUMO

Enzyme catalysts are an integral part of green chemistry strategies towards a more sustainable and resource-efficient chemical synthesis. However, the use of biocatalysed reactions in retrosynthetic planning clashes with the difficulties in predicting the enzymatic activity on unreported substrates and enzyme-specific stereo- and regioselectivity. As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, we extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis. The enzymatic knowledge is learned from an extensive data set of publicly available biochemical reactions with the aid of a new class token scheme based on the enzyme commission classification number, which captures catalysis patterns among different enzymes belonging to the same hierarchy. The forward reaction prediction model (top-1 accuracy of 49.6%), the retrosynthetic pathway (top-1 single-step round-trip accuracy of 39.6%) and the curated data set are made publicly available to facilitate the adoption of enzymatic catalysis in the design of greener chemistry processes.


Assuntos
Biocatálise , Reatores Biológicos , Técnicas de Química Sintética , Química Verde/métodos , Catálise , Quimioinformática , Recursos Naturais
8.
J Chem Inf Model ; 62(2): 240-257, 2022 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-34905358

RESUMO

Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 discontiguous residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, data sets, and performance metrics and holds true when predicting pIC50 for both unseen ligands and kinases. Our interpretability analysis reveals a potential explanation for the superiority of the active site models: whereas only mild statistical effects about the extraction of three-dimensional (3D) interaction sites take place in the full sequence models, the active site models are equipped with an implicit but strong inductive bias about the 3D structure stemming from the discontiguity of the active sites. Moreover, in direct comparisons, our models perform similarly or better than previous state-of-the-art approaches in affinity prediction. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES- and SELFIES-based molecular generators). Our work challenges the assumption that the full primary structure is indispensable for modeling human kinases.


Assuntos
Proteínas , Sítios de Ligação , Domínio Catalítico , Humanos , Ligantes , Ligação Proteica , Proteínas/metabolismo
10.
Bioinformatics ; 37(Suppl_1): i245-i253, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252933

RESUMO

SUMMARY: In recent years, SWATH-MS has become the proteomic method of choice for data-independent-acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification is limited to a small set of peptides, potentially important biological information may be discarded. Here we demonstrate that deep learning can be used to learn discriminative features directly from raw MS data, eliminating hence the need of elaborate data processing pipelines. Using transfer learning to overcome sample sparsity, we exploit a collection of publicly available deep learning models already trained for the task of natural image classification. These models are used to produce feature vectors from each mass spectrometry (MS) raw image, which are later used as input for a classifier trained to distinguish tumor from normal prostate biopsies. Although the deep learning models were originally trained for a completely different classification task and no additional fine-tuning is performed on them, we achieve a highly remarkable classification performance of 0.876 AUC. We investigate different types of image preprocessing and encoding. We also investigate whether the inclusion of the secondary MS2 spectra improves the classification performance. Throughout all tested models, we use standard protein expression vectors as gold standards. Even with our naïve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis. AVAILABILITY AND IMPLEMENTATION: The open source code used to generate the results from MS images is available on GitHub: https://ibm.biz/mstransc. The raw MS data underlying this article cannot be shared publicly for the privacy of individuals that participated in the study. Processed data including the MS images, their encodings, classification labels and results can be accessed at the following link: https://ibm.box.com/v/mstc-supplementary. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Estudos de Viabilidade , Humanos , Masculino , Espectrometria de Massas , Redes Neurais de Computação , Proteômica , Reprodutibilidade dos Testes
11.
Curr Med Chem ; 28(38): 7862-7886, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34325627

RESUMO

It is more pressing than ever to reduce the time and costs for the development of lead compounds in the pharmaceutical industry. The co-occurrence of advances in high-throughput screening and the rise of deep learning (DL) have enabled the development of large-scale multimodal predictive models for virtual drug screening. Recently, deep generative models have emerged as a powerful tool to explore the chemical space and raise hopes to expedite the drug discovery process. Following this progress in chemocentric approaches for generative chemistry, the next challenge is to build multimodal conditional generative models that leverage disparate knowledge sources when mapping biochemical properties to target structures. Here, we call the community to bridge drug discovery more closely with systems biology when designing deep generative models. Complementing the plethora of reviews on the role of DL in chemoinformatics, we specifically focus on the interface of predictive and generative modelling for drug discovery. Through a systematic publication keyword search on PubMed and a selection of preprint servers (arXiv, biorXiv, chemRxiv, and medRxiv), we quantify trends in the field and find that molecular graphs and VAEs have become the most widely adopted molecular representations and architectures in generative models, respectively. We discuss progress on DL for toxicity, drug-target affinity, and drug sensitivity prediction and specifically focus on conditional molecular generative models that encompass multimodal prediction models. Moreover, we outline future prospects in the field and identify challenges such as the integration of deep learning systems into experimental workflows in a closed-loop manner or the adoption of federated machine learning techniques to overcome data sharing barriers. Other challenges include, but are not limited to interpretability in generative models, more sophisticated metrics for the evaluation of molecular generative models, and, following up on that, community-accepted benchmarks for both multimodal drug property prediction and property-driven molecular design.


Assuntos
Aprendizado Profundo , Desenho de Fármacos , Descoberta de Drogas , Humanos , Aprendizado de Máquina , Modelos Moleculares
12.
Patterns (N Y) ; 2(6): 100269, 2021 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-33969323

RESUMO

Although a plethora of research articles on AI methods on COVID-19 medical imaging are published, their clinical value remains unclear. We conducted the largest systematic review of the literature addressing the utility of AI in imaging for COVID-19 patient care. By keyword searches on PubMed and preprint servers throughout 2020, we identified 463 manuscripts and performed a systematic meta-analysis to assess their technical merit and clinical relevance. Our analysis evidences a significant disparity between clinical and AI communities, in the focus on both imaging modalities (AI experts neglected CT and ultrasound, favoring X-ray) and performed tasks (71.9% of AI papers centered on diagnosis). The vast majority of manuscripts were found to be deficient regarding potential use in clinical practice, but 2.7% (n = 12) publications were assigned a high maturity level and are summarized in greater detail. We provide an itemized discussion of the challenges in developing clinically relevant AI solutions with recommendations and remedies.

13.
iScience ; 24(4): 102269, 2021 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-33851095

RESUMO

With the advent of deep generative models in computational chemistry, in-silico drug design is undergoing an unprecedented transformation. Although deep learning approaches have shown potential in generating compounds with desired chemical properties, they disregard the cellular environment of target diseases. Bridging systems biology and drug design, we present a reinforcement learning method for de novo molecular design from gene expression profiles. We construct a hybrid Variational Autoencoder that tailors molecules to target-specific transcriptomic profiles, using an anticancer drug sensitivity prediction model (PaccMann) as reward function. Without incorporating information about anticancer drugs, the molecule generation is biased toward compounds with high predicted efficacy against cell lines or cancer types. The generation can be further refined by subsidiary constraints such as toxicity. Our cancer-type-specific candidate drugs are similar to cancer drugs in drug-likeness, synthesizability, and solubility and frequently exhibit the highest structural similarity to compounds with known efficacy against these cancer types.

14.
Bioinformatics ; 37(14): 2070-2072, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-33241320

RESUMO

SUMMARY: The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization. Here, we present Consensus Interaction Network Inference Service (COSIFER), a package and a companion web-based platform to infer molecular networks from expression data using state-of-the-art consensus approaches. COSIFER includes a selection of state-of-the-art methodologies for network inference and different consensus strategies to integrate the predictions of individual methods and generate robust networks. AVAILABILITY AND IMPLEMENTATION: COSIFER Python source code is available at https://github.com/PhosphorylatedRabbits/cosifer. The web service is accessible at https://ibm.biz/cosifer-aas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Consenso
15.
Genome Biol ; 21(1): 302, 2020 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-33317623

RESUMO

BACKGROUND: Tumor-specific genomic aberrations are routinely determined by high-throughput genomic measurements. It remains unclear how complex genome alterations affect molecular networks through changing protein levels and consequently biochemical states of tumor tissues. RESULTS: Here, we investigate the propagation of genomic effects along the axis of gene expression during prostate cancer progression. We quantify genomic, transcriptomic, and proteomic alterations based on 105 prostate samples, consisting of benign prostatic hyperplasia regions and malignant tumors, from 39 prostate cancer patients. Our analysis reveals the convergent effects of distinct copy number alterations impacting on common downstream proteins, which are important for establishing the tumor phenotype. We devise a network-based approach that integrates perturbations across different molecular layers, which identifies a sub-network consisting of nine genes whose joint activity positively correlates with increasingly aggressive tumor phenotypes and is predictive of recurrence-free survival. Further, our data reveal a wide spectrum of intra-patient network effects, ranging from similar to very distinct alterations on different molecular layers. CONCLUSIONS: This study uncovers molecular networks with considerable convergent alterations across tumor sites and patients. It also exposes a diversity of network effects: we could not identify a single sub-network that is perturbed in all high-grade tumor regions.


Assuntos
Progressão da Doença , Regulação Neoplásica da Expressão Gênica , Neoplasias da Próstata/genética , Biomarcadores Tumorais/genética , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Genômica , Humanos , Masculino , Mutação , Fenótipo , Próstata/patologia , Proteogenômica , Proteoma , Proteômica , RNA Mensageiro , Transcriptoma
16.
NPJ Syst Biol Appl ; 6(1): 27, 2020 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-32843649

RESUMO

Knowledge about the clonal evolution of a tumor can help to interpret the function of its genetic alterations by identifying initiating events and events that contribute to the selective advantage of proliferative, metastatic, and drug-resistant subclones. Clonal evolution can be reconstructed from estimates of the relative abundance (frequency) of subclone-specific alterations in tumor biopsies, which, in turn, inform on its composition. However, estimating these frequencies is complicated by the high genetic instability that characterizes many cancers. Models for genetic instability suggest that copy number alterations (CNAs) can influence mutation-frequency estimates and thus impede efforts to reconstruct tumor phylogenies. Our analysis suggested that accurate mutation frequency estimates require accounting for CNAs-a challenging endeavour using the genetic profile of a single tumor biopsy. Instead, we propose an optimization algorithm, Chimæra, to account for the effects of CNAs using profiles of multiple biopsies per tumor. Analyses of simulated data and tumor profiles suggested that Chimæra estimates are consistently more accurate than those of previously proposed methods and resulted in improved phylogeny reconstructions and subclone characterizations. Our analyses inferred recurrent initiating mutations in hepatocellular carcinomas, resolved the clonal composition of Wilms' tumors, and characterized the acquisition of mutations in drug-resistant prostate cancers.


Assuntos
Evolução Clonal , Neoplasias/genética , Neoplasias/patologia , Biópsia , Variações do Número de Cópias de DNA , Humanos
17.
Nucleic Acids Res ; 48(W1): W502-W508, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32402082

RESUMO

The identification of new targeted and personalized therapies for cancer requires the fast and accurate assessment of the drug efficacy of potential compounds against a particular biomolecular sample. It has been suggested that the integration of complementary sources of information might strengthen the accuracy of a drug efficacy prediction model. Here, we present a web-based platform for the Prediction of AntiCancer Compound sensitivity with Multimodal Attention-based Neural Networks (PaccMann). PaccMann is trained on public transcriptomic cell line profiles, compound structure information and drug sensitivity screenings, and outperforms state-of-the-art methods on anticancer drug sensitivity prediction. On the open-access web service (https://ibm.biz/paccmann-aas), users can select a known drug compound or design their own compound structure in an interactive editor, perform in-silico drug testing and investigate compound efficacy on publicly available or user-provided transcriptomic profiles. PaccMann leverages methods for model interpretability and outputs confidence scores as well as attention heatmaps that highlight the genes and chemical sub-structures that were more important to make a prediction, hence facilitating the understanding of the model's decision making and the involved biochemical processes. We hope to serve the community with a toolbox for fast and efficient validation in drug repositioning or lead compound identification regimes.


Assuntos
Antineoplásicos/farmacologia , Reposicionamento de Medicamentos , Software , Antineoplásicos/química , Simulação por Computador , Perfilação da Expressão Gênica , Internet , Redes Neurais de Computação , Sirolimo/análogos & derivados , Sirolimo/farmacologia
18.
IEEE/ACM Trans Comput Biol Bioinform ; 17(6): 2141-2147, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31494553

RESUMO

Boolean models are a powerful abstraction for qualitative modeling of gene regulatory networks. With the recent availability of advanced high-throughput technologies, Boolean models have increasingly grown in size and complexity, posing a challenge for existing software simulation tools that have not scaled at the same speed. Field Programmable Gate Arrays (FPGAs) are powerful reconfigurable integrated circuits that can offer massive performance improvements. Due to their highly parallel nature, FPGAs are well suited to simulate complex molecular networks. We present here a new simulation framework for Boolean models, which first converts the model to Verilog, a standardized hardware description language, and then connects it to an execution core that runs on an FPGA coherently attached to a POWER8 processor. We report an order of magnitude speedup over a multi-threaded software simulation tool running on the same processor on a selection of Boolean models. Analysis on a T-cell large granular lymphocyte leukemia (T-LGL) demonstrates that our framework achieves consistent performance improvements resulting in new biological insights. In addition, we show that our solution allows to perform attractor detection at an unprecedented speed, exhibiting a speedup ranging from one to three orders of magnitude compared to alternative software solutions.


Assuntos
Biologia Computacional/métodos , Simulação por Computador , Redes Reguladoras de Genes/genética , Modelos Genéticos , Humanos , Leucemia Linfocítica Granular Grande/genética , Software
19.
Sci Rep ; 9(1): 15918, 2019 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-31685861

RESUMO

We present the Network-based Biased Tree Ensembles (NetBiTE) method for drug sensitivity prediction and drug sensitivity biomarker identification in cancer using a combination of prior knowledge and gene expression data. Our devised method consists of a biased tree ensemble that is built according to a probabilistic bias weight distribution. The bias weight distribution is obtained from the assignment of high weights to the drug targets and propagating the assigned weights over a protein-protein interaction network such as STRING. The propagation of weights, defines neighborhoods of influence around the drug targets and as such simulates the spread of perturbations within the cell, following drug administration. Using a synthetic dataset, we showcase how application of biased tree ensembles (BiTE) results in significant accuracy gains at a much lower computational cost compared to the unbiased random forests (RF) algorithm. We then apply NetBiTE to the Genomics of Drug Sensitivity in Cancer (GDSC) dataset and demonstrate that NetBiTE outperforms RF in predicting IC50 drug sensitivity, only for drugs that target membrane receptor pathways (MRPs): RTK, EGFR and IGFR signaling pathways. We propose based on the NetBiTE results, that for drugs that inhibit MRPs, the expression of target genes prior to drug administration is a biomarker for IC50 drug sensitivity following drug administration. We further verify and reinforce this proposition through control studies on, PI3K/MTOR signaling pathway inhibitors, a drug category that does not target MRPs, and through assignment of dummy targets to MRP inhibiting drugs and investigating the variation in NetBiTE accuracy.


Assuntos
Algoritmos , Antineoplásicos/química , Biomarcadores/metabolismo , Neoplasias/patologia , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Bases de Dados Factuais , Humanos , Concentração Inibidora 50 , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Mapas de Interação de Proteínas/efeitos dos fármacos , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Receptores de Superfície Celular/metabolismo , Transdução de Sinais/efeitos dos fármacos , Serina-Treonina Quinases TOR/metabolismo
20.
Mol Pharm ; 16(12): 4797-4806, 2019 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-31618586

RESUMO

In line with recent advances in neural drug design and sensitivity prediction, we propose a novel architecture for interpretable prediction of anticancer compound sensitivity using a multimodal attention-based convolutional encoder. Our model is based on the three key pillars of drug sensitivity: compounds' structure in the form of a SMILES sequence, gene expression profiles of tumors, and prior knowledge on intracellular interactions from protein-protein interaction networks. We demonstrate that our multiscale convolutional attention-based encoder significantly outperforms a baseline model trained on Morgan fingerprints and a selection of encoders based on SMILES, as well as the previously reported state-of-the-art for multimodal drug sensitivity prediction (R2 = 0.86 and RMSE = 0.89). Moreover, the explainability of our approach is demonstrated by a thorough analysis of the attention weights. We show that the attended genes significantly enrich apoptotic processes and that the drug attention is strongly correlated with a standard chemical structure similarity index. Finally, we report a case study of two receptor tyrosine kinase (RTK) inhibitors acting on a leukemia cell line, showcasing the ability of the model to focus on informative genes and submolecular regions of the two compounds. The demonstrated generalizability and the interpretability of our model testify to its potential for in silico prediction of anticancer compound efficacy on unseen cancer cells, positioning it as a valid solution for the development of personalized therapies as well as for the evaluation of candidate compounds in de novo drug design.


Assuntos
Algoritmos , Antineoplásicos , Aprendizado Profundo , Desenho de Fármacos , Humanos , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA