Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening.

Vasanthakumari, Priyanka; Zhu, Yitan; Brettin, Thomas; Partin, Alexander; Shukla, Maulik; Xia, Fangfang; Narykov, Oleksandr; Weil, Michael Ryan; Stevens, Rick L.

Cancers (Basel) ; 16(3)2024 Jan 26.

Artigo em Inglês | MEDLINE | ID: mdl-38339281

RESUMO

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.

2.

Engineering of increased L-Threonine production in bacteria by combinatorial cloning and machine learning.

Hanke, Paul; Parrello, Bruce; Vasieva, Olga; Akins, Chase; Chlenski, Philippe; Babnigg, Gyorgy; Henry, Chris; Foflonker, Fatima; Brettin, Thomas; Antonopoulos, Dionysios; Stevens, Rick; Fonstein, Michael.

Metab Eng Commun ; 17: e00225, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37435441

RESUMO

The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4-5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models.

3.

Deep learning methods for drug response prediction in cancer: Predominant and emerging trends.

Partin, Alexander; Brettin, Thomas S; Zhu, Yitan; Narykov, Oleksandr; Clyde, Austin; Overbeek, Jamie; Stevens, Rick L.

Front Med (Lausanne) ; 10: 1086097, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36873878

RESUMO

Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.

4.

Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images.

Partin, Alexander; Brettin, Thomas; Zhu, Yitan; Dolezal, James M; Kochanny, Sara; Pearson, Alexander T; Shukla, Maulik; Evrard, Yvonne A; Doroshow, James H; Stevens, Rick L.

Front Med (Lausanne) ; 10: 1058919, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36960342

RESUMO

Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.

5.

AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection.

Clyde, Austin; Liu, Xuefeng; Brettin, Thomas; Yoo, Hyunseung; Partin, Alexander; Babuji, Yadu; Blaiszik, Ben; Mohd-Yusof, Jamaludin; Merzky, Andre; Turilli, Matteo; Jha, Shantenu; Ramanathan, Arvind; Stevens, Rick.

Sci Rep ; 13(1): 2105, 2023 02 06.

Artigo em Inglês | MEDLINE | ID: mdl-36747041

RESUMO

Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/metabolismo , Inteligência Artificial , Simulação de Acoplamento Molecular , Ligantes , Proteínas/metabolismo

6.

Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR.

Olson, Robert D; Assaf, Rida; Brettin, Thomas; Conrad, Neal; Cucinell, Clark; Davis, James J; Dempsey, Donald M; Dickerman, Allan; Dietrich, Emily M; Kenyon, Ronald W; Kuscuoglu, Mehmet; Lefkowitz, Elliot J; Lu, Jian; Machi, Dustin; Macken, Catherine; Mao, Chunhong; Niewiadomska, Anna; Nguyen, Marcus; Olsen, Gary J; Overbeek, Jamie C; Parrello, Bruce; Parrello, Victoria; Porter, Jacob S; Pusch, Gordon D; Shukla, Maulik; Singh, Indresh; Stewart, Lucy; Tan, Gene; Thomas, Chris; VanOeffelen, Margo; Vonstein, Veronika; Wallace, Zachary S; Warren, Andrew S; Wattam, Alice R; Xia, Fangfang; Yoo, Hyunseung; Zhang, Yun; Zmasek, Christian M; Scheuermann, Richard H; Stevens, Rick L.

Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36350631

RESUMO

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.

Assuntos

Genômica , Software , Vírus , Humanos , Bactérias/genética , Biologia Computacional , Bases de Dados Genéticas , Influenza Humana , Vírus/genética

7.

Integration of Computational Docking into Anti-Cancer Drug Response Prediction Models.

Narykov, Oleksandr; Zhu, Yitan; Brettin, Thomas; Evrard, Yvonne A; Partin, Alexander; Shukla, Maulik; Xia, Fangfang; Clyde, Austin; Vasanthakumari, Priyanka; Doroshow, James H; Stevens, Rick L.

Cancers (Basel) ; 16(1)2023 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-38201477

RESUMO

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.

8.

GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics.

Zvyagin, Maxim; Brace, Alexander; Hippe, Kyle; Deng, Yuntian; Zhang, Bin; Bohorquez, Cindy Orozco; Clyde, Austin; Kale, Bharat; Perez-Rivera, Danilo; Ma, Heng; Mann, Carla M; Irvin, Michael; Pauloski, J Gregory; Ward, Logan; Hayot-Sasson, Valerie; Emani, Murali; Foreman, Sam; Xie, Zhen; Lin, Diangen; Shukla, Maulik; Nie, Weili; Romero, Josh; Dallago, Christian; Vahdat, Arash; Xiao, Chaowei; Gibbs, Thomas; Foster, Ian; Davis, James J; Papka, Michael E; Brettin, Thomas; Stevens, Rick; Anandkumar, Anima; Vishwanath, Venkatram; Ramanathan, Arvind.

bioRxiv ; 2022 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-36451881

RESUMO

We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2, paving the path to realizing this on large biological data.

9.

TULIP: An RNA-seq-based Primary Tumor Type Prediction Tool Using Convolutional Neural Networks.

Jones, Sara; Beyers, Matthew; Shukla, Maulik; Xia, Fangfang; Brettin, Thomas; Stevens, Rick; Weil, M Ryan; Ranganathan Ganakammal, Satishkumar.

Cancer Inform ; 21: 11769351221139491, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36507076

RESUMO

Background: With cancer as one of the leading causes of death worldwide, accurate primary tumor type prediction is critical in identifying genetic factors that can inhibit or slow tumor progression. There have been efforts to categorize primary tumor types with gene expression data using machine learning, and more recently with deep learning, in the last several years. Methods: In this paper, we developed four 1-dimensional (1D) Convolutional Neural Network (CNN) models to classify RNA-seq count data as one of 17 highly represented primary tumor types or 32 primary tumor types regardless of imbalanced representation. Additionally, we adapted the models to take as input either all Ensembl genes (60,483) or protein coding genes only (19,758). Unlike previous work, we avoided selection bias by not filtering genes based on expression values. RNA-seq count data expressed as FPKM-UQ of 9,025 and 10,940 samples from The Cancer Genome Atlas (TCGA) were downloaded from the Genomic Data Commons (GDC) corresponding to 17 and 32 primary tumor types respectively for training and validating the models. Results: All 4 1D-CNN models had an overall accuracy of 94.7% to 97.6% on the test dataset. Further evaluation indicates that the models with protein coding genes only as features performed with better accuracy compared to the models with all Ensembl genes for both 17 and 32 primary tumor types. For all models, the accuracy by primary tumor type was above 80% for most primary tumor types. Conclusions: We packaged all 4 models as a Python-based deep learning classification tool called TULIP (TUmor CLassIfication Predictor) for performing quality control on primary tumor samples and characterizing cancer samples of unknown tumor type. Further optimization of the models is needed to improve the accuracy of certain primary tumor types.

10.

A cross-study analysis of drug response prediction in cancer cell lines.

Xia, Fangfang; Allen, Jonathan; Balaprakash, Prasanna; Brettin, Thomas; Garcia-Cardona, Cristina; Clyde, Austin; Cohn, Judith; Doroshow, James; Duan, Xiaotian; Dubinkina, Veronika; Evrard, Yvonne; Fan, Ya Ju; Gans, Jason; He, Stewart; Lu, Pinyi; Maslov, Sergei; Partin, Alexander; Shukla, Maulik; Stahlberg, Eric; Wozniak, Justin M; Yoo, Hyunseung; Zaki, George; Zhu, Yitan; Stevens, Rick.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34524425

RESUMO

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.

Assuntos

Neoplasias , Algoritmos , Linhagem Celular , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Redes Neurais de Computação

11.

High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor.

Clyde, Austin; Galanie, Stephanie; Kneller, Daniel W; Ma, Heng; Babuji, Yadu; Blaiszik, Ben; Brace, Alexander; Brettin, Thomas; Chard, Kyle; Chard, Ryan; Coates, Leighton; Foster, Ian; Hauner, Darin; Kertesz, Vilmos; Kumar, Neeraj; Lee, Hyungro; Li, Zhuozhao; Merzky, Andre; Schmidt, Jurgen G; Tan, Li; Titov, Mikhail; Trifan, Anda; Turilli, Matteo; Van Dam, Hubertus; Chennubhotla, Srinivas C; Jha, Shantenu; Kovalevsky, Andrey; Ramanathan, Arvind; Head, Martha S; Stevens, Rick.

J Chem Inf Model ; 62(1): 116-128, 2022 01 10.

Artigo em Inglês | MEDLINE | ID: mdl-34793155

RESUMO

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.

Assuntos

COVID-19 , Inibidores de Proteases , Antivirais , Proteases 3C de Coronavírus , Humanos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Ácido Orótico/análogos & derivados , Piperazinas , SARS-CoV-2

12.

A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes.

VanOeffelen, Margo; Nguyen, Marcus; Aytan-Aktug, Derya; Brettin, Thomas; Dietrich, Emily M; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Olson, Robert; Pusch, Gordon D; Shukla, Maulik; Stevens, Rick; Vonstein, Veronika; Warren, Andrew S; Wattam, Alice R; Yoo, Hyunseung; Davis, James J.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34379107

RESUMO

Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Resistência Microbiana a Medicamentos , Genômica/métodos , Testes de Sensibilidade Microbiana , Inteligência Artificial , Bactérias/efeitos dos fármacos , Bactérias/genética , Genoma Bacteriano , Humanos , Laboratórios , Aprendizado de Máquina , Fenótipo

13.

Publisher Correction: Converting tabular data into images for deep learning with convolutional neural networks.

Zhu, Yitan; Brettin, Thomas; Xia, Fangfang; Partin, Alexander; Shukla, Maulik; Yoo, Hyunseung; Evrard, Yvonne A; Doroshow, James H; Stevens, Rick L.

Sci Rep ; 11(1): 14036, 2021 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-34211076

14.

Converting tabular data into images for deep learning with convolutional neural networks.

Zhu, Yitan; Brettin, Thomas; Xia, Fangfang; Partin, Alexander; Shukla, Maulik; Yoo, Hyunseung; Evrard, Yvonne A; Doroshow, James H; Stevens, Rick L.

Sci Rep ; 11(1): 11325, 2021 05 31.

Artigo em Inglês | MEDLINE | ID: mdl-34059739

RESUMO

Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimized assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image. We apply IGTD to transform gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into their respective image representations. Compared with existing transformation methods, IGTD generates compact image representations with better preservation of feature neighborhood structure. Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.

Assuntos

Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Software , Linhagem Celular Tumoral , Humanos

15.

Learning curves for drug response prediction in cancer cell lines.

Partin, Alexander; Brettin, Thomas; Evrard, Yvonne A; Zhu, Yitan; Yoo, Hyunseung; Xia, Fangfang; Jiang, Songhao; Clyde, Austin; Shukla, Maulik; Fonstein, Michael; Doroshow, James H; Stevens, Rick L.

BMC Bioinformatics ; 22(1): 252, 2021 May 17.

Artigo em Inglês | MEDLINE | ID: mdl-34001007

RESUMO

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.

Assuntos

Neoplasias , Preparações Farmacêuticas , Linhagem Celular , Curva de Aprendizado , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Estudos Prospectivos

16.

Ensemble transfer learning for the prediction of anti-cancer drug response.

Zhu, Yitan; Brettin, Thomas; Evrard, Yvonne A; Partin, Alexander; Xia, Fangfang; Shukla, Maulik; Yoo, Hyunseung; Doroshow, James H; Stevens, Rick L.

Sci Rep ; 10(1): 18040, 2020 10 22.

Artigo em Inglês | MEDLINE | ID: mdl-33093487

RESUMO

Transfer learning, which transfers patterns learned on a source dataset to a related target dataset for constructing prediction models, has been shown effective in many applications. In this paper, we investigate whether transfer learning can be used to improve the performance of anti-cancer drug response prediction models. Previous transfer learning studies for drug response prediction focused on building models to predict the response of tumor cells to a specific drug treatment. We target the more challenging task of building general prediction models that can make predictions for both new tumor cells and new drugs. Uniquely, we investigate the power of transfer learning for three drug response prediction applications including drug repurposing, precision oncology, and new drug development, through different data partition schemes in cross-validation. We extend the classic transfer learning framework through ensemble and demonstrate its general utility with three representative prediction algorithms including a gradient boosting model and two deep neural networks. The ensemble transfer learning framework is tested on benchmark in vitro drug screening datasets. The results demonstrate that our framework broadly improves the prediction performance in all three drug response prediction applications with all three prediction algorithms.

Assuntos

Antineoplásicos/farmacologia , Conjuntos de Dados como Assunto , Aprendizado Profundo , Ensaios de Seleção de Medicamentos Antitumorais , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Algoritmos , Antineoplásicos/uso terapêutico , Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos , Humanos , Modelos Biológicos , Redes Neurais de Computação , Medicina de Precisão

17.

Enhanced Co-Expression Extrapolation (COXEN) Gene Selection Method for Building Anti-Cancer Drug Response Prediction Models.

Zhu, Yitan; Brettin, Thomas; Evrard, Yvonne A; Xia, Fangfang; Partin, Alexander; Shukla, Maulik; Yoo, Hyunseung; Doroshow, James H; Stevens, Rick L.

Genes (Basel) ; 11(9)2020 09 11.

Artigo em Inglês | MEDLINE | ID: mdl-32933072

RESUMO

The co-expression extrapolation (COXEN) method has been successfully used in multiple studies to select genes for predicting the response of tumor cells to a specific drug treatment. Here, we enhance the COXEN method to select genes that are predictive of the efficacies of multiple drugs for building general drug response prediction models that are not specific to a particular drug. The enhanced COXEN method first ranks the genes according to their prediction power for each individual drug and then takes a union of top predictive genes of all the drugs, among which the algorithm further selects genes whose co-expression patterns are well preserved between cancer cases for building prediction models. We apply the proposed method on benchmark in vitro drug screening datasets and compare the performance of prediction models built based on the genes selected by the enhanced COXEN method to that of models built on genes selected by the original COXEN method and randomly picked genes. Models built with the enhanced COXEN method always present a statistically significantly improved prediction performance (adjusted p-value ≤ 0.05). Our results demonstrate the enhanced COXEN method can dramatically increase the power of gene expression data for predicting drug response.

Assuntos

Antineoplásicos/farmacologia , Biomarcadores Tumorais/genética , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Algoritmos , Humanos

18.

The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.

Davis, James J; Wattam, Alice R; Aziz, Ramy K; Brettin, Thomas; Butler, Ralph; Butler, Rory M; Chlenski, Philippe; Conrad, Neal; Dickerman, Allan; Dietrich, Emily M; Gabbard, Joseph L; Gerdes, Svetlana; Guard, Andrew; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Murphy-Olson, Dan; Nguyen, Marcus; Nordberg, Eric K; Olsen, Gary J; Olson, Robert D; Overbeek, Jamie C; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomas, Chris; VanOeffelen, Margo; Vonstein, Veronika; Warren, Andrew S; Xia, Fangfang; Xie, Dawen; Yoo, Hyunseung; Stevens, Rick.

Nucleic Acids Res ; 48(D1): D606-D612, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31667520

RESUMO

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.

Assuntos

Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Algoritmos , Animais , Caenorhabditis elegans/genética , Galinhas/genética , Drosophila melanogaster/genética , Interações Hospedeiro-Patógeno/genética , Humanos , Internet , Macaca mulatta/genética , Metagenômica , Camundongos , National Institute of Allergy and Infectious Diseases (U.S.) , Fenótipo , Filogenia , Ratos , Suínos/genética , Estados Unidos , Peixe-Zebra/genética

19.

AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing.

Bhattacharya, Tanmoy; Brettin, Thomas; Doroshow, James H; Evrard, Yvonne A; Greenspan, Emily J; Gryshuk, Amy L; Hoang, Thuc T; Lauzon, Carolyn B Vea; Nissley, Dwight; Penberthy, Lynne; Stahlberg, Eric; Stevens, Rick; Streitz, Fred; Tourassi, Georgia; Xia, Fangfang; Zaki, George.

Front Oncol ; 9: 984, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31632915

RESUMO

The application of data science in cancer research has been boosted by major advances in three primary areas: (1) Data: diversity, amount, and availability of biomedical data; (2) Advances in Artificial Intelligence (AI) and Machine Learning (ML) algorithms that enable learning from complex, large-scale data; and (3) Advances in computer architectures allowing unprecedented acceleration of simulation and machine learning algorithms. These advances help build in silico ML models that can provide transformative insights from data including: molecular dynamics simulations, next-generation sequencing, omics, imaging, and unstructured clinical text documents. Unique challenges persist, however, in building ML models related to cancer, including: (1) access, sharing, labeling, and integration of multimodal and multi-institutional data across different cancer types; (2) developing AI models for cancer research capable of scaling on next generation high performance computers; and (3) assessing robustness and reliability in the AI models. In this paper, we review the National Cancer Institute (NCI) -Department of Energy (DOE) collaboration, Joint Design of Advanced Computing Solutions for Cancer (JDACS4C), a multi-institution collaborative effort focused on advancing computing and data technologies to accelerate cancer research on three levels: molecular, cellular, and population. This collaboration integrates various types of generated data, pre-exascale compute resources, and advances in ML models to increase understanding of basic cancer biology, identify promising new treatment options, predict outcomes, and eventually prescribe specialized treatments for patients with cancer.

20.

Cystic Fibrosis Rapid Response: Translating Multi-omics Data into Clinically Relevant Information.

Cobián Güemes, Ana Georgina; Lim, Yan Wei; Quinn, Robert A; Conrad, Douglas J; Benler, Sean; Maughan, Heather; Edwards, Rob; Brettin, Thomas; Cantú, Vito Adrian; Cuevas, Daniel; Hamidi, Rohaum; Dorrestein, Pieter; Rohwer, Forest.

mBio ; 10(2)2019 04 16.

Artigo em Inglês | MEDLINE | ID: mdl-30992350

RESUMO

Pulmonary exacerbations are the leading cause of death in cystic fibrosis (CF) patients. To track microbial dynamics during acute exacerbations, a CF rapid response (CFRR) strategy was developed. The CFRR relies on viromics, metagenomics, metatranscriptomics, and metabolomics data to rapidly monitor active members of the viral and microbial community during acute CF exacerbations. To highlight CFRR, a case study of a CF patient is presented, in which an abrupt decline in lung function characterized a fatal exacerbation. The microbial community in the patient's lungs was closely monitored through the multi-omics strategy, which led to the identification of pathogenic shigatoxigenic Escherichia coli (STEC) expressing Shiga toxin. This case study illustrates the potential for the CFRR to deconstruct complicated disease dynamics and provide clinicians with alternative treatments to improve the outcomes of pulmonary exacerbations and expand the life spans of individuals with CF.IMPORTANCE Proper management of polymicrobial infections in patients with cystic fibrosis (CF) has extended their life span. Information about the composition and dynamics of each patient's microbial community aids in the selection of appropriate treatment of pulmonary exacerbations. We propose the cystic fibrosis rapid response (CFRR) as a fast approach to determine viral and microbial community composition and activity during CF pulmonary exacerbations. The CFRR potential is illustrated with a case study in which a cystic fibrosis fatal exacerbation was characterized by the presence of shigatoxigenic Escherichia coli The incorporation of the CFRR within the CF clinic could increase the life span and quality of life of CF patients.

Assuntos

Fibrose Cística/complicações , Progressão da Doença , Infecções por Escherichia coli/diagnóstico , Genômica , Pulmão/microbiologia , Metabolômica , Adulto , Estudos de Casos e Controles , Coinfecção/complicações , Fibrose Cística/microbiologia , Gerenciamento Clínico , Evolução Fatal , Perfilação da Expressão Gênica , Humanos , Pulmão/fisiopatologia , Masculino , Metaboloma , Metagenoma , Microbiota , Toxina Shiga/genética , Escherichia coli Shiga Toxigênica/genética , Escherichia coli Shiga Toxigênica/patogenicidade

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA