Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 164
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350631

RESUMEN

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.


Asunto(s)
Genómica , Programas Informáticos , Virus , Humanos , Bacterias/genética , Biología Computacional , Bases de Datos Genéticas , Gripe Humana , Virus/genética
2.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34524425

RESUMEN

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.


Asunto(s)
Neoplasias , Algoritmos , Línea Celular , Humanos , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Redes Neurales de la Computación
3.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34379107

RESUMEN

Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Farmacorresistencia Microbiana , Genómica/métodos , Pruebas de Sensibilidad Microbiana , Inteligencia Artificial , Bacterias/efectos de los fármacos , Bacterias/genética , Genoma Bacteriano , Humanos , Laboratorios , Aprendizaje Automático , Fenotipo
4.
J Chem Inf Model ; 62(1): 116-128, 2022 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-34793155

RESUMEN

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.


Asunto(s)
COVID-19 , Inhibidores de Proteasas , Antivirales , Proteasas 3C de Coronavirus , Humanos , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Ácido Orótico/análogos & derivados , Piperazinas , SARS-CoV-2
5.
Nucleic Acids Res ; 48(D1): D606-D612, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31667520

RESUMEN

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.


Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Algoritmos , Animales , Caenorhabditis elegans/genética , Pollos/genética , Drosophila melanogaster/genética , Interacciones Huésped-Patógeno/genética , Humanos , Internet , Macaca mulatta/genética , Metagenómica , Ratones , National Institute of Allergy and Infectious Diseases (U.S.) , Fenotipo , Filogenia , Ratas , Porcinos/genética , Estados Unidos , Pez Cebra/genética
6.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-34001007

RESUMEN

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Asunto(s)
Neoplasias , Preparaciones Farmacéuticas , Línea Celular , Curva de Aprendizaje , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Estudios Prospectivos
7.
Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-28968762

RESUMEN

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Farmacorresistencia Microbiana/genética , Integración de Sistemas , Biología Computacional/tendencias , Bases de Datos Genéticas/estadística & datos numéricos , Genoma Microbiano , Humanos , Internet , Anotación de Secuencia Molecular
8.
Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899627

RESUMEN

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.


Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Bacteriano , Genómica/métodos , Antibacterianos/farmacología , Bacterias/efectos de los fármacos , Bacterias/metabolismo , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Farmacorresistencia Bacteriana , Anotación de Secuencia Molecular , Proteoma , Proteómica/métodos , Programas Informáticos , Navegador Web
9.
BMC Bioinformatics ; 19(Suppl 18): 491, 2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30577736

RESUMEN

BACKGROUND: Current multi-petaflop supercomputers are powerful systems, but present challenges when faced with problems requiring large machine learning workflows. Complex algorithms running at system scale, often with different patterns that require disparate software packages and complex data flows cause difficulties in assembling and managing large experiments on these machines. RESULTS: This paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembles of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales. The initial release of the application framework that we call CANDLE/Supervisor addresses the problem of hyper-parameter exploration of deep neural networks. CONCLUSIONS: Initial results demonstrating CANDLE on DOE systems at ORNL, ANL and NERSC (Titan, Theta and Cori, respectively) demonstrate both scaling and multi-platform execution.


Asunto(s)
Detección Precoz del Cáncer/métodos , Aprendizaje Automático/tendencias , Neoplasias/diagnóstico , Humanos , Neoplasias/patología , Redes Neurales de la Computación , Flujo de Trabajo
10.
BMC Bioinformatics ; 19(Suppl 18): 486, 2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30577754

RESUMEN

BACKGROUND: The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity. RESULTS: We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance. While our best result is achieved with a combination of molecular feature types (gene expression, microRNA and proteome), we show that most of the predictive power comes from drug descriptors. To further demonstrate value in detecting anticancer therapy, we rank the drug pairs for each cell line based on model predicted combination effect and recover 80% of the top pairs with enhanced activity. CONCLUSIONS: We present promising results in applying deep learning to predicting combinational drug response. Our feature analysis indicates screening data involving more cell lines are needed for the models to make better use of molecular features.


Asunto(s)
Aprendizaje Profundo/tendencias , Evaluación Preclínica de Medicamentos/métodos , Línea Celular Tumoral , Humanos , National Cancer Institute (U.S.) , Redes Neurales de la Computación , Estados Unidos
11.
Proc Natl Acad Sci U S A ; 112(21): E2813-9, 2015 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-25964331

RESUMEN

Vibrio species are both ubiquitous and abundant in marine coastal waters, estuaries, ocean sediment, and aquaculture settings worldwide. We report here the isolation, characterization, and genome sequence of a novel Vibrio species, Vibrio antiquarius, isolated from a mesophilic bacterial community associated with hydrothermal vents located along the East Pacific Rise, near the southwest coast of Mexico. Genomic and phenotypic analysis revealed V. antiquarius is closely related to pathogenic Vibrio species, namely Vibrio alginolyticus, Vibrio parahaemolyticus, Vibrio harveyi, and Vibrio vulnificus, but sufficiently divergent to warrant a separate species status. The V. antiquarius genome encodes genes and operons with ecological functions relevant to the environment conditions of the deep sea and also harbors factors known to be involved in human disease caused by freshwater, coastal, and brackish water vibrios. The presence of virulence factors in this deep-sea Vibrio species suggests a far more fundamental role of these factors for their bacterial host. Comparative genomics revealed a variety of genomic events that may have provided an important driving force in V. antiquarius evolution, facilitating response to environmental conditions of the deep sea.


Asunto(s)
Respiraderos Hidrotermales/microbiología , Vibrio/aislamiento & purificación , Vibrio/patogenicidad , Evolución Molecular , Genoma Bacteriano , Humanos , Filogenia , ARN Bacteriano/genética , ARN Ribosómico 16S/genética , Agua de Mar/microbiología , Especificidad de la Especie , Vibrio/genética , Virulencia/genética
12.
Nucleic Acids Res ; 40(Web Server issue): W604-8, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22700702

RESUMEN

Web services application programming interface (API) was developed to provide a programmatic access to the regulatory interactions accumulated in the RegPrecise database (http://regprecise.lbl.gov), a core resource on transcriptional regulation for the microbial domain of the Department of Energy (DOE) Systems Biology Knowledgebase. RegPrecise captures and visualize regulogs, sets of genes controlled by orthologous regulators in several closely related bacterial genomes, that were reconstructed by comparative genomics. The current release of RegPrecise 2.0 includes >1400 regulogs controlled either by protein transcription factors or by conserved ribonucleic acid regulatory motifs in >250 genomes from 24 taxonomic groups of bacteria. The reference regulons accumulated in RegPrecise can serve as a basis for automatic annotation of regulatory interactions in newly sequenced genomes. The developed API provides an efficient access to the RegPrecise data by a comprehensive set of 14 web service resources. The RegPrecise web services API is freely accessible at http://regprecise.lbl.gov/RegPrecise/services.jsp with no login requirements.


Asunto(s)
Regulación Bacteriana de la Expresión Génica , Regulón , Programas Informáticos , Transcripción Genética , Redes Reguladoras de Genes , Genoma Bacteriano , Genómica/métodos , Internet , Motivos de Nucleótidos , Secuencias Reguladoras de Ácido Ribonucleico , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador
13.
Cancers (Basel) ; 16(3)2024 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-38339281

RESUMEN

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.

14.
Methods Mol Biol ; 2802: 547-571, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38819571

RESUMEN

As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.bv-brc.org/ . The combined BV-BRC leverages the functionality of the original resources for bacterial and viral research communities with a unified data model, enhanced web-based visualization and analysis tools, and bioinformatics services. Here we demonstrate how antimicrobial resistance data can be analyzed in the new resource.


Asunto(s)
Bacterias , Biología Computacional , Bases de Datos Genéticas , Farmacorresistencia Bacteriana , Genómica , Genómica/métodos , Biología Computacional/métodos , Farmacorresistencia Bacteriana/genética , Bacterias/genética , Bacterias/efectos de los fármacos , Humanos , Programas Informáticos , Genoma Bacteriano , Antibacterianos/farmacología , Navegador Web , Estados Unidos , National Institute of Allergy and Infectious Diseases (U.S.)
15.
Proc Natl Acad Sci U S A ; 107(49): 21134-9, 2010 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-21078967

RESUMEN

Whether Vibrio mimicus is a variant of Vibrio cholerae or a separate species has been the subject of taxonomic controversy. A genomic analysis was undertaken to resolve the issue. The genomes of V. mimicus MB451, a clinical isolate, and VM223, an environmental isolate, comprise ca. 4,347,971 and 4,313,453 bp and encode 3,802 and 3,290 ORFs, respectively. As in other vibrios, chromosome I (C-I) predominantly contains genes necessary for growth and viability, whereas chromosome II (C-II) bears genes for adaptation to environmental change. C-I harbors many virulence genes, including some not previously reported in V. mimicus, such as mannose-sensitive hemagglutinin (MSHA), and enterotoxigenic hemolysin (HlyA); C-II encodes a variant of Vibrio pathogenicity island 2 (VPI-2), and Vibrio seventh pandemic island II (VSP-II) cluster of genes. Extensive genomic rearrangement in C-II indicates it is a hot spot for evolution and genesis of speciation for the genus Vibrio. The number of virulence regions discovered in this study (VSP-II, MSHA, HlyA, type IV pilin, PilE, and integron integrase, IntI4) with no notable difference in potential virulence genes between clinical and environmental strains suggests these genes also may play a role in the environment and that pathogenic strains may arise in the environment. Significant genome synteny with prototypic pre-seventh pandemic strains of V. cholerae was observed, and the results of phylogenetic analysis support the hypothesis that, in the course of evolution, V. mimicus and V. cholerae diverged from a common ancestor with a prototypic sixth pandemic genomic backbone.


Asunto(s)
Genómica/métodos , Vibrio mimicus/genética , Cromosomas Bacterianos , Genes Bacterianos , Especiación Genética , Genoma Bacteriano , Sintenía , Vibrio cholerae/genética
16.
Front Med (Lausanne) ; 10: 1086097, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36873878

RESUMEN

Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.

17.
Metab Eng Commun ; 17: e00225, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37435441

RESUMEN

The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4-5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models.

18.
Sci Rep ; 13(1): 2105, 2023 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-36747041

RESUMEN

Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/metabolismo , Inteligencia Artificial , Simulación del Acoplamiento Molecular , Ligandos , Proteínas/metabolismo
19.
Front Med (Lausanne) ; 10: 1058919, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36960342

RESUMEN

Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.

20.
Cancers (Basel) ; 16(1)2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38201477

RESUMEN

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA