Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Cancers (Basel) ; 16(3)2024 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-38339281

RESUMEN

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.

2.
Front Med (Lausanne) ; 10: 1086097, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36873878

RESUMEN

Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.

3.
Front Med (Lausanne) ; 10: 1058919, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36960342

RESUMEN

Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.

4.
Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350631

RESUMEN

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.


Asunto(s)
Genómica , Programas Informáticos , Virus , Humanos , Bacterias/genética , Biología Computacional , Bases de Datos Genéticas , Gripe Humana , Virus/genética
5.
Cancers (Basel) ; 16(1)2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38201477

RESUMEN

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.

7.
Pathogens ; 10(6)2021 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-34067337

RESUMEN

Pneumonic tularemia is a highly debilitating and potentially fatal disease caused by inhalation of Francisella tularensis. Most of our current understanding of its pathogenesis is based on the highly virulent F. tularensis subsp. tularensis strain SCHU S4. However, multiple sources of SCHU S4 have been maintained and propagated independently over the years, potentially generating genetic variants with altered virulence. In this study, the virulence of four SCHU S4 stocks (NR-10492, NR-28534, NR-643 from BEI Resources and FTS-635 from Battelle Memorial Institute) along with another virulent subsp. tularensis strain, MA00-2987, were assessed in parallel. In the Fischer 344 rat model of pneumonic tularemia, NR-643 and FTS-635 were found to be highly attenuated compared to NR-10492, NR-28534, and MA00-2987. In the NZW rabbit model of pneumonic tularemia, NR-643 caused morbidity but not mortality even at a dose equivalent to 500x the LD50 for NR-10492. Genetic analyses revealed that NR-10492 and NR-28534 were identical to each other, and nearly identical to the reference SCHU S4 sequence. NR-643 and FTS-635 were identical to each other but were found to have nine regions of difference in the genomic sequence when compared to the published reference SCHU S4 sequence. Given the genetic differences and decreased virulence, NR-643/FTS-635 should be clearly designated as a separate SCHU S4 substrain and no longer utilized in efficacy studies to evaluate potential vaccines and therapeutics against tularemia.

8.
Sci Rep ; 11(1): 11325, 2021 05 31.
Artículo en Inglés | MEDLINE | ID: mdl-34059739

RESUMEN

Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimized assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image. We apply IGTD to transform gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into their respective image representations. Compared with existing transformation methods, IGTD generates compact image representations with better preservation of feature neighborhood structure. Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.


Asunto(s)
Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador , Programas Informáticos , Línea Celular Tumoral , Humanos
9.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-34001007

RESUMEN

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Asunto(s)
Neoplasias , Preparaciones Farmacéuticas , Línea Celular , Curva de Aprendizaje , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Estudios Prospectivos
10.
Sci Rep ; 10(1): 18040, 2020 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-33093487

RESUMEN

Transfer learning, which transfers patterns learned on a source dataset to a related target dataset for constructing prediction models, has been shown effective in many applications. In this paper, we investigate whether transfer learning can be used to improve the performance of anti-cancer drug response prediction models. Previous transfer learning studies for drug response prediction focused on building models to predict the response of tumor cells to a specific drug treatment. We target the more challenging task of building general prediction models that can make predictions for both new tumor cells and new drugs. Uniquely, we investigate the power of transfer learning for three drug response prediction applications including drug repurposing, precision oncology, and new drug development, through different data partition schemes in cross-validation. We extend the classic transfer learning framework through ensemble and demonstrate its general utility with three representative prediction algorithms including a gradient boosting model and two deep neural networks. The ensemble transfer learning framework is tested on benchmark in vitro drug screening datasets. The results demonstrate that our framework broadly improves the prediction performance in all three drug response prediction applications with all three prediction algorithms.


Asunto(s)
Antineoplásicos/farmacología , Conjuntos de Datos como Asunto , Aprendizaje Profundo , Ensayos de Selección de Medicamentos Antitumorales , Neoplasias/tratamiento farmacológico , Neoplasias/patología , Algoritmos , Antineoplásicos/uso terapéutico , Desarrollo de Medicamentos , Reposicionamiento de Medicamentos , Humanos , Modelos Biológicos , Redes Neurales de la Computación , Medicina de Precisión
11.
Genes (Basel) ; 11(9)2020 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-32933072

RESUMEN

The co-expression extrapolation (COXEN) method has been successfully used in multiple studies to select genes for predicting the response of tumor cells to a specific drug treatment. Here, we enhance the COXEN method to select genes that are predictive of the efficacies of multiple drugs for building general drug response prediction models that are not specific to a particular drug. The enhanced COXEN method first ranks the genes according to their prediction power for each individual drug and then takes a union of top predictive genes of all the drugs, among which the algorithm further selects genes whose co-expression patterns are well preserved between cancer cases for building prediction models. We apply the proposed method on benchmark in vitro drug screening datasets and compare the performance of prediction models built based on the genes selected by the enhanced COXEN method to that of models built on genes selected by the original COXEN method and randomly picked genes. Models built with the enhanced COXEN method always present a statistically significantly improved prediction performance (adjusted p-value ≤ 0.05). Our results demonstrate the enhanced COXEN method can dramatically increase the power of gene expression data for predicting drug response.


Asunto(s)
Antineoplásicos/farmacología , Biomarcadores de Tumor/genética , Ensayos de Selección de Medicamentos Antitumorales/métodos , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Algoritmos , Humanos
14.
Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-28968762

RESUMEN

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Farmacorresistencia Microbiana/genética , Integración de Sistemas , Biología Computacional/tendencias , Bases de Datos Genéticas/estadística & datos numéricos , Genoma Microbiano , Humanos , Internet , Anotación de Secuencia Molecular
15.
J Clin Microbiol ; 57(2)2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30333126

RESUMEN

Nontyphoidal Salmonella species are the leading bacterial cause of foodborne disease in the United States. Whole-genome sequences and paired antimicrobial susceptibility data are available for Salmonella strains because of surveillance efforts from public health agencies. In this study, a collection of 5,278 nontyphoidal Salmonella genomes, collected over 15 years in the United States, was used to generate extreme gradient boosting (XGBoost)-based machine learning models for predicting MICs for 15 antibiotics. The MIC prediction models had an overall average accuracy of 95% within ±1 2-fold dilution step (confidence interval, 95% to 95%), an average very major error rate of 2.7% (confidence interval, 2.4% to 3.0%), and an average major error rate of 0.1% (confidence interval, 0.1% to 0.2%). The model predicted MICs with no a priori information about the underlying gene content or resistance phenotypes of the strains. By selecting diverse genomes for the training sets, we show that highly accurate MIC prediction models can be generated with less than 500 genomes. We also show that our approach for predicting MICs is stable over time, despite annual fluctuations in antimicrobial resistance gene content in the sampled genomes. Finally, using feature selection, we explore the important genomic regions identified by the models for predicting MICs. To date, this is one of the largest MIC modeling studies to be published. Our strategy for developing whole-genome sequence-based models for surveillance and clinical diagnostics can be readily applied to other important human pathogens.


Asunto(s)
Farmacorresistencia Bacteriana , Técnicas de Genotipaje/métodos , Aprendizaje Automático , Pruebas de Sensibilidad Microbiana/métodos , Infecciones por Salmonella/microbiología , Salmonella/efectos de los fármacos , Salmonella/genética , Enfermedades Transmitidas por los Alimentos/microbiología , Genoma Bacteriano , Humanos , Salmonella/aislamiento & purificación , Estados Unidos
16.
BMC Bioinformatics ; 19(Suppl 18): 486, 2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30577754

RESUMEN

BACKGROUND: The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity. RESULTS: We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance. While our best result is achieved with a combination of molecular feature types (gene expression, microRNA and proteome), we show that most of the predictive power comes from drug descriptors. To further demonstrate value in detecting anticancer therapy, we rank the drug pairs for each cell line based on model predicted combination effect and recover 80% of the top pairs with enhanced activity. CONCLUSIONS: We present promising results in applying deep learning to predicting combinational drug response. Our feature analysis indicates screening data involving more cell lines are needed for the models to make better use of molecular features.


Asunto(s)
Aprendizaje Profundo/tendencias , Evaluación Preclínica de Medicamentos/métodos , Línea Celular Tumoral , Humanos , National Cancer Institute (U.S.) , Redes Neurales de la Computación , Estados Unidos
19.
Sci Rep ; 8(1): 421, 2018 01 11.
Artículo en Inglés | MEDLINE | ID: mdl-29323230

RESUMEN

Antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ±1 two-fold dilution factor, is 92%. Individual accuracies are ≥90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.


Asunto(s)
Antibacterianos/farmacología , Infecciones por Klebsiella/microbiología , Klebsiella pneumoniae/genética , Secuenciación Completa del Genoma/métodos , Simulación por Computador , ADN Bacteriano/genética , Farmacorresistencia Bacteriana Múltiple , Humanos , Klebsiella pneumoniae/efectos de los fármacos , Aprendizaje Automático , Pruebas de Sensibilidad Microbiana , Modelos Teóricos
20.
Nature ; 551(7681): 457-463, 2017 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-29088705

RESUMEN

Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.


Asunto(s)
Biodiversidad , Planeta Tierra , Microbiota/genética , Animales , Archaea/genética , Archaea/aislamiento & purificación , Bacterias/genética , Bacterias/aislamiento & purificación , Ecología/métodos , Dosificación de Gen , Mapeo Geográfico , Humanos , Plantas/microbiología , ARN Ribosómico 16S/análisis , ARN Ribosómico 16S/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...