Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350631

RESUMO

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.


Assuntos
Genômica , Software , Vírus , Humanos , Bactérias/genética , Biologia Computacional , Bases de Dados Genéticas , Influenza Humana , Vírus/genética
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34524425

RESUMO

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.


Assuntos
Neoplasias , Algoritmos , Linhagem Celular , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Redes Neurais de Computação
3.
Emerg Infect Dis ; 29(5)2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37054986

RESUMO

Since late 2020, SARS-CoV-2 variants have regularly emerged with competitive and phenotypic differences from previously circulating strains, sometimes with the potential to escape from immunity produced by prior exposure and infection. The Early Detection group is one of the constituent groups of the US National Institutes of Health National Institute of Allergy and Infectious Diseases SARS-CoV-2 Assessment of Viral Evolution program. The group uses bioinformatic methods to monitor the emergence, spread, and potential phenotypic properties of emerging and circulating strains to identify the most relevant variants for experimental groups within the program to phenotypically characterize. Since April 2021, the group has prioritized variants monthly. Prioritization successes include rapidly identifying most major variants of SARS-CoV-2 and providing experimental groups within the National Institutes of Health program easy access to regularly updated information on the recent evolution and epidemiology of SARS-CoV-2 that can be used to guide phenotypic investigations.


Assuntos
COVID-19 , SARS-CoV-2 , Estados Unidos/epidemiologia , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , National Institutes of Health (U.S.)
4.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34379107

RESUMO

Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Resistência Microbiana a Medicamentos , Genômica/métodos , Testes de Sensibilidade Microbiana , Inteligência Artificial , Bactérias/efeitos dos fármacos , Bactérias/genética , Genoma Bacteriano , Humanos , Laboratórios , Aprendizado de Máquina , Fenótipo
5.
Nucleic Acids Res ; 48(D1): D606-D612, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31667520

RESUMO

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Algoritmos , Animais , Caenorhabditis elegans/genética , Galinhas/genética , Drosophila melanogaster/genética , Interações Hospedeiro-Patógeno/genética , Humanos , Internet , Macaca mulatta/genética , Metagenômica , Camundongos , National Institute of Allergy and Infectious Diseases (U.S.) , Fenótipo , Filogenia , Ratos , Suínos/genética , Estados Unidos , Peixe-Zebra/genética
6.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-34001007

RESUMO

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Assuntos
Neoplasias , Preparações Farmacêuticas , Linhagem Celular , Curva de Aprendizado , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Estudos Prospectivos
7.
Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-28968762

RESUMO

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Resistência Microbiana a Medicamentos/genética , Integração de Sistemas , Biologia Computacional/tendências , Bases de Dados Genéticas/estatística & dados numéricos , Genoma Microbiano , Humanos , Internet , Anotação de Sequência Molecular
8.
PLoS Comput Biol ; 16(10): e1008319, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33075053

RESUMO

A growing number of studies are using machine learning models to accurately predict antimicrobial resistance (AMR) phenotypes from bacterial sequence data. Although these studies are showing promise, the models are typically trained using features derived from comprehensive sets of AMR genes or whole genome sequences and may not be suitable for use when genomes are incomplete. In this study, we explore the possibility of predicting AMR phenotypes using incomplete genome sequence data. Models were built from small sets of randomly-selected core genes after removing the AMR genes. For Klebsiella pneumoniae, Mycobacterium tuberculosis, Salmonella enterica, and Staphylococcus aureus, we report that it is possible to classify susceptible and resistant phenotypes with average F1 scores ranging from 0.80-0.89 with as few as 100 conserved non-AMR genes, with very major error rates ranging from 0.11-0.23 and major error rates ranging from 0.10-0.20. Models built from core genes have predictive power in cases where the primary AMR mechanisms result from SNPs or horizontal gene transfer. By randomly sampling non-overlapping sets of core genes, we show that F1 scores and error rates are stable and have little variance between replicates. Although these small core gene models have lower accuracies and higher error rates than models built from the corresponding assembled genomes, the results suggest that sufficient variation exists in the core non-AMR genes of a species for predicting AMR phenotypes.


Assuntos
Sequência Conservada/genética , Farmacorresistência Bacteriana/genética , Genoma Bacteriano/genética , Genômica/métodos , Aprendizado de Máquina , Algoritmos , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bactérias/genética , Fenótipo
9.
Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899627

RESUMO

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano , Genômica/métodos , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bactérias/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Farmacorresistência Bacteriana , Anotação de Sequência Molecular , Proteoma , Proteômica/métodos , Software , Navegador
10.
BMC Bioinformatics ; 19(Suppl 18): 486, 2018 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-30577754

RESUMO

BACKGROUND: The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity. RESULTS: We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance. While our best result is achieved with a combination of molecular feature types (gene expression, microRNA and proteome), we show that most of the predictive power comes from drug descriptors. To further demonstrate value in detecting anticancer therapy, we rank the drug pairs for each cell line based on model predicted combination effect and recover 80% of the top pairs with enhanced activity. CONCLUSIONS: We present promising results in applying deep learning to predicting combinational drug response. Our feature analysis indicates screening data involving more cell lines are needed for the models to make better use of molecular features.


Assuntos
Aprendizado Profundo/tendências , Avaliação Pré-Clínica de Medicamentos/métodos , Linhagem Celular Tumoral , Humanos , National Cancer Institute (U.S.) , Redes Neurais de Computação , Estados Unidos
11.
Bioinformatics ; 31(2): 252-8, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25273106

RESUMO

MOTIVATION: We've developed a highly curated bacterial virulence factor (VF) library in PATRIC (Pathosystems Resource Integration Center, www.patricbrc.org) to support infectious disease research. Although several VF databases are available, there is still a need to incorporate new knowledge found in published experimental evidence and integrate these data with other information known for these specific VF genes, including genomic and other omics data. This integration supports the identification of VFs, comparative studies and hypothesis generation, which facilitates the understanding of virulence and pathogenicity. RESULTS: We have manually curated VFs from six prioritized NIAID (National Institute of Allergy and Infectious Diseases) category A-C bacterial pathogen genera, Mycobacterium, Salmonella, Escherichia, Shigella, Listeria and Bartonella, using published literature. This curated information on virulence has been integrated with data from genomic functional annotations, trancriptomic experiments, protein-protein interactions and disease information already present in PATRIC. Such integration gives researchers access to a broad array of information about these individual genes, and also to a suite of tools to perform comparative genomic and transcriptomics analysis that are available at PATRIC. AVAILABILITY AND IMPLEMENTATION: All tools and data are freely available at PATRIC (http://patricbrc.org). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bactérias/genética , Infecções Bacterianas/microbiologia , Proteínas de Bactérias/metabolismo , Gráficos por Computador , Bases de Dados Factuais , Fatores de Virulência/metabolismo , Virulência/genética , Bactérias/classificação , Bactérias/patogenicidade , Perfilação da Expressão Gênica , Genoma Bacteriano , Genômica , Humanos , Mapeamento de Interação de Proteínas , Integração de Sistemas
12.
Bioinformatics ; 31(9): 1496-8, 2015 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-25573919

RESUMO

MOTIVATION: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RESULTS: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. AVAILABILITY AND IMPLEMENTATION: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. CONTACT: anwarren@vt.edu SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Ensaios de Triagem em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Animais , Bactérias/genética , Vetores de Doenças , Genômica , Parasitos/genética
13.
Nucleic Acids Res ; 42(Database issue): D206-14, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24293654

RESUMO

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.


Assuntos
Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Anotação de Sequência Molecular , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , Genômica , Internet , Software
14.
Nucleic Acids Res ; 42(Database issue): D581-91, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24225323

RESUMO

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.


Assuntos
Bases de Dados Genéticas , Genoma Bacteriano , Bactérias/classificação , Bactérias/genética , Infecções Bacterianas/microbiologia , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Técnicas de Tipagem Bacteriana , Perfilação da Expressão Gênica , Genômica , Humanos , Internet , Conformação Proteica , Mapeamento de Interação de Proteínas
15.
J Bacteriol ; 196(5): 920-30, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24336939

RESUMO

Brucella species include important zoonotic pathogens that have a substantial impact on both agriculture and human health throughout the world. Brucellae are thought of as "stealth pathogens" that escape recognition by the host innate immune response, modulate the acquired immune response, and evade intracellular destruction. We analyzed the genome sequences of members of the family Brucellaceae to assess its evolutionary history from likely free-living soil-based progenitors into highly successful intracellular pathogens. Phylogenetic analysis split the genus into two groups: recently identified and early-dividing "atypical" strains and a highly conserved "classical" core clade containing the major pathogenic species. Lateral gene transfer events brought unique genomic regions into Brucella that differentiated them from Ochrobactrum and allowed the stepwise acquisition of virulence factors that include a type IV secretion system, a perosamine-based O antigen, and systems for sequestering metal ions that are absent in progenitors. Subsequent radiation within the core Brucella resulted in lineages that appear to have evolved within their preferred mammalian hosts, restricting their virulence to become stealth pathogens capable of causing long-term chronic infections.


Assuntos
Evolução Biológica , Brucellaceae/genética , Brucellaceae/patogenicidade , Genoma Bacteriano , Genômica/métodos , Filogenia , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Regulação Bacteriana da Expressão Gênica/fisiologia , Virulência
16.
Cancers (Basel) ; 16(3)2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38339281

RESUMO

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.

17.
Methods Mol Biol ; 2802: 547-571, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38819571

RESUMO

As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.bv-brc.org/ . The combined BV-BRC leverages the functionality of the original resources for bacterial and viral research communities with a unified data model, enhanced web-based visualization and analysis tools, and bioinformatics services. Here we demonstrate how antimicrobial resistance data can be analyzed in the new resource.


Assuntos
Bactérias , Biologia Computacional , Bases de Dados Genéticas , Farmacorresistência Bacteriana , Genômica , Genômica/métodos , Biologia Computacional/métodos , Farmacorresistência Bacteriana/genética , Bactérias/genética , Bactérias/efeitos dos fármacos , Humanos , Software , Genoma Bacteriano , Antibacterianos/farmacologia , Navegador , Estados Unidos , National Institute of Allergy and Infectious Diseases (U.S.)
18.
Front Med (Lausanne) ; 10: 1058919, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36960342

RESUMO

Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.

19.
Cancers (Basel) ; 16(1)2023 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-38201477

RESUMO

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.

20.
J Bacteriol ; 194(2): 376-94, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22056929

RESUMO

We present the draft genome for the Rickettsia endosymbiont of Ixodes scapularis (REIS), a symbiont of the deer tick vector of Lyme disease in North America. Among Rickettsia species (Alphaproteobacteria: Rickettsiales), REIS has the largest genome sequenced to date (>2 Mb) and contains 2,309 genes across the chromosome and four plasmids (pREIS1 to pREIS4). The most remarkable finding within the REIS genome is the extraordinary proliferation of mobile genetic elements (MGEs), which contributes to a limited synteny with other Rickettsia genomes. In particular, an integrative conjugative element named RAGE (for Rickettsiales amplified genetic element), previously identified in scrub typhus rickettsiae (Orientia tsutsugamushi) genomes, is present on both the REIS chromosome and plasmids. Unlike the pseudogene-laden RAGEs of O. tsutsugamushi, REIS encodes nine conserved RAGEs that include F-like type IV secretion systems similar to that of the tra genes encoded in the Rickettsia bellii and R. massiliae genomes. An unparalleled abundance of encoded transposases (>650) relative to genome size, together with the RAGEs and other MGEs, comprise ~35% of the total genome, making REIS one of the most plastic and repetitive bacterial genomes sequenced to date. We present evidence that conserved rickettsial genes associated with an intracellular lifestyle were acquired via MGEs, especially the RAGE, through a continuum of genomic invasions. Robust phylogeny estimation suggests REIS is ancestral to the virulent spotted fever group of rickettsiae. As REIS is not known to invade vertebrate cells and has no known pathogenic effects on I. scapularis, its genome sequence provides insight on the origin of mechanisms of rickettsial pathogenicity.


Assuntos
Regulação Bacteriana da Expressão Gênica/fisiologia , Genoma Bacteriano , Sequências Repetitivas Dispersas , Ixodes/microbiologia , Rickettsia/genética , Animais , Vetores Aracnídeos/microbiologia , Evolução Biológica , Mapeamento Cromossômico , Cromossomos Bacterianos , Dados de Sequência Molecular , Plasmídeos , Simbiose
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA