Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350631

RESUMEN

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.


Asunto(s)
Genómica , Programas Informáticos , Virus , Humanos , Bacterias/genética , Biología Computacional , Bases de Datos Genéticas , Gripe Humana , Virus/genética
2.
Nature ; 551(7681): 457-463, 2017 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-29088705

RESUMEN

Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.


Asunto(s)
Biodiversidad , Planeta Tierra , Microbiota/genética , Animales , Archaea/genética , Archaea/aislamiento & purificación , Bacterias/genética , Bacterias/aislamiento & purificación , Ecología/métodos , Dosificación de Gen , Mapeo Geográfico , Humanos , Plantas/microbiología , ARN Ribosómico 16S/análisis , ARN Ribosómico 16S/genética
3.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-34001007

RESUMEN

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Asunto(s)
Neoplasias , Preparaciones Farmacéuticas , Línea Celular , Curva de Aprendizaje , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Estudios Prospectivos
4.
Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-28968762

RESUMEN

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Farmacorresistencia Microbiana/genética , Integración de Sistemas , Biología Computacional/tendencias , Bases de Datos Genéticas/estadística & datos numéricos , Genoma Microbiano , Humanos , Internet , Anotación de Secuencia Molecular
5.
J Clin Microbiol ; 57(2)2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30333126

RESUMEN

Nontyphoidal Salmonella species are the leading bacterial cause of foodborne disease in the United States. Whole-genome sequences and paired antimicrobial susceptibility data are available for Salmonella strains because of surveillance efforts from public health agencies. In this study, a collection of 5,278 nontyphoidal Salmonella genomes, collected over 15 years in the United States, was used to generate extreme gradient boosting (XGBoost)-based machine learning models for predicting MICs for 15 antibiotics. The MIC prediction models had an overall average accuracy of 95% within ±1 2-fold dilution step (confidence interval, 95% to 95%), an average very major error rate of 2.7% (confidence interval, 2.4% to 3.0%), and an average major error rate of 0.1% (confidence interval, 0.1% to 0.2%). The model predicted MICs with no a priori information about the underlying gene content or resistance phenotypes of the strains. By selecting diverse genomes for the training sets, we show that highly accurate MIC prediction models can be generated with less than 500 genomes. We also show that our approach for predicting MICs is stable over time, despite annual fluctuations in antimicrobial resistance gene content in the sampled genomes. Finally, using feature selection, we explore the important genomic regions identified by the models for predicting MICs. To date, this is one of the largest MIC modeling studies to be published. Our strategy for developing whole-genome sequence-based models for surveillance and clinical diagnostics can be readily applied to other important human pathogens.


Asunto(s)
Farmacorresistencia Bacteriana , Técnicas de Genotipaje/métodos , Aprendizaje Automático , Pruebas de Sensibilidad Microbiana/métodos , Infecciones por Salmonella/microbiología , Salmonella/efectos de los fármacos , Salmonella/genética , Enfermedades Transmitidas por los Alimentos/microbiología , Genoma Bacteriano , Humanos , Salmonella/aislamiento & purificación , Estados Unidos
6.
Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899627

RESUMEN

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.


Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Bacteriano , Genómica/métodos , Antibacterianos/farmacología , Bacterias/efectos de los fármacos , Bacterias/metabolismo , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Farmacorresistencia Bacteriana , Anotación de Secuencia Molecular , Proteoma , Proteómica/métodos , Programas Informáticos , Navegador Web
7.
BMC Bioinformatics ; 19(Suppl 18): 486, 2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30577754

RESUMEN

BACKGROUND: The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity. RESULTS: We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance. While our best result is achieved with a combination of molecular feature types (gene expression, microRNA and proteome), we show that most of the predictive power comes from drug descriptors. To further demonstrate value in detecting anticancer therapy, we rank the drug pairs for each cell line based on model predicted combination effect and recover 80% of the top pairs with enhanced activity. CONCLUSIONS: We present promising results in applying deep learning to predicting combinational drug response. Our feature analysis indicates screening data involving more cell lines are needed for the models to make better use of molecular features.


Asunto(s)
Aprendizaje Profundo/tendencias , Evaluación Preclínica de Medicamentos/métodos , Línea Celular Tumoral , Humanos , National Cancer Institute (U.S.) , Redes Neurales de la Computación , Estados Unidos
8.
Artículo en Inglés | MEDLINE | ID: mdl-28069655

RESUMEN

ß-Lactams are the most widely used antibacterials. Among ß-lactams, carbapenems are considered the last line of defense against recalcitrant infections. As recent developments have prompted consideration of carbapenems for treatment of drug-resistant tuberculosis, it is only a matter of time before Mycobacterium tuberculosis strains resistant to these drugs will emerge. In the present study, we investigated the genetic basis that confers such resistance. To our surprise, instead of mutations in the known ß-lactam targets, a single nucleotide polymorphism in the Rv2421c-Rv2422 intergenic region was common among M. tuberculosis mutants selected with meropenem or biapenem. We present data supporting the hypothesis that this locus harbors a previously unidentified gene that encodes a protein. This protein binds to ß-lactams, slowly hydrolyzes the chromogenic ß-lactam nitrocefin, and is inhibited by select penicillins and carbapenems and the ß-lactamase inhibitor clavulanate. The mutation results in a W62R substitution that reduces the protein's nitrocefin-hydrolyzing activity and binding affinities for carbapenems.


Asunto(s)
Proteínas Bacterianas/genética , ADN Intergénico , Mutación , Mycobacterium tuberculosis/genética , Resistencia betalactámica/genética , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Antibacterianos/farmacología , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Cefalosporinas/metabolismo , Cefalosporinas/farmacología , Ácido Clavulánico/metabolismo , Ácido Clavulánico/farmacología , Expresión Génica , Sitios Genéticos , Humanos , Meropenem , Pruebas de Sensibilidad Microbiana , Mycobacterium tuberculosis/efectos de los fármacos , Mycobacterium tuberculosis/aislamiento & purificación , Mycobacterium tuberculosis/metabolismo , Sistemas de Lectura Abierta , Unión Proteica , Tienamicinas/farmacología , Tuberculosis Resistente a Múltiples Medicamentos/microbiología
9.
BMC Genomics ; 17: 568, 2016 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-27502787

RESUMEN

BACKGROUND: Automatically generated bacterial metabolic models, and even some curated models, lack accuracy in predicting energy yields due to poor representation of key pathways in energy biosynthesis and the electron transport chain (ETC). Further compounding the problem, complex interlinking pathways in genome-scale metabolic models, and the need for extensive gapfilling to support complex biomass reactions, often results in predicting unrealistic yields or unrealistic physiological flux profiles. RESULTS: To overcome this challenge, we developed methods and tools ( http://coremodels.mcs.anl.gov ) to build high quality core metabolic models (CMM) representing accurate energy biosynthesis based on a well studied, phylogenetically diverse set of model organisms. We compare these models to explore the variability of core pathways across all microbial life, and by analyzing the ability of our core models to synthesize ATP and essential biomass precursors, we evaluate the extent to which the core metabolic pathways and functional ETCs are known for all microbes. 6,600 (80 %) of our models were found to have some type of aerobic ETC, whereas 5,100 (62 %) have an anaerobic ETC, and 1,279 (15 %) do not have any ETC. Using our manually curated ETC and energy biosynthesis pathways with no gapfilling at all, we predict accurate ATP yields for nearly 5586 (70 %) of the models under aerobic and anaerobic growth conditions. This study revealed gaps in our knowledge of the central pathways that result in 2,495 (30 %) CMMs being unable to produce ATP under any of the tested conditions. We then established a methodology for the systematic identification and correction of inconsistent annotations using core metabolic models coupled with phylogenetic analysis. CONCLUSIONS: We predict accurate energy yields based on our improved annotations in energy biosynthesis pathways and the implementation of diverse ETC reactions across the microbial tree of life. We highlighted missing annotations that were essential to energy biosynthesis in our models. We examine the diversity of these pathways across all microbial life and enable the scientific community to explore the analyses generated from this large-scale analysis of over 8000 microbial genomes.


Asunto(s)
Metabolismo Energético , Redes y Vías Metabólicas , Modelos Biológicos , Adenosina Trifosfato/biosíntesis , Bacterias/clasificación , Bacterias/genética , Bacterias/metabolismo , Biomasa , Biología Computacional/métodos , Proteínas del Complejo de Cadena de Transporte de Electrón/metabolismo , Genómica/métodos , Anotación de Secuencia Molecular , Filogenia
10.
Nucleic Acids Res ; 42(Database issue): D581-91, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24225323

RESUMEN

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.


Asunto(s)
Bases de Datos Genéticas , Genoma Bacteriano , Bacterias/clasificación , Bacterias/genética , Infecciones Bacterianas/microbiología , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Técnicas de Tipificación Bacteriana , Perfilación de la Expresión Génica , Genómica , Humanos , Internet , Conformación Proteica , Mapeo de Interacción de Proteínas
11.
Nucleic Acids Res ; 41(1): 687-99, 2013 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-23109554

RESUMEN

The nonessential regions in bacterial chromosomes are ill-defined due to incomplete functional information. Here, we establish a comprehensive repertoire of the genome regions that are dispensable for growth of Bacillus subtilis in a variety of media conditions. In complex medium, we attempted deletion of 157 individual regions ranging in size from 2 to 159 kb. A total of 146 deletions were successful in complex medium, whereas the remaining regions were subdivided to identify new essential genes (4) and coessential gene sets (7). Overall, our repertoire covers ~76% of the genome. We screened for viability of mutant strains in rich defined medium and glucose minimal media. Experimental observations were compared with predictions by the iBsu1103 model, revealing discrepancies that led to numerous model changes, including the large-scale application of model reconciliation techniques. We ultimately produced the iBsu1103V2 model and generated predictions of metabolites that could restore the growth of unviable strains. These predictions were experimentally tested and demonstrated to be correct for 27 strains, validating the refinements made to the model. The iBsu1103V2 model has improved considerably at predicting loss of viability, and many insights gained from the model revisions have been integrated into the Model SEED to improve reconstruction of other microbial models.


Asunto(s)
Bacillus subtilis/genética , Cromosomas Bacterianos , Modelos Biológicos , Bacillus subtilis/crecimiento & desarrollo , Bacillus subtilis/metabolismo , Deleción Cromosómica , Mapeo Cromosómico , Redes y Vías Metabólicas/genética , Fenotipo
12.
Cancers (Basel) ; 16(3)2024 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-38339281

RESUMEN

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.

13.
Methods Mol Biol ; 2802: 547-571, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38819571

RESUMEN

As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.bv-brc.org/ . The combined BV-BRC leverages the functionality of the original resources for bacterial and viral research communities with a unified data model, enhanced web-based visualization and analysis tools, and bioinformatics services. Here we demonstrate how antimicrobial resistance data can be analyzed in the new resource.


Asunto(s)
Bacterias , Biología Computacional , Bases de Datos Genéticas , Farmacorresistencia Bacteriana , Genómica , Genómica/métodos , Biología Computacional/métodos , Farmacorresistencia Bacteriana/genética , Bacterias/genética , Bacterias/efectos de los fármacos , Humanos , Programas Informáticos , Genoma Bacteriano , Antibacterianos/farmacología , Navegador Web , Estados Unidos , National Institute of Allergy and Infectious Diseases (U.S.)
15.
Front Med (Lausanne) ; 10: 1086097, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36873878

RESUMEN

Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.

16.
Front Med (Lausanne) ; 10: 1058919, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36960342

RESUMEN

Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.

17.
Cancers (Basel) ; 16(1)2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38201477

RESUMEN

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.

18.
Biochim Biophys Acta ; 1810(10): 967-77, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21421023

RESUMEN

BACKGROUND: The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. SCOPE OF REVIEW: This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. MAJOR CONCLUSIONS: This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. GENERAL SIGNIFICANCE: This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.


Asunto(s)
Genotipo , Fenotipo , Análisis de Secuencia de ADN/métodos , Animales , Humanos , Metagenómica/métodos
19.
Sci Rep ; 11(1): 11325, 2021 05 31.
Artículo en Inglés | MEDLINE | ID: mdl-34059739

RESUMEN

Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimized assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image. We apply IGTD to transform gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into their respective image representations. Compared with existing transformation methods, IGTD generates compact image representations with better preservation of feature neighborhood structure. Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.


Asunto(s)
Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador , Programas Informáticos , Línea Celular Tumoral , Humanos
20.
Pathogens ; 10(6)2021 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-34067337

RESUMEN

Pneumonic tularemia is a highly debilitating and potentially fatal disease caused by inhalation of Francisella tularensis. Most of our current understanding of its pathogenesis is based on the highly virulent F. tularensis subsp. tularensis strain SCHU S4. However, multiple sources of SCHU S4 have been maintained and propagated independently over the years, potentially generating genetic variants with altered virulence. In this study, the virulence of four SCHU S4 stocks (NR-10492, NR-28534, NR-643 from BEI Resources and FTS-635 from Battelle Memorial Institute) along with another virulent subsp. tularensis strain, MA00-2987, were assessed in parallel. In the Fischer 344 rat model of pneumonic tularemia, NR-643 and FTS-635 were found to be highly attenuated compared to NR-10492, NR-28534, and MA00-2987. In the NZW rabbit model of pneumonic tularemia, NR-643 caused morbidity but not mortality even at a dose equivalent to 500x the LD50 for NR-10492. Genetic analyses revealed that NR-10492 and NR-28534 were identical to each other, and nearly identical to the reference SCHU S4 sequence. NR-643 and FTS-635 were identical to each other but were found to have nine regions of difference in the genomic sequence when compared to the published reference SCHU S4 sequence. Given the genetic differences and decreased virulence, NR-643/FTS-635 should be clearly designated as a separate SCHU S4 substrain and no longer utilized in efficacy studies to evaluate potential vaccines and therapeutics against tularemia.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA