Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Metab Eng Commun ; 17: e00225, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37435441

RESUMEN

The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4-5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models.

2.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-34001007

RESUMEN

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Asunto(s)
Neoplasias , Preparaciones Farmacéuticas , Línea Celular , Curva de Aprendizaje , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Estudios Prospectivos
3.
Curr Opin Biotechnol ; 17(5): 448-56, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-16978855

RESUMEN

Within the past five years genome-scale gene essentiality data sets have been published for ten diverse bacterial species. These data are a rich source of information about cellular networks that we are only beginning to explore. The analysis of these data, very heterogeneous in nature, is a challenging task. Even the definition of 'essential genes' in various genome-scale studies varies from genes 'absolutely required for survival' to those 'strongly contributing to fitness' and robust competitive growth. A comparative analysis of gene essentiality across multiple organisms based on projection of experimentally observed essential genes to functional roles in a collection of metabolic pathways and subsystems is emerging as a powerful tool of systems biology.


Asunto(s)
Genes Esenciales/genética , Redes y Vías Metabólicas/genética , Biología de Sistemas/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Bacteriano/genética , Modelos Biológicos
4.
Nucleic Acids Res ; 33(17): 5691-702, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16214803

RESUMEN

The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.


Asunto(s)
Genoma Arqueal , Genoma Bacteriano , Genómica/métodos , Programas Informáticos , Acilcoenzima A/metabolismo , Coenzima A/biosíntesis , Biología Computacional , Internet , Leucina/metabolismo , Proteínas Ribosómicas/clasificación , Terminología como Asunto , Vocabulario Controlado
5.
FEMS Microbiol Lett ; 250(2): 175-84, 2005 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-16099605

RESUMEN

Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp. israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.


Asunto(s)
Bacillus anthracis/genética , Bacillus cereus/genética , Bacillus subtilis/genética , Bacillus thuringiensis/genética , Genoma Bacteriano , Proteínas Bacterianas/genética , Pared Celular/genética , Genómica , Glicoproteínas de Membrana/genética , Proteínas de la Membrana/genética , Proteínas de Transporte de Membrana/genética , Transducción de Señal/genética , Sintenía
6.
Nat Biotechnol ; 22(12): 1554-8, 2004 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-15543133

RESUMEN

The lactic acid bacterium Streptococcus thermophilus is widely used for the manufacture of yogurt and cheese. This dairy species of major economic importance is phylogenetically close to pathogenic streptococci, raising the possibility that it has a potential for virulence. Here we report the genome sequences of two yogurt strains of S. thermophilus. We found a striking level of gene decay (10% pseudogenes) in both microorganisms. Many genes involved in carbon utilization are nonfunctional, in line with the paucity of carbon sources in milk. Notably, most streptococcal virulence-related genes that are not involved in basic cellular processes are either inactivated or absent in the dairy streptococcus. Adaptation to the constant milk environment appears to have resulted in the stabilization of the genome structure. We conclude that S. thermophilus has evolved mainly through loss-of-function events that remarkably mirror the environment of the dairy niche resulting in a severely diminished pathogenic potential.


Asunto(s)
Proteínas Bacterianas/genética , Mapeo Cromosómico/métodos , Evolución Molecular , Inestabilidad Genómica/genética , Infecciones Estreptocócicas/genética , Streptococcus thermophilus/genética , Factores de Virulencia/genética , Yogur/microbiología , Secuencia de Bases , Secuencia Conservada , Genoma Bacteriano , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Especificidad de la Especie , Streptococcus thermophilus/clasificación , Streptococcus thermophilus/patogenicidad
7.
Nature ; 423(6935): 87-91, 2003 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-12721630

RESUMEN

Bacillus cereus is an opportunistic pathogen causing food poisoning manifested by diarrhoeal or emetic syndromes. It is closely related to the animal and human pathogen Bacillus anthracis and the insect pathogen Bacillus thuringiensis, the former being used as a biological weapon and the latter as a pesticide. B. anthracis and B. thuringiensis are readily distinguished from B. cereus by the presence of plasmid-borne specific toxins (B. anthracis and B. thuringiensis) and capsule (B. anthracis). But phylogenetic studies based on the analysis of chromosomal genes bring controversial results, and it is unclear whether B. cereus, B. anthracis and B. thuringiensis are varieties of the same species or different species. Here we report the sequencing and analysis of the type strain B. cereus ATCC 14579. The complete genome sequence of B. cereus ATCC 14579 together with the gapped genome of B. anthracis A2012 enables us to perform comparative analysis, and hence to identify the genes that are conserved between B. cereus and B. anthracis, and the genes that are unique for each species. We use the former to clarify the phylogeny of the cereus group, and the latter to determine plasmid-independent species-specific markers.


Asunto(s)
Bacillus anthracis/genética , Bacillus cereus/genética , Genoma Bacteriano , Secuencia de Bases , Secuencia Conservada , Genes Bacterianos/genética , Datos de Secuencia Molecular , Filogenia , Plásmidos/genética , Análisis de Secuencia de ADN , Especificidad de la Especie
8.
Nucleic Acids Res ; 31(1): 164-71, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12519973

RESUMEN

The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription.


Asunto(s)
Bases de Datos Genéticas , Genoma , Genómica , Animales , Biología Computacional , Perfilación de la Expresión Génica , Metabolismo , Proteínas/fisiología
9.
J Bacteriol ; 184(16): 4555-72, 2002 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-12142426

RESUMEN

Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.


Asunto(s)
Coenzima A/biosíntesis , Escherichia coli/metabolismo , Flavina-Adenina Dinucleótido/biosíntesis , NADP/biosíntesis , Antibacterianos , Huella de ADN , Elementos Transponibles de ADN , Diseño de Fármacos , Farmacorresistencia Bacteriana , Escherichia coli/efectos de los fármacos , Escherichia coli/genética , Mononucleótido de Flavina/biosíntesis , Genoma Bacteriano , Mutagénesis Insercional , Nicotinamida-Nucleótido Adenililtransferasa/metabolismo , Fosfotransferasas (Aceptor de Grupo Alcohol)/genética , Especificidad por Sustrato
10.
J Bacteriol ; 184(7): 2005-18, 2002 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-11889109

RESUMEN

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth.


Asunto(s)
Fusobacterium nucleatum/genética , Genoma Bacteriano , Biosíntesis de Proteínas , Transcripción Genética , Aminoácidos/metabolismo , Proteínas de la Membrana Bacteriana Externa/metabolismo , Transporte Biológico , División Celular , Coenzimas/metabolismo , Reparación del ADN , Replicación del ADN , Elementos Transponibles de ADN , ADN Bacteriano/análisis , Farmacorresistencia Bacteriana , Fusobacterium nucleatum/metabolismo , Metabolismo de los Lípidos , Lipopolisacáridos/metabolismo , Mutagénesis Insercional , Nucleótidos/metabolismo , Protones , Transducción de Señal/fisiología , Virulencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA