Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
J Ind Microbiol Biotechnol ; 50(1)2023 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-37656881

RESUMEN

Biomanufacturing could contribute as much as ${\$}$30 trillion to the global economy by 2030. However, the success of the growing bioeconomy depends on our ability to manufacture high-performing strains in a time- and cost-effective manner. The Design-Build-Test-Learn (DBTL) framework has proven to be an effective strain engineering approach. Significant improvements have been made in genome engineering, genotyping, and phenotyping throughput over the last couple of decades that have greatly accelerated the DBTL cycles. However, to achieve a radical reduction in strain development time and cost, we need to look at the strain engineering process through a lens of optimizing the whole cycle, as opposed to simply increasing throughput at each stage. We propose an approach that integrates all 4 stages of the DBTL cycle and takes advantage of the advances in computational design, high-throughput genome engineering, and phenotyping methods, as well as machine learning tools for making predictions about strain scale-up performance. In this perspective, we discuss the challenges of industrial strain engineering, outline the best approaches to overcoming these challenges, and showcase examples of successful strain engineering projects for production of heterologous proteins, amino acids, and small molecules, as well as improving tolerance, fitness, and de-risking the scale-up of industrial strains.

3.
Nat Commun ; 14(1): 241, 2023 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-36646716

RESUMEN

Deep mutational scanning is a powerful approach to investigate a wide variety of research questions including protein function and stability. Here, we perform deep mutational scanning on three essential E. coli proteins (FabZ, LpxC and MurA) involved in cell envelope synthesis using high-throughput CRISPR genome editing, and study the effect of the mutations in their original genomic context. We use more than 17,000 variants of the proteins to interrogate protein function and the importance of individual amino acids in supporting viability. Additionally, we exploit these libraries to study resistance development against antimicrobial compounds that target the selected proteins. Among the three proteins studied, MurA seems to be the superior antimicrobial target due to its low mutational flexibility, which decreases the chance of acquiring resistance-conferring mutations that simultaneously preserve MurA function. Additionally, we rank anti-LpxC lead compounds for further development, guided by the number of resistance-conferring mutations against each compound. Our results show that deep mutational scanning studies can be used to guide drug development, which we hope will contribute towards the development of novel antimicrobial therapies.


Asunto(s)
Antibacterianos , Proteínas de Escherichia coli , Antibacterianos/farmacología , Antibacterianos/química , Proteínas Bacterianas/metabolismo , Escherichia coli/metabolismo , Mutación , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/farmacología
4.
Nucleic Acids Res ; 36(Database issue): D943-6, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17933772

RESUMEN

The Generation Challenge Programme (GCP; www.generationcp.org) has developed an online resource documenting stress-responsive genes comparatively across plant species. This public resource is a compendium of protein families, phylogenetic trees, multiple sequence alignments (MSA) and associated experimental evidence. The central objective of this resource is to elucidate orthologous and paralogous relationships between plant genes that may be involved in response to environmental stress, mainly abiotic stresses such as water deficit ('drought'). The web-based graphical user interface (GUI) of the resource includes query and visualization tools that allow diverse searches and browsing of the underlying project database. The web interface can be accessed at http://dayhoff.generationcp.org.


Asunto(s)
Productos Agrícolas/genética , Bases de Datos Genéticas , Genes de Plantas , Productos Agrícolas/metabolismo , Deshidratación , Ambiente , Perfilación de la Expresión Génica , Internet , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/clasificación , Alineación de Secuencia , Interfaz Usuario-Computador
5.
PLoS Comput Biol ; 3(8): e160, 2007 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-17708678

RESUMEN

Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at http://phylogenomics.berkeley.edu/phylofacts/.


Asunto(s)
Algoritmos , Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Proteínas/clasificación , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Cadenas de Markov , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
6.
Nucleic Acids Res ; 35(Web Server issue): W27-32, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17488835

RESUMEN

Phylogenomic analysis addresses the limitations of function prediction based on annotation transfer, and has been shown to enable the highest accuracy in prediction of protein molecular function. The Berkeley Phylogenomics Group provides a series of web servers for phylogenomic analysis: classification of sequences to pre-computed families and subfamilies using the PhyloFacts Phylogenomic Encyclopedia, FlowerPower clustering of proteins sharing the same domain architecture, MUSCLE multiple sequence alignment, SATCHMO simultaneous alignment and tree construction and SCI-PHY subfamily identification. The PhyloBuilder web server provides an integrated phylogenomic pipeline starting with a user-supplied protein sequence, proceeding to homolog identification, multiple alignment, phylogenetic tree construction, subfamily identification and structure prediction. The Berkeley Phylogenomics Group resources are available at http://phylogenomics.berkeley.edu.


Asunto(s)
Biología Computacional/métodos , Filogenia , Algoritmos , Animales , Computadores , Bases de Datos Genéticas , Bases de Datos de Proteínas , Humanos , Internet , Modelos Genéticos , Conformación Proteica , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos , Interfaz Usuario-Computador
7.
BMC Evol Biol ; 7 Suppl 1: S12, 2007 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-17288570

RESUMEN

BACKGROUND: Function prediction by transfer of annotation from the top database hit in a homology search has been shown to be prone to systematic error. Phylogenomic analysis reduces these errors by inferring protein function within the evolutionary context of the entire family. However, accuracy of function prediction for multi-domain proteins depends on all members having the same overall domain structure. By contrast, most common homolog detection methods are optimized for retrieving local homologs, and do not address this requirement. RESULTS: We present FlowerPower, a novel clustering algorithm designed for the identification of global homologs as a precursor to structural phylogenomic analysis. Similar to methods such as PSIBLAST, FlowerPower employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures. CONCLUSION: Structural phylogenomic analysis enables biologists to avoid the systematic errors associated with annotation transfer; clustering sequences based on sharing the same domain architecture is a critical first step in this process. FlowerPower is shown to consistently identify homologous sequences having the same domain architecture as the query. AVAILABILITY: FlowerPower is available as a webserver at http://phylogenomics.berkeley.edu/flowerpower/.


Asunto(s)
Algoritmos , Filogenia , Estructura Terciaria de Proteína , Proteínas/fisiología , Análisis de Secuencia de Proteína/métodos , Animales , Análisis por Conglomerados , Bases de Datos Genéticas , Humanos , Proteínas/clasificación , Proyectos de Investigación , Alineación de Secuencia
8.
Genome Biol ; 7(9): R83, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16973001

RESUMEN

The Berkeley Phylogenomics Group presents PhyloFacts, a structural phylogenomic encyclopedia containing almost 10,000 'books' for protein families and domains, with pre-calculated structural, functional and evolutionary analyses. PhyloFacts enables biologists to avoid the systematic errors associated with function prediction by homology through the integration of a variety of experimental data and bioinformatics methods in an evolutionary framework. Users can submit sequences for classification to families and functional subfamilies. PhyloFacts is available as a worldwide web resource from http://phylogenomics.berkeley.edu/phylofacts.


Asunto(s)
Bases de Datos de Proteínas , Proteínas , Animales , Evolución Molecular , Humanos , Filogenia , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Relación Estructura-Actividad
9.
Plant Physiol ; 138(2): 611-23, 2005 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15955925

RESUMEN

The tomato (Lycopersicon esculentum) Cf-9 resistance gene encodes the first characterized member of the plant receptor-like protein (RLP) family. Other RLPs such as CLAVATA2 and TOO MANY MOUTHS are known to regulate development. The domain structure of RLPs consists of extracellular leucine-rich repeats, a transmembrane helix, and a short cytoplasmic region. Here, we identify 90 RLPs in rice (Oryza sativa) and compare them with functionally characterized RLPs from different plant species and with 56 Arabidopsis (Arabidopsis thaliana) RLPs, including the downy mildew resistance protein RPP27. Many RLPs cluster into four distinct superclades, three of which include RLPs known to be involved in plant defense. Sequence comparisons reveal diagnostic amino acid residues that may specify different molecular functions in different RLP subtypes. This analysis of rice RLPs thus identified at least 73 candidate resistance genes and four genes potentially involved in development. Due to the synteny between rice and other Gramineae, this analysis should provide valuable tools for experimental studies in rice and other cereals.


Asunto(s)
Arabidopsis/genética , Oryza/genética , Proteínas de Plantas/genética , Receptores de Superficie Celular/genética , Secuencia de Aminoácidos , Arabidopsis/química , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/genética , Secuencia Conservada , Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Genoma de Planta , Datos de Secuencia Molecular , Oryza/química , Filogenia , Proteínas de Plantas/química , Receptores de Superficie Celular/química , Alineación de Secuencia , Homología de Secuencia de Aminoácido
10.
Mol Cell Proteomics ; 4(8): 1072-84, 2005 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-15901827

RESUMEN

We report an extensive proteome analysis of rice etioplasts, which were highly purified from dark-grown leaves by a novel protocol using Nycodenz density gradient centrifugation. Comparative protein profiling of different cell compartments from leaf tissue demonstrated the purity of the etioplast preparation by the absence of diagnostic marker proteins of other cell compartments. Systematic analysis of the etioplast proteome identified 240 unique proteins that provide new insights into heterotrophic plant metabolism and control of gene expression. They include several new proteins that were not previously known to localize to plastids. The etioplast proteins were compared with proteomes from Arabidopsis chloroplasts and plastid from tobacco Bright Yellow 2 cells. Together with computational structure analyses of proteins without functional annotations, this comparative proteome analysis revealed novel etioplast-specific proteins. These include components of the plastid gene expression machinery such as two RNA helicases, an RNase II-like hydrolytic exonuclease, and a site 2 protease-like metalloprotease all of which were not known previously to localize to the plastid and are indicative for so far unknown regulatory mechanisms of plastid gene expression. All etioplast protein identifications and related data were integrated into a data base that is freely available upon request.


Asunto(s)
Regulación de la Expresión Génica de las Plantas , Oryza/química , Proteínas de Plantas/metabolismo , Plastidios/química , Secuencia de Aminoácidos , Arabidopsis/química , Arabidopsis/genética , Cloroplastos , Biología Computacional , Electroforesis en Gel Bidimensional , Exonucleasas/metabolismo , Espectrometría de Masas , Metaloproteasas/metabolismo , Datos de Secuencia Molecular , Proteínas de Plantas/análisis , Proteínas de Plantas/química , Proteínas de Plantas/clasificación , Proteoma/análisis , Homología de Secuencia de Aminoácido , Transducción de Señal
11.
Pac Symp Biocomput ; : 322-33, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-15759638

RESUMEN

The limitations of homology-based methods for prediction of protein molecular function are well known; differences in domain structure, gene duplication events and errors in existing database annotations complicate this process. In this paper we present a method to detect and model protein subfamilies, which can be used in high-throughput, genome-scale phylogenomic inference of protein function. We demonstrate the method on a set of nine PFAM families, and show that subfamily HMMs provide greater separation of homologs and non-homologs than is possible with a single HMM for each family. We also show that subfamily HMMs can be used for functional classification with a very low expected error rate. The BETE method for identifying functional subfamilies is illustrated on a set of serotonin receptors.


Asunto(s)
Genómica , Animales , Teorema de Bayes , Evolución Biológica , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Enzimas/genética , Duplicación de Gen , Cadenas de Markov , Modelos Genéticos , Filogenia , Proteínas/química , Proteínas/genética , Alineación de Secuencia
12.
Proc Natl Acad Sci U S A ; 102(6): 2087-92, 2005 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-15684089

RESUMEN

During infection of Arabidopsis thaliana, the bacterium Pseudomonas syringae pv tomato delivers the effector protein AvrRpt2 into the plant cell cytosol. Within the plant cell, AvrRpt2 undergoes N-terminal processing and causes elimination of Arabidopsis RIN4. Previous work established that AvrRpt2 is a putative cysteine protease, and AvrRpt2 processing and RIN4 elimination require an intact predicted catalytic triad in that AvrRpt2. In this work, proteolytic events that depend on AvrRpt2 activity were characterized. The amino acid sequence surrounding the processing site of AvrRpt2 and two related sequences from RIN4 triggered Avr-Rpt2-dependent proteolytic cleavage of a synthetic substrate, demonstrating that these sequences are cleavage recognition sites for AvrRpt2 activity. Processing-deficient AvrRpt2 mutants were identified and shown to retain their ability to eliminate wild-type RIN4. Single amino acid substitutions were made in each of the two RIN4 cleavage sites, and mutation of both sites resulted in cleavage-resistant RIN4. Growth of Pseudomonas expressing AvrRpt2 was significantly higher than catalytically inactive mutants on Arabidopsis rin4/rps2 mutant plants, suggesting there are additional protein targets of AvrRpt2 that account for the virulence activity of this effector. Bioinformatics analysis identified putative Arabidopsis proteins containing sequences similar to the proteolytic cleavage sites conserved in AvrRpt2 and RIN4. Several of these proteins were eliminated in an AvrRpt2-dependent manner in a transient in planta expression system. These results identify amino acids important for AvrRpt2 substrate recognition and cleavage as well as demonstrate AvrRpt2 protease activity eliminates multiple Arabidopsis proteins in a transient expression system.


Asunto(s)
Proteínas Bacterianas/metabolismo , Pseudomonas syringae/metabolismo , Secuencia de Aminoácidos , Arabidopsis/microbiología , Proteínas de Arabidopsis/metabolismo , Proteínas Bacterianas/genética , Proteínas Portadoras/metabolismo , Péptidos y Proteínas de Señalización Intracelular , Datos de Secuencia Molecular , Alineación de Secuencia
13.
Proc Natl Acad Sci U S A ; 102(5): 1685-90, 2005 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-15668378

RESUMEN

The Agrobacterium T-DNA transporter belongs to a growing class of evolutionarily conserved transporters, called type IV secretion systems (T4SSs). VirB4, 789 aa, is the largest T4SS component, providing a rich source of possible structural domains. Here, we use a variety of bioinformatics methods to predict that the C-terminal domain of VirB4 (including the Walker A and B nucleotide-binding motifs) is related by divergent evolution to the cytoplasmic domain of TrwB, the coupling protein required for conjugative transfer of plasmid R388 from Escherichia coli. This prediction is supported by detailed sequence and structure analyses showing conservation of functionally and structurally important residues between VirB4 and TrwB. The availability of a solved crystal structure for TrwB enables the construction of a comparative model for VirB4 and the prediction that, like TrwB, VirB4 forms a hexamer. These results lead to a model in which VirB4 acts as a docking site at the entrance of the T4SS channel and acts in concert with VirD4 and VirB11 to transport substrates (T-strand linked to VirD2 or proteins such as VirE2, VirE3, or VirF) through the T4SS.


Asunto(s)
Proteínas Bacterianas/química , Transportadores de Anión Orgánico/química , Transportadores de Anión Orgánico/metabolismo , Rhizobium/fisiología , Secuencia de Aminoácidos , Proteínas Bacterianas/metabolismo , Sustancias Macromoleculares/química , Sustancias Macromoleculares/metabolismo , Modelos Moleculares , Datos de Secuencia Molecular , Fragmentos de Péptidos/química , Fragmentos de Péptidos/metabolismo , Estructura Secundaria de Proteína , Transporte de Proteínas , Alineación de Secuencia , Homología de Secuencia de Aminoácido
14.
Curr Protoc Bioinformatics ; Chapter 6: Unit 6.9, 2005 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18428751

RESUMEN

With the explosion in sequence data, accurate prediction of protein function has become a vital task in prioritizing experimental investigation. While computationally efficient methods for homology-based function prediction have been developed to make this approach feasible in high-throughput mode, it is not without its dangers. Biological processes such as gene duplication, domain shuffling, and speciation produce families of related genes whose gene products can have vastly different molecular functions. Standard sequence-comparison approaches may not discriminate effectively among these candidate homologs, leading to errors in database annotations. In this unit, we describe phylogenomic approaches to reduce the error rate in function prediction. Phylogenomic inference of protein molecular function consists of a series of subtasks. Once a cluster of homologs is identified, a multiple sequence alignment and phylogenetic tree are constructed. Finally, the phylogenetic tree is overlaid with experimental data culled for the members of the family, and changes in biochemical function can be traced along the evolutionary tree.


Asunto(s)
Algoritmos , Evolución Molecular , Modelos Genéticos , Proteínas/genética , Proteínas/metabolismo , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Simulación por Computador , Análisis Mutacional de ADN/métodos , Datos de Secuencia Molecular , Filogenia , Homología de Secuencia de Ácido Nucleico
15.
Curr Protoc Protein Sci ; Chapter 2: 2.11.1-2.11.24, 2005 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-18429280

RESUMEN

Prediction of molecular function of proteins has become an important task in the genomics era. A wide variety of sequence analysis tools are available to biologists for this task. We have selected one or two primary protocols for tasks such as domain detection, subcellular localization, and motif detection. We also present a strategy for integration of results from different protocols. All the resources needed for these protocols are accessible via publicly available Web servers and databases and require little or no computational expertise.


Asunto(s)
Proteínas/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Internet , Datos de Secuencia Molecular , Estructura Secundaria de Proteína , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Fracciones Subcelulares/química
16.
Curr Protoc Mol Biol ; Chapter 19: Unit 19.5, 2005 May.
Artículo en Inglés | MEDLINE | ID: mdl-18265355

RESUMEN

Prediction of molecular function of proteins has become an important task in the genomics era. A wide variety of sequence analysis tools are available to biologists for this task. We have selected one or two primary protocols for tasks such as domain detection, subcellular localization, and motif detection. We also present a strategy for integration of results from different protocols. All the resources needed for these protocols are accessible via publicly available Web servers and databases and require little or no computational expertise.


Asunto(s)
Sistemas de Administración de Bases de Datos , Análisis de Secuencia de Proteína/métodos , Secuencias de Aminoácidos , Almacenamiento y Recuperación de la Información , Internet , Estructura Terciaria de Proteína , Programas Informáticos , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA