Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Bioinformatics ; 35(21): 4402-4404, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31086982

RESUMO

SUMMARY: To address the need for improved phage annotation tools that scale, we created an automated throughput annotation pipeline: multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene calling algorithm and assigns putative functions to gene calls using protein-, virus- and phage-centric databases. multiPhATE's modular construction allows the user to implement all or any portion of the analyses by acquiring local instances of the desired databases and specifying the desired analyses in a configuration file. We demonstrate multiPhATE by annotating two newly sequenced Yersinia pestis phage genomes. Within multiPhATE, the PhATE processing pipeline can be readily implemented across multiple processors, making it adaptable for throughput sequencing projects. Software documentation assists the user in configuring the system. AVAILABILITY AND IMPLEMENTATION: multiPhATE was implemented in Python 3.7, and runs as a command-line code under Linux or Unix. multiPhATE is freely available under an open-source BSD3 license from https://github.com/carolzhou/multiPhATE. Instructions for acquiring the databases and third-party codes used by multiPhATE are included in the distribution README file. Users may report bugs by submitting to the github issues page associated with the multiPhATE distribution. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bacteriófagos , Biologia Computacional , Algoritmos , Genoma , Software
2.
BMC Genomics ; 18(1): 334, 2017 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-28454561

RESUMO

BACKGROUND: Examination of complex biological systems has long been achieved through methodical investigation of the system's individual components. While informative, this strategy often leads to inappropriate conclusions about the system as a whole. With the advent of high-throughput "omic" technologies, however, researchers can now simultaneously analyze an entire system at the level of molecule (DNA, RNA, protein, metabolite) and process (transcription, translation, enzyme catalysis). This strategy reduces the likelihood of improper conclusions, provides a framework for elucidation of genotype-phenotype relationships, and brings finer resolution to comparative genomic experiments. Here, we apply a multi-omic approach to analyze the gene expression profiles of two closely related Pseudomonas aeruginosa strains grown in n-alkanes or glycerol. RESULTS: The environmental P. aeruginosa isolate ATCC 33988 consumed medium-length (C10-C16) n-alkanes more rapidly than the laboratory strain PAO1, despite high genome sequence identity (average nucleotide identity >99%). Our data shows that ATCC 33988 induces a characteristic set of genes at the transcriptional, translational and post-translational levels during growth on alkanes, many of which differ from those expressed by PAO1. Of particular interest was the lack of expression from the rhl operon of the quorum sensing (QS) system, resulting in no measurable rhamnolipid production by ATCC 33988. Further examination showed that ATCC 33988 lacked the entire lasI/lasR arm of the QS response. Instead of promoting expression of QS genes, ATCC 33988 up-regulates a small subset of its genome, including operons responsible for specific alkaline proteases and sphingosine metabolism. CONCLUSION: This work represents the first time results from RNA-seq, microarray, ribosome footprinting, proteomics, and small molecule LC-MS experiments have been integrated to compare gene expression in bacteria. Together, these data provide insights as to why strain ATCC 33988 is better adapted for growth and survival on n-alkanes.


Assuntos
Alcanos/farmacologia , Biologia Computacional/métodos , Pseudomonas aeruginosa/efeitos dos fármacos , Perfilação da Expressão Gênica , Glicolipídeos/metabolismo , Pseudomonas aeruginosa/citologia , Pseudomonas aeruginosa/genética , Pseudomonas aeruginosa/metabolismo , Percepção de Quorum/efeitos dos fármacos
3.
BMC Bioinformatics ; 17: 43, 2016 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-26792120

RESUMO

BACKGROUND: Here we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. RESULTS: In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. CONCLUSIONS: PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequence-based genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.


Assuntos
Genoma Bacteriano , Herbaspirillum/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Internet , Anotação de Sequência Molecular/métodos , Software , Biologia Computacional/métodos , Computadores , Microbiologia da Água
4.
BMC Bioinformatics ; 12: 226, 2011 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-21635786

RESUMO

BACKGROUND: Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. RESULTS: Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. CONCLUSIONS: StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.


Assuntos
Algoritmos , Poliovirus/enzimologia , RNA Polimerase Dependente de RNA/química , Homologia Estrutural de Proteína , Motivos de Aminoácidos , Sequência de Aminoácidos , Primers do DNA/genética , Modelos Moleculares , Dados de Sequência Molecular , Poliovirus/metabolismo , RNA Polimerase Dependente de RNA/metabolismo
5.
G3 (Bethesda) ; 11(5)2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33734357

RESUMO

To address a need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of multiPhATE, a functional annotation code released previously, multiPhATE2 performs gene finding using multiple algorithms, compares the results of the algorithms, performs functional annotation of coding sequences, and incorporates additional search algorithms and databases to extend the search space of the original code. MultiPhATE2 performs gene matching among sets of closely related bacteriophage genomes, and uses multiprocessing to speed computations. MultiPhATE2 can be re-started at multiple points within the workflow to allow the user to examine intermediate results and adjust the subsequent computations accordingly. In addition, multiPhATE2 accommodates custom gene calls and sequence databases, again adding flexibility. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC operating systems. Full documentation is provided as a README file and a Wiki website.


Assuntos
Bacteriófagos , Algoritmos , Bacteriófagos/genética , Genoma , Genômica , Anotação de Sequência Molecular , Software
6.
Microorganisms ; 9(1)2021 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-33429904

RESUMO

One of the main steps in gene-finding in prokaryotes is determining which open reading frames encode for a protein, and which occur by chance alone. There are many different methods to differentiate the two; the most prevalent approach is using shared homology with a database of known genes. This method presents many pitfalls, most notably the catch that you only find genes that you have seen before. The four most popular prokaryotic gene-prediction programs (GeneMark, Glimmer, Prodigal, Phanotate) all use a protein-coding training model to predict protein-coding genes, with the latter three allowing for the training model to be created ab initio from the input genome. Different methods are available for creating the training model, and to increase the accuracy of such tools, we present here GOODORFS, a method for identifying protein-coding genes within a set of all possible open reading frames (ORFS). Our workflow begins with taking the amino acid frequencies of each ORF, calculating an entropy density profile (EDP), using KMeans to cluster the EDPs, and then selecting the cluster with the lowest variation as the coding ORFs. To test the efficacy of our method, we ran GOODORFS on 14,179 annotated phage genomes, and compared our results to the initial training-set creation step of four other similar methods (Glimmer, MED2, PHANOTATE, Prodigal). We found that GOODORFS was the most accurate (0.94) and had the best F1-score (0.85), while Glimmer had the highest precision (0.92) and PHANOTATE had the highest recall (0.96).

7.
PLoS Comput Biol ; 5(6): e1000401, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19503843

RESUMO

Here we introduce a quantitative structure-driven computational domain-fusion method, which we used to predict the structures of proteins believed to be involved in regulation of the subtilin pathway in Bacillus subtilis, and used to predict a protein-protein complex formed by interaction between the proteins. Homology modeling of SpaK and SpaR yielded preliminary structural models based on a best template for SpaK comprising a dimer of a histidine kinase, and for SpaR a response regulator protein. Our LGA code was used to identify multi-domain proteins with structure homology to both modeled structures, yielding a set of domain-fusion templates then used to model a hypothetical SpaK/SpaR complex. The models were used to identify putative functional residues and residues at the protein-protein interface, and bioinformatics was used to compare functionally and structurally relevant residues in corresponding positions among proteins with structural homology to the templates. Models of the complex were evaluated in light of known properties of the functional residues within two-component systems involving His-Asp phosphorelays. Based on this analysis, a phosphotransferase complexed with a beryllofluoride was selected as the optimal template for modeling a SpaK/SpaR complex conformation. In vitro phosphorylation studies performed using wild type and site-directed SpaK mutant proteins validated the predictions derived from application of the structure-driven domain-fusion method: SpaK was phosphorylated in the presence of (32)P-ATP and the phosphate moiety was subsequently transferred to SpaR, supporting the hypothesis that SpaK and SpaR function as sensor and response regulator, respectively, in a two-component signal transduction system, and furthermore suggesting that the structure-driven domain-fusion approach correctly predicted a physical interaction between SpaK and SpaR. Our domain-fusion algorithm leverages quantitative structure information and provides a tool for generation of hypotheses regarding protein function, which can then be tested using empirical methods.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Domínios e Motivos de Interação entre Proteínas , Proteínas Serina-Treonina Quinases/química , Fatores de Transcrição/química , Bacillus subtilis/enzimologia , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Escherichia coli/genética , Modelos Químicos , Modelos Moleculares , Fosforilação , Conformação Proteica , Mapeamento de Interação de Proteínas , Proteínas Serina-Treonina Quinases/genética , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Recombinantes de Fusão/química , Proteínas Recombinantes de Fusão/genética , Proteínas Recombinantes de Fusão/metabolismo , Reprodutibilidade dos Testes , Homologia Estrutural de Proteína , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
8.
BMC Bioinformatics ; 7: 459, 2006 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-17044936

RESUMO

BACKGROUND: MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. DESCRIPTION: MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. CONCLUSION: MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports. Access to MannDB is freely available at http://manndb.llnl.gov/.


Assuntos
Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Interface Usuário-Computador , Algoritmos , Sequência de Aminoácidos , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Sítios de Ligação , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Internet , Dados de Sequência Molecular , Ligação Proteica , Proteoma/química , Proteoma/classificação , Proteoma/genética , Proteoma/metabolismo , Software , Integração de Sistemas
9.
Bioinform Biol Insights ; 10: 81-95, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27385911

RESUMO

Modeling the molecular mechanisms that govern genetic variation can be useful in understanding the dynamics that drive genetic state transition in quasispecies viruses. For example, there is considerable interest in understanding how the relatively benign vaccine strains of poliovirus eventually revert to forms that confer neurovirulence and cause disease (ie, vaccine-derived poliovirus). This report describes a stochastic simulation model, S2M, which can be used to generate hypothetical outcomes based on known mechanisms of genetic diversity. S2M begins with predefined genotypes based on the Sabin-1 and Mahoney wild-type sequences, constructs a set of independent cell-based populations, and performs in-cell replication and cell-to-cell infection cycles while quantifying genetic changes that track the transition from Sabin-1 toward Mahoney. Realism is incorporated into the model by assigning defaults for variables that constrain mechanisms of genetic variability based roughly on metrics reported in the literature, yet these values can be modified at the command line in order to generate hypothetical outcomes driven by these parameters. To demonstrate the utility of S2M, simulations were performed to examine the effects of the rates of replication error and recombination and the presence or absence of defective interfering particles, upon reaching the end states of Mahoney resemblance (semblance of a vaccine-derived state), neurovirulence, genome fitness, and cloud diversity. Simulations provide insight into how modeled biological features may drive hypothetical outcomes, independently or in combination, in ways that are not always intuitively obvious.

10.
Source Code Biol Med ; 10: 9, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26246852

RESUMO

BACKGROUND: In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. RESULTS: This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CONCLUSIONS: CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

11.
BMC Res Notes ; 5: 96, 2012 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-22333139

RESUMO

BACKGROUND: Genes conferring antibiotic resistance to groups of bacterial pathogens are cause for considerable concern, as many once-reliable antibiotics continue to see a reduction in efficacy. The recent discovery of the metallo ß-lactamase blaNDM-1 gene, which appears to grant antibiotic resistance to a variety of Enterobacteriaceae via a mobile plasmid, is one example of this distressing trend. The following work describes a computational analysis of pathogen-borne MBLs that focuses on the structural aspects of characterized proteins. RESULTS: Using both sequence and structural analyses, we examine residues and structural features specific to various pathogen-borne MBL types. This analysis identifies a linker region within MBL-like folds that may act as a discriminating structural feature between these proteins, and specifically resistance-associated acquirable MBLs. Recently released crystal structures of the newly emerged NDM-1 protein were aligned against related MBL structures using a variety of global and local structural alignment methods, and the overall fold conformation is examined for structural conservation. Conservation appears to be present in most areas of the protein, yet is strikingly absent within a linker region, making NDM-1 unique with respect to a linker-based classification scheme. Variability analysis of the NDM-1 crystal structure highlights unique residues in key regions as well as identifying several characteristics shared with other transferable MBLs. CONCLUSIONS: A discriminating linker region identified in MBL proteins is highlighted and examined in the context of NDM-1 and primarily three other MBL types: IMP-1, VIM-2 and ccrA. The presence of an unusual linker region variant and uncommon amino acid composition at specific structurally important sites may help to explain the unusually broad kinetic profile of NDM-1 and may aid in directing research attention to areas of this protein, and possibly other MBLs, that may be targeted for inactivation or attenuation of enzymatic activity.

12.
Bioinform Biol Insights ; 2: 5-13, 2008 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-19812763

RESUMO

We compared structure alignments generated by several protein structure comparison programs to determine whether existing methods would satisfactorily align residues at a highly conserved position within an immunogenic loop in ribosome inactivating proteins (RIPs). Using default settings, structure alignments generated by several programs (CE, DaliLite, FATCAT, LGA, MAMMOTH, MATRAS, SHEBA, SSM) failed to align the respective conserved residues, although LGA reported correct residue-residue (R-R) correspondences when the beta-carbon (Cb) position was used as the point of reference in the alignment calculations. Further tests using variable points of reference indicated that points distal from the beta carbon along a vector connecting the alpha and beta carbons yielded rigid structural alignments in which residues known to be highly conserved in RIPs were reported as corresponding residues in structural comparisons between ricin A chain, abrin-A, and other RIPs. Results suggest that approaches to structure alignment employing alternate point representations corresponding to side chain position may yield structure alignments that are more consistent with observed conservation of functional surface residues than do standard alignment programs, which apply uniform criteria for alignment (i.e. alpha carbon (Ca) as point of reference) along the entirety of the peptide chain. We present the results of tests that suggest the utility of allowing user-specified points of reference in generating alternate structural alignments, and we present a web server for automatically generating such alignments: http://as2ts.llnl.gov/AS2TS/LGA/lga_pdblist_plots.html.

13.
Bioinformatics ; 21(14): 3089-96, 2005 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-15905278

RESUMO

MOTIVATION: Specific and sensitive ligand-based protein detection assays that employ antibodies or small molecules such as peptides, aptamers or other small molecules require that the corresponding surface region of the protein be accessible and that there be minimal cross-reactivity with non-target proteins. To reduce the time and cost of laboratory screening efforts for diagnostic reagents, we developed new methods for evaluating and selecting protein surface regions for ligand targeting. RESULTS: We devised combined structure- and sequence-based methods for identifying 3D epitopes and binding pockets on the surface of the A chain of ricin that are conserved with respect to a set of ricin A chains and unique with respect to other proteins. We (1) used structure alignment software to detect structural deviations and extracted from this analysis the residue-residue correspondence, (2) devised a method to compare corresponding residues across sets of ricin structures and structures of closely related proteins, (3) devised a sequence-based approach to determine residue infrequency in local sequence context and (4) modified a pocket-finding algorithm to identify surface crevices in close proximity to residues determined to be conserved/unique based on our structure- and sequence-based methods. In applying this combined informatics approach to ricin A, we identified a conserved/unique pocket in close proximity (but not overlapping) the active site that is suitable for bi-dentate ligand development. These methods are generally applicable to identification of surface epitopes and binding pockets for development of diagnostic reagents, therapeutics and vaccines.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Ricina/análise , Ricina/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Sequência Conservada , Dados de Sequência Molecular , Ligação Proteica , Conformação Proteica , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa