Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
Nucleic Acids Res ; 41(Database issue): D387-95, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23197656

ABSTRACT

TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature.


Subject(s)
Databases, Protein , Proteins/classification , Proteins/genetics , Genome, Archaeal , Genome, Bacterial , Genomics/methods , Internet , Markov Chains , Molecular Sequence Annotation , Proteins/physiology , Sequence Alignment
2.
Nucleic Acids Res ; 40(Database issue): D306-12, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22096229

ABSTRACT

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Proteins/classification , Proteins/physiology , Sequence Analysis, Protein , Software , Terminology as Topic , User-Computer Interface
3.
J Bacteriol ; 194(1): 36-48, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22037399

ABSTRACT

Multiple new prokaryotic C-terminal protein-sorting signals were found that reprise the tripartite architecture shared by LPXTG and PEP-CTERM: motif, TM helix, basic cluster. Defining hidden Markov models were constructed for all. PGF-CTERM occurs in 29 archaeal species, some of which have more than 50 proteins that share the domain. PGF-CTERM proteins include the major cell surface protein in Halobacterium, a glycoprotein with a partially characterized diphytanylglyceryl phosphate linkage near its C terminus. Comparative genomics identifies a distant exosortase homolog, designated archaeosortase A (ArtA), as the likely protein-processing enzyme for PGF-CTERM. Proteomics suggests that the PGF-CTERM region is removed. Additional systems include VPXXXP-CTERM/archeaosortase B in two of the same archaea and PEF-CTERM/archaeosortase C in four others. Bacterial exosortases often fall into subfamilies that partner with very different cohorts of extracellular polymeric substance biosynthesis proteins; several species have multiple systems. Variant systems include the VPDSG-CTERM/exosortase C system unique to certain members of the phylum Verrucomicrobia, VPLPA-CTERM/exosortase D in several alpha- and deltaproteobacterial species, and a dedicated (single-target) VPEID-CTERM/exosortase E system in alphaproteobacteria. Exosortase-related families XrtF in the class Flavobacteria and XrtG in Gram-positive bacteria mark distinctive conserved gene neighborhoods. A picture emerges of an ancient and now well-differentiated superfamily of deeply membrane-embedded protein-processing enzymes. Their target proteins are destined to transit cellular membranes during their biosynthesis, during which most undergo additional posttranslational modifications such as glycosylation.


Subject(s)
Aminoacyltransferases/metabolism , Archaeal Proteins/metabolism , Bacterial Proteins/metabolism , Cysteine Endopeptidases/metabolism , Protein Processing, Post-Translational/physiology , Amino Acid Sequence , Aminoacyltransferases/genetics , Archaeal Proteins/genetics , Bacterial Proteins/genetics , Cell Membrane , Cysteine Endopeptidases/genetics , Gene Expression Regulation, Archaeal/physiology , Gene Expression Regulation, Bacterial/physiology , Gene Expression Regulation, Enzymologic/physiology , Molecular Sequence Data
4.
BMC Bioinformatics ; 12: 434, 2011 Nov 09.
Article in English | MEDLINE | ID: mdl-22070167

ABSTRACT

BACKGROUND: Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies. RESULTS: Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries. CONCLUSION: ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/.


Subject(s)
Archaea/metabolism , Archaeal Proteins/genetics , Methane/biosynthesis , Phylogeny , Software , Algorithms , Archaea/genetics , DNA/metabolism
5.
Nucleic Acids Res ; 37(Database issue): D211-5, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18940856

ABSTRACT

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).


Subject(s)
Databases, Protein , Sequence Analysis, Protein , Proteins/chemistry , Proteins/classification , Systems Integration
6.
Plant Commun ; 2(2): 100101, 2021 03 08.
Article in English | MEDLINE | ID: mdl-33898973

ABSTRACT

The most popular CRISPR-SpCas9 system recognizes canonical NGG protospacer adjacent motifs (PAMs). Previously engineered SpCas9 variants, such as Cas9-NG, favor G-rich PAMs in genome editing. In this manuscript, we describe a new plant genome-editing system based on a hybrid iSpyMacCas9 platform that allows for targeted mutagenesis, C to T base editing, and A to G base editing at A-rich PAMs. This study fills a major technology gap in the CRISPR-Cas9 system for editing NAAR PAMs in plants, which greatly expands the targeting scope of CRISPR-Cas9. Finally, our vector systems are fully compatible with Gateway cloning and will work with all existing single-guide RNA expression systems, facilitating easy adoption of the systems by others. We anticipate that more tools, such as prime editing, homology-directed repair, CRISPR interference, and CRISPR activation, will be further developed based on our promising iSpyMacCas9 platform.


Subject(s)
CRISPR-Cas Systems , Gene Editing/methods , Genome, Plant , Oryza/genetics , Triticum/genetics , Zea mays/genetics
7.
Nat Commun ; 12(1): 1944, 2021 03 29.
Article in English | MEDLINE | ID: mdl-33782402

ABSTRACT

CRISPR-Cas12a is a promising genome editing system for targeting AT-rich genomic regions. Comprehensive genome engineering requires simultaneous targeting of multiple genes at defined locations. Here, to expand the targeting scope of Cas12a, we screen nine Cas12a orthologs that have not been demonstrated in plants, and identify six, ErCas12a, Lb5Cas12a, BsCas12a, Mb2Cas12a, TsCas12a and MbCas12a, that possess high editing activity in rice. Among them, Mb2Cas12a stands out with high editing efficiency and tolerance to low temperature. An engineered Mb2Cas12a-RVRR variant enables editing with more relaxed PAM requirements in rice, yielding two times higher genome coverage than the wild type SpCas9. To enable large-scale genome engineering, we compare 12 multiplexed Cas12a systems and identify a potent system that exhibits nearly 100% biallelic editing efficiency with the ability to target as many as 16 sites in rice. This is the highest level of multiplex edits in plants to date using Cas12a. Two compact single transcript unit CRISPR-Cas12a interference systems are also developed for multi-gene repression in rice and Arabidopsis. This study greatly expands the targeting scope of Cas12a for crop genome engineering.


Subject(s)
Arabidopsis/genetics , Bacterial Proteins/genetics , CRISPR-Associated Proteins/genetics , CRISPR-Cas Systems , Endodeoxyribonucleases/genetics , Gene Editing/methods , Genetic Engineering/methods , Genome, Plant , Oryza/genetics , Agrobacterium tumefaciens , Alleles , Arabidopsis/metabolism , Bacterial Proteins/metabolism , Base Sequence , CRISPR-Associated Protein 9/genetics , CRISPR-Associated Protein 9/metabolism , CRISPR-Associated Proteins/metabolism , Clustered Regularly Interspaced Short Palindromic Repeats , Crops, Agricultural , Endodeoxyribonucleases/metabolism , Humans , Isoenzymes/genetics , Isoenzymes/metabolism , Oryza/metabolism , Plants, Genetically Modified , RNA, Guide, Kinetoplastida/genetics , RNA, Guide, Kinetoplastida/metabolism , Sequence Alignment
8.
J Bacteriol ; 192(21): 5788-98, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20675471

ABSTRACT

Regimens targeting Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), require long courses of treatment and a combination of three or more drugs. An increase in drug-resistant strains of M. tuberculosis demonstrates the need for additional TB-specific drugs. A notable feature of M. tuberculosis is coenzyme F(420), which is distributed sporadically and sparsely among prokaryotes. This distribution allows for comparative genomics-based investigations. Phylogenetic profiling (comparison of differential gene content) based on F(420) biosynthesis nominated many actinobacterial proteins as candidate F(420)-dependent enzymes. Three such families dominated the results: the luciferase-like monooxygenase (LLM), pyridoxamine 5'-phosphate oxidase (PPOX), and deazaflavin-dependent nitroreductase (DDN) families. The DDN family was determined to be limited to F(420)-producing species. The LLM and PPOX families were observed in F(420)-producing species as well as species lacking F(420) but were particularly numerous in many actinobacterial species, including M. tuberculosis. Partitioning the LLM and PPOX families based on an organism's ability to make F(420) allowed the application of the SIMBAL (sites inferred by metabolic background assertion labeling) profiling method to identify F(420)-correlated subsequences. These regions were found to correspond to flavonoid cofactor binding sites. Significantly, these results showed that M. tuberculosis carries at least 28 separate F(420)-dependent enzymes, most of unknown function, and a paucity of flavin mononucleotide (FMN)-dependent proteins in these families. While prevalent in mycobacteria, markers of F(420) biosynthesis appeared to be absent from the normal human gut flora. These findings suggest that M. tuberculosis relies heavily on coenzyme F(420) for its redox reactions. This dependence and the cofactor's rarity may make F(420)-related proteins promising drug targets.


Subject(s)
Actinobacteria/enzymology , Gene Expression Regulation, Bacterial/physiology , Mycobacterium tuberculosis/enzymology , Riboflavin/analogs & derivatives , Amino Acid Sequence , Binding Sites , Coenzymes/metabolism , Flavonoids , Gene Expression Profiling , Genome, Bacterial , Molecular Biology , Molecular Sequence Data , Molecular Structure , Phylogeny , Protein Conformation , Riboflavin/genetics , Riboflavin/metabolism
9.
BMC Bioinformatics ; 11: 52, 2010 Jan 26.
Article in English | MEDLINE | ID: mdl-20102603

ABSTRACT

BACKGROUND: Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. RESULTS: Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. CONCLUSIONS: SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Proteins/chemistry , Proteins/physiology , Sequence Analysis, Protein/methods , Amino Acid Sequence , Molecular Sequence Data , Phylogeny , Structure-Activity Relationship
10.
Nature ; 426(6964): 299-302, 2003 Nov 20.
Article in English | MEDLINE | ID: mdl-14628053

ABSTRACT

Post-translational modifications provide sensitive and flexible mechanisms to dynamically modulate protein function in response to specific signalling inputs. In the case of transcription factors, changes in phosphorylation state can influence protein stability, conformation, subcellular localization, cofactor interactions, transactivation potential and transcriptional output. Here we show that the evolutionarily conserved transcription factor Eyes absent (Eya) belongs to the phosphatase subgroup of the haloacid dehalogenase (HAD) superfamily, and propose a function for it as a non-thiol-based protein tyrosine phosphatase. Experiments performed in cultured Drosophila cells and in vitro indicate that Eyes absent has intrinsic protein tyrosine phosphatase activity and can autocatalytically dephosphorylate itself. Confirming the biological significance of this function, mutations that disrupt the phosphatase active site severely compromise the ability of Eyes absent to promote eye specification and development in Drosophila. Given the functional importance of phosphorylation-dependent modulation of transcription factor activity, this evidence for a nuclear transcriptional coactivator with intrinsic phosphatase activity suggests an unanticipated method of fine-tuning transcriptional regulation.


Subject(s)
Drosophila Proteins/metabolism , Drosophila melanogaster/enzymology , Eye Proteins/metabolism , Protein Tyrosine Phosphatases/metabolism , Transcription Factors/metabolism , Amino Acid Motifs , Amino Acid Sequence , Animals , Antibodies, Phospho-Specific/immunology , Drosophila Proteins/chemistry , Drosophila Proteins/genetics , Drosophila melanogaster/embryology , Drosophila melanogaster/genetics , Embryonic Induction , Eye/embryology , Eye/enzymology , Eye/metabolism , Eye Proteins/chemistry , Eye Proteins/genetics , Gene Expression Regulation , Kinetics , Mice , Models, Molecular , Molecular Sequence Data , Mutation/genetics , Phosphorylation , Protein Conformation , Protein Tyrosine Phosphatases/chemistry , Protein Tyrosine Phosphatases/genetics , Substrate Specificity , Transcription Factors/chemistry , Transcription Factors/genetics
11.
Nature ; 432(7019): 910-3, 2004 Dec 16.
Article in English | MEDLINE | ID: mdl-15602564

ABSTRACT

Since the recognition of prokaryotes as essential components of the oceanic food web, bacterioplankton have been acknowledged as catalysts of most major biogeochemical processes in the sea. Studying heterotrophic bacterioplankton has been challenging, however, as most major clades have never been cultured or have only been grown to low densities in sea water. Here we describe the genome sequence of Silicibacter pomeroyi, a member of the marine Roseobacter clade (Fig. 1), the relatives of which comprise approximately 10-20% of coastal and oceanic mixed-layer bacterioplankton. This first genome sequence from any major heterotrophic clade consists of a chromosome (4,109,442 base pairs) and megaplasmid (491,611 base pairs). Genome analysis indicates that this organism relies upon a lithoheterotrophic strategy that uses inorganic compounds (carbon monoxide and sulphide) to supplement heterotrophy. Silicibacter pomeroyi also has genes advantageous for associations with plankton and suspended particles, including genes for uptake of algal-derived compounds, use of metabolites from reducing microzones, rapid growth and cell-density-dependent regulation. This bacterium has a physiology distinct from that of marine oligotrophs, adding a new strategy to the recognized repertoire for coping with a nutrient-poor ocean.


Subject(s)
Adaptation, Physiological/genetics , Genome, Bacterial , Plankton/genetics , Plankton/physiology , Roseobacter/genetics , Roseobacter/physiology , Seawater/microbiology , Carrier Proteins/genetics , Carrier Proteins/metabolism , Genes, Bacterial/genetics , Marine Biology , Molecular Sequence Data , Oceans and Seas , Phylogeny , Plankton/classification , RNA, Ribosomal, 16S/genetics , Roseobacter/classification
12.
Nat Biotechnol ; 25(5): 569-75, 2007 May.
Article in English | MEDLINE | ID: mdl-17468768

ABSTRACT

Dichelobacter nodosus causes ovine footrot, a disease that leads to severe economic losses in the wool and meat industries. We sequenced its 1.4-Mb genome, the smallest known genome of an anaerobe. It differs markedly from small genomes of intracellular bacteria, retaining greater biosynthetic capabilities and lacking any evidence of extensive ongoing genome reduction. Comparative genomic microarray studies and bioinformatic analysis suggested that, despite its small size, almost 20% of the genome is derived from lateral gene transfer. Most of these regions seem to be associated with virulence. Metabolic reconstruction indicated unsuspected capabilities, including carbohydrate utilization, electron transfer and several aerobic pathways. Global transcriptional profiling and bioinformatic analysis enabled the prediction of virulence factors and cell surface proteins. Screening of these proteins against ovine antisera identified eight immunogenic proteins that are candidate antigens for a cross-protective vaccine.


Subject(s)
Antigens/immunology , Antigens/therapeutic use , Dichelobacter nodosus/genetics , Dichelobacter nodosus/pathogenicity , Foot Rot/immunology , Foot Rot/microbiology , Sequence Analysis, DNA/methods , Animals , Antigens/genetics , Chromosome Mapping/methods , Dichelobacter nodosus/immunology , Dichelobacter nodosus/metabolism , Foot Rot/prevention & control , Genome, Bacterial/genetics
13.
Appl Environ Microbiol ; 75(7): 2046-56, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19201974

ABSTRACT

The complete genomes of three strains from the phylum Acidobacteria were compared. Phylogenetic analysis placed them as a unique phylum. They share genomic traits with members of the Proteobacteria, the Cyanobacteria, and the Fungi. The three strains appear to be versatile heterotrophs. Genomic and culture traits indicate the use of carbon sources that span simple sugars to more complex substrates such as hemicellulose, cellulose, and chitin. The genomes encode low-specificity major facilitator superfamily transporters and high-affinity ABC transporters for sugars, suggesting that they are best suited to low-nutrient conditions. They appear capable of nitrate and nitrite reduction but not N(2) fixation or denitrification. The genomes contained numerous genes that encode siderophore receptors, but no evidence of siderophore production was found, suggesting that they may obtain iron via interaction with other microorganisms. The presence of cellulose synthesis genes and a large class of novel high-molecular-weight excreted proteins suggests potential traits for desiccation resistance, biofilm formation, and/or contribution to soil structure. Polyketide synthase and macrolide glycosylation genes suggest the production of novel antimicrobial compounds. Genes that encode a variety of novel proteins were also identified. The abundance of acidobacteria in soils worldwide and the breadth of potential carbon use by the sequenced strains suggest significant and previously unrecognized contributions to the terrestrial carbon cycle. Combining our genomic evidence with available culture traits, we postulate that cells of these isolates are long-lived, divide slowly, exhibit slow metabolic rates under low-nutrient conditions, and are well equipped to tolerate fluctuations in soil hydration.


Subject(s)
Bacteria/genetics , Bacteria/isolation & purification , DNA, Bacterial/genetics , Genome, Bacterial , Soil Microbiology , Anti-Bacterial Agents/biosynthesis , Biological Transport , Carbohydrate Metabolism , Cyanobacteria/genetics , DNA, Bacterial/chemistry , Fungi/genetics , Macrolides/metabolism , Molecular Sequence Data , Nitrogen/metabolism , Phylogeny , Proteobacteria/genetics , Sequence Analysis, DNA , Sequence Homology
14.
Nucleic Acids Res ; 35(Database issue): D260-4, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17151080

ABSTRACT

TIGRFAMs is a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions. Each family is based on a hidden Markov model (HMM), where both cutoff scores and membership in the seed alignment are chosen so that the HMMs can classify numerous proteins according to their specific molecular functions. Most TIGRFAMs models describe 'equivalog' families, where both orthology and lateral gene transfer may be part of the evolutionary history, but where a single molecular function has been conserved. The Genome Properties system contains a queriable set of metabolic reconstructions, genome metrics and extractions of information from the scientific literature. Its genome-by-genome assertions of whether or not specific structures, pathways or systems are present provide high-level conceptual descriptions of genomic content. These assertions enable comparative genomics, provide a meaningful biological context to aid in manual annotation, support assignments of Gene Ontology (GO) biological process terms and help validate HMM-based predictions of protein function. The Genome Properties system is particularly useful as a generator of phylogenetic profiles, through which new protein family functions may be discovered. The TIGRFAMs and Genome Properties systems can be accessed at http://www.tigr.org/TIGRFAMs and http://www.tigr.org/Genome_Properties.


Subject(s)
Archaeal Proteins/physiology , Bacterial Proteins/physiology , Databases, Protein , Archaeal Proteins/classification , Archaeal Proteins/genetics , Bacterial Proteins/classification , Bacterial Proteins/genetics , Genome, Bacterial , Genomics , Internet , Phylogeny , Software , User-Computer Interface
15.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17202162

ABSTRACT

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Subject(s)
Databases, Protein , Internet , Protein Structure, Tertiary , Proteins/chemistry , Proteins/classification , Proteins/physiology , Sequence Analysis, Protein , Systems Integration , User-Computer Interface
16.
PLoS Genet ; 2(2): e21, 2006 Feb.
Article in English | MEDLINE | ID: mdl-16482227

ABSTRACT

Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens.


Subject(s)
Ehrlichia/genetics , Ehrlichiosis/genetics , Genomics/methods , Animals , Biotin/metabolism , DNA Repair , Ehrlichiosis/microbiology , Genome , Humans , Models, Biological , Phylogeny , Rickettsia/genetics , Ticks
17.
Nat Biotechnol ; 22(5): 554-9, 2004 May.
Article in English | MEDLINE | ID: mdl-15077118

ABSTRACT

Desulfovibrio vulgaris Hildenborough is a model organism for studying the energy metabolism of sulfate-reducing bacteria (SRB) and for understanding the economic impacts of SRB, including biocorrosion of metal infrastructure and bioremediation of toxic metal ions. The 3,570,858 base pair (bp) genome sequence reveals a network of novel c-type cytochromes, connecting multiple periplasmic hydrogenases and formate dehydrogenases, as a key feature of its energy metabolism. The relative arrangement of genes encoding enzymes for energy transduction, together with inferred cellular location of the enzymes, provides a basis for proposing an expansion to the 'hydrogen-cycling' model for increasing energy efficiency in this bacterium. Plasmid-encoded functions include modification of cell surface components, nitrogen fixation and a type-III protein secretion system. This genome sequence represents a substantial step toward the elucidation of pathways for reduction (and bioremediation) of pollutants such as uranium and chromium and offers a new starting point for defining this organism's complex anaerobic respiration.


Subject(s)
Desulfovibrio vulgaris/genetics , Genome, Bacterial , Desulfovibrio vulgaris/metabolism , Energy Metabolism , Molecular Sequence Data
18.
Nucleic Acids Res ; 33(Database issue): D201-5, 2005 Jan 01.
Article in English | MEDLINE | ID: mdl-15608177

ABSTRACT

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Subject(s)
Databases, Protein , Proteins/chemistry , Proteins/classification , Sequence Analysis, Protein , Databases, Protein/trends , Humans , Protein Structure, Tertiary , Sequence Alignment , Systems Integration
19.
BMC Biol ; 4: 29, 2006 Aug 24.
Article in English | MEDLINE | ID: mdl-16930487

ABSTRACT

BACKGROUND: Protein translocation to the proper cellular destination may be guided by various classes of sorting signals recognizable in the primary sequence. Detection in some genomes, but not others, may reveal sorting system components by comparison of the phylogenetic profile of the class of sorting signal to that of various protein families. RESULTS: We describe a short C-terminal homology domain, sporadically distributed in bacteria, with several key characteristics of protein sorting signals. The domain includes a near-invariant motif Pro-Glu-Pro (PEP). This possible recognition or processing site is followed by a predicted transmembrane helix and a cluster rich in basic amino acids. We designate this domain PEP-CTERM. It tends to occur multiple times in a genome if it occurs at all, with a median count of eight instances; Verrucomicrobium spinosum has sixty-five. PEP-CTERM-containing proteins generally contain an N-terminal signal peptide and exhibit high diversity and little homology to known proteins. All bacteria with PEP-CTERM have both an outer membrane and exopolysaccharide (EPS) production genes. By a simple heuristic for screening phylogenetic profiles in the absence of pre-formed protein families, we discovered that a homolog of the membrane protein EpsH (exopolysaccharide locus protein H) occurs in a species when PEP-CTERM domains are found. The EpsH family contains invariant residues consistent with a transpeptidase function. Most PEP-CTERM proteins are encoded by single-gene operons preceded by large intergenic regions. In the Proteobacteria, most of these upstream regions share a DNA sequence, a probable cis-regulatory site that contains a sigma-54 binding motif. The phylogenetic profile for this DNA sequence exactly matches that of three proteins: a sigma-54-interacting response regulator (PrsR), a transmembrane histidine kinase (PrsK), and a TPR protein (PrsT). CONCLUSION: These findings are consistent with the hypothesis that PEP-CTERM and EpsH form a protein export sorting system, analogous to the LPXTG/sortase system of Gram-positive bacteria, and correlated to EPS expression. It occurs preferentially in bacteria from sediments, soils, and biofilms. The novel method that led to these findings, partial phylogenetic profiling, requires neither global sequence clustering nor arbitrary similarity cutoffs and appears to be a rapid, effective alternative to other profiling methods.


Subject(s)
Amino Acid Motifs/genetics , Bacterial Proteins/metabolism , Polysaccharides, Bacterial/metabolism , Protein Sorting Signals/genetics , Amino Acid Sequence , Bacteria/genetics , Bacteria/growth & development , Bacteria/metabolism , Bacterial Proteins/genetics , Biofilms , Genome, Bacterial/genetics , Markov Chains , Molecular Sequence Data , Phylogeny , Protein Sorting Signals/physiology , Protein Transport/physiology , Seawater/microbiology , Sequence Alignment , Soil Microbiology
20.
OMICS ; 10(2): 100-4, 2006.
Article in English | MEDLINE | ID: mdl-16901213

ABSTRACT

This article summarizes the proceedings of the "eGenomics: Cataloguing our Complete Genome Collection II" workshop held November 10-11, 2005, at the European Bioinformatics Institute. This exploratory workshop, organized by members of the Genomic Standards Consortium (GSC), brought together researchers from the genomic, functional OMICS, and computational biology communities to discuss standardization activities across a range of projects. The workshop proceedings and outcomes are set to help guide the development of the GSC's Minimal Information about a Genome Sequence (MIGS) specification.


Subject(s)
Databases as Topic/standards , Genome, Human , Genome , Genomics/standards , Animals , Humans
SELECTION OF CITATIONS
SEARCH DETAIL