Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
1.
Science ; 278(5338): 609-14, 1997 Oct 24.
Article in English | MEDLINE | ID: mdl-9381171

ABSTRACT

Ancient duplications and rearrangements of protein-coding segments have resulted in complex gene family relationships. Duplications can be tandem or dispersed and can involve entire coding regions or modules that correspond to folded protein domains. As a result, gene products may acquire new specificities, altered recognition properties, or modified functions. Extreme proliferation of some families within an organism, perhaps at the expense of other families, may correspond to functional innovations during evolution. The underlying processes are still at work, and the large fraction of human and other genomes consisting of transposable elements may be a manifestation of the evolutionary benefits of genomic flexibility.


Subject(s)
Multigene Family , Proteins/genetics , Amino Acid Sequence , Animals , Base Sequence , Computer Communication Networks , Databases as Topic , Evolution, Molecular , Genetic Variation , Humans , Phylogeny , Proteins/chemistry , Proteins/classification , Proteins/physiology , Repetitive Sequences, Nucleic Acid
2.
Trends Genet ; 17(8): 465-72, 2001 Aug.
Article in English | MEDLINE | ID: mdl-11485819

ABSTRACT

Inteins are selfish DNA elements found within coding regions. They are translated with their host protein, but then catalyze their own excision and the formation of a peptide bond between their flanking protein regions. Understanding what drives and selects inteins is relevant for assessing whether they have unidentified biological functions and whether they can invade and become established in new genes and organisms. Inteins are suggested to have been present and more common in the progenitors of eukaryotes and prokaryotes. In these cells, inteins had some beneficial function or had evolved from an unknown beneficial protein. Since then, this putative benefit has been lost and inteins are gradually becoming extinct. The proteins in which inteins are currently found are proposed to be proteins vital for the survival of the organism, where intein removal is most difficult.


Subject(s)
DNA/genetics , Archaea/chemistry , Biological Evolution , DNA Replication , Models, Biological , Phylogeny , Protein Structure, Tertiary , Transcription, Genetic
3.
J Mol Biol ; 307(3): 939-49, 2001 Mar 30.
Article in English | MEDLINE | ID: mdl-11273712

ABSTRACT

A new method to analyze the similarity between multiply aligned protein motifs (blocks) was developed. It identifies sets of consistently aligned blocks. These are found to be protein regions of similar function and structure that appear in different contexts. For example, the Rossmann fold ligand-binding region is found similar to TIM barrel and methylase regions, various protein families are predicted to have a TIM-barrel fold and the structural relation between the ClpP protease and crotonase folds is identified from their sequence. Besides identifying local structure features, sequence similarity across short sequence-regions (less than 20 amino acid regions) also predicts structure similarity of whole domains (folds) a few hundred amino acid residues long. Most of these relations could not be identified by other advanced sequence-to-sequence or sequence-to-multiple alignments comparisons. We describe the method (termed CYRCA), present examples of our findings, and discuss their implications.


Subject(s)
Computational Biology/methods , Protein Folding , Proteins/chemistry , Proteins/metabolism , Sequence Alignment , Adenosine Triphosphatases/chemistry , Adenosine Triphosphatases/metabolism , Algorithms , Amino Acid Motifs , Automation , Binding Sites , Databases as Topic , Endopeptidase Clp , Enoyl-CoA Hydratase/chemistry , Enoyl-CoA Hydratase/metabolism , Internet , Ligands , Models, Molecular , Protein Binding , Protein Structure, Tertiary , Serine Endopeptidases/chemistry , Serine Endopeptidases/metabolism , Software , Structure-Activity Relationship
4.
Protein Sci ; 3(12): 2340-50, 1994 Dec.
Article in English | MEDLINE | ID: mdl-7756989

ABSTRACT

Inteins (protein introns) are internal portions of protein sequences that are posttranslationally excised while the flanking regions are spliced together, making an additional protein product. Inteins have been found in a number of homologous genes in yeast, mycobacteria, and extreme thermophile archaebacteria. The inteins are probably multifunctional, autocatalyzing their own splicing, and some were also shown to be DNA endonucleases. The splice junction regions and two regions similar to homing endonucleases were thought to be the only common sequence features of inteins. This work analyzed all published intein sequences with recently developed methods for detecting weak, conserved sequence features. The methods complemented each other in the identification and assessment of several patterns characterizing the intein sequences. New intein conserved features are discovered and the known ones are quantitatively described and localized. The general sequence description of all the known inteins is derived from the motifs and their relative positions. The intein sequence description is used to search the sequence databases for intein-like proteins. A sequence region in a mycobacterial open reading frame possessing all of the intein motifs and absent from sequences homologous to both of its flanking sequences is identified as an intein. A newly discovered putative intein in red algae chloroplasts is found not to contain the endonuclease motifs present in all other inteins. The yeast HO endonuclease is found to have an overall intein-like structure and a few viral polyprotein cleavage sites are found to be significantly similar to the inteins amino-end splice junction motif. The intein features described may serve for detection of intein sequences.


Subject(s)
Bacterial Proteins/chemistry , DNA Helicases , Introns , Protein Processing, Post-Translational , Proteins/chemistry , Amino Acid Sequence , Deoxyribonucleases, Type II Site-Specific/chemistry , DnaB Helicases , Fungal Proteins/chemistry , Molecular Sequence Data , Mycobacterium leprae/chemistry , Open Reading Frames , Plant Proteins/chemistry , Proteins/metabolism , Rhodophyta/chemistry , Saccharomyces cerevisiae/chemistry , Saccharomyces cerevisiae Proteins , Sequence Alignment , Sequence Homology, Amino Acid
5.
Protein Sci ; 7(1): 64-71, 1998 Jan.
Article in English | MEDLINE | ID: mdl-9514260

ABSTRACT

Analysis of the conserved sequence features of inteins (protein "introns") reveals that they are composed of three distinct modular domains. The N-terminal (N) and C-terminal (C) domains are predicted to perform different parts of the autocatalytic protein splicing reaction. An optional endonuclease domain (EN) is shown to correspond to different types of homing endonucleases in different inteins. The N domain contains motifs predicted to catalyze the first steps of protein splicing, leading to the cleavage of the intein N terminus from its protein host. Intein N domain motifs are also found in C-terminal autocatalytic domains (CADs) present in hedgehog and other protein families. Specific residues in the N domain of intein and CADs are proposed to form a charge relay system involved in cleaving their N-termini. The intein C domain is apparently unique to inteins and contains motifs that catalyze the final protein splicing steps: ligation of the intein flanks and cleavage of its C terminus to release the free intein and spliced host protein. All intein EN domains known thus far have dodecapeptide (DOD, LAGLI-DADG) type homing endonuclease motifs. This work identifies an EN domain with an HNH homing-endonuclease motif and two new small inteins with no EN domains. One of these small inteins might be inactive or a "pseudo intein." The results suggest a modular architecture for inteins, clarify their origin and relationship to other protein families, and extend recent experimental findings on the functional roles of intein N, C, and EN motifs.


Subject(s)
Protein Splicing/physiology , Proteins/chemistry , Amino Acid Sequence , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Binding Sites/physiology , Carboxylic Ester Hydrolases/chemistry , Catalysis , DNA Gyrase , DNA Topoisomerases, Type II , Deoxyribonucleases, Type II Site-Specific/physiology , Models, Molecular , Molecular Sequence Data , Proteins/metabolism , Saccharomyces cerevisiae Proteins , Sequence Alignment
6.
Gene ; 122(1): 129-37, 1992 Dec 01.
Article in English | MEDLINE | ID: mdl-1452019

ABSTRACT

In addition to universally appearing mitochondrial (mt) genes, origins of replication and transcription start regions typical of all mt genome variants of the yeast Saccharomyces cerevisiae, the mt genomes of some of the strains contain variable sequences. These sequences are apparently largely dispensable. They are mainly composed of group-I and -II introns and intergenic open reading frames (ORFs). Many of the introns contain ORFs, some of which were shown by genetic and biochemical means to be involved in splicing and transposition of the mt introns. Some of the optional sequences are hypothesized to be mobile genetic elements. Nucleotide (nt) sequences of the mt genome of S. cerevisiae were examined by analyzing occurrences of oligodeoxyribonucleotide (oligo) 'words'. This linguistic technique had been found to be sensitive to both function and origin of the sequence [Pietrokovski et al., J. Biomol. Struct. Dyn. 7 (1990) 1251-1268]. A clear difference is found between the oligo vocabularies of the optional and basic yeast mt sequences. The difference is mainly located in protein coding segments of the optional sequences which contain conserved amino acid motifs, characteristic of intronic and intergenic ORFs. The use of nt linguistics to detect the sequence dissimilarity and its causes in yeast mitochondria provides fast and straightforward results, identifying the intronic and intergenic ORFs as DNA sequences of foreign, non-mt origin.


Subject(s)
Genome, Fungal , Mitochondria/metabolism , Saccharomyces cerevisiae/genetics , Amino Acid Sequence , Base Sequence , Conserved Sequence , DNA Replication , Genetic Techniques , Introns , Molecular Sequence Data , Nucleotides , Open Reading Frames , Transcription, Genetic
7.
Gene ; 163(2): GC17-26, 1995 Oct 03.
Article in English | MEDLINE | ID: mdl-7590261

ABSTRACT

Protein blocks consist of multiply aligned sequence segments that correspond to the most highly conserved regions of protein families. Typically, a set of related proteins has more than one region in common and their relationship can be represented as a series of ungapped blocks separated by unaligned regions. Blockmaker is an automated system available by electronic mail (blockmaker@howard.fhcrc.org) and the World Wide Web (http://www.blocks.fhcrc.org4) that finds blocks in a group of related protein sequences submitted by the user. It adapts and extends existing algorithms to make them useful to biologists looking for conserved regions in a group of related proteins sequences. Two sets of blocks are returned, one in which candidate blocks are detected using the MOTIF algorithm and the other using a Gibbs sampler algorithm that has been adapted for full automation. This use of two block-finding methods based on completely different principles provides a 'reality check,' whereby a block detected by both methods is considered to be correct. Resulting blocks can be displayed using the information-based 'sequence logo' method, adapted to incorporate sequence weights, which provides an intuitive visual description of both the residue and the conservation information at each position. Blocks generated by this system are useful in diverse applications, such as searching databases and designing degenerate PCR primers. As an example, blocks made from amino acid sequences related to Caenorhabditis elegans Tc1 transposase were used to search GenBank, revealing that several fish and amphibian genomic sequences harbor previously unreported Tc1 homologs.


Subject(s)
Amino Acid Sequence , Computer Graphics , Databases, Factual , Software Design , Transposases , Algorithms , Animals , Caenorhabditis elegans/enzymology , DNA-Binding Proteins/chemistry , Information Storage and Retrieval , Molecular Sequence Data , Nucleotidyltransferases/chemistry , Proteins/chemistry , Sequence Alignment
9.
J Biotechnol ; 35(2-3): 257-72, 1994 Jun 30.
Article in English | MEDLINE | ID: mdl-7765062

ABSTRACT

Nucleotide and amino acid sequences can be analyzed and compared by their oligomer compositions. Such methods are fundamentally different from comparison methods based on sequence alignment. They are analogous to the linguistic analysis of human texts. The methods have a wide range of sensitivity and can identify homologous as well as functionally and taxonomically related sequences. Significant sequence dissimilarity can also be identified enabling detection of foreign DNA sequences in genomes, genetic libraries and databases. The simplicity and speed of linguistic methods make them very suitable for database searching and maintenance and as a preliminary step to more specific and time-consuming analysis methods.


Subject(s)
Linguistics , Sequence Alignment/methods , Amino Acid Sequence , Animals , Base Sequence , Biotechnology , DNA, Fungal/genetics , DNA, Mitochondrial/genetics , DNA, Viral/genetics , Humans , Mice , Molecular Sequence Data , Promoter Regions, Genetic , Rats , Retroviridae/genetics , Saccharomyces cerevisiae/genetics , Sequence Homology, Amino Acid
10.
J Biomol Struct Dyn ; 7(6): 1251-68, 1990 Jun.
Article in English | MEDLINE | ID: mdl-2363847

ABSTRACT

The frequencies of "words", oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence "texts". Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.


Subject(s)
Base Sequence , Animals , Humans , Phylogeny , RNA, Ribosomal/genetics , Sequence Homology, Nucleic Acid
11.
ISME J ; 8(3): 625-635, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24088628

ABSTRACT

Bdellovibrio and like organisms (BALO) are obligate predators of Gram-negative bacteria, belonging to the α- and δ-proteobacteria. BALO prey using either a periplasmic or an epibiotic predatory strategy, but the genetic background underlying these phenotypes is not known. Here we compare the epibiotic Bdellovibrio exovorus and Micavibrio aeruginosavorus to the periplasmic B. bacteriovorus and Bacteriovorax marinus. Electron microscopy showed that M. aeruginosavorus, but not B. exovorus, can attach to prey cells in a non-polar manner through its longitudinal side. Both these predators were resistant to a surprisingly high number of antibiotic compounds, possibly via 26 and 19 antibiotic-resistance genes, respectively, most of them encoding efflux pumps. Comparative genomic analysis of all the BALOs revealed that epibiotic predators have a much smaller genome (ca. 2.5 Mbp) than the periplasmic predators (ca. 3.5 Mbp). Additionally, periplasmic predators have, on average, 888 more proteins, at least 60% more peptidases, and one more rRNA operon. Fifteen and 219 protein families were specific to the epibiotic and the periplasmic predators, respectively, the latter clearly forming the core of the periplasmic 'predatome', which is upregulated during the growth phase. Metabolic deficiencies of epibiotic genomes include the synthesis of inosine, riboflavin, vitamin B6 and the siderophore aerobactin. The phylogeny of the epibiotic predators suggests that they evolved by convergent evolution, with M. aeruginosavorus originating from a non-predatory ancestor while B. exovorus evolved from periplasmic predators by gene loss.


Subject(s)
Bdellovibrio/classification , Bdellovibrio/physiology , Biological Evolution , Gram-Negative Bacteria/physiology , Bacterial Proteins/analysis , Bdellovibrio/cytology , Bdellovibrio/genetics , Genome, Bacterial , Phylogeny , Proteome/analysis
17.
Nucleic Acids Res ; 24(19): 3836-45, 1996 Oct 01.
Article in English | MEDLINE | ID: mdl-8871566

ABSTRACT

A general searching method for comparing multiple sequence alignments was developed to detect sequence relationships between conserved protein regions. Multiple alignments are treated as sequences of amino acid distributions and aligned by comparing pairs of such distributions. Four different comparison measures were tested and the Pearson correlation coefficient chosen. The method is sensitive, detecting weak sequence relationships between protein families. Relationships are detected beyond the range of conventional sequence database searches, illustrating the potential usefulness of the method. The previously undetected relation between flavoprotein subunits of two oxidoreductase families points to the potential active site in one of the families. The similarity between the bacterial RecA, DnaA and Rad51 protein families reveals a region in DnaA and Rad51 proteins likely to bind and unstack single-stranded DNA. Helix--turn--helix DNA binding domains from diverse proteins are readily detected and shown to be similar to each other. Glycosylasparaginase and gamma-glutamyltransferase enzymes are found to be similar in their proteolytic cleavage sites. The method has been fully implemented on the World Wide Web at URL: http://blocks.fhcrc.org/blocks-bin/LAMAvsearch.


Subject(s)
Conserved Sequence , Databases, Factual , Proteins/chemistry , Sequence Homology, Amino Acid , Amino Acid Sequence , Aspartylglucosylaminase/chemistry , Aspartylglucosylaminase/metabolism , Catalysis , DNA Nucleotidyltransferases/chemistry , DNA Nucleotidyltransferases/metabolism , DNA, Single-Stranded/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , Flavin-Adenine Dinucleotide/metabolism , Helix-Turn-Helix Motifs , Hydrolysis , Proteins/metabolism , Transposases , gamma-Glutamyltransferase/chemistry , gamma-Glutamyltransferase/metabolism
18.
Mol Gen Genet ; 254(6): 689-95, 1997 May.
Article in English | MEDLINE | ID: mdl-9202385

ABSTRACT

A helix-turn-helix (HTH) DNA-binding motif is identified in transposase sequences in Tc1, mariner and pogo DNA transposum. The findings are supported by results of various sequence analysis methods. Tc1 transposases are also predicted to contain another DNA-binding region. These findings are in accord with experimental evidence obtained from Tc1A, Tc3A and pogo transposases. The pogo family transposases, but not the pogo-type transcription factors, contain the HTH motif, suggesting that HTH structures are essential for Tc1/mariner/pogo transposition. Analysis of multiple sequence alignments enabled the identification of the HTH motif in distantly related protein sequences.


Subject(s)
DNA Nucleotidyltransferases/metabolism , DNA Transposable Elements , DNA-Binding Proteins/metabolism , Nucleotidyltransferases/metabolism , Amino Acid Sequence , Binding Sites , Conserved Sequence , DNA/metabolism , DNA Nucleotidyltransferases/chemistry , Databases, Factual , Molecular Sequence Data , Protein Conformation , Sequence Homology, Amino Acid , Transposases
19.
J Biol Chem ; 274(40): 28751-61, 1999 Oct 01.
Article in English | MEDLINE | ID: mdl-10497247

ABSTRACT

We describe here the isolation and characterization of a B-type DNA polymerase (PolB) from the archaeon Methanobacterium thermoautotrophicum DeltaH. Uniquely, the catalytic domains of M. thermoautotrophicum PolB are encoded from two different genes, a feature that has not been observed as yet in other polymerases. The two genes were cloned, and the proteins were overexpressed in Escherichia coli and purified individually and as a complex. We demonstrate that both polypeptides are needed to form the active polymerase. Similar to other polymerases constituting the B-type family, PolB possesses both polymerase and 3'-5' exonuclease activities. We found that a homolog of replication protein A from M. thermoautotrophicum inhibits the PolB activity. The inhibition of DNA synthesis by replication protein A from M. thermoautotrophicum can be relieved by the addition of M. thermoautotrophicum homologs of replication factor C and proliferating cell nuclear antigen. The possible roles of PolB in M. thermoautotrophicum replication are discussed.


Subject(s)
DNA Polymerase beta/isolation & purification , Methanobacterium/enzymology , Amino Acid Sequence , Base Sequence , Chromatography, Affinity , Cloning, Molecular , DNA Polymerase beta/genetics , DNA Polymerase beta/metabolism , DNA Replication , DNA-Binding Proteins/metabolism , Electrophoresis, Polyacrylamide Gel , Methanobacterium/genetics , Molecular Sequence Data , Oligonucleotides , Recombinant Proteins/genetics , Recombinant Proteins/isolation & purification , Recombinant Proteins/metabolism
20.
Funct Integr Genomics ; 1(4): 250-5, 2001 Mar.
Article in English | MEDLINE | ID: mdl-11793244

ABSTRACT

The availability of the complete sequence of the Drosophila genome and the assignment of putative reading frames, provides an opportunity to search for new members in families of proteins generating signaling cascades. The six major pathways that dictate patterning were examined: receptor tyrosine kinases, transforming growth factor beta (TGF beta), Wnt, Toll, Hedgehog and Notch. Several new components were identified for the first four pathways, including ligands, receptors, cytoplasmic components and transcription factors. Most notable is the identification of a vascular endothelial growth factor (VEGF) receptor tyrosine kinase, two insulin/insulin growth factor I (IGF I) receptors without cytoplasmic protein kinase domains, and a family of proteins similar to Rhomboid (a protein involved in cleavage of TGF alpha-like ligands). A new TGF beta family ligand, two new Wnts and a Frizzled receptor were also identified. Finally, for the Toll pathway, two new potential Spatzle-like ligands and two new receptors were identified.


Subject(s)
Drosophila melanogaster/genetics , Genome , Signal Transduction , Animals , Humans
SELECTION OF CITATIONS
SEARCH DETAIL