RESUMO
PREX (http://www.csb.wfu.edu/prex/) is a database of currently 3516 peroxiredoxin (Prx or PRDX) protein sequences unambiguously classified into one of six distinct subfamilies. Peroxiredoxins are a diverse and ubiquitous family of highly expressed, cysteine-dependent peroxidases that are important for antioxidant defense and for the regulation of cell signaling pathways in eukaryotes. Subfamily members were identified using the Deacon Active Site Profiler (DASP) bioinformatics tool to focus in on functionally relevant sequence fragments surrounding key residues required for protein activity. Searches of this database can be conducted by protein annotation, accession number, PDB ID, organism name or protein sequence. Output includes the subfamily to which each classified Prx belongs, accession and GI numbers, genus and species and the functional site signature used for classification. The query sequence is also presented aligned with a select group of Prxs for manual evaluation and interpretation by the user. A synopsis of the characteristics of members of each subfamily is also provided along with pertinent references.
Assuntos
Bases de Dados de Proteínas , Peroxirredoxinas/classificação , Peroxirredoxinas/química , Interface Usuário-ComputadorRESUMO
Peroxiredoxins (Prxs) are a widespread and highly expressed family of cysteine-based peroxidases that react very rapidly with H2O2, organic peroxides, and peroxynitrite. Correct subfamily classification has been problematic because Prx subfamilies are frequently not correlated with phylogenetic distribution and diverge in their preferred reductant, oligomerization state, and tendency toward overoxidation. We have developed a method that uses the Deacon Active Site Profiler (DASP) tool to extract functional-site profiles from structurally characterized proteins to computationally define subfamilies and to identify new Prx subfamily members from GenBank(nr). For the 58 literature-defined Prx test proteins, 57 were correctly assigned, and none were assigned to the incorrect subfamily. The >3500 putative Prx sequences identified were then used to analyze residue conservation in the active site of each Prx subfamily. Our results indicate that the existence and location of the resolving cysteine vary in some subfamilies (e.g., Prx5) to a greater degree than previously appreciated and that interactions at the A interface (common to Prx5, Tpx, and higher order AhpC/Prx1 structures) are important for stabilization of the correct active-site geometry. Interestingly, this method also allows us to further divide the AhpC/Prx1 into four groups that are correlated with functional characteristics. The DASP method provides more accurate subfamily classification than PSI-BLAST for members of the Prx family and can now readily be applied to other large protein families.
Assuntos
Peroxirredoxinas/química , Sequência de Aminoácidos , Domínio Catalítico , Entropia , Modelos Moleculares , Dados de Sequência Molecular , Filogenia , Homologia de Sequência de AminoácidosRESUMO
Cysteine sulfenic acid (Cys-SOH), a reversible modification, is a catalytic intermediate at enzyme active sites, a sensor for oxidative stress, a regulator of some transcription factors, and a redox-signaling intermediate. This post-translational modification is not random: specific features near the cysteine control its reactivity. To identify features responsible for the propensity of cysteines to be modified to sulfenic acid, a list of 47 proteins (containing 49 known Cys-SOH sites) was compiled. Modifiable cysteines are found in proteins from most structural classes and many functional classes, but have no propensity for any one type of protein secondary structure. To identify features affecting cysteine reactivity, these sites were analyzed using both functional site profiling and electrostatic analysis. Overall, the solvent exposure of modifiable cysteines is not different from the average cysteine. The combined sequence, structure, and electrostatic approaches reveal mechanistic determinants not obvious from overall sequence comparison, including: (1) pKaS of some modifiable cysteines are affected by backbone features only; (2) charged residues are underrepresented in the structure near modifiable sites; (3) threonine and other polar residues can exert a large influence on the cysteine pKa; and (4) hydrogen bonding patterns are suggested to be important. This compilation of Cys-SOH modification sites and their features provides a quantitative assessment of previous observations and a basis for further analysis and prediction of these sites. Agreement with known experimental data indicates the utility of this combined approach for identifying mechanistic determinants at protein functional sites.
Assuntos
Cisteína/análogos & derivados , Cisteína/química , Proteínas/química , Ácidos Sulfênicos/química , Sequência de Aminoácidos , Sítios de Ligação , Catálise , Cisteína/metabolismo , Ligação de Hidrogênio , Proteínas/metabolismo , Alinhamento de Sequência , Eletricidade Estática , Ácidos Sulfênicos/metabolismoRESUMO
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
Assuntos
Bases de Dados de Proteínas , Glutationa Transferase/química , Glutationa Transferase/genética , Fosfopiruvato Hidratase/química , Fosfopiruvato Hidratase/genética , Análise de Sequência de Proteína/métodosRESUMO
Eglin c is a small protease inhibitor whose structural and thermodynamic properties have been well studied. Previous thermodynamic measurements on mutants at solvent-accessible positions in the protein's helix have shown the unexpected result that the data could be best fit by the inclusion of residue- and position-specific parameters to the model. To explore the origins of this surprising result, long molecular dynamics simulations in explicit solvent have been performed. These simulations indicate specific long-range interactions between the solvent-exposed residues in the eglin c alpha-helix and binding loop, an unexpected observation for such a small protein. The residues involved in the interaction are on opposite sides of the protein, about 25 A apart. Simulations of alanine substitutions at the solvent-exposed helix positions, arginine 22, glutamic acid 23, threonine 26, and leucine 27, show both small and large perturbations of eglin c dynamics. Two mutations exhibit large impacts on the long-range helix-loop interactions. Previous stability measurements (Yi et al., Biochemistry 2003;42:7594-7603) had indicated that an alanine substitution at position 27 was less stabilizing than at other solvent-exposed positions in the helix. The L27A mutation effects observed in these simulations suggest that the position-dependent loss of stability measured in wet bench experiments is derived from changes in dynamics that involve long-range interactions; thus, these simulations support the hypothesis that solvent-exposed positions in helices are not always equivalent.
Assuntos
Proteínas/química , Biologia Computacional , Simulação por Computador , Modelos Moleculares , Mutação/genética , Ressonância Magnética Nuclear Biomolecular , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/genéticaRESUMO
A major pharmaceutical problem is designing diverse and selective lead compounds. The human genome sequence provides opportunities to discover compounds that are protein selective if we can develop methods to identify specificity determinants from sequence alone. We have analyzed sequence and structural diversity of sheep COX-1 and mouse COX-2 proteins by Active Site Profiling (ASP). Eleven residues that should serve as specificity determinants between COX-1 and COX-2 were identified; however, the literature suggests that only one has been utilized in structure-based discovery. ASP was used to create a position-specific scoring matrix, which was used to identify possible cross-reacting proteins from the human sequences. This method proved selective for cyclooxygenases, comparing well with results using BLAST. The methods identify a probable misannotation of a cyclooxygenase in which there is high sequence similarity scores using BLAST, but ASP shows it does not contain the residues necessary for cyclooxygenase function. ASP Analysis of human COX proteins suggests that some specificity determinants that distinguish COX-1 and COX-2 proteins are similar between sheep COX-1/mouse COX-2 and human COX-1/COX2; however, residue identities at those positions are not necessarily conserved. Our results lay groundwork for development of family-specific pattern recognition methods to selectively match compounds with proteins.
Assuntos
Ciclo-Oxigenase 1/química , Ciclo-Oxigenase 1/genética , Ciclo-Oxigenase 2/química , Ciclo-Oxigenase 2/genética , Proteínas de Membrana/química , Proteínas de Membrana/genética , Sequência de Aminoácidos , Animais , Sítios de Ligação/fisiologia , Ciclo-Oxigenase 1/metabolismo , Ciclo-Oxigenase 2/metabolismo , Inibidores de Ciclo-Oxigenase 2/química , Inibidores de Ciclo-Oxigenase 2/metabolismo , Inibidores de Ciclo-Oxigenase/química , Inibidores de Ciclo-Oxigenase/metabolismo , Humanos , Proteínas de Membrana/metabolismo , Camundongos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos , Carneiro DomésticoRESUMO
The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods.