Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 45(7): e46, 2017 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-27923999

RESUMEN

Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Dominios Proteicos , Programas Informáticos
2.
Nucleic Acids Res ; 42(Database issue): D485-9, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24319146

RESUMEN

Understanding which are the catalytic residues in an enzyme and what function they perform is crucial to many biology studies, particularly those leading to new therapeutics and enzyme design. The original version of the Catalytic Site Atlas (CSA) (http://www.ebi.ac.uk/thornton-srv/databases/CSA) published in 2004, which catalogs the residues involved in enzyme catalysis in experimentally determined protein structures, had only 177 curated entries and employed a simplistic approach to expanding these annotations to homologous enzyme structures. Here we present a new version of the CSA (CSA 2.0), which greatly expands the number of both curated (968) and automatically annotated catalytic sites in enzyme structures, utilizing a new method for annotation transfer. The curated entries are used, along with the variation in residue type from the sequence comparison, to generate 3D templates of the catalytic sites, which in turn can be used to find catalytic sites in new structures. To ease the transfer of CSA annotations to other resources a new ontology has been developed: the Enzyme Mechanism Ontology, which has permitted the transfer of annotations to Mechanism, Annotation and Classification in Enzymes (MACiE) and UniProt Knowledge Base (UniProtKB) resources. The CSA database schema has been re-designed and both the CSA data and search capabilities are presented in a new modern web interface.


Asunto(s)
Dominio Catalítico , Bases de Datos de Proteínas , Enzimas/química , Ontologías Biológicas , Internet , Análisis de Secuencia de Proteína
3.
Bioinformatics ; 29(23): 3007-13, 2013 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-23995390

RESUMEN

MOTIVATION: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (<33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions. RESULTS: We measured local alignment start/stop boundary accuracy using a set of queries where the correct alignment boundaries were known, and found that 7% of BLASTP and 8% of SSEARCH alignment boundaries were overextended. Overextended alignments include non-homologous sequences; they occur most frequently between sequences that are more closely related (>33% identity). Adjusting the scoring matrix to reflect the identity of the homologous sequence can correct higher identity overextended alignment boundaries. In addition, the scoring matrix that produced a correct alignment could be reliably predicted based on the sequence identity seen in the original BLOSUM62 alignment. Realigning with the predicted scoring matrix corrected 37% of all overextended alignments, resulting in more correct alignments than using BLOSUM62 alone.


Asunto(s)
Biología Computacional/métodos , Posición Específica de Matrices de Puntuación , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Datos de Secuencia Molecular , Homología de Secuencia de Aminoácido
4.
Nucleic Acids Res ; 40(Database issue): D783-9, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22058127

RESUMEN

MACiE (which stands for Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/. This article presents the release of Version 3 of MACiE, which not only extends the dataset to 335 entries, covering 182 of the EC sub-subclasses with a crystal structure available (~90%), but also incorporates greater chemical and structural detail. This version of MACiE represents a shift in emphasis for new entries, from non-homologous representatives covering EC reaction space to enzymes with mechanisms of interest to our users and collaborators with a view to exploring the chemical diversity of life. We present new tools for exploring the data in MACiE and comparing entries as well as new analyses of the data and new searches, many of which can now be accessed via dedicated Perl scripts.


Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Biocatálisis , Fenómenos Bioquímicos , Dominio Catalítico , Coenzimas/química , Enzimas/clasificación , Internet , Anotación de Secuencia Molecular
5.
NAR Genom Bioinform ; 6(2): lqae066, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38863529

RESUMEN

The 'canonical' protein sets distributed by UniProt are widely used for similarity searching, and functional and structural annotation. For many investigators, canonical sequences are the only version of a protein examined. However, higher eukaryotes often encode multiple isoforms of a protein from a single gene. For unreviewed (UniProtKB/TrEMBL) protein sequences, the longest sequence in a Gene-Centric group is chosen as canonical. This choice can create inconsistencies, selecting >95% identical orthologs with dramatically different lengths, which is biologically unlikely. We describe the ortho2tree pipeline, which examines Reference Proteome canonical and isoform sequences from sets of orthologous proteins, builds multiple alignments, constructs gap-distance trees, and identifies low-cost clades of isoforms with similar lengths. After examining 140 000 proteins from eight mammals in UniProtKB release 2022_05, ortho2tree proposed 7804 canonical changes for release 2023_01, while confirming 53 434 canonicals. Gap distributions for isoforms selected by ortho2tree are similar to those in bacterial and yeast alignments, organisms unaffected by isoform selection, suggesting ortho2tree canonicals more accurately reflect genuine biological variation. 82% of ortho2tree proposed-changes agreed with MANE; for confirmed canonicals, 92% agreed with MANE. Ortho2tree can improve canonical assignment among orthologous sequences that are >60% identical, a group that includes vertebrates and higher plants.

6.
Bioinformatics ; 28(12): 1650-1, 2012 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-22539666

RESUMEN

UNLABELLED: Iterative similarity searches with PSI-BLAST position-specific score matrices (PSSMs) find many more homologs than single searches, but PSSMs can be contaminated when homologous alignments are extended into unrelated protein domains-homologous over-extension (HOE). PSI-Search combines an optimal Smith-Waterman local alignment sequence search, using SSEARCH, with the PSI-BLAST profile construction strategy. An optional sequence boundary-masking procedure, which prevents alignments from being extended after they are initially included, can reduce HOE errors in the PSSM profile. Preventing HOE improves selectivity for both PSI-BLAST and PSI-Search, but PSI-Search has ~4-fold better selectivity than PSI-BLAST and similar sensitivity at 50% and 60% family coverage. PSI-Search is also produces 2- for 4-fold fewer false-positives than JackHMMER, but is ~5% less sensitive. AVAILABILITY AND IMPLEMENTATION: PSI-Search is available from the authors as a standalone implementation written in Perl for Linux-compatible platforms. It is also available through a web interface (www.ebi.ac.uk/Tools/sss/psisearch) and SOAP and REST Web Services (www.ebi.ac.uk/Tools/webservices).


Asunto(s)
Secuencias de Aminoácidos , Alineación de Secuencia/métodos , Programas Informáticos , Biología Computacional/métodos , Bases de Datos de Proteínas , Internet , Lenguajes de Programación
7.
Nucleic Acids Res ; 38(7): 2177-89, 2010 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-20064877

RESUMEN

We have characterized a novel type of PSI-BLAST error, homologous over-extension (HOE), using embedded PFAM domain queries on searches against a reference library containing Pfam-annotated UniProt sequences and random synthetic sequences. PSI-BLAST makes two types of errors: alignments to non-homologous regions and HOE alignments that begin in a homologous region, but extend beyond the homology into neighboring sequence regions. When the neighboring sequence region contains a non-homologous domain, PSI-BLAST can incorporate the unrelated sequence into its position specific scoring matrix, which then finds non-homologous proteins with significant expectation values. HOE accounts for the largest fraction of the initial false positive (FP) errors, and the largest fraction of FPs at iteration 5. In searches against complete protein sequences, 5-9% of alignments at iteration 5 are non-homologous. HOE frequently begins in a partial protein domain; when partial domains are removed from the library, HOE errors decrease from 16 to 3% of weighted coverage (hard queries; 35-5% for sampled queries) and no-error searches increase from 2 to 58% weighed coverage (hard; 16-78% sampled). When HOE is reduced by not extending previously found sequences, PSI-BLAST specificity improves 4-8-fold, with little loss in sensitivity.


Asunto(s)
Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Filogenia , Posición Específica de Matrices de Puntuación , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/genética
8.
Front Genet ; 13: 984513, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36482890

RESUMEN

The integration of mitochondrial genome fragments into the nuclear genome is well documented, and the transfer of these mitochondrial nuclear pseudogenes (numts) is thought to be an ongoing evolutionary process. With the increasing number of eukaryotic genomes available, genome-wide distributions of numts are often surveyed. However, inconsistencies in genome quality can reduce the accuracy of numt estimates, and methods used for identification can be complicated by the diverse sizes and ages of numts. Numts have been previously characterized in rodent genomes and it was postulated that they might be more prevalent in a group of voles with rapidly evolving karyotypes. Here, we examine 37 rodent genomes, and an additional 26 vertebrate genomes, while also considering numt detection methods. We identify numts using DNA:DNA and protein:translated-DNA similarity searches and compare numt distributions among rodent and vertebrate taxa to assess whether some groups are more susceptible to transfer. A combination of protein sequence comparisons (protein:translated-DNA) and BLASTN genomic DNA searches detect 50% more numts than genomic DNA:DNA searches alone. In addition, higher-quality RefSeq genomes produce lower estimates of numts than GenBank genomes, suggesting that lower quality genome assemblies can overestimate numts abundance. Phylogenetic analysis shows that mitochondrial transfers are not associated with karyotypic diversity among rodents. Surprisingly, we did not find a strong correlation between numt counts and genome size. Estimates using DNA: DNA analyses can underestimate the amount of mitochondrial DNA that is transferred to the nucleus.

9.
Bioinformatics ; 26(3): 310-8, 2010 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-19948773

RESUMEN

MOTIVATION: To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four- and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models. RESULTS: While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e.g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four- or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overrepresented words are not enriched in conserved regions of proteins, but four-letter words are enriched 1.18- to 1.56-fold in alpha-helical secondary structures (but not beta-strands). Five-residue consensus exceptional words are enriched for alpha-helix 1.43- to 1.61-fold. Protein word preferences in regular secondary structure do not appear to significantly restrict the use of sequence words in unrelated proteins, although the consensus exceptional words have a secondary structure bias for alpha-helix. Globally, words in protein sequences appear to be under very few constraints; for the most part, they appear to be random. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuencia de Aminoácidos , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Bases de Datos de Proteínas , Pliegue de Proteína , Estructura Secundaria de Proteína
10.
Bioinformatics ; 26(18): 2361-2, 2010 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-20693322

RESUMEN

UNLABELLED: RefProtDom provides a set of divergent query domains, originally selected from Pfam, and full-length proteins containing their homologous domains, with diverse architectures, for evaluating pair-wise and iterative sequence similarity searches. Pfam homology and domain boundary annotations in the target library were supplemented using local and semi-global searches, PSI-BLAST searches, and SCOP and CATH classifications. AVAILABILITY: RefProtDom is available from http://faculty.virginia.edu/wrpearson/fasta/PUBS/gonzalez09a.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas , Programas Informáticos
11.
BMC Bioinformatics ; 11: 146, 2010 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-20307279

RESUMEN

BACKGROUND: While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. RESULTS: We compared near-optimal protein sequence alignments produced by the Zuker algorithm and a set of probabilistic alignments produced by the probA program with structural alignments produced by four different structure alignment algorithms. There is significant overlap between the solution spaces of structural alignments and both the near-optimal sequence alignments produced by commonly used scoring parameters for sequences that share significant sequence similarity (E-values < 10-5) and the ensemble of probA alignments. We constructed a logistic regression model incorporating three input variables derived from sets of near-optimal alignments: robustness, edge frequency, and maximum bits-per-position. A ROC analysis shows that this model more accurately classifies amino acid pairs (edges in the alignment path graph) according to the likelihood of appearance in structural alignments than the robustness score alone. We investigated various trimming protocols for removing incorrect edges from the optimal sequence alignment; the most effective protocol is to remove matches from the semi-global optimal alignment that are outside the boundaries of the local alignment, although trimming according to the model-generated probabilities achieves a similar level of improvement. The model can also be used to generate novel alignments by using the probabilities in lieu of a scoring matrix. These alignments are typically better than the optimal sequence alignment, and include novel correct structural edges. We find that the probA alignments sample a larger variety of alignments than the Zuker set, which more frequently results in alignments that are closer to the structural alignments, but that using the probA alignments as input to the regression model does not increase performance. CONCLUSIONS: The pool of suboptimal pairwise protein sequence alignments substantially overlaps structure-based alignments for pairs with statistically significant similarity, and a regression model based on information contained in this alignment pool improves the accuracy of pairwise alignments with respect to structure-based alignments.


Asunto(s)
Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína
12.
Nature ; 431(7012): 1107-12, 2004 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-15510150

RESUMEN

Cryptosporidium species cause acute gastroenteritis and diarrhoea worldwide. They are members of the Apicomplexa--protozoan pathogens that invade host cells by using a specialized apical complex and are usually transmitted by an invertebrate vector or intermediate host. In contrast to other Apicomplexans, Cryptosporidium is transmitted by ingestion of oocysts and completes its life cycle in a single host. No therapy is available, and control focuses on eliminating oocysts in water supplies. Two species, C. hominis and C. parvum, which differ in host range, genotype and pathogenicity, are most relevant to humans. C. hominis is restricted to humans, whereas C. parvum also infects other mammals. Here we describe the eight-chromosome approximately 9.2-million-base genome of C. hominis. The complement of C. hominis protein-coding genes shows a striking concordance with the requirements imposed by the environmental niches the parasite inhabits. Energy metabolism is largely from glycolysis. Both aerobic and anaerobic metabolisms are available, the former requiring an alternative electron transport system in a simplified mitochondrion. Biosynthesis capabilities are limited, explaining an extensive array of transporters. Evidence of an apicoplast is absent, but genes associated with apical complex organelles are present. C. hominis and C. parvum exhibit very similar gene complements, and phenotypic differences between these parasites must be due to subtle sequence divergence.


Asunto(s)
Cryptosporidium/genética , Genoma de Protozoos , Animales , Cromosomas/genética , Cryptosporidium/clasificación , Cryptosporidium/enzimología , Cryptosporidium/metabolismo , Cryptosporidium parvum/genética , Enzimas/genética , Evolución Molecular , Genes Protozoarios/genética , Genómica , Humanos , Fenotipo , Proteínas Protozoarias/genética
13.
PLoS One ; 14(11): e0224288, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31738797

RESUMEN

Bioinformatics, a discipline that combines aspects of biology, statistics, mathematics, and computer science, is becoming increasingly important for biological research. However, bioinformatics instruction is not yet generally integrated into undergraduate life sciences curricula. To understand why we studied how bioinformatics is being included in biology education in the US by conducting a nationwide survey of faculty at two- and four-year institutions. The survey asked several open-ended questions that probed barriers to integration, the answers to which were analyzed using a mixed-methods approach. The barrier most frequently reported by the 1,260 respondents was lack of faculty expertise/training, but other deterrents-lack of student interest, overly-full curricula, and lack of student preparation-were also common. Interestingly, the barriers faculty face depended strongly on whether they are members of an underrepresented group and on the Carnegie Classification of their home institution. We were surprised to discover that the cohort of faculty who were awarded their terminal degree most recently reported the most preparation in bioinformatics but teach it at the lowest rate.


Asunto(s)
Biología/educación , Biología Computacional/educación , Curriculum , Docentes/estadística & datos numéricos , Femenino , Humanos , Masculino , Motivación , Estudiantes/psicología , Encuestas y Cuestionarios/estadística & datos numéricos , Estados Unidos
14.
Curr Opin Struct Biol ; 15(3): 254-60, 2005 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15919194

RESUMEN

Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized.


Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Proteínas/análisis , Proteínas/clasificación , Alineación de Secuencia/tendencias , Análisis de Secuencia de Proteína/tendencias , Homología de Secuencia de Aminoácido , Relación Estructura-Actividad
15.
Curr Protoc Bioinformatics ; 59: 9.4.1-9.4.22, 2017 09 13.
Artículo en Inglés | MEDLINE | ID: mdl-28902397

RESUMEN

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Escherichia coli/genética , Evolución Molecular , Proteínas/química , Proteínas/genética , Alineación de Secuencia , Homología de Secuencia de Aminoácido
16.
Nucleic Acids Res ; 31(13): 3859-61, 2003 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-12824437

RESUMEN

The CRP (Cleavage of Radiolabeled Phosphoproteins) program guides the design and interpretation of experiments to identify protein phosphorylation sites by Edman sequencing of unseparated peptides. Traditionally, phosphorylation sites are determined by cleaving the phosphoprotein and separating the peptides for Edman 32P-phosphate release sequencing. CRP analysis of a phosphoprotein's sequence accelerates this process by omitting the separation step: given a protein sequence of interest, the CRP program performs an in silico proteolytic cleavage of the sequence and reports the predicted Edman cycles in which radioactivity would be observed if a given serine, threonine or tyrosine were phosphorylated. Experimentally observed cycles containing 32P can be compared with CRP predictions to confirm candidate sites and/or explore the ability of additional cleavage experiments to resolve remaining ambiguities. To reduce ambiguity, the phosphorylated residue (P-Tyr, P-Ser or P-Thr) can be determined experimentally, and CRP will ignore sites with alternative residues. CRP also provides simple predictions of likely phosphorylation sites using known kinase recognition motifs. The CRP interface is available at http://fasta.bioch.virginia.edu/crp.


Asunto(s)
Fosfoproteínas/química , Fosfoproteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Humanos , Internet , Fosforilación , Fosfoserina/análisis , Fosfotreonina/análisis , Fosfotirosina/análisis , Trazadores Radiactivos
17.
Curr Protoc Bioinformatics ; 53: 3.9.1-3.9.25, 2016 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-27010337

RESUMEN

The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons.


Asunto(s)
Nucleótidos/química , Proteínas/química , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Homología de Secuencia de Ácido Nucleico , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas
18.
Methods Enzymol ; 401: 186-204, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16399387

RESUMEN

The best known glutathione transferase family, with its class-alpha, -mu, -pi, -omega, -sigma, -theta, and -zeta subdivisions, is only one of four, or perhaps five, ancient protein families that conjugate glutathione or use a glutathione intermediate: (1) the cytoplasmic family, (2) the mitochondrial (kappa) family, (3) the microsomal (MAPEG) family, which may actually be two separate families, and (4) the fosphomycin/glyoxalase family. Although the cytoplasmic family is perhaps the most diverse, all four of these families have homologs in both prokaryotes and eukaryotes; it is striking that at least three, and perhaps as many as five, different protein folds capable of binding and positioning glutathione for a nucleophilic attack emerged more than 2 billion years ago. This chapter presents phylogenies for the four (or five) glutathione transferase families, focusing on the statistical evidence for homology (and non-homology).


Asunto(s)
Evolución Molecular , Glutatión Transferasa/clasificación , Isoenzimas/clasificación , Animales , Citoplasma/enzimología , Glutatión Transferasa/química , Glutatión Transferasa/genética , Humanos , Isoenzimas/química , Isoenzimas/genética , Microsomas/enzimología , Familia de Multigenes , Filogenia , Tiorredoxinas/clasificación , Tiorredoxinas/genética
19.
Methods Enzymol ; 401: 1-8, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16399376

RESUMEN

The nomenclature for human soluble glutathione transferases (GSTs) is extended to include new members of the GST superfamily that have been discovered, sequenced, and shown to be expressed. The GST nomenclature is based on primary structure similarities and the division of GSTs into classes of more closely related sequences. The classes are designated by the names of the Greek letters: Alpha, Mu, Pi, etc., abbreviated in Roman capitals: A, M, P, and so on. (The Greek characters should not be used.) Class members are distinguished by Arabic numerals and the native dimeric protein structures are named according to their subunit composition (e.g., GST A1-2 is the enzyme composed of subunits 1 and 2 in the Alpha class). Soluble GSTs from other mammalian species can be classified in the same manner as the human enzymes, and this chapter presents the application of the nomenclature to the rat and mouse GSTs.


Asunto(s)
Glutatión Transferasa/clasificación , Isoenzimas/clasificación , Terminología como Asunto , Animales , Humanos , Datos de Secuencia Molecular
20.
Curr Protoc Bioinformatics ; 51: 4.12.1-4.12.8, 2015 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-26334923

RESUMEN

The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteínas/metabolismo , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Minería de Datos/métodos , Datos de Secuencia Molecular , Relación Estructura-Actividad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA