Pesquisa | Portal Regional da BVS

Comprehensive reanalysis of transcription factor knockout expression data in Saccharomyces cerevisiae reveals many new targets.

Reimand, Jüri; Vaquerizas, Juan M; Todd, Annabel E; Vilo, Jaak; Luscombe, Nicholas M.

Nucleic Acids Res ; 38(14): 4768-77, 2010 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-20385592

RESUMO

Transcription factor (TF) perturbation experiments give valuable insights into gene regulation. Genome-scale evidence from microarray measurements may be used to identify regulatory interactions between TFs and targets. Recently, Hu and colleagues published a comprehensive study covering 269 TF knockout mutants for the yeast Saccharomyces cerevisiae. However, the information that can be extracted from this valuable dataset is limited by the method employed to process the microarray data. Here, we present a reanalysis of the original data using improved statistical techniques freely available from the BioConductor project. We identify over 100,000 differentially expressed genes-nine times the total reported by Hu et al. We validate the biological significance of these genes by assessing their functions, the occurrence of upstream TF-binding sites, and the prevalence of protein-protein interactions. The reanalysed dataset outperforms the original across all measures, indicating that we have uncovered a vastly expanded list of relevant targets. In summary, this work presents a high-quality reanalysis that maximizes the information contained in the Hu et al. compendium. The dataset is available from ArrayExpress (accession: E-MTAB-109) and it will be invaluable to any scientist interested in the yeast transcriptional regulatory system.

Assuntos

Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação , Interpretação Estatística de Dados , Regulação para Baixo , Técnicas de Inativação de Genes , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Fatores de Transcrição/genética

The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space.

Cuff, Alison; Redfern, Oliver C; Greene, Lesley; Sillitoe, Ian; Lewis, Tony; Dibley, Mark; Reid, Adam; Pearl, Frances; Dallman, Tim; Todd, Annabel; Garratt, Richard; Thornton, Janet; Orengo, Christine.

Structure ; 17(8): 1051-62, 2009 Aug 12.

Artigo em Inglês | MEDLINE | ID: mdl-19679085

RESUMO

This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., alphabeta-motifs, alpha-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Biologia Computacional/métodos , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Proteínas/classificação

Progress of structural genomics initiatives: an analysis of solved target structures.

Todd, Annabel E; Marsden, Russell L; Thornton, Janet M; Orengo, Christine A.

J Mol Biol ; 348(5): 1235-60, 2005 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-15854658

RESUMO

The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.

Assuntos

Biologia Computacional/tendências , Bases de Dados de Proteínas , Genômica/métodos , Conformação Proteica , Animais , Genoma , Humanos , Análise de Sequência de Proteína , Homologia Estrutural de Proteína

The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.

Pearl, Frances; Todd, Annabel; Sillitoe, Ian; Dibley, Mark; Redfern, Oliver; Lewis, Tony; Bennett, Christopher; Marsden, Russell; Grant, Alistair; Lee, David; Akpor, Adrian; Maibaum, Michael; Harrison, Andrew; Dallman, Timothy; Reeves, Gabrielle; Diboun, Ilhem; Addou, Sarah; Lise, Stefano; Johnston, Caroline; Sillero, Antonio; Thornton, Janet; Orengo, Christine.

Nucleic Acids Res ; 33(Database issue): D247-51, 2005 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-15608188

RESUMO

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43,229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616,470 domain sequences classified into 23,876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.

Assuntos

Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genômica , Estrutura Terciária de Proteína , Proteínas/classificação , Análise de Sequência de Proteína , Bases de Dados de Proteínas/estatística & dados numéricos , Internet , Proteínas/genética , Homologia de Sequência de Aminoácidos , Integração de Sistemas , Interface Usuário-Computador

Target selection and determination of function in structural genomics.

Watson, James D; Todd, Annabel E; Bray, James; Laskowski, Roman A; Edwards, Aled; Joachimiak, Andrzej; Orengo, Christine A; Thornton, Janet M.

IUBMB Life ; 55(4-5): 249-55, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12880206

RESUMO

The first crucial step in any structural genomics project is the selection and prioritization of target proteins for structure determination. There may be a number of selection criteria to be satisfied, including that the proteins have novel folds, that they be representatives of large families for which no structure is known, and so on. The better the selection at this stage, the greater is the value of the structures obtained at the end of the experimental process. This value can be further enhanced once the protein structures have been solved if the functions of the given proteins can also be determined. Here we describe the methods used at either end of the experimental process: firstly, sensitive sequence comparison techniques for selecting a high-quality list of target proteins, and secondly the various computational methods that can be applied to the eventual 3D structures to determine the most likely biochemical function of the proteins in question.

Assuntos

Genômica/métodos , Proteínas/química , Proteínas/fisiologia , Motivos de Aminoácidos , Animais , Sítios de Ligação , Análise por Conglomerados , Biologia Computacional/métodos , Bases de Dados de Proteínas , Filogenia , Estrutura Terciária de Proteína , Proteínas/classificação , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade

Inferring protein function from structure.

Bartlett, Gail J; Todd, Annabel E; Thornton, Janet M.

Methods Biochem Anal ; 44: 387-407, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12647396

Assuntos

Proteínas/química , Proteínas/fisiologia , Sequência de Aminoácidos , Biologia Computacional , Simulação por Computador , Bases de Dados de Proteínas , Enzimas/química , Enzimas/genética , Enzimas/fisiologia , Ligantes , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Molecular , Conformação Proteica , Dobramento de Proteína , Proteínas/genética , Proteômica , Alinhamento de Sequência , Software

Sequence and structural differences between enzyme and nonenzyme homologs.

Todd, Annabel E; Orengo, Christine A; Thornton, Janet M.

Structure ; 10(10): 1435-51, 2002 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-12377129

RESUMO

To improve our understanding of the evolution of novel functions, we performed a sequence, structural, and functional analysis of homologous enzymes and nonenzymes of known three-dimensional structure. In most examples identified, the nonenzyme is derived from an ancestral catalytic precursor (as opposed to the reverse evolutionary scenario, nonenzyme to enzyme), and the active site pocket has been disrupted in some way, owing to the substitution of critical catalytic residues and/or steric interactions that impede substrate binding and catalysis. Pairwise sequence identity is typically insignificant, and almost one-half of the enzyme and nonenzyme pairs do not share any similarity in function. Heterooligomeric enzymes comprising homologous subunits in which one chain is catalytically inactive and enzyme polypeptides that contain internal catalytic and noncatalytic duplications of an ancient enzyme domain are also discussed.

Assuntos

Enzimas/química , Biopolímeros , Domínio Catalítico , Enzimas/genética , Evolução Molecular , Modelos Moleculares , Conformação Proteica

Plasticity of enzyme active sites.

Todd, Annabel E; Orengo, Christine A; Thornton, Janet M.

Trends Biochem Sci ; 27(8): 419-26, 2002 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-12151227

RESUMO

The expectation is that any similarity in reaction chemistry shared by enzyme homologues is mediated by common functional groups conserved through evolution. However, detailed enzyme studies have revealed the flexibility of many active sites, in that different functional groups, unconserved with respect to position in the primary sequence, mediate the same mechanistic role. Nevertheless, the catalytic atoms might be spatially equivalent. More rarely, the active sites have completely different locations in the protein scaffold. This variability could result from: (1) the hopping of functional groups from one position to another to optimize catalysis; (2) the independent specialization of a low-activity primordial enzyme in different phylogenetic lineages; (3) functional convergence after evolutionary divergence; or (4) circular permutation events.

Assuntos

Domínio Catalítico , Enzimas/química , Enzimas/metabolismo , Sequência de Aminoácidos , Sequência Conservada , Enzimas/genética , Evolução Molecular , Conformação Proteica , Relação Estrutura-Atividade

The CATH protein family database: a resource for structural and functional annotation of genomes.

Orengo, Christine A; Bray, James E; Buchan, Daniel W A; Harrison, Andrew; Lee, David; Pearl, Frances M G; Sillitoe, Ian; Todd, Annabel E; Thornton, Janet M.

Proteomics ; 2(1): 11-21, 2002 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-11788987

RESUMO

Over the last decade, there have been huge increases in the numbers of protein sequences and structures determined. In parallel, many methods have been developed for recognising similarities between these proteins, arising from their common evolutionary background, and for clustering such relatives into protein families. Here we review some of the protein family resources available to the biologist and describe how these can be used to provide structural and functional annotations for newly determined sequences. In particular we describe recent developments to the CATH domain database of protein structural families which have facilitated genome annotation and which have also revealed important caveats that must be considered when transferring functional data between homologous proteins.

Assuntos

Bases de Dados de Proteínas , Genoma , Conformação Proteica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA