Your browser doesn't support javascript.
loading
Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis.
Goh, Chern-Sing; Lan, Ning; Douglas, Shawn M; Wu, Baolin; Echols, Nathaniel; Smith, Andrew; Milburn, Duncan; Montelione, Gaetano T; Zhao, Hongyu; Gerstein, Mark.
Afiliação
  • Goh CS; Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Ave, New Haven, CT 06520, USA.
J Mol Biol ; 336(1): 115-30, 2004 Feb 06.
Article em En | MEDLINE | ID: mdl-14741208
ABSTRACT
Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a protein's characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized "pipeline schematics". We find that the properties of a protein that are most significant are (i.) whether it is conserved across many organisms; (ii). the percentage composition of charged residues; (iii). the occurrence of hydrophobic patches; (iv). the number of binding partners it has; and (v). its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from http//mining.nesg.org/.
Assuntos
Buscar no Google
Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Conformação Proteica / Proteínas / Genômica Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: J Mol Biol Ano de publicação: 2004 Tipo de documento: Article País de afiliação: Estados Unidos
Buscar no Google
Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Conformação Proteica / Proteínas / Genômica Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: J Mol Biol Ano de publicação: 2004 Tipo de documento: Article País de afiliação: Estados Unidos