RESUMO
Data management has emerged as one of the central issues in the high-throughput processes of taking a protein target sequence through to a protein sample. To simplify this task, and following extensive consultation with the international structural genomics community, we describe here a model of the data related to protein production. The model is suitable for both large and small facilities for use in tracking samples, experiments, and results through the many procedures involved. The model is described in Unified Modeling Language (UML). In addition, we present relational database schemas derived from the UML. These relational schemas are already in use in a number of data management projects.
Assuntos
Genômica/métodos , Engenharia de Proteínas/métodos , Proteínas/química , Proteômica/métodos , Algoritmos , Sequência de Aminoácidos , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Internet , Modelos Biológicos , Linguagens de Programação , Pesquisa , Software , Design de Software , Biologia de Sistemas , Unified Medical Language SystemRESUMO
To use crystallography for the determination of the three-dimensional structures of proteins, protein crystals need to be grown. Automated imaging systems are increasingly being used to monitor these crystallization experiments. These present problems of accessibility to the data, repeatability of any image analysis performed and the amount of storage required. Various image formats and techniques can be combined to provide effective solutions to high volume processing problems such as these, however lack of widespread support for the most effective algorithms, such as JPeg2000 which yielded a 64% improvement in file size over the bitmap, currently inhibits the immediate take up of this approach.