Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative.

Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras

Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras.

Afiliação

Khafizov K; Department of Systems and Computational Biology, Department of Biochemistry, New York Structural Genomics Research Consortium, Immune Function Network, and Department of Physiology and Biophysics, Albert Einstein College of Medicine, Bronx, NY 10461.

Proc Natl Acad Sci U S A ; 111(10): 3733-8, 2014 Mar 11.

Article em En | MEDLINE | ID: mdl-24567391

RESUMO

The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed â¼50% of this new structural coverage, despite determining only â¼10% of all new structures. Based on current trends, it is expected that â¼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.

Assuntos

Bases de Dados de Proteínas; Anotação de Sequência Molecular/tendências; Proteínas/química; Proteômica/tendências; Biologia Computacional; Anotação de Sequência Molecular/métodos; Especificidade da Espécie

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas / Bases de Dados de Proteínas / Proteômica / Anotação de Sequência Molecular Idioma: En Ano de publicação: 2014 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google