Thousands of protein linear motif classes may still be undiscovered.

Bulavka, Denys; Aptekmann, Ariel A; Méndez, Nicolás A; Krick, Teresa; Sánchez, Ignacio E

Bulavka, Denys; Aptekmann, Ariel A; Méndez, Nicolás A; Krick, Teresa; Sánchez, Ignacio E.

Afiliação

Bulavka D; Laboratorio de Fisiología de Proteínas, Facultad de Ciencias Exactas y Naturales, Consejo Nacional de lnvestigaciones Cientificas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos Aires, Argentina.
Aptekmann AA; Departamento de Matematica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.
Méndez NA; Laboratorio de Fisiología de Proteínas, Facultad de Ciencias Exactas y Naturales, Consejo Nacional de lnvestigaciones Cientificas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos Aires, Argentina.
Krick T; Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, United States of America.
Sánchez IE; Laboratorio de Fisiología de Proteínas, Facultad de Ciencias Exactas y Naturales, Consejo Nacional de lnvestigaciones Cientificas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos Aires, Argentina.

PLoS One ; 16(5): e0248841, 2021.

Article em En | MEDLINE | ID: mdl-33939703

RESUMO

Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered.

Assuntos

Motivos de Aminoácidos; Análise de Sequência de Proteína/métodos; Humanos; Sensibilidade e Especificidade; Análise de Sequência de Proteína/normas

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise de Sequência de Proteína / Motivos de Aminoácidos Tipo de estudo: Diagnostic_studies Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google