Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
J Mol Graph Model ; 132: 108818, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39025021

RESUMEN

Specific amino acid (AA) binding by aminoacyl-tRNA synthetases (aaRSs) is necessary for correct translation of the genetic code. Sequence and structure analyses have revealed the main specificity determinants and allowed a partitioning of aaRSs into two classes and several subclasses. However, the information contributed by each determinant has not been precisely quantified, and other, minor determinants may still be unidentified. Growth of genomic data and development of machine learning classification methods allow us to revisit these questions. This work considered the subclass IIb, formed by the three enzymes aspartyl-, asparaginyl-, and lysyl-tRNA synthetase (LysRS). Over 35,000 sequences from the Pfam database were considered, and used to train a machine-learning model based on ensembles of decision trees. The model was trained to reproduce the existing classification of each sequence as AspRS, AsnRS, or LysRS, and to identify which sequence positions were most important for the classification. A few positions (5-8 depending on the AA substrate) sufficed for accurate classification. Most but not all of them were well-known specificity determinants. The machine learning models thus identified sets of mutations that distinguish the three subclass members, which might be targeted in engineering efforts to alter or swap the AA specificities for biotechnology applications.


Asunto(s)
Aminoacil-ARNt Sintetasas , Aprendizaje Automático , Especificidad por Sustrato , Aminoacil-ARNt Sintetasas/química , Aminoacil-ARNt Sintetasas/genética , Aminoacil-ARNt Sintetasas/metabolismo , Modelos Moleculares , Secuencia de Aminoácidos
2.
Protein Sci ; 33(4): e4918, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38501429

RESUMEN

Protein kinases are key actors of signaling networks and important drug targets. They cycle between active and inactive conformations, distinguished by a few elements within the catalytic domain. One is the activation loop, whose conserved DFG motif can occupy DFG-in, DFG-out, and some rarer conformations. Annotation and classification of the structural kinome are important, as different conformations can be targeted by different inhibitors and activators. Valuable resources exist; however, large-scale applications will benefit from increased automation and interpretability of structural annotation. Interpretable machine learning models are described for this purpose, based on ensembles of decision trees. To train them, a set of catalytic domain sequences and structures was collected, somewhat larger and more diverse than existing resources. The structures were clustered based on the DFG conformation and manually annotated. They were then used as training input. Two main models were constructed, which distinguished active/inactive and in/out/other DFG conformations. They considered initially 1692 structural variables, spanning the whole catalytic domain, then identified ("learned") a small subset that sufficed for accurate classification. The first model correctly labeled all but 3 of 3289 structures as active or inactive, while the second assigned the correct DFG label to all but 17 of 8826 structures. The most potent classifying variables were all related to well-known structural elements in or near the activation loop and their ranking gives insights into the conformational preferences. The models were used to automatically annotate 3850 kinase structures predicted recently with the Alphafold2 tool, showing that Alphafold2 reproduced the active/inactive but not the DFG-in proportions seen in the Protein Data Bank. We expect the models will be useful for understanding and engineering kinases.


Asunto(s)
Inhibidores de Proteínas Quinasas , Proteínas Quinasas , Modelos Moleculares , Inhibidores de Proteínas Quinasas/química , Conformación Proteica , Proteínas Quinasas/química , Aprendizaje Automático
3.
Nucleic Acids Res ; 51(3): 1229-1244, 2023 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-36651276

RESUMEN

An increasing number of studies emphasize the role of non-coding variants in the development of hereditary diseases. However, the interpretation of such variants in clinical genetic testing still remains a critical challenge due to poor knowledge of their pathogenicity mechanisms. It was previously shown that variants in 5'-untranslated regions (5'UTRs) can lead to hereditary diseases due to disruption of upstream open reading frames (uORFs). Here, we performed a manual annotation of upstream translation initiation sites (TISs) in human disease-associated genes from the OMIM database and revealed ∼4.7 thousand of TISs related to uORFs. We compared our TISs with the previous studies and provided a list of 'high confidence' uORFs. Using a luciferase assay, we experimentally validated the translation of uORFs in the ETFDH, PAX9, MAST1, HTT, TTN,GLI2 and COL2A1 genes, as well as existence of N-terminal CDS extension in the ZIC2 gene. Besides, we created a tool to annotate the effects of genetic variants located in uORFs. We revealed the variants from the HGMD and ClinVar databases that disrupt uORFs and thereby could lead to Mendelian disorders. We also showed that the distribution of uORFs-affecting variants differs between pathogenic and population variants. Finally, drawing on manually curated data, we developed a machine-learning algorithm that allows us to predict the TISs in other human genes.


Asunto(s)
Regiones no Traducidas 5' , Bases de Datos Genéticas , Enfermedad , Sistemas de Lectura Abierta , Humanos , Biosíntesis de Proteínas , Enfermedad/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...