Búsqueda | Portal Regional de la BVS

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms.

Bordin, Nicola; Sillitoe, Ian; Nallapareddy, Vamsi; Rauer, Clemens; Lam, Su Datt; Waman, Vaishali P; Sen, Neeladri; Heinzinger, Michael; Littmann, Maria; Kim, Stephanie; Velankar, Sameer; Steinegger, Martin; Rost, Burkhard; Orengo, Christine.

Commun Biol ; 6(1): 160, 2023 02 08.

Artículo en Inglés | MEDLINE | ID: mdl-36755055

RESUMEN

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.

Asunto(s)

Furilfuramida , Proteínas , Humanos , Bases de Datos de Proteínas , Proteínas/química

CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models.

Nallapareddy, Vamsi; Bordin, Nicola; Sillitoe, Ian; Heinzinger, Michael; Littmann, Maria; Waman, Vaishali P; Sen, Neeladri; Rost, Burkhard; Orengo, Christine.

Bioinformatics ; 39(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-36648327

RESUMEN

MOTIVATION: CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein language models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS: The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4% and 98.2 ± 0.3%, respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION: The code for the developed models is available on https://github.com/vam-sin/CATHe, and the datasets developed in this study can be accessed on https://zenodo.org/record/6327572. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Proteínas , Humanos , Homología de Secuencia de Aminoácido , Proteínas/química , Bases de Datos de Proteínas

DeepCys: Structure-based multiple cysteine function prediction method trained on deep neural network: Case study on domains of unknown functions belonging to COX2 domains.

Nallapareddy, Vamsi; Bogam, Shubham; Devarakonda, Himaja; Paliwal, Shubham; Bandyopadhyay, Debashree.

Proteins ; 89(7): 745-761, 2021 07.

Artículo en Inglés | MEDLINE | ID: mdl-33580578

RESUMEN

Cysteine (Cys) is the most reactive amino acid participating in a wide range of biological functions. In-silico predictions complement the experiments to meet the need of functional characterization. Multiple Cys function prediction algorithm is scarce, in contrast to specific function prediction algorithms. Here we present a deep neural network-based multiple Cys function prediction, available on web-server (DeepCys) (https://deepcys.herokuapp.com/). DeepCys model was trained and tested on two independent datasets curated from protein crystal structures. This prediction method requires three inputs, namely, PDB identifier (ID), chain ID and residue ID for a given Cys and outputs the probabilities of four cysteine functions, namely, disulphide, metal-binding, thioether and sulphenylation and predicts the most probable Cys function. The algorithm exploits the local and global protein properties, like, sequence and secondary structure motifs, buried fractions, microenvironments and protein/enzyme class. DeepCys outperformed most of the multiple and specific Cys function algorithms. This method can predict maximum number of cysteine functions. Moreover, for the first time, explicitly predicts thioether function. This tool was used to elucidate the cysteine functions on domains of unknown functions belonging to cytochrome C oxidase subunit-II like transmembrane domains. Apart from the web-server, a standalone program is also available on GitHub (https://github.com/vam-sin/deepcys).

Asunto(s)

Cisteína/química , Aprendizaje Profundo , Disulfuros/química , Complejo IV de Transporte de Electrones/química , Procesamiento Proteico-Postraduccional , Programas Informáticos , Secuencia de Aminoácidos , Cationes Bivalentes/química , Cationes Bivalentes/metabolismo , Cisteína/metabolismo , Disulfuros/metabolismo , Complejo IV de Transporte de Electrones/metabolismo , Glutatión/química , Glutatión/metabolismo , Modelos Moleculares , Compuestos Nitrosos/química , Compuestos Nitrosos/metabolismo , Dominios Proteicos , Estructura Secundaria de Proteína , Relación Estructura-Actividad , Sulfuros/química , Sulfuros/metabolismo , Ácidos Sulfínicos/química , Ácidos Sulfínicos/metabolismo , Ácidos Sulfónicos/química , Ácidos Sulfónicos/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA