Pesquisa | Portal Regional da BVS

1.

Predicting human and viral protein variants affecting COVID-19 susceptibility and repurposing therapeutics.

Waman, Vaishali P; Ashford, Paul; Lam, Su Datt; Sen, Neeladri; Abbasian, Mahnaz; Woodridge, Laurel; Goldtzvik, Yonathan; Bordin, Nicola; Wu, Jiaxin; Sillitoe, Ian; Orengo, Christine A.

Sci Rep ; 14(1): 14208, 2024 06 20.

Artigo em Inglês | MEDLINE | ID: mdl-38902252

RESUMO

The COVID-19 disease is an ongoing global health concern. Although vaccination provides some protection, people are still susceptible to re-infection. Ostensibly, certain populations or clinical groups may be more vulnerable. Factors causing these differences are unclear and whilst socioeconomic and cultural differences are likely to be important, human genetic factors could influence susceptibility. Experimental studies indicate SARS-CoV-2 uses innate immune suppression as a strategy to speed-up entry and replication into the host cell. Therefore, it is necessary to understand the impact of variants in immunity-associated human proteins on susceptibility to COVID-19. In this work, we analysed missense coding variants in several SARS-CoV-2 proteins and their human protein interactors that could enhance binding affinity to SARS-CoV-2. We curated a dataset of 19 SARS-CoV-2: human protein 3D-complexes, from the experimentally determined structures in the Protein Data Bank and models built using AlphaFold2-multimer, and analysed the impact of missense variants occurring in the protein-protein interface region. We analysed 468 missense variants from human proteins and 212 variants from SARS-CoV-2 proteins and computationally predicted their impacts on binding affinities for the human viral protein complexes. We predicted a total of 26 affinity-enhancing variants from 13 human proteins implicated in increased binding affinity to SARS-CoV-2. These include key-immunity associated genes (TOMM70, ISG15, IFIH1, IFIT2, RPS3, PALS1, NUP98, AXL, ARF6, TRIMM, TRIM25) as well as important spike receptors (KREMEN1, AXL and ACE2). We report both common (e.g., Y13N in IFIH1) and rare variants in these proteins and discuss their likely structural and functional impact, using information on known and predicted functional sites. Potential mechanisms associated with immune suppression implicated by these variants are discussed. Occurrence of certain predicted affinity-enhancing variants should be monitored as they could lead to increased susceptibility and reduced immune response to SARS-CoV-2 infection in individuals/populations carrying them. Our analyses aid in understanding the potential impact of genetic variation in immunity-associated proteins on COVID-19 susceptibility and help guide drug-repurposing strategies.

Assuntos

COVID-19 , Mutação de Sentido Incorreto , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/imunologia , COVID-19/genética , COVID-19/virologia , COVID-19/imunologia , Reposicionamento de Medicamentos , Proteínas Virais/genética , Proteínas Virais/metabolismo , Ligação Proteica , Predisposição Genética para Doença , Suscetibilidade a Doenças , Tratamento Farmacológico da COVID-19

2.

Chainsaw: protein domain segmentation with fully convolutional neural networks.

Wells, Jude; Hawkins-Hooker, Alex; Bordin, Nicola; Sillitoe, Ian; Paige, Brooks; Orengo, Christine.

Bioinformatics ; 40(5)2024 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-38718225

RESUMO

MOTIVATION: Protein domains are fundamental units of protein structure and play a pivotal role in understanding folding, function, evolution, and design. The advent of accurate structure prediction techniques has resulted in an influx of new structural data, making the partitioning of these structures into domains essential for inferring evolutionary relationships and functional classification. RESULTS: This article presents Chainsaw, a supervised learning approach to domain parsing that achieves accuracy that surpasses current state-of-the-art methods. Chainsaw uses a fully convolutional neural network which is trained to predict the probability that each pair of residues is in the same domain. Domain predictions are then derived from these pairwise predictions using an algorithm that searches for the most likely assignment of residues to domains given the set of pairwise co-membership probabilities. Chainsaw matches CATH domain annotations in 78% of protein domains versus 72% for the next closest method. When predicting on AlphaFold models, expert human evaluators were twice as likely to prefer Chainsaw's predictions versus the next best method. AVAILABILITY AND IMPLEMENTATION: github.com/JudeWells/Chainsaw.

Assuntos

Algoritmos , Redes Neurais de Computação , Domínios Proteicos , Proteínas , Proteínas/química , Bases de Dados de Proteínas , Biologia Computacional/métodos , Software , Humanos

3.

Enhancing missense variant pathogenicity prediction with protein language models using VariPred.

Lin, Weining; Wells, Jude; Wang, Zeyuan; Orengo, Christine; Martin, Andrew C R.

Sci Rep ; 14(1): 8136, 2024 04 07.

Artigo em Inglês | MEDLINE | ID: mdl-38584172

RESUMO

Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. Using one of the best-performing protein language models (ESM-1b), we establish a robust classifier that requires no calculation of structural features or multiple sequence alignments. We compare the performance of VariPred with other representative models including 3Cnet, Polyphen-2, REVEL, MetaLR, FATHMM and ESM variant. VariPred performs as well as, or in most cases better than these other predictors using six variant impact prediction benchmarks despite requiring only sequence data and no pre-processing of the data.

Assuntos

Mutação de Sentido Incorreto , Proteínas , Virulência , Proteínas/genética , Sequência de Aminoácidos , Biologia Computacional/métodos

4.

CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds.

Waman, Vaishali P; Bordin, Nicola; Alcraft, Rachel; Vickerstaff, Robert; Rauer, Clemens; Chan, Qian; Sillitoe, Ian; Yamamori, Hazuki; Orengo, Christine.

J Mol Biol ; : 168551, 2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38548261

RESUMO

CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.

5.

FunPredCATH: An ensemble method for predicting protein function using CATH.

Bonello, Joseph; Orengo, Christine.

Biochim Biophys Acta Proteins Proteom ; 1872(2): 140985, 2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38122964

RESUMO

MOTIVATION: The growth of unannotated proteins in UniProt increases at a very high rate every year due to more efficient sequencing methods. However, the experimental annotation of proteins is a lengthy and expensive process. Using computational techniques to narrow the search can speed up the process by providing highly specific Gene Ontology (GO) terms. METHODOLOGY: We propose an ensemble approach that combines three generic base predictors that predict Gene Ontology (BP, CC and MF) terms from sequences across different species. We train our models on UniProtGOA annotation data and use the CATH domain resources to identify the protein families. We then calculate a score based on the prevalence of individual GO terms in the functional families that is then used as an indicator of confidence when assigning the GO term to an uncharacterised protein. METHODS: In the ensemble, we use a statistics-based method that scores the occurrence of GO terms in a CATH FunFam against a background set of proteins annotated by the same GO term. We also developed a set-based method that uses Set Intersection and Set Union to score the occurrence of GO terms within the same CATH FunFam. Finally, we also use FunFams-Plus, a predictor method developed by the Orengo Group at UCL to predict GO terms for uncharacterised proteins in the CAFA3 challenge. EVALUATION: We evaluated the methods against the CAFA3 benchmark and DomFun. We used the Precision, Recall and Fmax metrics and the benchmark datasets that are used in CAFA3 to evaluate our models and compare them to the CAFA3 results. Our results show that FunPredCATH compares well with top CAFA methods in the different ontologies and benchmarks. CONTRIBUTIONS: FunPredCATH compares well with other prediction methods on CAFA3, and the ensemble approach outperforms the base methods. We show that non-IEA models obtain higher Fmax scores than the IEA counterparts, while the models including IEA annotations have higher coverage at the expense of a lower Fmax score.

Assuntos

Proteínas , Análise de Sequência de Proteína , Bases de Dados de Proteínas , Proteínas/metabolismo , Anotação de Sequência Molecular , Análise de Sequência de Proteína/métodos , Ontologia Genética

6.

Large-scale clustering of AlphaFold2 3D models shines light on the structure and function of proteins.

Bordin, Nicola; Lau, Andy M; Orengo, Christine.

Mol Cell ; 83(22): 3950-3952, 2023 Nov 16.

Artigo em Inglês | MEDLINE | ID: mdl-37977115

RESUMO

Two recent studies exploited ultra-fast structural aligners and deep-learning approaches to cluster the protein structure space in the AlphaFold Database. Barrio-Hernandez et al.1 and Durairaj et al.2 uncovered fascinating new protein functions and structural features previously unknown.

Assuntos

Análise por Conglomerados , Bases de Dados Factuais

7.

Broad functional profiling of fission yeast proteins using phenomics and machine learning.

Rodríguez-López, María; Bordin, Nicola; Lees, Jon; Scholes, Harry; Hassan, Shaimaa; Saintain, Quentin; Kamrad, Stephan; Orengo, Christine; Bähler, Jürg.

Elife ; 122023 10 03.

Artigo em Inglês | MEDLINE | ID: mdl-37787768

RESUMO

Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.

Assuntos

Proteínas de Schizosaccharomyces pombe , Schizosaccharomyces , Humanos , Fenômica , Proteínas de Schizosaccharomyces pombe/genética , Fenótipo , Schizosaccharomyces/genética , Aprendizado de Máquina

8.

Protein diversification through post-translational modifications, alternative splicing, and gene duplication.

Goldtzvik, Yonathan; Sen, Neeladri; Lam, Su Datt; Orengo, Christine.

Curr Opin Struct Biol ; 81: 102640, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37354790

RESUMO

Proteins provide the basis for cellular function. Having multiple versions of the same protein within a single organism provides a way of regulating its activity or developing novel functions. Post-translational modifications of proteins, by means of adding/removing chemical groups to amino acids, allow for a well-regulated and controlled way of generating functionally distinct protein species. Alternative splicing is another method with which organisms possibly generate new isoforms. Additionally, gene duplication events throughout evolution generate multiple paralogs of the same genes, resulting in multiple versions of the same protein within an organism. In this review, we discuss recent advancements in the study of these three methods of protein diversification and provide illustrative examples of how they affect protein structure and function.

Assuntos

Processamento Alternativo , Duplicação Gênica , Evolução Molecular , Isoformas de Proteínas/genética , Processamento de Proteína Pós-Traducional

9.

KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units.

Adeyelu, Tolulope; Bordin, Nicola; Waman, Vaishali P; Sadlej, Marta; Sillitoe, Ian; Moya-Garcia, Aurelio A; Orengo, Christine A.

Biomolecules ; 13(2)2023 02 02.

Artigo em Inglês | MEDLINE | ID: mdl-36830646

RESUMO

Protein kinases are important targets for treating human disorders, and they are the second most targeted families after G-protein coupled receptors. Several resources provide classification of kinases into evolutionary families (based on sequence homology); however, very few systematically classify functional families (FunFams) comprising evolutionary relatives that share similar functional properties. We have developed the FunFam-MARC (Multidomain ARchitecture-based Clustering) protocol, which uses multi-domain architectures of protein kinases and specificity-determining residues for functional family classification. FunFam-MARC predicts 2210 kinase functional families (KinFams), which have increased functional coherence, in terms of EC annotations, compared to the widely used KinBase classification. Our protocol provides a comprehensive classification for kinase sequences from >10,000 organisms. We associate human KinFams with diseases and drugs and identify 28 druggable human KinFams, i.e., enriched in clinically approved drugs. Since relatives in the same druggable KinFam tend to be structurally conserved, including the drug-binding site, these KinFams may be valuable for shortlisting therapeutic targets. Information on the human KinFams and associated 3D structures from AlphaFold2 are provided via our CATH FTP website and Zenodo. This gives the domain structure representative of each KinFam together with information on any drug compounds available. For 32% of the KinFams, we provide information on highly conserved residue sites that may be associated with specificity.

Assuntos

Proteínas Quinases , Proteínas , Humanos , Proteínas Quinases/metabolismo , Proteínas/química , Bases de Dados de Proteínas , Homologia de Sequência de Aminoácidos

10.

ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models.

Vallat, Brinda; Tauriello, Gerardo; Bienert, Stefan; Haas, Juergen; Webb, Benjamin M; Zídek, Augustin; Zheng, Wei; Peisach, Ezra; Piehl, Dennis W; Anischanka, Ivan; Sillitoe, Ian; Tolchard, James; Varadi, Mihaly; Baker, David; Orengo, Christine; Zhang, Yang; Hoch, Jeffrey C; Kurisu, Genji; Patwardhan, Ardan; Velankar, Sameer; Burley, Stephen K; Sali, Andrej; Schwede, Torsten; Berman, Helen M; Westbrook, John D.

J Mol Biol ; 435(14): 168021, 2023 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-36828268

RESUMO

ModelCIF (github.com/ihmwg/ModelCIF) is a data information framework developed for and by computational structural biologists to enable delivery of Findable, Accessible, Interoperable, and Reusable (FAIR) data to users worldwide. ModelCIF describes the specific set of attributes and metadata associated with macromolecular structures modeled by solely computational methods and provides an extensible data representation for deposition, archiving, and public dissemination of predicted three-dimensional (3D) models of macromolecules. It is an extension of the Protein Data Bank Exchange / macromolecular Crystallographic Information Framework (PDBx/mmCIF), which is the global data standard for representing experimentally-determined 3D structures of macromolecules and associated metadata. The PDBx/mmCIF framework and its extensions (e.g., ModelCIF) are managed by the Worldwide Protein Data Bank partnership (wwPDB, wwpdb.org) in collaboration with relevant community stakeholders such as the wwPDB ModelCIF Working Group (wwpdb.org/task/modelcif). This semantically rich and extensible data framework for representing computed structure models (CSMs) accelerates the pace of scientific discovery. Herein, we describe the architecture, contents, and governance of ModelCIF, and tools and processes for maintaining and extending the data standard. Community tools and software libraries that support ModelCIF are also described.

Assuntos

Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Conformação Proteica , Software

11.

The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors.

Varadi, Mihaly; Bordin, Nicola; Orengo, Christine; Velankar, Sameer.

Curr Opin Struct Biol ; 79: 102543, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36807079

RESUMO

The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.

Assuntos

Aprendizado Profundo , Biologia Computacional/métodos , Proteínas/química , Sequência de Aminoácidos

12.

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms.

Bordin, Nicola; Sillitoe, Ian; Nallapareddy, Vamsi; Rauer, Clemens; Lam, Su Datt; Waman, Vaishali P; Sen, Neeladri; Heinzinger, Michael; Littmann, Maria; Kim, Stephanie; Velankar, Sameer; Steinegger, Martin; Rost, Burkhard; Orengo, Christine.

Commun Biol ; 6(1): 160, 2023 02 08.

Artigo em Inglês | MEDLINE | ID: mdl-36755055

RESUMO

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.

Assuntos

Furilfuramida , Proteínas , Humanos , Bases de Dados de Proteínas , Proteínas/química

13.

CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models.

Nallapareddy, Vamsi; Bordin, Nicola; Sillitoe, Ian; Heinzinger, Michael; Littmann, Maria; Waman, Vaishali P; Sen, Neeladri; Rost, Burkhard; Orengo, Christine.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36648327

RESUMO

MOTIVATION: CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein language models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS: The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4% and 98.2 ± 0.3%, respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION: The code for the developed models is available on https://github.com/vam-sin/CATHe, and the datasets developed in this study can be accessed on https://zenodo.org/record/6327572. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Proteínas , Humanos , Homologia de Sequência de Aminoácidos , Proteínas/química , Bases de Dados de Proteínas

14.

Novel machine learning approaches revolutionize protein knowledge.

Bordin, Nicola; Dallago, Christian; Heinzinger, Michael; Kim, Stephanie; Littmann, Maria; Rauer, Clemens; Steinegger, Martin; Rost, Burkhard; Orengo, Christine.

Trends Biochem Sci ; 48(4): 345-359, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36504138

RESUMO

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.

Assuntos

Aprendizado de Máquina , Proteínas , Proteínas/química , Biologia Computacional/métodos , Conformação Proteica

15.

Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins.

Hasenahuer, Marcia A; Sanchis-Juan, Alba; Laskowski, Roman A; Baker, James A; Stephenson, James D; Orengo, Christine A; Raymond, F Lucy; Thornton, Janet M.

J Mol Biol ; 435(2): 167892, 2023 01 30.

Artigo em Inglês | MEDLINE | ID: mdl-36410474

RESUMO

Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.

Assuntos

Genoma Humano , Fases de Leitura Aberta , Proteínas , Humanos , Sequência de Bases , Genoma Humano/genética , Genômica , Proteínas/genética , Mapeamento Cromossômico

16.

InterPro in 2022.

Paysan-Lafosse, Typhaine; Blum, Matthias; Chuguransky, Sara; Grego, Tiago; Pinto, Beatriz Lázaro; Salazar, Gustavo A; Bileschi, Maxwell L; Bork, Peer; Bridge, Alan; Colwell, Lucy; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex.

Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36350672

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

Assuntos

Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , Software

17.

Dissecting peripheral protein-membrane interfaces.

Tubiana, Thibault; Sillitoe, Ian; Orengo, Christine; Reuter, Nathalie.

PLoS Comput Biol ; 18(12): e1010346, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36516231

RESUMO

Peripheral membrane proteins (PMPs) include a wide variety of proteins that have in common to bind transiently to the chemically complex interfacial region of membranes through their interfacial binding site (IBS). In contrast to protein-protein or protein-DNA/RNA interfaces, peripheral protein-membrane interfaces are poorly characterized. We collected a dataset of PMP domains representative of the variety of PMP functions: membrane-targeting domains (Annexin, C1, C2, discoidin C2, PH, PX), enzymes (PLA, PLC/D) and lipid-transfer proteins (START). The dataset contains 1328 experimental structures and 1194 AphaFold models. We mapped the amino acid composition and structural patterns of the IBS of each protein in this dataset, and evaluated which were more likely to be found at the IBS compared to the rest of the domains' accessible surface. In agreement with earlier work we find that about two thirds of the PMPs in the dataset have protruding hydrophobes (Leu, Ile, Phe, Tyr, Trp and Met) at their IBS. The three aromatic amino acids Trp, Tyr and Phe are a hallmark of PMPs IBS regardless of whether they protrude on loops or not. This is also the case for lysines but not arginines suggesting that, unlike for Arg-rich membrane-active peptides, the less membrane-disruptive lysine is preferred in PMPs. Another striking observation was the over-representation of glycines at the IBS of PMPs compared to the rest of their surface, possibly procuring IBS loops a much-needed flexibility to insert in-between membrane lipids. The analysis of the 9 superfamilies revealed amino acid distribution patterns in agreement with their known functions and membrane-binding mechanisms. Besides revealing novel amino acids patterns at protein-membrane interfaces, our work contributes a new PMP dataset and an analysis pipeline that can be further built upon for future studies of PMPs properties, or for developing PMPs prediction tools using for example, machine learning approaches.

Assuntos

Membrana Celular , Peptídeos , Aminoácidos/química , Sítios de Ligação , Peptídeos/química , Membrana Celular/química

18.

3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources.

Varadi, Mihaly; Nair, Sreenath; Sillitoe, Ian; Tauriello, Gerardo; Anyango, Stephen; Bienert, Stefan; Borges, Clemente; Deshpande, Mandar; Green, Tim; Hassabis, Demis; Hatos, Andras; Hegedus, Tamas; Hekkelman, Maarten L; Joosten, Robbie; Jumper, John; Laydon, Agata; Molodenskiy, Dmitry; Piovesan, Damiano; Salladini, Edoardo; Salzberg, Steven L; Sommer, Markus J; Steinegger, Martin; Suhajda, Erzsebet; Svergun, Dmitri; Tenorio-Ku, Luiggi; Tosatto, Silvio; Tunyasuvunakool, Kathryn; Waterhouse, Andrew Mark; Zídek, Augustin; Schwede, Torsten; Orengo, Christine; Velankar, Sameer.

Gigascience ; 112022 11 30.

Artigo em Inglês | MEDLINE | ID: mdl-36448847

RESUMO

While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.

Assuntos

Metadados , Registros , Sequência de Aminoácidos , Bases de Dados de Proteínas , Simulação por Computador

19.

Structural and energetic analyses of SARS-CoV-2 N-terminal domain characterise sugar binding pockets and suggest putative impacts of variants on COVID-19 transmission.

Lam, Su Datt; Waman, Vaishali P; Fraternali, Franca; Orengo, Christine; Lees, Jonathan.

Comput Struct Biotechnol J ; 20: 6302-6316, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36408455

RESUMO

Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 is an ongoing pandemic that causes significant health/socioeconomic burden. Variants of concern (VOCs) have emerged affecting transmissibility, disease severity and re-infection risk. Studies suggest that the - N-terminal domain (NTD) of the spike protein may have a role in facilitating virus entry via sialic-acid receptor binding. Furthermore, most VOCs include novel NTD variants. Despite global sequence and structure similarity, most sialic-acid binding pockets in NTD vary across coronaviruses. Our work suggests ongoing evolutionary tuning of the sugar-binding pockets and recent analyses have shown that NTD insertions in VOCs tend to lie close to loops. We extended the structural characterisation of these sugar-binding pockets and explored whether variants could enhance sialic acid-binding. We found that recent NTD insertions in VOCs (i.e., Gamma, Delta and Omicron variants) and emerging variants of interest (VOIs) (i.e., Iota, Lambda and Theta variants) frequently lie close to sugar-binding pockets. For some variants, including the recent Omicron VOC, we find increases in predicted sialic acid-binding energy, compared to the original SARS-CoV-2, which may contribute to increased transmission. These binding observations are supported by molecular dynamics simulations (MD). We examined the similarity of NTD across Betacoronaviruses to determine whether the sugar-binding pockets are sufficiently similar to be exploited in drug design. Whilst most pockets are too structurally variable, we detected a previously unknown highly structurally conserved pocket which can be investigated in pursuit of a generic pan-Betacoronavirus drug. Our structure-based analyses help rationalise the effects of VOCs and provide hypotheses for experiments. Our findings suggest a strong need for experimental monitoring of changes in NTD of VOCs.

20.

A roadmap for the functional annotation of protein families: a community perspective.

de Crécy-Lagard, Valérie; Amorin de Hegedus, Rocio; Arighi, Cecilia; Babor, Jill; Bateman, Alex; Blaby, Ian; Blaby-Haas, Crysten; Bridge, Alan J; Burley, Stephen K; Cleveland, Stacey; Colwell, Lucy J; Conesa, Ana; Dallago, Christian; Danchin, Antoine; de Waard, Anita; Deutschbauer, Adam; Dias, Raquel; Ding, Yousong; Fang, Gang; Friedberg, Iddo; Gerlt, John; Goldford, Joshua; Gorelik, Mark; Gyori, Benjamin M; Henry, Christopher; Hutinet, Geoffrey; Jaroch, Marshall; Karp, Peter D; Kondratova, Liudmyla; Lu, Zhiyong; Marchler-Bauer, Aron; Martin, Maria-Jesus; McWhite, Claire; Moghe, Gaurav D; Monaghan, Paul; Morgat, Anne; Mungall, Christopher J; Natale, Darren A; Nelson, William C; O'Donoghue, Seán; Orengo, Christine; O'Toole, Katherine H; Radivojac, Predrag; Reed, Colbie; Roberts, Richard J; Rodionov, Dmitri; Rodionova, Irina A; Rudolf, Jeffrey D; Saleh, Lana; Sheynkman, Gloria.

Database (Oxford) ; 20222022 08 12.

Artigo em Inglês | MEDLINE | ID: mdl-35961013

RESUMO

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.

Assuntos

Genômica , Proteínas , Sequência de Bases , Biologia Computacional , Genoma , Anotação de Sequência Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA