Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Mol Cell ; 83(22): 3950-3952, 2023 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-37977115

RESUMEN

Two recent studies exploited ultra-fast structural aligners and deep-learning approaches to cluster the protein structure space in the AlphaFold Database. Barrio-Hernandez et al.1 and Durairaj et al.2 uncovered fascinating new protein functions and structural features previously unknown.


Asunto(s)
Análisis por Conglomerados , Bases de Datos Factuales
2.
Trends Biochem Sci ; 48(4): 345-359, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36504138

RESUMEN

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.


Asunto(s)
Aprendizaje Automático , Proteínas , Proteínas/química , Biología Computacional/métodos , Conformación Proteica
3.
Bioinformatics ; 40(5)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38718225

RESUMEN

MOTIVATION: Protein domains are fundamental units of protein structure and play a pivotal role in understanding folding, function, evolution, and design. The advent of accurate structure prediction techniques has resulted in an influx of new structural data, making the partitioning of these structures into domains essential for inferring evolutionary relationships and functional classification. RESULTS: This article presents Chainsaw, a supervised learning approach to domain parsing that achieves accuracy that surpasses current state-of-the-art methods. Chainsaw uses a fully convolutional neural network which is trained to predict the probability that each pair of residues is in the same domain. Domain predictions are then derived from these pairwise predictions using an algorithm that searches for the most likely assignment of residues to domains given the set of pairwise co-membership probabilities. Chainsaw matches CATH domain annotations in 78% of protein domains versus 72% for the next closest method. When predicting on AlphaFold models, expert human evaluators were twice as likely to prefer Chainsaw's predictions versus the next best method. AVAILABILITY AND IMPLEMENTATION: github.com/JudeWells/Chainsaw.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Dominios Proteicos , Proteínas , Proteínas/química , Bases de Datos de Proteínas , Biología Computacional/métodos , Programas Informáticos , Humanos
4.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35641150

RESUMEN

Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.


Asunto(s)
Mutación Missense , Proteínas , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Mutación , Proteínas/química , Proteínas/genética
5.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36648327

RESUMEN

MOTIVATION: CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein language models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS: The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4% and 98.2 ± 0.3%, respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION: The code for the developed models is available on https://github.com/vam-sin/CATHe, and the datasets developed in this study can be accessed on https://zenodo.org/record/6327572. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Proteínas , Humanos , Homología de Secuencia de Aminoácido , Proteínas/química , Bases de Datos de Proteínas
6.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33237325

RESUMEN

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Dominios Proteicos , Proteínas/química , Secuencia de Aminoácidos , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , Biología Computacional/métodos , Epidemias , Humanos , Internet , Anotación de Secuencia Molecular , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Proteínas Virales/química , Proteínas Virales/genética , Proteínas Virales/metabolismo
7.
J Cell Sci ; 133(16)2020 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-32665322

RESUMEN

The yeast Hansenula polymorpha contains four members of the Pex23 family of peroxins, which characteristically contain a DysF domain. Here we show that all four H. polymorpha Pex23 family proteins localize to the endoplasmic reticulum (ER). Pex24 and Pex32, but not Pex23 and Pex29, predominantly accumulate at peroxisome-ER contacts. Upon deletion of PEX24 or PEX32 - and to a much lesser extent, of PEX23 or PEX29 - peroxisome-ER contacts are lost, concomitant with defects in peroxisomal matrix protein import, membrane growth, and organelle proliferation, positioning and segregation. These defects are suppressed by the introduction of an artificial peroxisome-ER tether, indicating that Pex24 and Pex32 contribute to tethering of peroxisomes to the ER. Accumulation of Pex32 at these contact sites is lost in cells lacking the peroxisomal membrane protein Pex11, in conjunction with disruption of the contacts. This indicates that Pex11 contributes to Pex32-dependent peroxisome-ER contact formation. The absence of Pex32 has no major effect on pre-peroxisomal vesicles that occur in pex3 atg1 deletion cells.


Asunto(s)
Peroxisomas , Proteínas de Saccharomyces cerevisiae , Retículo Endoplásmico/genética , Proteínas de la Membrana/genética , Biogénesis de Organelos , Peroxinas/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomycetales
8.
Bioinformatics ; 37(20): 3449-3455, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-33978744

RESUMEN

MOTIVATION: Classifying proteins into functional families can improve our understanding of protein function and can allow transferring annotations within one family. For this, functional families need to be 'pure', i.e., contain only proteins with identical function. Functional Families (FunFams) cluster proteins within CATH superfamilies into such groups of proteins sharing function. 11% of all FunFams (22 830 of 203 639) contain EC annotations and of those, 7% (1526 of 22 830) have inconsistent functional annotations. RESULTS: We propose an approach to further cluster FunFams into functionally more consistent sub-families by encoding their sequences through embeddings. These embeddings originate from language models transferring knowledge gained from predicting missing amino acids in a sequence (ProtBERT) and have been further optimized to distinguish between proteins belonging to the same or a different CATH superfamily (PB-Tucker). Using distances between embeddings and DBSCAN to cluster FunFams and identify outliers, doubled the number of pure clusters per FunFam compared to random clustering. Our approach was not limited to FunFams but also succeeded on families created using sequence similarity alone. Complementing EC annotations, we observed similar results for binding annotations. Thus, we expect an increased purity also for other aspects of function. Our results can help generating FunFams; the resulting clusters with improved functional consistency allow more reliable inference of annotations. We expect this approach to succeed equally for any other grouping of proteins by their phenotypes. AVAILABILITY AND IMPLEMENTATION: Code and embeddings are available via GitHub: https://github.com/Rostlab/FunFamsClustering. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
Mol Syst Biol ; 17(9): e10079, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34519429

RESUMEN

We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.


Asunto(s)
Enzima Convertidora de Angiotensina 2/metabolismo , Interacciones Huésped-Patógeno/genética , Procesamiento Proteico-Postraduccional , SARS-CoV-2/metabolismo , Glicoproteína de la Espiga del Coronavirus/metabolismo , Sistemas de Transporte de Aminoácidos Neutros/química , Sistemas de Transporte de Aminoácidos Neutros/genética , Sistemas de Transporte de Aminoácidos Neutros/metabolismo , Enzima Convertidora de Angiotensina 2/química , Enzima Convertidora de Angiotensina 2/genética , Sitios de Unión , COVID-19/genética , COVID-19/metabolismo , COVID-19/virología , Biología Computacional/métodos , Proteínas de la Envoltura de Coronavirus/química , Proteínas de la Envoltura de Coronavirus/genética , Proteínas de la Envoltura de Coronavirus/metabolismo , Proteínas de la Nucleocápside de Coronavirus/química , Proteínas de la Nucleocápside de Coronavirus/genética , Proteínas de la Nucleocápside de Coronavirus/metabolismo , Humanos , Proteínas de Transporte de Membrana Mitocondrial/química , Proteínas de Transporte de Membrana Mitocondrial/genética , Proteínas de Transporte de Membrana Mitocondrial/metabolismo , Proteínas del Complejo de Importación de Proteínas Precursoras Mitocondriales , Modelos Moleculares , Imitación Molecular , Neuropilina-1/química , Neuropilina-1/genética , Neuropilina-1/metabolismo , Fosfoproteínas/química , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Multimerización de Proteína , SARS-CoV-2/química , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Proteínas de la Matriz Viral/química , Proteínas de la Matriz Viral/genética , Proteínas de la Matriz Viral/metabolismo , Proteínas Viroporinas/química , Proteínas Viroporinas/genética , Proteínas Viroporinas/metabolismo , Replicación Viral
10.
Bioinformatics ; 34(22): 3937-3938, 2018 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-29931249

RESUMEN

Summary: We introduce ICBdocker, a Docker environment that allows the annotation of functional and structural features of proteomes through a Python/Perl pipeline. DataTables pages make it easy to set up a web-resource for research groups with a focus on the same organisms or datasets. The results are available as tab-separated values files and HTML, allowing data analysis and browsing. The pipeline focuses on modularity and scalability, with capability of integrating with multi-processing and high-performance computing clusters. Availability and implementation: ICBdocker is freely available on DockerHub at https://hub.docker.com/r/bordin89/icb/ Source code and documentation are available on GitHub at: https://github.com/bordin89/ICB_docker.


Asunto(s)
Proteoma , Programas Informáticos , Biología Computacional , Proteómica
11.
Genomics ; 110(5): 231-238, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29074368

RESUMEN

Planctomycetes are bacteria with complex molecular and cellular biology. They have large genomes, some over 7Mb, and complex life cycles that include motile cells and sessile cells. Some live on the complex biofilm of macroalgae. Factors governing their life in this environment were investigated at the genomic level. We analyzed the genomes of three planctomycetes isolated from algal surfaces. The genomes were 6.6Mbp to 8.1Mbp large. Genes for outer-membrane proteins, peptidoglycan and lipopolysaccharide biosynthesis were present. Rubripirellula obstinata LF1T, Roseimaritima ulvae UC8T and Mariniblastus fucicola FC18T shared with Rhodopirellula baltica and R. rubra SWK7 unique proteins related to metal binding systems, phosphate metabolism, chemotaxis, and stress response. These functions may contribute to their ecological success in such a complex environment. Exceptionally huge proteins (6000 to 10,000 amino-acids) with extracellular, periplasmic or membrane-associated locations were found which may be involved in biofilm formation or cell adhesion.


Asunto(s)
Genoma Bacteriano , Planctomycetales/genética , Proteínas de la Membrana Bacteriana Externa/genética , Biopelículas , Chlorophyta/microbiología , Lipopolisacáridos/biosíntesis , Lipopolisacáridos/genética , Phaeophyceae/microbiología , Planctomycetales/patogenicidad , Planctomycetales/fisiología , Proteoglicanos/genética
12.
BMC Microbiol ; 18(1): 133, 2018 10 16.
Artículo en Inglés | MEDLINE | ID: mdl-30326838

RESUMEN

BACKGROUND: Bacillus licheniformis GL174 is a culturable endophytic strain isolated from Vitis vinifera cultivar Glera, the grapevine mainly cultivated for the Prosecco wine production. This strain was previously demonstrated to possess some specific plant growth promoting traits but its endophytic attitude and its role in biocontrol was only partially explored. In this study, the potential biocontrol action of the strain was investigated in vitro and in vivo and, by genome sequence analyses, putative functions involved in biocontrol and plant-bacteria interaction were assessed. RESULTS: Firstly, to confirm the endophytic behavior of the strain, its ability to colonize grapevine tissues was demonstrated and its biocontrol properties were analyzed. Antagonism test results showed that the strain could reduce and inhibit the mycelium growth of diverse plant pathogens in vitro and in vivo. The strain was demonstrated to produce different molecules of the lipopeptide class; moreover, its genome was sequenced, and analysis of the sequences revealed the presence of many protein-coding genes involved in the biocontrol process, such as transporters, plant-cell lytic enzymes, siderophores and other secondary metabolites. CONCLUSIONS: This step-by-step analysis shows that Bacillus licheniformis GL174 may be a good biocontrol agent candidate, and describes some distinguished traits and possible key elements involved in this process. The use of this strain could potentially help grapevine plants to cope with pathogen attacks and reduce the amount of chemicals used in the vineyard.


Asunto(s)
Bacillus licheniformis/fisiología , Agentes de Control Biológico , Vitis/microbiología , Bacillus licheniformis/genética , Biodiversidad , Endófitos/genética , Endófitos/fisiología , Genoma Bacteriano , Filogenia , Enfermedades de las Plantas/microbiología , Hojas de la Planta/microbiología , Raíces de Plantas/microbiología , Análisis de Secuencia de ADN , Secuenciación Completa del Genoma
13.
Sci Rep ; 14(1): 14208, 2024 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-38902252

RESUMEN

The COVID-19 disease is an ongoing global health concern. Although vaccination provides some protection, people are still susceptible to re-infection. Ostensibly, certain populations or clinical groups may be more vulnerable. Factors causing these differences are unclear and whilst socioeconomic and cultural differences are likely to be important, human genetic factors could influence susceptibility. Experimental studies indicate SARS-CoV-2 uses innate immune suppression as a strategy to speed-up entry and replication into the host cell. Therefore, it is necessary to understand the impact of variants in immunity-associated human proteins on susceptibility to COVID-19. In this work, we analysed missense coding variants in several SARS-CoV-2 proteins and their human protein interactors that could enhance binding affinity to SARS-CoV-2. We curated a dataset of 19 SARS-CoV-2: human protein 3D-complexes, from the experimentally determined structures in the Protein Data Bank and models built using AlphaFold2-multimer, and analysed the impact of missense variants occurring in the protein-protein interface region. We analysed 468 missense variants from human proteins and 212 variants from SARS-CoV-2 proteins and computationally predicted their impacts on binding affinities for the human viral protein complexes. We predicted a total of 26 affinity-enhancing variants from 13 human proteins implicated in increased binding affinity to SARS-CoV-2. These include key-immunity associated genes (TOMM70, ISG15, IFIH1, IFIT2, RPS3, PALS1, NUP98, AXL, ARF6, TRIMM, TRIM25) as well as important spike receptors (KREMEN1, AXL and ACE2). We report both common (e.g., Y13N in IFIH1) and rare variants in these proteins and discuss their likely structural and functional impact, using information on known and predicted functional sites. Potential mechanisms associated with immune suppression implicated by these variants are discussed. Occurrence of certain predicted affinity-enhancing variants should be monitored as they could lead to increased susceptibility and reduced immune response to SARS-CoV-2 infection in individuals/populations carrying them. Our analyses aid in understanding the potential impact of genetic variation in immunity-associated proteins on COVID-19 susceptibility and help guide drug-repurposing strategies.


Asunto(s)
COVID-19 , Mutación Missense , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/inmunología , COVID-19/genética , COVID-19/virología , COVID-19/inmunología , Reposicionamiento de Medicamentos , Proteínas Virales/genética , Proteínas Virales/metabolismo , Unión Proteica , Predisposición Genética a la Enfermedad , Susceptibilidad a Enfermedades , Tratamiento Farmacológico de COVID-19
14.
J Mol Biol ; : 168551, 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38548261

RESUMEN

CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.

15.
Curr Opin Struct Biol ; 79: 102543, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36807079

RESUMEN

The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.


Asunto(s)
Aprendizaje Profundo , Biología Computacional/métodos , Proteínas/química , Secuencia de Aminoácidos
16.
Biomolecules ; 13(2)2023 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-36830646

RESUMEN

Protein kinases are important targets for treating human disorders, and they are the second most targeted families after G-protein coupled receptors. Several resources provide classification of kinases into evolutionary families (based on sequence homology); however, very few systematically classify functional families (FunFams) comprising evolutionary relatives that share similar functional properties. We have developed the FunFam-MARC (Multidomain ARchitecture-based Clustering) protocol, which uses multi-domain architectures of protein kinases and specificity-determining residues for functional family classification. FunFam-MARC predicts 2210 kinase functional families (KinFams), which have increased functional coherence, in terms of EC annotations, compared to the widely used KinBase classification. Our protocol provides a comprehensive classification for kinase sequences from >10,000 organisms. We associate human KinFams with diseases and drugs and identify 28 druggable human KinFams, i.e., enriched in clinically approved drugs. Since relatives in the same druggable KinFam tend to be structurally conserved, including the drug-binding site, these KinFams may be valuable for shortlisting therapeutic targets. Information on the human KinFams and associated 3D structures from AlphaFold2 are provided via our CATH FTP website and Zenodo. This gives the domain structure representative of each KinFam together with information on any drug compounds available. For 32% of the KinFams, we provide information on highly conserved residue sites that may be associated with specificity.


Asunto(s)
Proteínas Quinasas , Proteínas , Humanos , Proteínas Quinasas/metabolismo , Proteínas/química , Bases de Datos de Proteínas , Homología de Secuencia de Aminoácido
17.
Elife ; 122023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37787768

RESUMEN

Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.


Asunto(s)
Proteínas de Schizosaccharomyces pombe , Schizosaccharomyces , Humanos , Fenómica , Proteínas de Schizosaccharomyces pombe/genética , Fenotipo , Schizosaccharomyces/genética , Aprendizaje Automático
18.
Commun Biol ; 6(1): 160, 2023 02 08.
Artículo en Inglés | MEDLINE | ID: mdl-36755055

RESUMEN

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.


Asunto(s)
Furilfuramida , Proteínas , Humanos , Bases de Datos de Proteínas , Proteínas/química
19.
NAR Genom Bioinform ; 4(2): lqac043, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35702380

RESUMEN

Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.

20.
STAR Protoc ; 3(1): 101029, 2022 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-35059650

RESUMEN

Lak megaphages are prevalent across diverse gut microbiomes and may potentially impact animal and human health through lysis of Prevotella. Given their large genome size (up to 660 kbp), Lak megaphages are difficult to culture, and their identification relies on molecular techniques. Here, we present optimized protocols for identifying Lak phages in various microbiome samples, including procedures for DNA extraction, followed by detection and quantification of genes encoding Lak structural proteins using diagnostic endpoint and SYBR green-based quantitative PCR, respectively. For complete details on the use and execution of this protocol, please refer to Crisci et al., (2021).


Asunto(s)
Bacteriófagos , Microbioma Gastrointestinal , Microbiota , Animales , Bacteriófagos/genética , Microbiota/genética , Prevotella/genética , Reacción en Cadena en Tiempo Real de la Polimerasa/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA