Pesquisa | BVS Violência e Saúde

1.

The Human RNA-Binding Proteome and Its Dynamics during Translational Arrest.

Trendel, Jakob; Schwarzl, Thomas; Horos, Rastislav; Prakash, Ananth; Bateman, Alex; Hentze, Matthias W; Krijgsveld, Jeroen.

Cell ; 176(1-2): 391-403.e19, 2019 01 10.

Artigo em Inglês | MEDLINE | ID: mdl-30528433

RESUMO

Proteins and RNA functionally and physically intersect in multiple biological processes, however, currently no universal method is available to purify protein-RNA complexes. Here, we introduce XRNAX, a method for the generic purification of protein-crosslinked RNA, and demonstrate its versatility to study the composition and dynamics of protein-RNA interactions by various transcriptomic and proteomic approaches. We show that XRNAX captures all RNA biotypes and use this to characterize the sub-proteomes that interact with coding and non-coding RNAs (ncRNAs) and to identify hundreds of protein-RNA interfaces. Exploiting the quantitative nature of XRNAX, we observe drastic remodeling of the RNA-bound proteome during arsenite-induced stress, distinct from autophagy-related changes in the total proteome. In addition, we combine XRNAX with crosslinking immunoprecipitation sequencing (CLIP-seq) to validate the interaction of ncRNA with lamin B1 and EXOSC2. Thus, XRNAX is a resourceful approach to study structural and compositional aspects of protein-RNA interactions to address fundamental questions in RNA-biology.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas de Ligação a RNA/isolamento & purificação , RNA/isolamento & purificação , Sítios de Ligação , Complexo Multienzimático de Ribonucleases do Exossomo/metabolismo , Humanos , Imunoprecipitação/métodos , Lamina Tipo B/metabolismo , Ligação Proteica/genética , Ligação Proteica/fisiologia , Biossíntese de Proteínas/genética , Biossíntese de Proteínas/fisiologia , Processamento de Proteína Pós-Traducional , Proteínas/isolamento & purificação , Proteínas/metabolismo , Proteoma/metabolismo , Proteômica/métodos , RNA/genética , RNA/metabolismo , RNA Mensageiro/metabolismo , RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/metabolismo , Transcriptoma

2.

Uncovering new families and folds in the natural protein universe.

Durairaj, Janani; Waterhouse, Andrew M; Mets, Toomas; Brodiazhenko, Tetiana; Abdullah, Minhal; Studer, Gabriel; Tauriello, Gerardo; Akdel, Mehmet; Andreeva, Antonina; Bateman, Alex; Tenson, Tanel; Hauryliuk, Vasili; Schwede, Torsten; Pereira, Joana.

Nature ; 622(7983): 646-653, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37704037

RESUMO

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the ß-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.

Assuntos

Bases de Dados de Proteínas , Aprendizado Profundo , Anotação de Sequência Molecular , Dobramento de Proteína , Proteínas , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Internet , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo

3.

Bacterial retrons encode phage-defending tripartite toxin-antitoxin systems.

Bobonis, Jacob; Mitosch, Karin; Mateus, André; Karcher, Nicolai; Kritikos, George; Selkrig, Joel; Zietek, Matylda; Monzon, Vivian; Pfalz, Birgit; Garcia-Santamarina, Sarela; Galardini, Marco; Sueki, Anna; Kobayashi, Callie; Stein, Frank; Bateman, Alex; Zeller, Georg; Savitski, Mikhail M; Elfenbein, Johanna R; Andrews-Polymenis, Helene L; Typas, Athanasios.

Nature ; 609(7925): 144-150, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35850148

RESUMO

Retrons are prokaryotic genetic retroelements encoding a reverse transcriptase that produces multi-copy single-stranded DNA1 (msDNA). Despite decades of research on the biosynthesis of msDNA2, the function and physiological roles of retrons have remained unknown. Here we show that Retron-Sen2 of Salmonella enterica serovar Typhimurium encodes an accessory toxin protein, STM14_4640, which we renamed as RcaT. RcaT is neutralized by the reverse transcriptase-msDNA antitoxin complex, and becomes active upon perturbation of msDNA biosynthesis. The reverse transcriptase is required for binding to RcaT, and the msDNA is required for the antitoxin activity. The highly prevalent RcaT-containing retron family constitutes a new type of tripartite DNA-containing toxin-antitoxin system. To understand the physiological roles of such toxin-antitoxin systems, we developed toxin activation-inhibition conjugation (TAC-TIC), a high-throughput reverse genetics approach that identifies the molecular triggers and blockers of toxin-antitoxin systems. By applying TAC-TIC to Retron-Sen2, we identified multiple trigger and blocker proteins of phage origin. We demonstrate that phage-related triggers directly modify the msDNA, thereby activating RcaT and inhibiting bacterial growth. By contrast, prophage proteins circumvent retrons by directly blocking RcaT. Consistently, retron toxin-antitoxin systems act as abortive infection anti-phage defence systems, in line with recent reports3,4. Thus, RcaT retrons are tripartite DNA-regulated toxin-antitoxin systems, which use the reverse transcriptase-msDNA complex both as an antitoxin and as a sensor of phage protein activities.

Assuntos

Antitoxinas , Bacteriófagos , Retroelementos , Salmonella typhimurium , Sistemas Toxina-Antitoxina , Antitoxinas/genética , Bacteriófagos/metabolismo , DNA Bacteriano/genética , DNA de Cadeia Simples/genética , Conformação de Ácido Nucleico , Prófagos/metabolismo , DNA Polimerase Dirigida por RNA/metabolismo , Retroelementos/genética , Salmonella typhimurium/genética , Salmonella typhimurium/crescimento & desenvolvimento , Salmonella typhimurium/virologia , Sistemas Toxina-Antitoxina/genética

4.

Highly accurate protein structure prediction for the human proteome.

Tunyasuvunakool, Kathryn; Adler, Jonas; Wu, Zachary; Green, Tim; Zielinski, Michal; Zídek, Augustin; Bridgland, Alex; Cowie, Andrew; Meyer, Clemens; Laydon, Agata; Velankar, Sameer; Kleywegt, Gerard J; Bateman, Alex; Evans, Richard; Pritzel, Alexander; Figurnov, Michael; Ronneberger, Olaf; Bates, Russ; Kohl, Simon A A; Potapenko, Anna; Ballard, Andrew J; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Clancy, Ellen; Reiman, David; Petersen, Stig; Senior, Andrew W; Kavukcuoglu, Koray; Birney, Ewan; Kohli, Pushmeet; Jumper, John; Hassabis, Demis.

Nature ; 596(7873): 590-596, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34293799

RESUMO

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Assuntos

Biologia Computacional/normas , Aprendizado Profundo/normas , Modelos Moleculares , Conformação Proteica , Proteoma/química , Conjuntos de Dados como Assunto/normas , Diacilglicerol O-Aciltransferase/química , Glucose-6-Fosfatase/química , Humanos , Proteínas de Membrana/química , Dobramento de Proteína , Reprodutibilidade dos Testes

5.

When will RNA get its AlphaFold moment?

Schneider, Bohdan; Sweeney, Blake Alexander; Bateman, Alex; Cerny, Jiri; Zok, Tomasz; Szachniuk, Marta.

Nucleic Acids Res ; 51(18): 9522-9532, 2023 Oct 13.

Artigo em Inglês | MEDLINE | ID: mdl-37702120

RESUMO

The protein structure prediction problem has been solved for many types of proteins by AlphaFold. Recently, there has been considerable excitement to build off the success of AlphaFold and predict the 3D structures of RNAs. RNA prediction methods use a variety of techniques, from physics-based to machine learning approaches. We believe that there are challenges preventing the successful development of deep learning-based methods like AlphaFold for RNA in the short term. Broadly speaking, the challenges are the limited number of structures and alignments making data-hungry deep learning methods unlikely to succeed. Additionally, there are several issues with the existing structure and sequence data, as they are often of insufficient quality, highly biased and missing key information. Here, we discuss these challenges in detail and suggest some steps to remedy the situation. We believe that it is possible to create an accurate RNA structure prediction method, but it will require solving several data quality and volume issues, usage of data beyond simple sequence alignments, or the development of new less data-hungry machine learning methods.

6.

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022.

Thakur, Matthew; Bateman, Alex; Brooksbank, Cath; Freeberg, Mallory; Harrison, Melissa; Hartley, Matthew; Keane, Thomas; Kleywegt, Gerard; Leach, Andrew; Levchenko, Mariia; Morgan, Sarah; McDonagh, Ellen M; Orchard, Sandra; Papatheodorou, Irene; Velankar, Sameer; Vizcaino, Juan Antonio; Witham, Rick; Zdrazil, Barbara; McEntyre, Johanna.

Nucleic Acids Res ; 51(D1): D9-D17, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36477213

RESUMO

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.

Assuntos

Inteligência Artificial , Biologia Computacional , Gerenciamento de Dados , Bases de Dados Factuais , Genoma , Internet

7.

InterPro in 2022.

Paysan-Lafosse, Typhaine; Blum, Matthias; Chuguransky, Sara; Grego, Tiago; Pinto, Beatriz Lázaro; Salazar, Gustavo A; Bileschi, Maxwell L; Bork, Peer; Bridge, Alan; Colwell, Lucy; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex.

Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36350672

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

Assuntos

Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , Software

8.

The European Bioinformatics Institute (EMBL-EBI) in 2021.

Cantelli, Gaia; Bateman, Alex; Brooksbank, Cath; Petrov, Anton I; Malik-Sheriff, Rahuman S; Ide-Smith, Michele; Hermjakob, Henning; Flicek, Paul; Apweiler, Rolf; Birney, Ewan; McEntyre, Johanna.

Nucleic Acids Res ; 50(D1): D11-D19, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34850134

RESUMO

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.

Assuntos

Biologia Computacional/educação , Biologia Computacional/métodos , Bases de Dados Factuais , Academias e Institutos , Inteligência Artificial , COVID-19 , Bases de Dados Factuais/economia , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , RNA não Traduzido/genética , SARS-CoV-2/genética

9.

Periscope Proteins are variable-length regulators of bacterial cell surface interactions.

Whelan, Fiona; Lafita, Aleix; Gilburt, James; Dégut, Clément; Griffiths, Samuel C; Jenkins, Huw T; St John, Alexander N; Paci, Emanuele; Moir, James W B; Plevin, Michael J; Baumann, Christoph G; Bateman, Alex; Potts, Jennifer R.

Proc Natl Acad Sci U S A ; 118(23)2021 06 08.

Artigo em Inglês | MEDLINE | ID: mdl-34074781

RESUMO

Changes at the cell surface enable bacteria to survive in dynamic environments, such as diverse niches of the human host. Here, we reveal "Periscope Proteins" as a widespread mechanism of bacterial surface alteration mediated through protein length variation. Tandem arrays of highly similar folded domains can form an elongated rod-like structure; thus, variation in the number of domains determines how far an N-terminal host ligand binding domain projects from the cell surface. Supported by newly available long-read genome sequencing data, we propose that this class could contain over 50 distinct proteins, including those implicated in host colonization and biofilm formation by human pathogens. In large multidomain proteins, sequence divergence between adjacent domains appears to reduce interdomain misfolding. Periscope Proteins break this "rule," suggesting that their length variability plays an important role in regulating bacterial interactions with host surfaces, other bacteria, and the immune system.

Assuntos

Proteínas de Bactérias , Proteínas de Membrana , Streptococcus gordonii , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Proteínas de Membrana/química , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Streptococcus gordonii/química , Streptococcus gordonii/genética , Streptococcus gordonii/metabolismo

10.

A STRP-ed definition of Structured Tandem Repeats in Proteins.

Monzon, Alexander Miguel; Arrías, Paula Nazarena; Elofsson, Arne; Mier, Pablo; Andrade-Navarro, Miguel A; Bevilacqua, Martina; Clementel, Damiano; Bateman, Alex; Hirsh, Layla; Fornasari, Maria Silvina; Parisi, Gustavo; Piovesan, Damiano; Kajava, Andrey V; Tosatto, Silvio C E.

J Struct Biol ; 215(4): 108023, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-37652396

RESUMO

Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.

Assuntos

Proteínas , Sequências de Repetição em Tandem , Proteínas/genética , Proteínas/química , Sequências de Repetição em Tandem/genética , Sequência de Aminoácidos

11.

Domain shuffling of a highly mutable ligand-binding fold drives adhesin generation across the bacterial kingdom.

Barringer, Rob; Parnell, Alice E; Lafita, Aleix; Monzon, Vivian; Back, Catherine R; Madej, Mariusz; Potempa, Jan; Nobbs, Angela H; Burston, Steven G; Bateman, Alex; Race, Paul R.

Proteins ; 91(8): 1007-1020, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-36912614

RESUMO

Bacterial fibrillar adhesins are specialized extracellular polypeptides that promote the attachment of bacteria to the surfaces of other cells or materials. Adhesin-mediated interactions are critical for the establishment and persistence of stable bacterial populations within diverse environmental niches and are important determinants of virulence. The fibronectin (Fn)-binding fibrillar adhesin CshA, and its paralogue CshB, play important roles in host colonization by the oral commensal and opportunistic pathogen Streptococcus gordonii. As paralogues are often catalysts for functional diversification, we have probed the early stages of structural and functional divergence in Csh proteins by determining the X-ray crystal structure of the CshB adhesive domain NR2 and characterizing its Fn-binding properties in vitro. Despite sharing a common fold, CshB_NR2 displays an ~1.7-fold reduction in Fn-binding affinity relative to CshA_NR2. This correlates with reduced electrostatic charge in the Fn-binding cleft. Complementary bioinformatic studies reveal that homologues of CshA/B_NR2 domains are widely distributed in both Gram-positive and Gram-negative bacteria, where they are found housed within functionally cryptic multi-domain polypeptides. Our findings are consistent with the classification of Csh adhesins and their relatives as members of the recently defined polymer adhesin domain (PAD) family of bacterial proteins.

Assuntos

Antibacterianos , Proteínas de Membrana , Ligantes , Proteínas de Membrana/química , Bactérias Gram-Negativas/metabolismo , Bactérias Gram-Positivas/metabolismo , Adesinas Bacterianas/genética , Adesinas Bacterianas/química , Adesinas Bacterianas/metabolismo , Proteínas de Bactérias/química

12.

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.

Hufsky, Franziska; Lamkiewicz, Kevin; Almeida, Alexandre; Aouacheria, Abdel; Arighi, Cecilia; Bateman, Alex; Baumbach, Jan; Beerenwinkel, Niko; Brandt, Christian; Cacciabue, Marco; Chuguransky, Sara; Drechsel, Oliver; Finn, Robert D; Fritz, Adrian; Fuchs, Stephan; Hattab, Georges; Hauschild, Anne-Christin; Heider, Dominik; Hoffmann, Marie; Hölzer, Martin; Hoops, Stefan; Kaderali, Lars; Kalvari, Ioanna; von Kleist, Max; Kmiecinski, Renó; Kühnert, Denise; Lasso, Gorka; Libin, Pieter; List, Markus; Löchel, Hannah F; Martin, Maria J; Martin, Roman; Matschinske, Julian; McHardy, Alice C; Mendes, Pedro; Mistry, Jaina; Navratil, Vincent; Nawrocki, Eric P; O'Toole, Áine Niamh; Ontiveros-Palacios, Nancy; Petrov, Anton I; Rangel-Pineros, Guillermo; Redaschi, Nicole; Reimering, Susanne; Reinert, Knut; Reyes, Alejandro; Richardson, Lorna; Robertson, David L; Sadegh, Sepideh; Singer, Joshua B.

Brief Bioinform ; 22(2): 642-663, 2021 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-33147627

RESUMO

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.

Assuntos

COVID-19/prevenção & controle , Biologia Computacional , SARS-CoV-2/isolamento & purificação , Pesquisa Biomédica , COVID-19/epidemiologia , COVID-19/virologia , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética

13.

DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets.

Russo, Elena Tea; Barone, Federico; Bateman, Alex; Cozzini, Stefano; Punta, Marco; Laio, Alessandro.

PLoS Comput Biol ; 18(10): e1010610, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36260616

RESUMO

Proteins that are known only at a sequence level outnumber those with an experimental characterization by orders of magnitude. Classifying protein regions (domains) into homologous families can generate testable functional hypotheses for yet unannotated sequences. Existing domain family resources typically use at least some degree of manual curation: they grow slowly over time and leave a large fraction of the protein sequence space unclassified. We here describe automatic clustering by Density Peak Clustering of UniRef50 v. 2017_07, a protein sequence database including approximately 23M sequences. We performed a radical re-implementation of a pipeline we previously developed in order to allow handling millions of sequences and data volumes of the order of 3 TeraBytes. The modified pipeline, which we call DPCfam, finds â¼ 45,000 protein clusters in UniRef50. Our automatic classification is in close correspondence to the ones of the Pfam and ECOD resources: in particular, about 81% of medium-large Pfam families and 72% of ECOD families can be mapped to clusters generated by DPCfam. In addition, our protocol finds more than 14,000 clusters constituted of protein regions with no Pfam annotation, which are therefore candidates for representing novel protein families. These results are made available to the scientific community through a dedicated repository.

Assuntos

Proteínas , Bases de Dados de Proteínas , Proteínas/genética , Análise por Conglomerados , Sequência de Aminoácidos , Domínios Proteicos

14.

Pfam: The protein families database in 2021.

Mistry, Jaina; Chuguransky, Sara; Williams, Lowri; Qureshi, Matloob; Salazar, Gustavo A; Sonnhammer, Erik L L; Tosatto, Silvio C E; Paladin, Lisanna; Raj, Shriya; Richardson, Lorna J; Finn, Robert D; Bateman, Alex.

Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33125078

RESUMO

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.

Assuntos

Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Epidemias , Humanos , Internet , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Proteoma/classificação , Proteoma/genética , Sequências Repetitivas de Aminoácidos/genética , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Análise de Sequência de Proteína/métodos

15.

Rfam 14: expanded coverage of metagenomic, viral and microRNA families.

Kalvari, Ioanna; Nawrocki, Eric P; Ontiveros-Palacios, Nancy; Argasinska, Joanna; Lamkiewicz, Kevin; Marz, Manja; Griffiths-Jones, Sam; Toffano-Nioche, Claire; Gautheret, Daniel; Weinberg, Zasha; Rivas, Elena; Eddy, Sean R; Finn, Robert D; Bateman, Alex; Petrov, Anton I.

Nucleic Acids Res ; 49(D1): D192-D200, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33211869

RESUMO

Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.

Assuntos

Bases de Dados de Ácidos Nucleicos , Metagenoma , MicroRNAs/genética , RNA Bacteriano/genética , RNA não Traduzido/genética , RNA Viral/genética , Bactérias/genética , Bactérias/metabolismo , Pareamento de Bases , Sequência de Bases , Humanos , Internet , MicroRNAs/classificação , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , RNA Bacteriano/classificação , RNA Bacteriano/metabolismo , RNA não Traduzido/classificação , RNA não Traduzido/metabolismo , RNA Viral/classificação , RNA Viral/metabolismo , Alinhamento de Sequência , Análise de Sequência de RNA , Software , Vírus/genética , Vírus/metabolismo

16.

The InterPro protein families and domains database: 20 years on.

Blum, Matthias; Chang, Hsin-Yu; Chuguransky, Sara; Grego, Tiago; Kandasaamy, Swaathi; Mitchell, Alex; Nuka, Gift; Paysan-Lafosse, Typhaine; Qureshi, Matloob; Raj, Shriya; Richardson, Lorna; Salazar, Gustavo A; Williams, Lowri; Bork, Peer; Bridge, Alan; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex; Finn, Robert D.

Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.

Assuntos

Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência

17.

Large-Scale Discovery of Microbial Fibrillar Adhesins and Identification of Novel Members of Adhesive Domain Families.

Monzon, Vivian; Bateman, Alex.

J Bacteriol ; 204(6): e0010722, 2022 06 21.

Artigo em Inglês | MEDLINE | ID: mdl-35608365

RESUMO

Fibrillar adhesins are bacterial cell surface proteins that mediate interactions with the environment, including host cells during colonization or other bacteria during biofilm formation. These proteins are characterized by a stalk that projects the adhesive domain closer to the binding target. Fibrillar adhesins evolve quickly and thus can be difficult to computationally identify, yet they represent an important component for understanding bacterium-host interactions. To detect novel fibrillar adhesins, we developed a random forest prediction approach based on common characteristics we identified for this protein class. We applied this approach to Firmicutes and Actinobacteria proteomes, yielding over 6,500 confidently predicted fibrillar adhesins. To verify the approach, we investigated predicted fibrillar adhesins that lacked a known adhesive domain. Based on these proteins, we identified 24 sequence clusters representing potential novel members of adhesive domain families. We used AlphaFold to verify that 15 clusters showed structural similarity to known adhesive domains, such as the TED domain. Overall, our study has made a significant contribution to the number of known fibrillar adhesins and has enabled us to identify novel members of adhesive domain families involved in bacterial pathogenesis. IMPORTANCE Fibrillar adhesins are a class of bacterial cell surface proteins that enable bacteria to interact with their environment. We developed a machine learning approach to identify fibrillar adhesins and applied this classification approach to the Firmicutes and Actinobacteria Reference Proteomes database. This method allowed us to detect a high number of novel fibrillar adhesins and also novel members of adhesive domain families. To confirm our predictions of these potential adhesin protein domains, we predicted their structure using the AlphaFold tool.

Assuntos

Adesivos , Proteoma , Adesinas Bacterianas/metabolismo , Bactérias/genética , Bactérias/metabolismo , Aderência Bacteriana , Humanos , Proteínas de Membrana/química , Domínios Proteicos

18.

Sequence analysis of tyrosine recombinases allows annotation of mobile genetic elements in prokaryotic genomes.

Smyshlyaev, Georgy; Bateman, Alex; Barabas, Orsolya.

Mol Syst Biol ; 17(5): e9880, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-34018328

RESUMO

Mobile genetic elements (MGEs) sequester and mobilize antibiotic resistance genes across bacterial genomes. Efficient and reliable identification of such elements is necessary to follow resistance spreading. However, automated tools for MGE identification are missing. Tyrosine recombinase (YR) proteins drive MGE mobilization and could provide markers for MGE detection, but they constitute a diverse family also involved in housekeeping functions. Here, we conducted a comprehensive survey of YRs from bacterial, archaeal, and phage genomes and developed a sequence-based classification system that dissects the characteristics of MGE-borne YRs. We revealed that MGE-related YRs evolved from non-mobile YRs by acquisition of a regulatory arm-binding domain that is essential for their mobility function. Based on these results, we further identified numerous unknown MGEs. This work provides a resource for comparative analysis and functional annotation of YRs and aids the development of computational tools for MGE annotation. Additionally, we reveal how YRs adapted to drive gene transfer across species and provide a tool to better characterize antibiotic resistance dissemination.

Assuntos

Archaea/genética , Bactérias/genética , Fungos/genética , Recombinases/metabolismo , Análise de Sequência de Proteína/métodos , Archaea/enzimologia , Bactérias/enzimologia , Resistência Microbiana a Medicamentos , Evolução Molecular , Fungos/enzimologia , Sequências Repetitivas Dispersas , Anotação de Sequência Molecular , Biologia de Sistemas

19.

Defining the remarkable structural malleability of a bacterial surface protein Rib domain implicated in infection.

Whelan, Fiona; Lafita, Aleix; Griffiths, Samuel C; Cooper, Rachael E M; Whittingham, Jean L; Turkenburg, Johan P; Manfield, Iain W; St John, Alexander N; Paci, Emanuele; Bateman, Alex; Potts, Jennifer R.

Proc Natl Acad Sci U S A ; 116(52): 26540-26548, 2019 Dec 26.

Artigo em Inglês | MEDLINE | ID: mdl-31818940

RESUMO

Streptococcus groups A and B cause serious infections, including early onset sepsis and meningitis in newborns. Rib domain-containing surface proteins are found associated with invasive strains and elicit protective immunity in animal models. Yet, despite their apparent importance in infection, the structure of the Rib domain was previously unknown. Structures of single Rib domains of differing length reveal a rare case of domain atrophy through deletion of 2 core antiparallel strands, resulting in the loss of an entire sheet of the ß-sandwich from an immunoglobulin-like fold. Previously, observed variation in the number of Rib domains within these bacterial cell wall-attached proteins has been suggested as a mechanism of immune evasion. Here, the structure of tandem domains, combined with molecular dynamics simulations and small angle X-ray scattering, suggests that variability in Rib domain number would result in differential projection of an N-terminal host-colonization domain from the bacterial surface. The identification of 2 further structures where the typical B-D-E immunoglobulin ß-sheet is replaced with an α-helix further confirms the extensive structural malleability of the Rib domain.

20.

Discovery of fibrillar adhesins across bacterial species.

Monzon, Vivian; Lafita, Aleix; Bateman, Alex.

BMC Genomics ; 22(1): 550, 2021 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-34275445

RESUMO

BACKGROUND: Fibrillar adhesins are long multidomain proteins that form filamentous structures at the cell surface of bacteria. They are an important yet understudied class of proteins composed of adhesive and stalk domains that mediate interactions of bacteria with their environment. This study aims to characterize fibrillar adhesins in a wide range of bacterial phyla and to identify new fibrillar adhesin-like proteins to improve our understanding of host-bacteria interactions. RESULTS: Through careful literature and computational searches, we identified 82 stalk and 27 adhesive domain families in fibrillar adhesins. Based on the presence of these domains in the UniProt Reference Proteomes database, we identified and analysed 3,542 fibrillar adhesin-like proteins across species of the most common bacterial phyla. We further enumerate the adhesive and stalk domain combinations found in nature and demonstrate that fibrillar adhesins have complex and variable domain architectures, which differ across species. By analysing the domain architecture of fibrillar adhesins, we show that in Gram positive bacteria, adhesive domains are mostly positioned at the N-terminus and cell surface anchors at the C-terminus of the protein, while their positions are more variable in Gram negative bacteria. We provide an open repository of fibrillar adhesin-like proteins and domains to enable further studies of this class of bacterial surface proteins. CONCLUSION: This study provides a domain-based characterization of fibrillar adhesins and demonstrates that they are widely found in species across the main bacterial phyla. We have discovered numerous novel fibrillar adhesins and improved our understanding of pathogenic adhesion and invasion mechanisms.

Assuntos

Adesinas Bacterianas , Proteínas de Bactérias , Adesinas Bacterianas/genética , Bactérias/genética , Aderência Bacteriana , Proteínas de Bactérias/genética , Bactérias Gram-Positivas , Proteínas de Membrana

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA