Your browser doesn't support javascript.
loading
PhANNs, a fast and accurate tool and web server to classify phage structural proteins.
Cantu, Vito Adrian; Salamon, Peter; Seguritan, Victor; Redfield, Jackson; Salamon, David; Edwards, Robert A; Segall, Anca M.
Afiliação
  • Cantu VA; Computational Science Research Center, San Diego State University, San Diego, United States of America.
  • Salamon P; Viral Information Institute, San Diego State University, San Diego, United States of America.
  • Seguritan V; Viral Information Institute, San Diego State University, San Diego, United States of America.
  • Redfield J; Department of Mathematics and Statistics, San Diego State University, San Diego, United States of America.
  • Salamon D; Computational Science Research Center, San Diego State University, San Diego, United States of America.
  • Edwards RA; Department of Biology, San Diego State University, San Diego, United States of America.
  • Segall AM; Department of Mathematics and Statistics, San Diego State University, San Diego, United States of America.
PLoS Comput Biol ; 16(11): e1007845, 2020 11.
Article em En | MEDLINE | ID: mdl-33137102
For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an "other" category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as "other," providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Bacteriófagos / Proteínas Estruturais Virais / Internet / Bases de Dados de Proteínas Tipo de estudo: Prognostic_studies Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Bacteriófagos / Proteínas Estruturais Virais / Internet / Bases de Dados de Proteínas Tipo de estudo: Prognostic_studies Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos