Your browser doesn't support javascript.
loading
A deep learning genome-mining strategy for biosynthetic gene cluster prediction.
Hannigan, Geoffrey D; Prihoda, David; Palicka, Andrej; Soukup, Jindrich; Klempir, Ondrej; Rampula, Lena; Durcak, Jindrich; Wurst, Michael; Kotowski, Jakub; Chang, Dan; Wang, Rurun; Piizzi, Grazia; Temesi, Gergely; Hazuda, Daria J; Woelk, Christopher H; Bitton, Danny A.
Afiliação
  • Hannigan GD; Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
  • Prihoda D; Big Data Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Palicka A; Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, Czech Republic.
  • Soukup J; AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Klempir O; Data Science, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Rampula L; Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Durcak J; NLP, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Wurst M; Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Kotowski J; AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Chang D; AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Wang R; Genetics & Pharmacogenomics, Merck & Co., Inc., Boston, MA, USA.
  • Piizzi G; Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
  • Temesi G; Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
  • Hazuda DJ; Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
  • Woelk CH; Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
  • Bitton DA; Infectious Diseases and Vaccine Research, MRL, Merck & Co., Inc., West Point, PA, USA.
Nucleic Acids Res ; 47(18): e110, 2019 10 10.
Article em En | MEDLINE | ID: mdl-31400112
ABSTRACT
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Família Multigênica / Biologia Computacional / Vias Biossintéticas / Mineração de Dados Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Família Multigênica / Biologia Computacional / Vias Biossintéticas / Mineração de Dados Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2019 Tipo de documento: Article