Your browser doesn't support javascript.
loading
mtx-COBRA: Subcellular localization prediction for bacterial proteins.
Arora, Isha; Kummer, Arkadij; Zhou, Hao; Gadjeva, Mihaela; Ma, Eric; Chuang, Gwo-Yu; Ong, Edison.
Afiliación
  • Arora I; Moderna, Inc., 200 Technology Square, Cambridge, MA 02139, USA.
  • Kummer A; Moderna, Inc., 200 Technology Square, Cambridge, MA 02139, USA.
  • Zhou H; Moderna, Inc., 200 Technology Square, Cambridge, MA 02139, USA.
  • Gadjeva M; Moderna, Inc., 200 Technology Square, Cambridge, MA 02139, USA.
  • Ma E; Moderna, Inc., 200 Technology Square, Cambridge, MA 02139, USA.
  • Chuang GY; Moderna, Inc., 200 Technology Square, Cambridge, MA 02139, USA.
  • Ong E; Moderna, Inc., 200 Technology Square, Cambridge, MA 02139, USA. Electronic address: edison.ong@modernatx.com.
Comput Biol Med ; 171: 108114, 2024 Mar.
Article en En | MEDLINE | ID: mdl-38401450
ABSTRACT

BACKGROUND:

Bacteria can have beneficial effects on our health and environment; however, many are responsible for serious infectious diseases, warranting the need for vaccines against such pathogens. Bioinformatic and experimental technologies are crucial for the development of vaccines. The vaccine design pipeline requires identification of bacteria-specific antigens that can be recognized and can induce a response by the immune system upon infection. Immune system recognition is influenced by the location of a protein. Methods have been developed to determine the subcellular localization (SCL) of proteins in prokaryotes and eukaryotes. Bioinformatic tools such as PSORTb can be employed to determine SCL of proteins, which would be tedious to perform experimentally. Unfortunately, PSORTb often predicts many proteins as having an "Unknown" SCL, reducing the number of antigens to evaluate as potential vaccine targets.

METHOD:

We present a new pipeline called subCellular lOcalization prediction for BacteRiAl Proteins (mtx-COBRA). mtx-COBRA uses Meta's protein language model, Evolutionary Scale Modeling, combined with an Extreme Gradient Boosting machine learning model to identify SCL of bacterial proteins based on amino acid sequence. This pipeline is trained on a curated dataset that combines data from UniProt and the publicly available ePSORTdb dataset.

RESULTS:

Using benchmarking analyses, nested 5-fold cross-validation, and leave-one-pathogen-out methods, followed by testing on the held-out dataset, we show that our pipeline predicts the SCL of bacterial proteins more accurately than PSORTb.

CONCLUSIONS:

mtx-COBRA provides an accessible pipeline that can more efficiently classify bacterial proteins with currently "Unknown" SCLs than existing bioinformatic and experimental methods.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Proteínas Bacterianas / Vacunas Idioma: En Revista: Comput Biol Med Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Proteínas Bacterianas / Vacunas Idioma: En Revista: Comput Biol Med Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos
...