Your browser doesn't support javascript.
loading
TT3D: Leveraging precomputed protein 3D sequence models to predict protein-protein interactions.
Sledzieski, Samuel; Devkota, Kapil; Singh, Rohit; Cowen, Lenore; Berger, Bonnie.
Affiliation
  • Sledzieski S; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States.
  • Devkota K; Department of Computer Science, Tufts University, 177 College Avenue, Medford, MA 02155, United States.
  • Singh R; Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27705, United States.
  • Cowen L; Department of Cell Biology, Duke University, Durham, NC 27705, United States.
  • Berger B; Department of Computer Science, Tufts University, 177 College Avenue, Medford, MA 02155, United States.
Bioinformatics ; 39(11)2023 11 01.
Article in En | MEDLINE | ID: mdl-37897686
ABSTRACT
MOTIVATION High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di).

RESULTS:

We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein-protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein-protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION TT3D is available at https//github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https//zenodo.org/records/10037674.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Proteins Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2023 Document type: Article Affiliation country: United States Publication country: ENGLAND / ESCOCIA / GB / GREAT BRITAIN / INGLATERRA / REINO UNIDO / SCOTLAND / UK / UNITED KINGDOM

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Proteins Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2023 Document type: Article Affiliation country: United States Publication country: ENGLAND / ESCOCIA / GB / GREAT BRITAIN / INGLATERRA / REINO UNIDO / SCOTLAND / UK / UNITED KINGDOM