OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization.

Ahdritz, Gustaf; Bouatta, Nazim; Floristean, Christina; Kadyan, Sachin; Xia, Qinghui; Gerecke, William; O'Donnell, Timothy J; Berenberg, Daniel; Fisk, Ian; Zanichelli, Niccolò; Zhang, Bo; Nowaczynski, Arkadiusz; Wang, Bei; Stepniewska-Dziubinska, Marta M; Zhang, Shang; Ojewole, Adegoke; Guney, Murat Efe; Biderman, Stella; Watkins, Andrew M; Ra, Stephen; Lorenzo, Pablo Ribalta; Nivon, Lucas; Weitzner, Brian; Ban, Yih-En Andrew; Chen, Shiyang; Zhang, Minjia; Li, Conglong; Song, Shuaiwen Leon; He, Yuxiong; Sorger, Peter K; Mostaque, Emad; Zhang, Zhao; Bonneau, Richard; AlQuraishi, Mohammed

Ahdritz, Gustaf; Bouatta, Nazim; Floristean, Christina; Kadyan, Sachin; Xia, Qinghui; Gerecke, William; O'Donnell, Timothy J; Berenberg, Daniel; Fisk, Ian; Zanichelli, Niccolò; Zhang, Bo; Nowaczynski, Arkadiusz; Wang, Bei; Stepniewska-Dziubinska, Marta M; Zhang, Shang; Ojewole, Adegoke; Guney, Murat Efe; Biderman, Stella; Watkins, Andrew M; Ra, Stephen; Lorenzo, Pablo Ribalta; Nivon, Lucas; Weitzner, Brian; Ban, Yih-En Andrew; Chen, Shiyang; Zhang, Minjia; Li, Conglong; Song, Shuaiwen Leon; He, Yuxiong; Sorger, Peter K; Mostaque, Emad; Zhang, Zhao; Bonneau, Richard; AlQuraishi, Mohammed.

Afiliação

Ahdritz G; Department of Systems Biology, Columbia University, New York, NY, USA.
Bouatta N; Harvard University, Cambridge, MA, USA.
Floristean C; Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA. nbouatta@gmail.com.
Kadyan S; Department of Systems Biology, Columbia University, New York, NY, USA.
Xia Q; Department of Systems Biology, Columbia University, New York, NY, USA.
Gerecke W; Department of Systems Biology, Columbia University, New York, NY, USA.
O'Donnell TJ; Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.
Berenberg D; Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Fisk I; Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.
Zanichelli N; Flatiron Institute, New York, NY, USA.
Zhang B; OpenBioML, Cambridge, MA, USA.
Nowaczynski A; Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA.
Wang B; NVIDIA, Santa Clara, CA, USA.
Stepniewska-Dziubinska MM; NVIDIA, Santa Clara, CA, USA.
Zhang S; NVIDIA, Santa Clara, CA, USA.
Ojewole A; NVIDIA, Santa Clara, CA, USA.
Guney ME; NVIDIA, Santa Clara, CA, USA.
Biderman S; NVIDIA, Santa Clara, CA, USA.
Watkins AM; EleutherAI, New York, NY, USA.
Ra S; Booz Allen Hamilton, McLean, VA, USA.
Lorenzo PR; Prescient Design, Genentech, New York, NY, USA.
Nivon L; Prescient Design, Genentech, New York, NY, USA.
Weitzner B; NVIDIA, Santa Clara, CA, USA.
Ban YA; Cyrus Bio, Seattle, WA, USA.
Chen S; Outpace Bio, Seattle, WA, USA.
Zhang M; Arzeda, Seattle, WA, USA.
Li C; Rutgers University, New Brunswick, NJ, USA.
Song SL; University of Illinois at Urbana-Champaign, Champaign, IL, USA.
He Y; Microsoft, Redmond, WA, USA.
Sorger PK; Microsoft, Redmond, WA, USA.
Mostaque E; Microsoft, Redmond, WA, USA.
Zhang Z; Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.
Bonneau R; Stability AI, Los Altos, CA, USA.
AlQuraishi M; Rutgers University, New Brunswick, NJ, USA.

Nat Methods ; 21(8): 1514-1524, 2024 Aug.

Article em En | MEDLINE | ID: mdl-38744917

ABSTRACT

ABSTRACT

AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.

Assuntos

Modelos Moleculares; Dobramento de Proteína; Proteínas; Proteínas/química; Biologia Computacional/métodos; Software; Conformação Proteica; Algoritmos; Estrutura Secundária de Proteína

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas / Modelos Moleculares / Dobramento de Proteína Idioma: En Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google