Protein NMR assignment by isotope pattern recognition.

Rasulov, Uluk; Wang, Harrison K; Viennet, Thibault; Droemer, Maxim A; Matosin, Srdan; Schindler, Sebastian; Sun, Zhen-Yu J; Mureddu, Luca; Vuister, Geerten W; Robson, Scott A; Arthanari, Haribabu; Kuprov, Ilya

Rasulov, Uluk; Wang, Harrison K; Viennet, Thibault; Droemer, Maxim A; Matosin, Srdan; Schindler, Sebastian; Sun, Zhen-Yu J; Mureddu, Luca; Vuister, Geerten W; Robson, Scott A; Arthanari, Haribabu; Kuprov, Ilya.

Afiliación

Rasulov U; School of Chemistry, University of Southampton, University Road, Southampton SO17 1BJ, UK.
Wang HK; Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
Viennet T; Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA.
Droemer MA; Department of Chemistry and iNANO, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark.
Matosin S; Faculty for Chemistry and Pharmacy, Ludwig-Maximilians-Universität München, Munich, Germany.
Schindler S; Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
Sun ZJ; Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA.
Mureddu L; Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
Vuister GW; Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA.
Robson SA; Department of Biochemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
Arthanari H; Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA.
Kuprov I; Department of Molecular and Cell Biology, Institute for Structural and Chemical Biology, University of Leicester, Lancaster Road, Leicester LE1 7HB, UK.

Sci Adv ; 10(36): eado0403, 2024 Sep 06.

Article en En | MEDLINE | ID: mdl-39231223

ABSTRACT

ABSTRACT

The current standard method for amino acid signal identification in protein NMR spectra is sequential assignment using triple-resonance experiments. Good software and elaborate heuristics exist, but the process remains laboriously manual. Machine learning does help, but its training databases need millions of samples that cover all relevant physics and every kind of instrumental artifact. In this communication, we offer a solution to this problem. We propose polyadic decompositions to store millions of simulated three-dimensional NMR spectra, on-the-fly generation of artifacts during training, a probabilistic way to incorporate prior and posterior information, and integration with the industry standard CcpNmr software framework. The resulting neural nets take [1H,13C] slices of mixed pyruvate-labeled HNCA spectra (different CA signal shapes for different residue types) and return an amino acid probability table. In combination with primary sequence information, backbones of common proteins (GB1, MBP, and INMT) are rapidly assigned from just the HNCA spectrum.

Asunto(s)

Proteínas; Proteínas/química; Resonancia Magnética Nuclear Biomolecular/métodos; Programas Informáticos; Aminoácidos/química; Algoritmos; Isótopos/química; Aprendizaje Automático

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Proteínas Idioma: En Revista: Sci Adv Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Proteínas Idioma: En Revista: Sci Adv Año: 2024 Tipo del documento: Article