Pesquisa | Biblioteca Virtual em Saúde

The generative capacity of probabilistic protein sequence models.

McGee, Francisco; Hauri, Sandro; Novinger, Quentin; Vucetic, Slobodan; Levy, Ronald M; Carnevale, Vincenzo; Haldane, Allan.

Nat Commun ; 12(1): 6302, 2021 11 02.

Artigo em Inglês | MEDLINE | ID: mdl-34728624

RESUMO

Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model's generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE's lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general.

Assuntos

Mutação , Proteínas/química , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Modelos Estatísticos , Elementos Estruturais de Proteínas , Proteínas/genética , Relação Estrutura-Atividade

Structure motif-centric learning framework for inorganic crystalline systems.

Banjade, Huta R; Hauri, Sandro; Zhang, Shanshan; Ricci, Francesco; Gong, Weiyi; Hautier, Geoffroy; Vucetic, Slobodan; Yan, Qimin.

Sci Adv ; 7(17)2021 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-33883136

RESUMO

Incorporation of physical principles in a machine learning (ML) architecture is a fundamental step toward the continued development of artificial intelligence for inorganic materials. As inspired by the Pauling's rule, we propose that structure motifs in inorganic crystals can serve as a central input to a machine learning framework. We demonstrated that the presence of structure motifs and their connections in a large set of crystalline compounds can be converted into unique vector representations using an unsupervised learning algorithm. To demonstrate the use of structure motif information, a motif-centric learning framework is created by combining motif information with the atom-based graph neural networks to form an atom-motif dual graph network (AMDNet), which is more accurate in predicting the electronic structures of metal oxides such as bandgaps. The work illustrates the route toward fundamental design of graph neural network learning architecture for complex materials by incorporating beyond-atom physical principles.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA