Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38586054

RESUMO

Machine learning (ML) for protein design requires large protein fitness datasets generated by high-throughput experiments for training, fine-tuning, and benchmarking models. However, most models do not account for experimental noise inherent in these datasets, harming model performance and changing model rankings in benchmarking studies. Here, we develop FLIGHTED, a Bayesian method for generating fitness landscapes with calibrated errors from noisy high-throughput experimental data. We apply FLIGHTED to single-step selection assays such as phage display and to a novel high-throughput assay DHARMA that ties fitness to base editing activity. Our results show that FLIGHTED robustly generates fitness landscapes with accurate errors. We demonstrate that FLIGHTED improves model performance and enables the generation of protein fitness datasets of up to 106 variants with DHARMA. FLIGHTED can be used on any high-throughput assay and makes it easy for ML scientists to account for experimental noise when modeling protein fitness.

2.
Nucleic Acids Res ; 50(D1): D553-D559, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850923

RESUMO

The Structural Classification of Proteins-extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries).


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas/classificação , Algoritmos , Bases de Dados de Compostos Químicos , Regulação da Expressão Gênica/genética , Aprendizado de Máquina , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA