Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Assunto da revista
País de afiliação
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 204, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38824535

RESUMO

BACKGROUND: Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS: In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS: GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .


Assuntos
Proteínas , Solubilidade , Proteínas/química , Proteínas/metabolismo , Conformação Proteica , Bases de Dados de Proteínas , Biologia Computacional/métodos , Software , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/química , Algoritmos , Modelos Moleculares , Sequência de Aminoácidos
2.
bioRxiv ; 2024 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-39345488

RESUMO

Purpose: We previously developed an approach to calibrate computational tools for clinical variant classification, updating recommendations for the reliable use of variant impact predictors to provide evidence strength up to Strong. A new generation of tools using distinctive approaches have since been released, and these methods must be independently calibrated for clinical application. Method: Using our local posterior probability-based calibration and our established data set of ClinVar pathogenic and benign variants, we determined the strength of evidence provided by three new tools (AlphaMissense, ESM1b, VARITY) and calibrated scores meeting each evidence strength. Results: All three tools reached the Strong level of evidence for variant pathogenicity and Moderate for benignity, though sometimes for few variants. Compared to previously recommended tools, these yielded at best only modest improvements in the tradeoffs of evidence strength and false positive predictions. Conclusion: At calibrated thresholds, three new computational predictors provided evidence for variant pathogenicity at similar strength to the four previously recommended predictors (and comparable with functional assays for some variants). This calibration broadens the scope of computational tools for application in clinical variant classification. Their new approaches offer promise for future advancement of the field.

3.
Circ Genom Precis Med ; : e004584, 2024 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-39119706

RESUMO

BACKGROUND: Genetic testing for cardiac channelopathies is the standard of care. However, many rare genetic variants remain classified as variants of uncertain significance (VUS) due to lack of epidemiological and functional data. Whether deep protein language models may aid in VUS resolution remains unknown. Here, we set out to compare how 2 deep protein language models perform at VUS resolution in the 3 most common long-QT syndrome-causative genes compared with the gold-standard patch clamp. METHODS: A total of 72 rare nonsynonymous VUS (9 KCNQ1, 19 KCNH2, and 50 SCN5A) were engineered by site-directed mutagenesis and expressed in either HEK293 cells or TSA201 cells. Whole-cell patch-clamp technique was used to functionally characterize these variants. The protein language models, ESM1b and AlphaMissense, were used to predict the variant effect of missense variants and compared with patch clamp. RESULTS: Considering variants in all 3 genes, the ESM1b model had a receiver operator curve-area under the curve of 0.75 (P=0.0003). It had a sensitivity of 88% and a specificity of 50%. AlphaMissense performed well compared with patch-clamp with an receiver operator curve-area under the curve of 0.85 (P<0.0001), sensitivity of 80%, and specificity of 76%. CONCLUSIONS: Deep protein language models aid in VUS resolution with high sensitivity but lower specificity. Thus, these tools cannot fully replace functional characterization but can aid in reducing the number of variants that may require functional analysis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA