Pesquisa | BVS CLAP/SMR-OPAS/OMS

DeepLoc 2.1: multi-label membrane protein type prediction using protein language models.

Ødum, Marius Thrane; Teufel, Felix; Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Winther, Ole; Nielsen, Henrik.

Nucleic Acids Res ; 52(W1): W215-W220, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38587188

RESUMO

DeepLoc 2.0 is a popular web server for the prediction of protein subcellular localization and sorting signals. Here, we introduce DeepLoc 2.1, which additionally classifies the input proteins into the membrane protein types Transmembrane, Peripheral, Lipid-anchored and Soluble. Leveraging pre-trained transformer-based protein language models, the server utilizes a three-stage architecture for sequence-based, multi-label predictions. Comparative evaluations with other established tools on a test set of 4933 eukaryotic protein sequences, constructed following stringent homology partitioning, demonstrate state-of-the-art performance. Notably, DeepLoc 2.1 outperforms existing models, with the larger ProtT5 model exhibiting a marginal advantage over the ESM-1B model. The web server is available at https://services.healthtech.dtu.dk/services/DeepLoc-2.1.

Assuntos

Proteínas de Membrana , Software , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Internet , Sinais Direcionadores de Proteínas , Análise de Sequência de Proteína

DeepLocRNA: an interpretable deep learning model for predicting RNA subcellular localization with domain-specific transfer-learning.

Wang, Jun; Horlacher, Marc; Cheng, Lixin; Winther, Ole.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38317052

RESUMO

MOTIVATION: Accurate prediction of RNA subcellular localization plays an important role in understanding cellular processes and functions. Although post-transcriptional processes are governed by trans-acting RNA binding proteins (RBPs) through interaction with cis-regulatory RNA motifs, current methods do not incorporate RBP-binding information. RESULTS: In this article, we propose DeepLocRNA, an interpretable deep-learning model that leverages a pre-trained multi-task RBP-binding prediction model to predict the subcellular localization of RNA molecules via fine-tuning. We constructed DeepLocRNA using a comprehensive dataset with variant RNA types and evaluated it on the held-out dataset. Our model achieved state-of-the-art performance in predicting RNA subcellular localization in mRNA and miRNA. It has also demonstrated great generalization capabilities, performing well on both human and mouse RNA. Additionally, a motif analysis was performed to enhance the interpretability of the model, highlighting signal factors that contributed to the predictions. The proposed model provides general and powerful prediction abilities for different RNA types and species, offering valuable insights into the localization patterns of RNA molecules and contributing to our understanding of cellular processes at the molecular level. A user-friendly web server is available at: https://biolib.com/KU/DeepLocRNA/.

Assuntos

Aprendizado Profundo , Animais , Humanos , Camundongos , RNA/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Motivos de Nucleotídeos , Proteínas de Ligação a RNA/metabolismo , Biologia Computacional/métodos

Can large language models reason about medical questions?

Liévin, Valentin; Hother, Christoffer Egeberg; Motzfeldt, Andreas Geert; Winther, Ole.

Patterns (N Y) ; 5(3): 100943, 2024 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-38487804

RESUMO

Although large language models often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether closed- and open-source models (GPT-3.5, Llama 2, etc.) can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-US Medical Licensing Examination [USMLE], MedMCQA, and PubMedQA) and multiple prompting scenarios: chain of thought (CoT; think step by step), few shot, and retrieval augmentation. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason, and recall expert knowledge. Last, by leveraging advances in prompt engineering (few-shot and ensemble methods), we demonstrated that GPT-3.5 not only yields calibrated predictive distributions but also reaches the passing score on three datasets: MedQA-USMLE (60.2%), MedMCQA (62.7%), and PubMedQA (78.2%). Open-source models are closing the gap: Llama 2 70B also passed the MedQA-USMLE with 62.5% accuracy.

DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations.

Høie, Magnus Haraldson; Gade, Frederik Steensgaard; Johansen, Julie Maria; Würtzen, Charlotte; Winther, Ole; Nielsen, Morten; Marcatili, Paolo.

Front Immunol ; 15: 1322712, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38390326

RESUMO

Accurate computational identification of B-cell epitopes is crucial for the development of vaccines, therapies, and diagnostic tools. However, current structure-based prediction methods face limitations due to the dependency on experimentally solved structures. Here, we introduce DiscoTope-3.0, a markedly improved B-cell epitope prediction tool that innovatively employs inverse folding structure representations and a positive-unlabelled learning strategy, and is adapted for both solved and predicted structures. Our tool demonstrates a considerable improvement in performance over existing methods, accurately predicting linear and conformational epitopes across multiple independent datasets. Most notably, DiscoTope-3.0 maintains high predictive performance across solved, relaxed and predicted structures, alleviating the need for experimental structures and extending the general applicability of accurate B-cell epitope prediction by 3 orders of magnitude. DiscoTope-3.0 is made widely accessible on two web servers, processing over 100 structures per submission, and as a downloadable package. In addition, the servers interface with RCSB and AlphaFoldDB, facilitating large-scale prediction across over 200 million cataloged proteins. DiscoTope-3.0 is available at: https://services.healthtech.dtu.dk/service.php?DiscoTope-3.0.

Assuntos

Epitopos de Linfócito B , Conformação Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA