Decoding functional proteome information in model organisms using protein language models.

Barrios-Núñez, Israel; Martínez-Redondo, Gemma I; Medina-Burgos, Patricia; Cases, Ildefonso; Fernández, Rosa; Rojas, Ana M

Barrios-Núñez, Israel; Martínez-Redondo, Gemma I; Medina-Burgos, Patricia; Cases, Ildefonso; Fernández, Rosa; Rojas, Ana M.

Affiliation

Barrios-Núñez I; Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain.
Martínez-Redondo GI; Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain.
Medina-Burgos P; Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain.
Cases I; Bioinformatics Unit, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain.
Fernández R; Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain.
Rojas AM; Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain.

NAR Genom Bioinform ; 6(3): lqae078, 2024 Sep.

Article in En | MEDLINE | ID: mdl-38962255

ABSTRACT

ABSTRACT

Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use.

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: NAR Genom Bioinform Year: 2024 Document type: Article Affiliation country: Spain

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: NAR Genom Bioinform Year: 2024 Document type: Article Affiliation country: Spain