Improving quantitative prediction of protein subcellular locations in fluorescence images through deep generative models.
Comput Biol Med
; 179: 108913, 2024 Sep.
Article
em En
| MEDLINE
| ID: mdl-39047508
ABSTRACT
Machine learning has been employed in recognizing protein localization at the subcellular level, which highly facilitates the protein function studies, especially for those multi-label proteins that localize in more than one organelle. However, existing works mostly study the qualitative classification of protein subcellular locations, ignoring fraction of one multi-label protein in different locations. Actually, about 50 % proteins are multi-label proteins, and the ignorance of quantitative information highly restricts the understanding of their spatial distribution and functional mechanism. One reason of the lack of quantitative study is the insufficiency of quantitative annotations. To address the data shortage problem, here we proposed a generative model, PLocGAN, which could generate cell images with conditional quantitative annotation of the fluorescence distribution. The model was a conditional generative adversarial network, in which the condition learning utilized partial label learning to overcome the lack of training labels and allowed training with only qualitative labels. Meanwhile, it used contrastive learning to enhance diversity of the generated images. We assessed the PLocGAN on four pixel-fused synthetic datasets and one real dataset, and demonstrated that the model could generate images with good fidelity and diversity, outperforming existing state-of-the-art generative methods. To verify the utility of PLocGAN in the quantitative prediction of protein subcellular locations, we replaced the training images with generated quantitative images and built prediction models, and found that they had a boosting effect on the quantitative estimation. This work demonstrates the effectiveness of deep generative models in bioimage analysis, and provides a new solution for quantitative subcellular proteomics.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Aprendizado Profundo
Limite:
Humans
Idioma:
En
Ano de publicação:
2024
Tipo de documento:
Article