Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Learning With Noisy Labels Over Imbalanced Subpopulations.

Chen, Mingcai; Zhao, Yu; He, Bing; Han, Zongbo; Huang, Junzhou; Wu, Bingzhe; Yao, Jianhua.

IEEE Trans Neural Netw Learn Syst ; PP2024 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-38691432

RESUMO

Learning with noisy labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have a "small loss." However, this assumption often fails to generalize to some real-world cases with imbalanced subpopulations, that is, training subpopulations that vary in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those "informative" samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address this issue, we propose a novel LNL method to deal with noisy labels and imbalanced subpopulations simultaneously. It first leverages sample correlation to estimate samples' clean probabilities for label correction and then utilizes corrected labels for distributionally robust optimization (DRO) to further improve the robustness. Specifically, in contrast to previous works using classification loss as the selection criterion, we introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities. Then, we refurbish the noisy labels using the estimated clean probabilities and the pseudo-labels from the model's predictions. With refurbished labels, we use DRO to train the model to be robust to subpopulation imbalance. Extensive experiments on a wide range of benchmarks demonstrate that our technique can consistently improve state-of-the-art (SOTA) robust learning paradigms against noisy labels, especially when encountering imbalanced subpopulations. We provide our code in https://github.com/chenmc1996/LNL-IS.

scPROTEIN: a versatile deep graph contrastive learning framework for single-cell proteomics embedding.

Li, Wei; Yang, Fan; Wang, Fang; Rong, Yu; Liu, Linjing; Wu, Bingzhe; Zhang, Han; Yao, Jianhua.

Nat Methods ; 21(4): 623-634, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38504113

RESUMO

Single-cell proteomics sequencing technology sheds light on protein-protein interactions, posttranslational modifications and proteoform dynamics in the cell. However, the uncertainty estimation for peptide quantification, data missingness, batch effects and high noise hinder the analysis of single-cell proteomic data. It is important to solve this set of tangled problems together, but the existing methods tailored for single-cell transcriptomes cannot fully address this task. Here we propose a versatile framework designed for single-cell proteomics data analysis called scPROTEIN, which consists of peptide uncertainty estimation based on a multitask heteroscedastic regression model and cell embedding generation based on graph contrastive learning. scPROTEIN can estimate the uncertainty of peptide quantification, denoise protein data, remove batch effects and encode single-cell proteomic-specific embeddings in a unified framework. We demonstrate that scPROTEIN is efficient for cell clustering, batch correction, cell type annotation, clinical analysis and spatially resolved proteomic data exploration.

Assuntos

Aprendizagem , Proteômica , Análise por Conglomerados , Processamento de Proteína Pós-Traducional , Peptídeos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA