Predicting liver cancers using skewed epidemiological data.

Li, Jinpeng; Tao, Yaling; Cong, Huaiwei; Zhu, Enwei; Cai, Ting

Li, Jinpeng; Tao, Yaling; Cong, Huaiwei; Zhu, Enwei; Cai, Ting.

Afiliación

Li J; Ningbo HwaMei Hospital, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China; Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China. Electronic address: lijinpeng@ucas.ac.cn.
Tao Y; Ningbo HwaMei Hospital, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China; Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China. Electronic address: taoyaling@ucas.ac.cn.
Cong H; Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China.
Zhu E; Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China.
Cai T; Ningbo HwaMei Hospital, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China; Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315010, China. Electronic address: caiting@ucas.ac.cn.

Artif Intell Med ; 124: 102234, 2022 02.

Article en En | MEDLINE | ID: mdl-35115129

ABSTRACT

ABSTRACT

Liver Cancer is a threat to human health and life over the world. The key to reduce liver cancer incidence is to identify high-risk populations and carry out individualized interventions before cancer occurrence. Building predictive models based on machine learning algorithms is an effective and economical way to forecast potential liver cancers. However, since the dataset is usually extremely skewed (negative samples are much more than positive samples), machine learning models suffer from severe bias and make unreliable predictions. In this paper, we systematically evaluate existing approaches in tackling class-imbalance problem and introduce two undersampling methods. The first is based on K-means++, where robust clustering centers are appointed as negative samples. The second is based on learning vector quantization, which considers diagnostic labels during clustering, and the prototypes are used as negative data. In this way, positive and negative samples are rebalanced. The algorithm is applied to five-year liver cancer prediction in Early Diagnosis and Treatment of Urban Cancer project in China. We achieve an AUC of 0.76 when no clinical measure except for epidemiological information is used. Experimental results show the advantage of our method over existing oversampling, undersampling, ensemble algorithms, and state-of-the-art outlier detection algorithms. This work explores a feasible and practical roadmap to tackle skewed medical data in cancer prediction and benefits applications targeted to human health and well-being.

Asunto(s)

Neoplasias Hepáticas; Aprendizaje Automático; Algoritmos; China; Análisis por Conglomerados; Humanos; Neoplasias Hepáticas/diagnóstico; Neoplasias Hepáticas/epidemiología

Palabras clave

Cancer prediction; Clustering; Liver cancer; Machine learning; Risk assessment

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Aprendizaje Automático / Neoplasias Hepáticas Tipo de estudio: Diagnostic_studies / Prognostic_studies / Screening_studies Límite: Humans País/Región como asunto: Asia Idioma: En Revista: Artif Intell Med Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google