Search | VHL Regional Portal

1.

Design and Application of Multi-Center Clinical Research Platform for Phenotyping of Voriconazole Hepatotoxicity.

Zhang, Ying; Wang, Yuqing; Chi, Shengqiang; Ru, Hua; Jiang, Yifan; Tian, Yu; Zhou, Tianshu; Li, Jingsong.

Stud Health Technol Inform ; 310: 1482-1483, 2024 Jan 25.

Article in English | MEDLINE | ID: mdl-38269707

ABSTRACT

We introduce a phenotyping pipeline for voriconazole hepatotoxicity based on a multi-center clinical research platform. Using the platform's queue construction, feature generation, and feature screening functions, 52 features were obtained for model training. The prediction model of voriconazole hepatotoxicity was obtained by using the model training and evaluation functions of the platform. Important risk factors and protection factors of the model were listed.

Subject(s)

Chemical and Drug Induced Liver Injury , Humans , Voriconazole/toxicity , Protective Factors , Risk Factors , Chemical and Drug Induced Liver Injury/etiology

2.

Temporal Phenotyping for End-Stage Renal Disease Using Longitudinal Electronic Health Records.

Chi, Shengqiang; Wang, Feng; Li, Xueyao; Xu, Minghong; Li, Jingsong.

Stud Health Technol Inform ; 310: 264-268, 2024 Jan 25.

Article in English | MEDLINE | ID: mdl-38269806

ABSTRACT

End Stage Renal Disease (ESRD) is a highly heterogeneous disease with significant differences in prevalence, mortality, complications, and treatment modalities across age, sex, race, and ethnicity. An improved knowledge of disease characteristics results from the use of a data-driven phenotypic classification strategy to identify patients of different subtypes and expose the clinical traits of different subtypes. This study used topic models and process mining techniques to perform subtyping of ESRD patients on hemodialysis based on real-world longitudinal electronic health record data. The mined subtypes are interpretable and clinically significant, and they can reflect differences in the progression of the disease state and clinical outcomes.

Subject(s)

Electronic Health Records , Kidney Failure, Chronic , Humans , Kidney Failure, Chronic/epidemiology , Kidney Failure, Chronic/therapy , Renal Dialysis , Ethnicity , Knowledge

3.

A Hemodialysis Mortality Prediction Model Based on Active Contrastive Learning.

Wang, Feng; Chi, Shengqiang; Li, Xueyao; Zhang, Hang; Li, Jingsong.

Stud Health Technol Inform ; 310: 720-724, 2024 Jan 25.

Article in English | MEDLINE | ID: mdl-38269903

ABSTRACT

Hemodialysis (HD) is the main treatment for end-stage renal disease with high mortality and heavy economic burdens. Predicting the mortality risk in patients undergoing maintenance HD and identifying high-risk patients are critical to enable early intervention and improve quality of life. In this study, we proposed a two-stage protocol based on electronic health record (EHR) data to predict mortality risk of maintenance HD patients. First, we developed a multilayer perceptron (MLP) model to predict mortality risk. Second, an Active Contrastive Learning (ACL) method was proposed to select sample pairs and optimize the representation space to improve the prediction performance of the MLP model. Our ACL method outperforms other methods and has an average F1-score of 0.820 and an average area under the receiver operating characteristic curve of 0.853. This work is generalizable to analyses of cross-sectional EHR data, while this two-stage approach can be applied to other diseases as well.

Subject(s)

Kidney Failure, Chronic , Quality of Life , Humans , Cross-Sectional Studies , Renal Dialysis , Problem-Based Learning , Kidney Failure, Chronic/diagnosis , Kidney Failure, Chronic/therapy

4.

Graph Neural Network Based Multi-Label Hierarchical Classification for Disease Predictions in General Practice.

Chi, Shengqiang; Wang, Yuqing; Zhang, Ying; Zhu, Weiwei; Li, Jingsong.

Stud Health Technol Inform ; 310: 725-729, 2024 Jan 25.

Article in English | MEDLINE | ID: mdl-38269904

ABSTRACT

General practitioners are supposed to be better diagnostics to detect patients with serious diseases earlier, and conduct early interventions and appropriate referrals of patients. However, in the current general practice, primary general practitioners lack sufficient clinical experiences, and the correct rate of general disease diagnosis is low. To assist general practitioners in diagnosis, this paper proposes a multi-label hierarchical classification method based on graph neural network, which integrates medical knowledge and electronic health record (EHR) data to build a disease prediction model. The experimental results based on data consist of 231,783 visits from EHR show that the proposed model outperforms all baseline models in the general disease prediction task with a top-3 recall of 0.865. The interpretable results of the model can effectively help clinicians understand the basis of the model's decision-making.

Subject(s)

General Practice , General Practitioners , Humans , Family Practice , Knowledge , Neural Networks, Computer

5.

A Knowledge-based and Data-driven Approach for Predicting Acute Kidney Injury in Patients with Heart Failure.

Chi, Shengqiang; Zhou, Tianshu; Zhu, Weiwei; Li, Xueyao; Li, Jingsong.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-4, 2023 07.

Article in English | MEDLINE | ID: mdl-38083608

ABSTRACT

It has great potential to integrate medical knowledge and electronic health record data for diagnosis prediction. However, present studies only utilized information from knowledge graphs, omitting potentially significant global graph structural features. In this study, we proposed a knowledge and data integrating modeling approach to reconstruct patient electronic health record data with graph structure and use medical knowledge as internal information of patient data to build a risk prediction model for acute kidney injury in patients with heart failure based on graph neural networks. Experimental results based on the MIMIC III data showed that the method proposed was superior to other baseline models in predicting the risk of acute kidney injury in heart failure patients, with an accuracy of 0.725 and an F1 score of 0.755. This study provides a novel approach to the disease risk prediction models that integrates medical knowledge and data.

Subject(s)

Acute Kidney Injury , Heart Failure , Humans , Neural Networks, Computer , Heart Failure/complications , Heart Failure/diagnosis , Acute Kidney Injury/diagnosis , Acute Kidney Injury/etiology , Electronic Health Records

6.

A novel lifelong machine learning-based method to eliminate calibration drift in clinical prediction models.

Chi, Shengqiang; Tian, Yu; Wang, Feng; Zhou, Tianshu; Jin, Shan; Li, Jingsong.

Artif Intell Med ; 125: 102256, 2022 03.

Article in English | MEDLINE | ID: mdl-35241261

ABSTRACT

OBJECTIVE: Clinical prediction models (CPMs) constructed based on artificial intelligence have been proven to have positive impacts on clinical activities. However, the deterioration of CPM performance over time has rarely been studied. This paper proposes a model updating method to solve the calibration drift issue caused by data drift. MATERIALS AND METHODS: This paper proposes a novel model updating method based on lifelong machine learning (LML). The effectiveness of the proposed method is verified in four tumor datasets, and a comprehensive comparison with other model updating methods is performed. RESULTS: Changes in data distributions cause model performances to drift. The four compared model updating methods have different effects in terms of improving the discrimination and calibration abilities of the tested models. The LML method proposed in this study improves model performance better than or equivalent to the other methods. The proposed method achieved a mean AUC of 0.8249, 0.8780, 0.8261, and 0.8489, a mean AUPRC of 0.7782, 0.9730, 0.4655, and 0.5728, a mean F1 of 0.6866, 0.9552, 0.2985, and 0.3585, and a mean estimated calibration index (ECI) of 0.0320, 0.0338, 0.0101, and 0.0115 using colorectal, lung, breast and prostate cancer datasets. DISCUSSION: The LML framework simultaneously monitors model performance and the distribution of disease risk characteristics, enabling it to effectively address the performance degradation caused by gradual and sudden data drifts and provide reasonable explanations for the causes of performance degradation. CONCLUSION: Monitoring model performance and the underlying data distribution can promote model life cycle iteration with "development-deployment-maintenance-monitoring" as the core, which, in turn, ensures that the model can provide accurate predictions, guides the model update process and explains the causes of model performance changes.

Subject(s)

Artificial Intelligence , Models, Statistical , Calibration , Humans , Machine Learning , Male , Prognosis

7.

Deep Semisupervised Multitask Learning Model and Its Interpretability for Survival Analysis.

Chi, Shengqiang; Tian, Yu; Wang, Feng; Wang, Yu; Chen, Ming; Li, Jingsong.

IEEE J Biomed Health Inform ; 25(8): 3185-3196, 2021 08.

Article in English | MEDLINE | ID: mdl-33687852

ABSTRACT

Survival analysis is a commonly used method in the medical field to analyze and predict the time of events. In medicine, this approach plays a key role in determining the course of treatment, developing new drugs, and improving hospital procedures. Most of the existing work in this area has addressed the problem by making strong assumptions about the underlying stochastic process. However, these assumptions are usually violated in the real-world data. This paper proposed a semisupervised multitask learning (SSMTL) method based on deep learning for survival analysis with or without competing risks. SSMTL transforms the survival analysis problem into a multitask learning problem that includes semisupervised learning and multipoint survival probability prediction. The distribution of survival times and the relationship between covariates and outcomes were modeled directly without any assumptions. Semisupervised loss and ranking loss are used to deal with censored data and the prior knowledge of the nonincreasing trend of the survival probability. Additionally, the importance of prognostic factors is determined, and the time-dependent and nonlinear effects of these factors on survival outcomes are visualized. The prediction performance of SSMTL is better than that of previous models in settings with or without competing risks, and the effects of predictors are successfully described. This study is of great significance for the exploration and application of deep learning methods involving medical structured data and provides an effective deep-learning-based method for survival analysis with complex-structured clinical data.

Subject(s)

Supervised Machine Learning , Humans , Stochastic Processes , Survival Analysis

8.

Semi-supervised learning to improve generalizability of risk prediction models.

Chi, Shengqiang; Li, Xinhang; Tian, Yu; Li, Jun; Kong, Xiangxing; Ding, Kefeng; Weng, Chunhua; Li, Jingsong.

J Biomed Inform ; 92: 103117, 2019 04.

Article in English | MEDLINE | ID: mdl-30738948

ABSTRACT

The utility of a prediction model depends on its generalizability to patients drawn from different but related populations. We explored whether a semi-supervised learning model could improve the generalizability of colorectal cancer (CRC) risk prediction relative to supervised learning methods. Data on 113,141 patients diagnosed with nonmetastatic CRC from 2004 to 2012 were obtained from the Surveillance Epidemiology End Results registry for model development, and data on 1149 patients from the Second Affiliated Hospital, Zhejiang University School of Medicine, who were diagnosed between 2004 and 2011, were collected for generalizability testing. A clinical prediction model for CRC survival risk using a semi-supervised logistic regression method was developed and validated to investigate the model discrimination, calibration, generalizability, interpretability and clinical usefulness. Rigorous model performance comparisons with other supervised learning models were performed. The area under the curve of the logistic membership model revealed a large heterogeneity between the development cohort and validation cohort, which is typical of generalizability studies of prediction models. The discrimination was good for all models. Calibration was poor for supervised learning models, while the semi-supervised logistic regression model exhibited a good calibration on the validation cohort, which indicated good generalizability. Clinical usefulness analysis showed that semi-supervised logistic regression can lead to better clinical outcomes than supervised learning methods. These results increase our current understanding of the generalizability of different models and provide a reference for predictive model development for clinical decision-making.

Subject(s)

Colorectal Neoplasms/diagnosis , Colorectal Neoplasms/mortality , Models, Statistical , Supervised Machine Learning , Adolescent , Adult , Aged , Aged, 80 and over , Child , Diagnosis, Computer-Assisted , Female , Humans , Male , Middle Aged , Prognosis , Risk , Survival Analysis , Young Adult

9.

Spatially varying effects of predictors for the survival prediction of nonmetastatic colorectal Cancer.

Tian, Yu; Li, Jun; Zhou, Tianshu; Tong, Danyang; Chi, Shengqiang; Kong, Xiangxing; Ding, Kefeng; Li, Jingsong.

BMC Cancer ; 18(1): 1084, 2018 Nov 08.

Article in English | MEDLINE | ID: mdl-30409119

ABSTRACT

BACKGROUND: An increasing number of studies have identified spatial differences in colorectal cancer survival. However, little is known about the spatially varying effects of predictors in survival prediction modeling studies of colorectal cancer that have focused on estimating the absolute survival risk for patients from a wide range of populations. This study aimed to demonstrate the spatially varying effects of predictors of survival for nonmetastatic colorectal cancer patients. METHODS: Patients diagnosed with nonmetastatic colorectal cancer from 2004 to 2013 who were followed up through the end of 2013 were extracted from the Surveillance Epidemiology End Results registry (Patients: 128061). The log-rank test and the restricted mean survival time were used to evaluate survival outcome differences among spatial clusters corresponding to a widely used clinical predictor: stage determined by AJCC 7th edition staging system. The heterogeneity test, which is used in meta-analyses, revealed the spatially varying effects of single predictors. Then, considering the above predictors in a standard survival prediction model based on spatially clustered data, the spatially varying coefficients of these models revealed that some covariate effects may not be constant across the geographic regions of the study. Then, two types of survival prediction models (a statistical model and a machine learning model) were built; these models considered the predictors and enabled survival prediction for patients from a wide range of geographic regions. RESULTS: Based on univariate and multivariate analysis, some prognostic factors, such as "TNM stage", "tumor size" and "age at diagnosis," have significant spatially varying effects among different regions. When considering these spatially varying effects, machine learning models have fewer assumption constraints (such as proportional hazard assumptions) and better predictive performance compared with statistical models. Upon comparing the concordance indexes of these two models, the machine learning model was found to be more accurate (0.898[0.895,0.902]) than the statistical model (0.732 [0.726, 0.738]). CONCLUSIONS: Based on this study, it's recommended that the spatially varying effect of predictors should be considered when building survival prediction models involving large-scale and multicenter research data. Machine learning models that are not limited by the requirement of a statistical hypothesis are promising alternative models.

Subject(s)

Colorectal Neoplasms/mortality , Adult , Aged , Aged, 80 and over , Colorectal Neoplasms/epidemiology , Colorectal Neoplasms/pathology , Female , Humans , Kaplan-Meier Estimate , Machine Learning , Male , Middle Aged , Multivariate Analysis , Neoplasm Staging , Prognosis , Proportional Hazards Models , SEER Program , Spatial Analysis , United States/epidemiology

10.

POPCORN: A web service for individual PrognOsis prediction based on multi-center clinical data CollabORatioN without patient-level data sharing.

Tian, Yu; Shang, Yong; Tong, Dan-Yang; Chi, Sheng-Qiang; Li, Jun; Kong, Xiang-Xing; Ding, Ke-Feng; Li, Jing-Song.

J Biomed Inform ; 86: 1-14, 2018 10.

Article in English | MEDLINE | ID: mdl-30103028

ABSTRACT

BACKGROUND AND OBJECTIVE: Clinical prognosis prediction plays an important role in clinical research and practice. The construction of prediction models based on electronic health record data has recently become a research focus. Due to the lack of external validation, prediction models based on single-center, hospital-specific datasets may not perform well with datasets from other medical institutions. Therefore, research investigating prognosis prediction model construction based on a collaborative analysis of multi-center electronic health record data could increase the number and coverage of patients used for model training, enrich patient prognostic features and ultimately improve the accuracy and generalization of prognosis prediction. MATERIALS AND METHODS: A web service for individual prognosis prediction based on multi-center clinical data collaboration without patient-level data sharing (POPCORN) was proposed. POPCORN focuses on solving key issues in multi-center collaborative research based on electronic health record systems; these issues include the standardization of clinical data expression, the preservation of patient privacy during model training and the effect of case mix variance on the prediction model construction and application. POPCORN is based on a multivariable meta-analysis and a Bayesian framework and can construct suitable prediction models for multiple clinical scenarios that can effectively adapt to complex clinical application environments. RESULTS: POPCORN was validated using a joint, multi-center collaborative research network between China and the United States with patients diagnosed with colorectal cancer. The performance of the models based on POPCORN was comparable to that of the standard prognosis prediction model; however, POPCORN did not expose raw patient data. The prediction models had similar AUC, but the BMA model had the lowest ECI across all prediction models, indicating that this model had better calibration performance than the other models, especially for patients in Chinese hospitals. CONCLUSIONS: The POPCORN system can build prediction models that perform well in complex clinical application scenarios and can provide effective decision support for individual patient prognostic predictions.

Subject(s)

Colorectal Neoplasms/diagnosis , Colorectal Neoplasms/epidemiology , Decision Support Systems, Clinical , Electronic Health Records , Internet , Access to Information , Aged , Algorithms , Bayes Theorem , Calibration , China , Diagnosis, Computer-Assisted , Female , Humans , Information Dissemination , International Cooperation , Male , Middle Aged , Probability , Prognosis , Reproducibility of Results , United States

11.

A modified TNM staging system for non-metastatic colorectal cancer based on nomogram analysis of SEER database.

Kong, Xiangxing; Li, Jun; Cai, Yibo; Tian, Yu; Chi, Shengqiang; Tong, Danyang; Hu, Yeting; Yang, Qi; Li, Jingsong; Poston, Graeme; Yuan, Ying; Ding, Kefeng.

BMC Cancer ; 18(1): 50, 2018 01 08.

Article in English | MEDLINE | ID: mdl-29310604

ABSTRACT

BACKGROUND: To revise the American Joint Committee on Cancer TNM staging system for colorectal cancer (CRC) based on a nomogram analysis of Surveillance, Epidemiology, and End Results (SEER) database, and to prove the rationality of enhancing T stage's weighting in our previously proposed T-plus staging system. METHODS: Total 115,377 non-metastatic CRC patients from SEER were randomly grouped as training and testing set by ratio 1:1. The Nomo-staging system was established via three nomograms based on 1-year, 2-year and 3-year disease specific survival (DSS) Logistic regression analysis of the training set. The predictive value of Nomo-staging system for the testing set was evaluated by concordance index (c-index), likelihood ratio (L.R.) and Akaike information criteria (AIC) for 1-year, 2-year, 3-year overall survival (OS) and DSS. Kaplan-Meier survival curve was used to valuate discrimination and gradient monotonicity. And an external validation was performed on database from the Second Affiliated Hospital of Zhejiang University (SAHZU). RESULTS: Patients with T1-2 N1 and T1N2a were classified into stage II while T4 N0 patients were classified into stage III in Nomo-staging system. Kaplan-Meier survival curves of OS and DSS in testing set showed Nomo-staging system performed better in discrimination and gradient monotonicity, and the external validation in SAHZU database also showed distinctly better discrimination. The Nomo-staging system showed higher value in L.R. and c-index, and lower value in AIC when predicting OS and DSS in testing set. CONCLUSION: The Nomo-staging system showed better performance in prognosis prediction and the weight of lymph nodes status in prognosis prediction should be cautiously reconsidered.

Subject(s)

Colorectal Neoplasms/epidemiology , Nomograms , Prognosis , Colorectal Neoplasms/pathology , Female , Humans , Kaplan-Meier Estimate , Male , Neoplasm Staging , SEER Program

12.

Time-dependent and nonlinear effects of prognostic factors in nonmetastatic colorectal cancer.

Chi, Sheng-Qiang; Tian, Yu; Li, Jun; Tong, Dan-Yang; Kong, Xiang-Xing; Poston, Graeme; Ding, Ke-Feng; Li, Jing-Song.

Cancer Med ; 6(8): 1882-1892, 2017 Aug.

Article in English | MEDLINE | ID: mdl-28707427

ABSTRACT

The survival risk following curative surgery for nonmetastatic colorectal cancer (CRC) may be over- or underestimated due to a lack of attention to nonlinear effects and violation of the proportional hazards assumption. In this paper, we aimed to detect and interpret the shape of time-dependent and nonlinear effects to improve the predictive performance of models of prognoses in nonmetastatic CRC patients. Data for nonmetastatic CRC patients diagnosed between 2004 and 2012 were obtained from the Surveillance Epidemiology End Results registry. Time-dependent and nonlinear effects were tested and plotted. A nonlinear model that used random survival forests was implemented. The estimated 5-year cancer-specific death rate was 17.95% (95% CI, 17.70-18.20%). Tumor invasion depth, lymph node status, age at diagnosis, tumor grade, histology and tumor site were significantly associated with cancer-specific death. Nonlinear and time-dependent effects on survival were detected. Positive lymph node number had a larger effect per unit of measurement at low values than at high values, whereas age at diagnosis showed the opposite pattern. Moreover, nonproportional hazards were detected for all covariates, indicating that the contributions of these risks to survival outcomes decreased over time. The nonlinear model predicted prognoses more accurately (C-index: 0.7934, 0.7933-0.7934) than did the Fine and Gray model (C-index: 0.7550, 0.7510-0.7583). The three-dimensional cumulative incidence curves derived from nonlinear model were used to identify the change points of the risk trends. It would be useful to implement these findings in treatment plans and follow-up surveillance in nonmetastatic CRC patients.

Subject(s)

Colorectal Neoplasms/mortality , Colorectal Neoplasms/pathology , Adolescent , Adult , Aged , Aged, 80 and over , Female , Follow-Up Studies , Humans , Incidence , Male , Middle Aged , Neoplasm Grading , Neoplasm Staging , Nonlinear Dynamics , Prognosis , SEER Program , Time Factors , United States/epidemiology , Young Adult

13.

A Study on Data-Driven Novel Cancer Staging Methods.

Gao, Yuan; Tian, Yu; Chi, Shengqiang; Lu, Yao; Li, Xinhang; Zhou, Tianshu; Li, Jing-Song.

Stud Health Technol Inform ; 245: 1263, 2017.

Article in English | MEDLINE | ID: mdl-29295348

ABSTRACT

This paper presents a data-driven method to study the relationship of survival and clinical information of patients. The machine learning models were established to study the survival situation at the time of interest based on survival analysis. The way to determine the time of interest is an innovation of this paper. The distribution of survival time is considered, namely the three quartiles, as well as the traditional analysis experience is taken into consideration.

Subject(s)

Machine Learning , Neoplasm Staging , Humans , Statistics as Topic , Survival Analysis

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL