Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 108
1.
J Biomed Inform ; 154: 104651, 2024 Jun.
Article En | MEDLINE | ID: mdl-38703936

OBJECTIVE: Chatbots have the potential to improve user compliance in electronic Patient-Reported Outcome (ePRO) system. Compared to rule-based chatbots, Large Language Model (LLM) offers advantages such as simplifying the development process and increasing conversational flexibility. However, there is currently a lack of practical applications of LLMs in ePRO systems. Therefore, this study utilized ChatGPT to develop the Chat-ePRO system and designed a pilot study to explore the feasibility of building an ePRO system based on LLM. MATERIALS AND METHODS: This study employed prompt engineering and offline knowledge distillation to design a dialogue algorithm and built the Chat-ePRO system on the WeChat Mini Program platform. In order to compare Chat-ePRO with the form-based ePRO and rule-based chatbot ePRO used in previous studies, we conducted a pilot study applying the three ePRO systems sequentially at the Sir Run Run Shaw Hospital to collect patients' PRO data. RESULT: Chat-ePRO is capable of correctly generating conversation based on PRO forms (success rate: 95.7 %) and accurately extracting the PRO data instantaneously from conversation (Macro-F1: 0.95). The majority of subjective evaluations from doctors (>70 %) suggest that Chat-ePRO is able to comprehend questions and consistently generate responses. Pilot study shows that Chat-ePRO demonstrates higher response rate (9/10, 90 %) and longer interaction time (10.86 s/turn) compared to the other two methods. CONCLUSION: Our study demonstrated the feasibility of utilizing algorithms such as prompt engineering to drive LLM in completing ePRO data collection tasks, and validated that the Chat-ePRO system can effectively enhance patient compliance.


Algorithms , Patient Reported Outcome Measures , Pilot Projects , Humans , Male , Female , Electronic Health Records , Middle Aged , Adult
2.
Neuroreport ; 35(8): 499-508, 2024 May 15.
Article En | MEDLINE | ID: mdl-38597270

Intracerebral hemorrhage (ICH) is a severe stroke subtype. Secondary injury is a key factor leading to neurological deficits after ICH. Electroacupuncture (EA) can improve the neurological function after ICH, however, its internal mechanism is still unclear. The aim of this study is to investigate whether EA could ameliorate secondary injury after ICH through antioxidative stress and its potential regulatory mechanism. A rat model of ICH was established by injecting autologous blood into striatum. After the intervention of EA and EA combined with peroxisome proliferator-activated receptor-γ (PPARγ) blocker, Zea-longa scores, modified neurological severity scores and open field tests were used to evaluate the neurological function of the rats. Flow cytometry detected tissue reactive oxygen species (ROS) levels. Tissue tumor necrosis factor-α (TNF-α) levels were analyzed by enzyme-linked immunosorbent assays. The protein expressions of PPAR γ, nuclear factor erythroid2-related factor 2 (Nrf2) and γ-glutamylcysteine synthetase (γ-GCS) were detected by Western blot. Immunohistochemistry was used to observe the activation of microglia. The demyelination degree of axon myelin was observed by transmission electron microscope. Compared with the model group, EA intervention improved neurological function, decreased ROS and TNF-α levels, increased the protein expression of PPARγ, Nrf2 and γ-GCS, and reduced the activation of microglia, it also alleviated axonal myelin sheath damage. In addition, the neuroprotective effect of EA was partially attenuated by PPARγ blocker. EA ameliorated the neurological function of secondary injury after ICH in rats, possibly by activating the PPARγ/Nrf2/γ-GCS signaling pathway, reducing microglia activation, and inhibiting oxidative stress, thus alleviating the extent of axonal demyelination plays a role.


Cerebral Hemorrhage , Electroacupuncture , Glutamate-Cysteine Ligase , NF-E2-Related Factor 2 , Oxidative Stress , PPAR gamma , Rats, Sprague-Dawley , Animals , PPAR gamma/metabolism , NF-E2-Related Factor 2/metabolism , Electroacupuncture/methods , Oxidative Stress/physiology , Oxidative Stress/drug effects , Cerebral Hemorrhage/metabolism , Cerebral Hemorrhage/complications , Rats , Male , Glutamate-Cysteine Ligase/metabolism , Signal Transduction/physiology , Signal Transduction/drug effects , Reactive Oxygen Species/metabolism
3.
Emerg Microbes Infect ; 13(1): 2332670, 2024 Dec.
Article En | MEDLINE | ID: mdl-38646911

This study aimed to provide data for the clinical features of invasive pneumococcal disease (IPD) and the molecular characteristics of Streptococcus pneumoniae isolates from paediatric patients in China. We conducted a multi-centre prospective study for IPD in 19 hospitals across China from January 2019 to December 2021. Data of demographic characteristics, risk factors for IPD, death, and disability was collected and analysed. Serotypes, antibiotic susceptibility, and multi-locus sequence typing (MLST) of pneumococcal isolates were also detected. A total of 478 IPD cases and 355 pneumococcal isolates were enrolled. Among the patients, 260 were male, and the median age was 35 months (interquartile range, 12-46 months). Septicaemia (37.7%), meningitis (32.4%), and pneumonia (27.8%) were common disease types, and 46 (9.6%) patients died from IPD. Thirty-four serotypes were detected, 19F (24.2%), 14 (17.7%), 23F (14.9%), 6B (10.4%) and 19A (9.6%) were common serotypes. Pneumococcal isolates were highly resistant to macrolides (98.3%), tetracycline (94.1%), and trimethoprim/sulfamethoxazole (70.7%). Non-sensitive rates of penicillin were 6.2% and 83.3% in non-meningitis and meningitis isolates. 19F-ST271, 19A-ST320 and 14-ST876 showed high resistance to antibiotics. This multi-centre study reports the clinical features of IPD and demonstrates serotype distribution and antibiotic resistance of pneumococcal isolates in Chinese children. There exists the potential to reduce IPD by improved uptake of pneumococcal vaccination, and continued surveillance is warranted.


Anti-Bacterial Agents , Multilocus Sequence Typing , Pneumococcal Infections , Serogroup , Streptococcus pneumoniae , Humans , Streptococcus pneumoniae/genetics , Streptococcus pneumoniae/drug effects , Streptococcus pneumoniae/classification , Streptococcus pneumoniae/isolation & purification , Male , Pneumococcal Infections/microbiology , Pneumococcal Infections/epidemiology , Pneumococcal Infections/mortality , Female , Child, Preschool , China/epidemiology , Infant , Anti-Bacterial Agents/pharmacology , Prospective Studies , Microbial Sensitivity Tests , Hospitals/statistics & numerical data , Child , Risk Factors , East Asian People
4.
Zhongguo Shi Yan Xue Ye Xue Za Zhi ; 32(2): 476-482, 2024 Apr.
Article Zh | MEDLINE | ID: mdl-38660855

OBJECTIVE: To study the reversal effect of NVP-BEZ235 on doxorubicin resistance in Burkitt lymphoma RAJI cell line. METHODS: The doxorubicin-resistant cell line was induced by treating RAJI cells with a concentration gradient of doxorubicin. The levels of Pgp, p-AKT, and p-mTOR in cells were detected by Western blot. Cell viability was detected by MTT assay. IC50 was computed by SPSS. RESULTS: The doxorubicin-resistant Burkitt lymphoma cell line, RAJI/DOX, was established successfully. The expression of Pgp and the phosphorylation levels of AKT and mTOR in RAJI/DOX cell line were both higher than those in RAJI cell line. NVP-BEZ235 downregulated the phosphorylation levels of AKT and mTOR in RAJI/DOX cell line. NVP-BEZ235 inhibited the proliferation of RAJI/DOX cell line, and the effect was obvious when it was cooperated with doxorubicin. CONCLUSION: The constitutive activation of PI3K/AKT/mTOR pathway of RAJI/DOX cell line was more serious than RAJI cell line. NVP-BEZ235 reversed doxorubicin resistance of RAJI/DOX cell line by inhibiting the PI3K/AKT/mTOR signal pathway.


Burkitt Lymphoma , Cell Proliferation , Doxorubicin , Drug Resistance, Neoplasm , Imidazoles , Proto-Oncogene Proteins c-akt , Quinolines , TOR Serine-Threonine Kinases , Humans , Doxorubicin/pharmacology , Cell Line, Tumor , Proto-Oncogene Proteins c-akt/metabolism , Quinolines/pharmacology , TOR Serine-Threonine Kinases/metabolism , Cell Proliferation/drug effects , Imidazoles/pharmacology , Phosphatidylinositol 3-Kinases/metabolism , Signal Transduction , Cell Survival/drug effects , Phosphorylation
5.
Comput Struct Biotechnol J ; 23: 982-989, 2024 Dec.
Article En | MEDLINE | ID: mdl-38404709

The thermostable α-amylase derived from Bacillus licheniformis (BLA) has multiple advantages, including enhancing the mass transfer rate and by reducing microbial contamination in starch hydrolysis. Nonetheless, the application of BLA is constrained by the accessibility and stability of enzymes capable of achieving high conversion rates at elevated temperatures. Moreover, the thermotolerance of BLA requires further enhancement. Here, we developed a computational strategy for constructing small and smart mutant libraries to identify variants with enhanced thermostability. Initially, molecular dynamics (MD) simulations were employed to identify the regions with high flexibility. Subsequently, FoldX, a computational design predictor, was used to design mutants by rigidifying highly flexible residues, whereas the simultaneous decrease in folding free energy assisted in improving thermostability. Through the utilization of MD and FoldX, residues K251, T277, N278, K319, and E336, situated at a distance of 5 Å from the catalytic triad, were chosen for mutation. Seventeen mutants were identified and characterized by evaluating enzymatic characteristics and kinetic parameters. The catalytic efficiency of the E271L/N278K mutant reached 184.1 g L-1 s-1, which is 1.88-fold larger than the corresponding value determined for the WT. Furthermore, the most thermostable mutant, E336S, exhibited a 1.43-fold improvement in half-life at 95 â„ƒ, compared with that of the WT. This study, by combining computational simulation with experimental verification, establishes that potential sites can be computationally predicted to increase the activity and stability of BLA and thus provide a possible strategy by which to guide protein design.

6.
Int J Biol Macromol ; 262(Pt 2): 130248, 2024 Mar.
Article En | MEDLINE | ID: mdl-38367782

Phenylalanine ammonia-lyase (PAL) has various applications in fine chemical manufacturing and the pharmaceutical industry. In particular, PAL derived from Anabaena variabilis (AvPAL) is used as a therapeutic agent to the treat phenylketonuria in clinical settings. In this study, we aligned the amino acid sequences of AvPAL and PAL derived from Nostoc punctiforme (NpPAL) to obtain several mutants with enhanced activity, expression yield, and thermal stability via amino acid substitution and saturation mutagenesis at the N-terminal position. Enzyme kinetic experiments revealed that the kcat values of NpPAL-N2K, NpPAL-I3T, and NpPAL-T4L mutants were increased to 3.2-, 2.8-, and 3.3-fold that of the wild-type, respectively. Saturation mutagenesis of the fourth amino acid in AvPAL revealed that the kcat values of AvPAL-L4N, AvPAL-L4P, AvPAL-L4Q and AvPAL-L4S increased to 4.0-, 3.7-, 3.6-, and 3.2-fold, respectively. Additionally, the soluble protein yield of AvPAL-L4K increased to approximately 14 mg/L, which is approximately 3.5-fold that of AvPAL. Molecular dynamics studies further revealed that maintaining the attacking state of the reaction and N-terminal structure increased the rate of catalytic reaction and improved the solubility of proteins. These findings provide new insights for the rational design of PAL in the future.


Anabaena variabilis , Phenylalanine Ammonia-Lyase , Phenylalanine Ammonia-Lyase/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Anabaena variabilis/genetics , Anabaena variabilis/metabolism , Amino Acid Sequence , Catalysis
7.
Stud Health Technol Inform ; 310: 906-910, 2024 Jan 25.
Article En | MEDLINE | ID: mdl-38269940

Lymph node metastasis is of paramount importance for patient treatment decision-making, prognosis evaluation, and clinical trial enrollment. However, accurate preoperative diagnosis remains challenging. In this study, we proposed a multi-task network to learn the primary tumor pathological features using the pT stage prediction task and leverage these features to facilitate lymph node metastasis prediction. We conducted experiments using electronic medical record data from 681 patients with non-small cell lung cancer. The proposed method achieved a 0.768 area under the receiver operating characteristic curve (AUC) value with a 0.073 standard deviation (SD) and a 0.448 average precision (AP) value with a 0.113 SD for lymph node metastasis prediction, which significantly outperformed the baseline models. Based on the results, we can conclude that the proposed multi-task method can effectively learn representations about tumor pathological conditions to support lymph node metastasis prediction.


Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Lymphatic Metastasis , Learning , Electronic Health Records
8.
J Biomed Inform ; 149: 104566, 2024 Jan.
Article En | MEDLINE | ID: mdl-38070818

Modern hospitals implement clinical pathways to standardize patients' treatments. Conformance checking techniques provide an automated tool to assess whether the actual executions of clinical processes comply with the corresponding clinical pathways. However, clinical processes are typically characterized by a high degree of uncertainty, both in their execution and recording. This paper focuses on uncertainty related to logging clinical processes. The logging of the activities executed during a clinical process in the hospital information system is often performed manually by the involved actors (e.g., the nurses). However, such logging can occur at a different time than the actual execution time, which hampers the reliability of the diagnostics provided by conformance checking techniques. To address this issue, we propose a novel conformance checking algorithm that leverages principles of fuzzy set theory to incorporate experts' knowledge when generating conformance diagnostics. We exploit this knowledge to define a fuzzy tolerance in a time window, which is then used to assess the magnitude of timestamp violations of the recorded activities when evaluating the overall process execution compliance. Experiments conducted on a real-life case study in a Dutch hospital show that the proposed method obtains more accurate diagnostics than the state-of-the-art approaches. We also consider how our diagnostics can be used to stimulate discussion with domain experts on possible strategies to mitigate logging uncertainty in the clinical practice.


Algorithms , Hospital Information Systems , Humans , Reproducibility of Results , Uncertainty , Hospitals , Fuzzy Logic
9.
Int J Med Inform ; 183: 105321, 2024 Mar.
Article En | MEDLINE | ID: mdl-38157785

INTRODUCTION: Electronic health records contain an enormous amount of valuable information recorded in free text. Information extraction is the strategy to transform free text into structured data, but some of its components require annotated data to tune, which has become a bottleneck. Large language models achieve good performances on various downstream NLP tasks without parameter tuning, becoming a possible way to extract information in a zero-shot manner. METHODS: In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract information from the radiological reports. We first design the prompt template for the interested information in the CT reports. Then, we generate the prompts by combining the prompt template with the CT reports as the inputs of ChatGPT to obtain the responses. A post-processing module is developed to transform the responses into structured extraction results. Besides, we add prior medical knowledge to the prompt template to reduce wrong extraction results. We also explore the consistency of the extraction results. RESULTS: We conducted the experiments with 847 real CT reports. The experimental results indicate that ChatGPT can achieve competitive performances for some extraction tasks like tumor location, tumor long and short diameters compared with the baseline information extraction system. By adding some prior medical knowledge to the prompt template, extraction tasks about tumor spiculations and lobulations obtain significant improvements but tasks about tumor density and lymph node status do not achieve better performances. CONCLUSION: ChatGPT can achieve competitive information extraction for radiological reports in a zero-shot manner. Adding prior medical knowledge as instructions can further improve performances for some extraction tasks but may lead to worse performances for some complex extraction tasks.


Electronic Health Records , Neoplasms , Humans , Information Storage and Retrieval , Knowledge , Language
10.
Article En | MEDLINE | ID: mdl-38083421

Lung cancer is one of the most dangerous cancers all over the world. Surgical resection remains the only potentially curative option for patients with lung cancer. However, this invasive treatment often causes various complications, which seriously endanger patient health. In this study, we proposed a novel multi-label network, namely a hierarchy-driven multi-label network with label constraints (HDMN-LC), to predict the risk of complications of lung cancer patients. In this method, we first divided all complications into pulmonary and cardiovascular complication groups and employed the hierarchical cluster algorithm to analyze the hierarchies between these complications. After that, we employed the hierarchies to drive the network architecture design so that related complications could share more hidden features. And then, we combined all complications and employed an auxiliary task to predict whether any complications would occur to impose the bottom layer to learn general features. Finally, we proposed a regularization term to constrain the relationship between specific and combined complication labels to improve performance. We conducted extensive experiments on real clinical data of 593 patients. Experimental results indicate that the proposed method outperforms the single-label, multi-label baseline methods, with an average AUC value of 0.653. And the results also prove the effectiveness of hierarchy-driven network architecture and label constraints. We conclude that the proposed method can predict complications for lung cancer patients more effectively than the baseline methods.Clinical relevance-This study presents a novel multi-label network that can more accurately predict the risk of specific postoperative complications for lung cancer patients. The method can help clinicians identify high-risk patients more accurately and timely so that interventions can be implemented in advance to ensure patient safety.


Lung Neoplasms , Humans , Lung Neoplasms/surgery , Algorithms , Postoperative Complications/diagnosis , Postoperative Complications/etiology , Learning , Pattern Recognition, Automated/methods
11.
Heliyon ; 9(4): e15570, 2023 Apr.
Article En | MEDLINE | ID: mdl-37151662

Background: ICD-10 has been widely used in statistical analysis of mortality rates and medical reimbursement. Automatic ICD-10 coding is desperately needed because manually assigning codes is expensive, time-consuming, and labor-intensive. Diagnoses described in medical records differ significantly from those used in ICD-10 classification, making it impossible for existing automatic coding techniques to perform well enough to support medical billing, resource allocation, and research requirements. Meanwhile, most of the current automatic coding approaches are oriented toward English ICD-10. This method for automatically assigning ICD-10 codes to diagnoses extracted from Chinese discharge records was provided in this paper. Method: First, BERT creates word representations of the two texts. Second, the context representation layer incorporates contextual information into the representation of each time step of the word representations using a bidirectional Long Short-Term Memory. Third, the matching layer compares each contextual embedding of the uncoded diagnosis record against a weighted version of all contextual character embeddings of the manually coded diagnosis record. The matching strategy is element-wise subtraction and element-wise multiplication and then through a neural network layer. Fourth, the matching vectors are combined using a one-layer convolutional neural network. A sigmoid is then used to output matching results. Results: To evaluate the proposed method, 1,003,558 manually coded primary diagnoses were gathered from the homepage of the discharge medical records. The experimental results showed that the proposed method outperformed popular deep semantic matching algorithms, such as DSSM, ConvNet, ESIM, and ABCNN, and demonstrated state-of-the-art results in a single text matching with an accuracy of 0.986, a precision of 0.979, a recall of 0.983, and an F1-score of 0.981. Conclusion: The automatic ICD-10 coding of Chinese diagnoses is successful when using the proposed deep semantic matching approach based on analogical reasoning.

12.
Article En | MEDLINE | ID: mdl-37018304

Lymph node metastasis (LNM) is critical for treatment decision-making for cancer patients, but it is difficult to diagnose accurately before surgery. Machine learning can learn nontrivial knowledge from multi-modal data to support accurate diagnosis. In this paper, we proposed a Multi-modal Heterogeneous Graph Forest (MHGF) approach to extract the deep representations of LNM from multi-modal data. Specifically, we first extracted the deep image features from CT images to represent the pathological anatomic extent of the primary tumor (pathological T stage) using a ResNet-Trans network. And then, a heterogeneous graph with six vertices and seven bi-directional relations was defined by medical experts to describe the possible relations between the clinical and image features. After that, we proposed a graph forest approach to construct the sub-graphs by removing each vertex in the complete graph iteratively. Finally, we used graph neural networks to learn the representations of each sub-graph in the forest to predict LNM and averaged all the prediction results as final results. We conducted experiments on 681 patients' multi-modal data. The proposed MHGF achieves the best performances with a 0.806 AUC value and 0.513 AP value compared with state-of-art machine learning and deep learning methods. The results indicate that the graph method can explore the relations between different types of features to learn effective deep representations for LNM prediction. Moreover, we found that the deep image features about the pathological anatomic extent of the primary tumor are useful for LNM prediction. And the graph forest approach can further improve the generalization ability and stability of the LNM prediction model.

13.
Brief Bioinform ; 24(2)2023 03 19.
Article En | MEDLINE | ID: mdl-36733262

Single-cell RNA sequencing (scRNA-seq) data are typically with a large number of missing values, which often results in the loss of critical gene signaling information and seriously limit the downstream analysis. Deep learning-based imputation methods often can better handle scRNA-seq data than shallow ones, but most of them do not consider the inherent relations between genes, and the expression of a gene is often regulated by other genes. Therefore, it is essential to impute scRNA-seq data by considering the regional gene-to-gene relations. We propose a novel model (named scGGAN) to impute scRNA-seq data that learns the gene-to-gene relations by Graph Convolutional Networks (GCN) and global scRNA-seq data distribution by Generative Adversarial Networks (GAN). scGGAN first leverages single-cell and bulk genomics data to explore inherent relations between genes and builds a more compact gene relation network to jointly capture the homogeneous and heterogeneous information. Then, it constructs a GCN-based GAN model to integrate the scRNA-seq, gene sequencing data and gene relation network for generating scRNA-seq data, and trains the model through adversarial learning. Finally, it utilizes data generated by the trained GCN-based GAN model to impute scRNA-seq data. Experiments on simulated and real scRNA-seq datasets show that scGGAN can effectively identify dropout events, recover the biologically meaningful expressions, determine subcellular states and types, improve the differential expression analysis and temporal dynamics analysis. Ablation experiments confirm that both the gene relation network and gene sequence data help the imputation of scRNA-seq data.


Single-Cell Gene Expression Analysis , Software , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Genomics , Gene Expression Profiling
14.
Front Chem ; 10: 1063374, 2022.
Article En | MEDLINE | ID: mdl-36569957

Emergence of the SARS-CoV-2 Omicron variant of concern (VOC; B.1.1.529) resulted in a new peak of the COVID-19 pandemic, which called for development of effective therapeutics against the Omicron VOC. The receptor binding domain (RBD) of the spike protein, which is responsible for recognition and binding of the human ACE2 receptor protein, is a potential drug target. Mutations in receptor binding domain of the S-protein have been postulated to enhance the binding strength of the Omicron VOC to host proteins. In this study, bioinformatic analyses were performed to screen for potential therapeutic compounds targeting the omicron VOC. A total of 92,699 compounds were screened from different libraries based on receptor binding domain of the S-protein via docking and binding free energy analysis, yielding the top 5 best hits. Dynamic simulation trajectory analysis and binding free energy decomposition were used to determine the inhibitory mechanism of candidate molecules by focusing on their interactions with recognized residues on receptor binding domain. The ADMET prediction and DFT calculations were conducted to determine the pharmacokinetic parameters and precise chemical properties of the identified molecules. The molecular properties of the identified molecules and their ability to interfere with recognition of the human ACE2 receptors by receptor binding domain suggest that they are potential therapeutic agents for SARS-CoV-2 Omicron VOC.

15.
Materials (Basel) ; 15(21)2022 Oct 23.
Article En | MEDLINE | ID: mdl-36363026

As a zero-dimensional (0D) nanomaterial, graphene quantum dot (GQD) has a unique physical structure and electrochemical properties, which has been widely used in biomedical fields, such as bioimaging, biosensor, drug delivery, etc. Its biological safety and potential cytotoxicity to human and animal cells have become a growing concern in recent years. In particular, the potential DNA structure damage caused by GQD is of great importance but still obscure. In this study, molecular dynamics (MD) simulation was used to investigate the adsorption behavior and the structural changes of single-stranded (ssDNA) and double-stranded DNA (dsDNA) on the surfaces of GQDs with different sizes and oxidation. Our results showed that ssDNA can strongly adsorb and lay flat on the surface of GQDs and graphene oxide quantum dots (GOQDs), whereas dsDNA was preferentially oriented vertically on both surfaces. With the increase of GQDs size, more structural change of adsorbed ssDNA and dsDNA could be found, while the size effect of GOQD on the structure of ssDNA and dsDNA is not significant. These findings may help to improve the understanding of GQD biocompatibility and potential applications of GQD in the biomedical field.

16.
BMC Med Inform Decis Mak ; 22(1): 245, 2022 09 19.
Article En | MEDLINE | ID: mdl-36123745

BACKGROUND: Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prognostic prediction models. METHODS: In this study, we present a novel approach, namely ensemble learning with active sampling (ELAS), to tackle the imbalanced data problem in NSCLC prognostic prediction. ELAS first applies an active sampling mechanism to query the most informative samples to update the base classifier to give it a new perspective. This training process is repeated until no enough samples are queried. Next, an internal validation set is employed to evaluate the base classifiers, and the ones with the best performances are integrated as the ensemble model. Besides, we set up multiple initial training data seeds and internal validation sets to ensure the stability and generalization of the model. RESULTS: We verified the effectiveness of the ELAS on a real clinical dataset containing 1848 postoperative NSCLC patients. Experimental results showed that the ELAS achieved the best averaged 0.736 AUROC value and 0.453 AUPRC value for 6 prognostic tasks and obtained significant improvements in comparison with the SVM, AdaBoost, Bagging, SMOTE and TomekLinks. CONCLUSIONS: We conclude that the ELAS can effectively alleviate the imbalanced data problem in NSCLC prognostic prediction and demonstrates good potential for future postoperative NSCLC prognostic prediction.


Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Algorithms , Carcinoma, Non-Small-Cell Lung/surgery , Humans , Lung Neoplasms/surgery , Machine Learning , Prognosis
17.
Bioinformatics ; 38(19): 4581-4588, 2022 09 30.
Article En | MEDLINE | ID: mdl-35997558

MOTIVATION: High-resolution annotation of gene functions is a central task in functional genomics. Multiple proteoforms translated from alternatively spliced isoforms from a single gene are actual function performers and greatly increase the functional diversity. The specific functions of different isoforms can decipher the molecular basis of various complex diseases at a finer granularity. Multi-instance learning (MIL)-based solutions have been developed to distribute gene(bag)-level Gene Ontology (GO) annotations to isoforms(instances), but they simply presume that a particular annotation of the gene is responsible by only one isoform, neglect the hierarchical structures and semantics of massive GO terms (labels), or can only handle dozens of terms. RESULTS: We propose an efficacy approach IsofunGO to differentiate massive functions of isoforms by GO embedding. Particularly, IsofunGO first introduces an attributed hierarchical network to model massive GO terms, and a GO network embedding strategy to learn compact representations of GO terms and project GO annotations of genes into compressed ones, this strategy not only explores and preserves hierarchy between GO terms but also greatly reduces the prediction load. Next, it develops an attention-based MIL network to fuse genomics and transcriptomics data of isoforms and predict isoform functions by referring to compressed annotations. Extensive experiments on benchmark datasets demonstrate the efficacy of IsofunGO. Both the GO embedding and attention mechanism can boost the performance and interpretability. AVAILABILITYAND IMPLEMENTATION: The code of IsofunGO is available at http://www.sdu-idea.cn/codes.php?name=IsofunGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Computational Biology , Semantics , Gene Ontology , Molecular Sequence Annotation , Protein Isoforms/genetics
18.
Comput Methods Programs Biomed ; 225: 107033, 2022 Oct.
Article En | MEDLINE | ID: mdl-35905698

BACKGROUND: Personalized medicine requires the patient similarity analysis for providing specific treatments tailed for each patient. However, the patient similarity analysis in personalized clinical scenarios encounters challenges, which are twofold. First, heterogeneous and multi-type data are usually recorded to Electronic Health Records (EHRs) during the course of admissions, which makes it difficult to measure the patient similarity. Second, disease progression manifests diverse disease states at different times, which brings sequential complexity to dynamically retrieve similar patients' sequences. MATERIALS AND METHODS: To overcome the above-mentioned challenges, we propose a novel dynamic patient similarity analysis model based on deep learning. Specifically, the proposed model embeds the multi-type and heterogeneous data into hidden representations with a specially designed embedding and attention module. Thereafter, the proposed model retrieves similar patients' sequences based on these hidden representations in a dynamic manner. More importantly, we adopt two clinical tasks, i.e., diagnosis prediction and medication recommendation, to validate the effectiveness of the proposed model. It is worth noticing that the proposed model integrates a drug-drug interaction (DDI) knowledge graph in the medication recommendation task to reduce adverse reactions caused by combinational treatments, such that a more rational strategy can be realized. We evaluate our proposed model using the critical care database MIMIC-III, which includes 5,430 patients covering 14,096 clinical visits. RESULTS: The proposed model outperforms several state-of-the-art methods. For diagnosis prediction, the average PR-AUC score of the proposed model reaches 0.6200, which is significantly higher than that of the baseline models (0.2497∼0.5407). Meanwhile, for medication recommendation, the average PR-AUC of the proposed model is 0.6682 (Jaccard: 0.4070; F1: 0.5672; Recall: 0.7832) whereas the K-nearest model can only reach 0.3805 (Jaccard: 0.3911; F1: 0.5465; Recall: 0.5705). In addition, our proposed model achieves a lower DDI rate. CONCLUSION: We propose a novel dynamic patient similarity analysis model, which can be implemented into a decision support system for clinical tasks including diagnosis prediction, surgical procedure selection, medication recommendation, etc. Also, the proposed model serves as an explainable protocol in clinical practice thanks to its analogy to real clinical reasoning where a doctor diagnoses diseases and prescribes medications according to the previous cured patients empirically.


Electronic Health Records , Precision Medicine , Critical Care , Databases, Factual , Humans , Intensive Care Units
19.
Stud Health Technol Inform ; 290: 106-110, 2022 Jun 06.
Article En | MEDLINE | ID: mdl-35672980

The clinical data often have limited usefulness because of the diversified expression. Chinese clinical data standardization can improve the usability of clinical data. The complexity of data cleaning and coding for Chinese clinical data prompted the turn of low-effective manual coding into the computer-aided tool. This study established the universal data cleaning and coding process and tool for Chinese clinical data standardization, which can greatly improve human efficiency. The process included the preprocessing, text similarity algorithm, and manual review. The standardization process proved effective for the diagnosis, drug, and examination data standardization task and can be used gradually in other clinical domains. The semi-automatic data cleaning and coding can reduce the half time for standardization, and it was used in hospitals in Beijing.


Algorithms , China , Humans , Reference Standards
...