Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 170
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38066710

RESUMO

Post-translational modification (PTM) occurs after a protein is translated from ribonucleic acid. It is an important living creature life phenomenon because it is implicated in almost all cellular processes. Identification of PTM sites from a given protein sequence is a hot topic in bioinformatics. Lots of computational methods have been proposed, and they provide good performance. However, most previous methods can only tackle one PTM type. Few methods consider multiple PTM types. In this study, a multi-label classification model, named RMTLysPTM, was developed to recognize four types of lysine (K) PTM sites, including acetylation, crotonylation, methylation and succinylation. The surrounding sites of a lysine site were selected to constitute a peptide segment, representing the lysine at the center. Deep analysis was conducted to count the distribution of 2-residues with fixed location across the four types of lysine PTM sites. By aggregating the distribution information of 2-residues in one peptide segment, the peptide segment was encoded by informative features. Furthermore, a prediction engine that can precisely capture the traits of the above representations was designed to recognize the types of lysine PTM sites. The cross-validation results on two datasets (Qiu and CPLM training datasets) suggested that the model had extremely high performance and RMTLysPTM had strong generalization ability by testing it on protein Q16778 and CPLM testing datasets. The model was found to be generally superior to all previous models and those using popular methods and features. A web server was set up for RMTLysPTM, and it can be accessed at http://119.3.127.138/.


Assuntos
Lisina , Proteínas , Lisina/metabolismo , Proteínas/química , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos , Peptídeos/metabolismo
2.
BMC Bioinformatics ; 25(1): 50, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38291384

RESUMO

BACKGROUND: Enzymes play an irreplaceable and important role in maintaining the lives of living organisms. The Enzyme Commission (EC) number of an enzyme indicates its essential functions. Correct identification of the first digit (family class) of the EC number for a given enzyme is a hot topic in the past twenty years. Several previous methods adopted functional domain composition to represent enzymes. However, it would lead to dimension disaster, thereby reducing the efficiency of the methods. On the other hand, most previous methods can only deal with enzymes belonging to one family class. In fact, several enzymes belong to two or more family classes. RESULTS: In this study, a fast and efficient multi-label classifier, named PredictEFC, was designed. To construct this classifier, a novel feature extraction scheme was designed for processing functional domain information of enzymes, which counting the distribution of each functional domain entry across seven family classes in the training dataset. Based on this scheme, each training or test enzyme was encoded into a 7-dimenion vector by fusing its functional domain information and above statistical results. Random k-labelsets (RAKEL) was adopted to build the classifier, where random forest was selected as the base classification algorithm. The two tenfold cross-validation results on the training dataset shown that the accuracy of PredictEFC can reach 0.8493 and 0.8370. The independent test on two datasets indicated the accuracy values of 0.9118 and 0.8777. CONCLUSION: The performance of PredictEFC was slightly lower than the classifier directly using functional domain composition. However, its efficiency was sharply improved. The running time was less than one-tenth of the time of the classifier directly using functional domain composition. In additional, the utility of PredictEFC was superior to the classifiers using traditional dimensionality reduction methods and some previous methods, and this classifier can be transplanted for predicting enzyme family classes of other species. Finally, a web-server available at http://124.221.158.221/ was set up for easy usage.


Assuntos
Algoritmos , Enzimas , Enzimas/classificação
3.
Biometrics ; 80(3)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38949889

RESUMO

The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.


Assuntos
Teorema de Bayes , Simulação por Computador , Modelos Estatísticos , Análise Multivariada , Humanos , Modelos Lineares , Biometria/métodos , Distribuição Normal
4.
J Biomed Inform ; 150: 104586, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-38191011

RESUMO

BACKGROUND: Halbert L. Dunn's concept of wellness is a multi-dimensional aspect encompassing social and mental well-being. Neglecting these dimensions over time can have a negative impact on an individual's mental health. The manual efforts employed in in-person therapy sessions reveal that underlying factors of mental disturbance if triggered, may lead to severe mental health disorders. OBJECTIVE: In our research, we introduce a fine-grained approach focused on identifying indicators of wellness dimensions and mark their presence in self-narrated human-writings on Reddit social media platform. DESIGN AND METHOD: We present the MultiWD dataset, a curated collection comprising 3281 instances, as a specifically designed and annotated dataset that facilitates the identification of multiple wellness dimensions in Reddit posts. In our study, we introduce the task of identifying wellness dimensions and utilize state-of-the-art classifiers to solve this multi-label classification task. RESULTS: Our findings highlights the best and comparative performance of fine-tuned large language models with fine-tuned BERT model. As such, we set BERT as a baseline model to tag wellness dimensions in a user-penned text with F1 score of 76.69. CONCLUSION: Our findings underscore the need of trustworthy and domain-specific knowledge infusion to develop more comprehensive and contextually-aware AI models for tagging and extracting wellness dimensions.


Assuntos
Transtornos Mentais , Mídias Sociais , Humanos , Saúde Mental , Conscientização
5.
J Biomed Inform ; 157: 104711, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39182632

RESUMO

OBJECTIVE: This study aimed to develop a novel approach using routinely collected electronic health records (EHRs) data to improve the prediction of a rare event. We illustrated this using an example of improving early prediction of an autism diagnosis, given its low prevalence, by leveraging correlations between autism and other neurodevelopmental conditions (NDCs). METHODS: To achieve this, we introduced a conditional multi-label model by merging conditional learning and multi-label methodologies. The conditional learning approach breaks a hard task into more manageable pieces in each stage, and the multi-label approach utilizes information from related neurodevelopmental conditions to learn predictive latent features. The study involved forecasting autism diagnosis by age 5.5 years, utilizing data from the first 18 months of life, and the analysis of feature importance correlations to explore the alignment within the feature space across different conditions. RESULTS: Upon analysis of health records from 18,156 children, we are able to generate a model that predicts a future autism diagnosis with moderate performance (AUROC=0.76). The proposed conditional multi-label method significantly improves predictive performance with an AUROC of 0.80 (p < 0.001). Further examination shows that both the conditional and multi-label approach alone provided marginal lift to the model performance compared to a one-stage one-label approach. We also demonstrated the generalizability and applicability of this method using simulated data with high correlation between feature vectors for different labels. CONCLUSION: Our findings underscore the effectiveness of the developed conditional multi-label model for early prediction of an autism diagnosis. The study introduces a versatile strategy applicable to prediction tasks involving limited target populations but sharing underlying features or etiology among related groups.

6.
Int J Geriatr Psychiatry ; 39(2): e6071, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38372966

RESUMO

BACKGROUND: Geriatric depression and anxiety have been identified as mood disorders commonly associated with the onset of dementia. Currently, the diagnosis of geriatric depression and anxiety relies on self-reported assessments for primary screening purposes, which is uncomfortable for older adults and can be prone to misreporting. When a more precise diagnosis is needed, additional methods such as in-depth interviews or functional magnetic resonance imaging are used. However, these methods can not only be time-consuming and costly but also require systematic and cost-effective approaches. OBJECTIVE: The main objective of this study was to investigate the feasibility of training an end-to-end deep learning (DL) model by directly inputting time-series activity tracking and sleep data obtained from consumer-grade wrist-worn activity trackers to identify comorbid depression and anxiety. METHODS: To enhance accuracy, the input of the DL model consisted of step counts and sleep stages as time series data, along with minimal depression and anxiety assessment scores as non-time-series data. The basic structure of the DL model was designed to process mixed-input data and perform multi-label-based classification for depression and anxiety. Various DL models, including the convolutional neural network (CNN) and long short-term memory (LSTM), were applied to process the time-series data, and model selection was conducted by comparing the performances of the hyperparameters. RESULTS: This study achieved significant results in the multi-label classification of depression and anxiety, with a Hamming loss score of 0.0946 in the Residual Network (ResNet), by applying a mixed-input DL model based on activity tracking data. The comparison of hyper-parameter performance and the development of various DL models, such as CNN, LSTM, and ResNet contributed to the optimization of time series data processing and achievement of meaningful results. CONCLUSIONS: This study can be considered as the first to develop a mixed-input DL model based on activity tracking data for the multi-label identification of late-life depression and anxiety. The findings of the study demonstrate the feasibility and potential of using consumer-grade wrist-worn activity trackers in conjunction with DL models to improve the identification of comorbid mental health conditions in older adults. The study also established a multi-label classification framework for identifying the complex symptoms of depression and anxiety.


Assuntos
Aprendizado Profundo , Humanos , Idoso , Depressão/diagnóstico , Depressão/epidemiologia , Ansiedade/diagnóstico , Ansiedade/epidemiologia , Transtornos de Ansiedade , Sono
7.
BMC Biol ; 21(1): 238, 2023 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-37904157

RESUMO

BACKGROUND: Therapeutic peptides play an essential role in human physiology, treatment paradigms and bio-pharmacy. Several computational methods have been developed to identify the functions of therapeutic peptides based on binary classification and multi-label classification. However, these methods fail to explicitly exploit the relationship information among different functions, preventing the further improvement of the prediction performance. Besides, with the development of peptide detection technology, peptide functions will be more comprehensively discovered. Therefore, it is necessary to explore computational methods for detecting therapeutic peptide functions with limited labeled data. RESULTS: In this study, a novel method called TPpred-LE based on Transformer framework was proposed for predicting therapeutic peptide multiple functions, which can explicitly extract the function correlation information by using label embedding methodology and exploit the specificity information based on function-specific classifiers. Besides, we incorporated the multi-label classifier retraining approach (MCRT) into TPpred-LE to detect the new therapeutic functions with limited labeled data. Experimental results demonstrate that TPpred-LE outperforms the other state-of-the-art methods, and TPpred-LE with MCRT is robust for the limited labeled data. CONCLUSIONS: In summary, TPpred-LE is a function-specific classifier for accurate therapeutic peptide function prediction, demonstrating the importance of the relationship information for therapeutic peptide function prediction. MCRT is a simple but effective strategy to detect functions with limited labeled data.


Assuntos
Biologia Computacional , Peptídeos , Humanos , Peptídeos/uso terapêutico
8.
Yi Chuan ; 46(8): 661-669, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39140146

RESUMO

The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.


Assuntos
Biologia Computacional , Biologia Computacional/métodos , Sequência de Aminoácidos , Proteínas/química , Algoritmos , Análise de Sequência de Proteína/métodos , Enzimas/química , Enzimas/metabolismo
9.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32520339

RESUMO

The long non-coding RNAs (lncRNAs) are subject of intensive recent studies due to its association with various human diseases. It is desirable to build the artificial intelligence-based models for prediction of diseases or tissues based on the lncRNAs data, which will be useful in disease diagnosis and therapy. The accuracy and robustness of existing models based on the machine learning techniques are subject to further improvement. In this study, we propose a deep learning model, called Multi-Label Classifications with Deep Forest, termed MLCDForest, to address multi-label classification on tissue prediction for a given lncRNA, which can be regarded as an implementation of the deep forest model in multi-label classification. The MLCDForest is a sequential multi-label-grained scanning method, which distinguishes from the standard deep forest model. It is proposed to train in sequential of multi-labels with label correlation considered. A systematic comparison using the lncRNA-disease association datasets demonstrates that our method consistently shows superior performance over the state-of-the-art methods in disease prediction. Considering label correlation in the sequential multi-label-grained scanning, our model provides a powerful tool to make multi-label classification and tissue prediction based on given lncRNAs.


Assuntos
Biologia Computacional , Aprendizado Profundo , Doença/genética , Modelos Genéticos , RNA Longo não Codificante/genética , Humanos
10.
J Biomed Inform ; 143: 104396, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37211195

RESUMO

Automated ICD coding is a multi-label prediction task aiming at assigning patient diagnoses with the most relevant subsets of disease codes. In the deep learning regime, recent works have suffered from large label set and heavy imbalance distribution. To mitigate the negative effect in such scenarios, we propose a retrieve and rerank framework that introduces the Contrastive Learning (CL) for label retrieval, allowing the model to make more accurate prediction from a simplified label space. Given the appealing discriminative power of CL, we adopt it as the training strategy to replace the standard cross-entropy objective and retrieve a small subset by taking the distance between clinical notes and ICD codes into account. After properly training, the retriever could implicitly capture the code co-occurrence, which makes up for the deficiency of cross-entropy assigning each label independently of the others. Further, we evolve a powerful model via a Transformer variant for refining and reranking the candidate set, which can extract semantically meaningful features from long clinical sequences. Applying our method on well-known models, experiments show that our framework provides more accurate results guaranteed by preselecting a small subset of candidates before fine-level reranking. Relying on the framework, our proposed model achieves 0.590 and 0.990 in terms of Micro-F1 and Micro-AUC on benchmark MIMIC-III.


Assuntos
Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Humanos
11.
Sensors (Basel) ; 23(19)2023 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-37836898

RESUMO

The traction system is very important to ensure the safe operation of high-speed trains, and the failure of the traction transformer is the most likely fault in the traction system. Fault diagnosis in actual work relies largely on manual experience. This paper proposes an improved RAkEL (Random k-Labelsets) algorithm for the fault diagnosis of high-speed train traction transformers. Firstly, this article starts from the large amount of "sleeping" fault maintenance data accumulated by the railway department, takes a single maintenance record as an instance, uses specific monitoring values to construct an instance vector, and uses the fault phenomena corresponding to the monitoring indicators as labels. Then, this paper improves the step of selecting k-labelsets in RAkEL, and extracts associated faults using the Relief algorithm. Finally, this paper excavates and uses the association rules between data and faults to identify traction transformer faults. The results showed that the improved RAkEL diagnostic method had a significant improvement in the evaluation indicators. Compared with other multi-label classification algorithms, including BR (Binary Relevance) and CLR (Calibrated Label Ranking), this method performs well on multiple evaluation indicators. It can further help engineers perform timely maintenance work in the future.

12.
Sensors (Basel) ; 23(5)2023 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-36904717

RESUMO

Classifications based on deep learning have been widely applied in the estimation of the direction of arrival (DOA) of signal. Due to the limited number of classes, the classification of DOA cannot satisfy the required prediction accuracy of signals from random azimuth in real applications. This paper presents a Centroid Optimization of deep neural network classification (CO-DNNC) to improve the estimation accuracy of DOA. CO-DNNC includes signal preprocessing, classification network, and Centroid Optimization. The DNN classification network adopts a convolutional neural network, including convolutional layers and fully connected layers. The Centroid Optimization takes the classified labels as the coordinates and calculates the azimuth of received signal according to the probabilities of the Softmax output. The experimental results show that CO-DNNC is capable of acquiring precise and accurate estimation of DOA, especially in the cases of low SNRs. In addition, CO-DNNC requires lower numbers of classes under the same condition of prediction accuracy and SNR, which reduces the complexity of the DNN network and saves training and processing time.

13.
Sichuan Da Xue Xue Bao Yi Xue Ban ; 54(5): 884-891, 2023 Sep.
Artigo em Zh | MEDLINE | ID: mdl-37866942

RESUMO

Objective: To improve the accuracy of potentially inappropriate medication (PIM) prediction, a PIM prediction model that combines knowledge graph and machine learning was proposed. Methods: Firstly, based on Beers criteria 2019 and using the knowledge graph as the basic structure, a PIM knowledge representation framework with logical expression capabilities was constructed, and a PIM inference process was implemented from patient information nodes to PIM nodes. Secondly, a machine learning prediction model for each PIM label was established based on the classifier chain algorithm, to learn the potential feature associations from the data. Finally, based on a threshold of sample size, a portion of reasoning results from the knowledge graph was used as output labels on the classifier chain to enhance the reliability of the prediction results of low-frequency PIMs. Results: 11 741 prescriptions from 9 medical institutions in Chengdu were used to evaluate the effectiveness of the model. Experimental results show that the accuracy of the model for PIM quantity prediction is 98.10%, the F1 is 93.66%, the Hamming loss for PIM multi-label prediction is 0.06%, and the macroF1 is 66.09%, which has higher prediction accuracy than the existing models. Conclusion: The method proposed has better prediction performance for potentially inappropriate medication and significantly improves the recognition of low-frequency PIM labels.


Assuntos
Prescrição Inadequada , Lista de Medicamentos Potencialmente Inapropriados , Humanos , Prescrição Inadequada/prevenção & controle , Reprodutibilidade dos Testes , Reconhecimento Automatizado de Padrão , Polimedicação , Estudos Retrospectivos
14.
Appl Intell (Dordr) ; 53(8): 9444-9462, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35966181

RESUMO

Multi-view multi-label learning (MVML) is an important paradigm in machine learning, where each instance is represented by several heterogeneous views and associated with a set of class labels. However, label incompleteness and the ignorance of both the relationships among views and the correlations among labels will cause performance degradation in MVML algorithms. Accordingly, a novel method, label recovery and label correlation co-learning for M ulti-V iew M ulti-L abel classification with inco M plete L abels (MV2ML), is proposed in this paper. First, a label correlation-guided binary classifier kernel-based is constructed for each label. Then, we adopt the multi-kernel fusion method to effectively fuse the multi-view data by utilizing the individual and complementary information among multiple views and distinguishing the contribution difference of each view. Finally, we propose a collaborative learning strategy that considers the exploitation of asymmetric label correlations, the fusion of multi-view data, the recovery of incomplete label matrix and the construction of the classification model simultaneously. In such a way, the recovery of incomplete label matrix and the learning of label correlations interact and boost each other to guide the training of classifiers. Extensive experimental results demonstrate that MV2ML achieves highly competitive classification performance against state-of-the-art approaches on various real-world multi-view multi-label datasets in terms of six evaluation criteria.

15.
BMC Genomics ; 23(1): 284, 2022 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-35395714

RESUMO

BACKGROUND: Disclosure of patients' genetic information in the process of applying machine learning techniques for tumor classification hinders the privacy of personal information. Homomorphic Encryption (HE), which supports operations between encrypted data, can be used as one of the tools to perform such computation without information leakage, but it brings great challenges for directly applying general machine learning algorithms due to the limitations of operations supported by HE. In particular, non-polynomial activation functions, including softmax functions, are difficult to implement with HE and require a suitable approximation method to minimize the loss of accuracy. In the secure genome analysis competition called iDASH 2020, it is presented as a competition task that a multi-label tumor classification method that predicts the class of samples based on genetic information using HE. METHODS: We develop a secure multi-label tumor classification method using HE to ensure privacy during all the computations of the model inference process. Our solution is based on a 1-layer neural network with the softmax activation function model and uses the approximate HE scheme. We present an approximation method that enables softmax activation in the model using HE and a technique for efficiently encoding data to reduce computational costs. In addition, we propose a HE-friendly data filtering method to reduce the size of large-scale genetic data. RESULTS: We aim to analyze the dataset from The Cancer Genome Atlas (TCGA) dataset, which consists of 3,622 samples from 11 types of cancers, genetic features from 25,128 genes. Our preprocessing method reduces the number of genes to 4,096 or less and achieves a microAUC value of 0.9882 (85% accuracy) with a 1-layer shallow neural network. Using our model, we successfully compute the tumor classification inference steps on the encrypted test data in 3.75 minutes. As a result of exceptionally high microAUC values, our solution was awarded co-first place in iDASH 2020 Track 1: "Secure multi-label Tumor classification using Homomorphic Encryption". CONCLUSIONS: Our solution is the first result of implementing a neural network model with softmax activation using HE. Also, HE optimization methods presented in this work enable machine learning implementation using HE or other challenging HE applications.


Assuntos
Segurança Computacional , Privacidade , Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Redes Neurais de Computação
16.
Brief Bioinform ; 21(5): 1628-1640, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31697319

RESUMO

Human protein subcellular localization has an important research value in biological processes, also in elucidating protein functions and identifying drug targets. Over the past decade, a number of protein subcellular localization prediction tools have been designed and made freely available online. The purpose of this paper is to summarize the progress of research on the subcellular localization of human proteins in recent years, including commonly used data sets proposed by the predecessors and the performance of all selected prediction tools against the same benchmark data set. We carry out a systematic evaluation of several publicly available subcellular localization prediction methods on various benchmark data sets. Among them, we find that mLASSO-Hum and pLoc-mHum provide a statistically significant improvement in performance, as measured by the value of accuracy, relative to the other methods. Meanwhile, we build a new data set using the latest version of Uniprot database and construct a new GO-based prediction method HumLoc-LBCI in this paper. Then, we test all selected prediction tools on the new data set. Finally, we discuss the possible development directions of human protein subcellular localization. Availability: The codes and data are available from http://www.lbci.cn/syn/.


Assuntos
Internet , Proteínas/metabolismo , Frações Subcelulares/metabolismo , Benchmarking , Conjuntos de Dados como Assunto , Humanos
17.
Sensors (Basel) ; 22(3)2022 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-35161928

RESUMO

The rapid growth and adaptation of medical information to identify significant health trends and help with timely preventive care have been recent hallmarks of the modern healthcare data system. Heart disease is the deadliest condition in the developed world. Cardiovascular disease and its complications, including dementia, can be averted with early detection. Further research in this area is needed to prevent strokes and heart attacks. An optimal machine learning model can help achieve this goal with a wealth of healthcare data on heart disease. Heart disease can be predicted and diagnosed using machine-learning-based systems. Active learning (AL) methods improve classification quality by incorporating user-expert feedback with sparsely labelled data. In this paper, five (MMC, Random, Adaptive, QUIRE, and AUDI) selection strategies for multi-label active learning were applied and used for reducing labelling costs by iteratively selecting the most relevant data to query their labels. The selection methods with a label ranking classifier have hyperparameters optimized by a grid search to implement predictive modelling in each scenario for the heart disease dataset. Experimental evaluation includes accuracy and F-score with/without hyperparameter optimization. Results show that the generalization of the learning model beyond the existing data for the optimized label ranking model uses the selection method versus others due to accuracy. However, the selection method was highlighted in regards to the F-score using optimized settings.


Assuntos
Doenças Cardiovasculares , Cardiopatias , Atenção à Saúde , Cardiopatias/diagnóstico , Humanos , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado
18.
Sensors (Basel) ; 22(14)2022 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-35890929

RESUMO

Commercial load is an essential demand-side resource. Monitoring commercial loads helps not only commercial customers understand their energy usage to improve energy efficiency but also helps electric utilities develop demand-side management strategies to ensure stable operation of the power system. However, existing non-intrusive methods cannot monitor multiple commercial loads simultaneously and do not consider the high correlation and severe imbalance among commercial loads. Therefore, this paper proposes a deep learning-based non-intrusive commercial load monitoring method to solve these problems. The method takes the total power signal of the commercial building as input and directly determines the state and power consumption of several specific appliances. The key elements of the method are a new neural network structure called TTRNet and a new loss function called MLFL. TTRNet is a multi-label classification model that can autonomously learn correlation information through its unique network structure. MLFL is a loss function specifically designed for multi-label classification tasks, which solves the imbalance problem and improves the monitoring accuracy for challenging loads. To validate the proposed method, experiments are performed separately in seen and unseen scenarios using a public dataset. In the seen scenario, the method achieves an average F1 score of 0.957, which is 7.77% better than existing multi-label classification methods. In the unseen scenario, the average F1 score is 0.904, which is 1.92% better than existing methods. The experimental results show that the method proposed in this paper is both effective and practical.


Assuntos
Aprendizado Profundo , Monitorização Fisiológica , Redes Neurais de Computação
19.
BMC Bioinformatics ; 22(1): 590, 2021 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-34903164

RESUMO

BACKGROUND: Clinical notes are documents that contain detailed information about the health status of patients. Medical codes generally accompany them. However, the manual diagnosis is costly and error-prone. Moreover, large datasets in clinical diagnosis are susceptible to noise labels because of erroneous manual annotation. Therefore, machine learning has been utilized to perform automatic diagnoses. Previous state-of-the-art (SOTA) models used convolutional neural networks to build document representations for predicting medical codes. However, the clinical notes are usually long-tailed. Moreover, most models fail to deal with the noise during code allocation. Therefore, denoising mechanism and long-tailed classification are the keys to automated coding at scale. RESULTS: In this paper, a new joint learning model is proposed to extend our attention model for predicting medical codes from clinical notes. On the MIMIC-III-50 dataset, our model outperforms all the baselines and SOTA models in all quantitative metrics. On the MIMIC-III-full dataset, our model outperforms in the macro-F1, micro-F1, macro-AUC, and precision at eight compared to the most advanced models. In addition, after introducing the denoising mechanism, the convergence speed of the model becomes faster, and the loss of the model is reduced overall. CONCLUSIONS: The innovations of our model are threefold: firstly, the code-specific representation can be identified by adopted the self-attention mechanism and the label attention mechanism. Secondly, the performance of the long-tailed distributions can be boosted by introducing the joint learning mechanism. Thirdly, the denoising mechanism is suitable for reducing the noise effects in medical code prediction. Finally, we evaluate the effectiveness of our model on the widely-used MIMIC-III datasets and achieve new SOTA results.


Assuntos
Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
20.
BMC Bioinformatics ; 22(1): 554, 2021 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-34781902

RESUMO

BACKGROUND: Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture. RESULTS: Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions. CONCLUSIONS: Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.


Assuntos
Proteínas de Ligação a RNA , RNA , Sítios de Ligação , Redes Neurais de Computação , Ligação Proteica , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA