Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34226915

RESUMO

Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.


Assuntos
Biologia Computacional/métodos , Pseudouridina/química , RNA/química , RNA/genética , Algoritmos , Aprendizado de Máquina , Pseudouridina/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos
2.
Sensors (Basel) ; 22(19)2022 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-36236620

RESUMO

Multispectral imaging (MSI) has become a new fast and non-destructive detection method in seed identification. Previous research has usually focused on single models in MSI data analysis, which always employed all features and increased the risk to efficiency and that of system cost. In this study, we developed a stacking ensemble learning (SEL) model for successfully identifying a single seed of sickle alfalfa (Medicago falcata), hybrid alfalfa (M. varia), and alfalfa (M. sativa). SEL adopted a three-layer structure, i.e., level 0 with principal component analysis (PCA), linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA) as models of dimensionality reduction and feature extraction (DRFE); level 1 with support vector machine (SVM), multiple logistic regression (MLR), generalized linear models with elastic net regularization (GLMNET), and eXtreme Gradient Boosting (XGBoost) as basic learners; and level 3 with XGBoost as meta-learner. We confirmed that the values of overall accuracy, kappa, precision, sensitivity, specificity, and sensitivity in the SEL model were all significantly higher than those in basic models alone, based on both spectral features and a combination of morphological and spectral features. Furthermore, we also developed a feature filtering process and successfully selected 5 optimal features out of 33 ones, which corresponded to the contents of chlorophyll, anthocyanin, fat, and moisture in seeds. Our SEL model in MSI data analysis provided a new way for seed identification, and the feature filter process potentially could be used widely for development of a low-cost and narrow-channel sensor.


Assuntos
Antocianinas , Medicago , Clorofila , Sementes , Máquina de Vetores de Suporte
3.
Sensors (Basel) ; 22(9)2022 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-35591030

RESUMO

Semantic segmentation network-based methods can detect concrete damage at the pixel level. However, the performance of a single semantic segmentation network is often limited. To improve the concrete damage detection performance of a semantic segmentation network, a stacking ensemble learning-based concrete crack detection method using multiple semantic segmentation networks is proposed. To realize this method, a database including 500 images and their labels with concrete crack and spalling is built and divided into training and testing sets. At first, the training and prediction of five semantic segmentation networks (FCN-8s, SegNet, U-Net, PSPNet and DeepLabv3+) are respectively implemented on the built training set according to a five-fold cross-validation principle, where 80% of the training images are used in the training process, and 20% training images are reserved. Then, in predicting the results of reserved training images from trained semantic segmentation networks, the class labels of all pixels are collected, and then four softmax regression-based ensemble learning models are trained using the collected class labels and their true classification labels. The trained ensemble learning models are applied to regressed testing results of semantic segmentation network models. Compared with the best single semantic segmentation network, the best ensemble learning model provides performance improvement of 0.21% PA, 0.54% MPA, 3.66% MIoU, and 0.12% FWIoU, respectively. The study results show that the stacking ensemble learning strategy can indeed improve concrete damage detection performance through ensemble learning of multiple semantic segmentation networks.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Bases de Dados Factuais , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Semântica
4.
Sensors (Basel) ; 21(11)2021 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-34070287

RESUMO

(1) Background: Diabetic retinopathy, one of the most serious complications of diabetes, is the primary cause of blindness in developed countries. Therefore, the prediction of diabetic retinopathy has a positive impact on its early detection and treatment. The prediction of diabetic retinopathy based on high-dimensional and small-sample-structured datasets (such as biochemical data and physical data) was the problem to be solved in this study. (2) Methods: This study proposed the XGB-Stacking model with the foundation of XGBoost and stacking. First, a wrapped feature selection algorithm, XGBIBS (Improved Backward Search Based on XGBoost), was used to reduce data feature redundancy and improve the effect of a single ensemble learning classifier. Second, in view of the slight limitation of a single classifier, a stacking model fusion method, Sel-Stacking (Select-Stacking), which keeps Label-Proba as the input matrix of meta-classifier and determines the optimal combination of learners by a global search, was used in the XGB-Stacking model. (3) Results: XGBIBS greatly improved the prediction accuracy and the feature reduction rate of a single classifier. Compared to a single classifier, the accuracy of the Sel-Stacking model was improved to varying degrees. Experiments proved that the prediction model of XGB-Stacking based on the XGBIBS algorithm and the Sel-Stacking method made effective predictions on diabetes retinopathy. (4) Conclusion: The XGB-Stacking prediction model of diabetic retinopathy based on biochemical and physical data had outstanding performance. This is highly significant to improve the screening efficiency of diabetes retinopathy and reduce the cost of diagnosis.


Assuntos
Diabetes Mellitus , Retinopatia Diabética , Algoritmos , Retinopatia Diabética/diagnóstico , Humanos , Aprendizado de Máquina
5.
Water Res ; 259: 121848, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-38824797

RESUMO

Chronic exposure to elevated geogenic arsenic (As) and fluoride (F-) concentrations in groundwater poses a significant global health risk. In regions around the world where regular groundwater quality assessments are limited, the presence of harmful levels of As and F- in shallow groundwater extracted from specific wells remains uncertain. This study utilized an enhanced stacking ensemble learning model to predict the distributions of As and F- in shallow groundwater based on 4,393 available datasets of observed concentrations and forty relevant environmental factors. The enhanced model was obtained by fusing well-suited Extreme Gradient Boosting, Random Forest, and Support Vector Machine as the base learners and a structurally simple Linear Discriminant Analysis as the meta-learner. The model precisely captured the patchy distributions of groundwater As and F- with an AUC value of 0.836 and 0.853, respectively. The findings revealed that 9.0% of the study area was characterized by a high As risk in shallow groundwater, while 21.2% was at high F- risk identified as having a high risk of fluoride contamination. About 0.2% of the study area shows elevated levels of both of them. The affected populations are estimated at approximately 7.61 million, 34.1 million, and 0.2 million, respectively. Furthermore, sedimentary environment exerted the greatest influence on distribution of groundwater As, with human activities and climate following closely behind at 29.5%, 28.1%, and 21.9%, respectively. Likewise, sedimentary environment was the primary factor affecting groundwater F- distribution, followed by hydrogeology and soil physicochemical properties, contributing 27.8%, 24.0%, and 23.3%, respectively. This study contributed to the identification of health risks associated with shallow groundwater As and F-, and provided insights into evaluating health risks in regions with limited samples.


Assuntos
Arsênio , Monitoramento Ambiental , Fluoretos , Água Subterrânea , Poluentes Químicos da Água , Água Subterrânea/química , Fluoretos/análise , Arsênio/análise , Poluentes Químicos da Água/análise , China
6.
Talanta ; 276: 126242, 2024 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-38761656

RESUMO

Spectral preprocessing techniques can, to a certain extent, eliminate irrelevant information, such as current noise and stray light from spectral data, thereby enhancing the performance of prediction models. However, current preprocessing techniques mostly attempt to find the best single preprocessing method or their combination, overlooking the complementary information among different preprocessing methods. These preprocessing techniques fail to maximize the utilization of useful information in spectral data and restrict the performance of prediction models. This study proposed a spectral ensemble preprocessing method based on the rapidly developing ensemble learning methods in recent years and the ridge regression (RR) model, named stacking preprocessing ridge regression (SPRR), to address the aforementioned issues. Different from conventional ensemble learning methods, the proposed SPRR method applied multiple different preprocessing techniques to the original spectral data, generating multiple preprocessed datasets. These datasets were then individually inputted into RR base models for training. Ultimately, RR still served as the meta-model, integrating the output results of each RR base model through stacking. This approach not only produced diversity in base models but also achieved higher accuracy and lower computational complexity by using a single type of base model. On the apple spectral dataset collected by our team, correlation analysis showed significant complementary information among the data produced by different preprocessing techniques. This provided robust theoretical support for the proposed SPRR method. By introducing the currently popular averaging ensemble preprocessing method in a comparative experiment, the results of applying the proposed SPRR method to six datasets (apple, meat, wheat, olive oil, tablet, and corn) demonstrated that compared to the single preprocessing method and averaging ensemble preprocessing method, SPRR yielded the best accuracy and reliability for all six datasets. Furthermore, under the same conditions of the training and test datasets, the proposed SPRR method demonstrated better performance than the four commonly used ensemble preprocessing methods.

7.
J Med Imaging Radiat Sci ; 55(4): 101729, 2024 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-39128321

RESUMO

PURPOSE: To construct a tumor motion monitoring model for stereotactic body radiation therapy (SBRT) of lung cancer from a feasibility perspective. METHODS: A total of 32 treatment plans for 22 patients were collected, whose planning CT and the centroid position of the planning target volume (PTV) were used as the reference. Images of different respiratory phases in 4DCT were acquired to redefine the targets and obtain the floating PTV centroid positions. In accordance with the planning CT and CBCT registration parameters, data augmentation was accomplished, yielding 2130 experimental recordings for analysis. We employed a stacking multi-learning ensemble approach to fit the 3D point cloud variations of body surface and the change of target position to construct the tumor motion monitoring model, and the prediction accuracy was assess using root mean squared error (RMSE) and R-Square (R2). RESULTS: The prediction displacement of the stacking ensemble model shows a high degree of agreement with the reference value in each direction. In the first layer of model, the X direction (RMSE =0.019 ∼ 0.145mm, R2 =0.9793∼0.9996) and the Z direction (RMSE = 0.051 ∼ 0.168 mm, R2 = 0.9736∼0.9976) show the best results, while the Y direction ranked behind (RMSE = 0.088 ∼ 0.224 mm, R2 = 0.9553∼ 0.9933). The second layer model summarizes the advantages of unit models of first layer, and RMSE of 0.015 mm, 0.083 mm, 0.041 mm, and R2 of 0.9998, 0.9931, 0.9984 respectively for X, Y, Z were obtained. CONCLUSIONS: The tumor motion monitoring method for SBRT of lung cancer has potential application of non-ionization, non-invasive, markerless, and real-time.

8.
Comput Biol Med ; 178: 108731, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38870727

RESUMO

Non-sugar sweeteners (NSSs) or artificial sweeteners have long been used as food chemicals since World War II. NSSs, however, also raise a concern about their mutagenicity. Evaluating the mutagenic ability of NSSs is crucial for food safety; this step is needed for every new chemical registration in the food and pharmaceutical industries. A computational assessment provides less time, money, and involved animals than the in vivo experiments; thus, this study developed a novel computational method from an ensemble convolutional deep neural network and read-across algorithms, called DeepRA, to classify the mutagenicity of chemicals. The mutagenicity data were obtained from the curated Ames test data set. The DeepRA model was developed using both molecular descriptors and molecular fingerprints. The obtained DeepRA model provides accurate and reliable mutagenicity classification through an independent test set. This model was then used to examine the NSSs-related chemicals, enabling the evaluation of mutagenicity from the NSSs-like substances. Finally, this model was publicly available at https://github.com/taraponglab/deepra for further use in chemical regulation and risk assessment.


Assuntos
Aprendizado Profundo , Mutagênicos , Mutagênicos/toxicidade , Edulcorantes/toxicidade , Testes de Mutagenicidade , Algoritmos , Redes Neurais de Computação
9.
Ultrasound Med Biol ; 50(9): 1361-1371, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38910034

RESUMO

BACKGROUND: Ultrasound image examination has become the preferred choice for diagnosing metabolic dysfunction-associated steatotic liver disease (MASLD) due to its non-invasive nature. Computer-aided diagnosis (CAD) technology can assist doctors in avoiding deviations in the detection and classification of MASLD. METHOD: We propose a hybrid model that integrates the pre-trained VGG16 network with an attention mechanism and a stacking ensemble learning model, which is capable of multi-scale feature aggregation based on the self-attention mechanism and multi-classification model fusion (Logistic regression, random forest, support vector machine) based on stacking ensemble learning. The proposed hybrid method achieves four classifications of normal, mild, moderate, and severe fatty liver based on ultrasound images. RESULT AND CONCLUSION: Our proposed hybrid model reaches an accuracy of 91.34% and exhibits superior robustness against interference, which is better than traditional neural network algorithms. Experimental results show that, compared with the pre-trained VGG16 model, adding the self-attention mechanism improves the accuracy by 3.02%. Using the stacking ensemble learning model as a classifier further increases the accuracy to 91.34%, exceeding any single classifier such as LR (89.86%) and SVM (90.34%) and RF (90.73%). The proposed hybrid method can effectively improve the efficiency and accuracy of MASLD ultrasound image detection.


Assuntos
Algoritmos , Redes Neurais de Computação , Ultrassonografia , Humanos , Ultrassonografia/métodos , Fígado/diagnóstico por imagem , Fígado Gorduroso/diagnóstico por imagem , Aprendizado de Máquina , Interpretação de Imagem Assistida por Computador/métodos
10.
PeerJ Comput Sci ; 10: e2103, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983199

RESUMO

Images and videos containing fake faces are the most common type of digital manipulation. Such content can lead to negative consequences by spreading false information. The use of machine learning algorithms to produce fake face images has made it challenging to distinguish between genuine and fake content. Face manipulations are categorized into four basic groups: entire face synthesis, face identity manipulation (deepfake), facial attribute manipulation and facial expression manipulation. The study utilized lightweight convolutional neural networks to detect fake face images generated by using entire face synthesis and generative adversarial networks. The dataset used in the training process includes 70,000 real images in the FFHQ dataset and 70,000 fake images produced with StyleGAN2 using the FFHQ dataset. 80% of the dataset was used for training and 20% for testing. Initially, the MobileNet, MobileNetV2, EfficientNetB0, and NASNetMobile convolutional neural networks were trained separately for the training process. In the training, the models were pre-trained on ImageNet and reused with transfer learning. As a result of the first trainings EfficientNetB0 algorithm reached the highest accuracy of 93.64%. The EfficientNetB0 algorithm was revised to increase its accuracy rate by adding two dense layers (256 neurons) with ReLU activation, two dropout layers, one flattening layer, one dense layer (128 neurons) with ReLU activation function, and a softmax activation function used for the classification dense layer with two nodes. As a result of this process accuracy rate of 95.48% was achieved with EfficientNetB0 algorithm. Finally, the model that achieved 95.48% accuracy was used to train MobileNet and MobileNetV2 models together using the stacking ensemble learning method, resulting in the highest accuracy rate of 96.44%.

11.
Brief Funct Genomics ; 22(3): 302-311, 2023 05 18.
Artigo em Inglês | MEDLINE | ID: mdl-36715222

RESUMO

Enhancers, a class of distal cis-regulatory elements located in the non-coding region of DNA, play a key role in gene regulation. It is difficult to identify enhancers from DNA sequence data because enhancers are freely distributed in the non-coding region, with no specific sequence features, and having a long distance with the targeted promoters. Therefore, this study presents a stacking ensemble learning method to accurately identify enhancers and classify enhancers into strong and weak enhancers. Firstly, we obtain the fusion feature matrix by fusing the four features of Kmer, PseDNC, PCPseDNC and Z-Curve9. Secondly, five K-Nearest Neighbor (KNN) models with different parameters are trained as the base model, and the Logistic Regression algorithm is utilized as the meta-model. Thirdly, the stacking ensemble learning strategy is utilized to construct a two-layer model based on the base model and meta-model to train the preprocessed feature sets. The proposed method, named iEnhancer-SKNN, is a two-layer prediction model, in which the function of the first layer is to predict whether the given DNA sequences are enhancers or non-enhancers, and the function of the second layer is to distinguish whether the predicted enhancers are strong enhancers or weak enhancers. The performance of iEnhancer-SKNN is evaluated on the independent testing dataset and the results show that the proposed method has better performance in predicting enhancers and their strength. In enhancer identification, iEnhancer-SKNN achieves an accuracy of 81.75%, an improvement of 1.35% to 8.75% compared with other predictors, and in enhancer classification, iEnhancer-SKNN achieves an accuracy of 80.50%, an improvement of 5.5% to 25.5% compared with other predictors. Moreover, we identify key transcription factor binding site motifs in the enhancer regions and further explore the biological functions of the enhancers and these key motifs. Source code and data can be downloaded from https://github.com/HaoWuLab-Bioinformatics/iEnhancer-SKNN.


Assuntos
Elementos Facilitadores Genéticos , Software , Elementos Facilitadores Genéticos/genética , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNA/métodos , DNA , Aprendizado de Máquina
12.
Methods Mol Biol ; 2624: 139-151, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36723814

RESUMO

Pseudouridine is a ubiquitous RNA modification and plays a crucial role in many biological processes. However, it remains a challenging task to identify pseudouridine sites using expensive and time-consuming experimental research. To this end, we present Porpoise, a computational approach to identify pseudouridine sites from RNA sequence data. Porpoise builds on a stacking ensemble learning framework with several informative features and achieves competitive performance compared with state-of-the-art approaches. This protocol elaborates on step-by-step use and execution of the local stand-alone version and the webserver of Porpoise. In addition, we also provide a general machine learning framework that can help identify the optimal stacking ensemble learning model using different combinations of feature-based features. This general machine learning framework can facilitate users to build their pseudouridine predictors using their in-house datasets.


Assuntos
Pseudouridina , RNA , RNA/genética , Aprendizado de Máquina , Sequência de Bases
13.
Diagnostics (Basel) ; 13(4)2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-36832205

RESUMO

Endoscopic procedures for diagnosing gastrointestinal tract findings depend on specialist experience and inter-observer variability. This variability can cause minor lesions to be missed and prevent early diagnosis. In this study, deep learning-based hybrid stacking ensemble modeling has been proposed for detecting and classifying gastrointestinal system findings, aiming at early diagnosis with high accuracy and sensitive measurements and saving workload to help the specialist and objectivity in endoscopic diagnosis. In the first level of the proposed bi-level stacking ensemble approach, predictions are obtained by applying 5-fold cross-validation to three new CNN models. A machine learning classifier selected at the second level is trained according to the obtained predictions, and the final classification result is reached. The performances of the stacking models were compared with the performances of the deep learning models, and McNemar's statistical test was applied to support the results. According to the experimental results, stacking ensemble models performed with a significant difference with 98.42% ACC and 98.19% MCC in the KvasirV2 dataset and 98.53% ACC and 98.39% MCC in the HyperKvasir dataset. This study is the first to offer a new learning-oriented approach that efficiently evaluates CNN features and provides objective and reliable results with statistical testing compared to state-of-the-art studies on the subject. The proposed approach improves the performance of deep learning models and outperforms the state-of-the-art studies in the literature.

14.
Comput Biol Med ; 165: 107386, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37619323

RESUMO

Diabetes mellitus has become a major public health concern associated with high mortality and reduced life expectancy and can cause blindness, heart attacks, kidney failure, lower limb amputations, and strokes. A new generation of antidiabetic peptides (ADPs) that act on ß-cells or T-cells to regulate insulin production is being developed to alleviate the effects of diabetes. However, the lack of effective peptide-mining tools has hampered the discovery of these promising drugs. Hence, novel computational tools need to be developed urgently. In this study, we present ADP-Fuse, a novel two-layer prediction framework capable of accurately identifying ADPs or non-ADPs and categorizing them into type 1 and type 2 ADPs. First, we comprehensively evaluated 22 peptide sequence-derived features coupled with eight notable machine learning algorithms. Subsequently, the most suitable feature descriptors and classifiers for both layers were identified. The output of these single-feature models, embedded with multiview information, was trained with an appropriate classifier to provide the final prediction. Comprehensive cross-validation and independent tests substantiate that ADP-Fuse surpasses single-feature models and the feature fusion approach for the prediction of ADPs and their types. In addition, the SHapley Additive exPlanation method was used to elucidate the contributions of individual features to the prediction of ADPs and their types. Finally, a user-friendly web server for ADP-Fuse was developed and made publicly accessible (https://balalab-skku.org/ADP-Fuse), enabling the swift screening and identification of novel ADPs and their types. This framework is expected to contribute significantly to antidiabetic peptide identification.


Assuntos
Diabetes Mellitus , Hipoglicemiantes , Peptídeos , Sequência de Aminoácidos , Algoritmos , Aprendizado de Máquina , Biologia Computacional
15.
Diagnostics (Basel) ; 13(3)2023 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-36766500

RESUMO

(1) Background: Accurate diagnosis of wound age is crucial for investigating violent cases in forensic practice. However, effective biomarkers and forecast methods are lacking. (2) Methods: Samples were collected from rats divided randomly into control and contusion groups at 0, 4, 8, 12, 16, 20, and 24 h post-injury. The characteristics of concern were nine mRNA expression levels. Internal validation data were used to train different machine learning algorithms, namely random forest (RF), support vector machine (SVM), multilayer perceptron (MLP), gradient boosting (GB), and stochastic gradient descent (SGD), to predict wound age. These models were considered the base learners, which were then applied to developing 26 stacking ensemble models combining two, three, four, or five base learners. The best-performing stacking model and base learner were evaluated through external validation data. (3) Results: The best results were obtained using a stacking model of RF + SVM + MLP (accuracy = 92.85%, area under the receiver operating characteristic curve (AUROC) = 0.93, root-mean-square-error (RMSE) = 1.06 h). The wound age prediction performance of the stacking models was also confirmed for another independent dataset. (4) Conclusions: We illustrate that machine learning techniques, especially ensemble algorithms, have a high potential to be used to predict wound age. According to the results, the strategy can be applied to other types of forensic forecasts.

16.
Front Genet ; 14: 1332273, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38264213

RESUMO

Increasing evidence indicates that mutations and dysregulation of long non-coding RNA (lncRNA) play a crucial role in the pathogenesis and prognosis of complex human diseases. Computational methods for predicting the association between lncRNAs and diseases have gained increasing attention. However, these methods face two key challenges: obtaining reliable negative samples and incorporating lncRNA-disease association (LDA) information from multiple perspectives. This paper proposes a method called NDMLDA, which combines multi-view feature extraction, unsupervised negative sample denoising, and stacking ensemble classifier. Firstly, an unsupervised method (K-means) is used to design a negative sample denoising module to alleviate the imbalance of samples and the impact of potential noise in the negative samples on model performance. Secondly, graph attention networks are employed to extract multi-view features of both lncRNAs and diseases, thereby enhancing the learning of association information between them. Finally, lncRNA-disease association prediction is implemented through a stacking ensemble classifier. Existing research datasets are integrated to evaluate performance, and 5-fold cross-validation is conducted on this dataset. Experimental results demonstrate that NDMLDA achieves an AUC of 0.9907and an AUPR of 0.9927, with a 5-fold cross-validation variance of less than 0.1%. These results outperform the baseline methods. Additionally, case studies further illustrate the model's potential in cancer diagnosis and precision medicine implementation.

17.
Environ Sci Pollut Res Int ; 30(27): 71063-71087, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37156950

RESUMO

Accurate prediction of carbon emissions is vital to achieving carbon neutrality, which is one of the major goals of the global effort to protect the ecological environment. However, due to the high complexity and volatility of carbon emission time series, it is hard to forecast carbon emissions effectively. This research offers a novel decomposition-ensemble framework for multi-step prediction of short-term carbon emissions. The proposed framework involves three main steps: (i) data decomposition. A secondary decomposition method, which is a combination of empirical wavelet transform (EWT) and variational modal decomposition (VMD), is used to process the original data. (ii) Prediction and selection: ten models are used to forecast the processed data. Then, neighborhood mutual information (NMI) is used to select suitable sub-models from candidate models. (iii) Stacking ensemble: the stacking ensemble learning method is innovatively introduced to integrate the selected sub-models and output the final prediction results. For illustration and verification, the carbon emissions of three representative EU countries are used as our sample data. The empirical results show that the proposed framework is superior to other benchmark models in predictions 1, 15, and 30 steps ahead, with the mean absolute percentage error (MAPE) of the proposed framework being as low as 5.4475% in Italy dataset, 7.3159% in France dataset, and 8.6821% in Germany dataset.


Assuntos
Meio Ambiente , Análise de Ondaletas , Previsões , Fatores de Tempo , França
18.
Comput Biol Med ; 148: 105700, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35715261

RESUMO

Tumor homing peptides (THPs) play a crucial role in recognizing and specifically binding to cancer cells. Although experimental approaches can facilitate the precise identification of THPs, they are usually time-consuming, labor-intensive, and not cost-effective. However, computational approaches can identify THPs by utilizing sequence information alone, thus highlighting their great potential for large-scale identification of THPs. Herein, we propose NEPTUNE, a novel computational approach for the accurate and large-scale identification of THPs from sequence information. Specifically, we constructed variant baseline models from multiple feature encoding schemes coupled with six popular machine learning algorithms. Subsequently, we comprehensively assessed and investigated the effects of these baseline models on THP prediction. Finally, the probabilistic information generated by the optimal baseline models is fed into a support vector machine-based classifier to construct the final meta-predictor (NEPTUNE). Cross-validation and independent tests demonstrated that NEPTUNE achieved superior performance for THP prediction compared with its constituent baseline models and the existing methods. Moreover, we employed the powerful SHapley additive exPlanations method to improve the interpretation of NEPTUNE and elucidate the most important features for identifying THPs. Finally, we implemented an online web server using NEPTUNE, which is available at http://pmlabstack.pythonanywhere.com/NEPTUNE. NEPTUNE could be beneficial for the large-scale identification of unknown THP candidates for follow-up experimental validation.


Assuntos
Neoplasias , Netuno , Algoritmos , Biologia Computacional , Humanos , Aprendizado de Máquina , Peptídeos , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA