Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38670158

RESUMO

Despite the widespread use of ionizable lipid nanoparticles (LNPs) in clinical applications for messenger RNA (mRNA) delivery, the mRNA drug delivery system faces an efficient challenge in the screening of LNPs. Traditional screening methods often require a substantial amount of experimental time and incur high research and development costs. To accelerate the early development stage of LNPs, we propose TransLNP, a transformer-based transfection prediction model designed to aid in the selection of LNPs for mRNA drug delivery systems. TransLNP uses two types of molecular information to perceive the relationship between structure and transfection efficiency: coarse-grained atomic sequence information and fine-grained atomic spatial relationship information. Due to the scarcity of existing LNPs experimental data, we find that pretraining the molecular model is crucial for better understanding the task of predicting LNPs properties, which is achieved through reconstructing atomic 3D coordinates and masking atom predictions. In addition, the issue of data imbalance is particularly prominent in the real-world exploration of LNPs. We introduce the BalMol block to solve this problem by smoothing the distribution of labels and molecular features. Our approach outperforms state-of-the-art works in transfection property prediction under both random and scaffold data splitting. Additionally, we establish a relationship between molecular structural similarity and transfection differences, selecting 4267 pairs of molecular transfection cliffs, which are pairs of molecules that exhibit high structural similarity but significant differences in transfection efficiency, thereby revealing the primary source of prediction errors. The code, model and data are made publicly available at https://github.com/wklix/TransLNP.


Assuntos
Lipídeos , Lipossomos , Nanopartículas , RNA Mensageiro , Nanopartículas/química , RNA Mensageiro/genética , RNA Mensageiro/química , Lipídeos/química , Transfecção , Humanos , Modelos Moleculares , Sistemas de Liberação de Medicamentos
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37328692

RESUMO

Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic protein-protein interaction (PPI) networks. Current methods usually first infer PPIs based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high-confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.


Assuntos
Mapeamento de Interação de Proteínas , Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Algoritmos , Mapas de Interação de Proteínas , Software
3.
BMC Bioinformatics ; 25(1): 111, 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38486135

RESUMO

BACKGROUND: DNA-binding proteins (DNA-BPs) are the proteins that bind and interact with DNA. DNA-BPs regulate and affect numerous biological processes, such as, transcription and DNA replication, repair, and organization of the chromosomal DNA. Very few proteins, however, are DNA-binding in nature. Therefore, it is necessary to develop an efficient predictor for identifying DNA-BPs. RESULT: In this work, we have proposed new benchmark datasets for the DNA-binding protein prediction problem. We discovered several quality concerns with the widely used benchmark datasets, PDB1075 (for training) and PDB186 (for independent testing), which necessitated the preparation of new benchmark datasets. Our proposed datasets UNIPROT1424 and UNIPROT356 can be used for model training and independent testing respectively. We have retrained selected state-of-the-art DNA-BP predictors in the new dataset and reported their performance results. We also trained a novel predictor using the new benchmark dataset. We extracted features from various feature categories, then used a Random Forest classifier and Recursive Feature Elimination with Cross-validation (RFECV) to select the optimal set of 452 features. We then proposed a stacking ensemble architecture as our final prediction model. Named Stacking Ensemble Model for DNA-binding Protein Prediction, or StackDPP in short, our model achieved 0.92, 0.92 and 0.93 accuracy in 10-fold cross-validation, jackknife and independent testing respectively. CONCLUSION: StackDPP has performed very well in cross-validation testing and has outperformed all the state-of-the-art prediction models in independent testing. Its performance scores in cross-validation testing generalized very well in the independent test set. The source code of the model is publicly available at https://github.com/HasibAhmed1624/StackDPP . Therefore, we expect this generalized model can be adopted by researchers and practitioners to identify novel DNA-binding proteins.


Assuntos
Algoritmos , Proteínas de Ligação a DNA , Proteínas de Ligação a DNA/metabolismo , Software , DNA/metabolismo
4.
Pattern Recognit Lett ; 182: 111-117, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-39086494

RESUMO

Detecting action units is an important task in face analysis, especially in facial expression recognition. This is due, in part, to the idea that expressions can be decomposed into multiple action units. To evaluate systems that detect action units, F1-binary score is often used as the evaluation metric. In this paper, we argue that F1-binary score does not reliably evaluate these models due largely to class imbalance. Because of this, F1-binary score should be retired and a suitable replacement should be used. We justify this argument through a detailed evaluation of the negative influence of class imbalance on action unit detection. This includes an investigation into the influence of class imbalance in train and test sets and in new data (i.e., generalizability). We empirically show that F1-micro should be used as the replacement for F1-binary.

5.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33834199

RESUMO

Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina , Modelos Biológicos , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Bases de Dados de Proteínas , Humanos , Redes Neurais de Computação , Proteínas/classificação , Reprodutibilidade dos Testes
6.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34322702

RESUMO

Since 2015, a fast growing number of deep learning-based methods have been proposed for protein-ligand binding site prediction and many have achieved promising performance. These methods, however, neglect the imbalanced nature of binding site prediction problems. Traditional data-based approaches for handling data imbalance employ linear interpolation of minority class samples. Such approaches may not be fully exploited by deep neural networks on downstream tasks. We present a novel technique for balancing input classes by developing a deep neural network-based variational autoencoder (VAE) that aims to learn important attributes of the minority classes concerning nonlinear combinations. After learning, the trained VAE was used to generate new minority class samples that were later added to the original data to create a balanced dataset. Finally, a convolutional neural network was used for classification, for which we assumed that the nonlinearity could be fully integrated. As a case study, we applied our method to the identification of FAD- and FMN-binding sites of electron transport proteins. Compared with the best classifiers that use traditional machine learning algorithms, our models obtained a great improvement on sensitivity while maintaining similar or higher levels of accuracy and specificity. We also demonstrate that our method is better than other data imbalance handling techniques, such as SMOTE, ADASYN, and class weight adjustment. Additionally, our models also outperform existing predictors in predicting the same binding types. Our method is general and can be applied to other data types for prediction problems with moderate-to-heavy data imbalances.


Assuntos
Redes Neurais de Computação , Algoritmos , Aprendizado Profundo , Ligantes
7.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33709119

RESUMO

Discovering drug-target (protein) interactions (DTIs) is of great significance for researching and developing novel drugs, having a tremendous advantage to pharmaceutical industries and patients. However, the prediction of DTIs using wet-lab experimental methods is generally expensive and time-consuming. Therefore, different machine learning-based methods have been developed for this purpose, but there are still substantial unknown interactions needed to discover. Furthermore, data imbalance and feature dimensionality problems are a critical challenge in drug-target datasets, which can decrease the classifier performances that have not been significantly addressed yet. This paper proposed a novel drug-target interaction prediction method called PreDTIs. First, the feature vectors of the protein sequence are extracted by the pseudo-position-specific scoring matrix (PsePSSM), dipeptide composition (DC) and pseudo amino acid composition (PseAAC); and the drug is encoded with MACCS substructure fingerings. Besides, we propose a FastUS algorithm to handle the class imbalance problem and also develop a MoIFS algorithm to remove the irrelevant and redundant features for getting the best optimal features. Finally, balanced and optimal features are provided to the LightGBM Classifier to identify DTIs, and the 5-fold CV validation test method was applied to evaluate the prediction ability of the proposed method. Prediction results indicate that the proposed model PreDTIs is significantly superior to other existing methods in predicting DTIs, and our model could be used to discover new drugs for unknown disorders or infections, such as for the coronavirus disease 2019 using existing drugs compounds and severe acute respiratory syndrome coronavirus 2 protein sequences.


Assuntos
Biologia Computacional/métodos , Preparações Farmacêuticas/química , Proteínas/química , Conjuntos de Dados como Assunto , Aprendizado de Máquina , Ligação Proteica
8.
Sensors (Basel) ; 23(7)2023 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-37050751

RESUMO

Certain fields present significant challenges when attempting to train complex Deep Learning architectures, particularly when the available datasets are limited and imbalanced. Real-time object detection in maritime environments using aerial images is a notable example. Although SeaDronesSee is the most extensive and complete dataset for this task, it suffers from significant class imbalance. To address this issue, we present POSEIDON, a data augmentation tool specifically designed for object detection datasets. Our approach generates new training samples by combining objects and samples from the original training set while utilizing the image metadata to make informed decisions. We evaluate our method using YOLOv5 and YOLOv8 and demonstrate its superiority over other balancing techniques, such as error weighting, by an overall improvement of 2.33% and 4.6%, respectively.

9.
Sensors (Basel) ; 23(7)2023 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-37050506

RESUMO

The analysis of sleep stages for children plays an important role in early diagnosis and treatment. This paper introduces our sleep stage classification method addressing the following two challenges: the first is the data imbalance problem, i.e., the highly skewed class distribution with underrepresented minority classes. For this, a Gaussian Noise Data Augmentation (GNDA) algorithm was applied to polysomnography recordings to seek the balance of data sizes for different sleep stages. The second challenge is the difficulty in identifying a minority class of sleep stages, given their short sleep duration and similarities to other stages in terms of EEG characteristics. To overcome this, we developed a DeConvolution- and Self-Attention-based Model (DCSAM) which can inverse the feature map of a hidden layer to the input space to extract local features and extract the correlations between all possible pairs of features to distinguish sleep stages. The results on our dataset show that DCSAM based on GNDA obtains an accuracy of 90.26% and a macro F1-score of 86.51% which are higher than those of our previous method. We also tested DCSAM on a well-known public dataset-Sleep-EDFX-to prove whether it is applicable to sleep data from adults. It achieves a comparable performance to state-of-the-art methods, especially accuracies of 91.77%, 92.54%, 94.73%, and 95.30% for six-stage, five-stage, four-stage, and three-stage classification, respectively. These results imply that our DCSAM based on GNDA has a great potential to offer performance improvements in various medical domains by considering the data imbalance problems and correlations among features in time series data.


Assuntos
Eletroencefalografia , Sono , Adulto , Humanos , Criança , Eletroencefalografia/métodos , Fases do Sono , Polissonografia/métodos , Algoritmos
10.
Psychother Res ; 33(6): 683-695, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-36669124

RESUMO

Objective: The occurrence of dropout from psychological interventions is associated with poor treatment outcome and high health, societal and economic costs. Recently, machine learning (ML) algorithms have been tested in psychotherapy outcome research. Dropout predictions are usually limited by imbalanced datasets and the size of the sample. This paper aims to improve dropout prediction by comparing ML algorithms, sample sizes and resampling methods. Method: Twenty ML algorithms were examined in twelve subsamples (drawn from a sample of N = 49,602) using four resampling methods in comparison to the absence of resampling and to each other. Prediction accuracy was evaluated in an independent holdout dataset using the F1-Measure. Results: Resampling methods improved the performance of ML algorithms and down-sampling can be recommended, as it was the fastest method and as accurate as the other methods. For the highest mean F1-Score of .51 a minimum sample size of N = 300 was necessary. No specific algorithm or algorithm group can be recommended. Conclusion: Resampling methods could improve the accuracy of predicting dropout in psychological interventions. Down-sampling is recommended as it is the least computationally taxing method. The training sample should contain at least 300 cases.


Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Tamanho da Amostra , Psicoterapia
11.
Comput Commun ; 198: 195-205, 2023 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-36506874

RESUMO

Road crashes are a major problem for traffic safety management, which usually causes flash crowd traffic with a profound influence on traffic management and communication systems. In 2020, the sudden outbreak of the novel coronavirus disease (COVID-19) pandemic led to significant changes in road traffic conditions. In this paper, by analyzing crash data from 2016 to 2020 and new COVID-19 case data in 2020, we find that the average crash severity and crash deaths during this period (a rapid increase of new COVID-19 cases in 2020) are higher than those in previous four years. Hence, it is necessary to exploit a novel road crash risk prediction model for such an emergency. We propose a novel data-adaptive fatigue focal loss (DA-FFL) method by fusing fatigue factors to establish a road crash risk prediction model under the scenario of large-scale emergencies. Finally, the experimental results demonstrate that DA-FFL performs better than the other typical methods in terms of area under curve (AUC) and false alarm rate (FAR) for imbalanced data. Furthermore, DA-FFL has better prediction performance in convolutional neural networks-long short-term memory (CNN-LSTM).

12.
Brief Bioinform ; 21(4): 1448-1454, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31267129

RESUMO

For genome-wide CRISPR off-target cleavage sites (OTS) prediction, an important issue is data imbalance-the number of true OTS recognized by whole-genome off-target detection techniques is much smaller than that of all possible nucleotide mismatch loci, making the training of machine learning model very challenging. Therefore, computational models proposed for OTS prediction and scoring should be carefully designed and properly evaluated in order to avoid bias. In our study, two tools are taken as examples to further emphasize the data imbalance issue in CRISPR off-target prediction to achieve better sensitivity and specificity for optimized CRISPR gene editing. We would like to indicate that (1) the benchmark of CRISPR off-target prediction should be properly evaluated and not overestimated by considering data imbalance issue; (2) incorporation of efficient computational techniques (including ensemble learning and data synthesis techniques) can help to address the data imbalance issue and improve the performance of CRISPR off-target prediction. Taking together, we call for more efforts to address the data imbalance issue in CRISPR off-target prediction to facilitate clinical utility of CRISPR-based gene editing techniques.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Edição de Genes/métodos , Aprendizado de Máquina
13.
J Biomed Inform ; 135: 104192, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36064114

RESUMO

The extraction of drug-drug interactions (DDIs) is an important task in the field of biomedical research, which can reduce unexpected health risks during patient treatment. Previous work indicates that methods using external drug information have a much higher performance than those methods not using it. However, the use of external drug information is time-consuming and resource-costly. In this work, we propose a novel method for extracting DDIs which does not use external drug information, but still achieves comparable performance. First, we no longer convert the drug name to standard tokens such as DRUG0, the method commonly used in previous research. Instead, full drug names with drug entity marking are input to BioBERT, allowing us to enhance the selected drug entity pair. Second, we adopt the Key Semantic Sentence approach to emphasize the words closely related to the DDI relation of the selected drug pair. After the above steps, the misclassification of similar instances which are created from the same sentence but corresponding to different pairs of drug entities can be significantly reduced. Then, we employ the Gradient Harmonizing Mechanism (GHM) loss to reduce the weight of mislabeled instances and easy-to-classify instances, both of which can lead to poor performance in DDI extraction. Overall, we demonstrate in this work that it is better not to use drug blinding with BioBERT, and show that GHM performs better than Cross-Entropy loss if the proportion of label noise is less than 30%. The proposed model achieves state-of-the-art results with an F1-score of 84.13% on the DDIExtraction 2013 corpus (a standard English DDI corpus), which fills the performance gap (4%) between methods that rely on and do not rely on external drug information.


Assuntos
Pesquisa Biomédica , Semântica , Humanos , Mineração de Dados/métodos , Interações Medicamentosas
14.
J Biomed Inform ; 129: 104072, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35421602

RESUMO

BACKGROUND: Medical decision-making impacts both individual and public health. Clinical scores are commonly used among various decision-making models to determine the degree of disease deterioration at the bedside. AutoScore was proposed as a useful clinical score generator based on machine learning and a generalized linear model. However, its current framework still leaves room for improvement when addressing unbalanced data of rare events. METHODS: Using machine intelligence approaches, we developed AutoScore-Imbalance, which comprises three components: training dataset optimization, sample weight optimization, and adjusted AutoScore. Baseline techniques for performance comparison included the original AutoScore, full logistic regression, stepwise logistic regression, least absolute shrinkage and selection operator (LASSO), full random forest, and random forest with a reduced number of variables. These models were evaluated based on their area under the curve (AUC) in the receiver operating characteristic analysis and balanced accuracy (i.e., mean value of sensitivity and specificity). By utilizing a publicly accessible dataset from Beth Israel Deaconess Medical Center, we assessed the proposed model and baseline approaches to predict inpatient mortality. RESULTS: AutoScore-Imbalance outperformed baselines in terms of AUC and balanced accuracy. The nine-variable AutoScore-Imbalance sub-model achieved the highest AUC of 0.786 (0.732-0.839), while the eleven-variable original AutoScore obtained an AUC of 0.723 (0.663-0.783), and the logistic regression with 21 variables obtained an AUC of 0.743 (0.685-0.801). The AutoScore-Imbalance sub-model (using a down-sampling algorithm) yielded an AUC of 0.771 (0.718-0.823) with only five variables, demonstrating a good balance between performance and variable sparsity. Furthermore, AutoScore-Imbalance obtained the highest balanced accuracy of 0.757 (0.702-0.805), compared to 0.698 (0.643-0.753) by the original AutoScore and the maximum of 0.720 (0.664-0.769) by other baseline models. CONCLUSIONS: We have developed an interpretable tool to handle clinical data imbalance, presented its structure, and demonstrated its superiority over baselines. The AutoScore-Imbalance tool can be applied to highly unbalanced datasets to gain further insight into rare medical events and facilitate real-world clinical decision-making.


Assuntos
Algoritmos , Aprendizado de Máquina , Tomada de Decisão Clínica , Modelos Logísticos , Curva ROC
15.
Sensors (Basel) ; 22(19)2022 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-36236506

RESUMO

Following the recent advances in wireless communication leading to increased Internet of Things (IoT) systems, many security threats are currently ravaging IoT systems, causing harm to information. Considering the vast application areas of IoT systems, ensuring that cyberattacks are holistically detected to avoid harm is paramount. Machine learning (ML) algorithms have demonstrated high capacity in helping to mitigate attacks on IoT devices and other edge systems with reasonable accuracy. However, the dynamics of operation of intruders in IoT networks require more improved IDS models capable of detecting multiple attacks with a higher detection rate and lower computational resource requirement, which is one of the challenges of IoT systems. Many ensemble methods have been used with different ML classifiers, including decision trees and random forests, to propose IDS models for IoT environments. The boosting method is one of the approaches used to design an ensemble classifier. This paper proposes an efficient method for detecting cyberattacks and network intrusions based on boosted ML classifiers. Our proposed model is named BoostedEnML. First, we train six different ML classifiers (DT, RF, ET, LGBM, AD, and XGB) and obtain an ensemble using the stacking method and another with a majority voting approach. Two different datasets containing high-profile attacks, including distributed denial of service (DDoS), denial of service (DoS), botnets, infiltration, web attacks, heartbleed, portscan, and botnets, were used to train, evaluate, and test the IDS model. To ensure that we obtained a holistic and efficient model, we performed data balancing with synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN) techniques; after that, we used stratified K-fold to split the data into training, validation, and testing sets. Based on the best two models, we construct our proposed BoostedEnsML model using LightGBM and XGBoost, as the combination of the two classifiers gives a lightweight yet efficient model, which is part of the target of this research. Experimental results show that BoostedEnsML outperformed existing ensemble models in terms of accuracy, precision, recall, F-score, and area under the curve (AUC), reaching 100% in each case on the selected datasets for multiclass classification.


Assuntos
Internet das Coisas , Algoritmos , Área Sob a Curva , Aprendizado de Máquina
16.
Sensors (Basel) ; 22(15)2022 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-35957379

RESUMO

As the range of security attacks increases across diverse network applications, intrusion detection systems are of central interest. Such detection systems are more crucial for the Internet of Things (IoT) due to the voluminous and sensitive data it produces. However, the real-world network produces imbalanced traffic including different and unknown attack types. Due to this imbalanced nature of network traffic, the traditional learning-based detection techniques suffer from lower overall detection performance, higher false-positive rate, and lower minority-class attack detection rates. To address the issue, we propose a novel deep generative-based model called Class-wise Focal Loss Variational AutoEncoder (CFLVAE) which overcomes the data imbalance problem by generating new samples for minority attack classes. Furthermore, we design an effective and cost-sensitive objective function called Class-wise Focal Loss (CFL) to train the traditional Variational AutoEncoder (VAE). The CFL objective function focuses on different minority class samples and scrutinizes high-level feature representation of observed data. This leads the VAE to generate more realistic, diverse, and quality intrusion data to create a well-balanced intrusion dataset. The balanced dataset results in improving the intrusion detection accuracy of learning-based classifiers. Therefore, a Deep Neural Network (DNN) classifier with a unique architecture is then trained using the balanced intrusion dataset to enhance the detection performance. Moreover, we utilize a challenging and highly imbalanced intrusion dataset called NSL-KDD to conduct an extensive experiment with the proposed model. The results demonstrate that the proposed CFLVAE with DNN (CFLVAE-DNN) model obtains promising performance in generating realistic new intrusion data samples and achieves superior intrusion detection performance. Additionally, the proposed CFLVAE-DNN model outperforms several state-of-the-art data generation and traditional intrusion detection methods. Specifically, the CFLVAE-DNN achieves 88.08% overall intrusion detection accuracy and 3.77% false positive rate. More significantly, it obtains the highest low-frequency attack detection rates for U2R (79.25%) and R2L (67.5%) against all the state-of-the-art algorithms.


Assuntos
Internet das Coisas , Algoritmos , Redes Neurais de Computação
17.
Appl Soft Comput ; 129: 109588, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36061418

RESUMO

Healthcare systems worldwide have been struggling since the beginning of the COVID-19 pandemic. The early diagnosis of this unprecedented infection has become their ultimate objective. Detecting positive patients from chest X-ray images is a quick and efficient solution for overloaded hospitals. Many studies based on deep learning (DL) techniques have shown high performance in classifying COVID-19 chest X-ray images. However, most of these studies suffer from a class imbalance problem mainly due to the limited number of COVID-19 samples. Such a problem may significantly reduce the efficiency of DL classifiers. In this work, we aim to build an accurate model that assists clinicians in the early diagnosis of COVID-19 using balanced data. To this end, we trained six state-of-the-art convolutional neural networks (CNNs) via transfer learning (TL) on three different COVID-19 datasets. The models were developed to perform a multi-classification task that distinguishes between COVID-19, normal, and viral pneumonia cases. To address the class imbalance issue, we first investigated the Weighted Categorical Loss (WCL) and then the Synthetic Minority Oversampling Technique (SMOTE) on each dataset separately. After a comparative study of the obtained results, we selected the model that achieved high classification results in terms of accuracy, sensitivity, specificity, precision, F1 score, and AUC compared to other recent works. DenseNet201 and VGG-19 claimed the best scores. With an accuracy of 98.87%, an F1_Score of 98.21%, a sensitivity of 98.86%, a specificity of 99.43%, a precision of 100%, and an AUC of 99.15%, the WCL combined with CheXNet outperformed the other examined models.

18.
Entropy (Basel) ; 24(8)2022 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-36010729

RESUMO

To overcome the lack of flexibility of Harris Hawks Optimization (HHO) in switching between exploration and exploitation, and the low efficiency of its exploitation phase, an efficient improved greedy Harris Hawks Optimizer (IGHHO) is proposed and applied to the feature selection (FS) problem. IGHHO uses a new transformation strategy that enables flexible switching between search and development, enabling it to jump out of local optima. We replace the original HHO exploitation process with improved differential perturbation and a greedy strategy to improve its global search capability. We tested it in experiments against seven algorithms using single-peaked, multi-peaked, hybrid, and composite CEC2017 benchmark functions, and IGHHO outperformed them on optimization problems with different feature functions. We propose new objective functions for the problem of data imbalance in FS and apply IGHHO to it. IGHHO outperformed comparison algorithms in terms of classification accuracy and feature subset length. The results show that IGHHO applies not only to global optimization of different feature functions but also to practical optimization problems.

19.
Ecol Lett ; 24(6): 1251-1261, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33783944

RESUMO

Ecologists increasingly rely on complex computer simulations to forecast ecological systems. To make such forecasts precise, uncertainties in model parameters and structure must be reduced and correctly propagated to model outputs. Naively using standard statistical techniques for this task, however, can lead to bias and underestimation of uncertainties in parameters and predictions. Here, we explain why these problems occur and propose a framework for robust inference with complex computer simulations. After having identified that model error is more consequential in complex computer simulations, due to their more pronounced nonlinearity and interconnectedness, we discuss as possible solutions data rebalancing and adding bias corrections on model outputs or processes during or after the calibration procedure. We illustrate the methods in a case study, using a dynamic vegetation model. We conclude that developing better methods for robust inference of complex computer simulations is vital for generating reliable predictions of ecosystem responses.


Assuntos
Ecossistema , Modelos Estatísticos , Teorema de Bayes , Simulação por Computador , Previsões , Incerteza
20.
Environ Sci Technol ; 55(14): 9958-9967, 2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-34240848

RESUMO

Deep learning (DL) offers an unprecedented opportunity to revolutionize the landscape of toxicity prediction based on quantitative structure-activity relationship (QSAR) studies in the big data era. However, the structural description in the reported DL-QSAR models is still restricted to the two-dimensional level. Inspired by point clouds, a type of geometric data structure, a novel three-dimensional (3D) molecular surface point cloud with electrostatic potential (SepPC) was proposed to describe chemical structures. Each surface point of a chemical is assigned its 3D coordinate and molecular electrostatic potential. A novel DL architecture SepPCNET was then introduced to directly consume unordered SepPC data for toxicity classification. The SepPCNET model was trained on 1317 chemicals tested in a battery of 18 estrogen receptor-related assays of the ToxCast program. The obtained model recognized the active and inactive chemicals at accuracies of 82.8 and 88.9%, respectively, with a total accuracy of 88.3% on the internal test set and 92.5% on the external test set, which outperformed other up-to-date machine learning models and succeeded in recognizing the difference in the activity of isomers. Additional insights into the toxicity mechanism were also gained by visualizing critical points and extracting data-driven point features of active chemicals.


Assuntos
Estrogênios , Relação Quantitativa Estrutura-Atividade , Estrogênios/toxicidade , Humanos , Eletricidade Estática
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa